KHTTP_PARSE(3) Library Functions Manual KHTTP_PARSE(3)

khttp_parse, khttp_parsexparse a CGI instance for kcgi

library “libkcgi”

#include <sys/types.h>
#include <stdarg.h>
#include <stdint.h>
#include <kcgi.h>

enum kcgi_err
khttp_parse(struct kreq *req, const struct kvalid *keys, size_t keysz, const char *const *pages, size_t pagesz, size_t defpage);

enum kcgi_err
khttp_parsex(struct kreq *req, const struct kmimemap *suffixes, const char *const *mimes, size_t mimesz, const struct kvalid *keys, size_t keysz, const char *const *pages, size_t pagesz, size_t defmime, size_t defpage, void *arg, void (*argfree)(void *arg), unsigned int debugging, const struct kopts *opts);

extern const char *const kmimetypes[KMIME__MAX];
extern const char *const khttps[KHTTP__MAX];
extern const char *const kschemes[KSCHEME__MAX];
extern const char *const kmethods[KMETHOD__MAX];
extern const struct kmimemap ksuffixmap[];
extern const char *const ksuffixes[KMIME__MAX];

The () and khttp_parsex() functions parse and validate input and the HTTP environment (compression, paths, MIME types, and so on). They are the central functions in the kcgi(3) library, parsing and validating key-value form (query string, message body, cookie) data and opaque message bodies.

They must be matched by khttp_free(3) if and only if the return value is KCGI_OK. Otherwise, resources are internally freed.

The collective arguments are as follows:

arg
A pointer to private application data. It is not touched unless argfree is provided.
argfree
Function invoked with arg by the child process starting to parse untrusted network data. This makes sure that no unnecessary data is leaked into the child.
debugging
This bit-field enables debugging of the underlying parse and/or write routines. It may have KREQ_DEBUG_WRITE for writes and KREQ_DEBUG_READ_BODY for the pre-parsed body. Debugging messages to kutil_info(3) consist of the process ID followed by "-tx" or "-rx" for writing or reading, a colon and space, then the logged data. A newline will flush the existing line, as well reaching 80 characters. If flushed at 80 characters and not a newline, an ellipsis will follow the line. The total logged bytes will be emitted at the end of all reads or writes.
defmime
If no MIME type is specified (that is, there's no suffix to the page request), use this index in the mimes array.
defpage
If no page was specified (e.g., the default landing page), this is provided as the requested page index.
keys
An optional array of input and validation fields or NULL.
keysz
The number of elements in keys.
mimesz
The number of elements in mimes. Also the MIME index used if no MIME type was matched. This differs from defmime, which is used if there is no MIME suffix at all.
mimes
An array of MIME types (e.g., “text/html”), mapped into a MIME index during MIME body parsing. This relates both to pages and input fields with a body type. Any array should include at least text/plain, as this is the default content type for MIME documents.
opts
Tunable options regarding socket buffer sizes and so on. If set to NULL, meaningful defaults are used.
pages
An array of recognised pathnames. When pathnames are parsed, they're matched to indices in this array.
pagesz
The number of pages in pages. Also used if the requested page was not in pages.
req
This structure is cleared and filled with input fields and HTTP context parsed from the CGI environment. It is the main structure carried around in a kcgi(3) application.
suffixes
Define the MIME type (suffix) mapping.

The first form, (), is for applications using the system-recognised MIME types. This should work well enough for most applications. It is equivalent to invoking the second form, khttp_parsex(), as follows:

khttp_parsex(req, ksuffixmap,
  kmimetypes, KMIME__MAX, keys, keysz,
  pages, pagesz, KMIME_TEXT_HTML,
  defpage, NULL, NULL, 0, NULL);

A struct kreq object is filled in by khttp_parse() and khttp_parsex(). It consists of the following fields:

void *arg
Private application data. This is set during khttp_parse().
enum kauth auth
Type of “managed” HTTP authorisation performed by the web server according to the AUTH_TYPE header variable, if any. This is KAUTH_DIGEST for the AUTH_TYPE of "digest", KAUTH_BASIC for "basic", KAUTH_BEARER for "bearer", KAUTH_UNKNOWN for other values of AUTH_TYPE, or KAUTH_NONE if AUTH_TYPE is not set. See the rawauth field for raw (i.e., not processed by the web server) authorisation requests.
struct kpair **cookiemap
An array of keysz singly linked lists of elements of the cookies array. If cookie->key is equal to one of the entries of keys and cookie->state is KPAIR_VALID or KPAIR_UNCHECKED, the cookie is added to the list cookiemap[cookie->keypos]. Empty lists are NULL. If a list contains more than one cookie, cookie->next points to the next cookie. For the last cookie in a list, cookie->next is NULL.
struct kpair **cookienmap
Similar to cookiemap, except that it contains the cookies where cookie->state is KPAIR_INVALID.
struct kpair *cookies
Key-value pairs read from request cookies found in the HTTP_COOKIE header variable, or NULL if cookiesz is 0. See fields for key-value pairs from the request query string or message body.
size_t cookiesz
The size of the cookies array.
struct kpair **fieldmap
Similar to cookiemap, except that the lists contain elements of the fields array.
struct kpair **fieldnmap
Similar to fieldmap, except that it contains the fields where field->state is KPAIR_INVALID.
struct kpair *fields
Key-value pairs read from the QUERY_STRING header variable and from the message body, or NULL if fieldsz is 0. See cookies for key-value pairs from request cookies.
size_t fieldsz
The number of elements in the fields array.
char *fullpath
The full requested path as contained in the PATH_INFO header variable. For example, requesting "https://bsd.lv/app.cgi/dir/file.html?q=v", where "app.cgi" is the CGI program, this value would be /dir/file.html. It is not guaranteed to start with a slash and it may be an empty string.
char *host
The host name received in the HTTP_HOST header variable. When using name-based virtual hosting, this is typically the virtual host name specified by the client in the HTTP request, and it should not be confused with the canonical DNS name of the host running the web server. For example, a request to "https://bsd.lv/app.cgi/file" would have a host of "bsd.lv". If HTTP_HOST is not defined, host is set to "localhost".
struct kdata *kdata
Internal data. Should not be touched.
const struct kvalid *keys
Value passed to khttp_parse().
size_t keysz
Value passed to khttp_parse().
enum kmethod method
The KMETHOD_ACL, KMETHOD_CONNECT, KMETHOD_COPY, KMETHOD_DELETE, KMETHOD_GET, KMETHOD_HEAD, KMETHOD_LOCK, KMETHOD_MKCALENDAR, KMETHOD_MKCOL, KMETHOD_MOVE, KMETHOD_OPTIONS, KMETHOD_POST, KMETHOD_PROPFIND, KMETHOD_PROPPATCH, KMETHOD_PUT, KMETHOD_REPORT, KMETHOD_TRACE, or KMETHOD_UNLOCK submission method obtained from the REQUEST_METHOD header variable. If an unknown method was requested, KMETHOD__MAX is used. If no method was specified, the default is KMETHOD_GET.

Applications will usually accept only KMETHOD_GET and KMETHOD_POST, so be sure to emit a KHTTP_405 status for undesired methods.

size_t mime
The MIME type of the requested file as determined by its suffix matched to the mimemap map passed to () or the default kmimemap if using khttp_parse(). This defaults to the mimesz value passed to khttp_parsex() or the default KMIME__MAX if using khttp_parse() when no suffix is specified or when the suffix is specified but not known.
size_t page
The page index found by looking up pagename in the pages array. If pagename is not found in pages, pagesz is used; if pagename is empty, defpage is used.
char *pagename
The first component of fullpath or an empty string if there is none. It is compared to the elements of the pages array to determine which page it corresponds to. For example, for a fullpath of "/dir/file.html" this component corresponds to dir. For "/file.html", it's file.
char *path
The middle part of fullpath, after stripping pagename/ at the beginning and .suffix at the end, or an empty string if there is none. For example, if the fullpath is bar/baz.html, this component is baz.
char *pname
The script name received in the SCRIPT_NAME header variable. For example, for a request to a CGI program /var/www/cgi-bin/app.cgi mapped by the web server from "https://bsd.lv/app.cgi/file", this would be app.cgi. This may not reflect a file system entity and it may be an empty string.
uint16_t port
The server's receiving TCP port according to the SERVER_PORT header variable, or 80 if that is not defined or an invalid number.
struct khttpauth rawauth
The raw authorization request according to the HTTP_AUTHORIZATION header variable passed by the web server. This is only set if the web server is not managing authorisation itself.
char *remote
The string form of the client's IPv4 or IPv6 address taken from the REMOTE_ADDR header variable, or "127.0.0.1" if that is not defined. The address format of the string is not checked.
struct khead *reqmap[KREQU__MAX]
Mapping of enum krequ enumeration values to reqs parsed from the input stream.
struct khead *reqs
List of all HTTP request headers, known via enum krequ and not known, parsed from the input stream, or NULL if reqsz is 0.
size_t reqsz
Number of request headers in reqs.
enum kscheme scheme
The access scheme according to the HTTPS header variable, either KSCHEME_HTTPS if HTTPS is set and equal to the string "on" or KSCHEME_HTTP otherwise.
char *suffix
The suffix part of the last component of fullpath or an empty string if there is none. For example, if the fullpath is /bar/baz.html, this component is html. See the mime field for the MIME type parsed from the suffix.

The application may optionally define keys provided to () and khttp_parsex() as an array of struct kvalid. This structure is central to the validation of input data. It consists of the following fields:

const char *name
The field name, i.e., how it appears in the HTML form input name. This cannot be NULL. If the field name is an empty string and the HTTP message consists of an opaque body (and not key-value pairs), then that field will be used to validate the HTTP message body. This is useful for KMETHOD_PUT style requests.
int (*)(struct kpair *) valid
A validation function returning non-zero if parsing and validation succeed or 0 otherwise. If it is NULL, then no validation is performed, the data is considered as valid, and it is bucketed into cookiemap or fieldmap as such.

User-defined valid functions usually set the type and parsed fields in the key-value pair. When working with binary data or with a key that can take different data types, it is acceptable for a validation function to set the type to KPAIR__MAX and for the application to ignore the parsed field and to work directly with val and valsz.

The validation function is allowed to allocate new memory for val: if the val pointer changes during validation, the memory pointed to after validation will be freed with free(3) after the data is passed out of the sandbox.

These functions are invoked from within a system-specific sandbox that may not allow some system calls, for example opening files or sockets. In other words, validation functions should only do pure computation.

The struct kpair structure presents the user with fields parsed from input and (possibly) matched to the keys variable passed to () and khttp_parsex(). It is also passed to the validation function to be filled in. In this case, the MIME-related fields are already filled in and may be examined to determine the method of validation. This is useful when validating opaque message bodies.

char *ctype
The value's MIME content type (e.g., image/jpeg), or an empty string if not defined.
size_t ctypepos
If ctype is not NULL, it is looked up in the mimes parameter passed to khttp_parsex() or ksuffixmap if using khttp_parse(). If found, it is set to the appropriate index. Otherwise, it's mimesz.
char *file
The value's MIME source filename or an empty string if not defined.
char *key
The NUL-terminated key (input) name. If the HTTP message body is opaque (e.g., KMETHOD_PUT), then an empty-string key is cooked up. The key may contain an arbitrary sequence of non-NUL bytes, even non-ASCII bytes, control characters, and shell metacharacters.
size_t keypos
If found in the keys array passed to khttp_parse(), the index of the matching key. Otherwise keysz.
struct kpair *next
In a cookie or field map, next points to the next parsed key-value pair with the same key name. This occurs most often in HTML checkbox forms, where many fields may have the same name.
union parsed parsed
The parsed, validated value. These may be integer in i, for a 64-bit signed integer; a string s, for a NUL-termianted character string; or a double d, for a double-precision floating-point number. This is intentionally basic because the resulting data must be reliably passed from the parsing context back into the web application.
enum kpairstate state
The validation state: KPAIR_VALID if the pair was successfully validated by a validation function, KPAIR_INVALID if a validation function was invoked but failed, or KPAIR_UNCHECKED if no validation function is defined for this key.
enum kpairtype type
If parsed, the type of data in parsed, otherwise KFIELD__MAX.
char *val
The (input) value, which may contain an arbitrary sequence of bytes, even NUL bytes, non-ASCII bytes, control characters, and shell metacharacters. The byte following the end of the array, val[valsz], is always guaranteed to be NUL. The validation function may modify the contents. For example, for integer numbers and e-mail adresses, trailing whitespace may be replaced with NUL bytes.
size_t valsz
The length of the val buffer in bytes. It is not a string length.
char *xcode
The value's MIME content transfer encoding (e.g., base64), or an empty string if not defined.

The struct khttpauth structure holds authorisation data if passed by the server. The specific fields are as follows.

enum kauth type
If no data was passed by the server, the type value is KAUTH_NONE. Otherwise it's KAUTH_BASIC, KAUTH_BEARER, or KAUTH_DIGEST. KAUTH_UNKNOWN signals that the authorisation type was not recognised.
int authorised
For KAUTH_BASIC, KAUTH_BEARER, or KAUTH_DIGEST authorisation, this field indicates whether all required values were specified for the application to perform authorisation.
char *digest
An MD5 digest of REQUEST_METHOD, SCRIPT_NAME, PATH_INFO, header variables and the request body. It is not a NUL-terminated string, but an array of exactly MD5_DIGEST_LENGTH bytes. Only filled in when HTTP_AUTHORIZATION is "digest" and authorised is non-zero. Otherwise, it remains NULL. Used in khttpdigest_validatehash(3).
d
An anonymous union containing parsed fields per type: struct khttpbasic basic for KAUTH_BASIC or KAUTH_BEARER, or struct khttpdigest digest for KAUTH_DIGEST.

If the field for an HTTP authorisation request is KAUTH_BASIC or KAUTH_BEARER, it will consist of the following for its parsed entities in its struct khttpbasic structure:

response
The hashed and encoded response string for KAUTH_BASIC, or an opaque string for KAUTH_BEARER.

If the field for an HTTP authorisation request is KAUTH_DIGEST, it will consist of the following in its struct khttpdigest structure:

alg
The encoding algorithm, parsed from the possible MD5 or MD5-Sess values.
qop
The quality of protection algorithm, which may be unspecified, Auth or Auth-Init.
user
The user coordinating the request.
uri
The URI for which the request is designated. (This must match the request URI).
realm
The request realm.
nonce
The server-generated nonce value.
cnonce
The (optional) client-generated nonce value.
response
The hashed and encoded response string, which entangled fields depending on algorithm and quality of protection.
count
The (optional) cnonce counter.
opaque
The (optional) opaque string requested by the server.

The struct kopts structure consists of tunables for network performance. You probably don't want to use these unless you really know what you're doing!

sndbufsz
The size of the output buffer. The output buffer is a heap-allocated region into which writes (via khttp_write(3) and khttp_head(3)) are buffered instead of being flushed directly to the wire. The buffer is flushed when it is full, when the HTTP headers are flushed, and when khttp_free(3) is invoked. If the buffer size is zero, writes are flushed immediately to the wire. If the buffer size is less than zero, it is filled with a meaningful default.

Lastly, the struct khead structure holds parsed HTTP headers.

key
Holds the HTTP header name. This is not the CGI header name (e.g., HTTP_COOKIE), but the reconstituted HTTP name (e.g., Coookie).
val
The opaque header value, which may be an empty string.

A number of variables are defined <kcgi.h> to simplify invocations of the () family. Applications are strongly suggested to use these variables (and associated enumerations) in khttp_parse() instead of overriding them with hand-rolled sets in khttp_parsex().

kmimetypes
Indexed list of common MIME types, for example, “text/html” and “application/json”. Corresponds to enum kmime enum khttp.
khttps
Indexed list of HTTP status code and identifier, for example, “200 OK”. Corresponds to enum khttp.
kschemes
Indexed list of URL schemes, for example, “https” or “ftp”. Corresponds to enum kscheme.
kmethods
Indexed list of HTTP methods, for example, “GET” and “POST”. Corresponds to enum kmethod.
ksuffixmap
Map of MIME types defined in enum kmime to possible suffixes. This array is terminated with a MIME type of KMIME__MAX and name NULL.
ksuffixes
Indexed list of canonical suffixes for MIME types corresponding to enum kmime. This may be a NULL pointer for types that have no canonical suffix, for example. “application/octet-stream”.

khttp_parse() and khttp_parsex() return an error code:

Success (not an error).
Memory failure. This can occur in many places: spawning a child, allocating memory, creating sockets, etc.
Could not allocate file descriptors.
Could not spawn a child.
Malformed data between parent and child whilst parsing an HTTP request. (Internal system error.)
Opaque operating system error.

On failure, the calling application should terminate as soon as possible. Applications should try to write an HTTP 505 error or similar, but allow the web server to handle the empty CGI response on its own.

kcgi(3), khttp_free(3)

The khttp_parse() and khttp_parsex() functions were written by Kristaps Dzonsons <kristaps@bsd.lv>.

September 15, 2024 OpenBSD 7.5