|KHTTP_PARSE(3)||Library Functions Manual||KHTTP_PARSE(3)|
khttp_parsex — parse a CGI
instance for kcgi
khttp_parse(struct kreq *req,
const struct kvalid *keys, size_t
keysz, const char *const *pages,
size_t pagesz, size_t
khttp_parsex(struct kreq *req,
const struct kmimemap *suffixes, const
char *const *mimes, size_t mimesz,
const struct kvalid *keys, size_t
keysz, const char *const *pages,
size_t pagesz, size_t defmime,
size_t defpage, void *arg,
void (*argfree)(void *arg), unsigned
int debugging, const struct kopts *opts);
extern const char *const
extern const char *const khttps[KHTTP__MAX];
extern const char *const kschemes[KSCHEME__MAX];
extern const char *const kmethods[KMETHOD__MAX];
extern const struct kmimemap ksuffixmap;
extern const char *const ksuffixes[KMIME__MAX];
khttp_parsex() functions parse and validate
input and the HTTP environment (compression, paths, MIME types, and so on).
They are the central functions in the kcgi(3) library,
parsing and validating key-value form (query string, message body, cookie)
data and opaque message bodies.
They must be matched by khttp_free(3) if and
only if the return value is
resources are internally freed.
The collective arguments are as follows:
- A pointer to private application data. It is not touched unless argfree is provided.
- Function invoked with arg by the child process starting to parse untrusted network data. This makes sure that no unnecessary data is leaked into the child.
- This bit-field enables debugging of the underlying parse and/or write
routines. It may have
KREQ_DEBUG_WRITEfor writes and
KREQ_DEBUG_READ_BODYfor the pre-parsed body. Debugging messages to kutil_info(3) consist of the process ID followed by "-tx" or "-rx" for writing or reading, a colon and space, then the logged data. A newline will flush the existing line, as well reaching 80 characters. If flushed at 80 characters and not a newline, an ellipsis will follow the line. The total logged bytes will be emitted at the end of all reads or writes.
- If no MIME type is specified (that is, there's no suffix to the page request), use this index in the mimes array.
- If no page was specified (e.g., the default landing page), this is provided as the requested page index.
- An optional array of input and validation fields or
- The number of elements in keys.
- The number of elements in mimes. Also the MIME index used if no MIME type was matched. This differs from defmime, which is used if there is no MIME suffix at all.
- An array of MIME types (e.g., “text/html”), mapped into a
MIME index during MIME body parsing. This relates both to pages and input
fields with a body type. Any array should include at least
text/plain, as this is the default content type for MIME documents.
- Tunable options regarding socket buffer sizes and so on. If set to
NULL, meaningful defaults are used.
- An array of recognised pathnames. When pathnames are parsed, they're matched to indices in this array.
- The number of pages in pages. Also used if the requested page was not in pages.
- This structure is cleared and filled with input fields and HTTP context parsed from the CGI environment. It is the main structure carried around in a kcgi(3) application.
- Define the MIME type (suffix) mapping.
The first form,
is for applications using the system-recognised MIME types. This should work
well enough for most applications. It is equivalent to invoking the second
khttp_parsex(), as follows:
khttp_parsex(req, ksuffixmap, kmimetypes, KMIME__MAX, keys, keysz, pages, pagesz, KMIME_TEXT_HTML, defpage, NULL, NULL, 0, NULL);
A struct kreq object is filled in by
khttp_parsex(). It consists of the following
- void *arg
- Private application data. This is set during
- enum kauth auth
- Type of “managed” HTTP authorisation performed by the web
server according to the
AUTH_TYPEheader variable, if any. This is
KAUTH_UNKNOWNfor other values of
AUTH_TYPEis not set. See the rawauth field for raw authorisation requests.
- struct kpair **cookiemap
- An array of keysz singly linked lists of elements of
the cookies array. If
cookie->key is equal to one
of the entries of keys and
KPAIR_UNCHECKED, the cookie is added to the list cookiemap[cookie->keypos]. Empty lists are
NULL. If a list contains more than one cookie, cookie->next points to the next cookie. For the last cookie in a list, cookie->next is NULL.
- struct kpair **cookienmap
- Similar to cookiemap, except that it contains the
cookies where cookie->state
- struct kpair *cookies
- Key-value pairs read from request cookies found in the
HTTP_COOKIEheader variable, or
NULLif cookiesz is 0. See fields for key-value pairs from the request query string or message body.
- size_t cookiesz
- The size of the cookies array.
- struct kpair **fieldmap
- Similar to cookiemap, except that the lists contain elements of the fields array.
- struct kpair **fieldnmap
- Similar to fieldmap, except that it contains the
fields where field->state
- struct kpair *fields
- Key-value pairs read from the
QUERY_STRINGheader variable and from the message body, or
fieldszis 0. See cookies for key-value pairs from request cookies.
- size_t fieldsz
- The number of elements in the fields array.
- char *fullpath
- The full requested path as contained in the
PATH_INFOheader variable. For example, requesting "https://bsd.lv/app.cgi/dir/file.html?q=v", where "app.cgi" is the CGI program, this value would be /dir/file.html. It is not guaranteed to start with a slash and it may be an empty string.
- char *host
- The host name received in the
HTTP_HOSTheader variable. When using name-based virtual hosting, this is typically the virtual host name specified by the client in the HTTP request, and it should not be confused with the canonical DNS name of the host running the web server. For example, a request to "https://bsd.lv/app.cgi/file" would have a host of "bsd.lv". If
HTTP_HOSTis not defined, host is set to "localhost".
- struct kdata *kdata
- Internal data. Should not be touched.
- const struct kvalid *keys
- Value passed to
- size_t keysz
- Value passed to
- enum kmethod method
KMETHOD_UNLOCKsubmission method obtained from the
REQUEST_METHODheader variable. If an unknown method was requested,
KMETHOD__MAXis used. If no method was specified, the default is
Applications will usually accept only
KMETHOD_POST, so be sure to emit a
KHTTP_405status for undesired methods.
- size_t mime
- The MIME type of the requested file as determined by its
suffix matched to the
mimemap map passed to
khttp_parsex() or the default kmimemap if using
khttp_parse(). This defaults to the mimesz value passed to
khttp_parsex() or the default
khttp_parse() when no suffix is specified or when the suffix is specified but not known.
- size_t page
- The page index found by looking up pagename in the pages array. If pagename is not found in pages, pagesz is used; if pagename is empty, defpage is used.
- char *pagename
- The first component of fullpath or an empty string if there is none. It is compared to the elements of the pages array to determine which page it corresponds to. For example, for a fullpath of "/dir/file.html" this component corresponds to dir. For "/file.html", it's file.
- char *path
- The middle part of fullpath, after stripping pagename/ at the beginning and .suffix at the end, or an empty string if there is none. For example, if the fullpath is bar/baz.html, this component is baz.
- char *pname
- The script name received in the
SCRIPT_NAMEheader variable. For example, for a request to a CGI program /var/www/cgi-bin/app.cgi mapped by the web server from "https://bsd.lv/app.cgi/file", this would be app.cgi. This may not reflect a file system entity and it may be an empty string.
- uint16_t port
- The server's receiving TCP port according to the
SERVER_PORTheader variable, or 80 if that is not defined or an invalid number.
- struct khttpauth rawauth
- The raw authorization request according to the
HTTP_AUTHORIZATIONheader variable passed by the web server. Some web servers, for example Apache, do not set
- char *remote
- The string form of the client's IPv4 or IPv6 address taken from the
REMOTE_ADDRheader variable, or "127.0.0.1" if that is not defined. The address format of the string is not checked.
- struct khead
- Mapping of enum krequ enumeration values to reqs parsed from the input stream.
- struct khead *reqs
- List of all HTTP request headers, known via enum
krequ and not known, parsed from the input stream, or
NULLif reqsz is 0.
- size_t reqsz
- Number of request headers in reqs.
- enum kscheme scheme
- The access scheme according to the
HTTPSheader variable, either
HTTPSis set and equal to the string "on" or
- char *suffix
- The suffix part of the last component of fullpath or an empty string if there is none. For example, if the fullpath is /bar/baz.html, this component is html. See the mime field for the MIME type parsed from the suffix.
The application may optionally define
keys provided to
khttp_parsex() as an array of
struct kvalid. This structure is central to the
validation of input data. It consists of the following fields:
- const char *name
- The field name, i.e., how it appears in the HTML form input name. This
NULL. If the field name is an empty string and the HTTP message consists of an opaque body (and not key-value pairs), then that field will be used to validate the HTTP message body. This is useful for
- int (*)(struct kpair *) valid
- A validation function returning non-zero if parsing and validation succeed
or 0 otherwise. If it is
NULL, then no validation is performed, the data is considered as valid, and it is bucketed into cookiemap or fieldmap as such.
User-defined valid functions usually set the type and parsed fields in the key-value pair. When working with binary data or with a key that can take different data types, it is acceptable for a validation function to set the type to
KPAIR__MAXand for the application to ignore the parsed field and to work directly with val and valsz.
The validation function is allowed to allocate new memory for val: if the val pointer changes during validation, the memory pointed to after validation will be freed with free(3) after the data is passed out of the sandbox.
These functions are invoked from within a system-specific sandbox that may not allow some system calls, for example opening files or sockets. In other words, validation functions should only do pure computation.
The struct kpair
structure presents the user with fields parsed from input and (possibly)
matched to the keys variable passed to
khttp_parsex(). It is also passed to the
validation function to be filled in. In this case, the MIME-related fields
are already filled in and may be examined to determine the method of
validation. This is useful when validating opaque message bodies.
- char *ctype
- The value's MIME content type (e.g.,
image/jpeg), or an empty string if not defined.
- size_t ctypepos
- If ctype is not
NULL, it is looked up in the mimes parameter passed to
khttp_parsex() or ksuffixmap if using
khttp_parse(). If found, it is set to the appropriate index. Otherwise, it's mimesz.
- char *file
- The value's MIME source filename or an empty string if not defined.
- char *key
- The NUL-terminated key (input) name. If the HTTP message body is opaque
KMETHOD_PUT), then an empty-string key is cooked up. The key may contain an arbitrary sequence of non-NUL bytes, even non-ASCII bytes, control characters, and shell metacharacters.
- size_t keypos
- If found in the keys array passed to
khttp_parse(), the index of the matching key. Otherwise keysz.
- struct kpair *next
- In a cookie or field map, next points to the next parsed key-value pair with the same key name. This occurs most often in HTML checkbox forms, where many fields may have the same name.
- union parsed parsed
- The parsed, validated value. These may be integer in i, for a 64-bit signed integer; a string s, for a NUL-termianted character string; or a double d, for a double-precision floating-point number. This is intentionally basic because the resulting data must be reliably passed from the parsing context back into the web application.
- enum kpairstate state
- The validation state:
KPAIR_VALIDif the pair was successfully validated by a validation function,
KPAIR_INVALIDif a validation function was invoked but failed, or
KPAIR_UNCHECKEDif no validation function is defined for this key.
- enum kpairtype type
- If parsed, the type of data in parsed, otherwise
- char *val
- The (input) value, which may contain an arbitrary sequence of bytes, even NUL bytes, non-ASCII bytes, control characters, and shell metacharacters. The byte following the end of the array, val[valsz], is always guaranteed to be NUL. The validation function may modify the contents. For example, for integer numbers and e-mail adresses, trailing whitespace may be replaced with NUL bytes.
- size_t valsz
- The length of the val buffer in bytes. It is not a string length.
- char *xcode
- The value's MIME content transfer encoding (e.g.,
base64), or an empty string if not defined.
The struct khttpauth structure holds authorisation data if passed by the server. The specific fields are as follows.
- enum kauth type
- If no data was passed by the server, the type value
KAUTH_NONE. Otherwise it's
KAUTH_UNKNOWNif the authorisation type was not recognised.
- int authorised
KAUTH_DIGESTauthorisation, this field indicates whether all required values were specified.
- char *digest
- An MD5 digest of
PATH_INFO, header variables and the request body. It is not a NUL-terminated string, but an array of exactly
MD5_DIGEST_LENGTHbytes. Only filled in when
HTTP_AUTHORIZATIONis "digest" and authorised is non-zero. Otherwise, it remains
NULL. Used in khttpdigest_validatehash(3).
- An anonymous union containing parsed fields per type:
struct khttpbasic basic for
KAUTH_BASICor struct khttpdigest digest for
If the field for an HTTP authorisation request is
KAUTH_BASIC, it will consist of the following for
its parsed entities in its struct khttpbasic
- The hashed and encoded response string.
If the field for an HTTP authorisation request is
KAUTH_DIGEST, it will consist of the following in
its struct khttpdigest structure:
- The encoding algorithm, parsed from the possible
- The quality of protection algorithm, which may be unspecified,
- The user coordinating the request.
- The URI for which the request is designated. (This must match the request URI).
- The request realm.
- The server-generated nonce value.
- The (optional) client-generated nonce value.
- The hashed and encoded response string, which entangled fields depending on algorithm and quality of protection.
- The (optional) cnonce counter.
- The (optional) opaque string requested by the server.
The struct kopts structure consists of tunables for network performance. You probably don't want to use these unless you really know what you're doing!
- The size of the output buffer. The output buffer is a heap-allocated region into which writes (via khttp_write(3) and khttp_head(3)) are buffered instead of being flushed directly to the wire. The buffer is flushed when it is full, when the HTTP headers are flushed, and when khttp_free(3) is invoked. If the buffer size is zero, writes are flushed immediately to the wire. If the buffer size is less than zero, it is filled with a meaningful default.
Lastly, the struct khead structure holds parsed HTTP headers.
- Holds the HTTP header name. This is not the CGI header name (e.g.,
HTTP_COOKIE), but the reconstituted HTTP name (e.g.,
- The opaque header value, which may be an empty string.
A number of variables are defined
<kcgi.h> to simplify
invocations of the
family. Applications are strongly suggested to use these variables (and
associated enumerations) in
khttp_parse() instead of
overriding them with hand-rolled sets in
- Indexed list of common MIME types, for example, “text/html” and “application/json”. Corresponds to enum kmime enum khttp.
- Indexed list of HTTP status code and identifier, for example, “200 OK”. Corresponds to enum khttp.
- Indexed list of URL schemes, for example, “https” or “ftp”. Corresponds to enum kscheme.
- Indexed list of HTTP methods, for example, “GET” and “POST”. Corresponds to enum kmethod.
- Map of MIME types defined in enum kmime to possible
suffixes. This array is terminated with a MIME type of
- Indexed list of canonical suffixes for MIME types corresponding to
enum kmime. This may be a
NULLpointer for types that have no canonical suffix, for example. “application/octet-stream”.
khttp_parsex() return an error code:
- Success (not an error).
- Memory failure. This can occur in many places: spawning a child, allocating memory, creating sockets, etc.
- Could not allocate file descriptors.
- Could not spawn a child.
- Malformed data between parent and child whilst parsing an HTTP request. (Internal system error.)
- Opaque operating system error.
On failure, the calling application should terminate as soon as possible. Applications should not try to write an HTTP 505 error or similar, but allow the web server to handle the empty CGI response on its own.
khttp_parsex() functions were written by
|July 21, 2020||OpenBSD 6.7|