KHTTP_PARSE(3) Library Functions Manual KHTTP_PARSE(3)

NAME

khttp_parse, khttp_parsexparse a CGI instance for kcgi

LIBRARY

library “libkcgi”

SYNOPSIS

#include <sys/types.h>
#include <stdarg.h>
#include <stddef.h>
#include <stdint.h>
#include <kcgi.h>
enum kcgi_err
khttp_parse(struct kreq *req, const struct kvalid *keys, size_t keysz, const char *const *pages, size_t pagesz, size_t defpage);
enum kcgi_err
khttp_parsex(struct kreq *req, const struct kmimemap *suffixes, const char *const *mimes, size_t mimesz, const struct kvalid *keys, size_t keysz, const char *const *pages, size_t pagesz, size_t defmime, size_t defpage, void *arg, void (*argfree)(void *arg), unsigned int debugging, const struct kopts *opts);
extern const char *const kmimetypes[KMIME__MAX];
extern const char *const khttps[KHTTP__MAX];
extern const char *const kschemes[KSCHEME__MAX];
extern const char *const kresps[KRESP__MAX];
extern const char *const kmethods[KMETHOD__MAX];
extern const struct kmimemap ksuffixmap[];
extern const char *const ksuffixes[KMIME__MAX];

DESCRIPTION

The khttp_parse and khttp_parsex functions parse and validate input and the HTTP environment (compression, paths, MIME types, and so on). They are the central functions in the kcgi(3) library, parsing and validating key-value form (query string, message body, cookie) data and opaque message bodies. They must be matched by khttp_free(3) if and only if the return value is KERR_OK (otherwise, resources are internally freed).
The collective arguments are as follows:
 
 
arg
A pointer to private application data. It is not touched unless argfree is provided.
 
 
argfree
Function invoked with arg by the child process starting to parse untrusted network data. This makes sure that no unnecessary data is leaked into the child.
 
 
debugging
This bit-field sets debugging of the underlying parse and/or write routines. Debugging messages are sent to stderr and consist of the process ID, a colon, then the logged data. Logged data consists of printable ASCII characters and spaces. A newline will flush the existing line. There are at most BUFSIZ characters per line. Other characters are either escaped (\v, \r, \b) or replaced with a question mark. If the KREQ_DEBUG_WRITE bit is set, write operations directly or indirectly via khttp_write(3) will be logged. When the request is torn down with khttp_free(3), the process ID and total logged bytes are printed on their own line. If the KREQ_DEBUG_READ_BODY bit is set, the entire input body is logged. The total byte count is printed on its own line afterward.
 
 
defmime
If no MIME type is specified (that is, there's no suffix to the page request), use this index in the mimes array.
 
 
defpage
If no page was specified (e.g., the default landing page), this is provided as the requested page index.
 
 
keys
An optional array of input and validation fields or NULL.
 
 
keysz
The number of elements in keys.
 
 
mimesz
The number of elements in mimes. Also the MIME index used if no MIME type was matched. This differs from defmime, which is used if there is no MIME suffix at all.
 
 
mimes
An array of MIME types (e.g., “text/html”), mapped into a MIME index during MIME body parsing. This relates both to pages and input fields with a body type.
 
 
opts
Tunable options regarding socket buffer sizes and so on. If set to NULL, meaningful defaults are used.
 
 
pages
An array of recognised pathnames. When pathnames are parsed, they're matched to indices in this array.
 
 
pagesz
The number of pages in pages. Also used if the requested page was not in pages.
 
 
req
This structure is cleared and filled with input fields and HTTP context parsed from the CGI environment. It is the main structure carried around in a kcgi(3) application.
 
 
suffixes
Define the MIME type (suffix) mapping.
The first form, khttp_parse, is for applications using the system-recognised MIME types. This should work well enough for most applications. It is equivalent to invoking the second form, khttp_parsex, as follows:
khttp_parsex(req, ksuffixmap, 
  kmimetypes, KMIME__MAX, keys, keysz, 
  pages, pagesz, KMIME_TEXT_HTML, 
  defpage, NULL, NULL, 0, NULL);

Types

A struct kreq object is filled in by khttp_parse and khttp_parsex. It consists of the following fields:
 
 
arg
Private application data. This is set during khttp_parse().
 
 
auth
Type of “managed” HTTP authorisation, if any. This is digest (KAUTH_DIGEST) or basic (KAUTH_BASIC) authorisation performed by the web server. See the rawauth field for raw authorisation requests. If a managed authorisation is specified but with unknown type (i.e., not digest or basic authentiation), this is set to KAUTH_UNKNOWN.
 
 
cookies
Key-value pairs read from request cookies, or NULL if cookiesz is 0. See fields for key-value pairs from the request query string or message body.
 
 
cookiemap
Entries in successfully-parsed (or un-parsed) cookies mapped into field indices as defined by the keys argument to khttp_parse().
 
 
cookienmap
Entries in unsuccessfully-parsed (but still attempted) cookies mapped into field indices as defined by the keys argument to khttp_parse().
 
 
cookiesz
The size of the cookies array.
 
 
fields
Key-value pairs read from the request query string and message body, or NULL if fieldsz is 0. See cookies for key-value pairs from request cookies.
 
 
fieldmap
Entries in successfully-parsed (or un-parsed) fields mapped into field indices as defined by the keys arguments to khttp_parse().
 
 
fieldnmap
Entries in unsuccessfully-parsed (but still attempted) fields mapped into field indices as defined by the keys argument to khttp_parse().
 
 
fieldsz
The number of elements in the fields array.
 
 
fullpath
The full path following the server name or an empty string if there is no path following the server. For example, if foo.cgi/bar/baz is the PATH_INFO, this would be /bar/baz.
 
 
host
The host-name (i.e., the host of the web application) request passed to the application. This shouldn't be confused with the application host's canonical name.
 
 
method
The KMETHOD_ACL, KMETHOD_CONNECT, KMETHOD_COPY, KMETHOD_DELETE, KMETHOD_GET, KMETHOD_HEAD, KMETHOD_LOCK, KMETHOD_MKCALENDAR, KMETHOD_MKCOL, KMETHOD_MOVE, KMETHOD_OPTIONS, KMETHOD_POST, KMETHOD_PROPFIND, KMETHOD_PROPPATCH, KMETHOD_PUT, KMETHOD_REPORT, KMETHOD_TRACE, or KMETHOD_UNLOCK submission method obtained from the REQUEST_METHOD header variable. If an unknown method was requested, KMETHOD__MAX is used. If no method was specified, the default is KMETHOD_GET.
Note: applications will usually accept only KMETHOD_GET and KMETHOD_POST, so be sure to emit a KHTTP_405 status for undesired methods.
 
 
kdata
Internal data. Should not be touched.
 
 
keys
Value passed to khttp_parse().
 
 
keysz
Value passed to khttp_parse().
 
 
mime
The MIME type of the requested file as determined by its suffix matched to the mimemap map passed to khttp_parsex() or the default kmimemap if using khttp_parse(). This defaults to the mimesz value passed to khttp_parsex() or the default KMIME__MAX if using khttp_parse() when no suffix is specified or when the suffix is specified but not known.
 
 
page
The page index as defined by the pages array passed to khttp_parse() and parsed from the requested file. This is the first path component! The default page provided to khttp_parse() is used if no path was specified or pagesz if the path failed lookup.
 
 
pagename
The string corresponding to page.
 
 
port
The server's receiving TCP port.
 
 
path
The path (or empty string) following the parsed component regardless of whether it was located in the path array provided to khttp_parse(). For example, if the PATH_INFO is foo.cgi/bar/baz.html, the path component would be baz (with the leading slash stripped).
 
 
pname
The script name (which may be an empty string in degenerate cases) passed to the server. This may not reflect a file-system entity if re-written by the web server.
 
 
rawauth
If the web server passes the “Authorization” header (which, for example, Apache doesn't by default), then the header is parsed into this field, which is of type struct khttpauth.
 
 
remote
The string form of the client's IPV4 or IVP6 address.
 
 
reqmap
Mapping of enum krequ enumeration values to reqs parsed from the input stream.
 
 
reqs
List of all HTTP request headers, known via enum krequ and not known, parsed from the input stream, or NULL if reqsz is 0.
 
 
reqsz
Number of request headers in reqs.
 
 
scheme
The access scheme, which is either KSCHEME_HTTP or KSCHEME_HTTPS. The scheme defaults to KSCHEME_HTTP if not specified by the request.
 
 
suffix
The suffix part of the PATH_INFO or an empty string if none exists. For example, if the PATH_INFO is foo.cgi/bar/baz.html, the suffix would be html. See the mime field for the MIME type parsed from the suffix.
The application may optionally define keys provided to khttp_parse and khttp_parsex as an array of struct kvalid. This structure is central to the validation of input data. It consists of the following fields:
 
 
name
The field name, i.e., how it appears in the HTML form input name. This cannot be NULL. If the field name is an empty string and the HTTP message consists of an opaque body (and not key-value pairs), then that field will be used to validate the HTTP message body. This is useful for KMETHOD_PUT style requests.
 
 
valid
Validating function. This function accepts a single struct kpair * argument and returns an int where zero is failure and non-zero is parse success. If the function is NULL, then no validation is performed and the data is considered as valid and is bucketed into fieldmap as such. If you provide your own valid function, it usually sets the type and parsed variables in the key-value pair. However, if you're working with binary or alternatively-typed data, you can set the type to KPAIR__MAX, ignore the parsed field, and work directly with val and valsz. You can also allocate new memory for the val and thus valsz: if the value of val changes during your validation, the new value will be freed with free(3) after being passed out of the sandbox. Note: these functions are invoked from within a system-specific sandbox. You should assume that you cannot invoke any “invasive” system calls such as opening files, sockets, etc. In other words, these must be pure computation.
The struct kpair structure presents the user with fields parsed from input and (possibly) matched to the keys variable passed to khttp_parse and khttp_parsex. It is also passed to the validation function to be filled in. In this case, the MIME-related fields are already filled in and may be examined to determine the method of validation. This is useful when validating opaque message bodies.
 
 
ctype
The value's MIME content type (e.g., image/jpeg), or an empty string if not defined.
 
 
ctypepos
If ctype is not NULL, it is looked up in the mimes parameter passed to khttp_parsex or ksuffixmap if using khttp_parse. If found, it is set to the appropriate index. Otherwise, it's mimesz.
 
 
file
The value's MIME source filename or an empty string if not defined.
 
 
key
The nil-terminated key (input) name. If the HTTP message body is opaque (e.g., KMETHOD_PUT), then an empty-string key is cooked up.
 
 
keypos
If looked up in the keys variable passed to khttp_parse, the index of the looked-up key. Otherwise keysz.
 
 
next
In a cookie or field map, next points to the next parsed key-value pair with the same key name. This occurs most often in HTML checkbox forms, where many fields may have the same name.
 
 
parsed
The parsed, validated value. These may be integer, for a 64-bit signed integer; string, for a nil-termianted character string; or double, for a double-precision floating-point number. This is intentionally basic because the resulting data must be reliably passed from the parsing context back into the web application.
 
 
state
The validation state: whether validated by a parse, invalidated by a parse, or non-validated (unparsed).
 
 
type
If parsed, the type of data in parsed, otherwise KFIELD__MAX.
 
 
val
The (input) value, which is always nil-terminated, but if the data is binary, nil terminators may occur before the true data length of valsz.
 
 
valsz
The true length of val.
 
 
xcode
The value's MIME content transfer encoding (e.g., base64), or an empty string if not defined.
The struct khttpauth structure holds authorisation data if passed by the server. If no data was passed by the server, the type value is KAUTH_NONE. Otherwise it's KAUTH_BASIC or KAUTH_DIGEST, with KAUTH_UNKNOWN if the authorisation type was not recognised. The specific fields are as follows.
 
 
authorised
For KAUTH_BASIC or KAUTH_DIGEST authorisation, this field indicates whether all required values were specified.
 
 
d
A union containing parsed fields per type: basic for KAUTH_BASIC or digest for KAUTH_DIGEST.
If the field for an HTTP authorisation request is KAUTH_BASIC, it will consist of the following for its parsed entities in its struct khttpbasic structure:
 
 
response
The hashed and encoded response string.
If the field for an HTTP authorisation request is KAUTH_DIGEST, it will consist of the following in its struct khttpdigest structure:
 
 
alg
The encoding algorithm, parsed from the possible MD5 or MD5-Sess values.
 
 
qop
The quality of protection algorithm, which may be unspecified, Auth or Auth-Init.
 
 
user
The user coordinating the request.
 
 
uri
The URI for which the request is designated. (This must match the request URI).
 
 
realm
The request realm.
 
 
nonce
The server-generated nonce value.
 
 
cnonce
The (optional) client-generated nonce value.
 
 
response
The hashed and encoded response string, which entangled fields depending on algorithm and quality of protection.
 
 
count
The (optional) cnonce counter.
 
 
opaque
The (optional) opaque string requested by the server.
The struct kopts structure consists of tunables for network performance. You probably don't want to use these unless you really know what you're doing!
 
 
sndbufsz
The size of the output buffer. The output buffer is a heap-allocated region into which writes (via khttp_write(3) and khttp_head(3)) are buffered instead of being flushed directly to the wire. The buffer is flushed when it is full, when the HTTP headers are flushed, and when khttp_free(3) is invoked. If the buffer size is zero, writes are flushed immediately to the wire. If the buffer size is less than zero, it is filled with a meaningful default.
Lastly, the struct khead structure holds parsed HTTP headers.
 
 
key
Holds the HTTP header name. This is not the CGI header name (e.g., HTTP_COOKIE), but the reconstituted HTTP name (e.g., Coookie).
 
 
val
The opaque header value, which may be an empty string.

Variables

A number of variables are defined <kcgi.h> to simplify invocations of the khttp_parse family. Applications are strongly suggested to use these variables (and associated enumerations) in khttp_parse instead of overriding them with hand-rolled sets in khttp_parsex.
 
 
kmimetypes
Indexed list of common MIME types, for example, “text/html” and “application/json”. Corresponds to enum kmime enum khttp.
 
 
khttps
Indexed list of HTTP status code and identifier, for example, “200 OK”. Corresponds to enum khttp.
 
 
kschemes
Indexed list of URL schemes, for example, “https” or “ftp”. Corresponds to enum kscheme.
 
 
kresps
Indexed list of header response names, for example, “Cache-Control” or “Content-Length”. Corresponds to enum kresp.
 
 
kmethods
Indexed list of HTTP methods, for example, “GET” and “POST”. Corresponds to enum kmethod.
 
 
ksuffixmap
Map of MIME types defined in enum kmime to possible suffixes. This array is terminated with a MIME type of KMIME__MAX and name NULL.
 
 
ksuffixes
Indexed list of canonical suffixes for MIME types corresponding to enum kmime. Note: this may be a NULL pointer for types that have no canonical suffix, for example. “application/octet-stream”.

RETURN VALUES

khttp_parse and khttp_parsex return an error code:
 
 
KCGI_OK
Success (not an error).
 
 
KCGI_ENOMEM
Memory failure. This can occur in many places: spawning a child, allocating memory, creating sockets, etc.
 
 
KCGI_ENFILE
Could not allocate file descriptors.
 
 
KCGI_EAGAIN
Could not spawn a child.
 
 
KCGI_FORM
Malformed data between parent and child whilst parsing an HTTP request. (Internal system error.)
 
 
KCGI_SYSTEM
Opaque operating system error.
On failure, the calling application should terminate as soon as possible. Applications should not try to write an HTTP 505 error or similar, but allow the web server to handle the empty CGI response on its own.

SEE ALSO

kcgi(3), khttp_free(3)

AUTHORS

The khttp_parse and khttp_parsex functions were written by Kristaps Dzonsons <kristaps@bsd.lv>.
September 22, 2017 OpenBSD 5.8