NAME
khttp_parse
,
khttp_parsex
—
parse a CGI instance for
kcgi
LIBRARY
library “libkcgi”
SYNOPSIS
#include
<sys/types.h>
#include <stdarg.h>
#include <stdint.h>
#include <kcgi.h>
enum kcgi_err
khttp_parse
(struct kreq *req,
const struct kvalid *keys, size_t
keysz, const char *const *pages,
size_t pagesz, size_t
defpage);
enum kcgi_err
khttp_parsex
(struct kreq *req,
const struct kmimemap *suffixes, const
char *const *mimes, size_t mimesz,
const struct kvalid *keys, size_t
keysz, const char *const *pages,
size_t pagesz, size_t defmime,
size_t defpage, void *arg,
void (*argfree)(void *arg), unsigned
int debugging, const struct kopts *opts);
extern const char *const
kmimetypes[KMIME__MAX];
extern const char *const khttps[KHTTP__MAX];
extern const char *const kschemes[KSCHEME__MAX];
extern const char *const kmethods[KMETHOD__MAX];
extern const struct kmimemap ksuffixmap[];
extern const char *const ksuffixes[KMIME__MAX];
DESCRIPTION
The
khttp_parse
()
and khttp_parsex
() functions parse and validate
input and the HTTP environment (compression, paths, MIME types, and so on).
They are the central functions in the kcgi(3) library, parsing and validating key-value form (query
string, message body, cookie) data and opaque message bodies.
They must be matched by khttp_free(3) if and only if the return value is
KCGI_OK
. Otherwise, resources are internally
freed.
The collective arguments are as follows:
- arg
- A pointer to private application data. It is not touched unless argfree is provided.
- argfree
- Function invoked with arg by the child process starting to parse untrusted network data. This makes sure that no unnecessary data is leaked into the child.
- debugging
- This bit-field enables debugging of the underlying parse and/or write
routines. It may have
KREQ_DEBUG_WRITE
for writes andKREQ_DEBUG_READ_BODY
for the pre-parsed body. Debugging messages to kutil_info(3) consist of the process ID followed by "-tx" or "-rx" for writing or reading, a colon and space, then the logged data. A newline will flush the existing line, as well reaching 80 characters. If flushed at 80 characters and not a newline, an ellipsis will follow the line. The total logged bytes will be emitted at the end of all reads or writes. - defmime
- If no MIME type is specified (that is, there's no suffix to the page request), use this index in the mimes array.
- defpage
- If no page was specified (e.g., the default landing page), this is provided as the requested page index.
- keys
- An optional array of input and validation fields or
NULL
. - keysz
- The number of elements in keys.
- mimesz
- The number of elements in mimes. Also the MIME index used if no MIME type was matched. This differs from defmime, which is used if there is no MIME suffix at all.
- mimes
- An array of MIME types (e.g., “text/html”), mapped into a
MIME index during MIME body parsing. This relates both to pages and input
fields with a body type. Any array should include at least
text/plain
, as this is the default content type for MIME documents. - opts
- Tunable options regarding socket buffer sizes and so on. If set to
NULL
, meaningful defaults are used. - pages
- An array of recognised pathnames. When pathnames are parsed, they're matched to indices in this array.
- pagesz
- The number of pages in pages. Also used if the requested page was not in pages.
- req
- This structure is cleared and filled with input fields and HTTP context parsed from the CGI environment. It is the main structure carried around in a kcgi(3) application.
- suffixes
- Define the MIME type (suffix) mapping.
The first form,
khttp_parse
(),
is for applications using the system-recognised MIME types. This should work
well enough for most applications. It is equivalent to invoking the second
form, khttp_parsex
(), as follows:
khttp_parsex(req, ksuffixmap, kmimetypes, KMIME__MAX, keys, keysz, pages, pagesz, KMIME_TEXT_HTML, defpage, NULL, NULL, 0, NULL);
Types
A struct kreq object is filled in by
khttp_parse
() and
khttp_parsex
(). It consists of the following
fields:
- void *arg
- Private application data. This is set during
khttp_parse
(). - enum kauth auth
- Type of “managed” HTTP authorisation performed by the web
server according to the
AUTH_TYPE
header variable, if any. This isKAUTH_DIGEST
for theAUTH_TYPE
of "digest",KAUTH_BASIC
for "basic",KAUTH_BEARER
for "bearer",KAUTH_UNKNOWN
for other values ofAUTH_TYPE
, orKAUTH_NONE
ifAUTH_TYPE
is not set. See the rawauth field for raw (i.e., not processed by the web server) authorisation requests. - struct kpair **cookiemap
- An array of keysz singly linked lists of elements of
the cookies array. If
cookie->key is equal to one
of the entries of keys and
cookie->state is
KPAIR_VALID
orKPAIR_UNCHECKED
, the cookie is added to the list cookiemap[cookie->keypos]. Empty lists areNULL
. If a list contains more than one cookie, cookie->next points to the next cookie. For the last cookie in a list, cookie->next is NULL. - struct kpair **cookienmap
- Similar to cookiemap, except that it contains the
cookies where cookie->state
is
KPAIR_INVALID
. - struct kpair *cookies
- Key-value pairs read from request cookies found in the
HTTP_COOKIE
header variable, orNULL
if cookiesz is 0. See fields for key-value pairs from the request query string or message body. - size_t cookiesz
- The size of the cookies array.
- struct kpair **fieldmap
- Similar to cookiemap, except that the lists contain elements of the fields array.
- struct kpair **fieldnmap
- Similar to fieldmap, except that it contains the
fields where field->state
is
KPAIR_INVALID
. - struct kpair *fields
- Key-value pairs read from the
QUERY_STRING
header variable and from the message body, orNULL
iffieldsz
is 0. See cookies for key-value pairs from request cookies. - size_t fieldsz
- The number of elements in the fields array.
- char *fullpath
- The full requested path as contained in the
PATH_INFO
header variable. For example, requesting "https://bsd.lv/app.cgi/dir/file.html?q=v", where "app.cgi" is the CGI program, this value would be /dir/file.html. It is not guaranteed to start with a slash and it may be an empty string. - char *host
- The host name received in the
HTTP_HOST
header variable. When using name-based virtual hosting, this is typically the virtual host name specified by the client in the HTTP request, and it should not be confused with the canonical DNS name of the host running the web server. For example, a request to "https://bsd.lv/app.cgi/file" would have a host of "bsd.lv". IfHTTP_HOST
is not defined, host is set to "localhost". - struct kdata *kdata
- Internal data. Should not be touched.
- const struct kvalid *keys
- Value passed to
khttp_parse
(). - size_t keysz
- Value passed to
khttp_parse
(). - enum kmethod method
- The
KMETHOD_ACL
,KMETHOD_CONNECT
,KMETHOD_COPY
,KMETHOD_DELETE
,KMETHOD_GET
,KMETHOD_HEAD
,KMETHOD_LOCK
,KMETHOD_MKCALENDAR
,KMETHOD_MKCOL
,KMETHOD_MOVE
,KMETHOD_OPTIONS
,KMETHOD_POST
,KMETHOD_PROPFIND
,KMETHOD_PROPPATCH
,KMETHOD_PUT
,KMETHOD_REPORT
,KMETHOD_TRACE
, orKMETHOD_UNLOCK
submission method obtained from theREQUEST_METHOD
header variable. If an unknown method was requested,KMETHOD__MAX
is used. If no method was specified, the default isKMETHOD_GET
.Applications will usually accept only
KMETHOD_GET
andKMETHOD_POST
, so be sure to emit aKHTTP_405
status for undesired methods. - size_t mime
- The MIME type of the requested file as determined by its
suffix matched to the
mimemap map passed to
khttp_parsex
() or the default kmimemap if usingkhttp_parse
(). This defaults to the mimesz value passed tokhttp_parsex
() or the defaultKMIME__MAX
if usingkhttp_parse
() when no suffix is specified or when the suffix is specified but not known. - size_t page
- The page index found by looking up pagename in the pages array. If pagename is not found in pages, pagesz is used; if pagename is empty, defpage is used.
- char *pagename
- The first component of fullpath or an empty string if there is none. It is compared to the elements of the pages array to determine which page it corresponds to. For example, for a fullpath of "/dir/file.html" this component corresponds to dir. For "/file.html", it's file.
- char *path
- The middle part of fullpath, after stripping pagename/ at the beginning and .suffix at the end, or an empty string if there is none. For example, if the fullpath is bar/baz.html, this component is baz.
- char *pname
- The script name received in the
SCRIPT_NAME
header variable. For example, for a request to a CGI program /var/www/cgi-bin/app.cgi mapped by the web server from "https://bsd.lv/app.cgi/file", this would be app.cgi. This may not reflect a file system entity and it may be an empty string. - uint16_t port
- The server's receiving TCP port according to the
SERVER_PORT
header variable, or 80 if that is not defined or an invalid number. - struct khttpauth rawauth
- The raw authorization request according to the
HTTP_AUTHORIZATION
header variable passed by the web server. This is only set if the web server is not managing authorisation itself. - char *remote
- The string form of the client's IPv4 or IPv6 address taken from the
REMOTE_ADDR
header variable, or "127.0.0.1" if that is not defined. The address format of the string is not checked. - struct khead
*reqmap[
KREQU__MAX
] - Mapping of enum krequ enumeration values to reqs parsed from the input stream.
- struct khead *reqs
- List of all HTTP request headers, known via enum
krequ and not known, parsed from the input stream, or
NULL
if reqsz is 0. - size_t reqsz
- Number of request headers in reqs.
- enum kscheme scheme
- The access scheme according to the
HTTPS
header variable, eitherKSCHEME_HTTPS
ifHTTPS
is set and equal to the string "on" orKSCHEME_HTTP
otherwise. - char *suffix
- The suffix part of the last component of fullpath or an empty string if there is none. For example, if the fullpath is /bar/baz.html, this component is html. See the mime field for the MIME type parsed from the suffix.
The application may optionally define
keys provided to
khttp_parse
()
and khttp_parsex
() as an array of
struct kvalid. This structure is central to the
validation of input data. It consists of the following fields:
- const char *name
- The field name, i.e., how it appears in the HTML form input name. This
cannot be
NULL
. If the field name is an empty string and the HTTP message consists of an opaque body (and not key-value pairs), then that field will be used to validate the HTTP message body. This is useful forKMETHOD_PUT
style requests. - int (*)(struct kpair *) valid
- A validation function returning non-zero if parsing and validation succeed
or 0 otherwise. If it is
NULL
, then no validation is performed, the data is considered as valid, and it is bucketed into cookiemap or fieldmap as such.User-defined valid functions usually set the type and parsed fields in the key-value pair. When working with binary data or with a key that can take different data types, it is acceptable for a validation function to set the type to
KPAIR__MAX
and for the application to ignore the parsed field and to work directly with val and valsz.The validation function is allowed to allocate new memory for val: if the val pointer changes during validation, the memory pointed to after validation will be freed with free(3) after the data is passed out of the sandbox.
These functions are invoked from within a system-specific sandbox that may not allow some system calls, for example opening files or sockets. In other words, validation functions should only do pure computation.
The struct kpair
structure presents the user with fields parsed from input and (possibly)
matched to the keys variable passed to
khttp_parse
()
and khttp_parsex
(). It is also passed to the
validation function to be filled in. In this case, the MIME-related fields
are already filled in and may be examined to determine the method of
validation. This is useful when validating opaque message bodies.
- char *ctype
- The value's MIME content type (e.g.,
image/jpeg
), or an empty string if not defined. - size_t ctypepos
- If ctype is not
NULL
, it is looked up in the mimes parameter passed tokhttp_parsex
() or ksuffixmap if usingkhttp_parse
(). If found, it is set to the appropriate index. Otherwise, it's mimesz. - char *file
- The value's MIME source filename or an empty string if not defined.
- char *key
- The NUL-terminated key (input) name. If the HTTP message body is opaque
(e.g.,
KMETHOD_PUT
), then an empty-string key is cooked up. The key may contain an arbitrary sequence of non-NUL bytes, even non-ASCII bytes, control characters, and shell metacharacters. - size_t keypos
- If found in the keys array passed to
khttp_parse
(), the index of the matching key. Otherwise keysz. - struct kpair *next
- In a cookie or field map, next points to the next parsed key-value pair with the same key name. This occurs most often in HTML checkbox forms, where many fields may have the same name.
- union parsed parsed
- The parsed, validated value. These may be integer in i, for a 64-bit signed integer; a string s, for a NUL-termianted character string; or a double d, for a double-precision floating-point number. This is intentionally basic because the resulting data must be reliably passed from the parsing context back into the web application.
- enum kpairstate state
- The validation state:
KPAIR_VALID
if the pair was successfully validated by a validation function,KPAIR_INVALID
if a validation function was invoked but failed, orKPAIR_UNCHECKED
if no validation function is defined for this key. - enum kpairtype type
- If parsed, the type of data in parsed, otherwise
KFIELD__MAX
. - char *val
- The (input) value, which may contain an arbitrary sequence of bytes, even NUL bytes, non-ASCII bytes, control characters, and shell metacharacters. The byte following the end of the array, val[valsz], is always guaranteed to be NUL. The validation function may modify the contents. For example, for integer numbers and e-mail adresses, trailing whitespace may be replaced with NUL bytes.
- size_t valsz
- The length of the val buffer in bytes. It is not a string length.
- char *xcode
- The value's MIME content transfer encoding (e.g.,
base64
), or an empty string if not defined.
The struct khttpauth structure holds authorisation data if passed by the server. The specific fields are as follows.
- enum kauth type
- If no data was passed by the server, the type value
is
KAUTH_NONE
. Otherwise it'sKAUTH_BASIC
,KAUTH_BEARER
, orKAUTH_DIGEST
.KAUTH_UNKNOWN
signals that the authorisation type was not recognised. - int authorised
- For
KAUTH_BASIC
,KAUTH_BEARER
, orKAUTH_DIGEST
authorisation, this field indicates whether all required values were specified for the application to perform authorisation. - char *digest
- An MD5 digest of
REQUEST_METHOD
,SCRIPT_NAME
,PATH_INFO
, header variables and the request body. It is not a NUL-terminated string, but an array of exactlyMD5_DIGEST_LENGTH
bytes. Only filled in whenHTTP_AUTHORIZATION
is "digest" and authorised is non-zero. Otherwise, it remainsNULL
. Used in khttpdigest_validatehash(3). - d
- An anonymous union containing parsed fields per type:
struct khttpbasic basic for
KAUTH_BASIC
orKAUTH_BEARER
, or struct khttpdigest digest forKAUTH_DIGEST
.
If the field for an HTTP authorisation request is
KAUTH_BASIC
or KAUTH_BEARER
,
it will consist of the following for its parsed entities in its
struct khttpbasic structure:
- response
- The hashed and encoded response string for
KAUTH_BASIC
, or an opaque string forKAUTH_BEARER
.
If the field for an HTTP authorisation request is
KAUTH_DIGEST
, it will consist of the following in
its struct khttpdigest structure:
- alg
- The encoding algorithm, parsed from the possible
MD5
orMD5-Sess
values. - qop
- The quality of protection algorithm, which may be unspecified,
Auth
orAuth-Init
. - user
- The user coordinating the request.
- uri
- The URI for which the request is designated. (This must match the request URI).
- realm
- The request realm.
- nonce
- The server-generated nonce value.
- cnonce
- The (optional) client-generated nonce value.
- response
- The hashed and encoded response string, which entangled fields depending on algorithm and quality of protection.
- count
- The (optional) cnonce counter.
- opaque
- The (optional) opaque string requested by the server.
The struct kopts structure consists of tunables for network performance. You probably don't want to use these unless you really know what you're doing!
- sndbufsz
- The size of the output buffer. The output buffer is a heap-allocated region into which writes (via khttp_write(3) and khttp_head(3)) are buffered instead of being flushed directly to the wire. The buffer is flushed when it is full, when the HTTP headers are flushed, and when khttp_free(3) is invoked. If the buffer size is zero, writes are flushed immediately to the wire. If the buffer size is less than zero, it is filled with a meaningful default.
Lastly, the struct khead structure holds parsed HTTP headers.
- key
- Holds the HTTP header name. This is not the CGI header name (e.g.,
HTTP_COOKIE
), but the reconstituted HTTP name (e.g.,Coookie
). - val
- The opaque header value, which may be an empty string.
Variables
A number of variables are defined
<kcgi.h>
to simplify
invocations of the
khttp_parse
()
family. Applications are strongly suggested to use these variables (and
associated enumerations) in khttp_parse
() instead of
overriding them with hand-rolled sets in
khttp_parsex
().
- kmimetypes
- Indexed list of common MIME types, for example, “text/html” and “application/json”. Corresponds to enum kmime enum khttp.
- khttps
- Indexed list of HTTP status code and identifier, for example, “200 OK”. Corresponds to enum khttp.
- kschemes
- Indexed list of URL schemes, for example, “https” or “ftp”. Corresponds to enum kscheme.
- kmethods
- Indexed list of HTTP methods, for example, “GET” and “POST”. Corresponds to enum kmethod.
- ksuffixmap
- Map of MIME types defined in enum kmime to possible
suffixes. This array is terminated with a MIME type of
KMIME__MAX
and nameNULL
. - ksuffixes
- Indexed list of canonical suffixes for MIME types corresponding to
enum kmime. This may be a
NULL
pointer for types that have no canonical suffix, for example. “application/octet-stream”.
RETURN VALUES
khttp_parse
() and
khttp_parsex
() return an error code:
KCGI_OK
- Success (not an error).
KCGI_ENOMEM
- Memory failure. This can occur in many places: spawning a child, allocating memory, creating sockets, etc.
KCGI_ENFILE
- Could not allocate file descriptors.
KCGI_EAGAIN
- Could not spawn a child.
KCGI_FORM
- Malformed data between parent and child whilst parsing an HTTP request. (Internal system error.)
KCGI_SYSTEM
- Opaque operating system error.
On failure, the calling application should terminate as soon as possible. Applications should not try to write an HTTP 505 error or similar, but allow the web server to handle the empty CGI response on its own.
SEE ALSO
AUTHORS
The khttp_parse
() and
khttp_parsex
() functions were written by
Kristaps Dzonsons
<kristaps@bsd.lv>.