NAME
khttp_parse,
    khttp_parsex —
    parse a CGI instance for
  kcgi
LIBRARY
library “libkcgi”
SYNOPSIS
#include
    <sys/types.h>
  
  #include <stdarg.h>
  
  #include <stdint.h>
  
  #include <kcgi.h>
enum kcgi_err
  
  khttp_parse(struct kreq *req,
    const struct kvalid *keys, size_t
    keysz, const char *const *pages,
    size_t pagesz, size_t
  defpage);
enum kcgi_err
  
  khttp_parsex(struct kreq *req,
    const struct kmimemap *suffixes, const
    char *const *mimes, size_t mimesz,
    const struct kvalid *keys, size_t
    keysz, const char *const *pages,
    size_t pagesz, size_t defmime,
    size_t defpage, void *arg,
    void (*argfree)(void *arg), unsigned
    int debugging, const struct kopts *opts);
extern const char *const
    kmimetypes[KMIME__MAX];
  
  extern const char *const khttps[KHTTP__MAX];
  
  extern const char *const kschemes[KSCHEME__MAX];
  
  extern const char *const kmethods[KMETHOD__MAX];
  
  extern const struct kmimemap ksuffixmap[];
  
  extern const char *const ksuffixes[KMIME__MAX];
DESCRIPTION
The
    khttp_parse()
    and khttp_parsex() functions parse and validate
    input and the HTTP environment (compression, paths, MIME types, and so on).
    They are the central functions in the kcgi(3) library, parsing and validating key-value form (query
    string, message body, cookie) data and opaque message bodies.
They must be matched by khttp_free(3) if and only if the return value is
    KCGI_OK. Otherwise, resources are internally
  freed.
The collective arguments are as follows:
- arg
 - A pointer to private application data. It is not touched unless argfree is provided.
 - argfree
 - Function invoked with arg by the child process starting to parse untrusted network data. This makes sure that no unnecessary data is leaked into the child.
 - debugging
 - This bit-field enables debugging of the underlying parse and/or write
      routines. It may have 
KREQ_DEBUG_WRITEfor writes andKREQ_DEBUG_READ_BODYfor the pre-parsed body. Debugging messages to kutil_info(3) consist of the process ID followed by "-tx" or "-rx" for writing or reading, a colon and space, then the logged data. A newline will flush the existing line, as well reaching 80 characters. If flushed at 80 characters and not a newline, an ellipsis will follow the line. The total logged bytes will be emitted at the end of all reads or writes. - defmime
 - If no MIME type is specified (that is, there's no suffix to the page request), use this index in the mimes array.
 - defpage
 - If no page was specified (e.g., the default landing page), this is provided as the requested page index.
 - keys
 - An optional array of input and validation fields or
      
NULL. - keysz
 - The number of elements in keys.
 - mimesz
 - The number of elements in mimes. Also the MIME index used if no MIME type was matched. This differs from defmime, which is used if there is no MIME suffix at all.
 - mimes
 - An array of MIME types (e.g., “text/html”), mapped into a
      MIME index during MIME body parsing. This relates both to pages and input
      fields with a body type. Any array should include at least
      
text/plain, as this is the default content type for MIME documents. - opts
 - Tunable options regarding socket buffer sizes and so on. If set to
      
NULL, meaningful defaults are used. - pages
 - An array of recognised pathnames. When pathnames are parsed, they're matched to indices in this array.
 - pagesz
 - The number of pages in pages. Also used if the requested page was not in pages.
 - req
 - This structure is cleared and filled with input fields and HTTP context parsed from the CGI environment. It is the main structure carried around in a kcgi(3) application.
 - suffixes
 - Define the MIME type (suffix) mapping.
 
The first form,
    khttp_parse(),
    is for applications using the system-recognised MIME types. This should work
    well enough for most applications. It is equivalent to invoking the second
    form, khttp_parsex(), as follows:
khttp_parsex(req, ksuffixmap, kmimetypes, KMIME__MAX, keys, keysz, pages, pagesz, KMIME_TEXT_HTML, defpage, NULL, NULL, 0, NULL);
Types
A struct kreq object is filled in by
    khttp_parse() and
    khttp_parsex(). It consists of the following
  fields:
- void *arg
 - Private application data. This is set during
      
khttp_parse(). - enum kauth auth
 - Type of “managed” HTTP authorisation performed by the web
      server according to the 
AUTH_TYPEheader variable, if any. This isKAUTH_DIGESTfor theAUTH_TYPEof "digest",KAUTH_BASICfor "basic",KAUTH_BEARERfor "bearer",KAUTH_UNKNOWNfor other values ofAUTH_TYPE, orKAUTH_NONEifAUTH_TYPEis not set. See the rawauth field for raw (i.e., not processed by the web server) authorisation requests. - struct kpair **cookiemap
 - An array of keysz singly linked lists of elements of
      the cookies array. If
      cookie->key is equal to one
      of the entries of keys and
      cookie->state is
      
KPAIR_VALIDorKPAIR_UNCHECKED, the cookie is added to the list cookiemap[cookie->keypos]. Empty lists areNULL. If a list contains more than one cookie, cookie->next points to the next cookie. For the last cookie in a list, cookie->next is NULL. - struct kpair **cookienmap
 - Similar to cookiemap, except that it contains the
      cookies where cookie->state
      is 
KPAIR_INVALID. - struct kpair *cookies
 - Key-value pairs read from request cookies found in the
      
HTTP_COOKIEheader variable, orNULLif cookiesz is 0. See fields for key-value pairs from the request query string or message body. - size_t cookiesz
 - The size of the cookies array.
 - struct kpair **fieldmap
 - Similar to cookiemap, except that the lists contain elements of the fields array.
 - struct kpair **fieldnmap
 - Similar to fieldmap, except that it contains the
      fields where field->state
      is 
KPAIR_INVALID. - struct kpair *fields
 - Key-value pairs read from the 
QUERY_STRINGheader variable and from the message body, orNULLiffieldszis 0. See cookies for key-value pairs from request cookies. - size_t fieldsz
 - The number of elements in the fields array.
 - char *fullpath
 - The full requested path as contained in the
      
PATH_INFOheader variable. For example, requesting "https://bsd.lv/app.cgi/dir/file.html?q=v", where "app.cgi" is the CGI program, this value would be /dir/file.html. It is not guaranteed to start with a slash and it may be an empty string. - char *host
 - The host name received in the 
HTTP_HOSTheader variable. When using name-based virtual hosting, this is typically the virtual host name specified by the client in the HTTP request, and it should not be confused with the canonical DNS name of the host running the web server. For example, a request to "https://bsd.lv/app.cgi/file" would have a host of "bsd.lv". IfHTTP_HOSTis not defined, host is set to "localhost". - struct kdata *kdata
 - Internal data. Should not be touched.
 - const struct kvalid *keys
 - Value passed to 
khttp_parse(). - size_t keysz
 - Value passed to 
khttp_parse(). - enum kmethod method
 - The 
KMETHOD_ACL,KMETHOD_CONNECT,KMETHOD_COPY,KMETHOD_DELETE,KMETHOD_GET,KMETHOD_HEAD,KMETHOD_LOCK,KMETHOD_MKCALENDAR,KMETHOD_MKCOL,KMETHOD_MOVE,KMETHOD_OPTIONS,KMETHOD_POST,KMETHOD_PROPFIND,KMETHOD_PROPPATCH,KMETHOD_PUT,KMETHOD_REPORT,KMETHOD_TRACE, orKMETHOD_UNLOCKsubmission method obtained from theREQUEST_METHODheader variable. If an unknown method was requested,KMETHOD__MAXis used. If no method was specified, the default isKMETHOD_GET.Applications will usually accept only
KMETHOD_GETandKMETHOD_POST, so be sure to emit aKHTTP_405status for undesired methods. - size_t mime
 - The MIME type of the requested file as determined by its
      suffix matched to the
      mimemap map passed to
      
khttp_parsex() or the default kmimemap if usingkhttp_parse(). This defaults to the mimesz value passed tokhttp_parsex() or the defaultKMIME__MAXif usingkhttp_parse() when no suffix is specified or when the suffix is specified but not known. - size_t page
 - The page index found by looking up pagename in the pages array. If pagename is not found in pages, pagesz is used; if pagename is empty, defpage is used.
 - char *pagename
 - The first component of fullpath or an empty string if there is none. It is compared to the elements of the pages array to determine which page it corresponds to. For example, for a fullpath of "/dir/file.html" this component corresponds to dir. For "/file.html", it's file.
 - char *path
 - The middle part of fullpath, after stripping pagename/ at the beginning and .suffix at the end, or an empty string if there is none. For example, if the fullpath is bar/baz.html, this component is baz.
 - char *pname
 - The script name received in the 
SCRIPT_NAMEheader variable. For example, for a request to a CGI program /var/www/cgi-bin/app.cgi mapped by the web server from "https://bsd.lv/app.cgi/file", this would be app.cgi. This may not reflect a file system entity and it may be an empty string. - uint16_t port
 - The server's receiving TCP port according to the
      
SERVER_PORTheader variable, or 80 if that is not defined or an invalid number. - struct khttpauth rawauth
 - The raw authorization request according to the
      
HTTP_AUTHORIZATIONheader variable passed by the web server. This is only set if the web server is not managing authorisation itself. - char *remote
 - The string form of the client's IPv4 or IPv6 address taken from the
      
REMOTE_ADDRheader variable, or "127.0.0.1" if that is not defined. The address format of the string is not checked. - struct khead
    *reqmap[
KREQU__MAX] - Mapping of enum krequ enumeration values to reqs parsed from the input stream.
 - struct khead *reqs
 - List of all HTTP request headers, known via enum
      krequ and not known, parsed from the input stream, or
      
NULLif reqsz is 0. - size_t reqsz
 - Number of request headers in reqs.
 - enum kscheme scheme
 - The access scheme according to the 
HTTPSheader variable, eitherKSCHEME_HTTPSifHTTPSis set and equal to the string "on" orKSCHEME_HTTPotherwise. - char *suffix
 - The suffix part of the last component of fullpath or an empty string if there is none. For example, if the fullpath is /bar/baz.html, this component is html. See the mime field for the MIME type parsed from the suffix.
 
The application may optionally define
    keys provided to
    khttp_parse()
    and khttp_parsex() as an array of
    struct kvalid. This structure is central to the
    validation of input data. It consists of the following fields:
- const char *name
 - The field name, i.e., how it appears in the HTML form input name. This
      cannot be 
NULL. If the field name is an empty string and the HTTP message consists of an opaque body (and not key-value pairs), then that field will be used to validate the HTTP message body. This is useful forKMETHOD_PUTstyle requests. - int (*)(struct kpair *) valid
 - A validation function returning non-zero if parsing and validation succeed
      or 0 otherwise. If it is 
NULL, then no validation is performed, the data is considered as valid, and it is bucketed into cookiemap or fieldmap as such.User-defined valid functions usually set the type and parsed fields in the key-value pair. When working with binary data or with a key that can take different data types, it is acceptable for a validation function to set the type to
KPAIR__MAXand for the application to ignore the parsed field and to work directly with val and valsz.The validation function is allowed to allocate new memory for val: if the val pointer changes during validation, the memory pointed to after validation will be freed with free(3) after the data is passed out of the sandbox.
These functions are invoked from within a system-specific sandbox that may not allow some system calls, for example opening files or sockets. In other words, validation functions should only do pure computation.
 
The struct kpair
    structure presents the user with fields parsed from input and (possibly)
    matched to the keys variable passed to
    khttp_parse()
    and khttp_parsex(). It is also passed to the
    validation function to be filled in. In this case, the MIME-related fields
    are already filled in and may be examined to determine the method of
    validation. This is useful when validating opaque message bodies.
- char *ctype
 - The value's MIME content type (e.g., 
image/jpeg), or an empty string if not defined. - size_t ctypepos
 - If ctype is not 
NULL, it is looked up in the mimes parameter passed tokhttp_parsex() or ksuffixmap if usingkhttp_parse(). If found, it is set to the appropriate index. Otherwise, it's mimesz. - char *file
 - The value's MIME source filename or an empty string if not defined.
 - char *key
 - The NUL-terminated key (input) name. If the HTTP message body is opaque
      (e.g., 
KMETHOD_PUT), then an empty-string key is cooked up. The key may contain an arbitrary sequence of non-NUL bytes, even non-ASCII bytes, control characters, and shell metacharacters. - size_t keypos
 - If found in the keys array passed to
      
khttp_parse(), the index of the matching key. Otherwise keysz. - struct kpair *next
 - In a cookie or field map, next points to the next parsed key-value pair with the same key name. This occurs most often in HTML checkbox forms, where many fields may have the same name.
 - union parsed parsed
 - The parsed, validated value. These may be integer in i, for a 64-bit signed integer; a string s, for a NUL-termianted character string; or a double d, for a double-precision floating-point number. This is intentionally basic because the resulting data must be reliably passed from the parsing context back into the web application.
 - enum kpairstate state
 - The validation state: 
KPAIR_VALIDif the pair was successfully validated by a validation function,KPAIR_INVALIDif a validation function was invoked but failed, orKPAIR_UNCHECKEDif no validation function is defined for this key. - enum kpairtype type
 - If parsed, the type of data in parsed, otherwise
      
KFIELD__MAX. - char *val
 - The (input) value, which may contain an arbitrary sequence of bytes, even NUL bytes, non-ASCII bytes, control characters, and shell metacharacters. The byte following the end of the array, val[valsz], is always guaranteed to be NUL. The validation function may modify the contents. For example, for integer numbers and e-mail adresses, trailing whitespace may be replaced with NUL bytes.
 - size_t valsz
 - The length of the val buffer in bytes. It is not a string length.
 - char *xcode
 - The value's MIME content transfer encoding (e.g.,
      
base64), or an empty string if not defined. 
The struct khttpauth structure holds authorisation data if passed by the server. The specific fields are as follows.
- enum kauth type
 - If no data was passed by the server, the type value
      is 
KAUTH_NONE. Otherwise it'sKAUTH_BASIC,KAUTH_BEARER, orKAUTH_DIGEST.KAUTH_UNKNOWNsignals that the authorisation type was not recognised. - int authorised
 - For 
KAUTH_BASIC,KAUTH_BEARER, orKAUTH_DIGESTauthorisation, this field indicates whether all required values were specified for the application to perform authorisation. - char *digest
 - An MD5 digest of 
REQUEST_METHOD,SCRIPT_NAME,PATH_INFO, header variables and the request body. It is not a NUL-terminated string, but an array of exactlyMD5_DIGEST_LENGTHbytes. Only filled in whenHTTP_AUTHORIZATIONis "digest" and authorised is non-zero. Otherwise, it remainsNULL. Used in khttpdigest_validatehash(3). - d
 - An anonymous union containing parsed fields per type:
      struct khttpbasic basic for
      
KAUTH_BASICorKAUTH_BEARER, or struct khttpdigest digest forKAUTH_DIGEST. 
If the field for an HTTP authorisation request is
    KAUTH_BASIC or KAUTH_BEARER,
    it will consist of the following for its parsed entities in its
    struct khttpbasic structure:
- response
 - The hashed and encoded response string for
      
KAUTH_BASIC, or an opaque string forKAUTH_BEARER. 
If the field for an HTTP authorisation request is
    KAUTH_DIGEST, it will consist of the following in
    its struct khttpdigest structure:
- alg
 - The encoding algorithm, parsed from the possible
      
MD5orMD5-Sessvalues. - qop
 - The quality of protection algorithm, which may be unspecified,
      
AuthorAuth-Init. - user
 - The user coordinating the request.
 - uri
 - The URI for which the request is designated. (This must match the request URI).
 - realm
 - The request realm.
 - nonce
 - The server-generated nonce value.
 - cnonce
 - The (optional) client-generated nonce value.
 - response
 - The hashed and encoded response string, which entangled fields depending on algorithm and quality of protection.
 - count
 - The (optional) cnonce counter.
 - opaque
 - The (optional) opaque string requested by the server.
 
The struct kopts structure consists of tunables for network performance. You probably don't want to use these unless you really know what you're doing!
- sndbufsz
 - The size of the output buffer. The output buffer is a heap-allocated region into which writes (via khttp_write(3) and khttp_head(3)) are buffered instead of being flushed directly to the wire. The buffer is flushed when it is full, when the HTTP headers are flushed, and when khttp_free(3) is invoked. If the buffer size is zero, writes are flushed immediately to the wire. If the buffer size is less than zero, it is filled with a meaningful default.
 
Lastly, the struct khead structure holds parsed HTTP headers.
- key
 - Holds the HTTP header name. This is not the CGI header name (e.g.,
      
HTTP_COOKIE), but the reconstituted HTTP name (e.g.,Coookie). - val
 - The opaque header value, which may be an empty string.
 
Variables
A number of variables are defined
    <kcgi.h> to simplify
    invocations of the
    khttp_parse()
    family. Applications are strongly suggested to use these variables (and
    associated enumerations) in khttp_parse() instead of
    overriding them with hand-rolled sets in
    khttp_parsex().
- kmimetypes
 - Indexed list of common MIME types, for example, “text/html” and “application/json”. Corresponds to enum kmime enum khttp.
 - khttps
 - Indexed list of HTTP status code and identifier, for example, “200 OK”. Corresponds to enum khttp.
 - kschemes
 - Indexed list of URL schemes, for example, “https” or “ftp”. Corresponds to enum kscheme.
 - kmethods
 - Indexed list of HTTP methods, for example, “GET” and “POST”. Corresponds to enum kmethod.
 - ksuffixmap
 - Map of MIME types defined in enum kmime to possible
      suffixes. This array is terminated with a MIME type of
      
KMIME__MAXand nameNULL. - ksuffixes
 - Indexed list of canonical suffixes for MIME types corresponding to
      enum kmime. This may be a
      
NULLpointer for types that have no canonical suffix, for example. “application/octet-stream”. 
RETURN VALUES
khttp_parse() and
    khttp_parsex() return an error code:
KCGI_OK- Success (not an error).
 KCGI_ENOMEM- Memory failure. This can occur in many places: spawning a child, allocating memory, creating sockets, etc.
 KCGI_ENFILE- Could not allocate file descriptors.
 KCGI_EAGAIN- Could not spawn a child.
 KCGI_FORM- Malformed data between parent and child whilst parsing an HTTP request. (Internal system error.)
 KCGI_SYSTEM- Opaque operating system error.
 
On failure, the calling application should terminate as soon as possible. Applications should not try to write an HTTP 505 error or similar, but allow the web server to handle the empty CGI response on its own.
SEE ALSO
AUTHORS
The khttp_parse() and
    khttp_parsex() functions were written by
    Kristaps Dzonsons
    <kristaps@bsd.lv>.