kcgiminimal CGI library in C

Version 0.3.3: rename khttp_parse to khttp_parsex and use simplified arguments for the original khttp_parse. Have content-types passed into struct kpair be looked up in the MIME type database and, if found, record the index as ctypepos. Remove some overly-restrictive assertions on empty input fields and added the kvalid_stringne and kvalid_date validators for non-empty strings and ISO 8601 dates, respectively (the latter to avoid strptime, which violates the sandbox). Lastly, add kutil_urlpartx for typed arguments.

kcgi is a minimal CGI library for web applications in ISC licensed ISO C. It was designed to be secure and auditable. See a Comparison of CGI Libraries in C for alternatives. To start, download kcgi.tgz and run make install into your PREFIX of choice. The kcgi(3) manpage documents usage. kcgi is a BSD.lv project.

int main(void) { struct kreq r; struct kvalid key = { kvalid_int, "integer" }; const char *page = "index"; if ( ! khttp_parse(&r, &key, 1, &page, 1, 0)) return(EXIT_FAILURE); khttp_head(&r, kresps[KRESP_STATUS], "%s", khttps[KHTTP_200]); khttp_head(&r, kresps[KRESP_CONTENT_TYPE], "%s", kmimetypes[r.mime]); khttp_body(&r); return(EXIT_SUCCESS); }

Most kcgi applications work as follows (the sample.c file distributed in the source consists of a full working example):

  1. Call khttp_parse as early as possible. This will parse forms, query, and cookie data; validate fields; set up the HTTP environment; and map page and MIME requests. Validation uses kvalid_date, kvalid_double, kvalid_email, kvalid_int, kvalid_string, kvalid_stringne, kvalid_udouble, kvalid_uint, or locally-defined functions for validation.
  2. Examine the struct kpair elements of the struct kreq structure and potentially perform high-level, database-driven revalidation. This structure contains all elements parsed by khttp_parse.
  3. Emit HTTP headers with khttp_head, followed by khttp_body to begin the HTTP body. The latter will automatically trigger compression if requested by the client.
  4. Emit HTTP body output using HTML5 tree-building functions khtml_attr, khtml_attrx, khtml_close, khtml_closeto, khtml_elem, khtml_elemat, khtml_entity, khtml_int, khtml_ncr, or khtml_text; or
  5. use the khttp_template or khttp_template_buf functions to populate file templates; or
  6. directly use khttp_putc, khttp_puts, and khttp_write.
  7. Call khttp_free to close the HTTP document and free all memory.

This library is still quite new. Contact Kristaps with questions or comments.

The following is a rough feature list of kcgi. See the manual for details.

Security

As a security precaution, the kcgi library parses and validates untrusted network data in a sandboxed child process as follows. When invoked, khttp_parse will fork. The child process is responsible for reading and parsing form data from the web server. This parsed data is returned to the parent process over a socketpair.

kcgi follows OpenSSH's method of sandboxing the untrusted child process. This requires special handling for each operating system. For now, only two methods are supported.

systrace
The systrace(4) device as found on OpenBSD and other operating systems. This requires the existence of /dev/systrace if running in a chroot(2). Note: if you're using a stock OpenBSD, make sure that the mount-point of the /dev/systrace isn't mounted nodev!
Mac OS X Sandbox
The sandbox(7) facility for pure computation provided in Mac OS X Leopard and later. This is supplemented by resource limiting with setrlimit(2).

Since validation occurs within the sandbox, special care must be taken that validation routines don't access the environment (e.g., by opening files), as the child will be abruptly killed.

Portability

kcgi should run on any modern UNIX systems and with any web server. To date, it has been built and run on GNU/Linux machines, BSD (OpenBSD), and Mac OSX (Snow Leopard, Lion) on i386 and AMD64. It has been deployed under Apache and nginx (via the slowcgi wrapper).

Portability across UNIX systems is made possible by a small configure script that checks for minor inconsistencies such as strlcpy, the Security mechanisms, and for Compression support.

Extensibility

While page maps and input validation are entirely driven by the interfacing application, kcgi also allows for extension of the default HTTP headers, schemas, MIME types, and so on. Reasonable default have been provided for convenience. For specifics, see the khttp_parse and khttp_parsex in kcgi(3).

Compression

If HAVE_ZLIB is enabled during compilation (via the Portability mechanism), khttp_body will signal use of zlib to compress the HTTP body. Compression is only enabled if the client provides the correct (gzip) HTTP request header.

Input Processing

All common input methods—query string, cookie, and form (multipart form-data and mixed, urlencoded, and plain—are supported by kcgi. As described in the Security section, these fields are all parsed and validated from network data in a child process. Each input key-value pair can be matched (by key name) to a validator, which is run when fields are parsed. You can then look up key-value pairs constant-time in a table indexed by that key.

Output Processing

kcgi provides just the necessary functions for building HTML5 trees, outputting HTTP headers, and building URLs to get by. Many of these functions have both a basic and an extended calling style (with the function name ending in x, such as khtml_attrx). As a convenience, it also provides memory allocation wrappers, but these can be safely disregarded or mixed with the standard UNIX memory allocation routines.

Templating

While you can build HTML5 trees as noted in Functions, most application will want just to fill in a template. kcgi provides two simple functions, khttp_template and khttp_template_buf, to fill in files or memory buffers with data. Templates are the most common usage of kcgi, as they allow for a strong disconnect between prsentation and logic.