Toggle All News

kcgi is a minimal CGI library for web applications in ISC licensed ISO C. It was designed to be secure and auditable. To start, install the library then read the usage guide, which links to the canonical manpage documentation. Contact Kristaps with questions or comments. kcgi is a BSD.lv project.

The following simple example implements a server that just echoes Hello, World! as an HTTP response. Click on any italicitised fields to link to the documentation.
#include <stdint.h>
#include <stdlib.h>
#include <kcgi.h>

int main(void) {
  struct kreq r;
  const char *page = "index";
  if (KCGI_OK != khttp_parse(&r, NULL, 0, &page, 1, 0))
    return(EXIT_FAILURE);
  khttp_head(&r, kresps[KRESP_STATUS], "%s", khttps[KHTTP_200]);
  khttp_head(&r, kresps[KRESP_CONTENT_TYPE], "%s", kmimetypes[r.mime]);
  khttp_body(&r);
  khttp_puts(&r, "Hello, world!");
  khttp_free(&r);
  return(EXIT_SUCCESS);
}
#include <stdint.h>
#include <stdlib.h>
#include <kcgi.h>

int main(void) {
  struct kreq r;
  const char *page = "index";

  /*
   * Parse the HTTP environment.
   * We only know a single page, "index", which is also
   * the default page if none is supplied.
   * (We don't validate any input fields.)
   */
  if (KCGI_OK != khttp_parse(&r, NULL, 0, &page, 1, 0))
    return(EXIT_FAILURE);

  /* 
   * Ordinarily, here I'd switch on the method (OPTIONS, etc.,
   * defined in the method variable) then switch on which
   * page was requested (page variable).
   * But for the same of example, just output a response.
   */

  /* Emit the HTTP status 200 header: everything's ok. */
  khttp_head(&r, kresps[KRESP_STATUS], "%s", khttps[KHTTP_200]);
  /* Echo our content-type, defaulting to HTML if none was specified. */
  khttp_head(&r, kresps[KRESP_CONTENT_TYPE], "%s", kmimetypes[r.mime]);
  /* No more HTTP headers: start the HTTP document body. */
  khttp_body(&r);
  
  /*
   * We can put any content below here: JSON, HTML, etc.
   * Usually we'd switch on our MIME type.
   * However, we're just going to put the literal string as noted...
   */
  khttp_puts(&r, "Hello, world!");
  /* Flush the document and free resources. */
  khttp_free(&r);
  return(EXIT_SUCCESS);
}

Installation

First, check if kcgi isn't already a third-part port for your system, such as for OpenBSD or FreeBSD. If so, install using that system.

If not, you'll need a modern UNIX system. To date, kcgi has been built and run on GNU/Linux machines, BSD (OpenBSD, FreeBSD), and Mac OSX (Snow Leopard, Lion) on i386 and AMD64. It has been deployed under Apache, nginx, and OpenBSD's httpd(8) (the latter two via the slowcgi wrapper). Begin by downloading kcgi.tgz and verify the archive with kcgi.tgz.sha512. Once downloaded, compile the software with make, which will automatically run a configuration script to conditionally deploy portability glue. Finally, install the software using make install, optionally specifying the PREFIX if you don't intend to use /usr/local.

If kcgi doesn't compile, please send me the config.log file and the output of the failed compilation. If you're running on an operating system with an unsupported sandbox, let me know and we can work together to fit it into the configuration and portability layer. If you're running on Linux, you're not being sandboxed. kcgi doesn't yet have support for seccomp. Please contact me if you can give me access to a server to do so.

Usage

While this section describes a fairly typical setup, you'll want to read kcgi(3) and related for a fuller description. The most relevant component of kcgi is input parsing, described in khttp_parse(3). All key-value pairs are parsed from input, as are non-key-value message bodies. Consider a sample application that wishes to process two named HTML fields, string (being a non-empty string) and integer, a signed integer. First, assign these fields to numeric identifiers. This simplifies later access of the field values.

enum key {
  KEY_STRING,
  KEY_INTEGER,
  KEY__MAX
};

Next, connect the indices with validation functions and names. The validation function is run when the value is parsed; the name is the HTML form name for the given element. Validation applies to cookies as well as form and query-string data. You can provide your own validation functions instead of using the stock kcgi ones described in kvalid_string(3), of course. An empty string for the name will be applied to non-key-value message bodies.

const struct kvalid keys[KEY__MAX] = {
  { kvalid_stringne, "string" }, 
  { kvalid_int, "integer" }, 
};

Next, define a function that acts upon parsed fields. Note that this is application logic, and thus constitutes only an example: each application will handle its input differently. For simplicity, I focus only on the string input. If the value is found, it is assigned into the fieldmap array. If it was found but did not validate, it is assigned into the fieldnmap array. In this trivial example, the function emits the string values if found or indicates that they're not found (or not valid). There can be multiple inputs matching the same name, such as for HTML checkboxes.

Beyond directly writing to the HTTP document with khttp_write(3) and templating with khttp_template(3), kcgi provides the kcgihtml(3), kcgijson(3), and kcgixml(3) libraries for media-specific output functions. These are intended only for the simplest use: complex applications will probably use their own.

void process(struct kreq *r) {
  struct kpair *p;
  khttp_puts(r, "string = ");
  if ((p = r->fieldmap[KEY_STRING]))
    khttp_puts(r, p->parsed.s);
  else if (r->fieldnmap[KEY_STRING])
    khttp_puts(r, "failed parse");
  else 
    khttp_puts(r, "not provided");
}

Before doing any parsing, sanitise the HTTP context. To begin, I provide an array of indexed page identifiers. These define the page requests accepted by the application, in this case being only /index.html.

enum page {
  PAGE_INDEX,
  PAGE__MAX
};
const char *const pages[PAGE__MAX] = {
  "index",
};

Now, validate the page request and HTTP context based upon pre-parsed components. This function checks the page request (it must be /index), MIME type (expanding to /index.html), and HTTP method (it must be an HTTP GET, such as /index.html?string=foo).

int sanitise(struct kreq *r) {
  if (PAGE_INDEX != r->page)
    return(0);
  else if (KMIME_TEXT_HTML != r->mime)
    return(0);
  return (KMETHOD_GET == r->method);
}

Putting all of these together, parse the HTTP context, validate it, process it, then free the resources. This simple example emits an HTTP 404 error regardless of the cause. In reality, this will switch on HTTP 405 for unsupported methods and empty output for non-text media. Headers are output using khttp_head(3), with the document body started with khttp_body(3). The HTTP context is closed with khttp_free(3).

int main(void) {
  struct kreq r;
  if (KCGI_OK != khttp_parse(&r, keys, KEY__MAX,
      pages, PAGE__MAX, PAGE_INDEX))  
    return(EXIT_FAILURE);
  if ( ! sanitise(&r)) {
    khttp_head(&r, kresps[KRESP_STATUS],
      "%s", khttps[KHTTP_404]);
    khttp_head(&r, kresps[KRESP_CONTENT_TYPE],
      "%s", kmimetypes[r.mime]);
    khttp_body(&r);
    khttp_puts(&r, "Page not found.");
  } else {
    khttp_head(&r, kresps[KRESP_STATUS],
      "%s", khttps[KHTTP_200]);
    khttp_head(&r, kresps[KRESP_CONTENT_TYPE],
      "%s", kmimetypes[r.mime]);
    khttp_body(&r);
    process(&r);
  }
  khttp_free(&r);
  return(EXIT_SUCCESS);
};

Usually, web applications for HTML content will use the page and input maps quite heavily, switching on the page requested (which is usually known beforehand) and data sent along with the page. CMS-style web applications will not parse the page identifier, leaving that to a database lookup. Moreover, while HTML applications can generally disregard the HTTP method, a DAV implementation will focus much more on the method itself.

Deploying

Applications using kcgi behave just like any other application. To compile kcgi applications, just include the kcgi.h header file and make sure it appears in the compiler inclusion path. (According to C99, you'll need to include stdint.h before it for the int64_t type used for parsing integers.) Linking is similarly normative: link to libkcgi and, if your system has compression support, libz.

Well-deployed web servers, such as the default OpenBSD server, by default are deployed within a chroot(2). If this is the case, you'll need to statically link your binary. If running within a chroot(2) and on OpenBSD, be aware that the sandbox method requires /dev/systrace within the server root. By default, this file does not exist in the web server root. Moreover, the default web server root mount-point, /var, is mounted nodev. This complication does not exist for the other sandboxes.

Implementation Details

The bulk of kcgi lies in khttp_parse(3), which fully parses the HTTP context. Application developers must invoke this function before all others. It must be matched by an khttp_free(3), which frees all resources.

The khttp_parse(3) function isolates its parsing and validation of untrusted network data within a sandboxed child process. Sandboxes limit the environment available to a process, so exploitable errors in the parsing process (or validation with third-party libraries) cannot touch the system environment. This parsed data is returned to the parent process over a socket.

Implementation Details

This method of sandboxing the untrusted child process follows OpenSSH, and requires special handling for each operating system:

systrace(4) (OpenBSD)
This requires the existence of /dev/systrace if running in a chroot(2), which is strongly suggested. If you're using a stock OpenBSD, make sure that the mount-point of /dev/systrace isn't mounted nodev!
sandbox_init(3) (Apple OSX)
This uses the sandboxing profile for pure computation as provided in Mac OS X Leopard and later. This is supplemented by resource limiting via setrlimit(2).
capsicum(4) (FreeBSD)
Uses the capabilities facility on FreeBSD 10 and later. This is supplemented by resource limiting with setrlimit(2).

Since validation occurs within the sandbox, special care must be taken that validation routines don't access the environment (e.g., by opening files, network connections, etc.), as the child will be abruptly killed by the sandbox facility. If required, this kind of validation can take place after the parse validation sequence.

Testing

kcgi is shipped with a fully automated testing framework executed with make regress. Interfacing systems can also make use of this by working with the kcgi_regress(3) function library. This framework acts as a mini-webserver, listening on a local port, translating an HTTP document into a minimal CGI request, and passing the request to a kcgi CGI client. For internal tests, test requests are constructed with libcurl.

The automated test framework, at the moment, only has a few tests for basic functionality and sandboxing. The binding local port is fixed, too; so if you plan on running the regression suite, you may need to tweak its access port.

Another testing framework exists for use with the American fuzzy lop. To use this, you'll need to compile the make afl target with your compiler of choice, e.g., make clean, then make afl CC=afl-gcc. Then run the afl-fuzz tool on the afl-multipart, afl-plain, and afl-urlencoded binaries using the test cases (and dictionaries, for the first) provided.

Performance

Security comes at a price. By design, kcgi incurs overhead in three ways: first, spawning a child to process the untrusted network data; second, enacting the sandbox framework; and third, passing parsed pairs back to the parent context.

This figure illustrates the cost of running kcgi against a baseline on OpenBSD 5.5 running nginx and slowcgi(8), i386. It shows the empirical cumulative distribution of a statisically-significant number of page requests (>1000) as measured by ab(1).

Line (1) shows a static file being served by the web server. The high speed is due to the file (most likely) being cached by the web server and/or kernel. Moving right, line (2) shows a basic CGI request producing no content. The CGI simply exits with an HTTP 200. This reflects the cost of invoking slowcgi(8). Line (3) is a simple kcgi that simply emits an HTTP 200. This instance is neither sandboxed nor does it compress output. The overhead from line (2) is due to the additional child being spawned. Line (4) shows the addition of sandboxing without compression, in this case via systrace(4). Lastly, line (5) shows both compression and sandboxing.

In this bar chart, I show the same but for relative comparison of the distribution means. The similarity between the compressed and non-compressed versions are due to the small amount of data being transmitted in the response body.