Getting Started with CGI in C

Source Code

I'll describe this as if reading a source file from top to bottom. To wit, let's start with the header files. We'll obviously need kcgi and stdint.h, which is necessary for some types found in the header file. I'll also include the HTML library for kcgi—I'll explain why later.

#include <sys/types.h> /* size_t, ssize_t */
#include <stdarg.h> /* va_list */
#include <stddef.h> /* NULL */
#include <stdint.h> /* int64_t */
#include <kcgi.h>
#include <kcgihtml.h>

Next, I'll assign the fields we're interested in to numeric identifiers. This will allow us later to assign names, then assign validators to named fields.

enum key {
  KEY_STRING,
  KEY_INTEGER,
  KEY__MAX
};

The enumeration will allow us to bound an array to KEY__MAX and refer to individual buckets in the array by the enumeration value. I'll assume that KEY_STRING is assigned 0 and KEY_INTEGER, 1.

Next, connect the indices with validation functions and names. The validation function is run by khttp_parse(3); the name is the HTML form name for the given element. Built-in validation functions, which we'll use, are described in kvalid_string(3). In this example, kvalid_stringne will validate a non-empty (nil-terminated) C string, while kvalid_int will validate a signed 64-bit integer.

static const struct kvalid keys[KEY__MAX] = {
  { kvalid_stringne, "string" }, /* KEY_STRING */
  { kvalid_int, "integer" }, /* KEY_INTEGER */
};

Next, I define a function that acts upon the parsed fields. According to khttp_parse(3), if a valid value is found, it is assigned into the fieldmap array. If one was found but did not validate, it is assigned into the fieldnmap array. Both of these are indexed by the array position in keys. (We could also have run the fields list, but that's for chumps.)

In this trivial example, the function emits the string values if found or indicates that they're not found (or not valid).

static void process(struct kreq *r) {
  struct kpair *p;
  khttp_puts(r, "<p>\n");
  khttp_puts(r, "The string value is ");
  if ((p = r->fieldmap[KEY_STRING]))
    khttp_puts(r, p->parsed.s);
  else if (r->fieldnmap[KEY_STRING])
    khttp_puts(r, "<i>failed parse</i>");
  else 
    khttp_puts(r, "<i>not provided</i>");
  khttp_puts(r, "</p>\n");
}

As is, this routine introduces a significant problem: if the KEY_STRING value consists of HTML, it will be inserted directly into the stream, allowing attackers to use XSS. Instead, let's use the kcgihtml(3) library to perform the proper encoding and element nesting.

static void process_safe(struct kreq *r) {
  struct kpair *p;
  struct khtmlreq req;
  khtml_open(&req, r, 0);
  khtml_elem(&req, KELEM_P);
  khtml_puts(&req, "The string value is ");
  if ((p = r->fieldmap[KEY_STRING])) {
    khtml_puts(&req, p->parsed.s);
  } else if (r->fieldnmap[KEY_STRING]) {
    khtml_elem(&req, KELEM_I);
    khtml_puts(&req, "failed parse");
  } else {
    khtml_elem(&req, KELEM_I);
    khtml_puts(&req, "not provided");
  }
  khtml_close(&req);
}

Before doing any parsing, I sanitise the HTTP context. This consists of the page requested, MIME type, HTTP method, and so on.

To begin, I provide an array of indexed page identifiers—similarly as I did for the field validator and name. This will also be passed to khttp_parse(3). These define the page requests accepted by the application, in this case being only index, which I'll also set to be the default page when invoked without a path (i.e., just http://www.foo.com). Note: this is the first path component, so specifying index will also accept index/foo.

enum page {
  PAGE_INDEX,
  PAGE__MAX
};
const char *const pages[PAGE__MAX] = {
  "index", /* PAGE_INDEX */
};

Now, I validate the page request and HTTP context based upon the defined components. This function checks the page request (it must be index without a subpath), HTML MIME type (expanding to index.html), and HTTP method (it must be an HTTP GET, such as index.html?string=foo). To keep things reasonable, I'll have the sanitiser return an HTTP error code (see RFC 2616 for an explanation).

static enum khttp sanitise(const struct kreq *r) {
  if (PAGE_INDEX != r->page)
    return KHTTP_404;
  else if ('\0' != *r->path) /* no index/xxxx */
    return KHTTP_404;
  else if (KMIME_TEXT_HTML != r->mime)
    return KHTTP_404;
  else if (KMETHOD_GET != r->method)
    return KHTTP_405;
  return KHTTP_200;
}

Putting all of these together: parse the HTTP context, validate it, process it, then free the resources. Headers are output using khttp_head(3), with the document body started with khttp_body(3). The HTTP context is closed with khttp_free(3).

int main(void) {
  struct kreq r;
  enum khttp er;
  if (KCGI_OK != khttp_parse(&r, keys, KEY__MAX,
      pages, PAGE__MAX, PAGE_INDEX))  
    return 0;
  if (KHTTP_200 != (er = sanitise(&r))) {
    khttp_head(&r, kresps[KRESP_STATUS],
      "%s", khttps[er]);
    khttp_head(&r, kresps[KRESP_CONTENT_TYPE],
      "%s", kmimetypes[KMIME_TEXT_PLAIN]);
    khttp_body(&r);
    if (KMIME_TEXT_HTML == r.mime)
      khttp_puts(&r, "Could not service request.");
  } else {
    khttp_head(&r, kresps[KRESP_STATUS],
      "%s", khttps[KHTTP_200]);
    khttp_head(&r, kresps[KRESP_CONTENT_TYPE],
      "%s", kmimetypes[r.mime]);
    khttp_body(&r);
    process_safe(&r);
  }
  khttp_free(&r);
  return 0;
};

That's it!

Compile and Link

Your source is no good til it's compiled and linked into an executable. In this section I'll mention two strategies: the first is where the application is dynamically linked; in the second, statically. Dynamic linking is normal for most applications, but CGI applications are often placed in a file-system jail (a chroot(2)) without access to other libraries, and are thus statically linked. In short, it depends on your environment. Let's call our application tutorial0.cgi and the source file, tutorial0.c. I assume kcgi has been installed into /usr/local and is being linked with zlib(3). To dynamically link:

% cc -I/usr/local/include -c -o tutorial0.o tutorial0.c
% cc -L/usr/local/lib -o tutorial0.cgi tutorial0.o -lkcgihtml -lkcgi -lz

For static linking, which is the norm in more sophisticated systems like OpenBSD:

% cc -static -L/usr/local/lib -o tutorial0.cgi tutorial0.o -lkcgihtml -lkcgi -lz

Install

Installation steps depends on your operating system, web server, and a thousand other factors. I'll stick with the simplest installation using the defaults of OpenBSD with the default web server httpd(8). To begin with, configure /etc/httpd.conf with your server's root being in /var/www and FastCGI being in /var/www/cgi-bin. If you've already done this, or have a configuration file in place, you won't need to do this.

server "me.local" {
  listen on * port 80
  root "/htdocs"
  location "/cgi-bin/*" {
    fastcgi
    root "/"
  }
}

Next, we use the rcctl(8) tool to enable and start the httpd(8) webserver and slowcgi(8) wrapper. (The latter is necessary because httpd(8) only directly supports FastCGI, so a proxy is necessary.) Again, you may not need to do this part. We also make sure the instructions on the main page are followed regarding OpenBSD sandboxing in the file-system jail.

% doas rcctl enable httpd
% doas rcctl start httpd
% doas rcctl check httpd
httpd(ok)
% doas rcctl enable slowcgi
% doas rcctl start slowcgi
% doas rcctl check slowcgi
slowcgi(ok)

Assuming we built the static binary, we can now just install into the CGI directory and be ready to go!

% doas install -m 0555 tutorial0.cgi /var/www/cgi-bin