Thanks to Ross Richardson's fine work in contributing this tutorial!
In order to facilitate convenient handling of common cases, kcgi provides functionality for dealing with
the CGI meta variable
For example, if /cgi-bin/foo is the CGI script, invoking /cgi-bin/foo/bar/baz will pass /bar/baz as additional information.
Many CGI scripts use this functionality as
URL normalisation, or pushing query-string variables into the path.
This tutorial describes an example CGI which implements a news site devoted to some particular topic. The default document shows an index page, and there are sections for particular relevant areas. In each of these, the trailing slash may be included or omitted. I assume that your script is available at /cgi-bin/news.
- /cgi-bin/news, /cgi-bin/news/index
- main index
- about the site
- archive of old articles
- archive/index of articles for year yyyy
- archive/index of articles for month mm of year yyyy
- archive/index of articles for date yyyy-mm-dd
- a random article
- articles tagged with "subj"
Assuming a call to khttp_parse(3) returns
KCGI_OK, the relevant fields of the
struct kreq are:
- the value of CGI meta variable
PATH_INFO(which may be the empty string)
- the substring of
PATH_INFOfrom after the initial '/' to (but excluding) the next '/', or to the end-of-string (or the empty string if no such substring exists)
pagenameis the empty string, the
defpageparameter passed to khttp_parse(3) (that is, the index corrsponding to the default page)
pagenamematches one of the strings in the
pagesparameter passed to khttp_parse(3), the index of that string
pagenamedoes not match any of the strings in
pageszparameter passed to khttp_parse(3)
- the middle part of
pagename/at the beginning and
.suffixat the end.
In addition, the field
pname contains the value of the CGI meta variable
Here we look only at the code snippets not covered by the earlier tutorials. Firstly, we define some values corresponding with the subsections of the site.
Next, we define the path strings corresponding with the enumeration values
We then define a constant bitmap corresponding with those
enum pg values for which no extra path information should
be present in the HTTP request.
This will be used for sanity-checking the request.
Next, we define a type for dates, a constant for the earliest valid year, functions for parsing a string specifying a date. We use year zero to indicate an invalid specification, and month/day zero to indicate that a month/day value was not specified.)
Editor's note: remember that strptime(3) and friends may not be available within a file-system sandbox due to time-zone access, so we need to find another way.
Now, we consider the basic handling of the request.
Suppose we now decide that we wish to fall back to looking for a date specification (with '-' separators rather than '/') in the query string if none is specified in the path. This is as simple as adding the required definition…
…and adding a validator function…
(Note that the same date parsing function, str_to_adate(), is used but in this case it is wrapped in a validator function and thus executes in the sandboxed environment.)
…and, in main(), modifying the call to khttp_parse(3)…
…and handling of the PG_ARCHIVE case…
Whilst some specifications are naturally suited to the use of path information (for example, dates, file system hierarchies, and timezones), others are are a less natural fit. Suppose, in our example, that we want to be able to specify a date and a tag at the same time. This could be achieved by extending the behaviour of the archive or tag "page", but does not fit comfortably with either. In general, use of query string keys is preferred over pages because the former:
- involve parsing/validation in a sandboxed environment
- allows for greater flexibility
Editor's note: Ross makes a good case
for putting some sort of handling facility for URLs into
the protected child process.
For example, we could pass a string into khttp_parsex(3) that would define a template for
splitting the path into arguments.
/@@0@@/@@1@@/@@2@@ might consider a pathname matching
/foo/bar/baz with components being validated as