Using Pages
Ross RichardsonThanks to Ross Richardson's fine work in contributing this tutorial!
In order to facilitate convenient handling of common cases, kcgi provides functionality for dealing with
the CGI meta variable PATH_INFO
).
For example, if /cgi-bin/foo is the CGI script, invoking /cgi-bin/foo/bar/baz will pass /bar/baz as additional information.
Many CGI scripts use this functionality as URL normalisation
, or pushing query-string variables into the path.
This tutorial describes an example CGI which implements a news site devoted to some particular topic. The default document shows an index page, and there are sections for particular relevant areas. In each of these, the trailing slash may be included or omitted. I assume that your script is available at /cgi-bin/news.
- /cgi-bin/news, /cgi-bin/news/index
- main index
- /cgi-bin/news/about/
- about the site
- /cgi-bin/news/archive/
- archive of old articles
- /cgi-bin/news/archive/yyyy
- archive/index of articles for year yyyy
- /cgi-bin/news/archive/yyyy/mm
- archive/index of articles for month mm of year yyyy
- /cgi-bin/news/archive/yyyy/mm/dd
- archive/index of articles for date yyyy-mm-dd
- /cgi-bin/news/random
- a random article
- /cgi-bin/news/tag/subj
- articles tagged with "subj"
Basic Handling
Assuming a call to khttp_parse(3) returns KCGI_OK
, the relevant fields of the
struct kreq
are:
fullpath
- the value of CGI meta variable
PATH_INFO
(which may be the empty string) pagename
- the substring of
PATH_INFO
from after the initial '/' to (but excluding) the next '/', or to the end-of-string (or the empty string if no such substring exists) page
-
- if
pagename
is the empty string, thedefpage
parameter passed to khttp_parse(3) (that is, the index corrsponding to the default page) - if
pagename
matches one of the strings in thepages
parameter passed to khttp_parse(3), the index of that string - if
pagename
does not match any of the strings inpages
, thepagesz
parameter passed to khttp_parse(3)
- if
path
- the middle part of
PATH_INFO
after strippingpagename/
at the beginning and.suffix
at the end.
In addition, the field pname
contains the value of the CGI meta variable SCRIPT_NAME
.
Source Code
Here we look only at the code snippets not covered by the earlier tutorials. Firstly, we define some values corresponding with the subsections of the site.
Next, we define the path strings corresponding with the enumeration values
We then define a constant bitmap corresponding with those enum pg
values for which no extra path information should
be present in the HTTP request.
This will be used for sanity-checking the request.
Next, we define a type for dates, a constant for the earliest valid year, functions for parsing a string specifying a date. We use year zero to indicate an invalid specification, and month/day zero to indicate that a month/day value was not specified.)
Editor's note: remember that strptime(3) and friends may not be available within a file-system sandbox due to time-zone access, so we need to find another way.
Now, we consider the basic handling of the request.
Suppose we now decide that we wish to fall back to looking for a date specification (with '-' separators rather than '/') in the query string if none is specified in the path. This is as simple as adding the required definition…
…and adding a validator function…
(Note that the same date parsing function, str_to_adate(), is used but in this case it is wrapped in a validator function and thus executes in the sandboxed environment.)
…and, in main(), modifying the call to khttp_parse(3)…
…and handling of the PG_ARCHIVE case…
Whilst some specifications are naturally suited to the use of path information (for example, dates, file system hierarchies, and timezones), others are are a less natural fit. Suppose, in our example, that we want to be able to specify a date and a tag at the same time. This could be achieved by extending the behaviour of the archive or tag "page", but does not fit comfortably with either. In general, use of query string keys is preferred over pages because the former:
- involve parsing/validation in a sandboxed environment
- allows for greater flexibility
Editor's note: Ross makes a good case
for putting some sort of handling facility for URLs into
the protected child process.
For example, we could pass a string into khttp_parsex(3) that would define a template for
splitting the path into arguments.
For example, /@@0@@/@@1@@/@@2@@
might consider a pathname matching /foo/bar/baz
with components being validated as
query arguments.