SBLG(1)

SYNOPSIS

sblg [-acjlLrV] [-C file] [-o file] [-s sort] [-t template] file ...

DESCRIPTION

The sblg utility merges XML articles and templates in a number of ways.

Standalone mode (-c) merges a single article's content and metadata into a template. For example, "sblg -o- -c foo.xml" merges foo.xml into the template article-template.xml.
Blog mode (the default) merges multiple articles' content and metadata into a template. For example, "sblg -o- bar.xml baz.xml" merges bar.xml and baz.xml into the template blog-template.xml.
Combined mode (-C) links multiple articles' content and metadata in standalone style. For example, "sblg -o- -C bar.xml baz.xml" will show content for only bar.xml, but metadata for both inputs. The similar -L flags runs the process for each input file without reparsing.
Atom mode (-a) merges multiple articles into an Atom feed template.
JSON mode (-j) merges all articles into a JSON object.

By default, sblg operates in blog mode with template blog-template.xml. Its arguments are as follows:

-a: Creates an Atom feed from its input files.
-c: Create standalone articles instead of merging articles together.
-l: Instead of emitting any output files, simply process the input and report a table of tags. This table consists of the input file name, a tab, then the tag. (Also known as article-major order.) The tag has escaped white-space printed as unescaped. You can also use -r to have tag-major order and -j for JSON output. Specify -l twice to show matches (tags for article-major, articles for tag-major) all on one tab-separated line, instead of one per line.
-r: Print the -l tag listing in “tag-major” order wherein the first column is the tag and the second column is the article. If the -j flag is specified, this is JSON formatted.
-j: JSON instead of XML output mode. This behaves as in blog mode, but outputs JSON instead of XML. If -l is specified, the tag listing will be displayed in JSON instead. See JSON Schema for details.
-C file: Like -c, but creating a blog from the article in file with the remaining files being articles used for navigation.
-L: Like -C, but acting on all input files, translating the input to output files such as in -c without -o. If there are multiple articles in an output file, the output is recreated for each (so only the last will remain). So running with “article0.xml article1.xml” will produce “article0.html article1.html” as if -C were seperately specified for both. This avoids needing to parse all inputs for each input.
-o file: Output file. If unspecified, standalone articles have .html appended to the input file name, unless the input file extension is .xml, in which case the .xml is replaced by .html. If multiple input files are specified, -o is ignored. If unspecified for the blog, blog.html is used by default. If unspecified for the Atom feed or JSON, atom.xml or blog.json, respectively, is used by default. Use -o - for standard output.
-s sort: Change how articles are sorted before being written into navigation or article entries. The default is date, which sorts oldest-newest by date. You can also specify filename, which sorts in increasing A–Z case-sensitive order of the source filename; cmdline for the command-line order; ititle for the case-insensitive document title; or title for the case-sensitive document title. Each sort may be prefixed with "r" (e.g., rcmdline) to reverse the sort.
-t template: Template for all modes. If unspecified, defaults to article-template.xml for -c, atom-template.xml for -a, and blog-template.xml otherwise.
-V: Emits the version as sblg-xx.yy.zz and exits.
file ...: Input files. In standalone mode with -c, input XML files are merged with a template into an output file. Otherwise, multiple input files are merged into a single blog.

All input must be well-formed XML. Element names and attributes are case-sensitive.

Article Input

Article input files consist of the following within the document:

<article data-sblg-article="1">
  <header>
    <h1>Article Name</h1>
    <address>Author Name</address>
    <time datetime="2013-06-29">29 June, 2013</time>
  </header>
  <aside>
    This is used as the feed <b>abstract</b>.
  </aside>
  <p>
    Some text in the <b>content</b>.
    <img src="foo.jpg" alt="An image for the feed" />
  </p>
</article>

All content outside of the element with the data-sblg-article="1" attribute, usually an <article>, is discarded. Then the article is scanned for the following:

the article title (both as text data only and inclusive of markup) is extracted from the first <hn> (header 1–4);
the article publication date is extracted from the datetime attribute of the first <time> (which must be a date, YYYY-MM-DD, or time, YYYY-MM-DDTHH:MM:SSZ) interpreted in UTC;
the author (both as text data only and inclusive of markup) from the first <address>;
the first <aside> is used for the feed abstract; and
the first <img> is associated as the article's image.

These are all set once: subsequent invocations will not override prior setting. See data-sblg-aside, data-sblg-author, data-sblg-datetime, data-sblg-img, and data-sblg-title for explicitly setting or overriding these values.

If unspecified, the default article title text (and mark-up) is "Untitled article", the default author text (and mark-up) is the "Unknown author", the publication time is set to the document's file-system creation time, the abstract is left empty, and the image is empty.

There are a number of special attributes that are recognised in the input file.

data-sblg-aside=string: Sets the aside material as otherwise would be set from the first <aside> element. It overrides the previously set aside. The alternative data-sblg-const-aside only sets the aside if it has not yet been set.
data-sblg-author=url: Sets the author as otherwise would be set from the first <address> element. It overrides the previously set author. The alternative data-sblg-const-author only sets the author if it has not yet been set.
data-sblg-datetime=datetime: Overrides the first <time> element. This must be YYYY-MM-DD or YYYY-MM-DDTH:MM:SSZ. It overrides the previously set date. The alternative data-sblg-const-datetime only sets the date if it has not yet been set.
data-sblg-img=url: Set the image associated with the article. It overrides any previously set image. The alternative data-sblg-const-img only sets the image if it has not yet been set.
data-sblg-lang=string: May only be set on the <article> and specifies one or more space-separated languages for the document. You can escape spaces with a backslash (“\”) if you have spaces in the tag name, e.g., “foo\ bar”. These languages are removed in the “stripping” operations for the Tag Symbols.
data-sblg-set-xxx=string: This allows arbitrary values to be attached to the article. For example, specifying data-sblg-set-foo="bar" sets the foo keyword to bar. If specified multiple times for the same key, only the last value is used. These may be retrieved with ${sblg-get} or queried with ${sblg-has} of the Tag Symbols.
data-sblg-sort=first|last: May only be set on the <article> element and overrides the article's position relative to other articles. This can be either first or last. If multiple articles have the same sort override, they are ordered in the natural way.
data-sblg-source=file: Set the source filename associated with the article. It overrides the implicit value set from the actual file.
data-sblg-tags=string: This tag may be specified on any element within the article and consists of space-separated tag names. You can escape spaces with a backslash (“\”) if you have spaces in the tag name, e.g., “foo\ bar”. These tags are extracted for navigation tag operation. It may not contain any tabs.
data-sblg-title=string: Sets the title as otherwise would be set in a <hN> element. It overrides the previously set title. The alternative data-sblg-const-title only sets the title if it has not yet been set.

Standalone Template

The standalone template file replaces the first element with the data-sblg-article="1" attribute, usually an <article>, with the article contents.

<body>
  <header>This consists of a single blog entry.</header>
  <article>This is kept.</article>
  <article data-sblg-article="1">This is removed.</article>
  <footer>Something.</footer>
</body>

Article templates may contain the following attributes:

data-sblg-article=boolean: If set to true, the contents are replaced with the input article. This only happens once: subsequent elements are ignored.
data-sblg-ign-once=boolean: If an element has the data-sblg-article="1" attribute set to true, the element is not processed as an article and the data-sblg-ign-once attribute is removed.

See Tag Symbols for a list of symbols that will be replaced if found in attribute value or textual contexts. These may occur anywhere in the template document.

Blog Template

The blog template replaces elements with the data-sblg-article="1" attribute, usually <article>, with ordered (by default, newest to oldest) article contents. If there aren't enough articles, the element is removed.

Elements with a data-sblg-nav="1" attribute, usually <nav>, are replaced by the same list of articles within an unordered list.

If an element has both attributes, only the first is recognised.

Usually, the article elements are used for displaying full articles, while the navigation elements are used for displaying navigation to articles, such as just their titles, dates, and links.

<body>
  <header>This consists of two blog entries.</header>
  <nav data-sblg-nav="1" />
  <article data-sblg-article="1" />
  <article data-sblg-article="1" />
  <footer>Something.</footer>
</body>

Article templates may contain several attributes.

data-sblg-article=boolean: If set to true, the contents (including the element itself) are replaced with the input article.
data-sblg-articletag=string: If an element with the data-sblg-article="1" attribute contains this, limit displayed articles to those matching the space-separated tags or ${sblg-get|xxx} when in -L or -C mode. This scans for tags from the current article in the list of articles.
data-sblg-ign-once=boolean: If an element with the data-sblg-article="1" attribute has this set to true, the element is not processed as an article and the data-sblg-ign-once attribute is removed.
data-sblg-permlink=boolean: If an element with the data-sblg-article="1" attribute has this set to true, a permanent link to the article's input filename is emitted within a <div data-sblg-permlink="1"> element after the element with the data-sblg-article="1" attribute.

The navigation element may contain several attributes.

data-sblg-navcontent=boolean: Deprecated alias for respective content and element styles list-keep and keep if true, list-summarise and keep if false.
data-sblg-navstyle-content=style: Style for formatting articles into the content of the navigation element. May be keep, to output the content per-article and perform Tag Symbols substitution; summarise or summarize, to discard content and output the article time followed by a link to the article; list-keep, same as keep except surrounding each article with <li> and all articles with <ul>; or list-summarise, or list-summarize, same as summarise except surrounding each article with <li> and all articles with <ul>. If not given or unknown, defaults to list-summarise.
data-sblg-navstyle-element=style: Style for the navigation element. May be keep, to output the element as-is once around all articles; keep-strip, to output the element without attributes once around all articles; repeat-strip, to output the element without attributes around each article (if the content styles are list-keep or list-summarise, the element is output within the li); or discard to suppress output. If not given or unknown, defaults to keep.
data-sblg-navsort=sort: Overrides the global search order given with -s. Uses the same names. If the search name is not recognised, the attribute is silently ignored and the global search order used.
data-sblg-navstart=number: How many articles will skip being displayed (so if you have tags, it will only account for articles that would meet those tags) before showing the first navigation entry. Starts at one (a value of zero is the same as a value of one).
data-sblg-navsz=number: If the <nav> element contains this attribute with a positive integer, it is used to limit the number of navigation entries.
data-sblg-navtag=string: Only articles with matching tags are shown. You can specify multiple space-separated tags, for instance, data-sblg-navtag="foo bar" will search for foo or bar. Tags to be matched against are extracted from the space-separated data-sblg-tags element of each article's topmost element. Escape spaces with a backslash (“\”) if you have spaces in the tag name, e.g., “foo\ bar”. Use ${sblg-get|xxx} or (for multi-word values) ${sblg-get-escaped|xxx} when in -C or -L mode to use the current article's set data as part of a string, e.g., location-${sblg-get|location}.
data-sblg-navxml=boolean: Deprecated alias for respective content and element styles keep and discard if true, list-summarise and keep if false.

Combined Template

This is identical to the Blog Template except that a single article is noted with -C, and this is the only article displayed in the article stub. Furthermore, like in standalone mode, Tag Symbols may be used anywhere in the document template and refer to the current article unless within a navigation element, in which case the symbol resolves to the currently-printed article. In the given example,

<body>
  <header>This consists of two blog entries.</header>
  <nav data-sblg-nav="1" />
  <article data-sblg-article="1" />
  <article data-sblg-article="1" />
  <footer>Something.</footer>
</body>

the navigation would be populated by all articles, but only the first article stub would be filled in with the specified article. The second would be removed.

This follows the usual rules of data-sblg-articletag, so if the article you specify with -C doesn't have the correct tag, it won't inline the article.

Atom Template

The Atom template file must be a well-formed XML file where each <entry> element with a Boolean data-sblg-entry attribute is replaced by ordered (newest to oldest) article information. If there aren't enough articles, the element is removed. The template may contain pre-existing entries.

The following is a minimal template: anything less will not conform to the Atom specification:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link href="http://example.org" />
  <title>A Title Here</title>
  <updated />
  <id />
  <entry data-sblg-entry="1" data-sblg-forall="1" />
</feed>

The recognised elements are as follows. Un-recognised elements are printed verbatim.

<entry data-sblg-entry="1">: Filled-in article entry. If the attribute is not specified, the entry is retained verbatim. Otherwise it is filled in with an article's information.
<id>: If this is empty, it is filled in with the URL in <link [rel="alternate"]>, which must exist. Otherwise, the value is copied and used for subsequent feed entries.
<link [rel="alternate"]>: Unless an <id> is provided, the href attribute must be a full URL, e.g., <link href="https://kristaps.bsd.lv/">. Otherwise, it may be a relative path. This element must be first.
<updated>: This is filled in with the most recent article. Its contents are discarded.

There are a number of special attributes that may be given to the above elements.

data-sblg-altlink=boolean: If an <entry data-sblg-entry="1"> element contains this set to true, the alternate <link> is printed.
data-sblg-altlink-fmt=string: If both data-sblg-entry and data-sblg-altlink are true for an <entry>, the value is used as the link address. Accepts Tag Symbols, most commonly being ${sblg-base}.
data-sblg-atomcontent=boolean: If <entry data-sblg-entry="1"> contains this set to true, the contents are printed directly and the Tag Symbols are processed. This overrides data-sblg-altlink and data-sblg-content.
data-sblg-content=boolean: If <entry data-sblg-entry="1"> contains this set to true, the article's contents (everything within the element having the data-sblg-article="1" attribute) are inlined within the <content> element with type html. Tag Symbols are processed.
data-sblg-entry=boolean: Each <entry> element with this is filled in with article content.
data-sblg-forall=boolean: If an <entry data-sblg-entry="1"> element contains this set to true, it is used for all remaining articles. Any <entry data-sblg-entry="1"> following this are discarded.

If not using data-sblg-atomcontent, entries are filled in with a <title>, <id>, <author>, HTML <content> (specified in the article as an <aside>), and alternate <link>. The <id> is constructed by appending the source filename, hash print, and date following the feed's <id> or <link> element.

When filling in HTML content, sblg will strip away HTML attributes that do not fit into a white-list. This white-list is defined by the W3C's Feed Validator.

JSON Schema

sblg can produce JSON with the -j flag. The structure of the JSON file is consumable either with a JSON schema (noted in the FILES section) or using the typings that may be downloaded with npm(1):

npm install sblg

If -l is specified, the output schema is simply an array as follows. Let source1.xml and source2.xml be input files with a variety of tags.

[
 {"src": "source1.xml",
  "tags": ["tag1","tag2"]},
 {"src": "source2.xml",
  "tags": ["tag1"]}
]

If, however, -r is also specified, the reverse format is used:

[
 {"tag": "tag1",
  "srcs": ["source1.xml","source2.xml"]},
 {"tag": "tag2",
  "srcs": ["source1.xml"]}
]

Tag Symbols

Within the template for -c or -C, or in any article contents written (either into an article or navigation entry), the following special strings are replaced. These symbols concern the current article being processed: in a navigation entry, or as article contents. In the event of the positional “next” and “prev” symbols, these refer to the article's position within the input articles. Obviously, -c has only a single article.

In general, these must be considered strict values, e.g., ${sblg-aside} and not ${ sblg-aside }. Some symbols accept optional arguments, which have the format ${sblg-tags[|argument]}. Here, |argument may be omitted.

Be careful in using tag symbols: the contents are copied directly, so if specifying a value within an HTML attribute that has a double-quote, the attribute will be prematurely closed.

To prevent regular text with ${...} from being processed, escape one or more character, such as &dollar;{...}.

${sblg-abscount}: The total number of articles. This is only valid in <nav data-sblg-nav="1">, otherwise it always prints 1. See also ${sblg-count} and ${sblg-setcount}.
${sblg-abspos}: The position (from 1) of the article's position in the list of all articles. This is only valid in a <nav data-sblg-nav="1"> context, otherwise it always prints 1. See also ${sblg-pos}.
${sblg-aside}: The article's first aside with markup.
${sblg-asidetext}: The article's first aside, textual parts only.
${sblg-author}: The article's author with markup.
${sblg-authortext}: The article's author, textual parts only
${sblg-realbase}: Like ${sblg-base}, and having the same sub-types, except deriving from ${sblg-real}.
${sblg-base}: Same as ${sblg-source} but with the last suffix part chopped off. For example, foo/bar.xml becomes foo/bar. The ${sblg-stripbase} variant will strip off the directory part and any sufix. For example, foo/bar.xml becomes bar. The ${sblg-striplangbase} variant will also strip the language. For example, if “en” language was specified on the article, foo/bar.en.xml becomes bar.
${sblg-count}: The total number of articles that will be shown, i.e., taking into consideration the navigation length and offset. In standalone mode, this is always 1. In <nav data-sblg-nav="1">, it's the total number within the navigation. See also ${sblg-abscount} and ${sblg-setcount}.
${sblg-date}: The publication date as YYYY-MM-DD (UTC).
${sblg-datetime}: The publication date and time as YYYY-MM-DDTHH:MM:SSZ (UTC).
${sblg-datetime-fmt[|fmt]}: A human-readable representation of the date and, if specified, time in local time. This accepts an optional format string passed to strftime(3). If the format string is empty or “auto”, a human-readable date (with %x) or date-time (%c) is printed.
${sblg-img}: The article's associated image. This will be an empty string if no image was specified.
${sblg-first-base}: The first (newest) base name in the list of articles. There are also ${sblg-first-stripbase} and ${sblg-first-striplangbase} variants. See ${sblg-base}.
${sblg-last-base}: The last (oldest) base name in the list of articles. There are also ${sblg-last-stripbase} and ${sblg-last-striplangbase} variants. See ${sblg-base}.
${sblg-next-base}: The next base name when chronologically ordered from newest to oldest, wrapping back to the beginning for the last. There are also ${sblg-next-stripbase} and ${sblg-next-striplangbase} variants. See ${sblg-base}.
${sblg-next-has}: Prints sblg-next-has if there exists a next article in the ordered set, otherwise prints nothing.
${sblg-pos}: The position (from 1) of the articles actually shown. This always starts at 1 and increments by one, regardless the tag filtering or starting position. In standalone mode, it always prints 1. In blog mode (outside of a <nav> context), it shows the position in the input files. Within a <nav> context, it shows the position within the navigation.
${sblg-pos-frac}: The fractional (0–1) value of ${sblg-pos}/$(sblg-count}.
${sblg-pos-pct}: The percentage (0–100, not including the percent sign) form of ${sblg-pos-frac}.
${sblg-prev-base}: The previous base name when chronologically ordered from newest to oldest, wrapping back to the beginning for the last. There are also ${sblg-prev-stripbase} and ${sblg-prev-striplangbase} variants. See ${sblg-base}.
${sblg-prev-has}: Prints sblg-prev-has if there exists a previous article in the ordered set, otherwise prints nothing.
${sblg-get[|key]}: Print the value of key assigned in data-sblg-set-key. If unspecified or the key was not found, this is ignored and omitted from output. The lookup is case sensitive.
${sblg-get-escaped[|key]}: Like ${sblg-get[|key]}, but escapes the value of the key so that it may be used for data-sblg-navtag or data-sblg-articletag attribute values for multi-word tags.
${sblg-has[|key]}: Like ${sblg-get[|key]}, but queries with the key exists. If it is specified and it does exist, then the string sblg-has-key is printed. This is useful in class attributes to test whether a given key has been specified.
${sblg-setcount}: Like ${sblg-count}, but only the articles matching the requested tags. See also ${sblg-count} and ${sblg-abscount}.
${sblg-real}: The article's actual source file. See ${sblg-source} for an overridable source indicator.
${sblg-source}: The source file associated with the article.
${sblg-tags[|tagspec]}: List of unique tags in the article, optionally filtered by those having the prefix tagspec. If the prefix is not specified, all tags. Each tag (e.g., TAG) is listed as <span class="sblg-tag">TAG</span>. If no tags were found, a single <span class="sblg-tags-notfound"></span> is emitted.
${sblg-title}: The article title with markup.
${sblg-titletext}: The article title, textual parts only.
${sblg-url}: The output filename, which is empty for standard output.
${sblg-version}: The current sblg version as xx.yy.zz.

FILES

The following files are installed in /usr/local/share/sblg.

schema.json: JSON schema for output generated with -j.

EXIT STATUS

The sblg utility exits 0 on success, and >0 if an error occurs.

EXAMPLES

First, create standalone HTML5 files (filled-in <article data-sblg-article="1">) from article fragments. An article-template.xml file is assumed to exist. This will create article1.html and article2.html from the re-write rule for the XML suffix.

% sblg -c article1.xml
  article2.xml

Next, merge formatted files into a front page. A blog-template.xml file is assumed to exist.

% sblg -o index.html article1.html
  article2.html

This will create index.html with filled-in <article data-sblg-article="1"> and <nav data-sblg-nav="1"> elements.

Combining the above two examples, we can specify a single article to be displayed along with a full navigation as follows:

% sblg -o article1.html -C
  article1.xml article1.xml article2.xml

This will fill the contents of article1.xml into the <article data-sblg-article="1"> but use both (along with any others) in the <nav data-sblg-nav="1">.

If we want to make an output article as in the above example for each element of the input, we could either run -C for each input element, or use -L to avoid re-running sblg for each input article, which can be costly for many articles!

% sblg -L article1.xml
  article2.xml

This re-writes the suffixes and fills in the <article data-sblg-article="1"> for article1.xml in article1.html, and so on. For each of these, it will fill in <nav data-sblg-nav="1">.

NAME