Restless: Yet Another Lightweight Markup Processor

What is lightweight markup?

By lightweight markup, we mean a file format that looks a lot like plain text, but which conforms to some non-intrusive conventions for writing headings, emphasis, links and so on. This markup can be detected by a processor which will typically generate HTML, or some other less human-readable but prettier format, from the lightweight input.

The most common application of lightweight markup is in online applications (e.g. forums and wikis) where it evolved to overcome the plain-text-only limitation of the HTML <TEXTAREA> element. It's also often used for documentation embedded in programs, which are also limited to non-WYSIWYG markup.

Rather than list any of the numerous existing lightweight markup languages here, I'll refer to this Wikipedia page which links to many of them: http://en.wikipedia.org/wiki/Lightweight_markup_language

Why "Yet Another" one?

Existing lightweight languages are great when they're not trying to do too much. It's easy to think of obvious markup conventions for a few basic things such as headings and sub-headings, emphasis and links. But most (all?) of the existing languages have, in my opinion, gone too far. In order to support more advanced markup they've introduced cryptic conventions which are less human readable than, for example, an HTML tag would be. (And what's more, each of these languages has introduced their own different conventions. At least there's only one HTML, which has been the same for 15 years or so.)

You might hope that it's possible to simply ignore the unwanted features (as you can with HTML tags that you don't care about). But instead you'll find that some bit of text that you wanted to appear literally is instead interpreted as markup: you then need to learn the escaping convention for that language.

So I've written restless, a "rest" processor that does "less". ("Rest" means "re-structured text", and it's a form of lightweight markup supported by docbook and used a lot in the python community; see http://docutils.sourceforge.net/rst.html.)

Restless currently converts only to HTML. You can use it as a command-line tool and also as an on-the-fly converter invoked by your web server.

The Restless Markup

My guess is that many of the existing lightweight markup languages gained their first layer of "too much markup" when their authors attempted to bootstrap them by using them to write their own documentation: of course, you need a way to escape the markup so that it appears literally. Restless has no such way to escape anything, which I consider to be a feature, not a bug. So here's a quick summary in prose:

And that's (currently) it.

Heuristic Processing

My intention is that any formatting beyond the rules described above, with a few possible exceptions as described below, will be performed heuristically. I consider this experimental; the danger is that the heuristic will be a bad one and the results frustrating.

At present, I have implemented a heuristic that attempts to distinguish code blocks from surrounding text based on indentation and character frequency. This is not yet working perfectly but I've not put much effort into it, and I think it can probably be made to work well enough in due course.

Possible Markup Extensions

The following are forms of lightweight markup that I consider to be potentially useful, but have not yet implemented:

Other Required Features

The following are features that I consider important, but have not yet though of a way to implement:

License

Restless is distributed under the terms of the GNU General Public License, version 2 or later.

Installing Restless

Restless is a small C++ program which should be simple to install on a Linux or similar system. First you'll need to install the following prerequisites:

Then do something like this:

$ svn co http://svn.chezphil.org/restless/trunk/ restless
$ cd restless
$ make
$ sudo make install

Using Restless

You can use restless from the command line much like any other unix utility:

$ restless foo.rest > foo.html
$ zcat something | restless | more

Using Restless With Apache

To have Apache process .rest files into HTML on request, use mod_ext_filter; see the documentation at http://httpd.apache.org/docs/2.2/mod/mod_ext_filter.html. First make sure that the module is enabled, something like this:

# a2enmod ext_filter
# /etc/init.d/apache2 force-reload

Now put something like this in your Apache configuration:

ExtFilterDefine rest-to-html mode=output intype=text/rest outtype=text/html \
    cmd="/usr/local/bin/restless"
SetOutputFilter rest-to-html
AddType text/rest .rest

TODO

About The Code/Text Character Frequency Heuristic

The character frequency heuristic test mesures the correlation between the character frequency in the input and two reference character frequency tables, one for code and one for text. These tables are #included into the restless executable when it is compiled. You may prefer to substitute your own character frequency tables. You can do this using the countfreq program, which reads data on standard input and outputs a frequency table on standard output. Use it as follows:

$ cat *.txt | countfreqs > textfreqs.h
$ cat *.c *.py *.sh | countfreqs > codefreqs.h

Status

Restless is currently new and experimental. If you want to hear about new releases, please subscribe to the freshmeat entry at http://freshmeat.net/projects/restless/.

The Author

Restless is written by Phil Endecott; I'm also responsible for Anyterm (http://anyterm.org/) and Decimail (http://decimail.org/). Your feedback is welcome; email contact details can be found at http://chezphil.org/email/genemail.cgi.