1. Document Sets

1.1. Standard Links
1.2. Source Code Syntax Highlighting
1.3. Mathematics

This is an introduction to a set of tools that I have put together for managing document sets related to a course.

A document set is a collection of documents in web-browser accessible form that can cross-reference one another. The problems of cross-referencing is not as trivial as simply writing <a href=...> links within HTML web pages. That's because the links need to be specified when the documents are written, but the documents might be published on the website in any of several different formats.

This tool set supports three different input formats and 5 different output formats. Some output formats are intended for interactive viewing in a browser, others for printing. It is possible to publish a single input document in multiple output formats. In fact, I would expect that most documents would be published in two formats, one for viewing and one for printing.

The possible inputs and outputs are

Table 1. Input and Output Formats
Input Form Output Form Name Output Form Description
DocBook page Single HTML page
DocBook pages Multiple HTML pages, linked by "next" and "previous" buttons
DocBook slides Like pages, but split into more and shorted pages, formatted with a larger font for use in presentations
DocBook pdf A single PDF document - not all features are supported.
html html A single HTML page, essentially a copy of the original, except that a footer can be added with course-specific links and the <a> elements can use DocBook-style cross-referencing (targetdoc and targetptr) instead of href.
Source code; *.h, *.cpp, *.java N/A Single HTML page with syntax-highlighting of the code

The output forms "pages" and "slides" are considered to be viewing formats, because printing these would require someone to navigate from page to page, printing each one.. "pdf" is a printable format. "page" and "html" can be either.

As you can see, the primary input format supported is DocBook. DocBook is a format that emphasizes describing one's text rather than formatting it. For example, let's say that you were typing out an assignment and wanted to something like

"You will need to modify the declaration of Table in table.h."

If you were working directly in HTML, you would probably think to yourself, "How do I want to format the variable name and the file name in that sentence?" You might opt for something like

You will need to modify the declaration of <i>Table</i> in <tt>table.h</tt>.

The problem here is that you might have wind up using italics (<i>) to mean many different things in this document or across a set of related documents. You might later come to regret having used italics to format variable names and want to change this patter. That will be difficult if sometimes an <i> is a variable name, sometimes it is a term being defined, and sometimes it is used simply for emphasis.

A more savvy HTML author might have written

You will need to modify the declaration of <span class="varname">Table</span> in <span class="filename">table.h</span>.

Although more awkward to type, and requiring a lot more self-discipline, this has the advantage of latter allowing you to choose distinctive formatting (via CSS) for variable names and file names. This is an example of writing descriptively rather than visually.

DocBook provides, in essence, a set of descriptive tags that you can use:

You will need to modify the declaration of <varname>Table</varname> in <filename>table.h</filename>.

This can be translated into HTML, PDF, ebook formats, etc. In fact, the HTML translation would probably look a lot like the span/class example above, so that a little CSS can then be used to uniformly format different "kinds" of text.

The tools provided here will not only take care of translation and cross-referencing, they also provide for some convenience functions as well.