2. The Site Map

Documents are grouped into directories. In general, a given directory can hold only a single document that will be published in "pages" form and a single document (possibly the same one) that will be published in "slides" form. Any number of "page", "pdf", or source code documents may occur in a directory.

Typically, a directory name will reflect the name of the pages/slides document that it contains. For example, a directory overview might contain a DocBook file overview.dbk together with any graphics files, stylesheets, or other auxiliary content that is used to prepare the final document. Let's suppose that this overview document also links to some C++ source code foo.cpp and to an assignment file asst.html. Let's further suppose that we decide to publish the overview in all available output forms. Then the relation between the input and output files is given by

Table 2. Input and Output Files
Input File Output File
overview/overview.dbk overview/page/overview.html The "page" output format
overview/overview.dbk overview/pages/index.html The opening page of the multi-page output for the "pages" output form.
overview/overview.dbk overview/slides/index.html The opening page of the multi-page output for the "slides" output form.
overview/overview.dbk overview/page/overview.pdf The "pdf" output format
overview/foo.cpp overview/foo.cpp.html Highlighted version of the source code
overview/asst.html overview/html/asst.html The "html" output format

Actually, the table above over-simplified in one respect. There will actually be two separate directories named "overview". One is the original input directory and the other is a copy of that directory created in an output area. In general, document conversion leaves the input directories untouched.

The thing that determines which documents are included in the set and what output forms should be used is the site map, an XML file that lists the directories in which documents are stored, the documents stored there, and the output forms desired for each document.

The site map is normally called "course.sitemap" and looks something like like this

<?xml version="1.0" encoding="utf-8"?>
<targetset> 
  <targetsetinfo>
    CS 250 1
  </targetsetinfo>

  <sitemap home="index.html"2 email="cs250@cs.odu.edu"3>
    <dir name="cs250Documents">4
      <dir name="Directory">5
         <document targetdoc="topics">6
            <form>page</form>
         </document>
         <document targetdoc="info">
            <form>page</form>
         </document>
         <document targetdoc="buttons">
            <form>html</form>
         </document>
      </dir>

      
      <dir name="syllabus">7
         <document targetdoc="syllabus">
            <form>page</form>
         </document>
      </dir>


<!--- ============ Lectures ===============  -->


      <dir name="cppProgramStructure">8
         <document targetdoc="cppProgramStructure">
            <form>slides</form>
            <form>page</form>
         </document>
      </dir>

      <dir name="arrays">
         <document targetdoc="arrays">
            <form>slides</form>
            <form>page</form>
	        </document>
      </dir>

 ...

1

This is used in a few places as part of a title, to identify the course.

2

The home attribute contains a URL that all documents will link back to as a home page for the course. This might be a relative URL to a document inside the document set (e.g., the topics page) or, if the course will be published on BlackBoard or another LMS, the address of the course on that system

3

This email address is used to construct the messaging link atthe bottom of each page.

The purpose of providing such links is so that, when students send messages about a lecture/assignment/whatever, the URL of the web page they are asking about can be automatically copied into their message.

4

This introduces a directory named "cs250Documents". Documents and other directories can be nested inside this one. The top-level directory named in the sitemap is a bit special. It gives a name to the directory within which all output forms are written. It also provides a name for the zip file that will eventually contain all the outputs and that can eventually be published to a website.

5

This introduces an inner directory. For the input, this is a directory at the same level as the sitemap file. In the output this appears inside the topmost directory.

For example, if we had stored the sitemap at ~cs250/website/course.sitemap, then this would say that we have an input directory ~cs250/website/Directory/ and that output from that directory will be stored at ~cs250/website/cs250Documents/Directory/.

6

This introduces the first of three documents that are stored in Directory. The targetdoc entry for a document is the base name of the input file. It must be unique over the entire site map.

Within the document can be one or more form elements. Each names an output form that we wish to generate for that document.

In this example, we are defining three documents. Each will be published as a single web page, but the input forms differ. The first two documents are created from inputs Directory/topics.dbk and Directory/info.dbk, with outputs cs250Documents/Directory/page/topics.html and cs250Documents/Directory/page/info.html. The third document is created from input Directory/buttons.html and is output as cs250Documents/Directory/html/buttons.html.

The "topics" document is actually something of a special case. The DocBook .dbk file for this document is created from the course outline as described later.

7

This introduces another document. It will be created from inputs syllabus/syllabus.dbk and output as cs250Documents/syllabus/page/syllabus.html.

8

This is actually a more typical entry. It introduces a document (a set of slides for a lecture) that is produced in two output forms. One is the multi-page set of slides and the other is a single web page (for printing).

When a document is declared to have multiple output forms, a couple of things need to be noted:

  • If any other documents link to this one, they will actually link to the first form listed for this document. Thus, in some sense, the first form listed is the primary or most important output format.

  • If any of the output formats is "paged" or "slides" (the two multi-html-page outputs) and any of the other output formats is "page" or "pdf", then the multi-paged documents will show a printer icon at the bottom of each page that links to the page/pdf version of the output.

It's worth pointing out nothing in the above sitemap makes any explicit mention of source code documents. That's because source code documents tend to be numerous and are likely to have identical names. Source code documents are handled (almost) implicitly. For every directory named in the site map (whether that directory contains any explicit document entries or not), that directory will be scanned for files endinging in ".h", ".cpp", or ".java". Each such file located will be converted into a web page.

Note that, if you should have a directory that contains only source code but no input DocBook or HTML documents, you would list that directory in the sitemap in order to collect and convert the source code.

It is also possible to have html pages that are not converted in any way. If the input directories contain a *.html file that is not listed in the sitemap, that file simply gets copied to the corresponding output directory. I use this, for example, to provide an index.html for the entire document set that consists of a frameset showing a navigation table (buttons.html) on one side and the topics page on the other.

This is actually not a special case, by the way. All files in a named input directory get copied to the corresponding output directory and to the output form directories. That's how graphics, stylesheets, and other related content stays with your web pages.