Overview - Course Document Sets

Steven Zeil

ODU Dept of Computer Science

August 1, 2010


Table of Contents

1. Document Sets
1.1. Standard Links
1.2. Source Code Syntax Highlighting
1.3. Mathematics
2. The Site Map
3. The Outline

1. Document Sets

This is an introduction to a set of tools that I have put together for managing document sets related to a course.

A document set is a collection of documents in web-browser accessible form that can cross-reference one another. The problems of cross-referencing is not as trivial as simply writing <a href=...> links within HTML web pages. That's because the links need to be specified when the documents are written, but the documents might be published on the website in any of several different formats.

This tool set supports three different input formats and 5 different output formats. Some output formats are intended for interactive viewing in a browser, others for printing. It is possible to publish a single input document in multiple output formats. In fact, I would expect that most documents would be published in two formats, one for viewing and one for printing.

The possible inputs and outputs are

Table 1. Input and Output Formats
Input Form Output Form Name Output Form Description
DocBook page Single HTML page
DocBook pages Multiple HTML pages, linked by "next" and "previous" buttons
DocBook slides Like pages, but split into more and shorted pages, formatted with a larger font for use in presentations
DocBook pdf A single PDF document - not all features are supported.
html html A single HTML page, essentially a copy of the original, except that a footer can be added with course-specific links and the <a> elements can use DocBook-style cross-referencing (targetdoc and targetptr) instead of href.
Source code; *.h, *.cpp, *.java N/A Single HTML page with syntax-highlighting of the code

The output forms "pages" and "slides" are considered to be viewing formats, because printing these would require someone to navigate from page to page, printing each one.. "pdf" is a printable format. "page" and "html" can be either.

As you can see, the primary input format supported is DocBook. DocBook is a format that emphasizes describing one's text rather than formatting it. For example, let's say that you were typing out an assignment and wanted to something like

"You will need to modify the declaration of Table in table.h."

If you were working directly in HTML, you would probably think to yourself, "How do I want to format the variable name and the file name in that sentence?" You might opt for something like

You will need to modify the declaration of <i>Table</i> in <tt>table.h</tt>.

The problem here is that you might have wind up using italics (<i>) to mean many different things in this document or across a set of related documents. You might later come to regret having used italics to format variable names and want to change this patter. That will be difficult if sometimes an <i> is a variable name, sometimes it is a term being defined, and sometimes it is used simply for emphasis.

A more savvy HTML author might have written

You will need to modify the declaration of <span class="varname">Table</span> in <span class="filename">table.h</span>.

Although more awkward to type, and requiring a lot more self-discipline, this has the advantage of latter allowing you to choose distinctive formatting (via CSS) for variable names and file names. This is an example of writing descriptively rather than visually.

DocBook provides, in essence, a set of descriptive tags that you can use:

You will need to modify the declaration of <varname>Table</varname> in <filename>table.h</filename>.

This can be translated into HTML, PDF, ebook formats, etc. In fact, the HTML translation would probably look a lot like the span/class example above, so that a little CSS can then be used to uniformly format different "kinds" of text.

The tools provided here will not only take care of translation and cross-referencing, they also provide for some convenience functions as well.

1.1. Standard Links

Each HTML page can provide links to

  • a alternative printable format of the same document,

  • a course home page, and

  • a messaging facility (currently generates a mailto: link with the course name in the email subject line and the URL of the originating page at the start of the message).

1.2. Source Code Syntax Highlighting

C++ and Java code can be automatically syntax-highlighted.

#include <iostream>


/** An example of some C++ code
*/

using namespace std;

int main() {
  cout << "Hello world!" << endl;
  return 0;
}

You can also apply various highlighting and other formatting to the code.

#include <iostream>


/** An example of some C++ code
*/

using namespace std;

int main() {
  cout << "Hello world!" << endl;
  return 0;
}

You can also place callouts in the code

#include <iostream>


/** An example of some C++ code
*/

using namespace std;1

int main() {
  cout << "Hello world!" << endl;
  return 0;2
}

1

Without this, we would have to spell out some of the names in their long form: std::cout and std::endl.

2

main has a return type of int, so it must return a value. Zero is traditionally returned to indicate that the program has not encountered any problems.

1.3. Mathematics

Mathematics is supported using the ASCIIMath input format. For example, you can type

x = (-b +- sqrt(b^2-4a c))/(2a)

and get

` x = (-b +- sqrt(b^2-4a c))/(2a) `

2. The Site Map

Documents are grouped into directories. In general, a given directory can hold only a single document that will be published in "pages" form and a single document (possibly the same one) that will be published in "slides" form. Any number of "page", "pdf", or source code documents may occur in a directory.

Typically, a directory name will reflect the name of the pages/slides document that it contains. For example, a directory overview might contain a DocBook file overview.dbk together with any graphics files, stylesheets, or other auxiliary content that is used to prepare the final document. Let's suppose that this overview document also links to some C++ source code foo.cpp and to an assignment file asst.html. Let's further suppose that we decide to publish the overview in all available output forms. Then the relation between the input and output files is given by

Table 2. Input and Output Files
Input File Output File
overview/overview.dbk overview/page/overview.html The "page" output format
overview/overview.dbk overview/pages/index.html The opening page of the multi-page output for the "pages" output form.
overview/overview.dbk overview/slides/index.html The opening page of the multi-page output for the "slides" output form.
overview/overview.dbk overview/page/overview.pdf The "pdf" output format
overview/foo.cpp overview/foo.cpp.html Highlighted version of the source code
overview/asst.html overview/html/asst.html The "html" output format

Actually, the table above over-simplified in one respect. There will actually be two separate directories named "overview". One is the original input directory and the other is a copy of that directory created in an output area. In general, document conversion leaves the input directories untouched.

The thing that determines which documents are included in the set and what output forms should be used is the site map, an XML file that lists the directories in which documents are stored, the documents stored there, and the output forms desired for each document.

The site map is normally called "course.sitemap" and looks something like like this

<?xml version="1.0" encoding="utf-8"?>
<targetset> 
  <targetsetinfo>
    CS 250 1
  </targetsetinfo>

  <sitemap home="index.html"2 email="cs250@cs.odu.edu"3>
    <dir name="cs250Documents">4
      <dir name="Directory">5
         <document targetdoc="topics">6
            <form>page</form>
         </document>
         <document targetdoc="info">
            <form>page</form>
         </document>
         <document targetdoc="buttons">
            <form>html</form>
         </document>
      </dir>

      
      <dir name="syllabus">7
         <document targetdoc="syllabus">
            <form>page</form>
         </document>
      </dir>


<!--- ============ Lectures ===============  -->


      <dir name="cppProgramStructure">8
         <document targetdoc="cppProgramStructure">
            <form>slides</form>
            <form>page</form>
         </document>
      </dir>

      <dir name="arrays">
         <document targetdoc="arrays">
            <form>slides</form>
            <form>page</form>
	        </document>
      </dir>

 ...

1

This is used in a few places as part of a title, to identify the course.

2

The home attribute contains a URL that all documents will link back to as a home page for the course. This might be a relative URL to a document inside the document set (e.g., the topics page) or, if the course will be published on BlackBoard or another LMS, the address of the course on that system

3

This email address is used to construct the messaging link atthe bottom of each page.

The purpose of providing such links is so that, when students send messages about a lecture/assignment/whatever, the URL of the web page they are asking about can be automatically copied into their message.

4

This introduces a directory named "cs250Documents". Documents and other directories can be nested inside this one. The top-level directory named in the sitemap is a bit special. It gives a name to the directory within which all output forms are written. It also provides a name for the zip file that will eventually contain all the outputs and that can eventually be published to a website.

5

This introduces an inner directory. For the input, this is a directory at the same level as the sitemap file. In the output this appears inside the topmost directory.

For example, if we had stored the sitemap at ~cs250/website/course.sitemap, then this would say that we have an input directory ~cs250/website/Directory/ and that output from that directory will be stored at ~cs250/website/cs250Documents/Directory/.

6

This introduces the first of three documents that are stored in Directory. The targetdoc entry for a document is the base name of the input file. It must be unique over the entire site map.

Within the document can be one or more form elements. Each names an output form that we wish to generate for that document.

In this example, we are defining three documents. Each will be published as a single web page, but the input forms differ. The first two documents are created from inputs Directory/topics.dbk and Directory/info.dbk, with outputs cs250Documents/Directory/page/topics.html and cs250Documents/Directory/page/info.html. The third document is created from input Directory/buttons.html and is output as cs250Documents/Directory/html/buttons.html.

The "topics" document is actually something of a special case. The DocBook .dbk file for this document is created from the course outline as described later.

7

This introduces another document. It will be created from inputs syllabus/syllabus.dbk and output as cs250Documents/syllabus/page/syllabus.html.

8

This is actually a more typical entry. It introduces a document (a set of slides for a lecture) that is produced in two output forms. One is the multi-page set of slides and the other is a single web page (for printing).

When a document is declared to have multiple output forms, a couple of things need to be noted:

  • If any other documents link to this one, they will actually link to the first form listed for this document. Thus, in some sense, the first form listed is the primary or most important output format.

  • If any of the output formats is "paged" or "slides" (the two multi-html-page outputs) and any of the other output formats is "page" or "pdf", then the multi-paged documents will show a printer icon at the bottom of each page that links to the page/pdf version of the output.

It's worth pointing out nothing in the above sitemap makes any explicit mention of source code documents. That's because source code documents tend to be numerous and are likely to have identical names. Source code documents are handled (almost) implicitly. For every directory named in the site map (whether that directory contains any explicit document entries or not), that directory will be scanned for files endinging in ".h", ".cpp", or ".java". Each such file located will be converted into a web page.

Note that, if you should have a directory that contains only source code but no input DocBook or HTML documents, you would list that directory in the sitemap in order to collect and convert the source code.

It is also possible to have html pages that are not converted in any way. If the input directories contain a *.html file that is not listed in the sitemap, that file simply gets copied to the corresponding output directory. I use this, for example, to provide an index.html for the entire document set that consists of a frameset showing a navigation table (buttons.html) on one side and the topics page on the other.

This is actually not a special case, by the way. All files in a named input directory get copied to the corresponding output directory and to the output form directories. That's how graphics, stylesheets, and other related content stays with your web pages.

3. The Outline

The outline file organizes documents into a hierarchical structure of topics. Normally stored in a file named course.outline, it serves as the input from which the course topics page is generated.

If you have no interest in preparing a topics page (e.g., because you are going to link to each document directly from Blackboard or some other Learning Management System) then you can simply provide a skeleton outline file and ignore the generated page.

The outline file looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<outline>
  <preamble> 1
    <info>
      <title>CS333 Topics</title>
    </info>
  </preamble>


<topic title="Part I. Coding in C++"> 2
  <topic title="Overview"> 3
    <item kind="slides" targetdoc="policiesAndThemes">Course Policies and Themes</item>
  </topic>

    <topic title="Primitive Data Types and Assignments"> 4
      <item kind="lab" 5
          date="2010-01-11" enddate="2010-01-15" 6
          targetdoc="accountSetup"7/>
      <item kind="reading">Chapters 1,2</item> 8
      <item kind="lecturenotes" targetdoc="assignment"/>
      <item kind="quiz" 
            href="http://www.course.com/downloads/computerscience/malikcpp3e/selftests2.cfm">
        Self-test (ungraded)</item> 9
    </topic>

    <topic title="I/O">
      <item kind="reading">Chapter 3</item>
      <item kind="lecturenotes" targetdoc="io"/>
      <item kind="lab" 
          targetdoc="usingCodeBlocks1"/>
      <item kind="quiz" 
            href="http://www.course.com/downloads/computerscience/malikcpp3e/selftests3.cfm">
        Self-test (ungraded)</item>
    </topic>

       ...

    
    <topic title="End of Part I">
        <item kind="asst" date="2010-02-19">All assignments from Part I are due by the end of the day</item>
        <item kind="exam"  id="exam1_s10" href="../assts/exam1.html" date="2010-02-14">Exam 1</item>
        
        
    </topic>
</topic>

<topic title="part II. Programming in C++">
    ...
</topic>


<postscript>  10
  <informaltable>
    <tr>
      <th colspan="2">
        Symbol Key
      </th>
    </tr>
    <tr>
      <td>
        <img alt="conference" src="lecture.gif"/>
      </td>
      <td>Conference</td>
    </tr>
    <tr>
      <td>
        <img alt="lecture notes" src="lecturenotes.gif"/>
      </td>
      <td>Lecture Notes</td>
    </tr>
    <tr>
      <td>
        <img alt="slides" src="slides.gif"/>
      </td>
      <td>Slides</td>
    </tr>
    <tr>
      <td>
      <img alt="text" src="text.gif"/></td>
      <td>Text</td>
    </tr>
     ...
  </informaltable>
  <blockquote>
    <para>All times in this schedule are given in Eastern Time.</para>
  </blockquote>
</postscript>

  <presentation>  11
    <column title="Topics" kinds="topics"/>
    <column title="Lecture Notes" kinds="lecture lecturenotes slides event exam"/>
    <column title="Readings" kinds="reading text"/>
    <column title="Assignments &amp; Quizzes" kinds="exam quiz asst lab unix"/>
  </presentation>
</outline>

1

The preamble can contain any text that you want to appear at the top of the topics page.

2

The actual content is grouped into topics. Each topic can (should) have a title.

3

Topics can nest within other topics.

4

Topics that do not contain other topics can instead hold an arbitrary number of items. Each item represents something a student can read or do.

5

Each item has a "kind". The item kinds are arbitrary labels. They are, however, used to group items into columns in the topics page. They also are used to select a small icon graphic that will appear next to the item in the topics page. For example, this item will be labeled with the icon lab.gif.

6

Each item may have a single date (date=) or a range of dates (date=, enddate=) as shown here.

7

You will want most (though maybe not all) items to appear in the topics page as a hyperlink. There are two ways to do this. The first, shown here, is for documents that are part of the same document set. Use targetdoc (and, optionally, targetptr) to name the desired document.

If the <item> does not contain any text, then the title of the document is fetched automatically and inserted into the topics page as the content of the link.

8

An example of an item that does not turn into a hyperlink.

9

To link to documents that are not part of the same document set, use href=URL. For this style of linking, you must give the document title explicitly in the item.

I often find that I want to leave item titles in the topics page so that students can see that something is coming up, while not providing a hyperlink until the material (e.g., an assignment) is actually ready. Renaming the href attribute to something else (e.g., "hrefx") takes care of this without losing the URL entirely.

You can do something similar with the targetdoc style links, but if you have not provided an explicit title, then breakingthe link prevents the system from fetching the document title and the item simply disappears from the topics page.

10

The postscript section contains any text that you would like to appear at the bottom of the topics page. I usually use this are to provide a symbol key for the various icons.

11

The presentation section controls how many columns will be used for the topics page and which items wil lappear in which column. For each desired column (you need at least one), give a title to appear in the header of that column and then list the item kinds to appear in that column.

A common mistake is to omit item kinds from these lists. If so, those items will not appear in the topics page.