Documentation and Documentation Generators

Steven J Zeil

Last modified: Apr 1, 2020

Contents:

1 Source Code Documentation

1.1 Comments

1.2 Self-Documenting Code

2.3 Other API Documentation Generators

… because everyone loves writing documentation.

1 Source Code Documentation

1.1 Comments

widely used
widely abused

1.1.1 Do Comments Matter?

McConnell has a good & balanced discussion on this.

Source code commenting is often a crutch to hide
- poor naming
- a failure to extract code blocks into recognizable functions
- poor design
- lack of quality tools (version control, issue tracking, source formatters)
Still useful for
- Explaining why a thing is being done
- Documenting a pseudo-code based design
- Cross-referencing related items

Modern focus has shifted considerably away from commenting bodies towards API documentation.

1.1.2 Which is better?

double m; // mean average
double s; // standard deviation

double meanAverage
double standardDeviation

1.1.3 Which is better?

// Sum up the data
double sum = 0.0;
double sumSquares = 0.0;
// Add up the sums
for (double d: scores)
{
   sum += d;
   sumSquares += d*d;
}

// Compute the average and standard
//  deviation
double meanAverage = sum / numScores;
double standardDeviation =
   sqrt ((sumSquares - numScores*sum*sum)
            /(numScores - 1.0));

// Subtract the average from each data
// item and divide by the standard
// deviation.
for (int i = 0; i < numScores; ++i)
{
   scores[i] = (scores[i] - meanAverage)
       / standardDeviation;
}

// Compute summary statistics
double sum = 0.0;
double sumSquares = 0.0;

for (double d: scores)
{
   sum += d;
   sumSquares += d*d;
}

double meanAverage = sum / numScores;
double standardDeviation =
   sqrt ((sumSquares - numScores*sum*sum)
            / (numScores - 1.0));

// Normalize the scores
for (int i = 0; i < numScores; ++i)
{
   scores[i] = (scores[i] - meanAverage)
       / standardDeviation;
}

1.1.4 Which is better?

// Compute summary statistics
double sum = 0.0;
double sumSquares = 0.0;

for (double d: scores)
{
   sum += d;
   sumSquares += d*d;
}

double meanAverage = sum / numScores;
double standardDeviation =
   sqrt ((sumSquares - numScores*sum*sum)
            /(numScores - 1.0));

// Normalize the scores
for (int i = 0; i < numScores; ++i)
   scores[i] = (scores[i] - meanAverage)
       / standardDeviation;

void computeSummaryStatistics (
   const double* scores,      // inputs
   int numScores,
   double& meanAverage,       // outputs
   double& standardDeviation)
{
  double sum = 0.0;
  double sumSquares = 0.0;
  for (double d: scores)
  {
	 sum += d;
	 sumSquares += d*d;
  }

  meanAverage = sum / numScores;
  standardDeviation =
	 sqrt ((sumSquares - numScores*sum*sum)
			  /(numScores - 1.0));
}


void normalizeData (double* data,
                    int numData,
                    double center,
					double spread)
{
  for (int i = 0; i < numData; ++i)
    data[i] = (data[i] - center) / spread;
}

    ⋮

double meanAverage;
double standardDeviation;
computeSummaryStatistics (scores, numScores,
    meanAverage, standardDeviation);
normalizeData (scores, numScores,
    meanAverage, standardDeviation);

1.1.5 Which is better?

void computeSummaryStatistics (
   const double* scores,      // inputs
   int numScores,
   double& meanAverage,       // outputs
   double& standardDeviation)
{
  double sum = 0.0;
  double sumSquares = 0.0;

  for (double d: scores)
  {
	 sum += d;
	 sumSquares += d*d;
  }

  meanAverage = sum / numScores;
  standardDeviation =
	 sqrt ((sumSquares - numScores*sum*sum)
			  /(numScores - 1.0));
}


void normalizeData (double* data,
                    int numData,
                    double center,
					double spread)
{
  for (int i = 0; i < numData; ++i)
	 data[i] = (data[i] - center) / spread;
}
    ⋮

double meanAverage;
double standardDeviation;
computeSummaryStatistics (scores, numScores,
    meanAverage, standardDeviation);
normalizeData (scores, numScores,
    meanAverage, standardDeviation);

void computeSummaryStatistics (
   const double* scores,      // inputs
   int numScores,
   double& meanAverage,       // outputs
   double& standardDeviation)
{
  double sum = accumulate(
     scores, scores+numScores);
  double sumSquares = accumulate(
     scores, scores+numScores,
     [](double x, double y)
	   {return x + y*y;});

  meanAverage = sum / numScores;
  standardDeviation =
	 sqrt ((sumSquares - numScores*sum*sum)
			  /(numScores - 1.0));
}


    ⋮

// Normalize the scores
double meanAverage;
double standardDeviation;
computeSummaryStatistics (scores, numScores,
    meanAverage, standardDeviation);
transform (
    scores, scores+numScores,
	scores,
    [] (double d) {
	  return (d - meanAverage)
	           / standardDeviation});

1.1.6 Kinds of Comments

Repeat of the code
- Useless
Explanation of the code
- Only useful if the code is confusing
- In which case, first priority should be to simplify the code.
Markers
- notes not intended to be left in final code
- If standardized, useful
  - e.g., // TODO in Eclipse
    - flagged in editor
    - easily searched for
Summary of the code
- applied to entire code “paragraphs”
- useful to allow easy skimming of the code
Description of intent
- similar to summary, but describes problem rather than solution
Information that cannot be expressed in the code
- e.g., authors, copyright, date of modification

1.2 Self-Documenting Code

Self-Documenting code relies on good programming style to perform most of the documentation.

“the Holy Grail of legibility” (McConnell)

1.2.1 Characteristics of Self-Documenting Code

Classes

Does the class’s interface present a consistent abstraction?

Is the class well named, and does its name describe its central purpose?

Does the class’s interface make obvious how you should use the class?

Is the class’s interface abstract enough that you don’t have to think about how its services are implemented? Can you treat the class as a black box?

Routines

Does each routine’s name describe exactly what the routine does?

Does each routine perform one well-defined task?

Have all parts of each routine that would benefit from being put into their own routines been put into their own routines?

Is each routine’s interface obvious and clear?

Data Names

Are type names descriptive enough to help document data declarations?

Are variables named well?

Are variables used only for the purpose for which they’re named?

Are loop counters given more informative names than i, j, and k?

Are well-named enumerated types used instead of makeshift flags or boolean variables?

Are named constants used instead of magic numbers or magic strings?

Do naming conventions distinguish among type names, enumerated types, named constants, local variables, class variables, and global variables?

Data Organization

Are extra variables used for clarity when needed?

Are references to variables close together?

Are data types simple so that they minimize complexity?

Is complicated data accessed through abstract access routines (abstract data types)?

Control

Is the nominal path through the code clear?

Are related statements grouped together?

Have relatively independent groups of statements been packaged into their own routines?

Does the normal case follow the if rather than the else?

Are control structures simple so that they minimize complexity?

Does each loop perform one and only one function, as a well-defined routine would?

Is nesting minimized?

Have boolean expressions been simplified by using additional boolean variables, boolean functions, and decision tables?

Layout

Does the program’s layout show its logical structure?

Design

Is the code straightforward, and does it avoid cleverness?

Are implementation details hidden as much as possible?

Is the program written in terms of the problem domain as much as possible rather than in terms of computer-science or programming-language structures?

(McConnell, ch 32)

1.3 Charting

How many forms of software documentation charting do you know?

Control Flow
- Flowcharts
- Nassi-Schneidermann Charts
- State diagrams
- UML interaction diagrams
Module relationships
- Structure (call) charts
- Data-Flow Diagrams
- SADT (Structured Analysis and Design Technique)
- E-R
- UML class relationship diagrams

1.3.1 From Code to Charts

For as long as people have been writing source code, they’ve been looking for ways to ease the effort of documenting that code.

Often after-the-fact

Earliest examples were automatic flowchart generators

Generating flowcharts from source code.

Raw results were poor quality

But still could be claimed to satisfy client requirements

As flowcharts declined in popularity, so did the demand for these tools.

Still offered in reverse engineering tools ( e.g. )

Flowchart synced to code viewer

Human retitles blocks as “understanding” of the code progresses

1.3.2 From Charts to Code

A hallmark of so-called CASE (Computer-Aided Software Engineering) systems

Modern versions generate class declarations from UML class diagrams

2 API Documentation

API documentation tools are now more common

Reflect modern emphasis on re-usable interfaces
Combine info from
- a (limited) language parser
  - Extracts info about module/function structure and function parameters
- and specially formatted blocks of comments embedded in the source code
Encourages updating comments as code is modified

Comments become a legitimately useful tool for application writers.
- Application writers have less need to access actual code.
- Generate linked documents to facilitate browsing of referenced type names and other entities
- Some IDEs understand this markup as well and use it enhance “live” help while editing code.

2.1 javadoc

Perhaps the best known tool in this category

part of the standard Java distribution

achieved prominence when Sun used it to document the Java “standard library”.

E.g., 1.6, 1.8

2.1.1 Javadoc Comments

Javadoc markup is enclosed in comments delineated by /** ... */

And therefore processed as normal comments by the Java compiler.

A comment block precedes the entity that it describes

e.g., This page is generated from this source code.
In addition to “free-form” text, can contain special markup

Common Javadoc Markup

@author authorName
@version versionNumber
@param name description
@return description
@throws exceptionClassName description
@see crossReference

Running javadoc

Command line
```
javadoc -d destinationDir -sourcepath sourceCodeDir \
    -link http://docs.oracle.com/javase/7/docs/api/
```
- Can add multiple source paths, links to external libraries
- Can also specify which packages from source code to document
Eclipse: Project ⇒ Generate Javadoc ...

2.1.2 JavaDoc and build managers

Ant

Ant has a javadoc task among its default task set.

A typical invocation might be:

<javadoc packagenames="edu.odu.cs.*"
         destdir="target/javadoc"
         classpathref="javadoc.classpath" Author="yes"
         Version="yes" Use="yes" defaultexcludes="yes">
   <fileset dir="." defaultexcludes="yes">
      <include name="extractor/src/main/java/**" />
      <exclude name="**/*.html" />
   </fileset>
   <doctitle><![CDATA[<h1>ODU CS Extract
                    Project</h1>]]></doctitle>
</javadoc>

Gradle

Gradle provides a javadoc plugin that provides a javadoc task.

Here is the full build.gradle for a simple Java build with Javadoc generation.

Click to reveal

plugins {
   id 'java'
   id 'javadoc'    ➀
}

repositories {
    jcenter()
}

dependencies {
    testImplementation("junit:junit:4.12")
    testRuntimeOnly("org.junit.vintage:junit-vintage-engine:5.5.2")
}

test {
    useJUnit()
    ignoreFailures = true
}

javadoc {                  ➁
   options.with {
     links 'https://docs.oracle.com/javase/8/docs/api/', 'gradle/javadocs/jdk'
   }
   failOnError = false;  ➂
}

build.dependsOn javadoc    ➃

➀ use the javadoc plugin.
➁ Include links to the online Java API documentation.
➂ Don’t stop the build just because javadoc finds a problem
➃ Tell Gradle to run the javadoc task before the build task.

2.2 doxygen

the most popular API generator for C/C++
- Also works with Objective-C, C#, Java, IDL, Python, PHP, VHDL, and FORTRAN
Markup is essentially identical to javadoc
- Internal formatting available via Markdown
Output can be HTML, LaTeX, or RTF
Can also generate
- various non-quite-UML diagrams
- and hyperlinked source code

Running doxygen

Command line

doxygen configFile

The config file can contain any of a bewildering set of options in typical property-file style:

PROJECT_NAME = C++ Spreadsheet
INPUT = src/model
OUTPUT_DIRECTORY = target/doc
EXTRACT_ALL = YES
CLASS_DIAGRAMS = YES
GENERATE_HTML = YES
GENERATE_LATEX = YES
USE_PDFLATEX = YES

Eclipse: Eclox plugin
Ant task for doxygen
Gradle plugins for doxygen (untried)

2.3 Other API Documentation Generators

Because a documentation generator needs to module and function structure and function parameters, a distinct parser is needed for each programming language.

This leads to a variety of language-specific tools, e.g.,

jsDoc for Javascript
YARD for Ruby
sandcastle for .Net