Documentation and Documentation Generators
Steven J Zeil
… because everyone loves writing documentation.
1 Source Code Documentation
1.1 Comments
-
widely used
-
widely abused
1.1.1 Do Comments Matter?
McConnell has a good & balanced discussion on this.
-
Source code commenting is often a crutch to hide
- poor naming
- a failure to extract code blocks into recognizable functions
- poor design
- lack of quality tools (version control, issue tracking, source formatters)
-
Still useful for
- Explaining why a thing is being done
- Documenting a pseudo-code based design
- Cross-referencing related items
Modern focus has shifted considerably away from commenting bodies towards API documentation.
1.1.2 Which is better?
double m; // mean average
double s; // standard deviation
double meanAverage
double standardDeviation
1.1.3 Which is better?
// Sum up the data
double sum = 0.0;
double sumSquares = 0.0;
// Add up the sums
for (double d: scores)
{
sum += d;
sumSquares += d*d;
}
// Compute the average and standard
// deviation
double meanAverage = sum / numScores;
double standardDeviation =
sqrt ((sumSquares - numScores*sum*sum)
/(numScores - 1.0));
// Subtract the average from each data
// item and divide by the standard
// deviation.
for (int i = 0; i < numScores; ++i)
{
scores[i] = (scores[i] - meanAverage)
/ standardDeviation;
}
// Compute summary statistics
double sum = 0.0;
double sumSquares = 0.0;
for (double d: scores)
{
sum += d;
sumSquares += d*d;
}
double meanAverage = sum / numScores;
double standardDeviation =
sqrt ((sumSquares - numScores*sum*sum)
/ (numScores - 1.0));
// Normalize the scores
for (int i = 0; i < numScores; ++i)
{
scores[i] = (scores[i] - meanAverage)
/ standardDeviation;
}
1.1.4 Which is better?
// Compute summary statistics
double sum = 0.0;
double sumSquares = 0.0;
for (double d: scores)
{
sum += d;
sumSquares += d*d;
}
double meanAverage = sum / numScores;
double standardDeviation =
sqrt ((sumSquares - numScores*sum*sum)
/(numScores - 1.0));
// Normalize the scores
for (int i = 0; i < numScores; ++i)
scores[i] = (scores[i] - meanAverage)
/ standardDeviation;
void computeSummaryStatistics (
const double* scores, // inputs
int numScores,
double& meanAverage, // outputs
double& standardDeviation)
{
double sum = 0.0;
double sumSquares = 0.0;
for (double d: scores)
{
sum += d;
sumSquares += d*d;
}
meanAverage = sum / numScores;
standardDeviation =
sqrt ((sumSquares - numScores*sum*sum)
/(numScores - 1.0));
}
void normalizeData (double* data,
int numData,
double center,
double spread)
{
for (int i = 0; i < numData; ++i)
data[i] = (data[i] - center) / spread;
}
⋮
double meanAverage;
double standardDeviation;
computeSummaryStatistics (scores, numScores,
meanAverage, standardDeviation);
normalizeData (scores, numScores,
meanAverage, standardDeviation);
1.1.5 Which is better?
void computeSummaryStatistics (
const double* scores, // inputs
int numScores,
double& meanAverage, // outputs
double& standardDeviation)
{
double sum = 0.0;
double sumSquares = 0.0;
for (double d: scores)
{
sum += d;
sumSquares += d*d;
}
meanAverage = sum / numScores;
standardDeviation =
sqrt ((sumSquares - numScores*sum*sum)
/(numScores - 1.0));
}
void normalizeData (double* data,
int numData,
double center,
double spread)
{
for (int i = 0; i < numData; ++i)
data[i] = (data[i] - center) / spread;
}
⋮
double meanAverage;
double standardDeviation;
computeSummaryStatistics (scores, numScores,
meanAverage, standardDeviation);
normalizeData (scores, numScores,
meanAverage, standardDeviation);
void computeSummaryStatistics (
const double* scores, // inputs
int numScores,
double& meanAverage, // outputs
double& standardDeviation)
{
double sum = accumulate(
scores, scores+numScores);
double sumSquares = accumulate(
scores, scores+numScores,
[](double x, double y)
{return x + y*y;});
meanAverage = sum / numScores;
standardDeviation =
sqrt ((sumSquares - numScores*sum*sum)
/(numScores - 1.0));
}
⋮
// Normalize the scores
double meanAverage;
double standardDeviation;
computeSummaryStatistics (scores, numScores,
meanAverage, standardDeviation);
transform (
scores, scores+numScores,
scores,
[] (double d) {
return (d - meanAverage)
/ standardDeviation});
1.1.6 Kinds of Comments
-
Repeat of the code
-
Useless
-
-
Explanation of the code
-
Only useful if the code is confusing
-
In which case, first priority should be to simplify the code.
-
-
Markers
-
notes not intended to be left in final code
-
If standardized, useful
-
e.g.,
// TODO
in Eclipse- flagged in editor
- easily searched for
-
-
-
Summary of the code
- applied to entire code “paragraphs”
- useful to allow easy skimming of the code
-
Description of intent
-
similar to summary, but describes problem rather than solution
-
-
Information that cannot be expressed in the code
-
e.g., authors, copyright, date of modification
-
1.2 Self-Documenting Code
Self-Documenting code relies on good programming style to perform most of the documentation.
- “the Holy Grail of legibility” (McConnell)
1.2.1 Characteristics of Self-Documenting Code
Classes
Does the class’s interface present a consistent abstraction?
Is the class well named, and does its name describe its central purpose?
Does the class’s interface make obvious how you should use the class?
Is the class’s interface abstract enough that you don’t have to think about how its services are implemented? Can you treat the class as a black box?
Routines
Does each routine’s name describe exactly what the routine does?
Does each routine perform one well-defined task?
Have all parts of each routine that would benefit from being put into their own routines been put into their own routines?
Is each routine’s interface obvious and clear?
Data Names
Are type names descriptive enough to help document data declarations?
Are variables named well?
Are variables used only for the purpose for which they’re named?
Are loop counters given more informative names than i, j, and k?
Are well-named enumerated types used instead of makeshift flags or boolean variables?
Are named constants used instead of magic numbers or magic strings?
Do naming conventions distinguish among type names, enumerated types, named constants, local variables, class variables, and global variables?
Data Organization
Are extra variables used for clarity when needed?
Are references to variables close together?
Are data types simple so that they minimize complexity?
Is complicated data accessed through abstract access routines (abstract data types)?
Control
Is the nominal path through the code clear?
Are related statements grouped together?
Have relatively independent groups of statements been packaged into their own routines?
Does the normal case follow the if rather than the else?
Are control structures simple so that they minimize complexity?
Does each loop perform one and only one function, as a well-defined routine would?
Is nesting minimized?
Have boolean expressions been simplified by using additional boolean variables, boolean functions, and decision tables?
Layout
- Does the program’s layout show its logical structure?
Design
Is the code straightforward, and does it avoid cleverness?
Are implementation details hidden as much as possible?
Is the program written in terms of the problem domain as much as possible rather than in terms of computer-science or programming-language structures?
(McConnell, ch 32)
1.3 Charting
How many forms of software documentation charting do you know?
- Control Flow
-
Flowcharts
-
Nassi-Schneidermann Charts
- State diagrams
- UML interaction diagrams
-
- Module relationships
-
Structure (call) charts
-
Data-Flow Diagrams
- SADT (Structured Analysis and Design Technique)
- E-R
- UML class relationship diagrams
-
1.3.1 From Code to Charts
-
For as long as people have been writing source code, they’ve been looking for ways to ease the effort of documenting that code.
- Often after-the-fact
-
Earliest examples were automatic flowchart generators
Generating flowcharts from source code.
- Raw results were poor quality
- But still could be claimed to satisfy client requirements
- As flowcharts declined in popularity, so did the demand for these tools.
-
Still offered in reverse engineering tools ( e.g. )
- Flowchart synced to code viewer
- Human retitles blocks as “understanding” of the code progresses
- Raw results were poor quality
1.3.2 From Charts to Code
A hallmark of so-called CASE (Computer-Aided Software Engineering) systems
- Modern versions generate class declarations from UML class diagrams
2 API Documentation
API documentation tools are now more common
-
Reflect modern emphasis on re-usable interfaces
-
Combine info from
-
a (limited) language parser
- Extracts info about module/function structure and function parameters
-
and specially formatted blocks of comments embedded in the source code
-
-
Encourages updating comments as code is modified
Comments become a legitimately useful tool for application writers.
-
Application writers have less need to access actual code.
-
Generate linked documents to facilitate browsing of referenced type names and other entities
-
Some IDEs understand this markup as well and use it enhance “live” help while editing code.
-
2.1 javadoc
Perhaps the best known tool in this category
-
part of the standard Java distribution
-
achieved prominence when Sun used it to document the Java “standard library”.
2.1.1 Javadoc Comments
-
Javadoc markup is enclosed in comments delineated by
/** ... */
- And therefore processed as normal comments by the Java compiler.
-
A comment block precedes the entity that it describes
- e.g., This page is generated from this source code.
-
In addition to “free-form” text, can contain special markup
Common Javadoc Markup
-
@author authorName
-
@version versionNumber
-
@param name description
-
@return description
-
@throws exceptionClassName description
-
@see crossReference
Running javadoc
-
Command line
javadoc -d destinationDir -sourcepath sourceCodeDir \ -link http://docs.oracle.com/javase/7/docs/api/
- Can add multiple source paths, links to external libraries
- Can also specify which packages from source code to document
-
Eclipse:
Project
⇒Generate Javadoc ...
2.1.2 JavaDoc and build managers
Ant
Ant has a javadoc
task among its default task set.
A typical invocation might be:
<javadoc packagenames="edu.odu.cs.*"
destdir="target/javadoc"
classpathref="javadoc.classpath" Author="yes"
Version="yes" Use="yes" defaultexcludes="yes">
<fileset dir="." defaultexcludes="yes">
<include name="extractor/src/main/java/**" />
<exclude name="**/*.html" />
</fileset>
<doctitle><![CDATA[<h1>ODU CS Extract
Project</h1>]]></doctitle>
</javadoc>
Gradle
Gradle
provides a javadoc
plugin that provides a javadoc
task.
Here is the full build.gradle
for a simple Java build with Javadoc generation.
2.2 doxygen
-
the most popular API generator for C/C++
- Also works with Objective-C, C#, Java, IDL, Python, PHP, VHDL, and FORTRAN
-
Markup is essentially identical to javadoc
- Internal formatting available via Markdown
-
Output can be HTML, LaTeX, or RTF
-
Can also generate
- various non-quite-UML diagrams
- and hyperlinked source code
Running doxygen
-
Command line
doxygen configFile
- The config file can contain any of a bewildering set of options in typical property-file style:
PROJECT_NAME = C++ Spreadsheet INPUT = src/model OUTPUT_DIRECTORY = target/doc EXTRACT_ALL = YES CLASS_DIAGRAMS = YES GENERATE_HTML = YES GENERATE_LATEX = YES USE_PDFLATEX = YES
- The config file can contain any of a bewildering set of options in typical property-file style:
-
Eclipse: Eclox plugin
-
Ant task for doxygen
-
Gradle plugins for doxygen (untried)
2.3 Other API Documentation Generators
Because a documentation generator needs to module and function structure and function parameters, a distinct parser is needed for each programming language.
This leads to a variety of language-specific tools, e.g.,
-
jsDoc for Javascript
-
YARD for Ruby
-
sandcastle for .Net