Program Analysis Tools

Steven J Zeil

Last modified: Nov 30, 2020
Contents:

Abstract

In this lesson we look at a variety of code analysis tools available to the practicing software developer. These include static analysis tools that examine code without executing it, and dynamic analysis tools that monitor code while it is being run on tests or in operation.

We will look at the kinds of information that developers can obtain from these tools, the potential value offered by this information, and how such tools can be integrated into an automated build or a continuous integration setup.


Classifying Analysis Tools


Analysis Tools and Compilers

Analysis tools, particularly static, share a great deal with compilers

1 Representing Programs

Most static analysis is based upon one of these graphs

That’s “graphs” in the discrete mathematics (CS 381) or data structures (CS 361) sense: a collection of nodes connected by edges, not the sense of points plotted on X-Y axes.

1.1 Abstract Syntax Trees (ASTs)

 


Abstract Syntax Trees (cont.)

 


Abstract Syntax Trees (cont.)

 

1.1.1 Abstract Syntax Graphs

 

1.2 Control Flow Graphs

Represent each executable statement in the code as a node,

1.2.1 Sample CFG

 

01: procedure SQRT (Q, A, B: in float;
02:                 X: out float);
03: // Compute X = square root of Q,
04: //    given that A <= X <= B
05:    X1, F1, F2, H: float;
06: begin
07:    X1 := A;
08:    X2 := B;
09:    F1 := Q - X1**2
10:    H := X2 - X1;
11:    while (ABS(H) >= 0.001) loop
12:       F2 := Q - X2**2;
13:       H := - F2 * ((X2-X1)/(F2-F1));
14:       X1 := X2;
15:       X2 := X2 + H;
16:       F1 := F2
17:    end loop;
18:    X := (X1 + X2) / 2.;
19: end SQRT;

Simplifying CFGs: Basic Blocks

 

procedure SQRT (Q, A, B: in float; //  node 0
                X: out float);
// Compute X = square root of Q,
//    given that A <= X <= B
   X1, F1, F2, H: float;
begin
   X1 := A;
   X2 := B;                        // node 1
   F1 := Q - X1**2
   H := X2 - X1;
   while (ABS(H) >= 0.001) loop    // node 2
      F2 := Q - X2**2;
      H := - F2 * ((X2-X1)/(F2-F1));
      X1 := X2;                    // node 3
      X2 := X2 + H;
      F1 := F2
   end loop;
   X := (X1 + X2) / 2.;            // node 4
end SQRT;                          // node 5

1.2.2 Data Flow Analysis


Data-Flow Annotated CFG

 

procedure SQRT (Q, A, B: in float; //  node 0
                X: out float);
// Compute X = square root of Q,
//    given that A <= X <= B
   X1, F1, F2, H: float;
begin
   X1 := A;
   X2 := B;                        // node 1
   F1 := Q - X1**2
   H := X2 - X1;
   while (ABS(H) >= 0.001) loop    // node 2
      F2 := Q - X2**2;
      H := - F2 * ((X2-X1)/(F2-F1));
      X1 := X2;                    // node 3
      X2 := X2 + H;
      F1 := F2
   end loop;
   X := (X1 + X2) / 2.;            // node 4
end SQRT;                          // node 5

1.2.3 Reaching Definitions

 

A definition di(x) reaches a node nj iff there exists a path from ni to nj on which x is neither defined nor undefined.

What definitions reach the reference to X1 in node 4?

What definitions reach the reference to H in node 2?

1.2.4 Data Flow Anomalies

The reaching definitions problem can be used to detect anomolous patterns that may reflect errors.

1.2.5 Available Expressions

An expression e is available at a node n iff every path from the start of the program to n evaluates e, and iff, after the last evaluation of e on each such path, there are no subsequent definitions or undefinitions to the variables in e.

 

procedure SQRT (Q, A, B: in float; //  node 0
                X: out float);
// Compute X = square root of Q,
//    given that A <= X <= B
   X1, F1, F2, H: float;
begin
   X1 := A;
   X2 := B;                        // node 1
   F1 := Q - X1**2
   H := X2 - X1;
   while (ABS(H) >= 0.001) loop    // node 2
      F2 := Q - X2**2;
      H := - F2 * ((X2-X1)/(F2-F1));
      X1 := X2;                    // node 3
      X2 := X2 + H;
      F1 := F2
   end loop;
   X := (X1 + X2) / 2.;            // node 4
end SQRT;                          // node 5

Is the expression X2 - X1 available at the start of node 3?

At the end of node 3?

Same questions for Q - X2**2

1.2.6 Live Variables

 

A variable x is live at node n iff there exists a path starting at n along which x is used without prior redefinition.

In what nodes in H live?

In what nodes is X1 live?

What does this tell you about memory allocation within this function?

1.2.7 Data Flow and Optimization

Optimization Technique Data-Flow Information
Constant Propagation reach
Copy Propagation reach
Elimination of Common Subexpressions available
Dead Code Elimination live, reach
Register Allocation live
Anomaly Detection reach
Code Motion reach

2 Style and Anomaly Checking

A common form of static analysis:

2.1 Lint

Perhaps the first such tool to be widely used, lint (1979) became a staple tool for C programmers.

Combines static analysis with style recommendations, e.g.,


Is there room for lint-like tools?

2.2 Static Analysis by Compilers


Analysis Options for g++

g++ offers several “collections” flags that turn on multiple warnings (which could have been turned on individually).

You explored these in an earlier lab.

2.3 CheckStyle

checkstyle is a tool for enforcing Java coding standards.

2.3.1 Example: Adding Checkstyle to gradle

2.4 SpotBugs

Unlike Checkstyle, SpotBugs goes well beyond cosmetics:

2.4.1 SpotBugs in Gradle

In build.gradle:

plugins {
   id 'java'
   ⋮
   id "com.github.spotbugs" version "4.6.0"
}

spotbugsMain {
    ignoreFailures = true
    effort = 'max'
    reportLevel = 'medium'
    reports {
       xml.enabled = false
       html.enabled = true
    }
}

spotbugsTest.enabled = false

2.5 PMD

Another good tool for finding non-cosmetic problems in your code:

2.5.1 PMD Reports

2.5.2 Example: Adding PMD to gradle

2.5.3 Example: Adding PMD to Eclipse

2.5.4 Customizing PMD

In Gradle and Eclipse, you customize by giving your own ruleset, like this one.

Gradle:

pmd {
    ruleSetFiles = ["config/pmd/ruleset.xml"]
}

Removing a Rule

A common thing to do in a custom ruleset is to remove a rule entirely:

<rule ref="rulesets/java/comments.xml">
   <exclude name="CommentSize"/>
</rule>

(This rule is well-intentioned, but tends to flag Javadoc-style comments that often have good reason to exceed it’s limit of 6 lines per comment.)


Modifying a Rule

<rule ref="category/java/codestyle.xml/ClassNamingConventions">
   <properties>
     <property name="utilityClassPattern" value="[A-Z][a-zA-Z0-9]+"/>
   </properties>
</rule>

This rule defaults to insisting that all Java “utility” classes (ones that have no constructors) should have names ending with “Helper” or “Util”.


Excluding Source Code

<ruleset ⋮>
  <description>PMD rule set - java applications</description>
  <exclude-pattern>.*/src/test/java/.*</exclude-pattern>

I don’t find PMD checks on Unit test code to be particularly useful.

3 Reverse-Engineering Tools

Reverse engineering makes heavy use of static analysis, and is even more closely tied to compiler technology than the tools we have looked at so far.

3.1 Reverse Compilers

a.k.a. “uncompilers”


Java and Decompilation

3.1.1 Example of Java Decompilation

For example, I might write the following code:

void drawGraphics(Graphics g, Point[] pts)
{
  double xMin = pts[0].x;
  double xMax = pts[0].x;
  double yMin = pts[0].y;
  double yMax = pts[0].y;

Defending Against Decompilers

3.2 Java Obfuscators

Work by a combination of

3.3 Obfuscation Example

Example, given the compiled code from

void drawGraphics(Graphics g, Point[] pts)
{
  double xMin = pts[0].x;
  double xMax = pts[0].x;
  double yMin = pts[0].y;
  double yMax = pts[0].y;

the obfuscator yguard will rewrite the code so that the best that a decompiler could produce is:

void a(Graphics a, Point[] b)
{
  double d0;
  double d1;
  double d2;
  double d3;
  _mthfor(d0, _mthdo(b, 0));
  _mthfor(d1, _mthdo(b, 0));
  _mthfor(d2, _mthif(b, 0));
  _mthfor(d3, _mthif(b, 0));

4 Dynamic Analysis Tools

Not all useful analysis can be done statically


Abusing Data Structures

4.1 Pointer/Memory Errors

Memory Abuse


How to Catch Pointer Errors


Memory Analysis Tools

4.2 Profilers

Profilers provide info on where a program is speding most of its execution time


Profiling Tools