Program Analysis Tools

Steven J Zeil

Last modified: Apr 07, 2014

Analysis Tools and Compilers

Analysis tools, particularly static, share a great deal with compilers

1. Abstract Syntax Trees (ASTs)

Abstract Syntax Trees (cont.)

Abstract Syntax Trees (cont.)

Abstract Syntax Graphs

2. Data Flow Analysis

Sample CFG

procedure SQRT (Q, A, B: in float; //  node 0 
                X: out float);
// Compute X = square root of Q, 
//    given that A <= X <= B
   X1, F1, F2, H: float;
begin
   X1 := A;
   X2 := B;                        // node 1 
   F1 := Q - X1**2
   H := X2 - X1;
   while (ABS(H) >= 0.001) loop    // node 2 
      F2 := Q - X2**2;
      H := - F2 * ((X2-X1)/(F2-F1));
      X1 := X2;                    // node 3
      X2 := X2 + H;
      F1 := F2
   end loop;
   X := (X1 + X2) / 2.;            // node 4
end SQRT;                          // node 5

2.1 Reaching Definitions

A definition di(x) reaches a node nj iff there exists a path from ni to nj on which x is neither defined nor undefined.

What definitions reach the reference to X1 in node 4?

What definitions reach the reference to H in node 2?

Data Flow Anomalies

The reaching definitions problem can be used to detect anomolous patterns that may reflect errors.

2.2 Available Expressions

An expression e is available at a node n iff every path from the start of the program to n evaluates e, and iff, after the last evaluation of e on each such path, there are no subsequent definitions or undefinitions to the variables in e.

procedure SQRT (Q, A, B: in float; //  node 0 
                X: out float);
// Compute X = square root of Q, 
//    given that A <= X <= B
   X1, F1, F2, H: float;
begin
   X1 := A;
   X2 := B;                        // node 1 
   F1 := Q - X1**2
   H := X2 - X1;
   while (ABS(H) >= 0.001) loop    // node 2 
      F2 := Q - X2**2;
      H := - F2 * ((X2-X1)/(F2-F1));
      X1 := X2;                    // node 3
      X2 := X2 + H;
      F1 := F2
   end loop;
   X := (X1 + X2) / 2.;            // node 4
end SQRT;                          // node 5

Is the expression X2 - X1 available at the start of node 3?

At the end of node 3?

Same questions for Q - X2**2

2.3 Live Variables

A variable x is live at node n iff there exists a path starting at n along which x is used without prior redefinition.

In what nodes in H live?

In what nodes is X1 live?

What does this tell you about memory allocation within this function?

2.4 Data Flow and Optimization

Optimization Technique Data-Flow Information
Constant Propagation reach
Copy Propagation reach
Elimination of Common Subexpressions available
Dead Code Elimination live, reach
Register Allocation live
Anomaly Detection reach
Code Motion reach

3. Static Analysis Tools

3.1 Style and Anomaly Checking

Lint

Perhaps the first such tool to be widely used, lint (1979) became a staple tool for C programmers.

Combines static analysis with style recommendations, e.g.,

Is there room for lint-like tools?

FindBugs

What Bugs does FindBugs Find?

PMD

PMD Reports

3.2 Reverse Compilers & Obfuscators

Reverse Compilers

a.k.a. “uncompilers”

Java and Decompilation

Java Obfuscators

Work by a combination of

Example, yguard

4. Dynamic Analysis Tools

Not all useful analysis can be done statically

Abusing Data Structures

4.1 Pointer/Memory Errors

Memory Abuse

How to Catch Pointer Errors

Memory Analysis Tools

** Sample of Leaktracer Output **

Gathered 8 (8 unique) points of data.
(gdb)
Allocations: 1 / Size: 36
0x80608e6 is in NullArcableInstance::NullArcableInstance(void) (Machine.cc:40).
39      public:
40          NullArcableInstance() : ArcableInstance(new NullArcable) {}

Allocations: 1 / Size: 8 
0x8055b02 is in init_types(void) (Type.cc:119). 
118 void init_types() { 
119 Type::Integer = new IntegerType;

Allocations: 1 / Size: 132 (new[]) 
0x805f4ab is in Hashtable<NativeCallable, String, false, true>::Hashtable(unsigned int) (ea/h/Hashtable.h:15). 
14 Hashtable (uint _size = 32) : size(_size), count(0) { 
15 table = new List<E, own> [size]; 

4.2 Profilers

Profilers provide info on where a program is speding most of its execution time

Profiling Tools