Program Analysis Tools

Steven J Zeil

Last modified: Dec 4, 2016
Contents:
1 Representing Programs
1.1 Abstract Syntax Trees (ASTs)
1.2 Control Flow Graphs
2 Style and Anomaly Checking
2.1 Lint
2.2 Static Analysis by Compilers
2.3 CheckStyle
2.4 FindBugs
2.5 PMD
3 Reverse-Engineering Tools
3.1 Reverse Compilers
3.2 Java Obfuscators
3.3 Obfuscation Example
4 Dynamic Analysis Tools
4.1 Pointer/Memory Errors
4.2 Profilers

Classifying Analysis Tools


Analysis Tools and Compilers

Analysis tools, particularly static, share a great deal with compilers

1 Representing Programs

Most static analysis is based upon one of these graphs

1.1 Abstract Syntax Trees (ASTs)

 


Abstract Syntax Trees (cont.)

 


Abstract Syntax Trees (cont.)

 

1.1.1 Abstract Syntax Graphs

 

1.2 Control Flow Graphs

Represent each executable statement in the code as a node,

1.2.1 Sample CFG

 

01: procedure SQRT (Q, A, B: in float;
02:                 X: out float);
03: // Compute X = square root of Q, 
04: //    given that A <= X <= B
05:    X1, F1, F2, H: float;
06: begin
07:    X1 := A;
08:    X2 := B;
09:    F1 := Q - X1**2
10:    H := X2 - X1;
11:    while (ABS(H) >= 0.001) loop
12:       F2 := Q - X2**2;
13:       H := - F2 * ((X2-X1)/(F2-F1));
14:       X1 := X2;
15:       X2 := X2 + H;
16:       F1 := F2
17:    end loop;
18:    X := (X1 + X2) / 2.;
19: end SQRT;

Simplifying CFGs: Basic Blocks

 

procedure SQRT (Q, A, B: in float; //  node 0 
                X: out float);
// Compute X = square root of Q, 
//    given that A <= X <= B
   X1, F1, F2, H: float;
begin
   X1 := A;
   X2 := B;                        // node 1 
   F1 := Q - X1**2
   H := X2 - X1;
   while (ABS(H) >= 0.001) loop    // node 2 
      F2 := Q - X2**2;
      H := - F2 * ((X2-X1)/(F2-F1));
      X1 := X2;                    // node 3
      X2 := X2 + H;
      F1 := F2
   end loop;
   X := (X1 + X2) / 2.;            // node 4
end SQRT;                          // node 5

1.2.2 Data Flow Analysis


Data-Flow Annotated CFG

 

procedure SQRT (Q, A, B: in float; //  node 0 
                X: out float);
// Compute X = square root of Q, 
//    given that A <= X <= B
   X1, F1, F2, H: float;
begin
   X1 := A;
   X2 := B;                        // node 1 
   F1 := Q - X1**2
   H := X2 - X1;
   while (ABS(H) >= 0.001) loop    // node 2 
      F2 := Q - X2**2;
      H := - F2 * ((X2-X1)/(F2-F1));
      X1 := X2;                    // node 3
      X2 := X2 + H;
      F1 := F2
   end loop;
   X := (X1 + X2) / 2.;            // node 4
end SQRT;                          // node 5

1.2.3 Reaching Definitions

 

A definition di(x) reaches a node nj iff there exists a path from ni to nj on which x is neither defined nor undefined.

What definitions reach the reference to X1 in node 4?

What definitions reach the reference to H in node 2?

1.2.4 Data Flow Anomalies

The reaching definitions problem can be used to detect anomolous patterns that may reflect errors.

2 Style and Anomaly Checking

A common form of static analysis:

2.1 Lint

Perhaps the first such tool to be widely used, lint (1979) became a staple tool for C programmers.

Combines static analysis with style recommendations, e.g.,


Is there room for lint-like tools?

2.2 Static Analysis by Compilers


Analysis Options for g++

g++ offers several “collections” flags that turn on multiple warnings (which could have been turned on individually).

You explored these in an earlier lab.

2.3 CheckStyle

checkstyle is a tool for enforcing Java coding standards.

2.3.1 Example: Adding Checkstyle to gradle

2.4 FindBugs

2.4.1 What Bugs does FindBugs Find?

Unlike Checkstyle, FindBugs goes well beyond cosmetics:

2.4.2 Example: Adding Findbugs to gradle

2.5 PMD

Another good tool for finding non-cosmetic problems in your code:

2.5.1 PMD Reports

2.5.2 Example: Adding PMD to gradle

3 Reverse-Engineering Tools

Reverse engineering makes heavy use of static analysis, and is even more closely tied to compiler technology than the tools we have looked at so far.

3.1 Reverse Compilers

a.k.a. “uncompilers”


Java and Decompilation

3.1.1 Example of Java Decompilation

For example, I might write the following code:

void drawGraphics(Graphics g, Point[] pts)
{
  double xMin = pts[0].x;
  double xMax = pts[0].x;
  double yMin = pts[0].y;
  double yMax = pts[0].y;

Defending Against Decompilers

3.2 Java Obfuscators

Work by a combination of

3.3 Obfuscation Example

Example, given the compiled code from

void drawGraphics(Graphics g, Point[] pts)
{
  double xMin = pts[0].x;
  double xMax = pts[0].x;
  double yMin = pts[0].y;
  double yMax = pts[0].y;

the obfuscator yguard will rewrite the code so that the best that a decompiler could produce is:

void a(Graphics a, Point[] b)
{
  double d0;
  double d1;
  double d2;
  double d3;
  _mthfor(d0, _mthdo(b, 0));
  _mthfor(d1, _mthdo(b, 0));
  _mthfor(d2, _mthif(b, 0));
  _mthfor(d3, _mthif(b, 0));

4 Dynamic Analysis Tools

Not all useful analysis can be done statically


Abusing Data Structures

4.1 Pointer/Memory Errors

Memory Abuse


How to Catch Pointer Errors


Memory Analysis Tools


** Sample of Leaktracer Output **

Gathered 8 (8 unique) points of data.
(gdb)
Allocations: 1 / Size: 36
0x80608e6 is in NullArcableInstance::NullArcableInstance(void) (Machine.cc:40).
39      public:
40          NullArcableInstance() : ArcableInstance(new NullArcable) {}

Allocations: 1 / Size: 8 
0x8055b02 is in init_types(void) (Type.cc:119). 
118 void init_types() { 
119 Type::Integer = new IntegerType;

Allocations: 1 / Size: 132 (new[]) 
0x805f4ab is in Hashtable<NativeCallable, String, false, true>::Hashtable(unsigned int) (ea/h/Hashtable.h:15). 
14 Hashtable (uint _size = 32) : size(_size), count(0) { 
15 table = new List<E, own> [size]; 

4.2 Profilers

Profilers provide info on where a program is speding most of its execution time


Profiling Tools