Last modified: Apr 07, 2014
Analysis Tools and Compilers
Analysis tools, particularly static, share a great deal with compilers
Data flow techniques originated in compiler optimization
Generally viewed as a generalization of operator-applied-to-operands
Abstract Syntax Trees (cont.)
ASTs can be applied to larger constructions than just expressions
In fact, generally reduce entire program or compilation unit to one AST
Abstract Syntax Trees (cont.)
Abstract Syntax Graphs
All data-flow information is obtained by propagating data flow markers through the program.
Data flow problems are solved by propagating markers around a control flow graph (flowchart)
Sample CFG
procedure SQRT (Q, A, B: in float; // node 0
X: out float);
// Compute X = square root of Q,
// given that A <= X <= B
X1, F1, F2, H: float;
begin
X1 := A;
X2 := B; // node 1
F1 := Q - X1**2
H := X2 - X1;
while (ABS(H) >= 0.001) loop // node 2
F2 := Q - X2**2;
H := - F2 * ((X2-X1)/(F2-F1));
X1 := X2; // node 3
X2 := X2 + H;
F1 := F2
end loop;
X := (X1 + X2) / 2.; // node 4
end SQRT; // node 5
A definition di(x) reaches a node nj iff there exists a path from ni to nj on which x is neither defined nor undefined.
What definitions reach the reference to X1 in node 4?
What definitions reach the reference to H in node 2?
The reaching definitions problem can be used to detect anomolous patterns that may reflect errors.
ur anomalies: if an undefinition of a variable reaches a reference of the same variable
dd anomalies: if a definition of a variable reaches a definition of the same variable
du anomalies: if a definition of a variable reaches an undefinition of the same variable
An expression e is available at a node n iff every path from the start of the program to n evaluates e, and iff, after the last evaluation of e on each such path, there are no subsequent definitions or undefinitions to the variables in e.
procedure SQRT (Q, A, B: in float; // node 0
X: out float);
// Compute X = square root of Q,
// given that A <= X <= B
X1, F1, F2, H: float;
begin
X1 := A;
X2 := B; // node 1
F1 := Q - X1**2
H := X2 - X1;
while (ABS(H) >= 0.001) loop // node 2
F2 := Q - X2**2;
H := - F2 * ((X2-X1)/(F2-F1));
X1 := X2; // node 3
X2 := X2 + H;
F1 := F2
end loop;
X := (X1 + X2) / 2.; // node 4
end SQRT; // node 5
Is the expression X2 - X1 available at the start of node 3?
At the end of node 3?
Same questions for Q - X2**2
A variable x is live at node n iff there exists a path starting at n along which x is used without prior redefinition.
In what nodes in H live?
In what nodes is X1 live?
What does this tell you about memory allocation within this function?
Optimization Technique | Data-Flow Information |
---|---|
Constant Propagation | reach |
Copy Propagation | reach |
Elimination of Common Subexpressions | available |
Dead Code Elimination | live, reach |
Register Allocation | live |
Anomaly Detection | reach |
Code Motion | reach |
Perhaps the first such tool to be widely used, lint (1979) became a staple tool for C programmers.
Combines static analysis with style recommendations, e.g.,
data flow anomalies
conditional statements with constant values
potential = versus == confusion
Is there room for lint-like tools?
lint was a response, in part, to the weak capabilities of early C compilers
Much of what lint does is now handled by optimizing compilers
Open source project from U.Md.
Works on compiled Java bytecode
What Bugs does FindBugs Find?
Bugs are also given “priorities” (p1, p2, p3 from high to low)
Works on source code
Sample reports (PMD & CPD)
PMD Reports
Reports provide cross reference to source location
a.k.a. “uncompilers”
Generate source code from object code
But also great tools for plagiarism
Java and Decompilation
Work by a combination of
Challenge is to preserve those names of entry points needed to execute a program or applet or make calls upon a library’s public API
Stripping away debugging information (e.g., source code file names and line numbers associated with blocks of code)
Applying optimization techniques to reduce code size while also confusing the object-to-source mapping
Example, yguard
Not all useful analysis can be done statically
Profiling
Memory leaks, corruption, etc.
Data structure abuse
Abusing Data Structures
In a sense, the assert command of C++ and Java is the language’s own extension mechanism for such checks.
Memory Abuse
How to Catch Pointer Errors
Memory Analysis Tools
Purify is a well-known commercial (pricey) tool
** Sample of Leaktracer Output **
Gathered 8 (8 unique) points of data. (gdb) Allocations: 1 / Size: 36 0x80608e6 is in NullArcableInstance::NullArcableInstance(void) (Machine.cc:40). 39 public: 40 NullArcableInstance() : ArcableInstance(new NullArcable) {} Allocations: 1 / Size: 8 0x8055b02 is in init_types(void) (Type.cc:119). 118 void init_types() { 119 Type::Integer = new IntegerType; Allocations: 1 / Size: 132 (new[]) 0x805f4ab is in Hashtable<NativeCallable, String, false, true>::Hashtable(unsigned int) (ea/h/Hashtable.h:15). 14 Hashtable (uint _size = 32) : size(_size), count(0) { 15 table = new List<E, own> [size];
Profilers provide info on where a program is speding most of its execution time
Profiling Tools
jvisualm for Java, part of the Java SDK
Provides multiple monitoring tools, including both CPU and memory profiling