Last modified: Apr 07, 2014
Testing Process Components
Oracle: The process, person, and/or program that determines if test output is correct
Why do we test?
To show that the program works
Dijkstra: “Testing can show the presence of errors, but never their absence.”
Choosing Test Data
From these motives come different approaches to choosing test data
Representative testing * tests designed to reflect the frequence of user inputs. * Used for reliability estimation.
Choose data that is representative of the way the end users will exercise the software.
Producing Representative Tests
Representative Testing
Sample Op Profile
Input Category | Percentage | |
---|---|---|
Transaction Proc. | 85% | |
Balancing | 14% | |
Year-end Report | 1% |
Sample Op Profile
Input Category | Percentage | |
---|---|---|
Transaction Proc. | 85% | |
New account | 7% | |
Close account | 3% | |
Debits | 70% | |
Credits | 20% | |
Balancing | 14% | |
Year-end Report | 1% |
Sample Op Profile
Input Category | Percentage | ||
---|---|---|---|
Transaction Proc. | 85% | ||
New account | 7% | ||
new | 90% | ||
already exists | 10% | ||
Close account | 3% | ||
non-existent | 15% | ||
exists | 85% | ||
Debits | 70% | ||
non-existent | 25% | ||
exists | 75% | ||
Credits | 20% | ||
non-existent | 25% | ||
exists | 75% | ||
Balancing | 14% | ||
Year-end Report | 1% |
Representative testing difficulties
Growth model is a mathematical model of the system reliability change as it is tested and faults are removed
Used as a means of reliability prediction by extrapolating from current data
Reliability modeling procedure
Determine operational profile of the software
Generate a set of test data corresponding to the profile
Apply tests, measuring amount of execution time between each failure
After a statistically valid number of tests have been executed, reliability can be measured
Reliability metrics
Reliability metrics
Reliability metrics
Reliability measurement
Time units
Time units in reliability measurement must be carefully selected. Not the same for all systems
Raw execution time (for non-stop systems)
Calendar time (for systems which have a regular usage pattern e.g. systems which are always run once per day)
Number of transactions (for systems which are used on demand)
Assumptions:
Software contains \(N\) faults (\(N\) is unknown)
Each fault manifests (causes a failure) at rate \(\phi\)
Faults manifest independently
Faults are fixed perfectly, without introducing new ones
If we have repaired \(i\) faults, the program’s failure rate \(\lambda\) is \[ \lambda_{i} = (N-i)\phi \]
Observed reliability growth
Simple equal-step model but does not reflect reality
Reliability does not necessarily increase with change as the change can introduce new faults
The rate of reliability growth tends to slow down with time as frequently occurring faults are discovered and removed from the software
Musa Logarithmic Poisson Model
Assumptions:
Software can never be completely free of faults.
Faults manifest independently
Faults are found in decreasing order of failure rate.
The program failure rate before repairing any faults is \(\lambda_{0}\)
Faults are fixed perfectly, without introducing new ones
If we have repaired \(i\) faults, the program’s failure rate \(\lambda\) is \[ \lambda_{i} = \lambda_{0} e^{-\theta i} \]
Fitting Example
Choose tests designed to reveal
many faults
as quickly as possible
Choosing Good Test Data
Techniques for selecting directed test data are generally termed either
Black-box (a.k.a. specification-based) testing chooses tests without consulting the implementation.
Functional Coverage
a.k.a Equivalence partitioning
Large, structured projects place emphasis on tracking requirements to functional test cases
Boundary Values Testing
a.k.a., Extremal Values Testing
Special Values Testing
Choose as test data those certain data values that just tend to cause trouble.
Programmers eventually develop a sense for these. They include
For integers: –1, 0, 1
For floating point numbers: -e, 0, +e, where “e” is a very small number
For strings: the empty string, strings containing only blanks, strings containing no alphabetic characters
What is “Special”?
Special values and Boundary values often overlap
(F14 example)
White-Box (a.k.a. Implementation-based testing) uses information from the implementation to choose tests.
Common forms:
Structural Testing (a.k.a., “path testing” (not per your text)
Designate a set of paths through the program that must be exercised during testing.
Mutation testing
Perturbation testing
Require that every statement in the code be executed at least once during testing.
Special programs (“software tools”) will monitor this requirement for you.
Example
cin >> x >> y;
while (x > y)
{
if (x > 0)
cout << x;
x = f(x, y);
}
cout << x;
What kinds of tests are required for statement coverage?
Requires that every “branch” in the flowchart be tested at least once
if X < 0 then
X := -X;
Y := sqrt(X);
Branch Coverage example
cin >> x >> y;
while (x > y)
{
if (x > 0)
cout << x;
x = f(x, y);
}
cout << x;
What kinds of tests are required for branch coverage?
a.k.a., Condition coverage
Various approaches to coping with boolean expressions, particularly short-circuited ones.
Goal: given a boolean expression \( a \oplus b \) , where \( \oplus \) could be & , && , |, etc., need at least one test where
For example, for the expression a & b, we would need the combinations
a | b |
---|---|
true | true |
false | true |
true | false |
(a.k.a “independent path coverage”, “path testing”)
The latter term (used in your text) should be discouraged as it is both vague and means something entirely different to most of the testing community
A Control Flow Graph
The number of independent paths in a program can be discovered by computing the cyclomatic complexity (McCabe, 1976)
\[CC(G) = \mbox{Number}(\mbox{edges}) - \mbox{Number}(\mbox{nodes}) + 1\]
This is a popular metric for module complexity.
Actually pretty trivial: for structured programs with only binary decision constructs, equals number of conditional statements +1
Uniqueness
Sets of independent paths are not unique, nor is their size:
Attempts to test significant combinations of branches.
Any stmt i where a variable X may be assigned a new value is called a definition of X at i: def(X,i)
Any stmt i where a variable X may be used/retrieved is called a reference or use of X at i: ref(X,i)
def-clear
all-defs
The all-defs criterion requires that each definition def(X,i) be tested some def-clear path to some reference ref(X,j).
1: cin >> x >> y; d(x,1) d(y,1) 2: while (x > y) r(x,2), r(y,2) 3: { 4: if (x > 0) r(x,4) 5: cout << x; r(x,5) 6: x = f(x, y); r(x,6), r(y,6), d(x,6) 7: } 8: cout << x; r(x,8)
What kinds of tests are required for all-defs coverage?
all-uses
The all-uses criterion requires that each pair (def(X,i), ref(X,j)) be tested using some def-clear path from i to j.
1: cin >> x >> y; d(x,1) d(y,1) 2: while (x > y) r(x,2), r(y,2) 3: { 4: if (x > 0) r(x,4) 5: cout << x; r(x,5) 6: x = f(x, y); r(x,6), r(y,6), d(x,6) 7: } 8: cout << x; r(x,8)
What kinds of tests are required for all-uses coverage?
Given a program \(P\),
Form a set of mutant programs that differ from \(P\) by some single change
but can be almost any single small change.
Mutation Testing (cont.)
Run \(P\) and each mutant \(P_i\) on a previously chosen set of tests
Mutation Testing (cont.)
A set of test data is considered inadequate if it cannot distinguish between the program as written (\(P\)) and programs that differ from it by only a simple change.
Mutation Testing Problems
Some mutants are actually equivalent to the original program:
⋮ ⋮
X = Y; X = Y;
if (X > 0) then if (Y > 0) then}
⋮ ⋮
Perturbation testing (Zeil) treats each arithmetic expression \( f(\bar{x}) \) in the code as if it had been modified by the addition of an error term \( f(\bar{v}) + e(\bar{v}) \) , where \( v \) are the program variables and \( e \) can be a polynomial of arbitrarily high degree (can approximate almost any error)
Monitor the variable values actually encountered during testing
Most literature on reliability models assumes that it can only be done with representative testing, because
Directed tests’ time-to-failure is unrelated to operational time-to-failure
Directed tests may find faults “out of order”
Order Statistic Model
Zeil & Mitchell (1996) presented a model for reliability growth under either representative or directed testing.
Assumptions:
Software contains \(N\) faults, whose failure rates are described by a distribution \(F\).
Faults manifest independently
The test process is biased towards finding faults with higher failure rates.
Measurement Process
Fault failure rates are measured when the fault has been identified and corrected.
Fault failure rate data is then sorted by failure rate.
Fitting Example