Choosing Tests

Steven Zeil

Last modified: Dec 21, 2019
Contents:

Abstract

Testing starts with the design of test cases that express our strategy for testing our code. Where do these come from?

Testing strategies are generally divided into three categories:

We will examine each of these possibilities.

1 Types of testing

We can differentiate testing strategies by overall goal:


Choosing Test Data

From these motives come different approaches to choosing test data

2 Representative Testing

Choose data that is representative of the way the end users will exercise the software.


Producing Representative Tests

Test data can be obtained via


Representative Testing

2.1 The Operational Profile

The operational profile is a description of the probability distribution of the input.

It describes how often, during operational program use, certain “kinds” of inputs will be seen.


Sample Op Profile

Input Category Percentage
Transaction Proc. 85%
Balancing 14%
Year-end Report 1%

For an accounting program, we might start with an observation that past activities have broken down like this.


Sample Op Profile

Input Category Percentage
Transaction Proc. 85%
New account 7%
Close account 3%
Debits 70%
Credits 20%
Balancing 14%
Year-end Report 1%

Sample Op Profile

Input Category Percentage
Transaction Proc. 85%
New account 7%
new 90%
already exists 10%
Close account 3%
non-existent 15%
exists 85%
Debits 70%
non-existent 25%
exists 75%
Credits 20%
non-existent 25%
exists 75%
Balancing 14%
Year-end Report 1%

The breakdown of transactions into cases bases on whether the account is new (non-existent) versus already existing suggests something of an answer to the earlier discussion of randomly generating customer names and asking how often they should repeat. By explicitly measuring how often a new transaction involves an existing account, we know whether to randomly generate a new name or to randomly select from among already-generated ones.


Representative testing difficulties

2.2 Reliability Growth Models


Reliability modeling procedure


Reliability metrics

Some common metrics1 that come out of these statistical models are:

2.2.1 Collecting data for reliability measurement


Time units

2.2.2 Jelinski-Moranda Model

Assumptions:

If we have repaired \(i\) faults, the program’s failure rate \(\lambda\) is \[ \lambda_{i} = (N-i)\phi \]


Observed reliability growth

2.2.3 Musa Logarithmic Poisson Model

Assumptions:

If we have repaired \(i\) faults, the program’s failure rate \(\lambda\) is \[ \lambda_{i} = \lambda_{0} e^{-\theta i} \]


Fitting Example

3 Directed Testing

Choose tests designed to reveal


Choosing Good Test Data

Techniques for selecting directed test data are generally termed either

3.1 Black-Box Testing

Black-box (a.k.a. specification-based) testing chooses tests without consulting the implementation.

One of the goals of black-box testing is to be sure that every distinct behavior expected of a unit has been triggered in at least one test. Another is to try to choose tests that are likely to cause trouble, no matter what the actual algorithm is.

Some of the best-known techniques for choosing black-box tests focus on the input values that will be supplied to the unit during testing.


Functional Coverage

a.k.a Equivalence partitioning


Boundary Values Testing

a.k.a., Extremal Values Testing


Special Values Testing

Choose as test data those certain data values that just tend to cause trouble.

Programmers eventually develop a sense for these. They include


What is “Special”?


3.2 White-Box Testing

White-Box (a.k.a. Implementation-based testing) uses information from the implementation to choose tests.

Common forms:

3.2.1 Statement Coverage

Require that every statement in the code be executed at least once during testing.

Special programs (“software tools”) will monitor this requirement for you.


Example

cin >> x >> y;
while (x > y)
  {
   if (x > 0)
      cout << x;
   x = f(x, y);
  }
cout << x;

What kinds of tests are required for statement coverage?

3.2.2 Branch Coverage

Requires that every “branch” in the flowchart be tested at least once

  if X < 0 then
    X := -X;
  Y := sqrt(X);

Branch Coverage example

cin >> x >> y;
while (x > y)
  {
   if (x > 0)
      cout << x;
   x = f(x, y);
  }
cout << x;

What kinds of tests are required for branch coverage?

3.2.3 Cyclomatic Coverage

(a.k.a “independent path coverage”, “path testing”)


A Control Flow Graph

 
What are the independent paths?

3.2.4 Cyclomatic Complexity

The number of independent paths in a program can be discovered by computing the cyclomatic complexity (McCabe, 1976)

\[CC(G) = \mbox{Number}(\mbox{edges}) - \mbox{Number}(\mbox{nodes}) + 1\]


Uniqueness

 

Sets of independent paths are not unique, nor is their size.

3.2.5 Data-Flow Coverage

Attempts to test significant combinations of branches.


def-clear


all-defs

The all-defs criterion requires that each definition def(X,i) be tested some def-clear path to some reference ref(X,j).

1:  cin >> x >> y;     d(x,1) d(y,1)
2:  while (x > y)      r(x,2), r(y,2)
3:    {
4:     if (x > 0)      r(x,4)
5:        cout << x;   r(x,5)
6:     x = f(x, y);    r(x,6), r(y,6), d(x,6)
7:    }
8:  cout << x;         r(x,8)

What kinds of tests are required for all-defs coverage?


all-uses

The all-uses criterion requires that each pair (def(X,i), ref(X,j)) be tested using some def-clear path from i to j.

1:  cin >> x >> y;     d(x,1) d(y,1)
2:  while (x > y)      r(x,2), r(y,2)
3:    {
4:     if (x > 0)      r(x,4)
5:        cout << x;   r(x,5)
6:     x = f(x, y);    r(x,6), r(y,6), d(x,6)
7:    }
8:  cout << x;         r(x,8)

What kinds of tests are required for all-uses coverage?

3.2.6 Mutation Testing

Given a program \(P\),


Mutation Testing (cont.)


Mutation Testing (cont.)

A set of test data is considered inadequate if it cannot distinguish between the program as written (\(P\)) and programs that differ from it by only a simple change.


Mutation Testing Problems

      ⋮                    ⋮
    X = Y;               X = Y;
    if (X > 0) then      if (Y > 0) then}
      ⋮                    ⋮

3.2.7 Perturbation Testing

Perturbation testing (Zeil) treats each arithmetic expression \(f(\bar{x})\) in the code as if it had been modified by the addition of an error term \(f(\bar{v}) + e(\bar{v})\), where \(v\) are the program variables and \(e\) can be a polynomial of arbitrarily high degree (can approximate almost any error)

3.3 Reliability Modeling with Directed Tests

Most literature on reliability models assumes that it can only be done with representative testing, because


Order Statistic Model

Zeil & Mitchell (1996) presented a model for reliability growth under either representative or directed testing.

Assumptions:


Measurement Process


Fitting Example

 

This plot shows an alternative scenario in which the testers started by using representative testing, but once the intervals between failures (and, therefore, between fixes) became lengthy, switched to directed testing to accelerate the process of actually finding and fixing bugs. The Order Statistic model is able, after a period of adjustment to the new testing approach, to model (predict) the severity of the remaining bugs.


1: For some reason, in this field people like to talk about “metrics” rather than the entirely equivalent but less impressive sounding “measures”.