Automating the Testing Oracle

Steven J Zeil

Last modified: Oct 20, 2020
Contents:

 

Earlier we introduced this model of the basic testing process.

In this lesson, we turn our attention to the oracle, the process for determining whether a test has failed.

We will argue that the economics of testing provide a powerful incentive to automate the oracle. Traditionally, this has been done by capture-and-examine, capturing outputs and then using a separate automated process to examine and pass judgement upon those outputs.

Current practice, however, emphasizes self-checking tests, drivers that both feed input to the code under test and immediately evaluating its responses. Self-checking tests are supported by a *Unit framework. We will look at popular frameworks for both Java and C++.

1 Automating the Oracle

Why Automate the Oracle?

The best way to be sure programmers rerun the tests on a regular basis is to make the test run part of the regular build process (e.g., build the test runs into the project make file) and to make them self-checking.

1.1 How can oracles be automated?

Output Capture

If we are doing systems/regression tests, the first step towards automation is to capture the output:

If a program produces output files, one can self-check by creating a file representing the expected correct output, then running the program to get the actual output file and using a simple comparison utility like the Unix diff or cmp commands to see if the actual output is identical to the expected output.

For system-level regression tests, this is even simpler. Once we have a program that passes our system tests, we run it on those tests and save the outputs. Those become the expected output files for later regression testing. (Remember, the point of regression testing is to determine if any behavior has changed due to recent updates to the code.)

If the program updates a database, it may be possible to capture entire databases in a similar fashion. Alternatively, we write database queries to check for changes in the records most likely to have been affected by a test.

On the other hand, if the program’s main function is to present information on a screen, self-checking is very difficult. Screen captures are often not much use, because we are unlikely to want to deal with changes where, say one window is a pixel wider or a pixel to the left of where it had been in a prior test. Self-checking tests for programs like this either require extremely fine control over all possible interactive inputs and graphics device characteristics, or they require a careful “design for testability” to record, in a testing log file, information about what is being rendered on the screen. (We’ll revisit this idea later in the semester when we discuss the MVC pattern for designing user interfaces.)

Output Capture and Drivers

At the unit and integration test level, we are testing functions and ADT member functions that most often produce data, not files, as their output. That data could be of any type.

How can we capture that output in a fashion that allows automated examination?

1.2 Examining Output

1.2.1 File Tools

Alternatives

Custom oracles

1.2.2 expect

expect is a shell for testing interactive programs.

Key expect Commands

Sample Expect Script

Log in to other machine and ignore “authenticity” warnings.

#!/usr/local/bin/expect
set timeout 60
spawn ssh $argv
while {1} {
  expect {

    eof                          {break}
    "The authenticity of host"   {send "yes\r"}
    "password:"                  {send "$password\r"}
    "$argv"                      {break} # assume machine name is in prompt
  }
}
interact
close $spawn_id

Expect: Testing a program

    puts "in test0: $programdir/testsets\n"
catch {
   spawn $programdir/testsets                          ➀

   expect \                                           ➁
    "RESULT: 0" {fail "testsets"} \                   ➂
    "missing expected element" {fail "testsets"} \
    "contains unexpected element" {fail "testsets"} \
    "does not match" {fail "testsets"} \
    "but not destroyed" {fail "testsets"} \
    {RESULT: 1} \{pass "testsets"} \                 ➃
    eof {fail "testsets"; puts "eofbsl nl"} \
    timeout {fail "testsets"; puts "timeout\n"}
}
catch {
    close
    wait
}

1.3 Limitations of Capture-and-Examine Tests

Structured Output

For unit/integration test, output is often a data structure.


Repository Output

For system and high-level unit/integration tests, output may be updates to a database or other repository.


Graphics Output

2 Self-Checking Unit & Integration Tests

In testing an ADT, we are not testing an individual function, but a collection of related functions. In some ways that makes thing easier, because we can use many of these functions to help test one another.

2.1 First Cut at a Self-Checking Test

Suppose you were testing a SetOfInteger ADT and had to test the add function in isolation, you would need to know how the data was stored in the set and would have to write code to search that storage for a value that you had just added in your test. E.g.,

void testAdd (SetOfInteger aSet)
{
   aSet.add (23);
   bool found = false;
   for (int i = 0; i < aSet.numMembers && !found; ++i)
      found = (aSet[i] == 23);
   assert(found);
}

2.1.1 What’s Good and Bad About This?

void testAdd (SetOfInteger aSet)
{
   aSet.add (23);
   bool found = false;
   for (int i = 0; i < aSet.numMembers && !found; ++i)
      found = (aSet.data[i] == 23);
   assert(found);
}

2.2 Better Idea: Test the Public Functions Against Each Other

On the other hand, if you are testing the add and the contains function, you could use the second function to check the results of the first:

void testAdd (SetOfInteger aSet)
{
  aSet.add (23);
  assert (aSet.contains(23));
}

Not only is this code simpler than writing your own search function as part of the test driver, it continues to work even if the data structure used to implement the ADT should be changed. What’s more, it is, in a sense, a more thorough test, since it tests two functions at once. Finally, there’s the simple fact that the test with the explicit loop probably won’t even compile, since it refers directly to data members that are almost certainly private.

In a sense, we have made a transition from white-box towards black-box testing. The new test case deliberately ignores the underlying structure.

2.2.1 Idiom: Preserve and Compare

void testAdd (SetOfInteger startingSet)
{
  SetOfInteger aSet = startingSet;
  aSet.add (23);
  assert (aSet.contains(23));
  if (startingSet.contains(23))
     assert (aSet.size() == startingSet.size());
  else
     assert (aSet.size() == startingSet.size() + 1);
}

Note that we

2.2.2 More Thorough Tests

You can see the usefulness of “preserve and comapre” in this more thorough test.

void testAdd (SetOfInteger aSet)
{
  for (int i = 0; i < 1000; ++i)
    {
     int x = rand() % 500;
     bool alreadyContained = aSet.contains(x);
     int oldSize = aSet.size();
     aSet.add (23);
     assert (aSet.contains(x));
     if (alreadyContained)
       assert (aSet.size() == oldSize);
     else
       assert (aSet.size() == oldSize + 1);
    }
}

2.3 assert() might not be quite what we want

Our use of assert() in these examples has mixed results

All of what we have shown in this lesson is a lot of work, but represents what programmers did on a regular basis until unit testing frameworks became popular. These frameworks are the subject of the next lesson/