Testing
Steven Zeil
Abstract
Testing is a critical skill for any software developer, but its treatment in most textbooks belies the complexity of this process.
1 The Testing Process
The diagram here illustrates the steps involved in testing code.
Although you have been doing testing for some time as CS students, most of your testing has probably been informal. Still most of the activities here should be familiar to you.
-
Beginning from an overall test plan (or test specification),
-
we eventually seek to discover a collection of failures
-
These failures become the input to the process of debugging, where we seek to find the faults in the code responsible for those failures.
-
Terminology
-
Failure: An execution on which incorrect behavior occurs
-
Fault: A defect in the code that (may) cause a failure
Related to these, though not part of our testing process:
- Error: A human mistake that results in a fault, or,
-
alternatively, the difference between the expected output and the actual output on a failed test.
-
These terms get abused a lot, but there really is a clear difference among the three.
In informal discussions, we sometimes like to make reference to something that is wrong even if we aren’t quite sure (or just don’t care) whether it is an error, a failure or a fault. I tend to use the words like “bug” or “problem”, or “defect” in those circumstances, as these do not have formal definitions.
1.1 The Steps in a Testing Process
A test plan (more properly, a test specification) describes a set of test cases.
- Test case: a general description of a required test
For any given test case, there may be many possible inputs that would serve. For example, in testing a square root function, we might have a test case “find a square root where the value cannot be represented exactly in floating point”, for which possible inputs would be 2.0, 3.0, 200.0, etc.
With that in mind, the first step in testing is to
-
Derive inputs for each test case.
In most cases, you will also need to record the expected outputs or behavior for your test inputs.
The inputs and expected outputs may be recorded in a database of regression tests for later. But the most obvious use for the new inputs is to…
-
Execute the tests
The test inputs are fed into the program being tested and the actual outputs collected.
-
Determine which tests have failed.
The test inputs, actual outputs obtained from their execution, and the expected outputs are passed on to the testing oracle. The oracle is the person, program, or process used to determine if a test has failed.
-
Pass the failures on for debugging.
The purpose of debugging is to determine the faults i nthe code that are actually responsible for the failures observed during testing.
1.1.1 Oracles
The testing oracle is the person, program, or process used to determine if a test has failed.
The term “oracle” stems from the “Oracle of Delphi”, a priestess of Apollo in ancient Greece who was believed to have the gift of divination, although her pronouncements were famously cryptic. In Computer Science, “oracles” are invoked as models for answering questions that cannot be entirely solved by algorithms.
Testing oracles come in many forms, ranging from automated oracles that we will consider in a later lesson, to the “eyeball oracle” that you probably employ for most of your academic assignments.
The eyeball oracle is notoriously prone to missing failures. We just aren’t very good at looking at thousands of lines of output and picking out the one or two that are incorrect. On the other hand, the eyeball oracle is very good at noticing faults from the early stages of requirements and design that were faithfully translated into code (i.e., the eyeball oracle does more validation than verification.
Another common form of oracle is the “head to head” oracle. If we are developing a system to replace an existing one, then we can run the test inputs through both systems and compare them, usually by simple byte-by-bye comparison (e.g., the diff
command).
This sort of situation (head to head testing) is more common than you might think. Developing new, first-time systems from scratch is a comparatively rare activity. Even if we expect to add some new functions to the new system (or are adding functionality to the existing one), we can use head-to-head for the shared portion of the two.
1.1.2 Regression
The regression log or regression database is a collection of tests and expected outputs from past testing.
It is used, during regression testing, to quickly rerun old tests. Regression databases can quickly grow to thousands or tens of thousands of cases or more. It becomes particularly important that we not rely on the eyeball oracle for evaluating regression tests.
2 Stages of Testing
We recognize several different stages of testing. These differ in scope (how much of the program is involved) and purpose (who conducts the testing and what information do they derive from it).
-
Unit Test: Tests of individual subroutines and modules,
- usually conducted by the programmer.
-
Integration Test: Tests of “subtrees” of the total project hierarchy chart (groups of subroutines calling each other).
- generally a team responsibility.
-
System Test: Test of the entire system,
- supervised by team leaders or by V&V specialists.
- Many companies have independent teams for this purpose.
-
Regression Test: Unit/Integration/System tests that are repeated after a change has been made to the code.
-
Acceptance Test: A test conducted by the customers or their representatives to decide whether to purchase/accept a developed system.
Testing goals
Focusing on the differing purposes of testing, …
-
Unit Test: does it work?
-
Integration Test: does it work?
-
System Test: does it work?
-
Regression Test: has it changed?
-
Acceptance Test: should we pay for it?
Regression testing is particularly interesting. We regression test after a change to make sure we have not inadvertently broken anything else. In fact, we really are looking for unintended effects of our changes.
Regression logs commonly record tests that have been both passed and failed, and we want to be informed of changes in that status.
So, while most testing has possible outcomes “pass” or “fail”, regression testing has outcomes
- Expected pass: this test used to pass, and it still does
- Expected fail: this test used to fail, and it still does
- Unexpected pass: this test used to fail, and but not it passes
-
This can be a concern because, if we did not intend to fix that failure with our most recent changes, it suggests that we now have a fault somewhere in the code that is now hidden — we no longer have a test case revealing the fault, so it’s lurking in there somewhere, just waiting to spring out at some particularly inopportune time.
-
-
Unexpected fail: this test used to pass, and now it fails.
- Sadly, very common. This means that while fixing one bug, we broke something else.
3 Unit Testing
We’re going to spend a lot of time taking about unit testing this semester, so it deserves some special attention now.
By testing modules in isolation from the rest of the system
-
Easier to design and run extensive tests
-
Much easier to debug any failures
-
Errors caught much earlier
Main challenge is how to test in isolation
3.1 Scaffolding
To do Unit tests, we have to provide replacements for parts of the program that we will omit from the test.
-
Scaffolding is any code that we write, not as part of the application, but simply to support the process of Unit and Integration testing.
-
Scaffolding comes in two forms
-
Drivers
-
Stubs
-
3.1.1 Drivers
A driver is test scaffolding that calls the module being tested.
- Often just a simple main program that reads values, uses them to construct ADT values, apply ADT operations and print the results
3.1.2 Stubs
Stubs are replacements for code begin called from the unit under test
- Must match the “real” API
- May need to provide simulated output parameters and return values
4 Integration Testing
Integration testing is testing that combines several modules, but still falls short of exercising the entire program all at once.
-
Integration testing usually combines a small number of modules that call upon one another.
-
Integration testing can be conducted
-
bottom-up
(start by unit-testing the modules that dont’call anything else, then add the modules that call those starting modules and thest the combination, then add the modules that call those, and so on until you are ready to test main().)
- relieves the need for stubs
-
or top-down
(start by unit-testing main() with stubs for everything it calls, then replace those stubs by the real code, but leaving in stubs for anything called from the replacement code, then replacng those stubs, and so on, until you have assembled and tested the entire program).
-
It’s worth noting that unit testing and integration testing can sometimes use some of the same test inputs (and maybe the same expected outputs), because we are testing the software in different configurations.