Black-Box Testing

Steven Zeil

Last modified: Dec 26, 2016
Contents:

1 Testing

In this lesson, we explore how to do a thorough job of testing your programs.

If you’ve ever submitted a programming assignment that you thought was Ok and then been surprised to receive a poor grade for it, then it’s a pretty good bet that the instructor had done a more rigorous job of testing your code than you had.

In the professional world, testing and debugging typically take far more time than actually writing the code. They may amount to as much as half the total development time with the requirements analysis, design, and coding taking up the remainder.

As a professional, it will be a mark of your professionalism and a factor in your company’s reputation to catch bugs before the software is released. As a student, it’s a matter of self-interest to do a good job of testing your code if you hope to get good grades.


Why do we test?

1.1 Definitions


Testing and Debugging

In practical terms, we don’t start debugging until we know that something is wrong. How do we find out that something is wrong? Hopefully, we failed one of our own tests. If that’s not how we found out, it’s probably because someone using our software reported a bug. That, of course, makes us look bad.

And if someone else reported the bug, then before we can start debugging we have to figure out how to reproduce the bug, which means we are right back to testing, trying to find test data that reveal the bug so that we can figure out what’s causing it.


Failures, Faults, and Errors


Test Cases and Suites

1.2 Choosing Test Data


Strategies for Choosing Tests

Testing traditionally can be conducted in three styles

We’ll discuss the black-box testing of these today and white-box later this semester.

Random testing is not used as often. It sometimes is useful as a way check against unwarranted assumptions by the developers and testers (“Oh, I never thought of even trying that!”). But its primary use is in large projects where statistics collected during random testing are plugged into models that are used to predict how reliable the software will be if released in its current state or how much longer we will need to test before we reach a desired level of reliability.

2 Black Box Testing


Black-Box Techniques

There are a number of well-known BB strategies.

A good test BB suite will include all of these.

In fact, these should become second nature to most programmers.

2.1 Representative Inputs

Choose at least one test that is a “typical” input to the system when in real use.


Example: the Auction Program

What would we use as “typical” or “representative” inputs to our auction program?

Well, the auction program description contains some sample input, and that’s certainly worth using, but it’s really not clear how representative that’s intended to be.

But there’s also mention of existing auction systems, so we might try to collect some sample sessions from some of those.

2.2 Test Specifications

Software development projects manage testing via a couple of standard documents:

We’ll demonstrate the development of a test specification document, following (roughly) the format of the IEEE 829 Standard for Software and System Test Documentation.


Starting the Auction Program Test Spec

1. Module Overview

The auction program resolves bids received at an online auction site.

1.1 Inputs Three files, one describing the items up for bid on a single day, another describing the registered bidders, a third describing bids received.

1.2 Outputs List of auction winners, written to standard output.

2. Test Data

Representative Input

That last item is our first test case.

2.3 Functional Coverage

This is a bit of a misnomer. the “function” in “Functional Coverage” has nothing to do with the idea of a C++ function. Instead it really pertains more to “functionality”.

If you look though most program requirements, you’ll see that the program is probably supposed to carry out a variety of different behaviors under various circumstances. Sometimes these behaviors are described in terms of input conditions:

For any input $x$, if $x \geq 0$, return $x$ but if $x < 0$, return $-x$.

Here we have clearly drawn a distinction between different behaviors depending upon the input values.

In other cases, distinct behaviors are most easily identified by looking at the output requirements:

This program reads a file name from the input. It should print a single line of output

filename file-size-in-kilobytes

if the file exists but should print

filename does not exist.

if the file cannot be found.

Here we can tell from the distinct output specifications that we have two separate behaviors going on.

Now, from a purely theoretical point of view, distinct behaviors must have both distinct input pattern and distinct output patterns. I could just as easily argue that, in the above example, the two behaviors are associated with the input cases of “the input named an existing file” and “the input named a non-existent file”. But sometimes the people writing the requirements will find it easier to focus on the input conditions and sometimes on the output cases. As a tester (and coder), you have to roll with whatever they give you.

Sometimes, a behavior may be described in purely internal terms:

Apply the formula

x = x - (x*x - N) / 2

up to 1000 times until x changes by no more than $\pm 0.001$. If, after 1000 iterations, x is still changing by more than that, abort the program with message

Calculation does not converge.

In this case, if x is not an input but some value computer internally, it might be quite difficult to figure out what inputs would lead to this “does not converge” behavior. Nonetheless, as a tester, you will want to try to do so.


Functional Coverage

Choose at least one test that covers each distinct “behavior” described in the requirements.

Don’t make the common mistake of that different parts of a program’s calculations or outputs are distinct behaviors. For example, given a requirement

This program reads a file name from the input.

It should print that file name.

If the file exists, it should then print the size of the file (in kilobytes), on the same line.

If the file does not exist, it should print “ does not exist.” on the same line as the file name.

We only have two distinct behaviors here, not three. The printing of the file name is not a distinct behavior from the behavior of printing the file size or the “does not exist” message. It’s simply a component of the behavior in each of the other two cases. How do we know? Because there is no distinct set of inputs on which file names are (and are not) printed.


Example: Auction Functional Coverage

Question:

What are the distinct behaviors associated with the auction program?


Continuing the Auction Program Test Specification

**2. Test Data **

Representative Input

Functional Coverage

  1. Highest bid for some item qualifies to win.

  2. Highest bid for some item is too late but otherwise qualifies to win.

  3. Highest bid for some item is below the reserve price but otherwise qualifies to win.

  4. Highest bid for some item is above the bidder’s account balance but otherwise qualifies to win.

  5. No qualifying bids received for some item

Note: many test cases can be satisfied by a single set of test input run. That’s OK, as long as they don’t intuitively interfere.

For example, by packing lots of auctions into one set of input files, we might well be able to cover all of the above cases with one test run.

Now whether or not that’s a good idea is another matter. If you carry it to an extreme, you wind up with tests that are hard to check. And if something goes wrong and you have to start debugging, you won’t be happy if you have to wade through 99 correctly handled behaviors to get the buggy number 100.

2.4 Boundary Values Testing

A.k.a., Extremal Values Testing

Choose as test data the largest and the smallest values for each input and for each “functional behavior” range


Seeking Boundaries


Sample Boundary Tests

Suppose that we are testing an implementation of the absolute value function abs(x)

The test cases $0$ and $-1$ are useful because they give us a chance of detecting common errors like using the wrong relational operator.

Notice that, if we have done a thorough job of listing the functional cases for a program, then we will get the “largest and smallest possible inputs” cases automatically when we list out the largest and smallest cases for each distinct behavior.


Example: Auction Extremal Values

We had functional cases


Example: Auction Extremal Values (cont.)

We also have some explicit limits on the legal inputs, e.g.:

This leads to some additional boundary cases:


Continuing the Auction Program Test Specification

2. Test Data

Representative Input

Functional Coverage

  1. Highest bid for some item qualifies to win.

  2. Highest bid for some item is too late but otherwise qualifies to win.

  3. Highest bid for some item is below the reserve price but otherwise qualifies to win.

  4. Highest bid for some item is above the bidder’s account balance but otherwise qualifies to win.

  5. No qualifying bids received for some item

Boundary Values

  1. Winning bid arrives at same second as auction ends

  2. Winning bid arrives one second after auction ends

  3. Winning Bid exactly matches reserve price

  4. Winning Bid one cent below reserve price

  5. Winning Bid exactly matches bidder’s account balance

  6. Winning Bid one cent above bidder’s account balance

  7. Number of items is zero

  8. Number of items is large

  9. Reserve price is zero

  10. Reserve price is large

  11. Auction end time 00:00:00

  12. Auction end time 23:59:59

  13. # bidders is zero

  14. # bidders is large

2.5 Special Values Testing

Choose as test data those certain data values that just tend to cause trouble. Programmers eventually develop a sense for these. They include


What is Special?

Special values and Boundary values often overlap. That’s OK.

One of my favorite stories about bugs that (should have) been caught by boundary and special values testing comes from early testing of the software controllers for the F-16 jet fighter. During fight simulations (thankfully!), one pilot decided to fly the simulated plane across the equator. A sign error in handling southern latitudes caused the control software to decide that the plane was upside down, and to attempt to roll the plane over to compensate.


Example: Auction Special Cases

Special Values

  1. Number of items is zero
  • Number of items is one

  • Item name has no alphabetic characters

  • Item reserve price is zero
  • Auction ends at midnight
  • Auction ends at noon

  • Number of bidders is zero
  • Number of bidders is one

  • Bidder's balance is zero
  • Bidder’s name has no alphabetic characters

  • Number of bids is zero
  • Number of bids is one

  • Amount bid is zero
  • Time of bid is midnight

  • Time of bid is noon

2.6 Completed Test Specification

After eliminating duplicates from the preceding list,

Test Specification Auction Program

1. Module Overview

The auction program resolves bids received at an online auction site.

1.1 Inputs Three files, one describing the items up for bid on a single day, another describing the registered bidders, a third describing bids received.

1.2 Outputs List of auction winners, written to standard output.

2. Test Data

Representative Input

  1. At least one typical input as described in the requirements document.

Functional Coverage

  1. Highest bid for some item qualifies to win.

Boundary Values

  1. Winning bid arrives at same second as auction ends

Special Values

  1. Number of items is one

3 Illegal Inputs

Earlier, we observed that the input requirement:

#items is non-negative

led to a boundary value test case:

I did not add a test case

If we have an input requirement

#items is non-negative

then the input #items == -1 is illegal.

Notice that I did not say “should not”. I said “cannot”. It can’t be done.

Therefore …

You may have been told something differently by other teachers or read something different elsewhere.

I don’t care.

They are either wrong, or you misunderstood what they were talking about.

There are people who make statements like “a good program never crashes” or “a good program should always behave sensibly on any input”. Setting aside the fact that the first is impossible and the second relies on a definition of “sensible” that no two people will ever agree upon, these kinds of statements are really aimed at how to write a good requirements statement for programs. They have nothing at all to do with how we code programs or how we test them.

3.1 Why CAN’T we test illegal inputs?

On an illegal test input, no matter the program actually does, it passes the test by definition.

3.2 Illegal versus Unusual Inputs

Don’t confuse illegal inputs with unusual inputs

3.3 How do you test a program on …

4 Some General Guidelines for Testing

4.1 Choosing the Test Data

Within the limits of the test case specification,

4.2 Conducting Tests

Many of these are covered in the earlier labs.