Refactoring & Regression Testing
Thomas J. Kennedy
1 Looking Back…
In a lecture during Module 3 we discussed a paint example, i.e., Module-03/Painting-2. We would like to refactor the code… and fix a few implementation and style quirks.
2 Adding Tests
Before we refactor the example… let us write a few tests. This is an opportunity to introduce regression testing. We are going to…
-
Develop a set of tests for the current code.
-
Refactor the code.
-
Run the tests and confirm that nothing has gone wrong.
-
Repeat step 2 and step 3.
Let us start by introducing two test files.
├── compute_paint.py
├── estimate_paint.py
└── tests
├── test_compute_paint.py
└── test_estimate_paint.py
We have one test file for compute_paint.py
and one file for estimate_paint.py
.
2.1 Writing the Tests
Let us start with the compute_paint
module. There are two functions that need tests:
def wall_surface_area(length: float, width: float) -> float:
area_one_wall = length * width
area_four_walls = 4 * area_one_wall
and
def gallons_required(
wall_area: float, min_coverage: float = 350, max_coverage: float = 400
) -> tuple[int, int]:
Let us start with wall_surface_area
…
import pytest
from hamcrest import *
from compute_paint import (wall_surface_area, gallons_required)
We need to import:
-
pytest
as our testing framework -
hamcrest
for matchers (to write our checks) -
wall_surface_area
andgallons_required
so that we can use (call/invoke) them
2.2 Introducing Hamcrest Matchers
Each test takes the form of a function with assertions. Instead of the built-in Python assert
or unittest
module’s assertTrue
or assertFalse
… we are using assert_that
.
If you come from a Java background you may be familiar with Hamcrest Matchers. Instead of writing a boolean expression in the form…
assertTrue(num_books == 20)
we would like to write something closer to a sentence…
assert_that(num_books, equal_to(20))
or
assert_that(num_books, is_(equal_to(20)))
The is_
is purely syntactic sugar (i.e., it exists purely for readability).
assert_that(wall_surface_area(10, 12), close_to(480, 1e-1))
assert_that(wall_surface_area(10.5, 10), close_to(420, 1e-1))
2.3 Back to Writing Tests
We know that wall_surface_area
-
takes the length and width of a room
-
returns the total surface area of all four walls
-
uses
4 * length * width
to compute the surface area
Writing tests is a fairly quick endeavor…
def test_wall_surface_area():
assert_that(wall_surface_area(1, 1), equal_to(4))
assert_that(wall_surface_area(1, 2), equal_to(8))
assert_that(wall_surface_area(2, 2), equal_to(16))
However, wall measurements are seldom a nice whole numbers. Walls are usually something closer to 8 feet 5 inches than an even 8 feet. Let us write a couple floating point number checks.
assert_that(wall_surface_area(10, 12), close_to(480, 1e-1))
assert_that(wall_surface_area(10.5, 10), close_to(420, 1e-1))
Take note of how we are checking for a value within a certain tolerance (e.g., a number that is within 0.1
of 480
instead of exactly 480). Keep in mind that 1e-1
is scientific notation for 0.01
.
2.4 Parametrizing the Test
Each check is the same with the exception of the input (length and width) and expected result (total surface area). We can define a three-tuple (3-tuple) in the form…
test_data = [
(1, 1, 4),
(1, 2, 8),
(2, 2, 16),
(10, 12, 480),
(10.5, 10, 420),
]
where the first entry is length, the second entry is width, and the final entry is the expected surface area.
We can then tell pytest
to run the test once for each tuple using…
@pytest.mark.parametrize("length, width, surface_area", test_data)
def test_wall_surface_area(length, width, surface_area):
assert_that(wall_surface_area(length, width), close_to(surface_area, 1e-1))
The first line (@pytest.mark.parametrize(...)
) tells pytest
to how to unpack each tuple. The function itself then needs to be modified to accept three arguments.
We can now get away with a single assertion.
A parametrized test is ideal when you want to rerun the same test for different inputs.
2.5 Testing the Second Function
We still need to test gallons_required
. Let us start with a draft…
def test_gallons_required():
lower, upper = gallons_required(wall_area=1)
assert_that(lower, equal_to(1))
assert_that(upper, equal_to(1))
lower, upper = gallons_required(wall_area=10)
assert_that(lower, equal_to(1))
assert_that(upper, equal_to(1))
lower, upper = gallons_required(wall_area=100)
assert_that(lower, equal_to(1))
assert_that(upper, equal_to(1))
lower, upper = gallons_required(wall_area=1_000)
assert_that(lower, equal_to(3))
assert_that(upper, equal_to(3))
lower, upper = gallons_required(wall_area=1_050)
assert_that(lower, equal_to(3))
assert_that(upper, equal_to(3))
lower, upper = gallons_required(wall_area=1_200)
assert_that(lower, equal_to(3))
assert_that(upper, equal_to(4))
We start each check by invoking gallons_required
and unpacking the returned tuple
into lower
and upper
.
However, we need two checks. Separate checks would make sense if we were dealing with float
values. But… gallons_required
returns a tuple[int,
int]
(i.e., two int
values).
We can simplify our test_gallons_required
function to…
def test_gallons_required():
assert_that(gallons_required(wall_area=1), equal_to((1, 1)))
assert_that(gallons_required(wall_area=10), equal_to((1, 1)))
assert_that(gallons_required(wall_area=100), equal_to((1, 1)))
assert_that(gallons_required(wall_area=1_000), equal_to((3, 3)))
assert_that(gallons_required(wall_area=1_050), equal_to((3, 3)))
assert_that(gallons_required(wall_area=1_200), equal_to((3, 4)))
The trick is recognizing the fact that tuple
s can be compared directly in Python.
3 Testing Strings
The tests for test_estimate_paint.py
will require a different perspective. We will need to keep in mind:
-
get_report
can generate two (2) different reports depending on whethermin_gallons == max_gallons
. -
comparing two entire strings (expected/correct vs actual) is seldom the best approach.
-
the content of the generated string along with the order of that content should
usuallyalmost always form the basis of any test.
3.1 Checking the First Case
Let us start by examining the first case (i.e., min_gallons == max_gallons
).
actual_report = get_report(min_gallons=4, max_gallons=4, price_per_gallon=35.10)
The correct output is known to be…
You will need to buy 4 gallons of paint.
You will spend $ 140.40.
One might start by saying…
-
4
must appear since that is the correct number of gallons.assert_that(actual_report, contains_string("4"))
-
140.40
must appear since that is the correct cost.assert_that(actual_report, contains_string("140.40"))
However, that does not really capture what we expect… we expect to see a 4
and then (later in the report) 140.40
.
assert_that(actual_report, string_contains_in_order("4", "$ 140.40."))
That is a better check. However, we know the content of the report…
assert_that(
actual_report,
string_contains_in_order(
"You will need to buy "
"4",
" gallons of paint.",
"\n",
"You will spend",
"$ 140.40."
)
)
We should (in this case) since the report content is known check for the values that we expect along with the fixed content (including line breaks).
3.2 Checking the Second Case
The second case can be checked with…
def test_get_report_different_min_max():
actual_report = get_report(min_gallons=10, max_gallons=14, price_per_gallon=32.50)
assert_that(
actual_report,
string_contains_in_order(
"You will need to buy "
"10 to 14",
" gallons of paint.",
"\n",
"You will spend",
"$ 325.00 to $ 455.00."
)
)
Take note of how different values were selected. While we could parametrize this test and check for different values… this function (get_report
) does not compute min_gallons
or max_gallons
. The values are passed in. We just need to check the two branches (i.e., if
and else
).
4 Taking Stock
The two complete test files can be found in Module-08/Painting-3-Tests within the tests directory.
The next lecture will start the actual refactoring and code review.