NumPy & Multi-Dimension Arrays - Refactoring
Thomas J. Kennedy
1 Where Did We Leave Off?
In the previous lecture we implemented a matrix solver using NumPy. We left off with a discussion of the main
function.
def main():
# Set up input data points, X, Y, and XT
points = [(0., 0.), (1., 1.), (2., 4.)]
# Set up X, Y, and XT matrices
matrix_X = np.array([[1., 0., 0.],
[1., 1., 1.],
[1., 2., 4.]])
matrix_Y = np.array([0,
1,
4])
matrix_XT = matrix_X.transpose()
# Compute XTX and XTY
matrix_XTX = np.matmul(matrix_XT, matrix_X)
matrix_XTY = np.matmul(matrix_XT, matrix_Y)
print_matrices(matrix_XTX, matrix_XTY)
print()
print("{:-^40}".format("Solution"))
solution = solve_matrix(matrix_XTX, matrix_XTY)
print(solution)
2 Where to Start?
I would like to start with points
, matrix_X
, and matrix_Y
. The convention here is not truly snake case. However, the X
and Y
are the names used in mathematical notations. The compromise you see here is to prefix them both with matrix_
.
So… the names will be left alone. However, points
is an issue. While the points captured in the list of tuples do serve as the source of our data… they are never actually used. We can fix that by changing
# Set up input data points, X, Y, and XT
points = [(0., 0.), (1., 1.), (2., 4.)]
# Set up X, Y, and XT matrices
matrix_X = np.array([[1., 0., 0.],
[1., 1., 1.],
[1., 2., 4.]])
to
points = [(0.0, 0.0), (1.0, 1.0), (2.0, 4.0)]
x_values = [x for x, _ in points]
x_values = np.array(x_values)
x_values_squared = x_values ** 2
matrix_X = np.column_stack((np.ones(len(points)), x_values, x_values_squared))
The first change was to each point. I believe that .0
is more natural than just a decimal point.
The first real change was to how matrix_X
is defined:
-
x_values
starts of a list comprehension to grab just thex
value of each point. -
x_values
is turned into annp.array
-
x_values_squared
uses NumPy’s broadcast functionality to square each entry. -
np.ones(len(points))
creates annp.array
with three (3) ones (which matches the number of points). -
np.ones(len(points))
,x_values
. andx_values_squared
are placed into a tuple -
The tuple is passed to
cp.column_stack
which uses the three arrays as column zero, column one, and column two, respectively.
The Y
matrix is a one liner.
matrix_Y = np.array([y for _, y in points])
This time a list comprehension was used to grab the y value of each point.
We could factor this setup out into a get_matrices
function. However, we do want to refactor more than just main
.
2.1 The main
Function So Far
Let us take a moment to examine main
with the changes made thus far.
def main():
points = [(0.0, 0.0), (1.0, 1.0), (2.0, 4.0)]
# Compute X
x_values = [x for x, _ in points]
x_values = np.array(x_values)
x_values_squared = x_values ** 2
matrix_X = np.column_stack((np.ones(len(points)), x_values, x_values_squared))
# Compute Y
matrix_Y = np.array([y for _, y in points])
matrix_XT = matrix_X.transpose()
# Compute XTX and XTY
matrix_XTX = np.matmul(matrix_XT, matrix_X)
matrix_XTY = np.matmul(matrix_XT, matrix_Y)
print_matrices(matrix_XTX, matrix_XTY)
print()
print("{:-^40}".format("Solution"))
solution = solve_matrix(matrix_XTX, matrix_XTY)
print(np.polynomial.Polynomial(solution))
Make note of the last line. The use of np.polynomial.Polynomial
was mentioned at the end of the previous lecture.
2.2 Merging the Matrices
While there is quite a bit of refactoring to perform in general. I would like to rewrite the code to work with a single augmented matrix (that is one where matrix_XTY
is added as the last column of matrix_XTX
).
matrix_XTY = matrix_XTY.reshape(matrix_XTX.shape[0], 1)
matrix_augmented = np.hstack((matrix_XTX, matrix_XTY))
These two lines are tricky. We want to take matrix_XTY
which contains three float
s…
[ 5. 9. 17.]
and turn it into a list
of list
s that each contain one float
…
[[ 5.]
[ 9.]
[17.]]
The reshape line takes care of this by getting the number of rows in matrix_XTX
and reshaping matrix_XTY
from one row of three values into three rows which each contain one (1) value.
The second line uses np.hstack
to append the corresponding entries to each row of matrix_XTX
. This yields our final main
(for now)…
def main():
points = [(0.0, 0.0), (1.0, 1.0), (2.0, 4.0)]
# Compute X
x_values = [x for x, _ in points]
x_values = np.array(x_values)
x_values_squared = x_values ** 2
matrix_X = np.column_stack((np.ones(len(points)), x_values, x_values_squared))
# Compute Y
matrix_Y = np.array([y for _, y in points])
matrix_XT = matrix_X.transpose()
# Compute XTX and XTY
matrix_XTX = np.matmul(matrix_XT, matrix_X)
matrix_XTY = np.matmul(matrix_XT, matrix_Y)
print_matrices(matrix_XTX, matrix_XTY)
matrix_XTY = matrix_XTY.reshape(matrix_XTX.shape[0], 1)
matrix_augmented = np.hstack((matrix_XTX, matrix_XTY))
print(matrix_augmented)
solution = solve_matrix(matrix_augmented)
print()
print("{:-^40}".format("Solution"))
print(np.polynomial.Polynomial(solution))
Of course… now we need to rewrite solve_matrix
to work with a single augmented matrix.
3 Updating the Solver
Now… this is the fun part. The solve_matrix
function needs be rewritten to work with an augmented matrix. That means…
def solve_matrix(matrix_XTX, matrix_XTY):
"""
Solve a matrix and return the resulting solution vector
"""
# Get the dimensions (shape) of the XTX matrix
num_rows, num_columns = matrix_XTX.shape
needs have three immediate changes made.
-
The function must accept a single augmented matrix.
def solve_matrix(matrix_augmented: np.array) -> np.array:
Take note of the addition of type hints. The solve function both accepts an
np.array
and returns annp.array
. -
The docstring must be rewritten.
""" Solve a matrix and return the resulting solution vector Args: matrix_augmented: an n-by-n matrix with a vector augmented in the right-most column Returns: constants c_0, c_1, c_2, ... c_n depending on the number of rows in the supplied matrix """
-
We need to retrieve the number of rows in the matrix, but we can ignore the number of columns.
# Get the number of rows in the matrix num_rows, _ = matrix_augmented.shape
3.1 The Solver Loop
Let us rewrite the solver loop to use current_row_idx
instead of i
as the loop counter.
for current_row_idx in range(0, num_rows):
# Find column with largest entry
largest_idx = find_largest_row_by_col(matrix_augmented, current_row_idx)
# Swap
current_col = current_row_idx
swap_rows(matrix_augmented, largest_idx, current_col)
# Scale
scale_row(matrix_augmented, current_row_idx)
# Eliminate
eliminate(matrix_augmented, current_row_idx)
_backsolve(matrix_augmented)
I also took the opportunity to update each function call within the loop. Note that the functions invoked (called) within the loop still need to be updated. This change temporarily breaks the code.
3.1.1 Updating swap_rows
When we run the code (in its current form) the following error is generated:
TypeError: swap_rows() missing 1 required positional argument: 'i'
While updating swap rows could be our next step… I want to look at the preceding lines of code…
for current_row_idx in range(0, num_rows):
# Find column with largest entry
largest_idx = find_largest_row_by_col(matrix_augmented, current_row_idx)
# Swap
current_col = current_row_idx
swap_rows(matrix_augmented, largest_idx, current_col)
These first four lines (not including the blank line) are really one operation. Both largest_idx
and current_col
are used exclusively as arguments to swap_rows
.
Let us combine these operations into a single new function named swap_current_row_with_largest_row
.
def swap_current_row_with_largest_row(matrix: np.array, current_idx: int) -> None:
"""
Find the row (starting with the current row) with the largest entry in the
column with the same index as the current row (e.g., matrix[1,1]). Consider
only rows below the current one.
Args:
matrix: augmented matrix to update
current_idx: current row (and column) index
"""
Let us make a point of writing pydoc documentation and type hints for any new or updated functions.
The first part of the function will be an updated version of the find largest column entry logic.
num_rows, _ = matrix.shape
# Find the row with the largest column entry
row_idx = current_idx
largest_idx = row_idx
current_col = current_idx
for j in range(row_idx + 1, num_rows):
if matrix[largest_idx, row_idx] < matrix[j, current_col]:
largest_idx = j
For the moment… we will stick with a loop. In a future tweak… this loop will be replace with some NumPy magic. The next step is to add the swap logic.
# If the current row is not the largest row then swap
if largest_idx != current_idx:
matrix[[current_idx, largest_idx], :] = matrix[[largest_idx, current_idx], :]
With those changes… our solver loop no longer requires comments… since the function calls are now self documenting.
for current_row_idx in range(0, num_rows):
swap_current_row_with_largest_row(matrix_augmented, current_row_idx)
scale_row(matrix_augmented, current_row_idx)
eliminate(matrix_augmented, current_row_idx)
3.1.2 Updating scale_row
The scale_row
method can be updated fairly quickly.
def scale_row(matrix: np.array, current_row_idx: int) -> None:
"""
Scale every entry of the current row by the value of the corresponding
column (e.g., matrix[2,2])
"""
scaling_factor = matrix[current_row_idx, current_row_idx]
matrix[current_row_idx, :] /= scaling_factor
The function itself (barring the renamed variables) is same as before, but with a single scale operation instead of two.
3.1.3 And Now… eliminate
The eliminate
function can now be written as…
def eliminate(matrix: np.array, current_row_idx: int) -> None:
"""
Subract multiples of the current rows from all rows below it. Once this
function completes all rows below this one will contain zero in the
"current_row_idx" column
"""
num_rows, _ = matrix.shape
for row_i in range(current_row_idx + 1, num_rows):
scaling_factor = matrix[row_i][current_row_idx]
matrix[row_i] = matrix[row_i] - scaling_factor * matrix[current_row_idx]
Take note of the variables that were renamed. Let us be sure to replace any single letter variable names with more descriptive identifiers.
3.1.4 Finally… _backsolve
The last function is _backsolve
.
def _backsolve(matrix: np.array) -> None:
"""
Back solve the matrix by performing the necessary row scale and subtraction
operations to obtain a diagonal matrix with ones on the diagonal.
The augmented column will contain the solution.
Args:
matrix: augmented matrix
"""
num_rows, _ = matrix.shape
for i in reversed(range(1, num_rows)):
for j in reversed(range(0, i)):
scaling_factor = matrix[j, i]
matrix[j, i] -= scaling_factor * matrix[i, i]
matrix[j, -1] -= scaling_factor * matrix[i, -1]
Since this function starts with an upper-triangular matrix… we know that for each row every column to the left of index “i
” is zero. This means we only need to subtract column “i
” and the augmented column from each row above row “i
”.
We can actually combine the two lines using NumPy magic.
matrix[j, [i, -1]] -= scaling_factor * matrix[i, [i, -1]]
The matrix[j, [i, -1]]
specifies that we want to retrieve only column “i
” and the last column (i.e., -1
) from row “j
”.
3.2 Back to the Solve Loop
We need to make one final edit to the solve_matrix
function. Yes… I know that the return happens after the loop. But… I needed a section heading (naming things is hard).
The previous return statement no longer works.
return matrix_augmented
We do not want to return the entire matrix. We only need the last column…
return matrix_augmented[:, -1].flatten()
Notice the .flatten()
? We do not want an array of arrays, e.g.,
[[0], [0], [1]]
We want to remove the inner level or arrays and end up with…
[0, 0, 1]
3.3 Where is the Code?
The refactored code can be accessed directly here. You will notice a small change in the output logic.
print("{:*^40}".format("XTX"))
print(matrix_XTX)
print()
print("{:*^40}".format("XTY"))
print(matrix_XTY)
print()
print("{:*^40}".format("XTX|XTY"))
print(matrix_augmented)
solution = solve_matrix(matrix_augmented)
print()
print("{:-^40}".format("Solution"))
print(np.polynomial.Polynomial(solution))
The print_matrices
function has been removed. All output logic (which now includes the augmented matrix) happens directly in main
.
Current Output
******************XTX******************* [[ 3. 3. 5.] [ 3. 5. 9.] [ 5. 9. 17.]] ******************XTY******************* [[ 5.] [ 9.] [17.]] ****************XTX|XTY***************** [[ 3. 3. 5. 5.] [ 3. 5. 9. 9.] [ 5. 9. 17. 17.]] ----------------Solution---------------- 0.0 + 0.0·x + 1.0·x²
4 Where is the NumPy Magic?
There is still a bit of tweaking left. There are a few places where NumPy can both simplify the code and make the code more performant. However, that will be covered in the next lecture.