NumPy & Multi-Dimension Arrays

Thomas J. Kennedy

Contents:

1 Perspective

Matrices are often used to solve systems of simultaneous linear equations. In Linear Algebra… Gaussian Elimination is one of the first techniques taught. Note that while this discussion focuses on a matrix solver… you are not expected to have a Linear Algebra background.

The main function is omitted intentionally. One of the lessons, alongside NumPy, will be on refactoring code. The main function will be introduced near the end of this lecture.

2 Pseudocode

The pseudocode for a matrix solver can be broken into four (4) pieces.

  1. Pivot
  2. Scale
  3. Eliminate
  4. Backsolve

Note that pivot is included in the driver function.

Example 1: Top Level Pseudocode
# Let `matrix_A` be an n by n matrix with an augmented vector in col n
def solve(matrix_A):

    # Iterate through all rows
    for every i in 0 to n-1:

        # Pivot
        idx <- find_largest_row_by_col(A, col_index, num_rows)
        swap_row(A, i, idx)

        scale(A, i, num_cols, A[i][i])
        A[i][i] = 1

        eliminate(A, i, num_rows)
Example 2: Scale
def scale_row(A, row_idx, num_cols, s):

    for every j in 0 to num_cols:
        A[row_idx][j] = A[row_idx][j] / s

Example 3: Eliminate
def eliminate(A, src_row_idx, num_cols, num_rows):

    start_col <- src_row_idx

    for every i in (src_row_idx + 1) to num_rows:

        s <- A[i][start_col]

        for every j in start_col to num_cols:
            A[i][j] = A[i][j] - s * A[src_row_idx][j]

        A[i][start_col] = 0

Example 4: Backsolve
def back_solve(matrix):

    augColIdx <- matrix.cols()  # the augmented column
    lastRow = matrix.rows() - 1

    for i in lastRow down to 1:
        for j <- (i - 1) down to 0:
            s <- matrix[j][i]

            matrix[j][i] -= (s * matrix[i][i])
            matrix[j][augCol] -= (s * matrix[i][augColIdx])

3 Implementing the Solver

The initial implementation will use np.array objects. However, the implementation will forgo most NumPy functions, relying heavily on vanilla Python.

3.1 What About Best Practices?

The initial implementation will be rough or crude. It will work. It will get the job done. The implementation will need a lot of clean up. The follow-up implementations will incorporate Python conventions with some NumPy functionality.

3.2 The Driver

The first piece of pseudocode can be written, in Python, in a solve_matrix function…

def solve_matrix(matrix_XTX, matrix_XTY):
    """
    Solve a matrix and return the resulting solution vector
    """

    # Get the dimensions (shape) of the XTX matrix
    num_rows, num_columns = matrix_XTX.shape

    # Placeholder until the interfaces for each function are defined
    for i in range(0, num_rows):
        pass

Note the use of the .shape tuple to retrieve the number of rows and columns in matrix_XTX.

3.3 Pivot

The pivot operation can be completed in two steps:

  1. look down the current column and find the row with the largest value in that column
  2. swap the largest row with the current row

The pseudocode, coincidentally, is also two lines…

    idx <- find_largest_row_by_col(A, col_index, num_rows)
    swap_row(A, i, idx)

A naive implementation is more than two lines.

def find_largest_row_by_col(matrix, col_index):
    num_rows, _ = matrix.shape

    i = col_index
    largest_idx = i
    current_col = i
    for j in range(i + 1, num_rows):
        if matrix[largest_idx, i] < matrix[j, current_col]:
            largest_idx = j

    return largest_idx


def swap_rows(matrix_XTX, matrix_XTY, largest_idx, i):
    if largest_idx != i:
        matrix_XTX[[i, largest_idx], :] = matrix_XTX[[largest_idx, i], :]
        matrix_XTY[[i, largest_idx]] = matrix_XTY[[largest_idx, i]]

3.3.1 Indices - Row and Column

NumPy provides extensions to the Python index syntax. Consider…

matrix[largest_idx, i]

This excerpt specifies that a value is being retrieved from row largest_idx and column i within that row.

3.3.2 Indices - Entire Rows

Consider the somewhat intimidating…

matrix_XTX[[i, largest_idx], :]

This line grabs the rows at two indices (i.e., i and largest_idx) and every column within those rows.

3.4 Scale

The scale pseudocode calls for a loop.

def scale_row(A, row_idx, num_cols, s):

    for every j in 0 to num_cols:
        A[row_idx][j] = A[row_idx][j] / s

However, we can use NumPy’s broadcast mechanic to scale every column within a row.

def scale_row(matrix_XTX, matrix_XTY, i):
    scaling_factor = matrix_XTX[i, i]
    matrix_XTX[i, :] /= scaling_factor
    matrix_XTY[i] /= scaling_factor

3.5 Eliminate

The eliminate logic takes the current row and subtracts it from every subsequent row.

def eliminate(A, src_row_idx, num_cols, num_rows):

    start_col <- src_row_idx

    for every i in (src_row_idx + 1) to num_rows:

        s <- A[i][start_col]

        for every j in start_col to num_cols:
            A[i][j] = A[i][j] - s * A[src_row_idx][j]

        A[i][start_col] = 0

The pseudocode starts of by accessing the .shape property and discards the number of columns (since we only need the number of rows).

def eliminate(matrix_XTX, matrix_XTY, i):
    num_rows, _ = matrix_XTX.shape

    for row_i in range(i + 1, num_rows):
        s = matrix_XTX[row_i][i]

        matrix_XTX[row_i] = matrix_XTX[row_i] - s * matrix_XTX[i]
        matrix_XTY[row_i] = matrix_XTY[row_i] - s * matrix_XTY[i]

Note how we take row i, multiply every value in the row by s and then subtract it from row row_i. Have you noticed a pattern varaible names (specifically indices)? They are anything but self-documenting. These naming issues will be part of our follow-up refactoring discussion.

3.6 Backsolve

The backsolve operation is the final step. Now that we have worked our way to the last row of the matrix (and ended up with an upper triangular matrix)… we can work our way back up.

def back_solve(matrix):

    augColIdx <- matrix.cols()  # the augmented column
    lastRow = matrix.rows() - 1

    for i in lastRow down to 1:
        for j <- (i - 1) down to 0:
            s <- matrix[j][i]

            matrix[j][i] -= (s * matrix[i][i])
            matrix[j][augCol] -= (s * matrix[i][augColIdx])

The pseudocode once again seems to differ from the implementation. The pseudocode has a single matrix while the implementation has two. The why will be discussed during the follow-up refactoring discussion.

def _backsolve(matrix_XTX, matrix_XTY):

    num_rows, _ = matrix_XTX.shape

    for i in reversed(range(1, num_rows)):
        for j in reversed(range(0, i)):
            s = matrix_XTX[j, i]

            matrix_XTX[j, i] -= (s * matrix_XTX[i, i])
            matrix_XTY[j] -= (s * matrix_XTY[i])

Take note of the leading underscore. By convention… any function that starts with a “_” is to be treated as an implementation detail.

4 What About main?

As promised… we will discuss the main function. Before solve is ever called we set up some input data.

    # Set up input data points, X, Y, and XT
    points = [(0., 0.), (1., 1.), (2., 4.)]

We have three input points for which we would like to compute a polynomial of best fit. The polynomial will be of degree two (2) and take the form…

$$ \hat{\varphi} = c_0 + c_1 x + c_2 x^2 $$

where $c_0$, $c_1$, and $c_2$ are the unknowns for which we are solving.

The $X$ matrix is defined as…

    # Set up X, Y, and XT matrices 
    matrix_X = np.array([[1., 0., 0.],
                         [1., 1., 1.],
                         [1., 2., 4.]])

The $Y$ matrix is defined as…

    matrix_Y = np.array([0,
                         1,
                         4])

The transpose of $X$ (i.e. $X^T$) can be obtined using NumPy’s transpose method.

    matrix_XT = matrix_X.transpose()

We need to perform two matrix multiplications. My preference is to call np.matmul.

    # Compute XTX and XTY
    matrix_XTX = np.matmul(matrix_XT, matrix_X)
    matrix_XTY = np.matmul(matrix_XT, matrix_Y)

However, some NumPy code will use the @ operator, e.g.,

    matrix_XTX = matrix_XT @ matrix_X

The print_matrices function is used for debugging. However, it should be replaced with the pprint module.

    print_matrices(matrix_XTX, matrix_XTY)

The rest of main is just outputting the results.

    print()
    print("{:-^40}".format("Solution"))
    solution = solve_matrix(matrix_XTX, matrix_XTY)
    print(solution)

5 But… What is the Output?

If you run the code from this lecture using

python3.11 least_squares_initial.py

you will obtain, as output…

Sample Output

******************XTX*******************
[[ 3.  3.  5.]
 [ 3.  5.  9.]
 [ 5.  9. 17.]]

******************XTY*******************
[ 5.  9. 17.]

----------------Solution----------------
[0. 0. 1.]

We can make the output a little more readable…

0.0 + 0.0·x + 1.0·x²

The trick is to replace…

    print(solution)

with

    print(np.polynomial.Polynomial(solution))

which uses the NumPy Polynomial class to generate more presentable output.

6 Where is the Refactoring?

Refactoring the code from this discussion will a separate lecture (i.e., the very next lexture).