Least Squares - A Quick First Example

Thomas J. Kennedy

Contents:

Later in this course we will discuss a more robust form of Least Squares Approximation. However, we will start with the XTX|XTY method.

1 A Quick Initial Example

First, we will need some input points.

x f(x)
-2 4
-1 1
0 0
1 1
2 4

One could guess that $f(x) = x^2$. However, we actually want to do some math (of course, that might just be me).

Suppose we want to a line that approximates this (supposedly) unknown function (i.e., $f(x)$) for $x \ge 0$. We want a function in the form

\[ \begin{align} \hat{\varphi} & = \sum_{i=0}^{1} c_i x^i \\ & = c_0 + c_1 x \\ \end{align} \]

Did you notice the subtle change from $\varphi$ to $\hat{\varphi}$? We need to compute values for both $c_0$ and $c_1$. However, we want the two values that give us the best possible approximation function (i.e., the line with the smallest error).

Later in the course we will define this error using

\[ ||f - \hat{\varphi}|| \]

However, that is a discussion for the next module. For now, we need to define three matrices:

  1. $X$
  2. $X^T$
  3. $Y$

1.1 The Set Up

The X matrix is defined by the taking each constant in our approximation function and plugging in each point. For our selected points

x f(x)
0 0
1 1
2 4

and selected approximation function

\[ \hat{\varphi} = c_0 (1) + c_1 x \]

the matrix $X$ will be defined as

$$ X = \left[\begin{array}{rr} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right] $$

The first column is defined by taking each $x$ value and plugging it into $y=1$. The second column is defined by taking each $x$ value and plugging it into $y=x$. For the $Y$ matrix, we need only copy each $y$.

$$ Y = \left[\begin{array}{r} 0 \\ 1 \\ 4 \\ \end{array}\right] $$

$X^T$ is the transpose of $X$.

$$ \left[\begin{array}{rrr} 1 & 1 & 1\\ 0 & 1 & 2\\ \end{array}\right] $$

1.2 Constructing XTX|XTY

Once the matrices are defined, $X^{T}X$ and $X^{T}Y$ must be computed. The two matrix multiplications are left as an exercise to the reader (i.e., you).

$$ X^{T}X = \left[\begin{array}{rr} 3 & 3\\ 3 & 5\\ \end{array}\right] $$

$$ X^{T}Y = \left[\begin{array}{r} 5\\ 9\\ \end{array}\right] $$

With both matrices computed, we can construct the $X^TX|X^TY$ augmented matrix. This will, in turn, allow us to compute $c_0$ and $c_1$.

$$ [X^{T}X|X^{T}Y] = \left[\begin{array}{rr|r} 3 & 3 & 5\\ 3 & 5 & 9\\ \end{array}\right] $$

1.3 Solving XTX|XTY

The system, $[X^TX|X^TY]$ can be solved using Gaussian Elimination. We will start with $[X^TX|X^TY]$.

$$ \left[\begin{array}{rr|r} 3 & 3 & 5\\ 3 & 5 & 9\\ \end{array}\right] $$

Scale row 0 using $\frac{1}{3}r_0$

$$ \left[\begin{array}{rr|r} 1 & 1 & \frac{5}{3}\\ 3 & 5 & 9\\ \end{array}\right] $$

Subtract row 0 from row 1 using $r_1 - 3r_0$

$$ \left[\begin{array}{rr|r} 1 & 1 & \frac{5}{3}\\ 0 & 2 & 4\\ \end{array}\right] $$

Scale row 1 using $\frac{1}{2}r_1$.

$$ \left[\begin{array}{rr|r} 1 & 1 & \frac{5}{3}\\ 0 & 1 & 2\\ \end{array}\right] $$

Backsolve by using $r_0 = r_0-r_1$. This leaves us with our answer.

$$ \left[\begin{array}{rr|r} 1 & 0 & \frac{-1}{3}\\ 0 & 1 & 2\\ \end{array}\right] $$

1.4 Final Result

Coefficients

$$c_0 = \frac{-1}{3}$$ $$c_1 = 2$$

Approximation Function (phi hat)

\[ \hat{\varphi} = \frac{-1}{3} + 2x^1 \]


2 The Other Half

Suppose we are now interested in the domain $x \le 0$. We need to construct the same three matrices:

  1. $X$
  2. $X^T$
  3. $Y$

2.1 The Set Up

The X matrix is defined by the taking each constant in our approximation function and plugging in each point. For our selected points

x f(x)
-2 4
-1 1
0 0

and selected (linear/polynomial) approximation function

\[ \hat{\varphi} = c_0 (1) + c_1 x \]

the matrix $X$ will be defined as

$$ X = \left[\begin{array}{rr} 1 & -2 \\ 1 & -1 \\ 1 & 0 \\ \end{array}\right] $$

The first column is defined by taking each $x$ value and plugging it into $y=1$. The second column is defined by taking each $x$ value and plugging it into $y=x$. For the $Y$ matrix, we need only copy each $y$.

$$ Y = \left[\begin{array}{r} 4 \\ 1 \\ 0 \\ \end{array}\right] $$

$X^T$ is the transpose of $X$.

$$ \left[\begin{array}{rrr} 1 & 1 & 1\\ -2 & -1 & 0\\ \end{array}\right] $$

2.2 Constructing XTX|XTY

Once the matrices are defined, $X^{T}X$ and $X^{T}Y$ must be computed. The two matrix multiplications are left as an exercise to the reader (i.e., you).

$$ X^{T}X = \left[\begin{array}{rr} 3 & -3\\ -3 & 5\\ \end{array}\right] $$

$$ X^{T}Y = \left[\begin{array}{r} 5\\ -9\\ \end{array}\right] $$

With both matrices computed, we can construct the $[X^TX|X^TY]$ augmented matrix. This will, in turn, allow us to compute $c_0$ and $c_1$.

$$ [X^{T}X|X^{T}Y] = \left[\begin{array}{rr|r} 3 & -3 & 5\\ -3 & 5 & -9\\ \end{array}\right] $$

2.3 Solving XTX|XTY

The system, $[X^TX|X^TY]$ can be solved using Gaussian Elimination. We will start with $[X^TX|X^TY]$.

$$ \left[\begin{array}{rr|r} 3 & -3 & 5\\ -3 & 5 & -9\\ \end{array}\right] $$

Add row 0 to row 1 using $r_1 + r_0$

$$ \left[\begin{array}{rr|r} 3 & -3 & 5\\ 0 & 2 & -4\\ \end{array}\right] $$

Scale row 1 using $\frac{1}{2}r_1$ and row 0 using $\frac{1}{3}r_0$.

$$ \left[\begin{array}{rr|r} 1 & -1 & \frac{5}{3}\\ 0 & 1 & -2\\ \end{array}\right] $$

Add $r_1$ to $r_0$ ($r_0=r_1 + r_0$).

$$ \left[\begin{array}{rr|r} 1 & 0 & \frac{5}{3} - \frac{6}{3}\\ 0 & 1 & -2\\ \end{array}\right] $$

…after simplifying we have

$$ \left[\begin{array}{rr|r} 1 & 0 & \frac{-1}{3}\\ 0 & 1 & -2\\ \end{array}\right] $$

2.4 Final Result

Coefficients

$$c_0 = \frac{-1}{3}$$ $$c_1 = -2$$

Approximation Function (phi hat)

\[ \hat{\varphi} = -\frac{1}{3} - 2x \]

3 TL;DR

Given a collection of discrete points we need to find a line (polynomial of degree one) of best fit. Since more than two points will be included, it is impossible to have a single line pass perfectly through every point. Instead a line of best fit is computed. While most cases use a line, it is possible