Least Squares - A Quick First Example
Thomas J. Kennedy
Later in this course we will discuss a more robust form of Least Squares Approximation. However, we will start with the XTX|XTY method.
1 A Quick Initial Example
First, we will need some input points.
x | f(x) |
---|---|
-2 | 4 |
-1 | 1 |
0 | 0 |
1 | 1 |
2 | 4 |
One could guess that $f(x) = x^2$. However, we actually want to do some math (of course, that might just be me).
Suppose we want to a line that approximates this (supposedly) unknown function (i.e., $f(x)$) for $x \ge 0$. We want a function in the form
\[ \begin{align} \hat{\varphi} & = \sum_{i=0}^{1} c_i x^i \\ & = c_0 + c_1 x \\ \end{align} \]
Did you notice the subtle change from $\varphi$ to $\hat{\varphi}$? We need to compute values for both $c_0$ and $c_1$. However, we want the two values that give us the best possible approximation function (i.e., the line with the smallest error).
Later in the course we will define this error using
\[ ||f - \hat{\varphi}|| \]
However, that is a discussion for the next module. For now, we need to define three matrices:
- $X$
- $X^T$
- $Y$
1.1 The Set Up
The X matrix is defined by the taking each constant in our approximation function and plugging in each point. For our selected points
x | f(x) |
---|---|
0 | 0 |
1 | 1 |
2 | 4 |
and selected approximation function
\[ \hat{\varphi} = c_0 (1) + c_1 x \]
the matrix $X$ will be defined as
$$ X = \left[\begin{array}{rr} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right] $$
The first column is defined by taking each $x$ value and plugging it into $y=1$. The second column is defined by taking each $x$ value and plugging it into $y=x$. For the $Y$ matrix, we need only copy each $y$.
$$ Y = \left[\begin{array}{r} 0 \\ 1 \\ 4 \\ \end{array}\right] $$
$X^T$ is the transpose of $X$.
$$ \left[\begin{array}{rrr} 1 & 1 & 1\\ 0 & 1 & 2\\ \end{array}\right] $$
1.2 Constructing XTX|XTY
Once the matrices are defined, $X^{T}X$ and $X^{T}Y$ must be computed. The two matrix multiplications are left as an exercise to the reader (i.e., you).
$$ X^{T}X = \left[\begin{array}{rr} 3 & 3\\ 3 & 5\\ \end{array}\right] $$
$$ X^{T}Y = \left[\begin{array}{r} 5\\ 9\\ \end{array}\right] $$
With both matrices computed, we can construct the $X^TX|X^TY$ augmented matrix. This will, in turn, allow us to compute $c_0$ and $c_1$.
$$ [X^{T}X|X^{T}Y] = \left[\begin{array}{rr|r} 3 & 3 & 5\\ 3 & 5 & 9\\ \end{array}\right] $$
1.3 Solving XTX|XTY
The system, $[X^TX|X^TY]$ can be solved using Gaussian Elimination. We will start with $[X^TX|X^TY]$.
$$ \left[\begin{array}{rr|r} 3 & 3 & 5\\ 3 & 5 & 9\\ \end{array}\right] $$
Scale row 0 using $\frac{1}{3}r_0$
$$ \left[\begin{array}{rr|r} 1 & 1 & \frac{5}{3}\\ 3 & 5 & 9\\ \end{array}\right] $$
Subtract row 0 from row 1 using $r_1 - 3r_0$
$$ \left[\begin{array}{rr|r} 1 & 1 & \frac{5}{3}\\ 0 & 2 & 4\\ \end{array}\right] $$
Scale row 1 using $\frac{1}{2}r_1$.
$$ \left[\begin{array}{rr|r} 1 & 1 & \frac{5}{3}\\ 0 & 1 & 2\\ \end{array}\right] $$
Backsolve by using $r_0 = r_0-r_1$. This leaves us with our answer.
$$ \left[\begin{array}{rr|r} 1 & 0 & \frac{-1}{3}\\ 0 & 1 & 2\\ \end{array}\right] $$
1.4 Final Result
Coefficients
$$c_0 = \frac{-1}{3}$$ $$c_1 = 2$$
Approximation Function (phi hat)
\[ \hat{\varphi} = \frac{-1}{3} + 2x^1 \]
2 The Other Half
Suppose we are now interested in the domain $x \le 0$. We need to construct the same three matrices:
- $X$
- $X^T$
- $Y$
2.1 The Set Up
The X matrix is defined by the taking each constant in our approximation function and plugging in each point. For our selected points
x | f(x) |
---|---|
-2 | 4 |
-1 | 1 |
0 | 0 |
and selected (linear/polynomial) approximation function
\[ \hat{\varphi} = c_0 (1) + c_1 x \]
the matrix $X$ will be defined as
$$ X = \left[\begin{array}{rr} 1 & -2 \\ 1 & -1 \\ 1 & 0 \\ \end{array}\right] $$
The first column is defined by taking each $x$ value and plugging it into $y=1$. The second column is defined by taking each $x$ value and plugging it into $y=x$. For the $Y$ matrix, we need only copy each $y$.
$$ Y = \left[\begin{array}{r} 4 \\ 1 \\ 0 \\ \end{array}\right] $$
$X^T$ is the transpose of $X$.
$$ \left[\begin{array}{rrr} 1 & 1 & 1\\ -2 & -1 & 0\\ \end{array}\right] $$
2.2 Constructing XTX|XTY
Once the matrices are defined, $X^{T}X$ and $X^{T}Y$ must be computed. The two matrix multiplications are left as an exercise to the reader (i.e., you).
$$ X^{T}X = \left[\begin{array}{rr} 3 & -3\\ -3 & 5\\ \end{array}\right] $$
$$ X^{T}Y = \left[\begin{array}{r} 5\\ -9\\ \end{array}\right] $$
With both matrices computed, we can construct the $[X^TX|X^TY]$ augmented matrix. This will, in turn, allow us to compute $c_0$ and $c_1$.
$$ [X^{T}X|X^{T}Y] = \left[\begin{array}{rr|r} 3 & -3 & 5\\ -3 & 5 & -9\\ \end{array}\right] $$
2.3 Solving XTX|XTY
The system, $[X^TX|X^TY]$ can be solved using Gaussian Elimination. We will start with $[X^TX|X^TY]$.
$$ \left[\begin{array}{rr|r} 3 & -3 & 5\\ -3 & 5 & -9\\ \end{array}\right] $$
Add row 0 to row 1 using $r_1 + r_0$
$$ \left[\begin{array}{rr|r} 3 & -3 & 5\\ 0 & 2 & -4\\ \end{array}\right] $$
Scale row 1 using $\frac{1}{2}r_1$ and row 0 using $\frac{1}{3}r_0$.
$$ \left[\begin{array}{rr|r} 1 & -1 & \frac{5}{3}\\ 0 & 1 & -2\\ \end{array}\right] $$
Add $r_1$ to $r_0$ ($r_0=r_1 + r_0$).
$$ \left[\begin{array}{rr|r} 1 & 0 & \frac{5}{3} - \frac{6}{3}\\ 0 & 1 & -2\\ \end{array}\right] $$
…after simplifying we have
$$ \left[\begin{array}{rr|r} 1 & 0 & \frac{-1}{3}\\ 0 & 1 & -2\\ \end{array}\right] $$
2.4 Final Result
Coefficients
$$c_0 = \frac{-1}{3}$$ $$c_1 = -2$$
Approximation Function (phi hat)
\[ \hat{\varphi} = -\frac{1}{3} - 2x \]
3 TL;DR
Given a collection of discrete points we need to find a line (polynomial of degree one) of best fit. Since more than two points will be included, it is impossible to have a single line pass perfectly through every point. Instead a line of best fit is computed. While most cases use a line, it is possible
-
for a polynomial of any degree to be used as an approximation function (provided a sufficient number of points).
-
to extend the problem to 3 or more spatial dimensions.