Finite Precision & Error... in Base 2

Thomas J. Kennedy

Contents:

Refer to this PDF

1 What is Error?

Let us start with the definitions for absolute error and relative error.

Absolute error is the difference between some experimental (or measured or derived) value and a known correct value.

$$ |x - x^{*}| $$

In these problems… we will use $x$ to refer to the correct value and $x^{*}$ to refer to a value subject to error (e.g., due to machine precision).

Relative error is absolute error in relation to the know correct value… it captures how far off we are in relation to the magnitude of the known correct value.

$$ \frac{|x - x^{*}|}{|x|} $$

2 Representing Real Numbers

$x^{*}$ (read as “x-star” or “x-chop”) represents a number that is subject to finite precision (i.e., must be stored using a finite number of digits/bits).

$$ x^{*} = \pm \left( \sum\limits_{i=1}^t b_{-i} \beta^{-i} \right) * \beta^{e^{*}} $$

where

$$ e^{*} = \pm \sum\limits_{i=1}^s s_{i} \beta^{i} $$

This notation can be quite overwhelming at first. Let us break it down…

$$\sum\limits_{i=1}^t b_{-i} \beta^{-i}$$

represents the mantissa (i.e., the digits to the right of the decimal place).

$$ \pm \sum\limits_{i=1}^s s_{i} \beta^{i} $$

represents the exponent bits.

3 Let us Clarify Notation

$x^{*}$ is a real number that can be represented by $t$ mantissa digits and $s$ exponent digits.

$$ x^{*} \in \mathbb{R}(t,s) $$

$x$ is more interesting…

$$ x \in \mathbb{R}(\infty,\infty) $$

$x$ is a real number for which we have access to infinite precision (i.e., infinite digits).

4 Bounding Relative Error

Now… it is time to derive an upper bound for the error between $x$ and $x^*$. We know that (by definition):

$$ x^{*} = \pm \left( \sum\limits_{i=1}^t b_{-i} \beta^{-i} \right) * \beta^{e^{*}} $$

where

$$ e^{*} = \pm \sum\limits_{i=1}^s s_{i} \beta^{i} $$

and

$$ x = \pm \left( \sum\limits_{i=1}^{\infty} b_{-i} \beta^{-i} \right) * \beta^{e^{*}} $$

where

$$e = \pm \sum\limits_{i=1}^{\infty} s_{i} \beta^{i} $$

The difference comes down to finite (i.e., limited) precision, i.e., $x^* \in \mathbb{R}(t, s)$ vs $x \in \mathbb{R}(\infty, \infty)$.

4.1 Absolute Error

Relative error is defined as $\frac{|x-x^*|}{|x|}$. Let us start with the numerator… which happens to be absolute error!

$$ \left| x - x^{*} \right| = \left| \sum\limits_{i=0}^{\infty} b_{-i}\beta^{-i} * \beta^{e} - \sum\limits_{i=0}^{k} b_{-i}\beta^{-i} * \beta^{e^{*}} \right| $$

Let…

These two “small” observations lead to…

$$ \begin{eqnarray} \left| x - x^{*} \right| &=& \left| \sum\limits_{i=0}^{k} b_{-i}\beta^{-i} * \beta^{e} + \sum\limits_{i=k+1}^{\infty} b_{-i}\beta^{-i} * \beta^{e} - \sum\limits_{i=0}^{k} b_{-i}\beta^{-i}* \beta^{e} \right| \\ &=& \left| \sum\limits_{i=0}^{k} b_{-i}\beta^{-i} + \sum\limits_{i=k+1}^{\infty} b_{-i}\beta^{-i} - \sum\limits_{i=0}^{k} b_{-i}\beta^{-i} \right| \beta^{e} \\ &=& \left| \sum\limits_{i=k+1}^{\infty} b_{-i}\beta^{-i} \right| \beta^{e} \\ \end{eqnarray} $$

Unlike most problems… let us sacrifice a little generality. Our interest lies with base 2 (binary). Let us set $\beta = 2$.

$$ \begin{eqnarray} \left| x - x^{*} \right| &=& \left| \sum\limits_{i=k+1}^{\infty} b_{-i}2^{-i} \right| 2^{e} \\ \end{eqnarray} $$

Now… we need to bound the error by using the largest possible mantissa. Letting $b_{-i} = (\beta - 1) = (2 - 1) = 1$ for all $i$ leads to

$$ \begin{eqnarray} \left| x - x^{*} \right| &=& \left| \sum\limits_{i=k+1}^{\infty} b_{-i}2^{-i} \right| 2^{e} \\ &\le& \left| \sum\limits_{i=k+1}^{\infty} (2 - 1) 2^{-i} \right| 2^{e} \\ &\le& \left| (2 - 1) \sum\limits_{i=k+1}^{\infty}2^{-i} \right| 2^{e} \\ &\le& \left| \sum\limits_{i=k+1}^{\infty}2^{-i} \right| 2^{e} \\ &\le& \left| 2^{-k} \sum\limits_{i=1}^{\infty}2^{-i} \right| 2^{e} \\ &\le& \left| 2^{-k - 1} \sum\limits_{i=0}^{\infty}2^{-i} \right| 2^{e} \\ \end{eqnarray} $$

Now we need to tackle the sum

$$ \sum\limits_{i=0}^{\infty}2^{-i} $$

We can use the geometric series formula for a convergent infinite series..

$S_{\infty} = \frac{1}{1-r}$ iff $|r| < 1$

In this case… $r = 2^{-1}$.

$$ \begin{eqnarray} S_{\infty} &=& \frac{1}{1-r} \\ &=& \frac{1}{1 - \frac{1}{2}} &=& \frac{1}{\frac{1}{2}} &=& 2 \end{eqnarray} $$

Using this result leads to…

$$ \begin{eqnarray} \left| x - x^{*} \right| &\le& \left|2^{-k - 1} 2\right| 2^{e} \\ &\le& \left|2^{-k}\right| 2^{e} \\ \end{eqnarray} $$

The worst case error (i.e., upper bound for error) can be written as…

$$ \left| x - x^{*} \right| \le 2^{-k}2^{e} $$

4.2 Relative Error

We know that relative error is defined as

$$ \frac{|x - x^*|}{|x|} $$

From the absolute error derivation, we know that

$$ \left| x - x^{*} \right| \le 2^{-k}2^{e} $$

That leads to…

$$ \frac{|x - x^*|}{|x|} \le \frac{2^{-k}2^{e}}{\min(|x|) * 2^{e}} $$

 

Notice how $|x|$ became $\min(|x|)$. We are bounding the bound. As |x| gets smaller, the error gets larger. The smallest legal non-zero mantissa (by the normalization constraint) is $2^{-1}$.

$$ \begin{eqnarray} \frac{|x - x^*|}{|x|} &\le& \frac{2^{-k}2^{e}}{\min(|x|) * 2^e} \\ &\le& \frac{2^{-k}2^{e}}{2^{-1}2^e} \\ &\le& \frac{2^{-k}}{2^{-1}} \\ &\le& 2^1 2^{-k} \\ &\le& 2^{-k + 1} \\ \end{eqnarray} $$