Our primary tool in trying to predict the speed of an algorithm is going to be an idea we call the “complexity” or more specifically the “worst case complexity” of the algorithm.
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
Let’s look at how we might analyze a simple algorithm to determine its run time before implementation:
We do this by tallying up all of the primitive operations that would be performed by this code. (This requires a certain level of insight into how a compiler will translate non-trivial operations, such as array indexing.)
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
Analysis:
line 1 is executed $N$ times.
line 2 is executed $N$ times.
line 3 is executed $N^2$ times.
line 4 is executed $N^2$ times.
Now let’s break down line 1 into more detail:
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
line 1 is executed N times.
More precisely, we have 1 assignment when i
is initialized, plus an addition, comparison, and assignment to i
each time the loop is repeated. There is also a final comparison just before exiting the loop.
line 2 is executed N times.
line 3 is executed $N^2$ times.
line 4 is executed $N^2$ times.
We’ll tally these into a table like this:
Line | Iterations | Assignments | Additions | Multiplications | Comparisons |
---|---|---|---|---|---|
1 | N | N+1 | N | 0 | N+1 |
Now let’s look in detail at line 2:
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
The 2nd line contributes 1 addition, 1 multiplication and 1 assignment each time it is executed, because an array indexing op like a[i]
gets translated by the compiler into
\[ \mbox{address} = \mbox{addr}_{a} + i * s_{a} \]
where $\mbox{addr}_{a}$ is the starting address of the array a
and $s_{a}$ is the size (in bytes) of a single element of a
.
Line | Iterations | Assignments | Additions | Multiplications | Comparisons |
---|---|---|---|---|---|
1 | N | N+1 | N | 0 | N+1 |
2 | N | N | N | N | 0 |
Moving on to line 3,
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
The 3rd line contributes 1 assignment when j
is initialized. This happens N times. It also has an addition, comparison, and assignment to j
each time the loop is repeated, plus one other comparison at the start of the loop.
Line | Iterations | Assignments | Additions | Multiplications | Comparisons |
---|---|---|---|---|---|
1 | $N$ | $N+1$ | $N$ | $0$ | $N+1$ |
2 | $N$ | $N$ | $N$ | $N$ | $0$ |
3 | $N^2$ | $N^2 + N$ | $N^2$ | $0$ | $N (N + 1)$ |
And, finally, line 4:
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
The 4th line produces 3 additions, 3 multiplications (2 of these additions and multiplications are from the array indexing) and one assignment each time it is executed.
a[i]
” is the same expression in two places, and must refer to the same address each time. These compilers would only do the calculation once, saving an addition and a multiplication.
Recognition of common subexpressions like that is one of the more common optimizations that you are likely to see if you compile your code with the appropriate flags (e.g., -O2
for g++
) to request optimization.
Line | Iterations | Assignments | Additions | Multiplications | Comparisons |
---|---|---|---|---|---|
1 | $N$ | $N+1$ | $N$ | $0$ | $N+1$ |
2 | $N$ | $N$ | $N$ | $N$ | $0$ |
3 | $N^2$ | $N^2 + N$ | $N^2$ | $0$ | $N (N + 1)$ |
4 | $N^2$ | $N^2$ | $3 N^2$ | $3 N^2$ | $0$ |
Totals
And the total is …
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
Line | Iterations | Assignments | Additions | Multiplications | Comparisons |
---|---|---|---|---|---|
1 | N | N+1 | N | 0 | N+1 |
2 | N | N | N | N | 0 |
3 | $N^2$ | $N^2 + N$ | $N^2$ | 0 | $N (N + 1)$ |
4 | $N^2$ | $N^2$ | $3 N^2$ | $3 N^2$ | 0 |
Totals: | $2 N^2 + 3N + 1$ | $4 N^2 + 2 N$ | $3 N^2 + N$ | $N^2 + 2N + 1$ |
The total run time, $T(N)$, for this algorithm is
\[ \begin{eqnarray*} T(N) & = & (2N^2 + 3N + 1) t_{\mbox{asst}} + (N^2 + 2N + 1)t_{\mbox{comp}} \\ & & + (4N^2 + 2N)t_{\mbox{add}} + (3N^2 + N)t_{\mbox{mult}} \end{eqnarray*} \]
where $t_{\mbox{asst}}$ is the time required by our CPU to do one assignment, $t_{\mbox{comp}}$ is the time required by our CPU to do one comparison, $t_{\mbox{add}}$ is the time required to do one addition, and $t_{\mbox{mult}}$ is the time required to do one multiplication.
This process is tedious.
The actual value depends upon how fast your CPU does assignments, additions, …
The actual value depends on how your compiler translates the source code into lower-level instructions and upon the settings you use when invoking the compiler. We’ve made some reasonable assumptions in this example, but that might be harder to do if we were working with more complicated code.
The total run time, T(N), for this algorithm is
\[ \begin{eqnarray*} T(N) & = & (2N^2 + 3N + 1) t_{\mbox{asst}} + (N^2 + 2N + 1)t_{\mbox{comp}} \\ & & + (4N^2 + 2N)t_{\mbox{add}} + (3N^2 + N)t_{\mbox{mult}} \end{eqnarray*} \]
Suppose that we group together terms that involve different powers of N:
\[\begin{eqnarray*} T(N) & = & N^{2}(2t_{\mbox{asst}} + t_{\mbox{comp}} + 4t_{\mbox{add}} + 3t_{\mbox{mult}}) \\ & & + N(3t_{\mbox{asst}} + 2t_{\mbox{comp}} + 2t_{\mbox{add}} + t_{\mbox{mult}}) \\ & & + 1(t_{\mbox{asst}} + t_{\mbox{comp}}) \end{eqnarray*}\]
Each of the parenthesized terms is, for any given CPU, a constant.
Define
\[ \begin{align} c_1 & = 2t_{\mbox{asst}} + t_{\mbox{comp}} + 4t_{\mbox{add}} + 3t_{\mbox{mult}} \\
c_2 & = 3t_{\mbox{asst}} + 2t_{\mbox{comp}} + 2t_{\mbox{add}} + t_{\mbox{mult}} \\ c_3 & = t_{\mbox{asst}} + t_{\mbox{comp}}
\end{align} \]
Then $T(N)=c_1 N^2 +c_2 N + c_3$
The $c_{i}$ will be different for different CPUs and even for different compilers on the same CPU.
But this formula for $T(N)$ still allows us to describe the behavior of this algorithm for different input (array) sizes.
Shortly, we will see that’s all that really matters.
Many algorithms will run in different amounts of time depending upon which input data they are given, even when given the same “amount” of data.
Consider, for example, a basic sequential search:
int seqSearch (int[] arr, int N, int key)
{
for (int i = 0; i < N; ++i)
{
if (arr[i] == key)
return i;
}
return -1;
}
key
that happens to be in arr[0]
, we return after a single iteration of the loop.key
that happens to be in arr[N-1]
, we return after N
iterations of the loop.key
that isn’t anywhere within arr
, we return after N
iterations of the loop, but N+1
evaluations of the loop condition. This is actually slightly slower than the previous case.arr
, we will take something between the two extremes of 1 iteration and N
iterations of the loop. Intuitively, we might guess that we will average around N/2
iterations.For a fixed “size” of input,
- we call the minimum run-time we can obtain, over all possible inputs of that size, the best case time of that algorithm,
- we call the maximum run-time we can obtain, over all possible inputs of that size, the worst case time of that algorithm, and
- we call the average run-time we can obtain, over all possible inputs of that size, the average case time of that algorithm.
Definition
We say that an algorithm requires time proportional to $f(\bar{n})$ if there are constants $c$ and $\bar{n}_0$ such that the algorithm requires no more than $c*f(\bar{n})$ time units to process an input set of size $\bar{n}$ whenever $\bar{n} \geq \bar{n}_0$ .
Let’s take this definition apart, piece by piece, and see what it actually tells us.
First, we are talking about an input set of size $\bar{n}$.
It’s not always clear what the components of $\bar{n}$ are, but, as it happens, the techniques that we will be developing will help to reveal the appropriate “size” measure(s)
Second, we are talking about algorithms taking time proportional to $f(\bar{n})$. The “f” here describes the rate at which the time required by this algorithm goes up as you change the size of the input for particular program or algorithm.
The multiplier c is used so that we can talk about the algorithm requiring no more than “$c*f(\bar{n})$ time units”. In that sense, c is the multiplier that converts this abstract function $f(\bar{n})$ into a real time.
The final component of this definition is $\bar{n}_0$.
$\bar{n}_0$ is used to place a lower limit on the inputs that we are really worried about. The reason for doing this is that
It’s only when we have large amounts of data to process that speed becomes an issue.
What does “time proportional to” have to do with the idea of complexity?
When I started off this lesson, I said what we are trying to do was to come up with the way to discuss the speed of an algorithm. Then I told you that we would not measure the speed directly, but instead use this notion of “complexity”. Then I turned around and gave you this definition of “time proportional to”. Is this some sort of bait and switch?
There are two steps remaining to justify the connections between “time proportional to” and “complexity”.
First, we introduce a shorthand notation so we don’t have to keep writing out “time proportional to”.
Definition
O(f(N)) is the set of all functions that are proportional to f(N).
If $t(N)$ is the time required to run a given program on an input set of size $N$, we write
We’re abusing the “=” sign here. That’s not really equality. Since $O(f(N))$ is a set of functions, we should probably be writing
\[ t(N) \in O(f(N)) \]
instead, and the original definitions of $O(f(n))$ did exactly that. But somewhere along the line, it became tradition to use the “$=$” symbol instead of “$\in$”.
\[ t(N) = O(f(N)) \]
to mean “there exist constants $c$ and $n_0$ such that $t(N) \le c*f(N)$ when $N > n_0$.”
For example, if a program runs in time T(N) and we can show that T(N) is proportional to $f(N)=N^2$, then we would say that “T(N) is in $O(N^2)$”.
This is often called “big-O” notation.
The “O” in this notation stands for “order”, so people will sometimes talk about $O(N)$ functions as having “linear order” or $O(N^2)$ functions as having “quadratic order”.
Second, we assert that this quantity, $O(f(n))$, is in fact what we call the worst case complexity of an algorithm, and that this will be our measure of algorithm speed in most cases.
Suppose that we have run a program all day, every day for a year.
In all that time, we have never managed to process more than 1,000 data in one day.
We are considering buying a new CPU that runs twice as fast as our current one
How many data records should we expect to process per day with the new CPU?
If we know the complexity of the program, we can answer that question.
Program Complexity | Max Records per Day |
---|---|
$O(\log n)$ | 1,000,000 |
$O(n)$ | 2,000 |
$O(n \log n)$ | 1,850 |
$O(n^{2})$ | 1,414 |
$O(n^{3})$ | 1,259 |
$O(2^{n})$ | 1,001 |
All the big-O expressions shown above are ones that we will encounter in the course of this semester, so you can see that there is a very noticeable and practical difference in the answers we would offer to the question, “Is it worth it to buy the faster CPU?” Certainly, if we knew that our program was in $O(2^{n})$, then buying a faster CPU would not actually boost our processing capability by much at all. We would probably do better by devoting our resources to redesign the program to use a much lower-complexity algorithm.
We left this earlier example
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
after concluding that $T(N) = c_1 N^2 + c_2 N + c_3$
Conjecture
\[ T(N) = c_1 N^2 + c_2 N + c_3 = O(N^2) \]
(i.e., the algorithm has a worst-case complexity of $N^{2}$)
Proof
\[ T(N) = c_1 N^2 + c_2 N + c_3 \]
Let $c=c_1 + c_2 + c_3$ and let $n_0=1$. It should be clear that
\[ c_1 N^2 + c_2 N + c_3 \leq c_1 N^2 + c_2 N^2 + c_3 N^2 \]
because $c_2 N \leq c_2 N^2$ and $c_3 \leq c_3 N^2$ whenever $N \geq 1$.
We know, therefore, that
\[T(N) \leq c_1 N^2 + c_2 N^2 + c_3 N^2 \]
Now define c as $c = c_1 + c_2 + c_3$, and we have
\[ T(N) \leq c N^2 \]
which means, by the definition of “proportional to” that T(N) is proportional to $N^2$ and therefore is in $O(N^{2})$.
The tallest building in the world is said to be the Burj Khalifa, at 2,717 ft. So if I told you that “h(b) is the height of a building b”, you might suggest that $h(b) \leq 2717\mbox{ft}$. And that’s a reasonable bound.
Of course it’s also true that $h(b) \leq 4000\mbox{ft}$. And that $h(b) \leq 10000\mbox{mi}$. None of those are false statements, but 2717ft is a tight bound, 4000ft is a loose bound, and 10000mi is a ridiculously loose bound.
What is the complexity of $t(n) = 0.5n$?
In general, you should always try to give as tight a bound for the complexity as you can.
Sometimes we will settle for slightly looser bounds because proving anything tighter would be difficult. But being off by one or more powers of $n$ will be considered a failure to provide a proper bound.
Recall our definition:
We say that an algorithm requires time proportional to $f(\bar{n})$ if there are constants $c$ and $\bar{n}_0$ such that the algorithm requires no more than $c*f(\bar{n})$ time units to process an input set of size $n$ whenever $\bar{n} \geq \bar{n}_0$ .
In our definition of complexity, $\bar{n}$ is simply some numeric measure of the “size” of the algorithm’s inputs.
Writing a big-O expression with undefined variables (especially “n”) is one of the most common, and least forgivable, mistakes in students’ answers on assignments and tests.
Writing a big-O expression with an expression that does not describe a number is probably the next most common error.
In many cases, the algorithm will not actually contain a variable or parameter named “n
”.
n
is not the same thing as N
, in C++ or in mathematics!)For example, consider the function
ostream& operator<< (ostream& out, const std::string& str);
We can’t simply say:
operator<<
is in $O(n)$because there is no “n” here.
We also can’t say
operator<<
is in $O(out)$or
operator<<
is in $O(str)$There’s two ways to express the idea that this function’s complexity is probably proportional to the length of the string str
. The first is to simply define the symbol we want:
operator<<
is in $O(n)$ , where $n$ is the number of characters in str
.The added definition makes all the difference
The other way to express the same idea is to replace “n” by an appropriate expression that captures the concept of “number of characters in str
”. Taking advantage of the public interface to std::string
, we can write:
operator<<
is in O(str.size()
)which is perfectly understandable to anyone familiar with std::string
in C++.