Analysis of Algorithms: Worst Case Complexity
Steven J. Zeil
Our primary tool in trying to predict the speed of an algorithm is going to be an idea we call the “complexity” or more specifically the “worst case complexity” of the algorithm.
The English language word “complexity” may mean many things, and two people might not necessarily agree which of two algorithms is more complex in the colloquial sense of the word. But in this lesson we are going to explore a specific definition of complexity that will, as it turns out, be directly related to the running speed of the algorithm.
I’ll warn you right up front that the initial work we are going to do, both giving this definition and showing how it applies, is going to be little bit tedious. But as we go through the remainder of the lessons in this section of the course, we will come up with faster and more elegant ways to apply this idea of complexity.
1 Brute Force Timing Analysis
1.1 Timing Analysis Example
The total run time, $T(N)$, for this algorithm is
\[ \begin{eqnarray*} T(N) & = & (2N^2 + 3N + 1) t_{\mbox{asst}} + (N^2 + 2N + 1)t_{\mbox{comp}} \\ & & + (4N^2 + 2N)t_{\mbox{add}} + (3N^2 + N)t_{\mbox{mult}} \end{eqnarray*} \]
where $t_{\mbox{asst}}$ is the time required by our CPU to do one assignment, $t_{\mbox{comp}}$ is the time required by our CPU to do one comparison, $t_{\mbox{add}}$ is the time required to do one addition, and $t_{\mbox{mult}}$ is the time required to do one multiplication.
1.1.1 Limitations of Detailed Analysis
-
This process is tedious.
-
The actual value depends upon how fast your CPU does assignments, additions, …
-
The actual value depends on how your compiler translates the source code into lower-level instructions and upon the settings you use when invoking the compiler. We’ve made some reasonable assumptions in this example, but that might be harder to do if we were working with more complicated code.
1.2 Do We Need That Much Detail?
The total run time, T(N), for this algorithm is
\[ \begin{eqnarray*} T(N) & = & (2N^2 + 3N + 1) t_{\mbox{asst}} + (N^2 + 2N + 1)t_{\mbox{comp}} \\ & & + (4N^2 + 2N)t_{\mbox{add}} + (3N^2 + N)t_{\mbox{mult}} \end{eqnarray*} \]
Suppose that we group together terms that involve different powers of N:
\[\begin{eqnarray*} T(N) & = & N^{2}(2t_{\mbox{asst}} + t_{\mbox{comp}} + 4t_{\mbox{add}} + 3t_{\mbox{mult}}) \\ & & + N(3t_{\mbox{asst}} + 2t_{\mbox{comp}} + 2t_{\mbox{add}} + t_{\mbox{mult}}) \\ & & + 1(t_{\mbox{asst}} + t_{\mbox{comp}}) \end{eqnarray*}\]
Each of the parenthesized terms is, for any given CPU, a constant.
1.2.1 CPU Constants
Define
\[ \begin{align} c_1 & = 2t_{\mbox{asst}} + t_{\mbox{comp}} + 4t_{\mbox{add}} + 3t_{\mbox{mult}} \\
c_2 & = 3t_{\mbox{asst}} + 2t_{\mbox{comp}} + 2t_{\mbox{add}} + t_{\mbox{mult}} \\ c_3 & = t_{\mbox{asst}} + t_{\mbox{comp}}
\end{align} \]
Then $T(N)=c_1 N^2 +c_2 N + c_3$
-
The $c_{i}$ will be different for different CPUs and even for different compilers on the same CPU.
-
But this formula for $T(N)$ still allows us to describe the behavior of this algorithm for different input (array) sizes.
-
Shortly, we will see that’s all that really matters.
2 Best, Worst, and Average Cases
2.1 Varying Times
Many algorithms will run in different amounts of time depending upon which input data they are given, even when given the same “amount” of data.
Consider, for example, a basic sequential search:
int seqSearch (int[] arr, int N, int key)
{
for (int i = 0; i < N; ++i)
{
if (arr[i] == key)
return i;
}
return -1;
}
Obviously, if we increase N
, the number of elements in the array, this algorithm will slow down. But even if we look at different possibilities for some fixed value of N
, there can be very different run times.
- If we search for a
key
that happens to be inarr[0]
, we return after a single iteration of the loop. - If we search for a
key
that happens to be inarr[N-1]
, we return afterN
iterations of the loop. - If we search for a
key
that isn’t anywhere withinarr
, we return afterN
iterations of the loop, butN+1
evaluations of the loop condition. This is actually slightly slower than the previous case. - If we make a random selection of a string from among the ones filled into
arr
, we will take something between the two extremes of 1 iteration andN
iterations of the loop. Intuitively, we might guess that we will average aroundN/2
iterations.
For a fixed “size” of input,
- we call the minimum run-time we can obtain, over all possible inputs of that size, the best case time of that algorithm,
- we call the maximum run-time we can obtain, over all possible inputs of that size, the worst case time of that algorithm, and
- we call the average run-time we can obtain, over all possible inputs of that size, the average case time of that algorithm.
Now, the notion of what constitutes an “average” is a surprisingly slippery one, so we will defer discussion of that till a later lesson.
Even the idea of “size of input” is a bit problematic, but we need to come to grips with that almost immediately.
First though, let’s point out that we are only rarely interested in the best case behavior. It would be absurdly overly-optimistic to make decisions about what algorithm to use based on it’s best-case behavior.
We might think that the average case behavior would be a much more reasonable basis for choice. And, sometimes it is. But if the worst case behavior is much, much slower than the average, then Murphy’s law says that we will hit that slow worst case just when it is most critical, when we really need that output quickly or when we are demoing the code to upper management or potential customers.
- Worst case behavior is particularly crucial for interactive programs. If you have a button in front of you, and when you click on it, you are used to getting a response in an average of a tenth of a second, what do you do when you suddenly get a “worst case” delay of 5 seconds? Why you click it again, and again, and… (Come on, be honest, you know you do this!) And suddenly instead of performing a function once, the program finally finishes handling the first click and then immediately tries to honor all of those other clicks as well.
So, for now, we’re going to focus on worst-case behavior.
2.2 Size of the Input
What do we mean by “size”? Often, that’s pretty obvious. In our sequential search function, the array size N
is clearly an indicator of how much data we have to work with. If, as another example, we had a program that read a series of numbers and computed the average of them all, the “size” is probably how many numbers are in the input file. If we had a program that read personnel records for preparing a company’s payroll, the “size” is probably the number of people working for the company.
For some code, there may be multiple possibilities for a “size” measure. If our program reads and processes lines of text, the best “size” measure might be the number of lines in the input file. But if we actually scan through each character in each line, the total number of characters might be a more appropriate “size” measure.
Sometimes, we will need to use multiple, separate numbers to describe the size of the input. Let’s change our sequential search function just slightly:
int seqSearch (string[] arr, int N, string key)
{
for (int i = 0; i < N; ++i)
{
if (arr[i] == key)
return i;
}
return -1;
}
Now, the time required to do the comparison arr[i] ==
key
depends upon how many characters are in our strings. Obviously, strings of a few dozen characters can be compared quickly. Strings of thousands or millions of characters may take a long time to compare.
So we might need many numbers to measure the size $\bar{n}$ of the input for this function:
\[ \bar{n} = (N, w_0, w_1, …, w_{N-1}, w_{\mbox{key}}) \]
where N
is the parameter denoting the array size, the $w_i$ are the widths of the strings in arr
, and $w_{\mbox{key}}$ is the width of the string in key
.
Well, that got ugly real fast!
Now, given that we have said that we are going to focus on worst case behavior, we can guess that we can probably reduce that a bit to
\[ \bar{n} = (N, w_{\mbox{max}}) \]
where N
is the parameter denoting the array size, and $w_{\mbox{max}}$ is the maximum width of any string in arr
and key
.
In some cases we might be able to reduce the size $\bar{n}$ to a single number. If someone were to walk in on our design meeting and say “Oh, by the way, we never have more than 25 elements in the array.”, we might conclude that $w_{\mbox{max}}$ is the only thing we need to worry about. Or, hearkening back to our earlier example of a spellchecker design, if we believed that our array was filled with English language words, we could argue that there is a reasonably small limit on $w_{\mbox{max}}$ and that N
is the only thing we really need to worry about.
-
Reputedly, “antidisestablishmentarianism” is the longest word recognized in English dictionaries. While quite a mouthful, that’s still not long enough to be a major concern in most text processing applications.
-
On the other hand, if we were working in genetics, our “strings” might represent genomes and a “word” might be many millions of characters long, in which case we might expect the length of the strings being processed would be a major factor in the algorithm performance.
3 Worst Case Complexity
3.1 Time Proportional To
Definition
We say that an algorithm requires time proportional to $f(\bar{n})$ if there are constants $c$ and $\bar{n}_0$ such that the algorithm requires no more than $c*f(\bar{n})$ time units to process an input set of size $\bar{n}$ whenever $\bar{n} \geq \bar{n}_0$ .
The definition you see here is going to be fundamental to our idea of complexity of an algorithm. This definition explains what we mean when we say that an algorithm requires time proportional to some function of n, where n is the measure of the amount of input data.
Let’s take this definition apart, piece by piece, and see what it actually tells us.
-
First, we are talking about an input set of size $\bar{n}$.
-
It’s not always clear what the components of $\bar{n}$ are, but, as it happens, the techniques that we will be developing will help to reveal the appropriate “size” measure(s)
as a side effect of the analysis that we will be doing anyway. So we don’t need to sweat over this question too much for now.
-
-
Second, we are talking about algorithms taking time proportional to $f(\bar{n})$. The “f” here describes the rate at which the time required by this algorithm goes up as you change the size of the input for particular program or algorithm.
For example, if I were to tell you that an algorithm runs in time proportional to N (i.e., $f(\bar{n})=N$), then I am telling you that the running time of this algorithm is directly proportional to the size of the input set.
-
Doubling the size of input set (running twice as much data we did previously) may double the running time. More specifically, it will double the maximum amount of time we were expecting this algorithm will take.
-
On the other hand if I run 10 times as many inputs through this algorithm, I will expect that, for an algorithm with time proportional to n, it would take 10 times as much time to run.
Now, suppose I tell you that this algorithm runs in time proportional to $N^{2}$. Then the running time of algorithm in the worst case should go up with the square of the input size.
- So if I run twice as much input than before, I should see that the maximum running time goes up by the factor of 4.
- If I run 10 times as many inputs as I did previously, I could see the running time will go up by as much as 100 times.
Keep in mind that we are talking about worst case situations here. If an algorithm takes time proportional to $N^{2}$ and I double the amount of input I put through it, I might see the running time go up as much as a factor of 4. It doesn’t have to be that bad, but it could be.
-
-
The multiplier c is used so that we can talk about the algorithm requiring no more than “$c*f(\bar{n})$ time units”. In that sense, c is the multiplier that converts this abstract function $f(\bar{n})$ into a real time.
If I tell you, let’s say, that this algorithm runs in time proportional to $f(\bar{n}) = N$ and also tell you that constant c is 0.25 seconds, then you can take any particular value of N, plug it into this formula, and get the exact value for the maximum run time of this algorithm. For example, when N=10, then I would expect this algorithm would take no more than 2.5 seconds.
-
The final component of this definition is $\bar{n}_0$.
When we say $\bar{n} \geq \bar{n}_{0}$, we mean that each component of $\bar{n}$ is greater than or equal to the corresponding component of $\bar{n}_0$.
$\bar{n}_0$ is used to place a lower limit on the inputs that we are really worried about. The reason for doing this is that
- many algorithms behave rather differently on small input sets than they do on large sets, and
- algorithms are often fast enough on small input sets that we just don’t care about their exact speed.
It’s only when we have large amounts of data to process that speed becomes an issue.
Suppose that algorithm A is faster than B on small input sets, but B is faster than A on large input sets. Odds are that, on the small sets, both algorithms are “fast enough” and any differences between them are of little practical consequence. What we are usually interested in is talking about what happens when the input set size is big enough to actually be troublesome. That’s what $n_{0}$ does for us. The definition says that we are only worried about this when $\bar{n} \geq \bar{n}_{0}$. $\bar{n}_{0}$ represents the limit of “being large enough to worry about”. When $\bar{n}$ is smaller than $\bar{n}_{0}$ we don’t care about the speed.
Sometimes an algorithm is able to achieve good run times for large input sizes because it does a lot of extra preparations at the beginning of the run. For example, some algorithms compute a bunch of intermediate results and store them away in data structures from which they can be retrieved later. In circumstances like that, you often find that the algorithm speed suffers for the very small input set sizes because there is certain amount of overhead involved in doing that preparation. And so a simpler algorithm that doesn’t do that kind of extra preparation may run faster for small input set sizes. But when you start giving larger and larger sets of inputs, the algorithm that does the extra preparation starts to go faster and faster, taking advantage of that extra work that it did early.
Given a choice between these two algorithms, we probably want to go with more elaborate algorithm, not because it is more elaborate but because, when it really counts, that algorithm actually runs faster for us.
This is what $n_{0}$ does for us in that definition. It sets a certain threshold below which we don’t really care.
3.1.1 Getting back to “complexity”
What does “time proportional to” have to do with the idea of complexity?
When I started off this lesson, I said what we are trying to do was to come up with the way to discuss the speed of an algorithm. Then I told you that we would not measure the speed directly, but instead use this notion of “complexity”. Then I turned around and gave you this definition of “time proportional to”. Is this some sort of bait and switch?
There are two steps remaining to justify the connections between “time proportional to” and “complexity”.
3.2 Big-O
First, we introduce a shorthand notation so we don’t have to keep writing out “time proportional to”.
Definition
O(f(N)) is the set of all functions that are proportional to f(N).
If $t(N)$ is the time required to run a given program on an input set of size $N$, we write
\[ t(N) = O(f(N)) \]
to mean “there exist constants $c$ and $n_0$ such that $t(N) \le c*f(N)$ when $N > n_0$.”
For example, if a program runs in time T(N) and we can show that T(N) is proportional to $f(N)=N^2$, then we would say that “T(N) is in $O(N^2)$”.
This is often called “big-O” notation.
Informally, we often simply say that the program is in $O(N^2)$. Some people will shorten that phrase even further and simply say that the program “is $O(N^2)$”, but that tends to hide the fact that $O(N^2)$ is actually the name of a whole set of programs.
The “O” in this notation stands for “order”, so people will sometimes talk about $O(N)$ functions as having “linear order” or $O(N^2)$ functions as having “quadratic order”.
3.3 Big-O == Worst Case Complexity
Second, we assert that this quantity, $O(f(n))$, is in fact what we call the worst case complexity of an algorithm, and that this will be our measure of algorithm speed in most cases.
3.3.1 Big-O and Expectations for Run Times
Suppose that we have run a program all day, every day for a year.
-
In all that time, we have never managed to process more than 1,000 data in one day.
-
We are considering buying a new CPU that runs twice as fast as our current one
How many data records should we expect to process per day with the new CPU?
If we know the complexity of the program, we can answer that question.
Program Complexity | Max Records per Day |
---|---|
$O(\log n)$ | 1,000,000 |
$O(n)$ | 2,000 |
$O(n \log n)$ | 1,850 |
$O(n^{2})$ | 1,414 |
$O(n^{3})$ | 1,259 |
$O(2^{n})$ | 1,001 |
All the big-O expressions shown above are ones that we will encounter in the course of this semester, so you can see that there is a very noticeable and practical difference in the answers we would offer to the question, “Is it worth it to buy the faster CPU?” Certainly, if we knew that our program was in $O(2^{n})$, then buying a faster CPU would not actually boost our processing capability by much at all. We would probably do better by devoting our resources to redesign the program to use a much lower-complexity algorithm.
4 Testing a O(f(N)) Hypothesis
For example, suppose that we run a program on different sizes of input. Suppose, furthermore, that we have observed this program for a long time and noted that, for any given size of the input, it always takes approximately the same amount of time. In other words, whatever the worst case time is, we are reasonably sure that we have seen it or that it’s not significantly worse than the other times we have been observing.
If this is what we have observed:
Input Set Size | Average Time (seconds) |
---|---|
1.0 | 10.0 |
2.0 | 40.0 |
3.0 | 80.0 |
4.0 | 165.0 |
We would suspect that this algorithm is in $O(N^{2})$ (with c = 10).
To defend this suspicion, refer back to our definition:
We say that an algorithm requires time proportional to f(n) if there are constants c and $n_{0}$ such that the algorithm requires no more than c*f(n) time units to process an input set of size n whenever $n \geq n_{0}$ .
Divide each running time by $f(N) = N^{2}$. This would give us an approximate value for the constant $c$ from our definition of complexity.
Input Set Size (N) | Avg. Time (sec) | Time / ($N^2$) |
---|---|---|
1.0 | 10.0 | 10.0 |
2.0 | 40.0 | 10.0 |
3.0 | 80.0 | 8.9 |
4.0 | 165.0 | 10.3 |
If our guess is correct, the quotients should stay roughly constant.
Some variation is normal in any experiment, but this looks pretty good. One important limitation, however, is that in this example we’re only dealing with averages over a finite number of observations. If that sample of input cases does not, in fact, include the input case responsible for the the worst time of the algorithm and if that worst case is significantly different from the average, then our numbers don’t mean much at all about the worst case complexity.
Another thing to consider: the following is also probably $O(N^{2})$:
Input Set Size (N) | Time (seconds) | Time / ($N^2$) |
---|---|---|
1.0 | 2000.0 | 2000.0 |
2.0 | 8000.0 | 2000.0 |
3.0 | 16000.0 | 2000.0 |
4.0 | 32000.0 | 2000.0 |
It just has a significantly larger value for $c$.
So two programs can both be in $O(f)$, but have very different run times. As we’ll see shortly, however, when two programs have different complexities $O(f)$ and $O(g)$, no matter what constants they might involve, the function part of the complexities will eventually dominate any comparison.
5 Detailed Timing Example Revisited
We left this earlier example
1: for (i = 0; i < N; ++i) {
2: a[i] = 0;
3: for (j = 0; j < N; ++j)
4: a[i] = a[i] + i*j;
5: }
after concluding that $T(N) = c_1 N^2 + c_2 N + c_3$
Conjecture
\[ T(N) = c_1 N^2 + c_2 N + c_3 = O(N^2) \]
(i.e., the algorithm has a worst-case complexity of $N^{2}$)
Proof
\[ T(N) = c_1 N^2 + c_2 N + c_3 \]
Let $c=c_1 + c_2 + c_3$ and let $n_0=1$. It should be clear that
\[ c_1 N^2 + c_2 N + c_3 \leq c_1 N^2 + c_2 N^2 + c_3 N^2 \]
because $c_2 N \leq c_2 N^2$ and $c_3 \leq c_3 N^2$ whenever $N \geq 1$.
We know, therefore, that
\[T(N) \leq c_1 N^2 + c_2 N^2 + c_3 N^2 \]
Now define c as $c = c_1 + c_2 + c_3$, and we have
\[ T(N) \leq c N^2 \]
which means, by the definition of “proportional to” that T(N) is proportional to $N^2$ and therefore is in $O(N^{2})$.
6 Pitfalls to Avoid
6.1 Loose Bounds and Tight Bounds
The tallest building in the world is said to be the Burj Khalifa, at 2,717 ft. So if I told you that “h(b) is the height of a building b”, you might suggest that $h(b) \leq 2717\mbox{ft}$. And that’s a reasonable bound.
Of course it’s also true that $h(b) \leq 4000\mbox{ft}$. And that $h(b) \leq 10000\mbox{mi}$. None of those are false statements, but 2717ft is a tight bound, 4000ft is a loose bound, and 10000mi is a ridiculously loose bound.
What is the complexity of $t(n) = 0.5n$?
- We could say that $t(n) \in O(n)$, and that’s a tight bound.
- We could say that $t(n) \in O(n^2)$, and that’s true but is a loose bound.
- We could say that $t(n) \in O(\infty)$, and that’s also true but is such a loose bound that it’s completely useless.
In general, you should always try to give as tight a bound for the complexity as you can.
Sometimes we will settle for slightly looser bounds because proving anything tighter would be difficult. But being off by one or more powers of $n$ will be considered a failure to provide a proper bound.
6.2 “n” is “n”othing special
Recall our definition:
We say that an algorithm requires time proportional to $f(\bar{n})$ if there are constants $c$ and $\bar{n}_0$ such that the algorithm requires no more than $c*f(\bar{n})$ time units to process an input set of size $n$ whenever $\bar{n} \geq \bar{n}_0$ .
In our definition of complexity, $\bar{n}$ is simply some numeric measure of the “size” of the algorithm’s inputs.
Writing a big-O expression with undefined variables (especially “n”) is one of the most common, and least forgivable, mistakes in students’ answers on assignments and tests.
Writing a big-O expression with an expression that does not describe a number is probably the next most common error.
In many cases, the algorithm will not actually contain a variable or parameter named “n
”.
- (And
n
is not the same thing asN
, in C++ or in mathematics!)
For example, consider the function
ostream& operator<< (ostream& out, const std::string& str);
for writing strings to an output stream. Now, intuitively, we can guess that if were to double the number of characters in the string, str
, to be written out, that the amount of time taken by this function would also double. That suggests that this function’s complexity is probably proportional to the length of the string str
.
But how do we write that?
We can’t simply say:
operator<<
is in $O(n)$
because there is no “n” here.
You know that, when you are programming, you aren’t allowed to use variables that you have not properly declared. Surprise! The same rules applies to mathematics: you can’t use undeclared variables in mathematics!
We also can’t say
operator<<
is in $O(out)$
or
operator<<
is in $O(str)$
because, although out
and str
are defined variables in this context, neither one is a numeric quantity, so the expressions O(out) and O(str) don’t make sense. For example, O(str) would, ultimately, mean that $t(n) < c * \mbox{str}$, and that simply makes no sense if str
is a string.
There’s two ways to express the idea that this function’s complexity is probably proportional to the length of the string str
. The first is to simply define the symbol we want:
operator<<
is in $O(n)$ , where $n$ is the number of characters instr
.
The added definition makes all the difference
— it allows a reader to ascertain that $n$ is a property of str
(and not of some other variable that might be mentioned nearby, such as, for example, out
) and exactly what the nature of that property is. This assumes, of course, that the idea of “number of characters in” is sufficiently obvious to be easily and unambiguously understood.
The other way to express the same idea is to replace “n” by an appropriate expression that captures the concept of “number of characters in str
”. Taking advantage of the public interface to std::string
, we can write:
operator<<
is in O(str.size()
)
which is perfectly understandable to anyone familiar with std::string
in C++.