Analysis of Algorithms: Worst Case Complexity

Steven J. Zeil

Last modified: Oct 20, 2020
Contents:

Our primary tool in trying to predict the speed of an algorithm is going to be an idea we call the “complexity” or more specifically the “worst case complexity” of the algorithm.

The English language word “complexity” may mean many things, and two people might not necessarily agree which of two algorithms is more complex in the colloquial sense of the word. But in this lesson we are going to explore a specific definition of complexity that will, as it turns out, be directly related to the running speed of the algorithm.

I’ll warn you right up front that the initial work we are going to do, both giving this definition and showing how it applies, is going to be little bit tedious. But as we go through the remainder of the lessons in this section of the course, we will come up with faster and more elegant ways to apply this idea of complexity.

1 Brute Force Timing Analysis

1.1 Timing Analysis Example

1: for (i = 0; i < N; ++i) {
2:    a[i] = 0;
3:    for (j = 0; j < N; ++j)
4:       a[i] = a[i] + i*j;
5: }

Let’s look at how we might analyze a simple algorithm to determine its run time before implementation:

We do this by tallying up all of the primitive operations that would be performed by this code. (This requires a certain level of insight into how a compiler will translate non-trivial operations, such as array indexing.)

first prev1 of 7next last

The total run time, $T(N)$, for this algorithm is

\[ \begin{eqnarray*} T(N) & = & (2N^2 + 3N + 1) t_{\mbox{asst}} + (N^2 + 2N + 1)t_{\mbox{comp}} \\ & & + (4N^2 + 2N)t_{\mbox{add}} + (3N^2 + N)t_{\mbox{mult}} \end{eqnarray*} \]

where $t_{\mbox{asst}}$ is the time required by our CPU to do one assignment, $t_{\mbox{comp}}$ is the time required by our CPU to do one comparison, $t_{\mbox{add}}$ is the time required to do one addition, and $t_{\mbox{mult}}$ is the time required to do one multiplication.

1.1.1 Limitations of Detailed Analysis

1.2 Do We Need That Much Detail?

The total run time, T(N), for this algorithm is

\[ \begin{eqnarray*} T(N) & = & (2N^2 + 3N + 1) t_{\mbox{asst}} + (N^2 + 2N + 1)t_{\mbox{comp}} \\ & & + (4N^2 + 2N)t_{\mbox{add}} + (3N^2 + N)t_{\mbox{mult}} \end{eqnarray*} \]

Suppose that we group together terms that involve different powers of N:

\[\begin{eqnarray*} T(N) & = & N^{2}(2t_{\mbox{asst}} + t_{\mbox{comp}} + 4t_{\mbox{add}} + 3t_{\mbox{mult}}) \\ & & + N(3t_{\mbox{asst}} + 2t_{\mbox{comp}} + 2t_{\mbox{add}} + t_{\mbox{mult}}) \\ & & + 1(t_{\mbox{asst}} + t_{\mbox{comp}}) \end{eqnarray*}\]

Each of the parenthesized terms is, for any given CPU, a constant.

1.2.1 CPU Constants

Define

\[ \begin{align} c_1 & = 2t_{\mbox{asst}} + t_{\mbox{comp}} + 4t_{\mbox{add}} + 3t_{\mbox{mult}} \\
c_2 & = 3t_{\mbox{asst}} + 2t_{\mbox{comp}} + 2t_{\mbox{add}} + t_{\mbox{mult}} \\ c_3 & = t_{\mbox{asst}} + t_{\mbox{comp}}
\end{align} \]

Then $T(N)=c_1 N^2 +c_2 N + c_3$

2 Best, Worst, and Average Cases

2.1 Varying Times

Many algorithms will run in different amounts of time depending upon which input data they are given, even when given the same “amount” of data.

Consider, for example, a basic sequential search:

int seqSearch (int[] arr, int N, int key)
{
    for (int i = 0; i < N; ++i)
    {
       if (arr[i] == key)
          return i;
    }
    return -1;
} 

Obviously, if we increase N, the number of elements in the array, this algorithm will slow down. But even if we look at different possibilities for some fixed value of N, there can be very different run times.

For a fixed “size” of input,

  • we call the minimum run-time we can obtain, over all possible inputs of that size, the best case time of that algorithm,
  • we call the maximum run-time we can obtain, over all possible inputs of that size, the worst case time of that algorithm, and
  • we call the average run-time we can obtain, over all possible inputs of that size, the average case time of that algorithm.

Now, the notion of what constitutes an “average” is a surprisingly slippery one, so we will defer discussion of that till a later lesson.

Even the idea of “size of input” is a bit problematic, but we need to come to grips with that almost immediately.

First though, let’s point out that we are only rarely interested in the best case behavior. It would be absurdly overly-optimistic to make decisions about what algorithm to use based on it’s best-case behavior.

We might think that the average case behavior would be a much more reasonable basis for choice. And, sometimes it is. But if the worst case behavior is much, much slower than the average, then Murphy’s law says that we will hit that slow worst case just when it is most critical, when we really need that output quickly or when we are demoing the code to upper management or potential customers.

So, for now, we’re going to focus on worst-case behavior.

2.2 Size of the Input

What do we mean by “size”? Often, that’s pretty obvious. In our sequential search function, the array size N is clearly an indicator of how much data we have to work with. If, as another example, we had a program that read a series of numbers and computed the average of them all, the “size” is probably how many numbers are in the input file. If we had a program that read personnel records for preparing a company’s payroll, the “size” is probably the number of people working for the company.

For some code, there may be multiple possibilities for a “size” measure. If our program reads and processes lines of text, the best “size” measure might be the number of lines in the input file. But if we actually scan through each character in each line, the total number of characters might be a more appropriate “size” measure.

Sometimes, we will need to use multiple, separate numbers to describe the size of the input. Let’s change our sequential search function just slightly:

int seqSearch (string[] arr, int N, string key)
{
    for (int i = 0; i < N; ++i)
    {
       if (arr[i] == key)
          return i;
    }
    return -1;
} 

Now, the time required to do the comparison arr[i] == key depends upon how many characters are in our strings. Obviously, strings of a few dozen characters can be compared quickly. Strings of thousands or millions of characters may take a long time to compare.

So we might need many numbers to measure the size $\bar{n}$ of the input for this function:

\[ \bar{n} = (N, w_0, w_1, …, w_{N-1}, w_{\mbox{key}}) \]

where N is the parameter denoting the array size, the $w_i$ are the widths of the strings in arr, and $w_{\mbox{key}}$ is the width of the string in key.

Well, that got ugly real fast!

Now, given that we have said that we are going to focus on worst case behavior, we can guess that we can probably reduce that a bit to

\[ \bar{n} = (N, w_{\mbox{max}}) \]

where N is the parameter denoting the array size, and $w_{\mbox{max}}$ is the maximum width of any string in arr and key.

In some cases we might be able to reduce the size $\bar{n}$ to a single number. If someone were to walk in on our design meeting and say “Oh, by the way, we never have more than 25 elements in the array.”, we might conclude that $w_{\mbox{max}}$ is the only thing we need to worry about. Or, hearkening back to our earlier example of a spellchecker design, if we believed that our array was filled with English language words, we could argue that there is a reasonably small limit on $w_{\mbox{max}}$ and that N is the only thing we really need to worry about.

3 Worst Case Complexity

3.1 Time Proportional To

Definition

We say that an algorithm requires time proportional to $f(\bar{n})$ if there are constants $c$ and $\bar{n}_0$ such that the algorithm requires no more than $c*f(\bar{n})$ time units to process an input set of size $\bar{n}$ whenever $\bar{n} \geq \bar{n}_0$ .

 

The definition you see here is going to be fundamental to our idea of complexity of an algorithm. This definition explains what we mean when we say that an algorithm requires time proportional to some function of n, where n is the measure of the amount of input data.

Let’s take this definition apart, piece by piece, and see what it actually tells us.

3.1.1 Getting back to “complexity”

What does “time proportional to” have to do with the idea of complexity?

When I started off this lesson, I said what we are trying to do was to come up with the way to discuss the speed of an algorithm. Then I told you that we would not measure the speed directly, but instead use this notion of “complexity”. Then I turned around and gave you this definition of “time proportional to”. Is this some sort of bait and switch?

There are two steps remaining to justify the connections between “time proportional to” and “complexity”.

3.2 Big-O

First, we introduce a shorthand notation so we don’t have to keep writing out “time proportional to”.

Definition

O(f(N)) is the set of all functions that are proportional to f(N).

If $t(N)$ is the time required to run a given program on an input set of size $N$, we write

 

\[ t(N) = O(f(N)) \]

to mean “there exist constants $c$ and $n_0$ such that $t(N) \le c*f(N)$ when $N > n_0$.”

This is often called “big-O” notation. For example, if a program runs in time T(N) and we can show that T(N) is proportional to $f(N)=N^2$, then we would say that “T(N) is in $O(N^2)$”. Informally, we often simply say that the program is in $O(N^2)$. Some people will shorten that phrase even further and simply say that the program “is $O(N^2)$”, but that tends to hide the fact that $O(N^2)$ is actually the name of a whole set of programs.

The “O” in this notation stands for “order”, so people will sometimes talk about $O(N)$ functions as having “linear order” or $O(N^2)$ functions as having “quadratic order”.

3.3 Big-O == Worst Case Complexity

Second, we assert that this quantity, $O(f(n))$, is in fact what we call the worst case complexity of an algorithm, and that this will be our measure of algorithm speed in most cases.

3.3.1 Big-O and Expectations for Run Times

Suppose that we have run a program all day, every day for a year.

How many data records should we expect to process per day with the new CPU?

If we know the complexity of the program, we can answer that question.

Program Complexity Max Records per Day
$O(\log n)$ 1,000,000
$O(n)$ 2,000
$O(n \log n)$ 1,850
$O(n^{2})$ 1,414
$O(n^{3})$ 1,259
$O(2^{n})$ 1,001

All the big-O expressions shown above are ones that we will encounter in the course of this semester, so you can see that there is a very noticeable and practical difference in the answers we would offer to the question, “Is it worth it to buy the faster CPU?” Certainly, if we knew that our program was in $O(2^{n})$, then buying a faster CPU would not actually boost our processing capability by much at all. We would probably do better by devoting our resources to redesign the program to use a much lower-complexity algorithm.

4 Testing a O(f(N)) Hypothesis

For example, suppose that we run a program on different sizes of input. Suppose, furthermore, that we have observed this program for a long time and noted that, for any given size of the input, it always takes approximately the same amount of time. In other words, whatever the worst case time is, we are reasonably sure that we have seen it or that it’s not significantly worse than the other times we have been observing.

If this is what we have observed:

Input Set Size Average Time (seconds)
1.0 10.0
2.0 40.0
3.0 80.0
4.0 165.0

We would suspect that this algorithm is in $O(N^{2})$ (with c = 10).

To defend this suspicion, refer back to our definition:

We say that an algorithm requires time proportional to f(n) if there are constants c and $n_{0}$ such that the algorithm requires no more than c*f(n) time units to process an input set of size n whenever $n \geq n_{0}$ .

Divide each running time by $f(N) = N^{2}$. This would give us an approximate value for the constant $c$ from our definition of complexity.

Input Set Size (N) Avg. Time (sec) Time / ($N^2$)
1.0 10.0 10.0
2.0 40.0 10.0
3.0 80.0 8.9
4.0 165.0 10.3

If our guess is correct, the quotients should stay roughly constant.

Some variation is normal in any experiment, but this looks pretty good. One important limitation, however, is that in this example we’re only dealing with averages over a finite number of observations. If that sample of input cases does not, in fact, include the input case responsible for the the worst time of the algorithm and if that worst case is significantly different from the average, then then our numbers don’t mean much at all about the worst case complexity.


Another thing to consider: the following is also probably $O(N^{2})$:

Input Set Size (N) Time (seconds) Time / ($N^2$)
1.0 2000.0 2000.0
2.0 8000.0 2000.0
3.0 16000.0 2000.0
4.0 32000.0 2000.0

It just has a significantly larger value for $c$.

So two programs can both be in $O(f)$, but have very different run times. As we’ll see shortly, however, when two programs have different complexities $O(f)$ and $O(g)$, no matter what constants they might involve, the function part of the complexities will eventually dominate any comparison.

5 Detailed Timing Example Revisited

We left this earlier example

1: for (i = 0; i < N; ++i) {
2:    a[i] = 0;
3:    for (j = 0; j < N; ++j)
4:       a[i] = a[i] + i*j;
5: }

after concluding that $T(N) = c_1 N^2 + c_2 N + c_3$

Conjecture

\[ T(N) = c_1 N^2 + c_2 N + c_3 = O(N^2) \]

(i.e., the algorithm has a worst-case complexity of $N^{2}$)

Proof

\[ T(N) = c_1 N^2 + c_2 N + c_3 \]

Let $c=c_1 + c_2 + c_3$ and let $n_0=1$. It should be clear that

\[ c_1 N^2 + c_2 N + c_3 \leq c_1 N^2 + c_2 N^2 + c_3 N^2 \]

because $c_2 N \leq c_2 N^2$ and $c_3 \leq c_3 N^2$ whenever $N \geq 1$.

We know, therefore, that

\[T(N) \leq c_1 N^2 + c_2 N^2 + c_3 N^2 \]

Now define c as $c = c_1 + c_2 + c_3$, and we have

\[ T(N) \leq c N^2 \]

which means, by the definition of “proportional to” that T(N) is proportional to $N^2$ and therefore is in $O(N^{2})$.

6 Pitfalls to Avoid

6.1 Loose Bounds and Tight Bounds

The tallest building in the world is said to be the Burj Khalifa, at 2,717 ft. So if I told you that “h(b) is the height of a building b”, you might suggest that $h(b) \leq 2717\mbox{ft}$. And that’s a reasonable bound.

Of course it’s also true that $h(b) \leq 4000\mbox{ft}$. And that $h(b) \leq 10000\mbox{mi}$. None of those are false statements, but 2717ft is a tight bound, 4000ft is a loose bound, and 10000mi is a ridiculously loose bound.

What is the complexity of $t(n) = 0.5n$?

In general, you should always try to give as tight a bound for the complexity as you can.

Sometimes we will settle for slightly looser bounds because proving anything tighter would be difficult. But being off by one or more powers of $n$ will be considered a failure to provide a proper bound.

6.2 “n” is “n”othing special

Recall our definition:

We say that an algorithm requires time proportional to $f(\bar{n})$ if there are constants $c$ and $\bar{n}_0$ such that the algorithm requires no more than $c*f(\bar{n})$ time units to process an input set of size $n$ whenever $\bar{n} \geq \bar{n}_0$ .

In our definition of complexity, $\bar{n}$ is simply some numeric measure of the “size” of the algorithm’s inputs. Now, that doesn’t mean the algorithm will actually contain a variable or parameter named “n”, or that there will even be a single program variable that we can substitute for n in this definition. Sometimes, our “n” is an expression that, somehow, defines a measure of size of the input.

For example, consider the function

  ostream& operator<< (ostream& out, const std::string& str);

for writing strings to an output stream. Now, intuitively, we can guess that if were to double the number of characters in the string, str, to be written out, that the amount of time taken by this function would also double. That suggests that this function’s complexity is probably proportional to the length of the string str.

But how do we write that? We can’t simply say:

because there is no “n” here.

You know that, when you are programming, you aren’t allowed to use variables that you have not properly declared. Surprise! The same rules applies to mathematics: you can’t use undeclared variables in mathematics!

We also can’t say

or

because, although out and str are defined variables in this context, neither one is a numeric quantity, so the expressions O(out) and O(str) don’t make sense. For example, O(str) would, ultimately, mean that $t(n) < c * \mbox{str}$, and that simply makes no sense if str is a string.

There’s two ways to express the idea that this function’s complexity is probably proportional to the length of the string str. The first is to simply define the symbol we want:

The added definition makes all the difference — it allows a reader to ascertain that $n$ is a property of str (and not of some other variable that might be mentioned nearby, such as, for example, out) and exactly what the nature of that property is. This assumes, of course, that the idea of “number of characters in” is sufficiently obvious to be easily and unambiguously understood.

The other way to express the same idea is to replace “n” by an appropriate expression that captures the concept of “number of characters in str”. Taking advantage of the public interface to std::string, we can write:

which is perfectly understandable to anyone familiar with std::string in C++.

Writing a big-O expression with undefined variables (especially “n”) is one of the most common, and least forgivable, mistakes in students answers on assignments and tests.

Writing a big-O expression with an expression that does not describe a number is probably the next most common error on tests.