Average Case Analysis

Steven J. Zeil

Last modified: May 2, 2024
Contents:

1 Introduction

Earlier, we looked at the process of analyzing the worst-case running time of an algorithm, and the use of worst-case analysis and big-O notation as a way of describing it.

In this section, we will introduce average-case complexity. Just as the worst-case complexity describes an upper bound on the worst-case time we would see when running an algorithm, average case complexity will present an upper bound on the average time we would see when running the program many times on many different inputs.

Worst-case complexity gets used more often than average-case. There are a number of reasons for this.

In those circumstances, it makes sense to focus on worst-case behavior and to do what we can to improve that worst case.


On the other hand, suppose we’re talking about a batch program that will process thousands of inputs per run, or we’re talking about a critical piece of an interactive program that gets run hundreds or thousands of times in between each response from the user.

In that situation, adding up hundreds or thousands of worst-cases may be just too pessimistic. The cumulative time of thousands of different runs should show some averaging out of the worst-case behavior, and an average case analysis may give a more realistic picture of what the user will be seeing.

2 Definition

Definition: Average Case Complexity

We say that an algorithm requires average time proportional to $f(n)$ (or that it has average-case complexity $O(f(N))$ if there are constants $c$ and $n_{\mbox{0}}$ such that the average time the algorithm requires to process an input set of size $n$ is no more than $c*f(n)$ time units whenever $n \geq n_{\mbox{0}}$.

This definition is very similar to the one for worst case complexity. The difference is that for worst-case complexity, we want $T_{\mbox{max}}(n) \leq c*f(n)$ where $T_{\mbox{max}}(n)$ is the maximum time taken by any input of size $n$, but for average case complexity we want $T_{\mbox{avg}}(n) \leq c*f(n)$ where $T_{\mbox{avg}}(n)$ is the average time required by inputs of size $n$.

The average case complexity describes how quickly the average time increases when n increases, just as the worst case complexity describes how quickly the worst case time increases when n increases.

In both forms of complexity, we are looking for upper bounds, so our big-O notation (and its peculiar algebra) will still apply.


Question: Suppose we have an algorithm with worst case complexity $O(n)$.

True or false: It is possible for that algorithm to have average case complexity $O(n^{\mbox{2}})$.

Answer

This is an important idea to keep in mind as we discuss the rules for average case analysis. The worst-case analysis rules all apply, because they do provide an upper bound. But that bound is sometimes not as tight as it could be. One of the things that makes average case analysis tricky is recognizing when we can get by with the worst-case rules and when we can gain by using a more elaborate average-case rule. There’s no hard-and fast way to tell — it requires the same kind of personal judgment that goes on in most mathematical proofs.

3 Probably Not a Problem

We’re going to need a few basic facts about probabilities for this section.

You can find a fuller tutorial here.

4 What’s an Average?


For some people, average case analysis is difficult because they ​ don’t have a very flexible idea of what an “average” is.

Example:

Last semester, Professor Cord gave out the following grades in his CS361 class:

A, A, A-, B+, B, B, B, B-, C+, C, C-, D, F, F, F, F

Translating these to their numerical equivalent,

4, 4, 3.7, 3.3, 3, 3, 3, 2.7, 2.3, 2, 1.7, 1, 0, 0, 0, 0

what was the average grade in Cord’s class?

According to some classic forms of average:

Median
the middle value of the sorted list, (the midpoint between the two middle values given an even number of items)

\[ \mbox{avg}_{\mbox{median}} = 2.5 \]

Mode
the most commonly occurring value

\[ \mbox{avg}_{\mbox{modal}} = 0 \]

Mean
Computed from the sum of the elements

\[ \begin{align} \mbox{avg}_{\mbox{mean}} &= (4 + 4 + 3.7 + 3.3 + 3 + 3 + 3 + 2.7 + 2.3 + 2 + 1.7 + 1 + 0 + 0 + 0 + 0) / 16 \\ &= 2.11 \end{align} \]

4.1 The Mean Average

The mean average is the most commonly used, and corresponds to most people’s idea of a “normal” average, but even that comes in comes in many varieties:

Simple mean
$\bar{x} = \frac{\sum_{i=1}^N x_i}{N}$
Weighted mean
$\bar{x} = \frac{\sum_{i=1}^N w_i * x_i}{\sum_{i=1}^N w_i}$

The $w_i$ are the weights that adjust the relative importance of the scores.

Example:

Last semester Professor Cord gave the following grades

Grade # students
4.0 2
3.7 1
3.3 1
3.0 3
2.7 1
2.3 1
2.0 1
1.7 1
1.3 0
1.0 1
0.0 4

The weighted average is $\frac{2*4.0 + 1*3.7 + 1*3.3 + 3*3.0 + 1*2.7 + 1*2.3 + 1*2.0 + 1*1.7 + 0*1.3 + 1*1.0 + 4*0.0}{2 + 1 + 1 + 3 + 1 + 1 + 1 + 1 + 0 + 1 + 4}$ $= 2.11$

Another example of weighted averages:

When one student asked about his overall grade for the semester, Professor Cord pointed out that assignments were worth 50% of the grade, the final exam was worth 30%, and the midterm exam worth 20%. The student has a B, A, and C-, respectively on these.

Category Score Weight
Assignments 3.0 50
Final 4.0 30
Midterm 1.7 20

So the student’s average grade was $$\frac{50*3.0 + 30*4.0 + 20*1.7}{50+30+20} = 3.04$$

4.2 Expected Value

The expected value is a special version of the weighted mean in which the weights are the probability of seeing each particular value.

If $x_1, x_2, \ldots ,$ are all the possible values of some quantity, and these values occur with probability $p_1, p_2, \ldots ,$, then the expected value of that quantity is

\[ E(x) = \sum_{i=1}^N p_i * x_i \]

Note that if we have listed all possible values, then

\[ \sum_{i=1}^N p_i = 1 \]

so you can regard the $E(x)$ formula above as a special case of the weighted average in which the denominator (the sum of the weights) becomes simply “1”.

Example:

After long observation, we have determined that Professor Cord tends to give grades with the following distribution:

Grade probability
4.0 2/16
3.7 1/16
3.3 1/16
3.0 3/16
2.7 1/16
2.3 1/16
2.0 1/16
1.7 1/16
1.3 0/16
1.0 1/16
0.0 4/16

So the expected value of the grade for an average student in his class is

$$\begin{align} &((2/16)*4.0 + (1/16)*3.7 + (1/16)*3.3 + (3/16)*3.0 \\ &+ (1/16)*2.7 + (1/16)*2.3 + (1/16)*2.0 + (1/16)*1.7 + (0/16)*1.3 \\ &+ (1/16)*1.0 + (4/16)*0.0) \\ &= 2.11 \end{align}$$

The expected value is the kind of average we will use throughout this course in discussing average case complexity.

5 Determining the Average Case Complexity

In many ways, determining average case complexity is similar to determining the worst-case complexity.

5.1 Why Might Average-Case Be Smaller than Worst-Case?

It boils down to 3 possible reasons:

  1. Your code calls another function whose average case complexity is smaller than its worst case.
  2. You have a loop (or recursion) that, on average, does not repeat as often as it would in the worst case.
  3. You have an if statement with different complexities for its then and else parts, that if statement is inside a loop (or recursion), and, on average it takes the cheaper option ore often than it would in the worst case.

5.2 It Still All Boils Down to Addition

If you want to know how much time a complicated process takes, you figure that out by adding up the times of its various components.

That basic observations is the same for average times as it is for worst-case times.

When in doubt, just add things up.

  • Just keep in mind that you want to add up their average times instead of their worst-case times.

5.3 All of Your Variables Must be Defined

In math, as in programming, all of your variables must be declared/defined before you can use them.

5.4 Complexity is Written in Terms of the Inputs

The complexity of a block of code must be a function of the inputs (only!) to that block.

5.5 The Complexity of Any Block of Code Must be Numeric

No reason for this to change.

5.6 Surprises Demand Explanation

The vast majority of algorithms we will look at will have the same average-case complexity as worst-case. So, if you come up with a different value for the average-case, make sure that you understand why.

By the same token, though, if you make it to the end of an average-case analysis and never once took the time to even consider how the “average input” is different from the “worst-case input”, you may need to start over.

6 Why Would Average-Case Complexity Be Different from Worst-Case?

There are basically only three reasons why a piece of code would run in an average case complexity faster than its worst case:

  1. The code calls some function that is known to have a faster average case than worst case.
  2. The code contains a conditional statement that chooses between two alternatives, one of which is faster than the other, and that, on average, the faster choice is taken more often.
  3. The code contains a loop (or recursive call) that, on average, repeats far fewer times than it does in the worst case.

That’s pretty much it.

6.1 Exiting Early from a Loop

The total time for a loop is found by adding up the times of all of its iterations. But, what do we do if the number of iterations can vary depending on subtle properties of the input?

If we can say that each iteration of the loop runs in time $O_{\mbox{iteration}}(f(N))$, i.e., the time of an iteration does not depend on which iteration we are in, then we can write

\[ T_{\mbox{loop}} = \sum_{i=1}^k O_{\mbox{iteration}}(f(N)) \]

where $k$ is the number of times that the loop repeats. Now, if we are doing worst case analysis, we figure out what the maximum value of $k$ would be. But if we are doing average case analysis, we might ask if the average (or expected) value of $k$ is significantly less than that maximum.

Consider this code for searching an ordered (sorted) array of integers. We will assume that both the numbers inserted into the array and the values we use when searching are drawn randomly from the integers in some range $0 \ldots M$.

int orderedSearch (int[] array, int key) {
    int i = 0;
    while (i < array.length && array[i] > key) {
        ++i;
    }
    if (i < array.length && array[i] == key)
        return i;
    else
        return -1;
}

This search takes advantage of the fact that the array is ordered by stopping the loop as soon as we get to an array value larger than or equal to the key. For example, if we had the array

0 42 101 252 568 890

and if we were searching for 97, we would stop after i==2 because array[2] > 97 and so we know, if we have not found 97 yet, there’s no point to looking through the even larger numbers in the rest of the array.

So if we let $k$ denote the number of iterations of this loop,

what is $k$ in the worst case?
What does that tell us about the worst case complexity?
Now, what is $k$ in the average case?
What does that tell us about the average case complexity?

6.1.2 Example 2: Simulating a Rolling Die

Java has a useful class for generating pseudo-random numbers.

package java.util;

public class Random {
    // Creates a new random number generator.
    public Random() {...}
      ⋮
    // Returns a pseudorandom int value between 0 (inclusive) and bound (exclusive).
    public int nextInt(int bound) {...}
      ⋮
}

The distinction between a “pseudorandom” and true “random” integer is not particularly important to us.

The nextInt(bound) function returns a random integer in the range0...bound-1. It does this in O(1) time.

We can simulate the roll of a six-sided die by taking nextInt(6) + 1.This gives us a uniform random selection in the range 1..6.

Consider the following code:

    Random rand = new Random();
    int roll = rand.nextInt(6) + 1;
    while (roll != 2) {
       roll = rand.nextInt(6) + 1;
    }
What is the worst-case complexity of this code?
What is the average-case complexity of this code?

​%if _ignore

7 Extended Example: Ordered Insertion, Different Input Distributions

We’ll illustrate the process of doing average-case analysis by looking at a simple but useful algorithm, exploring how changes in the input distribution (the probabilities of seeing various possible inputs) affect the average case behavior.

Here is our ordered insertion algorithm.

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    
  --preStop;                  
  while (stop != start && value < *preStop) {
    *stop = *preStop;         
    --stop;                   
    --preStop;                
  }
  // Insert the new value
  *stop = value;              
  return stop;                
}

We will, as always, assume that the basic iterator operations are $O(1)$. For the sake of this example, we will also assume that the operations on the Comparable type are $O(1)$. (We’ll discuss the practical implications of this at the end.)

We start, as usual, by marking the simple bits O(1).

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) {
    *stop = *preStop;         // O(1)
    --stop;                   // O(1)
    --preStop;                // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

Next we note that the loop body can be collapsed to O(1).

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) {
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

The loop condition is O(1):

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) { //cond: O(1)
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

Because the loop condition and body are $O(1)$, we can use the shortcut of simply analyzing this loop on the expected number of iterations.

That, however, depends on the values already in the container, and how the new value compares to them.

The loop might execute

What we don’t know are the probabilities to associate with these different numbers of iterations.

Consider using this algorithm as part of a spell checking program. We can envision two very different input patterns:

Let’s analyze each of these cases in turn and see how they might differ in performance.

7.1 Input in Sorted Order

 

If the input arrives in sorted order, then each call to the function will execute the loop zero times, because the word being inserted will always be alphabetically greater than all the words already in the container.

So if $p_k$ denotes the probability of executing the loop $k$ times, then \( p_0=1, p_1 = 0, p_2 = 0, \ldots \) .

So the time is

\[ \begin{align} t_{\mbox{loop}} & = t_L(0) \\ & = O(1) \end{align} \]

For this input pattern, the entire algorithm has an average-case complexity of $O(1)$.

7.2 Input in Arbitrary Order:

 

In this case, we are equally likely to need 0 iterations, 1 iteration, 2 iterations , … , n iterations, where $n$ is distance(start,stop). So the possible numbers of iterations from 0 to $n$ are all equally likely:

\[p_i = \left\{ \begin{array}{ll}\frac{1}{n+1} & \mbox{if } 0 \leq k \leq n \\ 0 & \mbox{otherwise}\end{array}\right. \]

The cost of the loop condition and of the body is constant for each iteration, however, so we can use the special case

\[ t_{\mbox{loop}} = t_L(E(k)) \]

where $E(k)$ is the expected number of iterations of the loop.

What is $E(k)$?

Intuitively, if we are equally likely to repeat the loop 0 times, 1 time, 2 times, … , $n$ times, the average number of iterations would seem to be $n/2$.

Formally,

\[ \begin{eqnarray*} E(k) & = & \sum_{k=0}^{\infty} p_k k \\ & = & \sum_{k=0}^{\mbox{n}} p_k k \; \; (\mbox{because } p_k=0 \mbox{ when } k > \mbox{n})\\ & = & \sum_{k=0}^{\mbox{n}} \frac{1}{\mbox{n}+1} k \\ & = & \frac{1}{\mbox{n}+1} \sum_{k=0}^{\mbox{n}} k \\ & = & \frac{1}{\mbox{n}+1} \frac{\mbox{n}(\mbox{n}+1)}{2} \\ & = & \frac{\mbox{n}}{2} \\ \end{eqnarray*} \]

Chalk one up for intuition!

So the loop is $\frac{n}{2} O(1) = O(n)$

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) { //cond: O(1) #:  n/2  total: O(n)
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

And we can then replace the entire loop by $O(\mbox{n})$.

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  // O(n)
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

And now, we add up the complexities in the remaining straight-line sequence, and conclude that the entire algorithm has an average case complexity of $O(n)$, where $n$ is the distance from start to stop, when presented with randomly arranged inputs.

This is the same result we had for the worst case analysis. Does this mean that it runs in the same time on average as it does in the worst case? No, on average, it runs in half the time of its worst case, but that’s only a constant multiplier, so it disappears when we simplify.

Under similar randomly arranged inputs, the average case complexity of ordered search is $O(n)$ and the average case complexity of binary search is $O(\log n)$. Again, these are the same as their worst-case complexities.

7.3 Inputs in Almost-Sorted Order


We’ve already considered the case where the inputs to this function were already arranged into ascending order. What would happen if the inputs were almost, but not exactly, already sorted into ascending order?

For example, suppose that, on average, one out of $n$ items is out of order. Then the probability of a given input repeating the loop zero times would be $p_{\mbox{0}} = \frac{n-1}{n}$, and some single $p_{\mbox{i}}$ would have probability $1/n$, with all the other probabilities being zero.

Assuming the worst (because we want to find an upper bound), let’s assume that the one out-of-order element is the very last one added, and that it actually gets inserted into position 0. Then we have $p_0 = (n-1)/n, p_1 = 0, p_2 = 0, … , p_{n-1} = 0, p_n = 1/n$

So the average number of iterations would be given by \begin{eqnarray*} E(k) & = & \sum_{k=0}^{n} k p_k \\ & = & 0 * (n-1)/n + n * 1/n \\ & = & n/n \\ & = & 1 \\ \end{eqnarray*} and the function is $O(E(k)) = O(1)$

7.4 Almost Sorted - version 2

Now, that’s only one possible scenario in which the inputs are almost sorted. Let’s look at another. Suppose that we knew that, for each successive input, the probability of it appearing in the input $m$ steps out of its correct position is proportional to $1/(m+1)$ (i.e., each additional step out of its correct position is progressively more unlikely). Then we have $p_{0}=c, p_{1}=c/2, p_{2}=c/3, … p_{n-1}=c/n, p_{n}=c/(n+1)$.

The constant $c$ is necessary because the sum of all the probabilities must be exactly 1. We can compute the value of $c$ by using that fact:

\begin{align} \sum_{i=0}^{n} p_i = & 1 \\ \sum_{i=0}^{n} \frac{c}{i+1} = & 1 \\ c \sum_{i=0}^{n} \frac{1}{i+1} = & 1 \end{align}

This sum, for reasonably large n, is approximately $\log n$.

So we conclude that $c$ is approximately $= 1/\log(n)$.

So the function, for this input distribution, is

\begin{align} t_{\mbox{loop}} = & O(E(k)) \\ = & O\left(\sum_{i=0}^n (i + 1)p_i\right) \\ = & O\left(\sum_{i=0}^n (i + 1) \frac{c}{i+1}\right) \\ = & O\left(\sum_{i=0}^n c\right) \\ = & O((n+1)c) \\ = & O\left(\frac{n}{\log n}\right) \end{align}

So the average case is slightly smaller than the worst case, though not by much (remember that $\log n$ is nearly constant over large ranges of $n$, so $n/(\log(n))$ grows only slightly slower than $n$.

8 The Input Distribution is Key

You can see, then, that average case complexity can vary considerably depending upon just what constitutes an “average” set of inputs.

Utility functions that get used in many different programs may see different input distributions in each program, and so their average performances in different programs will vary accordingly.

9 endif