Average Case Analysis

Steven J. Zeil

Last modified: Feb 19, 2024
Contents:

1 Introduction

In the first section of this course, we looked at the process of analyzing the worst-case running time of an algorithm, and the use of worst-case analysis and big-O notation as a way of describing it.

In this section, we will introduce average-case complexity. Just as the worst-case complexity describes an upper bound on the worst-case time we would see when running an algorithm, average case complexity will present an upper bound on the average time we would see when running the program many times on many different inputs.

Worst-case complexity gets used more often than average-case. There are a number of reasons for this.

In those circumstances, it makes sense to focus on worst-case behavior and to do what we can to improve that worst case.


On the other hand, suppose we’re talking about a batch program that will process thousands of inputs per run, or we’re talking about a critical piece of an interactive program that gets run hundreds or thousands of times in between each response from the user.

In that situation, adding up hundreds or thousands of worst-cases may be just too pessimistic. The cumulative time of thousands of different runs should show some averaging out of the worst-case behavior, and an average case analysis may give a more realistic picture of what the user will be seeing.

2 Definition

Definition: Average Case Complexity

We say that an algorithm requires average time proportional to $f(n)$ (or that it has average-case complexity $O(f(N))$ if there are constants $c$ and $n_{\mbox{0}}$ such that the average time the algorithm requires to process an input set of size $n$ is no more than $c*f(n)$ time units whenever $n \geq n_{\mbox{0}}$.

This definition is very similar to the one for worst case complexity. The difference is that for worst-case complexity, we want $T_{\mbox{max}}(n) \leq c*f(n)$ where $T_{\mbox{max}}(n)$ is the maximum time taken by any input of size $n$, but for average case complexity we want $T_{\mbox{avg}}(n) \leq c*f(n)$ where $T_{\mbox{avg}}(n)$ is the average time required by inputs of size $n$.

The average case complexity describes how quickly the average time increases when n increases, just as the worst case complexity describes how quickly the worst case time increases when n increases.

In both forms of complexity, we are looking for upper bounds, so our big-O notation (and its peculiar algebra) will still apply.


Question: Suppose we have an algorithm with worst case complexity $O(n)$.

True or false: It is possible for that algorithm to have average case complexity $O(n^{\mbox{2}})$.

Answer

This is an important idea to keep in mind as we discuss the rules for average case analysis. The worst-case analysis rules all apply, because they do provide an upper bound. But that bound is sometimes not as tight as it could be. One of the things that makes average case analysis tricky is recognizing when we can get by with the worst-case rules and when we can gain by using a more elaborate average-case rule. There’s no hard-and fast way to tell — it requires the same kind of personal judgment that goes on in most mathematical proofs.

3 Probably Not a Problem

We’re going to need a few basic facts about probabilities for this section.

You can find a fuller tutorial here.

4 What’s an Average?


For some people, average case analysis is difficult because they ​ don’t have a very flexible idea of what an “average” is.

Example:

Last semester, Professor Cord gave out the following grades in his CS361 class:

A, A, A-, B+, B, B, B, B-, C+, C, C-, D, F, F, F, F

Translating these to their numerical equivalent,

4, 4, 3.7, 3.3, 3, 3, 3, 2.7, 2.3, 2, 1.7, 1, 0, 0, 0, 0

what was the average grade in Cord’s class?

According to some classic forms of average:

Median
the middle value of the sorted list, (the midpoint between the two middle values given an even number of items)

\[ \mbox{avg}_{\mbox{median}} = 2.5 \]

Mode
the most commonly occurring value

\[ \mbox{avg}_{\mbox{modal}} = 0 \]

Mean
Computed from the sum of the elements

\[ \begin{align} \mbox{avg}_{\mbox{mean}} &= (4 + 4 + 3.7 + 3.3 + 3 + 3 + 3 + 2.7 + 2.3 + 2 + 1.7 + 1 + 0 + 0 + 0 + 0) / 16 \\ &= 2.11 \end{align} \]

4.1 The Mean Average

The mean average is the most commonly used, and corresponds to most people’s idea of a “normal” average, but even that comes in comes in many varieties:

Simple mean
$\bar{x} = \frac{\sum_{i=1}^N x_i}{N}$
Weighted mean
$\bar{x} = \frac{\sum_{i=1}^N w_i * x_i}{\sum_{i=1}^N w_i}$

The $w_i$ are the weights that adjust the relative importance of the scores.

Example:

Last semester Professor Cord gave the following grades

Grade # students
4.0 2
3.7 1
3.3 1
3.0 3
2.7 1
2.3 1
2.0 1
1.7 1
1.3 0
1.0 1
0.0 4

The weighted average is $\frac{2*4.0 + 1*3.7 + 1*3.3 + 3*3.0 + 1*2.7 + 1*2.3 + 1*2.0 + 1*1.7 + 0*1.3 + 1*1.0 + 4*0.0}{2 + 1 + 1 + 3 + 1 + 1 + 1 + 1 + 0 + 1 + 4}$ $= 2.11$

Another example of weighted averages:

When one student asked about his overall grade for the semester, Professor Cord pointed out that assignments were worth 50% of the grade, the final exam was worth 30%, and the midterm exam worth 20%. The student has a B, A, and C-, respectively on these.

Category Score Weight
Assignments 3.0 50
Final 4.0 30
Midterm 1.7 20

So the student’s average grade was $$\frac{50*3.0 + 30*4.0 + 20*1.7}{50+30+20} = 3.04$$

4.2 Expected Value

The expected value is a special version of the weighted mean in which the weights are the probability of seeing each particular value.

If $x_1, x_2, \ldots ,$ are all the possible values of some quantity, and these values occur with probability $p_1, p_2, \ldots ,$, then the expected value of that quantity is

\[ E(x) = \sum_{i=1}^N p_i * x_i \]

Note that if we have listed all possible values, then

\[ \sum_{i=1}^N p_i = 1 \]

so you can regard the $E(x)$ formula above as a special case of the weighted average in which the denominator (the sum of the weights) becomes simply “1”.

Example:

After long observation, we have determined that Professor Cord tends to give grades with the following distribution:

Grade probability
4.0 2/16
3.7 1/16
3.3 1/16
3.0 3/16
2.7 1/16
2.3 1/16
2.0 1/16
1.7 1/16
1.3 0/16
1.0 1/16
0.0 4/16

So the expected value of the grade for an average student in his class is

$$\begin{align} &((2/16)*4.0 + (1/16)*3.7 + (1/16)*3.3 + (3/16)*3.0 \\ &+ (1/16)*2.7 + (1/16)*2.3 + (1/16)*2.0 + (1/16)*1.7 + (0/16)*1.3 \\ &+ (1/16)*1.0 + (4/16)*0.0) \\ &= 2.11 \end{align}$$

The expected value is the kind of average we will use throughout this course in discussing average case complexity.

5 Determining the Average Case Complexity

In many ways, determining average case complexity is similar to determining the worst-case complexity.

5.1 Why Might Average-Case Be Smaller than Worst-Case?

It boils down to 3 possible reasons:

  1. Your code calls another function whose average case complexity is smaller than its worst case.
  2. You have a loop (or recursion) that, on average, does not repeat as often as it would in the worst case.
  3. You have an if statement with different complexities for its then and else parts, that if statement is inside a loop (or recursion), and, on average it takes the cheaper option ore often than it would in the worst case.

5.2 It Still All Boils Down to Addition

If you want to know how much time a complicated process takes, you figure that out by adding up the times of its various components.

That basic observations is the same for average times as it is for worst-case times.

When in doubt, just add things up.

  • Just keep in mind that you want to add up their average times instead of their worst-case times.

5.3 All of Your Variables Must be Defined

In math, as in programming, all of your variables must be declared/defined before you can use them.

5.4 Complexity is Written in Terms of the Inputs

The complexity of a block of code must be a function of the inputs (only!) to that block.

5.5 The Complexity of Any Block of Code Must be Numeric

No reason for this to change.

5.6 Surprises Demand Explanation

The vast majority of algorithms we will look at will have the same average-case complexity as worst-case. So, if you come up with a different value for the average-case, make sure that you understand why.

By the same token, though, if you make it to the end of an average-case analysis and never once took the time to even consider how the “average input” is different from the “worst-case input”, you may need to start over.

6 The Complexity of Expression Evaluation

Very little changes for evaluating the cost of C++ expressions:

With so much the same between average and worst-case analysis, where do the differences come into play? The differences are in the treatment of compound statements, especially loops.

7 The Complexity of Compound Statements

As in worst-case analysis, we will usually analyze compound statements in an inside-out fashion, starting with the most deeply nested components and working our way outward from there.

We will annotate our code in the same way as before to record our analysis.

7.1 Sequences of Statements

The simplest form of compound statement is the sequence or block, usually written in C++ between { } brackets. Examples include function bodies, loop bodies, and the “then” and “else” parts of if statements.

A sequence tells the machine to process statements one after the other. So,

The time for a sequence of statements is the sum of the times of the individual statements.

More formally, for a sequence of statements $s_1, s_2, …, s_k$

\[ E(t_{\mbox{seq}}) = \sum_{i=1}^k E(t_{s_i}) \]

When doing worst-case analysis, we added up the worst-case times.

When doing average-case analysis, we add up the average-case (expected value) of the times instead.

7.2 Conditional statements

When we have an if statement, we know that we will execute either the then part or the else part, but not both. However, when doing an average-case analysis we generally do not know which part we will take. Instead, we have to consider the probability of taking each part.

Let $p$ be the probability that the if condition is true (i.e., that we will take the “then” part).

\[E(t_{\mbox{if}}) = E(t_{\mbox{condition}}) + \left(p*E(t_{\mbox{then}}) + (1-p)*E(t_{\mbox{else}})\right) \]

The expression $\left(p*E(t_{\mbox{then}}) + (1-p)*E(t_{\mbox{else}})\right)$ is simply the expected value (average) of the time to take the “then” and “else” parts.

7.3 Loops

When we analyze a loop, we need to add up the expected time required for all of its iterations.

The complicating factor here is that, in worst case analysis, we can almost always say “in the worst case, for an input of size N, this loop will repeat k times”, and then sum up the times over those $k$ iterations.

When we are doing an average case analysis, we often have to consider the possibility that the number of iterations may vary, even among inputs of the same size $N$. Each possible number of iterations will have a certain probability, and we would need to get the expected value of the time over all of those possible numbers of iterations

In general,

Then the average time $t_{\mbox{loop}}$ time expected for that loop is

\[ t_{\mbox{loop}} = \sum_{k=0}^{\infty} p_k t_L(k) \]

7.3.1 Special case: constant or linear loops

Luckily, we often will not need to evaluate this formula in all of it’s generality.

For example, if $t_L(k)$ is $O(1)$ or $O(k)$ we can instead say

\[ t_{\mbox{loop}} = t_L(E(k)) \]

i.e., we simply need to find the average number of iterations and evaluate the loop time on that number.

Example 1: Simulating a Rolling Die

The function rand() returns a pseudo-random non-negative integer each time it is called. rand() runs in $O(1)$ time (worst-case and average-case).

We can simulate the roll of a six-sided die by taking (rand() % 6) + 1. The % 6 takes the remainder of the random number when divided by 6, which we can regard as a uniform random selection in the range 0..5. “Uniform” means here that all six possible values are equally likely on any given call to rand(). Then adding 1 to that gives us a uniform random selection in the range 1..6.

Consider the following code:

int roll = (rand() % 6) + 1;
while (roll != 2)
{
   roll = (rand() % 6) + 1;
}
What is the worst-case complexity of this code?
What is the average-case complexity of this code?

7.3.2 Special case: a fixed number of iterations

Another common simplification occurs when we can determine the number of iterations for all inputs of size $N$. If we know that the loop executes $h(N)$ times for all inputs of size $N$, then the probabilities $p_k$ of executing the loop $k$ times is

\[ p_k = \left\{ \begin{array}{ll} 1.0 & \mbox{if } k = h(N) \\ 0.0 & \mbox{if } k \neq h(N) \\ \end{array} \right. \]

so that all of the terms in

\[ t_{\mbox{loop}} = \sum_{k=0}^{\infty} p_k t_L(k) \]

are zero except the $k = h(N)$ term:

\[ t_{\mbox{loop}} = t_L(h(N)) \]

Example 2: Filling an Array

Consider the code:

void fill_n (int* array, int n, int value)
{
   for (int i = 0; i < n; ++i)
       array[i] = value;
}

where the values in the array are known to be in the range 0..10, with half of them being zero.

What is the average case complexity of this code?

8 Extended Example: Ordered Insertion, Different Input Distributions

We’ll illustrate the process of doing average-case analysis by looking at a simple but useful algorithm, exploring how changes in the input distribution (the probabilities of seeing various possible inputs) affect the average case behavior.

Here is our ordered insertion algorithm.

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    
  --preStop;                  
  while (stop != start && value < *preStop) {
    *stop = *preStop;         
    --stop;                   
    --preStop;                
  }
  // Insert the new value
  *stop = value;              
  return stop;                
}

We will, as always, assume that the basic iterator operations are $O(1)$. For the sake of this example, we will also assume that the operations on the Comparable type are $O(1)$. (We’ll discuss the practical implications of this at the end.)

We start, as usual, by marking the simple bits O(1).

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) {
    *stop = *preStop;         // O(1)
    --stop;                   // O(1)
    --preStop;                // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

Next we note that the loop body can be collapsed to O(1).

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) {
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

The loop condition is O(1):

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) { //cond: O(1)
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

Because the loop condition and body are $O(1)$, we can use the shortcut of simply analyzing this loop on the expected number of iterations.

That, however, depends on the values already in the container, and how the new value compares to them.

The loop might execute

What we don’t know are the probabilities to associate with these different numbers of iterations.

Consider using this algorithm as part of a spell checking program. We can envision two very different input patterns:

Let’s analyze each of these cases in turn and see how they might differ in performance.

8.1 Input in Sorted Order

 

If the input arrives in sorted order, then each call to the function will execute the loop zero times, because the word being inserted will always be alphabetically greater than all the words already in the container.

So if $p_k$ denotes the probability of executing the loop $k$ times, then \( p_0=1, p_1 = 0, p_2 = 0, \ldots \) .

So the time is

\[ \begin{align} t_{\mbox{loop}} & = t_L(0) \\ & = O(1) \end{align} \]

For this input pattern, the entire algorithm has an average-case complexity of $O(1)$.

8.2 Input in Arbitrary Order:

 

In this case, we are equally likely to need 0 iterations, 1 iteration, 2 iterations , … , n iterations, where $n$ is distance(start,stop). So the possible numbers of iterations from 0 to $n$ are all equally likely:

\[p_i = \left\{ \begin{array}{ll}\frac{1}{n+1} & \mbox{if } 0 \leq k \leq n \\ 0 & \mbox{otherwise}\end{array}\right. \]

The cost of the loop condition and of the body is constant for each iteration, however, so we can use the special case

\[ t_{\mbox{loop}} = t_L(E(k)) \]

where $E(k)$ is the expected number of iterations of the loop.

What is $E(k)$?

Intuitively, if we are equally likely to repeat the loop 0 times, 1 time, 2 times, … , $n$ times, the average number of iterations would seem to be $n/2$.

Formally,

\[ \begin{eqnarray*} E(k) & = & \sum_{k=0}^{\infty} p_k k \\ & = & \sum_{k=0}^{\mbox{n}} p_k k \; \; (\mbox{because } p_k=0 \mbox{ when } k > \mbox{n})\\ & = & \sum_{k=0}^{\mbox{n}} \frac{1}{\mbox{n}+1} k \\ & = & \frac{1}{\mbox{n}+1} \sum_{k=0}^{\mbox{n}} k \\ & = & \frac{1}{\mbox{n}+1} \frac{\mbox{n}(\mbox{n}+1)}{2} \\ & = & \frac{\mbox{n}}{2} \\ \end{eqnarray*} \]

Chalk one up for intuition!

So the loop is $\frac{n}{2} O(1) = O(n)$

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) { //cond: O(1) #:  n/2  total: O(n)
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

And we can then replace the entire loop by $O(\mbox{n})$.

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  // O(n)
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

And now, we add up the complexities in the remaining straight-line sequence, and conclude that the entire algorithm has an average case complexity of $O(n)$, where $n$ is the distance from start to stop, when presented with randomly arranged inputs.

This is the same result we had for the worst case analysis. Does this mean that it runs in the same time on average as it does in the worst case? No, on average, it runs in half the time of its worst case, but that’s only a constant multiplier, so it disappears when we simplify.

Under similar randomly arranged inputs, the average case complexity of ordered search is $O(n)$ and the average case complexity of binary search is $O(\log n)$. Again, these are the same as their worst-case complexities.

8.3 Inputs in Almost-Sorted Order


We’ve already considered the case where the inputs to this function were already arranged into ascending order. What would happen if the inputs were almost, but not exactly, already sorted into ascending order?

For example, suppose that, on average, one out of $n$ items is out of order. Then the probability of a given input repeating the loop zero times would be $p_{\mbox{0}} = \frac{n-1}{n}$, and some single $p_{\mbox{i}}$ would have probability $1/n$, with all the other probabilities being zero.

Assuming the worst (because we want to find an upper bound), let’s assume that the one out-of-order element is the very last one added, and that it actually gets inserted into position 0. Then we have $p_0 = (n-1)/n, p_1 = 0, p_2 = 0, … , p_{n-1} = 0, p_n = 1/n$

So the average number of iterations would be given by \begin{eqnarray*} E(k) & = & \sum_{k=0}^{n} k p_k \\ & = & 0 * (n-1)/n + n * 1/n \\ & = & n/n \\ & = & 1 \\ \end{eqnarray*} and the function is $O(E(k)) = O(1)$

8.4 Almost Sorted - version 2

Now, that’s only one possible scenario in which the inputs are almost sorted. Let’s look at another. Suppose that we knew that, for each successive input, the probability of it appearing in the input $m$ steps out of its correct position is proportional to $1/(m+1)$ (i.e., each additional step out of its correct position is progressively more unlikely). Then we have $p_{0}=c, p_{1}=c/2, p_{2}=c/3, … p_{n-1}=c/n, p_{n}=c/(n+1)$.

The constant $c$ is necessary because the sum of all the probabilities must be exactly 1. We can compute the value of $c$ by using that fact:

\begin{align} \sum_{i=0}^{n} p_i = & 1 \\ \sum_{i=0}^{n} \frac{c}{i+1} = & 1 \\ c \sum_{i=0}^{n} \frac{1}{i+1} = & 1 \end{align}

This sum, for reasonably large n, is approximately $\log n$.

So we conclude that $c$ is approximately $= 1/\log(n)$.

So the function, for this input distribution, is

\begin{align} t_{\mbox{loop}} = & O(E(k)) \\ = & O\left(\sum_{i=0}^n (i + 1)p_i\right) \\ = & O\left(\sum_{i=0}^n (i + 1) \frac{c}{i+1}\right) \\ = & O\left(\sum_{i=0}^n c\right) \\ = & O((n+1)c) \\ = & O\left(\frac{n}{\log n}\right) \end{align}

So the average case is slightly smaller than the worst case, though not by much (remember that $\log n$ is nearly constant over large ranges of $n$, so $n/(\log(n))$ grows only slightly slower than $n$.

9 The Input Distribution is Key

You can see, then, that average case complexity can vary considerably depending upon just what constitutes an “average” set of inputs.

Utility functions that get used in many different programs may see different input distributions in each program, and so their average performances in different programs will vary accordingly.