Average Case Analysis

Steven J. Zeil

Last modified: Feb 19, 2024

1 Introduction

In the first section of this course, we looked at the process of analyzing the worst-case running time of an algorithm, and the use of worst-case analysis and big-O notation as a way of describing it.

In this section, we will introduce average-case complexity. Just as the worst-case complexity describes an upper bound on the worst-case time we would see when running an algorithm, average case complexity will present an upper bound on the average time we would see when running the program many times on many different inputs.

Worst-case complexity gets used more often than average-case. There are a number of reasons for this.

The worst-case complexity is often easier to compute than the average case. Just figuring out what an “average” set of inputs will look like is often a challenge. To figure out the worst case complexity, we only need to identify that one single input that results in the slowest running.
In many cases, it will turn out the worst and average complexity will turn out to be the same.
Finally, reporting the worst case to your boss or your customers is often “safer” than reporting the average.

If you give them the average, then sometimes they will run the program and see slower performance than they had expected. Human nature being what it is, they will probably get rather annoyed. On the other hand, if you go to those same customers with the worst-case figure, most of the time they will observe faster-than-expected behavior, and will be more pleased.

This appears to be particularly true of interactive programs.

When people are actually sitting there typing things in or clicking with the mouse and then waiting for a response, if they have to sit for a long time waiting for a response, they’re going to remember that. Even if 99.9% of the time they get instant response (so the average response is still quite good), they will characterize your program as “sluggish”.

In those circumstances, it makes sense to focus on worst-case behavior and to do what we can to improve that worst case.

On the other hand, suppose we’re talking about a batch program that will process thousands of inputs per run, or we’re talking about a critical piece of an interactive program that gets run hundreds or thousands of times in between each response from the user.

In that situation, adding up hundreds or thousands of worst-cases may be just too pessimistic. The cumulative time of thousands of different runs should show some averaging out of the worst-case behavior, and an average case analysis may give a more realistic picture of what the user will be seeing.

2 Definition

Definition: Average Case Complexity

We say that an algorithm requires average time proportional to $f(n)$ (or that it has average-case complexity $O(f(N))$ if there are constants $c$ and $n_{\mbox{0}}$ such that the average time the algorithm requires to process an input set of size $n$ is no more than $c*f(n)$ time units whenever $n \geq n_{\mbox{0}}$.

This definition is very similar to the one for worst case complexity. The difference is that for worst-case complexity, we want $T_{\mbox{max}}(n) \leq c*f(n)$ where $T_{\mbox{max}}(n)$ is the maximum time taken by any input of size $n$, but for average case complexity we want $T_{\mbox{avg}}(n) \leq c*f(n)$ where $T_{\mbox{avg}}(n)$ is the average time required by inputs of size $n$.

The average case complexity describes how quickly the average time increases when n increases, just as the worst case complexity describes how quickly the worst case time increases when n increases.

In both forms of complexity, we are looking for upper bounds, so our big-O notation (and its peculiar algebra) will still apply.

Question: Suppose we have an algorithm with worst case complexity $O(n)$.

True or false: It is possible for that algorithm to have average case complexity $O(n^{\mbox{2}})$.

Answer

This is an important idea to keep in mind as we discuss the rules for average case analysis. The worst-case analysis rules all apply, because they do provide an upper bound. But that bound is sometimes not as tight as it could be. One of the things that makes average case analysis tricky is recognizing when we can get by with the worst-case rules and when we can gain by using a more elaborate average-case rule. There’s no hard-and fast way to tell — it requires the same kind of personal judgment that goes on in most mathematical proofs.

3 Probably Not a Problem

We’re going to need a few basic facts about probabilities for this section.

Every probability is between $0$ and $1$, inclusive.
- For example, the probability of a flipped coin coming up heads is $0.5$.
- The probability of an ordinary six-sided die rolling a ‘3’ is $1/6$.
  - The probability of that same die rolling an even number is $0.5$.
The sum of the probabilities of all possible events must be $1.0$.
- For example, the probability of an ordinary six-sided die rolling some number between 1 and 6 is $1.0$.
If $p$ is the probability that an event will happen, then $(1-p)$ is the probability that the event will not happen.
- The probability of a 6-sided die rolling any number other than ‘3’ is $1 - 1/6 = 5/6$.
If two events with probabilities $p_1$ and $p_2$ are independent (the success or failure of the first event does not affect the success or failure of the other), then
1. the probability that both events will occur is $p_1 * p_2$.
2. the probability that neither will occur is $(1 - p_1)(1 - p_2)$.
3. the probability that at least one will occur is $1 - (1-p_1)(1-p_2)$.
For example, if I roll two 6-sided dice, the probability that both will roll a ‘3’ is $\frac{1}{6} * \frac{1}{6} = \frac{1}{36}$.

If I roll two six-sided dice, the probability that at least one will come up ‘3’ is

\[ 1 - \left(1-\frac{1}{6}\right)\left(1-\frac{1}{6}\right) = 1 - \left(\frac{5}{6}\right)\left(\frac{5}{6}\right) = 1 - \frac{25}{36} = \frac{11}{36} \]
More generally, if I have a series of independent events with probabilities $p_1, p_2, \ldots p_k$, then
1. the probability that all events will occur is $\prod_{i=1}^{k} p_i$.
2. the probability that neither will occur is $\prod_{i=1}^{k}(1 - p_i)$.
3. the probability that at least one will occur is $1 - \prod_{i=1}^{k}(1-p_i)$.
If I flip a coin 3 times, the probability that I will see the sequence heads-tails-heads is $\frac{1}{2} * \frac{1}{2} * \frac{1}{2} = \frac{1}{8}$
- Note that the probability of seeing the sequence heads-heads-heads is exactly the same.
If I flip a coin 3 times, the probability that I will see at least one heads is 1 - $\frac{1}{2} * \frac{1}{2} * \frac{1}{2} = \frac{7}{8}$
If we make a number of independent attempts at something, and the chance of seeing an event on any single attempt is $p$, on average we will need $1/p$ attempts before seeing that event.
- For example, if I start rolling a six-sided die until I see a ‘2’, on average I would wait $\frac{1}{1/6} = 6$ rolls.
  
  This is a great illustration of the difference between average and worst-case times, because in the worst case you would keep rolling forever!

You can find a fuller tutorial here.

4 What’s an Average?

For some people, average case analysis is difficult because they don’t have a very flexible idea of what an “average” is.

Example:

Last semester, Professor Cord gave out the following grades in his CS361 class:

A, A, A-, B+, B, B, B, B-, C+, C, C-, D, F, F, F, F

Translating these to their numerical equivalent,

4, 4, 3.7, 3.3, 3, 3, 3, 2.7, 2.3, 2, 1.7, 1, 0, 0, 0, 0

what was the average grade in Cord’s class?

According to some classic forms of average:

Median: the middle value of the sorted list, (the midpoint between the two middle values given an even number of items)

\[ \mbox{avg}_{\mbox{median}} = 2.5 \]

Mode: the most commonly occurring value

\[ \mbox{avg}_{\mbox{modal}} = 0 \]

Mean: Computed from the sum of the elements

\[ \begin{align} \mbox{avg}_{\mbox{mean}} &= (4 + 4 + 3.7 + 3.3 + 3 + 3 + 3 + 2.7 + 2.3 + 2 + 1.7 + 1 + 0 + 0 + 0 + 0) / 16 \\ &= 2.11 \end{align} \]

4.1 The Mean Average

The mean average is the most commonly used, and corresponds to most people’s idea of a “normal” average, but even that comes in comes in many varieties:

Simple mean: $\bar{x} = \frac{\sum_{i=1}^N x_i}{N}$
Weighted mean: $\bar{x} = \frac{\sum_{i=1}^N w_i * x_i}{\sum_{i=1}^N w_i}$
The $w_i$ are the weights that adjust the relative importance of the scores.

Example:

Last semester Professor Cord gave the following grades

Grade # students

4.0 2

3.7 1

3.3 1

3.0 3

2.7 1

2.3 1

2.0 1

1.7 1

1.3 0

1.0 1

0.0 4

The weighted average is $\frac{2*4.0 + 1*3.7 + 1*3.3 + 3*3.0 + 1*2.7 + 1*2.3 + 1*2.0 + 1*1.7 + 0*1.3 + 1*1.0 + 4*0.0}{2 + 1 + 1 + 3 + 1 + 1 + 1 + 1 + 0 + 1 + 4}$ $= 2.11$

Grade	# students
4.0	2
3.7	1
3.3	1
3.0	3
2.7	1
2.3	1
2.0	1
1.7	1
1.3	0
1.0	1
0.0	4

Another example of weighted averages:

When one student asked about his overall grade for the semester, Professor Cord pointed out that assignments were worth 50% of the grade, the final exam was worth 30%, and the midterm exam worth 20%. The student has a B, A, and C-, respectively on these.

Category Score Weight

Assignments 3.0 50

Final 4.0 30

Midterm 1.7 20

So the student’s average grade was $$\frac{50*3.0 + 30*4.0 + 20*1.7}{50+30+20} = 3.04$$

Category	Score	Weight
Assignments	3.0	50
Final	4.0	30
Midterm	1.7	20

4.2 Expected Value

The expected value is a special version of the weighted mean in which the weights are the probability of seeing each particular value.

If $x_1, x_2, \ldots ,$ are all the possible values of some quantity, and these values occur with probability $p_1, p_2, \ldots ,$, then the expected value of that quantity is

\[ E(x) = \sum_{i=1}^N p_i * x_i \]

Note that if we have listed all possible values, then

\[ \sum_{i=1}^N p_i = 1 \]

so you can regard the $E(x)$ formula above as a special case of the weighted average in which the denominator (the sum of the weights) becomes simply “1”.

Example:

After long observation, we have determined that Professor Cord tends to give grades with the following distribution:

Grade probability

4.0 2/16

3.7 1/16

3.3 1/16

3.0 3/16

2.7 1/16

2.3 1/16

2.0 1/16

1.7 1/16

1.3 0/16

1.0 1/16

0.0 4/16

So the expected value of the grade for an average student in his class is

$$\begin{align} &((2/16)*4.0 + (1/16)*3.7 + (1/16)*3.3 + (3/16)*3.0 \\ &+ (1/16)*2.7 + (1/16)*2.3 + (1/16)*2.0 + (1/16)*1.7 + (0/16)*1.3 \\ &+ (1/16)*1.0 + (4/16)*0.0) \\ &= 2.11 \end{align}$$

The expected value is the kind of average we will use throughout this course in discussing average case complexity.

Grade	probability
4.0	2/16
3.7	1/16
3.3	1/16
3.0	3/16
2.7	1/16
2.3	1/16
2.0	1/16
1.7	1/16
1.3	0/16
1.0	1/16
0.0	4/16

5 Determining the Average Case Complexity

In many ways, determining average case complexity is similar to determining the worst-case complexity.

5.1 Why Might Average-Case Be Smaller than Worst-Case?

It boils down to 3 possible reasons:

Your code calls another function whose average case complexity is smaller than its worst case.

You have a loop (or recursion) that, on average, does not repeat as often as it would in the worst case.

You have an if statement with different complexities for its then and else parts, that if statement is inside a loop (or recursion), and, on average it takes the cheaper option ore often than it would in the worst case.

5.2 It Still All Boils Down to Addition

If you want to know how much time a complicated process takes, you figure that out by adding up the times of its various components.

That basic observations is the same for average times as it is for worst-case times.

When in doubt, just add things up.

Just keep in mind that you want to add up their average times instead of their worst-case times.

5.3 All of Your Variables Must be Defined

In math, as in programming, all of your variables must be declared/defined before you can use them.

5.4 Complexity is Written in Terms of the Inputs

The complexity of a block of code must be a function of the inputs (only!) to that block.

5.5 The Complexity of Any Block of Code Must be Numeric

No reason for this to change.

5.6 Surprises Demand Explanation

The vast majority of algorithms we will look at will have the same average-case complexity as worst-case. So, if you come up with a different value for the average-case, make sure that you understand why.

By the same token, though, if you make it to the end of an average-case analysis and never once took the time to even consider how the “average input” is different from the “worst-case input”, you may need to start over.

6 The Complexity of Expression Evaluation

Very little changes for evaluating the cost of C++ expressions:

Arithmetic & Relational Operations on Primitive Types are O(1)

Still true.
Assignments of Primitive Types are O(1)

Still true.
Basic Address Calculations are O(1)

Still true.
A function calls has the average-case complexity of the function body, with appropriate substitutions of the actual parameters for the formal parameters.

No change from worst-case analysis, except that we must have done an average-case analysis on the function body instead of a worst-case analysis.

With so much the same between average and worst-case analysis, where do the differences come into play? The differences are in the treatment of compound statements, especially loops.

7 The Complexity of Compound Statements

As in worst-case analysis, we will usually analyze compound statements in an inside-out fashion, starting with the most deeply nested components and working our way outward from there.

We will annotate our code in the same way as before to record our analysis.

7.1 Sequences of Statements

The simplest form of compound statement is the sequence or block, usually written in C++ between { } brackets. Examples include function bodies, loop bodies, and the “then” and “else” parts of if statements.

A sequence tells the machine to process statements one after the other. So,

The time for a sequence of statements is the sum of the times of the individual statements.

More formally, for a sequence of statements $s_1, s_2, …, s_k$

\[ E(t_{\mbox{seq}}) = \sum_{i=1}^k E(t_{s_i}) \]

When doing worst-case analysis, we added up the worst-case times.

When doing average-case analysis, we add up the average-case (expected value) of the times instead.

7.2 Conditional statements

When we have an if statement, we know that we will execute either the then part or the else part, but not both. However, when doing an average-case analysis we generally do not know which part we will take. Instead, we have to consider the probability of taking each part.

Let $p$ be the probability that the if condition is true (i.e., that we will take the “then” part).

\[E(t_{\mbox{if}}) = E(t_{\mbox{condition}}) + \left(p*E(t_{\mbox{then}}) + (1-p)*E(t_{\mbox{else}})\right) \]

The expression $\left(p*E(t_{\mbox{then}}) + (1-p)*E(t_{\mbox{else}})\right)$ is simply the expected value (average) of the time to take the “then” and “else” parts.

7.3 Loops

When we analyze a loop, we need to add up the expected time required for all of its iterations.

The complicating factor here is that, in worst case analysis, we can almost always say “in the worst case, for an input of size N, this loop will repeat k times”, and then sum up the times over those $k$ iterations.

When we are doing an average case analysis, we often have to consider the possibility that the number of iterations may vary, even among inputs of the same size $N$. Each possible number of iterations will have a certain probability, and we would need to get the expected value of the time over all of those possible numbers of iterations

In general,

Let $t_L(k)$ denote the average time that a loop $L$ requires to run on those inputs that cause the loop to repeat exactly $k$ times before exiting,

$t_L$ can be computed using the same rules as used for $t_{\mbox{while}}$ and $t_{\mbox{for}}$ in worst-case analysis, so long as we add up average times instead of worst-case times.

Let $p_k$ denote the probability of an arbitrary input causing the loop to repeat exactly $k$ times before exiting.

Then the average time $t_{\mbox{loop}}$ time expected for that loop is

\[ t_{\mbox{loop}} = \sum_{k=0}^{\infty} p_k t_L(k) \]

The $\infty$ is not a mistake. It expresses the general case that some loops could execute an arbitrary number of times depending upon the input.

7.3.1 Special case: constant or linear loops

Luckily, we often will not need to evaluate this formula in all of it’s generality.

For example, if $t_L(k)$ is $O(1)$ or $O(k)$ we can instead say

\[ t_{\mbox{loop}} = t_L(E(k)) \]

i.e., we simply need to find the average number of iterations and evaluate the loop time on that number.

Example 1: Simulating a Rolling Die

The function rand() returns a pseudo-random non-negative integer each time it is called. rand() runs in $O(1)$ time (worst-case and average-case).

We can simulate the roll of a six-sided die by taking (rand() % 6) + 1. The % 6 takes the remainder of the random number when divided by 6, which we can regard as a uniform random selection in the range 0..5. “Uniform” means here that all six possible values are equally likely on any given call to rand(). Then adding 1 to that gives us a uniform random selection in the range 1..6.

Consider the following code:

int roll = (rand() % 6) + 1; while (roll != 2) { roll = (rand() % 6) + 1; }

What is the worst-case complexity of this code?

All of the operations in the first line are $O(1)

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) { roll = (rand() % 6) + 1; }

All of the operations in the loop body are O(1).

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) { roll = (rand() % 6) + 1; // O(1) }

The condition of the while loop is $O(1)$:

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) // cond: O(1) { roll = (rand() % 6) + 1; // O(1) }

Things get tricky when we ask the question “How many times does this loop repeat?”

How many times, in the worst case, can we roll a die until a ‘2’ comes up? In the worst case, there is no limit to the number of times we might roll.

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) // cond: O(1) #: infinity { roll = (rand() % 6) + 1; // O(1) }

And we conclude that the loop, and the entire block of code, is $O(\infty)$.

What is the average-case complexity of this code?

All of the operations in the first line are $O(1)

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) { roll = (rand() % 6) + 1; }

All of the operations in the loop body are O(1).

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) { roll = (rand() % 6) + 1; // O(1) }

The condition of the while loop is $O(1)$:

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) // cond: O(1) { roll = (rand() % 6) + 1; // O(1) }

How many times does this loop repeat? We don’t know, but can compute an expected value. How many times, on average, would we roll a die until a ‘2’ comes up?

On any given roll, the probability of seeing a ‘2’ is $1/6$.

And if we are looking at a string of independent events with probability $p$, on average we wait $1/p$ times to see that event.

So the average number of times we would repeat this loop is $\frac{1}{1/6}$ or 6 times.

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) // cond: O(1) #: 6 { roll = (rand() % 6) + 1; // O(1) }

Each iteration of the loop is $O(1)$ time for both the condition and the body, so we conclude that the loop average $6*O(1)$ time, which simplifies to $O(1)$

int roll = (rand() % 6) + 1; // O(1) while (roll != 2) // cond: O(1) #: 6 total: O(1) { roll = (rand() % 6) + 1; // O(1) }

And the entire block of code has an average-case complexity of $O(1)$.

7.3.2 Special case: a fixed number of iterations

Another common simplification occurs when we can determine the number of iterations for all inputs of size $N$. If we know that the loop executes $h(N)$ times for all inputs of size $N$, then the probabilities $p_k$ of executing the loop $k$ times is

\[ p_k = \left\{ \begin{array}{ll} 1.0 & \mbox{if } k = h(N) \\ 0.0 & \mbox{if } k \neq h(N) \\ \end{array} \right. \]

so that all of the terms in

\[ t_{\mbox{loop}} = \sum_{k=0}^{\infty} p_k t_L(k) \]

are zero except the $k = h(N)$ term:

\[ t_{\mbox{loop}} = t_L(h(N)) \]

Example 2: Filling an Array

Consider the code:
void fill_n (int* array, int n, int value)
{
   for (int i = 0; i < n; ++i)
       array[i] = value;
}
where the values in the array are known to be in the range 0..10, with half of them being zero.
What is the average case complexity of this code?
The body is O(1):
void fill_n (int* array, int n, int value)
{
   for (int i = 0; i < n; ++i)  
       array[i] = value;        // O(1)
}
The loop condition is O(1), and the loop repeats $n$ times.
void fill_n (int* array, int n, int value)
{
   for (int i = 0; i < n; ++i)  // cont: O(1)  #: n
       array[i] = value;        // O(1)
}
Now, because the number of loop iterations depends only on the size of the input ($n$) and not upon the variations in input within a given size, we can say that

\[ t_{\mbox{loop}} = t_L(n) \]

and evaluate $t_L$ according to our conventional rules:
void fill_n (int* array, int n, int value)
{
   for (int i = 0; i < n; ++i)  // cont: O(1)  #: n  total: O(n)
       array[i] = value;        // O(1)
}
and the entire function has average case complexity $O(n)$ – the same as it’s worst-case complexity. That should not be a surprise - this function behaves exactly (time-wise) the same no matter values we supply in the array or in value.

8 Extended Example: Ordered Insertion, Different Input Distributions

We’ll illustrate the process of doing average-case analysis by looking at a simple but useful algorithm, exploring how changes in the input distribution (the probabilities of seeing various possible inputs) affect the average case behavior.

Here is our ordered insertion algorithm.

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    
  --preStop;                  
  while (stop != start && value < *preStop) {
    *stop = *preStop;         
    --stop;                   
    --preStop;                
  }
  // Insert the new value
  *stop = value;              
  return stop;                
}

We will, as always, assume that the basic iterator operations are $O(1)$. For the sake of this example, we will also assume that the operations on the Comparable type are $O(1)$. (We’ll discuss the practical implications of this at the end.)

We start, as usual, by marking the simple bits O(1).

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) {
    *stop = *preStop;         // O(1)
    --stop;                   // O(1)
    --preStop;                // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

Next we note that the loop body can be collapsed to O(1).

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) {
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

The loop condition is O(1):

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) { //cond: O(1)
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

Because the loop condition and body are $O(1)$, we can use the shortcut of simply analyzing this loop on the expected number of iterations.

That, however, depends on the values already in the container, and how the new value compares to them.

The loop might execute

0 times (if value is larger than anything already in the container),
1 time (if value is larger than all but one element already in the container),
and so on up to a maximum of distance(start,stop) times (if value is smaller than everything already in the container).

What we don’t know are the probabilities to associate with these different numbers of iterations.

These depend upon the way the successive inputs to this function are distributed.

Consider using this algorithm as part of a spell checking program. We can envision two very different input patterns:

If we are reading words from the spell check dictionary into an array for later search purposes, those words are most likely already sorted. There may be a few exceptions, e.g., words that the user has added to their personal dictionary. For example, I often find it useful to add my last name to my spellcheck dictionaries.
If we are building a concordance, a list of words actually appearing in the document, then we would recieve the words in whatever order they appear in the document text, essentially a random order.

Let’s analyze each of these cases in turn and see how they might differ in performance.

8.1 Input in Sorted Order

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop,
                const Comparable& value)
{
  Iterator preStop = stop;    
  --preStop;                  
  while (stop != start && value < *preStop) {
    *stop = *preStop;         
    --stop;                   
    --preStop;                
  }
  // Insert the new value
  *stop = value;              
  return stop;                
}

If the input arrives in sorted order, then each call to the function will execute the loop zero times, because the word being inserted will always be alphabetically greater than all the words already in the container.

So if $p_k$ denotes the probability of executing the loop $k$ times, then $ p_0=1, p_1 = 0, p_2 = 0, \ldots $ .

So the time is

\[ \begin{align} t_{\mbox{loop}} & = t_L(0) \\ & = O(1) \end{align} \]

For this input pattern, the entire algorithm has an average-case complexity of $O(1)$.

8.2 Input in Arbitrary Order:

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop,
                const Comparable& value)
{
  Iterator preStop = stop;    
  --preStop;                  
  while (stop != start && value < *preStop) {
    *stop = *preStop;         
    --stop;                   
    --preStop;                
  }
  // Insert the new value
  *stop = value;              
  return stop;                
}

In this case, we are equally likely to need 0 iterations, 1 iteration, 2 iterations , … , n iterations, where $n$ is distance(start,stop). So the possible numbers of iterations from 0 to $n$ are all equally likely:

\[p_i = \left\{ \begin{array}{ll}\frac{1}{n+1} & \mbox{if } 0 \leq k \leq n \\ 0 & \mbox{otherwise}\end{array}\right. \]

The cost of the loop condition and of the body is constant for each iteration, however, so we can use the special case

\[ t_{\mbox{loop}} = t_L(E(k)) \]

where $E(k)$ is the expected number of iterations of the loop.

What is $E(k)$?

Intuitively, if we are equally likely to repeat the loop 0 times, 1 time, 2 times, … , $n$ times, the average number of iterations would seem to be $n/2$.

Formally,

\[ \begin{eqnarray*} E(k) & = & \sum_{k=0}^{\infty} p_k k \\ & = & \sum_{k=0}^{\mbox{n}} p_k k \; \; (\mbox{because } p_k=0 \mbox{ when } k > \mbox{n})\\ & = & \sum_{k=0}^{\mbox{n}} \frac{1}{\mbox{n}+1} k \\ & = & \frac{1}{\mbox{n}+1} \sum_{k=0}^{\mbox{n}} k \\ & = & \frac{1}{\mbox{n}+1} \frac{\mbox{n}(\mbox{n}+1)}{2} \\ & = & \frac{\mbox{n}}{2} \\ \end{eqnarray*} \]

Chalk one up for intuition!

So the loop is $\frac{n}{2} O(1) = O(n)$

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  while (stop != start && value < *preStop) { //cond: O(1) #:  n/2  total: O(n)
    // O(1)
  }
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

And we can then replace the entire loop by $O(\mbox{n})$.

template <typename Iterator, typename Comparable>
int addInOrder (Iterator start, Iterator stop, const Comparable& value)
{
  Iterator preStop = stop;    // O(1)
  --preStop;                  // O(1)
  // O(n)
  // Insert the new value
  *stop = value;              // O(1)
  return stop;                // O(1)
}

And now, we add up the complexities in the remaining straight-line sequence, and conclude that the entire algorithm has an average case complexity of $O(n)$, where $n$ is the distance from start to stop, when presented with randomly arranged inputs.

This is the same result we had for the worst case analysis. Does this mean that it runs in the same time on average as it does in the worst case? No, on average, it runs in half the time of its worst case, but that’s only a constant multiplier, so it disappears when we simplify.

Under similar randomly arranged inputs, the average case complexity of ordered search is $O(n)$ and the average case complexity of binary search is $O(\log n)$. Again, these are the same as their worst-case complexities.

8.3 Inputs in Almost-Sorted Order

We’ve already considered the case where the inputs to this function were already arranged into ascending order. What would happen if the inputs were almost, but not exactly, already sorted into ascending order?

For example, suppose that, on average, one out of $n$ items is out of order. Then the probability of a given input repeating the loop zero times would be $p_{\mbox{0}} = \frac{n-1}{n}$, and some single $p_{\mbox{i}}$ would have probability $1/n$, with all the other probabilities being zero.

Assuming the worst (because we want to find an upper bound), let’s assume that the one out-of-order element is the very last one added, and that it actually gets inserted into position 0. Then we have $p_0 = (n-1)/n, p_1 = 0, p_2 = 0, … , p_{n-1} = 0, p_n = 1/n$

So the average number of iterations would be given by \begin{eqnarray*} E(k) & = & \sum_{k=0}^{n} k p_k \\ & = & 0 * (n-1)/n + n * 1/n \\ & = & n/n \\ & = & 1 \\ \end{eqnarray*} and the function is $O(E(k)) = O(1)$

8.4 Almost Sorted - version 2

Now, that’s only one possible scenario in which the inputs are almost sorted. Let’s look at another. Suppose that we knew that, for each successive input, the probability of it appearing in the input $m$ steps out of its correct position is proportional to $1/(m+1)$ (i.e., each additional step out of its correct position is progressively more unlikely). Then we have $p_{0}=c, p_{1}=c/2, p_{2}=c/3, … p_{n-1}=c/n, p_{n}=c/(n+1)$.

The constant $c$ is necessary because the sum of all the probabilities must be exactly 1. We can compute the value of $c$ by using that fact:

\begin{align} \sum_{i=0}^{n} p_i = & 1 \\ \sum_{i=0}^{n} \frac{c}{i+1} = & 1 \\ c \sum_{i=0}^{n} \frac{1}{i+1} = & 1 \end{align}

This sum, for reasonably large n, is approximately $\log n$.

So we conclude that $c$ is approximately $= 1/\log(n)$.

So the function, for this input distribution, is

\begin{align} t_{\mbox{loop}} = & O(E(k)) \\ = & O\left(\sum_{i=0}^n (i + 1)p_i\right) \\ = & O\left(\sum_{i=0}^n (i + 1) \frac{c}{i+1}\right) \\ = & O\left(\sum_{i=0}^n c\right) \\ = & O((n+1)c) \\ = & O\left(\frac{n}{\log n}\right) \end{align}

So the average case is slightly smaller than the worst case, though not by much (remember that $\log n$ is nearly constant over large ranges of $n$, so $n/(\log(n))$ grows only slightly slower than $n$.

9 The Input Distribution is Key

You can see, then, that average case complexity can vary considerably depending upon just what constitutes an “average” set of inputs.

Utility functions that get used in many different programs may see different input distributions in each program, and so their average performances in different programs will vary accordingly.