Sorting --- Insertion Sort

Steven J. Zeil

Last modified: Oct 26, 2023

Contents:

1 The Algorithm

2 Insertion Sort: Worst Case Analysis

3 Insertion Sort: special case

4 Average-Case Analysis for Insertion Sort

4.1 Inversions

4.2 A Speed Limit on Adjacent-Swap Sorting

Sorting: given a sequence of data items in an unknown order, re-arrange the items to put them into ascending (descending) order by key.

Sorting algorithms have been studied extensively. There is no one best algorithm for all circumstances, but the big-O behavior is a key to understanding where and when to use different algorithms.

The insertion sort divides the list of items into a sorted and an unsorted region, with the sorted items in the first part of the list.

Idea: Repeatedly take the first item from the unsorted region and insert it into the proper position in the sorted portion of the list.

1 The Algorithm

This is the insertion sort:

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );
     
     int j;
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
     for( j = p; j > 0 && tmp < a[j-1]; --j)
         a[j] = std::move (a[j-1]);
      // the location is found; insert target
     a[j] = std::move( tmp );
   }
}

At the beginning of each outer iteration, items 0 … p-1 are properly ordered.

Each outer iteration seeks to insert item a[p] into the appropriate position within 0 … p.

The std::move calls allow a speedup for those data types that have implemented move constructors and move assignment operators. For those data types, data is moved instead of assigned. For types that do not support those rather arcane move functions, the std::move call does nothing and a normal assignment is performed.

Try out the insertion sort in an animation.

2 Insertion Sort: Worst Case Analysis

Assume comparisons & copying are O(1).

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );         // O(1)
     
     int j;                                      // O(1)
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
     for( j = p; j > 0 && tmp < a[j-1]; --j)
         a[j] = std::move (a[j-1]);              // O(1)
      // the location is found; insert target
     a[j] = std::move( tmp );                    // O(1)
   }
}

Looking at the inner loop,

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );         // O(1)
     
     int j;                                      // O(1)
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
     for( j = p; j > 0 && tmp < a[j-1]; --j)
         a[j] = std::move (a[j-1]);              // O(1)
      // the location is found; insert target
     a[j] = std::move( tmp );                    // O(1)
   }
}

Question: In the worst case, how many times do we go around the inner loop (to within plus or minus 1)?

0 times
1 time
p times
j times
n times

**Answer:**

With that determined,

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );         // O(1)
     
     int j;                                      // O(1)
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
     for( j = p; j > 0 && tmp < a[j-1]; --j)  // cond: O(1) #: p
         a[j] = std::move (a[j-1]);              // O(1)
      // the location is found; insert target
     a[j] = std::move( tmp );                    // O(1)
   }
}

Moving on…

Question: So what is the complexity of the inner loop?

O(1)
O(p)
O(j)
O(n)
None of the above

**Answer:**

The body and condition are O(1), and the loop executes p times, so the entire loop is O(p).

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );         // O(1)
     
     int j;                                      // O(1)
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
     for( j = p; j > 0 && tmp < a[j-1]; --j)    // cond: O(1) #: p total: O(p)
         a[j] = std::move (a[j-1]);              // O(1)
      // the location is found; insert target
     a[j] = std::move( tmp );                    // O(1)
   }
}

Don’t be fooled by the fact that $p$ is always less than $n$ into jumping immediately to $O(n)$. We always analyze loops from the inside out, and the relationship between $p$ and $n$ is outside the current portion of the algorithm you were asked about.

Although, O(n) is, technically correct, it’s a looser bound than O(p), and moving prematurely to unnecessarily loose bounds can, in some cases, cause us to miss possible simplifications and wind up with an final bound that will be much larger than necessary. (After all, not only is $O(n)$ technically correct, but so is $O(n^{1000})$, and clearly if we adopted that as our bound, the resulting answer would be too large to be meaningful.)

Now, looking at the outer loop body,

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );         // O(1)
     
     int j;                                      // O(1)
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
      O(p) 
         a[j] = std::move (a[j-1]);              // O(1)
      // the location is found; insert target
     a[j] = std::move( tmp );                    // O(1)
   }
}

So the entire outer loop body is $O(p)$.

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
      O(p) 
   }
}

Let $n$ denote a.size(). The outer loop executes $n-1$ times.

Question: What, then is the complexity of the entire outer loop?

$O(p)$
$O(n)$
$O(p*(n-1))$
$O(p*n)$
$O(p^2)$
$O(n^2)$
$O((n*(n-1))/2)$

**Answer:**

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{ // let n = a.size()
   for( int p = 1; p < a.size(); ++p) // cond: O(1) #: n total: O(n^2)
   {
      O(p)
   }
}

If you gave any answer involving p, you should have known better from the start. Complexity of a block of code must always be described in therms of the inputs to that code. p is not an input to the loop - any value it might have held prior to the start of the loop is ignored and overwritten.

Question: What, then is the complexity of the entire function?

$O(n)$
$O(n^2)$
$O((n*(n-1))/2)$
None of the above

**Answer:**

Insertion sort has a worst case of $O(N^2)$ where $N$ is the size of the input vector.

3 Insertion Sort: special case

As a special case, consider the behavior of this algorithm when applied to an array that is already sorted.

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );
     
     int j;
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
     for( j = p; j > 0 && tmp < a[j-1]; --j)
         a[j] = std::move (a[j-1]);
      // the location is found; insert target
     a[j] = std::move( tmp );
   }
}

Note that if the array is already sorted, then we never enter the body of the inner loop. The inner loop is then $O(1)$ and insertionSort is $O(\mbox{a.size()})$.

This makes insertion sort a reasonable choice when adding a few items to a large, already sorted array.

4 Average-Case Analysis for Insertion Sort

Instead of doing the average case analysis by the copy-and-paste technique, we’ll produce a result that works for all algorithms that behave like it.

Define an inversion of an array a as any pair (i,j) such that i<j but a[i]>a[j].

Question: How many inversions in this array?

$ [29 \; 10 \; 14 \; 37 \; 13] $

**Answer**

4.1 Inversions

In an array of n elements, the most inversions occur when the array is in exactly reversed order. Inversions then are

inversions	count
(1,2), (1,3), (1,4), … , (1,n),	n-1
(2,3), (2,4), … , (2,n),	n-2
(3,4), … , (3,n),	n-3
`⋮`	`⋮`
(n-1,n)	1

Counting these we have (starting from the bottom): $\sum_{i=1}^{n-1} i$ inversions. So the total # of inversions is $\frac{n*(n-1)}{2}$.

We’ll state this formally:

Theorem: The maximum number of inversions in an array of $n$ elements is $(n*(n-1))/2$.

We have just proven that theorem. Now, another one, describing the average:

Theorem: The average number of inversions in an array of $n$ randomly selected elements is $(n*(n-1))/4$.

We won’t prove this, but note that it makes sense, since the minimum number of inversions is 0, and the maximum is $(n*(n-1))/2$, so it makes intuitive sense that the average would be the midpoint of these two values.

4.2 A Speed Limit on Adjacent-Swap Sorting

Now, the result we have been working toward:

Theorem: Any sorting algorithm that only swaps adjacent elements has average time no faster than $O(n^2)$.

Proof

Swapping 2 adjacent elements in an array removes at most 1 inversion.

But on average, there are $O(n^2)$ inversions, so the total number of swaps required is at least $O(n^2)$.

Hence the algorithm as a whole can be no faster than $O(n^2)$.

QED

And,

Corollary: Insertion sort has average case complexity $O(n^2)$.

Proof

Insertion sort is often written like this:

// Weiss 7.2
//
template <typename Comparable>
void insertionSort(vector<Comparable>& v)
{
   for( int p = 1; p < a.size(); ++p)
   {
     // place a[p] into the sublist
     //   a[0] ... a[i-1], 1 <= i < p,
     //   so it is in the correct position
     Comparable tmp = std::move( a[p] );
     
     int j;
      // locate insertion point by scanning downward as long
      // as tmp < a[j-1] and we have not encountered the
      // beginning of the list
     for( j = p; j > 0 && a[j] < a[j-1]; --j)
         std::swap(a[j], a[j-1]);
      // the location is found; insert target
      // a[j] = std::move( tmp );

   }
}

and it is clear that this version only exchanges adjacent elements.

By the theorem just given, the best average case complexity we could therefore get is $O(n^2)$.

The theorem does not preclude an average case complexity even slower than that, but we know that the worst case complexity is also $O(n^2)$, and the average case can’t be any slower than the worst case.

So we conclude that the average case complexity is, indeed, $O(n^2)$.

Our actual algorithm replaces the swap call with a single assignment, cutting the cost of the inner loop body in half. But that’s just a reduction by a constant multiplier, which cannot affect the overall complexity. So the actual algorithm given at the top of the page is also $O(n^2)$.