Sorting --- Merge Sort
Steven J. Zeil
Our next algorithm actually achieves the optimal big-O behavior for a sorting algorithm. The merge sort has $O(n \log n)$ time for both its worst and average case.
This doesn’t necessarily make it the ideal choice, however, in all sorting applications. The constant multiplier on the timing is somewhat high, and merge sort can require an unusually high amount of memory.
Variants of the basic merge sort algorithm are, however, often used with linked lists (which can’t be sorted by most other $O(n \log n)$ algorithms and are used to sort data residing on disk or magnetic tape.
1 Merging Sorted Data
Before tackling the merge sort itself, we start with a simpler function that is used by merge sort.
Suppose that our sequence of data can be divided into two parts, such that a[leftPos..rightPos-1]
is already sorted and a[rightPos..rightEnd]
is already sorted. Then we could merge the two parts into a combined sorted sequence using the code shown here.
template <typename Comparable>
void merge(vector<Comparable>& a, vector<Comparable>& tmpArray,
int leftPos, int rightPos, int rightEnd)
{
int leftEnd = rightPos - 1;
int tmpPos = leftPos;
int numElements = rightEnd - leftPos - 1;
// Main loop
while( leftPos <= leftEnd && rightPose <= rightEnd ) ➀
if ( a[leftPos] <= a[rightPos] )
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] );
else
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] );
while( leftPos <= leftEnd ) // Copy rest of first half ➁
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] );
while( rightPos <= rightEnd ) // Copy rest of second half
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] );
for( int i = 0; i < numElements; --rightEnd) ➂
a[rightEnd] = std::move( tmpArray[rightEnd] );
}
1.1 Understanding the Merge Algorithm
The heart of the merge algorithm is the first loop (➀).
The variables leftPos, rightPos, and rightEnd mark off two subsequences that we want to merge. We can think of a[leftPos ... rightPos-1]
and a[rightPos ... rightEnd]
as two separate, sorted sequences. We want to combine them into a single sorted sequence, tmpArray.
The way to do this is quite simple. Just compare the first element in each of the two input (sub)sequences and copy the smaller one.
For example, if we were merging subsequences
$ [ 2 \; 4 \; 5 \; 6 ] $
and
$ [ 1 \; 3] $
we would compare the first element in each one (2 and 1) and decide to copy 1.
Then we continue with the remainder, merging
$ [ 2 \; 4 \; 5 \; 6] $
and
$ [ 3 ]$.
On the next step we would copy 2, and be left with the merge of
$[ 4 \; 5 \; 6 ] $
and
$ [ 3 ]$
We would then copy 3.
At this point, our temporary vector contains
$ [ 1 \; 2 \; 3] $
We would now exit from this main loop, because one of the arrays has been completely emptied out.
The rest of the algorithm is “cleanup”. We exit the main loop when we have emptied one of the two subsequences, so there is a possibility that the other subsequence still has data. The next two loops (➁) copy that data from the remainder of the two subsequences. (Because one of those subsequences has been emptied, one of these loops will execute zero times.)
Finally (➂), we copy the entire merged data set back out of the temporary vector into the original vector.
The code discussed here is available as an animation that you can run to see how it works.
1.2 Merge Analysis
template <typename Comparable>
void merge(vector<Comparable>& a, vector<Comparable>& tmpArray,
int leftPos, int rightPos, int rightEnd)
{
int leftEnd = rightPos - 1;
int tmpPos = leftPos;
int numElements = rightEnd - leftPos + 1;
// Main loop
while( leftPos <= leftEnd && rightPose <= rightEnd ) ➀
if ( a[leftPos] <= a[rightPos] )
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] );
else
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] );
while( leftPos <= leftEnd ) // Copy rest of first half ➁
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] );
while( rightPos <= rightEnd ) // Copy rest of second half
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] );
for( int i = 0; i < numElements; ++i, --rightEnd) ➂
a[rightEnd] = std::move( tmpArray[rightEnd] );
}
There are several assignment/move calls, which we will assume are $O(1)$.
This means that all the loop bodies are $O(1)$.
template <typename Comparable>
void merge(vector<Comparable>& a, vector<Comparable>& tmpArray,
int leftPos, int rightPos, int rightEnd)
{
int leftEnd = rightPos - 1;
int tmpPos = leftPos;
int numElements = rightEnd - leftPos + 1;
// Main loop
while( leftPos <= leftEnd && rightPose <= rightEnd )
if ( a[leftPos] <= a[rightPos] ) // total: O(1)
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] ); // O(1)
else
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] ); // O(1)
while( leftPos <= leftEnd ) // Copy rest of first half
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] ); // O(1)
while( rightPos <= rightEnd ) // Copy rest of second half
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] ); // O(1)
for( int i = 0; i < numElements; ++i, --rightEnd)
a[rightEnd] = std::move( tmpArray[rightEnd] ); // O(1)
}
Looking at the code for the first 3 loops, note that
-
each iteration of those loops adds one element into tmpArray.
-
no element is copied multiple times. If we copy the element at leftPos, we also increment leftPos, so we will not copy that element again. Similarly, if we copy the element at rightPos, we also increment rightPos, so we will not copy that element again.
Since there are a total of rightEnd-leftPos+1
elements, each loop can repeat no more than rightEnd-leftPos+1
times.
template <typename Comparable>
void merge(vector<Comparable>& a, vector<Comparable>& tmpArray,
int leftPos, int rightPos, int rightEnd)
{
int leftEnd = rightPos - 1;
int tmpPos = leftPos;
int numElements = rightEnd - leftPos + 1;
// Main loop
while( leftPos <= leftEnd && rightPose <= rightEnd ) // total: O(numElements)
if ( a[leftPos] <= a[rightPos] ) // total: O(1)
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] ); // O(1)
else
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] ); // O(1)
while( leftPos <= leftEnd ) // Copy rest of first half // total: O(numElements)
tmpArray [ tmpPos++ ] = std::move( a[ leftPos++ ] ); // O(1)
while( rightPos <= rightEnd ) // Copy rest of second half // total: O(numElements)
tmpArray [ tmpPos++ ] = std::move( a[ rightPos++ ] ); // O(1)
for( int i = 0; i < numElements; ++i, --rightEnd)
a[rightEnd] = std::move( tmpArray[rightEnd] ); // O(1)
}
In fact, the sum of the number of iterations of all three loops is rightEnd-leftPos+1
.
So all three loops, summed together, are O(numElements)
where numElements
is rightEnd-leftPos+1
.
template <typename Comparable>
void merge(vector<Comparable>& a, vector<Comparable>& tmpArray,
int leftPos, int rightPos, int rightEnd)
{
int leftEnd = rightPos - 1;
int tmpPos = leftPos;
int numElements = rightEnd - leftPos + 1;
// O(numElements)
for( int i = 0; i < numElements; ++i, --rightEnd)
a[rightEnd] = std::move( tmpArray[rightEnd] ); // O(1)
}
The last loop clearly repeats numElements
times.
template <typename Comparable>
void merge(vector<Comparable>& a, vector<Comparable>& tmpArray,
int leftPos, int rightPos, int rightEnd)
{
int leftEnd = rightPos - 1;
int tmpPos = leftPos;
int numElements = rightEnd - leftPos + 1;
// O(numElements)
for( int i = 0; i < numElements; ++i, --rightEnd) //cond: O(1) #: numElements total: O(numElements)
a[rightEnd] = std::move( tmpArray[rightEnd] ); // O(1)
}
That leaves only a handful of O(1) statements that will all be dominated by the complexity of the loops, so
- merge is $O(\mbox{rightEnd}-\mbox{leftPos})$.
2 Merge Sort
The merge function lets us combine two sorted sequences of data into a single sorted sequence. But how do we get the two sorted sequences in the first place? By merge’ing two even smaller sorted sequences!
2.1 The Algorithm
template <typename Comparable>
void mergeSort(vector<Comparable>& a)
{
vector<Comparable> tmpArray( a.size() );
mergeSort ( a, tmpArray, 0, a.size()-1 );
}
template <typename Comparable>
void mergeSort(vector<Comparable>& a, vector<Comparable>& tmpArray,
int left, int right)
{
// if the sublist has more than 1 element continue
if (left < right)
{
int center = (left + right) / 2;
mergeSort(a, tmpArray, first, center);
mergeSort(a, tmpArray, center+1, right);
merge(a, tmpArray, left, center+1, right);
}
}
The first function sets up the second function by allocating the temporary vector that is used by merge
and then telling the second function toe mergesort the entire range of input data.
The heard of the mergesort algorithm is the second function. It is almost amazingly simple, consisting simply of two recursive calls to itself, each attempting to sort half the vector, followed by a call to merge to combine the two sorted halves into a single sorted sequence.
For many people, the very simplicity of this algorithm makes it hard to believe that it can work. I therefore recommend strongly that you run this algorithm until you are comfortable with your understanding of it.
2.2 MergeSort Analysis
template <typename Comparable>
void mergeSort(vector<Comparable>& a, vector<Comparable>& tmpArray,
int left, int right)
{
// if the sublist has more than 1 element continue
if (left < right)
{
int center = (left + right) / 2;
mergeSort(a, tmpArray, first, center);
mergeSort(a, tmpArray, center+1, right);
merge(a, tmpArray, left, center+1, right);
}
}
-
Each call to mergeSort is either done in O(1) time (if $\mbox{left} \geq \mbox{right}$) or splits the array into two equal ($\pm 1$) pieces.
-
How many times can we split the vector into halves? That should be a familiar idea by now.
We can do this split up to $\log N$ times.
We can envision the recursive mergeSort calls (in blue) and the subsequent calls to merge (in yellow) as a tree-like structure.
Let $N$ denote the total number of elements being sorted (the value of last-first
on the very first call to mergeSort).
-
Each level in the tree involves no more than $N$ objects, split in various ways and needing to be merged.
-
merge is $O(k)$, where $k$ is the number of elements to be merged. The sum of all the $k$ values at any level of the yellow tree is $N$. Consequently the combined set of merges at each level of the tree is $O(N)$.
-
The blue tree represents all the non-merge work in mergeSort. But there’s only $O(1)$ work in each of those blue nodes. Since the most blue nodes we could have at one level is $N$, each blue level is, at most, $O(N)$ total work.
-
Because we have $\log N$ levels, each level taking $O(N)$ work, the overall merge sort code is (worst & average case) $O(N \log N)$.
So merge sort is as fast as any pairwise-comparison sort can be. Still, merge sort is not considered to be the “ideal” sorting algorithm. Its primary drawbacks are
-
It requires $O(N)$ extra storage (for the tempVector)
-
It does the full set of comparisons and copies even when applied to arrays that are already sorted.
On the other hand, merge sort has an advantage that may, at first glance, not have seemed very important. The merge routine itself moves sequentially through its working arrays, not jumping from place to place. This behavior would be absolutely wonderful if we were storing our arrays in some strange kind of memory where moving forward one place is cheap, but jumping to an arbitrary position is expensive.
In fact, that “strange kind of memory” does exist:
-
Linked lists support this movement pattern and can be sorted quickly using merge sort. Most other fast sorting algorithms are limited to array-like structures.
-
Disk drives and magnetic tape both meet that movement pattern as well. Hence variations of merge sort have long been the algorithm of choice in external sorting, sorting sets of material stored in disk/tape files that are too large to load into memory.