Heaps

Problem: Given a collection of elements that carry a numeric “score”, find and remove the element with the smallest [largest] score. New elements may be added at any time.

In an earlier lesson, we saw that this collection is called a priority queue. Now we will look at an efficient way of implementing it.

1 Recap: the std Priority Queue Interface

This is the entire priority queue interface.

public class PriorityQueue<E> implements Queue<E> {

    // Create an empty priority queue.
    public PriorityQueue() { ... }

    // Create a priority queue using a specific comparator.
    public PriorityQueue(Comparator<E> comparator) { ... }

    // Create a priority queue initialized with all of the elements
    // from another Collection.
    public PriorityQueue(Collection<? extends E> c) { ... }

    public void clear() { ... }

    public boolean isEmpty() { ... }

    public int size() { ... }

    // Add an element ot the queue.
    public boolean add(E e) { ... }

    // Look at the front element without removing it.
    public E peek() { ... }

    // Remove and return the front element.
    public E poll() { ... }

       ⋮
}

We can add a new element into the priority queue.

Unlike add into a stack or queue, however, the element does not automatically become the first or last thing we will next retrieve. Exactly when we will see this element again depends on its priority value.
We can check the size of the priority queue or ask if it isEmpty.
We can peek at the front (smallest) element.
We can remove the smallest element by polling.

1.1 The Priority Queue Implementation

public class PriorityQueue<E> implements Queue<E> {

    private MinHeap<E> heap;

    // Create an empty priority queue.
    public PriorityQueue() {
        heap = new MinHeap<>();
    }

    // Create a priority queue using a specific comparator.
    public PriorityQueue(Comparator<E> comparator) {
        heap = new MinHeap<>(comparator);
    }

    // Create a priority queue initialized with all of the elements
    // from another Collection.
    public PriorityQueue(Collection<? extends E> c) {
        heap = new MinHeap<>(c);
    }

    public void clear() { heap.clear(); }

    public boolean isEmpty() { return heap.heapSize() == 0; }

    public int size() { return heap.heapSize(); }

    public boolean add(E e) {
        heap.insert(e);
        return true;
    }

    public E peek() { return heap.peek(); }

    public E poll() { return heap.remove(); }

    ⋮
}

We have added an implementing data structure, a heap. You can then see that the priority queue functions are pretty much one-liners, all passing the buck to similar functions provided by the heap.

We will look at this heap data structure in just a bit. First, though, let’s look at what we could do to implement priority queues using the data structures we already know.

One possibility would be to use a sorted sequential structure (array or linked list). For example, using an ArrayList, we would try to keep the elements in descending order by priority. Then we could peek() at the priority queue as the get(size()-1) of the implementing ArrayList.

Question: With this data structure, what would the complexities of the priority queue add and poll operations be?

O(1) and O(1)
O(1) and O(n)
O(n) and O(1)
O(n) and O(n)

Answer

Priority Queues via Ordered Insert

The priority queue add and poll operations would be $O(n)$ and $O(1)$ , respectively.

To a new element onto the priority queue, we must do an ordered insert, which we know is $O(n)$ .

To the priority queue, we remove its smallest element, which we have stored at the end of the ArrayList. We can remove an element from the end of a ArrayList in O(1) time. (That’s why we’re storing the elements in descending order of priority rather than in descending order. If we had used descending order, the largest element would be at the front instead of at the back, and removing it would be $O(n)$ .)

Note that, if we used a linked list instead of an array-based structure, we could store the items in either ascending or descending order and get an $O(1)$ poll, but add would remain $O(n)$ .

We can do better than that.

We might consider instead using a balanced binary search tree to store the priority queue. This time, it will be a little easier if we store the items in ascending order by priority.

Question:

Using a balanced binary search tree as the underlying data structure, what would the complexities of the priority queue add and poll operations be?

$O(1)$ ; $O(\log n)$
$O(\log n)$ ; $O(1)$
$O(\log n)$ ; $O(\log n)$
$O(\log n)$ ; $O(n)$
$O(n)$ ; $O(\log n)$
$O(n)$ ; $O(n)$

Answer

Priority Queues via Binary Search Trees

The priority queue and operations would be $O(\log n)$ ; $O(\log n)$ .

We would implement by simply inserting into the balanced search tree, which is $O(\log n)$ .

poll would be implemented by 1. running down from the root as far to the left as we can go (the smallest value in the tree), 2. then removing the data value that we find there.

Both steps are $O(\log n)$ .

These are both the worst case and the average complexities.

That sounds pretty good.

We can’t actually hope to improve on the worst case times offered by balanced search trees, we can match those worst case times (and improve on the multiplicative constant) and actually achieve O(1) average case complexities by the use of a new data structure, called a “heap”.

2 Implementing Priority Queues - the Heap

We can implement priority queues using a data structure called a (binary) heap, sometimes also known as a “binary heap”.

2.1 Binary Heaps

A binary heap is a binary tree with the properties:

The tree is complete (entirely filled, except possibly on the lowest level, which is filled from left to right).
Each non-root node in the tree has a smaller (or equal) value than its parent.

**Important: A heap is a binary tree, but not a binary search tree. The ordering rules for heaps are different from those of binary search trees.

What I have defined here is sometimes called a max-heap, because the largest value in the heap will be at the root. We can also have a min-heap, in which every child has a value larger than its parent.

Max-heaps always have their very largest value in their root. Min-heaps always have their smallest value in the root.

In this course, we will always assume that a “heap” is a “max-heap” unless explicitly stated otherwise.

Let’s look at the implications of each of these two properties.

2.2 Heaps are complete trees

Here’s an example of a complete binary tree.

Complete binary trees have a very simple linear representation, allowing us to implement them in an array or vector with no pointers.

The parent of node $i$ is in slot $\left\lfloor \frac{i-1}{2} \right\rfloor$ .
The children of node $i$ are in $2i+1$ and $2i+2$ .

2.3 Children’s Values are Smaller than Their Parent’s

Using the same tree shape, we can fill in some values to show an example of a heap.

Each parent has a value larger than its children’s values (and, therefore, larger than the values of any of its descendants).

So when we ask for the front (largest value) of a priority queue, we find it in the root of the heap, which in turn will be in position 0 of the array/vector.

2.4 The Data Structure

The code in the textbook is not generic, so I will present a generic version.

class MaxHeap<E> {

    private Object[] heap; // Pointer to the heap array
    private Comparator<E> compare; // comparator to use when comparing elements
    private int n; // Number of things now in heap
    ⋮

Our basic data structure will be an array. In the textbook, this array has a fixed size that constitutes the maximum size of the queue. I will instead use an ArrayList-style doubling of the array whenever and add operation threatens to overflow it.

We are using the array to store a tree, but most of our “thinking” about this code will be in terms of a tree. So some useful utility functions will find the parents and children of any node.

    ⋮
    // Return true if pos a leaf position, false otherwise
    private boolean isLeaf(int pos) {
        return (n / 2 <= pos) && (pos < n);
    }

    // Return position for left child of pos
    private static int leftChild(int pos) {
        return 2 * pos + 1;
    }

    // Return position for right child of pos
    private static int rightChild(int pos) {
        return 2 * pos + 2;
    }

    // Return position for parent
    private static int parent(int pos) {
        return (pos - 1) / 2;
    }
    ⋮

2.5 Sifting Up and Sifting Down

Before looking in detail at how to add and delete elements from a heap, let’s consider a situation in which we have a “damaged” heap with one node out of position.

How do we “fix” the heap? There are two cases to consider.

The out-of-place node is too large (i.e., larger than its parent).
The out-of-place node is too small (i.e., smaller than one or both of its children).

2.5.1 Sifting Up

When we have a node that is larger than its parent, we sift it up (sometimes call “bubbling up”) by swapping it with its parent until it has reached its proper position.

    // Moves an element up to its correct place
    private void siftUp(int pos) {
        while (pos > 0) {
            int parent = parent(pos);
            if (isGreaterThan(parent, pos)) {
                return; // stop early
            }
            swap(pos, parent);
            pos = parent; // keep sifting up
        }
    }

    // swaps the elements at two positions
    private void swap(int pos1, int pos2) {
        Object temp = heap[pos1];
        heap[pos1] = heap[pos2];
        heap[pos2] = temp;
    }

    // does comparison used for checking heap validity
    private boolean isGreaterThan(int pos1, int pos2) {
        E e1 = (E) heap[pos1];
        E e2 = (E) heap[pos2];
        return compare.compare(e1, e2) > 0;
    }

In this case, starting with pos = 8, we swap node 8 with its parent 3 …

… and then because node 3 is still greater than its parent (1), we swap again.

Node 1 is smaller than its parent, so we are done.

1 of 3

Note that we have repaired the heap. The final arrangement satisfies the ordering requirements for a heap.

2.5.2 Sifting Down

When we have a node that is smaller than one or both of its children, we sift it down (also known as “percolate down” or “drip down”) by swapping it with the larger of its children until it has reached its proper position.

    // Moves an element down to its correct place
    private void siftDown(int pos) {
        while (!isLeaf(pos)) {
            int child = leftChild(pos);
            if ((child + 1 < n) && isGreaterThan(child + 1, child)) {
                child = child + 1; // child is now index with the lesser value
            }
            if (!isGreaterThan(child, pos)) {
                return; // stop early
            }
            swap(pos, child);
            pos = child; // keep sifting down
        }
    }

This is only a little more complicated than bubbling up. The main complication is that the current node might have 0 children, 1 child, or 2 children, so we need to be careful that we don’t try to access the value of non-existent children.

In this case, starting with pos = 0, we swap node 0 with its larger child, 2 …

… and then, because node 2 is still less than one of its children (5), we swap again.

Now, node 5 has no children (pos >= n), so we are done sifting.

Note that we have again repaired the heap. The final arrangement satisfies the ordering requirements for a heap.

1 of 3

If you understand the ideas of sifting up and sifting down, then almost all the things you would want to do to a heap become a variant of those two ideas.

2.6 Inserting into a heap

Suppose we have this heap and we want to add a new item to it.

Now, after we add an item to the heap, it will have one more tree node than it currently does. Because heaps are complete trees, we know exactly how the shape of the tree will change, even if we can’t be sure how the data values in the tree might be rearranged.

Question: How will the shape of the tree shown above change?

A new child will be added to the node that currently contains 48.
A new child will be added to the one of the nodes that currently contain 48, 60, or 11.
A new child will be added to one of the current leaves.
None of the above.

Answer:

Well, suppose that we just go ahead and put the new value into that position.

We’ve got two possibilities.

We might get lucky – maybe this is where the new value belongs.
If the new value is out of position, it must be because it is larger than its parent.

It would be the only node that was out of position, and we know how to “repair” a heap with a single node out of position that is larger than its parent — we sift up!

    // Insert val into heap
    public void insert(E key) {
        expandArrayIfNecessary();
        // Add the new value
        heap[n] = key;
        n++;
        siftUp(n - 1);
    }

    // Make sure that we have room to add one more element
    private expandArrayIfNecessary() {
        if (n >= heap.length) {
            // If we are about to overflow the array, double its size.
            int newCapacity = Math.max(1, 2 * heap.length);
            Object[] newHeap = new Object[newCapacity];
            System.arraycopy(heap, 0, newHeap, 0, n);
            heap = newHeap;
        }
    }

For example, suppose we wanted to add 54 to the heap. First we would add 54 onto the end of the array, in effect adding it to the complete tree.

When siftUp is called, we note that 54 is greater than its parent, so we swap …

… and since 54 is no longer greater than its new parent, we are done.

1 of 3

2.7 Removing from Heaps

When we remove the largest element from a heap, we know that the value being removed is the value currently in the root.

We also know how the tree shape will change. The rightmost node in the bottom level will disappear.

Now, unless the heap only has one node, the node that’s disappearing does not contain the value that we’re actually removing. So, we have two problems:

What value goes into the root to replace the one being removed?
What do we do with the value currently in the node that’s going to disappear?

So, we’ve got a node with no data, and data that needs a node. The natural thing to do is to put the data in that node.

That data value will almost certainly be out of position, being smaller than one or both of its children, but, again, that’s only a single node that’s out of position. We know how to fix that.

    // Remove and return root
    public E remove() {
        n--;
        swap(0, n); // Swap maximum with last value
        if (n > 0)
            siftDown(0); // Put new heap root val in correct place
        return (E) heap[n];
    }

Suppose we wanted to remove the maximum value from this heap.

The first step is to replace the root value by 47.

Then we pop the back of the vector to remove that final node.

When siftDown is called, we swap 47 with the larger of its children.

and then, because it is still out of position, we swap it with the larger of its children again.

And then we are done.

1 of 4

3 Analysis

A binary heap has the same shape as a balanced binary search tree.

Therefore its height, for $n$ nodes, is $\left\lfloor \log(n) \right\rfloor$ .

3.1 insert and remove

and do $O(1)$ work on each node along a path that runs, at worst, between a single leaf and the root.

Hence both operations are $O(\log n)$ , worst case.

The case for is $O(1)$ . The proof of this is beyond scope of this class.

3.2 buildHeap

A single insertion is $O(\log n)$ worst case and O(1) average.

What happens if we start with an empty heap and do $n$ inserts? The resulting total could be $O(n \log n)$ .

As it happens, we can do better with a special build operation to build an entire heap from an array (or array-like structure such as a vector).

    // Heapify contents of the heap array
    protected void buildHeap() {
        for (int i = parent(n - 1); i >= 0; i--) {
            siftDown(i);
        }
    }

Start with the data in any order.
Force heap order by percolating each non-leaf node.

Since each siftDown takes, in worst case, a time proportional to the height of the node being sifted, the total time for buildHeap is proportional to the sum of the heights of all the nodes in a complete tree.

Consider an array with $N$ elements. Let $h$ be the height of the complete tree representing that array. $h = \log N$

We apply percolate to the first $N/2$ elements.

The first element can move at most $h-1$ times.
The next two elements can move at most $h-2$ times.
The next four elements can move at most $h-2$ times.
The next eight elements can move at most $h-3$ times.
⋮
The last $N/2$ elements don’t move at all.

Total work: $\sum_{i=0}^h i \frac{n}{2^{i-1}}$

$= \sum_{i=0}^h n \frac{i}{2^{i-1}}$

$= n * \sum_{i=0}^h \frac{i}{2^{i-1}}$

Using one of our simplifications from the FAQ,

$< n * 3$

$= O(n)$

Therefore is $O(n)$ .

So it’s cheaper to build a heap all at once than to do it one insert at a time, although neither approach is terribly expensive.

In our MaxHeap, MinHeap, and PriorityQueue classes, this buildHeap function is used when we construct a new heap or priority queue form an existing collection of objects, e.g.,

    // Constructor supporting preloading of heap contents
    public MinHeap(Collection<? extends E> h) {
        heap = new Object[h.size()];
        compare = ...
        n = 0;
        for (E e : h) {
            heap[n] = e;
            ++n;
        }
        buildHeap();
    }

4 From MaxHeap to MinHeap

We can create a min heap (that returns the smallest value first) with a simple change to the MaxHeap code:

class MinHeap<E> {
       ⋮
    // Moves an element down to its correct place
    private void siftDown(int pos) {
        assert (0 <= pos && pos < n) : "Invalid heap position";
        while (!isLeaf(pos)) {
            int child = leftChild(pos);
            if ((child + 1 < n) && isLessThan(child + 1, child)) {
                child = child + 1; // child is now index with the lesser value
            }
            if (!isLessThan(child, pos)) {
                return; // stop early
            }
            swap(pos, child);
            pos = child; // keep sifting down
        }
    }

    // Moves an element up to its correct place
    private void siftUp(int pos) {
        assert (0 <= pos && pos < n) : "Invalid heap position";
        while (pos > 0) {
            int parent = parent(pos);
            if (isLessThan(parent, pos)) {
                return; // stop early
            }
            swap(pos, parent);
            pos = parent; // keep sifting up
        }
    }

    // does comparison used for checking heap validity
    private boolean isLessThan(int pos1, int pos2) {
        E e1 = (E) heap[pos1];
        E e2 = (E) heap[pos2];
        return compare.compare(e1, e2) < 0;
    }

      ⋮
}

Each place where our MaxHeap called isGreaterThan, we instead call isLessThan.

Note that our PriorityQueue implementation uses MinHeap.

5 Recap: Complexity of Heap Operations

Building a heap from an array of N items: $O(N)$ , worst-case and average-case
Inserting one element into a heap of size N: $O(\log N)$ , worst-case, $O(1)$ average
Removing the largest element from a heap of size N: $O(\log N)$ , worst-case and average-case