Binary Search Trees
Steven J. Zeil
A tree in which every parent has at most 2 children is a binary tree.
The most common use of binary trees is for ADTs that require frequent searches for arbitrary keys.
- E.g., sets, maps
For this we use a special form of binary tree, the binary search tree.
1 Definition: Binary Search Trees
A binary tree T is a binary search tree if, for each node $n$ with children $T_L$ and $T_R$:
-
The value in n is greater than the values in every node in $T_L$.
-
The value in n is less than the values in every node in $T_R$.
-
Both $T_L$ and $T_R$ are binary search trees.
Question: Is this a BST?
1.1 The Binary Search Tree ADT
Let’s look at the basic interface for a binary search tree, from your textbook:
#ifndef BINARY_SEARCH_TREE_H
#define BINARY_SEARCH_TREE_H
#include "dsexceptions.h"
#include <algorithm>
using namespace std;
// BinarySearchTree class
//
// CONSTRUCTION: zero parameter
//
// ******************PUBLIC OPERATIONS*********************
// void insert( x ) --> Insert x
// void remove( x ) --> Remove x
// bool contains( x ) --> Return true if x is present
// Comparable findMin( ) --> Return smallest item
// Comparable findMax( ) --> Return largest item
// boolean isEmpty( ) --> Return true if empty; else false
// void makeEmpty( ) --> Remove all items
// void printTree( ) --> Print tree in sorted order
// ******************ERRORS********************************
// Throws UnderflowException as warranted
template <typename Comparable>
class BinarySearchTree ➂
{
public:
BinarySearchTree( ) : root{ nullptr }
{
}
/**
* Copy constructor
*/
BinarySearchTree( const BinarySearchTree & rhs ) : root{ nullptr }
{
root = clone( rhs.root );
}
/**
* Move constructor
*/
BinarySearchTree( BinarySearchTree && rhs ) : root{ rhs.root }
{
rhs.root = nullptr;
}
/**
* Destructor for the tree
*/
~BinarySearchTree( )
{
makeEmpty( );
}
/**
* Copy assignment
*/
BinarySearchTree & operator=( const BinarySearchTree & rhs )
{
BinarySearchTree copy = rhs;
std::swap( *this, copy );
return *this;
}
/**
* Move assignment
*/
BinarySearchTree & operator=( BinarySearchTree && rhs )
{
std::swap( root, rhs.root );
return *this;
}
/**
* Find the smallest item in the tree.
* Throw UnderflowException if empty.
*/
const Comparable & findMin( ) const
{
if( isEmpty( ) )
throw UnderflowException{ };
return findMin( root )->element;
}
/**
* Find the largest item in the tree.
* Throw UnderflowException if empty.
*/
const Comparable & findMax( ) const
{
if( isEmpty( ) )
throw UnderflowException{ };
return findMax( root )->element;
}
/**
* Returns true if x is found in the tree.
*/
bool contains( const Comparable & x ) const ➃
{
return contains( x, root );
}
/**
* Test if the tree is logically empty.
* Return true if empty, false otherwise.
*/
bool isEmpty( ) const
{
return root == nullptr;
}
/**
* Print the tree contents in sorted order.
*/
void printTree( ostream & out = cout ) const
{
if( isEmpty( ) )
out << "Empty tree" << endl;
else
printTree( root, out );
}
/**
* Make the tree logically empty.
*/
void makeEmpty( )
{
makeEmpty( root );
}
/**
* Insert x into the tree; duplicates are ignored.
*/
void insert( const Comparable & x ) ➄
{
insert( x, root );
}
/**
* Insert x into the tree; duplicates are ignored.
*/
void insert( Comparable && x )
{
insert( std::move( x ), root );
}
/**
* Remove x from the tree. Nothing is done if x is not found.
*/
void remove( const Comparable & x ) ➅
{
remove( x, root );
}
private:
struct BinaryNode ➀
{
Comparable element;
BinaryNode *left;
BinaryNode *right;
BinaryNode( const Comparable & theElement, BinaryNode *lt, BinaryNode *rt )
: element{ theElement }, left{ lt }, right{ rt } { }
BinaryNode( Comparable && theElement, BinaryNode *lt, BinaryNode *rt )
: element{ std::move( theElement ) }, left{ lt }, right{ rt } { }
};
BinaryNode *root;
/**
* Internal method to insert into a subtree.
* x is the item to insert.
* t is the node that roots the subtree.
* Set the new root of the subtree.
*/
void insert( const Comparable & x, BinaryNode * & t ) ➆
{
if( t == nullptr )
t = new BinaryNode{ x, nullptr, nullptr };
else if( x < t->element )
insert( x, t->left );
else if( t->element < x )
insert( x, t->right );
else
; // Duplicate; do nothing
}
/**
* Internal method to insert into a subtree.
* x is the item to insert.
* t is the node that roots the subtree.
* Set the new root of the subtree.
*/
void insert( Comparable && x, BinaryNode * & t )
{
if( t == nullptr )
t = new BinaryNode{ std::move( x ), nullptr, nullptr };
else if( x < t->element )
insert( std::move( x ), t->left );
else if( t->element < x )
insert( std::move( x ), t->right );
else
; // Duplicate; do nothing
}
/**
* Internal method to remove from a subtree.
* x is the item to remove.
* t is the node that roots the subtree.
* Set the new root of the subtree.
*/
void remove( const Comparable & x, BinaryNode * & t )
{
if( t == nullptr )
return; // Item not found; do nothing
if( x < t->element )
remove( x, t->left );
else if( t->element < x )
remove( x, t->right );
else if( t->left != nullptr && t->right != nullptr ) // Two children
{
t->element = findMin( t->right )->element;
remove( t->element, t->right );
}
else
{
BinaryNode *oldNode = t;
t = ( t->left != nullptr ) ? t->left : t->right;
delete oldNode;
}
}
/**
* Internal method to find the smallest item in a subtree t.
* Return node containing the smallest item.
*/
BinaryNode * findMin( BinaryNode *t ) const
{
if( t == nullptr )
return nullptr;
if( t->left == nullptr )
return t;
return findMin( t->left );
}
/**
* Internal method to find the largest item in a subtree t.
* Return node containing the largest item.
*/
BinaryNode * findMax( BinaryNode *t ) const
{
if( t != nullptr )
while( t->right != nullptr )
t = t->right;
return t;
}
/**
* Internal method to test if an item is in a subtree.
* x is item to search for.
* t is the node that roots the subtree.
*/
bool contains( const Comparable & x, BinaryNode *t ) const
{
if( t == nullptr )
return false;
else if( x < t->element )
return contains( x, t->left );
else if( t->element < x )
return contains( x, t->right );
else
return true; // Match
}
/****** NONRECURSIVE VERSION*************************
bool contains( const Comparable & x, BinaryNode *t ) const
{
while( t != nullptr )
if( x < t->element )
t = t->left;
else if( t->element < x )
t = t->right;
else
return true; // Match
return false; // No match
}
*****************************************************/
/**
* Internal method to make subtree empty.
*/
void makeEmpty( BinaryNode * & t )
{
if( t != nullptr )
{
makeEmpty( t->left );
makeEmpty( t->right );
delete t;
}
t = nullptr;
}
/**
* Internal method to print a subtree rooted at t in sorted order.
*/
void printTree( BinaryNode *t, ostream & out ) const
{
if( t != nullptr )
{
printTree( t->left, out );
out << t->element << endl;
printTree( t->right, out );
}
}
/**
* Internal method to clone subtree.
*/
BinaryNode * clone( BinaryNode *t ) const
{
if( t == nullptr )
return nullptr;
else
return new BinaryNode{ t->element, clone( t->left ), clone( t->right ) };
}
};
#endif
Some points of note:
-
➀ : The nested
BinaryNode
struct implements individual tree nodes. It shows the very characteristic structure of binary tree nodes: a data member to hold the “real” data value, and a pair of pointers to other tree nodes, one for the left subtree and one for the right subtree.It is private here so that applications cannot get access to the internals of this tree.
- That may be overkill: making the pointer to the
root
➁ private would probably have sufficed. It would mean that application programmers could create and access their own tree node objects, but would still have no access to the specific nodes that make up this particular tree.
- That may be overkill: making the pointer to the
-
➂ : The
BinarySearchTree
template represents the entire tree (the whole collection of related nodes), with functions for searching, insertion, iteration, etc.. -
Our primary focus in this lecture will be on the
contains
➃ ,insert
➄ , andremove
➅ functions. -
Many of the functions making up this tree are implemented as a simple public function (e.g., ➄ ) that passes the tree root to a similarly named, private, “internal” function (e.g., ➆ ) that uses recursion (starting from that root) to perform to do the actual work.
2 Implementing Binary Search Trees
Since you have, presumably, read your text’s discussion of how to implement BSTs, I’m mainly going to hit the high points.
2.1 Searching a Binary Tree
We’ll start by reviewing the basic searching algorithm.
/**
* Returns true if x is found in the tree.
*/
bool contains( const Comparable & x ) const ➃
{
return contains( x, root );
}
The tree’s contains operation works by using a private utility function, also named contains, to find the node containing the desired data by starting a search from the root.
We search a tree by comparing the value we’re searching for to the “current” node, t. If the value we want is smaller, we look in the left subtree. If the value we want is larger, we look in the right subtree.
You may note that this algorithm bears a certain resemblance to the binary search algorithm we studied earlier in the semester. We shall see shortly that the performance of both search algorithms on a collection of N items is $O(\log N)$, but that binary trees support faster insertion operations, allowing us to build the searchable collection in less time than when using binary search over sorted arrays.
The code discussed here is available as an animation that you can run to see how it works.
2.2 Inserting into Binary Search Trees
/**
* Insert x into the tree; duplicates are ignored.
*/
void insert( const Comparable & x ) ➀
{
insert( x, root );
}
⋮
/**
* Internal method to insert into a subtree.
* x is the item to insert.
* t is the node that roots the subtree.
* Set the new root of the subtree.
*/
void insert( const Comparable & x, BinaryNode * & ➁ t )
{
if( t == nullptr )
t = new BinaryNode{ x, nullptr, nullptr }; ➂
else if( x < t->element )
insert( x, t->left );
else if( t->element < x )
insert( x, t->right );
else
; // Duplicate; do nothing
}
-
We start, again, with a public function ➀ that simply passes the buck to a recursive version, telling it to start from the root.
-
Note that the recursive function receives this pointer as a reference to a pointer ➁ , meaning that it can change the value of the pointer that it was given.
-
It does this specifically when our traversal brings us to a null pointer in the tree ➂ , indicating that this is the place where we want to insert a new tree node with a copy of the data that we are trying to insert.
-
-
But how do we find the place at which to insert that new node? Basically, we ask “where would we go if we were searching for this data in the tree?”
-
The remaining code in this function is almost exactly copied from the search code in the earlier
contains
function.
-
The code discussed here is available as an animation that you can run to see how it works.
Experiment with inserting nodes into binary search trees. Take particular note of what happens if you insert data in ascending or descending order, as opposed to inserting “randomly” ordered data.
2.3 Deletion
Removing a value starts, again, with a public funciton that simply passes the job to a private recursive one, telling it to start from the root.
Here is the recursive part of the remove algorithm.
Looking first at the beginning of the function, we see the by-now-familiar search for the desired value. So eventually, if the data we said to remove is really in the tree, we should find it.
Well, what do we do with it when we find it? Well, we can’t just delete the tree node. Take a look at this tree. If we were to remove 10, 40, or 60 by simply deleting the tree node, that might work. But deleting any other node would break the tree into two or three pieces, rendering it useless.
So, we’ll need to be careful here. Let’s break this problem down into cases:
-
Removing a leaf
-
Removing a node that has only one child
- only a left child
- only a right child
-
Removing a node that has two children
2.3.1 Removing a Leaf
Question Suppose we wanted to remove the “40” from this tree. What would we have to do so that the remaining nodes would still be a valid BST?
Now, take a look at the remove
function.
/**
* Internal method to remove from a subtree.
* x is the item to remove.
* t is the node that roots the subtree.
* Set the new root of the subtree.
*/
void remove( const Comparable & x, BinaryNode * & t )
{
if( t == nullptr )
return; // Item not found; do nothing
if( x < t->element )
remove( x, t->left );
else if( t->element < x )
remove( x, t->right );
else if( t->left != nullptr && t->right != nullptr ) // Two children
{
t->element = findMin( t->right )->element;
remove( t->element, t->right );
}
else
{
BinaryNode *oldNode = t;
t = ( t->left != nullptr ) ? t->left : t->right;
delete oldNode;
}
}
Look at the “leaf” case code, and you can see that all we do is to delete the node.
We reach this code when t
points to a leaf that contains the data we want to remove. In that case, we replace the address in t
by t->right
. If t
is pointing to a leaf, then t->right
is null, so we wind up writing a null pointer into the parent node, replacing whichever of its two children pointers was the one that we followed to get to t
So if we are removing a tree leaf, we “replace” it by a null pointer.
2.3.2 Removing A Non-Leaf Node with a Null Right Child
Question Suppose we wanted to remove the “20” or the “70” from this tree. What would we have to do so that the remaining nodes would still be a valid BST?
For example, starting from the tree shown here, verify for yourself that, if we remove 20:
or 70:
in this manner, that the results are still valid BSTs.
Looking again at the remove
function,
/**
* Internal method to remove from a subtree.
* x is the item to remove.
* t is the node that roots the subtree.
* Set the new root of the subtree.
*/
void remove( const Comparable & x, BinaryNode * & t )
{
if( t == nullptr )
return; // Item not found; do nothing
if( x < t->element )
remove( x, t->left );
else if( t->element < x )
remove( x, t->right );
else if( t->left != nullptr && t->right != nullptr ) // Two children
{
t->element = findMin( t->right )->element;
remove( t->element, t->right );
}
else
{
BinaryNode *oldNode = t;
t = ( t->left != nullptr ) ? t->left : t->right;
delete oldNode;
}
}
we talked about this code in for the leaf case, but we also come to the same code when the node with the desired data has one null pointer.
If, in this case, the right child is null but the left child is not, then we replace the parent’s pointer to this node, t
, by t->left
, so it winds up pointing directly to the only child of the node holding the data we want to remove.
2.3.3 Removing A Non-Leaf Node with a Null Left Child
This tree does not feature any non-leaf nodes with null left children, but examination of that same code will show that there is a symmetry with the prior case. If t->left
is null but t->right
is not, then we force the parent’s pointer t
to change to point to t->right
.
2.3.4 Removing a Node with Two Non-Null Children
Suppose we wanted to remove the “50” or the “30” from this tree. What would we have to do so that the remaining nodes would still be a valid BST?
This is a hard case. Clearly, if we remove either the “50” or “30” nodes, we break the tree into pieces, with no obvious place to put the now-detached subtrees.
So let’s take a different tack. Instead of deleting this node, is there some other data value that we could put into that node that would preserve the BST ordering (all nodes to the left must be less, all nodes to the right must be greater or equal)?
There are, in fact, two values that we could safely put in there: the smallest value from the right subtree, or the largest value from the left subtree.
We can find the largest value on the left by
-
taking one step to the left
-
then running as far down to the right as we can go
We can find the smallest value on the right by
-
taking one step to the right
-
then running as far down to the left as we can go
Now, if we replace “30” by …
… the largest value from the left:
or by the smallest value from the right,
the results are properly ordered for a BST, except, arguably, for the node we just copied the value from. But since that node is now redundant, we can delete it from the tree.
And here’s the best part. Since we find the node to copy from by running as far as we can go in one direction or the other, we know that the node we copied from has at least 1 null child pointer (otherwise we would have kept running past it). So removing it from the tree will always fall into one of the earlier, simpler cases (leaf or only one child).
Again, take a look at the code for removing a node.
/**
* Internal method to remove from a subtree.
* x is the item to remove.
* t is the node that roots the subtree.
* Set the new root of the subtree.
*/
void remove( const Comparable & x, BinaryNode * & t )
{
if( t == nullptr )
return; // Item not found; do nothing
if( x < t->element )
remove( x, t->left );
else if( t->element < x )
remove( x, t->right );
else if( t->left != nullptr && t->right != nullptr ) // Two children
{
t->element = findMin( t->right )->element;
remove( t->element, t->right );
}
else
{
BinaryNode *oldNode = t;
t = ( t->left != nullptr ) ? t->left : t->right;
delete oldNode;
}
}
-
This is the test to see if we are actually trying to remove from a node with two non-null children.
-
This does the “step to the right, then run to the left” behavior we have just described in order to find the replacement value. That value is then used to replace the value in this node, the one we want to remove.
-
The remaining code then removes the replacement value from the right subtree.
The code discussed here is available as an animation that you can run to see how it works. Try this on a variety of trees and nodes. Try to observe each of the major cases, as outlined here, in action.
3 How Fast Are Binary Search Trees?
Each step in the BST insert and contains algorithms move one level deeper in the tree. Similarly, in remove, the only part that is not constant time is the “running down the tree” to find the smallest value to the right.
The number of recursive calls/loop iterations in all these algorithms is therefore no greater than the height of the tree.
But how high can a BST be?
That depends on how well the tree is “balanced”.
3.1 Balancing
A binary tree is balanced if for every interior node, the height of its two children differ by at most 1.
Unbalanced trees are easy to obtain.
This is a BST.
But, so is this!
The shape of the tree depends upon the order of insertions. Try out the tree insertion in an animation. Try running this again. This time, clear the tree and then insert the values 1,2,3,4,5
. Then clear the tree again and insert the values 8,6,4,2
.
The worst case behavior for binary search trees is when the data being inserted is already in order (or in reverse order). In that case, the tree degenerates into a sorted linked list.
The best case is when the inserted data yields a tree that is balanced, meaning that, for each node, the heights of the node’s children are nearly the same.
3.2 Performance
Consider the contains operation on a nearly balanced tree with N nodes.
Question: What is the complexity of the best case?
-
$O(1)$
-
$O(\log N)$
-
$O(N)$
-
$O(N \log N)$
-
$O(N^2)$
Question: Consider the contains operation on a nearly balanced tree with N nodes.
What is the complexity of the worst case?
-
$O(1)$
-
$O(\log N)$
-
$O(N)$
-
$O(N \log N)$
-
$O(N^2)$
But how high is a balanced tree?
A nearly balanced tree will be height $\log N$.
Consider a tree that is completely balanced and has its lowest level full. Since every node on the lowest level shares a parent with one other, there will be exactly half as many nodes on the next-to-lowest level as on the lowest. And, by the same reasoning, each level will have half as many nodes as the one below it, until we finally get to the single root at the top of the tree.
So a balanced tree has height $\log N$, and searching a balanced binary tree would be $O(\log N)$.
Question: Consider the contains operation on a degenerate tree with N nodes.
What is the complexity of the worst case?
-
$O(1)$
-
$O(\log N)$
-
$O(N)$
-
$O(N \log N)$
-
$O(N^2)$
There’s quite a difference, then, between the worst case behavior of trees, depending upon the tree’s “shape”.
3.3 Average-Case
So we might wonder, then, does the “average” binary tree look more like the balanced or the degenerate case?
An intuitive argument is:
-
No tree with $n$ nodes has $\mbox{height} < \log{n}$.
-
No tree with $n$ nodes has $\mbox{height} > n$
-
The average depth of all nodes is therefore somewhere between $n/2$ and $(\log n)/2$.
- The more unbalanced a tree is, the less likely that a random insertion would increase the tree height.
For example, if we are inserting into this tree, then any insertion will increase the tree’s height.
But if we were inserting a randomly selected value into this one, then there is only a $2/8$ chance that we will increase the height of the tree.
For trees that are somewhere between those two extremes, the chances of a random insertion actually increasing the height of the tree will fall somewhere between those two probability extremes.
- Insertions that don’t increase the tree height make the tree more balanced.
So, the more unbalanced a tree is, the more likely that a random insertion will actually tend to increase the balance of the tree.
This suggests (but does not prove) that randomly constructed binary search trees tend to be reasonably balanced.
It is possible to prove this claim, but the proof is beyond the scope of this class.
But, yes, we expect randomly created binary search trees will be reasonably balanced.
But, it’s not safe to be too confident about the height of binary search trees. Although random construction tends to yield reasonable balance, in real applications we often do not get random values.
Question: Which of the following data would, if inserted into an initially empty binary search tree, yield a degenerate tree?
-
data that is in ascending order
-
data that is in descending order
-
both of the above
-
none of the above
3.4 Can We Avoid the Worst Case?
Both data in ascending and descending order results in degenerate trees.
It’s very common to get data that is in sorted or almost sorted order, so degenerate behavior turns out to be more common than we might expect.
Also, the arguments made so far don’t take deletions into account, which tend to unbalance trees.
Later, we’ll look at variants of the binary search tree that use more elaborate insertion and deletion algorithms to maintain tree balance.