Most of the data structures we have looked at so far have been devoted to keeping a collection of elements in some linear order.
digraph g {
graph [
rankdir = "UD"
];
ODU -> Academic_Affairs
ODU -> Admin_and_Finance
Academic_Affairs -> Engineering
Academic_Affairs -> Science
Academic_Affairs -> Arts_and_Letters
Science -> Physics
Science -> Computer_Science
Science -> Mathematics
Admin_and_Finance -> Financial_Serv
Admin_and_Finance -> Public_Safety
}
Trees are the most common non-linear data structure in computer science. Trees are useful in representing things that naturally occur in hierarchies
Trees also turn out to be exceedingly useful in implementing associative containers like std::set
.
*Compare this to the data structures we’ve seen so far, which may allow us to search in $O(\log N)$ time but insert in $O(N)$, or insert in $O(1)$ but search in $O(N)$.
A tree is a collection of nodes.
If nonempty, the collection includes a designated node r
, the root, and zero or more (sub)trees T1
, T2
, … , Tk
, each of whose roots are connected by an edge to r
.
The collection of nodes shown here is a tree. We can designate A as the root, and we note that the collections of nodes {B}, {C, F}, {D}, and {E,G,H,I}, together with their edges, are all trees whose own roots are connected to A.
Focusing on a tree as a collection of nodes leads to some other terminology:
Each node except the root has a parent.
Parent nodes have children. Nodes without children are leaves.
Nodes with the same parent are siblings.
A tree in which every parent has at most 2 children is a binary tree.
Trees in which parents may have more than 2 children are general trees.
This, then is a general tree.
\[ \forall i, 1 \leq i < k, n_i \; \mbox{is the parent of} \; n_{\mbox{i+1}}. \]
The length of a path is the number of edges in it.
$n_1$ is an ancestor of $n_k$.
$n_k$ is a descendant of $n_1$.
Question: Which of the following sequences of nodes are paths? (May be more than one)
[C, A, E, G]
[A, E, G]
[E]
[]
[C, A, E, G] is not a path, because C is not a parent of A.
[A, E, G] is a path. Each node in the sequence is a parent of the one that follows it.
[E] is a path. The definition says a path is a sequence of nodes such that for each successive pair of nodes, the first is the parent of the other. Since there are no such pairs, the definition reduces to “a sequence of nodes” in this case.
Empty sequences are paths.
The depth of a node is the length of the path from the root to that node.
The height of a node is the length of the longest path from it to any leaf.
The height of an empty tree is -1.
Question: What is the height of E?
The height of E is 1, because the longest path from E to any leaf is of length 1. (If you answered “2”, remember that the length of a path is computed by counting the edges, not the nodes.
Many algorithms for manipulating trees need to “traverse” the tree, to visit each node in the tree and process the data in that node. In this section, we’ll look at some prototype algorithms for traversing trees, mainly using recursion.
Later, we’ll look at how to devise iterators for tree traversal. But iterators are primarily for application code. The underlying implementation of a tree-based ADT will still need to employ the kinds of algorithms we are about to discuss.
A pre-order traversal is one in which the data of each node is processed before
visiting any of its children.
A post-order traversal is one in which the data of each node is processed after
visiting all of its children.
An in-order traversal is one in which the data of each node is processed after visiting its left child but before visiting its right child.
This traversal is specific to binary trees.
A level-order traversal is one in which all nodes of the same height are visited before any lower nodes.
Compilers, interpreters, spreadsheets, and other programs that read and evaluate arithmetic expressions often represent those expressions as trees. Constants and variables go in the leaves, and each non-leaf node represents the application of some operator to the subtrees representing its operands. The tree here, for example, shows the product of a sum and of a subtraction.
If we were to traverse this tree, printing each node as we visit it, we would get:
Compare this to ((13+a)*(x-1)), the “natural” way to write this expression. You can see that in-order traversal yields the normal way to write this expression except for parentheses. If we made our output routine put parentheses around everything, we would actually get an algebraically correct expression.
When applied to a “binary search tree”, which we will introduce in a later lesson, in-order traversal processes the nodes in sorted order.
Post-order traversal yields post-fix notation, which in turn is related to stack-based algorithms for expression evaluation.
// represents a node in a binary tree
template <typename T>
class BiTreeNode
{
public:
// BiTreeNode is a class implementation structure. making the
// data public simplifies building class functions
T nodeValue;
BiTreeNode<T> *left, *right;
// default constructor. data not initialized
BiTreeNode()
{}
// initialize the data members
BiTreeNode (const T& item, BiTreeNode<T> *lptr = nullptr,
BiTreeNode<T> *rptr = nullptr):
nodeValue(item), left(lptr), right(rptr)
{}
};
Let’s suppose that we have a binary tree whose nodes are declared as shown here.
This is a typical binary tree structure, with a field for data and pointers for up to two children. If the node is a leaf, both the left
and right
pointers will be null. If the node has only one child, either the left
or right
will be null.
It’s fairly easy to write pre-, in-, and post-order traversal algorithms using recursion.
template <typename T>
void basicTraverse (BiTreeNode<T>* t)
{
if (t != 0)
{
basicTraverse(t->left);
basicTraverse(t->right);
}
}
This is the basic structure for a recursive traversal. If this function is called with a null pointer, we do nothing. But if we have a real pointer to some node, we invoke the function recursively on the left and right subtrees. In this manner, we will eventually visit every node in the tree.
But we can convert the basic traversal into a pre-order traversal by applying the rule:
process the node before visiting its children
template <typename T>
void preorder(BiTreeNode<T> *t)
{
// the recursive scan terminates on a empty subtree
if (t != nullptr)
{
doSomethingWith (t->nodeValue);
preorder(t->left); // descend left
preorder(t->right); // descend right
}
}
We get a post-order traversal by applying the rule:
process the node after visiting its children
template <typename T>
void postorder(BiTreeNode<T> *t)
{
// the recursive scan terminates on a empty subtree
if (t != nullptr)
{
postorder(t->left); // descend left
postorder(t->right); // descend right
doSomethingWith (t->nodeValue);
}
}
And we get an in-order traversal by applying the rule:
process the node after visiting its left descendants and before visiting its right descendants.
template <typename T>
void inorder(BiTreeNode<T> *t)
{
// the recursive scan terminates on a empty subtree
if (t != nullptr)
{
inorder(t->left); // descend left
doSomethingWith (t->nodeValue);
inorder(t->right); // descend right
}
}
Note that, while pre- and post- order traversals can be applied to trees with any number of children, in-order really only makes sense when applied to binary trees.
Try out the traversals in an animation. Try these with different trees until you are comfortable with them.
When we work with general trees, we have to allow for an arbitrary number of children. So, instead of data members like left
and right
to hold individual children, we use a sequence container of some type to hold pointers to all of the children.
// represents a node in a binary tree
template <typename T>
class TreeNode
{
public:
// TreeNode is a class implementation structure. making the
// data public simplifies building class functions
T nodeValue;
std::vector<TreeNode<T> > children;
// default constructor. data not initialized
TreeNode()
{}
// initialize the data members
TreeNode (const T& item):
nodeValue(item)
{}
// Add a child
TreeNode<T>& addChild(TreeNode<T>* newChild)
{
children.push_back(newChild);
return *this;
}
};
Let’s suppose that we have a general tree whose nodes are declared as shown here.
We have used a vector to hold the pointers to the children. Unlike binary trees, we won’t use null pointers to indicate that a child is missing. Anything that is “missing” will simply never have been added to the vector. If the node is a leaf, the children
vector will have size zero.
A basic traversal of a general tree looks like this
template <typename T>
void basicTraverse (TreeNode<T>* t)
{
if (t != 0)
{
for (TreeNode<T>* child: t->children)
basicTraverse(child);
}
}
This is the basic structure for a recursive traversal. If this function is called with a null pointer, we do nothing. But if we have a real pointer to some node, we invoke the function recursively on the children. In this manner, we will eventually visit every node in the tree.
It’s fairly easy to rewrite the pre- and post-order traversal algorithms to accommodate the arbitrary number of children.
Again, to make this useful, we need to add code to actually do something with the data in each node.
The most common places to do this are just before or just after visiting all the children.
process the node before visiting its children
template <typename T>
void preorder(TreeNode<T> *t)
{
if (t != 0)
{
doSomethingWith (t->nodeValue);
for (TreeNode<T>* child: t->children)
basicTraverse(child);
}
}
process the node after visiting its children
template <typename T>
void postorder(TreeNode<T> *t)
{
template <typename T>
void preorder(TreeNode<T> *t)
{
if (t != 0)
{
for (TreeNode<T>* child: t->children)
basicTraverse(child);
doSomethingWith (t->nodeValue);
}
}
}
Note that, while pre- and post- order traversals can be applied to trees with any number of children, in-order really only makes sense when applied to binary trees.
This form of traversal is different from the others. In a level-order traversal, we visit the root, then all elements 1 level below the root, then all elements two levels below the root, and so on. Unlike the other traversals, elements visited successively may not be related in the parent-child sense except for having the root as a common (and possibly distant) ancestor.
To program a level-order traversal, we use a queue to keep track of nodes at the next lower level that need to be visited.
Here’s an algorithm for level-order traversal of a general tree. You can adapt this for us with binary trees as well, though in my experience level-order traversals seem more likely to arise in general-tree problems.
template <typename T>
void levelOrder (TreeNode<T>* t)
{
// store siblings of each node in a queue so that they are
// visited in order at the next level of the tree
queue<TreeNode<T> *> q;
TreeNode<T> *p;
// initialize the queue by inserting the root in the queue
q.push(t);
// continue the iterative process until the queue is empty
while(!q.empty())
{
// delete front node from queue and output the node value
p = q.front();
q.pop();
doSomethingWith (t->nodeValue);
// Add the children onto the queue for future processing
for (TreeNode<T>* child: children)
q.push(child);
}
}
The code discussed here is available as an animation that you can run to see how it works.
As an example of applying a tree traversal, let’s consider the problem of computing the height of a tree.
Remember that the height of a tree node was defined as “the length of the longest path from it to any leaf”.
That means that, if we knew the height of each of that node’s children, the height of this node would be one more than that of it’s “tallest” child.
int height = 1 + max(leftChildHeight, rightChildHeight);
This suggests that we much compute the height of a node’s children before we can compute the height of the node itself. That means that we can look to a post-order traversal: compute something for each child before computing it for the parent.
Our basic post-order traversal code looks like:
template <typename T>
void postorder(BiTreeNode<T> *t)
{
// the recursive scan terminates on a empty subtree
if (t != nullptr)
{
postorder(t->left); // descend left
postorder(t->right); // descend right
doSomethingWith (t->nodeValue);
}
}
To use this to compute the height, we would do
template <typename T>
int height(BiTreeNode<T> *t)
{
// the recursive scan terminates on a empty subtree
if (t != nullptr)
{
int leftChildHeight = height(t->left); // descend left
int rightChildHeight = height(t->right); // descend right
int height = 1 + max(leftChildHeight, rightChildHeight);
return height;
} else {
return -1; // height of an empty tree is -1
}
}
You should still be able to recognize the pattern of a post-order traversal in this code.
We could even streamline this as
template <typename T>
int height(BiTreeNode<T> *t)
{
// the recursive scan terminates on a empty subtree
if (t != nullptr)
{
return 1 + max(height(t->left), height(t->right));
} else {
return -1; // height of an empty tree is -1
}
}
and it would still be a post-order traversal (because the height calculation does not take place until we have returned form both recursive calls), even if it is a little harder to recognize as such.
A general tree version of this works along similar lines:
template <typename T>
int height(TreeNode<T> *t)
{
// the recursive scan terminates on a empty subtree
if (t != nullptr)
{
int maxHeight = -1;
for (TreeNode<T>* child: t->children)
{
maxHeight = max(maxHeight, height(child));
}
return 1 + maxHeight;
} else {
return -1; // height of an empty tree is -1
}
}