Trees

Steven J. Zeil

Last modified: Jul 16, 2014

Contents:
1. Tree Terminology
2. Tree Traversal
2.1 Recursive Traversals
3. Example: Processing Expressions
4. Example: Processing XML
5. Using Trees for Searching
5.1 How Fast Are Binary Search Trees?

Most of the data structures we have looked at so far have been devoted to keeping a collection of elements in some linear order.


Trees

Trees are the most common non-linear data structure in computer science.

Trees also turn out to be exceedingly useful in implementing fast searching and insertion.

1. Tree Terminology


Definition

A tree is a collection of nodes.

If nonempty, the collection includes a designated node r, the root, and zero or more (sub)trees T1, T2, … , Tk, each of whose roots are connected by an edge to r.


A Tree

It’s a subtle, but important point to note that in discussing trees, we sometimes focus on the things connected to the root as individual nodes, and other times as entire trees.


Botanical Families


Binary and General Trees

2. Tree Traversal


Tree Traversal

Many algorithms for manipulating trees need to “traverse” the tree, to visit each node in the tree and process the data in that node. In this section, we’ll look at some prototype algorithms for traversing trees.


Kinds of Traversals


Example: Expression Trees

The tree here, for example, shows the product of a sum and of a subtraction.


Traversing an Expression Tree

Pre-order:
* + 13 a - x 1
In-order:

13 + a * x - 1

Compare to ((13+a)*(x–1)), the “natural” way to write this expression.

Post-order:

13 a + x 1 - *

Post-order traversal yields post-fix notation, which in turn is related to algorithms for expression evaluation.

2.1 Recursive Traversals


Recursive Traversals

Let’s suppose that we have a binary tree whose nodes are declared as shown here.

struct BinaryTreeNode 
{
   T data;
   BinaryTreeNode* left;
   BinaryTreeNode* right;
};

It’s fairly easy to write pre-, in-, and post-order traversal algorithms using recursion.


A Basic Traversal

void basicTraverse (BinaryTreeNode* t)
{
  if (t != 0)
    {
      basicTraverse(t->left);
      basicTraverse(t->right);
    }
}

This is the basic structure for a recursive traversal. If this function is called with a null pointer, we do nothing. But if we have a real pointer to some node, we invoke the function recursively on the left and right subtrees. In this manner, we will eventually visit every node in the tree. The problem is, this basic form doesn’t do anything.


Pre-Order Traversals

But we can convert it into a pre-order traversal by applying the rule:

process the node before visiting its children


void preOrder (BinaryTreeNode* t)
{
  if (t != 0)
    {
      foo (t->data);
      preOrder(t->left);
      preOrder(t->right);
    }
}



Post-Order Traversals


void postOrder (BinaryTreeNode* t)
{
  if (t != 0)
    {
      postOrder(t->left);
      postOrder(t->right);
      foo (t->data);
    }
}


We get a post-order traversal by applying the rule:

process the node after visiting its children


In-Order Traversals

And we get an in-order traversal by applying the rule:

process the node after visiting its left descendents and before visiting its right descendents.


void inOrder (BinaryTreeNode* t)
{
  if (t != 0)
    {
      inOrder(t->Left);
      foo (t->Element);
      inOrder(t->Right);
    }
}



Demo

Run the different traversals

3. Example: Processing Expressions


Example: Processing Expressions

We’ll develop a program that can simplify certain basic arithmetic expressions, e.g., converting (2 - 2) * x + 1 * (y + z) to y + z.


The Data Structure

class Expression {
private:
  std::string opOrVarName;
  bool thisIsAConstant;
  bool thisIsAVariable;
  Expression* left;
  Expression* right;
  int value;
   ⋮

Note the left and right pointers, which give this it’s essential “tree-ness”.


The Application

simplifier.cpp

expression.h

expression.cpp

We’ll look at selected parts in more detail.


Printing Expressions

printExpr.cpp


Distributive Law

distrib.cpp


Simplifying Expressions

simplify.cpp

4. Example: Processing XML


XML

XML is a markup language used for exchanging data among a wide variety of programs on the web. It is flexible enough to represent almost any kind of data.

In XML, data is described by structures consisting of nested “elements” and text.


XML Tags


An example of XML:

<Question type="Choice" id="ultimate">
<QCategory>trial</QCategory>
<Body>What is the answer to the ultimate question of life, 
the universe, and everything?</Body>
<Choices>
<Choice>a good nap</Choice>
<Choice value="1">42</Choice>
<Choice>inner peace</Choice>
<Choice>money</Choice>
</Choices>
<AnswerKey>42</AnswerKey>
<Explanation>D Adams,
The Hitchhiker's Guide to the Galaxy</Explanation>
<Revisions>5/15/2001 1:35:29 PM</Revisions>
</Question>


XML and Trees

Although it may not be obvious, XML actually describes a tree structure.

For example, the structure above shows that all the elements are inside a “Question”. One of those elements inside the Question is a “Choices” element, and each individual “Choice” occurs inside there.We can diagram this tree structure as shown here.

XML is closely related to HTML

Just a test

Nothing much to see here.


Move along.

For example, the text shown here can be produced by the following XML-legal HTML:

<html>
  <head>
    <title>Just a Test</title>
  </head>

  <body>
    <h1>Just a test</h1>
<p>Nothing much to see <a href="test.html">here</a>.
</p>
<hr/>
<p>
Move <a href="nextpage.html">along</a>.
</p>
  </body>
</html>

The tree structure for this HTML page is:

A program that would read a web page and print a list of all links (<a> elements with href= attributes).

/** Example of tree manipulation using XML documents */

#include <iostream>

using namespace std;

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/dom/DOM.hpp>
#include <xercesc/sax/HandlerBase.hpp>
#include <xercesc/util/XMLString.hpp>
#include <xercesc/util/PlatformUtils.hpp>

using namespace XERCES_CPP_NAMESPACE;

DOMDocument* readXML (const char *xmlFile) 
{
  ⋮
}

string getHrefAttribute (DOMNode* linkNode)
{
  DOMElement* linkNodeE = (DOMElement*)linkNode;
  const XMLCh* href = XMLString::transcode("href");
  const XMLCh* attributeValue = linkNodeE->getAttribute(href);
  return string(XMLString::transcode(attributeValue));
}


void processTree (DOMNode *tree)
{
  if (tree != NULL)
    {
      if (tree->getNodeType() == DOMNode::ELEMENT_NODE)
        {
          const XMLCh* elName = tree->getNodeName();
          const XMLCh* aName = XMLString::transcode("a");
          if (XMLString::equals(elName, aName))
            cout << "Link to " << getHrefAttribute(tree) << endl;
        }
      for (DOMNode* child = tree->getFirstChild();
           child != NULL; child = child->getNextSibling())
        processTree(child);
    }
}


int main(int argc, char **argv) 
{
  if (argc != 2)
    {
      cerr << "usage: " << argv[0] << " xmlfile" << endl;
      return 1;
    }

  // Initialize the Xerces XML C++ library
  try {
    XMLPlatformUtils::Initialize();
  }
  catch (const XMLException& toCatch) {
    char* message = XMLString::transcode(toCatch.getMessage());
    cout << "Error during initialization! :\n"
         << message << "\n";
    XMLString::release(&message);
    return 1;
  }

  DOMDocument* doc = readXML(argv[1]);
  if (doc == 0)
    {
      cerr << "Could not read " << argv[1] << endl;
      return 2;
    }
  processTree (doc->getDocumentElement());

  // Cleanup
  doc->release();
  return 0;
}

based upon the Xerces-C++ library.

If this code is compiled and run on the HTML page we have just seen, it would print

Link to test.html
Link to nextpage.html


Processing XML as a Tree

void processTree (DOMNode *tree)
{
  if (tree != NULL)
    {
      if (tree->getNodeType() == DOMNode::ELEMENT_NODE)
        {
          const XMLCh* elName = tree->getNodeName();
          const XMLCh* aName = XMLString::transcode("a");
          if (XMLString::equals(elName, aName))
            cout << "Link to " << getHrefAttribute(tree) << endl;
        }
      for (DOMNode* child = tree->getFirstChild();
           child != NULL; child = child->getNextSibling())
        processTree(child);
    }
}

Now, this code features an interface that you have never seen before, and a lot of the details are bound to look mysterious.

Nonetheless, if you look at where the processTree function calls itself, you can readily tell that this function works by pre-order traversal.

5. Using Trees for Searching


Search Trees

For large collections of data, our current data structure allow us to do fast searches or fast insertions, but not both.

Trees offer us a way out of this conflict.


Definition: Binary Search Trees

A tree in which every parent has at most 2 children is a binary tree.

binary search tree\(T_{L}\)\(T_{R}\)

The Binary Search Tree ADT

Let’s look at the basic interface for a binary search tree.

template <class T>
class node 
{
public:
  node (const T & v, 
        node<T> * left, 
        node<T> * right)
    : value(v), 
      leftChild(left), 
      rightChild(right) { }
  
  T value;
  node<T> * leftChild;
  node<T> * rightChild;
};


Searching a Binary Tree

template <class T> 
node<T>* find (const T& element, 
               const node<T>* t)
{ 
 if (t == NULL)
   return NULL;
 if (element < t->value)
   return find(element, t->left);
 else if (t->value < element)
   return find(element, t->right);
 else  // t->value == element
   return t;
}

We search a tree by comparing the value we’re searching for to the “current” node.

  • If the value we want is smaller, we look in the left subtree.

  • If the value we want is larger, we look in the right subtree.


Demo

Run this algorithm.


Inserting into Binary Search Trees

template <class T> 
void insert (const T& element, node<T>*& t)
{ 
 if (t == NULL)
   t = new node<T>(element, NULL, NULL);
 if (element < t->value)
   insert (element, t->left);
 else if (t->value < element)
   insert (element, t->right);
 else  // t->value == element
   return; // If we want no duplicates
   // insert (element, t->right); // If we permit duplicates
}

Demo

Run this algorithm.

5.1 How Fast Are Binary Search Trees?


Trees Can Be Fast

Each step in the BST insert and find algorithms move one level deeper in the tree.

That depends on how well the tree is “balanced”.


Shapes of Trees


What Determines a Tree’s Shape?

The shape of the tree depends upon the order of insertions.