Graphs --- the Basics

Steven J. Zeil

Last modified: Oct 21, 2023
Contents:

We have previously seen that compilers often represent code as trees.

 

For example, in this case, the expression is (13 + a) * (x - 1).

Each non-leaf node represents the application of some operator to one or more subexpressions.


 

As good C++ programmers, we know that assignment and many other “built-in” parts of C++ are just more operators, so we can easily extend this idea to entire statements …

x = (13 + a) * (x - 1);

 

This idea can be extended to other kinds of statements as well, if we’re willing to be a bit loose in our interpretation of what an “operator” is.

Here, for example, is a structure that might be used to represent a declaration of a variable

int x = 0;

Now, let’s consider what happens just a little bit later in the compilation. We have trees for expressions and statements, and trees for declarations. All of these trees are actually joined together as subtrees of larger trees representing entire functions, classes, and other larger C++ constructs.

Suppose we have a tree for, say, the assignment

x = (13 + a) * (x - 1);

 

Now the compiler wants to know just what “x” actually refers to. In a typical C++ program, we may have lots of objects named “x”,

occurring in different functions, as data members of different classes and structs, etc. Each of these “x” objects has a unique tree representing its declaration, but which of those declarations are this assignment statement’s x’s referring to?


 

Well, the language has various rules allowing the compiler to resolve this question, and typically once the compiler has figured out the answer, it records that answer by adding a pointer from the uses of “x” to the appropriate declaration, as shown here.

We add pointers

So here we have a perfectly useful data structure. But what is it?

Whatever this is, it’s not a tree anymore!

This structure is an example of a graph.

The compiler must traverse this structure, generating code for each node of the tree. Processing graphs requires different kinds of algorithms from what we used for trees. Obviously we don’t want to generate code multiple times for the same nodes, but this example shows that in a graph, we can reach the same nodes multiple times using different paths. Even worse, recursion and other constructs can lead to cycles (loops) in the graph, but we still need to make sure that traversals will terminate.


 

As another example, consider the map representing the flights offered by a small airline.

This is also a graph. It can be used in such practical problems as

1 Definitions

A graph $G=(V,E)$ consists of a set of vertices ($V$) and a set of edges ($E$).

1.1 Paths

A path in G is a sequence of vertices $[w_1, w_2, \ldots , w_n]$ such that $(w_i, w_{i+1}) \in E $ for all $i = 1 \ldots n-1$.

1.2 Graphs and Multi-Graphs

Usually when we discuss graphs, we make the assumption that there is at most one edge between any two vertices.

This means that, for a digraph, $|E| \leq |V|^2$, since each vertex can have at most $|V|$ vertices adjacent to it (including itself – there is no rule against edges from a vertex to itself, although some problems involving graphs may not permit this).

For an undirected graph, $|E| \leq \frac{|V|(|V|-1)}{2}$.

In general, it is safe to say that $|E| \in O(|V|^2)$.


There are some graph-related problems, however, that use multi-graphs, which permit multiple edges between the same pair of vertices.

For example, a public transit system might be modeled as a graph with bus stops & train stations as vertices, and an edge for each scheduled bus/train between two vertices. In that case a pair of stops may share several connecting edges, each marked with a different departure and arrival time.

Multi-graphs are not subject to the limitation $|E| \in O(|V|^2)$.

Unless specified otherwise, we will be concentrating on graphs, not multi-graphs, in this course.

2 Data Structures for Implementing Graphs

2.1 Vertices and Pointers

One of the most direct approaches to implementing a graph is to use the same approach we employed with trees: create a class/struct to represent the vertices (for trees, nodes), linked by pointers.

For example, in our discussion of trees, we had this structure for a general tree node:

template <typename T>
class treenode
{
public:
  T nodeValue;
  std::vector<treenode<T>*> children;

  treenode (const T& item = T()):
    nodeValue(item)
  {}
};

We can do almost exactly the same thing for a graph vertex:

template <typename T>
class Vertex
{
public:
  T value;
  std::vector<Vertex<T>*> neighbors;

  Vertex (const T& item = T()):
    value(item)
  {}
};

 

Then, to get a graph like this, it’s a matter of allocating the vertices and setting up the pointers, e.g.,

vector<Vertex<int>*> sampleGraph;
for (int i = 0; i < 10; ++i)
    graph.push_back(new Vertex<int>(i+1));
sampleGraph[0].neighbors.push_back(sampleGraph[2]);
sampleGraph[0].neighbors.push_back(sampleGraph[3]);
sampleGraph[0].neighbors.push_back(sampleGraph[4]);
   ⋮

The major drawback to this approach to implementing graphs is memory management. Because graphs do not follow the discipline of trees of having each node reachable from exactly one parent, it can be very difficult to tell when a graph vertex can be safely deleted. That’s why it becomes important to have a “master list” of vertices, sampleGraph in this case.

2.2 Adjacency Matrix

A common approach is to number the vertices and keep them in an array.

An adjacency matrix indicates which vertices are connected. A 1 indicates the presence of an edge between two vertices, a zero indicates no edge.

 

The adjacency matrix for this graph would be

1 2 3 4 5 6 7 8 9
1 0 0 1 1 1 0 0 0 0
2 0 0 0 0 0 1 0 0 0
3 0 0 0 0 0 1 0 0 0
4 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 1 1
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 1 0 0 0 0

The advantage of this structure is that we can determine adjacency in O(1) time. A disadvantage to this structure is the lack of any easy way to remove (or add) vertices.

A potentially serious disadvantage is the $O(|V|)$ time required to list the vertices adjacent to a given vertex, a very common step in many graph algorithms. There are many algorithms of the form

for each vertex v in G
{
    do something to v
    do something to each neighbor of v
}

For the vertices and pointers approach, this is $O(|E|)$. With adjacency matrices, this is $O(|V|^2)$. Now, in any graph1 , we know that

\[ |E| \leq |V|^2 \]

but there are a lot of problems where the graph is sparse, with $|E| \in O(|V|)$, and for those graphs the above algorithm is $O(|V|)$ for vertices and pointers, $O(|V|^2)$ for adjacency matrices.

Another problem with adjacency matrices in many applications is the $O(|V|^2)$ storage size. Again, this is particularly annoying when the graph is sparse, when only a small fraction of the adjacency matrix elements are 1.

2.3 Adjacency Lists

A common structure that gives better time and space performance for sparse graphs is the adjacency list. For each vertex, we keep a list of vertices adjacent to it.

 

The adjacency list for this graph

 

would be this:

The adjacency list

2.3.1 Approximating Adjacency Lists with std Containers

In C++, you can implement an adjacency list as an array or vector of std::list. You can actually get something along the lines of an adjacency list by using a multimap:

class Node {
   ⋮
};

typedef std::multimap<Node,Node> Graph;
Graph g;

g.insert (Graph::value_type(node1, node3));
g.insert (Graph::value_type(node1, node4));
g.insert (Graph::value_type(node1, node5));
g.insert (Graph::value_type(node2, node6));
g.insert (Graph::value_type(node3, node6));
   ⋮

or a combination of a map and a set:

class Node {
   ⋮
};

typedef std::map<Node, std::set<Node> > Graph;
Graph g;

g[node1].insert (node3);
g[node1].insert (node4);
g[node1].insert (node5);
g[node2].insert (node6);
g[node3].insert (node6);
   ⋮

With these approaches, you don’t necessarily have to number the nodes. You can use any identifying information that can be inserted into a map or unordered map.

 

For example, you could do this airline graph as easily as

typedef std::multimap<std::string,std::string> Graph;
typedef Graph::value_type Flight;
Graph g;

g.insert (Flight("N.Y.", "Boston");
g.insert (Flight("Boston", "N.Y.");
g.insert (Flight("Raleigh", "N.Y.");
g.insert (Flight("N.Y.", "Wash DC");
g.insert (Flight("Wash DC", "N.Y.");
g.insert (Flight("Boston", "Wash DC");
g.insert (Flight("Wash DC", "Boston");
g.insert (Flight("Wash DC", "Norfolk");
g.insert (Flight("Norfolk", "Raleigh");

This is a useful approach if you want to quickly construct a usable graph for a specific application. It does not lend itself well to re-use, however. So, in the next lesson, we’ll look at a reusable Graph ADT in the style of the std library containers.