Graphs --- the Basics
Steven J. Zeil
We have previously seen that compilers often represent code as trees.
For example, in this case, the expression is (13 + a) * (x - 1)
.
Each non-leaf node represents the application of some operator to one or more subexpressions.
As good C++ programmers, we know that assignment and many other “built-in” parts of C++ are just more operators, so we can easily extend this idea to entire statements …
x = (13 + a) * (x - 1);
This idea can be extended to other kinds of statements as well, if we’re willing to be a bit loose in our interpretation of what an “operator” is.
Here, for example, is a structure that might be used to represent a declaration of a variable
int x = 0;
Now, let’s consider what happens just a little bit later in the compilation. We have trees for expressions and statements, and trees for declarations. All of these trees are actually joined together as subtrees of larger trees representing entire functions, classes, and other larger C++ constructs.
Suppose we have a tree for, say, the assignment
x = (13 + a) * (x - 1);
Now the compiler wants to know just what “x” actually refers to. In a typical C++ program, we may have lots of objects named “x”,
occurring in different functions, as data members of different classes and structs, etc. Each of these “x” objects has a unique tree representing its declaration, but which of those declarations are this assignment statement’s x
’s referring to?
Well, the language has various rules allowing the compiler to resolve this question, and typically once the compiler has figured out the answer, it records that answer by adding a pointer from the uses of “x” to the appropriate declaration, as shown here.
We add pointers
-
from each use of a name
-
to the declaration to which it refers.
So here we have a perfectly useful data structure. But what is it?
Whatever this is, it’s not a tree anymore!
This structure is an example of a graph.
The compiler must traverse this structure, generating code for each node of the tree. Processing graphs requires different kinds of algorithms from what we used for trees. Obviously we don’t want to generate code multiple times for the same nodes, but this example shows that in a graph, we can reach the same nodes multiple times using different paths. Even worse, recursion and other constructs can lead to cycles (loops) in the graph, but we still need to make sure that traversals will terminate.
As another example, consider the map representing the flights offered by a small airline.
This is also a graph. It can be used in such practical problems as
-
Given travel time on each route, find the fastest way to travel between two cities.
-
Find the fastest way to visit every city in the graph.
1 Definitions
A graph $G=(V,E)$ consists of a set of vertices ($V$) and a set of edges ($E$).
-
An edge is an ordered pair $(v,w)$, where $v \in V$ and $w \in V$.
-
A graph is undirected if for any vertices v and w, $(v,w) \in E$ iff (if and only if) $(w,v) \in E$.
-
Graphs that are not undirected are directed graphs or digraphs.
-
Both the assignment/declaration graph and the airline graph above are directed graphs.
-
-
A node $w$ is adjacent to $v$ in $G$ if there exists an edge $(v,w) \in E$.
-
In the assignment graph above, the
*
vertex is adjacent to the=
vertex. The=
vertex is not adjacent to the*
vertex (or to any other vertices).
-
1.1 Paths
A path in G is a sequence of vertices $[w_1, w_2, \ldots , w_n]$ such that $(w_i, w_{i+1}) \in E $ for all $i = 1 \ldots n-1$.
-
A path is simple if no two of its vertices, except possibly the first and last, are the same.
-
In the airline graph above,
[Boston, N.Y., Wash DC]
is a simple path.
-
-
A cycle is a path of length 1 or more for which $w_1 = w_n$.
-
In the airline graph above,
[Boston, N.Y., Wash DC, Boston]
is a cycle.
-
-
A directed graph is acyclic if it contains no cycles.
-
Sometimes called a DAG for “directed acyclic graph”.
-
The assignment/declaration graph is acyclic.
-
-
An undirected graph is connected if there is a path from each vertex to each other vertex.
-
A directed graph with this property is called strongly connected.
-
The airline graph above is strongly connected. The assignment/declaration graph is not.
Even if we removed the
char x = 'a'
declaration structure, that graph would still not be strongly connected. (For example, one cannot reach the=
vertex from any other vertex.) -
1.2 Graphs and Multi-Graphs
Usually when we discuss graphs, we make the assumption that there is at most one edge between any two vertices.
This means that, for a digraph, $|E| \leq |V|^2$, since each vertex can have at most $|V|$ vertices adjacent to it (including itself – there is no rule against edges from a vertex to itself, although some problems involving graphs may not permit this).
For an undirected graph, $|E| \leq \frac{|V|(|V|-1)}{2}$.
In general, it is safe to say that $|E| \in O(|V|^2)$.
There are some graph-related problems, however, that use multi-graphs, which permit multiple edges between the same pair of vertices.
For example, a public transit system might be modeled as a graph with bus stops & train stations as vertices, and an edge for each scheduled bus/train between two vertices. In that case a pair of stops may share several connecting edges, each marked with a different departure and arrival time.
Multi-graphs are not subject to the limitation $|E| \in O(|V|^2)$.
Unless specified otherwise, we will be concentrating on graphs, not multi-graphs, in this course.
2 Data Structures for Implementing Graphs
2.1 Vertices and Pointers
One of the most direct approaches to implementing a graph is to use the same approach we employed with trees: create a class/struct to represent the vertices (for trees, nodes), linked by pointers.
For example, in our discussion of trees, we had this structure for a general tree node:
template <typename T>
class treenode
{
public:
T nodeValue;
std::vector<treenode<T>*> children;
treenode (const T& item = T()):
nodeValue(item)
{}
};
We can do almost exactly the same thing for a graph vertex:
template <typename T>
class Vertex
{
public:
T value;
std::vector<Vertex<T>*> neighbors;
Vertex (const T& item = T()):
value(item)
{}
};
Then, to get a graph like this, it’s a matter of allocating the vertices and setting up the pointers, e.g.,
vector<Vertex<int>*> sampleGraph;
for (int i = 0; i < 10; ++i)
graph.push_back(new Vertex<int>(i+1));
sampleGraph[0].neighbors.push_back(sampleGraph[2]);
sampleGraph[0].neighbors.push_back(sampleGraph[3]);
sampleGraph[0].neighbors.push_back(sampleGraph[4]);
⋮
The major drawback to this approach to implementing graphs is memory management. Because graphs do not follow the discipline of trees of having each node reachable from exactly one parent, it can be very difficult to tell when a graph vertex can be safely deleted. That’s why it becomes important to have a “master list” of vertices, sampleGraph
in this case.
2.2 Adjacency Matrix
A common approach is to number the vertices and keep them in an array.
An adjacency matrix indicates which vertices are connected. A 1 indicates the presence of an edge between two vertices, a zero indicates no edge.
The adjacency matrix for this graph would be
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
The advantage of this structure is that we can determine adjacency in O(1) time. A disadvantage to this structure is the lack of any easy way to remove (or add) vertices.
A potentially serious disadvantage is the $O(|V|)$ time required to list the vertices adjacent to a given vertex, a very common step in many graph algorithms. There are many algorithms of the form
for each vertex v in G
{
do something to v
do something to each neighbor of v
}
For the vertices and pointers approach, this is $O(|E|)$. With adjacency matrices, this is $O(|V|^2)$. Now, in any graph1 , we know that
\[ |E| \leq |V|^2 \]
but there are a lot of problems where the graph is sparse, with $|E| \in O(|V|)$, and for those graphs the above algorithm is $O(|V|)$ for vertices and pointers, $O(|V|^2)$ for adjacency matrices.
Another problem with adjacency matrices in many applications is the $O(|V|^2)$ storage size. Again, this is particularly annoying when the graph is sparse, when only a small fraction of the adjacency matrix elements are 1.
2.3 Adjacency Lists
A common structure that gives better time and space performance for sparse graphs is the adjacency list. For each vertex, we keep a list of vertices adjacent to it.
The adjacency list for this graph
would be this:
The adjacency list
-
is more flexible in terms of storage use, but
-
requires O(|V|) time testing to see if
v1
is adjacent tov2
-
makes it easier to iterate over all vertices or all edges
2.3.1 Approximating Adjacency Lists with std
Containers
In C++, you can implement an adjacency list as an array or vector of std::list. You can actually get something along the lines of an adjacency list by using a multimap:
class Node {
⋮
};
typedef std::multimap<Node,Node> Graph;
Graph g;
g.insert (Graph::value_type(node1, node3));
g.insert (Graph::value_type(node1, node4));
g.insert (Graph::value_type(node1, node5));
g.insert (Graph::value_type(node2, node6));
g.insert (Graph::value_type(node3, node6));
⋮
or a combination of a map and a set:
class Node {
⋮
};
typedef std::map<Node, std::set<Node> > Graph;
Graph g;
g[node1].insert (node3);
g[node1].insert (node4);
g[node1].insert (node5);
g[node2].insert (node6);
g[node3].insert (node6);
⋮
With these approaches, you don’t necessarily have to number the nodes. You can use any identifying information that can be inserted into a map or unordered map.
For example, you could do this airline graph as easily as
typedef std::multimap<std::string,std::string> Graph;
typedef Graph::value_type Flight;
Graph g;
g.insert (Flight("N.Y.", "Boston");
g.insert (Flight("Boston", "N.Y.");
g.insert (Flight("Raleigh", "N.Y.");
g.insert (Flight("N.Y.", "Wash DC");
g.insert (Flight("Wash DC", "N.Y.");
g.insert (Flight("Boston", "Wash DC");
g.insert (Flight("Wash DC", "Boston");
g.insert (Flight("Wash DC", "Norfolk");
g.insert (Flight("Norfolk", "Raleigh");
This is a useful approach if you want to quickly construct a usable graph for a specific application. It does not lend itself well to re-use, however. So, in the next lesson, we’ll look at a reusable Graph ADT in the style of the std
library containers.