We have previously seen that compilers often represent code as trees.
For example, in this case, the expression is (13 + a) * (x - 1)
.
Each non-leaf node represents the application of some operator to one or more subexpressions.
As good C++ programmers, we know that assignment and many other “built-in” parts of C++ are just more operators, so we can easily extend this idea to entire statements …
x = (13 + a) * (x - 1);
This idea can be extended to other kinds of statements as well, if we’re willing to be a bit loose in our interpretation of what an “operator” is.
int x = 0;
Suppose we have a tree for, say, the assignment
x = (13 + a) * (x - 1);
Now the compiler wants to know just what “x” actually refers to. In a typical C++ program, we may have lots of objects named “x”,
We add pointers
from each use of a name
to the declaration to which it refers.
Whatever this is, it’s not a tree anymore!
This structure is an example of a graph.
As another example, consider the map representing the flights offered by a small airline.
This is also a graph. It can be used in such practical problems as
Given travel time on each route, find the fastest way to travel between two cities.
Find the fastest way to visit every city in the graph.
A graph $G=(V,E)$ consists of a set of vertices ($V$) and a set of edges ($E$).
An edge is an ordered pair $(v,w)$, where $v \in V$ and $w \in V$.
A graph is undirected if for any vertices v and w, $(v,w) \in E$ iff (if and only if) $(w,v) \in E$.
Graphs that are not undirected are directed graphs or digraphs.
Both the assignment/declaration graph and the airline graph above are directed graphs.
A node $w$ is adjacent to $v$ in $G$ if there exists an edge $(v,w) \in E$.
In the assignment graph above, the *
vertex is adjacent to the =
vertex. The =
vertex is not adjacent to the *
vertex (or to any other vertices).
A path in G is a sequence of vertices $[w_1, w_2, \ldots , w_n]$ such that $(w_i, w_{i+1}) \in E $ for all $i = 1 \ldots n-1$.
A path is simple if no two of its vertices, except possibly the first and last, are the same.
In the airline graph above, [Boston, N.Y., Wash DC]
is a simple path.
A cycle is a path of length 1 or more for which $w_1 = w_n$.
In the airline graph above, [Boston, N.Y., Wash DC, Boston]
is a cycle.
A directed graph is acyclic if it contains no cycles.
Sometimes called a DAG for “directed acyclic graph”.
The assignment/declaration graph is acyclic.
An undirected graph is connected if there is a path from each vertex to each other vertex.
A directed graph with this property is called strongly connected.
The airline graph above is strongly connected. The assignment/declaration graph is not.
Usually when we discuss graphs, we make the assumption that there is at most one edge between any two vertices.
This means that, for a digraph, $|E| \leq |V|^2$, since each vertex can have at most $|V|$ vertices adjacent to it (including itself – there is no rule against edges from a vertex to itself, although some problems involving graphs may not permit this).
For an undirected graph, $|E| \leq \frac{|V|(|V|-1)}{2}$.
In general, it is safe to say that $|E| \in O(|V|^2)$.
There are some graph-related problems, however, that use multi-graphs, which permit multiple edges between the same pair of vertices.
For example, a public transit system might be modeled as a graph with bus stops & train stations as vertices, and an edge for each scheduled bus/train between two vertices. In that case a pair of stops may share several connecting edges, each marked with a different departure and arrival time.
Multi-graphs are not subject to the limitation $|E| \in O(|V|^2)$.
Unless specified otherwise, we will be concentrating on graphs, not multi-graphs, in this course.
One of the most direct approaches to implementing a graph is to use the same approach we employed with trees: create a class/struct to represent the vertices (for trees, nodes), linked by pointers.
template <typename T>
class Vertex
{
public:
T value;
std::vector<Vertex<T>*> neighbors;
Vertex (const T& item = T()):
value(item)
{}
};
Then, to get a graph like this, it’s a matter of allocating the vertices and setting up the pointers, e.g.,
vector<Vertex<int>*> sampleGraph;
for (int i = 0; i < 10; ++i)
graph.push_back(new Vertex<int>(i+1));
sampleGraph[0].neighbors.push_back(sampleGraph[2]);
sampleGraph[0].neighbors.push_back(sampleGraph[3]);
sampleGraph[0].neighbors.push_back(sampleGraph[4]);
⋮
Major problem: memory management
A common approach is to number the vertices and keep them in an array.
An adjacency matrix indicates which vertices are connected. A 1 indicates the presence of an edge between two vertices, a zero indicates no edge.
The adjacency matrix for this graph would be
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
A common structure that gives better time and space performance for sparse graphs is the adjacency list. For each vertex, we keep a list of vertices adjacent to it.
The adjacency list for this graph
would be this:
The adjacency list
is more flexible in terms of storage use, but
requires O(|V|) time testing to see if v1
is adjacent to v2
makes it easier to iterate over all vertices or all edges