2.2 Mark and Sweep
Mark and Sweep
Mark and sweep is one of the
earliest and best-known garbage collection algorithms.
- It works perfectly well with cycles, but
- requires some significant support from the
compiler and run-time support system.
Assumptions
The core assumptions of mark and sweep are:
- Each object on the heap has a hidden "mark"
bit.
- We can find all pointers outside the heap
(i.e., in the activation stack and static area)
- For each data object on the heap, we can find
all pointers within that object.
- We can iterate over all objects on the
heap
The Mark and Sweep Algorithm
With those assumptions, the mark and sweep
garbage collector is pretty simple:
void markAndSweep()
{
for (all pointers P on the run-time stack or
in the static data area )
{
mark *P;
}
for (all objects *P on the heap)
{
if *P is not marked then
delete P
else
unmark *P
}
}
template <</span>class T>
void mark(T* p)
{
if *p is not already marked
{
mark *p;
for (all pointers q inside *p)
{
mark *q;
}
}
}
The algorithm works in two stages.
- In the first stage, we start from every pointer
outside the heap and recursively mark each object reachable via
that pointer. (In graph terms, this is a depth-first traversal of
objects on the heap.)
- In the second stage, we look at each item on
the heap.
- If it’s marked, then we have demonstrated
that it’s possible to reach that object from a pointer
outside the heap.
- It isn’t garbage, so we leave it alone
(but clear the mark so we’re ready to repeat the whole
process at some time in the future).
- If the object on the heap is not marked, then
it’s garbage and we scavenge it.
Mark and Sweep Example
As an example, suppose that we start with this
data.
Then, let’s assume that the
local variable holding the list header is destroyed.
Mark and Sweep Example II
- At some point later in time, the mark and sweep
algorithm is started (typically in response to later code of ours
trying to allocate something new in memory and the run-time system
has discovered that we are running low on available storage.).
- The main algorithm begins the marking phase,
looping through the pointers in the activation stack.
- We have two. The first points to the Boston
node. So we invoke the mark()
function on the pointer to Boston.
- The Boston node has not been marked yet, so we
mark it.
- Then the mark() function iterates over the
pointers in the Boston object. It first looks at the N.Y. pointer
and recursively invokes itself on that.
- The N.Y. object has not been marked yet, so we
mark it and then iterate over the pointers in N.Y.,
We first come to the pointer to Boston, and
recursively invoke mark() on that. But Boston is already marked, so
we return immediately to the N.Y. call. Continuing on, we find a
pointer to Wash DC. and invoke mark() on that.
The Wash DC object has not been marked yet, so we
mark it and then iterate over the pointers in Wash DC. We first
come to the pointer to Boston, and recursively invoke mark() on
that. But Boston is already marked, so we return immediately to the
N.Y. call. Again, that object is already marked so we immediately
return to the earlier N.Y. call. That one has now visited all of
its pointers, so it returns to the first Boston call.
The Boston call resumes iterating over its
pointers, and finds a pointer to Wash DC. It calls mark() on that
pointer, but Wash DC has already been marked, so we return
immediately. The Boston call has now iterated over all of its
pointers, so we return to the main mark and sweep algorithm.
That algorithm continues looking at pointers on
the activation stack. We have a pointer to N.Y., and call mark() on
that. But N.Y. is already marked, so we return immediately.
Mark and Sweep Example III
Once the mark phase of the main algorithm is
complete,
- We have marked the Boston, N.Y., and Wash DC
objects.
- The Norfolk, Raleigh, Adams, Baker, and Davis
objects are unmarked.
The Sweep Phase
In the sweep phrase, we visit each object on the
heap.
- The three marked hubs will be kept, but their
marks will be cleared in preparation for running the algorithm
again at some time in the future.
- All of the other objects will be
scavenged.
Assessing Mark and Sweep
In practice, the recursive form of mark-and-sweep
requires too much stack space.
- It can frequently result in recursive calls of
the mark() function running thousands deep.
- Even with that improvement, systems that use
mark and sweep are often criticized as slow. The fact is, tracing
every object on the heap can be quite
time-consuming. On virtual memory systems, it can result in an
extraordinary number of page faults. The net effect is that
mark-and-sweep systems often appear to freeze up for seconds to
minutes at a time when the garbage collector is running. There are
a couple of ways to improve performance.