Sharing Pointers and Garbage Collection

Steven J Zeil

Last modified: Mar 21, 2014

Swearing by Sharing

We’ve talked a lot about using pointers to share information, but mainly as something that causes problems.

1. Shared Structures

In this section, we will introduce three examples that we will explore further in the remainder of the lesson. All three involve some degree of essential sharing.

1.1 Singly Linked Lists

We’ll start with a fairly prosaic example. In its simplest form, a singly linked list involves no sharing, and so we could safely treat all of its components as deep-copied.

SLL Destructors

In particular, we can take a simple approach of writing the destructors — if you have a pointer, delete it:
struct SLNode {
   string data;
   SLNode* next;
     ⋮
   ~SLNode () {delete next;}
};

class List {
   SLNode* first;
public:
     ⋮
   ~List() {delete first;}
};

Problem: stack size is \(O(N)\) where \(N\) is the length of the list.

Destroy the List, not the Nodes

struct SLNode {
   string data;
   SLNode* next;
     ⋮
   ~SLNode () {/* do nothing */}
};

class List {
   SLNode* first;
public:
     ⋮
   ~List() 
    {
      while (first != 0)
        {
          SLNode* next = first->next;
          delete first;
          first = next;
        }
    }
};

This avoids stacking up large numbers of recursive calls.

First-Last Headers

But now let’s consider one of the more common variations on linked lists.

So, if we were to extend our basic approach of writing destructors that simply delete their pointers:

** Aggressively Deleting Pointers

struct SLNode {
   string data;
   SLNode* next;
     ⋮
   ~SLNode () {delete next;}
};

class List {
   SLNode* first;
   SLNode* last;
public:
     ⋮
   ~List() {delete first; delete last;}
};

Then, when a list object is destroyed, the final node in the list will actually be deleted twice.

1.2 Doubly Linked Lists

Now, let’s make things just a little more difficult.

If we consider doubly linked lists, our straightforward approach of “delete everything” is really going to be a problem.

struct DLNode {
   string data;
   DLNode* prev;
   DLNode* next;
     ⋮
   ~DLNode () {delete prev; delete next;}
};

class List {
   DLNode* first;
   DLNode* last;
public:
     ⋮
   ~List() {delete first; delete last;}
};

Deleting the DLL

Deleting and Cycles

We’re now in an infinite recursion,

What makes this so much nastier than the singly linked list?

1.3 Airline Connections

Lest you think that this issue only arises in low-level data structures, let’s consider how it might arise in programming at the application level.

This graph illustrates flight connections available from an airline.

Aggressively Deleting a Graph

If we were to implement this airport graph with Big 3-style operations:

class Airport
{
   ⋮

private:
   vector<Airport*> hasFlightsTo;
};

Airport::~Airport()
{
   for (int i = 0; i < hasFlightsTo.size(); ++i)
      delete hasFlightsTo[i];
}

we would quickly run into a disaster.

Deleting the Graph

Suppose that we delete the Boston airport.

The Airline

Now, you might wonder just how or why we would have deleted that Boston pointer in the first place.

class AirLine {
   ⋮
   string name;
   map<string, Airport*> hubs;
};


AirLine::~Airline()
{
   for (map<string, Airport*>::iterator i = hubs.begin;
        i != hubs.end(); ++i)
     delete i->second;
}

The AirLine Structure

Suppose that PuddleJumper Air goes bankrupt.

Can We Do Better?

Now, that’s a problem. But what makes this example particularly vexing is that it’s not all that obvious what would constitute a better approach.

Changing the Hubs

Suppose that Wash DC were to lose its status as a hub.

Even though the pointer to it was removed from the hubs table, the Wash DC airport needs to remain in the map.

Changing the Connections

On the other hand, if Wash DC were to drop its service to Norfolk, one might argue that Norfolk and Raleigh should then be deleted, as there would be no way to reach them.

2. Garbage Collection

Garbage

Objects on the heap that can no longer be reached (in one or more hops) from any pointers in the activation stack or from any pointers in the static storage area are called garbage.

Garbage Example

Garbage Collection

Determining when something on the heap has become garbage is sufficiently difficult that many programming languages take over this job for the programmer.

The runtime support system for these languages provides automatic garbage collection, a service that determines when an object on the heap has become garbage anf automatically scavenges (reclaims the storage of) such objects.

Java has GC

Although Java and C++ look very similar, in Java there is no “delete” operator.

Java programmers use many more pointers than typical C++ programmers do.

But Java programmers never worry about deleting anything. They just trust in the garbage collector to come along eventually and clean up the mess.

C++ Does Not

Automatic garbage collection really can simplify a programmer’s life. Sadly, C++ does not support automatic garbage collection.

But how is this magic accomplished (and why doesn’t C++ support it)?

2.1 Reference Counting

Reference counting is one of the simplest techniques for implementing garbage collection.

Reference Counting Example


For example, here’s our airline example with reference counts. Now, suppose that Wash DC loses its hub status.

Reference Counting Example II

Now, suppose that Wash DC drops its service to Norfolk

Reference Counting Example III

So the Norfolk object can be scavenged.

Reference Counting Example IV

Doing that reduces N.Y.’s reference count, but the count stays above zero, so we don’t try to scavenge N.Y.

Can we do this?

Implementing reference counting requires that we take control of pointers.

A Reference Counted Pointer

Here is an (incomplete) sketch of a reference counted pointer ADT (which I will call a “smart pointer” for short).

refCountPtr.h

Is it worth the effort?

Disappearing Airline

Let’s return to our original airline example, with reference counts.

Leaky Airports

Here is the result, with the updated reference counts.

Ref Counted SLL

Here is our singly linked list with reference counts.

Assume that the list header itself is a local variable that is about to be destroyed.

Ref Counted SLL II

So that works just fine!

Ref Counted DLL

Now let’s look at our doubly linked list.

Again, let’s assume that the list header itself is a local variable that is about to be destroyed.

Ref Counted DLL II

Here’s the result.

Alas, we can see that none of the reference counters have gone to zero, so nothing will be scavenged, even though all three nodes are garbage.

Reference Counting’s Achilles Heel

What’s the common factor between the failures in the first and third examples?

2.2 Mark and Sweep

Mark and sweep is one of the earliest and best-known garbage collection algorithms.

Assumptions

The core assumptions of mark and sweep are:

The Mark and Sweep Algorithm

With those assumptions, the mark and sweep garbage collector is pretty simple:

markAndSweep.cpp

The algorithm works in two stages.

Mark and Sweep Example

As an example, suppose that we start with this data.

Then, let’s assume that the local variable holding the list header is destroyed.

Mark and Sweep Example II

Mark and Sweep Example III

Once the mark phase of the main algorithm is complete,

The Sweep Phase

In the sweep phrase, we visit each object on the heap.

Assessing Mark and Sweep

In practice, the recursive form of mark-and-sweep requires too much stack space.

Practical implementations of mark-and-sweep have countered this problem with an iterative version of the mark function that “reverses” the pointers it is exploring so that they leave a trace behind it of where to return to.

2.3 Generation-Based Collectors

Old versus New Garbage

In many programs, people have observed that object lifetime tends toward the extreme possibilities.

Generational GC

Generational collectors take advantage of this behavior by dividing the heap into “generations”.

2.4 Incremental Collection

Another way to avoid the appearance that garbage collection is locking up the system is to modify the algorithm so that it can be run one small piece at a time.

3. Strong and Weak Pointers

Doing Without

OK, garbage collection is great if you can get it.

Ownership

One approach that works in many cases is to try to identify which ADTs are the owners of the shared data, and which ones merely use the data.

Ownership Example


In this example that we looked at earlier, we saw that if both the Airline object on the left and the Airport objects on the right deleted their own pointers when destroyed, our program would crash.

Ownership Example

We could improve this situation by deciding that the Airline owns the Airport descriptors that it uses. So the Airline object would delete the pointers it has, but the Airports would never do so.

class Airport
{
   ⋮

private:
   vector<Airport*> hasFlightsTo;
};

Airport::~Airport()
{
  /* for (int i = 0; i < hasFlightsTo.size(); ++i)
      delete hasFlightsTo[i]; */
}

class AirLine {
   ⋮
   string name;
   map<string, Airport*> hubs;
};


AirLine::~Airline()
{
   for (map<string, Airport*>::iterator i = hubs.begin;
        i != hubs.end(); ++i)
     delete i->second;
}


Ownership Example


Thus, when the airline object on the left is destroyed, it will delete the Boston, N.Y., and Wash DC objects.

Asserting Ownership

I would probably resolve this by modifying the Airline class to keep better track of its Airports.


class AirLine {
   ⋮
   string name;
   set<string> hubs;
   map<string, Airport*> airportsServed;
};


AirLine::~Airline()
{
   for (map<string, Airport*>::iterator i = airportsServed.begin;
        i != airportsServed.end(); ++i)
     delete i->second;
}


Asserting Ownership (cont.)

The new map tracks all of the airports served by this airline, and we use a separate data structure to indicate which of those airports are hubs.

Now, when an airline object is destroyed, all of its airport descriptors will be reclaimed as well.

Ownership Can Be Too Strong

Ownership is sometimes a bit too strong a relation to be useful.

Strong and Weak Pointers

We can generalize the notion of ownership by characterizing the various pointer data members as strong or weak.

When an object containing pointer data members is destroyed, it deletes its strong pointer members and leaves its weak ones alone.

Strong and Weak SLL

In this example, if we characterize the pointers as shown:

struct SLNode {
   string data;
   SLNode* next; // strong
     ⋮
   ~SLNode () {delete next;}
};

class List {
   SLNode* first; // strong
   SLNode* last;  // weak
public:
     ⋮
   ~List() 
    {
      delete first;  // OK, because this is strong
      /*delete last;*/ // Don't delete. last is weak.
     }
};

then our program will run correctly.

Picking the Strong Ones

The key idea is to select the smallest set of pointer data members that would connect together all of the allocated objects, while giving you exactly one path to each such object.

Strong and Weak DLL

Similarly, in a doubly linked list, we can designate the pointers as follows:

struct DLNode {
   string data;
   DLNode* prev; // weak
   DLNode* next; // strong
     ⋮
   ~DLNode () {delete next;}
};

class List {
   DLNode* first; // strong
   DLNode* last;  // weak
public:
     ⋮
   ~List() {delete first;}
};

and so achieve a program that recovers all garbage without deleting anything twice.

4. C++11: std Reference Counting

The new C++11 standard contains smart pointer templates, quite similar in concept to the RefCountPointer discussed earlier.

shared and weak ptrs

There are two primary class templates involved

5. Java Programmers Have it Easy

Java Programmers Have it Easy

Java has included automatic garbage collection since its beginning.