Sharing Pointers and Garbage Collection

Steven J Zeil

Last modified: Oct 26, 2023
Contents:

Swearing by Sharing

We’ve talked a lot about using pointers to share information, but mainly as something that causes problems.

Example

For some data structures, this is OK. If we are using a pointer mainly to give us access to a dynamically allocated array, we can copy the entire array as necessary. In the example shown here, we would want each catalog to get its own distinct array of books. So we would implement a deep copy for the assignment operator and copy constructor, and delete the allBooks pointer in the destructor.

1 Shared Structures

In this section, we will call out three examples that we will explore further in the remainder of the lesson. All three involve some degree of essential sharing.

1.1 Singly Linked Lists

 

We’ll start with a fairly prosaic example. In its simplest form, a singly linked list involves no sharing, and so we could safely treat all of its components as deep-copied.

1.1.1 SLL Destructors

In particular, we can take a simple approach of writing the destructors — if you have a pointer, delete it:

struct SLNode {
   string data;
   SLNode* next;
     ⋮
   ~SLNode () {delete next;}
};

class List {
   SLNode* first;
public:
     ⋮
   ~List() {delete first;}
};

Problem: stack size is $O(N)$ where $N$ is the length of the list.


Alternative: Destroy the List, not the Nodes

struct SLNode {
   string data;
   SLNode* next;
     ⋮
   ~SLNode () {/* do nothing */}
};

class List {
   SLNode* first;
public:
     ⋮
   ~List() 
    {
      while (first != 0)
        {
          SLNode* next = first->next;
          delete first;
          first = next;
        }
    }
};

This avoids stacking up large numbers of recursive calls.

1.1.2 First-Last Headers

 

But now let’s consider one of the more common variations on linked lists.


FLH SLL Destructor

So, if we were to extend our basic approach of writing destructors that simply delete their pointers:

struct SLNode {
   string data;
   SLNode* next;
     ⋮
   ~SLNode () {delete next;}
};

class List {
   SLNode* first;
   SLNode* last;
public:
     ⋮
   ~List() {delete first; delete last;}
};

Then, when a list object is destroyed, the final node in the list will actually be deleted twice.

Deleting the same block of memory twice can corrupt the heap (by breaking the structure of the system free list) and eventually cause the program to fail.

1.2 Doubly Linked Lists


Doubly Linked Lists

 

Now, let’s make things just a little more difficult.

If we consider doubly linked lists, our straightforward approach of “delete everything” is really going to be a problem.


DLL Aggressive Deleting

struct DLNode {
   string data;
   DLNode* prev;
   DLNode* next;
     ⋮
   ~DLNode () {delete prev; delete next;}
};

class List {
   DLNode* first;
   DLNode* last;
public:
     ⋮
   ~List() {delete first; delete last;}
};

 


Deleting and Cycles

We’re now in an infinite recursion,

What makes this so much nastier than the singly linked list?

1.3 Airline Connections


Airline Connections

 

Lest you think that this issue only arises in low-level data structures, let’s consider how it might arise in programming at the application level.

This graph illustrates flight connections available from an airline.


Aggressively Deleting a Graph

If we were to implement this airport graph with Big 3-style operations:

class Airport
{
   ⋮

private:
   vector<Airport*> hasFlightsTo;
};

Airport::~Airport()
{
   for (int i = 0; i < hasFlightsTo.size(); ++i)
      delete hasFlightsTo[i];
}

we would quickly run into a disaster.


Deleting the Graph

 

Suppose that we delete the Boston airport.

This should not be a big surprise. Looking at the graph, we can see that it is possible to form cycles.


The Airline

Now, you might wonder just how or why we would have deleted that Boston pointer in the first place.

So, let’s add a bit of context.


The AirLine Structure

 

Suppose that PuddleJumper Air goes bankrupt.


Can We Do Better?

Now, that’s a problem. But what makes this example particularly vexing is that it’s not all that obvious what would constitute a better approach.


Changing the Hubs

 

Suppose that Wash DC were to lose its status as a hub.

Even though the pointer to it was removed from the hubs table, the Wash DC airport needs to remain in the map.


Changing the Connections

 

On the other hand, if Wash DC were to drop its service to Norfolk, one might argue that Norfolk and Raleigh should then be deleted, as there would be no way to reach them.

2 Garbage Collection

Objects on the heap that can no longer be reached (in one or more hops) from any pointers in the activation stack (i.e., in local variables of active functions) or from any pointers in the static storage area (variables declared in C++ as “static”) are called garbage.


Garbage Example

 


Garbage Collection

Determining when something on the heap has become garbage is sufficiently difficult that many programming languages take over this job for the programmer.

The runtime support system for these languages provides automatic garbage collection, a service that determines when an object on the heap has become garbage anf automatically scavenges (reclaims the storage of) such objects.


Java has GC

The programming Java, for example, looks very similar to C++. A lot of code written in one of these languages will work in the other.

But, in Java, there is no delete operator.

Java programmers use lots of pointers,1 many more than the typical C++ programmer.

But Java programmers never worry about deleting anything. They just trust in the garbage collector to come along eventually and clean up the mess.


C++ Does Not Have GC

Automatic garbage collection really can simplify a programmer’s life. Sadly, C++ does not support automatic garbage collection.

That’s the subject of the remainder of this section.

2.1 Reference Counting

Reference counting is one of the simplest techniques for implementing garbage collection.

2.1.1 Reference Counting Example

 

For example, here’s our airline example with reference counts.

Now, suppose that Wash DC loses its hub status…

first prev1 of 5next last

2.1.2 Reference Counted Pointers in C++

Implementing reference counting requires that we take control of pointers.

C++ now provides such an ADT — they are called “smart” pointers.

shared_ptr<T> p (new T());

This declares p to be a smart pointer to a reference-counted object of type T.

Important: You have to commit fully to using shared pointers on the objects they manage. You cannot have ordinary pointers to an object simultaneously while you also have shared_ptrs to the same objects.

Mixing ordinary and shared pointers will likely leave you with dangling ordinary pointers when the shared pointer decides to scavenge an object, eventually causing your program to crash.

2.1.3 Example: A Reference-Counted Singly Linked List

Starting from:

#include <string>

using namespace std;

struct SLNode {
   string data;
   SLNode* next;
     ⋮
   SLNode (string d = string(), SLNode* nxt = nullptr)
   : data(d), next(nxt)
   {}

   ~SLNode () { }
};

class List {
   SLNode* first;
public:
     ⋮
   ~List() 
    {
      while (first != 0)
       {
         SLNode* next = first->next;
         delete first;
         first = next;
       }
    }


   void add(string s)
   {
       first = new SLNode(s, first);
   }

};

we can change all uses of SLNode* to shared_ptr<SLNode>:

#include <string>
#include <memory>

using namespace std;

struct SLNode {
   string data;
   shared_ptr<SLNode> next;

   SLNode (string d = string(), shared_ptr<SLNode> nxt = nullptr)
   : data(d), next(nxt)
   {}
     ⋮
   ~SLNode () {/* do nothing */}
};

class List {
   shared_ptr<SLNode> first;
public:
    ⋮
   ~List()
    {    }

   void add(string s)
   {
       first = shared_ptr<SLNode>(new SLNode(s, first));
   }
     ⋮

};

And we no longer have to worry about explicitly deleting our unneeded nodes.

2.1.4 Is it worth the effort?


The Case of the Disappearing Airline

 

Let’s return to our original airline example, with reference counts.

  • Assume that

    • the airline object itself is a local variable in a function and that
    • we are about to return from that function.
  • That object will therefore be destroyed, and its reference counted pointers to the three hubs will disappear.

first prev1 of 2next last

What went wrong? Let’s look at our other examples.


Ref Counted SLL

Here is our singly linked list with reference counts.

 
Assume that the list header itself is a local variable that is about to be destroyed.

first prev1 of 2next last

Ref Counted DLL

Now let’s look at our doubly linked list.

 

Again, let’s assume that the list header itself is a local variable that is about to be destroyed.

first prev1 of 2next last

Reference Counting’s Achilles Heel

So two of our last three examples failed when trying to use refe3rence counting.

What’s the common factor between the two failures?

So a more general approach is needed.

2.2 Mark and Sweep

Mark and sweep is one of the earliest and best-known garbage collection algorithms.


Assumptions

The core assumptions of mark and sweep are:


The Mark and Sweep Algorithm

With those assumptions, the mark and sweep garbage collector is pretty simple:

markAndSweep.cpp
void markAndSweep()
{
 // mark
 for (all pointers P on the run-time stack or
   in the static data area )
  {
    mark *P;
  }

 //sweep
 for (all objects *P on the heap)
   {
     if *P is not marked then
        delete P
     else
        unmark *P
   }
}

template <class T>
void mark(T* p)
{
  if *p is not already marked
    {
      mark *p;
      for (all pointers q inside *p)
        {
          mark *q;
        }
     }
}

The algorithm works in two stages.

(In graph terms, this is a depth-first traversal of objects on the heap.)


Mark and Sweep Example

 

As an example, suppose that we start with this data.

Then, let’s assume that the local variable holding the list header is destroyed.

first prev1 of 16next last

Assessing Mark and Sweep

In practice, the recursive form of mark-and-sweep requires too much stack space.

Practical implementations of mark-and-sweep have countered this problem with an iterative version of the mark function that “reverses” the pointers it is exploring so that they leave a trace behind it of where to return to.

The fact is, tracing every object on the heap can be quite time-consuming. On virtual memory systems, it can result in an extraordinary number of page faults. The net effect is that mark-and-sweep systems often appear to freeze up for seconds to minutes at a time when the garbage collector is running. There are a couple of ways to improve performance.

2.3 Generation-Based Collectors


Old versus New Garbage

In many programs, people have observed that object lifetime tends toward the extreme possibilities.


Generational GC

Generational collectors take advantage of this behavior by dividing the heap into “generations”.

The actual scanning process is a modified mark and sweep. But because relatively few objects are scanned on each pass, the passes are short and the overall cost of GC is low.

To keep the cost of a pass low, we need to avoid scanning the old objects on the heap. The problem is that some of those objects may have pointers to the newer ones. Most generational schemes use traps in the virtual memory system to detect pointers from “old” pages to “new” ones to avoid having to explicitly scan the old area on each pass.

2.4 Incremental Collection


Incremental GC

Another way to avoid the appearance that garbage collection is locking up the system is to modify the algorithm so that it can be run one small piece at a time.

There is a difficuty here, though. Because the program might be modifying the heap while we are marking objects, we have to take extra care to be sure that we don’t improperly flag something as garbage just because all the pointers to it have suddenly been moved into some other data structure that we had already swept.

Again, special care has to be taken so that the continuously running garbage collector and the main calculation don’t interfere with one another.

3 Strong and Weak Pointers


Doing Without

OK, garbage collection is great if you can get it.


Ownership

One approach that works in many cases is to try to identify which ADTs are the owners of the shared data, and which ones merely use the data.


Ownership Example

 

In this example that we looked at earlier, we saw that if both the Airline object on the left and the Airport objects on the right deleted their own pointers when destroyed, our program would crash.


Ownership Example

We could improve this situation by deciding that the Airline owns the Airport descriptors that it uses. So the Airline object would delete the pointers it has, but the Airports would never do so.

class Airport
{
   ⋮

private:
   vector<Airport*> hasFlightsTo;
};

Airport::~Airport()
{
  /* for (int i = 0; i < hasFlightsTo.size(); ++i)
      delete hasFlightsTo[i]; */
}

class AirLine {
   ⋮
   string name;
   map<string, Airport*> hubs;
};


AirLine::~Airline()
{
   for (map<string, Airport*>::iterator i = hubs.begin;
        i != hubs.end(); ++i)
     delete i->second;
}



Ownership Example

 

Thus, when the airline object on the left is destroyed, it will delete the Boston, N.Y., and Wash DC objects.

The problem is that, having decided that the Airline owns the Airport descriptors, we have some Airport objects with no owner at all.


Asserting Ownership

I would probably resolve this by modifying the Airline class to keep better track of its Airports.

class AirLine {
   ⋮
   string name;
   set<string> hubs;
   map<string, Airport*> airportsServed;
};


AirLine::~Airline()
{
   for (map<string, Airport*>::iterator i = airportsServed.begin;
       i != airportsServed.end(); ++i)
     delete i->second;
}

Asserting Ownership (cont.)

 

The new map tracks all of the airports served by this airline, and we use a separate data structure to indicate which of those airports are hubs.

Now, when an airline object is destroyed, all of its airport descriptors will be reclaimed as well.


Ownership Can Be Too Strong

 

Ownership is sometimes a bit too strong a relation to be useful.


Strong and Weak Pointers

We can generalize the notion of ownership by characterizing the various pointer data members as strong or weak.

When an object containing pointer data members is destroyed, it deletes its strong pointer members and leaves its weak ones alone.


Strong and Weak SLL

In this example, if we characterize the pointers as shown:

struct SLNode {
   string data;
   SLNode* next; // strong
     ⋮
   ~SLNode () {delete next;}
};

class List {
   SLNode* first; // strong
   SLNode* last;  // weak
public:
     ⋮
   ~List() 
    {
      delete first;  // OK, because this is strong
      /*delete last;*/ // Don't delete. last is weak.
     }
};

then our program will run correctly.


Picking the Strong Ones

 

The key idea is to select the smallest set of pointer data members that would connect together all of the allocated objects, while giving you exactly one path to each such object.


Strong and Weak DLL

Similarly, in a doubly linked list, we can designate the pointers as follows:

struct DLNode {
   string data;
   DLNode* prev; // weak
   DLNode* next; // strong
     ⋮
   ~DLNode () {delete next;}
};

class List {
   DLNode* first; // strong
   DLNode* last;  // weak
public:
     ⋮
   ~List() {delete first;}
};

and so achieve a program that recovers all garbage without deleting anything twice.

3.1 Smart Pointers can be Strong or Weak

C++ smart pointers actually come in two “flavors”

For example, if we were doing a doubly linked list, this would not be useful:

struct DLNode {
   string data;
   shared_ptr<DLNode> prev;
   shared_ptr<DLNode> next;
     ⋮
   ~DLNode () {delete next;}
};

class List {
   shared_ptr<DLNode> first;
   shared_ptr<DLNode> last;
public:
     ⋮
};

because the cycles induced by the prev and next pointers would prevent any nodes’ reference counts from dropping to zero.

But if we make the back pointer weak:

struct DLNode {
   string data;
   weak_ptr<DLNode> prev;
   shared_ptr<DLNode> next;
     ⋮
   ~DLNode () {delete next;}
};

class List {
   shared_ptr<DLNode> first;
   shared_ptr<DLNode> last;
public:
     ⋮
};

then the list should have no cycles, and reference counting should work just fine.

4 Java Programmers Have it Easy

Java has included automatic garbage collection since its beginning.

C++ programmers may sometimes sneer at the slowdown caused by garbage collection. The collector implementations, however, continue to evolve. In fact, current versions of Java commonly offer multiple garbage collectors, one which can be selected at run-time in an attempt to find one whose run-time characteristics (i.e., how aggressively it tries to collect garbage and how much of the time it can block the main program threads while it is working) that matches your program’s needs.

Java programmers sometimes face an issue of running out of memory because they have inadvertently kept pointers to data that they no longer need. This is a particular problem in implementing algorithms that use caches or memoization to keep the answers to prior computations in case the same result is needed again in the future. Because of this, Java added a concept of a weak reference (pointer) that can be ignored when checking to see if an object is garbage and that gets set to null if the object it points to gets collected.


1: Though, somewhat confusingly, they call them “references” instead of “pointers”. But they really are more like C++ pointers than like C++ references because you

All three of these properties are true of C++ pointers but not of C++ references. So Java “references” really are the equivalent of C++ “pointers”.

By renaming them, Java advocates are able to boast that Java is a simpler language because it doesn’t have pointers. That’s more than a little disingenuous, IMO. (If it looks like a duck, swims like a duck, and quacks like a duck,…)

In actual fact, Java programs are absolutely swimming in pointers, but they pointers just aren’t as problematic as they are in C++.