Copying Data and the "Rule of the Big 3"

Steven J Zeil

Last modified: Oct 26, 2023
Contents:

Next we turn our attention to a set of issues that are often given short shrift in both introductory courses and textbooks, but that are extremely important in practical C++ programming.

As we begin to build up our own ADTS, implemented as C++ classes, we quickly come to the point where we need more than one of each kind of ADT object. Sometimes we will simply have multiple variables of our ADT types. Once we do, we will often want to copy or assign one variable to another, and we need to understand what will happen when we do so. Even more important, we need to be sure that what does happen is what we want to have happen, depending upon our intended behavior for our ADTs.

As we move past the simple case of multiple variables of the same ADT type, we may want to build collections of that ADT. The simplest case of this would be an array or linked list of our ADT type, though we will study other collections as the semester goes on. We will need to understand what happens when we initialize such a collection and when we copy values into and out of it, and we need to make sure that behavior is what we want for our ADTs.

In this lesson, we set the stage for this kind of understanding by looking at how we control initialization and copying of class values.

1 Copying Data – Shallow and Deep Copies

One of the most common things we do with data is to copy it from one place to another. After all, the basic assignment statement:

x = y;

may the first statement of C++ statement you learned, and assignments constitute the bulk of many programmers’ code.


Closely related to assignment is copying, creating a brand new object with the same content as the original.


Copying occurs in many places in C++ code:

1.1 Copying Blocks of Bits

We often envision copying data as a simple process of copying the bits from one location to another. This is a simple, intuitive view, and works well with some data.

1.1.1 Book - simple arrays

book1.h
#ifndef BOOK_H
#include "author.h"
#include "publisher.h"

class Book {
public:
  Book();
  
  Book (std::string theTitle, const Publisher* thePubl,
		      int numberOfAuthors, Author* theAuthors,
			  std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
        const Author& theAuthor,
        std::string theISBN);

  std::string getTitle() const        {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

  Publisher* getPublisher() const    {return publisher;}
  void setPublisher(const Publisher* publ) {publisher = publ;}

  int getNumberOfAuthors() const {return numAuthors;}

  Author getAuthor (int authorNumber) const;
  void addAuthor (const Author&);
  void removeAuthor (cosnt Author&);

  std::string getISBN() const   {return isbn;}
  void setISBN(std::string id)  {isbn = id;}

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author authors[maxAuthors];
  std::string isbn;
};


#endif

Consider the problem of copying a book, where this list of authors has been implemented as a basic (fixed-size) array.

Book - simple arrays (cont.)

 

If we start with a single book object, b1, as shown here, and then we execute

Book b2 = b1;

and assume that the copy of b1 is created by simply copying the block of bits that make up b1 to the location of the new variable b2,

first prev1 of 2next last

Of course, this fixed-length array design has a lot of drawbacks, so let’s look at another possible implementation.

1.1.2 Book - dynamic arrays

Now let’s consider the problem of copying a book implemented using dynamically allocated arrays.

class Book {
public:
  ⋮
private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author* authors; // array of authors
  std::string isbn;
};

Here we have replaced the simple array with a pointer. When initializing a Book object, we would allocate an appropriately-sized array of Authors on the heap, storing the address of that array in the pointer.

Copying dynamic arrays

 

If we start with a single book object, b1, as shown here, and then we execute

Book b2 = b1;

again carrying out the copy by copying the block of bits making up b1

first prev1 of 5next last

Sometimes it’s Better to Have 2 Copies

 

What we really wanted, after the copy:

Book b2 = b1;

is something more like this:

But to get that, we will not be able to rely on copying books as simple blocks of bits.

1.2 Shallow vs Deep Copy

Copy operations are distinguished by how they treat pointers:


This was a Shallow Copy

 

This was a Deep Copy

 


Shallow versus Deep

For any ADT, we must decide whether the things it points to are things we want to share with other objects or whether we want to own them exclusively. That’s a matter of just how we want our ADTs to behave, which depends in turn on what we expect to do with them.

Take note: “shallow” and “deep” are actually two extremes of a range of possible copy depths – sometimes our ADTs call for a behavior that has us treat some pointers shallowly and others deeply.

1.2.1 Copying Books Two Ways

Here is a shallow copy:

Book shallowCopyOf (const Book& b)
{
  Book copy;
  copy.title = b.title;
  copy.isbn = b.isbn;
  copy.publisher = b.publisher;
  copy,numAuthors = b.numAuthors;
  copy.MAXAUTHORS = b.MAXAUTHORS;
  copy.authors = b.authors;
  return copy;
}

If we replace the highlighted line (the only one involving a pointer):

Book deepCopyOf (const Book& b)
{
  Book copy;
  copy.title = b.title;
  copy.isbn = b.isbn;
  copy.publisher = b.publisher;
  copy,numAuthors = b.numAuthors;
  copy.MAXAUTHORS = b.MAXAUTHORS;
  copy.authors = new Author[copy.MAXAUTHORS];
  for (int i = 0; i < numAuthors; ++i)
    copy.authors[i] = b.authors[i];
  return copy;
}

then we get a deep copy.

In both copies, the non-pointer data members can be copied easily. But for the deep copy, the authors pointer is “copied” by allocating a new array big enough to hold the existing data; then all the existing data has to be copied into the new array.

(In truth, the second copy is not actually a fully deep copy. publisher is also a pointer, and we are still copying publisher shallowly. But that’s a deliberate choice. publisher was made a pointer so that multiple books published by hte same company could point back to their common publisher – we want to share the publisher.)

1.3 Choosing Between Shallow and Deep Copy

How do we decide which form of copy we want for any particular ADT?

First, if none of your data members are pointers (or references), then there’s no problem. Shallow and deep copy are entirely equivalent in the absence of pointers.

If you do have pointers among your data members, then you have to ask whether your ADT should share the data it points to. There’s no magic answer to this question. An ADT exists to support some mental model of a collection of data.

Sometimes sharing is a part of that mental model. For example, a publisher might want to keep all of its information about an author in one place,even though that author has written many books. In that case, it would make sense to use pointers to a shared Author object in our Books.

On the other hand, the dynamically allocated arrays we use to collect a group of co-authors of any particular book are properties specific to that book. So sharing of those arrays makes less sense and, as we have seen, can lead to easily corrupted data.

We can offer up this observation:

Shallow copy is wrong for any ADT that has pointers among its data members to things that it does not want to share.

2 The Big 3

The Big 3 in C++ are the

  1. copy constructor,
  2. assignment operator, and
  3. destructor.

These three functions are closely related to one another in regards to their treatment of shallow and deep copying.

2.1 Copy Constructors

The copy constructor for a class Foo is the constructor of the form:

Foo (const Foo& oldCopy);

Where are Copy Constructors Used?

The copy constructor gets used in 5 situations:

  1. When you declare a new object as a copy of an old one:

    Book book2 (book1);
    

    or

    Book book2 = book1;
    
  2. When a function call passes a parameter “by copy” (i.e., the formal parameter does not have a &):

    void foo (Book b, int k);
      ⋮
    
    Book text361 (0201308787, budd, 
    	 "Data Structures in C++ Using the Standard Template Library",
    	 1998, 1);
    foo (text361, 0);   // foo actually gets a copy of text361
    
  3. When a function returns an object:

    Book foo (int k);
    {
      Book b;
      ⋮
      return b; // a copy of b is placed in the caller's memory area
    }
    
  4. When data members are initialized in a constructor’s initialization list from a single parameter of the same type:

    Author::Author (std::string theName, 
                 Address theAddress, long id)
      : name(theName), 
        address(theAddress),
        identifier(id)
    {
    }
    
  5. When an object is a data member of another class for which the compiler has generated its own copy constructor.

2.1.1 Compiler-Generated Copy Constructors

As you can see from that list, the copy constructor gets used a lot. It would be very awkward to work with a class that did not provide a copy constructor.

So, again, the compiler tries to be helpful.

If we do not create a copy constructor for a class, the compiler generates one for us.


Example 1: copying Address
addressDecl.h
class Address {
public:
  Address (std::string theStreet, std::string theCity,
           std::string theState, std::string theZip);

  std::string getStreet() const;
  void putStreet (std::string theStreet);
  
  std::string getCity() const;
  void putCity (std::string theCity);
  
  std::string getState() const;
  void putState (std::string theState);
  
  std::string getZip() const;
  void putZip (std::string theZip);
  
private:
  std::string street;
  std::string city;
  std::string state;
  std::string zip;
};

In the case of our Address class, we have not provided a copy constructor, so the compiler would generate one for us.

The implicitly generated copy constructor would behave as if it had been written this way:

Address::Address (const Address& a)
  : street(a.street), city(a.city), 
    state(a.state), zip(a.zip)
{
}

If our data members do not have explicit copy constructors (and their data members do not have explicit copy constructors, and … ) then the compiler-provided copy constructor amounts to a shallow copy.

The compiler is all too happy to generate a copy constructor for us, but can we trust what it generates? To understand when we can and cannot trust it, we need to understand the different ways in which copying can occur.


Example 2: copying Books
book1.h
#ifndef BOOK_H
#include "author.h"
#include "publisher.h"

class Book {
public:
  Book();
  
  Book (std::string theTitle, const Publisher* thePubl,
		      int numberOfAuthors, Author* theAuthors,
			  std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
        const Author& theAuthor,
        std::string theISBN);

  std::string getTitle() const        {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

  Publisher* getPublisher() const    {return publisher;}
  void setPublisher(const Publisher* publ) {publisher = publ;}

  int getNumberOfAuthors() const {return numAuthors;}

  Author getAuthor (int authorNumber) const;
  void addAuthor (const Author&);
  void removeAuthor (cosnt Author&);

  std::string getISBN() const   {return isbn;}
  void setISBN(std::string id)  {isbn = id;}

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author authors[maxAuthors];
  std::string isbn;
};


#endif

If we provide no copy constructor for Book, the compiler generates one for us. It would be equivalent to

Book::Book (const Book& b)
  : title(b.title), isbn(b.isbn), publisher(b.publisher),
    numAuthors(b.numAuthors), MAXAUTHORS(b.MAXAUTHORS),
	authors(b.authors)
{
}

We’ve already seen that this would be fine for a Book implemented using a simple array, but a disaster for a Book implemented using a dynamically allocated array.

2.1.2 Do We Trust the Compiler?

So we conclude

The compiler-generated copy constructor is wrong for classes that have pointers among their data members to data that they don’t want to share.

2.1.3 Implementing a deep Copy Constructor

So for our dynamic array version of Book, we need to implement our own copy constructor.

We start by adding the constructor declaration:

class Book {
public:
  Book();

  Book (std::string theTitle, const Publisher* thePubl,
           int numberOfAuthors, Author* theAuthors,
           std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
       const Author& theAuthor,
       std::string theISBN);

  Book(const Book&);

  std::string getTitle() const        {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

    ⋮

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author* authors; // array of authors
  std::string isbn;
};

Then we supply a function body for this constructor.

Book::Book (const Book& b)
  : title(b.title), isbn(b.isbn), publisher(b.publisher),
    numAuthors(b.numAuthors), authors(new Author[maxAuthors])
{
  for (int i = 0; i < numAuthors; ++i)
    authors[i] = b.authors[i];
}

Most of the data members can be copied easily. But the authors pointer is copied by allocating a new array big enough to hold the existing data; then all the existing data has to be copied into the new array.

2.2 Assignment Operators

In most cases, when we think of copying, we think of assignment, not the copy constructor.

When we write book1 = book2, that’s shorthand for book1.operator=(book2).

The difference between the assignment operator and the copy constructor seems subtle to some people, but it typically comes down to whether the statement is a declaration or not. Look for the type name at the start of the statement:

MyClass x = y;
  • invokes the copy constructor
  • creates a new variable, x
  • initializes that new variable as a copy of y
x = y;
  • invokes the assignment operator
  • changes the value of an existing variable, x
  • replaces the value of that variable by a copy of y

It’s arguable which actually gets used more in typically C++ programming, the copy constructor or the assignment operator. Most of us probably write a lot more assignments, but the compiler generates a lot of copy constructor calls for us.

Assignment is so common in most people’s programming that, once again, the compiler tries to be helpful:

If you don’t provide your own assignment operator for a class, the compiler generates one automatically.


Example 3: assigning Address
addressDecl.h
class Address {
public:
  Address (std::string theStreet, std::string theCity,
           std::string theState, std::string theZip);

  std::string getStreet() const;
  void putStreet (std::string theStreet);
  
  std::string getCity() const;
  void putCity (std::string theCity);
  
  std::string getState() const;
  void putState (std::string theState);
  
  std::string getZip() const;
  void putZip (std::string theZip);
  
private:
  std::string street;
  std::string city;
  std::string state;
  std::string zip;
};

For example, we have not provided an assignment operator for Address class. Therefore the compiler will attempt to generate one, just as if we had written

class Address {
public:
  Address (std::string theStreet, std::string theCity,
           std::string theState, std::string theZip);


  Address& operator= (const Address&);

The automatically generated body for this assignment operator will be the equivalent of

Address& Address::operator= (const Address& a)
{
  street = a.street;
  city = a.city;
  state = a.state;
  zip = a.zip;
  return *this;
}

And that automatically generated assignment is just fine for Address.

2.2.1 Return values in Asst Ops

The return statement in the prior example returns the value just assigned, allowing programmers to chain assignments together:

addr3 = addr2 = addr1;

The compiler is all too happy to generate an assignment operator for us, but can we trust what it generates? To understand when we can and cannot trust it, we need to understand the different ways in which copying can occur.

Example 4: Assigning Books
book1.h
#ifndef BOOK_H
#include "author.h"
#include "publisher.h"

class Book {
public:
  Book();
  
  Book (std::string theTitle, const Publisher* thePubl,
		      int numberOfAuthors, Author* theAuthors,
			  std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
        const Author& theAuthor,
        std::string theISBN);

  std::string getTitle() const        {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

  Publisher* getPublisher() const    {return publisher;}
  void setPublisher(const Publisher* publ) {publisher = publ;}

  int getNumberOfAuthors() const {return numAuthors;}

  Author getAuthor (int authorNumber) const;
  void addAuthor (const Author&);
  void removeAuthor (cosnt Author&);

  std::string getISBN() const   {return isbn;}
  void setISBN(std::string id)  {isbn = id;}

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author authors[maxAuthors];
  std::string isbn;
};


#endif

If we provide no assignment operator for Book, the compiler generates one for us. It would be equivalent to

Book& Book::operator= (const Book& b)
{
  title = b.title;
  isbn = b.isbn;
  publisher = b.publisher;
  numAuthors = b.numAuthors;
  MAXAUTHORS = b.MAXAUTHORS;
  authors = b.authors;
  return *this;
}

Again, we’ve seen that this would be fine for a Book implemented using a simple array, but a disaster for a Book implemented using a dynamically allocated array.

2.2.2 Do We Trust the Compiler?

So we conclude

The compiler-generated assignment operator is wrong for classes that have pointers among their data members to data that they don’t want to share.

2.2.3 Implementing a deep assignment operator

So for our dynamic array version of Book, we need to implement our own assignment operator. We start by adding the operator declaration:

class Book {
public:
  Book();

  Book (std::string theTitle, const Publisher* thePubl,
           int numberOfAuthors, Author* theAuthors,
           std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
       const Author& theAuthor,
       std::string theISBN);

  Book(const Book&);

  const Book& operator= (const Book&);

  std::string getTitle() const        {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

    ⋮

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author* authors; // array of authors
  std::string isbn;
};

Then we supply a function body for this operator.

const Book& Book::operator= (const Book& b)
{
  title = b.title;
  isbn = b.isbn;
  publisher = b.publisher;
  numAuthors = b.numAuthors;
  delete [] authors;                 ➀
  authors = new Author[MAXAUTHORS];
  for (int i = 0; i < numAuthors; ++i) ➁
    authors[i] = b.authors[i];
  return *this;
}

Most of the data members can be copied easily. But the authors pointer is copied by allocating a new array big enough to hold the existing data; then all the existing data has to be copied into the new array ().

Note also that one of the big differences between a copy constructor and an assignment operator is that copy constructors build new values, but assignment operators replace existing values. That means that one of the tasks of an assignment operator has to be to clean up the old value, which is what you see in step , above.

And that leads us to an interesting issue…

2.2.4 Self-Assignment

If we assign something to itself:

x = x;

we normally expect that nothing really happens.

But when we are writing our own assignment operators, that’s not always the case. Sometimes assignment of an object to itself is a nasty special case that breaks things badly.

In the Book assignment operator we have just developed, what happens if we do b1 = b1;?

In step , we deleted the existing authors array. In step , we copy the old authors into the new array. But if we are assigning a book to itself, there won’t be any old authors left to copy, because we will have just deleted them.

So, instead of b1 = b1; leaving b1 unchanged, it would actually destroy b1.


Checking for Self-Assignment

const Book& Book::operator= (const Book& b)
{
  if (this != &b)
    {
      title = b.title;
      isbn = b.isbn;
      publisher = b.publisher;
      numAuthors = b.numAuthors;
      delete [] authors;
      authors = new Author[MAXAUTHORS];
      for (int i = 0; i < numAuthors; ++i)
    authors[i] = b.authors[i];
    }
  return *this;
}
bookSelfAsst.cpp
const Book& Book::operator= (const Book& b)
{
  if (this != &b)
    {
      title = b.title;
      isbn = b.isbn;
      publisher = b.publisher;
      numAuthors = b.numAuthors;
      MAXAUTHORS = b.MAXAUTHORS;
      delete [] authors;
      authors = new Author[MAXAUTHORS];
      for (int i = 0; i < numAuthors; ++i)
	authors[i] = b.authors[i];
    }
  return *this;
}

This is safer.

You might think that self-assignment is so rare that we wouldn’t need to worry about it. But, in practice, you might have lots of ways to reach the same object.

For example, we might have passed the same object as two different parameters of a function call foo(b1,b1). If the function body of foo were to assign one parameter to another, we would then have a self-assignment that would likely not have been anticipated by the author of foo and that would be very hard to detect in the code that called foo.

As another example, algorithms for sorting arrays often contain statements like

array[i] = array[j];

with a very real possibility that, on occasion, i and j might be equal.

So self-assignment does occur in practice, and it’s a good idea to check for this whenever you write your own assignment operators.

2.3 Destructors

We’ve already talked about the purpose of destructors. They are used to clean up objects that are no longer in use.

Once again, we find that we can’t do without them:

If you don’t provide a destructor for a class, the compiler generates one for you automatically.


Example 5: Destroying addresses
addrNoDestructor.h
class Address {
public:
  Address (std::string theStreet, std::string theCity,
           std::string theState, std::string theZip);

  std::string getStreet() const;
  void putStreet (std::string theStreet);
  
  std::string getCity() const;
  void putCity (std::string theCity);
  
  std::string getState() const;
  void putState (std::string theState);
  
  std::string getZip() const;
  void putZip (std::string theZip);
  
private:
  std::string street;
  std::string city;
  std::string state;
  std::string zip;
};


class Author
{
public:
  Author (std::string theName, Address theAddress, long id);

  std::string getName() const        {return name;}
  void putName (std::string theName) {name = theName;}

  const Address& getAddress() const   {return address;}
  void putAddress (const Address& addr) {address = addr;}

  long getIdentifier() const     {return identifier;}

private:
  std::string name;
  Address address;
  const long identifier;
};

We have not declared or implemented a destructor for any of our classes. For Address and Author, that’s OK.


Example 6: Destroying Books

Start with our simple array version.

book1.h
#ifndef BOOK_H
#include "author.h"
#include "publisher.h"

class Book {
public:
  Book();
  
  Book (std::string theTitle, const Publisher* thePubl,
		      int numberOfAuthors, Author* theAuthors,
			  std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
        const Author& theAuthor,
        std::string theISBN);

  std::string getTitle() const        {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

  Publisher* getPublisher() const    {return publisher;}
  void setPublisher(const Publisher* publ) {publisher = publ;}

  int getNumberOfAuthors() const {return numAuthors;}

  Author getAuthor (int authorNumber) const;
  void addAuthor (const Author&);
  void removeAuthor (cosnt Author&);

  std::string getISBN() const   {return isbn;}
  void setISBN(std::string id)  {isbn = id;}

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author authors[maxAuthors];
  std::string isbn;
};


#endif

This version of the book has all of its data in a single block of memory. Assuming that each data member knows how to clean up its own internal storage, there’s really nothing we would have to do when this book gets destroyed.

We can rely on the compiler-provided destructor.

Now, let’s think about the dynamically allocated array.

book2.h
#ifndef BOOK_H
#include "author.h"
#include "publisher.h"

class Book {
public:
  Book();
  
  Book (std::string theTitle, const Publisher* thePubl,
		      int numberOfAuthors, Author* theAuthors,
			  std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
        const Author& theAuthor,
        std::string theISBN);

  Book(cponst Book&);
  const Book& operator= (const Book&);

  std::string getTitle() const        {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

  Publisher* getPublisher() const    {return publisher;}
  void setPublisher(const Publisher* publ) {publisher = publ;}

  int getNumberOfAuthors() const {return numAuthors;}

  Author getAuthor (int authorNumber) const;
  void addAuthor (const Author&);
  void removeAuthor (cosnt Author&);

  std::string getISBN() const   {return isbn;}
  void setISBN(std::string id)  {isbn = id;}

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author* authors; // array of authors
  std::string isbn;
};


#endif

 

In this version of the Book class, a portion of the data is kept on the heap.


To implement our own destructor, we start by adding the destructor declaration:

class Book {
public:
  Book();

  Book (std::string theTitle, const Publisher* thePubl,
           int numberOfAuthors, Author* theAuthors,
           std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
       const Author& theAuthor,
       std::string theISBN);

  Book(const Book&);
  const Book& operator= (const Book&);
  ~Book();

Then we supply a function body for this destructor.

Book::~Book()
{
  delete [] authors;
}

Not much needs to be done - just delete the pointer to the array of authors.

2.3.1 Trusting the Compiler-Generated Destructor

By now, you may have perceived a pattern.

Compiler-generated destructors are wrong for an ADT when…

  • Your ADT has pointers among its data members, and
  • You don’t want to share the objects being pointed to.

Under those circumstances, the compiler-generated destructor would result in a memory leak by failing to recover storage of an allocated object that is not accessible from anywhere else.

3 The Rule of the Big 3

The “Big 3” are the

We’ve seen that, for each of these, the compiler will provide them if we don’t but that the compiler-generated versions will be wrong for our ADT under identical circumstances.

This leads to the Rule of the Big 3:

If you provide your own version of any one of the Big 3, you should provide your own version of all 3.

This is an important rule of thumb for C++ programmers. Like all “rules of thumb”, there are exceptions, but they are rare.

Watch for situations where you have pointers to data you don’t share. When you see that, plan on implementing your own version of these three functions.

3.1 What Happens When You Violate the Rule of the Big 3?

3.1.1 Suppose you are missing the destructor…

…but you have the copy constructor and assignment operator

3.1.2 Suppose you are missing the copy constructor…

…but you have the destructor and assignment operator

3.1.3 Suppose you are missing the assignment operator…

…but you have the destructor and copy constructor

3.1.4 Would it be safer (and easier) to just omit all three?

It’s the worst of all possible worlds!

4 Moving Data – l-values and r-values

A common, and valid, criticism of C++ is that it forces so much copying to take place that programs are noticeably slowed down by all the copying.

The 2011 C++ standard responded to that criticism by introducing some new features for moving data rather than copying it. To understand this, though, we need to go back to some ideas from some of the very earliest programming languages.

4.1 L-values and R-values

Assignment is actually a rather interesting operation (even setting aside the fact that, in C++, you can override operator= to make assignment mean almost anything you want). Ask someone for an example of assignment, and they might respond with something like this:

x = y;

But that really over-simplifies things, because we know that we can put almost any kind of expression on the right:

x = y;
x = y + 1;
x = a[i].data + f(x);

So we can see a difference in how the left and right hand sides of the assignment are treated. The left side names a location where we want to store something. The right denotes a value to store there. That value can be given as another location from which to fetch a value, or as an expression to compute the value to be stored.

But a little more thought shows that this picture is still over-simplified. There are many expressions that we can use on the left hand side as well:

x = y;
a[i] = y + 1;
b.title = a[i].data + f(x);
c.foo() = 42;

So is it just expressions on either side? No, not quite. There are some expressions that make no sense at all on the left hand side of an assignment:

y + 1 = 23; // No!
sqrt(x) = 2.0; // No!

So how is it that some expressions can appear on the left of an assignment but others cannot? Well, the important distinction is that any expression that appears on the left of the assignment must somehow compute a location. On the right hand side, we can have expressions that yield locations or “pure” calculated values.

An l-value is an expression that denotes a location where data can be stored.

An r-value is an expression that denotes a value that can be stored in a location.

int x;
int a[100];
x = 1; // OK: x is a location
x+1 = 1; // No: x+1 is not a location
a[2] = 1; // OK: a[2] is a location
a[x+1] = 1; // OK: a[x+1] is a location

I’ve seen authors explain the names by indicating that the “l” stands for “location” and the “r” for, well, that one varies a bit, sometimes “reference”, sometimes something else. But that’s pure revisionism. It’s clear from the earliest uses of the terms that the “l” and the “r” stand for “left” and “right”, because they describe the idea that an assignment is legal if it has the form

l-value = r-value;

So how it is that, on the right, we can sometimes have a location and sometimes a “real” data value?

x = 2*y; // OK: 2*y is an int
x = y;   // OK: y is a location (an int&)

In older programming languages, this was explained by claiming that the l-values are a special case of r-values. After all, the very existence of pointers shows that locations can be stored as data. But that explanation is a trifle bit weak, because it fails to explain why ordinary assignment copies values instead of addresses.

int x, y;
int* p;
x = y; // OK
x = p; // No good
p = y; // Also no good

C++ took a more formal approach to this problem by introducing reference types as fundamental types in the language. A reference holds a location, and assignment is defined as taking a reference type on the left and introducing some special rules:

 

Reference types turn out to be very useful. Among other things, they open up a useful option for functions. When we have a function with an “output parameter”, that’s because we actually passed in a reference type for that parameter.

For the purposes of this discussion, however, what is important is that reference types are l-values. Any operator or function whose return type is a reference type can be used to supply a location to which we can assign:

int a[100];

int& foo(int i) { return a[i]; }
int bar(int i) { return a[i]; }
   ⋮
foo(0) = 12; // OK: assigns to a[0]
bar(1) = 11; // compilation error - bar() does not return a reference

OK, so references are l-values. Each reference holds a location where data can be stored.

int a [100]
int k =1;
int& x = a[2*k+1]; // x holds the location of a[3]
x = 22; // stores 22 at a[3];

Now let’s shift our attention back to the right. It’s certainly possible in C++ to have references on the right as well:

int a [100]
int k =1;
int& x = a[2*k+1]; // x holds the location of a[3]
y = x; // copies a[3] into y

The thing to note here is that references denote actual memory locations where data is kept. But what about a statement like

x = y+1;

Where is the value y+1 stored prior to the actual assignment? In fact it might not be stored in memory at all. It might be a value that is computed in a CPU register and held there until we are ready to perform the assignment into x. If it does get stored in memory, it would be in some temporary storage location not directly accessible to the programmer.

y+1 is a “pure” r-value without a notion of “storage location”.

4.2 R-values and Returns

Suppose that we have function to produce a new edition of a Book:

Book newEdition (const Book& ofBook)
{
   Book b = ofBook;
   ++b.edition;
   return b;
}

We’ve already discussed the return statement triggers a call to the Book copy constructor, so that a copy of the book b is made as part of the return. That means that, if we use our new function like this:

Book oldBook;
  ⋮
Book newBook = newEdition(oldBook); ➀

then line actually results in 2 calls to the Book copy constructor, one to enact the return statement in the newEdition function, and the second to copy that return value into newBook.

Let’s assume, for the sake of example, that we are using the dynamic array version of Book. That’s two new arrays allocated on the heap. Now, as soon as the returned value has been copied, it is no longer usable, so its destructor will be invoked. That destructor will delete one of those new arrays on the heap. So, we went to all the trouble of building it just to immediately throw it away.

The C++11 standard provides a new mechanism that enables us to avoid excess copying when working with temporary values like that.

In some circumstances, these two “move” functions might run faster by taking advantage of the fact that, if the value being copied is known to be a temporary that’s about to be destroyed, there’s no penalty if we destroy the value in the course of copying it.

I know, that sounds strange. Let’s look at some examples.

Example 7: Book Copy and Move Constructors

We’ve already looked at a copy constructor for Book:

Book::Book (const Book& b)
  : title(b.title), isbn(b.isbn),
    publisher(b.publisher), edition(b.edition)
    numAuthors(b.numAuthors), MAXAUTHORS(b.MAXAUTHORS)
{
  authors = new Author[numAuthors+1];
  for (int i = 0; i < numAuthors; ++i)
    authors[i] = b.authors[i];
}

In this copy constructor, we had to copy the non-pointer data members and then allocate a new array and copy the contents of the old array into the new one.

This leaves us with two perfectly valid books, the original book b and the newly constructed one. But suppose that we know that we will be doing a lot with functions like newEdition that construct and return books, which appear in the caller as temporary variables:

Book newBook = newEdition(oldBook);

We could avoid creating and copying the array by providing a move constructor:

Book::Book (Book&& b)
  : title(b.title), isbn(b.isbn),
    publisher(b.publisher), edition(b.edition)
    numAuthors(b.numAuthors), MAXAUTHORS(b.MAXAUTHORS),
	authors(b.authors)
{
  b.authors = nullptr;
}

Instead of allocating a new array, we copy the address of the old array into the new book. Now, from our previous discussion of shallow copying, we know that this leads to sharing which is not what we want to do here.

But, this constructor will only be selected by the compiler if b is a temporary r-value, which means that it’s going to go away as soon as this call is finished. So, sharing problem solved, right? Well, just one thing to worry about. The Book destructor deletes the authors array, so when b is destroyed, it will try to take its array with it. We stymie that by deliberately breaking b’s pointer to its array by setting that to null. In effect, we are deliberately trashing b now that we have got what we want from it. Because b no longer has a pointer to that array, it won’t be able to destroy it.

Example 8: Book Assignment and Move Assignment

We can do something similar with assignment. Our old assignment operator for books was

bookSelfAsst.cpp
const Book& Book::operator= (const Book& b)
{
  if (this != &b)
    {
      title = b.title;
      isbn = b.isbn;
      publisher = b.publisher;
      numAuthors = b.numAuthors;
      MAXAUTHORS = b.MAXAUTHORS;
      delete [] authors;
      authors = new Author[MAXAUTHORS];
      for (int i = 0; i < numAuthors; ++i)
	authors[i] = b.authors[i];
    }
  return *this;
}

We can do a special version for copying r-values:

Book& Book::operator= (Book&& b)
{
  if (this != &b)
    {
      title = b.title;
      isbn = b.isbn;
      publisher = b.publisher;
      numAuthors = b.numAuthors;
      delete [] authors;
      authors = b.authors;
      b.authors = nullptr;
    }
  return *this;
}

This saves the time that would be spent allocating and copying data from an array that would be discarded momentarily, anyway.

4.3 The Rule of the Big 5?

So, do we now have a “Rule of the Big 5” to replace the former “Rule of the Big 3”?

Well, yes and no.

The Rule of the Big 3 was important because violations of it almost always meant that our program would not function correctly.

The two new move operations are not nearly so crucial. Implementing them may result in a speedup of our code, but the code would still function correctly without them. So, if there is a Rule of the Big 5, it would have to be something like this:

If you provide your own version of a copy constructor, assignment operator, or destructor, you should provide your own version of all 3.

You should then at least consider providing your own version of the move constructor and move assignment operator.

Not quite as catchy, and it remains to be seen just how widely the move functions will be embraced by the C++ programming community.

5 Summary

The Big 3 are the destructor, copy constructor, and assignment operator.

If your class has pointers among its data members and the data is not something you want to share among different class instances, you need to implement your own versions of the Big 3.

Rule of the Big 3: If you implement your own version of any of the Big 3, you usually will need to implement your own versions of all 3.