Implementing ADTs in C++

Steven J. Zeil

Last modified: Oct 26, 2023
Contents:

Once we have defined our desired ADT interface, it’s time to look at implementation – choosing a data structure and writing algorithms for the member functions.

In C++, this is generally done using a C++ class. The class enforces the ADT contract by dividing the information about the implementation into public and private portions. The ADT designer declares all items that the application programmer can use as public, and makes the rest private. An application programmer who then tries to use the private information will be issued error messages by the compiler.

1 Data Members

To be useful, an ADT must usually contain some internal data. These are declared as data members of the class. To continue our prior example, what data should we associate with a book? Earlier, we said that “A Book has a title, one or more authors, and a unique identification code.” and we can declare these as shown here.

class Book {
    ⋮
 private:
  std::string title;
  std::string isbn;
  Publisher publisher;
  int numberOfAuthors;
  static const int maxAuthors = 12;
  Author authors[maxAuthors];  // array of authors
};

Note that it’s not unusual for data members to involve still other ADTs (e.g., the Author and Publisher data types in this example are presumably declared elsewhere as classes with a number of different data fields, including name, address, etc.).

2 Function Members

2.1 Many Function Members are Simple

Many ADTs are rich in attributes and lean in operations. That means that many of the function members will be “gets” and “sets” that do little more than fetch and store in private data members.

class Book {
public:
  Book();

  Book(const std::string& title, const std::string& isbn, 
       const Publisher& publisher,
       Author* authors = nullptr, int numAuthors = 0);

  std::string getTitle() const {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

  const Publisher& getPublisher() const {return publisher;}
  void setPublisher(const Publisher& publ) {publisher = publ;}

  int getNumberOfAuthors() const {return numberOfAuthors;}

  Author getAuthor (int authorNumber) const;
  void addAuthor (Author);
  void removeAuthor (Author);

  std::string getISBN() const {return isbn;}
  void setISBN(std::string id) {isbn = id;}

  bool operator== (const Book& right) const;
  bool operator< (const Book& right) const;

private:
  std::string title;
  std::string isbn;
  Publisher publisher;
  int numberOfAuthors;
  static const int maxAuthors = 12;
  Author authors[maxAuthors];  // array of authors
};

Because these are so simple, they are good candidates for inlining.

2.2 Separately Compiled Member Functions

The remaining member functions would then have their bodies placed in a separage .cpp file. The usual convention is to package each ADT into its own pair of appropriately named .h and .cpp files. The class declaration above would typically appear in a file named book.h, as shown here.

#ifndef BOOK_H
#define BOOK_H 

#include <iostream>
#include <string>

#include "author.h"

class Publisher;                                 ➀


class Book {
public:
    ⋮
private:
  std::string title;
  Publisher* publ;
  int numAuthors;
  static const int maxAuthors = 12;
  Author* authors;  // array of authors
  std::string isbn;
};

std::ostream& operator<< (std::ostream& out, const Book& book)

#endif

Then, in a separate file named book.cpp we would place the remaining function definitions (bodies).

#include "book.h"
#include "publisher.h"

#include <cassert>
#include "arrayManipulation.h"

using namespace std;


Book::Book()              ➀
: title(), isbn(),        ➁
  publisher(),
  numAuthors(0)
{

}

Book::Book(const std::string& theTitle, const std::string& theISBN, 
           const Publisher& thePublisher,
		   Author* theAuthors, int theNumAuthors)
: title(theTitle), isbn(theISBN),            ➁
  publisher(thePublisher), numAuthors(0)
{
	for (int i = 0; i < theNumAuthors; ++i)
	{
		addAuthor(theAuthors[i]);
	}
}


void Book::addAuthor (const Author& au)      ➀
{
	assert (numAuthors < MaxAuthors);
	addIfNotPresent(authors, numAuthors, au.getName());  ➂
}

void Book::removeAuthor (const Author& au)
{
	removeIfPresent (authors, numAuthors, au.getName());  ➂
}

Author Book::getAuthor(int i) const
{
	return Author(authors[i], Address());
}


bool Book::operator== (const Book& right) const
{
	return getISBN() == right.getISBN();    ➃
}

bool Book::operator< (const Book& right) const
{
	return getISBN() < right.getISBN();   ➃
}


std::ostream& operator<< (std::ostream& out, const Book& book)
{
	out << book.getTitle() << ", by ";
	for (int i = 0; i < book.numberOfAuthors(); ++i)
	{
		if (i > 0)
			out << ", ";
		out << book.getAuthor(i);
	}
	out << "; " << book.getPublisher() << ", " << book.getISBN();
	return out;
}

2.3 Initialization Lists

There is a subtle but important distinction in C++ between initializing data and assigning data.

Initializing data is something that happens only once, when that data value is created. For example, we assign variables like this:

int x;
string s1, s2;
  ⋮
x = 23;
s1 = s2;

We initialize variables like this:

int x = 0;
double pi (3.14159);
string s1 = "abcdef";
string s2 ("abcdef");
string s3 {"abcdef"};
string s4;
Address holmesHome ("221b Baker St", "London", "", "England");

You can easily tell the difference between assignment and initialization because the initialization is combined with declaring the variables, so the type name appears in front of the variable being initialized.

Initialization of a value takes place by invoking the constructor for that value’s data type. The first three strings are each initialized by calling the constructor for class std::string that expects a single parameter of type const char* (the data type of string literals like "abcdef"). If a constructor takes exactly parameter, then the initialization can be written in any of the three forms shown above for s1, s2, and s3. They all mean exactly the same thing. The { } form for s3 is a relatively recent addition to C++, and is intended to make intialization look less like ordinary assignment or like an ordinary function call.

When writing constructors for classes, it’s tempting to write them much like ordinary functions, using a series of assignments:

Book::Book()             
{
    title = "";
    isbn = "";
    publisher = Publisher();
    numAuthors = 0;
}

Book::Book(const std::string& theTitle, const std::string& theISBN, 
           const Publisher& thePublisher,
		   Author* theAuthors, int theNumAuthors)
{
  title = theTitle;
  isbn = theISBN;
  publisher = thePublisher;
  numAuthors = 0;
  for (int i = 0; i < theNumAuthors; ++i)
	{
		addAuthor(theAuthors[i]);
	}
}

But these are all assignments. Before they even begin, the data members of the class have already been initialized (using their class’s default constructor). Then our assignments come along and write all over those initial values. This is rather inefficient.

We could simplify the Book default constructor a bit. Since we are overwriting each of the data members with the same value that their default constructors will have already initialized them to, we could write

Book::Book()             
{
    numAuthors = 0;
}

(Integers, by default, simply keep whatever random bits happened to be in that block of memory, so we still need to put some better value into numAuthors.) Although faster, this has the disadvantage of being less helpful to someone reading to code trying to perceive what data members are initialized and how.

Another way to write the same constructor is to make use of an initialization list, a special C++ syntax for initializing data members. It’s shown in the highlighted portion of the constructor code below.

Book::Book()             
: title(), isbn(),
  publisher(),
  numAuthors(0)
{
}

Book::Book(const std::string& theTitle, const std::string& theISBN, 
           const Publisher& thePublisher,
		   Author* theAuthors, int theNumAuthors)
: title(theTitle), isbn(theISBN),
  publisher(thePublisher), numAuthors(0)
{
	for (int i = 0; i < theNumAuthors; ++i)
	{
		addAuthor(theAuthors[i]);
	}
}

These describe how to initialize the data members. Each item in the list represents a call to a constructor. For example, in the first constructor

title()

means to initialize title (a `string) by invoking the string constructor that takes no parameters (the default constructor for string), while, in the second constructor,

title(theTitle)

means to initialize title by calling the string constructor that takes a single expression of type string as its only parameter (the string copy constructor).


Initialization lists

Initialization lists are often faster and more efficient than doing the same steps via normal assignment.


How data members are initialized in constructors

Each data member of a constructor is initialized, before the { } body of the constructor is started, according to the following rules:

  1. If a data member x is of a class/struct type T and x is listed in the initialization list as x(parameters), then x is initialized using the constructor T(parameters).
  2. If a data member x is of a primitive (not a class or struct) type T and x is listed in the initialization list as x(expression), then x is initialized as if assignment `x = expression.
  3. If a data member x is of a class/struct type T and x is not listed in the initialization list, then x is initialized using the constructor T().

So you can see that all non-primitive data members will be initialized before the constructor’s function body begins. It only makes sense to try and make that first initialization be the one that we actually want.

2.4 What is “this”?

With the ADT we have now prepared, we could write code like:

Book b1;
Book b2;
  ⋮
if (b1.getAuthor(0) == b2.getAuthor(0))
{
   ⋮

Now, let’s look at getAuthor again:

const Author& Book::getAuthor (int i) const
{
  return authors[i];
}

authors is, of course, a data member of the Book class. But in the two calls above, we expect that the getAuthor function will retrieve b1.authors on the first call and b2.authors on the second. But how does the code produced by compiling the function body above “know” which book object it should be working with?

The answer lies in a bit of legerdemain carried out by the C++ language designers. If you were to look at the code actually generated for member functions, you would discover that all (non-static) member functions have a hidden parameter, a pointer to the object that will appear to the left of the ‘.’ when we write calls like b1.getAuthor(0). So when we write:

class Book {
public:
     ⋮
  std::string getTitle() const;
  void setTitle(std::string theTitle);
     ⋮
  Author getAuthor (int i) const;
     ⋮
};

what the compiler actually generates is

class Book {
public:
     ⋮
  std::string getTitle(const Book* this) const;
  void setTitle(Book* this, std::string theTitle);
     ⋮
  Author getAuthor (const Book* this, int i) const;
     ⋮
};

Every non-static member function has a hidden first parameter. Its name is “this” and its data type is a pointer to the class being declared.

If the member function is const, this will be a const pointer. const pointers can be used to look at the data they point to, but cannot be used to change that data.

So when we write

if (b1.getAuthor(0) == b2.getAuthor(0))

how does Book::getAuthor know whether to access b1’s data members or b2’s data members? The answer is that

  1. Function calls with the member ‘.’ on the left, like

    b1.getAuthor(0) 
    

    are actually translated as if we had written

    Book::getAuthor(&b1, 0);
    

    The address of the variable on the left of the ‘.’ is passed as the first, hidden parameter.

  2. The C++ compiler will insert a dereference of the hidden pointer parameter whenever it sees a data or function member name that can’t be properly compiled otherwise. So when we write:

    Author* Book::getAuthor (int authorNumber) const
    {
      return authors[authorNumber];
    }
    

    the compiler pretends that we wrote

    Author* Book::getAuthor (const Book* this, int authorNumber)
    {
      return this->authors[authorNumber];
    }
    

We never need to write this-> in our code. But there are times when we do need to do things to the object on the left of a ‘.’ call other than use it with a -> dereference. In those cases, we need to know that this is very real even though it is hidden, that it is a pointer, and that is it available for us to use in our code just like any other function parameter.

3 Filling in the Related ADTs

The Book is not our only class in this world. Let’s start to fill in the other ADTs from our example:

Address is pretty simple:

class Address {
public:

  Address (std::string theStreet = std::string("unknown"),
		   std::string theCity = std::string(),
           std::string theState = std::string(),
           std::string theZip = std::string());

  std::string getStreet() const {return street;}
  void setStreet (std::string theStreet) {street = theStreet;}

  std::string getCity() const {return city;}
  void setCity (std::string theCity) {city = theCity;}

  std::string getState() const {return state;}
  void setState (std::string theState) {state = theState;}

  std::string getZip() const {return zipcode;}
  void setZip (std::string theZip) {zipcode = theZip;}

  bool operator== (const Address& right) const;
  bool operator< (const Address& right) const;

private:
  std::string street;
  std::string city;
  std::string state;
  std::string zipcode;
};

inline Address::Address (std::string theStreet, std::string theCity,
        std::string theState, std::string theZip)
  : street(theStreet), city(theCity), state(theState), zipcode(theZip)
{
}


std::ostream& operator<< (std::ostream& out, const Address& addr);

Note that, again, you can see the use of an initialization list in the Address constructor.

Author seems no more complicated than Book

class Author
{
public:
  Author();

  Author (std::string theName, const Address& theAddress);

  std::string getName() const        {return name;}
  void setName (std::string theName) {name = theName;}

  const Address& getAddress() const   {return address;}
  void setAddress (const Address& addr) {address = addr;}

  int numberOfBooks() const;
  Book& getBook(int i);
  const Book& getBook(int i) const;

  void addBook (Book& b);
  void removeBook (Book& b);

  bool operator== (const Author& right) const;
  bool operator< (const Author& right) const;


private:
  std::string name;
  Address address;

  static const int BookMax = 10;
  int numBooks;
  Book books[BookMax];
};

std::ostream& operator<< (std::ostream& out, const Author& author);

But when we consider this together with Book, we can see a problem. If every Author object contains the Books that person has written, and every Book object contains the list of Authors of that book, we have a conflict. So the block of memory for one Book will include bytes reserved to contain several Authors, each of whie will set aside some bytes to contain several Books, each of will contain bytes to hold its Authors, each of which will… This never ends.

We have a similar problem when we introduce the Publishers:

class Publisher
{
public:
  Publisher (std::string theName = std::string());

  std::string getName() const        {return name;}
  void setName (std::string theName) {name = theName;}

  int numberOfBooks() const;
  Book& getBook(int i);
  const Book& getBook(int i) const;
  void addBook (Book& b);

  int numberOfAuthors() const;
  Author& getAuthor(int i);
  const Author& getAuthor(int i) const;
  void addAuthor (const Author& au);

  bool operator== (const Publisher& right) const;
  bool operator< (const Publisher& right) const;


private:
  std::string name;

  static const int BookMax = 10;
  int numBooks;
  Book books[BookMax];

  static const int AuthorMax = 20;
  int numAuthors;
  Author authors[AuthorMax];

};

std::ostream& operator<< (std::ostream& out, const Publisher& publ);

If every Book contains its Publisher, which contains all of the Books it publishes, each of which contains its Publisher, … There we go again.

There are a couple of ways out of this dilemma:

3.0.1 Cheating – Break the Abstraction

We can “cheat” by not storing the authors and publishers directly in the books, but only storing their names. We would need to alter the interface accordingly, e.g.,

class Book {
public:
  Book();

  Book(const std::string& title, const std::string& isbn, 
       const Publisher& publisher,
       Author* authors = nullptr, int numAuthors = 0);

  std::string getTitle() const {return title;}
  void setTitle(std::string theTitle) {title = theTitle;}

  const std::string& getPublisher() const {return publisher;}
  void setPublisher(const std::string& publ) {publisher = publ;}

  int getNumberOfAuthors() const {return numberOfAuthors;}

  std::string getAuthor (int authorNumber) const;
  void addAuthor (std::string);
  void removeAuthor (std::string);

  std::string getISBN() const {return isbn;}
  void setISBN(std::string id) {isbn = id;}

  bool operator== (const Book& right) const;
  bool operator< (const Book& right) const;

private:
  std::string title;
  std::string isbn;
  std::string publisher;
  int numberOfAuthors;
  static const int maxAuthors = 12;
  std::string authors[maxAuthors];  // array of author names
};

I consider this cheating because it’s not the way our original abstract idea worked. If we ever want, for example, to get the address of the author of a book, or to find out if a book’s author has other books under the same publisher, we’re going to need some significant additional coding and data structures to obtain that kind of information.

Nonetheless, it’s a workable approach, and relatively simple, and is an example that we will use in several later lessons.

3.0.2 Use Pointers

If we want to stay faithful to our original abstraction, we can do so by breaking the unending “contains” recursion using pointers.

Indeed, this was hinted at in our earlier diagram.

 

4 The Perils of Pointers

…and pointer-like types.

4.1 Pointers and References

First, a quick review.

Both pointers and references hold the address of a value.

4.1.1 Pointers

A pointer type is indicated by following a type name with *, e.g. Author*.

Usually, we generate a pointer value by allocating an object on the heap:

Author* auPtr = new Author("William Shakespeare", avonAddress);

Once that pointer value has been obtained, we can copy it to other places, pass it to functions, store it in other data structures, etc.

If we want to get at the entire object denoted by a pointer, we use the * operator. If we want to get at a function or data member of an object denoted by a pointer, we use the -> operator.

Author shakespeare = *auPtr; 
string name = auPtr->getName();

A special pointer value, the null pointer, written in C++ as nullptr, indicates a pointer that does not actually contain a valid address.

Objects allocated on the heap stay there until our code explictly removes them, which it does via delete.

delete auPtr;  // hands the allocated storage 
               // back to the system for later reuse.

There’s a special case for arrays. If we use a pointer to point to (the start of) an array:

Book* readingList = new Book[30];

then to later recover the storage we say

delete [] readingList;

4.1.2 References

References also hold addresses to objects, but they have several key differences from pointers.

A reference type is indicated by following a type name with &, e.g. Author&.

We generate a reference value by initializing it with the object we want it to point to:

Author shakespeare ("William Shakespeare", avonAddress);
Author& auRef = shakespeare;  // auRef holds the memory address of shakespeare

Once that reference value has been obtained, we can use it to initialize other references, pass it to functions, but we cannot change where it points to. This is unlike pointer variables, which can be reassigned the addresses contained in other pointers/

If we want to get at the entire object denoted by a refernece, we just use the name of the refernece. If we want to get at a function or data member of an object denoted by a reference, we use the . operator.

Author shakespeare2 = auRef; 
string name = auRef.getName();

In fact, once a reference has been initialized, it looks pretty much like an ordinary variable.

Because references are always initialized by giving them the address of a real object, they cannot be null.

You are probably most familiar with reference types being used in the parameter lists for functions, e.g.,

class Book {
      ⋮
   void addAuthor(Author& au);
      ⋮
}

We understand that the & means that we don’t want our code to make an actual copy of the Author object that we pass to this function (which might take considerable time) but simply to pass the address of that object – just a few bytes.

However, references can also be used to simplify and speed up code. Consider some code like this:

Point points[30000];
   ⋮
for (int i = 0; i < 10000, ++i)
{
    double z = points[2*i+1].y;
    points[2*i+1].y = points[2*i+1].x;
    points[2*i+1].x = z;
}

It takes time to do the array calculation points[2*i+1] and, given how many times we will go around that loop, we might not want to do the same pointless (sorry!) calculation three times in a row. And frankly, this is just easier to read:

Point points[30000];
   ⋮
for (int i = 0; i < 10000, ++i)
{
    Point& p = points[2*i+1];
    double z = p.y;
    p.y = p.x;
    p.x = z;
}

Without the &, this code would not actually change any of the values in the array. With the &, however, p holds the address of an element inside the array and lets us look at and alter that array element efficiently.

4.1.3 Converting Between Pointers and References

If we have a pointer p, then *p is actually a reference to the same object.

If we have a reference r, then &r is a pointer to the same object.

4.1.4 const Pointers and const References

Both pointers and references can be declared as const. In both cases, it means the same thing: we can use that address to look at the value at that address but not to change that value in any way.

4.2 Pointers are Perilous

Pointers introduce a whole host of potential run-time errors if used improperly.

All of these various pointer errors are pernicious and hard to debug. Unlike most “simple” bugs in our code, pointer errors can have different effects on different execution of the program, even if we consistently give the program the same input each time. Worse, pointer errors can have effects that are only seen long after the mistaken pointer action took place and can affect data seemingly unrelated to the pointer itself. This makes it very hard to reason backwards from, say, seeing an incorrect value in the program output to the actual cause of the problem.

5 Pointers in the Publishing World

Let’s face it. Many programmer would gladly forgo the risk of working pointers at all if the things weren’t so blasted useful.

Remember our issue with books inside of authors inside of books inside of…? A few pointers can save the day.

class Author
{
public:
  Author();

  Author (std::string theName, const Address& theAddress);


  std::string getName() const        {return name;}
  void setName (std::string theName) {name = theName;}

  const Address& getAddress() const   {return address;}
  void setAddress (const Address& addr) {address = addr;}

  int numberOfBooks() const;
  Book* getBook(int i);
  const Book* getBook(int i) const;

  void addBook (const Book* b);
  void removeBook (const Book* b);

  bool operator== (const Author& right) const;
  bool operator< (const Author& right) const;


private:
  std::string name;
  Address address;

  static const int BookMax = 10;
  int numBooks;
  Book* books[BookMax];
};

Do the same with the Authors inside of Books, and with the Publisher class, and we wind up being able to support our abstraction as originally envisioned.

 

We’re going to leave that version for a while, though, because actually figuring out how and when to delete these objects is something of a nightmare. But we will get to it eventually.

And, in the meantime, we will continue to use pointers, in a more limited fashion, to give ourselves some flexibility in our data structures.