ADTs

Steven J Zeil

Last modified: Aug 6, 2019
Contents:

 

If we were to look at a program that is actually large enough to require a full team of programmers to implement it, you would probably not be surprised to find that it would not be organized as a single, large, monolithic unit, but instead as a large number of cooperating functions. You already know how to design and write functions in C++ . What may, however, come as a bit of a surprise if you have not worked much with programs that size is that even these functions will be further organized into higher level structure.

I’ve tried to illustrate that structure in the diagram that you see here. At the top we have the main application program, for example, a spell checker. The code that occurs at this level is very specific to this application and is the main thing that differentiates a spell checker from, let’s say, a spreadsheet. On the other hand, at the very bottom of the hierarchy, we have all the basic primitive data types and operations, such as the int type, the char type, addition, subtraction, and so on, that are provided by our programming language, (C++ in this example). These primitive operations may very well show up in almost any kind of program.

In between those, we have all the things that we can build on top of the language primitives on our way working up towards our application program. Just above the language primitives we have the basic data structures, structures like linked lists or trees. We’re going to spend a lot of time the semester looking at these kinds of structures - you may already be familiar with some of them. They are certainly very important. And yet, if we stopped at that level, we would wind up “building to order” for every application. As we move from one application to another we would often find ourselves doing the same kinds of coding, over and over again.

What’s wrong with that? Most companies, and therefore most programmers, do not move from one application to a wildly different application on the next project. Programmers who been working on “accounts receivable” are unlikely to start writing compilers the next week, and programmers who have been writing compilers are not going to be writing control software for jet aircraft the week after. Instead, programmers are likely to remain within a general application domain. The people who are currently working on our spell checker may very well be assigned to work on a grammar checker next month, or on some other text processing tool. That means that any special support we can design for dealing with text, words, sentences, or other concepts natural to this application to make may prove valuable in the long run because we can share that work over the course of several projects.

And so, on top of the basic data structures, we expect to find layers of reusable libraries. Just above the basic data structures, the libraries are likely to provide fairly general purpose structures, such as support for look-up tables, user interfaces, and the like. As we move up in the hierarchy, the libraries become more specialized to the application domain in which we’re working. Just below the application level, we will find support for concepts that are very close to the spell checker, such as “words”, “misspellings”, and “corrections”.

The libraries that make up all but the topmost layer of this diagram may contain individual functions or groups of functions organized as Abstract Data Types. In this lesson, we’ll review the idea of Abstract Data Types and their implementations. Little, if any, of the material in this lesson should be entirely new to you - all of it is covered in CS 250.

1 Abstraction

In general, abstraction is a creative process of focusing attention on the main problems by ignoring lower-level details.

In programming, we encounter two particular kinds of abstraction:

1.1 Procedural Abstraction

A procedural abstraction is a mental model of what we want a subprogram to do (but not how to do it).

Example:

If you wanted to compute the length of the a hypotenuse of a right triangle, you might write something like

double hypotenuse = sqrt(side1*side1 + side2*side2);

We can write this, understanding that the sqrt function is supposed to compute a square root, even if we have no idea how that square root actually gets computed.

When we start actually writing the code, we implement a procedural abstraction by

In practice, there may be many algorithms to achieve the same abstraction, and we use engineering considerations such as speed, memory requirements, and ease of implementation to choose among the possibilities.

For example, the “sqrt” function is probably implemented using a technique completely unrelated to any technique you may have learned in grade school for computing square roots. On many systems. sqrt doesn’t compute a square root at all, but computes a polynomial function that was chosen as a good approximation to the actual square root and that can be evaluated much more quickly than an actual square root. It may then refine the accuracy of that approximation by applying Newton’s method, a technique you may have learned in Calculus.

Does it bother you that sqrt does not actually compute via a square root algorithm? Probably not. It shouldn’t. As long as we trust the results, the method is something we are happy to ignore.

1.2 Data Abstraction

A data abstraction is a mental model of what can be done to a collection of data. It deliberately excludes details of how to do it.

1.2.1 Example: calendar days

A day (date?) in a calendar denotes a 24-hour period, identified by a specific year, month, and day number.

1.2.2 Example: cell names

One of the running examples I will use throughout this course is the design and implementation of a spreadsheet. I assume that you are familiar with some sort of spreadsheet program such as Microsoft’s Excel or the OpenOffice Calc program. All spreadsheets present a rectangular arrangement of cells, with each cell containing a mathematical expression or formula to be evaluated.

Every cell in a spreadsheet has a unique name. The name has a column part and a row part.

1.2.3 Example: a book

How to describe a book?

1.2.4 Example: positions within a container

Many of the abstractions that we work with are “containers” of arbitrary numbers of pieces of other data.

This is obvious with things like arrays and lists, but is also true of more prosaic items. For example, a book is, in effect, a container of an arbitrary number of authors (and in other variations, an arbitrary number of pages).

Any time you have an ordered sequence of data, you can imagine the need to look through it. That then leads to the concept of a position within that sequence, with notions like

2 Abstract Data Types

Adding Interfaces

2.1 Definition of an Abstract Data Type

Definition (traditional): An abstract data type (ADT) is a type name and a list of operations on that type.

It’s convenient, for the purpose of this course, to modify this definition just slightly:

Definition (alternate): An abstract data type (ADT) is a type name and a list of members (data or function) on that type.

An ADT corresponds, more or less, to the public portion of a typical class.

This change is not really all that significant. It’s mainly a matter of clarifying what we mean by “operations”. Traditionally, a data member X is modeled as a pair of getX() and putX(x) functions. But in practice, we will allow ADTs to include data members in their specification. This definition may make it a bit clearer that an ADT corresponds, more or less, to the public portion of a typical class.

In either case, when we talk about listing the members, this includes giving their names and their data types (for functions, their return types and the data types of their parameters).

If you search the web for the phrase “abstract data type”, you’ll find lots of references to stacks, queues, etc. - the “classic” data structures. Certainly, these are ADTs. But, just as with the abstractions introduced earlier, each application domain has certain characteristic or natural abstractions that may also need programming interfaces.

2.1.1 ADT Members: attributes and operations

The “members” of an ADT are Commonly divided into

2.2 Examples

2.2.1 Calendar Days

Nothing in the definition of ADT that says that the interface has to be written out in a programming language.

 

UML diagrams present classes as a 3-part box: name, attributes, & operations


Calendar Days: alternative

But we can use a more programming-style interface:

 

class Day {
public:
   // Attributes
   int day;
   int month;
   int year;
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮

or

class Day {
public:
   // Attributes
   int getDay();
   void setDay (int);
   int getMonth();
   void setMonth(int);
   int getYear();
   void setYear(int);
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮

Either of these interfaces captures the sense of the ADT described in the diagram.

From a programming style point of view, we lean towards the second interface, hiding the data members and revealing the attribute via get… and set… operations.

2.2.2 Cell Names

Here is a possible interface for our cell name abstraction.

 

cellnameInterface.h

class CellName
{
public:
  CellName (std::string column, int row,
            bool fixTheColumn = false,
            bool fixTheRow=false);
  //pre: column.size() > 0 && all characters in column are alphabetic
  //     row > 0

  CellName (std::string cellname);
  //pre: exists j, 0<=j<cellname.size()-1, 
  //        cellname.substr(0,j) is all alphabetic (except for a
  //             possible cellname[0]=='$')
  //        && cellname.substr(j) is all numeric (except for a
  //             possible cellname[j]=='$') with at least one non-zero
  //             digit

  CellName (unsigned columnNumber = 0, unsigned rowNumber = 0,
            bool fixTheColumn = false,
            bool fixTheRow=false);

  std::string toString() const;
  // render the entire CellName as a string

  // Get components in spreadsheet notation
  std::string getColumn() const;
  int getRow() const;

  bool isRowFixed() const;
  bool isColumnFixed() const;


  // Get components as integer indices in range 0..
  int getColumnNumber() const;
  int getRowNumber() const;


  bool operator== (const CellName& r) const
     ⋮
private:
     ⋮

Arguably, the diagram presents much the same information as the code

2.2.3 Example: a book

If we were to try to capture our book abstraction (concentrating on the metadata), we might come up with something like:

bookAbstraction0.h

class Book {
public:
  Book (Author)                 // for books with single authors
  Book (Author[], int nAuthors) // for books with multiple authors

  std::string getTitle() const;
  void putTitle(std::string theTitle);

  int getNumberOfAuthors() const;

  std::string getIsBN() const;
  void putISBN(std::string id);

  Publisher getPublisher() const;
  void putPublisher(const Publisher& publ);

  AuthorPosition begin();
  AuthorPosition end();

  void addAuthor (AuthorPosition at, const Author& author);
  void removeAuthor (AuthorPosition at);

private:
  ⋮
};

2.2.4 Example: positions within a container

Coming up with a good interface for our position abstraction is a problem that has challenged many an ADT designer.


C++ Iterators

The solution adapted by the C++ community is to have every ADT that is a “container” of sequences of other data to provide a special type for positions within that sequence.



A Possible Position Interface

In theory, we could satisfy this requirement with an ADT like this:

authorPosition0.h

class AuthorPosition {
public:
   AuthorPosition();

   // get data at this position
   Author getData() const;

   // get the position just after this one
   AuthorPosition next() const;

   // Is this the same position as pos?
   bool operator== (const AuthorPosition& pos) const;
   bool operator!= (const AuthorPosition& pos) const;

};

which in turn would allow us to access authors like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        p = p.next())
     cout << "author: " << p.getData() << endl;
}


The Iterator ADT

For historical reasons (and brevity), however, C++ programmers use overloaded operators for the getData() and next() operations:

Given a container c and “positions” it and it0 somewhere within c:

access the data at that position *it, it->
move it to the next position within c ++it or it++
compare two position values it and it0 it == it0, it != it0
get the beginning and ending positions in a container c.begin(), c.end()
copy a position it0 = it

We call position ADTs that conform to this patter iterators (because they let us iterate over a collection of data).

For example, we might define an iterator for authors in a book as:

authorPosition1.h

class AuthorPosition {
public:
   AuthorPosition();

   // get data at this position
   Author operator*() const;

   // get a data/function member at this position
   Author* operator->() const;

   // move forward to the position just after this one
   AuthorPosition operator++();

   // Is this the same position as pos?
   bool operator== (const AuthorPosition& pos) const;
   bool operator!= (const AuthorPosition& pos) const;

};

so that code to access authors would then look like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        ++p)
     cout << "author: " << *p << endl;
}


Range-Based For Loops

In later years, C++ embraced the idea that iterators would be a pervasive part of typical programming style. New short-hand versions of for loops were introduced specifically to work with classes that provide iterators via the conventional interfaces.

For example, in C++, instead of

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); ++p)
     cout << "author: " << *p << endl;
}

we would be more likely to write:

void listAllAuthors(Book& b)
{
   for (Author& au: b)
     cout << "author: " << au << endl;
}

Iterators are an important part of the “idiom” or “style” of working in C++ (and Java). If this is your first time encountering them, you can read more here.

2.3 Design Patterns

Iterator as a Design Pattern

 

The idea of an iterator is an instance of what we call a design pattern:

The illustration of that pattern shown here highlights the fact that we have a Collection class and an Iterator class, and lists the operations that need to be provided by each. (Again, in UML a class is shown as a box divided into three parts, 1) the class name, 2) the attributes (empty in this case), and 3) the operations.


Pattern, not ADT

In C++, our application code does not actually work with actual ADTs named “Collection” and “Iterator”.


Realizing a Design Pattern

 

Keep an eye out, as we move through the semester, for more instances of common design patterns.

3 ADTs as contracts

An ADT represents a contract between the ADT developer and the users (application programmers).

The Contract


Why the Contract

What do we gain by holding ourselves to this contract?

Look back at the sample ADTs from the previous sections. Note that, although none of them contain data structures or algorithms to actually provide the required functions, all of them provide enough information that you could start writing application code using them.

3.1 Information Hiding


Information Hiding

Every design can be viewed as a collection of “design decisions”.


Information Hiding: The Day ADT

An example of a design decision that we might want to hide:

But if we choose our second interface, making the data members, whatever they are, private, and accessing the attributes via get… and set… operations, then any code that uses our Day ADT will be unaffected by this design decision. And if we should later change our minds, that code will not need to be rewritten.

It’s important to note that, when we do information hiding, we are not hiding information from customers. We are not hiding the information from other programmers – if they have access to the source code, they can see what we have done. We are certainly not hiding information from ourselves, either!

The information about what design decision was made is being hidden from other parts of the program. Like firewalls in a large building, which are intended to make sure that, when something bad happens, it cannot easily spread, the point of information hiding is to limit the amount of code that would need to be rewritten in the event that we have to reconsider and change an earlier design decision.


Encapsulation

Although ADTs can be designed without language support, they rely on programmers’ self-discipline for enforcement of information hiding.

Encapsulation is the enforcement of information hiding by programming language constructs.

In C++, this is accomplished by allowing ADT implementors to put some declarations into a private: area. Any application code attempting to use those private names will fail to compile.

4 ADT Implementations


ADT Implementations

An ADT is implemented by supplying

We sometimes refer to the ADT itself as the ADT specification or the ADT interface, to distinguish it from the code of the ADT implementation.

In C++, implementation is generally done using a C++ class.

4.1 Examples


Calendar Day Implementations

As an ADT designer, I might consider two possible data structures:

Each approach has pros and cons. The first one makes for very quick I/O and retrieval of the date/month/year attributes. But getting the date of a day k days in the future or past is complicated and slow.

The second one makes computations of future and past dates trivial, but slows the process of doing I/O and of retrieving individual components of the date.


CellName implementation

cellnameImpl.cpp

class CellName
{
public:
  CellName (std::string column, int row,
            bool fixTheColumn = false,
            bool fixTheRow=false);
  //pre: column.size() > 0 && all characters in column are alphabetic
  //     row > 0

  CellName (std::string cellname);
  //pre: exists j, 0<=j<cellname.size()-1, 
  //        cellname.substr(0,j) is all alphabetic (except for a
  //             possible cellname[0]=='$')
  //        && cellname.substr(j) is all numeric (except for a
  //             possible cellname[j]=='$') with at least one non-zero
  //             digit

  CellName (unsigned columnNumber = 0, unsigned rowNumber = 0,
            bool fixTheColumn = false,
            bool fixTheRow=false);

  std::string toString() const;
  // render the entire CellName as a string

  // Get components in spreadsheet notation
  std::string getColumn() const;
  int getRow() const;

  bool isRowFixed() const;
  bool isColumnFixed() const;


  // Get components as integer indices in range 0..
  int getColumnNumber() const;
  int getRowNumber() const;


  bool operator== (const CellName& r) const
    {return (columnNumber == r.columnNumber &&
             rowNumber == r.rowNumber &&
             theColIsFixed == r.theColIsFixed &&
             theRowIsFixed == r.theRowIsFixed);}

private:
  ⋮
  int rowNumber;
  bool theRowIsFixed;
  bool theColIsFixed;

  int CellName::alphaToInt (std::string columnIndicator) const;
  std::string CellName::intToAlpha (int columnIndex) const;

};


inline
bool CellName::isRowFixed() const {return theRowIsFixed;}

inline
bool CellName::isColumnFixed() const {return theColIsFixed;}



#endif

There are some options here the have not been explored:


Book implementation

We can implement Book in book.h:

book1.h
#ifndef BOOK_H
#include "author.h"
#include "publisher.h"


class Book {
public:
  typedef const Author* AuthorPosition;

  Book (Author);                       // for books with single authors
  Book (const Author[], int nAuthors); // for books with multiple authors


  std::string getTitle() const;
  void setTitle(std::string theTitle);

  int getNumberOfAuthors() const;

  std::string getISBN() const;
  void setISBN(std::string id);

  Publisher getPublisher() const;
  void setPublisher(const Publisher& publ);

  AuthorPosition begin() const;
  AuthorPosition end() const;

  void addAuthor (AuthorPosition at, const Author& author);
  void removeAuthor (AuthorPosition at);

private:

  std::string title;
  int numAuthors;
  std::string isbn;
  Publisher publisher;

  static const int MAXAUTHORS = 12;
  Author authors[MAXAUTHORS];

};

#endif

and in book.cpp:

book1.cpp
#include "book1.h"

  // for books with single authors
Book::Book (Author a)
{
  numAuthors = 1;
  authors[0] = a;
}

// for books with multiple authors
Book::Book (const Author au[], int nAuthors)
{
  numAuthors = nAuthors;
  for (int i = 0; i < nAuthors; ++i)
    {
      authors[i] = au[i];
    }
}

std::string Book::getTitle() const
{
  return title;
}

void Book::setTitle(std::string theTitle)
{
  title = theTitle;
}

int Book::getNumberOfAuthors() const
{
  return numAuthors;
}

std::string Book::getISBN() const
{
  return isbn;
}

void Book::setISBN(std::string id)
{
  isbn = id;
}

Publisher Book::getPublisher() const
{
  return publisher;
}

void Book::setPublisher(const Publisher& publ)
{
  publisher = publ;
}

Book::AuthorPosition Book::begin() const
{
  return authors;
}

Book::AuthorPosition Book::end() const
{
  return authors+numAuthors;
}


void Book::addAuthor (Book::AuthorPosition at, const Author& author)
{
  int i = numAuthors;
  int atk = at - authors;
  while (i >= atk) 
    {
      authors[i+1] = authors[i];
      i--;
    }
  authors[atk] = author;
  ++numAuthors;
}


void Book::removeAuthor (Book::AuthorPosition at)
{
  int atk = at - authors;
  while (atk + 1 < numAuthors)
    {
      authors[atk] = authors[atk + 1];
      ++atk;
    }
  --numAuthors;
}


We’ll explore some of the details and alternatives of these implementations in the next lesson.