ADTs

Steven J Zeil

Last modified: Aug 6, 2019

Contents:

1 Abstraction

1.1 Procedural Abstraction

1.2 Data Abstraction

2 Abstract Data Types

2.1 Definition of an Abstract Data Type

2.2 Examples

1 Abstraction

In general, abstraction is a creative process of focusing attention on the main problems by ignoring lower-level details.

In programming, we encounter two particular kinds of abstraction:

procedural abstraction and
data abstraction.

1.1 Procedural Abstraction

A procedural abstraction is a mental model of what we want a subprogram to do (but not how to do it).

Example:

double hypotenuse = sqrt(side1*side1 + side2*side2);

We can write this, understanding that the sqrt function is supposed to compute a square root, even if we have no idea how that square root actually gets computed.

That’s because we understand what a square root is.

1.2 Data Abstraction

A data abstraction is a mental model of what can be done to a collection of data. It deliberately excludes details of how to do it.

1.2.1 Example: calendar days

A day (date?) in a calendar denotes a 24-hour period, identified by a specific year, month, and day number.

That’s it. That’s probably all you need to know for you and I to agree that we are talking about a common idea.

1.2.2 Example: cell names

Every cell in a spreadsheet has a unique name. The name has a column part and a row part.

The row indicators are integer values starting at 1.
The column indicators are case-insensitive strings of alphabetic characters as follows: A, B, … , Z, AA, AB, AC, … , AZ, BA, BB, … ZZ, AAA, AAB, … and so on.
Optional $ markers may appear in front of each part to “fix” the row or column during copying.
- Examples: A1, BC23, B$3, $A$1, $ZZZZZ2

1.2.3 Example: a book

How to describe a book?

1.2.3 Example: a book

How to describe a book?

If we are implementing a card catalog and library checkout, it is probably enough to list the metadata, e.g., title, authors, publisher, date.

1.2.3 Example: a book

How to describe a book?

If we are implementing a card catalog and library checkout, it is probably enough to list the metadata, e.g., title, authors, publisher, date.
If, however, we are going to be working on a project involving the full text of the document (e.g., automatic metadata extraction and indexing), then we might need all the pages and all the text.

1.2.3 Example: a book

How to describe a book?

If we are implementing a card catalog and library checkout, it is probably enough to list the metadata, e.g., title, authors, publisher, date.
If, however, we are going to be working on a project involving the full text of the document (e.g., automatic metadata extraction and indexing), then we might need all the pages and all the text.
On the other hand, if we were building bookshelves, we might need more physical attributes such as size and weight!

1.2.4 Example: positions within a container

Many of the abstractions that we work with are “containers” of arbitrary numbers of pieces of other data.

Obvious cases: arrays and lists,
More subtle: a book is, in effect, a container of
- an arbitrary number of authors
- an arbitrary number of pages

Any time you have an ordered sequence of data, you can imagine the need to look through it. That then leads to the concept of a position within that sequence, with notions like

finding the first and last position,
going forward to the next position, etc.

2 Abstract Data Types

Adding Interfaces

The mental model offered by a data abstraction gives us an informal understanding of how and when to use it.
But because it is simply a mental model, it does not tell us enough information to program with it.
An abstract data type (ADT) captures this model in a programming language interface.

2.1 Definition of an Abstract Data Type

Definition (traditional): An abstract data type (ADT) is a type name and a list of operations on that type.

It’s convenient, for the purpose of this course, to modify this definition just slightly:

Definition (alternate): An abstract data type (ADT) is a type name and a list of members (data or function) on that type.

An ADT corresponds, more or less, to the public portion of a typical class.

The “list of members” includes
- names
- data types
- expected behavior
a.k.a. an ADT specification

2.1.1 ADT Members: attributes and operations

The “members” of an ADT are Commonly divided into

attributes: the things that we think of as being data stored inthe ADT
- Actual interface is often through getAttr() and setAttr() functions.
  - which, in turn, might or might not actually involve direct access to a “data member”
operations: the functions or behaviors or the ADT
- the “type” of a function consists of its return type and an ordered list of of its parameters’ types

2.2 Examples

2.2.1 Calendar Days

Nothing in the definition of ADT that says that the interface has to be written out in a programming language.

UML diagrams present classes as a 3-part box: name, attributes, & operations

Calendar Days: alternative

But we can use a more programming-style interface:

class Day {
public:
   // Attributes
   int day;
   int month;
   int year;
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮

or

class Day {
public:
   // Attributes
   int getDay();
   void setDay (int);
   int getMonth();
   void setMonth(int);
   int getYear();
   void setYear(int);
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮

Either of these interfaces captures the sense of the ADT described in the diagram.

Possible disadvantages of moving early to programming-style interfaces:
- getting lost in language details
- prematurely committing to those details

2.2.2 Cell Names

Here is a possible interface for our cell name abstraction.

cellnameInterface.h


class CellName
{
public:
  CellName (std::string column, int row,
            bool fixTheColumn = false,
            bool fixTheRow=false);
  //pre: column.size() > 0 && all characters in column are alphabetic
  //     row > 0

  CellName (std::string cellname);
  //pre: exists j, 0<=j<cellname.size()-1, 
  //        cellname.substr(0,j) is all alphabetic (except for a
  //             possible cellname[0]=='$')
  //        && cellname.substr(j) is all numeric (except for a
  //             possible cellname[j]=='$') with at least one non-zero
  //             digit

  CellName (unsigned columnNumber = 0, unsigned rowNumber = 0,
            bool fixTheColumn = false,
            bool fixTheRow=false);

  std::string toString() const;
  // render the entire CellName as a string

  // Get components in spreadsheet notation
  std::string getColumn() const;
  int getRow() const;

  bool isRowFixed() const;
  bool isColumnFixed() const;


  // Get components as integer indices in range 0..
  int getColumnNumber() const;
  int getRowNumber() const;


  bool operator== (const CellName& r) const
     ⋮
private:
     ⋮

Arguably, the diagram presents much the same information as the code

2.2.3 Example: a book

If we were to try to capture our book abstraction (concentrating on the metadata), we might come up with something like:

bookAbstraction0.h


class Book {
public:
  Book (Author)                 // for books with single authors
  Book (Author[], int nAuthors) // for books with multiple authors

  std::string getTitle() const;
  void putTitle(std::string theTitle);

  int getNumberOfAuthors() const;

  std::string getIsBN() const;
  void putISBN(std::string id);

  Publisher getPublisher() const;
  void putPublisher(const Publisher& publ);

  AuthorPosition begin();
  AuthorPosition end();

  void addAuthor (AuthorPosition at, const Author& author);
  void removeAuthor (AuthorPosition at);

private:
  ⋮
};

What are Author and Publisher in this interface?
- They are simply other ADTs in this library world, and will need to have designed interfaces of their own.

2.2.4 Example: positions within a container

Coming up with a good interface for our position abstraction is a problem that has challenged many an ADT designer.

A look at our Book interface may suggest why.

bookNumericPositions.h
class Book { public: Book (Author) // for books with single authors Book (Author[], int nAuthors) // for books with multiple authors std::string getTitle() const; void putTitle(std::string theTitle); int getNumberOfAuthors() const; std::string getIsBN() const; void putISBN(std::string id); Publisher getPublisher() const; void putPublisher(const Publisher& publ); typedef int AuthorPosition; Author getAuthor (AuthorPosition authorNum) const; void addAuthor (AuthorPosition at, const Author& author); void removeAuthor (AuthorPosition at); private: ⋮ };
One intuitive idea might be to simply number the authors and treat the number as a position indicator, as shown here.
- biases implementors towards arrays/vectors
- may lock us into inefficient solutions

C++ Iterators

The solution adapted by the C++ community is to have every ADT that is a “container” of sequences of other data to provide a special type for positions within that sequence.

The container itself provides functions to return
- the beginning position in the sequence and
- the position just after the last data item in the
(these are typically called begin() and end()

The position ADT must provide, at a minimum:
- A function to fetch the data item at the given position.
- A function to advance from the current position to the next position in the sequence.
- A function to compare two positions to see if they are the same.

A Possible Position Interface

In theory, we could satisfy this requirement with an ADT like this:

authorPosition0.h


class AuthorPosition {
public:
   AuthorPosition();

   // get data at this position
   Author getData() const;

   // get the position just after this one
   AuthorPosition next() const;

   // Is this the same position as pos?
   bool operator== (const AuthorPosition& pos) const;
   bool operator!= (const AuthorPosition& pos) const;

};

which in turn would allow us to access authors like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        p = p.next())
     cout << "author: " << p.getData() << endl;
}

The Iterator ADT

For historical reasons (and brevity), however, C++ programmers use overloaded operators for the getData() and next() operations:

Given a container c and “positions” it and it0 somewhere within c:

access the data at that position	`*it`, `it->`
move `it` to the next position within `c`	`++it` or `it++`
compare two position values `it` and `it0`	`it == it0`, `it != it0`
get the beginning and ending positions in a container	`c.begin()`, `c.end()`
copy a position	`it0 = it`

We call position ADTs that conform to this patter iterators (because they let us iterate over a collection of data).

For example, we might define an iterator for authors in a book as:

authorPosition1.h


class AuthorPosition {
public:
   AuthorPosition();

   // get data at this position
   Author operator*() const;

   // get a data/function member at this position
   Author* operator->() const;

   // move forward to the position just after this one
   AuthorPosition operator++();

   // Is this the same position as pos?
   bool operator== (const AuthorPosition& pos) const;
   bool operator!= (const AuthorPosition& pos) const;

};

so that code to access authors would then look like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        ++p)
     cout << "author: " << *p << endl;
}

Range-Based For Loops

In later years, C++ embraced the idea that iterators would be a pervasive part of typical programming style. New short-hand versions of for loops were introduced specifically to work with classes that provide iterators via the conventional interfaces.