ADTs

Steven J Zeil

Contents:
1. Abstraction
1.1 Procedural Abstraction
1.2 Data Abstraction
2. Abstract Data Types
2.1 Examples
2.2 Design Patterns
3. ADTs as contracts
3.1 Information Hiding
4. ADT Implementations
4.1 Examples
If we were to look at a program that is actually large enough to require a full team of programmers to implement it, you would probably not be surprised to find that it would not be organized as a single, large, monolithic unit, but instead as a large number of cooperating functions. You already know how to design and write functions in C++ . What may, however, come as a bit of a surprise if you have not worked much with programs that size is that even these functions will be further organized into higher level structure.

I’ve tried to illustrate that structure in the diagram that you see here. At the top we have the main application program, for example, a spell checker. The code that occurs at this level is very specific to this application and is the main thing that differentiates a spell checker from, let’s say, a spreadsheet. On the other hand, at the very bottom of the hierarchy, we have all the basic primitive data types and operations, such as the int type, the char type, addition, subtraction, and so on, that are provided by our programming language, (C++ in this example). These primitive operations may very well show up in almost any kind of program.

In between those, we have all the things that we can build on top of the language primitives on our way working up towards our application program. Just above the language primitives we have the basic data structures, structures like linked lists or trees. We’re going to spend a lot of time the semester looking at these kinds of structures - you may already be familiar with some of them. They are certainly very important. And yet, if we stopped at that level, we would wind up “building to order” for every application. As we move from one application to another we would often find ourselves doing the same kinds of coding, over and over again.

What’s wrong with that? Most companies, and therefore most programmers, do not move from one application to a wildly different application on the next project. Programmers who been working on “accounts receivable” are unlikely to start writing compilers the next week, and programmers who have been writing compilers are not going to be writing control software for jet aircraft the week after. Instead, programmers are likely to remain within a general application domain. The people who are currently working on our spell checker may very well be assigned to work on a grammar checker next month, or on some other text processing tool. That means that any special support we can design for dealing with text, words, sentences, or other concepts natural to this application to make may prove valuable in the long run because we can share that work over the course of several projects.

And so, on top of the basic data structures, we expect to find layers of reusable libraries. Just above the basic data structures, the libraries are likely to provide fairly general purpose structures, such as support for look-up tables, user interfaces, and the like. As we move up in the hierarchy, the libraries become more specialized to the application domain in which we’re working. Just below the application level, we will find support for concepts that are very close to the spell checker, such as “words”, “misspellings”, and “corrections”.

The libraries that make up all but the topmost layer of this diagram may contain individual functions or groups of functions organized as Abstract Data Types. In this lesson, we’ll review the idea of Abstract Data Types and their implementations. Little, if any, of the material in this lesson should be entirely new to you - all of it is covered in CS 250.

1. Abstraction

In general, abstraction is a creative process of focusing attention on the main problems by ignoring lower-level details.

In programming, we encounter two particular kinds of abstraction:

1.1 Procedural Abstraction

A procedural abstraction is a mental model of what we want a subprogram to do (but not how to do it).

Example:

if you wanted to compute the length of the a hypotenuse of a right triangle, you might write something like

double hypotenuse = sqrt(side1*side1 + side2*side2);

We can write this, understanding that the sqrt function is supposed to compute a square root, even if we have no idea how that square root actually gets computed.

When we start actually writing the code, we implement a procedural abstraction by

In practice, there may be many algorithms to achieve the same abstraction, and we use engineering considerations such as speed, memory requirements, and ease of implementation to choose among the possibilities.

For example, the “sqrt” function is probably implemented using a technique completely unrelated to any technique you may have learned in grade school for computing square roots. On many systems. sqrt doesn’t compute a square root at all, but computes a polynomial function that was chosen as a good approximation to the actual square root and that can be evaluated much more quickly than an actual square root. It may then refine the accuracy of that approximation by applying Newton’s method, a technique you may have learned in Calculus.

Does it bother you that sqrt does not actually compute via a square root algorithm? Probably not. It shouldn’t. As long as we trust the results, the method is something we are happy to ignore.

1.2 Data Abstraction

Data abstraction works much the same way.

A data abstraction is a mental model of what can be done to a collection of data. It deliberately excludes details of how to do it.


Example: calendar days

A day (date?) in a calendar denotes a 24-hour period, identified by a specific year, month, and day number.

That’s it. That’s probably all you need to know for you and I to agree that we are talking about a common idea.


Example: cell names

One of the running examples I will use throughout this course is the design and implementation of a spreadsheet. I assume that you are familiar with some sort of spreadsheet program such as Microsoft’s Excel or the OpenOffice Calc program. All spreadsheets present a rectangular arrangement of cells, with each cell containing a mathematical expression or formula to be evaluated.

Every cell in a spreadsheet has a unique name. The name has a column part and a row part.

For example, suppose we have a cell B1 containing the formula 2*A1 and that we copy and paste that cell to position C3. When we look in C3, we would find the pasted formula was 2*B3 adjusted for the number of rows and columns over which we moved the copied value.

But if the cell B1 originally contained the formula 2*A$1, the copied formula would be 2*B$1. The $ indicates that we are fixing the column indicator during copies. Similarly, if the cell B1 originally contained the formula 2*$A$1, the copied formula would be 2*$A$1. (If this isn’t clear, fire up a spreadsheet and try it. We can’t expect to share mental models (abstractions) if we don’t share an experience with the subject domain.)


Example: a book

How to describe a book?


Example: positions within a container

Many of the abstractions that we work with are “containers” of arbitrary numbers of pieces of other data.

This is obvious with things like arrays and lists, but is also true of more prosaic items. For example, a book is, in effect, a container of an arbitrary number of authors (and in other variations, an arbitrary number of pages).

Any time you have an ordered sequence of data, you can imagine the need to look through it. That then leads to the concept of a position within that sequence, with notions like

2. Abstract Data Types


Adding Interfaces


Definition of an Abstract Data Type

(traditional): An abstract data type (ADT) is a type name and a list of operations on that type.

It’s convenient, for the purpose of this course, to modify this definition just slightly:

Definition (alternate): An abstract data type (ADT) is a type name and a list of members (data or function) on that type.

This change is not really all that significant. Traditionally, a data member X is modeled as a pair of getX() and putX(x) functions. But in practice, we will allow ADTs to include data members in their specification. This definition may make it a bit clearer that an ADT corresponds, more or less, to the public portion of a typical class.

In either case, when we talk about listing the members, this includes giving their names and their data types (for functions, their return types and the data types of their parameters).

If you search the web for the phrase “abstract data type”, you’ll find lots of references to stacks, queues, etc. - the “classic” data structures. Certainly, these are ADTs. But, just as with the abstractions introduced earlier, each application domain has certain characteristic or natural abstractions that may also need programming interfaces.


ADT Members: attributes and operations

Commonly divided into

2.1 Examples


Calendar Days

Nothing in the definition of ADT that says that the interface has to be written out in a programming language.

UML diagrams present classes as a 3-part box: name, attributes, & operations


Calendar Days: alternative

But we can use a more programming-style interface:

class Day {
public:
   // Attributes
   int getDay();
   void setDay (int);
   int getMonth();
   void setMonth(int);
   int getYear();
   void setYear(int);
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮

See also the interface developed in sections 3.1 and 3.2 of your text (Horstmann).


Notations

class Day {
public:
   // Attributes
   int getDay();
   void setDay (int);
   int getMonth();
   void setMonth(int);
   int getYear();
   void setYear(int);
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮


Cell Names

Here is a possible interface for our cell name abstraction.

cellnameInterface.h

class CellName
{
public:
  CellName (std::string column, int row,
            bool fixTheColumn = false,
            bool fixTheRow=false);
  //pre: column.size() > 0 && all characters in column are alphabetic
  //     row > 0

  CellName (std::string cellname);
  //pre: exists j, 0<=j<cellname.size()-1, 
  //        cellname.substr(0,j) is all alphabetic (except for a
  //             possible cellname[0]=='$')
  //        && cellname.substr(j) is all numeric (except for a
  //             possible cellname[j]=='$') with at least one non-zero
  //             digit

  CellName (unsigned columnNumber = 0, unsigned rowNumber = 0,
            bool fixTheColumn = false,
            bool fixTheRow=false);

  std::string toString() const;
  // render the entire CellName as a string

  // Get components in spreadsheet notation
  std::string getColumn() const;
  int getRow() const;

  bool isRowFixed() const;
  bool isColumnFixed() const;


  // Get components as integer indices in range 0..
  int getColumnNumber() const;
  int getRowNumber() const;


  bool operator== (const CellName& r) const
     ⋮
private:
     ⋮

Arguably, the diagram presents much the same information as the code


Example: a book

If we were to try to capture our book abstraction (concentrating on the metadata), we might come up with something like:

bookAbstraction0.h

class Book {
public:
  Book (Author)                 // for books with single authors
  Book (Author[], int nAuthors) // for books with multiple authors

  std::string getTitle() const;
  void putTitle(std::string theTitle);

  int getNumberOfAuthors() const;

  std::string getIsBN() const;
  void putISBN(std::string id);

  Publisher getPublisher() const;
  void putPublisher(const Publisher& publ);

  AuthorPosition begin();
  AuthorPosition end();

  void addAuthor (AuthorPosition at, const Author& author);
  void removeAuthor (AuthorPosition at);

private:
  ⋮
};

Example: positions within a container

Coming up with a good interface for our position abstraction is a problem that has challenged many an ADT designer.

bookNumericPositions.h
class Book {
public:
  Book (Author)                 // for books with single authors
  Book (Author[], int nAuthors) // for books with multiple authors

  std::string getTitle() const;
  void putTitle(std::string theTitle);

  int getNumberOfAuthors() const;

  std::string getIsBN() const;
  void putISBN(std::string id);

  Publisher getPublisher() const;
  void putPublisher(const Publisher& publ);

  typedef int AuthorPosition;
  <+1>Author getAuthor (AuthorPosition authorNum) const;  <-1>

  void addAuthor (AuthorPosition at, const Author& author);
  void removeAuthor (AuthorPosition at);

private:
  ⋮
};

A problem with this is that the getAuthor function could then be done efficiently only if the authors inside each book were stored in an array or array-like data structure. And then addAuthor and removeAuthor cannot be implemented efficiently. Arrays also pose a difficulty – how large an array should be allocated for this purpose? If we allocate too few, the program crashes. So programmers usually wind up allocating an array large enough to contain as many items as possible – in this case as many authors as have ever collaborated on a single book. This means a lot of wasted storage for most books.

Both C++ and Java provide “expandable” array types, std::vector and java.util.ArrayList, respectively, that grow to accommodate however much data you actually insert into them. These resolve the storage issue, at a slight cost in speed, but still do not permit efficient implementation of addAuthor and removeAuthor.


Iterators

The solution adapted by the C++ community is to have every ADT that is a “container” of sequences of other data to provide a special type for positions within that sequence.



A Possible Position Interface

In theory, we could satisfy this requirement with an ADT like this:

authorPosition0.h

class AuthorPosition {
public:
   AuthorPosition();

   // get data at this position
   Author getData() const;

   // get the position just after this one
   AuthorPosition next() const;

   // Is this the same position as pos?
   bool operator== (const AuthorPosition& pos) const;
   bool operator!= (const AuthorPosition& pos) const;

};

which in turn would allow us to access authors like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        p = p.next())
     cout << "author: " << p.getData() << endl;
}


The Iterator ADT

For historical reasons (and brevity), however, C++ programmers use overloaded operators for the getData() and next() operations:

authorPosition1.h

class AuthorPosition {
public:
   AuthorPosition();

   // get data at this position
   Author operator*() const;

   // get a data/function member at this position
   Author* operator->() const;

   // move forward to the position just after this one
   AuthorPosition operator++();

   // Is this the same position as pos?
   bool operator== (const AuthorPosition& pos) const;
   bool operator!= (const AuthorPosition& pos) const;

};

so that code to access authors would look like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        ++p)
     cout << "author: " << *p << endl;
}

This ADT for positions is called an iterator (because it lets us iterate over a collection of data).

Java has similar ADTs for positions within a container, called Enumeration and Iterator, which we will see in a later lesson.

2.2 Design Patterns


Iterator as a Design Pattern

The idea of an iterator is an instance of what we call a design pattern:

Pattern, not ADT

In C++, our application code does not actually work with an actual ADT named “Iterator”.

Again, there’s probably no class in our application code actually named “Collection”.


Realizing a Design Pattern

You may have noticed that your textbook is titled “Objected-Oriented Design & Patterns”. Keep an eye out, as we move through the semester, for more instances of common design patterns. (You might want to compare this diagram to the more Java-oriented version of this pattern on page 178 - in Java there really is something in the library named Iterator and our application works directly with that and only indirectly with the concrete realization.)

3. ADTs as contracts


ADTs as contracts

An ADT represents a contract between the ADT developer and the users (application programmers).

The Contract


Why the Contract

What do we gain by holding ourselves to this contract?

Look back at the sample ADTs from the previous sections. Note that, although none of them contain data structures or algorithms to actually provide the required functions, all of them provide enough information that you could start writing application code using them.

3.1 Information Hiding


Information Hiding

Every design can be viewed as a collection of “design decisions”.


Encapsulation

Although ADTs can be designed without language support, they rely on programmers’ self-discipline for enforcement of information hiding.

Encapsulation is the enforcement of information hiding by programming language constructs.

In C++, this is accomplished by allowing ADT implementors to put some declarations into a private: area. Any application code attempting to use those private names will fail to compile.

4. ADT Implementations


ADT Implementations

An ADT is implemented by supplying

We sometimes refer to the ADT itself as the ADT specification or the ADT interface, to distinguish it from the code of the ADT implementation.

In C++, implementation is generally done using a C++ class.

4.1 Examples


Calendar Day Implementations

Read section 3.3 of your text (Horstmann) for a discussion of three different implementations of the Day ADT. notice how we can choose among implementations for performance reasons, without breaking any application code that relies on the Day interface.


CellName implementation

cellnameImpl.cpp

class CellName
{
public:
  CellName (std::string column, int row,
            bool fixTheColumn = false,
            bool fixTheRow=false);
  //pre: column.size() > 0 && all characters in column are alphabetic
  //     row > 0

  CellName (std::string cellname);
  //pre: exists j, 0<=j<cellname.size()-1, 
  //        cellname.substr(0,j) is all alphabetic (except for a
  //             possible cellname[0]=='$')
  //        && cellname.substr(j) is all numeric (except for a
  //             possible cellname[j]=='$') with at least one non-zero
  //             digit

  CellName (unsigned columnNumber = 0, unsigned rowNumber = 0,
            bool fixTheColumn = false,
            bool fixTheRow=false);

  std::string toString() const;
  // render the entire CellName as a string

  // Get components in spreadsheet notation
  std::string getColumn() const;
  int getRow() const;

  bool isRowFixed() const;
  bool isColumnFixed() const;


  // Get components as integer indices in range 0..
  int getColumnNumber() const;
  int getRowNumber() const;


  bool operator== (const CellName& r) const
    {return (columnNumber == r.columnNumber &&
             rowNumber == r.rowNumber &&
             theColIsFixed == r.theColIsFixed &&
             theRowIsFixed == r.theRowIsFixed);}

private:
  ⋮
  int rowNumber;
  bool theRowIsFixed;
  bool theColIsFixed;

  int CellName::alphaToInt (std::string columnIndicator) const;
  std::string CellName::intToAlpha (int columnIndex) const;

};


inline
bool CellName::isRowFixed() const {return theRowIsFixed;}

inline
bool CellName::isColumnFixed() const {return theColIsFixed;}



#endif

There are some options here the have not been explored:


Book implementation

We can implement Book in book.h:

book1.h
#ifndef BOOK_H
#include "author.h"
#include "publisher.h"


class Book {
public:
  typedef const Author* AuthorPosition;

  Book (Author);                       // for books with single authors
  Book (const Author[], int nAuthors); // for books with multiple authors


  std::string getTitle() const;
  void setTitle(std::string theTitle);

  int getNumberOfAuthors() const;

  std::string getISBN() const;
  void setISBN(std::string id);

  Publisher getPublisher() const;
  void setPublisher(const Publisher& publ);

  AuthorPosition begin() const;
  AuthorPosition end() const;

  void addAuthor (AuthorPosition at, const Author& author);
  void removeAuthor (AuthorPosition at);

private:

  std::string title;
  int numAuthors;
  std::string isbn;
  Publisher publisher;

  static const int MAXAUTHORS = 12;
  Author authors[MAXAUTHORS];

};

#endif

and in book.cpp:

book1.cpp
#include "book1.h"

  // for books with single authors
Book::Book (Author a)
{
  numAuthors = 1;
  authors[0] = a;
}

// for books with multiple authors
Book::Book (const Author au[], int nAuthors)
{
  numAuthors = nAuthors;
  for (int i = 0; i < nAuthors; ++i)
    {
      authors[i] = au[i];
    }
}

std::string Book::getTitle() const
{
  return title;
}

void Book::setTitle(std::string theTitle)
{
  title = theTitle;
}

int Book::getNumberOfAuthors() const
{
  return numAuthors;
}

std::string Book::getISBN() const
{
  return isbn;
}

void Book::setISBN(std::string id)
{
  isbn = id;
}

Publisher Book::getPublisher() const
{
  return publisher;
}

void Book::setPublisher(const Publisher& publ)
{
  publisher = publ;
}

Book::AuthorPosition Book::begin() const
{
  return authors;
}

Book::AuthorPosition Book::end() const
{
  return authors+numAuthors;
}


void Book::addAuthor (Book::AuthorPosition at, const Author& author)
{
  int i = numAuthors;
  int atk = at - authors;
  while (i >= atk) 
    {
      authors[i+1] = authors[i];
      i--;
    }
  authors[atk] = author;
  ++numAuthors;
}


void Book::removeAuthor (Book::AuthorPosition at)
{
  int atk = at - authors;
  while (atk + 1 < numAuthors)
    {
      authors[atk] = authors[atk + 1];
      ++atk;
    }
  --numAuthors;
}



We’ll explore some of the details and alternatives of these implementations in the next lesson.