ADTs

Steven J Zeil

I’ve tried to illustrate that structure in the diagram that you see here. At the top we have the main application program, for example, a spell checker. The code that occurs at this level is very specific to this application and is the main thing that differentiates a spell checker from, let’s say, a spreadsheet. On the other hand, at the very bottom of the hierarchy, we have all the basic primitive data types and operations, such as the int type, the char type, addition, subtraction, and so on, that are provided by our programming language, (C++ in this example). These primitive operations may very well show up in almost any kind of program.

In between those, we have all the things that we can build on top of the language primitives on our way working up towards our application program. Just above the language primitives we have the basic data structures, structures like linked lists or trees. We’re going to spend a lot of time the semester looking at these kinds of structures - you may already be familiar with some of them. They are certainly very important. And yet, if we stopped at that level, we would wind up “building to order” for every application. As we move from one application to another we would often find ourselves doing the same kinds of coding, over and over again.

What’s wrong with that? Most companies, and therefore most programmers, do not move from one application to a wildly different application on the next project. Programmers who been working on “accounts receivable” are unlikely to start writing compilers the next week, and programmers who have been writing compilers are not going to be writing control software for jet aircraft the week after. Instead, programmers are likely to remain within a general application domain. The people who are currently working on our spell checker may very well be assigned to work on a grammar checker next month, or on some other text processing tool. That means that any special support we can design for dealing with text, words, sentences, or other concepts natural to this application to make may prove valuable in the long run because we can share that work over the course of several projects.

And so, on top of the basic data structures, we expect to find layers of reusable libraries. Just above the basic data structures, the libraries are likely to provide fairly general purpose structures, such as support for look-up tables, user interfaces, and the like. As we move up in the hierarchy, the libraries become more specialized to the application domain in which we’re working. Just below the application level, we will find support for concepts that are very close to the spell checker, such as “words”, “misspellings”, and “corrections”.

The libraries that make up all but the topmost layer of this diagram may contain individual functions or groups of functions organized as Abstract Data Types. In this lesson, we’ll review the idea of Abstract Data Types and their implementations. Little, if any, of the material in this lesson should be entirely new to you - all of it is covered in CS 250.

1. Abstraction


Abstraction

In general, abstraction is a creative process of focusing attention on the main problems by ignoring lower-level details.

In programming, we encounter two particular kinds of abstraction:

1.1 Procedural Abstraction


Procedural Abstraction

A procedural abstraction is a mental model of what we want a subprogram to do (but not how to do it).

Example: if you wanted to compute the length of the a hypotenuse of a right triangle, you might write something like

double hypotenuse = sqrt(side1*side1 + side2*side2);

We can write this, understanding that the sqrt function is supposed to compute a square root, even if we have no idea how that square root actually gets computed.

When we start actually writing the code, we implement a procedural abstraction by

In practice, there may be many algorithms to achieve the same abstraction, and we use engineering considerations such as speed, memory requirements, and ease of implementation to choose among the possibilities.

For example, the “sqrt” function is probably implemented using a technique completely unrelated to any technique you may have learned in grade school for computing square roots. On many systems. sqrt doesn’t compute a square root at all, but computes a polynomial function that was chosen as a good approximation to the actual square root and that can be evaluated much more quickly than an actual square root. It may then refine the accuracy of that approximation by applying Newton’s method, a technique you may have learned in Calculus.

Does it bother you that sqrt does not actually compute via a square root algorithm? Probably not. It shouldn’t. As long as we trust the results, the method is something we are happy to ignore.

1.2 Data Abstraction


Data Abstraction

Data abstraction works much the same way. A data abstraction is a mental model of what can be done to a collection of data. It deliberately excludes details of how to do it.


Example: calendar days

A day (date?) in a calendar denotes a 24-hour period, identified by a specific year, month, and day number.

That’s it. That’s probably all you need to know for you and I to agree that we are talking about a common idea.


Example: cell names

One of the running examples I will use throughout this course is the design and implementation of a spreadsheet. I assume that you are familiar with some sort of spreadsheet program such as Microsoft’s Excel or the OpenOffice Calc program. All spreadsheets present a rectangular arrangement of cells, with each cell containing a mathematical expression or formula to be evaluated.

Every cell in a spreadsheet has a unique name. The name has a column part and a row part.

But if the cell B1 originally contained the formula 2*A$1, the copied formula would be 2*B$1. The $ indicates that we are fixing the column indicator during copies. Similarly, if the cell B1 originally contained the formula 2*$A$1, the copied formula would be 2*$A$1. (If this isn’t clear, fire up a spreadsheet and try it. We can’t expect to share mental models (abstractions) if we don’t share an experience with the subject domain.)


Example: a book

How to describe a book?


Example: positions within a container

Many of the abstractions that we work with are “containers” of arbitrary numbers of pieces of other data. This is obvious with things like arrays and lists, but is also true of more prosaic items. For example, a book is, in effect, a container of an arbitrary number of authors (and in other variations, an arbitrary number of pages). Any time you have an ordered sequence of data, you can imagine the need to look through it. That then leads to the concept of a position within that sequence, with notions like

2. Abstract Data Types


Adding Interfaces


Definition of an Abstract Data Type

(traditional): An abstract data type (ADT) is a type name and a list of operations on that type.

It’s convenient, for the purpose of this course, to modify this definition just slightly:

Definition (alternate): An abstract data type (ADT) is a type name and a list of members (data or function) on that type.

In either case, when we talk about listing the members, this includes giving their names and their data types (for functions, their return types and the data types of their parameters).

If you search the web for the phrase “abstract data type”, you’ll find lots of references to stacks, queues, etc. - the “classic” data structures. Certainly, these are ADTs. But, just as with the abstractions introduced earlier, each application domain has certain characteristic or natural abstractions that may also need programming interfaces.


ADT Members: attributes and operations

Commonly divided into

2.1 Examples


Calendar Days

Nothing in the definition of ADT that says that the interface has to be written out in a programming language.

UML diagrams present classes as a 3-part box: name, attributes, & operations


Calendar Days: alternative

But we can use a more programming-style interface:

class Day {
public:
   // Attributes
   int getDay();
   void setDay (int);
   int getMonth();
   void setMonth(int);
   int getYear();
   void setYear(int);
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮

See also the interface developed in sections 3.1 and 3.2 of your text (Horstmann).


Notations

class Day {
public:
   // Attributes
   int getDay();
   void setDay (int);
   int getMonth();
   void setMonth(int);
   int getYear();
   void setYear(int);
   
   // Operations
   Day operator+ (int numDays);
   int operator- (Day);
   bool operator< (Day);
   bool operator== (Day);
     ⋮


Cell Names

Here is a possible interface for our cell name abstraction.

cellnameInterface.h

Arguably, the diagram presents much the same information as the code


Example: a book

If we were to try to capture our book abstraction (concentrating on the metadata), we might come up with something like:

bookAbstraction0.h

Example: positions within a container

Coming up with a good interface for our position abstraction is a problem that has challenged many an ADT designer.

bookNumericPositions.h

A problem with this is that the getAuthor function could then be done efficiently only if the authors inside each book were stored in an array or array-like data structure. And then addAuthor and removeAuthor cannot be implemented efficiently. Arrays also pose a difficulty – how large an array should be allocated for this purpose? If we allocate too few, the program crashes. So programmers usually wind up allocating an array large enough to contain as many items as possible – in this case as many authors as have ever collaborated on a single book. This means a lot of wasted storage for most books.

Both C++ and Java provide “expandable” array types, std::vector and java.util.ArrayList, respectively, that grow to accommodate however much data you actually insert into them. These resolve the storage issue, at a slight cost in speed, but still do not permit efficient implementation of addAuthor and removeAuthor.


Iterators

The solution adapted by the C++ community is to have every ADT that is a “container” of sequences of other data to provide a special type for positions within that sequence.



A Possible Position Interface

In theory, we could satisfy this requirement with an ADT like this:

authorPosition0.h

which in turn would allow us to access authors like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        p = p.next())
     cout << "author: " << p.getData() << endl;
}


The Iterator ADT

For historical reasons (and brevity), however, C++ programmers use overloaded operators for the getData() and next() operations:

authorPosition1.h

so that code to access authors would look like this:

void listAllAuthors(Book& b)
{
   for (AuthorPosition p = b.begin(); p != b.end(); 
        ++p)
     cout << "author: " << *p << endl;
}

This ADT for positions is called an iterator (because it lets us iterate over a collection of data).

Java has similar ADTs for positions within a container, called Enumeration and Iterator, which we will see in a later lesson.

2.2 Design Patterns


Iterator as a Design Pattern

design pattern

Pattern, not ADT

In C++, our application code does not actually work with an actual ADT named “Iterator”.


Realizing a Design Pattern

You may have noticed that your textbook is titled “Objected-Oriented Design \emph{& Patterns}”. Keep an eye out, as we move through the semester, for more instances of common design patterns. (You might want to compare this diagram to the more Java-oriented version of this pattern on page 178 - in Java there really is something in the library named Iterator and our application works directly with that and only indirectly with the concrete realization.)

3. ADTs as contracts


ADTs as contracts

An ADT represents a contract between the ADT developer and the users (application programmers).

The Contract


Why the Contract

What do we gain by holding ourselves to this contract?

Look back at the sample ADTs from the previous sections. Note that, although none of them contain data structures or algorithms to actually provide the required functions, all of them provide enough information that you could start writing application code using them.

3.1 Information Hiding


Information Hiding

Every design can be viewed as a collection of “design decisions”.


Encapsulation

Although ADTs can be designed without language support, they rely on programmers’ self-discipline for enforcement of information hiding.

Encapsulation is the enforcement of information hiding by programming language constructs.

In C++, this is accomplished by allowing ADT implementors to put some declarations into a private: area. Any application code attempting to use those private names will fail to compile.

4. ADT Implementations


ADT Implementations

An ADT is implemented by supplying

We sometimes refer to the ADT itself as the \firstterm{ADT specification} or the ADT interface, to distinguish it from the code of the \firstterm{ADT implementation}.

In C++, implementation is generally done using a C++ class.

4.1 Examples


Calendar Day Implementations

Read section 3.3 of your text (Horstmann) for a discussion of three different implementations of the Day ADT. notice how we can choose among implementations for performance reasons, without breaking any application code that relies on the Day interface.


CellName implementation

cellnameImpl.cpp

There are some options here the have not been explored:


Book implementation

We can implement Book in book.h:

book1.h

and in book.cpp:

book1.cpp

We’ll explore some of the details and alternatives of these implementations in the next lesson.