Abstraction and Abstract Data Types

Steven J. Zeil

Last modified: Oct 21, 2023
Contents:

If we were to look at a program that is actually large enough to require a full team of programmers to implement it, you would probably not be surprised to find that it would not be organized as a single, large, monolithic unit, but instead as a large number of cooperating functions. You already know how to design and write functions in C++ . What may, however, come as a bit of a surprise if you have not worked much with programs that size is that even these functions will be further organized into higher level structure.

 

I’ve tried to illustrate that structure in the diagram that you see here. At the top we have the main application program, for example, a spell checker. The code that occurs at this level is very specific to this application and is the main thing that differentiates a spell checker from, let’s say, a spreadsheet. On the other hand, at the very bottom of the hierarchy, we have all the basic primitive data types and operations, such as the int type, the char type, addition, subtraction, and so on, that are provided by our programming language, C++. These primitive operations may very well show up in almost any kind of program.

In between those, we have all the things that we can build on top of the language primitives on our way working up towards our application program. Just above the language primitives we have the basic data structures, structures like linked lists or trees. We’re going to spend a lot of time the semester looking at these kinds of structures - you may already be familiar with some of them. They are certainly very important. And yet, if we stopped at that level, we would wind up “building to order” for every application. As we move from one application to another we would often find ourselves doing the same kinds of coding, over and over again.

What’s wrong with that? Most companies, and therefore most programmers, do not move from one application to a wildly different application on the next project. Programmers who been working on “accounts receivable” are unlikely to start writing compilers the next week, and programmers who have been writing compilers are not going to be writing control software for jet aircraft the week after. Instead, programmers are likely to remain within a general application domain. The people who are currently working on our spell checker may very well be assigned to work on a grammar checker next month, or on some other text processing tool. That means that any special support we can design for dealing with text, words, sentences, or other concepts natural to this application to make may prove valuable in the long run because we can share that work over the course of several projects.

And so, on top of the basic data structures, we expect to find layers of reusable libraries. Just above the basic data structures, the libraries are likely to provide fairly general purpose structures, such as support for look-up tables, user interfaces, and the like. As we move up in the hierarchy, the libraries become more specialized to the application domain in which we’re working. Just below the application level, we will find support for concepts that are very close to the spell checker, such as “words”, “misspellings”, and “corrections”.

The libraries that make up all but the topmost layer of this diagram may contain individual functions or groups of functions organized as Abstract Data Types. In this lesson, we’ll review the idea of Abstract Data Types and how to use C++ classes to implement them. None of the material in this lesson should be entirely new to you - all of it is covered in CS 250.

1 Abstraction

In general, abstraction is a creative process of focusing attention on the main problems by ignoring lower-level details.

In programming, we encounter two particular kinds of abstraction: procedural abstraction and data abstraction.

1.1 Procedural Abstraction

A procedural abstraction is a mental model of what we want a subprogram to do (but not how to do it).

Example: if you wanted to compute the length of the a hypotenuse of a right triangle, you might write something like

double hypotenuse = sqrt(side1*side1 + side2*side2);

We can write this, understanding that the sqrt function is supposed to compute a square root, even if we have no idea how that square root actually gets computed.

When we start actually writing the code, we implement a procedural abstraction by

In practice, there may be many algorithms to achieve the same abstraction, and we use engineering considerations such as speed, memory requirements, and ease of implementation to choose among the possibilities.

For example, the sqrt function is probably implemented using a technique completely unrelated to any technique you may have learned in grade school for computing square roots. On many systems. sqrt doesn’t compute a square root at all, but computes a polynomial function that was chosen as a good approximation to the actual square root and that can be evaluated much more quickly than an actual square root. (This initial approximation may then be refined by several iterations of a numerical technique known as the Newton-Raphson method.)

Does it bother you to hear that the implementation of sqrt might not, in fact, work by computing a square root? It shouldn’t. You are still being guaranteed that if you call sqrt(x), then multiply the return value by itself, you will get something that is approximately x. And isn’t that what you were presumably looking for?

1.2 Data Abstraction

Data abstraction works much the same way. A data abstraction is a mental model of what can be done to a collection of data. It deliberately excludes details of how to do it.

1.2.1 Example: elapsed time

Elapsed time refers to a period or extent of time, as opposed to an instant in time that you might read in a single glance at a clock.

Elapsed time is generally measured in a mixture of hours, minutes, and seconds.

That’s it. That’s probably all you need to know for you and I to agree that we are talking about a common idea.

1.2.2 Example: a book

How to describe a book?

1.2.3 Example: Publishing

In many cases, we have a collection of related data abstractions that work together to define a simulated “world” in which our later programming design will take place.

Let’s explore that by expanding on the abstraction of a book, putting into a context.

 

Books

We start with Books:

There may be other properties of books, but we’re going to stick with just these few simple ones.

The diagram does not attempt to show all of the data associated with books, just enough to illustrate our idea.

 

Authors

 

Books and Authors

 

However, the most interesting thing about Authors is how they work in relationship with books:

Addresses

Addresses are also an abstraction. For our purposes, we will follow U.S. conventions:

Publishers

Our final abstraction in this world is that of Publishers.

 

Again, however, the most interesting thing about Publishers is their relationship to other abstractions:


We will return to this Publishing world example and its component abstractions many times over the course of the semester.

1.2.4 Summary

A data abstraction is a mental model. It’s not enough to start programming with. The evolution from mental model to implemented code proceeds in two steps:

  1. We devise an ADT interface for the data abstraction that describes what we want to do with it.
  2. We implement the data abstraction by
    • choosing an appropriate data structure - a specific construction of data, and
    • providing appropriate operations (algorithms) to manipulate that data.

This course is primarily about data structures and algorithms - the implementation level of the abstractions.

2 Abstract Data Types

The mental model offered by a data abstraction gives us an _informal understanding of how and when to use it. But because it is simply a mental model, it does not tell us enough information to program with it.

An abstract data type (ADT) captures this model in a programming language interface.

Definition: An abstract data type (ADT) is a type name and a set of operations on that type where

  • Users of the ADT are expected to alter/examine values of this type only via the operations provided.

  • The creator of the ADT promises to leave the operation specifications unchanged.

  • The creator of the ADT is allowed to change the code of the operations at any time, as long as it continues to satisfy the specifications.

  • The creator of the ADT is also allowed to change the data structure actually used to implement the type.

2.1 ADTs as contracts

An ADT represents a contract between the ADT developer and the users (application programmers).

What do we gain by holding ourselves to this contract?

Example 1: Book ADT

Suppose we are working on a program to track the offerings of a book publishing house. Among the abstractions that we might want to capture are:

  • A book has a title, one or more authors, a publisher, and a unique identification code.

  • An author has a name, an address, and a numeric identifier that never changes. One author may have written or co-written multiple books.

  • A publisher has a name and address and a catalog of books that they publish.

  • An address consists of a street address, city, state, and zip code.

These are, of course, highly simplified for the purpose of this example. There’s probably a lot more information that would really need to be captured about books and authors, and our address structure assumes U.S. addresses and is fairly limited, even given that assumption.

So, among the ADTs for this system, we will have

Address

   get/set Street
   get/set City
   get/set State
   get/set Zipcode
   

Author

   get/set Name
   get/set Address
   get/set ID

Book

   get/set Title
   get number of Authors
   get/set Author[i]
   get/set ID

Publisher

   get/set Name
   get/set Address
   get the catalog   

Catalog

   get # of books
   get a specific book from the catalog

One possible C++ realization for the Book ADT for that abstraction is shown here:

   typedef ... Book;
 
   void initialize (Book&);
 
   void setTitle(Book&, std::string);
   std::string getTitle (const Book&);
 
   void setPublisher(Book&, Publisher);
   Publisher getPublisher (const Book&);

   int getNumAuthors(const Book&);
   void addAuthor (Book&, Author*);
   void removeAuthor (Book&, Author*);
   Author* getAuthor (const Book&, int);
 
   void setIdentifier(Book&, std::string);
   std::string getIdentifier (const Book&);

This isn’t a particularly good ADT, but it illustrates the main point — with the information provided here, we could write code that manipulates Books, even though we haven’t yet established how the Book will actually be implemented (e.g., how to store and retrieve the multiple authors).

2.2 ADT Implementations

An ADT is implemented by supplying

We sometimes refer to the ADT itself as the ADT specification or the ADT interface, to distinguish it from the code of the ADT implementation.

In C++, implementation is generally done using a C++ class, e.g.,

class Book {
public:
  std::string getTitle() const;
  void setTitle(std::string theTitle);

  Publisher* getPublisher() const;
  void setPublisher(const Publisher*);

  int getNumberOfAuthors() const;

  Author getAuthor (int authorNumber) const;
  void addAuthor (Author);
  void removeAuthor (Author);

  std::string getISBN() const;
  void setISBN(std::string id);

private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  std::string isbn;
  ⋮
};

Like most classes, this uses public/private to enforce the ADT contract.

Would it really be so awful if we did not hide the data members in the private area? Why not just do:

struct Book {
  std::string title;
  Publisher* publisher;
  int numAuthors;
  std::string identifier;

  void addAuthor (Author*);
  void removeAuthor (Author*);
  ⋮
};

Well, for one thing, we have not yet figured out how we actually intend to store all the authors. As it is, our “proper” ADT version would allow you or a member of your team to start writing application code that uses Books even before we’ve made up our mind on that issue.

Second, the ADT version gives us the opportunity to change our minds later if we decide to store the authors in a different way. For example, right now you might be inclined to write something like this:

struct Book {
  std::string title;
  Publisher* publisher;
  int numAuthors;
  std::string identifier;

  void addAuthor (Author*);
  void removeAuthor (Author*);
  Author* authors[1000];
};

but, by the time this course is finished, we will have explored many better options. However, once that array has been made a public part of the interface, changing our mind later would be possible only at the risk of breaking application code that already references that array. By contrast, the ADT interface effectively raises a firewall around our choice of data structure for the authors - no matter what destruction we might later wreak on that authors data structure, none of the rest of the code in our program will get burned.

2.3 Where Do ADTs Come From?

 

ADTs may be

Domain and application-specific ADTs generally reflect the real-world objects found in the application domain.