In general, abstraction is a creative process of focusing attention on the main problems by ignoring lower-level details.
In programming, we encounter two particular kinds of abstraction:
A procedural abstraction is a mental model of what we want a subprogram to do (but not how to do it).
Example:
double hypotenuse = sqrt(side1*side1 + side2*side2);
We can write this, understanding that the sqrt function is supposed to compute a square root, even if we have no idea how that square root actually gets computed.
A data abstraction is a mental model of what
can be done to a collection of data. It deliberately excludes details of how
to do it.
A day (date?) in a calendar denotes a 24-hour period, identified by a specific year, month, and day number.
Every cell in a spreadsheet has a unique name. The name has a column part and a row part.
The row indicators are integer values starting at 1.
The column indicators are case-insensitive strings of alphabetic characters as follows: A
, B
, … , Z
, AA
, AB
, AC
, … , AZ
, BA
, BB
, … ZZ
, AAA
, AAB
, … and so on.
Optional $
markers may appear in front of each part to “fix” the row or column during copying.
Examples: A1, BC23, B$3, $A$1, $ZZZZZ2
How to describe a book?
How to describe a book?
If we are implementing a card catalog and library checkout, it is probably enough to list the metadata, e.g., title, authors, publisher, date.
How to describe a book?
If we are implementing a card catalog and library checkout, it is probably enough to list the metadata, e.g., title, authors, publisher, date.
If, however, we are going to be working on a project involving the full text of the document (e.g., automatic metadata extraction and indexing), then we might need all the pages and all the text.
How to describe a book?
If we are implementing a card catalog and library checkout, it is probably enough to list the metadata, e.g., title, authors, publisher, date.
If, however, we are going to be working on a project involving the full text of the document (e.g., automatic metadata extraction and indexing), then we might need all the pages and all the text.
On the other hand, if we were building bookshelves, we might need more physical attributes such as size and weight!
Many of the abstractions that we work with are “containers” of arbitrary numbers of pieces of other data.
Any time you have an ordered sequence of data, you can imagine the need to look through it. That then leads to the concept of a position within that sequence, with notions like
Adding Interfaces
The mental model offered by a data abstraction gives us an informal understanding of how and when to use it.
But because it is simply a mental model, it does not tell us enough information to program with it.
An abstract data type (ADT) captures this model in a programming language interface.
Definition (traditional): An abstract data type (ADT) is a type name and a list of operations on that type.
It’s convenient, for the purpose of this course, to modify this definition just slightly:
Definition (alternate): An abstract data type (ADT) is a type name and a list of members (data or function) on that type.
An ADT corresponds, more or less, to the public portion of a typical class.
The “members” of an ADT are Commonly divided into
attributes: the things that we think of as being data stored inthe ADT
operations: the functions or behaviors or the ADT
Nothing in the definition of ADT that says that the interface has to be written out in a programming language.
UML diagrams present classes as a 3-part box: name, attributes, & operations
Calendar Days: alternative
But we can use a more programming-style interface:
class Day {
public:
// Attributes
int day;
int month;
int year;
// Operations
Day operator+ (int numDays);
int operator- (Day);
bool operator< (Day);
bool operator== (Day);
⋮
or
class Day {
public:
// Attributes
int getDay();
void setDay (int);
int getMonth();
void setMonth(int);
int getYear();
void setYear(int);
// Operations
Day operator+ (int numDays);
int operator- (Day);
bool operator< (Day);
bool operator== (Day);
⋮
Either of these interfaces captures the sense of the ADT described in the diagram.
Here is a possible interface for our cell name abstraction.
class CellName
{
public:
CellName (std::string column, int row,
bool fixTheColumn = false,
bool fixTheRow=false);
//pre: column.size() > 0 && all characters in column are alphabetic
// row > 0
CellName (std::string cellname);
//pre: exists j, 0<=j<cellname.size()-1,
// cellname.substr(0,j) is all alphabetic (except for a
// possible cellname[0]=='$')
// && cellname.substr(j) is all numeric (except for a
// possible cellname[j]=='$') with at least one non-zero
// digit
CellName (unsigned columnNumber = 0, unsigned rowNumber = 0,
bool fixTheColumn = false,
bool fixTheRow=false);
std::string toString() const;
// render the entire CellName as a string
// Get components in spreadsheet notation
std::string getColumn() const;
int getRow() const;
bool isRowFixed() const;
bool isColumnFixed() const;
// Get components as integer indices in range 0..
int getColumnNumber() const;
int getRowNumber() const;
bool operator== (const CellName& r) const
⋮
private:
⋮
Arguably, the diagram presents much the same information as the code
If we were to try to capture our book abstraction (concentrating on the metadata), we might come up with something like:
class Book {
public:
Book (Author) // for books with single authors
Book (Author[], int nAuthors) // for books with multiple authors
std::string getTitle() const;
void putTitle(std::string theTitle);
int getNumberOfAuthors() const;
std::string getIsBN() const;
void putISBN(std::string id);
Publisher getPublisher() const;
void putPublisher(const Publisher& publ);
AuthorPosition begin();
AuthorPosition end();
void addAuthor (AuthorPosition at, const Author& author);
void removeAuthor (AuthorPosition at);
private:
⋮
};
Author
and Publisher
in this interface?
Coming up with a good interface for our position abstraction is a problem that has challenged many an ADT designer.
A look at our Book
interface may suggest why.
class Book {
public:
Book (Author) // for books with single authors
Book (Author[], int nAuthors) // for books with multiple authors
std::string getTitle() const;
void putTitle(std::string theTitle);
int getNumberOfAuthors() const;
std::string getIsBN() const;
void putISBN(std::string id);
Publisher getPublisher() const;
void putPublisher(const Publisher& publ);
typedef int AuthorPosition;
Author getAuthor (AuthorPosition authorNum) const;
void addAuthor (AuthorPosition at, const Author& author);
void removeAuthor (AuthorPosition at);
private:
⋮
};
One intuitive idea might be to simply number the authors and treat the number as a position indicator, as shown here.
C++ Iterators
The solution adapted by the C++ community is to have every ADT that is a “container” of sequences of other data to provide a special type for positions within that sequence.
begin()
and end()
A function to fetch the data item at the given position.
A function to advance from the current position to the next position in the sequence.
A function to compare two positions to see if they are the same.
A Possible Position Interface
In theory, we could satisfy this requirement with an ADT like this:
class AuthorPosition {
public:
AuthorPosition();
// get data at this position
Author getData() const;
// get the position just after this one
AuthorPosition next() const;
// Is this the same position as pos?
bool operator== (const AuthorPosition& pos) const;
bool operator!= (const AuthorPosition& pos) const;
};
which in turn would allow us to access authors like this:
void listAllAuthors(Book& b)
{
for (AuthorPosition p = b.begin(); p != b.end();
p = p.next())
cout << "author: " << p.getData() << endl;
}
The Iterator ADT
For historical reasons (and brevity), however, C++ programmers use overloaded operators for the getData()
and next()
operations:
Given a container c
and “positions” it
and it0
somewhere within c
:
access the data at that position | *it , it-> |
move it to the next position within c |
++it or it++ |
compare two position values it and it0 |
it == it0 , it != it0 |
get the beginning and ending positions in a container | c.begin() , c.end() |
copy a position | it0 = it |
We call position ADTs that conform to this patter iterators (because they let us iterate over a collection of data).
For example, we might define an iterator for authors in a book as:
class AuthorPosition {
public:
AuthorPosition();
// get data at this position
Author operator*() const;
// get a data/function member at this position
Author* operator->() const;
// move forward to the position just after this one
AuthorPosition operator++();
// Is this the same position as pos?
bool operator== (const AuthorPosition& pos) const;
bool operator!= (const AuthorPosition& pos) const;
};
so that code to access authors would then look like this:
void listAllAuthors(Book& b)
{
for (AuthorPosition p = b.begin(); p != b.end();
++p)
cout << "author: " << *p << endl;
}
Range-Based For Loops
In later years, C++ embraced the idea that iterators would be a pervasive part of typical programming style. New short-hand versions of for
loops were introduced specifically to work with classes that provide iterators via the conventional interfaces.