Iterators: an ADT for Positions

Steven J. Zeil

Last modified: Oct 26, 2023
Contents:

In this lesson we introduce one of the fundamental building blocks of the C++ standard library: the iterator.

1 The Abstraction: Positions Within a Collection of Data

Let’s look back at our Book interface and focus on a couple of awkward interface decisions that we sort of glazed over in our earlier looks.

Here’s what we have so far:

class Book {
public:
      Book();

      Book (std::string theTitle, const Publisher* thePubl,
            int numberOfAuthors, Author* theAuthors,
            std::string theISBN);  // for books with multiple authors

      Book (std::string theTitle, const Publisher* thePubl,
            const Author& theAuthor,
            std::string theISBN);    // for books with single authors

      Book(const Book&);
      ~Book();
      const Book& operator= (const Book&);

    std::string getTitle() const;
    void setTitle(std::string theTitle);

    int getNumberOfAuthors() const;
    Author& getAuthor (int authorNumber);
    const Author& getAuthor (int authorNumber) const;

    void addAuthor (const Author&);
    void removeAuthor (const Author&);

    Publisher& getPublisher() const;
    void setPublisher(const Publisher& publ);

    std::string getISBN() const;
    void setISBN(std::string id);

private:
    ⋮
};

What do I see as awkward about this?

1.1 Redesigning the Constructors

Well consider first the idea of trying to create a Book with several authors already in place. This constructor lets us do that by passing in an array. But if we don’t already have an array, we would need to create one, e.g.,

Author jones = ...;
Author smith = ...;
Author doe = ...;
   ⋮
Author* tempArray[] = {jones, smith, doe};
Book b  (theTitle, thePublisher, 3, tempArray, theISBN);
delete [] tempArray;

It’s kind of ugly. Ugly enough to justify another constructor for the common special case of a book with only one author:

Book b2 (theTitle, thePublisher, jones, theISBN);

But if we are going to have a special case for one author, why not a special case for zero authors, or for two authors?

Or, while we’re griping about that interface, what if we have our authors already in a std::array, or a vector, or … any of the other sequential containers that we will be studying in the coming weeks? Seems a bit silly to unpack data from one data structure just to repack it into a temporary array so that we can pass that to our constructor:

std::Array<Author, 3> authors = {jones, smith, doe};
  ⋮
Author* tempArray = new Author[3];
for (int i = 0; i < tempArray.size(); ++i)
    tempArray[i] = authors[i]; 
Book b  (theTitle, thePublisher, 3, tempArray, theISBN);
delete [] tempArray;

What would be better? Well, for one thing you can see in the code examples above I have twice used the syntax { …comma-separated list of values… }. In early versions of C++, this syntax was limited to initializing arrays. But in modern C++, this syntax is used to provide initial values to all kinds of sequential ADTs. So what we would really like to be able to do is:

Book b  (theTitle, thePublisher, {jones, smith, doe}, theISBN);
Book b2 (anotherTitle, thePublisher, {jones}, anotherISBN);   

That gives us a very nice way to construct books when we know, at compile time, exactly how many authors we will need.

We’ll come back shortly to the problem of initializing books when the number of authors is determined at run time. For now, however, let’s look at another, related problem.

1.2 Providing Access to Individual Elements

Our current interface let’s us access specific elements by index number:

    int getNumberOfAuthors() const;
    const Author& getAuthor (int authorNumber) const;

So if, for example, we wanted to print all of the authors in a book b, we could write

for (int i = 0; i < b.getNumberOfAuthors(); ++i)
    cout << b.getAuthor(i) << endl;

which seems fine. But…

Getting the $i^{\mbox{th}}$ element of a sequence is efficient if the sequence is implemented using ordinary arrays or std::array. With just index into the underlying array, and go directly to the element we want. It takes no more time to access authors[9999] than it does to access authors[0].

But if the sequence were implemented with a linked list, we can only get to the $i^{\mbox{th}}$ element by starting at element 0 and then moving forward, one element at a time, through the sequence. getAuthor(9999) likely takes 10,000 times as long as getAuthor(0)

1.2.1 Please Don’t Bias the Implementers

For that reason, the moment most programmers see an operation like

    Element get (int index) const;

they immediately assume that they will need to use arrays or array-like structures for the implementation. Anything else is automatically too slow.

But there are many circumstances where a non-array structure would be preferred for other operations. So, can we replace that get-by-integer-position operation with something a bit more neutral, something that would not immediately force the programmer into a array-style solution? What we want is a more general notion of a position-within-the-container:

    Author getAuthor (Position authorNumber) const;

Suppose that we wanted to add a new capability to our book – an ability to check to see if a particular author is in the author list and, if so, where in the list that author can be found.

Something like this:

class Book
{
   ⋮
  ??? find (const Author& au) const; 
   ⋮

What data type should the find function return?

1.4 What Operations Should A Position ADT Provide?

So, what do we need our hypothetical position-within-a-collection ADT to do?

  1. Given a position, we want to be able to access the data at that position in a short, fixed amount of time.

    • By “access”, I mean to both examine and store/alter the data at that location.

  2. Given a position, we need to be able to easily and quickly get the “next” position within the container.

    • If we do this enough times, we should eventually visit every position within the container.

  3. We should be able to compare two position values to see if they denote the same position within a container.

  4. For any container, we should be able to get the beginning position in that container and the ending position.

  5. We should be able to quickly and easily copy a position.

If we had an ADT that supported these operations, we could imagine various ways of using it:. For example, looping through a container could look like:

Position p = container.getStartingPosition();
while (p != container.getEndingPosition())
{
    Data x = valueAtPosition(p);
    doSomethingWith(x);
    p = nextPositionAfter(p);
}

A simple sequential search through a container could look like

Position seqSearch (container, x)
{
    Position p = container.getStartingPosition();
    while (p != container.getEndingPosition()
           && valueAtPosition(p) != x)
    {
       p = nextPositionAfter(p);
    }
    return p;
}

In fact, it becomes clear that one of the common uses for such Positions is to support iterating through all or part of a container. For this reason, many past attempts to develop a useful ADT interface for positions within a container have called the resulting ADT an “iterator”.

2 The C++ iterator

This problem of dealing with positions and using them to iterate over collections of data is so common that a specific C++ style has evolved to deal with it. In C++, we call these position ADTs iterators. Iterators are deeply embedded into the C++ std:: library.

Remember that there were five things that we wanted to do with a “position” value. The table below summarizes those things and gives the C++ style operation for that purpose.

Given a container c and iterators it and it0, denoting positions somewhere within c:

access the data at that position *it, it->
move it to the next position within c ++it or it++
compare two position values it and it0 it == it0, it != it0
get the beginning and ending positions in a container c.begin(), c.end()
copy a position it0 = it

One of the container types that supports this abstraction is std::array.

So if I have

std::array<int, N> arr;
   ⋮
sum = 0;
for (int i = 0; i < N; ++i)
    sum += arr[i];

I could rewrite that loop as

std::array<int, N> arr;
   ⋮
sum = 0;
for (std::array<int, N>::iterator it = arr.begin();   ➀
     it != arr.end(); ++it)                           ➁
    sum += *it                                        ➂

If the use of * and -> suggest pointers to you, that’s no accident. C and C++ programmers have long had a kind of second style for dealing with arrays that relied on the fact that an array is a pointer to a sequence of data, and incrementing a pointer is the same as moving it from one element of the array to the next, e.g.

    string* arr = new string[N];
       ⋮
    sum = 0;
    for (string* it = arr; it != arr + N; ++i)
       sum += it->size();

Comparing iterators to this older style of working with arrays.

container c array arr
access the data at that position *it, it-> *it, it->
move it to the next position within c ++it or it++ ++it or it++
compare two position values it and it0 it == it0, it != it0 it == it0, it != it0
get the beginning position c.begin() arr
get the ending position c.end() arr+N
copy a position it0 = it it0 = it

You can see that, other than the problem of finding the beginning and ending positions within a container, the notation is the same.

2.1 The iterator as an ADT Interface

There is no single data type in C++ for iterators. Instead, “iterator” is a pattern that we adopt. Each different type of std container will provide it’s own class that implements this pattern. And, outside of std::, most C++ programmers who face the problem of providing access to a sequence of data values (e.g., the authors for a book) will provide something that implements the same iterator pattern.

If, however, we were to try to capture this pattern in the form of a class, it would probably look something like this.

template <typename T>
class MyIterator
{
public:
  typedef std::forward_iterator_tag iterator_category;  ➀
  typedef T                          value_type;
  typedef ptrdiff_t                  difference_type;
  typedef T*                         pointer;
  typedef T&                         reference;

  iterator();                                           

  // Get the data element at this position
  reference operator*() const;                         ➁
  pointer operator->() const;

  // Move position forward 1 place
  iterator& operator++();                             ➂
  iterator operator++(int);

  // Comparison operators
  bool operator== (const iterator&) const;           ➃
  bool operator!= (const iterator&) const;
private:
  ⋮
};

Although there is no single iterator class, there is actually an iterator template that provides a little bit of aid in writing our own iterator classes. What it does is to provide a mean of writing out that block of internal data type names ( above).

#include <iterator>

template <typename T>
class MyIterator: public iterator<std::forward_iterator_tag, T>
{
public:
  iterator();

  // Get the data element at this position
  reference operator*() const;
  pointer operator->() const;

  // Move position forward 1 place
  iterator& operator++();
  iterator operator++(int);

  // Comparison operators
  bool operator== (const iterator&) const;
  bool operator!= (const iterator&) const;
private:
  ⋮
};

2.2 Names for iterator types

Most containers will actually provide two different iterator types:

Suppose we have a class C that will provide iterators.

  • Iterators for use with objects of type C or C& are called C::iterator.

  • Iterators for use with objects of type const C or const C& are called C::const_iterator.

2.3 Working with iterators

We now have all the tools we need to show some of the more basic patterns of working with iterators. We already know three containers that supply iterators: ordinary arrays, std::array and string. Ordinary arrays and std::arrays can be containers of almost anything we like, but strings are containers of chars.

First, suppose that we want to write a function that would add up all of the values in a std::vector of doubles:

double sum (std::vector<double>& v) {
   double result = 0.0;
   for (std::vector<double>::iterator it = v.begin(); it != v.end(); ++it)
      result += *it;
   return result;
}

or

double sum (std::vector<double>& v) {
   double result = 0.0;
   std::vector<double>::iterator it = v.begin(); 
   while (it != v.end())
     {
      result += *it;
      ++it;
     }
   return result;
}

Now, those loops do exactly the same thing as these next ones:

double sum (std::vector<double>& v) {
   double result = 0.0;
   for (int i = 0; i < v.size(); ++i)
      result += v[i];
   return result;
}

or

double sum (std::vector<double>& v) {
   double result = 0.0;
   int i =0;
   while (i < v.size())
     {
      result += v[i];
      ++i;
     }
   return result;
}

So why should we bother with the iterator forms? Well, for arrays, std::arrays, and std::vectors, I might would not use the iterator forms, because the loops based on an integer counter and the [ ] operator may be simpler and easier to read.

But shortly we will begin working with container types that don’t provide a nice convenient [ ] operator. The iterators will be our only choice then. Also, because all of the std containers support iterators, we can write some algorithms using iterators that will work on many different kinds of containers.

Another reason to use the iterator style is we can generalize this function to work on regular arrays, std::arrays, std::vectors, and, in fact, to work on almost any container at all.

2.3.1 Iterators and Templates

First, though, suppose that we wanted to slightly generalize this function

template <size_t N>
double sum (std::vector<double>& v)
{
   double result = 0.0;
   for (int i = 0; i < v.size(); ++i)
      result += v[i];
   return result;
}

so that, instead of adding up every number in the array, it only added up the numbers in some range of positions. We could do that by providing a starting and ending position.

template <size_t N>
double sum (std::vector<double>& v, int start, int stop)
{
   double result = 0.0;
   for (int i = start; i < stop; ++i)
      result += v[i];
   return result;
}

I’ve followed the usual C++ convention of interpreting the ending position as being the position just after the last data element that we will actually use.

So if we wanted to add up all of the elements in a vector, we would say

double theSum = sum(myVector, 0, myVector.size());

but if we wanted to get separate sums for the first and second halves of the vector, we could so:

double firstHalfSum = sum(myVector, 0, myVector.size()/2);
double secondHalfSum = sum(myVector, myVector.size()/2, myVector.size());

Now, let’s consider replacing the start and stop integer positions with iterators:

template <size_t N, typename Iterator>
double sum (std::array<double,N>& v, Iterator start, Iterator stop)
{
   double result = 0.0;
   for (Iterator i = start; i < stop; ++i)
      result += *i;
   return result;
}

Now, an interesting thing has happened. The container v itself is no longer used anywhere inside the body of the function. We don’t need it, and can drop it from the parameter list and then drop the size N from the template parameter list:

template <typename Iterator>
double sum (Iterator start, Iterator stop)
{
   double result = 0.0;
   for (Iterator i = start; i < stop; ++i)
      result += *i;
   return result;
}

Now, if we wanted to add up all of elements of a std::array, we could do

std::array<double, 100> myArray;
  ⋮
double theSum = sum(myArray.begin(), myArray.end());

But we could also use this function with a vector:

std::vector<double> myVector;
  ⋮
double theSum = sum(myVector.begin(), myVector.end());

And if we wanted to get separate sums for the first and second halves of a conventional array, we could so:

double* arr = new double[N];
  ⋮
double firstHalfSum = sum(arr, arr+N/2);
double secondHalfSum = sum(arr+N/2, arr+N);

The sum function template really doesn’t care what kind of container the data is in, so long as that container provides working iterators.


Designing functions to work on a range of data positions by passing a pair of iterators rather than the container from which those iterators are obtained is very common in C++. It allows us to write function templates that can work on many different std:: containers and on ordinary arrays.

2.4 iterator vs. const_iterator

Now let’s look a little bit more closely at those example functions.

We probably would not have actually declared them as

double sum (std::vector<double>& v);

A problem with this is that it suggests that v is both an input and an output parameter.

But we really don’t expect the operation of computing a sum to change the values in the array. To help catch buggy code that might inadvertently make such changes, the principle of const-correctness says that we should pass that array by copy (which would be potentially quite expensive) or as a const reference:

double sum (const std::vector<double>& v);

But then our code

double sum (const std::vector<double>& v) {
   double result = 0.0;
   for (std::array<double,N>::iterator it = v.begin(); it != v.end(); ++it)
      result += *it;
   return result;
}

would not compile. We would get compilation errors near the calls v.begin() and v.end() because those functions would actually return const_iterators, not iterators. So we would be trying to assign a const_iterator (v.begin()) to an iterator variable it and trying to compare an iterator it to a const_iterator (v.end()). Neither of those operations is going to be legal.


The fix is to replace our mentions of “iterator” by “const_iterator”:

double sum (const std::vector<double>& v) {
   double result = 0.0;
   for (std::vector<double>::const_iterator it = v.begin(); it != v.end(); ++it)
      result += *it;
   return result;
}

2.5 auto

C++11 introduced a feature that can help here. When you are declaring variables or parameters, you don’t have to write out the data type if the declaration provides enough information for the compiler to deduce the type. Instead, you simply use the keyword “auto”. For example, you could declare an integer variable like this

auto k = 12;

because the compiler can tell from the initial value being assigned what the data type of k would be.

You cannot, however, do the same here:

auto m;
cin >> m;
int k = m + 1;

because there are no clues within the declaration of m as to what it should be.


Now, when we call a container’s begin() or end() function, the compiler can look at the container type, determine whether it is a const container or not, and determine what data type would be returned by the begin() call. So we can rewrite our last example as

double sum (const std::vector<double>& v) {
   double result = 0.0;
   for (auto it = v.begin(); it != v.end(); ++it)
      result += *it;
   return result;
}

which is definitely an improvement.


auto can, when used properly, reduce errors and enhance the readability of your code. It can be a bit of a trap for beginning programmers, however, by hiding the “real” type of a variable.

So if you, as a programmer, use auto because you don’t want to type out a long awkward type name, that’s fine. But if you use it as a crutch because you don’t know what the data type really is, you’re not helping yourself because you won’t know what you can actually do with that variable or expression.

3 Example: Adding Iterators to the Book Interface

We can think of our Book class as a container of Authors.

Earlier, we dealt with the problem providing access to individual authors by using integer indices:

class Book {
public:
    ⋮
    int getNumberOfAuthors() const;
    Author getAuthor (int authorNumber) const;
    ⋮

Now, there’s nothing really wrong with getNumberOfAuthors. It’s a perfectly useful function in its own right. But, as we have noted earlier, the get-author-by-integer-index operation biases the possible implementations of Book in a way that we don’t like. We will relive this bias by replacing getAuthor with an iterator:

class Book {
public:
    ⋮
  typedef ... iterator;
  typedef ... const_iterator;
    ⋮
  int getNumberOfAuthors() const;

  iterator begin();
  const_iterator begin() const;
  iterator end();
  const_iterator end() const;
    ⋮

Now, we have to choose or create a suitable data structure to use for the iterators, and implement the begin and end functions using that data type.

As always, when faced with the need to choose a data structure, we should consider options such as

  1. Reuse a data type or ADT that provides the behavior we want.
  2. Adapt an ADT that almost provides the behavior we want via inheritance.
  3. Create an appropriate ADT “from scratch”.

3.1 Creating an Iterator Class

As it happens, we won’t need to create our Book::iterator from scratch. But if we did, we would start by introducing class names for that purpose:

#include "authorIterator.h"

class Book {
public:
    ⋮
  typedef AuthorIterator iterator;
  typedef AuthorConstIterator const_iterator;
    ⋮
  int getNumberOfAuthors() const;

  iterator begin();
  const_iterator begin() const;
  iterator end();
  const_iterator end() const;
    ⋮
private:
    ⋮
};

Then, to actually declare the new iterator classes, we would fall back to the pattern for iterator classes discussed earlier:

class AuthorIterator
{
public:
  typedef std::forward_iterator_tag iterator_category;
  typedef Author                     value_type;
  typedef ptrdiff_t                  difference_type;
  typedef value_type*                pointer;
  typedef value_type&                reference;

  AuthorIterator();

  // Get the data element at this position
  reference operator*() const;
  pointer operator->() const;

  // Move position forward 1 place
  AuthorIterator& operator++();
  AuthorIterator operator++(int);

  // Comparison operators
  bool operator== (const AuthorIterator&) const;
  bool operator!= (const AuthorIterator&) const;
private:
  ⋮
};

class AuthorConstIterator
{
public:
  typedef std::forward_iterator_tag    iterator_category;
  typedef Author                       value_type;
  typedef ptrdiff_t                    difference_type;
  typedef const value_type*  pointer;
  typedef const value_type&  reference;

  AuthorConstIterator();

  // Get the data element at this position
  reference operator*() const;
  pointer operator->() const;

  // Move position forward 1 place
  AuthorConstIterator& operator++();
  AuthorConstIterator operator++(int);

  // Comparison operators
  bool operator== (const AuthorConstIterator&) const;
  bool operator!= (const AuthorConstIterator&) const;
private:
  ⋮
};

The const and non-const iterator declarations are identical except for the highlighted items.

As previously discussed, some containers cannot safely allow code to change the data via iterators. In that case, we would define the const_iterator as above and then simply reuse that type for the iterator:

typedef const_iterator iterator;

The implementation of these classes depends on the underlying data structure used to hold the authors within a book. For example, if we were using a dynamic array:

class Book {
public:
  ⋮
private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author* authors; // array of authors
  std::string isbn;
};

then we could use pointers to individual array elements as our underlying data structure for the iterator:

class AuthorIterator
{
public:
    ⋮
private:
  Author* position;
  friend class Book;
};

The “friend” declaration means that the Book class will have access to the private members of the AuthorIterator class.

Then in Book, we implement begin() and end():

Book::iterator Book::begin()
{
    AuthorIterator b;
    b.position = authors;
    return b;
}

Book::iterator Book::end()
{
    AuthorIterator b;
    b.position = authors+numAuthors;
    return b;
}

and the operations for AuthorIterator are pretty simple:

AuthorIterator::reference AuthorIterator::operator*() const
{
    return *position;
}

AuthorIterator::pointer AuthorIterator::operator->() const
{
    return position;
}

// Move position forward 1 place
AuthorIterator& AuthorIterator::operator++()
{
    ++position;
    return *this;
}
AuthorIterator AuthorIterator::operator++(int)
{
  AuthorIterator saved = *this;
  position++;
  return saved;
}

// Comparison operators
bool AuthorIterator::operator== (const AuthorIterator& p) const
{
   return position == p.position;
}

bool AuthorIterator::operator!= (const AuthorIterator& p) const
{
   return position != p.position;
}

3.2 Reusing Existing Data Types as Iterators

Often, however, once we have decided on the underlying data structure, we will find that it already provides a data type that supports the *, ->, *, and ++ operations with behaviors identical to what we need for an iterator. In that case, we would not need to create a new class.

For example, if were using a dynamic array to hold out authors, we could use simple pointers to array elements as our iterators

class Book {
public:
  typedef Author*       iterator;
  typedef const Author* const_iterator;
  ⋮
  ⋮
private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  Author* authors; // array of authors
  std::string isbn;
};

and then merely need to implement the Book begin and end operations:

Book::iterator Book::begin()
{
    return authors;
}

Book::iterator Book::end()
{
    return authors+numAuthors;
}

On the other hand, if we used a std::array to hold our authors, we could use its iterators as our Book’s iterators.

class Book {
public:
  typedef std::array<Author,maxAuthors>::iterator       iterator;
  typedef std::array<Author,maxAuthors>::const_iterator const_iterator;
  ⋮
  ⋮
private:
  std::string title;
  Publisher* publisher;
  int numAuthors;
  static const int maxAuthors = 12;
  std::array<Author,maxAuthors> authors
  std::string isbn;
};

Book::iterator Book::begin()
{
    return authors.begin();
}

Book::iterator Book::end()
{
    return authors.end();
}

4 Example: Cleaning Up the Book Constructors

Another problem with our older interface for Book was the awkwardness of initializing a book with various number of authors.

Our old version had been:

  Book (std::string theTitle, const Publisher* thePubl,
           int numberOfAuthors, Author* theAuthors,
           std::string theISBN);

  Book (std::string theTitle, const Publisher* thePubl,
       const Author& theAuthor,
       std::string theISBN);

but we had suggested that instead of creating temporary arrays every time we want to create a new Book:

Author textAuthors[] = {budd};
Book text361 ("Data Structures in C++", macmillan, 1, 
           textAuthors, "0-201-10758");
Book ootext ("Introduction to Object Oriented Programming", macmillan,
          1, textAuthors, "0-201-12967");

Author recipeAuthors[] = {doe, smith};
Book recipes ("Cooking with Gas", 2, recipeAuthors, "0-124-46821");

we would like to be able to say things like

Book text361 ("Data Structures in C++", macmillan,
           {budd}, "0-201-10758");
Book ootext ("Introduction to Object Oriented Programming", macmillan,
          {budd}, "0-201-12967");

Book recipes ("Cooking with Gas", {doe, smith}, "0-124-46821");

in cases where the number of authors is known at compile-time, and would like to be able to use nearly any container of authors in cases where the authors are determined at run-time.

4.1 initializer_list

What is the data type of an expression like {budd} or {doe, smith}?

That is an initializer_list (not to be confused with the initialization lists that is part of the constructor syntax.

initializer_list is a container provided in the header <initializer_list>. It’s rather like a limited form of the std::array in that it is template providing a simple sequence of values.

In fact, the only operations provided by initializer_list are


So we can add an ability to use initializer lists to Book with a constructor

Book (std::string theTitle, const Publisher* publ,
      std::initializer_list<Author> theAuthors,
      std::string theISBN);

which would be implemented as

Book::  Book (std::string theTitle, const Publisher* publ,
         std::initializer_list<Author> theAuthors,
         std::string theISBN)
 : title(theTitle), publisher(publ),
   numAuthors(0), authors(new Author*[maxAuthors]),
   isbn(theISBN)
{
  for (auto it = theAuthors.begin(); it != theAuthors.end(); ++it)
  {
    authors[numAuthors] = *it;
    ++numAuthors;
  }
};

Note the use of the initializer_list to supply a list of authors, and the iterator-style loop to copy the authors into the Book.

4.2 Constructors with Start-Stop Ranges

For cases where the number of authors is not known at compile time, we can use a common pattern in C++ programming:

Pass ranges of data to function templates as a pair of iterators, denoting a starting and ending position.

This allows us to pass a whole range of values that could be stored in any container type that provides iterators (and, by modern C++ convention, most of them, even containers that are not part of the std library, do provide iterators.

We can use this pattern to allow a Book to be constructed from a range of sequential position from almost any container:

template <typename Iterator>
Book (std::string theTitle, const Publisher* publ,
      Iterator startAuthors, Iterator stopAuthors,
      std::string theISBN);

implemented as

template <typename Iterator>
Book::Book (std::string theTitle, const Publisher* publ,
            Iterator startAuthors, Iterator stopAuthors,
            std::string theISBN)
 : title(theTitle), publisher(publ),
   numAuthors(0), authors(new Author*[maxAuthors]),
   isbn(theISBN)
{
  for (auto it = startAuthors; it != stopAuthors; ++it)
  {
    authors[numAuthors] = *it;
    ++numAuthors;
  }
};

This would allow us to write code like:

void readBook (std::istream& in)
{
   string title, isbn;
   Publisher pub;
   cin >> title >> isbn >> pub;
   int numAuthors;
   cin >> numAuthors;
   vector<Author> temp;
   for (int i = 0; i < numAuthors; ++i)
   {
      Author au;
      cin >> au;
      temp.push_back(au);
   }
   Book b (title, pub, temp.begin(), temp.end(), isbn);
   return b;
}

For comparison and for the sake of completeness,, here is the std::array version of our Book class, with the same changes made to the constructors.

abook.h
#ifndef Book_H
#define Book_H

#include <initializer_list>
#include <string>
#include <array>
#include "author.h"

class Publisher;



class Book {
private:
	std::string title;
	const Publisher* publisher;
	int numAuthors;
	std::string isbn;
	static const int MaxAuthors = 12;
	std::array<Author,MaxAuthors> authors;

public:
	typedef std::array<Author,MaxAuthors>::iterator iterator;
	typedef std::array<Author,MaxAuthors>::const_iterator const_iterator;

	Book();

	Book (std::string theTitle, const Publisher* publ,
	      std::initializer_list<Author> theAuthors,
	      std::string theISBN);

	template <typename Iterator>
	Book (std::string theTitle, const Publisher* publ,
	      Iterator startAuthors, Iterator stopAuthors,
	      std::string theISBN);

	std::string getTitle() const {return title;}
	void setTitle(std::string theTitle) {title = theTitle;}

	int getNumberOfAuthors() const;

    iterator begin();
    const_iterator begin() const;
    iterator end();
    const_iterator end() const;

	void addAuthor (Author);
	void removeAuthor (Author);

	const Publisher* getPublisher() const {return publisher;}
	void setPublisher(const Publisher* publ) {publisher = publ;}

	std::string getISBN() const {return isbn;}
	void setISBN(std::string id) {isbn = id;}

};

template <typename Iterator>
Book::Book (std::string theTitle, const Publisher* publ,
      Iterator startAuthors, Iterator stopAuthors,
      std::string theISBN)
: title(theTitle), publisher(publ), numAuthors(0),
  isbn(theISBN)
{
	while (startAuthors != stopAuthors)
	{
		authors[numAuthors] = *startAuthors;
		++numAuthors;
		++startAuthors;
	}
}



#endif
abook.cpp
/*
 * book.cpp
 *
 *  Created on: May 23, 2018
 *      Author: zeil
 */

#include "abook.h"
#include <cassert>
#include <algorithm>

using namespace std;


Book::Book()
: title(), publisher(nullptr), numAuthors(0),
  isbn()
{

}

Book::Book (std::string theTitle, const Publisher* publ,
      std::initializer_list<Author> theAuthors,
      std::string theISBN)
: title(theTitle), publisher(publ), numAuthors(theAuthors.size()),
  isbn(theISBN)
{
	int i = 0;
	for (const Author& au: theAuthors)
	{
		authors[i] = au;
		++i;
	}
}



int Book::getNumberOfAuthors() const
{
	return numAuthors;
}

Book::iterator Book::begin()
{
	return authors.begin();
}
Book::const_iterator Book::begin() const
{
	return authors.begin();
}
Book::iterator Book::end()
{
	return authors.begin() + numAuthors;
}
Book::const_iterator Book::end() const
{
	return authors.begin() + numAuthors;
}

void Book::addAuthor (Author au)
{
	assert(numAuthors < MaxAuthors);
	authors[numAuthors] = au;
	++numAuthors;
}
void Book::removeAuthor (Author au)
{
    auto pos = find(begin(), end(), au);
    if (pos != end())
    {
    	copy (pos+1, end(), pos);
    	--numAuthors;
    }
}


5 Iterator Variations

There are a number of common variations on the basic iterator interface.

5.1 Forward Iterators

forwardIterator.h
#include <cstddef>
class MyIterator
{
public:
  typedef std::forward_iterator_tag iterator_category;
  typedef T                          value_type;
  typedef ptrdiff_t                  difference_type;
  typedef T*                         pointer;
  typedef T&                         reference;
  ⋮

  // Get the data element at this position
  reference operator*() const;
  pointer operator->() const;

  // Move position forward 1 place
  MyIterator& operator++();
  MyIterator operator++(int);

  // Comparison operators
  bool operator== (const iterator&) const;
  bool operator!= (const iterator&) const;
private:
  ⋮
};

5.2 Bidirectional Iterators

An obvious extension is to allow iterator applications to move backwards as well. The operator -- provides this capability.

biIterator.h
class MyIterator
{
public:
  typedef std::bidirectional_iterator_tag iterator_category;
  typedef T                          value_type;
  typedef ptrdiff_t                  difference_type;
  typedef T*                         pointer;
  typedef T&                         reference;
  ⋮

  // Get the data element at this position
  reference operator*() const;
  pointer operator->() const;

  // Move position forward 1 place
  iterator& operator++();
  iterator operator++(int);

  // Move position backward 1 place
  iterator& operator--();
  iterator operator--(int);

  // Comparison operators
  bool operator== (const iterator&) const;
  bool operator!= (const iterator&) const;
private:
  ⋮
};

These are called bidirectional iterators.

The std::list provides bi-directional iterators.

5.3 Random Access Iterators

The ++ and -- operators allow us to move only one position at a time. Some containers provide “random access” iterators that can move any integer number of places forward or back.

randomIterator.h
class MyIterator
{
public:
  typedef std::random_access_iterator_tag iterator_category;
  typedef T                          value_type;
  typedef ptrdiff_t                  difference_type;
  typedef T*                         pointer;
  typedef T&                         reference;
  ⋮

  // Get the data element at this position
  reference operator*() const;
  pointer operator->() const;

  // Move position forward 1 place
  iterator& operator++();
  iterator operator++(int);

  // Move position backward 1 place
  iterator& operator--();
  iterator operator--(int);


  // Random access operations

  ptrdiff_t operator- (const iterator&) const; 
    // how many positions apart are these iterators?

  iterator operator+ (ptrdiff_t k) const; 
    // Get a new iterator k positions past this one.

  iterator operator- (ptrdiff_t k) const; 
    // Get a new iterator k positions before this one.

  // Comparison operators
  bool operator== (const iterator&) const;
  bool operator!= (const iterator&) const;
  bool operator<  (const iterator&) const;
  bool operator<= (const iterator&) const;
  bool operator>  (const iterator&) const;
  bool operator>= (const iterator&) const;
private:
  ⋮
};

The std:array, std::initializer_list, and std::vector classes provide random-access iterators. Array pointers are considered to be random-access iterators as well.

5.3.1 Random Access Iterator Ops

With a random access iterator, it, we can get a position 5 places past it by computing it+5. We can get a position 3 places in front of it by the expression it-3 or it + (-3).

Subtracting one random-access iterator from another yields the number of positions apart that they are: (it+5) - it == 5.

Random access iterators are needed when an algorithm needs to “jump around” in a container rather than plod methodically from one end of the container to the other. For example, the iterator-based rewrite of the binary search algorithm that would compute the next place to look this way:

int low, high;
  ⋮
int mid = ( low + high ) / 2;
RandomAccessIterator midPos = start + mid;

Random-access iterators also support the relational operators <, <=, >, and >= for determining whether one position comes before or after another within the container.

5.4 Input Iterators

Input iterators are like forward iterators except that

Input iterators are most commonly used to walk through the contents of an input stream, e.g., cin:

    istream_iterator<string> in (cin);
    while (in != istream_iterator<string>()) {
       cout << "Next string in input is " << *in << endl;
       ++in;
    }

The restrictions of an input iterator are appropriate here because one can read from an input stream but cannot store data in it.

5.5 Output Iterators

Similarly, we have output iterators, which are like forward iterators except that

Output iterators are most commonly used to insert data into successive positions of an output stream, e.g., cout:

    ostream_iterator<int> out (cout, "\n");
    for (int i = 0; i < 100;  i *= 2) {
       *out = i;  // writes an int to cout followed by newline
       ++out;
    }

The restrictions of an output iterator are appropriate here because one can write into an output stream but cannot read data from it.

5.6 Reverse Iterators

Some containers (including vectors and lists) provide reverse iterators, which behave like the container’s normal iterators except that

For example:

vector<int> v {1, 2, 3, 4};
for (vector<int>::iterator i = v.begin();
     i != v.end(); ++i)
{
    cout << *i << ' ';
}      
cout << endl;
for (vector<int>::reverse_iterator i = v.rbegin();
     i != v.rend(); ++i)
{
    cout << *i << ' ';
}      
cout << endl;

would print

1 2 3 4
4 3 2 1

Some functions, such as vector::insert, will accept an iterator as a parameter but will not accept a reverse iterator. Luckily, a reverse iterator can be converted to a forward iterator via base(). However, base() returns the position just after the position denoted by the reverse iterator.

vector<int>::reverse_iterator rit1
    = myVector.rbegin(); // points to last element in myVector
vector<int>::iterator it1 = rit1.base(); // == myVector.end()

vector<int>::reverse_iterator rit2
    = myVector.rend(); // points just before first element in myVector
vector<int>::iterator it2 = rit2.base(); // == myVector.begin()

Technically, reverse iterators are not categories of iterators in the same sense as forward, bidirectional, etc. A reverse iterator itself can be forward, bidirectional, or random access. The “reverse” simply refers to which end of the container it starts from and the direction in which ++ moves it.

6 Example: Searching via Iterator Variants.

In this example, let’s develop an efficient utility function for searching through data that is being kept in ascending order.

/**
 * Search through a range of data in positions [start..stop) (i.e.,
 * positions beginning at start and going up to, but not including, stop)
 * looking for the value key.   All data in the array being searched
 * should be in non-descending order (i.e., for any two adjacent values x
 * and y, x <= y).
 *
 * Return the position containing key or, if key is not in the range, the
 * position at which key should be inserted in order to maintain the
 * non-descending order of the data.
 *
 * @param start  position of the first data element to be considered.
 * @param stop   position just after the last data element to be considered.
 * @param key    the value we are searching for
 * @return  The position containing key of, if key is not in the range,
 *          the position at which key could be inserted to preserve the
 *          data ordering.
 */
template <typename Iterator, typename Value>
Iterator lower_bound (Iterator start, Iterator stop, const Value& key)

For example, if we had

std::array<string, 5> names = {"Adams", "Baker", "Clarke", "Davis", "Evans"};
int arr[] = {0, 2, 4, 8};

then

lower_bound (names.begin(), names.end(), "Baker");

would return the position of the second string, and

lower_bound (arr, arr+4, 3)

would return the position in arr currently occupied by the number 4.

lower_bound (names.begin(), names.end(), "Zeil");

would return names.end(), and

lower_bound (arr, arr+4, 10)

would return the position arr+4.

A simple way to search data that is ordered is to start at the beginning of the sequence, and move forward one step at a time until we either find the data we are looking for or we encounter a value larger than what we are looking for. We can call this an ordered search. For example,

template <typename Comparable>
int seqOrderedSearch(const Comparable list[], int listLength, 
               Comparable key)
{
    int loc = 0;

    while (loc < listLength && list[loc] < key)
      {
       ++loc;
      }
    return loc; 
}

This code finds the integer position of key within an array or the position of the first item larger than key. We can rewrite that into iterator style, so that it can work on any container, like this:

template <typename Iterator, typename Value>
Iterator orderedSearch (Iterator start, Iterator stop, const Value& key)
{
    while ((start != stop) && (*start < key))
       ++start;
    return start;
}

You can see that this is the same underlying algorithm, just using the startstop range convention instead of passing the array and using *start in place of indexing into the array.

Although fairly simple, this function can be slow when applied to large amounts of data. For example, if we applied this to an array of 10000 items, on average it would look though 5000 of them before stopping. If we applied this to an array of 100000 items, on average it would look though 50000 of them before stopping.

For larger amounts of data, a binary search works better. The binary search works by considering a range [lowhigh] of possible positions where the key might be located. * At each step we look at the value in the middle of that range. If that middle value is the key, we are done. * If the key is smaller than the middle value, then we know the key can only be found in the lower half of the [lowhigh] range. * If the key is larger than the middle value, then we know the key can only be found in the upper half of the [lowhigh] range. Either way, we can immediately cut the range we are considering in half.


/**
 * Performs the standard binary search using two comparisons per level.
 * Returns index where item is found or or the index where it could
 * be inserted  if not found
 *
 * From Weiss,  Data Structures and Algorithm Analysis, 4e
 * ( modified SJ Zeil)
 */
template <typename Comparable>
int binarySearch( const Comparable* a, int size, const Comparable & x )
{
    int low = 0, high = size - 1;

    while( low <= high )
    {
       int mid = ( low + high ) / 2;

       if( a[ mid ] < x )
         low = mid + 1;
       else if( a[ mid ] > x )
         high = mid - 1;
       else
         return mid;   // Found
    }
    return low;
}

Suppose, for example, that we apply this function to an array of 10000 items.


And so on:

Pass # Possible positions
1 10000
2 4999
3 2499
4 1249
5 624
6 312
7 156
8 78
9 39
10 19
11 9
12 4
13 2
14 1

We might, of course, get lucky and find the key earlier, but with this algorithm we can search through 10000 elements while examining at most 14 of them. That’s clearly much faster than the simpler ordered search.


We can write this to work with iterators:

template <typename RandomAccessIterator, typename Value>
RandomAccessIterator binarySearch (RandomAccessIterator start,
                       RandomAccessIterator stop,
                       const Value& key)
{
    auto low = 0;
    auto high = stop - start - 1;                     ➀

    while( low <= high )
    {
       auto mid = ( low + high ) / 2;
       RandomAccessIterator midPos = start + mid;   ➁
       if( *midPos < key )
         low = mid + 1;
       else if( key < *midPos )
         high = mid - 1;
       else
         return midPos;   // Found
    }
    return start + low;                             ➂
}

Now, this code makes use of two special properties that are only available on random access iterators.

So this code will only work on ranges of data that are described using random access iterators. But I’m OK with that restriction. The very nature of binary search is to jump around within the range of data being searched. We can’t expect that to work well (or at all) with iterators that only allow us to move one step at a time.

6.2 Letting the Compiler Choose

The rest of this example begins to delve into some real C++ wizardry. Consider the remainder of this section to be optional reading.

However, this does not get us our all-purpose lower_bound function that we described at the start of this example. Instead, we have two different functions to choose from.

We could leave this choice up to the programmer, but, interestingly enough, it is possible to have the compiler choose the appropriate search function by looking at what type of iterator we are giving it.

The key here is in the idea of an “iterator category”. When we looked at the various data types provided by the typical iterator ADT interface, one of the data types we provided was:

template <typename T>
class MyIterator
{
public:
  typedef std::forward_iterator_tag iterator_category;

We declared the “category” of MyIterator to be std::forward_iterator_tag. Now, std::forward_iterator_tag is a class type. It does not provide any data or function members of particular interest. It simply serves as an identification of what kind of iterator we are providing. There are several of these tag functions:

Iterator Variant Tag operations provided
input iterator std::input_iterator_tag examine data, ++
output iterator std::out_iterator_tag store data, ++
forward iterator std::forward_iterator_tag read/store data, ++
bidirectional iterator std::bidirectional_iterator_tag read/store data, ++, --
random access iterator std::random_access_iterator_tag read/store data, ++, --, -, + int

All of these tags are data types. The values of these types are pretty much useless, but the types themselves can be used just like other types. In particular, suppose that I were to require programmers who wanted to use our search function to supply a pointer to one of these tag values as a parameter. I could write different, overloaded versions of our search function associated with these different tags:

template <typename Iterator, typename Value>
Iterator lower_bound (Iterator start, Iterator stop, const Value& key,
           const std::random_access_iterator_tag*)
{
    return binarySearch(start, stop, key);
}

template <typename Iterator, typename Value>
Iterator lower_bound (Iterator start, Iterator stop, const Value& key,
           const std::input_iterator_tag*)
{
    return orderedSearch(start, stop, key);
}

template <typename Iterator, typename Value>
Iterator lower_bound (Iterator start, Iterator stop, const Value& key,
           const std::forward_iterator_tag*)
{
    return orderedSearch(start, stop, key);
}

template <typename Iterator, typename Value>
Iterator lower_bound (Iterator start, Iterator stop, const Value& key,
           const std::bidirectional_iterator_tag*)
{
    return orderedSearch(start, stop, key);
}

So, if I were to do

std::array<string, 5> names = {"Adams", "Baker", "Clarke", "Davis", "Evans"};
auto position = lower_bound (names.begin(), names.end(), "Baker",
                        new std::random_access_iterator_tag);

the compiler would choose the version of lower_bound that uses binarySearch, but if I change that call to

auto position = search (names.begin(), names.end(), "Baker",
                        new std::forward_iterator_tag);

the compiler would choose one of the search functions that uses orderedSearch.

OK, that use of the tag is getting ugly, but I’m not done yet. Since we don’t actually use the tag value for anything, any old pointer will do. It doesn’t even need to point at a real object:

std::array<string, 5> names = {"Adams", "Baker", "Clarke", "Davis", "Evans"};
auto position = lower_bound (names.begin(), names.end(), "Baker",
                        (std::random_access_iterator_tag)null_ptr);

The null pointer will do the job, so long as we cast it to the correct tag type.

Now comes the clever bit. We don’t need to have the programmer supply the tag type at all. The compiler can do that for us as well. We just defined the 3-parameter version of lower_bound, the thing that we wanted all along as:

template <typename Iterator, typename Value>
Iterator lower_bound (Iterator start, Iterator stop, const Value& key)
{
    typedef std::iterator_traits<Iterator> traits;   ➀
    return lower_bound (start, stop, key,
                   (typename traits::iterator_category*)nullptr); ➁
}

So now our programmer can write

auto position = lower_bound (names.begin(), names.end(), "Baker");

This 3-parameter version of lower_bound will compile into a call to one of our 4-parameter versions of lower_bound. The compiler will choose the one that matches the traits for whatever iterator we are working.

The net result is that the 3-parameter lower_bound function will choose to do binary search when it can, and fall back to ordered search when given anything other than a random access iterator.

The lower_bound function that we have just described, that switches between binary search and ordered search based on the iterator category, is already in the std:: library in the <algorithm> header.

7 Range-based for Loops

Perhaps my favorite change enabled by the use of iterators is a simplified form of for loop called range-based for loops. (In other programming languages, these are called for-each loops.)

These loops take advantage of the fact that all std containers support iterators, that most C++ programmers who design their own containers (e.g. our Book as a container of authors) will provide iterators following the same convention, and that containers/iterators look like arrays/pointers.

If you have an array or an iterator-providing container and if you want to loop through all of the elements of that array or container, you can write:

for (ElementType variableName: containerOrArray)
  {
    ⋮
  }

For example, we can rewrite our earlier example:

double sum (const std::vector<double>& v) {
   double result = 0.0;
   for (auto it = v.begin(); it != v.end(); ++it)
      result += *it;
   return result;
}

as

double sum (const std::vector<double>& v) {
   double result = 0.0;
   for (double d: v)
      result += d;
   return result;
}

which is really quite nice.


It’s not a universal replacement for all for loops, however.

Sometimes we need the iterator because we are really interested in the position information.

There’s no way, for example, to use this new loop in

template <typename T, size_t N>
std::array<T,N>::const_iterator search (const std::array<T,N>& v, T x)
{
  for (auto it = v.begin(); it != v.end(); ++it)
    {
     if (x == *it)
       return it;
    }
  return v.end();
}

because if we tried to write the loop as

  for (T y: v)
    {
     if (x == y)
        return it; // oops!
    }

we would no longer have the position it, which is the thing that we actually wanted to return.