Maps and MultiMaps

Steven J. Zeil

Last modified: Oct 26, 2023

Contents:

1 Interface

1.1 The template header

1.2 Internal type names

1.3 Insert & Erase

1.4 Access

2 Maps versus Sequences and Sets

2.1 Converting to map

2.2 Supplying iterators

3 An Extended Example: Literary Style Emulator

3.1 The Program

3.2 Once More, with MultiMaps

4 Sets, MultiSets, Maps, and MultiMaps: all in the family

4.1 From Set to Map

4.2 From Map to MultiSet

4.3 From Map to MultiMap

A map can be thought of as

a lookup table, or as
a generalization of an array/vector in which the index need not be numeric.

typedef map<string, Zipcodes, less<string> >  ZipTable;
ZipTable zips;

zips["Jones"] = 23529;
zips["Zeil"] = 23452;
cout << zips["Zeil"];

A map associates a key type Key with an associated data type T. In the example above, the key type is string. The associated data is int.
The map allows you to store (key,data) pairs (e.g., the two assignment statements in the example above) and to retrieve the data previously stored with some key (e.g., the cout output statement).
Like sets, a map may only contain a single copy of any given Key value
- But a multimap can contain multiple copies of the same key.
- Like sets, you must supply a comparison functor class (for the Key type) or accept the less<Key> default.

1 Interface

The code shown here presents a version of map that is pretty close to the standard.

#ifndef MAP_H
#define MAP_H

#include <utility>

template <class Key, class T, class Compare = less<Key> >
class map {
public:

// typedefs:

  typedef Key key_type;
  typedef T data_type;
  typedef pair<const Key, T> value_type;
  typedef Compare key_compare;
    

  typedef ... pointer;
  typedef ... const_pointer;
  typedef ... reference;
  typedef ... const_reference;
  typedef ... iterator;
  typedef ... const_iterator;
  typedef ... reverse_iterator;
  typedef ... const_reverse_iterator;
  typedef ... size_type;
  typedef ... difference_type;

  // allocation/deallocation

  map();
  explicit map(const Compare& comp);

  template <class InputIterator>
  map(InputIterator first, InputIterator last);

  template <class InputIterator>
  map(InputIterator first, InputIterator last, const Compare& comp);

  map(const map<Key, T, Compare>& x);
  map<Key, T, Compare>& operator=(const map<Key, T, Compare>& x);

  // accessors:

  key_compare key_comp() const;
  value_compare value_comp() const;

  iterator begin();
  const_iterator begin() const;
  iterator end();
  const_iterator end() const;

  reverse_iterator rbegin();
  const_reverse_iterator rbegin() const;
  reverse_iterator rend();
  const_reverse_iterator rend();

  bool empty() const;
  size_type size() const;
  size_type max_size() const;

  T& operator[](const key_type& k);
  T& at(const key_type& k);
  T at(const key_type& k) const;

  void swap(map<Key, T, Compare>& x);

  // insert/erase

  pair<iterator,bool> insert(const value_type& x);
  iterator insert(iterator position, const value_type& x);

  template <class InputIterator>
  void insert(InputIterator first, InputIterator last);

  void erase(iterator position);
  size_type erase(const key_type& x);
  void erase(iterator first, iterator last);
  void clear();

  // map operations:

  iterator find(const key_type& x);
  const_iterator find(const key_type& x) const;

  size_type count(const key_type& x) const;

  iterator lower_bound(const key_type& x);
  const_iterator lower_bound(const key_type& x) const;

  iterator upper_bound(const key_type& x);
  const_iterator upper_bound(const key_type& x) const;
  
  pair<iterator,iterator> equal_range(const key_type& x);
  pair<const_iterator,const_iterator> equal_range(const key_type& x) const;

};

template <class Key, class T, class Compare>
bool operator==(const map<Key, T, Compare, Alloc>& x, 
                const map<Key, T, Compare, Alloc>& y);

template <class Key, class T, class Compare>
inline bool operator<(const map<Key, T, Compare, Alloc>& x, 
                      const map<Key, T, Compare, Alloc>& y);

#endif

The map interface is styled very similarly to that of set, so we’ll concentrate on the “surprises” that map has in store. Keep in mind that everything we say for maps applies to multimaps as well, except for the limitation to having only a single copy of any given key value.

1.1 The template header

template <class Key, class T, class Compare = less<Key>>
class map

You will most often create a map by instantiating the template on the two data types that you wish to use for the key and for the associated data:

map<string, int> myStringToIntMap;

But, as with set, there is another parameter, Compare, used to compare key values. The Compare parameter defaults to less<Key>.

1.2 Internal type names

// typedefs:

  typedef Key key_type;
  typedef T data_type;
  typedef pair<const Key&, T&> value_type;
  typedef Compare key_compare;
    

  typedef ... pointer;
  typedef ... const_pointer;
  typedef ... reference;
  typedef ... const_reference;
  typedef ... iterator;
  typedef ... const_iterator;
  typedef ... reverse_iterator;
  typedef ... const_reverse_iterator;
  typedef ... size_type;

Let’s look at the type names declared inside each map. As usual, we see const_iterator, iterator and size_type.

Like set, the map template defines type names for key_type, giving the data type of the keys in the container, and value_type, giving the data type that describes what we insert into the container and what is returned whenever we dereference (apply operator* to) an iterator.

For set and multiset, the value_type is the same as the key_type, but we can see that this is not true here. Instead the value_type is a pair. The first element of the pair is a key value, the second element a data value.

Notice also that the first element of the value_type pair is marked const. What this tells us is that, if we have a map iterator:

map<string,int>::iterator p = myStringToIntMap.find("foo");

we can use that iterator to change the data associated with this key:

(*p).second = 42;

but not to change the key at that location:

(*p).first = "bar"; // compiler flags this as an error

The exact form of value_type seems to vary among compilers, but it is always a key-data std::pair and it should always prohibit changing the key but allow changing the data.

So I would recommend _not_writing code like this:

pair<const Key, T> currentData = *mapIterator;

but suggest instead that you take advantage of the more convenient name that every map provides:

map<Key, T>::value_type currentData = *mapIterator;

or, in C++11,

auto currentData = *mapIterator;

1.3 Insert & Erase

// insert/erase

  pair<iterator,bool> insert(const value_type& x);
  iterator insert(iterator position, const value_type& x);

  template <class InputIterator>
  void insert(InputIterator first, InputIterator last);

  void erase(iterator position);
  size_type erase(const key_type& x);
  void erase(iterator first, iterator last);

Maps support the same kind of insert operations that we saw for sets.

You need to keep in mind, however, that you are inserting a key-data pair:

typedef map<PersonnelRecord, 
            string, 
            CompareByNameAddress>
    Departments;
Departments depts;
PersonnelRecord keyes, maly, zeil;
const string cs = "Computer Science";
const string math = "Mathematics";
  ⋮
depts.insert (Departments::value_type(keyes, math));
depts.insert (Departments::value_type(maly, cs));

value_type is a type name declared inside every std:: container and describes the data type that is inserted into the container and that is retrieved when we dereference an iterator. For the containers we have seen earlier, value_type was not all that useful, because we pretty much knew what kind of data we were inserting.

For maps and multimaps, however, value_type is a std::pair. pairs have a constructor that takes the two elements to be composed into the pair, which is what you are seeing in the expressions Departments::value_type(keyes, math) and Departments::value_type(maly, cs) above.

Another way to do much the same thing is with the std:: template make_pair:

depts.insert (make_pair(keyes, math));
depts.insert (make_pair(maly, cs));

As with sets, we can supply a position as a hint of where to store things. If we’re consistently right, we get an amortized $O(1)$ insertion time.

depts.insert (depts.end(), Departments::value_type(zeil, cs));

A simpler way to insert is to use the indexing notation (available for maps, but not for multimaps). We could write the same set of insertions this way:

depts[keyes] = math;
depts[maly] = cs;
depts[zeil] = cs;

which certainly looks a lot more attractive.

This shorter form can be a bit inefficient however. Here’s how it works:

The expression on the left of the assignment is evaluated first.
- The dept map’s operator[] is invoked.
- This searches through the dept map for any key-data pair having, in the first assignment, keyes as its key part.
  - If keyes is not in the map yet, a new pair is added to the map with keyes as its key and the default constructor for the data type (string()) used to create the initial value of the data part.
- A reference (string&) to the data part of the key-data pair is returned from operator[].
The expression on the right of the assignment is evaluated, yielding a string.
The string from the right side is assigned to the string& returned on the left, overwriting the value already there.

Now, only a single search through the table is done, but two different strings get put in there, the second replacing the first. So if our data type was one in which the default constructor was particularly expensive (the default constructor for string simply produces an empty string "", so that’s no big deal), then we might want to avoid the extra data value creation and use the insert function instead.

1.4 Access

1.4.1 find

iterator find(const key_type& x);
  const_iterator find(const key_type& x) const;

  size_type count(const key_type& x) const;

We can search maps for key values in much the same way that we do sets:

typedef map<PersonnelRecord, string, CompareByNameAddress>
    Departments;
Departments depts;
  ⋮
Departments::iterator i;
i = depts.find(maly);
if (i != depts.end())
  cout << maly.name() << " is in the "
       << i->second << " department."
       << endl;

1.4.2 operator[] and at()

Again, for maps (but not multimaps), we can use the more convenient indexing notation:

typedef map<PersonnelRecord, string, CompareByNameAddress>
    Departments;
Departments depts;
  ⋮
cout << maly.name() << " is in the "
     << depts[maly] << " department."
     << endl;

But this code isn’t really the same. Remember, the operator[] for a map searches the map for the given key, and, if it doesn’t find it, adds it to the map. So if maly wasn’t already in the map, we would see the following output:

Maly is in the  department.

(The new entry for maly would be created using the string default constructor for the data.)

The following hybrid code, which I often see in student programs, is particularly inefficient:

typedef map<PersonnelRecord, string, CompareByNameAddress>
    Departments;
Departments depts;
  ⋮
Departments::iterator i = depts.find(maly);
if (i != depts.end())
  cout << maly.name() << " is in the "
       << depts[maly] << " department."
       << endl;

because it actually searches the map twice for the same value. Even though searching a map is only $O(\log(\mbox{size()}))$, there’s no point to doubling the execution time like this.

In a similar vein, the fact that map access looks so much like accessing an array often leads people to write code like:

map<string, int> wordCounts;
  ⋮
wordCounts[word] = wordCounts[word] + 1;

If wordCounts were an array, most compilers would, at least on their highest optimization settings, recognize that the two indexing expressions access the same address and would avoid doing the calculation twice. But no compiler will perform that optimization when wordCounts is a map, so the above code does two searches through the map. Better would be:

++wordCounts[word];

int& count = wordCounts[word];
++count;

map<string,int>::iterator p = wordCounts.find(word);
++(p->second);

auto p = wordCounts.find(word);
++(p->second);

If we want access similar tor [ ] but without running the risk of adding new entries to the map, we can use the at function.

typedef map<PersonnelRecord, string, CompareByNameAddress>
    Departments;
Departments depts;
  ⋮
cout << maly.name() << " is in the "
     << depts.at(maly) << " department."
     << endl;

The difference between operator[] and at() is what happens when the thing we are looking for is not in the map. Given a declaration

map<Key,Data> myMap;
Key x;  
Data y;
 ⋮

	(`x`,`y`) is in `myMap`	`x` is not a key in `myMap`
`myMap[x]`	returns `y`	inserts (`x`,`Data()`) and returns `Data()`
`myMap.at(x)`	returns `y`	`out_of_range` exception

If we try to use at to retrieve a key that is not in the map, an exception (run-time error) is signaled.

1.4.3 lower_bound, upper_bound, & equal_range

iterator lower_bound(const key_type& x);
const_iterator lower_bound(const key_type& x) const;

iterator upper_bound(const key_type& x);
const_iterator upper_bound(const key_type& x) const;

pair<iterator,iterator> equal_range(const key_type& x);
pair<const_iterator,const_iterator> equal_range(const key_type& x) const;

Searching for a range of positions – most useful with multimaps.

If we are using a multimap, we can have multiple data values associated with the same key. So a find(...) operation that returns only a single position is probably not what we need.

For example, suppose we were implementing an appointment calendar and had:

multimap<Date, string> events;
  ⋮
events.insert(make_pair(Date("9/21/2016"), "project meeting"));
events.insert(make_pair(Date("9/21/2016"), "status report due"));
events.insert(make_pair(Date("9/21/2016"), "seminar"));
events.insert(make_pair(Date("9/22/2016"), "guest lecture"));
events.insert(make_pair(Date("9/24/2016"), "assignment due"));

We can envision this map as a sequence of pairs (even though we expect that it’s really stored in a binary search tree or similar data structure.

events: [(9/21/2016, "project meeting"), 9/21/2016, "status report due"),
         (9/21/2016, "seminar"), (9/22/2016, "guest lecture"),
         (9/24/2016, "assignment due")]

Now, if we do

auto pos = events.find(Date("9/21/2017"));

then we will get an iterator pos that points to one of the three events for 9/21, but we have no guarantee which of the three we will see. This would not be a problem for an ordinary map, because there could only be one pair with that key in an ordinary map, but you can see this might be less than ideal for a multimap.

lower_bound returns the position of the first pair with a given key. upper_bound gives the position just after the last pair with that key. So we can list all of the events for 9/21 with the following code:

Date d921 = Date("9/21/2017");
for (auto pos = events.lower_bound(d921);
     pos != events.upper_bound(d921); ++pos)
{
  cout << pos->second << endl;
}

Note that upper_bound is not the same as end(). end() gives us the end of the entire container. upper_bound gives us the end of a range of positions for a specific key. In the example we have been working with, upper_bound(Date("9/21/2016") is probably the same position as lower_bound(Date("9/22/2016").

This code has one flaw, however. It actually searches the multimap twice, once to get the lower bound and once to get the upper bound. equal_range lets us get both in a single search, returnign a pair of iterators:

auto events921 = events.equal_range(Date("9/21/2017"));
for (auto pos = events921->first;
     pos != events921->second; ++pos)
{
  cout << pos->second << endl;
}

2 Maps versus Sequences and Sets

As we did with sets, we might want to consider the circumstances in which we would prefer to use a map rather than a sequence (or a set).

What gets stored
- When we create a sequence, it is a collection of one kind of value.
- When we create a set, it is a collection of one kind of value.
- When we create a map, it is a collection of pairs of possibly different types, a key and an associated value type.
Inserting
- When we insert into a sequence, we insert at a specific position. Our code must locate the proper position at which to insert. If we don’t want duplicates, it is up to us to test to see if the value to be inserted is already in place.
- When we insert into a set, we allow the set to determine the location at which to do the insertion. Inserted values are ordered automatically. Duplicates are handled automatically.
- When we insert into a map, we allow the map to determine the location at which to insert the key and associated value. Inserted data is ordered by key automatically. Duplicates are handled automatically.
Searching
- When we use a sequence, we must write (or use) a separate search function to scan the sequence for a desired value.
- When we use a set, we use a built-in function to quickly search for a value.
- When we use a map, we use a built-in function to search for a key and retrieve the associated value.

Let’s revisit the publisher example and see how these ideas play out.

We start with this:

publisher0.h

/*
 * publisher.h
 *
 *  Created on: May 23, 2018
 *      Author: zeil
 */

#ifndef PUBLISHER_H_
#define PUBLISHER_H_

#include <set>
#include <string>
#include <vector>
#include "author.h"
#include "book.h"

class Publisher
{
public:
	typedef std::vector<Author>::iterator author_iterator;
	typedef std::vector<Author>::const_iterator const_author_iterator;
    typedef std::set<Book>::iterator book_iterator;
	typedef std::set<Book>::const_iterator const_book_iterator;

  Publisher (std::string theName = std::string());

  std::string getName() const        {return name;}
  void setName (std::string theName) {name = theName;}

  int numberOfBooks() const;
  book_iterator begin() {return books.begin();}
  book_iterator end() {return books.end();}
  const_book_iterator begin() const {return books.begin();}
  const_book_iterator end() const {return books.end();}

  void addBook (Book& b);

  int numberOfAuthors() const;
  author_iterator begin_authors() {return authors.begin();}
  author_iterator end_authors() {return authors.end();}
  const_author_iterator begin_authors() const {return authors.begin();}
  const_author_iterator end_authors() const {return authors.end();}

  author_iterator getAuthor(std::string name);
  const_author_iterator getAuthor(std::string name) const;

  void addAuthor (const Author& au);

  bool operator== (const Publisher& right) const;
  bool operator< (const Publisher& right) const;


private:
  std::string name;

  std::set<Book> books;

  std::vector<Author> authors;

};


std::ostream& operator<< (std::ostream& out, const Publisher& publ);

#endif /* AUTHOR_H_ */

The highlighted portions of this interface make it clear that the Publisher is, among other things, a collection of Authors. The private data and the author_iterator type declarations make it clear that we are currently storing those authors in a sequence (a vector, to be precise).

But these function declarations show that, when we search, we don’t give the publisher an enter Author to search for, only the author’s name. That fact that our searches involve supplying one type to retrieve another suggests that a map may be useful. That’s not to say that we cannot use a sequence or set for this purpose. The existing code already manages that using a sequence:

Publisher::const_author_iterator Publisher::getAuthor(std::string name) const
{
	return find_if (authors.begin(), authors.end(),
			[&] (const Author& au) {return name == au.getName();});
}

But this seems to be the kind of job that maps were designed to perform.

2.1 Converting to `map`

The basic conversion to a map is not all that difficult.

We replace the vector with a map in the data declaration:

class Publisher
{
public:
    ⋮
private:
  std::string name;

  std::set<Book> books;

  typedef std::map<std::string, Author> AuthorMap;
  AuthorMap authors;

};

We use a map with strings for the keys because we know that, when we search, we search by author name. We use Author for the associated value because the main thing that this is a collection of, is a collection of Authors.

We can now use the map’s built-in functions to search. Instead of using the find_if sequential search that we saw above, we can now write:

Publisher::const_author_iterator Publisher::getAuthor(std::string name) const
{
	return authors.find(name);
}

which is definitely simpler. And it will be faster as well.

With the vector, our insertion function looked like this:

void Publisher::addAuthor (const Author& au)
{
	auto pos = find(authors.begin(), authors.end(), au);
	if (pos == authors.end())
	{
		authors.push_back(au);
	}
}

Now we use the map’s own insert function instead:

void Publisher::addAuthor (const Author& au)
{
	authors[au.getName()] = au;
}

This is, however, not as efficient as we might like. If this is the first time we have encountered this author, then the expression authors[au.getName()] will create a slot in the map, initializing it with the default value Author(). Then we immediately copy over that with the following assignment operator, ... = au;.

A better approach is

void Publisher::addAuthor (const Author& au)
{
	authors.insert(AuthorMap::value_type(au.getName(), au));
}

which puts the desired value au into the slot immediately.

2.2 Supplying iterators

There is one significant complication, however. We cannot simply replace our iterators:

	typedef std::vector<Author>::iterator author_iterator;
	typedef std::vector<Author>::const_iterator const_author_iterator;
	typedef std::set<Book>::iterator book_iterator;
	typedef std::set<Book>::const_iterator const_book_iterator;

by iterators over the map:

	typedef std::vector<Author>::iterator author_iterator;
	typedef std::vector<Author>::const_iterator const_author_iterator;
	typedef std::map<std::string, Author>::iterator book_iterator;
	typedef std::map<std::string, Author>::const_iterator const_book_iterator;

or even

	typedef std::vector<Author>::iterator author_iterator;
	typedef std::vector<Author>::const_iterator const_author_iterator;
	typedef AuthorMap::iterator book_iterator;
	typedef AuthorMap::const_iterator const_book_iterator;

The reason for this is that the AuthorMap is a collection of (string, Author) pairs, so the iterators it supplies give us access to (string, Author) pairs. But we want the Publisher’s author iterators to give us access to a series of Authors.

The fix for this is not difficult, though it is a bit tedious.

We need to create our own iterator class.
Our iterator will hold an AuthorMap iterator inside it.
But when we try to dereference that iterator, it will only give us the Author part of the (string,Author) pair.

Here is the declaration for out new iterator class`:

authorIterator.h

/*
 * publisher.h
 *
 *  Created on: June 30, 2020
 *      Author: zeil
 */

#ifndef AUTHORITERATOR_H_
#define AUTHORITERATOR_H_

#include <iterator>
#include <map>
#include <string>
#include "author.h"

class Publisher;

class AuthorIterator {
public:
	using iterator_category = std::bidirectional_iterator_tag;
	using value_type = Author;
	using difference_type = ptrdiff_t;
	using pointer = const Author*;
	using reference = const Author&;

	AuthorIterator() {}

	// Get the data element at this position
	reference operator*() const;
	pointer operator->() const;

	// Move position forward 1 place
	AuthorIterator& operator++();
	AuthorIterator operator++(int);

	// Move position backward 1 place
	AuthorIterator& operator--();
	AuthorIterator operator--(int);

	// Comparison operators
	bool operator== (const AuthorIterator& it) const;
	bool operator!= (const AuthorIterator& it) const;
private:
	typedef std::map<std::string, Author> Container;
	Container::const_iterator position;

	AuthorIterator (Container::const_iterator pos)
	: position(pos) {}

	friend class Publisher;
};



#endif

It’s pretty much the standard interface for an iterator.

The most interesting parts are in the private: area.

The only data member is itself an iterator from the kind of map used in the Publisher.
However, we have a private constructor for building AuthorIterators that store a specific position within the map.
Then Publisher is named as a “friend” of this class. Friends are allowed access to private members, so Publisher can use that private constructor to convert begin() and end() positions in the map into AuthorIterator values.

The implementation is nearly all one-liners. For example, the operator* looks like:

// Get the data element at this position
AuthorIterator::reference AuthorIterator::operator*() const
{
	return position->second;
}

When we want to get the value at an AuthorIterator, we get the second part of the (string, Author) element inside the map.

Moving forward and backward are no harder, e.g.,

// Move position forward 1 place
AuthorIterator& AuthorIterator::operator++()
{
	++position;
	return *this;
}

Returning to the Publisher class, then we have little to do except to declare the appropriate iterator types:

class Publisher
{
public:
	typedef AuthorIterator author_iterator;
	typedef AuthorIterator const_author_iterator;

Because maps, like sets, order themselves, it is understood that they do not provide iterators that allow one to alter their data directly. For sets and maps, the const_iterator and the iterator are simply two names for the same type, which behaves in a “const iterator” fashion – you can use it to look at the data but not to change the data.

We adopt the same idea here, which means that we can replace several pairs of functions by just one. So the old declarations

  author_iterator begin_authors() {return authors.begin();}
  author_iterator end_authors() {return authors.end();}
  const_author_iterator begin_authors() const {return authors.begin();}
  const_author_iterator end_authors() const {return authors.end();}

  author_iterator getAuthor(std::string name);
  const_author_iterator getAuthor(std::string name) const;

now collapse down to just:

  const_author_iterator begin_authors() const {
      return AuthorIterator(authors.begin());}
  const_author_iterator end_authors() const {
      return AuthorIterator(authors.end());}

  const_author_iterator getAuthor(std::string name) const;

Note the use of the AuthorIterator private constructor to build an AuthorIterator from a map constructor.

The complete code for the converted publisher is:

publisher.h

/*
 * publisher.h
 *
 *  Created on: June 30, 2020
 *      Author: zeil
 */

#ifndef PUBLISHER_H_
#define PUBLISHER_H_

#include <iterator>
#include <map>
#include <set>
#include <string>
#include <vector>
#include "author.h"
#include "authorIterator.h"
#include "book.h"


class Publisher
{
public:
	typedef AuthorIterator author_iterator;
	typedef AuthorIterator const_author_iterator;
	typedef std::set<Book>::iterator book_iterator;
	typedef std::set<Book>::const_iterator const_book_iterator;

  Publisher (std::string theName = std::string());

  std::string getName() const        {return name;}
  void setName (std::string theName) {name = theName;}

  int numberOfBooks() const;
  book_iterator begin() {return books.begin();}
  book_iterator end() {return books.end();}
  const_book_iterator begin() const {return books.begin();}
  const_book_iterator end() const {return books.end();}

  void addBook (Book& b);

  int numberOfAuthors() const;
  const_author_iterator begin_authors() const {return AuthorIterator(authors.begin());}
  const_author_iterator end_authors() const {return AuthorIterator(authors.end());}

  const_author_iterator getAuthor(std::string name) const;

  void addAuthor (const Author& au);

  bool operator== (const Publisher& right) const;
  bool operator< (const Publisher& right) const;


private:
  std::string name;

  std::set<Book> books;

  typedef std::map<std::string, Author> AuthorMap;
  AuthorMap authors;

};


std::ostream& operator<< (std::ostream& out, const Publisher& publ);

#endif /* AUTHOR_H_ */

authorIterator.h

/*
 * publisher.h
 *
 *  Created on: June 30, 2020
 *      Author: zeil
 */

#ifndef AUTHORITERATOR_H_
#define AUTHORITERATOR_H_

#include <iterator>
#include <map>
#include <string>
#include "author.h"

class Publisher;

class AuthorIterator {
public:
	using iterator_category = std::bidirectional_iterator_tag;
	using value_type = Author;
	using difference_type = ptrdiff_t;
	using pointer = const Author*;
	using reference = const Author&;

	AuthorIterator() {}

	// Get the data element at this position
	reference operator*() const;
	pointer operator->() const;

	// Move position forward 1 place
	AuthorIterator& operator++();
	AuthorIterator operator++(int);

	// Move position backward 1 place
	AuthorIterator& operator--();
	AuthorIterator operator--(int);

	// Comparison operators
	bool operator== (const AuthorIterator& it) const;
	bool operator!= (const AuthorIterator& it) const;
private:
	typedef std::map<std::string, Author> Container;
	Container::const_iterator position;

	AuthorIterator (Container::const_iterator pos)
	: position(pos) {}

	friend class Publisher;
};



#endif

publisher.cpp

/*
 * publisher.cpp
 *
 *  Created on: May 10, 2020
 *      Author: zeil
 */

#include "publisher.h"
#include "author.h"
#include "book.h"

#include <algorithm>
#include <cassert>

using namespace std;

Publisher::Publisher (std::string theName)
: name(theName)
{}


int Publisher::numberOfBooks() const
{
   return books.size();
}


void Publisher::addBook (Book& b)
{
	b.setPublisher(*this);
	books.insert(b);
}

int Publisher::numberOfAuthors() const
{
   return authors.size();
}



Publisher::const_author_iterator Publisher::getAuthor(std::string name) const
{
	return authors.find(name);
}



void Publisher::addAuthor (const Author& au)
{
	// authors[au.getName()] = au;
	authors.insert(AuthorMap::value_type(au.getName(), au));
}

bool Publisher::operator== (const Publisher& right) const
{
   return name == right.name;
}

bool Publisher::operator< (const Publisher& right) const
{
   return name < right.name;
}

std::ostream& operator<< (std::ostream& out, const Publisher& publ)
{
	out << publ.getName() << ": \n";
	out << "  authors:";
	for (auto it = publ.begin_authors(); it != publ.end_authors();  ++it)
		out << ' ' << it->getName();
	out << "\n  books:";
	for (const Book& book: publ)
		out << ' ' << book.getTitle() << ';';
	return out;
}

authorIterator.cpp

/*
 * publisher.cpp
 *
 *  Created on: May 10, 2020
 *      Author: zeil
 */

#include "authorIterator.h"
#include "publisher.h"
#include "author.h"

#include <algorithm>

using namespace std;


// Get the data element at this position
AuthorIterator::reference AuthorIterator::operator*() const
{
	return position->second;
}

AuthorIterator::pointer AuthorIterator::operator->() const
{
	return &(position->second);
}

// Move position forward 1 place
AuthorIterator& AuthorIterator::operator++()
{
	++position;
	return *this;
}

AuthorIterator AuthorIterator::operator++(int)
{
	AuthorIterator saved (position);
	++position;
	return saved;
}

// Move position backward 1 place
AuthorIterator& AuthorIterator::operator--()
{
	--position;
	return *this;
}

AuthorIterator AuthorIterator::operator--(int)
{
	AuthorIterator saved (position);
	--position;
	return saved;
}

// Comparison operators
bool AuthorIterator::operator== (const AuthorIterator& it) const
{
	return position == it.position;
}

bool AuthorIterator::operator!= (const AuthorIterator& it) const
{
	return position != it.position;
}

3 An Extended Example: Literary Style Emulator

In those cases, we can copy work? The standard says that the functor can occur at all, or group of am associative containers, ones that is a different from the same key value and the course have data value (In some ordering on its destination positions of the map - The name and trust the object that we are called just ask if is somewhat oversimplified. The text’s discussion, especially more mathematical one, the key values are checked out by “multiset”) - The set of the of keys in a multiset of that default: or - Now, let’s write functors for later.

No, the above paragraph was not supposed to make sense. In fact, I did not write it. It was written by feeding the text of the previous lecture notes on sets into a simple “style emulator” program that attempts to generate text in the same “style” as a sample document it has been given. It does this by two simple rules:

It begins its output with a word that was used to begin a sentence in the sample document.
After producing any word w, it chooses its next word from among those that occurred after the same word w in the sample document.

The program produces gibberish, but gibberish that is based on the same vocabulary and phrasing of the sample document’s author.

The differences in output, given different sample documents, can be quite striking. Here is an output from the program after feeding it a copy of Shakespeare’s Hamlet:

Pooh! you hear me, he so; At his meed he’s the earth which dearest father grossly, full of majesty should murder sanctuarize; Revenge should do a piece of this.– There is this should be as level to tell us so and less than another? Why, look like a more willingly part them, sir? They are embark’d: farewell: And, as grace, As e’er my blood, And he beseech’d me packing: I’ll teach me. The tedious old man. My fault is to be at thy distemper Sprinkle cool patience! Whereon the carriages, sir, but reserve thy brain,

and after feeding it the U.S. Constitution:

But in the United States; To promote the Constitution of Marque and punish its own Members, and of all Crimes, except as under this Union, according to the several States, and establish. The President of the Appointment of the United States; and Water; To promote the Contrary notwithstanding. The Privilege of three Years after it shall be as to which shall be eligible to pay Duties of Impeachment shall be divided as may, on the Legislature, which shall be President of Commerce or being disapproved by the United States. A Person holding any State, the Senators from any State, the

and, finally, after giving it a copy of Lewis Carroll’s Alice Through the Looking Glass:

’Mind the word. ‘It’s very ignorant of you,’ the bow, not being punished: and all what I go on, turning from day was going on: ‘for I think about. They looked round them into this time that the silence, with a pause in fact, the Knight said “if”–’ ’But I’ve had eyes and one of puzzling, was, before she has the house! Well then, when he’s happy. His name them, ’just in bright thought that needed any lions or other, rolled down there, they’ve each other White King was a little further, ‘to be a battle?’ Tweedledum spied a Queen,

3.1 The Program

Now let’s look at how this program can be written.

Here is the high-level pseudocode for the main routine.

int main() {
  get sample doc filename and N, the # of 
    words to generate, from command-line args;
  read the sample document, collecting all
    sentence-starting words and all consecutive
    word pairs;
  generate N words of text from the collected
    sentence-starting words and consecutive pairs;
}

We’ll tackle this as a top-down design. Starting with this pseudocode, we could choose to expand any of these statements. For now, let’s just assume that the last two steps will be handled by separate functions.

3.1.1 main() calls functions for bulk of processing

int main() {
  get sample doc filename and N, the # of 
    words to generate, from command-line args;

  // A set of words that have been used to start new sentences.
  ??? startingWords;

  // For each word appearing in the document, a vector of all the
  // words that have immediately followed it.
  ??? consecutiveWords;

  readDocument (filename of sample doc, 
                startingWords, consecutiveWords);
  generateEmulatedText (startingWords, consecutiveWords, N);

  return 0;
}

Getting the command line arguments is one of those things that may appear mysterious at first, but because we do it for every almost program we write, it can eventually be done almost by rote.

3.1.2 main(): Command-Line Arguments

int main() {
  if (argc != 3)
    {
      cerr << "Usage: " << argv[0] << "document N\n"
           <<  "  where document is a plain-text document and\n"
           <<  "  N is  the number of words of output desired." << endl;
      return -1;
    }

  srand (time(0));

  int N;
  { // convert parameter string to an int
    istrstream Nin (argv[2]);
    Nin >> N;
  }

  get sample doc filename and N, the # of 
    words to generate, from command-line args;

  // A set of words that have been used to start new sentences.
  ??? startingWords;

  // For each word appearing in the document, a vector of all the
  // words that have immediately followed it.
  ??? consecutiveWords;

  readDocument (argv[1], startingWords, consecutiveWords);
  generateEmulatedText (startingWords, consecutiveWords, N);

  return 0;
}

We also throw in a use of srand to initialize the random number generator, as we will clearly be making random selections within this program.

Let’s look next at the process of generating the emulated text. (As a personal preference, I usually work on designing the “heart” of a program before worrying about its input/output routines, because I believe that you often discover important I/O requirements when you get into the “core” algorithms.)

3.1.3 generateEmulatedText()

Here’s a reasonable start, but it has some problems.

void generateEmulatedText
   (const ???& startingWords,
    const ???& consecutiveWords,
    int N)
{
  randomly select and print one word 
    from startingWords;
  do this N-1 times {
     Look up the last-printed word in
       consecutiveWords, getting all the 
       words that followed that one in the
       sample document.
     Randomly select one word from among those 
       followers and print it;
  }      
}

Suppose that the sample document ended with some word that doesn’t appear anywhere else in the document. If we ever select that word for printing, we will then be stuck because its list of followers would be empty.
We really don’t want all the output appearing on a single line, so we need to insert line breaks at appropriate places.

3.1.4 generateEmulatedText(): expand the loop body

Fixing those problems:

generateEmulatedText0.listing

void generateEmulatedText
   (const ???& startingWords,
    const ???& consecutiveWords,
    int N)
{
  randomly select and print one word 
    from startingWords;
  do this N-1 times {
     Look up the last-printed word in
       consecutiveWords, getting all the 
       words that followed that one in the
       sample document.
     if at least one such word exists
       randomly select one word from among those 
         followers;
     else
       randomly select one word 
         from startingWords;


     if selected word would make the
       current output line 80 
       or more characters wide 
         print a line break;
     print the selected word;

  }      
}

3.1.5 generateEmulatedText(): simplified loop

We can simplify this a bit with a little transformation of the loop.

void generateEmulatedText
   (const ???& startingWords,
    const ???& consecutiveWords,
    int N)
{
  word = "";
  do this N times {
     Look up word in
       consecutiveWords, getting all the 
       words that followed that one in the
       sample document.
     if at least one such follower exists
       word = random selection from among those 
         followers;
     else
       word = random selection
         from startingWords;

     if word would make the
       current output line 80 
       or more characters wide 
         print a line break;
     print word;
  }      
}

We’ve got enough detail here to start filling in some real C++ now.

3.1.6 generateEmulatedText(): line filling

void generateEmulatedText
   (const ???& startingWords,
    const ???& consecutiveWords,
    int N)
{
  const int MAXLINELENGTH = 80;

  string word = "";
  int lineLength = 0;

  for (int wordCount = 0; wordCount < N; ++wordCount)
    {
     Look up word in
       consecutiveWords, getting all the 
       words that followed that one in the
       sample document.
     if at least one such follower exists
       word = random selection from among those 
         followers;
     else
       word = random selection
         from startingWords;

     if (word.length() + lineLength > MAXLINELENGTH)
       {
         cout << endl;
         lineLength = 0;
       }
     if (lineLength > 0)
       cout << ' ';
     cout << word;
     lineLength += word.length();
    }      
}

Now, let’s consider the issue of making a random selection from a collection of words. If we have a sequence of K words, we could generate a random integer in the range 0…K-1 and use that to index into the sequence and access the selected word.

The data structures we have that allow retrieval via numeric indices are arrays, vectors, and deques. A vector would seem the most likely choice.

3.1.7 randomChoice()

A function like this should then do nicely for making a random selection.

string randomChoice (const vector<string>& choices)
{
  int k = rand() % choices.size();
  return choices[k];
}

Returning to our generateEmulatedText function, then, it would seem that the startingWords collection should be a vector<string>, so that we can apply randomChoice to it.

3.1.8 generateEmulatedText(): data structure for parameters

generateEmulatedText1.listing

void generateEmulatedText
   (const vector<string>& startingWords,
    const ???& consecutiveWords,
    int N)
{
  const int MAXLINELENGTH = 80;

  string word = "";
  int lineLength = 0;

  for (int wordCount = 0; wordCount < N; ++wordCount)
    {
     Look up word in
       consecutiveWords, getting all the 
       words that followed that one in the
       sample document.
     if at least one such follower exists
       word = random selection from among those 
         followers;
     else
       word = randomChoice(startingWords);

     if (word.length() + lineLength > MAXLINELENGTH)
       {
         cout << endl;
         lineLength = 0;
       }
     if (lineLength > 0)
       cout << ' ';
     cout << word;
     lineLength += word.length();
    }      
}

Now, what about consecutiveWords? The “look up” suggests a map or multimap. Because the thing we are looking up is a string, we know that the key type must be string. But what should the data type be?

3.1.9 This time, let’s try a map

generateEmulatedText2.listing

void generateEmulatedText
   (const vector<string>& startingWords,
    const map<string, vector<string> >& consecutiveWords,
    int N)
{
  const int MAXLINELENGTH = 80;

  string word = "";
  int lineLength = 0;

  for (int wordCount = 0; wordCount < N; ++wordCount)
    {
     vector<string> followers = consecutiveWords[word];
     if (followers.size() > 0) 
       word = randomChoice(followers);

     else
       word = randomChoice(startingWords);

     if (word.length() + lineLength > MAXLINELENGTH)
       {
         cout << endl;
         lineLength = 0;
       }
     if (lineLength > 0)
       cout << ' ';
     cout << word;
     lineLength += word.length();
    }      
}

This is shaping up pretty nicely. But if we try to compile this code, we will quickly find that the compiler complains about our trying to use operator[] on a const map.

Remember that, when you do something like myMap[key], the map is searched for the key value. If that key isn’t in the map yet, a new (key,value) pair is created and placed into the map.

That means that the [ ] operator is not const and so can only be used with maps that you are allowed to change.

3.1.10 Improved map access

generateEmulatedText3.listing

void generateEmulatedText
   (const vector<string>& startingWords,
    const map<string, vector<string>& consecutiveWords,
    int N)
{
  const int MAXLINELENGTH = 80;

  string word = "";
  int lineLength = 0;

  for (int wordCount = 0; wordCount < N; ++wordCount)
    {
     map<string, vector<string> >::const_iterator followers
         = consecutiveWords.find(word);
     if (followers != consecutiveWords.end()
         && followers->second.size() > 0)
        word = randomChoice(followers->second);


     if (word.length() + lineLength > MAXLINELENGTH)
       {
         cout << endl;
         lineLength = 0;
       }
     if (lineLength > 0)
       cout << ' ';
     cout << word;
     lineLength += word.length();
    }      
}

So operator[] can only be applied to non-const maps, and we are receiving consecutiveWords as a const reference. That means that we must fall back on the clumsier find approach.

That finishes the generateEmulatedText function. Next we can turn our attention to the problem of reading the sample document.

3.1.11 readDocument()

This appears to be a good starting point.

void readDocument (const char* docFileName,
                   vector<string>& startingWords,
                   map<string, vector<string> >& wordPairs)
{
  open a stream to read the file named docFileName;

  while (we can read another word)
    {
      if the word before this one had ended in
        a sentence-ending punctuation mark
          add word to startingWords;

      Add (previous word, word) to wordPairs
    }
}

Something worth noting is that we don’t want to filter these words the way we might for a spell checker, reducing everything to lowercase and discarding all punctuation. In this application, punctuation is our friend! It makes the generated text more natural-looking. We retain upper/lower-case distinctions as well. In most sample documents, every sentence will begin with a capitalized word. That means that, in our wordPairs map, words ending with ‘.’, ‘?’, etc., will only have capitalized words among their followers. Therefore whenever we happen to print a word that looks like it ends a sentence, the next word selected will wind up being capitalized. Again, this adds to the appearance of the generated text.

Similarly, we aren’t concerned about weeding out duplicate occurrences of the same word. If an author tends to use certain words over and over again, that is part of the author’s “style” and we want to emulate that pattern of word selection.

With this in mind, we start coding this function by noting that we will need variables to track the current word, the previous word, and the last character in the previous word.

3.1.12 readDocument(): setting up the variables

Next, we can take care of the file handling …

readDocument1.listing

void readDocument (const char* docFileName,
                   vector<string>& startingWords,
                   map<string, vector<string> >& wordPairs)
{
  open a stream to read the file named docFileName;

  char lastChar = '.';
  string lastWord;
  string word;


  while (we can read another word)
    {
      if the word before this one had ended in
        a sentence-ending punctuation mark
          add word to startingWords;

      Add (previous word, word) to wordPairs
      lastWord = word;
      lastChar = word[word.length()-1];

    }
}

3.1.13 readDocument(): file I/O

Adding the word to startingWords isn’t too much of a challenge …

readDocument2.listing

void readDocument (const char* docFileName,
                   vector<string>& startingWords,
                   map<string, vector<string> >& wordPairs)
{
  ifstream docIn (docFileName);
  char lastChar = '.';
  string lastWord;
  string word;

  while (docIn >> word)
    {
      if the word before this one had ended in
        a sentence-ending punctuation mark
          add word to startingWords;

      if (lastWord != "")
        Add (previous word, word) to wordPairs
      lastWord = word;
      lastChar = word[word.length()-1];

    }
}

3.1.14 readDocument(): adding a starting word

readDocument3.listing

void readDocument (const char* docFileName,
                   vector<string>& startingWords,
                   map<string, vector<string> >& wordPairs)
{
  ifstream docIn (docFileName);
  char lastChar = '.';
  string lastWord;
  string word;

  while (docIn >> word)
    {
      if (lastChar == '.' || lastChar == '?' || lastChar == '!')
        startingWords.push_back(word);

      
      if (lastWord != "")
        Add (previous word, word) to wordPairs

      lastWord = word;
      lastChar = word[word.length()-1];
    }
}

This may look familiar. It’s almost identical to the example we looked at in the sets lecture notes, but I have opted for a vector instead of a set because I don’t want to lose duplicate values.

At this point, we are left only with the problem of adding a pair of words to our map.

More precisely, our map should give us access to a vector of following words for lastWord. We then want to add word to the end of that vector.

3.1.15 adding a word pair

This is one way to do it.

readDocument4.listing

void readDocument (const char* docFileName,
                   vector<string>& startingWords,
                   map<string, vector<string> >& wordPairs)
{
  ifstream docIn (docFileName);
  char lastChar = '.';
  string lastWord;
  string word;

  while (docIn >> word)
    {
      if (lastChar == '.' || lastChar == '?' || lastChar == '!')
        startingWords.push_back(word);
      
      if (lastWord != "")
        {
          vector<string> wordFollowers = wordPairs[lastWord];
          wordFollowers.push_back(word);
          wordPairs[lastWord] = wordFollowers;
        }
      lastWord = word;
      lastChar = word[word.length()-1];
    }
}

We start by getting the current vector of followers out of the map. (If we have never seen lastWord before, then this will put an empty vector into the map and return a copy of that empty vector.)

Notice that what we are getting is a copy of the vector stores in the map.
We then add our new word onto then end of the vector.

Because this is a copy of the vector actually in the map, this push_back changes only our local copy, not the one that’s actually in the map.
Therefore we update the map by inserting the modified vector.

This code isn’t exactly thrilling. We are copying an entire vector of words out of the map, then copying an entire vector back into the map.

3.1.16 Using a reference variable to avoid copying

It is possible to do this without actually copying the vectors in and out of the map. The change required to accomplish this is subtle, but important.

readDocument5.listing

void readDocument (const char* docFileName,
                   vector<string>& startingWords,
                   map<string, vector<string> >& wordPairs)
{
  ifstream docIn (docFileName);
  char lastChar = '.';
  string lastWord;
  string word;

  while (docIn >> word)
    {
      if (lastChar == '.' || lastChar == '?' || lastChar == '!')
        startingWords.push_back(word);
      
      if (lastWord != "")
        {
          vector<string>& wordFollowers = wordPairs[lastWord];
          wordFollowers.push_back(word);
        }
      lastWord = word;
      lastChar = word[word.length()-1];
    }
}

By making wordFollowers not a vector but a reference to a vector, we actually extract from the map the “address of” its own copy of the vector. The push_back in the next line is therefore directly modifying the vector inside the map.

3.1.17 Summary

At this point, changes are pretty much finished. Put together our readDocument, generateEmulatedText and randomChoice functions, together with the main function we started with, slap a few #includes to cover the map and vector data structures we’ve employed, and we’re ready to go.

3.2 Once More, with MultiMaps

The use of a map< ... , vector< ... > > structure may strike you as being a bit convoluted. After all, aren’t multimaps expressly designed for situations where a given key value might map onto multiple associated data values?

That’s certainly true. In fact, to my mind, a multimap<T,U> is pretty much equivalent to map<T, list<U> > or map<T, vector<U> >. Whether you prefer a multimap or a map that returns a container is a judgment call, depending on what you’re comfortable with and kind of manipulation you might need to make on the whole collection of values mapped by a given key.

Let’s look at what would happen to our program if we opted for a multimap instead of a map onto vectors.

3.2.1 Tracking word pairs

Inserting a word pair is both simpler and messier.

readDocument6.listing

typedef multimap<string, string> FollowersMap;

void readDocument (const char* docFileName,
                   vector<string>& startingWords,
                   FollowersMap& wordPairs)
{
  ifstream docIn (docFileName);
  char lastChar = '.';
  string lastWord;
  string word;

  while (docIn >> word)
    {
      if (lastChar == '.' || lastChar == '?' || lastChar == '!')
        startingWords.push_back(word);
      
      if (lastWord != "")
        {
          wordPairs.insert(FollowersMap::value_type(lastWord, word));
        }
      lastWord = word;
      lastChar = word[word.length()-1];
    }
}

We don’t have to worry about extracting out a vector and performing vector operations. We just insert the pair into the multimap. But we do have to deal with the fact that the multimap insert function takes a pair of values (key and data). The exact data type for that pair is given to us by the multimap as its value_type.

3.2.2 generateEmulatedText - retrieving from a multimap

The changes to generateEmulatedText (shown here in the previous, map-of-vector version) will be fairly extensive, because we can no longer extract a vector of following words directly.

generateEmulatedText4.listing

void generateEmulatedText
   (const vector<string>& startingWords,
    const FollowersMap& consecutiveWords,
    int N)
{
  const int MAXLINELENGTH = 80;

  string word = "";
  int lineLength = 0;

  for (int wordCount = 0; wordCount < N; ++wordCount)
    {
     map<string, vector<string> >::const_iterator followers
         = consecutiveWords.find(word);
     if (followers != consecutiveWords.end()
         && followers->second.size() > 0)
        word = randomChoice(followers->second);
     vector<string> followers = consecutiveWords[word];
     if (followers.size() > 0) 
       word = randomChoice(followers);
     else
       word = randomChoice(startingWords);

     if (word.length() + lineLength > MAXLINELENGTH)
       {
         cout << endl;
         lineLength = 0;
       }
     if (lineLength > 0)
       cout << ' ';
     cout << word;
     lineLength += word.length();
    }      
}

3.2.3 generateEmulatedText - alternate retrieval from a multimap

We could accomplish the switch-over to a multimap most easily by getting the range of positions in the multimap corresponding to the key lastWord (via the equal_range) function.

generateEmulatedText5.listing

void generateEmulatedText
   (const vector<string>& startingWords,
    const FollowersMap& consecutiveWords,
    int N)
{
  const int MAXLINELENGTH = 80;

  string word = "";
  int lineLength = 0;

  for (int wordCount = 0; wordCount < N; ++wordCount)
    {
     pair<FollowerMap::const_iterator,
          FollowerMap::const_iterator> followersp =
            = consecutiveWords.equal_range(word);
     if (followersp.first != followersp.second) 
       {
        vector<string> followers;
        for (FollowerMap::const_iterator p = followersp.first;
             p != followersp.second; ++p)
          followers.push_back ((*p).second);
        word = randomChoice(followers);
       }
     else
       word = randomChoice(startingWords);

     if (word.length() + lineLength > MAXLINELENGTH)
       {
         cout << endl;
         lineLength = 0;
       }
     if (lineLength > 0)
       cout << ' ';
     cout << word;
     lineLength += word.length();
    }      
}

Then we can loop through those positions, copying the data values (each position p points to a (key,data) pair, which is why we actually copy (*p).second).

Not exactly pretty. I think I liked the map-onto-vector version better.

This version looks inefficient because of the loop used for copying. But it’s worth noting that we’re not actually copying any more strings here than we did with the map-onto-vector version. In each case, we were making copying out the entire vector just so we could do a random selection from it.

Could we eliminate this copy by randomly selecting a position from the range inside the map, and then only extracting the single data element we really want?

3.2.4 randomChoice

We can do that by generalizing our randomChoice function to handle any kind of position (iterator).

template <class Iterator> Iterator randomChoice (Iterator
start, Iterator finish) { int nChoices = finish - start; return
start + rand() % choices.size(); }

This template would work just fine when used with our startingWords vector:

word = randomChoice (startingWords.begin(),
                     startingWords.end());

but it can’t be used, as it is, with the positions we get from our multimap.

The reason for the difference is that operations like subtracting one iterator from another (to determine the number of positions between them) or adding an integer to an iterator are valid only for random access iterators. vector provides random access iterators. multimap does not.

3.2.5 generic randomChoice

This sort of problem comes up often enough that the standard library provides some functions for dealing with it.

template <class Iterator>
Iterator randomChoice (Iterator start, Iterator finish)
{
  int nChoices = distance(start, finish);
  return advance(start, rand() % choices.size());
}

distance computes the number of positions between two iterators. When distance(first,last) is called with a pair of random access iterators, distance does its work in $O(1)$ time by simply computing last-first. But if first and last are not random access, then distance(first,last) simply repeatedly applies ++ to first until it becomes equal to last, counting the number of steps required to do this. So, for non-random access iterators, distance(first,last) is $O(\mbox{distance(first,last)})$ - it runs in time proportional to the size of its answer.

Similarly, advance(start,k) returns the iterator denoting the position $k$ steps past start. If start is random access,this is done in $O(1)$ time. If not, the operation is $O(k)$.

With this version, the multimap solution may be easer to read than the map-onto-vector, and should run somewhat faster because we have eliminated a lot of pointless copying of vectors and strings.

3.2.6 Summary

generateEmulatedText6.listing

void generateEmulatedText
   (const vector<string>& startingWords,
    const FollowersMap& consecutiveWords,
    int N)
{
  const int MAXLINELENGTH = 80;

  string word = "";
  int lineLength = 0;

  for (int wordCount = 0; wordCount < N; ++wordCount)
    {
     vector<string> followers;
     pair<FollowerMap::const_iterator,
          FollowerMap::const_iterator> followersp =
            = consecutiveWords.equal_range(word);

     if (followersp.first != followersp.second) 
       word = randomChoice
                 (followersp.first,
                  followersp.second).second;

     else
       word = randomChoice(startingWords.begin(),
                           startingWords.end());


     if (word.length() + lineLength > MAXLINELENGTH)
       {
         cout << endl;
         lineLength = 0;
       }
     if (lineLength > 0)
       cout << ' ';
     cout << word;
     lineLength += word.length();
    }      
}

4 Sets, MultiSets, Maps, and MultiMaps: all in the family

We’ve already looked at how to implements sets. Given an efficient implementation of set, the others could be built on top of that with little trouble.

4.1 From Set to Map

To implement a map using a set, we start by remembering what the value_type of a map looks like:

template <class Key, class T, class Compare=less<Key> >
class map
{
  ⋮
  typedef pair<const Key&, T> value_type;

Since the operator* of our map iterators will need to return references to value_types, we know we need to actually use these pairs to store the keys and data.

It won’t do us any good though, to store the map data in a set<value\_type>, because we can’t change the elements in a set, and we do need to be able to change the data portion of a map. So, instead, we will use a set of pointers to value_types.

template <class Key, class T, class Compare=less<Key> >
class map {
public:
  typedef pair<const Key, T> value_type;
private:
  Compare comp;

  struct VTComparison
  {
    bool operator() (const value_type* left,
                     const value_type* right)
    {return comp(left->first, right->first);}
  };

  typedef set<value_type*, VTComparison> reptype;
  reptype data;
  ⋮

We’ll illustrate how this works by looking at the operator[] for our new map.

template <class Key, class T, class Compare=less<Key> >
T& map<Key,T,Compare>::operator[] (const Key& key)
{
  value_type temp (key, T());
  reptype::iterator p = data.find(temp);
  if (p == data.end())
    p = data.insert (temp);
  return (*p).second;
}

All we do is to search the underlying set. If we don’t find a match, we create a new key-data pair. Finally, we return a reference to the data portion of the appropriate key-data pair.

(It is possible to do this without creating the dummy T() value unless we need to insert it, but we don’t really need to see the complications.)

4.2 From Map to MultiSet

template <class Key, class Compare=less<Key> >
class multiset {
public:
private:
  map <Key, int> data;

To implement a multiset<T>, we could use a map<T,int>, where the int indicates how many copies of an element are in the multiset.

template <class Key, class Compare=less<Key> >
multiset<Key,Compare>::iterator 
  multiset<Key,Compare>::insert (const Key& key)
{
  auto pos = data.find(key);
  if (pos == data.end())
     // This key is not in the multiset yet
	 return data.insert (map <Key, int>(key, 1);
  else
     { // This key is already in the multiset.
	  ++(pos->second);
	  return pos;
	 }
}

If we are adding a new key to the multiset, we add it with count 1. If we are adding a key that is already in the multiset, we simply increment the exiting key’s count.

4.3 From Map to MultiMap

A multimap<Key,T> can be formed from a map<Key, list<T> > so that each key can be mapped onto an entire list of related data values. The only tricky part in doing this is implementing the multimap iterators, as we need to record both a position within the map and within the most recently-accessed list.

The point of this exercise has been to show that there are no new concepts, just “grunt work” programming involved in implementing multiset, map, and multimap once we have a suitable set type.

Maps and MultiMaps

Steven J. Zeil

1 Interface

1.1 The template header

1.2 Internal type names

1.3 Insert & Erase

1.4 Access

1.4.1 find

1.4.2 operator[] and at()

1.4.3 lower_bound, upper_bound, & equal_range

2 Maps versus Sequences and Sets

2.1 Converting to map

2.2 Supplying iterators

3 An Extended Example: Literary Style Emulator

3.1 The Program

3.1.1 main() calls functions for bulk of processing

3.1.2 main(): Command-Line Arguments

3.1.3 generateEmulatedText()

3.1.4 generateEmulatedText(): expand the loop body

3.1.5 generateEmulatedText(): simplified loop

3.1.6 generateEmulatedText(): line filling

3.1.7 randomChoice()

3.1.8 generateEmulatedText(): data structure for parameters

3.1.9 This time, let’s try a map

3.1.10 Improved map access

3.1.11 readDocument()

3.1.12 readDocument(): setting up the variables

3.1.13 readDocument(): file I/O

3.1.14 readDocument(): adding a starting word

3.1.15 adding a word pair

3.1.16 Using a reference variable to avoid copying

3.1.17 Summary

3.2 Once More, with MultiMaps

3.2.1 Tracking word pairs

3.2.2 generateEmulatedText - retrieving from a multimap

3.2.3 generateEmulatedText - alternate retrieval from a multimap

3.2.4 randomChoice

3.2.5 generic randomChoice

3.2.6 Summary

4 Sets, MultiSets, Maps, and MultiMaps: all in the family

4.1 From Set to Map

4.2 From Map to MultiSet

4.3 From Map to MultiMap

2.1 Converting to `map`