Generic Programming

Steven J. Zeil

Last modified: May 12, 2016

When we combine iterators with template functions, we get a powerful tool for writing programs. Because the iterator interface is the same no matter what kind of container the data is really in, many algorithms can be written as function templates to work with data in almost any kind of container.

This combination is called generic programming.

1 Iterators + Templates = Generic Programming

One benefit of designing our own classes to follow the “standard” form of iterators is that the C++ standard library is packed with small function templates for using iterators to do common tasks. These can be found in the header file <algorithm>.

For example, we can search any range of data for a particular element using std::find:

#include <algorithm>
⋮
pos = find(startingPosition, stoppingPosition, x);

This searches a sequence of data, beginning at startingPosition, up to but not including stoppingPosition, for the value x.

If it finds x, it returns the position where it was found. If it doesn’t find it, it returns stoppingPosition (which, I was careful to note, is not one of the positions actually searched, so we can unambiguously determine whether we found x or not:

#include <algorithm>
  ⋮
pos = find(startingPosition, stoppingPosition, x);
if (pos != stoppingPosition)
  {
    cout << "Found it!" << endl;
  }
else
  {
    cout << "It's not in there." << endl;
  }

Now, pos, startingPosition, and stoppingPosition are all iterators of some kind. They must all be of the same iterator type, and that type has to be a position of whatever the type of x is".

The std::find function will work with iterators taken from an array, a vector, a list, … , or whatever.

How does this happen? std::find is implemented as a template:

template <typename Iterator, typename T>
Iterator find (Iterator start, iterator stop, T x)
{
    while (start != stop && !(x == *start))
        ++start;
    return start;
}

This template makes no assumptions about the iterators passed to it, except that they support the operations !=, *, and ++, which all iterators, no matter what container they come from, are supposed to support.
It also makes minimal assumptions about the type of x. It simply assumes that x will be from a data type that supports comparison via ==.

So this template can be applied to iterators from arrays, vectors, lists, Books, PersonnelRecords, MyFavoriteDataTypeNumber241, or whatever. As long as we can give a starting position and a stopping position, the code in find is valid.

One of the hallmarks of the generic style of programming is that we always try to work on ranges of positions (iterators) with no explicit references to the container that those positions were drawn from.

We’ve already seen another such generic function template when we looked at Searching via Iterator Variants:

template <typename Iterator, typename Value>
Iterator lower_bound (Iterator start, Iterator stop, const Value& key);

as well as a handful of other useful function templates that are, strictly speaking, not “generic” because they are not based on ranges of iterator positions.

In this lesson, we want to look at more such generic functions and get a little better feel for how they can influence C++ programming style. If you have taken CS333 Principles of Programming Languages or a similar course, you may also recognize that generic programming shares a lot of ideas with “functional programming” as well.

2 Copying

All of our std:: containers support copying via copy constructors or assignment. But those cases involve copying between two containers of exactly the same type, e.g., a vector<int> to another vector<int>, or perhaps a list<string> to another list<string>.

But what if we wanted to copy a vector of strings to a list of strings? We can use std::copy for that.

For example, we can copy one container into another this way:

vector<string> ws(50);
std::string str[50];
  ⋮
copy (ws.begin(), ws.end(), str);

(copies ws into str)

This works because copy is written entirely in terms of iterator operations, and iterators can be applied to almost any container.

template <class InputIterator,
          class OutputIterator>
OutputIterator copy(InputIterator first, 
                    InputIterator last,
                    OutputIterator result)
{
  while (first != last)
    {
      *result = *first;
      result++; first++;
    }
  return result;
}

Notice how we have two template parameters, InputIterator and OutputIterator, that get replaced when copy is used. Of course, the names for these parameters are arbitrary. We could just as well have called them George and Martha instead of InputIterator and OutputIterator (at least, if we ignore documentation quality). How, then, does the compiler know that this copy algorithm is supposed to work with iterators?

It doesn’t, really. But the copy operation is written in terms of operator*, operator++ and operator!=, all of which are part of the conventional iterator interface. So the compiler will allow us to use copy with any data type that is sufficiently iterator-like to provide those operations.

2.1 Using std::copy

So, for example, in one version of Book (using a dynamically alllocated array), we implemented the Book constructor and assignment operator as shown here:

oldBook.cpp

class Book {
public:
  Book (int nAuthors, Author* a, 
        string theTitle, string theID);
  Book (const Book& b);
   ⋮
private:
  std::string title;
  int numAuthors;
  Author* authors;  // dynamic array of authors
  std::string identifier;
};

   ⋮

Book::Book (int nAuthors, Author* a, 
            string theTitle, string theID);
{
  authors = new Author[nAuthors];
  numAuthors = nAuthors;
  identifier = theID;
  for (int i = 0; i < numAuthors; ++i)
    authors[i] = a[i];
}

Book::Book& operator= (const Book& b)
{
  delete [] authors;
  authors = new Author[b.numAuthors];
  numAuthors = b.numAuthors;
  identifier = b.identifier;
  for (int i = 0; i < numAuthors; ++i)
    authors[i] = b.authors[i];
  return *this;
}

With the standard copy function, this can be written:

bookCopy.cpp

Book::Book (int nAuthors, Author* a, 
            string theTitle, string theID);
{
  authors = new Author[nAuthors];
  numAuthors = nAuthors;
  identifier = theID;
  copy (a, a+nAuthors, authors);
}

Book::Book& operator= (const Book& b)
{
  delete [] authors;
  authors = new Author[b.numAuthors];
  numAuthors = b.numAuthors;
  identifier = b.identifier;
  copy (b.authors, b.authors+b.numAuthors, authors);
  return *this;
}

OK, big deal. We saved two whole lines of code. Is that worth worrying about?

Well, that’s really not the point. What we did gain is the instant recognition that what is going on is a “copy”.

Someone reading the original version would need to study the loop and recognize the pattern of a copy.

That might onlysave a few seconds, but multiply that by all the places in a real program where copies and similar common, trivial programming patterns occur. The total gain in readability may be substantial.

And, as you gain experience using these standard algorithms, you will probably find that you avoid a lot of diddly little programming mistakes that you would have made in endlessly rewriting the more detailed explicit loops.

2.1.1 Generics and Clean Coding

This use of std functions is very much in keeping with one of the practices that make up Clean Coding: A function should do only one thing. Robert Martin, the guru of Clean Coding, suggests that

“The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.”

Martin Fowler explains the reasoning for this:

“If you have to spend effort into looking at a fragment of code to figure out what it’s doing, then you should extract it into a function and name the function after that ‘what’. That way when you read it again, the purpose of the function leaps right out at you”

At its extreme, this practice of Clean Coding suggests that functions should rarely have nested control flow constructors – no nested loops, no ifs inside loops or vice versa. If you can’t look at a loop body or the then- or else-part of an if and say what it does in a simple phrase, do you really understand it? Can you possibly understand what the nested construct that includes that block of code does? And if you can describe that block of code in a simple phrase, why not pull it out into a separate (private) function named with that descriptive phrase?

2.2 copy and I/O iterators

We can use the standard template ostream_iterator to get an output iterator that stores items in an output stream. All iterators represent a position within a container. In this case, the container is the output stream and the “position” is the place where we are set to write our next output. In effect, then, copying to that position will let us write a whole series of items.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>

using namespace std;

int main ()
{
  int v[5] = {-1, 5, 5, 5, 8};
  string s[4];
  s[0] = "zero";
  s[1] = "one";
  s[2] = "two";
  s[3] = "three";

  copy (v, v+5, ostream_iterator<int>(cout, "\n")); 
  // writes -1 5 5 5 8, each number on a separate line

  copy (s, s+4, ostream_iterator<string>(cout, "!=")); 
  // writes zero!=one!=two!=three!=

  return 0;
}

Similarly, there is an input iterator called istream_iterator, which can be used to read from an input stream. Try compiling and running the program shown here.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>

using namespace std;

int main ()
{
  copy (istream_iterator<string>(cin), // input iterator reading strings from cin
        istream_iterator<string>(),    // end-of-file position
        ostream_iterator<string>(cout, "\n") // output iterator writing to cout
       );

  return 0;
}

Note that it doesn’t quite exactly copy its input to its output. Can you figure out why?

Answer

The obvious difference is that we are separating words by newline characters on output. In the input, they could be separated by any whitespace characters. Hence the input

abc def
ghi

will generate the output

abc
def
ghi

The istream_iterator uses ordinary >> to read things. One of the characteristics of >> is that it skips over leading whitespace before it begins actually processing characters. So another, more subtle difference is that “extra” spaces and other whitespace will disappear.

The input

   abc     def


ghi

will generate the output

abc
def
ghi

2.3 Copying, Overwriting, and Inserting

This is dangerous:

int a[5] = {1, 2, 3, 4, 5};
int b[3];
copy (a, a+5, b);

It is the generic equivalent of

int a[5] = {1, 2, 3, 4, 5};
int b[3];
for (int i = 0; i < 5; ++i)
   b[i] = a[i];

which “obviously” writes past the end of the array b.

For similar reasons, this does not work:

int a[5] = {1, 2, 3, 4, 5};
vector<int> v;
copy (a, a+5, v.begin());

copy writes data into existing positions - you need to be sure that the data slots actually exist.

Pretty much the same is true for the output range of any generic function - they all assume that the data positions you name as the output range already exist.

2.3.1 Making room

This works:

vector<int> v;
  ⋮
int a[5] = {1, 2, 3, 4, 5};
v.resize(5);   
copy (a, a+5, v.begin());

because resize actually creates the data positions.

This does not:

vector<int> v;
  ⋮
int a[5] = {1, 2, 3, 4, 5};
v.reserve(5);   
copy (a, a+5, v.begin());

because reserve merely sets things up so that, if we later try to create some data slots, we won’t need to allocate new memory for them. The data slots may exist in the sense that the space for them has been allocated, but they have not actually been initialized and the vector’s size() indicates that those positions are still unused.

2.3.2 copying and expanding

resize is a useful trick, but it’s specific to vectors. There’s a more general solution to this problem — special iterators that are used to create new data slots as they are being filled.

The special iterators are
- back_inserter
- front_inserter
- inserter
All three are contained in <iterator>

2.3.3 `back_inserter`

back_inserter(container) returns an iterator on that container

This is an output iterator – you can assign to it but not look at it.
Each assignment to the element at this iterator results in a push_back() call on the container.

int a[5] = {1, 2, 3, 4, 5};
vector<int> v;
assert (v.size() == 0);
copy (a, a+5, back_inserter(v));
assert (v.size() == 5); // Five push_back's were done

2.3.4 `front_inserter`

front_inserter(container) returns an iterator on that container.

This is an output iterator - you can assign to it but not look at it.
Each assignment to the element at this iterator results in a push_front() call on the container.

int a[5] = {1, 2, 3, 4, 5};
list<int> alist;
assert (alist.size() == 0);
copy (a, a+5, front_inserter(alist));
assert (alist.size() == 5);
assert (alist.front() == 5);
assert (alist.back() == 1); // note that the order of the data is reversed

Obviously, this only works for containers that actually provide the push_front operation, so we can’t do this, for example, with vectors.

2.3.5 `inserter`

inserter(container,iter) returns an iterator on that container denoting the same position as iter

This is an output iterator - you can assign to it but not look at it
Each assignment of foo to the element at this iterator results in an insert(iter,foo) call on the container

int a[5] = {1, 2, 3, 4, 5};
list<int> alist (3, 0);
list<int>::iterator pos = alist.begin();
++pos;
copy (a, a+5, inserter(alist, pos));
  // alist contains 0 1 2 3 4 5 0 0

3 Other Useful Generic Functions in the std:: Library

3.1 equal

Another useful function is equal, which tests to see if corresponding elements in two position ranges are equal:

int v[5] = {-1, 5, 5, 5, 8};
int w[3] = {5, 5, 5};

assert (!equal(w, w+3, v));
assert (equal(w, w+3, v+1));
assert (equal(v+1, v+4, w));

There are actually two forms of equal. The three parameter form

equal (start1, stop1, start2)

checks to see if the elements in the range of positions start1...stop1 are equal to the same number of elements in positions starting at start2.

The four parameter form

equal (start1, stop1, start2, stop2)

checks to see if the elements in the range of positions start1...stop1 are equal to the number of elements in positions start2...stop2 and that the number of elements in the two ranges are the same.

For example, suppose that we wanted to know if two books have the same list of authors. We could have written it this way:

bool sameAuthors (const Book& left, const Book& right)
{
  if (left.numberOfAuthors() == right.numberofAuthors())
    {
     auto lpos = left.begin();
     auto rpos = right.begin();
        
     for (int i = 0; i < left.numberOfAuthors; ++i)
       {
         if (*lpos != *rpos)
           return false;
         ++lpos; ++rpos;
       }
     return true;
    }
  else
    return false;
}

but we can simplify this to

bool sameAuthors (const Book& left, const Book& right)
{
  if (left.numberOfAuthors() == right.numberofAuthors())
    {
     return equal(left.begin(), left.end(), right.begin());
    }
  else
    return false;
}

bool sameAuthors (const Book& left, const Book& right)
{
  return (left.numberOfAuthors() == right.numberofAuthors())
    &&  equal(left.begin(), left.end(), right.begin());
}

bool sameAuthors (const Book& left, const Book& right)
{
  return equal(left.begin(), left.end(), right.begin(), right.end());
}

I might prefer the next-to-last version as being slightly faaster. It’s often a good practice to do the cheap $O(1)$ test first before diving in to a test that requires looping through the whole container.

Similarly, we could easily implement a comparison operator for sorting books by author lists via the function lexicographical_compare. See the references at the end of these notes for details.

3.2 find and lower_bound

The find function performs an unordered sequential search for a value in some range of positions.

p = find(ws.begin(), ws.end(), "foobar");

searches a container for the indicated string.

If find cannot locate the indicated string, it returns the end position of the search range (e.g., ws.end() in the example above). Remember that iterator ranges are always inclusive on the starting position, exclusive on the ending position, so the end position of a search range could not possibly be returned by a successful search.

We can provide the equivalent of an ordered insert for containers that support the insert operations using another standard template function:

Container<Element> container;
   ⋮
cin >> x;
Container::iterator p = lower_bound (container.begin(), container.end(), x);
container.insert (x, p);

lower_bound returns the first location where x could be inserted if the collection is being maintained in sorted order.

What makes lower_bound a really nice function is that it uses a binary search when given random-access (e.g., std::vector) or trivial (array) iterators, and uses a sequential search when given merely forward or bi-directional (e.g., std::list) iterators.

There is a related function, upper_bound, which is called the same way, that returns the last position where key could be inserted. For example, suppose we had a container with 3 copies of key already in it. Then lower_bound would point to the first of these three, and upper_bound would point to the position just after the third copy.

Sometimes we don’t need the position – we just want to know if the value is in there or not. If we know that we have a random access iterator, we can use binary_search, which looks like lower_bound but returns a boolean.

Container<Element> container;
   ⋮
cin >> x;
if (binary_search (container.begin(), container.end(), x))
{
    cout << "We found " << x << "!" << endl;
}

3.3 count

std::string a(50, ' ');
  ⋮
count(a, a+50, "xxx");

Counts the number of occurrences of "xxx" in the array a.

3.4 fill

std::string a[50];
fill_n (a, 5, "Hello");

fills the first 5 positions of a with "Hello".

fill(a+5, a+50, "GoodBye");

Fills the remaining positions of a with "GoodBye".

4 std:: Library Generics That Take Functions

4.1 Passing Functions as Parameters

In C++, we can pass functions as parameters to other functions. For example, this is legal (though a bit silly):

typedef int *FunctionType (int);  // FunctionType is declared as a type name
                                  // for the set of all functions that
                                  // take 1 int and return an int.
int doItTwice(FunctionType f, int i)
{
  return f(f(i));
}

int mult2 (int x)  {return 2*x;}
  ⋮
int Twelve = doItTwice(mult2, 3);

The function doItTwice will actually call mult2 twice, passing the result of the first call as the parameter of the second.

Functions can be useful as parameters in various applications. They are often used in conjunction with templates.

4.2 for_each

Some of the most commonly used generics (in my own coding, only copy gets used more often) are for applying a function to every element in a range:

There’s an old proverb that “if the only tool a man has is a hammer, then every problem looks like a nail.”

for_each is the “hammer” of generic functions. Given a sufficiently complicated function for its third parameter, you can use it to replace any loop and to do anything you might do with almost any of the other generic functions in std.

But that doesn’t mean you should wield this hammer on every problem you see. The purpose of using generics is to express the code in a way that makes that “instant recognition” of the purpose of a loop possible. So if the computation being performed is really a “copy”, or a search (“find”), or a transform, or a selective erasure, or any of the other more specialized kinds of iterations provided as std generics, then you should use those more specific functions rather than for_each.

When in doubt, choose the most expressive option, the one that gives the most information to the person reading your code.

for_each applies a unary function (i.e., a function taking a single parameter) to each element in a range.

void printLength(string s) 
{
  cout << s << " is of length " 
       << s.length() << endl;
}
  ⋮
for_each (ws.begin(), ws.end(), printLength);

In this case, if ws contains the words [“aardvarks”, “are”, “furry”], then the output would be:

aardvarks is of length 9
are is of length 3
furry is of length 5

4.3 transform

for_each applies a function to every element in a range and disregards the return values, if any.

transform, on the other hand, applies a function to every element in a range and collects the returned values by copying them into an output range.

#include <string>
  ⋮
int v[5] = {-1, 5, 5, 5, 8};
string s[5];
transform (v, v+5, s, to_string);

In this example, each of the five v values will be passed, one at a time, to to_string (a function that converts numbers to strings) and the five resulting string values, [“-1”, “5”, “5”, “5”, “8”], stored in s.

The first two parameters give the input range, and the third parameter is the beginning of the output range. We don’t have to specify the end of the output range, because there will be just as many outputs are there are input values. The final parameter is, of course, the function we want to apply to each element.

transform can be used to replace each element in a range by some function of itself. To do this, we simply make the output range the same as the input range.

For example, suppose we had an array myNumbers of N floating point numbers and were really interested in working with the absolute value of all those numbers. We could write

transform(myNumbers, myNumbers+N, myNumbers, fabs);

and thereby replace every element in myNumbers by its absolute value (the fabs function).

4.4 "_if" Generics and Predicates

Many of the generic functions that do searching or comparisons of some kind (including some we have already discussed) have an alternate "_if" version that can take a function that is used in place of the obvious defaults.

For example, we earlier looked at this example of find:

list<string> ws;
    ⋮
p = find(ws.begin(), ws.end(), "foobar");

to search a container for a specific string.

Suppose, however, that we were interested in searching for a string that contained “foobar”. We could do this by supplying the appropriate test as a function:

bool containsFoobar (const string& s)
{
  return s.find("foobar") != string::npos);
}
    ⋮
list<string> ws;
    ⋮
p = find_if(ws.begin(), ws.end(), containsFooBar);

The function passed to find_if must return a bool, for reasons that should be apparent. This kind of function is sometimes called a predicate.

Or, suppose that we wanted to find out if any of the strings in our container were exactly one character long:

bool oneCharLong (const string& s)
{
  return s.size() == 1;
}
    ⋮
list<string> ws;
    ⋮
p = find_if(ws.begin(), ws.end(), oneCharLong);

There is also a useful variation find_if_not.

Similarly,

int k = 0;
count_if (ws.begin(), ws.end(), oneCharLong, k);

would count how many strings in the container ws are one character long.

We can use copy_if to copy selected strings to a new sequence:

list<string> shortStrings;
copy_if (ws.begin(), ws.end(), 
         back_inserter(shortStrings),
         oneCharLong);

would copy all of the 1-character strings from ws into a list.

And, the rather quixotically named

remove_copy_if (ws.begin(), ws.end(), 
                ostream_iterator<string>(cout, "\n"),
                oneCharLong);

copies all the words in ws except for the single-character ones, copying them to the output stream (i.e., writing them on cout, one per line).

There is also a remove_if function, but its name is misleading because it doesn’t actually “remove” anything. It just shifts selected elements to the end of the container, where they can later be removed in one erase call.

4.5 all_of, any_of, none_of

These functions offer the equivalent of the logical quantifiers $\forall$, $\exists$, and $\not{\exists}$.

These take a range of positions to examine, and a unary predicate (“unary” == one parameter, “predicate” == function with a bool return type)

bool containsFoobar (const string& s)
{
  return s.find("foobar") != string::npos);
}
    ⋮
list<string> ws;
    ⋮
if (all_of(ws.begin(), ws.end(), containsFooBar))
    cout <<  "Every element in ws contains 'foobar'." << endl;
if (any_of(ws.begin(), ws.end(), containsFooBar))
    cout <<  "At least one element in ws contains 'foobar'." << endl;
if (none_of(ws.begin(), ws.end(), containsFooBar))
    cout <<  "No element in ws contains 'foobar'." << endl;

Of course, we could do these same tests using find_if and checking the position it returned, for example:

// These two tests are the same.
bool test1 = any_of(ws.begin(), ws.end(), containsFooBar);
bool test2 = find_if(ws.begin(), ws.end(), containsFooBar) != ws.end();

// These two tests are the same.
bool test3 = none_of(ws.begin(), ws.end(), containsFooBar);
bool test4 = find_if(ws.begin(), ws.end(), containsFooBar) == ws.end();

// These two tests are the same.
bool test5 = all_of(ws.begin(), ws.end(), containsFooBar);
bool test4 = find_if(ws.begin(), ws.end(), not1(containsFooBar)) == ws.end();

But the all_of, any_of, and none_of functions are easier to read and convey the programmer’s intent more clearly.

5 C++11 Lambda Expressions

One thing that you may have noticed is that using generics that take function parameters works only if we already have a suitable function or are willing to write one.

The need to provide these small functions can result in an explosion of short, used-only-one-time functions in our code. And, because C++ does not allow functions to be nested within other functions, these small functions will be separated from the generic function call that uses them. This can impair the readability of the code.

To address this probem, C++ allows you to write an anonymous description of a short function right in the place where you would call it or, more often, pass it to other functions. This description called a lambda expression.

One of the most common mistakes that I see students make when trying to work with generics is to try and take a short-cut by simply writing an expression in place of a “proper” function. For example, instead of

bool oneCharLong (const string& s)
{
  return s.size() == 1;
}
    ⋮
list<string> ws;
    ⋮
p = find_if(ws.begin(), ws.end(), oneCharLong);

I often see students do

list<string> ws;
    ⋮
p = find_if(ws.begin(), ws.end(), s.size() == 1); // No!

which doesn’t work because “s” is undeclared.

In effect, the lambda expression provides a legal way to do what those students were attempting.

A lambda expression has components:

capture-description parameter-list function-body

Of these, the parameter-list and the function-body are pretty much the same as they would be in an ordinary, non-member function.

Here’s our “find a single-character string” search using a lambda expression:

list<string> ws;
    ⋮
p = find_if(ws.begin(), ws.end(), 
           [] (const string& s) {s.size() == 1}};

The capture-description explains what to do about variables that are “captured” – used in the function body but not passed as parameters. There are several options for what to do here, but the most common are likely to be:

[] if the function won’t use any variable names that are not declared, as s is above, as a function parameter
[&] if the function should capture variables as references to identically named variables in the current scope,
[this] for functions that should capture the this pointer of the current scope (in effect turning the lambda expression into a member function).

What you don’t see in a lambda expression is a name for the function, because the whole point is to use these for one-shot functions that aren’t going to be referenced anywhere else in the program, nor will you see a description of the function’s return type, because the compiler will deduce this from examining the return statements in the function body.

Here’s some of the examples from the previous section, redone using lambda expressions:

before

void printLength(string s) 
{
  cout << s << " is of length " 
       << s.length() << endl;
}
  ⋮
for_each (ws.begin(), ws.end(), printLength);

with lambda

for_each (ws.begin(), ws.end(), 
  [] (string s) {
      cout << s << " is of length " 
       << s.length() << endl;
  });

before

string convertToString(int i) 
{
  char buffer[256];
  ostrstream obuffer(buffer);
  out << i << ends;
  return string(buffer);
}
  ⋮
int v[5] = {-1, 5, 5, 5, 8};
string s[5];
transform (v, v+5, s, convertToString);

lambda

int v[5] = {-1, 5, 5, 5, 8};
string s[5];
transform (v, v+5, s,
  [] (int i) {
  char buffer[256];
  ostrstream obuffer(buffer);
  out << i << ends;
  return string(buffer);
});

before

bool containsFoobar (const string& s)
{
  return s.find("foobar") != string::npos);
}
    ⋮
list<string> ws;
    ⋮
p = find_if(ws.begin(), ws.end(), 
       containsFooBar);

lambda

list<string> ws;
    ⋮
p = find_if(ws.begin(), ws.end(), 
      [] (const string& s) {
        return s.find("foobar") 
                  != string::npos);
});

Lambda expressions are something of an acquired taste. Some people use them all the time. others feel that they complicate the code, arguing, as we have earlier that it makes more sense to pull that code out into a separate functions so that the function name can serve as documentation of what it actually does.

6 Functors

Function types seem a bit awkward, and there are times when we want to pass a “behavior” to a function or to save a “behavior” in a data structure, but a true function or a lambda expression just won’t do. That takes us into the strange and peculiar idiom of programming called “functors”

A functor is an object that is created to simulate a function. Why would we want to do that? Well, objects can do things that functions can’t. Objects can store information, and can be stored in other data structures. Functions can’t do these, at least not with the same ease and flexibility. A functor can often have the best of both worlds.

6.1 Example: Functors and User Interface Programming

An example of functors can be found in many windowing libraries. Think of the problem of building a menu bar, like the one you see across the top of most windows in user interfaces. A menu bar is probably just an ordered collection of menus:

class MenuBar {
  ⋮
  vector<Menu> menus;
  ⋮
};

A menu would have a name (e.g., “File”, “Edit”), but would itself contain a number of menu items.

class Menu {
  ⋮
  string menuName;
  vector<MenuItem> items;
  ⋮
};

And MenuItems? Well, they certainly have names, but they also will typically have a place to store an object or a pointer to an object that actually performs the desired function.

class MenuItemAction {
public:
   virtual void perform() {/* by default, do nothing */}
};

class MenuItem {
  string itemName;
  MenuItemAction action;
public:
  MenuItem (string name, MenuItemAction act)
    : itemName (name), action(act)  {}
  void setAction (MenuItemAction act) {action = act;}
  void itemWasSelected () {action.perform();}
};

When a menu is being built, the MenuItems are created with appropriate actions:

class FileReader: public MenuItemAction
{
   void perform() 
   {
    ⋮
	  code to read from a file
    ⋮
   }
};
class FileSaver: public MenuItemAction
{
   void perform() 
   {
    ⋮
       ...code to write to a file ...
    ⋮
   }
};
class Quitter: public MenuItemAction
{
   void perform() 
   {
    ⋮
       ...code to close window and shut down program ...
    ⋮
   }
};
  ⋮
FileReader rdr;
FileSaver svr;
Quitter quit;
// build a typical file menu
fileMenu.items.push_back (MenuItem("load", rdr));
fileMenu.items.push_back (MenuItem("save", svr));
fileMenu.items.push_back (MenuItem("exit", quit));

When a user actually selects one of these items from the menu, the windowing system calls the item’s itemWasSelected function, which in turn calls the perform() function of its action.

rdr, svr, and quit are examples of functors; they are objects created for the sole purpose of providing a single function, which in this case is called action.

6.2 operator()

Now in the previous example, the functors are called by calling their action function member. But C++ has special support for functors. We can write functors that are called just like regular functions. We do this by defining an operator().

Now, we’ve seen that you can define operators like <, ==, =, and *.

But it is a truly strange feature of C++ that () is considered an operator. It is a postfix operator (appearing to the right of the object that it operates on, e.g., x(). And, it can be defined to take any number of parameters of any legal C++ type, e.g., x(23,"abcdef").

So when you see something like w(z) written in C++, the only way to tell if you are looking at

a function w applied to a parameter z, or
a call to the operator() member of an object w

is to find the declaration of w and see if it really is a function or an object.

6.3 A Predicate Functor

Let’s see how this works. When we introduced the notion of iterators, we looked briefly at the standard function template find_if.

The code shown here, for example, searches a vector for the first string containing no more than 4 characters.

vector<string> v;
  ⋮
bool isShort(const std::string& s)
{
  return (s.size() <= 4);
}
  ⋮
p = find_if(v.begin(), v.end(), isShort);

Now, let’s write the same thing replacing isShort by a functor.

vector<string> v;
  ⋮
class IsShort {
public:
  bool operator() (const std::string& s)
    {
     return (s.size() <= 4);
    }
};
IsShort isShort;
  ⋮
p = find_if(v.begin(), v.end(), isShort);

6.4 Why Do Functors Work Where Functions Are Expected?

The code for find_if looks like:

template <class InputIterator, class Predicate>
InputIterator find_if(InputIterator first, InputIterator last,
                      Predicate pred) {
  while (first != last && !pred(*first)) ++first;
  return first;
}

So when we call find_if( ... ,isShort), isShort is passed to find_if as the parameter pred, and the body of find_if calls pred(*first) (shown in the highlighted code).

In the original version, isShort (and therefore pred) was a function, so pred(*first) was an ordinary function call.

Now, however, isShort is an object that happens to define an operator() taking a single string parameter, so pred(*first) is a call to pred’s operator().

6.5 Functors Can Have Memory

OK, so what? What does the isShort functor do that the isShort function did not? Absolutely nothing.

But now, suppose that we’re not always interested in strings of length 4 or less. Sometimes we may want stings of length 2 or less, or 8 or less, …

vector<string> v;
  ⋮
int length;
cout << "What's the longest acceptable string length?" << flush;
cin >> length;
p = find_if(v.begin(), v.end(), ????);

There’s no good way to write an ordinary function that we can pass to find_if that would search for strings of variable lengths. But a functor can fill the bill very nicely, by storing the critical length inside the object.

vector<string> v;
  ⋮
class IsShort {
  int length;
public:
  IsShort (int len): length(len) {}

  bool operator() (const std::string& s)
    {
     return (s.size() <= length);
    }
};

vector<string> v;
  ⋮
int length;
cout << "What's the longest acceptable string length?" << flush;
cin >> length;
IsShort isShort (length);    ➀
p = find_if(v.begin(), v.end(), isShort);  ➁

Once we know the length we want to hunt for, we create an object that remembers that length (➀) and that, when its operator() is called with some string, compares that string’s length to the value it has saved.

Then we can pass that object to find_if to be applied to every element in the range we are searching (➁).

Finally, we note that we can actually do without the isShort variable by using the constructor to create a temporary functor object to pass to find_if.

vector<string> v;
  ⋮
class IsShort {
  int length;
public:
  IsShort (int len): length(len) {}

  bool operator() (const std::string& s)
    {
     return (s.size() <= length);
    }
};

vector<string> v;
  ⋮
int length;
cout << "What's the longest acceptable string length?" << flush;
cin >> length;

p = find_if(v.begin(), v.end(), IsShort(length));

6.6 std Functors for Comparisons

The C++ standard library provides a number of functors. The most commonly used are functors for comparing pairs of objects using the relational operators.

Suppose we have a vector of strings that we want to sort into ascending order. The standard sort function takes three parameters. The firsttwo are iterators denoting the range of itmes to be sorted. The third is a comparison function or functor that is used to compare to objects and return “true” if the first object should come before the second in the desired sorted order.

We want the strings arranged into ascending order, so we would like to use the ordinary < for comparisons. We might be able to do this:

sort(v.begin(), v.end(), operator<);

taking advantage of the “real” name of the less-than operator. This should work for strings, but won’t work for some other data types for which operator< is a member function.

A safer alternative that will work for any data type that provides an operator<, member function or standalone function, is provided by the C++ standard library. The standard library provides less<T> for this purpose, so we could write:

sort (v.begin(), v.end(), less<string>());

less is not particularly complicated:

template <class T>
struct less : public binary_function<T, T, bool> {
  bool operator()(const T& x, const T& y) const 
  { return x < y; }
};

As you can see, the less class simply provides an operator() that uses the < operator on its parameters.

In addition to less, the standard library provides equal_to, not_equal_to, greater, greater_equal, and less_equal, all declared in the header <functional>.

A bit of a challenge: what does this function do, and can you think of a better name for it?

template <typename T>
function mysteryFunction (list<T>& aList, const T& x)
{
  list<T>::iterator pos = find_if (aList.begin(), aList.end(),
                                   bind2nd(x, greater<T>()));
  aList.insert (pos, x);
}

bind2d turns a two-parameter functor into a one-parameter functor by supplying a fixed value for the first parameter.

Answer

If you change “list” to “vector”, the code would still compile and would run correctly. Why would that be a bad idea?

Answer

6.7 Substituting Your Own Comparison Functions

Sometimes none of the standard relations will do. In those cases, we just define our own functors.

Suppose you were keeping a vector of PersonnelRecord, but there is no < for entire PersonnelRecords.

You need to pick some appropriate key, a data member or group of members that uniquely defines each record.

For example, we might use a combination of name and address.

Provide a functor that compares the keys:

class CompareByNameAddress {
public:
   bool operator()
    (const PersonnelRecord& p1,
     const PersonnelRecord& p2)
   {return (p1.name() < p2.name())
        || ((p1.name() == p2.name())
            && (p1.address() < p2.address());}
};

set<PersonnelRecord, CompareByNameAddress> employees;

Then

sort (v.begin(), v.end(), CompareByNameAddress() );

would sort your personnel records into order by name, with any people with the same name being sorted by address.

The () following “CompareByNameAddress” in the call above are important. CompareByNameAddress is a class, but we don’t pass classes as parameters to functions, we pass objects. So what is CompareByNameAddress()? It’s a call to the default constructor for the CompareByNameAddress class, which returns an object of that type which, in turn, we pass to the sort function.

7 References

This has not been an exhaustive list of all the generic functions in the C++ standard library. There are many others, but enough to give you a taste. Others are scattered through your textbook.

For a more compact listing, look at this summary sheet.

Generic Programming

Steven J. Zeil

1 Iterators + Templates = Generic Programming

2 Copying

2.1 Using std::copy

2.1.1 Generics and Clean Coding

2.2 copy and I/O iterators

2.3 Copying, Overwriting, and Inserting

2.3.1 Making room

2.3.2 copying and expanding

2.3.3 back_inserter

2.3.4 front_inserter

2.3.5 inserter

3 Other Useful Generic Functions in the std:: Library

3.1 equal

3.2 find and lower_bound

3.3 count

3.4 fill

4 std:: Library Generics That Take Functions

4.1 Passing Functions as Parameters

4.2 for_each

4.3 transform

4.4 "_if" Generics and Predicates

4.5 all_of, any_of, none_of

5 C++11 Lambda Expressions

6 Functors

6.1 Example: Functors and User Interface Programming

6.2 operator()

6.3 A Predicate Functor

6.4 Why Do Functors Work Where Functions Are Expected?

6.5 Functors Can Have Memory

6.6 std Functors for Comparisons

6.7 Substituting Your Own Comparison Functions

7 References

2.3.3 `back_inserter`

2.3.4 `front_inserter`

2.3.5 `inserter`