Comparing Data

Steven J. Zeil

Last modified: Oct 26, 2023
Contents:

Most C++ classes will need to provide for comparisons: Are these two things equal? Does one of them come before than the other?

Comparisons fall into the set of functions that you should consider providing even if you don’t think your application needs them.

1 operator==

After assignment operators and I/O, the most commonly programmed operators would be the relational operators, especially == and <.

class Address
{
   ⋮
  bool operator== (const Address&) const;
   ⋮
 private:
     std::string street;
     std::string city;
     std::string state;
     std::string zip;
};

The trickiest thing about providing these operators is making sure we understand just what they should mean for each individual ADT. For example, if I were to write

if (address1 == address2)

what would I expect to be true of two Addresses that passed this test? Probably that the two addresses would have the same street, city, state, and zip - in other words, that all the data fields should themselves be equal. In that case, this would be a reasonable implementation:

bool Address::operator== (const Address& right) const
{
  return (street == right.street)
      && (city   == right.city)
      && (state  == right.state)
      && (zip    == right.zip);
}

We could do something similar for Author:

class Author {
   ⋮
  bool operator== (const Author&) const;
   ⋮
 private:
   std::string name;
   Address address;
};
bool Author::operator== (const Author& right) const
{
  return (name       == right.name)
      && (address    == right.address);
 }

which, interestingly, makes use of the Address::operator== that we have just defined.

In both of these examples, we have decided that two objects were equal if all of their data members were equal. But that isn’t a universal law. Sometimes we have data members that we don’t want to employ in such comparisons. Sometimes we have data members that simply don’t need to be checked in such comparisons.

For example, consider a Book:

class Book {
   ⋮
  bool operator== (const Author&) const;
   ⋮
 private:
   std::string title;
   static const int MaxAuthors = 12;
   const int numAuthors;
   Author* authors; // pointer to array
   std::string isbn;
};

We could certainly follow the same patterns as the first two examples of comparing every data member (complicated, slightly, by the fact that one of our data members is a pointer to an array, so we need a loop to compare each of the authors):

bool Book::operator== (const Book& right) const
{
  if (title != right.title || isbn != right.isbn || numAuthors != right.numAuthors)
     return false;
  for (int i = 0; i < numAuthors; ++i)
     if (!(authors[i] == right.authors[i]))
         return false;
  return true;
}

(We don’t need to compare MaxAuthors because it is static. There is only a single MaxAuthors value that is shared by all Books.)

However, in the case of books, a little knowledge of the real world can simplify things. The ISBN (International Standard Book Number) is a unique1 identifier assigned to a book by its publisher. That means that we can actually tell if two Book values are intended to represent the same book by simply comparing the ISBNs:

bool Book::operator== (const Book& right) const
{
  return isbn == right.isbn;
}

2 operator<

C++ class designed provide operator< whenever possible, and many code libraries assume that this operator is available.

Again, just what this should mean depends upon the uses we intend to make of our ADT, but there are a few hard and fast rules:

We should always design operator< so that

In practical terms, this means that an implementation of operator< needs to look at exactly the same data members as the accompanying operator==. Most implementations of < follow a strategy of

  1. Deciding on what order to check the data members.
  2. Compare two data members. If they are equal, then we have a “tie” and move on to the next data member to serve as the “tie breaker”. If we have checked all of the data members, and they were all equal, then both objects are equal and we return false.(Remember: we are implementing <, and we have just discovered that the object on the left is not less than the object on the right.)
  3. When we find two data members that are not equal, we compare them again to see if the one from the object on the left is less than the one from the object on the right.

Thus we can get:

bool Address::operator< (const Address& right) const
{
  if (street != right.street)
      return street < right.street;
  else if (city != right.city)
      return city < right.city;
  else if (state  != right.state)
      return state < right.state;
  else
      return zip < right.zip;
}
bool Author::operator< (const Author& right) const
{
  if (name != right.name)
      return name < right.name;
  else
      return address < right.address;
 }
bool Book::operator< (const Book& right) const
{
  return isbn < right.isbn;
}

2.1 operator< and sorting

If it strikes you are strange to ask whether one book, or address, or author is “less than” another, it may help to think of < as meaning “comes before”. We can think of putting books in alphabetic order by title, in which case it certainly makes sense that we could ask whether one book should appear before the other in our ordering.

But we might also want to put books in order by their authors, using the title to break ties.

There is a useful utility function, sort, in <algorithm>, that sorts an array of items. To do that, it uses operator< to decide which values should come before which other values.

Suppose we had the following code:

Author poe ("Poe, Edgar Allen");
Author twain ("Twain, Mark");
Book heart ("The Tell-Tale Heart");
heart.setISBN("978-1592181667");
heart.addAuthor(poe);
Book masque ("The Masque of the Red Death");
masque.addAuthor(poe);
masque.setISBN("978-9387779709");
Book sawyer ("Tom Sawyer");
sawyer.addAuthor(twain);
sawyer.setISBN("979-8749478112");
Book yankee ("A Connecticut Yankee in King Arthur's Court");
yankee.addAuthor(twain);
yankee.setISBN("978-0486415918");

Book books[4] = {heart, masque, sawyer, yankee};

sort (books, books+4); // uses Book::operator<

for (int i = 0; i < 4; ++i)
    cout << books[i] << endl;

Depending upon the output formatting chosen for the Book output operator, if our comparison operators were:

bool Book::operator== (const Book& right) const
{
  return isbn == right.isbn;
}

bool Book::operator< (const Book& right) const
{
  return isbn < right.isbn;
}

we might then see this output:

Twain, Mark        A Connecticut Yankee in King Arthur's Court 978-0486415918
Poe, Edgar Allen   The Tell-Tale Heart                         978-1592181667
Poe, Edgar Allen   The Masque of the Red Death                 978-9387779709
Twain, Mark        Tom Sawyer                                  979-8749478112

The books are in ascending order by ISBN.

Changing the < operator implementation would change the order. For example, if we wrote:

bool Book::operator== (const Book& right) const
{
  return isbn == right.isbn;
}

bool Book::operator< (const Book& right) const
{
  return isbn > right.isbn;
}

then we are saying that one book comes before another if its ISBN is greater, and the output changes to

Twain, Mark        Tom Sawyer                                  979-8749478112
Poe, Edgar Allen   The Masque of the Red Death                 978-9387779709
Poe, Edgar Allen   The Tell-Tale Heart                         978-1592181667
Twain, Mark        A Connecticut Yankee in King Arthur's Court 978-0486415918

And if we change our comparison operators to use the other data members,

bool Book::operator== (const Book& right) const
{
  if (title != right.title || isbn != right.isbn || numAuthors != right.numAuthors)
     return false;
  for (int i = 0; i < numAuthors; ++i)
     if (!(authors[i] == right.authors[i]))
         return false;
  return true;
}

bool Book::operator< (const Book& right) const
{
  if (title != right.title)
      return title < right.title
  for (int i = 0; i < min(numAuthors, right.numAuthors); ++i)
      if (!(authors[i] == right.authors[i]))
          return authors[i] < right.authors[i];
  if (numAuthors != right.numAuthors)
      return numAuthors < right.numAuthors;
  return isbn < right.isbn;
}

the output changes to

Twain, Mark        A Connecticut Yankee in King Arthur's Court 978-0486415918
Poe, Edgar Allen   The Masque of the Red Death                 978-9387779709
Poe, Edgar Allen   The Tell-Tale Heart                         978-1592181667
Twain, Mark        Tom Sawyer                                  979-8749478112

Because titles are being compared first, the sort orders the books by title.

And if we compare authors before titles:

bool Book::operator< (const Book& right) const
{
  for (int i = 0; i < min(numAuthors, right.numAuthors); ++i)
      if (!(authors[i] == right.authors[i]))
          return authors[i] < right.authors[i];
  if (numAuthors != right.numAuthors)
      return numAuthors < right.numAuthors;
  if (title != right.title)
      return title < right.title
  return isbn < right.isbn;
}

the output would be

Poe, Edgar Allen   The Masque of the Red Death                 978-9387779709
Poe, Edgar Allen   The Tell-Tale Heart                         978-1592181667
Twain, Mark        A Connecticut Yankee in King Arthur's Court 978-0486415918
Twain, Mark        Tom Sawyer                                  979-8749478112

Now the books are ordered by authors, with “ties” broken by looking at the titles.

3 The Other Relational Operators

You might wonder why we put all of our focus on the the == and < operators and ignored the others (>, !=, <= and >=).

Did you notice, our earlier code, the slight inconsistency in our treatment of authors:

bool Book::operator< (const Book& right) const
{
  if (title != right.title)
      return title < right.title
  for (int i = 0; i < min(numAuthors, right.numAuthors); ++i)
      if (!(authors[i] == right.authors[i]))
          return authors[i] < right.authors[i];
  if (numAuthors != right.numAuthors)
      return numAuthors < right.numAuthors;
  return isbn < right.isbn;
}

Obviously, saying !(x == y) is logically equivalent to saying x != y, but we actually could not say authors[i] != right.authors[i] because we had not defined an operator!= function for the class Author.

The authors of the C++ standard library have, in general, been very careful to use on == and < in functions like sort so that programmers designing new classes would only need to provide those two operations. But if you feel that your code would be easier to read and write with a full complement of relational operators, that’s easy enough to do. There is a set of functions in <utility> that defined each of the remaining four relational operators in terms < and ==. These are kept in a special namespace, std::rel_ops. So we could have written:

bool Book::operator< (const Book& right) const
{
  using namespace std::rel_ops;

  if (title != right.title)
      return title < right.title
  for (int i = 0; i < min(numAuthors, right.numAuthors); ++i)
      if (authors[i] != right.authors[i])
          return authors[i] < right.authors[i];
  if (numAuthors != right.numAuthors)
      return numAuthors < right.numAuthors;
  return isbn < right.isbn;
}

Not a big deal, but it is sometimes convenient.


1: : Actually, this stopped being true when Amazon opened the floodgates to self-publishing and began selling books that lacked an ISBN. Many authors who self-publish do not bother paying the fee required to obtain a unique ISBN.