Analysis of Algorithms: Motivation

Steven J. Zeil

Last modified: Oct 26, 2023
Contents:

An important theme throughout this semester will be viewing the process of developing software as an engineering process. Now, engineers in traditional engineering disciplines, civil engineers, electrical engineers, and the like, face trade offs in developing a new product, trade offs in cost, performance, and quality. One of the jobs of an engineer is to look at competing proposals for the design of a new product and to estimate ahead of time the cost, the speed, the strength, the quality, etc. of products that would result from these competing designs.

Software developers face the same kinds of choices. Early on, you may have several alternative designs and need to make a decision about which of those designs to actually pursue. It’s no good waiting until the program has already been implemented and written down in code. By then you’ve already committed to one design and invested significant resources into it.

How do we make these kinds of choices? In this course, we’ll be looking at mathematical techniques for analyzing algorithms to determine what their speed will be. It will be important that we do this both with real algorithms already written into code and with proposed algorithms that have been given a much sketchier description, probably written in “pseudocode”.

1 Case study: A Spell Checker

Suppose that we worked for a company that was producing word processors and other text manipulation programs. They have decided to add an automatic spell-check feature to the product line. Our designers have considered the process of how to check a document for spelling errors (i.e., any words not in a “dictionary” of known words). They have proposed two different algorithms for finding the set of misspelled words within a target file.

1.1 Version 1: Check every word from the document

collectMisspelledWords (/* inputs */ targetFile, dictionaryFile, 
                        /* outputs */ misspellings)
{
  read dictionaryFile into dictionary;
  open targetFile;
  misspellings = empty;
  while more words in targetFile {
     read word w from targetFile;
     if w is not in dictionary {
       add w to misspellings;
     }
  }
  close targetFile;
}

In the first alternative, we read words, one at a time, from the target file. Each word that is not in the dictionary gets added to the set of misspellings.

Some of the designers, however, have objected that the first algorithm will waste time by repeatedly looking up common words like “a”, “the”, etc., in the dictionary.

They suggest an alternative algorithm.

1.2 Version 2: Build a Concordance

collectMisspelledWords (/* inputs */ targetFile, dictionaryFile, 
                        /* outputs */ misspellings)
{
  concordance = empty;
  open targetFile;
  while more words in targetFile {
     read word w from targetFile;
     add w to concordance;
  }
  close targetFile;

  open dictionaryFile;
  for each word w in the concordance {
    while (last word read from dictionaryFile < w) {
      read another word from the dictionaryFile;
    }
    if (w != last word read from dictionaryFile) {
       add all occurrences of w to misspellings;
    }
  }
  close targetFile;
}

This works by first collecting all words from the document to form a concordance, an index of all the words taken from a document together with the locations where they were found. Then each word is checked just once against the dictionary, no matter how many times that word actually occurs within the target document.


The check of each word is also faster. Because the dictionary and the concordance will (presumably) be sorted, we can compare them in a single pass through both sets of words.

For example, the concordance words for the paragraph:

This works by first collecting all words from the document
to form a concordance, an index of all the words taken from a
document together with the locations where they were found.
Then each word is checked just once agianst the dictionary,
no matter how many times that word actually occurs within the
target document.

would be:

a actually against all an by checked collecting concordance dictionary document each first form found from how index is just locations many matter no occurs of once taken target that the then they this times to together were where with within word words works


Concordance Dictionary
a a
actually aardvark
agianst aardvark’s
all aardvarks
an abaci

When we have a match, we accept the document word as being correctly spelled and advance both lists.

first prev1 of 8next last

So we can check the concordance against the dictionary in a single pass through both.

We’ll need to search each document word against the entire dictionary.

2 Comparison of Spellcheck Solutions

So, which of these algorithms is likely to run faster overall?

We can make plausible arguments in either direction:


Overall, it’s not obvious which is faster. Thinking more deeply about the question, we might ask:

In the lessons that follow, we will develop the mathematical tools for answering these kinds of questions.

3 Why Not Just Time the Programs?

So, why the fuss? Why don’t we simply sit down with a stopwatch, run both programs on some test data, time them, and adopt the one that runs faster?

Of course, if we’re still at the design level, we can’t time the programs because we haven’t written them yet. But even if we actually had the code for both programs in hand, a simple timing experiment might yield different results depending on who ran it and how.


Why should there be such big difference? Well, it turns out that the kind of results we get with timing experiments like that will very considerably because of

3.1 Better than Timing

Now how we are going to overcome these problems?


In order to make our choices, we tend to use worst case more often than average case analysis because