String Manipulation & Comparison

Thomas J. Kennedy

Contents:

1 Strings!

The Python string (i.e., str) provides quite a few methods. While an exhaustive list is a good cheat sheet to have… we can always bookmark the official str docs. For this lecture… we will start with:

str.startswith - returns True if a string starts with a specified prefix string.
str.endswith - returns True if a string ends with a specified suffix string.
str.capitialize - returns a new string with the first letter capitalized.
str.title - returns a new string with each word capitalized.
str.upper - returns a new string with each letter in uppercase.
str.lower - returns a new string with each letter in lowercase.
str.split - split a string at each occurrence of a given substring.
str.splitlines - split a string at every line ending.

There are quite a few more methods, but these are the ones that we will use most often.

2 A Quick First Example

I often find myself outputting headings in my example code. In C++ and Java… I have dedicated utilities modules. However, Python does not require one write a dedicated header module.

Suppose that we wanted to output a heading

    text = "Strings Are Fun!"

that is:

72 characters wide
centered
preceded by a border consisting of dashes (i.e., -)
followed by a border consisting of dashes (i.e., -)

Centering a string can be done with…

    width = 72
    text = "Strings Are Fun!".center(width)

Take note of the width variable. Since we will be using the value 72 in multiple places… we want “what 72 is” to unambiguous.

If you come from C++ or Java… your first instinct might be to create a new string or use some array trickery. However, this is Python…

    border = "-" * width

Python allows us to repeat a string by multiplying the string by an integer. (I prefer Rust’s .repeat syntax.)

Well… now we can put everything together.

def header_demo():

    width = 72
    border = "-" * width

    text = "Strings Are Fun!".center(width)

    print(border)
    print(text)
    print(border)


if __name__ == "__main__":
    header_demo()

Although… I would refactor this code into a reusable function.

def get_heading(text: str, width: int, divider: str = "-") -> str:
    border = "-" * width

    return "\n".join(
        (
            border,
            "Strings Are Fun!".center(width),
            border
        )
    )


def main():
    heading = get_heading(text="Strings Are Fun!", width=72)
    print(heading)


if __name__ == "__main__":
    main()

This is a perfect to introduce the .join method. The method takes a collection of values (e.g., a list or tuple) and places the specified string (e.g., \n or ,) between them.

You may be wondering if the same result can be achieved with an f-string. Yes… it is possible. However, an f-string only works if we have a one-line title.

3 A Second Example

Suppose that we are implementing a crude answer checker for a fill-in-the-blank question. Consider the following question.

A class defines the structure of a type of thing (e.g., Book) while an ______ is an actual thing (e.g., a book on a shelf).

We might start of with a function in the form…

def check_fill_in_the_blank(correct_answer: str, supplied_answer: str) -> bool:
    if supplied_answer == correct_answer:
        return True
    else
        return False

and then rewrite it as…

def check_fill_in_the_blank(correct_answer: str, supplied_answer: str) -> bool:
    if supplied_answer == correct_answer:
        return True

    return False

before finally settling on…

def check_fill_in_the_blank(correct_answer: str, supplied_answer: str) -> bool:
    return supplied_answer == correct_answer

A naive string equality check would work. However, we know that misspellings are common on exams and quizzes. Let us convert both answers to lowercase.

def check_fill_in_the_blank(correct_answer: str, supplied_answer: str) -> bool:
    return supplied_answer.lower() == correct_answer.lower()

Let us shorten the variable names and grab the length of the correct answer with len.

def check_fill_in_the_blank(correct_answer: str, supplied_answer: str) -> bool:
    supplied = supplied_answer.lower()
    correct = correct_answer.lower()

    correct_length = len(correct)

    return supplied == correct

How about we set the criteria as:

Grab the length of the correct answer.
Compare the first length // 2 letters of the correct answer and student answer
Compare the last length // 2 letters of the correct answer and student answer
Award credit if the first or last length // 2 letter match

def check_fill_in_the_blank(correct_answer: str, supplied_answer: str) -> bool:
    supplied = supplied_answer.lower()
    correct = correct_answer.lower()

    num_required_chars = len(correct) // 2

    if supplied.startswith(correct[:num_required_chars]):
        return True

    if supplied.endswith(correct[-num_required_chars:]):
        return True

    return False

The slice syntax will be covered when we get to list in a later lecture. However, for now let us note that…

correct[:num_required_chars] - starts at zero (0) and grabs every character up to (but not including) num_required_chars
correct[-num_required_chars:] - starts at -num_required_chars and grabs everything up through the end of correct

We should also account for the penchant of students to get carried away on such questions (e.g., write a full sentence when only a word or two was needed).

def check_fill_in_the_blank(correct_answer: str, supplied_answer: str) -> bool:
    supplied = supplied_answer.lower()
    correct = correct_answer.lower()

    num_required_chars = len(correct) // 2

    if supplied.startswith(correct[:num_required_chars]):
        return True

    if supplied.endswith(correct[-num_required_chars:]):
        return True

    if correct in supplied:
        return True

    return False

This handles the case where we expected “object” as an answer, but the student wrote something along the lines of “The correct answer is object.”

Note the use of in to check for the occurrence of a substring.

4 Is That It?

We have reached a good stopping point for now. We will see more string manipulation when we discuss working with files.