Clean Coding

There are certain best practices that “good” programmers follow, maybe without even thinking about them, or maybe because they have learned, to their sorrow, that doing otherwise will cost them, even if only in time and effort.

Good code is

Readable
Trusted
- Has it been tested?
  - How well?
  - Does it pass?
- Does it smell?
  
  A code smell is a section of code that exhibits a practice that the programming community considers ill-advised or even dangerous.
Maintainable

Does the code throw barriers in the way of people trying to fix it or change it? e.g.,
- Is the code highly coupled so that changes in one place directly affect many others?
- Is code duplicated, so that fixes/changes need to be applied simultaneously to many different places?

1.1 Perspectives

(from Martin, ch 1)

Grady Booch

Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control.

Dave Thomas (OTI)

Clean code can be read, and enhanced by a developer other than its original author. It has unit and acceptance tests. It has meaningful names. It provides one way rather than many ways for doing one thing. It has minimal dependencies, which are explicitly defined, and provides a clear and minimal API. Code should be literate since depending on the language, not all necessary information can be expressed clearly in code alone.

Ron Jeffries

In priority order, simple code:

Runs all the tests;

Contains no duplication;

Expresses all the design ideas that are in the system;

Minimizes the number of entities such as classes, methods, functions, and the like.

Ward Cunningham

You know you are working on clean code when each routine you read turns out to be pretty much what you expected. You can call it beautiful code when the code also makes it look like the language was made for the problem.

2 Principles

Tests should always pass.
Code should be expressive.
- maximize cohesion
- Keep classes and methods small.
- Work at one level of abstraction.
Keep it small, keep it simple
- Minimize coupling
  - Law of Demeter
- Avoid duplication.
- Do not create or maintain unnecessary code.

Closely related: SOLID design in OOP

SOLID is an acronym for a collection of design principles related to good object-oriented design.

Single Responsibility
Open-Closed Principle
Liskov Substitution
Interface Segregation
Dependency Inversion

The overall theme of SOLID is to control the ways in which different classes depend on each other. SOLID design tries to find class interfaces that are robust even in the face of change.

We won’t delve into SOLID in this course, because it requires a certain grasp of object-oriented programming that is covered in CS330, and not everyone in this course will have taken CS330.

But, if you are interested:

Martin: Agile Principles, Patterns, and Practices in C# chapters 8–12
SOLID Design (from CS330)

2.1 Code must Express its Intent

Martin relates a story about replaying an editing session via an editor that captures the entire history of edits.

reveal

    Bob enters the module.
    He scrolls down to the function needing change.
    He pauses, considering his options.
    Oh, he’s scrolling up to the top of the module to
       check the initialization of a variable.
    Now he scrolls back down and begins to type.
    Ooops, he’s erasing what he typed!
    He types it again.
    He erases it again!
    He types half of something else but then erases that!
    He scrolls down to another function that calls the function
        he’s changing to see how it is called.
    He pops up another window and looks at a subclass. Is that
        function overridden?

Notice how much time is spent reading rather than writing.
How much worse would this be if Bob had not touched this code in weeks, or months?
- Or if Bob was not the original author of the code?

Code should be expressive of its intent.

2.1.1 Refactor to Improve

Refactoring is the process of making changes to code that make it more expressive, but that should not alter its behavior in any way.

Common examples of refactoring include:

renaming entities
replacing literal constants by named constant variables
reformatting code
encapsulating public data members
extracting blocks of code into separate functions.

Many IDEs (including Eclipse) have build-in support for these and other refactoring activities.

For example, renaming a function from the Refactor menu not only changes the name of the function in the declaration, but also of all calls to that function in all source code files where it is used.

2.2 Tests should always pass.

We do not write large amounts of code and then get around to testing it (or claiming we had no time to test it).
We do not even write large amounts of code without pausing to rerun our tests every few minutes.

We’ll pass on the details for now, as we will be discussing automated unit testing and test driven development in great detail later.

2.3 Keep it small, keep it simple

Minimize the number of classes and members
Avoid duplication
Each function should “do one thing”.

3 Practices

3.1 Meaningful names

int nCstCnt; // number of customers

int numberOfCustomers;

There was a time when programmers could be forgiven for writing the code on the left.

Early programming languages limited variable names to a 5, 6, or 8 characters.
FORTRAN used the first character to declare the type of a variable:
- Names beginning with “I..N” were integers.
- Everything else was floating point.
The mechanics of punched cards encouraged the use of statements no more than 72 characters long.
Keypunches and early text editors lacked support for automatic completion of partial variable names.

None of that is true any more!

So please stop writing code like I had to back in the late 1970’s – S. Zeil

3.1.1 Good Names should…

Be expressive of the intention.
- If you need a comment to explain what this thing is, you need a better name.
Be pronounceable
Draw meaningful distinctions.
- Avoid tacking meaningless noise words like “Info”, “Data”, “Object”, …
  
  What’s the difference between classes named Book and BookData?
  - Doesn’t every class actually provide data about something?
Avoid encoding type information.
- Avoid variable names like songList.
  - Better to describe the intent as album, playlist, or discography.
Use appropriate parts of speech
- Class, variable and constant names should be noun phrases.
- Method/function names should be verb phrases.
  Avoid “nounifying” verbs.
  - generateSummaryReport is a perfectly good function name.
  - It’s a terrible name for a variable or class.
    - And SummaryReportGenerator is hardly better.
    - Better:
```
class SummaryReport {
   ⋮
   void generateAndPrint(OutputDevice);
```

3.2 Limit your Scope

The scope of a declared name is the range of code in which it can be legally referenced.

Place your declarations in as limited a scope as possible.¹

This prevents accidental re-uses of the same variable name, including many uses that may interfere with one another.

3.2.1 Example (conditionals)

Don’t do this:

String firstName;
String lastName;
int pos = fullName.find(','); 
if (pos >= 0) {
    lastName = fullName.substr(0, pos);
    firstName = fullName.substr(pos+1);
    println ("Hello " + firstName + ' ' + lastName);
} else {
    println ("Hello " + fullName);
}

The scopes of firstName and lastName continue to run through the rest of this code block, till the end of the enclosing { }.

Do this instead:

int pos = fullName.find(','); 
if (pos >= 0) {
    String lastName = fullName.substr(0, pos);
    String firstName = fullName.substr(pos+1);
    println ("Hello " + firstName + ' ' + lastName);
} else {
    println ("Hello " + fullName);
}

Not only is there less chance of an accidental re-use of the names later, but there is also never a period of time in which the variables are uninitialized.

3.2.2 Example (for loops)

Don’t do this:

vector<Author>::iterator iter = authorList.begin();
while (iter != authorList.end())
{
    doSomethingWith(*iter);
    ++iter;
}

The scope of iter continues to run through the rest of this code block, till the end of the enclosing { }.

Do this instead:

for (vector<Author>::iterator iter = authorList.begin();
     iter != authorList.end(); ++iter)
{
    doSomethingWith(*iter);
}

The scope of iter runs only to the end of the loop body.

Of course, since C++ 2014 you can do even better:

for (Author& author: authorList)
{
    doSomethingWith(author);
}

but the same rule applies whenever you are attempted to use a while loop with some kind of counter or external position marker.

3.3 Functions should do One Thing

This helps to keep them small, and understandable.

Bad: Suppose a Library class had this:

expand

public void addBookToCollection (Book book)
{
    List<Author> authors = book.getAuthors();
    Title title = book.getTitle();
    List<SubjectKeyword> subjects = new List<>();
    try {
        subjects = book.getSubjects();
    } catch (SubjectsUnassigned ex) {
        requestReviewOfSubjects(book);
    }   
    try {
        DecimalClassification dewey = oclc.getClassificationCode(book.getISBN());
        ShelfLocation intendedLocation = getShelfFor(dewey);
        book.setLocation (intendedLocation);
    } catch (OCLCServiceUnavailable ex) {
        log.error("Error contacting OCLC about " + book.toString(), ex);
    }   
    try {
        for (Author author: authors) {
            authorIndex.add(author, book);
        }
        titleIndex.add (title, book);
        for (SubjectKeyword subject: subjects) {
            subjectIndex.add(subject, book);
        }
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

It’s long. You might not think this is particularly long, but it’s too long to be expressive.
It spans multiple levels of abstraction. Embedded in here are decisions about library structure
- how many catalogs exist and what their indexing scheme is
- what classification scheme we use (DeweyDecimal) and who provides that classification service (OCLC).
and about the ADT interfaces used to manipulate the individual catalogs and classification service.

Cleaner:

expand

public void addBookToCollection (Book book)
{
  addBookToCatalogs(book);
  moveBookToIntendedLocation(book);
}

private void addBookToCatalogs(book) {
    addBookToAuthorCatalog(book);
    addBookToTitleCatalog(book);
    addBookToSubjectCatalog(book);
}

private void addBookToAuthorCatalog(Book book) {
    List<Author> authors = book.getAuthors();
    try {
        for (Author author: authors) {
            authorIndex.add(author, book);
        }
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

private void addBookToTitleCatalog(Book book) {
    Title title = book.getTitle();
    try {
        titleIndex.add (title, book);
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

private void addBookToSubjectCatalog (Book book) {
    try {
        List<SubjectKeyword> subjects = book.getSubjects();
        for (SubjectKeyword subject: subjects) {
            subjectIndex.add(subject, book);
        }
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
     } catch (SubjectsUnassigned ex) {
        requestReviewOfSubjects(book);
     }   
}


private void moveBookToIntendedLocation (Book book) {
    try {
        ShelfLocation intendedLocation = getIntendedLocation(book);
        book.setLocation (intendedLocation);
    } catch (OCLCServiceUnavailable ex) {
        log.error("Error contacting OCLC about " + book.toString(), ex);
    }   
}

private ShelfLocation getIntendedLocation (Book book) 
      throws OCLCServiceUnavailable 
{
    DecimalClassification dewey = oclc.getClassificationCode(book.getISBN());
    return getShelfFor(dewey);
}

Look, in particular, at the top two functions.

Aren’t they clearly more expressive of what needs to happen?
Is there anything in there that took you out of the Library-level of abstraction and forced you to simultaneously think about how more detailed components of the library were designed to work?

3.4 Error Handling is “One Thing”

Many of those functions were complicated by the pattern of

Try to do something
Handle the errors can arise if the attempt fails

But those are two different “things”…

expand

public void addBookToCollection (Book book)
{
  addBookToCatalogs(book);
  attemptToMoveBookToIntendedLocation(book);
}

private void addBookToCatalogs(book) {
    attemptToAddBookToAuthorCatalog(book);
    attemptToAddBookToTitleCatalog(book);
    attemptToAddBookToSubjectCatalog(book);
}

private void attemptToAddBookToAuthorCatalog(Book book) {
    try {
        addBookToAuthorCatalog(book);
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

private void addBookToAuthorCatalog(Book book) {
    List<Author> authors = book.getAuthors();
    for (Author author: authors) {
        authorIndex.add(author, book);
    }
}

private void attemptToAddBookToTitleCatalog(Book book) {
    try {
        addBookToTitleCatalog (book);
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

private void addBookToTitleCatalog(Book book) {
    Title title = book.getTitle();
    titleIndex.add (title, book);
}

private void attemptToAddBookToSubjectCatalog(Book book) {
    try {
        addBookToSubjectCatalog (book);
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}


private void addBookToSubjectCatalog(Book book) throws CatalogingError {
    List<SubjectKeyword> subjects = getSubjectListOrRequestReview(book);
    for (SubjectKeyword subject: subjects) {
        subjectIndex.add(subject, book);
    }
}


private List<SubjectKeyword> getSubjectListOrRequestReview(Book book) {
    try {
        return book.getSubjects();
    } catch (SubjectsUnassigned ex) {
        requestReviewOfSubjects(book);
        return new Collections<SubjectKeyWord>.emptyList();
    }   
}


private void moveBookToIntendedLocation (Book book) {
    try {
        ShelfLocation intendedLocation = getIntendedLocation(book);
        book.setLocation (intendedLocation);
    } catch (OCLCServiceUnavailable ex) {
        log.error("Error contacting OCLC about " + book.toString(), ex);
    }   
}

private ShelfLocation getIntendedLocation (Book book) 
      throws OCLCServiceUnavailable 
{
    DecimalClassification dewey = oclc.getClassificationCode(book.getISBN());
    return getShelfFor(dewey);
}

We have a lot more functions now, but each one is almost trival to read.

3.5 Avoid Duplication

Duplicated code is code that needs to be fixed/changed multiple times when fixing one bug or adding one feature.
Implicitly couples blocks of code in a way not visible from the interfaces.
Makes it hard to hide (“information hiding”) a design decision.

In our library example, the decision on how to handle cataloging errors has encoded three times:

private void addBookToCatalogs(book) {
    attemptToAddBookToAuthorCatalog(book);
    attemptToAddBookToTitleCatalog(book);
    attemptToAddBookToSubjectCatalog(book);
}

private void attemptToAddBookToAuthorCatalog(Book book) {
    try {
        addBookToAuthorCatalog(book);
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

private void attemptToAddBookToTitleCatalog(Book book) {
    try {
        addBookToTitleCatalog (book);
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

private void attemptToAddBookToSubjectCatalog(Book book) {
    try {
        addBookToSubjectCatalog (book);
     } catch (CatalogingError ex) {
          log.error ("Error cataloging " + book.toString(), ex); 
     }
}

Better is to collect that common decision either like this:

private void addBookToCatalogs(book) {
    attemptToAddBookToAuthorCatalog(book);
    attemptToAddBookToTitleCatalog(book);
    attemptToAddBookToSubjectCatalog(book);
}

private void handleCatalogingError (Book book, CatalogingError ex)
{
    log.error ("Error cataloging " + book.toString(), ex);
}

private void attemptToAddBookToAuthorCatalog(Book book) {
    try {
        addBookToAuthorCatalog(book);
     } catch (CatalogingError ex) {
          handleCatalogingError (book, ex);
     }
}

private void attemptToAddBookToTitleCatalog(Book book) {
    try {
        addBookToTitleCatalog (book);
     } catch (CatalogingError ex) {
          handleCatalogingError (book, ex);
     }
}

private void attemptToAddBookToSubjectCatalog(Book book) {
    try {
        addBookToSubjectCatalog (book);
     } catch (CatalogingError ex) {
          handleCatalogingError (book, ex);
     }
}

or like this:

private void attemptToAddBookToCatalogs(book) {
    try {
        addBookToCatalogs(book);
     } catch (CatalogingError ex) {
        log.error ("Error cataloging " + book.toString(), ex);
     }
}

private void addBookToCatalogs(book) {
    addBookToAuthorCatalog(book);
    addBookToTitleCatalog(book);
    addBookToSubjectCatalog(book);
}

3.6 Distinguish between data structures and “true” objects

Some classes/structs exist purely to specify a data structure, e.g.,

struct AuthorLinkedListNode {
    Author data;
    AuthorLinkedListNode* next;

    AuthorLinkedListNode (const Author& author,
                          AuthorLinkedListNode* nextNode = nullptr)
        : data(author), next(nextNode)
    {}

};

This data type is not intended to be seen outside of Book:

class Book {
private:
    AuthorLinkedListNode* first;
    AuthorLinkedListNode* last;
  ⋮
public:

so there’s really little reason to encapsulate its data members.

In some cases we might even enforce that idea

class Book {
private:
    struct AuthorLinkedListNode {
       Author data;
       AuthorLinkedListNode* next;

       AuthorLinkedListNode (const Author& author,
                     AuthorLinkedListNode* nextNode = nullptr)
         : data(author), next(nextNode)
       {}

    };
    AuthorLinkedListNode* first;
    AuthorLinkedListNode* last;
  ⋮
public:

although this is not always possible.

3.7 The Law of Demeter

“Talk to friends, not strangers.”

The law of Demeter: a method f of a class C should only call the methods of:

C

an object created by f

an object passed as an argument to f

an object held in a (data structure of a) data member of C

This law is intended to discourage code like this:

class Library {
        ⋮
    void checkOutBookToPatron (Book book, Patron patron)
    {
       if (patron.getBranch().getLendingRecord(patron).getNumCheckedOut()
           < patron.getBranch().getLendingRecord(patron).getCheckOutLimit()) {
           Date dueDate = Calendar.today().addDays(book.checkOutTerm());
           patron.getBranch().getLendingRecord(patron)
              .addCheckout(book, dueDate); 
           book.checkOutTo(patron, dueDate);
       } else {
           throw new CheckoutLimitReached(patron);
       }
    }

Some people find the chained calls hard to read.

Perhaps more to the point, this style hides a coupling of this function to the interface of classes LibraryBranch and LendingRecord.

Making the coupling explicit might be an improvement:

class Library {
        ⋮
    void checkOutBookToPatron (Book book, Patron patron)
    {
       LibraryBranch patronsHomeBranch = patron.getBranch();
       LendingRecord patronLendingRecord = branch.getLendingRecord(patron);
       if (patronLendingRecord.getNumCheckedOut()
           < patronLendingRecord.getCheckOutLimit()) {
           Date dueDate = Calendar.today().addDays(book.checkOutTerm());
           patronLendingRecord.addCheckout(book, dueDate); 
           book.checkOutTo(patron, duedate);
       } else {
           throw new CheckoutLimitReached(patron);
       }
    }

but it doesn’t make the coupling go away.

The highlighted calls violate the Law of Demeter
Look at the other calls. Do you see why they don’t violate that law?

If anything, it makes it clearer that we are not sticking to a uniform level of abstraction here. So, better still:

class Library {
        ⋮
    void checkOutBookToPatron (Book book, Patron patron)
    {
       if (patron.canCheckOutMoreBooks()) {
           Date dueDate = Calendar.today().addDays(book.checkOutTerm());
           patron.addBookToCheckedOutList(book, dueDate);
           book.checkOutTo(patron, duedate);
       } else {
           throw new CheckoutLimitReached(patron);
       }
    }


class Patron {
       ⋮
    public void addBookToCheckedOutList (Book book, Date dueDate) {
       LibraryBranch patronsHomeBranch = branch;
       LendingRecord patronLendingRecord = branch.getLendingRecord(patron);
       patronLendingRecord.addCheckout(book, dueDate);
    }

    public boolean canCheckOutMoreBooks() {
       LibraryBranch patronsHomeBranch = branch;;
       LendingRecord patronLendingRecord = branch.getLendingRecord(patron);
       return patronLendingRecord.getNumCheckedOut()
           < patronLendingRecord.getCheckOutLimit()       
    }

which still has multiple violations, so we need to keep going:

class Patron {
       ⋮
    public void addBookToCheckedOutList (Book book, Date dueDate) {
       LibraryBranch patronsHomeBranch = branch;
       branch.addBookToCheckedOutList(patron, book, dueDate);
    }

    public boolean canCheckOutMoreBooks() {
       LibraryBranch patronsHomeBranch = branch;;
       return branch.getNumCheckedOut(patron)
           < branch.getCheckOutLimit(patron);      
    }

class LibraryBranch {
       ⋮
    public void addBookToCheckedOutList (Patron patron, 
                                         Book book, Date dueDate) {
       LendingRecord patronLendingRecord = lendingRecords.get(patron);
       patronLendingRecord.addCheckout(book, dueDate);
    }

    public boolean canCheckOutMoreBooks() {
       LendingRecord patronLendingRecord = lendingRecords.get(patron);
       return patronLendingRecord.getNumCheckedOut()
           < patronLendingRecord.getCheckOutLimit();       
    }

Apply this “law” with caution.
- Personally, I like the 2nd version better.
- Martin also argues against functions taking more than one or two parameters, but this kind of refactoring for Demeter can result in longer parameter lists.

3.8 Error Handling

Favor exceptions rather than special return codes, returning null, etc.
Martin argues for unchecked exceptions rather than checked to avoid percolating throws declarations (and their accompanying coupling) throughout the code.

3.9 Classes should be small!

Martin argues strongly against large classes.

I.e., the number of responsibilities should be small.
- Roughly speaking, “responsibilities” are measured in terms of public members.
  - get/set pairs probably count as only one.
But refactoring to accomodate some of his other recommendations (e.g., “functions should do one thing”) tends to make the number of function members explode.
- The trick is that the added members are generally private, not public.

Cohesion

When is the list of public members too big?
When you cannot assign a simple overall description of what they all do in the class.
- The class is not cohesive if its public members perform wildly different tasks.
- The Single Responsibility Principle (SRP, part of SOLID) states that every class should have one overall responsibility.

Martin: We should also be able to write a brief description of the class in about 25 words, without using the words “if,” “and,” “or,” or “but.”

1 : This practice is not commonly included in writing about Clean Coding, but I think that’s because it is generally taken for granted that “every” programmer does this. But I see lots of violations of this practice in student coding every semester. – SJZ