Clean Coding
Steven J Zeil
Abstract
Clean coding is a set of principles and practices aimed at producing code that is readable, trusted, and maintainable.
1 What is Clean Coding?
In the end, we write code.
There are certain best practices that “good” programmers follow, maybe without even thinking about them, or maybe because they have learned, to their sorrow, that doing otherwise will cost them, even if only in time and effort.
Good code is
-
Readable
-
Trusted
-
Has it been tested?
- How well?
- Does it pass?
-
Does it smell?
A code smell is a section of code that exhibits a practice that the programming community considers ill-advised or even dangerous.
-
-
Maintainable
Does the code throw barriers in the way of people trying to fix it or change it? e.g.,
-
Is the code highly coupled so that changes in one place directly affect many others?
-
Is code duplicated, so that fixes/changes need to be applied simultaneously to many different places?
-
1.1 Perspectives
(from Martin, ch 1)
Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control.
Dave Thomas (OTI)
Clean code can be read, and enhanced by a developer other than its original author. It has unit and acceptance tests. It has meaningful names. It provides one way rather than many ways for doing one thing. It has minimal dependencies, which are explicitly defined, and provides a clear and minimal API. Code should be literate since depending on the language, not all necessary information can be expressed clearly in code alone.
.
In priority order, simple code:
- Runs all the tests;
- Contains no duplication;
- Expresses all the design ideas that are in the system;
- Minimizes the number of entities such as classes, methods, functions, and the like.
You know you are working on clean code when each routine you read turns out to be pretty much what you expected. You can call it beautiful code when the code also makes it look like the language was made for the problem.
2 Principles
- Tests should always pass.
- Code should be expressive.
- maximize cohesion
- Keep classes and methods small.
- Work at one level of abstraction.
- Keep it small, keep it simple
- Minimize coupling
- Law of Demeter
- Avoid duplication.
- Do not create or maintain unnecessary code.
- Minimize coupling
Closely related: SOLID design in OOP
SOLID is an acronym for a collection of design principles related to good object-oriented design.
- Single Responsibility
- Open-Closed Principle
- Liskov Substitution
- Interface Segregation
- Dependency Inversion
The overall theme of SOLID is to control the ways in which different classes depend on each other. SOLID design tries to find class interfaces that are robust even in the face of change.
We won’t delve into SOLID in this course, because it requires a certain grasp of object-oriented programming that is covered in CS330, and not everyone in this course will have taken CS330.
But, if you are interested:
-
Martin: Agile Principles, Patterns, and Practices in C# chapters 8–12
-
SOLID Design (from CS330)
2.1 Code must Express its Intent
Martin relates a story about replaying an editing session via an editor that captures the entire history of edits.
- Notice how much time is spent reading rather than writing.
- How much worse would this be if Bob had not touched this code in weeks, or months?
-
Or if Bob was not the original author of the code?
-
Code should be expressive of its intent.
2.1.1 Refactor to Improve
Refactoring is the process of making changes to code that make it more expressive, but that should not alter its behavior in any way.
Common examples of refactoring include:
- renaming entities
- replacing literal constants by named constant variables
- reformatting code
- encapsulating public data members
- extracting blocks of code into separate functions.
Many IDEs (including Eclipse) have build-in support for these and other refactoring activities.
- For example, renaming a function from the
Refactor
menu not only changes the name of the function in the declaration, but also of all calls to that function in all source code files where it is used.
2.2 Tests should always pass.
-
We do not write large amounts of code and then get around to testing it (or claiming we had no time to test it).
-
We do not even write large amounts of code without pausing to rerun our tests every few minutes.
We’ll pass on the details for now, as we will be discussing automated unit testing and test driven development in great detail later.
2.3 Keep it small, keep it simple
- Minimize the number of classes and members
- Avoid duplication
- Each function should “do one thing”.
3 Practices
3.1 Meaningful names
int nCstCnt; // number of customers
int numberOfCustomers;
There was a time when programmers could be forgiven for writing the code on the left.
- Early programming languages limited variable names to a 5, 6, or 8 characters.
- FORTRAN used the first character to declare the type of a variable:
- Names beginning with “I..N” were integers.
- Everything else was floating point.
- The mechanics of punched cards encouraged the use of statements no more than 72 characters long.
- Keypunches and early text editors lacked support for automatic completion of partial variable names.
None of that is true any more!
So please stop writing code like I had to back in the late 1970’s – S. Zeil
3.1.1 Good Names should…
-
Be expressive of the intention.
- If you need a comment to explain what this thing is, you need a better name.
-
Be pronounceable
-
Draw meaningful distinctions.
-
Avoid tacking meaningless noise words like “Info”, “Data”, “Object”, …
What’s the difference between classes named
Book
andBookData
?- Doesn’t every class actually provide data about something?
-
-
Avoid encoding type information.
-
Avoid variable names like
songList
.- Better to describe the intent as
album
,playlist
, ordiscography
.
- Better to describe the intent as
-
-
Use appropriate parts of speech
- Class, variable and constant names should be noun phrases.
- Method/function names should be verb phrases.
Avoid “nounifying” verbs.
generateSummaryReport
is a perfectly good function name.- It’s a terrible name for a variable or class.
- And
SummaryReportGenerator
is hardly better. - Better:
class SummaryReport { ⋮ void generateAndPrint(OutputDevice);
- And
3.2 Limit your Scope
The scope of a declared name is the range of code in which it can be legally referenced.
Place your declarations in as limited a scope as possible.1
This prevents accidental re-uses of the same variable name, including many uses that may interfere with one another.
3.2.1 Example (conditionals)
Don’t do this:
String firstName;
String lastName;
int pos = fullName.find(',');
if (pos >= 0) {
lastName = fullName.substr(0, pos);
firstName = fullName.substr(pos+1);
println ("Hello " + firstName + ' ' + lastName);
} else {
println ("Hello " + fullName);
}
The scopes of firstName
and lastName
continue to run through the rest of this code block, till the end of the enclosing { }
.
Do this instead:
int pos = fullName.find(',');
if (pos >= 0) {
String lastName = fullName.substr(0, pos);
String firstName = fullName.substr(pos+1);
println ("Hello " + firstName + ' ' + lastName);
} else {
println ("Hello " + fullName);
}
Not only is there less chance of an accidental re-use of the names later, but there is also never a period of time in which the variables are uninitialized.
3.2.2 Example (for loops)
Don’t do this:
vector<Author>::iterator iter = authorList.begin();
while (iter != authorList.end())
{
doSomethingWith(*iter);
++iter;
}
The scope of iter
continues to run through the rest of this code block, till the end of the enclosing { }
.
Do this instead:
for (vector<Author>::iterator iter = authorList.begin();
iter != authorList.end(); ++iter)
{
doSomethingWith(*iter);
}
The scope of iter
runs only to the end of the loop body.
Of course, since C++ 2014 you can do even better:
for (Author& author: authorList)
{
doSomethingWith(author);
}
but the same rule applies whenever you are attempted to use a while
loop with some kind of counter or external position marker.
3.3 Functions should do One Thing
This helps to keep them small, and understandable.
Bad: Suppose a Library
class had this:
- It’s long. You might not think this is particularly long, but it’s too long to be expressive.
- It spans multiple levels of abstraction. Embedded in here are decisions about library structure
- how many catalogs exist and what their indexing scheme is
- what classification scheme we use (DeweyDecimal) and who provides that classification service (OCLC).
Cleaner:
Look, in particular, at the top two functions.
- Aren’t they clearly more expressive of what needs to happen?
- Is there anything in there that took you out of the Library-level of abstraction and forced you to simultaneously think about how more detailed components of the library were designed to work?
3.4 Error Handling is “One Thing”
Many of those functions were complicated by the pattern of
- Try to do something
- Handle the errors can arise if the attempt fails
But those are two different “things”…
We have a lot more functions now, but each one is almost trival to read.
3.5 Avoid Duplication
-
Duplicated code is code that needs to be fixed/changed multiple times when fixing one bug or adding one feature.
-
Implicitly couples blocks of code in a way not visible from the interfaces.
-
Makes it hard to hide (“information hiding”) a design decision.
In our library example, the decision on how to handle cataloging errors has encoded three times:
private void addBookToCatalogs(book) {
attemptToAddBookToAuthorCatalog(book);
attemptToAddBookToTitleCatalog(book);
attemptToAddBookToSubjectCatalog(book);
}
private void attemptToAddBookToAuthorCatalog(Book book) {
try {
addBookToAuthorCatalog(book);
} catch (CatalogingError ex) {
log.error ("Error cataloging " + book.toString(), ex);
}
}
private void attemptToAddBookToTitleCatalog(Book book) {
try {
addBookToTitleCatalog (book);
} catch (CatalogingError ex) {
log.error ("Error cataloging " + book.toString(), ex);
}
}
private void attemptToAddBookToSubjectCatalog(Book book) {
try {
addBookToSubjectCatalog (book);
} catch (CatalogingError ex) {
log.error ("Error cataloging " + book.toString(), ex);
}
}
Better is to collect that common decision either like this:
private void addBookToCatalogs(book) {
attemptToAddBookToAuthorCatalog(book);
attemptToAddBookToTitleCatalog(book);
attemptToAddBookToSubjectCatalog(book);
}
private void handleCatalogingError (Book book, CatalogingError ex)
{
log.error ("Error cataloging " + book.toString(), ex);
}
private void attemptToAddBookToAuthorCatalog(Book book) {
try {
addBookToAuthorCatalog(book);
} catch (CatalogingError ex) {
handleCatalogingError (book, ex);
}
}
private void attemptToAddBookToTitleCatalog(Book book) {
try {
addBookToTitleCatalog (book);
} catch (CatalogingError ex) {
handleCatalogingError (book, ex);
}
}
private void attemptToAddBookToSubjectCatalog(Book book) {
try {
addBookToSubjectCatalog (book);
} catch (CatalogingError ex) {
handleCatalogingError (book, ex);
}
}
or like this:
private void attemptToAddBookToCatalogs(book) {
try {
addBookToCatalogs(book);
} catch (CatalogingError ex) {
log.error ("Error cataloging " + book.toString(), ex);
}
}
private void addBookToCatalogs(book) {
addBookToAuthorCatalog(book);
addBookToTitleCatalog(book);
addBookToSubjectCatalog(book);
}
3.6 Distinguish between data structures and “true” objects
Some classes/structs exist purely to specify a data structure, e.g.,
struct AuthorLinkedListNode {
Author data;
AuthorLinkedListNode* next;
AuthorLinkedListNode (const Author& author,
AuthorLinkedListNode* nextNode = nullptr)
: data(author), next(nextNode)
{}
};
This data type is not intended to be seen outside of Book
:
class Book {
private:
AuthorLinkedListNode* first;
AuthorLinkedListNode* last;
⋮
public:
so there’s really little reason to encapsulate its data members.
In some cases we might even enforce that idea
class Book {
private:
struct AuthorLinkedListNode {
Author data;
AuthorLinkedListNode* next;
AuthorLinkedListNode (const Author& author,
AuthorLinkedListNode* nextNode = nullptr)
: data(author), next(nextNode)
{}
};
AuthorLinkedListNode* first;
AuthorLinkedListNode* last;
⋮
public:
although this is not always possible.
3.7 The Law of Demeter
“Talk to friends, not strangers.”
The law of Demeter: a method
f
of a classC
should only call the methods of:
C
- an object created by
f
- an object passed as an argument to
f
- an object held in a (data structure of a) data member of
C
This law is intended to discourage code like this:
class Library {
⋮
void checkOutBookToPatron (Book book, Patron patron)
{
if (patron.getBranch().getLendingRecord(patron).getNumCheckedOut()
< patron.getBranch().getLendingRecord(patron).getCheckOutLimit()) {
Date dueDate = Calendar.today().addDays(book.checkOutTerm());
patron.getBranch().getLendingRecord(patron)
.addCheckout(book, dueDate);
book.checkOutTo(patron, dueDate);
} else {
throw new CheckoutLimitReached(patron);
}
}
Some people find the chained calls hard to read.
Perhaps more to the point, this style hides a coupling of this function to the interface of classes LibraryBranch
and LendingRecord
.
Making the coupling explicit might be an improvement:
class Library {
⋮
void checkOutBookToPatron (Book book, Patron patron)
{
LibraryBranch patronsHomeBranch = patron.getBranch();
LendingRecord patronLendingRecord = branch.getLendingRecord(patron);
if (patronLendingRecord.getNumCheckedOut()
< patronLendingRecord.getCheckOutLimit()) {
Date dueDate = Calendar.today().addDays(book.checkOutTerm());
patronLendingRecord.addCheckout(book, dueDate);
book.checkOutTo(patron, duedate);
} else {
throw new CheckoutLimitReached(patron);
}
}
but it doesn’t make the coupling go away.
- The highlighted calls violate the Law of Demeter
- Look at the other calls. Do you see why they don’t violate that law?
If anything, it makes it clearer that we are not sticking to a uniform level of abstraction here. So, better still:
class Library {
⋮
void checkOutBookToPatron (Book book, Patron patron)
{
if (patron.canCheckOutMoreBooks()) {
Date dueDate = Calendar.today().addDays(book.checkOutTerm());
patron.addBookToCheckedOutList(book, dueDate);
book.checkOutTo(patron, duedate);
} else {
throw new CheckoutLimitReached(patron);
}
}
class Patron {
⋮
public void addBookToCheckedOutList (Book book, Date dueDate) {
LibraryBranch patronsHomeBranch = branch;
LendingRecord patronLendingRecord = branch.getLendingRecord(patron);
patronLendingRecord.addCheckout(book, dueDate);
}
public boolean canCheckOutMoreBooks() {
LibraryBranch patronsHomeBranch = branch;;
LendingRecord patronLendingRecord = branch.getLendingRecord(patron);
return patronLendingRecord.getNumCheckedOut()
< patronLendingRecord.getCheckOutLimit()
}
which still has multiple violations, so we need to keep going:
class Patron {
⋮
public void addBookToCheckedOutList (Book book, Date dueDate) {
LibraryBranch patronsHomeBranch = branch;
branch.addBookToCheckedOutList(patron, book, dueDate);
}
public boolean canCheckOutMoreBooks() {
LibraryBranch patronsHomeBranch = branch;;
return branch.getNumCheckedOut(patron)
< branch.getCheckOutLimit(patron);
}
class LibraryBranch {
⋮
public void addBookToCheckedOutList (Patron patron,
Book book, Date dueDate) {
LendingRecord patronLendingRecord = lendingRecords.get(patron);
patronLendingRecord.addCheckout(book, dueDate);
}
public boolean canCheckOutMoreBooks() {
LendingRecord patronLendingRecord = lendingRecords.get(patron);
return patronLendingRecord.getNumCheckedOut()
< patronLendingRecord.getCheckOutLimit();
}
- Apply this “law” with caution.
- Personally, I like the 2nd version better.
- Martin also argues against functions taking more than one or two parameters, but this kind of refactoring for Demeter can result in longer parameter lists.
3.8 Error Handling
- Favor exceptions rather than special return codes, returning null, etc.
- Martin argues for unchecked exceptions rather than checked to avoid percolating
throws
declarations (and their accompanying coupling) throughout the code.
3.9 Classes should be small!
-
Martin argues strongly against large classes.
I.e., the number of responsibilities should be small.
- Roughly speaking, “responsibilities” are measured in terms of public members.
-
get/set pairs probably count as only one.
-
- Roughly speaking, “responsibilities” are measured in terms of public members.
-
But refactoring to accomodate some of his other recommendations (e.g., “functions should do one thing”) tends to make the number of function members explode.
-
The trick is that the added members are generally private, not public.
-
Cohesion
- When is the list of public members too big?
- When you cannot assign a simple overall description of what they all do in the class.
- The class is not cohesive if its public members perform wildly different tasks.
- The Single Responsibility Principle (SRP, part of SOLID) states that every class should have one overall responsibility.
Martin: We should also be able to write a brief description of the class in about 25 words, without using the words “if,” “and,” “or,” or “but.”
1: This practice is not commonly included in writing about Clean Coding, but I think that’s because it is generally taken for granted that “every” programmer does this. But I see lots of violations of this practice in student coding every semester. – SJZ