Protocol vs Abstract Base Class
Thomas J. Kennedy
1 A Clear Difference
I spent quite a bit time reading Python PEP 544 after the Typing Module documentation. Let us take a quick look at the Rational and Goals section of PEP 544.
Example 1: PEP 544 - Rational and GoalsThis passage was retrieved from https://peps.python.org/pep-0544/#rationale-and-goals.
Currently, PEP 484 and the
typing
module [typing] define abstract base classes for several common Python protocols such asIterable
andSized
. The problem with them is that a class has to be explicitly marked to support them, which is unpythonic and unlike what one would normally do in idiomatic dynamically typed Python code. For example, this conforms to PEP 484:
from typing import Sized, Iterable, Iterator class Bucket(Sized, Iterable[int]): ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ...
The same problem appears with user-defined ABCs: they must be explicitly subclassed or registered. This is particularly difficult to do with library types as the type objects may be hidden deep in the implementation of the library. Also, extensive use of ABCs might impose additional runtime costs.
The intention of this PEP is to solve all these problems by allowing users to write the above code without explicit base classes in the class definition, allowing
Bucket
to be implicitly considered a subtype of bothSized
andIterable[int]
by static type checkers using structural [wiki-structural] subtyping:
from typing import Iterator, Iterable class Bucket: ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ... def collect(items: Iterable[int]) -> int: ... result: int = collect(Bucket()) # Passes type check
Note that ABCs in
typing
module already provide structural behavior at runtime,isinstance(Bucket(), Iterable)
returnsTrue
. The main goal of this proposal is to support such behavior statically. The same functionality will be provided for user-defined protocols, as specified below. The above code with a protocol class matches common Python conventions much better. It is also automatically extensible and works with additional, unrelated classes that happen to implement the required protocol.
Now for the important question… How would we explain this passage in casual conversation? Well… let us find out!
2 What Do We Know?
We know that given a base class (e.g., Shape
) that any derived class (e.g., Rectangle
) must explicitly inherit from the base class. Consider… the following Shape
class.
from typing import Protocol
class Shape(Protocol):
"""
Shape in a 2-D Cartesian Plane
"""
def name(self) -> str:
raise NotImplementedError()
def area(self) -> float:
raise NotImplementedError()
def perimeter(self) -> float:
raise NotImplementedError()
Take note of the three methods:
name
- which requires a class to return the name of the .area
- which requires a class to implement an area computation.perimeter
- which requires a class to implement a perimeter computation.
A Rectangle
class could be written as…
@dataclass
class Rectangle:
length: float = 1
width: float = 1
def name(self) -> str:
return "Rectangle"
def area(self) -> float:
return self.length * self.height
def perimeter(self) -> float:
return 2 * (self.length + self.height)
Okay… What is the difference? The first line of the Rectangle
class differs:
class Rectangle(Shape):
versus
class Rectangle:
We no longer need to explicitly mark Rectangle
as inheriting from Shape
.
3 What is the Benefit?
We can write code along the lines of…
def get_largest_area_by_shape_name(shapes: list[Shape]) -> tuple[str, float]:
"""
Given a list of shapes:
1. Identify a set of shape names
2. find the largest area for each name
"""
pass
The function does not need to know about the specific classes involved, just that those classes provide name
and area
.
4 That Sounds Like an Abstract Base Class
That is an accurate assessment. However, a Protocol can be defined independent of the class definitions (e.g., you could define a Protocol
and have it apply to classes written as part of a separate module or library.
Alex Martelli (Fellow of the Python Software Foundation) summarizes the notion in a post on Google Groups.
In other words, don’t check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with. If the argument fails this specific-ducklyhood-subset-test, then you can shrug, ask “why a duck?” (at least, you can if you’re a Marx Brothers fan and have memorized “Cocoanuts”’ script; Monty Python one-true-wayists will have to find their own simile here), and move on to the next set of tests (why-a-no- chicken immediately comes to mind, but then one would have to ask why it crosses the road, so I think we’d better snip it).
…
Besides, “explicit is better than implicit”, goes one of Python’s mantras. Just let the client-code explicitly TELL you which kind of argument they are passing you (and doing so through a named argument is simple and readable), and your work drops to zero, while removing no useful functionality whatever from the client. As a little vig, you also avoid trouble in case what the client wants to pass is some tricky object that behaves EITHER as a file connection OR as a db connection (etc, etc) – not all that likely, but, who knows.
This gets us closer to Interface Segregation (the ‘I’ in SOLID). It is much better to define the specific type of behavior we want instead of for a thing.