Building a Linked List - Part 2
Thomas J. Kennedy
1 What is the Goal?
In the previous lecture… we left of with a LinkedList
class and an initial test suite. However, we never identified an end goal. Why build a Linked List class?
-
We need a case study for interfaces with an emphasis on abstract base classes.
-
One should always know how to build an iterator.
-
Contrived examples are never fun… we need a non-trivial example to discuss how to add support for
extend
functionality, thein
keyword.The contrived examples note refers to the cliché polymorphism “animals eat” inheritance hierarchy that is common to most OOP texts.
Let us state our goal succinctly…
We want a
LinkedList
class that can be used as a drop-in replacement for the built-in Pythonlist
.
2 Okay… What Comes First?
An Iterator
is the first piece of the puzzle. Our current implementation allows us to store data, but not retrieve that data. In fact… __str__
is the only way to see the stored data.
Let us start with two abstract base classes from the Python collections.abc
module:
-
Iterator
- this abstract base class (ABC) is used to traverse a data structure without having to worry about the specific data structure. Think of this as a general notion of position. -
Iterable
- this ABC indicates that a class provides an iterator.
We will need to
-
tweak our code to include
import collections.abc
-
tweak the beginning of the class definition
class LinkedList(collections.abc.Iterable)
-
add an inner
Iterator
afterNode
class Iterator(collections.abc.Iterator): pass
3 Implementing an Iterator
Let us put everything together.
Let us start with the Iterator
.
class Iterator(collections.abc.Iterator):
def __init__(self, starting_node=None):
self.current_node = starting_node
def __next__(self):
raise StopIteration()
You probably noticed the earlier typo (i.e., __next
instead of __next__
). That was the first fix.
Take note of the class itself. The __init__
method takes a single argument starting_node
which defaults to None
. The self.current_node
data member will be used to keep track of the position as we move through a list.
def __next__(self):
raise StopIteration()
When an Iterator
runs out of entries (i.e., reaches the end of a collection) a StopIteration
error is used to stop the loop (e.g., for val in
collection
) or next
being used to retrieve data from the container.
In our case…
def __next__(self):
if not self.current_node:
raise StopIteration()
the first step is to check if self.current_node
is Node
. If it is… there is no more data left. Otherwise we need to:
-
Create a temporary reference to data within the current node
-
Move to the next node
-
Return the temporary reference
def __next__(self):
if not self.current_node:
raise StopIteration()
value = self.current_node.data
# Move to the next node
self.current_node = self.current_node.next_node
return value
Once LinkedList.__next__
implementation is complete. The LinkedList.__iter__
method is a one liner.
def __iter__(self) -> LinkedList.Iterator:
return LinkedList.Iterator(starting_node=self.head)
I have opted to use the explicit keyword argument here… for readability.
4 The Iterator was the Key
The next few updates to LinkedList
will focus on functionality that was either:
-
difficult to implement without a
LinkedList.Iterator
-
near impossible to implement without a basic understanding of iterators.
4.1 Rewriting Dunder str
Now that we have an Iterator
the __str__
method can be changed from a while
loop…
output_str = ""
idx = 0
it = self.head
while it:
output_str += f"Node #{idx:} contains {it.data}\n"
it = it.next_node
idx += 1
return output_str
to a for
loop…
output_str = ""
for idx, data in enumerate(self):
output_str += f"Node #{idx:} contains {data}\n"
return output_str
We can replace this with a generator expression combined with a call to "\n".join
. This change will solve both the string concatenation issue and the trailing newline issue.
return "\n".join(
f"Node #{idx:} contains {data}" for idx, data in enumerate(self)
)
The final implementation of __str__
can (and should) be done in one line.
4.2 Comparing for Equality
We can now add a __eq__
method to compare two LinkedList
s for equality. You will note that this LinkedList
discussion has relaxed the pydoc documentation for every method rule. Remember…
-
practicality beats purity - the purpose of most functions is captured by their names and arguments (e.g.,
append
which mirrors thelist.append
method). -
readability counts - additional redundant documentation can make code less readable, especially if there happen to be typos.
Just like __str__
, __eq__
needs explicit documentation to capture not the purpose of the function, but the actual criteria used to check for equality.
def __eq__(self, rhs: LinkedList) -> bool:
"""
Compare two LinkedList objects for equality based on the elements in
each list. The two lists must:
1. Have the same number of elements
2. Contain identical elements
3. Contain the identical elements in the same order
"""
We will start with two checks…
if not isinstance(rhs, LinkedList):
return False
Let us restrict the comparison to two LinkedList
objects. If rhs
is another type… it cannot be equal to a LinkedList
. (While it is possible to relax this requirement… we will not do so here.)
if len(self) != len(rhs):
return False
return False
The next check is the length. Two LinkedList
objects (i.e., self
and rhs
) cannot be equal if they contain different numbers of elements. The final return False
is a placeholder for the remaining check.
If we make it past the length check… we know that the two lists contain the same number of elements. Since we have the LinkedList.Iterator
… we can use zip
to step through both lists simultaneously.
for lhs_datum, rhs_datum in zip(self, rhs):
if lhs_datum != rhs_datum:
return False
return True
Note the loop… as soon as we encounter a pair of elements that are not equal… we can stop:
-
To confirm the equality of two lists… we need only find a single pair of entries that differ. While there may be more… it does not matter if one value differs of multiple values differ.
-
To confirm that two lists are equal… every pair of values must be equal.
The any
or all
keywords can both be used to simplify the loop.
-
If
any
pair of values (i.e.,lhs_datum, rhs_datum
) is not equal return False.if any(lhs_datum != rhs_datum for lhs_datum, rhs_datum in zip(self, rhs)): return False return True
-
Return whether all pairs of values are equal
return all(lhs_datum == rhs_datum for lhs_datum, rhs_datum in zip(self, rhs))
My preference, in this case, is for the latter option (i.e., all
).
4.3 Now for Copying
Let us now implement the __deepcopy__
method. We would like the ability to create an identical copy of a list with a separate copy of all data.
def __deepcopy__(self, memo) -> LinkedList:
list_copy = LinkedList()
for entry in self:
list_copy.append(copy.deepcopy(entry))
return list_copy
Now that we have our own iterator… we can write functions that examine or retrieve data more quickly (as you may have noticed with __str__
and __eq__
).
Note that literal values (such as int
and str
) do not actually get copied. All references to a 7
will reference the same 7
. The copy logic is intended for mutable objects (e.g., list
, or user-defined classes/objects).
5 Adding Data with “extend”
The list
class provides an extend
method that allows multiple values to be added at the same time instead of one at a time with multiple calls to append
.
def extend(self, collection: collections.abc.Iterable) -> None:
"""
Take every value in collection, create a new Node, and append it to
this list
"""
for value in collection:
self.append(value)
Yes… the
extend
method is using a for loop and callingappend
within the loop. We will discuss how to optimize this function in the next lecture.
Did you notice how collection
is an Iterable
? It does not matter where the data comes from (e.g., generator or list
). If we can iterate over the data… our loop can handle it.
6 Adding Debugging Output
LinkedList
provides a __str__
for human readable (i.e., production output). We need a __repr__
which provides debugging output… ideally this output takes the form of Python code that can recreate an identical object.
Consider…
ll = LinkedList()
ll.append(2)
ll.append(3)
ll.append(5)
ll.append(7)
print(f"{ll!r}")
This code should generate, as output…
Expected Output
LinkedList(2, 3, 5, 7)
Let us add __repr__
to LinkedList
.
def __repr__(self) -> str:
inner_data_str = ", ".join(f"{datum!r}" for datum in self)
return f"LinkedList({inner_data_str})"
However, our constructor does not support arguments… and we want to supply multiple arguments. First… let us make a subtle change…
def __repr__(self) -> str:
inner_data_str = ", ".join(f"{datum!r}" for datum in self)
return f"LinkedList(({inner_data_str}))"
Take note of the double parens in LinkedList((...))
. We are going to update __init__
to accept an Iterable
… and a tuple
is iterable!
Updated Output
LinkedList((2, 3, 5, 7))
7 Updating dunder init
The __init__
signature will mirror that of append
:
def __init__(self, initial_data: collections.abc.Iterable = None):
self.head: Node = None
self.tail: Node = None
self.length: int = 0
# Use extend to add any starting data
if initial_data:
self.extend(initial_data)
The only addition to the function body is a conditional block that calls extend
if starting data was provided.
8 The Code So Far…
The current (complete) LinkedList
class from this example can be found (along with a test suite) in Module-10-Linked-List/Example-2 in the Example Code Repository.
Now that we have a reasonably complete LinkedList
… we can discuss some refactoring and the Python protocol mechanic. But… that will be the next lecture.