Exceptions & File Parsing
Thomas J. Kennedy
1 A Less Than Cliché Problem
Suppose that we have an input file that contains information about two-dimensional shapes.
Example 1: inputShapes.txtTriangle; 4 4 4 Right Triangle; 4 5 Right Triangle; 4 five Equilateral Triangle; 3 Equilateral Triangle 3 Equilateral Triangle; 3 3 3 Square; 9 Circle; 5 1337 Haxor; 1 lol i broke it Ellipse;
Let us write a quick program to parse the file.
from typing import TextIO
import sys
def parse_shape_file(input_file: TextIO) -> list[tuple[str, list[float]]]:
for line in input_file:
# Remove leading and trailing whitespace
line = line.strip()
if not line:
continue
print(line)
return []
def main():
with open(sys.argv[1], "r") as shape_file:
shape_data = parse_shape_file(shape_file)
if __name__ == "__main__":
main()
Take note of how the main function retrieves the input filename from sys.argv[1]
. I am not happy with the current main
function…
def main():
with open(sys.argv[1], "r") as shape_file:
shape_data = parse_shape_file(shape_file)
If the user…
-
Supplies an invalid filename (e.g., to a file that does not exist), we will encounter a
FileNotFound
exception. -
Does not supply a filename, we encounter an
IndexError
exception. will encounter aFileNotFound
exception.
Neither of these exceptions is handled. Let us start with the IndexError
.
try:
shape_filename = sys.argv[1]
except IndexError as _err:
print("Usage: exception_ex_5.py INPUT_FILENAME")
sys.exit(1)
with open(shape_filename, "r") as shape_file:
shape_data = parse_shape_file(shape_file)
That is much better. We first attempt to retrieve the filename within a try-except block. If no filename was provided… we output a usage message and exit. However, we still have the invalid file issue, e.g.,
python3.11 exception_ex_5.py doesNotExist.txt Traceback (most recent call last): File "exception_ex_5.py", line 33, in <module> main() File "exception_ex_5.py", line 28, in main with open(shape_filename, "r") as shape_file: FileNotFoundError: [Errno 2] No such file or directory: 'doesNotExist.txt'
A quick second try-except block can handle the issue…
def main():
try:
shape_filename = sys.argv[1]
except IndexError as _err:
print("Usage: exception_ex_5.py INPUT_FILENAME")
sys.exit(1)
try:
with open(shape_filename, "r") as shape_file:
shape_data = parse_shape_file(shape_file)
except FileNotFoundError as err:
print(err)
sys.exit(2)
Note how err
is printed. At the end of the day… the final line of the stacktrace is what we want. We can just output the FileNotFoundError
’s message.
python3.11 exception_ex_6.py doesNotExist.txt [Errno 2] No such file or directory: 'doesNotExist.txt'
The files exception_ex_5.py and exception_ex_6.py contain the updated code with the addition of the first and second try-except blocks, respectively.
2 What About Parsing the File
Do not worry… we will start parsing the input file shortly. Keep in mind that these first two exceptions (i.e., IndexError
and FileNotFoundError
) are two of the most common exceptions when dealing with files and command line arguments. However, too many programmers rush to the fun part and say “I will write that error handling later.” The error handling is oft never written.
Yes… I have worked quite hard to rid myself of this very habit. Learn from my mistakes.
Let us take a moment to review the entire file (after all the updates).
from typing import TextIO
import sys
def parse_shape_file(input_file: TextIO) -> list[tuple[str, list[float]]]:
for line in input_file:
# Remove leading and trailing whitespace
line = line.strip()
if not line:
continue
print(line)
return []
def main():
try:
shape_filename = sys.argv[1]
except IndexError as _err:
print("Usage: exception_ex_5.py INPUT_FILENAME")
sys.exit(1)
try:
with open(shape_filename, "r") as shape_file:
shape_data = parse_shape_file(shape_file)
except FileNotFoundError as err:
print(err)
sys.exit(2)
if __name__ == "__main__":
main()
I think that main
is good enough for us to focus in parse_shape_file
.
3 The Actual Parsing
Let us start by changing the return type. A list
of tuple
s is probably not the best choice. Once the shape names and correspoing numbers are retrieved their will probably be follow-up validation (e.g., checking that the numbers are non-negative). Let us switch from a function that returns a list
…
def parse_shape_file(input_file: TextIO) -> list[tuple[str, list[float]]]:
to a function that yield
s values as Generator
.
def parse_shape_file(input_file: TextIO) -> Generator[tuple[str, list[float]], None, None]:
And… let us add some pydoc documentation…
def parse_shape_file(input_file: TextIO) -> Generator[tuple[str, list[float]], None, None]:
"""
Take each line from a given file (or file-like object) and split it into a
tuple in the form
(name, [val_1, val_2, ...])
If a line is invalid (e.g., contains non-numeric values after the
semicolon)... skip the line.
"""
Note that this does require a change in main…
with open(shape_filename, "r") as shape_file:
shape_data = parse_shape_file(shape_file)
must be rewritten as…
with open(shape_filename, "r") as shape_file:
shape_data = list(parse_shape_file(shape_file))
3.1 The Initial Loop
Let us start with a basic loop. We want to:
- Grab a line
- Strip (remove) leading and trailing whitespace from the line
- If the line is blank (i.e., empty)… skip it and continue to the next line
for line in input_file:
# Remove leading and trailing whitespace
line = line.strip()
if not line:
continue
We need to split each line at the semicolon (i.e., ‘;
’). If the line does not contain a semicolon… it is malformed. Let us throw a ValueError
.
try:
name, the_rest = line.split(";")
except ValueError as _err:
print(f"Missing ';' -> \"{line}\" is malformed.", file=sys.stderr)
continue
Take note of how the output is written to sys.stderr
. In practice this is a recoverable error. If we were using a logger (as production code should) this message would be output at the Warning level. It is an error… but an error from which we can recover (by skipping the line). We want to separate error output from actual output.
Now…we want to parse the rest of the line (i.e., the_rest
). Let us…
-
Remove any leading whitespace.
the_rest = the_rest.lstrip()
-
Split on whitespace to get a list.
the_rest = the_rest.split()
-
Try to convert every list entry into a float.
numbers = [float(val) for val in the_rest]
-
Handle the case where a
val
is not a number.try: numbers = [float(val) for val in the_rest] except ValueError as err: print(f"{err} -> \"{line}\" is malformed.", file=sys.stderr) continue
-
Returnyield
name
andnumbers
as a tuple.yield (name, numbers)
4 Revising the Loop?
Let us take a look at the complete parse_shape_file
function.
def parse_shape_file(input_file: TextIO) -> Generator[tuple[str, list[float]], None, None]:
"""
Take each line from a given file (or file-like object) and split it into a
tuple in the form
(name, [val_1, val_2, ...])
If a line is invalid (e.g., contains non-numeric values after the
semicolon)... skip the line.
"""
for line in input_file:
# Remove leading and trailing whitespace
line = line.strip()
if not line:
continue
try:
name, the_rest = line.split(";")
except ValueError as _err:
print(f"Missing ';' -> \"{line}\" is malformed.", file=sys.stderr)
continue
the_rest = the_rest.lstrip()
the_rest = the_rest.split()
try:
numbers = [float(val) for val in the_rest]
except ValueError as err:
print(f"{err} -> \"{line}\" is malformed.", file=sys.stderr)
continue
yield (name, numbers)
One might be tempted to combinethe two try-except
blocks. Even though the two exceptions are both ValueError
s… they occur for different reasons. Our goal is to maintain context that can be logged (even though logging is a topic for a future module).
**Let us leave the loop “as is” for now.
5 Returning to “main”
Let us tweak main
a little more. While parse_shape_file
is intended for use in a larger program… we can still take a quick look at the output. Let us add a single print
statement at the end of main
try:
with open(shape_filename, "r") as shape_file:
shape_data = list(parse_shape_file(shape_file))
except FileNotFoundError as err:
print(err)
sys.exit(2)
print(shape_data)
Take note of where the print
statement is located. It is outside the
-
try-except
block because we only have data ifshape_file
was opened successfully -
with
context manager because we are done with the input file.
This organization clearly documents that we only output shape_data
after we have finsihed reading the input file.
Unfortunately, the output leaves much to be desired. The list
version of repr
does output the list of tuple
s in a form that captures the content.
Example 2: Output with reprcould not convert string to float: 'five' -> "Right Triangle; 4 five" is malformed. Missing ';' -> "Equilateral Triangle 3" is malformed. could not convert string to float: 'lol' -> "1337 Haxor; 1 lol i broke it" is malformed. [('Triangle', [4.0, 4.0, 4.0]), ('Right Triangle', [4.0, 5.0]), ('Equilateral Triangle', [3.0]), ('Equilateral Triangle', [3.0, 3.0, 3.0]), ('Square', [9.0]), ('Circle', [5.0]), ('Ellipse', [])]
The last two lines are actual output as a single line. I added a line break for readability. Let us switch to a PrettyPrinter
from the pprint module. Let us start with the import statement.
import pprint as pp
The convention is refer to the pprint
module with the pp
abreviation. We can then replace
print(shape_data)
with
pp.pprint(shape_data, indent=2, width=72)
Take note of the two keyword arguments:
-
indent
is the number of spaces to indent each level of thelist
-
width
is the maximum width of a line before output moves to the next line
The pprint
output is much more readily parsed (by a human).
Example 3: Output with pprintcould not convert string to float: 'five' -> "Right Triangle; 4 five" is malformed. Missing ';' -> "Equilateral Triangle 3" is malformed. could not convert string to float: 'lol' -> "1337 Haxor; 1 lol i broke it" is malformed. [ ('Triangle', [4.0, 4.0, 4.0]), ('Right Triangle', [4.0, 5.0]), ('Equilateral Triangle', [3.0]), ('Equilateral Triangle', [3.0, 3.0, 3.0]), ('Square', [9.0]), ('Circle', [5.0]), ('Ellipse', [])]
I think that is a good stopping point. The final example (exception_ex_7) can be accessed in the course example repository.