Chapter 10 Error Handling

“When Mister Safety Catch Is Not On, Mister Crossbow Is Not Your Friend.”

— Terry Pratchett

We live in an imperfect world. People will give our programs options that aren’t supported or ask those programs to read files that don’t exist. Our code will also inevitably contain bugs, so we should plan from the start to catch and handle errors. In this chapter, we will explore how errors are represented in programs and what we should do with them. The Zipf’s Law project should now include:

zipf/
├── .gitignore
├── CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── Makefile
├── README.md
├── bin
│   ├── book_summary.sh
│   ├── collate.py
│   ├── countwords.py
│   ├── plotcounts.py
│   ├── plotparams.yml
│   ├── script_template.py
│   └── utilities.py
├── data
│   ├── README.md
│   ├── dracula.txt
│   └── ...
└── results
    ├── dracula.csv
    ├── dracula.png
    └── ...

10.1 Exceptions

Most modern programming languages use exceptions for error handling. As the name suggests, an exception is a way to represent an exceptional or unusual occurrence that doesn’t fit neatly into the program’s expected operation. The code below uses exceptions to report attempts to divide by zero:

for denom in [-5, 0, 5]:
    try:
        result = 1/denom
        print(f'1/{denom} == {result}')
    except:
        print(f'Cannot divide by {denom}')
1/-5 == -0.2
Cannot divide by 0
1/5 == 0.2

try/except looks like if/else and works in a similar fashion. If nothing unexpected happens inside the try block, the except block isn’t run (Figure 10.1). If something goes wrong inside the try, on the other hand, the program jumps immediately to the except. This is why the print statement inside the try doesn’t run when denom is 0: as soon as Python tries to calculate 1/denom, it skips directly to the code under except.

Exception Control Flow

Figure 10.1: Exception Control Flow

We often want to know exactly what went wrong, so Python and other languages store information about the error in an object (which is also called an exception). We can catch an exception and inspect it as follows:

for denom in [-5, 0, 5]:
    try:
        result = 1/denom
        print(f'1/{denom} == {result}')
    except Exception as error:
        print(f'{denom} has no reciprocal: {error}')
1/-5 == -0.2
0 has no reciprocal: division by zero
1/5 == 0.2

We can use any variable name we like instead of error; Python will assign the exception object to that variable so that we can do things with it in the except block.

Python also allows us to specify what kind of exception we want to catch. For example, we can write code to handle out-of-range indexing and division by zero separately:

numbers = [-5, 0, 5]
for i in [0, 1, 2, 3]:
    try:
        denom = numbers[i]
        result = 1/denom
        print(f'1/{denom} == {result}')
    except IndexError as error:
        print(f'index {i} out of range')
    except ZeroDivisionError as error:
        print(f'{denom} has no reciprocal: {error}')
1/-5 == -0.2
0 has no reciprocal: division by zero
1/5 == 0.2
index 3 out of range

Exceptions are organized in a hierarchy: for example, FloatingPointError, OverflowError, and ZeroDivisionError are all special cases of ArithmeticError, so an except that catches the latter will catch all three of the former, but an except that catches an OverflowError won’t catch a ZeroDivisionError. The Python documentation describes all of the built-in exception types; in practice, the ones that people handle most often are:

  • ArithmeticError: something has gone wrong in a calculation.
  • IndexError and KeyError: something has gone wrong indexing a list or lookup something up in a dictionary.
  • OSError: thrown when a file is not found, the program doesn’t have permission to read it, and so on.

So where do exceptions come from? The answer is that programmers can raise them explicitly:

for number in [1, 0, -1]:
    try:
        if number < 0:
            raise ValueError(f'negative values not supported: {number}')
        print(number)
    except ValueError as error:
        print(f'exception: {error}')
1
0
exception: negative values not supported: -1

We can define our own exception types, and many libraries do, but the built-in types are enough to cover common cases.

One final note is that exceptions don’t have to be handled where they are raised. In fact, their greatest strength is that they allow long-range error handling. If an exception occurs inside a function and there is no except for it there, Python checks to see if whoever called the function is willing to handle the error. It keeps working its way up through the call stack until it finds a matching except. If there isn’t one, Python takes care of the exception itself. The example below relies on this: the second call to sum_reciprocals tries to divide by zero, but the exception is caught in the calling code rather than in the function.

def sum_reciprocals(values):
    result = 0
    for v in values:
        result += 1/v
    return result

numbers = [-1, 0, 1]
try:
    one_over = sum_reciprocals(numbers)
except ArithmeticError as error:
    print(f'Error trying to sum reciprocals: {error}')
Error trying to sum reciprocals: division by zero

This behavior is designed to support a pattern called “throw low, catch high”: write most of your code without exception handlers, since there’s nothing useful you can do in the middle of a small utility function, but put a few handlers in the uppermost functions of your program to catch and report all errors.

We can now go ahead and add error handling to our Zipf’s Law code. Some is already built in: for example, if we try to read a file that does not exist, the open function throws a FileNotFoundError:

python bin/collate.py results/none.csv results/dracula.csv
Traceback (most recent call last):
  File "bin/collate.py", line 27, in <module>
    main(args)
  File "bin/collate.py", line 17, in main
    with open(file_name, 'r') as reader:
FileNotFoundError: [Errno 2] No such file or directory: 'results/none.csv'

But what happens if we try to read a file that exists, but was not created by countwords.py?

$ python bin/collate.py Makefile
Traceback (most recent call last):
  File "bin/collate.py", line 27, in <module>
    main(args)
  File "bin/collate.py", line 18, in main
    update_counts(reader, word_counts)
  File "bin/collate.py", line 10, in update_counts
    for word, count in csv.reader(reader):
ValueError: not enough values to unpack (expected 2, got 1)

This error is hard to understand, even if we are familiar with the code’s internals. Our program should therefore check that the input files are CSV files, and if not, raise an error with a useful explanation to what went wrong. We could achieve this by wrapping the call to open in a try/except clause:

for file_name in args.infiles:
    try:
        with open(file_name, 'r') as reader:
            update_counts(reader, word_counts)
    except ValueError as e:
        print(f'{file_name} is not a CSV file.')
        print(f'ValueError: {e}')
$ python bin/collate.py Makefile
Makefile is not a CSV file.
ValueError: not enough values to unpack (expected 2, got 1)

This is definitely more informative than before. However, all ValueErrors that are raised when trying to open a file will result in this error message, including those raised when we actually do use a CSV file as input. A more precise approach in this case would be to throw an exception only if some other kind of file is specified as an input:

for file_name in args.infiles:
    if file_name[-4:] != '.csv':
        raise OSError(f'{file_name} is not a CSV file.')
    with open(file_name, 'r') as reader:
        update_counts(reader, word_counts)
$ python bin/collate.py Makefile
Traceback (most recent call last):
  File "bin/collate.py", line 29, in <module>
    main(args)
  File "bin/collate.py", line 18, in main
    raise OSError(f'{file_name} is not a valid CSV file of word counts.')
OSError: Makefile is not a CSV file.

This approach is still not perfect: we are checking that the file’s suffix is .csv instead of checking the content of the file and confirming that it is what we require. What we should do is check that there are two columns separated by a comma, that the first column contains strings, and that the second is numerical.

10.2 Kinds of Errors

The “if then raise” approach is sometimes referred to as “look before you leap”, while the try/except approach obeys the old adage that “it’s easier to ask for forgiveness than permission”. The first approach is more precise, but has the shortcoming that programmers can’t anticipate everything that can go wrong when running a program, so there should always be an except somewhere to deal with unexpected cases.

Generally speaking, we should distinguish between internal errors, such as calling a function with None instead of a list, and external errors, such as trying to read a file that doesn’t exist. Internal errors should be prevented by doing unit testing (Chapter 11), but software is always used in new ways in the real world, and those new ways can trigger unanticipated bugs. When an internal error occurs, the only thing we can do in most cases is report it and halt the program. If a function has been passed None instead of a valid list, for example, the odds are good that one of our data structures is corrupted. We can try to guess what the problem is and take corrective action, but our guess will often be wrong and our attempt to correct the problem might actually make things worse.

External errors, on the other hand, are usually caused by interactions between the program and the outside world: a user may mis-type a filename, the network might be down, and so on. Section 11.5 describes some ways to test that software will do the right thing when this happens, but we still need to catch and handle these errors when they arise. For example, if a user mis-types her password, prompting her to try again would be friendlier than requiring her to restart the program.

The one rule we should always follow is to check for errors as early as possible so that we don’t waste the user’s time. Few things are as frustrating as being told at the end of an hour-long calculation that the program doesn’t have permission to write to an output directory. It’s a little extra work to check things like this up front, but the larger your program or the longer it runs, the more useful those checks will be.

10.3 Writing Useful Error Messages

The error message shown in Figure 10.2 is not helpful:

An Unhelpful Error Message

Figure 10.2: An Unhelpful Error Message

Having collate.py print the message below would be equally unfriendly:

OSError: Something went wrong, try again.

This message doesn’t provide any information on what went wrong, so it is difficult to know what to change for next time. A slightly better message would be:

OSError: Unsupported file type.

This tells us the problem is with the type of file we’re trying to process, but it still doesn’t tell us what file types are supported, which means we have to rely on guesswork or read the source code. Telling the user “filename is not a CSV file” (as we did in the previous section) makes it clear that the program only works with CSV files, but since we don’t actually check the content of the file, this message could confuse someone who has comma-separated values saved in a .txt file. An even better message would therefore be:

OSError: The filename must end in `.csv`.

This message tells us exactly what the criteria are to avoid the error.

Error messages are often the first thing people read about a piece of software, so they should therefore be the most carefully written documentation for that software. A web search for “writing good error messages” turns up hundreds of hits, but recommendations are often more like gripes than guidelines and are usually not backed up by evidence. What research there is gives us the following rules (Becker et al. 2016):

  1. Tell the user what they did, not what the program did. Putting it another way, the message shouldn’t state the effect of the error, it should state the cause.

  2. Be spatially correct, i.e., point at the actual location of the error. Few things are as frustrating as being pointed at line 28 when the problem is really on line 35.

  3. Be as specific as possible without being or seeming wrong from a user’s point of view. For example, “file not found” is very different from “don’t have permissions to open file” or “file is empty”.

  4. Write for your audience’s level of understanding. For example, error messages should never use programming terms more advanced than those you would use to describe the code to the user.

  5. Do not blame the user, and do not use words like fatal, illegal, etc. The former can be frustrating—in many cases, “user error” actually isn’t—and the latter can make people worry that the program has damaged their data, their computer, or their reputation.

  6. Do not try to make the computer sound like a human being. In particular, avoid humor: very few jokes are funny on the dozenth re-telling, and most users are going to see error messages at least that often.

  7. Use a consistent vocabulary. This rule can be hard to enforce when error messages are written by several different people, but putting them all in one module makes review easier.

That last suggestion deserves a little elaboration. Most people write error messages directly in their code:

if file_name[-4:] != '.csv':
    raise OSError(f'{file_name}: The filename must end in `.csv`')

A better approach is to put all the error messages in a dictionary:

ERROR_MESSAGES = {
    'not_csv_file_suffix' : '{file_name}: The filename must end in `.csv`',
    'config_corrupted' : 'Configuration file {config_name} corrupted',
    # ...more error messages...
}

and then only use messages from that dictionary:

if file_name[-4:] != '.csv':
    raise OSError(ERROR_MESSAGES['not_csv_file_suffix'].format(file_name=file_name))

Doing this makes it much easier to ensure that messages are consistent. It also makes it much easier to give messages in the user’s preferred language:

ERROR_MESSAGES = {
    'en' : {
        'not_csv_file_suffix' : '{file_name}: The filename must end in `.csv`',
        'config_corrupted' : 'Configuration file {config_name} corrupted',
        # ...more error messages in English...
    },
    'fr' : {
        'not_csv_file_suffix' : '{file_name}: Le nom du fichier doit se terminer par `.csv`',
        'config_corrupted' : f'Fichier de configuration {config_name} corrompu',
        # ...more error messages in French...
    }
    # ...other languages...
}

The error report is then looked up and formatted as:

ERROR_MESSAGES[user_language]['not_csv_file_suffix'].format(file_name=file_name)

where user_language is a two-letter code for the user’s preferred language.

10.4 Reporting Errors

Programs should report things that go wrong; they should also sometimes report things that go right so that people can monitor their progress. Adding print statements is a common approach, but removing them or commenting them out when the code goes into production is tedious and error-prone.

A better approach is to use a logging framework, such as Python’s logging library. This lets us leave debugging statements in our code and turn them on or off at will. It also lets us send output to any of several destinations, which is helpful when our data analysis pipeline has several stages and we are trying to figure out which one contains a bug.

To understand how logging frameworks work, suppose we want to turn print statements in our collate.py program on or off without editing the program’s source code. We would probably wind up with code like this:

if LOG_LEVEL >= 0:
    print('Processing files...')
for file_name in args.infiles:
    if LOG_LEVEL >= 1:
        print(f'Reading in {file_name}...')
    if file_name[-4:] != '.csv':
        raise OSError(ERROR_MESSAGES['not_csv_file_suffix'].format(file_name=file_name))
    with open(file_name, 'r') as reader:
        if LOG_LEVEL >= 1:
            print(f'Computing word counts...')
        update_counts(reader, word_counts)

LOG_LEVEL acts as a threshold: any debugging output at a lower level than its value isn’t printed. As a result, the first log message will always be printed, but the other two only in case the user has requested more details by setting LOG_LEVEL higher than zero.

A logging framework combines the if and print statements in a single function call and defines standard names for the logging levels. In order of increasing severity, the usual levels are:

  • DEBUG: very detailed information used for localizing errors.
  • INFO: confirmation that things are working as expected.
  • WARNING: something unexpected happened, but the program will keep going.
  • ERROR: something has gone badly wrong, but the program hasn’t hurt anything.
  • CRITICAL: potential loss of data, security breach, etc.

Each of these has a corresponding function: we can use logging.debug, logging.info, etc. to write messages at these levels. By default, only WARNING and above are displayed; messages appear on standard error so that the flow of data in pipes isn’t affected. The logging framework also displays the source of the message, which is called root by default. Thus, if we run the small program shown below, only the warning message appears:

import logging


logging.warning('This is a warning.')
logging.info('This is just for information.')
WARNING:root:This is a warning.

Rewriting the collate.py example above using logging yields code that is less cluttered:

import logging


logging.info('Processing files...')
for file_name in args.infiles:
    logging.debug(f'Reading in {file_name}...')
    if file_name[-4:] != '.csv':
        raise OSError(ERROR_MESSAGES['not_csv_file_suffix'].format(file_name=file_name))
    with open(file_name, 'r') as reader:
        logging.debug('Computing word counts...')
        update_counts(reader, word_counts)

We can also configure logging to send messages to a file instead of standard error using logging.basicConfig. (This has to be done before we make any logging calls—it’s not retroactive.) We can also use that function to set the logging level: everything at or above the specified level is displayed.

import logging


logging.basicConfig(level=logging.DEBUG, filename='logging.log')

logging.debug('This is for debugging.')
logging.info('This is just for information.')
logging.warning('This is a warning.')
logging.error('Something went wrong.')
logging.critical('Something went seriously wrong.')
DEBUG:root:This is for debugging.
INFO:root:This is just for information.
WARNING:root:This is a warning.
ERROR:root:Something went wrong.
CRITICAL:root:Something went seriously wrong.

By default, basicConfig re-opens the file we specify in append mode; we can use filemode='w' to overwrite the existing log data. Overwriting is useful during debugging, but we should think twice before doing in production, since the information we throw away often turns out to be exactly what we need to find a bug.

Many programs allow users to specify logging levels and log file names as command-line parameters. At its simplest, this is a single flag -v or --verbose that changes the logging level from WARNING (the default) to DEBUG (the noisiest level). There may also be a corresponding flag -q or --quiet that changes the level to ERROR, and a flag -l or --logfile that specifies a log file name. To log messages to a file while also printing them, we can tell logging to use two handlers simultaneously:

import logging


logging.basicConfig(
    level=logging.DEBUG,
    handlers=[
        logging.FileHandler("logging.log"),
        logging.StreamHandler()])

logging.debug('This is for debugging.')

The string 'This is for debugging' is both printed to standard error and appended to logging.log.

Libraries like logging can send messages to many destinations; in production, we might send them to a centralized logging server that collates logs from many different systems. We might also use rotating files so that the system always has messages from the last few hours but doesn’t fill up the disk. We don’t need any of these when we start, but the data engineers and system administrators who eventually have to install and maintain your programs will be very grateful if we use logging instead of print statements, because it allows them to set things up the way they want with very little work.

Logging Configuration

Chapter 9 explained why and how to save the configuration that produced a particular result. We clearly also want this information in the log, so we have three options:

  1. Write the configuration values into the log one at a time.

  2. Save the configuration as a single record in the log (e.g., as a single entry containing JSON).

  3. Write the configuration to a separate file and save the filename in the log.

Option 1 usually means writing a lot of extra code to reassemble the configuration. Option 2 also often requires us to write extra code (since we need to be able to save and restore configurations as JSON as well as in whatever format we normally use), so on balance we recommend option 3.

10.5 Summary

Most programmers spend as much time debugging as they do writing new code, but most courses and textbooks only show working code, and never discuss how to prevent, diagnose, report, and handle errors. Raising our own exceptions instead of using the system’s, writing useful error messages, and logging problems systematically can save us and our users a lot of needless work.

10.6 Exercises

This chapter suggested several edits to collate.py, such that the script now reads:

"""Combine multiple word count CSV-files into a single cumulative count."""
import csv
import argparse
from collections import Counter
import logging
import utilities


ERROR_MESSAGES = {
    'not_csv_file_suffix' : '{file_name}: The filename must end in `.csv`',
}

def update_counts(reader, word_counts):
    """Update word counts with data from another reader/file."""
    for word, count in csv.reader(reader):
        word_counts[word] += int(count)

def main(args):
    """Run the command line program."""
    word_counts = Counter()
    logging.info('Processing files...')
    for file_name in args.infiles:
        logging.debug(f'Reading in {file_name}...')
        if file_name[-4:] != '.csv':
            raise OSError(ERROR_MESSAGES['not_csv_file_suffix'].format(file_name=file_name))
        with open(file_name, 'r') as reader:
            logging.debug('Computing word counts...')
            update_counts(reader, word_counts)
    utilities.collection_to_csv(word_counts, num=args.num)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('infiles', type=str, nargs='*', help='Input file names')
    parser.add_argument('-n', '--num', type=int, default=None,
                        help='Limit output to N most frequent words')
    args = parser.parse_args()
    main(args)

Some of the following exercises will ask you to make further edits to collate.py.

10.6.1 Set the logging level

Define a new command line flag for collate.py called --verbose (or -v) that changes the logging level from WARNING (the default) to DEBUG (the noisiest level).

Hint: the following command changes the logging level to DEBUG:

logging.basicConfig(level=logging.DEBUG)

Once finished, running collate.py with and without the -v flag should produce the following output:

$ python bin/collate.py results/dracula.csv results/moby_dick.csv -n 5
the,22559
and,12306
of,10446
to,9192
a,7629
$ python bin/collate.py results/dracula.csv results/moby_dick.csv -n 5 -v
INFO:root:Processing files...
DEBUG:root:Reading in results/dracula.csv...
DEBUG:root:Computing word counts...
DEBUG:root:Reading in results/moby_dick.csv...
DEBUG:root:Computing word counts...
the,22559
and,12306
of,10446
to,9192
a,7629

10.6.2 Send the logging output to file

In Exercise 10.6.1, logging information is printed to the screen when the verbose flag is activated. This is problematic if we want to re-direct the output from collate.py to a CSV file, because the logging information will appear in the CSV file as well as the words and their counts.

  1. Edit collate.py so that the logging information is sent to a log file called collate.log instead. (HINT: logging.basicConfig has an argument called filename.)

  2. Create a new command line option -l or --logfile so that the user can specify a different name for the log file if they don’t like the default name of collate.log.

10.6.3 Handling exceptions

  1. Modify the script collate.py so that it catches any exceptions that are raised when it tries to open files and records them in the log file. When you are finished, the program should collate all the files it can rather than halting as soon as it encounters a problem.
  2. Modify your first solution to handle nonexistent files and permission problems separately.

10.6.4 Error catalogs

In Section 10.3 we started to define an error catalog called ERROR_MESSAGES.

  1. Read Appendix K.1 and explain why we have used capital letters for the name of the catalog.
  2. Python has three ways to format strings: the % operator, the str.format method, and f-strings (where the “f” stands for “format”). Look up the documentation for each and explain why we have to use str.format rather than f-strings for formatting error messages in our catalog/lookup table.
  3. There’s a good chance we will eventually want to use the error messages we’ve defined in other scripts besides collate.py. To avoid duplication, move ERROR_MESSAGES to the utilities module that was first created in Section 4.8.

10.6.5 Tracebacks

Run the following code:

try:
    1/0
except Exception as e:
    help(e.__traceback__)
  1. What kind of object is e.__traceback__?
  2. What useful information can you get from it?

10.7 Key Points

  • Signal errors by raising exceptions.
  • Use try/except blocks to catch and handle exceptions.
  • Python organizes its standard exceptions in a hierarchy so that programs can catch and handle them selectively.
  • “Throw low, catch high”, i.e., raise exceptions immediately but handle them at a higher level.
  • Write error messages that help users figure out what to do to fix the problem.
  • Store error messages in a lookup table to ensure consistency.
  • Use a logging framework instead of print statements to report program activity.
  • Separate logging messages into DEBUG, INFO, WARNING, ERROR, and CRITICAL levels.
  • Use logging.basicConfig to define basic logging parameters.