Testing Python Code - Robotic Sea Bass

You think of an idea, you write code for it, and the code works as intended — right?

Nope. The world doesn’t work like that.

That should be enough motivation for testing your code, so let’s get right to it.

In this post, we finally get to some actual code! This is available at https://github.com/sea-bass/python-testing-ci.

Software Testing Crash Course

Code verification activities can broadly be broken down into two sets of two categories.

Manual or Interactive testing: A human inspects or runs the code
Automated testing: A human writes code to automatically test the code (inception?)

and

Static analysis: Analyzing code without running it
Dynamic analysis: Running code and then analyzing the results

Static analysis in its simplest form involves checking files against coding standards — you might see this being called linting in many places, including later in this post. There is also a whole field of formal verification which seeks to systematically analyze code to prove the absence or presence of errors.

Dynamic analysis can provide insight on many questions, such as:

Did the code build correctly and run without errors?
Is the code functionally correct? That is, did it provide the results you expected based on the design specification?
How did the code perform? This usually involves comparing to a baseline run and tracking changes in execution speed, memory usage, etc.

Functional testing is the most common form of dynamic analysis, because the most important thing is making sure the code does what it’s supposed to. It’s almost always — if not always — preferable for your code to be inefficient or fail with errors than give incorrect results.

Sometimes a change that seems to be harmless, or even an improvement, could affect speed, resource usage, and/or the actual results — that is, it’s a bug. So, it’s useful to run your code through examples that best represent the problems you expect the tools to handle. As you tweak your code base, you can compare against previous test runs to see if anything is slipping. This is known as regression testing.

Here are some final takeaways before we get into code:

Tests are as good as the test cases you write. You could easily miss important issues if you are not testing the right things.
No single approach is perfect. Often, the best way to test your code is through some judicious combination of all the methods above.

Unit Testing

For the rest of this post, I will use a simple example to demonstrate testing. Our code will use NumPy version 1.17.0 (for a purely contrived reason) to add two matrices. Surprisingly, you can stretch this to show a lot of interesting things.

Popular tools for unit testing in Python include unittest, pytest, and nose. In this post, we will be using pytest with a few extra plugins as you will soon see.

Creating Simple Unit Tests

Let’s say we decide on two tests:

Is the recommended NumPy version installed?
Does matrix addition produce correct results?

Let’s create a file named test_matrix_operations_basic.py that tests the above. One of the simplest ways to write tests is with assertions. These are Boolean expressions that, if false, will throw an error.

import numpy as np

def test_numpy_version():
    """ Checks for the correct NumPy version as per specification """
    np_ver = np.__version__
    assert(np_ver == "1.17.0")

def test_addition():
    """ Tests the addition of 2 matrices """
    a = np.array([[0, 1, 2],
                  [3, 4, 5],
                  [6, 7, 8]])
    b = np.array([[8, 7, 6],
                  [5, 4, 3],
                  [2, 1, 0]])
    expected = np.array([[8, 8, 8],
                         [8, 8, 8],
                         [8, 8, 8]])
    actual = a + b
    assert((expected == actual).all())

You can run this file as-is, but with pytest you get a few convenient behaviors, such as

Automatically discovering test files — the usual is searching every Python file under the current folder with the prefix test_ or suffix _test , but there is a whole set of discovery rules.
Providing detailed information on failed assertions — both by displaying them in the command line, and as we’ll see, generating reports on test results.

Now go to your shell and type pytest -v (for verbose, just to make things easier to illustrate). You should see results like the following. Yay — the tests passed!

Parametrizing Tests

We verified that we could correctly add two matrices, but is there any indication that this will work for any two matrices we pass in? Spoiler alert: It won’t.

Let’s say we want to try more sets of matrices. The naive way would be to copy-paste the test we wrote and change the input and expected output definition for each copy. As we know, copy-pasting the same code in multiple places is a top recipe for software disaster, so let’s avoid that by parametrizing our tests.

pytest provides a handful of decorators to help with typical testing activities like this one, and in our case we will use @pytest.mask.parametrize. Let’s make a new test file named test_matrix_operations_param.py and set it to add 2 carefully selected pairs of matrices.

Notice that if you start using some of the pytest functionality you have to also import pytest in your file.

import pytest
import numpy as np

# Test Case 0
a0 = np.array([[0, 1, 2],
               [3, 4, 5],
               [6, 7, 8]])
b0 = np.array([[8, 7, 6],
               [5, 4, 3],
               [2, 1, 0]])
expected0 = np.array([[8, 8, 8],
                      [8, 8, 8],
                      [8, 8, 8]])

# Test Case 1
a1 = np.array([[0, 1, 2],
               [3, 4, 5],
               [6, 7, 8]])
b1 = np.array([[3*0.1, 2,  3],
               [0,     0,  0],
               [-1,   -2, -3]])
expected1 = np.array([[0.3, 3, 5],
                      [3,   4, 5],
                      [5,   5, 5]])

@pytest.mark.parametrize("a,b,expected",[(a0,b0,expected0),(a1,b1,expected1)])
def test_addition(a,b,expected):
    """ Tests the addition of 2 matrices by exact comparison """
    actual = a + b
    assert((expected == actual).all())

Dealing with Failure

… and our tests above failed. It seems like that comparing that second pair of matrices against seemingly correct expected outputs somehow did not work.

Is it bad code, or a bad test?

In this case, it’s actually a bad test! With floating-point arithmetic, sometimes you can get numerical errors that cause exact comparison of two numbers to fail even though it shouldn’t. This is happening with that top left matrix value, where the test input defines it as 3*0.1 but the expected output is 0.3… which should be equal.

Taking a quick detour into Python, we can show this problem is indeed numerical error:

>>> 0.1 + 0.1 + 0.1 == 0.3
False
>>> (0.1 + 0.1 + 0.1) - 0.3
5.551115123125783e-17

To handle this common situation, NumPy provides an allclose function to check whether two matrices are close to each other within tolerance. We can rewrite our test function as follows and the tests will pass.

@pytest.mark.parametrize("a,b,expected",[(a0,b0,expected0),(a1,b1,expected1)])
def test_addition(a,b,expected):
    """
    Tests the addition of 2 matrices by checking if they are close within some tolerance
    """
    actual = a + b
    assert(np.allclose(actual,expected, rtol=1e-05, atol=1e-08))

I’ll let you try this for yourself, but the sample code provides both implementations and marks the exact comparison test with a @pytest.mark.skip decorator to skip the test. I recommend looking at the documentation for other ways to handle tests that you know won’t work (besides deleting them).

Towards More Realistic Testing

In a more realistic code base, you (hopefully) won’t be writing pointless tests for built-in functionality in other packages like adding two matrices. So let’s consider the following differences as a tiny step towards the real world.

Your code is now inside a package named matrix_tools, with a module named basic_utils that contains our functions for checking NumPy version and adding matrices.
The new and improved matrix addition function now has input validation code that checks whether you’ve passed in 2-D matrices that have the same shape.

An example is below, which you can find in the sample code as test_matrix_tools.py. Here I’ve decided to implement the tests inside a class because of the typical benefits of object-oriented programming. pytest will automatically discover all the class methods — in our case, all the methods that start with test_ .

Notice the new test added at the end which intentionally passes in invalid input (a 2-by-2 and a 3-by-3 matrix), and uses pytest.raises() to check that our function raises an exception of type SizeError, which is a custom exception type in our module.

import pytest
import numpy as np
from matrix_tools.basic_utils import get_numpy_version, add_matrices, SizeError

class TestMatrixTools:

    # Define inputs and expected outputs
    # SKIPPING FOR BREVITY ...

    # Define test methods
    def test_numpy_version(self):
        """
        Checks for the correct NumPy version as per specification
        """
        np_ver = get_numpy_version()
        assert(np_ver == "1.17.0")
        print("Correct NumPy version found: " + np_ver)
        

    @pytest.mark.skip(reason="Not using exact matrix comparison")
    @pytest.mark.parametrize("a,b,expected",[(a0,b0,expected0),(a1,b1,expected1)])
    def test_addition_exact(self,a,b,expected):
        """
        Tests the addition of 2 matrices by exact comparison
        """
        actual = add_matrices(a,b)
        assert((expected == actual).all())
        print("Matrices are exactly equal")


    @pytest.mark.parametrize("a,b,expected",[(a0,b0,expected0),(a1,b1,expected1)])
    def test_addition_close(self,a,b,expected):
        """
        Tests the addition of 2 matrices by checking if they are close within some tolerance
        """
        actual = add_matrices(a,b)
        assert(np.allclose(actual,expected,rtol=1e-05,atol=1e-08))
        print("Matrices are equal within specified tolerance")


    def test_shape_mismatch(self):
        """
        Tests the shape mismatch exception handling
        """
        # Define test inputs with mismatching shapes
        a = np.array([[1, 2],
                      [3, 4]])
        b = np.array([[0, 9, 8],
                      [7, 6, 5],
                      [4, 3, 2]])

        # Run the test expecting a SizeError
        with pytest.raises(SizeError):
            output = add_matrices(a,b)

… Let’s take a quick breather …

Generating Reports and Saving Options

Viewing the test results as standard output is fine for smaller code bases like ours so far, but it quickly becomes evident why you may want to write test results to a file that is more than a plain text dump.

Conveniently, there is a pytest-html package for exactly this. Once you’ve installed it, the pytest command gets supplemented with a few extra options. Let’s crank up the verbosity and write to a self-contained HTML file.

pytest -vv --html=latest_test_results.html --self-contained-html

Alternatively, if you plan to use the same options every time, you can stick all these options into a file named pytest.ini. This way, you only need to type pytest and the options will get automatically picked up from there. To use the same options as above, our file would look like this:

[pytest]
addopts = -vv --html=latest_test_results.html --self-contained-html

You can then open the generated HTML report — in our case, latest_test_results.html — in your favorite Web browser and you’ll see an interactive document like this.

Collecting Test Coverage

Another topic I find interesting is measuring code coverage. A generic definition of coverage is: did your tests exercise all the code that was written? If so, we say the tests achieved full coverage. There are many measures of coverage, and this page from Guru99 serves as a nice introduction.

There are Python tools for measuring test coverage including a pytest-cov package. This is a fairly thin wrapper around the coverage package that plays well with pytest. Both these tools by default only collect statement (or line) coverage, though they can also be configured for branch coverage.

You can add coverage collection and reporting options either when running pytest or in your pytest.ini file. For example:

[pytest]
addopts = -vv --html=latest_test_results.html --self-contained-html
              --cov=matrix_tools --cov-config=.coveragerc --cov-report html:coverage_results

Notice we are using a separate configuration file called .coveragerc where we are omitting the tests folder (why get coverage for the tests themselves?) and adding branch coverage to our collected output.

[run]
omit = tests/*
branch = True

Again, you can look at the HTML report generated for coverage. There is no self-contained option here, so go to the folder where the files were generated (in our case, coverage_results), find the index.html file, and navigate from there.

Notice that we do not have full coverage because we tested the input case where the matrices were 2D but different shape, but not the other cases where either of the individual matrices was not 2D. As homework, add tests for that and you should see full test coverage!

Linting Your Code

Now that we’ve built up to writing our own module for adding matrices, it’s time to put our source code through some static checks.

pylint and Flake8 are the two most popular linting tools in Python. You can learn more at this awesome post by luminousmen.

For this post I will use Flake8, which partly gets its name because it checks against the PEP8 style guide for Python code. You can also install flake8-html to generate HTML reports (see a pattern here?).

The commands to lint your code would look as follows.

# Prints to shell
flake8 matrix_tools

# Generates HTML report (requires flake8-html)
flake8 --format=html --htmldir=flake-report matrix_tools

# In the command above, the report gets saved to the flake-report folder

If we look at the output, we find some actionable coding standards fixes for our source file matrix_tools/basic_setup.py. Fix them if you’d like, run the linter again, and check the new results.

The generated HTML report from Flake8.
Pedantic? Yes. Useful? Absolutely.

Summary — What Next?

The code in this post is available at https://github.com/sea-bass/python-testing-ci.

Here we only did what is known as unit testing — that is, testing one function at a time — mostly because we only really wrote one function! However, real software usually has a whole system architecture of components that depend on each other. You will often see terms like integration testing, system testing, and acceptance testing as you go up the chain towards verifying your whole project. This post from Segue Technologies puts it nicely.

Finally, there is the big topic of continuous integration — in other words, can you automate a process wherein every time you change the code (or at some regular schedule), your tests are run automatically so you can analyze results and find issues before it’s too late? Expect a post on continuous integration soon!

0 thoughts on “Testing Python Code”

Pingback: Continuous Integration with GitHub, Docker, and Jenkins – Robotic Sea Bass
Ketan says:

June 8, 2020 at 7:23 pm

The testing library Hypothesis looks interesting:

https://hypothesis.readthedocs.io/en/latest/index.html

but I have not had a project where I can try it out.