2020-05-20

Why I still write low-level tests

I've seen on-line a few places where people recommend against writing "unit tests". [1 2 3 4]

At first blush that seems like a strange recommendation as writing tests is a pillar of modern software engineering, isn't it? But when you read on you find that they have a various well thought out—if rather more specific—recommendations.1

To take the TDD argument for instance, you are working to meet a set of predefined tests that tell you if your program is working. In other words the the test which are driving the design are (by their nature) acceptance tests. Such tests (should!) specify the user's view of what the program does but not the low-level implementation details. Testing the implementation details locks you into a particular implementation and you don't want to do that.

And people have other good reasons for their advice, too. Traditional low-level tests give you very little coverage per test, don't test the interactions of components, and don't mimic user behavior. Put into less inflammatory language, the recommendation are more like
  • Don't lock yourself into a particular implementation with your tests.
  • Write tests with non-trivial coverage
  • Write tests that mirror actual use patterns
All of which effectively push against testing low-level code units.


Why I not going to stop writing low-level tests


I am employed as a scientific programmer, which means that the codes I work on tend to have fiddly requirements for reproducibility and precision in floating point calculations, tend to execute moderately complex decision making,2 and are often pretty deeply layered. None of that is ideal, but the domain drives codes in that direction.

So, a common enough scenario in my end of the business is
  1. Write a set of utility routines handling a domain specific computation.
  2. Start using them to solve parts of your main problem. End up with several different bits of code that depend on the utility code and seem to work.
  3. Sometime later, make a change in one of the upper layers and a bug emerges.
  4. Waste a lot of time looking at the new code without luck.
  5. Eventually trace the issue to a poorly handled corner or edge case in the utility code.
  6. Bang head on desk.
Much of this pain can be avoided by knowing that you can reason about the high level code because you know that the low-level code does what it says on the can. Which you accomplish by having a pretty exhaustive set of tests for the utility code.


Re-framing


As above, I'm going to dig into the arguments related to TDD as an example.

In the farming of some TDD proponents that would be testing a "implementation detail" and a bad thing. I assert that the problem isn't testing low-level details, it's treating the tests of low-level details as having the same immutable character as the acceptance tests (that is, treating them as enforcing a particular implementation instead of understanding that they test a particular implementation and will be replaced if you change the implementation).

We can frame this is a couple of way
  1. We can maintain a categorical separation between acceptance tests that state what we have to accomplish and implementation tests which provide assurance that the-way-were-are-doing-it-right-now is working. The former are non-negotiable and the latter are fungible and exist only as long as the related implementation exists.
  2. We can conceive each sub-module as it's own TDD project with it's own tests, but be prepared to swap underlying modules if we want to (which means swapping the tests along the way because the test go with the module).
Fundamentally both points of view accomplish the same thing. Which framing is most appropriate depends a bit on the structure of your code and how your build system works.3

Either way I end up with two kinds of tests.

And when I look at the other arguments I end up with the same conclusions:
I still need my low-level tests, I just need some other tests as well.

And we have names for them. I've already used "acceptance tests" which are typically end-to-end or nearly, generally have relatively large coverage per test, and often mimic actual use patterns. "Integration tests" check that components play well together and test a lot more code in one go than traditional "unit" tests. "Regression tests" help your audit changes in your user-visible output, tend to have high coverage, and are generally driven by use cases.

In any case "don't write low-level test" is the wrong lesson. The lesson you should take here is that tests come in multiple kinds with multiple purposes. Some tests are for the clients, some are for the developers, and some are for both. You need to figure out which are appropriate for your project and include all that qualify.


1 The authors can be forgiven for phrasing the title in so jarring a way. We all hate click-bait but we do want to draw the audience in.

2 Really complex algorithms are usually limited to a few places and almost always exists in libraries rather than written bespoke for each project.

3 I suppose that if your Process-with-a-capital-P is Test-Driven Design you want the build system to enforce the distinction for you so that you can't just wipe away an acceptance test thinking it belongs to a implementation detail.

No comments:

Post a Comment