# Testing, or not
Software testing is one of those things that nobody can seem to agree on. Everything
from what to call different scopes of tests (what *is* a "unit" anyways?) to whether to
do it in the first place. Fortunately, consensus has *largely* formed that writing and
maintaining them is valuable for most applications, but that there are still a lot of
different opinions. As I've gotten later into my career, I find my focus and opinions on
what and how to write tests has changed over the years. This post serves as a brief
summary of my approach and methodology to software testing. It is **not** intended to be
prescriptive of what others should do, I likely don't work on the same kinds of projects
as you do and my methods may not translate to every environment. This gets to the core
of my methodology:
## It's a tool, not a ceremony, but ceremony is ok
Software development is full of ceremony. Some of it might even be valuable, especially
for junior engineers who haven't had the time to develop the *sense* for what is
appropriate and when. Testing is a very good example of this. When I was a wee junior
engineer, I tested nearly everything. Every function, every branch, and often times even
larger, *scarier* **"integration tests"**. I didn't know *what* was worth testing and my
lack of experience meant I was both more likely to make dumb mistakes (I still am, of
course) and I didn't really know *why* tests were valuable. I was just told that they
were, and it made logical sense after all. Tests ensure code is correct! ... right?
*Ceremony is the practice of the naive, but not without value.*
## I shackled myself to a veneer of correctness for fear that I might blunder
After writing thousands and thousands of lines of tests I got burned one too many times.
One Friday evening I was 7 hours deep in into a high priority fix; production had
hit a snag on some poorly formatted data and the team who primarily used the application
was waiting on a fix. At the end of when I normally called it quits for the night, I
fixed the core issue but there was one problem: we had over 80 tests that needed changes
to pass the CI pipeline and meet the requirements for review and approval. These tests
were not providing any value, in fact, they were providing *negative* value. They were
creating a barrier to progress. Now, most of the issues were small things like adjusting
a type name or function signature, not massive overhauls to what the tests were
validating, though some required more invasive changes. Ultimately, many of those tests
were thrown out, not that night, but later on after a few more cases like this.
The problem with *overly* comprehensive test suite, among a suite of other issues
like CI pipeline times asymptotically approaching infinity, is that they do not
really testing anything important. It's not that they weren't testing something or were
necessarily incorrect, but the logic they covered was not critical to the behavior of
the system and increased *drag* on further changes to those modules.
*Tests are not a good fix for a poorly designed system.*
Sometimes, there are better tools for the job, such as static type systems and a little
defensive programming.
## Who's going to test the tests?
Years later I was working with a large Python code base. This code base was well tested,
enough that we had already beaten our CI pipeline's execution time back down to under 15
minutes for a second time already. Python, being a much more... *dynamic* language meant
we had to heavily rely on testing to validate that code would even run. Ultimately, this
worked well enough, but the maintenance burden was intense. If there is any single
lesson I learned from this it's that dynamic languages are the wrong choice for most
large, complex systems. The other lesson is that complicated tests are more likely to
not be testing what they claim to be, or even validating anything at all.
Around this time I was fixing a failing test for an internal command-line utility. This
utility was *not* simple, but it had a larger test suite than any command line utility I
have ever seen. The tests that I was in the process of fixing used an output capture
feature of pytest to check that certain information was displayed to the user under a
failure condition. This seemed unusual to me at the time, considering the primary
consumer of this information was a human actively looking at the output of the utility
and doubly so because it was effectively testing that `print` worked properly. After
digging further into this test, I discovered that it didn't really validate anything at
all: the failure condition it was simulating in the test was implemented improperly. The
condition it was trying to simulate used a different exception type *and* the simulated
condition was triggered in a different point in the code higher in the call stack such
that it was caught by a broad `except` block that would have made the condition
impossible in the first place. But the test was *complicated*, so it mostly evaded
scrutiny until I was confused as to why it was still passing despite making substantial
changes to what it claimed to be checking. It was deleted shortly after.
*Tests should be simpler than the code they validate.*
## I was wrong then, and I'm wrong now
Requirements change. Some other team discovers a new constraint that was missed, some
assumption was incorrect, or the business simply has different priorities and needs now.
Either way, software is powerful because it is flexible. The first section of this post
went into what can happen in an overly tested application, but here's another reason:
some code simply needs to remain flexible. Tests are the opposite of flexibility, they
are an assertion that something should remain static, that behavior should **not**
change. This is often times appropriate. Tests themselves can be flexible and can be
changed, but too many and it starts hurting "agility". As with many things, this is a
balance that takes years of building up intuition to know when and where this balance
is.
*Test coverage should be inversely proportional to current + future rate of change,
confidence in the understanding of requirements, and criticality of the code.*
## What really matters is what others see
Coding is inherently abstraction, making decisions for a future user to perform some
task with minimal steps and decision making usually using programming language
constructs such as functions, types, interfaces, and too many others to list. Whatever
your abstraction tool of choice, a well design abstraction makes testing it obvious and
easy. "Test the API" is a common mantra, and it's true. At a minimum, testing the
behavior that a user *relies* on is the most important thing to get right, it's the
hardest thing to change after the fact too (especially for library authors).
*Testing your interfaces is easy, but building good interfaces to test is hard.*
Every function is an interface. It has semantics, inputs, outputs, and may even throw
exceptions. Not every function need be tested, though. Some are not meant to be used outside of their immediate context. It's *sometimes* worth writing tests for these, but typically only when they're particularly critical or tricky.
## The Rest
I haven't written the rest of this, but come back later for more.