Are Characterization (Golden Master/Snapshot) tests supposed to be human readable?

Question

I am trying to write characterization tests, in order to quickly put a legacy system to a test harness.
I was not able to find many examples with production code. The examples I find are small programs.
An example by an author of Working Effectively with Legacy Code implements them as unit tests.
It's nice because it's human-readable and help you understand the code. However, I think it would work for small programs in the example, but the system I am trying to test is a pretty complex API client.
Other examples store the inputs and outputs in files and read them to compare the results with the 'snapshots'.
Some examples are VCR, Approval Tests, Golden Master Testing
I feel like this allows the test input to be generated, and suitable for testing a large set of inputs.
However, for some reason it feels more high-level than unit testing, and not very human-readable.
Are these tests supposed to be part of your unit test suite, or are they supposed to be complements to unit tests?
In other words, should I sample some characteristic test inputs and write characterization tests as unit tests, but using the actual output from the code to 'lock down' the existing behaviors, and trying to make it readable? Or should I treat characterization tests as a complement to unit tests? If so, what should I focus on in my unit tests?

jonrsharpe · Answer

Generally, characterisation tests are not an end state. They're a way to pin the current behaviour of untested legacy code, so that you can start making changes towards maintainability and testability with a reasonable level of confidence that the overall picture hasn't changed. They're not good tests, though, they don't:

tell you any of the behaviour is actually correct;

tell you where the problem is when a test starts failing (they're just detecting changes); or

tell you which parts of the current implementation are important.

So readability isn't particularly important in this case; once you have explored the behaviour more thoroughly, found the overlaps between the test cases (and which ones shouldn't have passed) and written higher quality tests that give you real confidence, you can get rid of most if not all of them.
The other use for characterisation tests is where the alternatives would be less readable, e.g. when you have code that outputs a report and you don't want to have to exhaustively check each part of it. Instead you generate the report, manually check it and then compare future reports to it to make sure nothing changes. Although this "snapshot"-style testing can be harmful at lower levels, enforcing "change detection" that actually prevents refactoring, at this level it's useful.

Are Characterization (Golden Master/Snapshot) tests supposed to be human readable?

One Answer

Add your own answers!

Ask a Question