Terminology in testing - making a mockery of you
The trouble with test terminology
I've written before about my views on avoiding mocks in your unit tests. Stumbling across a Reddit post in the PHP forum the other day where someone had shared another blog post on this topic, Stop mocking about: Event Dispatcher, one thing which really stood out to me in the comments was the level of debate, uncertainty and confusion around the everyday terminology in automated testing.
Mock versus test double. Unit versus class method. Method call versus feature. Unit test versus integration test, integration test versus functional test, functional test versus acceptance test, acceptance test versus end-to-end test.
For each of these terms, you probably have a specific and quite narrow definition in mind when you use them. There are probably clear boundaries delineated in your understanding of these terms and reflected in the structure of your test suites and the style of your individual tests. This may even come from or be in alignment with the documentation of whatever particular application and test framework you tend to favour.
So it might surprise you to learn that regardless of what you mean by those terms, it's almost certainly not what many other people mean when they use the same terminology. And it's not that one of you is right and the other is wrong, it's that these terms aren't as strictly defined or narrow in scope as we might like them to be. They might be strictly scoped within an organisation, or a framework or your code base, but as numerous web forums and discussion sites about this topic demonstrate, people can't even really agree on whether instantiating two objects in the same test makes it a unit test or some other classification of test.
I'd be surprised if we can all broadly agree on what "test" means at this point, frankly.
Picking apart the threads
Let's start with the concept of a mock. When I say my (at least broad) advice for writing unit tests is to avoid mocking as much as possible, what's a mock exactly, what am I referring to there?
Some people use mock to refer to any test double. I would loosely define a test double as an object or function which exists only for the purpose of acting as a substitute for a dependency of another object or function in a test, usually to provide pre-defined controlled behaviour, mimic state or external resources and remove complexity. Partly and in particular in PHP, this broad equivalence of mocks and test doubles comes from the fact PHPUnit's API allows you to use the
getMockBuilder() functions to build any kind of test double.
When I say mock, I am referring to the specific type of test double which acts as an observer and validator of expectations about how the double will be used. That is, things like what methods on the double will be called, how many times, in what order, with what parameters - where these expectations act as assertions in the test and will fail if not met in the course of test execution.
So I use mock specifically to differentiate that type of double from other classifications of double, such as dummy, stub, spy or fake.
Most of the time. Even with that definition in mind, I'll still sometimes casually say mock as a shorthand for any test double in conversation. And just to make it even more potentially confusing, although it's mocks specifically I consider to be a test smell, I'd still say my broad advice is actually to avoid test doubles wherever you can in tests.
Of course you'll inevitably have to use them a lot - just don't start with using a test double for any dependency as your default. Use a double if using a real object would be awkward for the test - and by awkward, I mean things like would introduce stateful side effects, rely on external or remote resources unavailable to the test environment, or be unacceptably slow to execute where you want the tests to be fast.
Whether these things are "awkward" is very much context-dependent; in your highest level of system tests, you probably do want to test against real databases, real APIs, real state, etcetera. and you might not mind if those tests are quite slow to run. In your low-level library/unit tests, on the other hand, these things probably are "awkward" and call for test doubles.
Which leads me on to my next point.
What's in a unit, anyway?
The main place we will use test doubles is in what are generally referred to as "unit tests". I sometimes find it more helpful, at least in explaining the concept, to refer to these as "library-level tests", as opposed to "application-level tests". Why? Because sometimes people use the phrase "unit testing" to mean automated testing in general. And that's okay - unit testing, like all of the phrases I listed at the top of this post, is actually more of a wishy-washy, ill-defined term than you might think.
For me, unit tests (or library level tests) are the tests we write against individual components ("units") of our code. Cue the obvious question; what's a unit, what's a component? Is a one line private function a unit? What about a public function? What about an entire class? What about a function in a class which has to call a function on another object outside that class (i.e. a dependency)?
The answer is: that's up to you. A unit is whatever makes the most sense for your project to deliver usefulness and test value, to give you confidence in the probable correctness of changes to your code. For me, when I write PHP, units are usually but not necessarily the public methods on a class. If units are something else for you, that's fine.
Think about this; why do we bother classifying our types of test in to different categories in the first place? It's because we have different criteria for them. But the criteria that's important to you might not be important to someone else. The criteria that's important in one of your projects might not be the same criteria that's important in a different one.
These criteria tend to be - when will the tests execute? Several times a day on your local machine as you write code, to check you haven't broken the bit you're working on? On every commit? When a pull request is opened, when something is merged in to master/main, when a release is prepared for deployment? What resources will the tests need in order to run? A file system, a database, a headless web browser, just RAM? Do we need to bootstrap some initial data as a starting point? Do we expect these tests to be able to run independently of each other, to be arbitrarily repeatable or in parallel? Or do we expect them to run a series of actions in order as a sequence, preserving state in between?
Does it matter, then, to be hung up on terminology and trivial details when deciding what to call a test? We have meaningful criteria to determine in what test suite - whatever we've named it - a test should be placed. What value is gained from a team internally debating whether using a real but artificially instantiated
Request object to test a
ContentNegotiator service makes it an "integration test" when you would consider it a "unit test" if only the
Request object was mocked?
None. What matters is does this test meet the criteria of belonging to your unit test suite? If those criteria are, for example, all the tests should be independent, self-contained, stateless, able to be run in parallel, fast, in-memory, validated on every pushed commit and require no resources external to the test harness, then those criteria have been met. Stick it under unit tests, it's where it belongs.
And broadly for me, those are the criteria for what I call unit tests, in my projects. It's okay if those aren't your criteria. I've literally seen people argue over whether it's okay to touch the file system in a unit test. Well, if I was writing a test which included a component writing and parsing 2GB of XML and I know it's going to be slow, I'm probably not going to stick that in my unit test suite. But am I going to take the time to laboriously construct some kind of complex set of interfaces and virtual file system just to avoid writing 2 kilobytes to the system tmp directory as part of whatever a unit test is testing? Hell, no - because whether or not a test reads or writes a file at all isn't (usually) one of my criteria for what goes in that test suite. You wouldn't call that a unit test? Fine. Let's call it a sufficiently fast and useful test and move on.
Focus on test quality
The thing I want people to take away from this post is that all the tests we write as code are automated tests. Focus on writing useful, qualitative automated tests and you're doing your job right.
What you call those tests is the least important detail in any of it. Group similar tests together and call the test suite whatever the hell you want.
Fast, stateless, in-memory tests? Great, group 'em together, call 'em unit tests, call 'em integration tests, call 'em commit tests, it doesn't matter.
Tests on the high level interaction of your database, logging and user classes? Great, group 'em together. Call them integration tests, call them functional tests, call them "requires a real database tests".
Tests which fire up a webserver, headless Chrome and then use Webdriver to load pages, click buttons and introspect headings? Group them together, call them functional tests, call them acceptance tests, call them end-to-end tests, call them "needs a browser tests", it doesn't matter. What matters is all the tests in each group have common properties which make them distinct from your other tests.
My advice; don't get hung up on coming up with some narrrow definition of terminology which will inevitably be at odds with what other people mean by the same words, anyway, and don't spend too much time worrying about your test pyramid or what kinds of tests you're focusing on. You're building software to try and deliver value and you're writing tests to have confidence in that value.
As long as you have clear boundaries between your test suites and your expectations of how they behave, what they do and what they're telling you, all you need to remember is to write useful automated tests which meet your testing goals.
By constraining ourselves to very narrow definitions of what tests are and what they should and should not include, all we achieve is to make it harder on ourselves to write good automated tests.
Don't let your tests make a mockery of you.
All comments are pre-moderated and will not be published until approval.
Moderation policy: no abuse, no spam, no problem.
Why type hinting an array as a parameter or return type is an anti-pattern and should be avoided.
Leveraging the power of JSON and RDBMS for a combined SQL/NoSQL approach.
Musings from a Reddit thread
Life with a newborn baby aka why I sometimes go long periods of time without making any new posts.
Maximise performance with load once scripts, kept in long-running memory