Test Driven Development (TDD): How, What & Why

A word about automated testing

Why do we do automated testing? Why do we write unit tests, functional tests, e2e tests, API tests, contract tests, acceptance tests? What benefits do we get out of doing this?

When we think about automated tests, the things which usually come to mind tend to be fewer bugs (that new features work correctly and that existing features haven't regressed and broken), better quality code and reducing the manual labour involved in testing software.

I'm going to immediately go against this grain by suggesting you ought not to think about automated testing in this manner. I'm not saying automated tests are useless at (partially) achieving these goals, but to me they are not the primary benefit of testing.

Fewer bugs

Let's get this one out of the way first. A bug is an unintended behaviour in software. If you've been relying on automated tests to catch your bugs, or labouring under the belief that if you have sufficient automated test coverage and your test suite is fully passing, it means you don't have any bugs, I have bad news for you: neither of these things are necessarily true.

See, the only real measure of whether software is buggy or not is whether or not it does what it's supposed to do in the real world. It's entirely possible to have a high level of code coverage against a fully passing suite and still have bugs in your software, because you didn't properly understand the real world requirements of what you were building, or because you are testing the wrong things.

Conversely, it is also possible to have a failing test in your suite and not have a corresponding bug in the software, where the correct course of action is to change or delete the failing test.

Better quality code

Mmm, kinda-sorta. Generally we do see at least a loose positive correlation between the things we use to measure code quality (and this is a ton of different metrics) and the comprehensive coverage of code with tests. Code which is able to have a high proportion of coverage with tests is more likely to score better on factors like flexibility, maintainability, clean architecture, security vulnerability, etc. But again it is very much possible to write tests over poor quality code and for the false sense of security this creates to disguise poor quality code. It is also possible to write both poor quality code and poor quality tests.

Reducing labour in manual testing

Some, yes. You might not need humans to test everything with as much granularity, you might be able to save humans some time performing repetitive tasks in their testing cycle and you might be able to automate certain assurances about your software behaviour which never needed a human seal of approval in the first place.

But there are some judgements in the quality or correctness of software which can only be made by a human. Imagine we're building a computer game, a car racing game or something. We can automate tests around mathematical details of the rendering and reactions in accordance with simulated physics, sure. But we can't automate how exciting it is to play, or how realistic the car looks as it swerves around a corner. These judgements are by definition human judgements.

Even these sort of judgements aside there will always be things, no matter how technically correct the software, which can only be caught by people using your software, in the real world, for its intended use case. Small edge cases in combinations of things we didn't and wouldn't think of, maybe sequences of user interactions which were not anticipated.

Bottom line: human end-to-end testing is important, and even then after production your real-world users will still find and report things to you that you just didn't anticipate. Our aim is to keep that "real world testing" as minimal as possible, we can't get rid of it.

Okay, so why bother?

The key word for me in automated testing is confidence. That's what writing tests really gives us; we can have better confidence in the quality of our code, in the correctness of our features and the reliability of our software.

We don't know that new features always work as intended in any scenario, we don't know we definitely haven't broken any existing features, but the more robust and comprehensive our automated test suites, the more confidence we can have that we are closer to these goals.

Why a test-first approach?

Regardless of whether you agree with my perceptions on testing, we probably all (mostly all) agree there are benefits to having automated tests and that broadly, the more comprehensive coverage the better. That is, we should aim to have test coverage throughout the different types of testing - the test pyramid, as it's often depicted.

Test driven development is a methodology for writing software whereby we write the tests before we write the code which is being tested. We always start any new feature with tests which fail because there is no code yet to do what they are testing.

As much as I argue the primary benefit of testing is confidence in the correctness of software, so too I'd say the primary benefit of TDD is confidence in the correctness of our tests.

The thing people tend to forget is that programming is hard and humans make mistakes. If we can write buggy code, we can and will write buggy tests. What TDD gives us is an upfront assurance that what we are testing is what we think we are testing - and this is something which is easy to miss if you write your tests after the fact. It's all too easy to write tests which rather than testing a feature are testing an implementation. We end up testing code instead of behaviour, but it's behaviour we care about.

A quick example of a good versus bad test

To illustrate the point above, once I was doing a peer review of someone's code and part of the job being reviewed involved parsing a configuration file in PHP's ini format, using the parse_ini_file function.

It was some years ago now, so I can't remember the exact use case or precisely what the code looked like, but I do remember the corresponding unit test was something along the lines of the following:

public function testLoadConfigurationFileReturnsConfigArray()
{
    ...
    $config = $theObjectBeingTested->loadConfigurationFile($pathToTempFile);
    $iniSettings = file_get_contents($pathToTempFile);
    $expectedConfig = parse_ini_string($iniSettings);
    $this->assertEquals($expectedConfig, $config);
}

You can see why the author would have thought this test, written after they'd implemented the loadConfigurationFile method, was reasonable. They already knew that this method loaded the config from an ini file by using parse_ini_file and decided to test that given some known example of such a file, the output matches the output of grabbing the file's contents and running it through parse_ini_string. And the test passes, becauses why wouldn't it?

The problem is the test doesn't really say anything or test anything about the expected behaviour, beyond "is it the same as calling parse_ini_string on the contents of the same file, regardless of what parse_ini_string does."

But there's a crucial difference between that and what we should be testing. The business specification of the code being tested is "given a file in a format with the following particulars, parse it in to an array and return the array", not "do whatever a function named parse_ini_string does, regardless of what that is."

And it might seem like a pedantic difference but it's not, it's huge. This test doesn't specify or know anything about what parse_ini_string does, or whether it does anything. Maybe it deletes the contents of your file system, maybe it returns an empty array regardless of input - or maybe it does actually parse an ini format file in to an array and return the result. The point is, the test doesn't know because it isn't testing that.

In testing this way, what the author's really done is effectively just implement their code twice, two slightly different ways, and then check the result is equivalent. What they should have done is test the business parameters of the feature, in which case their test would look something more like this:

public function testLoadConfigurationFileReturnsConfigArray()
{
    $iniData = <<<DATA
    foo = 1
    bar = false
    baz = "/home/some/path"
    DATA;

    // Write this data to temp file
    ...

    $expectedConfig = [
        'foo' => 1,
        'bar' => false,
        'baz' => "/home/some/path".
    ];

    $config = $theObjectBeingTested->loadConfigurationFile($pathToTempFile);
    $this->assertSame($expectedConfig, $config);
}

Now we are testing what we think are testing - that a particular string is converted to a particular array.

When we adopt a test-first methodology, this is one of the most powerful things we get out of it. We define our test cases according to the feature, the business logic, we are trying to build and in doing so we have a much better confidence that we are testing the right thing.

But there are secondary benefits to TDD too; one is that we will be able to maximize our level of test coverage as close to 100% as is practicably possible, another is that we will inherently end up writing our implementation code to be more testable, because every line is written towards the goal of making an existing test case pass.

We can also quickly spike, prototype, or sanity-check design ideas that we have in our head; if you would struggle to write test cases for an API, you would struggle a lot more to actually build it.

What about drawbacks?

There are indeed a couple of cons to TDD. The first is it generally requires a greater investment of time and effort upfront, the second is that it can - especially in more complex systems - lead to ever increasing layers of abstraction and architectural bloat, which if you're not careful can create the very technical debt, poor maintainability and lower quality TDD aims to prevent.

What is the TDD cycle?

TDD is a very short but continuous development cycle which looks like the following:

  • Write a test defining a behaviour
  • See the test fail
  • Write the minimum code required to make the test pass
  • Refactor
  • Repeat

A slightly elaborated description of these stages might be:

  1. We translate some requirements – be it a new feature or a bug report – in to a docblock or funtion name for a test. We describe in simple terms what it is we are testing.
  2. We write a test that matches the description. This test will only test a single aspect of functionality.
  3. We run the test against our unmodified code and see that it fails. If it doesn't fail, we know that our test is not correct.
  4. We make the minimal changes to our code required to make the test pass.
  5. We run the test again and verify that it does pass.
  6. We refactor the code we've written to improve its design and ensure our test still passes.
  7. We then write the docstring for the next test and repeat the cycle.
  8. We can write multiple tests before doing any code. We'll know our development is complete and the requirements met when all the tests go from failing to passing.

💡️ One thing I think is important to mention is this methodology is called test-driven development, not unit test-driven development. Contrary to common misconceptions, it does not dictate that the type of test you must start with is a unit test.

How to do TDD - a trivial example

My blog is powered by Markdown. One of the things I do in the code is extend this notation slightly to support a small set of unicode emojis I frequently use. This means I can write something like @smile@ and it's automatically converted to 😀️.

I want to implement this feature as a method on an existing class via TDD, so I'll start by defining a test name which describes what I want the unit of code I'll be writing to do.

public function testParseEmojisReplacesSymbolNamesWithHtmlUnicodeEntities()
{

}

Now I'll fill in this test skeleton by adding a simple case.

public function testParseEmojisReplacesSymbolNamesWithHtmlUnicodeEntities()
{
    $parser = new MarkdownParser();
    $input = "Hello Dave @smile@";
    $expected = "Hello Dave &#x1F600;&#xFE0F;";
    $this->assertEquals($expected, $parser->parseEmojis($input));
}

The first time I run this test, it fails with an error, because the function I'm testing is new and doesn't exist yet:

Error: Call to undefined method Gebler\MarkdownParser::parseEmojis()

I then create an absolutely minimal, do-nothing implementation of the parseEmojis method on my class:

    public function parseEmojis(string $text): string
    {
        return '';
    }

The next time I run my test, it still fails, but now it's because my method doesn't fulfil my test criteria:

   Failed asserting that two strings are equal.
   --- Expected
   +++ Actual
   -'Hello Dave &#x1F600;&#xFE0F;'
   +''

I can now write the minimal, most simple implementation of the parseEmojis method which makes the test pass:

    public function parseEmojis(string $text): string
    {
        return str_replace('@smile@', '&#x1F600;&#xFE0F;', $text);
    }

I run my test again and it passes:

 ✔ Parse emojis replaces symbol names with html unicode entities

Time: 00:00.010, Memory: 4.00 MB

OK (1 test, 1 assertion)

Now I want to support more emojis, so I refactor my test to cover more cases, perhaps using a dataProvider.

public function emojiTestCases()
{
    return [
        'smile' => ['smile', '1F600'],
        'lightbulb' => ['note', '1F4A1'],
        'exclamation' => ['warning', '26A0'],
    ];
}

/**
 * @dataProvider emojiTestCases
 */
public function testParseEmojisReplacesSymbolNamesWithHtmlUnicodeEntities($symbol, $entity)
{
    $parser = new MarkdownParser();
    $input = "Hello Dave @{$symbol}@";
    $expected = "Hello Dave &#x{$entity};&#xFE0F;";
    $this->assertEquals($expected, $parser->parseEmojis($input));
}

Now I have three test cases, but two of them don't pass because my implementation was only written to minimally fulfil the first test case. So I need to refactor that too:

    public function parseEmojis(string $text): string
    {
        $emojis = [
            '@smile@' => '&#x1F600;',
            '@note@' => '&#x1F4A1;',
            '@warning@' => '&#x26A0;',
        ];
        foreach ($emojis as $placeholder => $entity) {
            $text = str_replace($placeholder, $entity.'&#xFE0F;', $text);
        }
        return $text;
    }

And I see my tests pass:

Markdown Parser (Gebler\MarkdownParser)
 ✔ Parse emojis replaces symbol names with html unicode entities with data set "smile"
 ✔ Parse emojis replaces symbol names with html unicode entities with data set "lightbulb"
 ✔ Parse emojis replaces symbol names with html unicode entities with data set "exclamation"

Time: 00:00.014, Memory: 6.00 MB

OK (3 tests, 3 assertions)

Now all that's left is to refactor the implementation to make it a bit better, without making the test fail. Perhaps I end up with something like the following:

    class MarkdownParser {
        //...
        private static $emojis = [
            'smile' => '1F600',
            'note' => '1F4A1',
            'warning' => '26A0',
            //...etc.
        ];

        private static $unicodeEmoji = '&#xFE0F;';  

        public function parseEmojis(string $text): string
        {
            return preg_replace_callback('/@([a-zA-Z0-9]+)@/', function($matches) {
                $placeholder = $matches[1];
                if (in_array($placeholder, array_keys(self::$emojis))) {
                    $placeholder = '&#x' . self::$emojis[$placeholder] . ';' . self::$unicodeEmoji;
                }
                return $placeholder;
            }, $text);
        }
        //...
    }

In conclusion

This is a fairly trivial example, but that's it, that's the TDD cycle. We write a test, we see it fail, we provide the smallest possible implementation to make the test pass, we refactor, we repeat. We do this for each feature we want to add, bug we want to fix - any change at all we want to make to the implementation of our software starts with a failing test.

In doing this, we will write code which is inherently testable and have better confidence not just in the correctness of our code but also in the correctness of our tests, that we are testing the right behaviour. This helps us avoid the trap of buggy or misleading tests, it gives us - in the tests - living, evolving documentation about what our software should do and brings us a step closer to a cleaner, flexible and more maintainable system.


Comments

Add a comment

All comments are pre-moderated and will not be published until approval.
Moderation policy: no abuse, no spam, no problem.

You can write in _italics_ or **bold** like this.

Recent posts


Saturday 10 February 2024, 17:18

The difference between failure and success isn't whether you make mistakes, it's whether you learn from them.

musings coding

Monday 22 January 2024, 20:15

Recalling the time I turned down a job offer because the company's interview technique sucked.

musings

SPONSORED AD

Buy this advertising space. Your product, your logo, your promotional text, your call to action, visible on every page. Space available for 3, 6 or 12 months.

Get in touch

Friday 19 January 2024, 18:50

Recalling the time I was rejected on the basis of a tech test...for the strangest reason!

musings

Monday 28 August 2023, 11:26

Why type hinting an array as a parameter or return type is an anti-pattern and should be avoided.

php

Saturday 17 June 2023, 15:49

Leveraging the power of JSON and RDBMS for a combined SQL/NoSQL approach.

php