On the limits of TDD, and the limits of studies of TDD

Greg Wilson writes:

This painstaking study is the latest in a long line to find that test-driven development (TDD) has little or no impact on development time or code quality. Here, the authors repeated an earlier study with a couple of new wrinkles, and then blinded their data before giving it to someone else to analyze to remove any possibility of bias. The result: no significant difference between TDD and iterative test-last (ITL) development.

I think it’s really important to pay attention to studies like this. Which is why I’m so glad that Greg is out there drawing attention to empirical science being done on software engineering.

It’s also important to keep in mind that science is always limited by the questions being asked. In this case, my eye was drawn to the experimental design:

The baseline experiment utilized a single experimental object: the Bowling Scorekeeper (BSK) kata. The task required to participants to implement an API for calculating the score of a player in a bowling game. The development of a GUI was not required. The task was divided into 13 user stories of incremental difficulty, each building on the results of the previous one. An example, in terms of input and expected output, accompanied the description of each user story. An Eclipse project, containing a stub of the expected API signature (51 Java SLOC); also an example JUnit test (9 Java SLOC) was provided together with the task description.

(Emphasis mine.)

A commenter on Greg’s blog already noted that this is an exceptionally tiny example coding problem, and questioned whether results on such a small, easy-to-conceptualize program scale meaningfully to real-world software projects. I think that’s a valid criticism.

But I’m more interested in just how well-defined the problem is.

For me, perhaps the greatest value in practicing Test-Driven Development has always been getting over the blank-page brain-freeze towards the beginning of writing a software component. And how, at the same time, TDD forces me to tightly define the problem before addressing it. My TDD process has always been dominated by these questions:

  1. Am I done yet?
  2. …well, do the tests pass?
  3. …and if they do, do the tests describe a completed solution?

This discipline has done more than any other I’ve tried to keep me focused, to help me whittle down the problem statement to specific essentials, and to avoid superfluous tangents.

In the experimental design quoted above, most of that mental work has already been done.

I also quipped on Twitter:

…which may seem flippant, but I’m actually kinda serious about it. Research, understandably, tends to focus on questions that are easier to ask. But a lot of the most important questions (in my opinion) that need to be asked about software today have to do with difficult-to-measure externalities. Technical debt is one such tough-to-measure externality. But even more difficult, and more vital, to ask are questions like: how much of our developers’ happiness, wellbeing, and calm are we burning to achieve these easily-measured productivity/quality results? What state are we leaving developers in for their next project?

I’m glad research like the study cited above is happening. We need to be mindful of it. But we also need to be aware of the questions that aren’t being asked.