A little over a decade after it was introduced, there is a general consensus that test-driven development (TDD) works. In other words, developers that use TDD tend to complete tasks more quickly, have fewer defects, and deliver completed code that can be enhanced and changed more easily than developers using more linear “traditional” approach.
However, over time, unit tests shift from an enabler to a maintenance burden.
This article is about shifting them back, based on my experiences as a member of the technical staff at Socialtext, working on an application developed continuously since 2002, employing Agile methods like pair programming and test-driven development since 2005.
How the shift happens
New features mean changes to code, and that means old assumptions about how things "should" work can become wrong. As a result, over time, we found that unit tests were functioning more like “change detectors” than tests. In many cases a half-dozen failures weren't actually failures at all, and the developer had to go back and “green” the tests after a run -- changing the tests to understand the new system.
At the same time we split the tests up by minor feature (file) and major feature (in a directory). This means the programmer can get the fast-feedback feel of TDD again by running a file at a time -- but runs the risk of making a change that causes some other unit test somewhere else to fail.
We catch those tests later in the process with continuous integration, but it rears an ugly question: Just what is the value that unit tests give us? If the only value were in a true 'test' suite, we might be well off to dump the process, as false errors and the maintenance burden of 'greening' the tests introduces new costs. Is the cost too high? Certainly not! There are other benefits to TDD, including helping developers with design, documenting the system, or letting the programmers know when they are “done” with a component. Our programmers found those benefits to be valuable enough to continue the practice. I've found teams tend to either shift their expectations and focus to these benefits, or tend to abandon the practice entirely.
Continuous integration (CI) and test greening
By CI I mean a piece of software checks out the entire codebase, either periodically or after every commit, building the software, running all the tests, and either “promoting” the build in case of success or sending some sort of flag of alarm in case of failure.
At Socialtext we have a webserver called testrunner that does this; the problem is that testrunner currently takes three hours to run. After three hours, it may not be obvious to any given developer which check-in caused what failure. To keep things green, we have a tradition toward the end of each iteration of “test greening,” a habit where developers swarm on tests until the iteration lead or ScrumMaster judges the branch stable.
You could think of test greening as checking the “change detectors” one by one by a human being to make sure everything is okay, and updating the documentation, which just happens to be executable code.
Prune mercilessly and consider the cloud
Three hours to see a test failure is a long time, and that's if your commit comes right before a new build. If the programmer adds the code immediately after a build kicks off, it might be six hours to see results. Two ways to fight this problem are to prune old tests that don't add value, and to break your tests suite into several mini-suites that can run at the same time.
Beyond Socialtext, in my consulting and conference speaking work, four hours seems to be a sort of magic number. When tests take longer than four hours to run, it becomes likely the programmer will not see the results of his change in the same business day -- and possible that the results are out of date. As a result, some companies will turn off CI.
For this reason, I suggest that you don't let your CI runs go over three hours. We're at that limit now at Socialtext, even after pruning tests that don't provide much information. We are currently exploring technologies to run CI tests in parallel using tools like Jenkins. Tools like Jenkins allow us to segment the tests into different categories, and then run the segments in parallel. We are constantly tuning our prototype server and have had build/test runs as quickly as twenty minutes.
As cloud computing continues to mature and prices continue to drop, I expect to see more large projects doing continuous integration on multiple machines simultaneously.
Testing the GUI and customer-facing tests
Speaking of visual inspection, you'll notice I said the testing value of TDD decreases over time, and it's true: We've found that many test failures don't always mean a buggy build, nor does their absence mean the build is solid. So while TDD remains a solid engineering practice, over the years we have built a solid customer-facing, traditional testing “release cadence” into our software before every release. This article focuses on developer-facing test techniques, but Chapter 16 of "Beautiful Testing" contains a case study of the other half of the story, customer-facing testing at Socialtext.
In this article I discussed steps to keep a long-term TDD project running.
Instead of an all-encompassing theory, I have presented a single data point, grounded in reality. By using test-driven development and continuous integration effectively, you can ensure not only your application is high quality, but your unit tests are high quality as well.