Wouldn't it be great to sit down with a performance testing great at Google or Facebook, or a performance test expert at a big insurance company or a bank and ask them how they do performance testing? Better yet, how about getting them in the same room and having them debate? You'd leave not just with a list of tools, but ideas, details, challenges, and solutions.
It turns out that is exactly what Software Test Professionals offered at their recent conference, a hands-on performance testing course with Dan Downing, consulting tester at Mentora Group, and Goranka Bjedov, a capacity planner at Facebook (formerly of Google). Instead of looking for specific tools to come out the workshop, I wanted guidelines and rules of thumb that were more general, and was especially interested in the difference in background between Downing, working mostly with large IT organizations, and Bjedov, who was more experienced with customer-facing web applications.
Here are five of the nuggets I pulled out of that goldmine, served up for you on a platter:
Guideline #1: Enable earlier releases... but be prepared to rollback quickly
On the one hand, we have the idea of releasing software in perpetual beta, which may lead to a tolerance for defects. At the same time, we have techniques like production monitoring and rollback to limit the time-in-production of defects to a matter of minutes, or even hours.
On the other end of the spectrum, developers are increasingly writing automated tests and ratcheting up their quality levels.
Now think about the combination of these two realities: If customers will settle for "good enough," and developers can deliver "good enough" enough of the time (and when we fail, we can rollback) what do we need testers for, again, exactly?
That's not going to be a fun conversation.
The first takeaway here is to have your company make a conscious decision if such a strategy will work for them and if you should pursue it.
Bjedov further pointed out that when this transformation happens, test teams that take a "guardians of quality" stance could easily become irrelevant or redundant. She suggested that to survive, test teams need to change focus, from being blockers and protectors, to being enablers. In other words, "thanks to the test team, we were able to ship earlier with less risk."
Come to think of it, that might be a good stance to take in general.
Guideline #2: A fool with a tool is... still a fool
Even if quality is dead, you’ll still likely want to performance test multi-user, time-intensive applications. While you could test by having the team (or the company) bang away at a keyboard, most demanding applications will require some sort of automated performance testing. Bjedov introduced us to JMeter (more about that later), and Downing showed us BadBoy, a record/playback tool that can be used to help JMeter support secure sockets tests.
Yet they also indicated that 'tool-itis,' or the desire to automate the thinking process, can be the undoing of a project. For example, the idea that the evaluation of performance test results can be ‘automated away’ with tools would not work. Bjedov indicated that that desire is similar to saying ‘since we have compilers, we should not need programmers.’
So while they emphasized tools to generate load, both suggested having a human look at results.
Guideline #3: Test early, test often
While we typically think of performance testing as something at the end of a project, Ms. Bjedov suggested teams build a minimal system early and develop performance tests “along the way.” The example she provides is to have a nightly performance test run against a daily build. If a tester notices a change in performance characteristics, they can take the problem to the development team.
That has two benefits: First, the programmer who introduced the problem is likely to remember what he actually did, and second, the team will not find fundamental architecture problems at the end of the project.
Guideline #4: You're probably going to have to know a little math ... and your tools may not help
Most performance tools churn out numbers-- median, mean, and standard deviation are three popular ones. We tend to think of these numbers as if the middle meant something; we assume that the user experience has a “normal” distribution, or a bell curve.
Often, performance result data does not have a bell curve profile. One common result is a “bimodal” curve, much like a camel with two humps. Consider, for example, the case where icon loads are typically snappy, but degrade massively under heavy load. If the heavy load happens only during peak hours, it will skew production monitoring results. Also note that context matters; having the icons not load, but the screen be usable might be just fine for certain internal applications-- or at least not worth investing more dollars into fixing.
While a human can ‘get’ this in an instant, a computer evaluating averages is very likely to force an interpretation of a bell-curve result. So have your performance tester build a graph or analyze results every time; then try to find patterns.
As an alternative, Ms. Bjedov suggested looking at 70/80/90% distribution numbers (that is, 70% of transactions completed in what time or less) or drawing you own graphs and histograms and, as a human, try to figure out what is going on.
Guideline #5: Performance tests need to be realistic
Finally, performance tests should model the actual use of the system, both of the activities and the data. The best way to do this is probably to capture actual usage data in such a way that it can be “played back”; second best is to have the log leave a “trail of breadcrumbs” that developers or testers can analyze to find out customer use patterns.
Without some sort of monitoring or evaluation, a performance test is essentially a shot in the dark. Likewise, if the data will degrade as it increases in size, you’ll want to test with a large, loaded, somewhat-realistic database.
Where to go for more
Now that we've covered some ideas on how to think about performance testing, you likely want to dive right in. So you might enjoy the tutorial on using JMeter to create a test, by Mike Kelly, or his follow-up on using JMeter's record/playback capabilities. You might not be able to spin up a thousand servers at the push of a button, yet the two also stressed that many race conditions and concurrency defects are as likely to occur with four users as with four thousand.
Or, instead of testing, you could 'just' get to work on hot-rollback capability and deploy continuously. Either way, the next few years will be a fascinating time of advances in engineering and the components we use to develop and test.
Yet until our systems become self-aware, we’ll need a smart human evaluating the results after the fact, especially for performance testing.