Reliability testing is an often-overlooked aspect to performance testing. The key to reliability testing is to measure how long the application will run under reasonable load. The approach I take on all of my teams is a four-step process. First, I work with product management to determine what a typical day's worth of load might look like. This is described in number of transactions, with percentage of transactions across application features or pages. Second, I then extrapolate from that the number of transactions required for the site to be up in weeks or months. Third, knowing the application's peak throughput (transactions per second or transactions per minute), I calculate how long my test needs to run at 80%, 85%, or 90% of that throughput in order to achieve the reliability goal. Fourth, I execute my test.
A real-world example might help. Let's pretend I'm the test manager for Google's search page, google.com (and my numbers are all fictitious). Let's also pretend that Google performs 100 searches per day with 5 transactions per search, and that Google's maximum throughput is 10 transactions per minute. Finally, let's assume we agree as a company that we want to prove the site can handle 30 consecutive days. Here are my calculations, then:
One hundred searches per day, five transactions per search x 30 days = 100 x 5 x 30 = 6,000 total transactions 80% of 10 TMP = 8 TPM 6,000 transactions / 8 TPM = 750 minutes. So I can conclude a couple of things. First, I can simulate 30 days of traffic for my site at 80% of capacity in 750 minutes, and second, my site had better run 750 minutes or longer at this peak.
One final thought: Reliability testing is part proving and part testing. I always aim to prove my site can handle the expected duration, but once that has been proven, I tend to leave my tests running in order to probe the limits. I'm looking to establish what the longest expected service window could be, for instance.
In the end, reliability testing is a core, but often overlooked, component of testing. Most operations teams benefit, however, from an understanding of the service window they can expect with an application. More importantly, proving an application's reliability testing is a key component of proving whether an application is ready for release.
This was first published in August 2009