While there's always going to be an issue or two that sneaks by from time to time, if you're seeing a pattern of missed bugs, you're right to step back and reevaluate your performance testing strategy. It's difficult to tell you what to add without looking at your application and the tests that you run today. However, I can share some examples of what I've done in the past in similar situations.
I often find that one of the traps teams can fall into is that they spend the majority of the time they're doing performance testing confirming their performance requirements, and not looking for performance issues. When looking for potential issues, the tests you run will often look different - as will the nature and the pace of the work. Scott Barber outlines many of these differences in his article Investigation vs. Validation.
One of the first things I tend to look for when I'm engaging with a team noticing performance issues is who's involved. I've worked on a couple of projects in the past, where all the performance-test execution and monitoring was done by the performance tester. While I know several performance testers who have the knowledge required to work effectively solo, it's a challenge for anyone to dig into everything at once: app logs, database analysis reports, JVM performance, physical hardware performance, network devices, test tool logs and results, etc - it's a lot.
If you're application is in anyway complex, you'll want multiple people involved with each person digging into different areas of the application trying to uncover potential problems. This often also brings to the table a more divergent set of monitoring tools. Another common issue is that the only numbers that get looked at are the numbers generated by the test tool. Doing this glosses over a lot of other information and can leave potential problems lurking behind the scenes. Make sure you're looking at the problem from multiple angles using a variety of tools.
Another common mistake I see is that teams become afraid to deviate from their baseline workload models. When you're looking for issues, many times you want to break out of these models or stretch them in interesting ways. The team doing the investigation should meet on a regular basis and review a prioritized list of what-if scenarios. Use the information you learn from each test run to make better decisions about what the next best test to run might be. If you just did a round of tests focused on potential long running queries and that didn't turn up anything concerning, then with this next test instead look at high levels of session abandonment. Or, if the test for long running queries yields some concerning numbers or errors, then dig in with more tests focused on the database.
A further place to look might be your test data or environments. I've seen a lot of teams skimp on setting up test data or leave out environment components (like a load balancer) because they can be hard to create, setup, or maintain. This can often hide problems you'll find when you get to production. Make sure you have enough (and enough variety of) data to push your system like it will be pushed in production. Look at the number of users ids you have, the number of records in your databases, or the number of files you'll have to serve up. Make sure you're environment is setup the same way. Look how network devices are configured, how servers are clustered, or perhaps even geographically where you're load is coming from (which you might need to simulate with your load generators).
Finally, test at different layers within the application and at different points in the development process. There's a lot of performance testing that can happen at the unit/component/service layer, and much of that can be done early in an iteration. In addition, it's often easier to stress individual components of a system than it is by introducing large numbers of virtual users using a user workload model.
Hopefully that's enough to get you started down the road of finding more issues before you get to production. Just recognize when you plan the work that the time it takes to investigate is less predictable than the time it takes to validate. Over time, you'll find that the team that does the investigation will get more efficient as they develop both rhythm and a plan of attack for the types of issues you're finding in production.
Dig Deeper on Topics Archive
Related Q&A from Mike Kelly
There are multiple ways performance testing can be handled on an Agile team. An expert describes the benefits of various approaches. Continue Reading
Every software tool is individually designed to meet various needs and requirements of projects, teams and project managers. Learn what tools experts... Continue Reading
Creating user acceptance tests out of basic software requirements documents can be a daunting task. Expert Mike Kelly points out logical approaches ... Continue Reading