Application performance is often a series of tradeoffs that occur throughout the application lifecycle. Many times, we don't even know we're making a tradeoff. For example, if we cut the schedule for the development of a particular service, we don't recognize that logging or monitoring functionality might be the code that ends up not getting written. If we allow for emergent design, we may not recognize the need for a focus on low-level performance metrics early in the project. Or perhaps we miss non-explicit tradeoffs, like a certain design might maximize usability from a capability perspective but kill usability when it comes to certain performance scenarios.
This is why application performance testing and monitoring can be so important for some projects. As teams work to test pre-production, and monitor post-production, they are often looking to tune their application to an ever-changing operating environment with an evolving user population. Tuning application performance isn't unlike sound-mixing – where you're asking people with some specialized skills to "listen" to your application and move a bunch of knobs, sliders, and dials to obtain optimal performance.
In this article, we explore some simple strategies for helping manage some of the tradeoffs associated with application performance testing and monitoring. We talk about testing and monitoring differently, not because monitoring isn't a subset of testing – it is; but instead because monitoring often extends well past the testing phase of a project into production. Testing is about developing an understanding of application performance limits and providing evidence to support conclusions. Monitoring helps with that, but it's also how we identify issues real-time. Armed with this real-time information, hopefully, we can respond quickly enough issues to avoid outages, lost revenue, or other production mishaps.
Tying performance metrics back to business metrics
Late last year, Stoyan Stefanov came out with an excellent presentation titled The performance business pitch. In the presentation, Stefanov looks a key business metrics for some popular websites and shows how application performance can correlate to both top-line and bottom-line business results. One fantastic example from his presentation is Shopzilla, who by redesigning their site was able to dropped page load times 80% and subsequently increase revenue from 7 to 12%, with corresponding increases in page views and paid search traffic. Or Netflix who, by enabling gzip compression for plain text components, noticed a sudden drop in outbound network traffic. This drop lowered their bandwidth bill by 43%.
It's my experience that few teams take the time to really understand how application performance can translate into key business metrics like those Stefanov illustrates. Without this understanding, performance testing and monitoring efforts can become large exercises in capturing a lot of data, but then not knowing how to make sense of it. When you're testing, tuning, or monitoring – if you want your efforts to be meaningful and focused then you need to have some clear context for what's important to the business and why.
Performance testing guru Scott Barber often talks about understanding "intent" when doing performance testing. In a large part, this is what he's talking about. By developing a working understanding of the business drivers for application performance, a skilled performance tester can better make sense of the performance goals and objectives, and develop a suite of tests that can bring to light some of the tradeoffs in play for a project. This often also makes test and monitoring results more actionable. Because results are phrased relative to the business drivers for the system or application, they create a more compelling story for change.
Take the time to define both top-line and bottom-line application business metrics for your project. What do the project stakeholders care about? And how much do they care about it? Work with them to get a prioritized list, in the language of the business – not technology, of what they care about and why. If possible, try doing a short couple-hour workshop on the topic and document the findings for their review. As you prioritize the metrics, you might consider creating different tiers. Sometimes it's easier to classify goals using broad buckets than it is to force rank them. This can also allow different stakeholders, who might have conflicting goals, to more easily communicate relative importance.
Identify what processes and transactions will affect those key metrics. This mapping of business processes to technology processes is sometimes very straightforward if you're application has a focused purpose. But it can be difficult if you have a diverse user population or range of functionality. Often customer advocates, analysts, architects, DBAs, and UI designers can help with this mapping. The goal is to figure out, for each metric, if you wanted to change the application performance to possibly affect that metric, what you would change. If you can't identify any performance implications to a metric, perhaps that means it doesn't belong on your list. At the end of this exercise, you might have a taxonomy of different aspects of your system and how they might affect key business metrics.
You want to get to a level of understanding where, instead of asking the question "How fast is fast enough?" you're instead asking "If we want better performance for metric X, how might we obtain it and what other business metrics might be impacted by making that change?" Not only does this help you better understand what you need to test, but it also provides insight into what you might need to be monitoring and what types of monitoring might be helpful.
Using information like this, teams can often use performance targets and goals to drive down to specific service level agreements (SLAs) around performance. By defining low-level SLAs across the application, each team or developer knows how fast their code needs to be. It also means they know performance is now their business, not someone else's. They can't (reasonably) put off that testing until the end, because they know what they're going to be held accountable to. And in my experience, most teams want to know this information. They don't like the surprise later on.
Correlating user behavior with performance metrics
It's not uncommon for a single application to have multiple performance profiles. Some of those might be predictable. For example, many financial applications see heavier usage around end-of-quarter or end-of-year processing. Or you might see seasonal variance with the types of or volume of transactions people process. Other usage profile changes might not be as predictable. Go-live might look different than steady state. Or a large advertizing campaign or new product announcement might change thy way people use our site.
Not understanding or anticipating the different usage models for your application can lead to a false sense of security around your test coverage. You might be passing all your tests, just to fail on go-live. Or perhaps worse, you might be running along for months in production before serious problems occur. If you know about a potential problem in advance, either through modeling or pro-active monitoring, you have time to design creative solutions to deal with the issues.
Usage modeling isn't always just about trying to test different worse-case scenarios either. It's also possibly looking at different ways to utilize excess capacity in the "off season" or testing to see if dynamic provisioning will support certain growth scenarios. You might also try different usage models just so you can build a taxonomy of different load profiles so later on, when you're monitoring production, you can go back and correlate what you're seeing in production to different load patterns during your testing phase. Sometimes that can provide insight into potential problems. "Oh yea, I remember seeing that before. Last time we saw that we were…."
In a previous article on analyzing performance plateaus and stress areas, I provided an overview of how you can use various UCML diagrams to do this type of analysis. In his article on the User Community Modeling Language (UCML) Scott Barber outlines how UCML can help visually depict complex workloads and performance scenarios. In Barber's own words, UCML can "serve to represent the workload distributions, operational profiles, pivot tables, matrixes and Markov chains that performance testers often employ to determine what activities are to be included in a test and with what frequency they'll occur."
The power behind a modeling approach like this is that it's intuitive to developers, users, managers and testers alike. That means faster communication, clearer requirements and better tests. When performing this type of modeling I start with the end user in mind. What will the user do with the software? What types of transactions do they care about? What time of day will they do it? What needs to be set up in the systems for them to be successful? The list of questions goes on and on.
Test and build out monitoring as you're writing the code
The best time to be thinking about possible performance issues is when you're actively designing and writing code. You're still close to the code. You know what it does. You know how it works. For each day away from the code, the fuzzier it gets in your head. The more developers committing to the codebase, the less you really know about the code you're interacting with.
All of this points to thinking about performance issues while you're writing code. That's also the best time to be thinking about how you're going to answer questions about performance in production. Aside from delaying issues until later in the project, waiting until the end to test for performance also makes it much more difficult to make sweeping changes if they are needed. That type of feedback often needs to happen up-front. If you needed to come up with a creative way to log a critical piece of real-time information, or if you need to speed things up by an order of magnitude, you would like to know that while you're writing the code. Not days, weeks, or even months later.
For most projects today, it's relatively easy to test close to the code. While that's not true for all languages or deployment platforms (think mainframe or some embedded systems), for most of us, we could likely spend three times as long testing our code as we did writing it if we wanted to. For example, for Java tools like JUnitPerf, JMeter, and an assortment of JVM monitoring tools have made it quite easy to capture unit or component level performance. If you did a good job cascading component-level SLAs down to the various components, then each developer (or team of developers) should be able to measure component level-performance against those goals before integration begins.
Similarly, monitoring and alerts can also be addressed at this level. In some cases, you can simply instrument using a company-wide tool (like Introscope, OpenNMS, or New Relic), or you can leverage some simple logging (log4j, log4r, or log4net), or if the need calls for it, you can build out something more elaborate. Even if you can't implement the proper level of monitoring in your development environment, you can flag the monitoring task as a deployment or post-deployment activity.
To help identify where monitoring and alerting might be handy, I've developed the habit of doing sequence diagram walkthroughs. I start by asking the following two questions for each exchange between participants:
- What would happen if that failed, and how would I know?
- How would I know if that were running fast enough, and what would the implications be if it slowed down?
Putting these tools in place, and actively monitoring your application while doing your performance testing does a couple of things for you. First, it tells you if you're monitoring is setup correctly. Second, it provides visibility into how any logging or monitoring you've implemented might impact application performance. Third, and most importantly, it provides a set of "second opinions" on your performance testing results.
Relying only on a single tool to provide your performance metrics is can be a dangerous approach. Often, tests are complicated and require a lot of subtle configurations. Environments can be tricky to get full visibility into - especially virtual environments. By leveraging your monitoring alongside your performance testing tool(s) you can develop the habit of validating numbers using multiple methods.
Managing focus while testing
With all this focus on figuring out the right metrics to care about, usage models to use, which tools to use for monitoring and component-level performance testing, you might be asking yourself where writing and running a more traditional load test comes in. Well, in my experience that's not where teams fall down when it comes to managing the application lifecycle. However, I'll leave you with this last set of tips for managing focus while writing and running your tests.
Often, teams and testers can develop tunnel vision when it comes to creating and running their tests. They become overly focused on one aspect of performance: the end user experience, how long specific transactions take, or what server profiles might look like while under load. There's nothing wrong with focusing in on where you think the risk is, but just make sure you don't do it to the exclusion to everything else.
One sided focus means incomplete test coverage, which can lead to false confidence and issues waiting to happen when you get to production. Balancing focus means you're less likely to overlook a process not in the critical path or in the common scenarios. It can help test many of the assumptions you've made around what you think is happening when you run a test.
To help protect against this pathology, on a regular basis, for one or two tests take the time track a transaction through every level of the system (database, services, batch jobs, etc….) and get individual timings under load. Look at the profile of the servers while they are processing that transaction. Understand what the impact might be to the end user. Try to become the world's foremost expert on what's going on in the system, for that thin sliver of time, for that small transaction.
These "deep dives" into your testing are also fantastic ways to better understand what's happening each time you run a test. It's an easy thing to tell someone to do, but for any system of even moderate complexity, it's incredibly difficult. Believe me when I say it's time well spent. Not only will it make your testing better, if you are involving the rest of the application team while you're doing it it's almost like practicing a fire drill for a production issue. Teams that can do this analysis well can respond to production issues well. It's almost like a capstone project – tying together all your models, metrics, and logging into one neat package.
For more on application lifecycle performance testing and monitoring, take a look at Matt Heusser's Just-enough application lifecycle management (ALM). Also worth a glance is David Johnson's Building a Performance Assurance Center of Excellence tutorial. Finally, for a more in-depth look at just performance testing, there are a series of links in a short answer I gave for a question on application performance testing across company networks.