I absolutely love emergent design and architecture. My experience has been that the solutions that come from working in this fashion tend to be far better than those I’ve seen “planned” up front.
Performance issues occur with the growing application
If you’ve worked on an Agile project, then you’re likely familiar with the following pattern:
- Sprint 1: We lay some foundational code, getting scaffolding in place, basic templates, and we get tools like CI and basic code coverage in place.
- Sprint 2: We start to lay in some core features. We’re focused on getting enough functionality that we can release to production and start the “release to prod at the end of every sprint” cycle.
- Sprint N: We deploy our first usable code to production, and we get users on it so we can get feedback.
- Sprint N+1: We’re iterating on customer feedback and stories for new features.
- Sprint N+5: We find that some core piece of functionality (our homepage, search, or some other feature) is dog slow for some reason. It happened over time, but it’s finally gotten to the point where the team writes a story to crush the problem.
- Sprint N+6: We’re rolling smooth again in production, on to more stories.
I’ve seen this pattern unfold a number of times now. Sometimes a slowdown doesn’t happen until later in the project, and sometimes it’s sudden and not something that happens slowly over a number of sprints. In either case, it’s often part of working in the Agile landscape where you’re moving in tight iterations, releasing to production with each sprint, and allowing both design and architecture to emerge over time.
Getting visibility on the problem
In early sprints, operational concerns take a backseat to getting core functionality in place. That’s just the natural order of backlog prioritization. However, at some point in the project there’s a tipping point, and operational concerns quickly move up the list in priority. Once you have “enough” foundational functionality in place, user experience quickly climbs the list of priorities and stories focused on improving or protecting that experience start to get pulled into sprints.
When this tipping point occurs, for me that’s often the time I start to really think about what the team’s strategy is for managing application performance. Managing performance isn’t just testing, it’s understanding how you’re performing in production, where your most taxing queries are, and what areas of the application are most threatened by iterative architectural or design decisions. To help get visibility on some of those aspects, I find it helpful to get the following two stories into the backlog:
Getting production performance monitoring in place (with alerting) for key site features/services where there might be performance concerns. These could be client facing (like measuring homepage load time every 30 minutes) or they could be monitors running in the background (like alerting on any database query taking over X seconds).
- Getting performance metrics from production into the regular lineup of metrics we review as a team. If we regularly monitor things like new user signups/cancelations, user time in session, or transaction counts for key features then I want a set of regular performance metrics listed along side of those so we can regularly notice trends over time.
Each of those will take a bit of tool selection, setup, and coding. And they both come with an obligation to act on the data they provide. Issues identified by either need to be added to the backlog along with anything else, and prioritized along with other stories and bugs on a sprint-to-sprint basis.
In addition to monitoring and trending, I also find it helpful to make sure there are some basic diagrams in place showing key infrastructure components, configurations, services, and data stores. These could be simple sequence and network diagrams. The goal isn’t comprehensive documentation -- it’s to create a crib sheet highlighting some of the nuances that you’re going to have forgotten three sprints after you’ve set something up in production.
“Done” includes performance
Most teams have a shared definition of what it means to be “done” with a story. Often this includes some level of testing, code maintenance, and customer acceptance. To make sure iterative design and architecture don’t become unmanageable problems, I often advocate that the team add the following:
Diagrams: Confirmation that any core diagrams (the ones just mentioned above) are updated as appropriate based on the changes in this story.
- Monitors: Confirmation that any needed production monitoring/trending items have been completed along with this story, or new tasks have been added to the backlog to ensure they don’t get forgotten.
- Performance Tests: Confirmation that any new required performance tests have been completed, up to and including creating unit-level performance tests for the CI environment or running full-blown integration or system level performance tests and being prepared to show the results of those tests as part of the sprint demo.
In many cases, all three of these can be summarized into one higher-level task called “story-level performance management.” Keep in mind that for most stories, there will be nothing to do related to this task. In my experience, most stories don’t have the potential for deep performance impact. However, for stories where performance management is an issue, it’s a safety net to ensure these simple tasks don’t get forgotten in the day-to-day pressures of getting ready for your sprint demo.
If a story has performance implications, make sure that you demo the results of your performance testing after you demo the functionality of the feature. This is your opportunity to open up the conversation to the client for “How fast is fast enough?” In many cases, these discussions lead to new stories for the backlog, but they also result in fewer mismatched expectations around current-state performance and what priority performance should have in the short and long term.
At times, the team will decide to engage in full-blown performance test cycles. This may be after an event, or if the team has the foresight, at key points in the product roadmap where they know they have the potential to really impact performance. This impact could come from a technical change, or it could come from an anticipated influx of new users due to marketing or partnerships.
When I describe these test cycles as “full-blown,” I’m simply trying to imply that there’s formal structure around the team’s approach. Instead of taking a story-by-story approach, they are instead tackling the problem from a systems perspective. These tests will often model multiple parts of the system interacting at the same time, and in many cases you’re not just testing to prove “it works,” but instead you’re also testing to see to what degree “it works.”
These performance test cycles might only happen occasionally, but when they do, you should have a mixed set of action items/stories resulting from them:
- Bugs that need to be fixed
- Stories for possible tuning/optimization activities
- Stories for possible additional testing activities
- Stories for possible new production monitoring events
If you have an incredibly performance dependent system or a rapid growth pattern in terms of new users/transactions, then you might also very quickly build a team whose full-time job it is to just do this testing and tackle tuning/optimization activities. If that’s the case, then often those teams work out of their own backlog and will have their own ideas for what might be added to the “done” checklist for new features. If you’re lucky enough to have the problem of this type of growth, then you’re likely past the stage of a handful of developers trying to manage performance in addition to all the other aspects of software quality and user experience.
On an Agile project, performance is everyone’s responsibility: developers, testers and the product owner. Managing performance requires communication, an informed understanding of where you’re at today, and an idea of what the major concerns will be down the road based on the product roadmap. Developing techniques to get visibility on possible problems, preventing them, and for ensuring you’re talking about them regularly is the best way to ensure that while you’re design and architecture change over time -- your focus on what’s important to test and monitor does too.
About the Author
Mike Kelly is currently a partner at DeveloperTown, a venture development firm. He regularly writes and speaks about topics in software testing, but spends most of his time working with teams to deliver ridiculously high-quality solutions faster than they could without rigorous testing practices. You can learn more about Mike and his other works on his website, www.MichaelDKelly.com.
This was first published in July 2011