With several agile teams I’ve worked with, thinking about operations has been an afterthought. Even when operations personnel are captured as stakeholders upfront, many times the hand-off from development to production is … well, “less than coordinated.”
I think this happens for a handful of natural reasons:
- The operations profile (interactions with other systems, infrastructure requirements, technology dependencies, configuration requirements and options, etc…) emerges as development unfolds over multiple iterations. That makes it difficult to look at the intermediate iterations, where the software is possibly going through rapid changes in direction, and invest in that type of documentation at the time. That means that the operations documentation often isn’t created until later iterations, for example a transition sprint in scrum, where it’s a mad dash to remember everything that was built out, capture it in one place, and deal with the realization that not much upfront thought was put into how this software would live and breath in production.
- Some aspects of operations are emergent. They depend on the architecture and technology decisions that are made as the project unfolds. They require extensive testing to measure. For example, providing details around system requirements (how much processing power, memory and disk space you’ll need), characteristics of performance/availability/reliability, and troubleshooting guidelines all emerge naturally from the testing process. Some aspects of operations often aren’t requirements you can just choose, you have to figure them out through iterations of testing and experimentation.
- There are many aspects of operations most programmers on the team just won’t know much about. When should backups be done, and how? What needs to be done for monitoring, alarming and alerting? How will that integrate with the technologies currently being used for those activities? Where will log files go and how often will they be checked, stored or cleared?
I call those natural reasons because all of them will necessarily happen to some degree no matter how much planning you do or how you structure the team. In any project, at early stages where you want to document aspects like this the most, that’s when decisions are most likely to change. Understanding operational criteria is something that’s always going to require testing and experimentation, which means it’s going to happen later in the project. And it’s unrealistic to expect programmers to be experts at programming and to be experts at knowing how to run data centers and administer systems.
So what can we do?
I think there are a couple of things that can be done:
- As mentioned earlier, many teams invest in the idea of a transition iteration. A period of time dedicated to pulling it all together. They collect the test results, inventory the system requirements, sit down with the operations folks and perform any last-minute tweaks to make the software manageable in the target environment.
- Other teams work with the operations team upfront to get stories for what their requirements are for an application from an operations perspective. While they still likely won’t get done until later iterations, it can reduce the surprise and can ensure that any big features are part of the technical road map for the product.
- I also think it’s helpful for the programming team to have exposure to operations in the form of production support for the applications that get deployed. Through needing to support the products in the wild, they develop an appreciation for the common issues around troubleshooting and maintaining the software. This can be done a number of ways. Some teams create separate production support teams and have programmers rotate through that team. Others have the same team that develops the product support it in production. What’s important is that the team gets the gritty exposure to the problems.