BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
The DevOps movement is making great strides in repairing crippled deployment processes. On the surface, the solution is for developers to think more like ops and vice versa. However, that surface level idea can only be leveraged so far. It takes concrete, actionable steps to overcome technical debt and approach the effectiveness of an application development organization such as Amazon -- where, by the way, the average time between deployments was 11.6 seconds as of May.
At Agile 2013, keynoter Gene Kim presented a workable framework for making actionable change in the application development and deployment process. Kim is the creator of Tripwire and was the chief technology officer of the company up until the summer of 2010. Now, three years later, Kim has published a novel, The Phoenix Project, which explains how to bring flexibility to a stagnant development organization. The novel dramatizes three major concepts Kim learned in his fourteen years studying the DevOps phenomenon.
According to Kim, organizations that have been engaged in a DevOps program for 12 months or more display significant added efficiencies over those that have not begun a DevOps program. Kim said that high-performing DevOps teams are more Agile (with 30 times more frequent deployments and 8,000 times faster cycle times) and are more reliable (with twice as many successful changes and a fraction of the mean time to response).
Achieving a high-performing DevOps team takes three main ingredients, Kim said. In his recent novel, Kim illustrates each ingredient as one of "three ways" presented to the protagonist by his mysterious guru. The three ways boil down to process flow, feedback and continual learning.
Way No. 1: The flow
In building process flow, Kim outlined five key rules. First, understand the flow of work. Without understanding, any changes will have random effects. Second and third, always increase the flow and never pass on defects downstream. If you imagine the application lifecycle management process as a river, it's easy to see how moving projects back upstream for corrections is difficult, especially when the process flows rapidly.
Fourth, never allow local optimization to cause global degradation. This is tough in large organizations with internal development silos. Each team wants their own piece to be the star of the show. Keeping everything in balance, however, is much more beneficial for the whole system. Fifth and final, achieve profound understanding of the entire system. This one links back to the first rule. You have to understand every nuance of the parts and how they fit together in order to keep them all in balance.
From these rules, Kim found one major way to improve deployment times. The trick is to find and break up bottlenecks in the process. Kim quoted one of his favorite lines from Eliyahu Goldratt's The Goal, saying, "Any improvement from changes anywhere but at the bottleneck is just an illusion." In this way, the entire deployment process chain is only as fast as its slowest link.
These are the six areas Kim said are most likely to be bottlenecks in large software development organizations: environment creation, code deployment, setting up and running tests, overly tight architecture, development, and product managers. Kim said the two most common links among organizations that show consistent speed and quality are infrastructure version control (89% use version control) and an automated deployment process (82% automate their deploys), according to a 2012 Puppet Labs survey.
Way No. 2: Consistent feedback
The second way is feedback, and it has four important rules. First, understand and respond to the needs of all customers, both internal and external. Second, shorten and amplify all feedback loops. Kim said this rule is modeled after the Toyota Lean manufacturing line, where any employee can stop the line immediately if they see any defect. Third, create quality at the source. This means proactively building quality into everything. There's no need for developers to rely on QA to find their errors. Fourth and final, create and embed knowledge where we need it.
Kim suggested returning fragile services back to the developers and making them responsible for service stability. To demonstrate why that works, Kim quoted Patrick Lightbody, the CEO of BrowserMob, who found, "When we woke up the developers at 2 a.m., we found the defects got fixed faster than ever." Of course they weren't just waking up developers in the middle of the night to irritate them. The developers only had to get up at 2 a.m. if their code was breaking. This naturally encouraged developers to create higher-quality code. "It comes down to shared pain," Kim explained.
Kim also suggested setting up simple automated monitors on a variety of metrics. For example, developers might create simple incremental counters to track both successful and failed login attempts. This monitor would give the team feedback on how well the authentication services are running. It could also spot a spike in malicious activity.
Way No. 3: Continual learning
Kim's third way is to foster a culture that encourages experimentation (even taking risks), rewards successes, learns from failures, and recognizes repetition as the prerequisite to mastery. Kim said that successful organizations couple a culture that keeps pushing into the danger zone with the habits that enable them to survive the danger.
Kim cited Adrian Cockcroft, cloud architect at Netflix. "Do painful things more frequently so you can make it less painful," Cockcroft advised. He said that his developers don't balk at the painful tasks because they know it will make the deployment process much smoother for everyone. Cockcroft's methods were proven in April 2011, when EC2 outages took out Reddit and Quora, but did not severely impact Netflix -- even though Netflix was equally tied into EC2.
Cockcroft's team developed a testing tool called Chaos Monkey. The purpose of Chaos Monkey is to randomly disable services in order to test the independence of all the other services. This is a somewhat extreme example, but it really demonstrates the value of failing early and failing often. Doing so let's the development organization grow to handle anything that can (and therefore probably will) go wrong once the application goes into production.
Kim presented three recommendations for sticking to the third way. First, enforce consistency everywhere -- in the code itself, in the environment and in configurations. Second, use Assert statements in the code to find misconfigurations and enforce best practices. Finally, use static code analysis along with automated continuous integration and testing.