Andreas Grabner, performance advocate at Dynatrace, spends a lot of time traveling the world talking to development teams about the importance of DevOps metrics. His message is simple: The data is out there and groups can use it to improve performance and to come up with better ideas. But the key is to actually use the data, not just guess randomly. Grabner will be taking his message to the JavaOne Conference this year in a session entitled "Metrics-Driven Delivery to Stop Bad Code and Bad Ideas Early." Senior technology editor Valerie Silverthorne asked him a few questions in advance of his presentation.
You've spent a lot of time talking about DevOps metrics, and you make a great case, but obviously you feel the case needs to be made. Why, if there are so many tools/options out there, is every company NOT doing this? What's the hold up?
Andreas Grabner: Believe me -- I am surprised and shocked as well each time I see eyes opening up when I remind them about the power of metrics.
After the DevOps talk (DevOpsDays in Boston), I had a team from a very large U.S. airline approach me. They are going through a transformation right now, but haven't thought at all about monitoring -- especially early and throughout the pipeline. The same just happened on my current trip in South America. I met three major banks that all went through a digital transformation [and they are] embracing DevOps. All of them have not thought about the power of metrics when applied for basic production monitoring.
I think people are too focused on making sure that a new project can get out the door faster than the way they developed software in the past. In the beginning, everyone wants to make sure to prove that DevOps leads to faster innovation. Metrics, however, are your safety net, as they tell you if what you are doing actually makes sense, has the right quality and delivers value.
"Stopping bad ideas earlier" -- great concept, but in a real world DevOps team, or a team trying to move to DevOps, what does it really take to accomplish that?
Grabner: It is simple. Each app or feature team needs to sit down and answer this question: What is the criteria of success of this feature and how do we measure it? Is it the number of users using it within the first week? Is it lowering the hardware requirements? Is it about scalability?
If the feature team gets these answers through DevOps metrics right after they've deployed a new feature/idea, they can draw the right conclusion; did we build the right thing that people love or is nobody using it? Does the new feature/app run efficiently (CPU, memory, disk, storage ...) or does it cost us more to run this than we actually gain?
Are we in danger, with all of these fab tools and, thus, metrics, of getting into a "too much information" situation when it comes to software development and deployment? And if that happens, what's the answer?
Grabner: Yes and no. If you just dump thousands of metrics on teams, you overwhelm them. That's why I believe that teams need to start answering the simple question I raised [above]; how can we measure the success of this new feature/idea/app?
Success, however, means two things: a) the number of users that use the new feature and are, therefore, creating business value and b) the cost of running that feature. That can be measured with the number of server instances [or] the number of bytes transferred or rows stored in the database. Especially when running in cloud/virtual environments, this becomes very important, as you have to calculate the cost per user/feature in these dynamic and elastic environments.
Another solution to "data overload" is to apply smart analytics on metrics. That's what tool vendors -- including Dynatrace -- have been doing in the last year. Instead of showing you raw metrics, we highlight problems and deliver the root cause, e.g., our tool spits out, "Hey Andi, your last code change just introduced the N+1 Query Pattern on your new back-end microservice. If you keep pushing this out, you will need 50% more Amazon Instances considering the load you expect."
Everything you need to know about performance testing and management
The inside story on APM
APM in the cloud -- it's a different world up there
App efficiency meets new urgency for Cisco