In Chuck Price's first 34 work days at TD Ameritrade, the company had 35 critical system failures. Who could blame...
the CIO for wondering, "What have I gotten myself into?"
Plus, they were hearing from users "all the time," Price said. "Our help desk had special processes just to manage the volume of calls. I used to say, 'We've got to stop making our users be our monitors.' They told us things were bad."
That was in 2005. Now the online brokerage firm has maybe one system failure per quarter, Price said, and user calls are "infrequent." He attributes that largely to the use of an application performance and availability monitoring solution from Redwood City, Calif.-based Nimsoft Inc. that enables the organization to be proactive about performance issues.
TD Ameritrade was not unique. According to the Aberdeen Group's report on application performance management (APM) released in March, the top performance challenges companies face are the inability to identify performance issues before users are impacted (61%), the inability to measure the business impact of issues with application performance (49%), and lack of visibility into the end-user experience (46%).
Key to helping TD Ameritrade was the ability to use Nimsoft proactively. "Originally, the company had red and green reports every morning -- things were either alive or dead," Price said. "I said, 'What happens when they get unhealthy?'" By setting different performance thresholds, he said, "you know when something is becoming unhealthy and you can mitigate the risk before it becomes a business impact."
Bojan Simic, research analyst for Aberdeen and author of the APM report, said the ability to anticipate application performance problems before they occur is "one key story of the report."
"Two to three years ago a lot of IT departments worried about mean time to repair and time to troubleshoot. The key metric now is the ability to troubleshoot before the problem occurs, before users starting picking up the phone," Simic said.
Also, organizations surveyed reported that application performance issues are having "a significant impact on key business goals, but nearly half of these organizations are not able to quantify that impact."
Price said that TD Ameritrade uses Nimsoft and custom scripts to "drill into the application and execute and test transactions, so we exercise our business-facing and customer-facing applications end to end, measuring the business perspective experience. From a technology perspective, it's interesting if [the problem is] you haven't got [enough] memory, but business users don't care. We use Nimsoft to simulate the customer-based experience of our products, to make sure they're performing like we expect them to."
Performance is not solely the domain of operations, according to Price. The development team drives the requirements for Nimsoft, he said. "They're part of that fix and monitoring process. They talk to the monitoring team and tell them, 'We know we're going to do this [to fix a problem], and we've put checks in place, but can you have Nimsoft monitor this thing for us?'"
Price explained that after a fix is made to an application, monitoring is put in place to make sure the fix worked and doesn't occur again. For example, he said there was an incident in which files weren't being transferred to clients when they were supposed to be, and discovered it only occurred under certain situations. "The right conditions needed to exist [for that problem to occur], so developers fixed the problem, and we put a monitor in place to alert us. It's almost like automated regression testing."
Originally Price's plan was to use Nimsoft to monitor just the IT infrastructure, but they have been able to integrate it with various point solutions to monitor the physical infrastructure as well, such as the diesel generator. "I said to the extent we can, let's use Nimsoft as a single pane of glass, and where we need a point solution let's integrate it with Nimsoft. We connect Nimsoft through SNMP. Nimsoft has all the routing logic for alerting, and we want that all managed through Nimsoft."
Price said that Nimsoft also has a lower total cost of ownership, and it's faster to implement and easier to manage than the previous application performance management solution they had, which was HP OpenView.
Price also said users are happier with performance. "Our customer satisfaction index scores have gone steadily up as a result of application performance. When we first had advisory board meetings with our top customers, their thing was [for us] to make the technology stable. At our recent meeting in early March, they were saying we've got a completely different technology department now. They don't worry about it anymore."
Price chalks that up to moving from "a reactive monitoring environment to a proactive monitoring environment, mitigating risks when they're unhealthy, not when they die."
For other organizations looking to implement an APM solution, Price offered this advice: "Manage the thresholds aggressively; if you don't, you can get overloaded with alerts and false positives. That's something we've had to become more disciplined about." Also, he said, invest in some training. Although Price's staff member got up to speed quickly with Nimsoft without training, "six months later we sent him to training and he knew lots more things. He's like a new person; I wish I'd done that to begin with."
Aberdeen's Simic also had some advice: "Look for tools that provide the visibility to identify and resolve problems before the business processes get disrupted." Also, he said, determine if the solution provides visibility into each transaction. "Within applications, some transactions may be having a stronger impact on business performance than others."
Finally, he said, the product should show which segments of the infrastructure are causing performance problems, such as the server, the application or the end-user device. "This is pretty consistent with what organizations doing well [with APM] are putting in place."