Service level agreements (SLAs) pertaining to application performance management (APM) are challenging enough in a normal application production environment, but even more so in the cloud. This is because the cloud’s architecture leverages shared and dynamic systems, making metrics and measurements difficult, if not impossible, to collect. In addition, there is a lack of expertise on the part of many cloud vendors regarding performance management creating unnecessary delays. And finally, APM solutions can be extremely expensive to implement within a cloud environment, causing some cloud vendors to forego the offering.
Given these challenges, what can an organization do to ensure that their cloud vendors offer adequate APM for their applications and how does an organization protect its customers from performance degradation when their applications are moved into the cloud? One way is to carefully construct and manage the SLA between the organization and the cloud vendor. This article offers ideas and insight into what makes a strong SLA for APM in the cloud.
Negotiating APM in cloud SLAs
IT organizations are moving their infrastructure, applications and services to the cloud for various reasons, such as lowered IT costs, decreased support complexity, and increased scalability of systems. Business executives support IT in this move as the cloud is perceived as a method for retaining and attracting customers, thus, increasing
However, without negotiating a solid SLA regarding APM in the cloud agreement, the cloud may actually result in lowered customer satisfaction and increased overall IT costs. This result generally happens when an organization does not understand the difference in APM with their current physical environment and the cloud, resulting in the organization accepting SLA terms that do not meet their requirements.
APM in physical environments
As discussed in my tip about APM ownership challenges in the cloud, a physical server environment that is located in an organization’s data center and is completely controlled by the IT organization makes APM relatively easy to manage. For example, obtaining metrics and measurements can be achieved through APM tools and applications being installed in the physical environment. In addition, the ownership of service requests is quickly identified due to strong internal communications among the network, operations, and application teams, reducing unnecessary delays with ownership identification. Customers also have a direct line into the IT organization by submitting a service request or calling the service desk. Physical environments allow for easier APM monitoring of thresholds due to no or limited shared services. Metrics can be collected and analyzed as the IT organization has direct access to the data. All of these create a strong application performance measurement environment, where problems can be detected before they impact the customer, resulting in higher customer satisfaction and retention.
APM in cloud environments
APM in the cloud environment is very different from what was described above. Operating systems, firewalls, and other resources are shared amongst several applications, making the collection of data and metrics difficult, if impossible, due to the shared resources being applied dynamically to the applications and services. In addition, most cloud vendors will not allow organizations access to their infrastructure, making it impossible for an organization to monitor or troubleshoot the underlying infrastructure. This means the organization must rely on the vendor to identify and fix the problem using the APM found in the vendor’s cloud, which may or may not be adequate or well-managed by the cloud technical team. Due to the difficulty of implementing and maintaining APM solutions, many times even cloud vendors do not have experienced APM technicians on hand, making for delays in resolving problems.
And what if the cloud vendor has other incidents with higher priorities and cannot get to the organization’s incident within the customer SLA’s time frame? Who is held responsible when the customer’s outage exceeds agreed upon metrics with the organization? And what if there are SLA credits that are payable by the organization to its customer when it fails to meet the contractual SLA metrics? Can the cloud vendor be held responsible for the SLA credits if the delay was due to failure to respond on their part? Most likely the answer is no, unless this has been agreed to in the APM section of the SLA.
In a cloud environment, the organization must rely on the cloud vendor to supply data from its virtual environment in a manner that is meaningful to the organization. In order to feel comfortable with the data being supplied by the cloud vendor, IT organizations need to identify and document in the SLA exactly what data is to be collected and reported, along with how the data was collected. The IT organization also needs to understand what APM tools and applications the vendor is using as not all APM tools and applications work well in the cloud environment.
And before moving existing APM tools and applications into the cloud, the organization needs to make sure that they are fully supported in the cloud by the APM vendor, as many times, they will not be supported due the differences in a physical and virtual environment.
APM solutions can be expensive and time intensive, however, a cloud vendor can spread this cost over the entire cloud environment, so organizations need to push back on vendors who say they cannot afford adequate APM solutions as part of the overarching SLA.
This article is meant to help organizations to begin thinking about what their SLA with a cloud vendor should contain regarding APM. There are many variations of what the SLA can contain, especially when the cloud is a SaaS (Software as a Service) or IaaS (infrastructure as a Service). The length of time for a cloud agreement can be lengthy, sometimes up to ten years, so it is very important to identify an organization’s APM requirements and place them into the SLA before signing on with a cloud vendor. And if a cloud vendor pushes back too hard on the APM requirements, maybe it’s time to consider another vendor.
This was first published in July 2011