Definition

software resilience testing

Sarah Lewis

Software resilience testing is a method of software testing that focuses on ensuring that applications will perform well in real-life or chaotic conditions. In other words, it tests an application’s resiliency, or ability to withstand stressful or challenging factors. Resilience testing is one part of non-functional software testing that also includes compliance, endurance, load and recovery testing. This form of testing is sometimes also referred to as software resilience engineering, application resilience testing or chaos engineering.

Since failures can never be avoided, resilience testing ensures that software can continue performing core functions and avoid data loss even when under stress. Especially as customer expectations are becoming higher and downtime can be detrimental to the success of an organization, it is crucial to minimize disruptions and be prepared for unwanted scenarios. Resilience testing can be considered one part of an organization’s business continuity plan (BCP).

How resilience testing works

Resilience testing is part of the software development lifecycle and starts with an application that needs to be tested. Then, a test environment is set up so the resilience test can be performed. In general, a few steps involved in conducting a resilience test are:

Determining metrics- Developers must choose which metrics should be measured in order to reflect the performance of the software. This could include input and output rates, throughput, time to recovery, latencyand the relationships between metrics.
Identifying the performance baseline- Next, a baseline for the maximum load the software can handle without experiencing performance issues needs to be gathered. This helps distinguish what the regular variance for performance is and can be used to compare metrics during testing.
Introduce and measure disruptions- This is the step where challenges are introduced to try to break the system. Breaking a system can be done in a variety of ways, such as disrupting communication with external dependencies, injecting malicious input, manipulating traffic control, constraining bandwidth, shutting down interfacing systems, deleting data sources and consuming system resources. After these scenarios are complete, metrics should be measured and plotted according to how each affected performance.
Draw conclusions and respond to results- Finally, results should be used to start discussions, fix the software and assess developer team practices. Teams should also use these findings to improve later testing scenarios.

Importance of resilience testing

Conducting resilience tests help minimize failure and security issues in the presence of a challenge. Examples of challenges that resilience testing helps defend against include power outages, system crashes, downtime and natural disasters. Additionally, resilience testing can help assess conformance to standards and best practices, privacy issues and scalability.

Resilience testing is especially important in multi-tier, multi-environment infrastructures. One way to improve the resilience of software is to migrate it to the cloud in order to minimize the chance of internal system failure. While disruptions can occur in the cloud, providers tend to have advanced recovery systems in place.

Reliability vs. resilience

Two terms that often get confused when applied to software are reliability and resilience. Resilience is defined as the ability to regain an ideal state or rapidly recover after undergoing a challenge. Reliability is the target that developers aim for, a system that has perfect operation or no downtime. When testing for resilience, reliability is the planned outcome. Resiliency is also known as recoverability.

This was last updated in June 2019

Continue Reading About software resilience testing

xMatters open-sources its Chaos Engineering tool to help organizations build resilient, self-healing systems

Chaos engineering enters mainstream QA, drills down to apps

Make your pitch for chaos engineering practices

Why software resilience should be the real goal of DevOps

Software resilience engineering helps teams quash chaos

software resilience testing

How resilience testing works

Importance of resilience testing

Reliability vs. resilience

Continue Reading About software resilience testing

Dig Deeper on Agile, DevOps and software development methodologies

Chaos Monkey

IT Sustainability Think Tank: Why 2024 is the year for IT managers to revamp their green IT plans

SRE vs. DevOps: What's the difference?

How to make a strong business case for software projects

How resilience testing works

Importance of resilience testing

Reliability vs. resilience

Continue Reading About software resilience testing

Related Terms

Dig Deeper on Agile, DevOps and software development methodologies

Chaos Monkey

IT Sustainability Think Tank: Why 2024 is the year for IT managers to revamp their green IT plans

SRE vs. DevOps: What's the difference?

How to make a strong business case for software projects