Here's a software QA riddle: What kind of test sometimes fails and sometimes passes? A bad one.
Flaky tests, also referred to as nondeterministic, are automated tests that pass or fail seemingly without cause, even when run with the same configuration, without changes to the software's code, data or environment. While flaky tests are not necessarily indicative of a defect, they are problematic and difficult to diagnose.
In fact, developers and QA professionals find identifying and fixing flaky tests some of the most tedious tasks related to automated test suite maintenance. Flaky tests, which typically appear in integration- and GUI-level tests, can diminish confidence in the entire automation suite. When one occurs, the team might ignore it altogether, rather than sink time into investigation and determining whether or not the bug is real.
Testers can struggle to find the root cause of a flaky test in a timely manner. However, flaky tests do have some common causes. QA teams can take practical approaches to identify flaky tests and fix the underlying issues.
What causes flaky tests?
Often, tests end up flaky because of how you write them. In UI testing, QA can use asynchronous calls to load dynamic data, but those tests can get flaky when testers use sleep functions. These calls might also increase the run time of the test. Instead, use callbacks or polling, which will mitigate some of these issues.
Stale data from caching can also cause flaky tests, as can setup and cleanup issues, such as not returning the environment to its original state after testing. Tests that require a current time or to gather events throughout the day can become flaky when they run in different time zones, so consider time scenarios when designing them.
Flakiness is not always a result of how a developer codes a test. Infrastructure issues, such as node failures, unreliable network connections, database failures and bugs in the automation framework, also throw off test results. Testing with third-party systems often contributes to flaky tests, as these environments are not under your control. If possible, stub the integrating systems to ensure the initial tests run deterministically before you run them against a third-party environment.
Fix flaky tests
Once you identify flaky tests, isolate them from your reliable ones, and quarantine them in a separate test suite. Just one flaky test can potentially contaminate the entire suite, especially if tests are not autonomous. Flaky tests also cause bottlenecks in the CI pipeline if they're not removed.
This separation -- without eliminating the flaky test entirely -- helps ensure that, when a test fails, the team will investigate the result as a potential defect. You simply can't ignore the results of flaky tests, as quarantined but unfixed tests cause gaps in regression coverage. As it is time-consuming to investigate and fix flaky tests, prioritize analysis and remediation efforts. Consider the business value of the test. For example, does this test validate critical business workflows? And how does it impact the customer experience when you miss a bug in these workflows?
Once you understand the business value of each nondeterministic test, you can focus efforts on making the most important tests work first. Eliminate the most obvious external causes of nondeterministic results, and rerun the test with a clean environment and system state. Then, stub third-party applications. Once you eliminate external issues, examine the automation script for issues with concurrency, time and asynchrony; update as needed. A tool like DeFlaker or pytest can accelerate the analysis process.