Twenty years ago, it was unprofessional to test code in production. Instead, a smart team used a test environment...
that was isolated from live servers.
Today, testing in production is all the rage for software QA professionals. Modern ways to test code in production don't just throw the code over the wall from development to the live IT environment. Live test strategies include ways to manage and mitigate risk to users. There's no error-proof way to test code in production, also called shift-right testing. In this process, tests move from the middle of the code creation and deployment process toward the end of it. Nuanced and skilled approaches can balance speedy deployment with the risks of unchecked code.
For organizations ready to embrace shift-right, methods include synthetic setups, intensive monitoring and continuous deployment with gates. Develop a strategy around the methods that best suit the application under test.
Synthetic accounts and transactions
Testers can access a production website on different devices and browsers to look for problems in live applications. For example, a tester might try a user-facing feature, such as Find a store near you or Product search to gauge its performance. Other components, however, such as Checkout and Order insurance, don't lend themselves to easy end-to-end testing, as they require the tester to conduct real business.
Synthetic accounts, also called test accounts, solve this problem. Synthetic account transactions look real, but they do not create an actual order and cannot be seen by production users.
A synthetic account has a login and other features that mimic production accounts, minus credit card processing and order fulfillment. Programmers write conditional code for the step where the test account's order is submitted to a processing system and the transaction charged. A conditional statement can be an if statement that reroutes the order to write an entry in a log instead of executing. The most realistic scenario for a synthetic transaction is for the system to fulfill the order end to end, then write a log and forget the transaction based on the account information.
To test in production with synthetic transactions, establish compliance rules and prevent regression in live accounts. This preparation takes a bit of effort, especially for financial transactions. Once established, synthetic accounts give testers the capability to try out a great majority of system capabilities in production.
Once these accounts exist, testers can run them all the time, on a loop, a useful sequence of user-like steps. For example, the automated sequence could create an account, log in, perform a search, add an item to the cart and check out, then repeat. As the test program runs these processes, it tracks every system error, page load time, API call time and other relevant metrics. The program might even capture images for testers to evaluate alongside these other measures of user experience (UX). Then, when customers complain about an application's performance or features not working, testers have a baseline of UX data logged for troubleshooting.
The gift that gives twice
To accomplish the same goal as synthetic accounts on a smaller scale, consider using gift cards so that testers can complete transactions. This method comes in handy during product launches, for example, or in advance of holiday site traffic.
While this method of testing in production yields powerful performance data, a diverse base of customers will encounter a wide range of UX problems. For example, a customer in Alaska or Abu Dhabi will experience different network delays than the synthetic test accounts, which run close to the servers hosting the application.
This kind of continuous user-like feedback is a quick and easy way to begin tests in production. Organizations can graph the results of automated synthetic transactions to monitor the application. "Sufficiently advanced monitoring is indistinguishable from testing," as Ed Keyes, a senior software engineer at Google, argued at a Google Test Automation Conference over a decade ago.
As the synthetic account programs run, gather all the production results, including common URLs, 500 errors, 404 errors, database errors and time-to-serve requests, and visualize them on a production wall for continuous monitoring.
Once monitoring is in place, consider continuous deployment, where every change that makes it through development rolls out to production.
Feature flags, also called configuration flags, enable new code to reach only a select group of live users. The simplest approach to configuration flags is to have a function. For example, something like if_turned_on(flagname, userid) returns a Boolean true/false. The function looks up the usertype for the userid, then checks the database to see if the flag is on for that user. To flip feature flags on and off, the IT group simply updates a database or a text file for that purpose kept in version control. To test code in production, the programmer puts the old behavior on the bottom of the if statement and the new behavior on the top, and then turns the feature on for the target first users. A code update can go live for testers, then company employees at large, then the entire customer base, for example, with updates to the flags.
Be careful with continuous deployment gated by feature flags for testing in production. A programmer's simple if mistake can send the new code live to all users, but keep it hidden from the designated tester group. Likewise, he can set the configuration flags incorrectly. Noah Sussman, principal software engineer at Medidata Solutions, was an early adopter of configuration flags. He warns that code with heavy configuration flags looks messy, with a great deal of if or switch statements thrown in and embedded within each other; one project I worked on did this, with a -o flag to indicate old behavior. Engineers eventually faced a crossroads where the conditionals made the code difficult to maintain, but taking them out would slow progress.
Rather than use feature flags, app dev teams can roll out new code incrementally by servers or groups of servers, an approach known as sharding, phased or rolling deployment, canary deployment and similar terms. The incremental server rollout method differs from configuration flags because it uses multiple versions of production code, instead of one codebase with if statements.
To phase new code into production, use roles to direct users to servers hosting either new or the current code version. Some organizations create roles for employees, beta users, power users, the late majority and the laggards. If a deployment is sufficiently large, say with more than a dozen web servers, deploy the change to one web server at a time, with a small delay between upgrades to enable monitoring tools to pick up on negative results from the new code. If the deployed code generates a spike in problems, the team can roll the software back to a previous state. Consider combining phased deployment with configuration flags to make rollbacks incredibly simple.
If easy rollback is most important, a blue/green deployment might be a good fit, though it comes with the tradeoff of larger code exposure. Blue/green deployments require two identical web server farms: The blue deployment hosts the known good code, and the green hosts the update. When the team is ready for the update to go live, the router switches to direct traffic to that green deployment. If monitoring shows a negative user experience, the router switch will bring all traffic back to the blue deployment, while the testers and developers determine the problem and create a new version.
Develop a strategy
Some of these tactics to test code in production might not suit your software project; some might even seem unattainable. For example, highly regulated industries and government IT organizations might ban synthetic test accounts because of the possibility of their use as security backdoors.
In addition, testing in production doesn't expose every bug; they don't all manifest as 500 or 404 errors, or slow web performance. The testing team still needs other effective means to find problems. If significant bugs exist in an application, affect a large number of users and are hard to find in production, the team might not want to jump to shift-right testing. Instead, focus on ways to drive out defects earlier in the code lifecycle.