It's a question on the minds of plenty of software professionals these days: When will AI eliminate testing jobs? Five years? Ten years? Twenty?
To get at these big questions about how software testing using AI will change jobs and processes, we first need to talk about which parts of testing AI can cover. This is what matters, much more so than whether AI will eventually take over entirely.
What may come as a surprise is that less-expensive testing could result in increased demand for software testing.
Lower costs = more spending
In the 1860s, England found a way to dramatically reduce the cost of coal. This caused coal consumption to rise, as people felt more willing to throw another lump on the fire and keep themselves a little warmer. Plus, less-costly coal meant that projects previously deemed uneconomical -- a train line from point A to point B, for example -- suddenly became viable business opportunities. This has become known as Jevons paradox.
Something similar happened with software development. As programming languages became more powerful, applications quickly moved beyond payroll and business reporting. Today, anyone with a little free time can write a hobby application to track merit badges for a scouting troop. Someday that hobby could become their day job.
You can see this with artificial intelligence (AI) and machine learning. As data becomes available and tools grow more accessible, projects that once seemed impossible and overly expensive start to look reasonable. That means we will see more of them.
Those projects will need to be tested. The advent of AI and machine learning will lead to more testing -- or at least different testing -- and not less.
Parts, not the whole
At a testing conference in 2004, the buzz was all about test-driven development and the end of the nontechnical tester. Fifteen years later, we've come to a more nuanced position, namely a conversation about which parts of the process to automate.
It's similar with AI and machine learning. You can't simply point AI at software and say, "Figure out if this works." You still have the classic problem of knowing what it means for that thing to work; only then can the test tool determine whether expectations are being met.
In his iconic black box software testing course, engineering professor Cem Kaner gives the example of testing an open source spreadsheet product. If the business rule to evaluate a cell is do the math the same way that Microsoft Excel would, it is possible to generate a random formula, have both Excel and the software being tested evaluate the formula and then make sure they match. To use this strategy, the tester needs the correct answer and therefore needs Microsoft Excel. It is possible, in some scenarios, to get AI to act as this oracle, which is the method used to verify that a bug is, in fact, a bug. Even in that case, the AI will not be able to find issues with security, usability or performance.
Any serious look at AI and machine learning will ask where to apply the technology. So let's take a closer look.
Scenarios for software testing using AI
One strict definition of artificial intelligence is the use of any sort of abstract logic to simulate human intelligence. With that definition, our spreadsheet comparison is artificial intelligence. When most people use the term, however, they typically mean the ability to learn based on data -- lots and lots of data.
In the best cases, there will be a few hundred thousand examples of training data, combined with whatever the correct answer is. Once the software reads in the examples, it can run through the examples again, trying to predict the answer, comparing that to the actual example -- and continue to run until the predictions are good enough. The simplest example of this is probably the online version of twenty questions.
This approach has a few direct applications for software testing using AI.
Automated oracles of expert systems. With enough examples, machine learning systems can predict the answer. Consider a website that takes symptoms and renders a medical diagnosis. It is tested by comparing what medical professionals would diagnose to what the software finds. The amazing thing about this approach is that the test software will eventually come to mimic the behavior of the software that it is testing -- as it has to find the correct answer for the comparison.
This leads to the possibility of a self-correcting system. That is, three different sets of AI, all given the same set of symptoms, all asked to predict the right answer, with a fourth system comparing answers to make sure they match. NASA, for instance, has used this approach to check its software models. The primary difference will be getting this technology to learn from real data instead of from algorithms programmed by humans.
Test data generation. Using live test data, for example, customer records will provide a close approximation of real-world conditions, giving the best chance of finding real problems. On the other hand, testing with real data could be disagreeable or even a security risk; in the case of personal privacy or health information, laws would make it illegal to do so. Using fake data runs the risk of missing large categories of defects you could have found easily with real data.
This is the paradox of test data. Machine learning and AI can look at real data -- customer data, log files and the like -- and then generate data that is not real, but is real enough to matter. This might be fake customer information, credit card information, purchase orders, insurance claims and so on.
Once you've generated the test data, you may need to find the correct answer -- for example, should the fake, generated claim be paid? Given a large enough data set, machine learning may be able to tackle that as well.
False-error correction. One of the biggest problems with running end-to-end tests over the long term is that the software will change. After all, the job of a programmer is to create change in software. So the software did one thing yesterday and does something else today, which the test software registers as an error. The simplest example of this is a UI element that moved. Suddenly, the test software cannot find the search button or the shopping cart icon, simply because it has moved or changed.
Tools already exist that can guess and try to find a way through the user interface, even after it has changed. Others, including test.ai, can be taught to recognize things, such as a shopping cart image, even if the image looks different. This isn't unlike what Facebook or a photo app does in recognizing faces.
Test idea generation. Often times in testing, we face the combinatorial problem: We have 10 possible things for one condition, 10 of another, 10 of a third, and 10 of a fourth -- leading to 10,000 possible test ideas. If automation is not economical, neither will be testing all 10,000. Testing just 40, called all singles, is probably not enough. Tools exist to generate the highest possible coverage with the lowest possible number of tests, sometimes called pairwise testing. Finding these pairs does not require learning, but it is a sort of artificial intelligence.
Plenty of reasonably priced online courses are available to teach machine learning, and many people recommend just diving in. But is that the smartest approach?
The challenge with software testing using AI and machine learning is to find the use cases that will actually work -- along with the oracles. The pieces of that are simple: the test idea, the data and the expected result. AI can help in all of those cases, but it cannot magically and seamlessly fit them together. For the next decade or more, that work will remain that of the human tester.
Ways to manage cognitive and AI bias in software development