Manage Learn to apply best practices and optimize your operations.

Managing test data in complex systems

Managing data in complex systems can pose a challenge for testers, as the data being tested is dependent on variable data and time-based information contained within the system. If changes are made it affects other aspects of the application. In this tip, software expert Chris McMahon explains how to handle and test complex software systems.

Chris McMahon

Depending on the type of system you're testing, you will need to carefully consider the data you'll be using in your tests. In this article, I'll explain stateful time-based systems and show you how APIs can be used to generate the dynamic data needed to properly test your system.

Stateful and time-based systems
There is a class of software application for which managing test data is extremely difficult. Such a system will manage a complex set of stateful transactions, and will also rely on historical data for the outcome of current operations. For example, consider an academic courseware application, where students are enrolled in sections of classes in a particular semester taught by instructors on particular days. Student grades rely on previous actions by the student, like taking tests; manipulating sections of courses for testing purposes depends on which semester is current and on which instructors are available for that particular semester. Not only is the data model complex, but the operation of the system depends not only on what is in the system now (stateful) but also on operations that occurred in the past (time-based). Altering past data for test purposes will affect the current operation of the system.

One way to handle test data for stateful systems that are not time-based is to load the test environment with a snapshot of data from the production system. This approach guarantees that the test data will be in a valid state, without having to manipulate sets of artificial test data. An example of an application that is stateful but not time-based might be a banking system, where transactions are accomplished and recorded but do not affect future transactions, and calculations are performed based solely on the state of the data (for instance, an interest rate or an account balance) that exist at the time the calculation is made.

But that does not solve the test data problem for systems that are both stateful and time-based. In a stateful time-based system, eventually time will overtake the current set of test data, causing the test environment to become less and less accurate. For example, consider the courseware application, where the test environment will contain information for a current semester, but when moving time closes the current semester and opens a new semester, the nature of all of the transactions in the past semester change (many become invalid), and the data from the past semester now informs the operation of the current semester (consider GPA, grade point average, that depends on the student's behavior for all enrollments in all courses.)

Dynamic data handling In order to test a stateful time-based system, the test framework must keep the state of the test data entirely separate from the transactions performed using that data. For example, a naive test would check that student John Smith may enroll in Section 10 of English 101. The test will pass as long as John Smith is an active student, and as long as Section 10 of English 101 is offered. But eventually John Smith will graduate and the test will fail, or the English Department will only offer Sections 1 through 9 of English 101.

Related Content:
Is unit testing an important aspect of software development?
Unit testing can be very helpful in testing source code while an application is being built by catching bugs early in the development cycle and easing testers' workloads.

Software test expert: Why Agile teams' unit tests often fail
For many software development teams, unit testing is a niche expertise and easily overlooked. However author Roy Osherove advocates for unit testing particularly among agile teams.

Instead, a stateful time-based system demands a test framework in which "a current student" may enroll in "an available section" of "a course being offered."

For a unit test, such test data may be quickly set up and torn down. But for a UI-level test that might take only one second, creating a valid student in a valid section of a valid course in the current semester, and then deleting everything at the end of the test quickly becomes prohibitively expensive.

One valid approach to retrieving dynamic test data is to query the database for valid data at the time the test runs. This approach has been effective in the past, when databases were relatively simple and relatively stable. But in a world where new kinds of databases are replacing SQL, in a world where Object Relational Mapping (ORM) systems may alter data relationships without human intervention, querying databases directly becomes more expensive and less attractive.

APIs for dynamic test data
As the world becomes more interconnected, more applications are exposing Application Programming Interfaces (APIs). Some well-known API protocols are RPC, Remote Procedure Calls, for example CORBA, the Common Object Request Broker Architecture; SOAP, which was once an abbreviation for Simple Object Access Protocol (SOAP stopped being simple a long time ago, and now SOAP is just a word); and REST, Representational State Transfer. APIs for the system under test are often excellent sources of dynamic test data. And since creating APIs is relatively inexpensive in modern code bases, it is often far less expensive to create custom APIs for testing than to maintain SQL queries or loadable test data for stateful time-based systems.

While any API protocol will serve to retrieve or generate dynamic test data, REST is of particular interest because it is so simple, and so universal. In a RESTful API, a client issues an HTTP request to a web server using one of these standard HTTP actions: GET, PUT, POST, or DELETE. The server will either manipulate the state of the system in the way the client has requested, or will return a standard HTTP error. The difference between a RESTful API and a normal web interface is that a REST client uses a Uniform Resource Identifier (URI) to address a service provided by the system, instead of a Uniform Resource Locator (URL) to address a document.

So to implement the example, the test framework might issue a GET request to, and would receive from the service a list of current students from which it would extract "John Smith". (REST APIs typically use XML or JSON data formats.) Then the test might issue a GET request for, from which it might extract "English 101." With that information in hand, the test framework could issue a PUT or POST request (depending on how the service is implemented) for, with a payload containing "English 101," so that the service knows that the framework needs a current section for a particular course. Now the test framework is in a position to validate that John Smith may enroll in Section 10 of English 101.

But this is dynamic data. The next time the test runs it might enroll Sally Jones in Section 3 of Introduction to Geology, and still pass. Using APIs, the nature of the test data is completely separate from the transaction being tested.

Systems with complex data, like stateful time-based systems such as this courseware example, demand a thoughtful approach to managing test data. Creating APIs for test purposes is an excellent approach to managing such test data. And since more systems are becoming more available to other systems via APIs, it may very well be that using APIs to manage such systems is not only the most reliable approach, it may also be less expensive than other alternatives such as direct access to the database.

About the author: Chris McMahon is a software tester and former professional bass player. His background in software testing is both deep and wide, having tested systems from mainframes to web apps, from the deepest telecom layers and life-critical software to the frothiest eye candy. Chris has been part of the greater public software testing community since about 2004, both writing about the industry and contributing to open source projects like Watir, Selenium, and FreeBSD. His recent work has been to start the process of prying software development from the cold, dead hands of manufacturing and engineering into the warm light of artistic performance. A dedicated agile telecommuter on distributed teams, Chris lives deep in the remote Four Corners area of the U.S. Luckily, he has email: 

Dig Deeper on Topics Archive

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.