Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

How to create test data that yields accurate QA results

Testers need valid data to prevent app defects, which means they must simulate real data types and structure them as closely as possible. Don't fake it when you make it.

If you want valid test results, you have to use valid data. But this is an element of software testing that QA...

teams typically bypass. Many of these teams struggle with automated and manual text execution, because both rely on unreliable data.

Why does this dummy test data yield unreliable results? It's because if you don't create test data that simulates the type and structure of the real thing, then there will be significant gaps in testing accuracy. You won't discover hidden defects in application code until it hits production and executes tasks with real data -- and that's not when you want to find a defect.

Secured, but real, test data also helps QA teams spot errors in application functionality more definitively. This informed way of working increases test execution speed and accuracy. When you create test data rather than fabricating viable examples for testing purposes, it frees up more time to focus on vetting the actual user functions of an application.

While it's not easy to procure reliable, valid test data, if you have sound data in these three areas, you'll have accurate, efficient QA tests.

User demographic data

Application users typically provide information for a user profile, such as address, phone number, email address, gender and date of birth. For security reasons, the company must scrub or scramble that personal data when it is outside the confines of the application stack. Live data is off-limits for QA testing. But QA testers need the same data types, input in a way that mimics a user or automated action, to ensure the application functions as intended.

Each data type includes behavioral characteristics that react differently within an application. For example, dates are a special data type, because they are written in multiple formats, the use of which can dictate their values and display order. Even simple numeric data types, such as ZIP code and phone numbers, have distinct formatting to handle data variations and localization display differences. If the test systems do not use the same data definitions as the live, production system, you will miss defects resulting from distinctions in the data type. QA might fix these bugs with simple code changes, or the bugs could require large recoding efforts.

QA testers should ask these questions about the data:

  • Where does it come from?
  • How is it defined in the coding language?
  • Is it entered directly into the system by the end user, or is it automatically loaded?

All of this information influences the data's behavior, which makes the answers critical for accurate testing.

Along with data type definitions, you'll need a variety of test data. Use data that caused defects in the past, not just clean data. Also, add negative testing data to help identify defects before release. While this makes for more work, realistic demographic data is a foundation for a high-quality user experience.

Critical data

Financial and medical data is more complex than user demographic data and gets scrambled or scrubbed by apps. QA testers must also replicate this data type for testing. Applications that process financial calculations, though, or display accounting or other data might not always use numeric data types. This data might have additional definitions that perform differently based on how the applications use input data. For example, U.S. dollars display and calculate differently than Euros or other numeric monetary types.

Accounting apps depend on valid data to generate accurate calculations that have a profound impact on the business. Businesses rely on those results to file taxes, among other tasks, and application defects that lead to calculation errors could create a domino-like failure that results in business fines or fees.

Similarly, applications often display medical data in charts or graphs for analytical reasons. If you create test data to verify these functions and it's not the same data type that the application uses, you can't determine defects. These app functions might display and process without errors and then fail in production due to a distinguishing data type characteristic. Many medical apps rely on calculations to prescribe the correct care to patients. These apps use medical data such as a patient's weight, height and date of birth to calculate a safe medication dosage. The right data can mean life or death.

Refreshable, resettable data

Finally, when you create test data for a QA test server or system, make sure you can refresh or reset it to a defined baseline. If you automate tests, the data must return to the pretest state before each execution. Otherwise, the QA team wastes countless hours trying to manipulate the automated tests, either by coding in the available data or creating new data.

Alternatively, if the QA team performs manual tests, a refreshable data set provides a faster, easier way to recognize errors. Rather than try to determine if the data is accurate, a manual tester can use the same data over and over to spot changes or errors in application functionality with more efficiency. It's easier to spot a defect in a known data set than with constantly changing data.

This was last published in December 2018

Dig Deeper on Software Test Design and Planning

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

How has invalid test data obscured your team's application defects?