Manage Learn to apply best practices and optimize your operations.

How to test a data warehouse

Testing a data warehouse is not very different from a typical testing project. Baher Malek advises testers on creating robust test cases and basing test design and execution on context.

This testing tip was prompted by a reader question about testing a data warehouse. Asked the reader, "How do you relate requirements and quality versus testing in a data warehouse environment?"

As much as I would want you to believe that testing a data warehouse is a wondrous and mysterious process, it's really not that different than any other testing project. The basic system analysis and testing process still applies. Let's review a few of these steps and how they fit within a data warehouse context:

Analyze source documentation
As with many other projects, when testing a data warehouse implementation, there is typically a requirements document of some sort. These documents can be useful for basic test strategy development, but often lack the details to support test development and execution. Many times there are other documents, known as source-to-target mappings, which provide much of the detailed technical specifications. These source-to-target documents specify where the data is coming from, what should be done to the data, and where it should get loaded. If you have it available, additional system-design documentation can also serve to guide the test strategy.

Develop strategy and test plans
As you analyze the various pieces of source documentation, you'll want to start to develop your test strategy. I've found that from a lifecycle and quality perspective it's often best to seek an incremental testing approach when testing a data warehouse. This essentially means that the development teams will deliver small pieces of functionality to the test team earlier in the process. The primary benefit of this approach is that it avoids an overwhelming "big bang" type of delivery and enables early defect detection and simplified debugging. In addition, this approach serves to set up the detailed processes involved in development and testing cycles. Specific to data warehouse testing this means testing of acquisition staging tables, then incremental tables, then base historical tables, BI views and so forth.

Another key data warehouse test strategy decision is the analysis-based test approach versus the query-based test approach. The pure analysis-based approach would put test analysts in the position of mentally calculating the expected result by analyzing the target data and related specifications. The query-based approach involves the same basic analysis but goes further to codify the expected result in the form of a SQL query. This offers the benefit of setting up a future regression process with minimal effort. If the testing effort is a one time effort, then it may be sufficient to take the analysis-based path since that is typically faster. Conversely, if the organization will have an ongoing need for regression testing, then a query-based approach may be appropriate.

Testing resources
How to do integration testing

Test coverage: Finding all the defects in your application 

How to design test cases

Test development and execution
Depending on the stability of the upstream requirements and analysis process it may or may not make sense to do test development in advance of the test execution process. If the situation is highly dynamic, then any early tests developed may largely become obsolete. In this situation, an integrated test development and test execution process that occurs in real time can usually yield better results. In any case, it is helpful to frame the test development and execution process with guiding test categories. For example, a few data warehouse test categories might be:

  • record counts (expected vs. actual)
  • duplicate checks
  • reference data validity
  • referential integrity
  • error and exception logic
  • incremental and historical process
  • control column values and default values

In addition to these categories, defect taxonomies like Larry Greenfield's may also be helpful.

An often forgotten aspect of execution is an accurate status reporting process. Making sure the rest of the team understands your approach, the test categories and testing progress will ensure that the rest of the team is clear on the testing status. With some careful planning, follow-through and communication, a data warehouse testing process can be set up to guide the project team to a successful release.

About the author: Baher Malek is currently a principal quality and test engineer for a Fortune 100 company where he helps traditional project teams adopt agile approaches to software testing. Baher is also a frequent attendee and speaker at the Indianapolis Workshops on Software Testing and the Workshop on Open Certification for Software Testers. You can contact Baher via email at [email protected]

Dig Deeper on Topics Archive