Most decision-support projects are all data, all the time -- for example, creating the logical and physical structure of a data warehouse and populating it using extract-transform-load (ETL), creating fixed- or variable-content reports, and structuring and populating data marts or target files for ad hoc business intelligence and analytics activities.
These data-centric systems sound straightforward but they seldom are, as you know if you’ve been assailed at the end of a project by business users who are unhappy with the data you’ve delivered. It’s commonplace to have business users dictate which specific elements they want, and then, when you provide the specified data, they change their minds. In this tip, you will learn ways to improve your success with data-centric projects by exploring how the business people will use the data to make decisions.
Primary techniques for requirements analysis
From a requirements and analysis standpoint, the primary techniques used in these projects are data definition and modeling, data quality sampling, and source-to-target mapping. The guiding principle is that any and all usages of the data will be possible as long as the data structures are optimal and the actual data is as clean as it can be. Consequently, it’s usually out of scope to think about how the business people will use the data to make decisions.
It's risky to focus on data without considering its usage. Data and its processing or usage are not separate, siloed disciplines but rather inform each other and relate to each other. If you’ve done a perfect job loading, cleaning, and providing access to the data but haven’t considered the context of how the data will be used, you may still not meet the business needs that motivated the project in the first place.
The key is to work with your business stakeholders and subject matter experts to identify additional data needs by looking at how the data will be used.
Identify the business use cases enabled by the project
On the surface, the use cases for data-centric projects seem trivial. For example, almost every data-centric project would have a use case called “Transform Data.” It would have one generic step: “Move source to target. See ‘Attachment A,’” where “Attachment A” contains the mapping rules. Of course, such use cases have very limited value. It’s better to focus on the use cases that reflect what the business users are going to do with the data. For data-centric projects, these business use cases often have a theme related to research or evaluation. Here are examples:
- Determine products to offer
- Determine product pricing
- Define marketing campaigns
- Determine cross-sell and up-sell strategy
- Improve service rep productivity
- Determine service level agreements
- Determine which customers to retain
- Detect fraud (internal or external)
For your purposes in working with your business colleagues to understand the data context, do not write detailed versions of these use cases. Instead, simply name them, and provide a very brief description of each one. Then use them as a starting point for further analysis and project planning.
Create detailed scenarios to represent concrete examples of how the data will be used
Suppose you want to write a detailed scenario for “Detect Fraud” in an insurance company; you might write, “Detect potential fraud when a customer files a claim for more than x dollars and changes address within y days.” Scenarios like this one will help you identify additional data that is needed and rules that need to be considered. Here are examples of what you can learn by working through this kind of detailed scenario:
- Additional data elements are needed and are available from the sources.
- Business rules are needed to define thresholds or limits (like the “x” and “y” in the example) or to specify other kinds of calculations needed as part of making the data available.
- To be of value for decision making, data elements must be available concurrently. Imagine that the sample scenario is very important for the insurance company. If the team members don’t analyze how the data is actually used, they might decide to create releases of data by subject area—in this case, claim data and customer data. They might choose to deliver claim data in the first release, and customer data in the next release. If they do that, the company will not be able to detect this kind of fraud during the first release, because it requires both kinds of data.
- Data comes from multiple sources, and you need to clarify which is the authoritative source.
- Data needs to be more clearly defined or mapped.
- Data is missing from the sources. When you discover that data is missing, the business stakeholders must decide whether they are willing to fund efforts to obtain the data, or whether it is acceptable to build the application without the data.
If you prefer, you can start directly with the scenarios and eventually roll them up into the business use cases. To prioritize the data, you will need both the use cases and the scenarios.
Prioritize the business usage represented by the use cases and scenarios
Prioritizing the business use cases helps you manage stakeholders’ expectations. In the insurance company example, the stakeholders may need to decide whether delivering data to support fraud detection is more pressing than delivering data to support product definition. Prioritizing the detailed scenarios will help you learn when to deliver the data.
Use scenarios to define acceptance tests for the usage of the data
In an operational system, user acceptance testing validates whether expected results are obtained for a given set of conditions. For reporting, business intelligence, and analytics activities, the results are rarely known in advance. That’s probably another reason why data-centric projects usually restrict their scope and their tests to proving that the project has delivered clean copies of the requested data. On the other hand, if you build acceptance tests based on the detailed scenarios, the business users can provide feedback on whether they can work with the data to discover the results, whatever they may be.
What does this latter alternative mean in terms of testing? You still want to conduct the classic data quality and mapping tests that are mandatory for these projects, but the business stakeholders and SMEs also need to test the detailed scenarios. You probably will not be able to cover all possible scenarios. Instead, look at those scenarios that will enable the business to provide the feedback needed to steer the project to a successful conclusion.
The Given/When/Then format used in Behavior Driven Development is a powerful way to structure user acceptance tests. Writing the tests may also help you uncover additional data needs as well as help define the data selection criteria (queries) employed by business users when they work with the data.