DATA QUALITY STRATEGY: A STEP-BY-STEP APPROACH
“Strategy is a cluster of decisions centered on goals that determine what actions to take and how
to apply resources.”
Certainly a cluster of decisions – in this case concerning six specific factors
– will need to be made to effectively improve the data. Corporate goals will determine
how the data is used and the level of quality needed. Actions are the processes improved and invoked
to manage the data. Resources are the people,systems, financing, and data itself.
We can therefore apply the selected definition in the context of data,and arrive at the definition
of data quality strategy:
“A cluster of decisions centered on organizational data quality goals that determine the data
processes to improve, solutions to implement, and people to engage.”
EXECUTIVE SUMMARY
This paper will discuss:
• Goals that drive a data quality strategy
• Six factors that should be considered when building a strategy
• Decisions within each factor
• Actions stemming from those decisions
• Resources affected by the decisions and needed to support the actions.
You will see how these six factors — when added together in different combinations — provide the
answer as to how people, process and technology are the integral and fundamental elements of
information quality.
GOALS OF DATA QUALITY
Goals drive strategy. Data quality goals must support on-going functional operations, data management
processes, or other initiatives such as the implementation of a new data warehouse (DW), CRM
application, or loan processing system.
THE SIX FACTORS OF DATA QUALITY
When creating a data quality strategy there are six factors, or aspects of an organization’s operations that
must be considered. Those six factors include:
• Context — the type of data being cleansed and the purposes for which it is used
• Storage — where the data resides
• Data Flow — how the data enters and moves through the organization
• Work Flow — how work activities interact with and use the data
• Stewardship— people responsible for managing the data
• Continuous Monitoring — processes for regularly validating the data
Figure 1 depicts the six factors centered on the goals of a data quality initiative, and shows that each
factor requires decisions to be made, actions that need to be carried, and resources to be allocated.
TYING IT ALL TOGETHER
In order for any strategy framework to be useful and effective it must be scalable. The strategy framework
provided here is scalable from a simple one-field update such as validating gender codes of male and
female, to an enterprise-wide initiative where 97 ERP systems need to be cleansed and consolidated into 1
system. To ensure the success of the strategy, and hence the project, each of the six factors must be
evaluated. The size (number of records/rows) and scope (number databases, tables, and columns)
determines the depth to which each factor is evaluated.
Taken all together or in smaller groups, the six factors act as operands in data quality strategy formulas.
• Context by itself = The type of cleansing algorithms needed
• Context + Storage + Data Flow + Work Flow = The types of cleansing and monitoring
technology implementations needed
• Stewardship + Work Flow = Near-term personnel impacts
• Stewardship + Work Flow + Continuous Monitoring = Long-term personnel impacts
• Data Flow + Work Flow + Continuous Monitoring = Changes to processes
It is a result of using these formulas the people come to understand that information quality truly is the
integration of people, process, and technology in the pursuit of deriving value from information assets.
To help the practitioner employ the data quality strategy methodology,
the core practices have been extracted from the factors and listed here.
a) A statement of the goals driving the project
b) A list of data sets and elements that support the goal
c) A list of data types and categories to be cleansed(1)
d) A catalog, schema or map of where the data resides(2)
e) A discussion of cleansing solutions per category of data(3)
f) Dataflow diagrams of applicable existing dataflows
g) Work flow diagrams of applicable existing work flows
h) A plan for when and where the data is accessed for cleansing(4)
i) A discussion of how the dataflow will change after project implementation
j) A discussion of how the workfflow will change after project implementation
k) A list of stakeholders affected by the project
l) A plan for educating stakeholders as to the benefits of the project
m) A plan for training operators and users
n) A list of data quality measurements and metrics to monitor
o) A plan for when and where to monitor(5)
p) A plan for initial and then regularly scheduled cleansing
• Stewardship— people responsible for managing the data
• Continuous Monitoring — processes for regularly validating the data
Figure 1 depicts the six factors centered on the goals of a data quality initiative, and shows that each
factor requires decisions to be made, actions that need to be carried, and resources to be allocated.
Credit goes to Frank Dravis
https://pdfs.semanticscholar.org/5b5b/a15e8ea1bd89fe4d14d5e97ce456436291e0.pdf