There are many definitions of data quality running around. Some like to start with data in its usage expectations. Others take a scorecard approach that assesses data based on its apparent technical merits. I suggest a blend of both approaches.
Data quality is about the data meeting user expectations. There are components of cleanliness that can be derived from robust data models -- referential integrity rule adherence, cardinality adherence and the like. Usually we have DBMS referential integrity turned off in the warehouse, so how is your programmatic referential integrity doing? How many violations of the indicated cardinalities on the data model that the data warehouse/mart was implemented from are still holding true within the data? If you have specialization/generalization in the model, are those rules (like an employee must be either a "full-time employee" or a "contractor") holding up in the actual implementation? A robust logical data model is very important to data warehousing data quality. Please don't start your data warehouse implementation by sitting down and typing the words "CREATE TABLE."
There are also measures associated with data value appropriateness. Are data columns being used for multiple meanings? Especially for numeric data, are there reasonable domains for values? Finally, does the data conform to the expected set of "clean" values for the column?
Form your data quality scorecard based on user expectations
Requires Free Membership to View
When you register, you'll begin receiving targeted emails from my team of award-winning editorial writers on the latest customer relationship management (CRM)and call center technology issues today. Our goal is to keep you informed on the hottest issues facing this fast-changing industry.
Hannah Smalltree, Editorial DirectorWilliam McKnight answers your questions on Business Intelligence in searchCRM's Ask the Experts.
This was first published in October 2001
Join the conversationComment
Share
Comments
Results
Contribute to the conversation