Acting on data quality in the data warehouse

Acting on data quality in the data warehouse

Data quality can be a hard thing to grasp. Many data warehouse programs are branded as having bad data quality yet these statements can tend to be void of detail, leaving the data warehouse build team wondering how to improve the data quality.

You can't improve what you can't measure. So, we need a means for measuring the quality of our data warehouse. Abstracting quality into a set of agreed data rules and measuring the occurrences of quality violations provides the measurement.

This approach can also help management to understand the importance that the cleanliness of the data that feeds the data warehouse is to overall data quality. "Garbage in, garbage out", to a degree.

So, what can you do with data quality violations as they are picked up for movement into the data warehouse? There are three basic actions:

  1. "HELD OUT": Record(s) are held out of the main DW tables due to gross rule violation and placed into "holding" tables for manual inspection and action.
  2. "REPORTED": Data quality violation is reported on but data is loaded and will remain in the table.
  3. "CHANGE DATA": Transform data to a value in a master set of "good" values (i.e., Texus is changed to Texas).

One tip about changing data for the data warehouse -- bring the "bad" data into the data warehouse as well. Label it "source" data. This way you can trace back to operational data, which is something many

    Requires Free Membership to View

    When you register, you'll begin receiving targeted emails from my team of award-winning editorial writers on the latest customer relationship management (CRM)and call center technology issues today. Our goal is to keep you informed on the hottest issues facing this fast-changing industry.

    Hannah Smalltree, Editorial Director

    By submitting your registration information to SearchCRM.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchCRM.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

business users require, even if they use a different, "cleansed" set of data for analytical functions.

In the next few tips, we'll explore the types of data quality violations and the appropriate actions.

For more information, check out SearchCRM's Best Web Links on Data Quality.


This was first published in May 2002

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.

    Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.