Home > Data quality management: Handling violations, part 1
Tip:
EMAIL THIS LICENSING & REPRINTS

Data quality management: Handling violations, part 1

07 May 2002 | William McKnight, Contributor

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

A workable way to approach data quality management is to make it the "absence of intolerable defects." Here's a look at some potential intolerable defects in your environment and what can be done about them.

Data types constrain the values in a column ... to a degree. Any mix of 0-20 characters can go into a character (20) data type column. However, if this is a Name field, there are some characters that you would not expect to find in the column such as % and $. These would be "red flags" that the field contained inappropriate data. There are also numerous misspellings and incorrect alternative spellings of last names. Often, a manual review of column contents, with counts of each unique value will bring to light the one correct spelling.

There are two approaches to handling the violations. Usually a combination is best. You can "generalize" into rules the various formatting errors that are found in the field. Typical of these formatting errors found in name columns include:

  • Space in front of name
  • Two spaces between first and last name and/or middle initial
  • No period after middle initial
  • Inconsistent use of middle initial (sometimes used, sometimes not)
  • Use of all caps
  • Use of "&" instead of "and" when indicating plurality
  • Use of slash instead of hyphen

On and on it goes, especially in environments where original data entry is "free form," unconstrained and without the use of master data as a reference.

It is not possible to generalize to rules things like use of initials and misspellings (i.e., William McNight instead of William McKnight) so they need to be handled separately. You can map the incorrect data to the correct data in your data warehouse's staging area. As new data is discovered (i.e., is Bill McKnight the same as William McKnight?), it is held out until review after which it can be mapped to incorrect or correct data and re-routed through the ETL process.

If adapting this approach, be sure procedurally the reviews are held quickly because data will be held out of the data warehouse until it is accounted for. The mapping would take place after the rules are applied.

With either or both approaches, I recommend actually bringing the "bad" value over as well since often users will want to know what the source data actually had in it.

Read part two of this tip.

For more information, check out this Learning Guide for Data Quality.

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
Data quality management
Jigsaw offers corporate data free of charge, partners with SaaS CRM
Customer data crucial to consolidating CRM
A true view of the customer requires data 'symbiosis'
Data quality cleans up customer records at Intellidyn
Data Quality Learning Guide
Travelocity 'activates' customer data for BI
Moving customer data quality upstream
CIOs seek 'one version of the truth'
Is there hidden knowledge in 'dirty' data?
Dear Santa, I've been a good data management manager ...
Data quality management Research

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
cooked data  (SearchCRM.com)
dirty data  (SearchCRM.com)
trouble ticket  (SearchCRM.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2000 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts