CRM / Call Center Tips:

Data quality management: Handling violations, part 1

SearchCRM.com

A workable way to approach data quality management is to make it the "absence of intolerable defects." Here's a look at some potential intolerable defects in your environment and what can be done about them.

Data types constrain the values in a column ... to a degree. Any mix of 0-20 characters can go into a character (20) data type column. However, if this is a Name field, there are some characters that you would not expect to find in the column such as % and $. These would be "red flags" that the field contained inappropriate data. There are also numerous misspellings and incorrect alternative spellings of last names. Often, a manual review of column contents, with counts of each unique value will bring to light the one correct spelling.

There are two approaches to handling the violations. Usually a combination is best. You can "generalize" into rules the various formatting errors that are found in the field. Typical of these formatting errors found in name columns include:

  • Space in front of name
  • Two spaces between first and last name and/or middle initial
  • No period after middle initial
  • Inconsistent use of middle initial (sometimes used, sometimes not)
  • Use of all caps
  • Use of "&" instead of "and" when indicating plurality
  • Use of slash instead of hyphen

On and on it goes, especially in environments where original data entry is "free form," unconstrained an

To continue reading for free, register below or login

Requires Membership to View

To gain access to this and all member only content, please provide the following information:

By joining SearchCRM.com you agree to receive email updates from the TechTarget network of sites, including updates on new content, magazine or event notifications, new site launches and market research surveys. Please verify all information and selections above. You may unsubscribe at any time from one or more of the services you have selected by editing your profile or unsubscribing via email.

TechTarget cares about your privacy. Read our Privacy Policy

d without the use of master data as a reference.

It is not possible to generalize to rules things like use of initials and misspellings (i.e., William McNight instead of William McKnight) so they need to be handled separately. You can map the incorrect data to the correct data in your data warehouse's staging area. As new data is discovered (i.e., is Bill McKnight the same as William McKnight?), it is held out until review after which it can be mapped to incorrect or correct data and re-routed through the ETL process.

If adapting this approach, be sure procedurally the reviews are held quickly because data will be held out of the data warehouse until it is accounted for. The mapping would take place after the rules are applied.

With either or both approaches, I recommend actually bringing the "bad" value over as well since often users will want to know what the source data actually had in it.

Read part two of this tip.

For more information, check out this Learning Guide for Data Quality.

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.