A common object of ETL jobs is to implement data quality on the extracted data from the operational system during the process of transformation. One major phenomenon is the presence of this data in different formats, e.g.:

  1. Prof., Prof, Professor
  2. IBM, I.B.M., International Business Machine
  3. Mr, Mr. / Mrs, Mrs. / Dr, Dr. etc

There can be a wide range of variety for same data. Data from the operational systems may be coming from SQL Server to Oracle on Solaris. So, one solution comes through the use of scripts -- Korn Shell and AWK -- either applied directly or as exits in ETL tools.

The Steps Are:

  1. Data extracted using BCP out to a flat file.
  2. FTP file from the operational system server to the Solaris data warehouse server.
  3. Build a parameter file (also flat file tab delimited) that contains several lines. Each line contains 3 columns -- Column # to be compared, Bad Data (i.e., From), Good Data (i.e., Changed to).
  4. One AWK script is developed to transform the data from Bad to Good.
  5. Another shell script (Korn Shell in this case) is build to run the AWK script to make the process generalized, i.e., to work upon any extracted flat file and parameter file.

On Thursday, I will pass on the Shell script code.

For more information, check out SearchCRM's Best Web Links on

    Requires Free Membership to View

Business Intelligence and Data Analysis.

Have a question about this strategy? Ask William now.


This was first published in April 2002

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.