The shell script

(See Tuesday's tip, Coding data cleansing in AWK, for context):



#!/bin/ksh #########################################################################

    Requires Free Membership to View

#
Data File : data.dat. ** Data Scrubber : chg.awk ** Parameter File : param.dat
##########################################################################
echo "Starting....'date '+%T''..." echo while read -r col from to do echo "doing for $col $from $to ....'date '+%T''" echo grep "$from" data.dat
>|temp_with grep -v "$from" data.dat >temp_without awk -f chg.awk COL=$col FROM=$from TO=$to temp_with
>|temp_with_clear cat temp_with_clear temp_without >|
data.dat echo "Ending $col $from $to ....'date '+%T''" echo done<
param.dat echo "ending....'date '+%T''..." /usr/bin/rm temp_with temp_without temp_with_clear The AWK Script [chg.awk]: BEGIN { FS="t" } { if ( $COL== FROM ) { for (i=1; i<
=NF; i++) { if ( i==COL ) { printf("%s",TO) if ( COL != NF ) { printf("t") } else { printf("n") } } else { printf("%s",$i); if ( i!=NF) { printf("t") } else { printf("n") } } } } else { print $0 } }
The parameter file: 3 i.b.m. IBM 2 profs. Prof. 1 professor Prof. 4 ibm IBM
[Each line indicates the transformation.
For example, the first line indicates that if 'i.b.m' is present anywhere in the file at column 3, it should be transformed into 'IBM']

This was first published in April 2002

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.

    Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.