Here are a few points to remember for the initial data load for the snowflake data warehouse:
- After development of the source-target map for the initial load, start from the top-level dimensions and then come to the fact through the intermediate level dimensions for loading.
- Have a staging area where source data lands before going into production. The staging area may contain the operational data model and some other supplementary tables to perform joins to implement the connection of surrogate keys and also to acquire the metadata from the ETL jobs.
- The metadata for the initial load may include:
- Job Name
- Package / Procedure Name
- Source database and Table
- Destination Database and Table
- No of rows loaded
- No of rows rejected
- Time consumed
- For ETL jobs, E-mail service or proper message escalation keeps track of the process and alerts users and other concerned parties on any type of mishaps. These messages may contain the metadata collected from the ETL job.
- If any tool is being used for the ETL, it is always advisable to use SQL more than any other external language. This will be good for performance, reliability and further modifications.
- Data cleansing, transformation and implementation of data quality should be implemented in the staging area and not on the production server
- of operational systems.
- The ETL processes should be flexible enough to cope with ongoing updates regularly. Using global variables and parameters, rather than hard coded codes, enhances the process.
- Well document everything for future use and maintenance.
For more information, check out SearchCRM's Best Web Links on Data Warehousing.
This was first published in April 2002