Most of the time, we do not have options for which systems to source data from. We need to get the data from the systems that generate the relevant data. However, the source system analysis is a critical part of determining the amount of time the ETL will take –- and therefore, how long the entire data warehouse project will take.
Source system analysis can be performed when certain information (next tip) can be gathered about the source system. Usually this takes place on a system currently in production that can be observed and potentially tested with. Products that are not in-house, or not even built, require vendor interface and the analysis can be based on speculation, rather than fact. Any source system analysis that is performed before this period of observation and hands-on interface occurs should be reevaluated after the system is in house.
Non-built source systems may provide an opportunity to influence the build of the system such that it is more data warehouse compatible. Time-stamping new and changed records and partitioning source databases are examples of this. Time-stamping gives the ability to more easily source only those records that have changed since the last extract. Partitioning source databases increases the availability possibilities for the source database while extract is occurring, thus increasing the chances that real-time, non-impacting data warehouse extracts can be performed someday.
For more information,
This was first published in July 2002