This tip addresses the place for replication in the world of data warehousing. The data warehousing world is one of performing "extract, transform and load (ETL)" for data movement. Some replication vendors missed the mark when they didn't add robust transformations to their product sets. Furthermore, DBMS vendors added replication functions much like they are adding ETL functionality now.
But still is there a place for replication in data warehousing? There is one. If you have a separate staging area tier (meaning separate physical DBMS instance) from your data warehouse, this staging area often acts as a dropping ground for the operational data and a data cleansing server. If you can live with the format of the data in this first tier of the data warehouse architecture initially being exactly the same as the operational data, replication can be used as a low-cost ETL for this first tier.
Be careful not to yield to the temptation to just replicate your operational data and call that a data warehouse. The transformation layer is very important and you need ETL. You just may choose to go light (or non-existent) on the transformations in the first ETL when the goal is to just get the data changed data only out of the source system and into a manageable environment as soon as possible.
One drawback of this approach is that you are not exercising your true ETL process in this step. Some have found this to be an acceptable tradeoff.
For more information, check out searchCRM's Best Web Links on Data Warehousing.
This was first published in November 2001