I received a number of excellent questions at guru sessions last week at the Data Warehousing Institute World Conference. I will share some of the discussion in the tips this week. The first question had to do with backing up the data warehouse.
Full backups of production data warehouses are practically impossible to achieve. The intrusion of such backups into warehouse available windows is just overwhelmed by the volume of data. Daily incremental backups are more common and recommended. If backing up the entire day's activity across all tables is still a challenge to desired availability, you could also segment the load schedule by cycling tables through the backup schedule such that in the course of 7 days, each table (or partition) receives a full backup. A strategy mixing incremental and full backups usually desires the best, balanced results.
Some DBMS, such as Sybase, require backups to be at the complete database level. For their backups, there is mirroring at the hardware level that is offered by SAN vendors. The mirrored copy can be detached and backed up "offline" of the data warehouse. Ongoing changes to the warehouse will accumulate and "catch up" to the mirrored version in a matter of hours after reattachment, usually during a batch window.
Some take advantage of the automatic creation of this "other" data warehouse and use it for maintaining continuous availability. While one is being loaded, the other remains available
The impact to performance of this approach is actually hardly noticeable. The main drawback is the cost of the hardware mirroring. Regardless, keep your load files and logs available for weeks just in case.
For more information, check out SearchCRM's Best Web Links on Data Warehousing.
This was first published in May 2002