In order to achieve real-time data warehousing, defined loosely as no more than five minutes from an event's occurrence to its appearance in the data warehouse, there are two main issues that shops must have a plan for dealing with. The first is the cooperation and lack of intrusion into the applications being sourced. Many legacy applications are fragile, expensive and unruly to update. Many are non-relational and ill prepared to support extracts of any kind except during nightly batch windows. These limitations are driving many "operational" functions into the warehouse environment. I know several shops that list the inability to source in real-time for the data warehouse as being a key driver for their proposed replacement.
Of course, once source systems are upgraded, numerous reporting and real-time access capabilities usually accompany such upgrade diminishing the need for real-time data warehousing.
The second impediment is the impact on usage of the data warehouse itself. It stands to reason that if real-time sourcing and loading is a requirement that the data access would be rather active as well. User usage of the warehouse should not (can not) be disrupted by loading activity. If you cannot achieve real-time data warehousing without taking the warehouse down or severely restricting usage of the warehouse throughout the day, keep trying. Usually partitioning strategies, whether manual or automatic/DBMS-driven, are sufficient for achieving
Overcoming these two impediments, whether with a more exercised and frequently scheduled "batch" ETL or with the inclusion of EAI in the architecture provides a rich reward to the successful program -- a real-time data warehouse.
For more information, check out SearchCRM.com's Best Web Links on Business Intelligence.
This was first published in October 2002