One of the golden rules to modeling a data warehouse is that the model should be business oriented. It should be built based on the business needs and incorporate the operational transaction flow. The most common problems to matching the existing data and the defined business are incompleteness, inconsistency and understanding the desired value set for the data. The problem becomes a major one when the model introduces any entity that is not present in the operational systems but related data exists in the systems. This type of problem can be solved by the implementation of Business Object inheritance in the warehouse model. Lets take an example.
In the operational systems, we have 'Contact', 'Customer', 'User', 'Member' and 'Employee'. These categories may have their own subcategories like 'Contacts' may be grouped into 'General' and 'Potential' and 'Employee' may be grouped into 'InService', 'Retired' and 'Customer' may be grouped into 'Old' and 'New'. In a conventional way they become respective areas in the warehouse as dimensions but this potentially increases the size and volume of the warehouse.
If there is a 'Region' entity stating the geographical location, it should be referenced to all the stated entities, which, in turn, will create a 'referential mesh' and will increase simultaneous query time for the entities and will make the model more complex to work with, as well. Lets introduce the Business Object concept here to make this
We may introduce one 'master' entity -- 'Person'. All the stated entities will be derived from that. The master entity, 'Person' in this case, will contain all the common attributes of all the entities ('Customer', 'Employee', 'Member' etc) like ID, Name, Address, Date of Birth, Nationality, Visa Status, etc. The classification will begin in the derived entity.
'Employee' is a type of person, a person who will use all the attributes of the master along with its own attributes like Date of Join, Emp Number, etc. For a more organized approach we may derive another step from 'Employee'. 'InService' and 'Retired', will contain 'salary', 'date of retirement' and 'pension', 'date retired' respectively. The diagram in the next tip will illustrate the approach.
For more information, check out SearchCRM's Best Web Links on Data Warehousing.
This was first published in April 2002