Many are not aware of the significant differences between DBMS architectures. One architecture that most are unfamiliar with is column-oriented DBMS. These DBMS store all columns of a table together as opposed to storing all the columns of a row together.
For example, instead of:
John Smith 123 Main St. Anywhere, CA USA
Jane Doe 456 Elm St. Anywhere, CA USA
123 Main St. 456 Elm St.
is stored. Single-column linear functions such as AVG, MIN, MAX, and SUM are going to perform well in this type of storage approach because all the data needed is together and there will be fewer I/Os. Extraneous columns not relevant to the linear function do not need to be "skipped over" in this architecture. Multi-column retrievals and joins are more conducive to a row-oriented storage approach where predictable linear function needs can be pre-calculated.
Additionally, the storage method facilitates compression because it is likely that value repeatability will occur from one row to the next as with CA and USA in the example above. This facilitates compression, especially for low cardinality columns.
On balance however, all things being equal, this approach is not conducive to the active, mixed-workload environments that comprise today's best practice data warehouses.
For more information, check out SearchCRM's
This was first published in July 2002