Friday, May 30, 2008

Junk dimensions

OLTP tables that are full of flag fields and yes/no attributes, many of which are used for operational support and have no documentation except for the column names.
There are three conventional ways to deal with these attributes: discard all of the miscellaneous attributes, eliminating them from the dimensional design; incorporate the miscellaneous attributes into the fact table; or make each miscellaneous attribute a separate dimension. However, all of these options are less than ideal.
Discarding the data can be dangerous because the miscellaneous values, flags, and yes/no fields might contain valuable business data.
Including the miscellaneous attributes in the fact table could cause the fact table to swell to alarming proportions, especially if you have more than just a few miscellaneous attributes. The increased size of the fact table could cause serious performance problems because of the reduced number of records per physical I/O. Even if you tried to index these fields to minimize the performance problems, you still wouldn’t gain anything because so many of the miscellaneous fields contain flag values such as 0 and 1; Y and N; or open, pending, and closed.
And if you make each miscellaneous attribute a separate dimension, it will most likely result in a complicated dimensional designyou’re looking at a much more complicated star schema and associated cube.
Column names such as Completed, Packed, Shipped, Received, Delivered, and Returned (each with yes/no data values) are very common, and they do have business value. Instead of discarding flag fields and yes/no attributes, I suggest placing them all into a junk dimension that’s organized
The junk dimension shown in Figure 1 represents an order-fulfillment system; the column headers show some of the possible statuses an item that has been ordered can have. Row 1 indicates that the item ordered has been picked out of the warehouse, packed for shipment, shipped, delivered, received, returned for a refund, and restocked in the warehouse. Row 9 shows an item on order that’s waiting to begin the order-fulfillment process. The rows in between indicate items that are in various stages of the orderfulfillment process. This example is very simple because the process is so linear and sequential,

No comments: