Friday, May 30, 2008

Junk dimensions

OLTP tables that are full of flag fields and yes/no attributes, many of which are used for operational support and have no documentation except for the column names.
There are three conventional ways to deal with these attributes: discard all of the miscellaneous attributes, eliminating them from the dimensional design; incorporate the miscellaneous attributes into the fact table; or make each miscellaneous attribute a separate dimension. However, all of these options are less than ideal.
Discarding the data can be dangerous because the miscellaneous values, flags, and yes/no fields might contain valuable business data.
Including the miscellaneous attributes in the fact table could cause the fact table to swell to alarming proportions, especially if you have more than just a few miscellaneous attributes. The increased size of the fact table could cause serious performance problems because of the reduced number of records per physical I/O. Even if you tried to index these fields to minimize the performance problems, you still wouldn’t gain anything because so many of the miscellaneous fields contain flag values such as 0 and 1; Y and N; or open, pending, and closed.
And if you make each miscellaneous attribute a separate dimension, it will most likely result in a complicated dimensional designyou’re looking at a much more complicated star schema and associated cube.
Column names such as Completed, Packed, Shipped, Received, Delivered, and Returned (each with yes/no data values) are very common, and they do have business value. Instead of discarding flag fields and yes/no attributes, I suggest placing them all into a junk dimension that’s organized
The junk dimension shown in Figure 1 represents an order-fulfillment system; the column headers show some of the possible statuses an item that has been ordered can have. Row 1 indicates that the item ordered has been picked out of the warehouse, packed for shipment, shipped, delivered, received, returned for a refund, and restocked in the warehouse. Row 9 shows an item on order that’s waiting to begin the order-fulfillment process. The rows in between indicate items that are in various stages of the orderfulfillment process. This example is very simple because the process is so linear and sequential,

Thursday, May 29, 2008

Degenerate dimension

Degenerate dimension is not one that lacks moral structure or integrity. Instead, a degenerate dimension is a dimension that doesn’t exist as a table but is represented in the data warehouse.
Data warehouse dimensional design requires you to include control documents such as invoices, orders, and warranties. Each of these control documents has a control number such as the invoice number, the order number, or the serial number of the item under warranty.

Degenerate dimensions are simply control numbers that are stored in the fact table of a data warehouse. These control numbers look like keys, but they don’t act like keys; they have no associated dimension to join with. Control numbers provide a way to identify which line items in the fact table were generated as a part of the same order or invoice. Let’s take a look at how to map control numbers from the OLTP database to the fact table in the data warehouse and associate them with each line item