Data Warehousing >
Concepts >
Fact Table Granularity
Granularity
The first step in designing a fact table is to determine the
granularity of the fact table. By granularity, we mean the
lowest level of information that will be stored in the fact table. This
constitutes two steps:
- Determine which dimensions will be included.
- Determine where along the hierarchy of each dimension the
information will be kept.
The determining factors usually goes back to
the requirements.
Which Dimensions To Include
Determining which dimensions to include is usually a straightforward
process, because business processes will often dictate clearly what are
the relevant dimensions.
For example, in an off-line retail world,
the dimensions for a sales fact table are usually time, geography,
and product. This list, however, is by no means a complete list for all
off-line retailers. A supermarket with a Rewards Card program, where
customers provide some personal information in exchange for a rewards
card, and the supermarket would offer lower prices for certain items for
customers who present a rewards card at checkout, will also have the
ability to track the customer dimension. Whether the data warehousing
system includes the customer dimension will then be a decision that needs to
be made.
What Level Within Each Dimensions To Include
Determining which part of hierarchy the information is stored along each
dimension is a bit more tricky. This is where user requirement (both stated
and possibly future) plays a major role.
In the above example, will the
supermarket wanting to do analysis along at the hourly level? (i.e.,
looking at how certain products may sell by different hours of the day.) If so,
it makes sense to use 'hour' as the lowest level of granularity in the time
dimension. If daily analysis is sufficient, then 'day' can be used as the
lowest level of granularity. Since the lower the level of detail, the larger
the data amount in the fact table, the granularity exercise is in essence
figuring out the sweet spot in the tradeoff between detailed level of analysis
and data storage.
Note that sometimes the users will not specify certain requirements, but
based on the industry knowledge, the data warehousing team may foresee
that certain requirements will be forthcoming that may result in the need of
additional details. In such cases, it is prudent for the data warehousing
team to design the fact table such that lower-level information is included.
This will avoid possibly needing to re-design the fact table in the future.
On the other hand, trying to anticipate all future requirements is
an impossible and hence futile exercise, and the data warehousing team needs
to fight the urge of the "dumping the lowest level of detail into the data
warehouse" symptom, and only includes what is practically needed. Sometimes
this can be more of an art than science, and prior experience will become
invaluable here.