Types of Facts
There are three types of facts:- Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
- Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
- Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Date |
Store |
Product |
Sales_Amount |
Say we are a bank with the following fact table:
Date |
Account |
Current_Balance |
Profit_Margin |
Types of Fact Tables
Based on the above classifications, there are two types of fact tables:- Cumulative: This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.
- Snapshot: This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.
What is a Junk Dimension in Datawarehousing
The
junk dimension is simply a structure that provides a convenient place to store
the junk attributes. It is just a collection of random transactional codes,
flags and/or text attributes that are unrelated to any particular dimension.
In
OLTP tables that are full of flag fields and yes/no attributes, many of which
are used for operational support and have no documentation except for the
column names and the memory banks of the person who created them. Not only do
those types of attributes not integrate easily into conventional dimensions
such as Customer, Vendor, Time, Location, and Product, but you also don’t want
to carry bad design into the data warehouse.However, some of the miscellaneous
attributes will contain data that has significant business value, so you have
to do something with them.
This scenario is especially common in legacy systems and databases that were created without solid, underlying design principles. Column names such as Completed, Packed, Shipped, Received, Delivered, and Returned (each with yes/no data values) are very common, and they do have business value.These miscellaneous indicators and flags that don't logically belong to the core dimension tables. They are either too valuable to ignore or exclude.Often the meaning of the flags and text attributes is obscure. This situation leaves the designer with a number of bad alternatives
Designers sometimes want to treat them as Fact or make it into numerous small Dimensional tables. However, all of these options are less than ideal. Discarding the data can be dangerous because the miscellaneous values, flags, and yes/no fields might contain valuable business data. Including the miscellaneous attributes in the fact table could cause the fact table to swell to alarming proportions, especially if you have more than just a few miscellaneous attributes. The increased size of the fact table could cause serious performance problems because of the reduced number of records per physical I/O. Even if you tried to index these fields to minimize the performance problems, you still wouldn’t gain anything because so many of the miscellaneous fields contain flag values such as 0 and 1; Y and N; or open, pending, and closed.
A
third, less obvious but preferable, solution is to incorporate a Junk
Dimension as a holding place for these flags and indicators.
Advantage
of junk dimension:
- It provides a recognizable location for related codes, indicators and their descriptors in a dimensional framework.
- This avoids the creation of multiple dimension tables.
- Provide a smaller, quicker point of entry for queries compared to performance when these attributes are directly in the fact table.
- An interesting use for a junk dimension is to capture the context of a specific transaction. While our common, conformed dimensions contain the key dimensional attributes of interest, there are likely attributes about the transaction that are not known until the transaction is processed.
Above
figure shows a junk dimension. As in any dimensional design, each of the rows
in the fact table will be associated with a row in this junk dimension.
Simple Data warehouse - Junk Dimension
You
want to keep the data warehouse design as simple and straightforward as
possible, so that users will be able to access data easily. Miscellaneous
attributes that contain business value are a challenge to include in your data
warehouse design because they don’t fit neatly into conventional dimensions,
and if improperly handled, can cause the data warehouse to swell in size and
perform suboptimally. By placing miscellaneous attributes into junk dimensions,
you can circumvent both of these problems.