Types of Facts
There are three types of facts:
- Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
- Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
- Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Let us use examples to illustrate each of the three types of facts. The first example assumes that we are a retailer, and we have a fact table with the following columns:
| Date |
| Store |
| Product |
| Sales_Amount |
The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week.
Say we are a bank with the following fact table:
| Date |
| Account |
| Current_Balance |
| Profit_Margin |
The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day. Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level.
Types of Fact Tables
Based on the above classifications, there are two types of fact tables:
- Cumulative: This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.
- Snapshot: This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.
What is a Junk Dimension in Datawarehousing
The
junk dimension is simply a structure that provides a convenient place to store
the junk attributes. It is just a collection of random transactional codes,
flags and/or text attributes that are unrelated to any particular dimension.
In
OLTP tables that are full of flag fields and yes/no attributes, many of which
are used for operational support and have no documentation except for the
column names and the memory banks of the person who created them. Not only do
those types of attributes not integrate easily into conventional dimensions
such as Customer, Vendor, Time, Location, and Product, but you also don’t want
to carry bad design into the data warehouse.However, some of the miscellaneous
attributes will contain data that has significant business value, so you have
to do something with them.
This
scenario is especially common in legacy systems and databases that were created
without solid, underlying design principles. Column names such as Completed,
Packed, Shipped, Received, Delivered, and Returned (each with yes/no data
values) are very common, and they do have business value.These miscellaneous
indicators and flags that don't logically belong to the core dimension
tables. They are either too valuable to ignore or exclude.Often the meaning
of the flags and text attributes is obscure. This situation leaves the designer
with a number of bad alternatives
Designers
sometimes want to treat them as Fact or make it into numerous small Dimensional
tables. However, all of these options are less than ideal. Discarding the data
can be dangerous because the miscellaneous values, flags, and yes/no fields
might contain valuable business data. Including the miscellaneous attributes in
the fact table could cause the fact table to swell to alarming proportions,
especially if you have more than just a few miscellaneous attributes. The
increased size of the fact table could cause serious performance problems
because of the reduced number of records per physical I/O. Even if you tried to
index these fields to minimize the performance problems, you still wouldn’t
gain anything because so many of the miscellaneous fields contain flag values
such as 0 and 1; Y and N; or open, pending, and closed.
A
third, less obvious but preferable, solution is to incorporate a Junk
Dimension as a holding place for these flags and indicators.
Advantage
of junk dimension:
- It provides a recognizable
location for related codes, indicators and their descriptors in a
dimensional framework.
- This avoids the creation of
multiple dimension tables.
- Provide a smaller, quicker point
of entry for queries compared to performance when these attributes are
directly in the fact table.
- An interesting use for a junk
dimension is to capture the context of a specific transaction. While
our common, conformed dimensions contain the key dimensional attributes of
interest, there are likely attributes about the transaction that are not
known until the transaction is processed.
Above
figure shows a junk dimension. As in any dimensional design, each of the rows
in the fact table will be associated with a row in this junk dimension.
Simple Data warehouse - Junk Dimension
You
want to keep the data warehouse design as simple and straightforward as
possible, so that users will be able to access data easily. Miscellaneous
attributes that contain business value are a challenge to include in your data
warehouse design because they don’t fit neatly into conventional dimensions,
and if improperly handled, can cause the data warehouse to swell in size and
perform suboptimally. By placing miscellaneous attributes into junk dimensions,
you can circumvent both of these problems.