A fact is a measure which is being analyzed in context of dimensions. In Data warehousing fact table refers to a table where all facts are stored.
e.g. in Monthly Revenue by State report “Revenue” is fact and it will be stored in fact table.
In above example Fact_Sales is a fact table where all facts are stored.
Points to note:
- In Data Warehousing schema , Fact table is a central table to which dimensions are joined using 1-n relationship.
- Fact table generally contains numerical facts which are additive however its not always true e.g. consider fact less fact table.
- Combination of foreign keys in fact table forma unique key and fact table does not require unique primary key unless for auditing purpose.
- Its the biggest table in data warehousing in terms of volume of data.
Dimension is a textual attribute against which facts or measures are analyzed. Dimension provides the analysis context to the measures.
e.g. consider a report which shows Monthly Revenue by State.
If you carefully read the statement without the “State”, “Monthly” , “Revenue” is just a number and does not carry any meaning.
If you simply remove the state and month from above example you will see revenue is just a number and does not carry any meaning if dimensions are not used to analyze it.
In Data warehouse dimensions are stored in a table called as Dimension Table which is joined with fact table using one-to-many relationship.
In above image, product, store, date are dimension table which contains dimension attributes or simply dimensions.
Important points to Note about dimension tables.
- Dimension table contain attributes against which facts/measures are analyzed.
- Generally dimension table contains textual attributes however it may contain numerical value which is dimension attribute.
- Dimension table may contain one or more hierarchies.
- Dimension table contains natural key as well as surrogate key to uniquely identify the row.
Hierarchy represents the relationship between data elements in a such a way that they can relate to each other as one above another, one below another. Basically it represents the how each attribute each related to each other in terms of level.
Classic example of hierarchy which you would see in almost every data warehouse design in Date hierarchy where which forms a relationship as Year->Month->Day-Week etc.
Generally end users like to browse the data by hierarchy as it helps him to explore data from of less detailed to more detailed way.
e.g. one might start exploring a Revenue by Year and after seeing a revenue drop in particular year he. She may click on a year and go further drill down to see which quarter is worst quarter in that year in terms of revenue drop.
Points to note about Hierarchies.
- You can have multiple hierarchies in a table
e.g. Calendar data hierarchy and Fiscal Date hierarchy in Date dimension.
- Hierarchies are useful to define the drill down/drill up navigation path in BI tools for reporting purpose
- There are three main types of hierarchies in a general dimensional data warehouse design.
- Balanced Hierarchy.
- Unbalanced Hierarchy.
- Ragged Hierarchy.
If you are working in data warehousing project or going to work on data warehouse project, you will often hear these two names and differences in their methodologies towards data warehousing. These two people are the guru’s of data warehousing methodologies and they have different methodologies towards data warehousing. Lets understand the basic difference between Ralph Kimball and Bill Inmon approaches towards data warehouse.
[click to continue…]
In data warehouse environment each dimension table should have primary key which uniquely identifies dimension record. We can use natural keys or business keys as a primary key in dimension table however these keys are not recommended to be used as primary key in dimension table due to following reasons.
[click to continue…]