Tuesday, June 28, 2011

Type 4 Slowly Changing Dimensions Design - Toolbox for IT Groups

Type 4 Slowly Changing Dimensions Design - Toolbox for IT Groups
I was just wondering in a type 4 SCD does the surrogate key ever change between the current table and the history table.
My response to this question is:

First of all, a good understanding of what a "surrogate key" really is is needed. For that, see my blog site at http://itpro420.blogspot.com/2011/06/what-is-difference-between-surrogate.html.
As I state in my blog, there really is no such thing as a "surrogate key"; the term is used to describe a Primary Key (PK) that uses a surrogate value.
If you follow C. J. Date's rules, every table would have a PK using a surrogate value, ergo a "surrogate key". But each table would also have a "natural key", or a column or group of columns that uniquely identify a row (enforced by a unique index).

If the source database system uses a "surrogate key" (PK in the source "current database") that is brought into the data warehouse (DW - a "temporal database"), that column's value will never change. However, the row in the DW should also have a PK that uses a surrogate value. In order to keep track of changes to specific columns in that row, audit columns must be used such as "start date", "end date", "current".
Read a good, IMHO, definition of "surrogate key" at http://en.wikipedia.org/wiki/Surrogate_key.
Now, an understanding of Slowly Changing Dimensions is needed. Since you mentioned Type 4 (SCD4), let me explain that this is nothing more than a Type 3 (SCD3) kept in a separate table. Therefore, SCD4 requires a "current dimension table" that contains only currently valid records and separate "historical dimension table" that contains the history of changes; SCD3 does this in one table. I have rarely seen SCD4 used because it requires more database space, management, and views to use by users to find the value that is either current or in use at a previous time. But, in either case, the DW PK that uses a surrogate value and the source's PK using a surrogate value will never change.

No comments:

Post a Comment