Introduction
There is no difference to the meanings below due to the data model; these are relational database management system (RDBMS) concepts. Whether you are using a normalized model or a dimensional model, the "primary key", "surrogate key", "natural key", and "alternate key" terms all have the same meanings.A Google search turned up lots of places to read up on keys, including these:
- http://en.wikipedia.org/wiki/Unique_key
- http://en.wikipedia.org/wiki/Surrogate_key
- http://www.dbdebunk.com/page/page/622337.htm
- http://databases.about.com/cs/administration/g/primarykey.htm
- Methods for primary and secondary key selection (Data base management)
Primary Key
A Primary Key (PK) in a database table uniquely identifies a row in the table. There can be only one PK per table. Depending on the database engine being used, a PK may require the explicit creation of a unique index or the database engine may automatically create such an index for the column or columns identified as the PK column(s).A primary key can consist of a single column using a "surrogate" value or be made up of one or more columns in the table that will uniquely identify a row (also called a "natural" or "meaningful" key).
Surrogate Key, aka Meaningless Key
The term "Surrogate Key" describes a particular type of Primary Key (PK). There really is no such thing as a "Surrogate Key". There are Primary Keys that use surrogate values. A PK that uses a surrogate value always consists of only one column.The surrogate value is generally a system-generated value that nobody would know what it is to search for from a query. This PK may then be used as a foreign key (FK) in other tables and used for join conditions.
A Primary Key that uses a surrogate value is also known as a "meaningless" key. A table with only a "surrogate key" (primary key using an arbitrary value) must have an alternate key (AK) (see below). Without an AK, it is very likely that there will be "duplicate" rows in the table. An explanation of this follows below.
Natural Key, aka Meaningful Key
The other type of PK is called a "natural key". A natural (or meaningful) primary key is made up of one or more columns in a row that uniquely identify that row. The column or columns used for a natural key contain values that have meaning to the business.If the natural key is the PK, then there can be no surrogate key. If the natural key is not the PK, it is an alternate key.
Alternate Key
While there can be only one primary key in a table, there may be more than one "alternate" keys in a table. These alternate keys are controlled by one or more unique indexes. A table with only a "surrogate key" (primary key using an arbitrary number) and no unique index for an alternate key is useless - you are doomed to have duplicates in rows for every column EXCEPT the surrogate key which does not help you find a unique row at all.Generally, the columns used for the alternate key(s) are the columns that people will use in their queries to find a unique row, but the joins to other tables will be done using the single PK column comprising the "meaningless key" with the PK column being used as a FK column in those other tables.
The use of surrogate key values prevents the need to duplicate multiple columns in "child" tables just to be able to find the children of the "parent" record in a JOIN (or vice versa).
Impact of Primary Key Design
Let's examine the impact and implications of designing a table's primary key. Table_1, below, is a table that has a PK made up of a surrogate value, hence "Surrogate Key", and no AK.T1_PK_ID | T1_COL1 | T1_COL2 | T1_COL3 |
---|---|---|---|
1 | A | B | C |
2 | A | B | C |
3 | A | D | E |
While the three rows above are unique in the PK value (PK_ID column), rows 1 and 2 are really not unique rows, are they.
A table with only a "surrogate key" (PK using an arbitrary value) and no unique index for an alternate key (or the natural key) is useless - you are doomed to have duplicates in rows for every column EXCEPT the surrogate key which does not help you find a unique row at all.
If Table_1 were created with a unique index on the columns that make up an alternate key, it would look like Table_2 below.
T2_PK_ID | T2_AK1 | T2_AK2 | T2_COL3 |
---|---|---|---|
1 | A | B | C |
2 | A | D | E |
A primary key can consist of a single "surrogate" value or be made up of one or more columns in the table that will uniquely identify a row (also called a meaningful key). If Table_1 were recreated using the natural key, it would look like Table_3 below.
T3_PK1 | T3_PK2 | T3_COL3 |
---|---|---|
A | B | C |
A | D | E |
As an example of what I stated above, let's assume the following table structure.
T4_PK | T4_AK1 | T4_AK2 | T4_COL3 | T3_PK1 | T3_PK2 |
---|---|---|---|---|---|
100 | AA | BA | CA | A | B |
101 | AA | BB | CB | A | D |
SELECT T4.T4_PK, T4.T4_AK1, T4.T4_AK2, T4.T4_COL3, T3.T3_COL3 FROM Table_4 "T4" INNER JOIN Table_3 "T3" ON T4.T3_PK1 = T3.T3_PK1 AND T4.T3_PK2 = T3.T3_PK2 WHERE T4.T4_AK1 = 'AA' AND T4.T4_AK2 = 'BA'The result from the query above would be:
T4_PK T4_AK1 T4_AK2 T4_COL3 T3_COL3 ---------------------- ------ ------ ------- ------- 100 AA BA CA CIf this same query were modified to be run against Table_1, assuming T1_COL1 is the same as T3_PK1 and T1_COL2 is the same as T3_PK2, the query would look like this:
SELECT T4.T4_PK, T4.T4_AK1, T4.T4_AK2, T4.T4_COL3, T1.T1_COL3 FROM Table_4 "T4" INNER JOIN Table_1 "T1" ON T4.T3_PK1 = T1.T1_COL1 AND T4.T3_PK2 = T1.T1_COL2 WHERE T4.T4_AK1 = 'AA' AND T4.T4_AK2 = 'BA'The result from the query above would be:
T4_PK T4_AK1 T4_AK2 T4_COL3 T1_COL3 ---------------------- ------ ------ ------- ------- 100 AA BA CA C 100 AA BA CA CInstead of a single row being returned, we now get two rows.
Remember we created a modified version of Table_1 giving it an AK as Table_2. If we run the same query above against Table_2, changing the table and column names in the JOIN condition, we see that the AK in Table_2 prevented the "duplicate" row and we get only one row returned.
SELECT T4.T4_PK, T4.T4_AK1, T4.T4_AK2, T4.T4_COL3, T2.T2_COL3 FROM Table_4 "T4" INNER JOIN Table_2 "T2" ON T4.T3_PK1 = T2.T2_AK1 AND T4.T3_PK2 = T2.T2_AK2 WHERE T4.T4_AK1 = 'AA' AND T4.T4_AK2 = 'BA' Results: T4_PK T4_AK1 T4_AK2 T4_COL3 T2_COL3 ---------------------- ------ ------ ------- ------- 100 AA BA CA CThe table below, Table_5, has a surrogate primary key (T5_PK), a natural key (T5_AK1, T5_AK2), and foreign key columns needed to join this table to Table_1 and Table_4. Since T1_PK_ID is a surrogate (meaningless) key, it is not expected that one would know what it is. Therefore, searches for a row would be based on other columns thought to uniquely identify a row, in this case T1_COL1, T1_COL2, and T1_COL3.
T5_PK | T5_AK1 | T5_AK2 | T5_COL4 | T4_PK | T1_COL1 | T1_COL2 | T1_COL3 |
---|---|---|---|---|---|---|---|
100 | AA | BA | CA | 100 | A | B | C |
101 | AA | BB | CB | 101 | A | D | E |
SELECT Table_1.T1_PK_ID FROM Table_5 INNER JOIN Table_4 ON Table_5.T4_PK = Table_4.T4_PK INNER JOIN table_1 ON Table_5.t1_col1 = table_1.t1_col1 AND Table_5.t1_col2 = Table_1.t1_col2 WHERE Table_5.t1_col1 = 'A' AND Table_5.t1_col2 = 'B' Results: T1_PK_ID ---------------------- 1 2If we wanted to get the value of Table_1.T1_COL3 for the "unique" values in the supposed "key" values in T1_COL1 and T1_COL2 using the SQL below, we will get an error.
SELECT T1_COL3 FROM Table_1 WHERE t1_pk_id = (select t1_pk_id from table_1 where t1_col1 = 'A' AND t1_col2 = 'B') Error starting at line 1 in command: SELECT T1_COL3 FROM Table_1 WHERE t1_pk_id = (select t1_pk_id from table_1 where t1_col1 = 'A' AND t1_col2 = 'B') Error report: SQL Error: ORA-01427: single-row subquery returns more than one row 01427. 00000 - "single-row subquery returns more than one row" *Cause: *Action:
Conclusion
While this started out with a question asking about the difference between a "surrogate key" and a "primary key", it evolved into a dissertaion on primary keys and how they can be designed. You should have learned:- There really is no such thing as a "surrogate key; there are primary keys using surrogate values."
- A "primary key" can be made up from
- a single column using a surrogate value
- one or more columns that have unique values and known business meanings
- A table with a PK using a single column with a surrogate value MUST have an alternate key (AK)
- There are impacts to SQL writing if the PK is designed wrong!
No comments:
Post a Comment