Surrogate Key In Database Design

In database design, it is a good practice to have a primary key for each table. There are two ways to specify a primary key: The first is to use part of the data as the primary key. For example, a table that includes information on employees may use Social Security Number as the primary key. This type of key is called a natural key. The second is to use a new field with artificially-generated values whose sole purpose is to be used as a primary key. This is called a surrogate key.

A surrogate key has the following characteristics:

1) It is typically an integer.

2) It has no meaning. You will not be able to know the meaning of that row of data based on the surrogate key value.

3) It is not visible to end users. End users should not see a surrogate key in a report.

Surrogate keys can be generated in a variety of ways, and most databases offer ways to generate surrogate keys. For example, Oracle uses SEQUENCE, MySQL uses AUTO_INCREMENT, and SQL Server uses IDENTITY.

Surrogate keys are often used in data warehousing systems, as the high data volume in a data warehouse means that optimizing query speed becomes important. Using a surrogate key is advantageous because it is quicker to join on a numeric field rather than a non-numeric field. This does come at a price — when you insert data into a table, whether via an ETL process or via an “INSERT INTO” statement, the system needs to take more resources to generate the surrogate key.

There are no hard rules on when to employ a surrogate key as opposed to using the natural key. Often the data architect would need to look at the nature of the data being modeled and stored and consider any possible performance implications. The following are examples of when it makes sense to use a surrogate key:

1) When different source systems use different keys for the same record. When we integrate the systems, instead of picking one set of keys, it is often better to use a surrogate key.

2) When we have Type 2 Slowly Changing Dimensions. In those cases, we’ll want to use the surrogate key to ensure that we keep the history of the change.