29th Apr 2021 6 minutes read

Why Do You Need a Primary Key in a Database Table?

data engineering

Table of Contents

What Is a Primary Key?
How Primary Keys Help Ensure Data Consistency
Primary Keys and Indexes
Next Steps with SQL Primary Keys

What is a primary key and why is it so important in SQL? Find out in this article.

Primary keys are a central element in SQL databases. They provide a unique value that can identify a specific row in a table, which opens up a lot of other benefits. This article covers what a primary key is, how it works, and how to use it in an SQL database.

Are you ready to learn the basics of primary keys and grow your SQL skills? Then let’s get started.

What Is a Primary Key?

In SQL databases, every table should have a primary key. The primary key (PK) is formed by one or several columns and is used to uniquely identify every record stored in the table. For example, let’s suppose we have the invoice table, as shown below:

invoice_number	invoice_date	customer_id	total_amount	payment_status
1000	2021-02-24	24	34.70	Paid
1001	2021-03-04	87	52.10	Paid
1002	2021-03-08	24	16.70	Paid

The most suitable column to identify a record in the table invoice is invoice_number, because every value in invoice_number can identify at most one record; in other words, we can’t have two records with the same invoice_number.

We can also easily see that the other table columns are not good candidates for the primary key. We can’t identify a record using customer_id because we can have many invoices from the same customer. The same is true for the date, total amount, and payment status.

Let's show how to implement the table invoice in SQL.

CREATE TABLE invoice (
             invoice_number  integer PRIMARY KEY,
             invoice_date    date,
             customer_id     integer,
             total_amount    numeric,
             payment_status  varchar(20)
);

In the above SQL code, we can see how to create the invoice table and define the column invoice_number as the primary key. Once we define a column as a primary key, a unique constraint is imposed on the PK column – it does not allow duplicate values. If we try to create a new invoice record using an existing invoice_number, this new record will be rejected. Some databases also impose a NOT NULL constraint on the primary key column, meaning that the database will also reject any record with a NULL value in the primary key column.

How does the unique constraint work when we have multi-column primary keys? The combination of all the values in the primary key columns must be unique. Let’s see an example using the items_sold table, where we record what products are included in each invoice:

invoice_number	item_number	product_id	quantity_sold
1000	1	544	2
1000	2	126	1
1001	1	544	1
1001	2	431	3
1002	1	544	1

The primary key of items_sold is a multi-column primary key formed by the invoice_number and item_number columns. If we look at the items_sold data, we can see duplicated values in the invoice_number and item_number columns individually, but the combinations of invoice_number and item_number are unique. When the table doesn’t have any single column suitable for the primary key (like items_sold), we should think of a multi-column primary key.

CREATE TABLE items_sold (
             invoice_number  integer,
             item_number     integer,
             product_id      integer,
             quantity_sold   integer,
             PRIMARY KEY (invoice_number, item_number)
);

In the above SQL, we can see how to create a multi-column PK; note that the syntax is different to the syntax used in the first example where we created a single-column PK. The syntax used in the last example can be used for multi- and single-column primary keys.

For additional examples, I suggest the article What is a Primary Key in SQL, where you can learn more about primary key implementation. I’d also like to suggest our course on The Basics of Creating Tables in SQL, where you can get experience working with SQL database objects like tables, primary keys, and foreign keys.

How Primary Keys Help Ensure Data Consistency

Maintaining data consistency is a critical task. One of the most common data consistency problems is duplicate records; thanks to their unique constraint, primary keys help prevent this. If we try to insert a new record that uses an existing value in the primary key, the record will be rejected. We can see this in the following image:

insert into invoice values (1001,101,now()::date,32.45,'paid');

insert into invoice values (1001,101,now()::date,32.45,'paid');

ERROR:  duplicate key violation constraint «invoice_pkey»
********** Error **********
SQL state: 23505
DETAIL:  Already exists key (invoice_number)=(1001).

Primary keys also help enforce referential integrity, another important part of data consistency. Let’s explain referential integrity with an example. We have the column customer_id in the table invoice. We need to be sure that each value in customer_id is a valid customer identification value. In other words, a record with a matching id value must exist in the table customer for every value stored in invoice.customer_id.

Referential integrity is checked by using foreign key constraints. Foreign keys are a kind of constraint where a column in a table (in our example, the invoice.customer_id column) is associated with (usually) a primary key in another table (in our example, the customer table). The foreign key column in the referencing table can only have values that exist in the primary key of the referenced table.

If you want to go deeper into foreign keys, try the article What is a Foreign Key in SQL. It explains foreign keys, how to implement them, and how they are related to primary keys.

Primary Keys and Indexes

Every time a primary key is created, the database creates an index on that table. The index key is composed of the same column(s) as the primary key. This index is created implicitly, as part of the primary key creation. In some databases, a PK index is created as a unique key index, to avoid the duplication of primary key values. At the same time, every query using a search criteria based on the primary key columns (which frequently happens) can be solved very quickly by the index. For example:

SELECT customer_id, invoice_date
FROM   invoice
WHERE  invoice_number = 1001

The above query searches for records based on the primary key column invoice_number; thus, the index with the invoice_number key will be used. When tables have many records, an index allows them to solve queries faster; however, to use an index there must be a match between the query search criteria and the index key. In the previous example, the WHERE search criteria and the index keys match: both are on invoice_number.

If you want to learn more about indexes, check out the article Learn SQL Track of the Season: Creating Database Structure, which covers indexes (among other topics, like data engineering and SQL constraints).

Next Steps with SQL Primary Keys

In this article, we’ve covered primary keys, explaining what they are, how to implement them, and how they help maintain data consistency.

Before closing, I want to take advantage of the fact that perhaps some readers are interested in a data engineering career. It’s one of the careers with the most potential, so it’s worth checking out. I suggest the article New Learning Path: Data Engineering to learn about what a data engineer needs to know.

Tags:

data engineering

What Is a Primary Key?

How Primary Keys Help Ensure Data Consistency

Primary Keys and Indexes

Next Steps with SQL Primary Keys

You may also like