22nd Oct 2020 8 minutes read

What is a Primary Key in SQL?

Data Engineering

Table of Contents

How to Identify Rows in a Table
What Is a Primary Key?
How to Create a Primary Key
Non-Numeric Primary Keys
Primary Keys With More Than One Column
Next Steps With Primary Keys

Primary keys are an important concept in SQL databases. They provide a unique ID for every row in a database table. As an SQL developer, you should know what primary keys are, what data values are allowed in them, and how to create them. Read this article to learn the ABCs of primary keys in 10 minutes.

How to Identify Rows in a Table

Tables are the main objects in an SQL database, and as you probably know, tables store records or rows. In order to identify each row of a table, we need to find a column in the table that has a different value for every row.

For example, if we have a table of United States citizens, we can use the column social_security_number to identify each row in the table. As another example, if we have a table of saving accounts at a bank, the column account_number can be used to uniquely identify rows.

Not all tables have a natural identifier like this though. We’ll talk about this later in the article.

What Is a Primary Key?

Once you identify a column that has a different value for each row in the table, you can create a primary key using this column. You can view the primary key as the “main identifier” for every row in the table based on a certain column.

Primary keys implement a technical mechanism to check if every row has a unique, non-empty value in the primary key column. For example, if you try to insert a new row with a duplicate value in the primary key column, this row will be rejected by the primary key.

Another feature of primary keys is a control to avoid NULL values. NULL values are not allowed in the primary key column.

In the case of the table citizen, a row with a NULL value in the social_security_number column will be rejected by the primary key. In other words, primary keys guarantee all rows in the table have unique and non-null values in the primary key column.

Question: What is a SQL Primary Key?

In SQL databases, a primary key is a unique identifier for each table row; it does not allow duplicate or NULL values and guarantees the distinctness of each row in that table.

How to Create a Primary Key

Suppose we want to create the table citizen as seen below:

social_security_number	last_name	first_name	born_date
721071426	Doe	John	06-25-1912
211271298	Smith	Mary	01-23-1903

Now, let’s talk about the SQL syntax for creating a primary key. If we want to define a primary key for the table citizen, we can create the table in the following way:

CREATE TABLE citizen (
	social_security_number	integer PRIMARY KEY,
	last_name 			varchar(40),
	first_name 			varchar(40),
	born_date			date
);

We used the PRIMARY KEY clause in the CREATE TABLE statement to define which column is the primary key. After defining the primary key, if we try to insert a new record with a duplicate value in the social_security_number column, the INSERT will fail with the following error:

INSERT INTO citizen VALUES (721071426, 'Kant','Peter','1920-09-22');

ERROR:  duplicated key violation for primary key citizen_pkey
DETAIL:  Already existing key (social_security_number)=(721071426).

At this point, I want to share a question that I have heard several times. Is it mandatory to have a primary key in each table? No, it is not. You can create an SQL table without defining a primary key.

However, database modeling best practices suggest creating a primary key for every table in the database. If you have a primary key, you can be sure you will not have duplicates and NULL values in the primary key column.

Before continuing to the next section, I would like to suggest the article How to Create a Table in SQL where you can learn about data types and NULL constraints among other important concepts.

Non-Numeric Primary Keys

In previous examples, we showed only numeric primary keys. However, it is possible to have primary keys of other data types.

For example, let’s suppose we have a table that stores all the airports in the world. Some of the columns are airport_name, airport_city, airport_country, and airport_code. The airport code is a unique three-letter code assigned to each airport by an international aviation organization. We can use this non-numeric code as the primary key.

airport_code	airport_name	airport_city	airport_country
JFK	John Kennedy	New York	USA
LGA	La Guardia	New York	USA
CDG	Charles de Gaulle	Paris	France
BCN	El Prat	Barcelona	Spain

The following SQL command creates the table airport with a primary key in the column airport_code:

CREATE TABLE airport (
	airport_code 	char(3) PRIMARY KEY,
	airport_name 	varchar(40),
	airport_city 	varchar(40),
	Airport_city 	varchar(40),
);

Note that we use a char(3) data type for the airport_code to store the three-letter code.

Another example is a table with all of the registered properties in a specific state. Each property is identified by a two-letter property ID code followed by an eight-digit number, like BV-11234134. The property_id column can be the primary key.

CREATE TABLE property (
	property_id			char(11) PRIMARY KEY,
	property_district	 	varchar(40),
	property_owner 		varchar(40),
	property_value 		numeric(10,2)
);

Because property_id is a combination of letters and numbers, we use a char(11) data type to store it.

Another example of a non-numeric primary key is a table of flights of an airline company. Usually, flight numbers are a combination of letters and numbers of a fixed length. So, in this case, we will reserve 10 positions for the letters and numbers in the flight number.

CREATE TABLE flight (
	flight_number			char(10) PRIMARY KEY,
	airport_code_origin 		char(3),
	airport_code_destination 	char(3),
	flight_duration			interval,
);

Although non-numeric primary keys are technically feasible, numeric primary keys tend to perform better. Databases operate more quickly with numbers than strings. In some cases, when performance is critical, you should consider adding a numeric column to be used as an artificial primary key.

For example, if we decide to add a numeric column to the flight table in order to use it as a primary key, we would write the following command:

CREATE TABLE flight (
	flight_id				integer PRIMARY KEY,
	flight_number			char(10),
	airport_code_origin 		char(3),
	airport_code_destination 	char(3),
	flight_duration			interval,
);

Primary Keys With More Than One Column

In some cases, there is not a natural one-column primary key. In these cases, the primary key is usually composed of two or more columns. These kinds of primary keys are called multi-column primary keys, or composite primary keys, and are very common.

For example, let’s suppose we have the table reservation for a restaurant with columns customer_name, reservation_day, reservation_time, and number_of_people. We cannot use customer_name alone as a primary key because the customer might have more than one reservation.

In order to solve this problem, we need to add an additional column to the primary key to be sure the value of the primary key is unique. So, we add the column reservation_day to the primary key, expecting that the combination of the values of customer_name and reservation_day will be unique.

However, what about if a customer wants to make two reservations for the same day: one for lunch and one for dinner? The database will reject the second reservation record due to a duplicate primary key value. To avoid this situation, we can add a third column to the primary key: reservation_time.

The excerpt of the table reservation below has two rows that will not be possible to store if the primary key is formed only by customer_name and reservation_day.

customer_name	reservation_day	reservation_time	number_of_people
John Doe	2020-09-20	11:30 AM	2
John Doe	2020-09-20	8:00 PM	2

The syntax to add a multicolumn primary key is different from adding a single column primary key. The multicolumn primary key is defined separately from the columns. You list the primary key columns in parentheses after the PRIMARY KEY keyword:

CREATE TABLE reservation (
	customer_name	varchar(40),
	reservation_day	date,
	reservation_time	time,
	number_of_people	integer,
	PRIMARY KEY (customer_name, reservation_day, reservation_time)
);

In general, tables used to connect two other tables are good candidates for multi-column primary keys. Suppose we have a database for a university, where we have the tables course and student. To represent the enrollment of students in courses, we have a table called enrollment with columns student_id, course_id, and start_date among others.

If you analyze the table enrollment, you will find we need a multi-column primary key based (at least) on the columns course_id and student_id. Why? We cannot use course_id alone because we can have several students enrolled in the same course. Similarly, we cannot use student_id alone because the same student can be enrolled in several courses.

The following SQL command creates the table enrollment:

CREATE TABLE enrollment (
	course_id 	integer,
	student_id 	integer,
	start_date	date,
	PRIMARY KEY (course_id, student_id)
);

If you want to go deeper and learn more about how to create tables, views, and indexes, I suggest the course Creating Database Structure which has many examples and exercises.

Next Steps With Primary Keys

In this article, I went over what a primary key is, the main purposes of primary keys, the different types of primary keys, and the syntax for creating primary keys in SQL.

There is another concept in SQL where primary keys play an important role: foreign keys. If you want to learn about foreign keys, I suggest the course The Basics of Creating Tables in SQL where you can learn more about primary keys and how they relate to foreign keys.

If you are interested in advancing in your SQL career, I suggest these two articles about a new field in IT called Data Engineering: What is Data Engineering? and LearnSQL.com’s New Learning Path: Data Engineering. Remember, when you increase your SQL skills, you increase your assets!

Tags:

Data Engineering