22nd Oct 2020 8 minutes read What is a Primary Key in SQL? Ignacio L. Bisso Data Engineering Table of Contents How to Identify Rows in a Table What Is a Primary Key? How to Create a Primary Key Non-Numeric Primary Keys Primary Keys With More Than One Column Next Steps With Primary Keys Primary keys are an important concept in SQL databases. They provide a unique ID for every row in a database table. As an SQL developer, you should know what primary keys are, what data values are allowed in them, and how to create them. Read this article to learn the ABCs of primary keys in 10 minutes. How to Identify Rows in a Table Tables are the main objects in an SQL database, and as you probably know, tables store records or rows. In order to identify each row of a table, we need to find a column in the table that has a different value for every row. For example, if we have a table of United States citizens, we can use the column social_security_number to identify each row in the table. As another example, if we have a table of saving accounts at a bank, the column account_number can be used to uniquely identify rows. Not all tables have a natural identifier like this though. We’ll talk about this later in the article. What Is a Primary Key? Once you identify a column that has a different value for each row in the table, you can create a primary key using this column. You can view the primary key as the “main identifier” for every row in the table based on a certain column. Primary keys implement a technical mechanism to check if every row has a unique, non-empty value in the primary key column. For example, if you try to insert a new row with a duplicate value in the primary key column, this row will be rejected by the primary key. Another feature of primary keys is a control to avoid NULL values. NULL values are not allowed in the primary key column. In the case of the table citizen, a row with a NULL value in the social_security_number column will be rejected by the primary key. In other words, primary keys guarantee all rows in the table have unique and non-null values in the primary key column. Question: What is a SQL Primary Key? In SQL databases, a primary key is a unique identifier for each table row; it does not allow duplicate or NULL values and guarantees the distinctness of each row in that table. How to Create a Primary Key Suppose we want to create the table citizen as seen below: social_security_numberlast_namefirst_nameborn_date 721071426DoeJohn06-25-1912 211271298SmithMary01-23-1903 Now, let’s talk about the SQL syntax for creating a primary key. If we want to define a primary key for the table citizen, we can create the table in the following way: CREATE TABLE citizen ( social_security_number integer PRIMARY KEY, last_name varchar(40), first_name varchar(40), born_date date ); We used the PRIMARY KEY clause in the CREATE TABLE statement to define which column is the primary key. After defining the primary key, if we try to insert a new record with a duplicate value in the social_security_number column, the INSERT will fail with the following error: INSERT INTO citizen VALUES (721071426, 'Kant','Peter','1920-09-22'); ERROR: duplicated key violation for primary key citizen_pkey DETAIL: Already existing key (social_security_number)=(721071426). At this point, I want to share a question that I have heard several times. Is it mandatory to have a primary key in each table? No, it is not. You can create an SQL table without defining a primary key. However, database modeling best practices suggest creating a primary key for every table in the database. If you have a primary key, you can be sure you will not have duplicates and NULL values in the primary key column. Before continuing to the next section, I would like to suggest the article How to Create a Table in SQL where you can learn about data types and NULL constraints among other important concepts. Non-Numeric Primary Keys In previous examples, we showed only numeric primary keys. However, it is possible to have primary keys of other data types. For example, let’s suppose we have a table that stores all the airports in the world. Some of the columns are airport_name, airport_city, airport_country, and airport_code. The airport code is a unique three-letter code assigned to each airport by an international aviation organization. We can use this non-numeric code as the primary key. airport_codeairport_nameairport_cityairport_country JFKJohn KennedyNew YorkUSA LGALa GuardiaNew YorkUSA CDGCharles de GaulleParisFrance BCNEl PratBarcelonaSpain The following SQL command creates the table airport with a primary key in the column airport_code: CREATE TABLE airport ( airport_code char(3) PRIMARY KEY, airport_name varchar(40), airport_city varchar(40), Airport_city varchar(40), ); Note that we use a char(3) data type for the airport_code to store the three-letter code. Another example is a table with all of the registered properties in a specific state. Each property is identified by a two-letter property ID code followed by an eight-digit number, like BV-11234134. The property_id column can be the primary key. CREATE TABLE property ( property_id char(11) PRIMARY KEY, property_district varchar(40), property_owner varchar(40), property_value numeric(10,2) ); Because property_id is a combination of letters and numbers, we use a char(11) data type to store it. Another example of a non-numeric primary key is a table of flights of an airline company. Usually, flight numbers are a combination of letters and numbers of a fixed length. So, in this case, we will reserve 10 positions for the letters and numbers in the flight number. CREATE TABLE flight ( flight_number char(10) PRIMARY KEY, airport_code_origin char(3), airport_code_destination char(3), flight_duration interval, ); Although non-numeric primary keys are technically feasible, numeric primary keys tend to perform better. Databases operate more quickly with numbers than strings. In some cases, when performance is critical, you should consider adding a numeric column to be used as an artificial primary key. For example, if we decide to add a numeric column to the flight table in order to use it as a primary key, we would write the following command: CREATE TABLE flight ( flight_id integer PRIMARY KEY, flight_number char(10), airport_code_origin char(3), airport_code_destination char(3), flight_duration interval, ); Primary Keys With More Than One Column In some cases, there is not a natural one-column primary key. In these cases, the primary key is usually composed of two or more columns. These kinds of primary keys are called multi-column primary keys, or composite primary keys, and are very common. For example, let’s suppose we have the table reservation for a restaurant with columns customer_name, reservation_day, reservation_time, and number_of_people. We cannot use customer_name alone as a primary key because the customer might have more than one reservation. In order to solve this problem, we need to add an additional column to the primary key to be sure the value of the primary key is unique. So, we add the column reservation_day to the primary key, expecting that the combination of the values of customer_name and reservation_day will be unique. However, what about if a customer wants to make two reservations for the same day: one for lunch and one for dinner? The database will reject the second reservation record due to a duplicate primary key value. To avoid this situation, we can add a third column to the primary key: reservation_time. The excerpt of the table reservation below has two rows that will not be possible to store if the primary key is formed only by customer_name and reservation_day. customer_namereservation_dayreservation_timenumber_of_people John Doe2020-09-2011:30 AM2 John Doe2020-09-208:00 PM2 The syntax to add a multicolumn primary key is different from adding a single column primary key. The multicolumn primary key is defined separately from the columns. You list the primary key columns in parentheses after the PRIMARY KEY keyword: CREATE TABLE reservation ( customer_name varchar(40), reservation_day date, reservation_time time, number_of_people integer, PRIMARY KEY (customer_name, reservation_day, reservation_time) ); In general, tables used to connect two other tables are good candidates for multi-column primary keys. Suppose we have a database for a university, where we have the tables course and student. To represent the enrollment of students in courses, we have a table called enrollment with columns student_id, course_id, and start_date among others. If you analyze the table enrollment, you will find we need a multi-column primary key based (at least) on the columns course_id and student_id. Why? We cannot use course_id alone because we can have several students enrolled in the same course. Similarly, we cannot use student_id alone because the same student can be enrolled in several courses. The following SQL command creates the table enrollment: CREATE TABLE enrollment ( course_id integer, student_id integer, start_date date, PRIMARY KEY (course_id, student_id) ); If you want to go deeper and learn more about how to create tables, views, and indexes, I suggest the course Creating Database Structure which has many examples and exercises. Next Steps With Primary Keys In this article, I went over what a primary key is, the main purposes of primary keys, the different types of primary keys, and the syntax for creating primary keys in SQL. There is another concept in SQL where primary keys play an important role: foreign keys. If you want to learn about foreign keys, I suggest the course The Basics of Creating Tables in SQL where you can learn more about primary keys and how they relate to foreign keys. If you are interested in advancing in your SQL career, I suggest these two articles about a new field in IT called Data Engineering: What is Data Engineering? and LearnSQL.com’s New Learning Path: Data Engineering. Remember, when you increase your SQL skills, you increase your assets! Tags: Data Engineering