Back to articles list Articles Cookbook
7 minutes read

What Is the Role of DISTINCT in SQL?

The DISTINCT keyword in SQL plays an important role in removing duplicate values from your results. This article provides examples showing exactly how DISTINCT works.

By default, SQL queries show all the returned rows, including duplicate rows, in the result set. The DISTINCT keyword in the SELECT clause is used to eliminate duplicate rows and display a unique list of values. In other words, the DISTINCT keyword retrieves unique values from a table.

The basic syntax of the DISTINCT keyword in SQL is:

SELECT DISTINCT
  column_1,
  column_2,
  …,
  column_n
FROM table_name

(If you find the above syntax confusing, consider LearnSQL.com’s SQL Basics course. It covers the fundamentals of SQL, including the DISTINCT keyword.)

Note that the DISTINCT keyword must be placed before the columns you specify. You can specify more than one column to be DISTINCT, which will make SQL return only unique combinations of those columns’ values. If you do select more than one column, all the columns are combined and only the uniqueness of the row will be considered. SQL DISTINCT looks for a distinct row, rather than a distinct column.

Let’s see the DISTINCT keyword in action with a practical example. First, though, we should familiarize ourselves with the data we’ll be working with. Imagine we have a table called employee_info, which contains details about the employees at our company:

employee_idfirst_namelast_namedepartmentage
127374JoshuaReesSales26
293421KarenBowerSales45
244813LukePayneNULL42
463281MaxPaigeHuman Resources53
374241JuliaGloverHuman Resources29
568421RoseHarrisHuman Resources43
NULLNULLNULLNULLNULL
839211OliviaBakerSales32
293493KarenBowerSales45
463281MaxPaigeHuman Resources53
244813LukePayneNULL42
933821BernadetteParsonsSalesNULL

SQL DISTINCT on a single column

Let’s see the effect of the DISTINCT keyword when we try to SELECT the employee_id column. First, here is the query before applying the DISTINCT keyword:

SELECT employee_id
FROM employee_info

This results in the following:

employee_id
127374
293421
244813
463281
374241
568421
NULL
839211
293493
463281
244813
933821

Executing this query returns the entire employee_id column from our table. This column matches the employee_info table exactly. On closer analysis, we can see some duplicate values – 244813 and 463281 each appear twice.

Let’s add the DISTINCT keyword to address these duplicate values. Remember, we add the DISTINCT keyword before we specify the columns:

SELECT DISTINCT employee_id 
FROM employee_info
employee_id
127374
293421
244813
463281
374241
568421
NULL
839211
293493
933821

Our resulting data set appears shorter; there are several things we can learn from this response.

SQL DISTINCT Removes Duplicate Values

If we contrast the original table to our result, we can see that the duplicates were successfully filtered out. Using the DISTINCT keyword is just one method for removing duplicate values, as shown by this article on the different ways to remove duplicates in SQL.

The employee_info table contained two occurrences of the following values:

employee_id
244813
463281

However, our result now only shows these values once. The DISTINCT keyword has had the desired effect.

Pay Attention to the Columns Used

When you first view the employee_info table, it appears that these rows are duplicates.

employee_idfirst_namelast_namedepartmentage
293421KarenBowerSales45
293493KarenBowerSales45

On closer inspection, you can see that the value of employee_id is different. Since we applied the DISTINCT keyword to this employee_id column, both of these values are returned.

The Inclusion of NULL values

You’ve probably noticed the presence of NULL values in the dataset returned. In SQL, NULL is considered a unique value, meaning it will get returned when using the DISTINCT keyword. If you would like to remove NULL values from the result, you could write the following query:

SELECT DISTINCT
  * 
FROM employee_info 
WHERE employee_id IS NOT NULL

SQL DISTINCT with a Subquery

You can also use the DISTINCT keyword as part of a subquery. However, it is worth mentioning that an IN, ANY, or EXISTS subquery evaluates to true if there is at least one row that causes the subquery to evaluate to true. This makes using a DISTINCT with an IN, ANY, or EXISTS subquery unnecessary.

SQL DISTINCT with Multiple Columns

The columns you specify as DISTINCT play a key role in what data gets returned. Look at the following query:

SELECT DISTINCT
  employee_id,
  first_name,
  last_name 
FROM employee_info

We are looking for unique combinations of these three columns. Executing this query yields the following result:

employee_idfirst_namelast_name
127374JoshuaRees
293421KarenBower
244813LukePayne
463281MaxPaige
374241JuliaGlover
568421RoseHarris
NULLNULLNULL
839211OliviaBaker
293493KarenBower
933821BernadetteParsons

When scanning this result set, you may be expecting to see completely unique names along with their employee_id. For example, look at this data:

employee_idfirst_namelast_name
293421KarenBower
293493KarenBower

The employee_id value is unique for both rows, meaning these rows are also considered unique. If we changed the query to the following:

SELECT DISTINCT
  first_name,
  last_name 
FROM employee_info

... employee_id is no longer considered; now we’re looking for unique combinations of names. Thus, only one row for “Karen Bower” is shown:

first_namelast_name
JoshuaRees
KarenBower
LukePayne
MaxPaige
JuliaGlover
RoseHarris
NULLNULL
OliviaBaker
BernadetteParsons

If you are struggling to achieve your desired results when using the DISTINCT keyword, pay special attention to the columns you have included. If this specific example is still causing you confusion, check out this article on how to count distinct values in SQL.

SQL DISTINCT with Aggregate Functions

The DISTINCT keyword can be used along aggregate functions. The most common aggregate functions are:

  • MIN() - Returns the smallest (minimal) value.
  • MAX() - Returns the largest (maximal) value.
  • SUM() - Returns the sum of values specified.
  • AVG() - Returns the average the specified values (e.g. in a column).
  • COUNT() - Returns the number of items (e.g. in a column).

When DISTINCT is used with SUM(), AVG(), and COUNT(), it eliminates duplicate values before the sum, average, or count is calculated. While you can use the DISTINCT keyword with MIN() and MAX(), it is pointless to do so; it will have no impact on the results.

Let’s apply this information to our previous example involving the employee_info table. If we wanted to count the number of employees currently working at our company while taking care to remove duplicate entries, we’d write the following query:

SELECT COUNT(DISTINCT employee_id)
FROM employee_info

The result of this query is:

COUNT(DISTINCT employee_id)
9

This would be the simplest example of how the DISTINCT keyword can be used in conjunction with aggregate functions. Executing this query without the DISTINCT keyword would include duplicate values in the count:

SELECT COUNT(employee_id) 
FROM employee_info

The result of this query is:

COUNT(employee_id)
11

If you think your table may contain duplicate records, using DISTINCT is an excellent way to ensure your reporting is accurate and remains unaffected by duplicate values. If this use case isn’t clear to you, I recommend reading this article on counting distinct values in SQL. There are many other use cases for COUNT(); for additional useful examples, check out this article explaining the differences between common applications of the COUNT() function.

DISTINCT with Aggregate Functions and GROUP BY

Let’s use an aggregate function with the DISTINCT keyword to help us solve a common problem. We want to find the number of potential duplicate employee_id entries for each name pairing in our table. We want to isolate the case where the employee’s name is exactly the same, but with a different employee_id. Here’s the query:

SELECT
  first_name,
  last_name,
  COUNT(DISTINCT employee_id) 
FROM employee_info 
GROUP BY first_name, last_name

And the result:

employee_idfirst_nameCOUNT (DISTINCT employee_id)
NULLNULL0
BernadetteParsons1
JoshuaRees1
JuliaGlover1
KarenBower2
LukePayne1
MaxPaige1
OliviaBaker1
RoseHarris1

We can see from these results that we have two different entries for Karen Bower’s employee_id. This is an excellent piece of information to have. If there are not two different individuals named Karen Bower working at our company, we know that this information was entered in error and can be considered a duplicate to be removed.

Learning about DISTINCT in SQL Is Just the Start!

The ability to isolate and exclude duplicate data from your query results is an important skill for all SQL users. The DISTINCT keyword plays an important role in your ability to do this effectively. If you enjoyed this article and want to continue learning about SQL, consider our SQL Fundamentals learning track. It will teach you basic SQL statements like WHERE, GROUP BY, ORDER BY, and HAVING. You’ll also learn how to JOIN tables and add, modify, or remove data from a database.