Back to cookbooks list Articles Cookbook

How to Eliminate Duplicate Rows in SQL

  • DISTINCT

Table of Contents

Problem

Duplicate rows in your SQL query results can make data harder to understand and reduce the accuracy of your analysis. When the same data appears more than once, it clutters your output and complicates reports. Solving this issue is important for maintaining clean, easy-to-read, and reliable datasets that lead to better insights and decisions.

Example

Our database has a table named clothes with data in the following columns: id, name, color, and year_produced.

idnamecoloryear_produced
1T-shirtyellow2015
2jacketblue2016
3jeansblack2015
4jacketblue2015
5jacketgreen2016
6jacketyellow2017
7hatyellow2017

Let’s get the non-repeated names and colors of clothes produced before 2017.

Solution

SELECT DISTINCT name, color
FROM clothes
WHERE year_produced < 2017;

Here’s the result:

namecolor
T-shirtyellow
jacketblue
jeansblack
jacketgreen

Notice that the blue appears only once in this result set, even though it appears twice in the original table. This is because we specified that we want to select only those rows for which the name-color combination is unique.

Discussion

Simply use the DISTINCT keyword after SELECT if you want to select only non-repeated rows. This keyword forces the query to discard any duplicate rows, based only on the columns you listed.

Here’s an example of selecting only the rows for which the name of the item is unique:

SELECT DISTINCT name
FROM clothes;

Unlike the previous query, this one returns only three records:

name
T-shirt
jacket
jeans

While there are four jeans in the original table (two blue, one green, and one yellow), the item appears only once in this result set thanks to the DISTINCT keyword.

Recommended courses:

Recommended articles:

See also: