How to Eliminate Duplicate Rows in SQL

Database:

Operators:

DISTINCT

Table of Contents

Problem
Example
Solution
Discussion

Problem

Duplicate rows in your SQL query results can make data harder to understand and reduce the accuracy of your analysis. When the same data appears more than once, it clutters your output and complicates reports. Solving this issue is important for maintaining clean, easy-to-read, and reliable datasets that lead to better insights and decisions.

Example

Our database has a table named clothes with data in the following columns: id, name, color, and year_produced.

id	name	color	year_produced
1	T-shirt	yellow	2015
2	jacket	blue	2016
3	jeans	black	2015
4	jacket	blue	2015
5	jacket	green	2016
6	jacket	yellow	2017
7	hat	yellow	2017

Let’s get the non-repeated names and colors of clothes produced before 2017.

Solution

SELECT DISTINCT name, color
FROM clothes
WHERE year_produced < 2017;

Here’s the result:

name	color
T-shirt	yellow
jacket	blue
jeans	black
jacket	green

Notice that the blue appears only once in this result set, even though it appears twice in the original table. This is because we specified that we want to select only those rows for which the name-color combination is unique.

Discussion

Simply use the DISTINCT keyword after SELECT if you want to select only non-repeated rows. This keyword forces the query to discard any duplicate rows, based only on the columns you listed.

Here’s an example of selecting only the rows for which the name of the item is unique:

SELECT DISTINCT name
FROM clothes;

Unlike the previous query, this one returns only three records:

name
T-shirt
jacket
jeans

While there are four jeans in the original table (two blue, one green, and one yellow), the item appears only once in this result set thanks to the DISTINCT keyword.

Problem

Example

Solution

Discussion

Recommended courses:

Recommended articles:

See also: