Articles Cookbook
Back to list
Standard SQL

How to Find Duplicate Rows in SQL?

Database:

Operators:

HAVING, COUNT, GROUP BY

Problem:

You have duplicate rows in your table, with only the IDs being unique. How do you find those duplicate entries?

Example:

Our database has a table named product with data in the following columns: id, name, and category.

idnamecategory
1steakmeat
2cakesweets
3d>steakmeat
4porkmeat
5cakesweets
6cakesweets

Let’s find duplicate names and categories of products. You can find duplicates by grouping rows, using the COUNT aggregate function, and specifying a HAVING clause with which to filter rows.

Solution:

SELECT  name,
 category,
FROM product
GROUP BY name, category
HAVING COUNT(id) >1;

This query returns only duplicate records—ones that have the same product name and category:

namecategory
steakmeat
cakesweets

There are two duplicate products in our table: steak from the meat category and cake from the sweets category. The first product is repeated two times in the table, while the second appears three times.

Discussion:

To select duplicate values, you need to create groups of rows with the same values and then select the groups with counts greater than one. You can achieve that by using GROUP BY and a HAVING clause.

The first step is to create groups of records with the same values in all non-ID columns (in our example, name and category). You do this with a GROUP BY clause. After the GROUP BY keyword, you put the names of the columns you want to use for grouping. We exclude the id column because it’s our table’s primary key; by definition, each row will have a different value under that column. If we were to include it, then we would not be able to detect duplicates!

We want to find groups with more than one row; such groups must contain a duplicate by definition, as long as we’ve grouped on the correct columns. To do this, we use a HAVING clause. The condition we specify is that the number of elements in the group—COUNT(id)—must be greater than one: COUNT(id) > 1. Remember that HAVING allows you to filter groups; WHERE is for filtering individual rows.

Recommended courses:

go to top