Back to cookbooks list Articles Cookbook

How to Delete Duplicate Rows in a Table in SQL Server

  • CTE
  • ROW_NUMBER()

Problem:

You want to delete duplicate rows from an existing table in SQL Server.

Example:

A company has its employee data stored in the table employees with the columns empName and dept.

empNamedept
Jack RusselSales
Jan KowalskiHR
John DoeSales
Jack RusselSales
John DoeSales
Marta WilsonHR
Jack RusselSales

Let’s say you want to delete duplicates from this table and leave only one record among the duplicates. For example, Jack Russel from the Sales department has multiple records that are the same.

Solution:

One way to approach this problem in SQL Server is to use a CTE and the ROW_NUMBER() function. Let’s look at the query:

WITH duplicates (name, dept, duplicate_count) AS (
  SELECT
	name,
	dept,
	ROW_NUMBER() OVER(PARTITION BY name, dept ORDER BY name)
  FROM employees
)
DELETE from duplicates
WHERE duplicate_count > 1

Here are the contents of the employees table after you run this query.

empNamedept
Jack RusselSales
Jan KowalskiHR
John DoeSales
Marta WilsonHR

Discussion:

First, we create a CTE called duplicates with a new column called duplicate_count, which stores all records from the table employees. The new column stores the count of repetitions for each record in the table. The first occurrence of “Jack Russel - Sales” gets duplicate_count 1, the second one gets 2, and so on. Here’s what the contents of the duplicates CTE look like:

empNamedeptduplicate_count
Jack RusselSales1
Jack RusselSales2
Jack RusselSales3
Jan KowalskiHR1
John DoeSales1
John DoeSales2
Marta WilsonHR1

We now have a CTE with information about which rows are duplicated. Those that have duplicate_count of 2 or more are duplications. With that information, we may safely delete all records that are duplications. The query keeps only the rows that have duplicate_count equal to 1.

Recommended courses:

Recommended articles:

See also: