21st May 2020 8 minutes read

How to Number Rows in an SQL Result Set

window functions

Table of Contents

A Frequent Request: Could You Number the Records?
Going Deeper: The Partition By and Order By Clauses
Using ROW_NUMBER to Remove Duplicates
Create a Ranking Report with ROW_NUMBER

Have you ever needed to add a sequential number to the records in the result of an SQL query? It’s not as simple as it looks! Find out how to do it correctly in this article.

To number rows in a result set, you have to use an SQL window function called ROW_NUMBER(). This function assigns a sequential integer number to each result row. However, it can also be used to number records in different ways, such as by subsets. You can even use it to number records for other interesting purposes, as we will see.

To learn how window functions work, what functions there are, and how to apply them to real-world problems, it’s best to take the Window Functions course. You can find all about it here. It’s interactive, there are 218 exercises, and you only need a web browser and some basic SQL knowledge.

A Frequent Request: Could You Number the Records?

Suppose you work for a car sales company and you want to produce the following report. Note that the first column (row_num), is not a table column; we generate it using ROW_NUMBER() in the query.

row_num	Article_code	Article_name	Branch	Units_sold
1	101	Katan 2.3 Lux	New York	23
2	102	Katan 1.8 Std	New York	17
3	102	Katan 1.8 Std	San Francisco	18
4	101	Katan 2.3 Lux	San Francisco	15
5	103	Katan Gold	New York	3

Result table

The query to obtain the report is:

SELECT ROW_NUMBER() OVER () AS row_num,
       article_code,
       article_name,
       branch,
       units_sold
FROM  Sales
WHERE article_code IN ( 101, 102, 103 )

In the above query, the syntax of the ROW_NUMBER() function is very simple: we use an empty OVER clause. This means that we want to number all the records in the result set using just one sequence of numbers, assigning numbers to records without any order. This is the simplest way to use the ROW_NUMBER() function:

ROW_NUMBER() OVER () AS row_num

However, there are other ways to use ROW_NUMBER(). We can add a PARTITION BY and/or an ORDER BY clause to the OVER, as we’ll see in the next section. The PARTITION BY clause allows us to number several groups of records independently, while the ORDER BY clause allows us to number the records in specific order. In the next section, we will see some examples.

Before continuing with ROW_NUMBER(), we should say a few words about SQL window functions. As we mentioned previously, ROW_NUMBER() is a window function. There are many other window functions that you can use in your queries, like AVG(), MAX(), LEAD(), LAG() and FIRST_VALUE(). If you want to go into details, I suggest LearnSQL’s Window Functions course. It’s a step-by-step tutorial that takes you through SQL window functions using examples and exercises.

Going Deeper: The Partition By and Order By Clauses

In the previous section, we covered the simplest way to use the ROW_NUMBER() window function, i.e. just numbering all records in the result set in no particular order. In the next paragraphs, we will see three examples with some additional clauses, like PARTITION BY and ORDER BY.

In our first example, we will number the records using a different sequence for each company branch, which will be ordered by the units sold at that branch. In the next query, the PARTITION BY branch clause groups the records that have the same value in branch, assigning a different ROW_NUMBER sequence to each group/branch. (Each group has a different color in the image below.) The ORDER BY units_sold clause defines the order in which we process the rows inside the partition. In this case, the rows belonging to each partition will be ordered by unit_sold in descending order.

SELECT 
   ROW_NUMBER() OVER (PARTITION BY branch ORDER BY units_sold DESC) AS row_num,
   article_code,
   article_name,
   branch,
   units_sold
FROM  Sales
WHERE article_code IN ( 101, 102, 103 )

Note that in this query, sequences are assigned per branch – in the image below, each group of records has a different color – and ordered by units_sold. The clause we used is:

ROW_NUMBER() OVER (PARTITION BY branch ORDER BY units_sold DESC)

This clause allows us to number groups of records that have the same value in the branch column. In our example, there are two groups of records: New York (red) and San Francisco (blue). Now each group of records will be numbered (the ORDER BY clause) based on the units_sold column. The values are shown in descending order.

row_num	Article_code	Article_name	Branch	Units_sold
1	101	Katan 2.3 Lux	New York	23
2	102	Katan 1.8 Std	New York	17
3	103	Katan Gold	New York	3
1	102	Katan 1.8 Std	San Francisco	18
2	101	Katan 2.3 Lux	San Francisco	15

Result table

The OVER, PARTITION BY, and ORDER BY clauses are very common in window functions; if you want to go into details, I suggest you read the article Window Function Examples, where you will find several window functions examples explained in detail.

Using ROW_NUMBER to Remove Duplicates

Another interesting use case for the ROW_NUMBER() function is when we have full duplicated records in a table. Full duplicated records happen when the table has more than one record with the same values in all their columns (usually due to a previous fail). We will show some SQL code to fix this situation; moreover, this code can be adapted to any case of full duplicated records.

First of all, let’s insert some full duplicated records into the Sales table. Suppose we don’t have a primary key in the Sales table and an SQL developer mistakenly executes the following INSERT statement:

INSERT INTO sales 
SELECT * FROM sales WHERE branch = 'San Francisco';

After the INSERT execution, the table Sales looks like this. The last two rows are full duplicates:

Article_code	Article_name	Branch	Units_sold	Period
101	Katan 2.3 Lux	New York	23	Q1-2020
102	Katan 1.8 Std	New York	17	Q1-2020
102	Katan 1.8 Std	San Francisco	18	Q1-2020
101	Katan 2.3 Lux	San Francisco	15	Q1-2020
103	Katan Gold	New York	3	Q1-2020
102	Katan 1.8 Std	San Francisco	18	Q1-2020
101	Katan 2.3 Lux	San Francisco	15	Q1-2020

Table: Sales

To remove the duplicate records, we will add a new column called row_num and we’ll populate it with the following INSERT that uses the ROW_NUMBER() function. Note that we PARTITION BY all columns in the table. Here is the SQL code:

ALTER TABLE sales ADD COLUMN row_num INTEGER;
INSERT INTO sales 
SELECT 
 article_code, article_name, branch, units_sold, period, 
 ROW_NUMBER() OVER (PARTITION BY article_code,article_name,branch, units_sold,period) 
FROM sales ;

Then, after adding the new column and populating it with ROW_NUMBER(), our table looks like this:

Article_code	Article_name	Branch	Units_sold	Period	row_num
101	Katan 2.3 Lux	New York	23	Q1-2020	NULL
102	Katan 1.8 Std	New York	17	Q1-2020	NULL
102	Katan 1.8 Std	San Francisco	18	Q1-2020	NULL
101	Katan 2.3 Lux	San Francisco	15	Q1-2020	NULL
103	Katan Gold	New York	3	Q1-2020	NULL
102	Katan 1.8 Std	San Francisco	18	Q1-2020	NULL
101	Katan 2.3 Lux	San Francisco	15	Q1-2020	NULL
101	Katan 2.3 Lux	New York	23	Q1-2020	1
102	Katan 1.8 Std	New York	17	Q1-2020	1
102	Katan 1.8 Std	San Francisco	18	Q1-2020	1
101	Katan 1.8 Lux	San Francisco	15	Q1-2020	1
103	Katan Gold	New York	3	Q1-2020	1
102	Katan 1.8 Std	San Francisco	18	Q1-2020	2
101	Katan 2.3 Lux	San Francisco	15	Q1-2020	2

Table: Sales

It’s easy to see we need to remove all records with a NULL or a 2 in the column row_num. Let’s do it with the DELETE command. After that, we need to remove the column row_num. Here’s the code:

DELETE FROM sales WHERE row_rank IS NULL OR row_rank = 2;
ALTER TABLE sales DROP COLUMN row_rank;

After running the DELETE and ALTER statements, the table Sales is fixed, with no duplicate records.

There is a disadvantage with this approach that we must clarify. At some point in the process, the number of records in the table Sales will be doubled. That can make this method inefficient, especially with large tables. Thus, we recommend using it only on small and medium tables.

Create a Ranking Report with ROW_NUMBER

In this section, we will use the ROW_NUMBER() function to create a ranking. We will see that there are better functions for ranking, like RANK and DENSE_RANK; however, we can build a pretty good ranking report by using ROW_NUMBER().

Let’s suppose that once a year our car sales company gives three bonuses to their salespeople: one bonus is for the person who sold the most units, another bonus is for the person who made the most revenue, and the third bonus is for the person who made the most profit. If any bonus category is won by two sales reps, then both reps receive 50% of the bonus. We will use the table Sellers_2019 to obtain the rankings and define the winner of each bonus.

Seller_name	Units_sold	Revenue	Profit
John Doyle	123	834.000	38%
Mary Smith	121	914.000	39%
Susan Graue	123	874.000	39%
Simon Doe	117	824.000	42%
Henry Savosky	120	813.000	35%

Table: Sellers_2019

The following query will return the rankings we need to define the bonus winners. These rankings will be on the columns units_ranking, revenue_ranking, and profit_ranking.

SELECT  seller_name,
	  ROW_NUMBER() OVER (ORDER BY units_sold desc) units_ranking,
	  ROW_NUMBER() OVER (ORDER BY revenue desc) revenue_ranking,
	  ROW_NUMBER() OVER (ORDER BY profit desc) profit_ranking
FROM sellers;

The next image shows the results of the previous query with the ROW_NUMBER() function. Initially, the ranking values seem to be correct. But if we review them in detail, we can find a mistake with Susan Graue in Units_ranking. Susan (and John Doyle) won the units sold bonus with 123 units. However, the ROW_NUMBER function assigns position 1 to John and position 2 to Susan, which is not correct.

Seller_name	Units_ranking	Revenue_ranking	Profit_ranking
John Doyle	1	3	4
Mary Smith	3	1	2
Susan Graue	2	2	3
Simon Doe	5	4	1
Henry Savosky	4	5	5

Result table

So, in this case the ROW_NUMBER() function is not the best choice for ranking calculations. Fortunately, SQL provides two functions specifically for ranking purposes: RANK() and DENSE_RANK(). The next query uses the RANK() function instead of the ROW_NUMBER():

SELECT  seller_name,
	  RANK() OVER (ORDER BY units_sold desc) units_ranking,
	  RANK() OVER (ORDER BY revenue desc) revenue_ranking,
	  RANK() OVER (ORDER BY profit desc) profit_ranking
FROM sellers;

In the next image, we can see the results of the RANK() query. We can verify that the issue with Susan’s rank is solved.

Seller_name	Units_ranking	Revenue_ranking	Profit_ranking
John Doyle	1	3	4
Mary Smith	3	1	2
Susan Graue	1	2	2
Simon Doe	5	4	1
Henry Savosky	4	5	5

Result table

If you are interested in the RANK() and DENSE_RANK() window functions, I suggest you read the article How to use RANK functions. It will give you several examples and queries.

Tags:

window functions

A Frequent Request: Could You Number the Records?

Going Deeper: The Partition By and Order By Clauses

Using ROW_NUMBER to Remove Duplicates

Create a Ranking Report with ROW_NUMBER

You may also like