How to Select the First Row in Each GROUP BY Group

Database:

Operators:

ROW_NUMBER
PARTITION BY
WITH

Problem:

You've grouped your data with GROUP BY and would like to display only the first row from each group.

Example:

Our database has a table named exam_results with data in the following table:

first_name	last_name	year	result
John	Klein	2020	40
Edith	Black	2020	43
Mark	Johnson	2019	32
Laura	Summer	2020	35
Kate	Smith	2019	41
Jacob	Black	2019	44
Tom	Bennett	2020	38
Emily	Kelly	2020	43

For each year, let's find the student with the best result. If there are two students tied for the best in a group, we'll arbitrarily select one of them to display.

Learn advanced SQL with out interactive SQL courses

Solution:

WITH added_row_number AS (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number
  FROM exam_results
)
SELECT
  *
FROM added_row_number
WHERE row_number = 1;

The result is:

first_name	last_name	year	result	row_number
Jacob	Black	2019	44	1
Emily	Kelly	2020	43	1

Discussion:

First, you need to write a CTE in which you assign a number to each row within each group. To do that, you can use the ROW_NUMBER() function. In OVER(), you specify the groups into which the rows should be divided (PARTITION BY) and the order in which the numbers should be assigned to the rows (ORDER BY).

Take a look at the result of the inner query:

SELECT
  *,
  ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number
FROM exam_results;

first_name	last_name	year	result	row_number
Jacob	Black	2019	44	1
Kate	Smith	2019	41	2
Mark	Johnson	2019	32	3
Emily	Kelly	2020	43	1
Edith	Black	2020	43	2
John	Klein	2020	40	3
Tom	Bennett	2020	38	4
Laura	Summer	2020	35	5

You assign the row numbers within each group (i.e., year). Each row has a row number based on the value of the result column. The rows are sorted in the descending order because of the DESC keyword after ORDER BY result. Even if there are multiple rows within a group that have the same value of result, the rows are still given different numbers. Here, Edith Black and Emily Kelly have the same result but different row numbers. To change this behavior and assign the same row number for the same result within a group, use RANK() or DENSE_RANK() instead of ROW_NUMBER().

In the outer query, you select all data from the CTE (added_row_number) and use a WHERE condition to specify which row to display from each group. Here, we want to display the first row, so the condition is row_number = 1.

Note that you can easily modify the solution to get, for example, the second row of each group.

WITH added_row_number AS (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number
  FROM exam_results
)
SELECT
  *
FROM added_row_number
WHERE row_number = 2;

Here's the result:

first_name	last_name	year	result	row_number
Kate	Smith	2019	41	2
Edith	Black	2020	43	2

On the other hand, if you want to get the row(s) with the second highest value of result within each group, you should use the DENSE_RANK() function. While the ROW_NUMBER() function creates consecutive numbers for each row in a group, resulting in different values assigned to the rows with the same result, the DENSE_RANK() function gives the same number to the rows with the same result.

WITH added_dense_rank AS (
  SELECT
    *,
    DENSE_RANK() OVER(PARTITION BY year ORDER BY result DESC) AS rank
  FROM exam_results
)
SELECT
  *
FROM added_dense_rank
WHERE rank = 2;

first_name	last_name	year	result	rank
Kate	Smith	2019	41	2
John	Klein	2020	40	2

You can see that John Klein has the second highest value of result (40) for the year 2020. John Klein is actually the third person in the group, but the first two students have the same result and they both have rank = 1.

Problem:

Example:

Solution:

Discussion:

Recommended courses:

Recommended articles:

See also: