Articles Cookbook
Back to list
Standard SQL

How to Select the First Row in Each GROUP BY Group

Database:

Operators:

ROW_NUMBER, PARTITION BY, WITH

Problem:

You've grouped your data with GROUP BY and would like to display only the first row from each group.

Example:

Our database has a table named exam_results with data in the following table:

first_namelast_nameyearresult
JohnKlein202040
EdithBlack202043
MarkJohnson201932
LauraSummer202035
KateSmith201941
JacobBlack201944
TomBennett202038
EmilyKelly202043

For each year, let's find the student with the best result. If there are two students tied for the best in a group, we'll arbitrarily select one of them to display.

Solution:

WITH added_row_number AS (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number
  FROM exam_results
)
SELECT
  *
FROM added_row_number
WHERE row_number = 1;

The result is:

first_namelast_nameyearresultrow_number
JacobBlack2019441
EmilyKelly2020431

Discussion:

First, you need to write a CTE in which you assign a number to each row within each group. To do that, you can use the ROW_NUMBER() function. In OVER(), you specify the groups into which the rows should be divided (PARTITION BY) and the order in which the numbers should be assigned to the rows (ORDER BY).

Take a look at the result of the inner query:

SELECT
  *,
  ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number
FROM exam_results;
first_namelast_nameyearresultrow_number
JacobBlack2019441
KateSmith2019412
MarkJohnson2019323
EmilyKelly2020431
EdithBlack2020432
JohnKlein2020403
TomBennett2020384
LauraSummer2020355

You assign the row numbers within each group (i.e., year). Each row has a row number based on the value of the result column. The rows are sorted in the descending order because of the DESC keyword after ORDER BY result. Even if there are multiple rows within a group that have the same value of result, the rows are still given different numbers. Here, Edith Black and Emily Kelly have the same result but different row numbers. To change this behavior and assign the same row number for the same result within a group, use RANK() or DENSE_RANK() instead of ROW_NUMBER().

In the outer query, you select all data from the CTE (added_row_number) and use a WHERE condition to specify which row to display from each group. Here, we want to display the first row, so the condition is row_number = 1.

Note that you can easily modify the solution to get, for example, the second row of each group.

WITH added_row_number AS (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number
  FROM exam_results
)
SELECT
  *
FROM added_row_number
WHERE row_number = 2;

Here's the result:

first_namelast_nameyearresultrow_number
KateSmith2019412
EdithBlack2020432

On the other hand, if you want to get the row(s) with the second highest value of result within each group, you should use the DENSE_RANK() function. While the ROW_NUMBER() function creates consecutive numbers for each row in a group, resulting in different values assigned to the rows with the same result, the DENSE_RANK() function gives the same number to the rows with the same result.

WITH added_dense_rank AS (
  SELECT
    *,
    DENSE_RANK() OVER(PARTITION BY year ORDER BY result DESC) AS rank
  FROM exam_results
)
SELECT
  *
FROM added_dense_rank
WHERE rank = 2;
first_namelast_nameyearresultrank
KateSmith2019412
JohnKlein2020402

You can see that John Klein has the second highest value of result (40) for the year 2020. John Klein is actually the third person in the group, but the first two students have the same result and they both have rank = 1.

Recommended courses:

Recommended articles:

See also:

go to top