# How to Select the First Row in Each GROUP BY Group

Database:

Operators:

Table of Contents

## Problem:

You've grouped your data with `GROUP BY`

and would like to display only the first row from each group.

## Example:

Our database has a table named

with data in the following table:**exam_results**

first_name | last_name | year | result |
---|---|---|---|

John | Klein | 2020 | 40 |

Edith | Black | 2020 | 43 |

Mark | Johnson | 2019 | 32 |

Laura | Summer | 2020 | 35 |

Kate | Smith | 2019 | 41 |

Jacob | Black | 2019 | 44 |

Tom | Bennett | 2020 | 38 |

Emily | Kelly | 2020 | 43 |

For each year, let's find the student with the best `result`

. If there are two students tied for the best in a group, we'll arbitrarily select one of them to display.

## Solution:

WITH added_row_number AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number FROM exam_results ) SELECT * FROM added_row_number WHERE row_number = 1;

The result is:

first_name | last_name | year | result | row_number |
---|---|---|---|---|

Jacob | Black | 2019 | 44 | 1 |

Emily | Kelly | 2020 | 43 | 1 |

## Discussion:

First, you need to write a CTE in which you assign a number to each row within each group. To do that, you can use the `ROW_NUMBER()`

function. In `OVER()`

, you specify the groups into which the rows should be divided (`PARTITION BY`

) and the order in which the numbers should be assigned to the rows (`ORDER BY`

).

Take a look at the result of the inner query:

SELECT *, ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number FROM exam_results;

first_name | last_name | year | result | row_number |
---|---|---|---|---|

Jacob | Black | 2019 | 44 | 1 |

Kate | Smith | 2019 | 41 | 2 |

Mark | Johnson | 2019 | 32 | 3 |

Emily | Kelly | 2020 | 43 | 1 |

Edith | Black | 2020 | 43 | 2 |

John | Klein | 2020 | 40 | 3 |

Tom | Bennett | 2020 | 38 | 4 |

Laura | Summer | 2020 | 35 | 5 |

You assign the row numbers within each group (i.e., year). Each row has a row number based on the value of the `result`

column. The rows are sorted in the descending order because of the `DESC`

keyword after `ORDER BY result`

. Even if there are multiple rows within a group that have the same value of `result`

, the rows are still given different numbers. Here, Edith Black and Emily Kelly have the same `result`

but different row numbers. To change this behavior and assign the same row number for the same result within a group, use `RANK()`

or `DENSE_RANK()`

instead of `ROW_NUMBER()`

.

In the outer query, you select all data from the CTE (

) and use a **added_row_number**`WHERE`

condition to specify which row to display from each group. Here, we want to display the first row, so the condition is `row_number = 1`

.

Note that you can easily modify the solution to get, for example, the **second row** of each group.

WITH added_row_number AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY year ORDER BY result DESC) AS row_number FROM exam_results ) SELECT * FROM added_row_number WHERE row_number = 2;

Here's the result:

first_name | last_name | year | result | row_number |
---|---|---|---|---|

Kate | Smith | 2019 | 41 | 2 |

Edith | Black | 2020 | 43 | 2 |

On the other hand, if you want to get the row(s) with the **second highest value** of `result`

within each group, you should use the `DENSE_RANK()`

function. While the `ROW_NUMBER()`

function creates consecutive numbers for each row in a group, resulting in different values assigned to the rows with the same result, the `DENSE_RANK()`

function gives the same number to the rows with the same result.

WITH added_dense_rank AS ( SELECT *, DENSE_RANK() OVER(PARTITION BY year ORDER BY result DESC) AS rank FROM exam_results ) SELECT * FROM added_dense_rank WHERE rank = 2;

first_name | last_name | year | result | rank |
---|---|---|---|---|

Kate | Smith | 2019 | 41 | 2 |

John | Klein | 2020 | 40 | 2 |

You can see that John Klein has the second highest value of `result (40)`

for the year 2020. John Klein is actually the third person in the group, but the first two students have the same `result`

and they both have `rank = 1`

.