Back to articles list Articles Cookbook
9 minutes read

Six Examples Using MySQL Window Functions

Window functions are an advanced SQL feature available in most popular databases. MySQL had not supported them for a long time, but that changed in Version 8.0. They are helpful not only for analysts and people who create reports, but also for other professionals who use databases to select data needed. In this article, we explain the syntax of some popular window functions with practical examples.

What Is a Window Function?

Window functions operate on a window frame, or a set of rows that are somehow related to the current row. They are similar to GROUP BY, because they compute aggregate values for a group of rows. However, unlike GROUP BY, they do not collapse rows; instead, they keep the details of individual rows.

A very useful feature especially for data analysts, marketers, and financial specialists, most popular relational databases support window functions today. MySQL had not supported them until it finally caught up starting in Version 8.0, when they were added as a new advanced feature.

For people who have worked exclusively in MySQL, window functions may be something completely new. If you’re a MySQL specialist who has been using it since before Version 8.0, learn this new feature! Find out why you should in the article “Why Should I Learn SQL Window Functions?”.

How do window functions work? Consider the table sale shown below, with columns product ID (ID), price (value), and year of sale (sale_year). Suppose we want to add the column prev_value for each product, containing the value of the previous year. See the picture below, which shows how the window function LAG() works:

Six Examples Using MySQL Window Functions

The query looks like this:

SELECT ID, value, sale_year,
  LAG(value) 
    OVER(
      PARTITION BY ID 
      ORDER BY year
    ) AS prev_value
FROM sales;

The query divides the result set into partitions: one for ID 1 and the other for ID 2. Within each partition, the sale value from the previous year is obtained. Note that the year 2016 for ID 2 is the first row of the partition and therefore has no previous row within the partition; consequently, the prev_value column is NULL. For the same ID, the year 2017 is the second row in the partition; the prev_value is 80.00, which is the value from the year 2016.

As you can see, we specify an OVER clause after naming the window function. Within this clause, we use PARTITION BY to define the partitions and ORDER BY to define the sorting within each partition. Other window functions work much in the same way.

Some window functions are ranking functions like RANK(), DENSE_RANK(), and ROW_NUMBER(), while others are analytic functions like LAG() and LEAD().

Ranking functions return a rank value for each row within each partition; in contrast, analytic functions point to preceding or subsequent rows within each partition. If no partitions are defined, all rows are treated as one big partition.

Let’s look at some examples!

Example 1: The RANK() Function

The RANK() function is used mainly to create reports. It computes the rank for each row in the result set in the order specified.

The ranks are sequential numbers starting from 1. When there are ties (i.e., multiple rows with the same value in the column used to order), these rows are assigned the same rank. In this case, the rank of the next row will have skipped some numbers according to the quantity of the tied rows. For this reason, the values returned by RANK() are not necessarily consecutive numbers.

Let’s see an example. Consider the following table, product:

idnamecategoryranking_score
1Sofa Alanliving room3422
2Desk Mirianoffice1777
3Sofa Frankliving room1777
4Armchair Ivoliving room1201
5Cabinet AWEoffice4547
6Armchair Alexliving room1201

The query below displays the columns name, category, and ranking_score. It also ranks the rows using the RANK() function in the order defined by the column ranking_score:

SELECT 
RANK() OVER(ORDER BY ranking_score) AS rank_number, 
name, category, ranking_score 
FROM product;

Here is the result of the query:

rank_numbernamecategoryranking_score
1Armchair Ivoliving room1201
1Armchair Alexliving room1201
3Desk Mirianoffice1777
3Sofa Frankliving room1777
5Sofa Alanliving room3422
6Cabinet AWEoffice4547

After the RANK(), we have an OVER() clause with an ORDER BY. The ORDER BY is mandatory for ranking functions. Here, the rows are sorted in ascending order according to the column ranking_score. The order is ascending by default; you may use ASC at the end of the ORDER BY clause to clarify the ascending order, but it is not necessary.

The first two products, Armchair Ivo and Armchair Alex, both have ranking_score equal to 1201, so they are tied at rank number 1. The rank for the next value of ranking_score, 1777, is 3 and not 2, because there are 2 rows with the rank 1. The ranks start at 1, but the next rank has skipped according to the repeated ranks of the previous rows.

Example 2: The DENSE_RANK() Function

DENSE_RANK() is similar to RANK(), but it does not allow gaps in ranks in the way RANK() does. Let’s see an example.

SELECT 
DENSE_RANK() OVER(ORDER BY ranking_score DESC) AS dense_rank_number, 
name, category, ranking_score 
FROM product;

This query displays the same columns as in the previous example, but with dense_rank_number defined by DENSE_RANK() instead of rank_number defined by RANK(). The rows are still sorted by the column ranking_score, but this time in descending order denoted by DESC at the end of the ORDER BY clause.

The Desk Mirian and Sofa Frank have the same ranking_score (1777) and are assigned a dense_rank_number of 3. DENSE_RANK() returns 4 next and doesn't account for the repeated ranks as does the RANK() function. The article “Overview of Ranking Functions in SQL” explains the difference between these functions.

The result of this query looks like this:

dense_rank_numbe rnamecategoryranking_score
1Cabinet AWEoffice4547
2Sofa Alanliving room3422
3Desk Mirianoffice1777
3Sofa Frankliving room1777
4Armchair Ivoliving room1201
4Armchair Alexliving room1201

Example 3: The ROW_NUMBER() Function

Another popular ranking function used in databases is ROW_NUMBER(). It simply assigns consecutive numbers to each row in a specified order. Let’s see an example:

SELECT 
ROW_NUMBER() OVER(ORDER BY ranking_score) AS row_number, 
name, category, ranking_score 
FROM product;

The query first orders the rows by ranking_score in ascending order. It then assigns row numbers consecutively starting with 1. The rows with ties in ranking_score are assigned different row numbers, effectively ignoring the ties.

The result of this query looks like this:

row_numbernamecategoryranking_score
1Armchair Ivoliving room1201
2Armchair Alexliving room1201
3Desk Mirianoffice1777
4Sofa Frankliving room1777
5Sofa Alanliving room3422
6Cabinet AWEoffice4547

Ranking Within Each Partition

You can assign ranks separately within each partition with RANK(), DENSE_RANK(), or ROW_NUMBER(). This is done by dividing rows into partitions by a column or a combination of columns then assigning ranks independently within each partition.

Here is an example:

SELECT 
RANK() OVER(PARTITION BY category ORDER BY ranking_score) AS rank_number, 
name, category, ranking_score 
FROM product;

This query defines ranks separately within each category. For the category “living room,” the rank starts at 1 and ends at 4; in the “office” category, the rank starts at 1 and ends at 2.

Here is the result of this query:

rank_numbernamecategoryranking_score
1Armchair Ivoliving room1201
1Armchair Alexliving room1201
3Sofa Frankliving room1777
4Sofa Alanliving room3422
1Desk Mirianoffice1777
2Cabinet AWEoffice4547

The rank numbering is defined separately for each category, since each category is a partition (PARTITION BY category). The records are sorted by ranking_score (ORDER BY ranking_score) within each partition (category), and the ranks are calculated within each partition.

Example 4: The LEAD() Function

Analytical functions are a different type of window functions. They are used to compute a value within a group of rows, returning some value from a preceding row or from a subsequent row within the row group. Analytical functions are useful for finding differences in a particular column between a given row and some preceding or subsequent row. For example, you can compare the difference of a particular column between the current row and the adjacent row.

The first analytic function we will examine is LEAD(). In its simplest form, LEAD() takes the value of a given column stored in the next row.

Consider the table below, toys_sale:

idtoy_namemonthsale_value
1robot323455
2robot412345
3robot523000
4kite36890
5kite47600
6kite59120
7ball345123
8ball442000
9ball520300
10puzzle567000

and the following query:

SELECT 
   toy_name, month, sale_value,
   LEAD(sale_value) OVER(PARTITION BY toy_name ORDER BY month) 
     AS next_month_value
FROM toys_sale;

The LEAD() function returns the value of sale_value in the next row. We have partitions in this example—the name of the toy—so only the rows within the same partition are considered. In contrast to ranking functions, LEAD() takes an argument—the name of the column whose value it will return from the next row.

Here is the result of the query:

toy_namemonthsale_valuenext_month_value
ball34512342000
ball44200020300
ball520300NULL
kite368907600
kite476009120
kite59120NULL
puzzle567000NULL
robot32345512345
robot41234523000
robot523000NULL

For the last row in the partition, the next value is always NULL. Look at Month 5 for all of the toys. This is because Month 5 is the last month in the partition. There is no month that follows Month 5 in the partition, and the next sale_value does not exist.

Example 5: The LAG() Function

LAG() is similar to LEAD(). Both LEAD() and LAG() can be used to compute the difference between a given row and another row. The difference is that LAG() returns the value from a preceding row, whereas LEAD() returns the value from a subsequent row.

Take a look at the following query:

SELECT toy_name, month, sale_value, 
  LAG(sale_value) OVER(PARTITION BY toy_name ORDER BY month) AS prev_month_value, 
  LAG(sale_value) OVER(PARTITION BY toy_name ORDER BY month)- sale_value as difference 
FROM toys_sale;

Each toy name is a partition. Within each partition, the query returns the sale value from the previous month and stores it in the column prev_month_value. For Month 5, prev_month_value contains the sale value from Month 4. It is NULL for Month 3, because there is no previous month within the partition.

In addition, this query calculates the difference in sale values between the current month and the previous month and stores the difference in the column difference. Here is the result:

toy_namemonthsale_valueprev_month_valuedifference
ball345123NULLNULL
ball442000451233123
ball5203004200021700
kite36890NULLNULL
kite476006890-710
kite591207600-1520
puzzle567000NULLNULL
robot323455NULLNULL
robot4123452345511110
robot52300012345-10655

Example 6: The Running Total

You can also calculate running totals in MySQL using the SUM() function.

Take a look at this query:

SELECT toy_name, month, sale_value, 
SUM(sale_value) OVER(PARTITION BY toy_name ORDER BY month) AS total_toy_value 
FROM toys_sale;

It calculates the cumulative sum of the sale values for each toy and saves it into a new column, total_toy_sale. For Month 3, it is simply the sale value of that month, because it is the first row in the partition. However, for Month 4, it is the sum of sale values from Months 3 and 4; for Month 5, it is the sum of sale values from Months 3, 4, and 5. This calculation is called the running total, because the sum is updated with each successive row.

toy_namemonthsale_valuetotal_toy_value
ball34512345123
ball44200087123
ball520300107423
kite368906890
kite4760014490
kite5912023610
puzzle56700067000
robot32345523455
robot41234535800
robot52300058800

Using Window Functions in MySQL

We have seen several examples of window functions. Most relational databases today support them, and MySQL finally caught up in Version 8.0. If you are not familiar with them, you should really consider learning them. They are useful especially for marketers and data analysts.

If you are interested in learning more about window functions, our interactive course “Window Functions” and our “SQL Window Functions Cheat Sheet” on LearnSQL.com are good resources. I hope this article has whetted your appetite to search and learn more about SQL window functions!