8th Sep 2020 Updated: 7th Nov 2024 9 minutes read

Seven Examples Using MySQL Window Functions

Table of Contents

What Is a Window Function?
How do Window Functions Work
Example 1: The RANK() Function
Example 2: The DENSE_RANK() Function
Example 3: The ROW_NUMBER() Function
Example 4: Ranking Within Partition
Example 5: The LEAD() Function
Example 6: The LAG() Function
Example 7: The Running Total
Using Window Functions in MySQL

Window functions are an advanced SQL feature available in most popular databases. MySQL had not supported them for a long time, but that changed in Version 8.0. They are helpful not only for analysts and people who create reports, but also for other professionals who use databases to select data needed. In this article, we explain the syntax of some popular window functions with practical examples.

What Is a Window Function?

Window functions operate on a window frame, that is: a set of rows that are somehow related to the current row. They are similar to GROUP BY, because they compute aggregate values for a group of rows. However, unlike GROUP BY, they do not collapse rows. Instead, they keep the details of individual rows.

This is a very useful feature especially for data analysts, marketers, and financial specialists, most popular relational databases support window functions today. MySQL had not supported them until Version 8.0, when they were added as a new advanced feature.

For people who have worked exclusively in MySQL, window functions may be something completely new. If you’re a MySQL specialist who has been using it since before Version 8.0, learn this new feature!

How do Window Functions Work

Consider the table sale shown below, with columns product ID (ID), price (value), and year of sale (sale_year). Suppose we want to add the column prev_value for each product, containing the value of the previous year. See the picture below, which shows how the window function LAG() works:

Six Examples Using MySQL Window Functions

The query looks like this:

SELECT ID, value, sale_year,
  LAG(value) 
    OVER(
      PARTITION BY ID 
      ORDER BY year
    ) AS prev_value
FROM sales;

The query divides the result set into partitions: one for ID 1 and the other for ID 2. Within each partition, the sale value from the previous year is obtained. Note that the year 2016 for ID 2 is the first row of the partition and therefore has no previous row within the partition; consequently, the prev_value column is NULL. For the same ID, the year 2017 is the second row in the partition; the prev_value is 80.00, which is the value from the year 2016.

As you can see, we specify an OVER clause after naming the window function. Within this clause, we use PARTITION BY to define the partitions and ORDER BY to define the sorting within each partition. Other window functions work much in the same way.

Let’s look at some examples!

Example 1: The RANK() Function

The RANK() function is used mainly to create reports. It computes the rank for each row in the result set in the order specified.

The ranks are sequential numbers starting from 1. When there are ties (i.e., multiple rows with the same value in the column used to order), these rows are assigned the same rank. In this case, the rank of the next row will have skipped some numbers according to the quantity of the tied rows. For this reason, the values returned by RANK() are not necessarily consecutive numbers.

Let’s see an example. Consider the following table, product:

id	name	category	ranking_score
1	Sofa Alan	living room	3422
2	Desk Mirian	office	1777
3	Sofa Frank	living room	1777
4	Armchair Ivo	living room	1201
5	Cabinet AWE	office	4547
6	Armchair Alex	living room	1201

The query below displays the columns name, category, and ranking_score. It also ranks the rows using the RANK() function in the order defined by the column ranking_score:

SELECT 
  RANK() OVER(ORDER BY ranking_score) AS rank_number, 
  name, 
  category, 
  ranking_score 
FROM product;

Here is the result of the query:

rank_number	name	category	ranking_score
1	Armchair Ivo	living room	1201
1	Armchair Alex	living room	1201
3	Desk Mirian	office	1777
3	Sofa Frank	living room	1777
5	Sofa Alan	living room	3422
6	Cabinet AWE	office	4547

After the RANK(), we have an OVER() clause with an ORDER BY. The ORDER BY is mandatory for ranking functions. Here, the rows are sorted in ascending order according to the column ranking_score. The order is ascending by default; you may use ASC at the end of the ORDER BY clause to clarify the ascending order, but it is not necessary.

The first two products, Armchair Ivo and Armchair Alex, both have ranking_score equal to 1201, so they are tied at rank number 1. The rank for the next value of ranking_score, 1777, is 3 and not 2, because there are 2 rows with the rank 1. The ranks start at 1, but the next rank has skipped according to the repeated ranks of the previous rows.

Example 2: The DENSE_RANK() Function

DENSE_RANK() is similar to RANK(), but it does not allow gaps in ranks in the way RANK() does. Let’s see an example.

SELECT 
  DENSE_RANK() OVER(ORDER BY ranking_score DESC) AS dense_rank_number, 
  name, 
  category, 
  ranking_score 
FROM product;

This query displays the same columns as in the previous example, but with dense_rank_number defined by DENSE_RANK() instead of rank_number defined by RANK(). The rows are still sorted by the column ranking_score, but this time in descending order denoted by DESC at the end of the ORDER BY clause.

The Desk Mirian and Sofa Frank have the same ranking_score (1777) and are assigned a dense_rank_number of 3. DENSE_RANK() returns 4 next and doesn't account for the repeated ranks as does the RANK() function. The article “Overview of Ranking Functions in SQL” explains the difference between these functions.

The result of this query looks like this:

dense_rank_number	name	category	ranking_score
1	Cabinet AWE	office	4547
2	Sofa Alan	living room	3422
3	Desk Mirian	office	1777
3	Sofa Frank	living room	1777
4	Armchair Ivo	living room	1201
4	Armchair Alex	living room	1201

Example 3: The ROW_NUMBER() Function

Another popular ranking function used in databases is ROW_NUMBER(). It simply assigns consecutive numbers to each row in a specified order. Let’s see an example:

SELECT 
  ROW_NUMBER() OVER(ORDER BY ranking_score) AS row_number, 
  name, 
  category, 
  ranking_score 
FROM product;

The query first orders the rows by ranking_score in ascending order. It then assigns row numbers consecutively starting with 1. The rows with ties in ranking_score are assigned different row numbers, effectively ignoring the ties.

The result of this query looks like this:

row_number	name	category	ranking_score
1	Armchair Ivo	living room	1201
2	Armchair Alex	living room	1201
3	Desk Mirian	office	1777
4	Sofa Frank	living room	1777
5	Sofa Alan	living room	3422
6	Cabinet AWE	office	4547

Example 4: Ranking Within Partition

You can assign ranks separately within each partition with RANK(), DENSE_RANK(), or ROW_NUMBER(). This is done by dividing rows into partitions by a column or a combination of columns then assigning ranks independently within each partition.

Here is an example:

SELECT 
  RANK() OVER(PARTITION BY category ORDER BY ranking_score) AS rank_number, 
  name, 
  category, 
  ranking_score 
FROM product;

This query defines ranks separately within each category. For the category “living room,” the rank starts at 1 and ends at 4; in the “office” category, the rank starts at 1 and ends at 2.

Here is the result of this query:

rank_number	name	category	ranking_score
1	Armchair Ivo	living room	1201
1	Armchair Alex	living room	1201
3	Sofa Frank	living room	1777
4	Sofa Alan	living room	3422
1	Desk Mirian	office	1777
2	Cabinet AWE	office	4547

The rank numbering is defined separately for each category, since each category is a partition (PARTITION BY category). The records are sorted by ranking_score (ORDER BY ranking_score) within each partition (category), and the ranks are calculated within each partition.

Example 5: The LEAD() Function

Analytical functions are a different type of window functions. They are used to compute a value within a group of rows, returning some value from a preceding row or from a subsequent row within the row group. Analytical functions are useful for finding differences in a particular column between a given row and some preceding or subsequent row. For example, you can compare the difference of a particular column between the current row and the adjacent row.

The first analytic function we will examine is LEAD(). In its simplest form, LEAD() takes the value of a given column stored in the next row.

Consider the table below, toys_sale:

id	toy_name	month	sale_value
1	robot	3	23455
2	robot	4	12345
3	robot	5	23000
4	kite	3	6890
5	kite	4	7600
6	kite	5	9120
7	ball	3	45123
8	ball	4	42000
9	ball	5	20300
10	puzzle	5	67000

and the following query:

SELECT 
  toy_name, 
  month, 
  sale_value,
  LEAD(sale_value) OVER(PARTITION BY toy_name ORDER BY month) 
    AS next_month_value
FROM toys_sale;

The LEAD() function returns the value of sale_value in the next row. We have partitions in this example—the name of the toy—so only the rows within the same partition are considered. In contrast to ranking functions, LEAD() takes an argument—the name of the column whose value it will return from the next row.

Here is the result of the query:

toy_name	month	sale_value	next_month_value
ball	3	45123	42000
ball	4	42000	20300
ball	5	20300	NULL
kite	3	6890	7600
kite	4	7600	9120
kite	5	9120	NULL
puzzle	5	67000	NULL
robot	3	23455	12345
robot	4	12345	23000
robot	5	23000	NULL

For the last row in the partition, the next value is always NULL. Look at Month 5 for all of the toys. This is because Month 5 is the last month in the partition. There is no month that follows Month 5 in the partition, and the next sale_value does not exist.

Example 6: The LAG() Function

LAG() is similar to LEAD(). Both LEAD() and LAG() can be used to compute the difference between a given row and another row. The difference is that LAG() returns the value from a preceding row, whereas LEAD() returns the value from a subsequent row.

Take a look at the following query:

SELECT 
  toy_name, 
  month, 
  sale_value, 
  LAG(sale_value) OVER(PARTITION BY toy_name ORDER BY month) 
    AS prev_month_value, 
  LAG(sale_value) OVER(PARTITION BY toy_name ORDER BY month) - sale_value 
    AS difference 
FROM toys_sale;

Each toy name is a partition. Within each partition, the query returns the sale value from the previous month and stores it in the column prev_month_value. For Month 5, prev_month_value contains the sale value from Month 4. It is NULL for Month 3, because there is no previous month within the partition.

In addition, this query calculates the difference in sale values between the current month and the previous month and stores the difference in the column difference. Here is the result:

toy_name	month	sale_value	prev_month_value	difference
ball	3	45123	NULL	NULL
ball	4	42000	45123	3123
ball	5	20300	42000	21700
kite	3	6890	NULL	NULL
kite	4	7600	6890	-710
kite	5	9120	7600	-1520
puzzle	5	67000	NULL	NULL
robot	3	23455	NULL	NULL
robot	4	12345	23455	11110
robot	5	23000	12345	-10655

Example 7: The Running Total

You can also calculate running totals in MySQL using the SUM() function.

Take a look at this query:

SELECT 
  toy_name, 
  month, 
  sale_value, 
  SUM(sale_value) OVER(PARTITION BY toy_name ORDER BY month) 
    AS total_toy_value 
FROM toys_sale;

It calculates the cumulative sum of the sale values for each toy and saves it into a new column, total_toy_sale. For Month 3, it is simply the sale value of that month, because it is the first row in the partition. However, for Month 4, it is the sum of sale values from Months 3 and 4; for Month 5, it is the sum of sale values from Months 3, 4, and 5. This calculation is called the running total, because the sum is updated with each successive row.

toy_name	month	sale_value	total_toy_value
ball	3	45123	45123
ball	4	42000	87123
ball	5	20300	107423
kite	3	6890	6890
kite	4	7600	14490
kite	5	9120	23610
puzzle	5	67000	67000
robot	3	23455	23455
robot	4	12345	35800
robot	5	23000	58800

Using Window Functions in MySQL

We have seen several examples of window functions. Most relational databases today support them, and MySQL finally caught up in Version 8.0. If you are not familiar with them, you should really consider learning them. They are useful especially for marketers and data analysts.

If you are interested in learning more about window functions in MySQL, our interactive course “Window Functions in MySQL” is a great resource. It contains over 200 hands-on practical exercises to help you build you window functions skills. I hope this article has whetted your appetite to learn more about SQL window functions!

Tags: