Back to articles list Articles Cookbook
17 minutes read

A Comprehensive Introduction to Window Functions in MySQL

Do a deep dive into the working principles, syntax, and applications of various MySQL window functions.

There are many open source databases you could choose; MySQL is consistently one of the most popular. But until 2018, there was no provision for including window functions in MySQL. Fortunately, all that has changed – starting from MySQL 8.0, SQL window functions were now available to MySQL users.

Please note that window functions are available only in MySQL 8.0. As of the time of this writing, there is no plan to add SQL window functions to MySQL’s earlier releases.

Why is the advent of window functions in MySQL so important? They are a powerful data analysis and reporting tool. It’s essential to learn window functions, especially if you work with earlier releases of MySQL and are planning to upgrade to MySQL 8.0.

In this article, we’ll go through MySQL’s available window functions and explain their working principles. Next, we’ll compare window functions and aggregate functions. Finally, we’ll examine various sample queries (including their input and output tables) so you can see for yourself how window functions work.

Overview of MySQL Window Functions

The table presents the available window functions in MySQL along with a brief explanation of each:

Window FunctionDescription
RANK()Returns the rank of the current row within a defined partition. If one or more rows share the same ranking value, some rank numbers will be omitted from the sequence (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 4…).
DENSE_RANK()Returns the rank of the current row within a defined partition. The difference between DENSE_RANK() and RANK() is that DENSE_RANK() does not skip any sequence numbers. (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 3, 4…).
ROW_NUMBER()Returns the number of the current row within a defined partition.
PERCENT_RANK()Returns a number between 0 and 1 (both inclusive) that indicates the percentage of the current partition values that are less than the current row value of that particular partition.
LEAD()Looks up a subsequent row value for the specified column within the current partition.
LAG()Looks up a previous row value for a specified column within the current partition.
FIRST_VALUE()Returns the value of a specified column for the first row within the current partition.
LAST_VALUE()Returns the value of a specified column for the last row within the current partition.
NTH_VALUE()Returns the value of a specified column for the nth row within the current partition, where n is defined by the user.
NTILE()Divides rows within the current partition into buckets. The number of buckets is specified by the user. The NTILE() function then assigns the number of the bucket to each row.
CUME_DIST()Returns the cumulative distribution of a value within a defined partition.

For a quick overview of SQL window functions, please see our Window Functions Cheat Sheet.

MySQL Window Function Syntax

Window function syntax is as follows:

Window function syntax

The window function (red box) is declared first, followed by the OVER clause (blue box). The OVER clause determines how the rows are arranged and then processed by the window function.

Inside the OVER clause are the PARTITION BY and ORDER BY clauses (green boxes). The optional PARTITION BY clause divides window columns into groups (partitions), as you’ll see in subsequent examples. The ORDER BY clause – also optional – orders the resulting table by the values in a user-selected column.

Depending on the particular window function you choose, you’ll use either PARTITION BY, ORDER BY, neither, or both.

Finally, there’s the closing OVER bracket (blue box).

Here’s an example of the above syntax at work:

syntax

Next, we’ll compare the window functions with the GROUP BY clause. This will help you get a better general understanding of the concept behind window functions.

Window Functions vs. GROUP BY

To truly understand window functions, let’s briefly compare window functions and aggregate functions.

In aggregate functions, input table rows are summarized, so several (or many) rows are collapsed into one summary row. Let’s see an example. To do this, we'll need the table CARS_SOLD:

SaleIdCarCountrySold
1AudiGermany120
2AudiUSA110
3AudiJapan100
4BMWGermany250
5BMWUSA200
6BMWJapan200
7FordGermany260
8FordUSA300
9FordJapan200

When running this query, which uses GROUP BY with the aggregate function SUM(), the number of rows in the result table is less than in the original input table. This is because the rows are treated as the operands in the addition operation. It returns one summary row for each car type.

SELECT 
Car, 
SUM(Sold)
FROM CARS_SOLD
GROUP BY Car;

And the result table:

CarSUM(Sold)
Audi330
BMW650
Ford760

On the other hand, the number of rows in the input and output tables is the same when we use a window function:

SELECT 
Car, 
Country, 
Sold, 
SUM(Sold) 
OVER (PARTITION BY Car) as SoldBrand 
FROM CARS_SOLD;

The result is:

CarCountrySoldSoldBrand
AudiGermany120330
AudiUSA110330
AudiJapan100330
BMWGermany250650
BMWUSA200650
BMWJapan200650
FordGermany260760
FordUSA300760
FordJapan200760

The summary is returned in the SoldBrand column, where the total value of sold cars per brand is presented. Thus, the returned number of records equals the total number of rows in the input table.

You may notice that the PARTITION BY clause in this example plays the same role as the GROUP BY clause in the previous query: it groups rows that have the same values in the Car column.

For more information on the value of window functions, see our article Why Should I Learn SQL Window Functions? or check out our course on Window Functions. Be sure to read our article SQL Course of the Month - Window Functions first to find out what to expect from the course.

Now that we’ve seen the difference between window and aggregate functions, let’s start exploring MySQL window functions.

Learn MySQL Window Functions by Example

This section will introduce you to the power of MySQL window functions. We’ll examine each function on its own and explain what it does. The tables used throughout the examples are presented below.

Table CONTINENTS

ContIdContinent
1America
2Europe
3Asia

Table COUNTRIES

CountryIdCountryNameContinentTotalCountrySales
1USA190
2Germany295
3France280
4Japan360
5Italy275
6Sweden285

Table CAR_MAKERS

IdMakerProductionYearCountryPrice
2Volkswagen2015270000
3BMW20152100000
5Ford Motor Company2015180000
7Citroen2015350000
8Nissan Motors2017450000
9Fiat2017540000
11Honda2017440000
13Daimler Benz20172110000
14Opel2017265000
15Peugeot2019355000
16Renault2019360000
19Toyota2019445000
21Volvo2019675000

Example 1: The Ranking Window Functions

Let’s look at some examples presenting the ranking window functions.

RANK() Window Function

The RANK() window function assigns a ranking value to each row within the defined partition. It reinitializes the rank to start from 1 when the partition is switched.

RANK() skips sequence numbers if the row value is repeated, i.e. the same rank is given to rows with the same values.

It’s easier to understand the working principle of RANK() when we look at the example below.

SELECT 
Maker, 
Price, 
RANK() 
OVER (ORDER BY Price) as RankValue
FROM CAR_MAKERS;
MakerPriceRankValue
Fiat400001
Honda400001
Toyota450003
Citroen500004
Nissan Motors500004
Peugeot550006
Renault600007
Opel650008
Volkswagen700009
Volvo7500010
Ford Motor Company8000011
BMW10000012
Daimler Benz11000013

In this example, we are ordering the result using the Price column and treating all table rows as one partition. The first row is assigned to the rank value 1. In the second row, the rank value is 1 again because the Price column value is the same for the first and second row.

In the third row, the rank value is 3 (because of the skipped number due to the repeated Price column values); the value is 3 because this is the third row. In the fourth row, the rank value is 4 as expected. Once again, two rows share the same Price value, and thus the rank 4 is repeated. This principle is applied throughout all the rows in the example.

Next, let’s look at an example of the RANK() window function that partitions the result by the ProductionYear column and then orders each partition according to the Country column:

SELECT 
Maker, 
ProductionYear, 
Price, 
RANK() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as RankValue
FROM CAR_MAKERS;
MakerProductionYearPriceRankValue
Citroen2015500001
Volkswagen2015700002
Ford Motor Company2015800003
BMW20151000004
Fiat2017400001
Honda2017400001
Nissan Motors2017500003
Opel2017650004
Daimler Benz20171100005
Toyota2019450001
Peugeot2019550002
Renault2019600003
Volvo2019750004

We placed the PARTITION BY clause inside the OVER() clause. This divides the result into 3 groups:

  • Group 1 with a ProductionYear of 2015.
  • Group 2 with a ProductionYear of 2017.
  • Group 3 with a ProductionYear of 2019.

Similar partitioning by the ProductionYear column takes place in the upcoming examples of this article.

In this example, each group (partition) has its rank values started from 1. The rank values are assigned using the same working principle as described in the previous example.

DENSE_RANK() Window Function

The DENSE_RANK() window function is very similar to the RANK() function. The only difference is that it does not skip any numbers in the rank sequence.

Let’s examine the example below to see the difference:

SELECT 
Maker, 
ProductionYear, 
Price, 
DENSE_RANK() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as DenseRankValue
FROM CAR_MAKERS;
MakerProductionYearPriceDenseRankValue
Citroen2015500001
Volkswagen2015700002
Ford Motor Company2015800003
BMW20151000004
Fiat2017400001
Honda2017400001
Nissan Motors2017500002
Opel2017650003
Daimler Benz20171100004
Toyota2019450001
Peugeot2019550002
Renault2019600003
Volvo2019750004

In the seventh row of the above output table, the DenseRankValue column value is 2 – although it is in the third row of its partition. The DENSE_RANK() function does not skip the rank value 2 , which was not used in the second row due to row value repetitions. On the other hand, RANK() would use the rank value 3 in the third row of the red partition.

ROW_NUMBER() Window Function

The ROW_NUMBER() window function does exactly what its name says. It assigns a row number to each record within the partition; it reinitializes row numbers to start from 1 when the partition is switched.

First, let’s look at an example that does not use the PARTITION BY clause within the OVER() clause. Please note that although the OVER() clause is empty, you still have to define it with the window function:

SELECT 
CountryName, 
Continent,
ROW_NUMBER() OVER () as RowNumberValue
FROM COUNTRIES;
CountryNameContinentRowNumberValue
USA11
Germany22
France23
Japan34
Italy25
Sweden26

As we do not partition the result set, the output is very straightforward. The rows are numbered from 1 to 6.

Once we partition the output table, there will be more than one set of row numbers. Please note that the ORDER BY clause is optional in both of these examples.

SELECT 
Maker, 
ProductionYear, 
Price, 
ROW_NUMBER() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as RowNumberValue
FROM CAR_MAKERS;
MakerProductionYearPriceRowNumberValue
Citroen2015500001
Volkswagen2015700002
Ford Motor Company2015800003
BMW20151000004
Fiat2017400001
Honda2017400002
Nissan Motors2017500003
Opel2017650004
Daimler Benz20171100005
Toyota2019450001
Peugeot2019550002
Renault2019600003
Volvo2019750004

Notice that rows are numbered from 1 for each of the partitioned groups.

PERCENT_RANK() Window Function

The PERCENT_RANK() window function returns the value from 0 to 1 (both inclusive), which indicates the percentage of current partition rows with a value less than the current row value. This will become clearer with an example. Have a look:

SELECT 
Maker, 
ProductionYear, 
Price, 
PERCENT_RANK() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as PercentValue
FROM CAR_MAKERS;
MakerProductionYearPricePercentValue
Citroen2015500000
Volkswagen2015700000.3333333333333333
Ford Motor Company2015800000.6666666666666666
BMW20151000001
Fiat2017400000
Honda2017400000
Nissan Motors2017500000.5
Opel2017650000.75
Daimler Benz20171100001
Toyota2019450000
Peugeot2019550000.3333333333333333
Renault2019600000.6666666666666666
Volvo2019750001

Let’s go through the output table starting from the first row. In the first partition, there is no Price column value that is smaller than 50000. Hence, the PercentValue column value is 0, i.e. 0% of the current partition’s Price values are smaller than the current row’s Price value.

In the case of the second row, 1/3 of the Price values in the current partition are smaller than 70000. So, the PercentValue value of this row is 0.333(or 33%).

In the last row of this partition, we see that all the Price values are smaller than 100000. Thus, the PercentValue value is 1, i.e. 100% of the current partition’s Price values are smaller than the current Price value.

You can apply the same working principle to go through the remaining two partitions of the above example.

Want to know more about ranking window functions? The article Overview of Ranking Functions in SQL will help you out.

Example 2: The Value Window Functions

Below are some usage examples of MySQL’s value window functions, which are very useful for data analysis.

LEAD() Window Function

The LEAD() window function allows us to look up the values of subsequent rows in the current partition. It is commonly used to calculate the difference between current and following row values.

First, let’s look at the example that simply uses the LEAD() function to output the value of the row after the current one:

SELECT 
Maker, 
ProductionYear, 
Price, 
TotalCountrySales,
LEAD(TotalCountrySales, 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as LeadValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearPriceTotalCountrySalesLeadValue
Citroen2015500008095
Volkswagen2015700009590
Ford Motor Company2015800009095
BMW2015100000950
Honda2017400006075
Fiat2017400007560
Nissan Motors2017500006095
Opel2017650009595
Daimler Benz2017110000950
Toyota2019450006080
Peugeot2019550008080
Renault2019600008085
Volvo201975000850

The LEAD() window function takes three arguments:

  • The column value to be returned – in this case, it is the TotalCountrySales column.
  • The row number (relative to the current row) to be looked up – here, LEAD() looks up the value of the row immediately after the current row.
  • A default value that’s returned if there is no subsequent row – in the current example, the LeadValue in the fourth row is 0 because there are no more rows in this partition.

In the output, the first row has a LeadValue equal to 95 – the TotalCountrySales of the second row. The second row has a LeadValue of 90, the TotalCountrySales of the third row, and so on. As I already mentioned, the LeadValue of the fourth row is 0 (the default value we defined); although there are subsequent rows in the table, the fourth row is the last row in the current partition.

This process is repeated for each partition of the resulted table.

Now let’s try to calculate the TotalCountrySales difference between the current and subsequent rows.

SELECT 
Maker, 
ProductionYear, 
Price, 
TotalCountrySales,
LEAD(TotalCountrySales , 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as LeadValue,
TotalCountrySales - (LEAD(TotalCountrySales , 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price)) 
as Difference
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearCountryTotalCountrySalesLeadValueDifference
Citroen2015500008095-15
Volkswagen20157000095905
Ford Motor Company2015800009095-5
BMW201510000095095
Fiat201740000756015
Honda20174000060600
Nissan Motors2017500006095-35
Opel20176500095950
Daimler Benz201711000095095
Toyota2019450006080-20
Peugeot20195500080800
Renault2019600008085-5
Volvo20197500085085

Except for the Difference column, this example is similar to the previous one. This column stores the difference between the TotalCountrySales value of the current row (column TotalCountrySales) and the TotalCountrySales value of the next row (column LeadValue).

Please note that the LEAD() window function is not limited to getting the value from the immediate next row. Let’s see an example where LEAD() is used to get a value from the third row after the current row. In this example, we do not partition the result table.

SELECT 
Maker, 
ProductionYear, 
Price, 
CountrySalesIndicator,
LEAD(TotalCountrySales, 3) 
OVER (ORDER BY Price) 
as LeadValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearPriceTotalCountrySalesLeadValue
Fiat2017400007580
Honda2017400006060
Toyota2019450006080
Citroen2015500008080
Nissan Motors2017500006095
Peugeot2019550008095
Renault2019600008085
Opel2017650009590
Volkswagen2015700009595
Volvo2019750008595
Ford Motor Company2015800009080
BMW20151000009585
Daimler Benz2017110000950

Starting from the first row of the result table, each row gets a value of the 3rd row following it. Please note that the last 3 rows values are null because there is no value for the 3rd row following them.

LAG() Window Function

The LAG() window function is the opposite of the LEAD() function. It returns the value from a previous row.

Let’s see it in action. The following example returns a LagValue and calculates the difference between the current row and the previous row:

SELECT 
Maker, 
ProductionYear, 
Price, 
TotalCountrySales,
LAG(TotalCountrySales, 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as LagValue,
TotalCountrySales - (LAG(TotalCountrySales, 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price)) 
as Difference
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearPriceTotalCountrySalesLagValueDifference
Citroen20155000080080
Volkswagen201570000958015
Ford Motor Company2015800009095-5
BMW201510000095905
Fiat20174000075075
Honda2017400006075-15
Nissan Motors20175000060600
Opel201765000956035
Daimler Benz201711000095950
Toyota20194500060060
Peugeot201955000806020
Renault20196000080800
Volvo20197500085805

The example should speak for itself, as it is analogous to the example used for LEAD(). Here the values are looked up from the previous row; this is opposite of the LEAD() function, where the values are looked up from the subsequent row.

Like LEAD(), the LAG() window function can fetch the value of any row preceding it – not just the immediately preceding row.

FIRST_VALUE() Window Functions

This window function outputs the first value of the current partition. This will be clear after looking at the example below.

SELECT 
Maker, 
ProductionYear, 
Price,
TotalCountrySales,
FIRST_VALUE(TotalCountrySales) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as FirstValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearPriceTotalCountrySalesFirstValue
Citroen2015500008080
Volkswagen2015700009580
Ford Motor Company2015800009080
BMW20151000009580
Fiat2017400007575
Honda2017400006075
Nissan Motors2017500006075
Opel2017650009575
Daimler Benz20171100009575
Toyota2019450006060
Peugeot2019550008060
Renault2019600008060
Volvo2019750008560

Let’s analyze the first partition in this output table. The FirstValue column contains an 80 for all rows; this is the TotalCountrySales column value of the first row in this partition.

Example 3: Using Aggregate Functions with OVER()

Our next example demonstrates the usage of the aggregate functions SUM(), COUNT(), and AVG() with the OVER() clause. Using these functions with OVER() turns them into window functions.

Please note that the OVER() clause is mandatory for all window functions, regardless of whether it is empty or not.

SELECT 
Maker, 
ProductionYear, 
Country, 
TotalCountrySales,
SUM(TotalCountrySales) 
OVER (PARTITION BY ProductionYear) as SumValue,
COUNT(TotalCountrySales) 
OVER (PARTITION BY ProductionYear) as CountValue,
AVG(TotalCountrySales) 
OVER (PARTITION BY ProductionYear) as AvgValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearCountryTotalCountrySalesSumValueCountValueAvgValue
Volkswagen2015295360490.0000
BMW2015295360490.0000
Ford Motor Company2015190360490.0000
Citroen2015380360490.0000
Nissan Motors2017460385577.0000
Fiat2017575385577.0000
Honda2017460385577.0000
Daimler Benz2017295385577.0000
Opel2017295385577.0000
Peugeot2019380305476.2500
Renault2019380305476.2500
Toyota2019460305476.2500
Volvo2019685305476.2500

In the above example, the sum, count, and average values are calculated using the values in the current partition. It is also worth mentioning that they are calculated for each partition separately.

Let’s analyze the first partition to see what’s going on:

  • The SumValue column stores the sum of all TotalCountrySales values in the current partition (i.e. 95+95+90+80=360).
  • The CountValue column stores the number of records in the current partition (i.e. 4).
  • The AvgValue column stores the average of TotalCountrySales column values in the current partition (i.e. (95+95+90+80)/4=90).

A good example of ‘aggregate function + OVER() clause’ usage is the SQL running total. If you want to learn more, please see the article What Is a SQL Running Total and How Do You Compute It?.

Example 4: NTILE() and CUME_DIST() Window Functions

The NTILE() Window Function

The NTILE() window function divides partition rows into buckets. It takes the parameter n that the user places in the function brackets and creates that number of buckets. Thus, n groups/buckets are created for each partition and the appropriate bucket number is assigned to each row.

Like a few of the other concepts we’ve talked about, this will be clearer after you see an example. Suppose you want to divide car makers into two groups for each production year. This is the query you’d use:

SELECT 
Maker, 
ProductionYear, 
NTILE(2) 
OVER (PARTITION BY ProductionYear ORDER BY Maker) 
as NtileValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearNtileValue
BMW20151
Citroen20151
Ford Motor Company20152
Volkswagen20152
Daimler Benz20171
Fiat20171
Honda20171
Nissan Motors20172
Opel20172
Peugeot20191
Renault20191
Toyota20192
Volvo20192

The column NtileValue contains the bucket number – which is either 1 or 2 – for each row in each partition.

The NTILE window function is used to divide the result table rows into groups. In the example above, we partition the result table into 3 partitions and each partition’s rows are divided into 2 groups because the query uses NTILE(2).

The CUME_DIST() Window Function

The CUME_DIST() window function – as its name indicates – calculates the cumulative distribution value for each row in a partition.

Cumulative distribution is the number of rows that have a value that’s less than or equal to the current row value divided by the number of all rows within the partition.

The example below will help clear up this concept:

SELECT 
Maker, 
ProductionYear, 
Country, 
TotalCountrySales,
CUME_DIST() 
OVER (PARTITION BY ProductionYear ORDER BY TotalCountrySales) as CumeDistValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;
MakerProductionYearCountryTotalCountrySalesCumeDistValue
Citroen20153800.25
Ford Motor Company20151900.5
Volkswagen20152951
BMW20152951
Nissan Motors20174600.4
Honda20174600.4
Fiat20175750.6
Daimler Benz20172951
Opel20172951
Toyota20194600.25
Peugeot20193800.75
Renault20193800.75
Volvo20196851

Let’s analyze the first partition as an example. It includes TotalCountrySales column values of 80, 90, 95, and 95. The CumeDistValue column values of the first four rows can be explained as follows:

  • The first row has a TotalCountrySales column value of 80. This is the smallest TotalCountrySales value in the current partition; hence, 1/4=0.25.
  • The second row has a TotalCountrySales value of 90. There are two values less than or equal to 90; hence, 2/4=0.5.
  • The third and fourth rows have a TotalCountrySales value of 95. This is the highest in this partition, so 4/4=1.

The CUME_DIST window function returns the cumulative distribution value. It is used mostly for data analysis.

Going Deeper into MySQL Window Functions

SQL window functions are similar to aggregate functions, but with a major difference. Aggregate functions create one summary row for a group of rows, while window functions assign summary values to each row. MySQL window functions are very useful for thorough data analysis: they allow the user to analyze each data group without losing the details in each row.

This article was meant to be an introduction to MySQL window functions. To really learn them, you should check out the LearnSQL.com course on Window Functions. It’ll give you a detailed explanation of each function and lots of opportunities to practice all of them.