2nd Oct 2020 17 minutes read

A Comprehensive Introduction to Window Functions in MySQL

window functions

Table of Contents

Overview of MySQL Window Functions
MySQL Window Function Syntax
Window Functions vs. GROUP BY
Learn MySQL Window Functions by Example
Going Deeper into MySQL Window Functions

Do a deep dive into the working principles, syntax, and applications of various MySQL window functions.

There are many open source databases you could choose; MySQL is consistently one of the most popular. But until 2018, there was no provision for including window functions in MySQL. Fortunately, all that has changed – starting from MySQL 8.0, SQL window functions were now available to MySQL users.

Please note that window functions are available only in MySQL 8.0. As of the time of this writing, there is no plan to add SQL window functions to MySQL’s earlier releases.

Why is the advent of window functions in MySQL so important? They are a powerful data analysis and reporting tool. It’s essential to learn window functions, especially if you work with earlier releases of MySQL and are planning to upgrade to MySQL 8.0.

In this article, we’ll go through MySQL’s available window functions and explain their working principles. Next, we’ll compare window functions and aggregate functions. Finally, we’ll examine various sample queries (including their input and output tables) so you can see for yourself how window functions work.

Overview of MySQL Window Functions

The table presents the available window functions in MySQL along with a brief explanation of each:

Window Function	Description
RANK()	Returns the rank of the current row within a defined partition. If one or more rows share the same ranking value, some rank numbers will be omitted from the sequence (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 4…).
DENSE_RANK()	Returns the rank of the current row within a defined partition. The difference between `DENSE_RANK()` and `RANK()` is that `DENSE_RANK()` does not skip any sequence numbers. (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 3, 4…).
ROW_NUMBER()	Returns the number of the current row within a defined partition.
PERCENT_RANK()	Returns a number between 0 and 1 (both inclusive) that indicates the percentage of the current partition values that are less than the current row value of that particular partition.
LEAD()	Looks up a subsequent row value for the specified column within the current partition.
LAG()	Looks up a previous row value for a specified column within the current partition.
FIRST_VALUE()	Returns the value of a specified column for the first row within the current partition.
LAST_VALUE()	Returns the value of a specified column for the last row within the current partition.
NTH_VALUE()	Returns the value of a specified column for the n^th row within the current partition, where n is defined by the user.
NTILE()	Divides rows within the current partition into buckets. The number of buckets is specified by the user. The NTILE() function then assigns the number of the bucket to each row.
CUME_DIST()	Returns the cumulative distribution of a value within a defined partition.

For a quick overview of SQL window functions, please see our Window Functions Cheat Sheet.

MySQL Window Function Syntax

Window function syntax is as follows:

The window function (red box) is declared first, followed by the OVER clause (blue box). The OVER clause determines how the rows are arranged and then processed by the window function.

Inside the OVER clause are the PARTITION BY and ORDER BY clauses (green boxes). The optional PARTITION BY clause divides window columns into groups (partitions), as you’ll see in subsequent examples. The ORDER BY clause – also optional – orders the resulting table by the values in a user-selected column.

Depending on the particular window function you choose, you’ll use either PARTITION BY, ORDER BY, neither, or both.

Finally, there’s the closing OVER bracket (blue box).

Here’s an example of the above syntax at work:

Next, we’ll compare the window functions with the GROUP BY clause. This will help you get a better general understanding of the concept behind window functions.

Window Functions vs. GROUP BY

To truly understand window functions, let’s briefly compare window functions and aggregate functions.

In aggregate functions, input table rows are summarized, so several (or many) rows are collapsed into one summary row. Let’s see an example. To do this, we'll need the table CARS_SOLD:

SaleId	Car	Country	Sold
1	Audi	Germany	120
2	Audi	USA	110
3	Audi	Japan	100
4	BMW	Germany	250
5	BMW	USA	200
6	BMW	Japan	200
7	Ford	Germany	260
8	Ford	USA	300
9	Ford	Japan	200

When running this query, which uses GROUP BY with the aggregate function SUM(), the number of rows in the result table is less than in the original input table. This is because the rows are treated as the operands in the addition operation. It returns one summary row for each car type.

SELECT 
Car, 
SUM(Sold)
FROM CARS_SOLD
GROUP BY Car;

And the result table:

Car	SUM(Sold)
Audi	330
BMW	650
Ford	760

On the other hand, the number of rows in the input and output tables is the same when we use a window function:

SELECT 
Car, 
Country, 
Sold, 
SUM(Sold) 
OVER (PARTITION BY Car) as SoldBrand 
FROM CARS_SOLD;

The result is:

Car	Country	Sold	SoldBrand
Audi	Germany	120	330
Audi	USA	110	330
Audi	Japan	100	330
BMW	Germany	250	650
BMW	USA	200	650
BMW	Japan	200	650
Ford	Germany	260	760
Ford	USA	300	760
Ford	Japan	200	760

The summary is returned in the SoldBrand column, where the total value of sold cars per brand is presented. Thus, the returned number of records equals the total number of rows in the input table.

You may notice that the PARTITION BY clause in this example plays the same role as the GROUP BY clause in the previous query: it groups rows that have the same values in the Car column.

For more information on the value of window functions, see our article Why Should I Learn SQL Window Functions? or check out our course on Window Functions. Be sure to read our article SQL Course of the Month - Window Functions first to find out what to expect from the course.

Now that we’ve seen the difference between window and aggregate functions, let’s start exploring MySQL window functions.

Learn MySQL Window Functions by Example

This section will introduce you to the power of MySQL window functions. We’ll examine each function on its own and explain what it does. The tables used throughout the examples are presented below.

Table CONTINENTS

ContId	Continent
1	America
2	Europe
3	Asia

Table COUNTRIES

CountryId	CountryName	Continent	TotalCountrySales
1	USA	1	90
2	Germany	2	95
3	France	2	80
4	Japan	3	60
5	Italy	2	75
6	Sweden	2	85

Table CAR_MAKERS

Id	Maker	ProductionYear	Country	Price
2	Volkswagen	2015	2	70000
3	BMW	2015	2	100000
5	Ford Motor Company	2015	1	80000
7	Citroen	2015	3	50000
8	Nissan Motors	2017	4	50000
9	Fiat	2017	5	40000
11	Honda	2017	4	40000
13	Daimler Benz	2017	2	110000
14	Opel	2017	2	65000
15	Peugeot	2019	3	55000
16	Renault	2019	3	60000
19	Toyota	2019	4	45000
21	Volvo	2019	6	75000

Example 1: The Ranking Window Functions

Let’s look at some examples presenting the ranking window functions.

RANK() Window Function

The RANK() window function assigns a ranking value to each row within the defined partition. It reinitializes the rank to start from 1 when the partition is switched.

RANK() skips sequence numbers if the row value is repeated, i.e. the same rank is given to rows with the same values.

It’s easier to understand the working principle of RANK() when we look at the example below.

SELECT 
Maker, 
Price, 
RANK() 
OVER (ORDER BY Price) as RankValue
FROM CAR_MAKERS;

Maker	Price	RankValue
Fiat	40000	1
Honda	40000	1
Toyota	45000	3
Citroen	50000	4
Nissan Motors	50000	4
Peugeot	55000	6
Renault	60000	7
Opel	65000	8
Volkswagen	70000	9
Volvo	75000	10
Ford Motor Company	80000	11
BMW	100000	12
Daimler Benz	110000	13

In this example, we are ordering the result using the Price column and treating all table rows as one partition. The first row is assigned to the rank value 1. In the second row, the rank value is 1 again because the Price column value is the same for the first and second row.

In the third row, the rank value is 3 (because of the skipped number due to the repeated Price column values); the value is 3 because this is the third row. In the fourth row, the rank value is 4 as expected. Once again, two rows share the same Price value, and thus the rank 4 is repeated. This principle is applied throughout all the rows in the example.

Next, let’s look at an example of the RANK() window function that partitions the result by the ProductionYear column and then orders each partition according to the Country column:

SELECT 
Maker, 
ProductionYear, 
Price, 
RANK() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as RankValue
FROM CAR_MAKERS;

Maker	ProductionYear	Price	RankValue
Citroen	2015	50000	1
Volkswagen	2015	70000	2
Ford Motor Company	2015	80000	3
BMW	2015	100000	4
Fiat	2017	40000	1
Honda	2017	40000	1
Nissan Motors	2017	50000	3
Opel	2017	65000	4
Daimler Benz	2017	110000	5
Toyota	2019	45000	1
Peugeot	2019	55000	2
Renault	2019	60000	3
Volvo	2019	75000	4

We placed the PARTITION BY clause inside the OVER() clause. This divides the result into 3 groups:

Group 1 with a ProductionYear of 2015.
Group 2 with a ProductionYear of 2017.
Group 3 with a ProductionYear of 2019.

Similar partitioning by the ProductionYear column takes place in the upcoming examples of this article.

In this example, each group (partition) has its rank values started from 1. The rank values are assigned using the same working principle as described in the previous example.

DENSE_RANK() Window Function

The DENSE_RANK() window function is very similar to the RANK() function. The only difference is that it does not skip any numbers in the rank sequence.

Let’s examine the example below to see the difference:

SELECT 
Maker, 
ProductionYear, 
Price, 
DENSE_RANK() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as DenseRankValue
FROM CAR_MAKERS;

Maker	ProductionYear	Price	DenseRankValue
Citroen	2015	50000	1
Volkswagen	2015	70000	2
Ford Motor Company	2015	80000	3
BMW	2015	100000	4
Fiat	2017	40000	1
Honda	2017	40000	1
Nissan Motors	2017	50000	2
Opel	2017	65000	3
Daimler Benz	2017	110000	4
Toyota	2019	45000	1
Peugeot	2019	55000	2
Renault	2019	60000	3
Volvo	2019	75000	4

In the seventh row of the above output table, the DenseRankValue column value is 2 – although it is in the third row of its partition. The DENSE_RANK() function does not skip the rank value 2 , which was not used in the second row due to row value repetitions. On the other hand, RANK() would use the rank value 3 in the third row of the red partition.

ROW_NUMBER() Window Function

The ROW_NUMBER() window function does exactly what its name says. It assigns a row number to each record within the partition; it reinitializes row numbers to start from 1 when the partition is switched.

First, let’s look at an example that does not use the PARTITION BY clause within the OVER() clause. Please note that although the OVER() clause is empty, you still have to define it with the window function:

SELECT 
CountryName, 
Continent,
ROW_NUMBER() OVER () as RowNumberValue
FROM COUNTRIES;

CountryName	Continent	RowNumberValue
USA	1	1
Germany	2	2
France	2	3
Japan	3	4
Italy	2	5
Sweden	2	6

As we do not partition the result set, the output is very straightforward. The rows are numbered from 1 to 6.

Once we partition the output table, there will be more than one set of row numbers. Please note that the ORDER BY clause is optional in both of these examples.

SELECT 
Maker, 
ProductionYear, 
Price, 
ROW_NUMBER() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as RowNumberValue
FROM CAR_MAKERS;

Maker	ProductionYear	Price	RowNumberValue
Citroen	2015	50000	1
Volkswagen	2015	70000	2
Ford Motor Company	2015	80000	3
BMW	2015	100000	4
Fiat	2017	40000	1
Honda	2017	40000	2
Nissan Motors	2017	50000	3
Opel	2017	65000	4
Daimler Benz	2017	110000	5
Toyota	2019	45000	1
Peugeot	2019	55000	2
Renault	2019	60000	3
Volvo	2019	75000	4

Notice that rows are numbered from 1 for each of the partitioned groups.

PERCENT_RANK() Window Function

The PERCENT_RANK() window function returns the value from 0 to 1 (both inclusive), which indicates the percentage of current partition rows with a value less than the current row value. This will become clearer with an example. Have a look:

SELECT 
Maker, 
ProductionYear, 
Price, 
PERCENT_RANK() 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as PercentValue
FROM CAR_MAKERS;

Maker	ProductionYear	Price	PercentValue
Citroen	2015	50000	0
Volkswagen	2015	70000	0.3333333333333333
Ford Motor Company	2015	80000	0.6666666666666666
BMW	2015	100000	1
Fiat	2017	40000	0
Honda	2017	40000	0
Nissan Motors	2017	50000	0.5
Opel	2017	65000	0.75
Daimler Benz	2017	110000	1
Toyota	2019	45000	0
Peugeot	2019	55000	0.3333333333333333
Renault	2019	60000	0.6666666666666666
Volvo	2019	75000	1

Let’s go through the output table starting from the first row. In the first partition, there is no Price column value that is smaller than 50000. Hence, the PercentValue column value is 0, i.e. 0% of the current partition’s Price values are smaller than the current row’s Price value.

In the case of the second row, 1/3 of the Price values in the current partition are smaller than 70000. So, the PercentValue value of this row is 0.333(or 33%).

In the last row of this partition, we see that all the Price values are smaller than 100000. Thus, the PercentValue value is 1, i.e. 100% of the current partition’s Price values are smaller than the current Price value.

You can apply the same working principle to go through the remaining two partitions of the above example.

Want to know more about ranking window functions? The article Overview of Ranking Functions in SQL will help you out.

Example 2: The Value Window Functions

Below are some usage examples of MySQL’s value window functions, which are very useful for data analysis.

LEAD() Window Function

The LEAD() window function allows us to look up the values of subsequent rows in the current partition. It is commonly used to calculate the difference between current and following row values.

First, let’s look at the example that simply uses the LEAD() function to output the value of the row after the current one:

SELECT 
Maker, 
ProductionYear, 
Price, 
TotalCountrySales,
LEAD(TotalCountrySales, 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as LeadValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	Price	TotalCountrySales	LeadValue
Citroen	2015	50000	80	95
Volkswagen	2015	70000	95	90
Ford Motor Company	2015	80000	90	95
BMW	2015	100000	95	0
Honda	2017	40000	60	75
Fiat	2017	40000	75	60
Nissan Motors	2017	50000	60	95
Opel	2017	65000	95	95
Daimler Benz	2017	110000	95	0
Toyota	2019	45000	60	80
Peugeot	2019	55000	80	80
Renault	2019	60000	80	85
Volvo	2019	75000	85	0

The LEAD() window function takes three arguments:

The column value to be returned – in this case, it is the TotalCountrySales column.
The row number (relative to the current row) to be looked up – here, LEAD() looks up the value of the row immediately after the current row.
A default value that’s returned if there is no subsequent row – in the current example, the LeadValue in the fourth row is 0 because there are no more rows in this partition.

In the output, the first row has a LeadValue equal to 95 – the TotalCountrySales of the second row. The second row has a LeadValue of 90, the TotalCountrySales of the third row, and so on. As I already mentioned, the LeadValue of the fourth row is 0 (the default value we defined); although there are subsequent rows in the table, the fourth row is the last row in the current partition.

This process is repeated for each partition of the resulted table.

Now let’s try to calculate the TotalCountrySales difference between the current and subsequent rows.

SELECT 
Maker, 
ProductionYear, 
Price, 
TotalCountrySales,
LEAD(TotalCountrySales , 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as LeadValue,
TotalCountrySales - (LEAD(TotalCountrySales , 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price)) 
as Difference
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	Country	TotalCountrySales	LeadValue	Difference
Citroen	2015	50000	80	95	-15
Volkswagen	2015	70000	95	90	5
Ford Motor Company	2015	80000	90	95	-5
BMW	2015	100000	95	0	95
Fiat	2017	40000	75	60	15
Honda	2017	40000	60	60	0
Nissan Motors	2017	50000	60	95	-35
Opel	2017	65000	95	95	0
Daimler Benz	2017	110000	95	0	95
Toyota	2019	45000	60	80	-20
Peugeot	2019	55000	80	80	0
Renault	2019	60000	80	85	-5
Volvo	2019	75000	85	0	85

Except for the Difference column, this example is similar to the previous one. This column stores the difference between the TotalCountrySales value of the current row (column TotalCountrySales) and the TotalCountrySales value of the next row (column LeadValue).

Please note that the LEAD() window function is not limited to getting the value from the immediate next row. Let’s see an example where LEAD() is used to get a value from the third row after the current row. In this example, we do not partition the result table.

SELECT 
Maker, 
ProductionYear, 
Price, 
CountrySalesIndicator,
LEAD(TotalCountrySales, 3) 
OVER (ORDER BY Price) 
as LeadValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	Price	TotalCountrySales	LeadValue
Fiat	2017	40000	75	80
Honda	2017	40000	60	60
Toyota	2019	45000	60	80
Citroen	2015	50000	80	80
Nissan Motors	2017	50000	60	95
Peugeot	2019	55000	80	95
Renault	2019	60000	80	85
Opel	2017	65000	95	90
Volkswagen	2015	70000	95	95
Volvo	2019	75000	85	95
Ford Motor Company	2015	80000	90	80
BMW	2015	100000	95	85
Daimler Benz	2017	110000	95	0

Starting from the first row of the result table, each row gets a value of the 3rd row following it. Please note that the last 3 rows values are null because there is no value for the 3rd row following them.

LAG() Window Function

The LAG() window function is the opposite of the LEAD() function. It returns the value from a previous row.

Let’s see it in action. The following example returns a LagValue and calculates the difference between the current row and the previous row:

SELECT 
Maker, 
ProductionYear, 
Price, 
TotalCountrySales,
LAG(TotalCountrySales, 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as LagValue,
TotalCountrySales - (LAG(TotalCountrySales, 1, 0) 
OVER (PARTITION BY ProductionYear ORDER BY Price)) 
as Difference
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	Price	TotalCountrySales	LagValue	Difference
Citroen	2015	50000	80	0	80
Volkswagen	2015	70000	95	80	15
Ford Motor Company	2015	80000	90	95	-5
BMW	2015	100000	95	90	5
Fiat	2017	40000	75	0	75
Honda	2017	40000	60	75	-15
Nissan Motors	2017	50000	60	60	0
Opel	2017	65000	95	60	35
Daimler Benz	2017	110000	95	95	0
Toyota	2019	45000	60	0	60
Peugeot	2019	55000	80	60	20
Renault	2019	60000	80	80	0
Volvo	2019	75000	85	80	5

The example should speak for itself, as it is analogous to the example used for LEAD(). Here the values are looked up from the previous row; this is opposite of the LEAD() function, where the values are looked up from the subsequent row.

Like LEAD(), the LAG() window function can fetch the value of any row preceding it – not just the immediately preceding row.

FIRST_VALUE() Window Functions

This window function outputs the first value of the current partition. This will be clear after looking at the example below.

SELECT 
Maker, 
ProductionYear, 
Price,
TotalCountrySales,
FIRST_VALUE(TotalCountrySales) 
OVER (PARTITION BY ProductionYear ORDER BY Price) 
as FirstValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	Price	TotalCountrySales	FirstValue
Citroen	2015	50000	80	80
Volkswagen	2015	70000	95	80
Ford Motor Company	2015	80000	90	80
BMW	2015	100000	95	80
Fiat	2017	40000	75	75
Honda	2017	40000	60	75
Nissan Motors	2017	50000	60	75
Opel	2017	65000	95	75
Daimler Benz	2017	110000	95	75
Toyota	2019	45000	60	60
Peugeot	2019	55000	80	60
Renault	2019	60000	80	60
Volvo	2019	75000	85	60

Let’s analyze the first partition in this output table. The FirstValue column contains an 80 for all rows; this is the TotalCountrySales column value of the first row in this partition.

Example 3: Using Aggregate Functions with OVER()

Our next example demonstrates the usage of the aggregate functions SUM(), COUNT(), and AVG() with the OVER() clause. Using these functions with OVER() turns them into window functions.

Please note that the OVER() clause is mandatory for all window functions, regardless of whether it is empty or not.

SELECT 
Maker, 
ProductionYear, 
Country, 
TotalCountrySales,
SUM(TotalCountrySales) 
OVER (PARTITION BY ProductionYear) as SumValue,
COUNT(TotalCountrySales) 
OVER (PARTITION BY ProductionYear) as CountValue,
AVG(TotalCountrySales) 
OVER (PARTITION BY ProductionYear) as AvgValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	Country	TotalCountrySales	SumValue	CountValue	AvgValue
Volkswagen	2015	2	95	360	4	90.0000
BMW	2015	2	95	360	4	90.0000
Ford Motor Company	2015	1	90	360	4	90.0000
Citroen	2015	3	80	360	4	90.0000
Nissan Motors	2017	4	60	385	5	77.0000
Fiat	2017	5	75	385	5	77.0000
Honda	2017	4	60	385	5	77.0000
Daimler Benz	2017	2	95	385	5	77.0000
Opel	2017	2	95	385	5	77.0000
Peugeot	2019	3	80	305	4	76.2500
Renault	2019	3	80	305	4	76.2500
Toyota	2019	4	60	305	4	76.2500
Volvo	2019	6	85	305	4	76.2500

In the above example, the sum, count, and average values are calculated using the values in the current partition. It is also worth mentioning that they are calculated for each partition separately.

Let’s analyze the first partition to see what’s going on:

The SumValue column stores the sum of all TotalCountrySales values in the current partition (i.e. 95+95+90+80=360).
The CountValue column stores the number of records in the current partition (i.e. 4).
The AvgValue column stores the average of TotalCountrySales column values in the current partition (i.e. (95+95+90+80)/4=90).

A good example of ‘aggregate function + OVER() clause’ usage is the SQL running total. If you want to learn more, please see the article What Is a SQL Running Total and How Do You Compute It?.

Example 4: NTILE() and CUME_DIST() Window Functions

The NTILE() Window Function

The NTILE() window function divides partition rows into buckets. It takes the parameter n that the user places in the function brackets and creates that number of buckets. Thus, n groups/buckets are created for each partition and the appropriate bucket number is assigned to each row.

Like a few of the other concepts we’ve talked about, this will be clearer after you see an example. Suppose you want to divide car makers into two groups for each production year. This is the query you’d use:

SELECT 
Maker, 
ProductionYear, 
NTILE(2) 
OVER (PARTITION BY ProductionYear ORDER BY Maker) 
as NtileValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	NtileValue
BMW	2015	1
Citroen	2015	1
Ford Motor Company	2015	2
Volkswagen	2015	2
Daimler Benz	2017	1
Fiat	2017	1
Honda	2017	1
Nissan Motors	2017	2
Opel	2017	2
Peugeot	2019	1
Renault	2019	1
Toyota	2019	2
Volvo	2019	2

The column NtileValue contains the bucket number – which is either 1 or 2 – for each row in each partition.

The NTILE window function is used to divide the result table rows into groups. In the example above, we partition the result table into 3 partitions and each partition’s rows are divided into 2 groups because the query uses NTILE(2).

The CUME_DIST() Window Function

The CUME_DIST() window function – as its name indicates – calculates the cumulative distribution value for each row in a partition.

Cumulative distribution is the number of rows that have a value that’s less than or equal to the current row value divided by the number of all rows within the partition.

The example below will help clear up this concept:

SELECT 
Maker, 
ProductionYear, 
Country, 
TotalCountrySales,
CUME_DIST() 
OVER (PARTITION BY ProductionYear ORDER BY TotalCountrySales) as CumeDistValue
FROM CAR_MAKERS cm JOIN COUNTRIES c 
ON cm.Country = c.CountryID;

Maker	ProductionYear	Country	TotalCountrySales	CumeDistValue
Citroen	2015	3	80	0.25
Ford Motor Company	2015	1	90	0.5
Volkswagen	2015	2	95	1
BMW	2015	2	95	1
Nissan Motors	2017	4	60	0.4
Honda	2017	4	60	0.4
Fiat	2017	5	75	0.6
Daimler Benz	2017	2	95	1
Opel	2017	2	95	1
Toyota	2019	4	60	0.25
Peugeot	2019	3	80	0.75
Renault	2019	3	80	0.75
Volvo	2019	6	85	1

Let’s analyze the first partition as an example. It includes TotalCountrySales column values of 80, 90, 95, and 95. The CumeDistValue column values of the first four rows can be explained as follows:

The first row has a TotalCountrySales column value of 80. This is the smallest TotalCountrySales value in the current partition; hence, 1/4=0.25.
The second row has a TotalCountrySales value of 90. There are two values less than or equal to 90; hence, 2/4=0.5.
The third and fourth rows have a TotalCountrySales value of 95. This is the highest in this partition, so 4/4=1.

The CUME_DIST window function returns the cumulative distribution value. It is used mostly for data analysis.

Going Deeper into MySQL Window Functions

SQL window functions are similar to aggregate functions, but with a major difference. Aggregate functions create one summary row for a group of rows, while window functions assign summary values to each row. MySQL window functions are very useful for thorough data analysis: they allow the user to analyze each data group without losing the details in each row.

This article was meant to be an introduction to MySQL window functions. To really learn them, you should check out the LearnSQL.com course on Window Functions. It’ll give you a detailed explanation of each function and lots of opportunities to practice all of them.

Tags:

window functions