# A Comprehensive Introduction to Window Functions in MySQL

*Do a deep dive into the working principles, syntax, and applications of various MySQL window functions. *

There are many open source databases you could choose; MySQL is consistently one of the most popular. But until 2018, there was no provision for including window functions in MySQL. Fortunately, all that has changed – starting from MySQL 8.0, SQL window functions were now available to MySQL users.

**Please note** that window functions are available **only** in MySQL 8.0. As of the time of this writing, there is no plan to add SQL window functions to MySQL’s earlier releases.

Why is the advent of window functions in MySQL so important? They are a powerful data analysis and reporting tool. It’s essential to learn window functions, especially if you work with earlier releases of MySQL and are planning to upgrade to MySQL 8.0.

In this article, we’ll go through MySQL’s available window functions and explain their working principles. Next, we’ll compare window functions and aggregate functions. Finally, we’ll examine various sample queries (including their input and output tables) so you can see for yourself how window functions work.

## Overview of MySQL Window Functions

The table presents the available window functions in MySQL along with a brief explanation of each:

Window Function | Description |
---|---|

RANK() | Returns the rank of the current row within a defined partition. If one or more rows share the same ranking value, some rank numbers will be omitted from the sequence (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 4…). |

DENSE_RANK() | Returns the rank of the current row within a defined partition. The difference between `DENSE_RANK()` and `RANK()` is that `DENSE_RANK()` does not skip any sequence numbers. (e.g. if there are two rows tied for second rank, the rank sequence will be 1, 2, 2, 3, 4…). |

ROW_NUMBER() | Returns the number of the current row within a defined partition. |

PERCENT_RANK() | Returns a number between 0 and 1 (both inclusive) that indicates the percentage of the current partition values that are less than the current row value of that particular partition. |

LEAD() | Looks up a subsequent row value for the specified column within the current partition. |

LAG() | Looks up a previous row value for a specified column within the current partition. |

FIRST_VALUE() | Returns the value of a specified column for the first row within the current partition. |

LAST_VALUE() | Returns the value of a specified column for the last row within the current partition. |

NTH_VALUE() | Returns the value of a specified column for the n^{th} row within the current partition, where n is defined by the user. |

NTILE() | Divides rows within the current partition into buckets. The number of buckets is specified by the user. The NTILE() function then assigns the number of the bucket to each row. |

CUME_DIST() | Returns the cumulative distribution of a value within a defined partition. |

For a quick overview of SQL window functions, please see our Window Functions Cheat Sheet.

## MySQL Window Function Syntax

Window function syntax is as follows:

The window function (red box) is declared first, followed by the OVER clause (blue box). The `OVER`

clause determines how the rows are arranged and then processed by the window function.

Inside the OVER clause are the `PARTITION BY`

and `ORDER BY`

clauses (green boxes). The optional `PARTITION BY`

clause divides window columns into groups (partitions), as you’ll see in subsequent examples. The `ORDER BY`

clause – also optional – orders the resulting table by the values in a user-selected column.

Depending on the particular window function you choose, you’ll use either PARTITION BY, ORDER BY, neither, or both.

Finally, there’s the closing OVER bracket (blue box).

Here’s an example of the above syntax at work:

Next, we’ll compare the window functions with the GROUP BY clause. This will help you get a better general understanding of the concept behind window functions.

## Window Functions vs. GROUP BY

To truly understand window functions, let’s briefly compare window functions and aggregate functions.

In aggregate functions, input table rows are summarized, so several (or many) rows are collapsed into one summary row. Let’s see an example. To do this, we'll need the table

:**CARS_SOLD**

SaleId | Car | Country | Sold |
---|---|---|---|

1 | Audi | Germany | 120 |

2 | Audi | USA | 110 |

3 | Audi | Japan | 100 |

4 | BMW | Germany | 250 |

5 | BMW | USA | 200 |

6 | BMW | Japan | 200 |

7 | Ford | Germany | 260 |

8 | Ford | USA | 300 |

9 | Ford | Japan | 200 |

When running this query, which uses GROUP BY with the aggregate function `SUM()`

, the number of rows in the result table is less than in the original input table. This is because the rows are treated as the operands in the addition operation. It returns one summary row for each car type.

SELECT Car, SUM(Sold) FROM CARS_SOLD GROUP BY Car;

And the result table:

Car | SUM(Sold) |
---|---|

Audi | 330 |

BMW | 650 |

Ford | 760 |

On the other hand, the number of rows in the input and output tables is the same when we use a window function:

SELECT Car, Country, Sold, SUM(Sold) OVER (PARTITION BY Car) as SoldBrand FROM CARS_SOLD;

The result is:

Car | Country | Sold | SoldBrand |
---|---|---|---|

Audi | Germany | 120 | 330 |

Audi | USA | 110 | 330 |

Audi | Japan | 100 | 330 |

BMW | Germany | 250 | 650 |

BMW | USA | 200 | 650 |

BMW | Japan | 200 | 650 |

Ford | Germany | 260 | 760 |

Ford | USA | 300 | 760 |

Ford | Japan | 200 | 760 |

The summary is returned in the `SoldBrand`

column, where the total value of sold cars per brand is presented. Thus, the returned number of records equals the total number of rows in the input table.

You may notice that the `PARTITION BY`

clause in this example plays the same role as the `GROUP BY`

clause in the previous query: it groups rows that have the same values in the `Car`

column.

For more information on the value of window functions, see our article Why Should I Learn SQL Window Functions? or check out our course on Window Functions. Be sure to read our article SQL Course of the Month - Window Functions first to find out what to expect from the course.

Now that we’ve seen the difference between window and aggregate functions, let’s start exploring MySQL window functions.

## Learn MySQL Window Functions by Example

This section will introduce you to the power of MySQL window functions. We’ll examine each function on its own and explain what it does. The tables used throughout the examples are presented below.

Table **CONTINENTS**

ContId | Continent |
---|---|

1 | America |

2 | Europe |

3 | Asia |

Table **COUNTRIES**

CountryId | CountryName | Continent | TotalCountrySales |
---|---|---|---|

1 | USA | 1 | 90 |

2 | Germany | 2 | 95 |

3 | France | 2 | 80 |

4 | Japan | 3 | 60 |

5 | Italy | 2 | 75 |

6 | Sweden | 2 | 85 |

Table **CAR_MAKERS**

Id | Maker | ProductionYear | Country | Price |
---|---|---|---|---|

2 | Volkswagen | 2015 | 2 | 70000 |

3 | BMW | 2015 | 2 | 100000 |

5 | Ford Motor Company | 2015 | 1 | 80000 |

7 | Citroen | 2015 | 3 | 50000 |

8 | Nissan Motors | 2017 | 4 | 50000 |

9 | Fiat | 2017 | 5 | 40000 |

11 | Honda | 2017 | 4 | 40000 |

13 | Daimler Benz | 2017 | 2 | 110000 |

14 | Opel | 2017 | 2 | 65000 |

15 | Peugeot | 2019 | 3 | 55000 |

16 | Renault | 2019 | 3 | 60000 |

19 | Toyota | 2019 | 4 | 45000 |

21 | Volvo | 2019 | 6 | 75000 |

### Example 1: The Ranking Window Functions

Let’s look at some examples presenting the ranking window functions.

*RANK() Window Function*

The `RANK()`

window function assigns a ranking value to each row within the defined partition. It reinitializes the rank to start from 1 when the partition is switched.

`RANK()`

skips sequence numbers if the row value is repeated, i.e. the same rank is given to rows with the same values.

It’s easier to understand the working principle of `RANK()`

when we look at the example below.

SELECT Maker, Price, RANK() OVER (ORDER BY Price) as RankValue FROM CAR_MAKERS;

Maker | Price | RankValue |
---|---|---|

Fiat | 40000 | 1 |

Honda | 40000 | 1 |

Toyota | 45000 | 3 |

Citroen | 50000 | 4 |

Nissan Motors | 50000 | 4 |

Peugeot | 55000 | 6 |

Renault | 60000 | 7 |

Opel | 65000 | 8 |

Volkswagen | 70000 | 9 |

Volvo | 75000 | 10 |

Ford Motor Company | 80000 | 11 |

BMW | 100000 | 12 |

Daimler Benz | 110000 | 13 |

In this example, we are ordering the result using the `Price`

column and treating all table rows as one partition. The first row is assigned to the rank value 1. In the second row, the rank value is 1 again because the `Price`

column value is the same for the first and second row.

In the third row, the rank value is 3 (because of the skipped number due to the repeated `Price`

column values); the value is 3 because this is the third row. In the fourth row, the rank value is 4 as expected. Once again, two rows share the same `Price`

value, and thus the rank 4 is repeated. This principle is applied throughout all the rows in the example.

Next, let’s look at an example of the `RANK()`

window function that partitions the result by the `ProductionYear`

column and then orders each partition according to the `Country`

column:

SELECT Maker, ProductionYear, Price, RANK() OVER (PARTITION BY ProductionYear ORDER BY Price) as RankValue FROM CAR_MAKERS;

Maker | ProductionYear | Price | RankValue |
---|---|---|---|

Citroen | 2015 | 50000 | 1 |

Volkswagen | 2015 | 70000 | 2 |

Ford Motor Company | 2015 | 80000 | 3 |

BMW | 2015 | 100000 | 4 |

Fiat | 2017 | 40000 | 1 |

Honda | 2017 | 40000 | 1 |

Nissan Motors | 2017 | 50000 | 3 |

Opel | 2017 | 65000 | 4 |

Daimler Benz | 2017 | 110000 | 5 |

Toyota | 2019 | 45000 | 1 |

Peugeot | 2019 | 55000 | 2 |

Renault | 2019 | 60000 | 3 |

Volvo | 2019 | 75000 | 4 |

We placed the `PARTITION BY`

clause inside the `OVER()`

clause. This divides the result into 3 groups:

- Group 1 with a ProductionYear of 2015.
- Group 2 with a ProductionYear of 2017.
- Group 3 with a ProductionYear of 2019.

Similar partitioning by the `ProductionYear`

column takes place in the upcoming examples of this article.

In this example, each group (partition) has its rank values started from 1. The rank values are assigned using the same working principle as described in the previous example.

*DENSE_RANK() Window Function*

The `DENSE_RANK()`

window function is very similar to the `RANK()`

function. The only difference is that it does not skip any numbers in the rank sequence.

Let’s examine the example below to see the difference:

SELECT Maker, ProductionYear, Price, DENSE_RANK() OVER (PARTITION BY ProductionYear ORDER BY Price) as DenseRankValue FROM CAR_MAKERS;

Maker | ProductionYear | Price | DenseRankValue |
---|---|---|---|

Citroen | 2015 | 50000 | 1 |

Volkswagen | 2015 | 70000 | 2 |

Ford Motor Company | 2015 | 80000 | 3 |

BMW | 2015 | 100000 | 4 |

Fiat | 2017 | 40000 | 1 |

Honda | 2017 | 40000 | 1 |

Nissan Motors | 2017 | 50000 | 2 |

Opel | 2017 | 65000 | 3 |

Daimler Benz | 2017 | 110000 | 4 |

Toyota | 2019 | 45000 | 1 |

Peugeot | 2019 | 55000 | 2 |

Renault | 2019 | 60000 | 3 |

Volvo | 2019 | 75000 | 4 |

In the seventh row of the above output table, the `DenseRankValue`

column value is 2 – although it is in the third row of its partition. The `DENSE_RANK()`

function does not skip the rank value 2 , which was not used in the second row due to row value repetitions. On the other hand, `RANK()`

would use the rank value 3 in the third row of the red partition.

*ROW_NUMBER() Window Function*

The `ROW_NUMBER()`

window function does exactly what its name says. It assigns a row number to each record within the partition; it reinitializes row numbers to start from 1 when the partition is switched.

First, let’s look at an example that does not use the `PARTITION BY`

clause within the `OVER()`

clause. Please note that although the `OVER()`

clause is empty, you still have to define it with the window function:

SELECT CountryName, Continent, ROW_NUMBER() OVER () as RowNumberValue FROM COUNTRIES;

CountryName | Continent | RowNumberValue |
---|---|---|

USA | 1 | 1 |

Germany | 2 | 2 |

France | 2 | 3 |

Japan | 3 | 4 |

Italy | 2 | 5 |

Sweden | 2 | 6 |

As we do not partition the result set, the output is very straightforward. The rows are numbered from 1 to 6.

Once we partition the output table, there will be more than one set of row numbers. Please note that the `ORDER BY`

clause is optional in both of these examples.

SELECT Maker, ProductionYear, Price, ROW_NUMBER() OVER (PARTITION BY ProductionYear ORDER BY Price) as RowNumberValue FROM CAR_MAKERS;

Maker | ProductionYear | Price | RowNumberValue |
---|---|---|---|

Citroen | 2015 | 50000 | 1 |

Volkswagen | 2015 | 70000 | 2 |

Ford Motor Company | 2015 | 80000 | 3 |

BMW | 2015 | 100000 | 4 |

Fiat | 2017 | 40000 | 1 |

Honda | 2017 | 40000 | 2 |

Nissan Motors | 2017 | 50000 | 3 |

Opel | 2017 | 65000 | 4 |

Daimler Benz | 2017 | 110000 | 5 |

Toyota | 2019 | 45000 | 1 |

Peugeot | 2019 | 55000 | 2 |

Renault | 2019 | 60000 | 3 |

Volvo | 2019 | 75000 | 4 |

Notice that rows are numbered from 1 for each of the partitioned groups.

*PERCENT_RANK() Window Function*

The `PERCENT_RANK()`

window function returns the value from 0 to 1 (both inclusive), which indicates the percentage of current partition rows with a value less than the current row value. This will become clearer with an example. Have a look:

SELECT Maker, ProductionYear, Price, PERCENT_RANK() OVER (PARTITION BY ProductionYear ORDER BY Price) as PercentValue FROM CAR_MAKERS;

Maker | ProductionYear | Price | PercentValue |
---|---|---|---|

Citroen | 2015 | 50000 | 0 |

Volkswagen | 2015 | 70000 | 0.3333333333333333 |

Ford Motor Company | 2015 | 80000 | 0.6666666666666666 |

BMW | 2015 | 100000 | 1 |

Fiat | 2017 | 40000 | 0 |

Honda | 2017 | 40000 | 0 |

Nissan Motors | 2017 | 50000 | 0.5 |

Opel | 2017 | 65000 | 0.75 |

Daimler Benz | 2017 | 110000 | 1 |

Toyota | 2019 | 45000 | 0 |

Peugeot | 2019 | 55000 | 0.3333333333333333 |

Renault | 2019 | 60000 | 0.6666666666666666 |

Volvo | 2019 | 75000 | 1 |

Let’s go through the output table starting from the first row. In the first partition, there is no `Price`

column value that is smaller than 50000. Hence, the `PercentValue`

column value is 0, i.e. 0% of the current partition’s `Price`

values are smaller than the current row’s `Price`

value.

In the case of the second row, 1/3 of the `Price`

values in the current partition are smaller than 70000. So, the `PercentValue`

value of this row is 0.333(or 33%).

In the last row of this partition, we see that all the `Price`

values are smaller than 100000. Thus, the `PercentValue`

value is 1, i.e. 100% of the current partition’s `Price`

values are smaller than the current `Price`

value.

You can apply the same working principle to go through the remaining two partitions of the above example.

Want to know more about ranking window functions? The article Overview of Ranking Functions in SQL will help you out.

### Example 2: The Value Window Functions

Below are some usage examples of MySQL’s value window functions, which are very useful for data analysis.

*LEAD() Window Function*

The `LEAD()`

window function allows us to look up the values of subsequent rows in the current partition. It is commonly used to calculate the difference between current and following row values.

First, let’s look at the example that simply uses the `LEAD()`

function to output the value of the row after the current one:

SELECT Maker, ProductionYear, Price, TotalCountrySales, LEAD(TotalCountrySales, 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price) as LeadValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | Price | TotalCountrySales | LeadValue |
---|---|---|---|---|

Citroen | 2015 | 50000 | 80 | 95 |

Volkswagen | 2015 | 70000 | 95 | 90 |

Ford Motor Company | 2015 | 80000 | 90 | 95 |

BMW | 2015 | 100000 | 95 | 0 |

Honda | 2017 | 40000 | 60 | 75 |

Fiat | 2017 | 40000 | 75 | 60 |

Nissan Motors | 2017 | 50000 | 60 | 95 |

Opel | 2017 | 65000 | 95 | 95 |

Daimler Benz | 2017 | 110000 | 95 | 0 |

Toyota | 2019 | 45000 | 60 | 80 |

Peugeot | 2019 | 55000 | 80 | 80 |

Renault | 2019 | 60000 | 80 | 85 |

Volvo | 2019 | 75000 | 85 | 0 |

The `LEAD()`

window function takes three arguments:

- The column value to be returned – in this case, it is the
`TotalCountrySales`

column. - The row number (relative to the current row) to be looked up – here,
`LEAD()`

looks up the value of the row immediately after the current row. - A default value that’s returned if there is no subsequent row – in the current example, the
`LeadValue`

in the fourth row is 0 because there are no more rows in this partition.

In the output, the first row has a `LeadValue`

equal to 95 – the `TotalCountrySales`

of the second row. The second row has a `LeadValue`

of 90, the `TotalCountrySales`

of the third row, and so on. As I already mentioned, the `LeadValue`

of the fourth row is 0 (the default value we defined); although there are subsequent rows in the table, the fourth row is the last row in the current partition.

This process is repeated for each partition of the resulted table.

Now let’s try to calculate the `TotalCountrySales`

difference between the current and subsequent rows.

SELECT Maker, ProductionYear, Price, TotalCountrySales, LEAD(TotalCountrySales , 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price) as LeadValue, TotalCountrySales - (LEAD(TotalCountrySales , 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price)) as Difference FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | Country | TotalCountrySales | LeadValue | Difference |
---|---|---|---|---|---|

Citroen | 2015 | 50000 | 80 | 95 | -15 |

Volkswagen | 2015 | 70000 | 95 | 90 | 5 |

Ford Motor Company | 2015 | 80000 | 90 | 95 | -5 |

BMW | 2015 | 100000 | 95 | 0 | 95 |

Fiat | 2017 | 40000 | 75 | 60 | 15 |

Honda | 2017 | 40000 | 60 | 60 | 0 |

Nissan Motors | 2017 | 50000 | 60 | 95 | -35 |

Opel | 2017 | 65000 | 95 | 95 | 0 |

Daimler Benz | 2017 | 110000 | 95 | 0 | 95 |

Toyota | 2019 | 45000 | 60 | 80 | -20 |

Peugeot | 2019 | 55000 | 80 | 80 | 0 |

Renault | 2019 | 60000 | 80 | 85 | -5 |

Volvo | 2019 | 75000 | 85 | 0 | 85 |

Except for the `Difference`

column, this example is similar to the previous one. This column stores the difference between the `TotalCountrySales`

value of the current row (column `TotalCountrySales`

) and the `TotalCountrySales`

value of the next row (column `LeadValue`

).

Please note that the `LEAD()`

window function is not limited to getting the value from the immediate next row. Let’s see an example where `LEAD()`

is used to get a value from the third row after the current row. In this example, we do not partition the result table.

SELECT Maker, ProductionYear, Price, CountrySalesIndicator, LEAD(TotalCountrySales, 3) OVER (ORDER BY Price) as LeadValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | Price | TotalCountrySales | LeadValue |
---|---|---|---|---|

Fiat | 2017 | 40000 | 75 | 80 |

Honda | 2017 | 40000 | 60 | 60 |

Toyota | 2019 | 45000 | 60 | 80 |

Citroen | 2015 | 50000 | 80 | 80 |

Nissan Motors | 2017 | 50000 | 60 | 95 |

Peugeot | 2019 | 55000 | 80 | 95 |

Renault | 2019 | 60000 | 80 | 85 |

Opel | 2017 | 65000 | 95 | 90 |

Volkswagen | 2015 | 70000 | 95 | 95 |

Volvo | 2019 | 75000 | 85 | 95 |

Ford Motor Company | 2015 | 80000 | 90 | 80 |

BMW | 2015 | 100000 | 95 | 85 |

Daimler Benz | 2017 | 110000 | 95 | 0 |

Starting from the first row of the result table, each row gets a value of the 3rd row following it. Please note that the last 3 rows values are `null`

because there is no value for the 3rd row following them.

*LAG() Window Function*

The `LAG()`

window function is the opposite of the `LEAD()`

function. It returns the value from a **previous** row.

Let’s see it in action. The following example returns a `LagValue`

and calculates the difference between the current row and the previous row:

SELECT Maker, ProductionYear, Price, TotalCountrySales, LAG(TotalCountrySales, 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price) as LagValue, TotalCountrySales - (LAG(TotalCountrySales, 1, 0) OVER (PARTITION BY ProductionYear ORDER BY Price)) as Difference FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | Price | TotalCountrySales | LagValue | Difference |
---|---|---|---|---|---|

Citroen | 2015 | 50000 | 80 | 0 | 80 |

Volkswagen | 2015 | 70000 | 95 | 80 | 15 |

Ford Motor Company | 2015 | 80000 | 90 | 95 | -5 |

BMW | 2015 | 100000 | 95 | 90 | 5 |

Fiat | 2017 | 40000 | 75 | 0 | 75 |

Honda | 2017 | 40000 | 60 | 75 | -15 |

Nissan Motors | 2017 | 50000 | 60 | 60 | 0 |

Opel | 2017 | 65000 | 95 | 60 | 35 |

Daimler Benz | 2017 | 110000 | 95 | 95 | 0 |

Toyota | 2019 | 45000 | 60 | 0 | 60 |

Peugeot | 2019 | 55000 | 80 | 60 | 20 |

Renault | 2019 | 60000 | 80 | 80 | 0 |

Volvo | 2019 | 75000 | 85 | 80 | 5 |

The example should speak for itself, as it is analogous to the example used for `LEAD()`

. Here the values are looked up from the previous row; this is opposite of the `LEAD()`

function, where the values are looked up from the subsequent row.

Like `LEAD()`

, the `LAG()`

window function can fetch the value of any row preceding it – not just the immediately preceding row.

*FIRST_VALUE() Window Functions*

This window function outputs the first value of the current partition. This will be clear after looking at the example below.

SELECT Maker, ProductionYear, Price, TotalCountrySales, FIRST_VALUE(TotalCountrySales) OVER (PARTITION BY ProductionYear ORDER BY Price) as FirstValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | Price | TotalCountrySales | FirstValue |
---|---|---|---|---|

Citroen | 2015 | 50000 | 80 | 80 |

Volkswagen | 2015 | 70000 | 95 | 80 |

Ford Motor Company | 2015 | 80000 | 90 | 80 |

BMW | 2015 | 100000 | 95 | 80 |

Fiat | 2017 | 40000 | 75 | 75 |

Honda | 2017 | 40000 | 60 | 75 |

Nissan Motors | 2017 | 50000 | 60 | 75 |

Opel | 2017 | 65000 | 95 | 75 |

Daimler Benz | 2017 | 110000 | 95 | 75 |

Toyota | 2019 | 45000 | 60 | 60 |

Peugeot | 2019 | 55000 | 80 | 60 |

Renault | 2019 | 60000 | 80 | 60 |

Volvo | 2019 | 75000 | 85 | 60 |

Let’s analyze the first partition in this output table. The `FirstValue`

column contains an 80 for all rows; this is the `TotalCountrySales`

column value of the first row in this partition.

*Example 3: Using Aggregate Functions with OVER() *

Our next example demonstrates the usage of the aggregate functions `SUM()`

, `COUNT()`

, and `AVG()`

with the `OVER()`

clause. Using these functions with `OVER()`

turns them into window functions.

Please note that the `OVER()`

clause is mandatory for all window functions, regardless of whether it is empty or not.

SELECT Maker, ProductionYear, Country, TotalCountrySales, SUM(TotalCountrySales) OVER (PARTITION BY ProductionYear) as SumValue, COUNT(TotalCountrySales) OVER (PARTITION BY ProductionYear) as CountValue, AVG(TotalCountrySales) OVER (PARTITION BY ProductionYear) as AvgValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | Country | TotalCountrySales | SumValue | CountValue | AvgValue |
---|---|---|---|---|---|---|

Volkswagen | 2015 | 2 | 95 | 360 | 4 | 90.0000 |

BMW | 2015 | 2 | 95 | 360 | 4 | 90.0000 |

Ford Motor Company | 2015 | 1 | 90 | 360 | 4 | 90.0000 |

Citroen | 2015 | 3 | 80 | 360 | 4 | 90.0000 |

Nissan Motors | 2017 | 4 | 60 | 385 | 5 | 77.0000 |

Fiat | 2017 | 5 | 75 | 385 | 5 | 77.0000 |

Honda | 2017 | 4 | 60 | 385 | 5 | 77.0000 |

Daimler Benz | 2017 | 2 | 95 | 385 | 5 | 77.0000 |

Opel | 2017 | 2 | 95 | 385 | 5 | 77.0000 |

Peugeot | 2019 | 3 | 80 | 305 | 4 | 76.2500 |

Renault | 2019 | 3 | 80 | 305 | 4 | 76.2500 |

Toyota | 2019 | 4 | 60 | 305 | 4 | 76.2500 |

Volvo | 2019 | 6 | 85 | 305 | 4 | 76.2500 |

In the above example, the sum, count, and average values are calculated using the values in the current partition. It is also worth mentioning that they are calculated for each partition separately.

Let’s analyze the first partition to see what’s going on:

- The
`SumValue`

column stores the sum of all`TotalCountrySales`

values in the current partition (i.e. 95+95+90+80=360). - The
`CountValue`

column stores the number of records in the current partition (i.e. 4). - The
`AvgValue`

column stores the average of`TotalCountrySales`

column values in the current partition (i.e. (95+95+90+80)/4=90).

A good example of ‘aggregate function + `OVER()`

clause’ usage is the SQL running total. If you want to learn more, please see the article What Is a SQL Running Total and How Do You Compute It?.

### Example 4: NTILE() and CUME_DIST() Window Functions

*The NTILE() Window Function*

The `NTILE()`

window function divides partition rows into buckets. It takes the parameter *n* that the user places in the function brackets and creates that number of buckets. Thus, *n* groups/buckets are created for each partition and the appropriate bucket number is assigned to each row.

Like a few of the other concepts we’ve talked about, this will be clearer after you see an example. Suppose you want to divide car makers into two groups for each production year. This is the query you’d use:

SELECT Maker, ProductionYear, NTILE(2) OVER (PARTITION BY ProductionYear ORDER BY Maker) as NtileValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | NtileValue |
---|---|---|

BMW | 2015 | 1 |

Citroen | 2015 | 1 |

Ford Motor Company | 2015 | 2 |

Volkswagen | 2015 | 2 |

Daimler Benz | 2017 | 1 |

Fiat | 2017 | 1 |

Honda | 2017 | 1 |

Nissan Motors | 2017 | 2 |

Opel | 2017 | 2 |

Peugeot | 2019 | 1 |

Renault | 2019 | 1 |

Toyota | 2019 | 2 |

Volvo | 2019 | 2 |

The column `NtileValue`

contains the bucket number – which is either 1 or 2 – for each row in each partition.

The `NTILE`

window function is used to divide the result table rows into groups. In the example above, we partition the result table into 3 partitions and each partition’s rows are divided into 2 groups because the query uses `NTILE(2)`

.

*The CUME_DIST() Window Function*

The `CUME_DIST()`

window function – as its name indicates – calculates the cumulative distribution value for each row in a partition.

Cumulative distribution is the number of rows that have a value that’s less than or equal to the current row value divided by the number of all rows within the partition.

The example below will help clear up this concept:

SELECT Maker, ProductionYear, Country, TotalCountrySales, CUME_DIST() OVER (PARTITION BY ProductionYear ORDER BY TotalCountrySales) as CumeDistValue FROM CAR_MAKERS cm JOIN COUNTRIES c ON cm.Country = c.CountryID;

Maker | ProductionYear | Country | TotalCountrySales | CumeDistValue |
---|---|---|---|---|

Citroen | 2015 | 3 | 80 | 0.25 |

Ford Motor Company | 2015 | 1 | 90 | 0.5 |

Volkswagen | 2015 | 2 | 95 | 1 |

BMW | 2015 | 2 | 95 | 1 |

Nissan Motors | 2017 | 4 | 60 | 0.4 |

Honda | 2017 | 4 | 60 | 0.4 |

Fiat | 2017 | 5 | 75 | 0.6 |

Daimler Benz | 2017 | 2 | 95 | 1 |

Opel | 2017 | 2 | 95 | 1 |

Toyota | 2019 | 4 | 60 | 0.25 |

Peugeot | 2019 | 3 | 80 | 0.75 |

Renault | 2019 | 3 | 80 | 0.75 |

Volvo | 2019 | 6 | 85 | 1 |

Let’s analyze the first partition as an example. It includes `TotalCountrySales`

column values of 80, 90, 95, and 95. The `CumeDistValue`

column values of the first four rows can be explained as follows:

- The first row has a
`TotalCountrySales`

column value of 80. This is the smallest`TotalCountrySales`

value in the current partition; hence, 1/4=0.25. - The second row has a
`TotalCountrySales`

value of 90. There are two values less than or equal to 90; hence, 2/4=0.5. - The third and fourth rows have a
`TotalCountrySales`

value of 95. This is the highest in this partition, so 4/4=1.

The `CUME_DIST`

window function returns the cumulative distribution value. It is used mostly for data analysis.

## Going Deeper into MySQL Window Functions

SQL window functions are similar to aggregate functions, but with a major difference. Aggregate functions create one summary row for a group of rows, while window functions assign summary values to each row. MySQL window functions are very useful for thorough data analysis: they allow the user to analyze each data group without losing the details in each row.

This article was meant to be an introduction to MySQL window functions. To really learn them, you should check out the LearnSQL.com course on Window Functions. It’ll give you a detailed explanation of each function and lots of opportunities to practice all of them.