*Window functions are one of the most powerful features in modern SQL. In this article, you will learn why window functions are so great. I will list you the most useful window functions and briefly explain when and how you can use them.*

Simple `SELECT`

statements, in combination with `WHERE`

, `GROUP BY`

, and `HAVING`

, are sufficient for many analyses at work. I used standard SQL for many years before I realized there is something more.

A few years ago, I started learning about a great “modern” feature: window functions. And I have been using them ever since. Once you master the syntax and realize how great window functions are, you will use them often.

In this article, I'm going to show you a list of window functions, why they’re so great, and how you can use them. Afterward, I will dive deep into examples.

## What Are Window Functions?

Window functions are a special feature in SQL that allows you to do calculations across a set of rows. They are also called `OVER`

functions or analytics functions. You might be thinking, “Hey, this is similar to a `GROUP BY`

statement.” Well, there is a huge difference between them.

When using a basic select-group, we do calculations and display them on a group level. The rows are collapsed and presented in the result set.

When using a window function, there is no collapsing. Each record gets its own calculation. For each row in a table, we define its window frame and do a specific calculation.

Let’s explain this with an example. Below is the

table that contains information about salesmen’s KPI (key performance indicator), a measure for success, each quarter: **kpi_new_clients**

salesman | period | new_clients |
---|---|---|

Olivia Smith | Q1 | 9 |

Olivia Smith | Q2 | 7 |

Olivia Smith | Q3 | 12 |

Olivia Smith | Q4 | 10 |

Lily Jones | Q1 | 7 |

Lily Jones | Q2 | 5 |

Lily Jones | Q3 | 8 |

Lily Jones | Q4 | 6 |

Alfred Brown | Q1 | 4 |

Alfred Brown | Q2 | 6 |

Alfred Brown | Q3 | 11 |

Alfred Brown | Q4 | 9 |

Sonny Lee | Q1 | 10 |

Sonny Lee | Q2 | 10 |

Sonny Lee | Q3 | 11 |

Sonny Lee | Q4 | 11 |

The table contains three columns:

— first and last name of the salesmen**salesman**

— Q1, Q2, Q3, and Q4 represent each quarter in the year**period**

— number of new clients that each salesman acquired in one quarter**new_clients**

Now, imagine you are the boss and you want to see an additional column, benchmark, that shows the best result achieved by the salesmen in each quarter. Once you have a benchmark assigned to each record, you can easily see how close each salesman was. This calculation can be easily done using window functions:

SELECT *, MAX(new_clients) OVER (PARTITION BY period) AS benchmark FROM kpi_new_clients;

Once you run this query, the SQL engine will display an entire dataset with the additional column, benchmark. This column represents the maximum value of new clients for a specific quarter:

Using a window function for the benchmark calculation.

Let’s briefly go through the syntax:

`OVER`

denotes that this is a window function. Due to this keyword, sometimes window functions are also called OVER functions.`PARTITION BY`

tells us how the rows are grouped into logical chunks/groups. In our example, the rows are grouped on a period level, which means that for each quarter, we will separately calculate a maximum value.

Now that you are familiar with the syntax, I will briefly describe why window functions are so useful and go over the most commonly used ones.

## Why Are Window Functions Useful?

By using window functions, you can perform certain complex business calculations with only a few lines of code. This is especially helpful with time series data analysis and calculations like moving averages, running totals, rankings, etc. This is nicely described in our articles When to use SQL window functions and SQL Course of the Month – Window Functions.

Also, by using window functions, you will have much cleaner and more readable code that is easier to maintain. I will show you what I mean with several examples later on.

Now, let's dive deep into the window functions list.

## Window Functions List

Before I introduce the window functions list, I want to explain the main types of window functions:

**Aggregate**functions — These are regular aggregate functions that you have probably used with`GROUP BY`

. However, they can also be used with`OVER()`

. Unlike regular aggregations used in combination with`GROUP BY`

, when they are used with`OVER()`

, rows are not collapsed. Each record gets its own calculated values. This group of functions represents`sum`

,`avg`

,`min`

,`max`

, and`count`

.**Ranking**window functions — These are used to assign a rank or row number to each record inside a partition. The most famous functions in this group are`rank()`

,`dense_rank()`

, and`row_number()`

.**Positional**window functions — Functions like`first_value`

,`last_value`

,`lead`

, and`lag`

return a single value from a particular row in each window frame (there are no aggregations). This “value” can be the value of the first/last record in each window frame, or it can return a value from the previous row or from the next row (lead/lag).**Distribution**functions — In this group, there are two famous functions:`cume_dist`

and`percent_rank`

. Both calculate where each row value stands in a group of other values inside the same group/partition/window frame.

Okay, now that you are aware of the function types, it is time to introduce the most famous window functions. Below is the window function list:

Okay, now that you are aware of the function types, it is time to introduce the most famous window functions. Below is the window function list:

Function | Category | Description |
---|---|---|

SUM() | AGGREGATION | returns total sum of all input values for each partition/window frame |

AVG() | AGGREGATION | returns average value of all input values for each partition/window frame |

MIN(), MAX() | AGGREGATION | returns min or max value among all input values for each partition/window frame |

COUNT() | AGGREGATION | returns total number of input values for each partition/window frame |

CUME_DIST() | DISTRIBUTION | returns cumulative distribution of input values for each partition/window frame (it is calculated by the formula: total number of rows that are less or equal to current value divided by the total number of rows in the partition/window frame) |

PERCENT_RANK() | DISTRIBUTION | returns relative percent rank of a given row (similar to cume_dist, based on the number of rows in the group that have a lower value than the current row) |

ROW_NUMBER() | RANKING | assigns a sequential integer to each row within the partition of a result set |

RANK() | RANKING | assigns a rank to each value within the group/partition (values need to be ordered so that ORDER BY can define an order; if two values are the same, they receive the same rank) |

DENSE_RANK() | RANKING | Assigns the same rank to rows with equal values for the ranking criteria (very similar to rank function) |

NTILE() | RANKING | Assigns to each value in a list a number that represents a bucket or group (you specify a number of buckets, and ntile() determines what belongs to which bucket) |

FIRST_VALUE() | POSITIONAL | returns a value of the expression for the first row according to the given order (it's different from max) |

LAST_VALUE() | POSITIONAL | returns the value of the expression for the last row according to the given order |

LEAD() | POSITIONAL | assigns a value to each row in a table that is stored in a row after the current one (next row value) |

LAG() | POSITIONAL | assigns a value to each row in a table that is stored in a row before the current one (previous row value) |

Now, it is time to see window functions in action. Let's go through several examples.

## The Total Number and Mean of New Clients for Each Salesman

We could display the total number of new clients and mean value for each salesman in one year (Q1, Q2, Q3, and Q4) by using GROUP BY syntax:

SELECT salesman, avg(new_clients) AS average, sum(new_clients) AS total FROM kpi_new_clients group by salesman;

You are probably now asking yourself, “Why should I use window functions if I can do the calculation by using GROUP BY? Why not stick with the regular syntax?” Well, in this case, window functions are great for adding averages and total sums to each input row as an addi`tional attribute without collapsing.`

With `GROUP BY`

, you will get collapsed results:

salesman | average | total |
---|---|---|

Alfred Brown | 7.5 | 30 |

Lily Jones | 6.5 | 26 |

Olivia Smith | 9.5 | 38 |

Sonny Lee | 10.5 | 42 |

You got four rows. If you would like to see the averages and totals as additional values, use window functions. See the query below:

SELECT *, new_clients, avg(new_clients) OVER (PARTITION BY salesman) AS average, sum(new_clients) OVER (PARTITION BY salesman) AS total FROM kpi_new_clients;

Once you run this query, you will see the whole table with two additional columns:

With a result presented like this, you can easily check in which quarter each salesman was below or above his/her average and answer similar questions. Our data set is now enriched with additional values that give us better insights.

## Salesman Ranking per Each Quarter

We could analyze our dataset further by ranking our salesmen. Who was the best in each quarter? Who has acquired the most and who has not been successful in acquiring new clients?

To answer these questions, you can use window ranking functions—`row_number`

, `rank`

, and `dense_rank`

. These analytics functions work similarly— rank is assigned to each record in a dataset by ordering records according to some value.

In this case, we are interested in who acquired the most clients. So, it is important that the column `new_clients`

is ordered in descending order. Here is the `SELECT`

statement:

SELECT *, ROW_NUMBER() OVER (PARTITION BY period ORDER BY new_clients desc) AS rownumber_salesman, RANK() OVER (PARTITION BY period ORDER BY new_clients desc) AS rank_salesman, dense_rank() OVER (PARTITION BY period ORDER BY new_clients desc) AS dense_rank_salesman FROM kpi_new_clients;

Once you run this statement, the following output will be displayed on the screen:

salesman | period | new_clients | rownumber_salesman | rank_salesman | dense_rank_salesman |
---|---|---|---|---|---|

Sonny Lee | Q1 | 10 | 1 | 1 | 1 |

Olivia Smith | Q1 | 9 | 2 | 2 | 2 |

Lily Jones | Q1 | 7 | 3 | 3 | 3 |

Alfred Brown | Q1 | 4 | 4 | 4 | 4 |

Sonny Lee | Q2 | 10 | 1 | 1 | 1 |

Olivia Smith | Q2 | 7 | 2 | 2 | 2 |

Alfred Brown | Q2 | 6 | 3 | 3 | 3 |

Lily Jones | Q2 | 5 | 4 | 4 | 4 |

Olivia Smith | Q3 | 12 | 1 | 1 | 1 |

Alfred Brown | Q3 | 11 | 2 | 2 | 2 |

Sonny Lee | Q3 | 11 | 3 | 2 | 2 |

Lily Jones | Q3 | 8 | 4 | 4 | 3 |

Sonny Lee | Q4 | 11 | 1 | 1 | 1 |

Olivia Smith | Q4 | 10 | 2 | 2 | 2 |

Alfred Brown | Q4 | 9 | 3 | 3 | 3 |

Lily Jones | Q4 | 6 | 4 | 4 | 4 |

Here is a short code explanation:

- In this example, we created three additional columns for the results of the window functions: rownumber_salesman, rank_salesman, and dense_rank_salesman.
- The rows are grouped into partitions by quarters so that each ranking will be done on each partition separately.
`row_number`

,`rank`

, and`dense_rank`

assigned 1 for the salesman with the highest number of new clients in a given period, two for the second-highest, and so on.- We used
`OVER`

in a combination with`ORDER BY`

.`ORDER BY`

is used for partition ordering (we sorted the records inside each partition in a specified order). In our example, each partition was sorted by the column`new_clients`

in descending order. After, row ordering ranking was applied.

Notice that all three analytics functions worked in almost the same way. The only difference lies in the records that have the same value in `new_clients`

.

Let's take a look at the numbers from the third quarter:

Let's take a look at the numbers from the third quarter:

salesman | period | new_clients | rownumber_salesman | rank_salesman | dense_rank_salesman |
---|---|---|---|---|---|

Olivia Smith | Q3 | 12 | 1 | 1 | 1 |

Alfred Brown | Q3 | 11 | 2 | 2 | 2 |

Sonny Lee | Q3 | 11 | 3 | 2 | 2 |

Lily Jones | Q3 | 8 | 4 | 4 | 3 |

In the third quarter, Alfred and Sonny acquired the same number of clients?: 11. Olivia did better, and Lily acquired the least. How do we want to rank if more records have the same value in `new_clients`

?

If we want Alfred and Sonny to get the same rank, we should use rank or `dense_rank`

. These two functions assign the same rank to equal values. If we want each salesman to get a different rank, we should use

.`row_number`

What about Lily if Alfred and Sonny share second place? Do we want to give her third or fourth place? If third, we should use `dense_rank`

. If fourth, we should use rank.

This is the main difference between rank and `dense_rank`

. `dense_rank`

does not skip any ranks if there is a tie in the preceding records.

More details about ranking functions can be found in our article: Overview of Ranking Functions in SQL. Take a look if you are keen to learn more about ranking.

## Grouping Salesmen by Their Performance

Another useful function is `NTILE`

. This window function breaks ordered records into a specified number of equal groups.

So, we could form groups or buckets of salesmen by their performance. We could do this as follows:

WITH total_by_quarter AS ( SELECT salesman, sum(new_clients) AS new_clients FROM kpi_new_clients GROUP BY salesman ) SELECT *, NTILE(2) OVER (ORDER BY new_clients desc) AS bucket_performance FROM total_by_quarter;

Here is the code explanation:

- We used CTEs (common table expressions).
`total_by_quarter`

is the temporary result set that stores the total number of acquired clients in the year (Q1, Q2, Q3, and Q4 merged) for each salesman. - Each salesman belongs to one bucket/group. This is stored in the column
`bucket_performance`

which is created by using`NTILE`

. `NTILE`

takes one argument?: ?the number of buckets. In our case, this argument is set to 2, which means that two salesmen groups are created.

The result of our query looks like this:

Sonny and Olivia acquired more new clients, so they are in bucket 1. Alfred and Lily are in bucket 2. The groups are equal in size (they both have two salesmen), and they are formed according to the ordered `new_clients`

variable.

## Number of Acquired Clients From Previous Quarter

In the last example, I’m going to introduce a positional function. We are going to use lag to return the value of `new_clients`

from the previous quarter for each salesman separately.

Here is the code:

SELECT *, LAG(new_clients) OVER (PARTITION BY salesman ORDER BY period) AS previous_period_new_clients FROM kpi_new_clients;

Keep this in mind:

- Inside the
`OVER`

clause, we used`PARTITION BY`

(we want to group the records for each salesman separately). For each salesman, the rows are ordered by period column. - To each current row, we assigned the value of the previous one (the
`lag`

function does this). - Some rows are populated with
`null`

. These are rows for which the previous row value is unknown.

Similarly, the `lead`

function can be used as well. `lead`

takes values from the following row.

## Summary

In this article, you learned about window functions — why they are useful and when you could use them. I gave you a list of window functions, and through several examples, I explained the syntax and how you can use them at work.

If you want to learn more, I suggest taking the Window Functions course available on our LearnSQL.com platform. This course is interactive, which means it contains a lot of examples for you to practice with.

If you are working with MS SQL or PostgreSQL, I suggest Window Functions in MS SQL Server or Window Functions in PostgreSQL. All three courses mentioned above contain the same content, except that they are for different SQL engines.

There is also a cool cheat sheet that you can use while practicing: SQL Window Functions Cheat Sheet. It is a good resource for learning window functions.