21st May 2024 15 minutes read

SQL Window Functions Guide

window functions

Table of Contents

The Syntax of SQL Window Functions
Common SQL Window Functions
SQL Window Functions Examples
Window Functions vs. GROUP BY and Aggregate Functions
Advanced SQL Window Function Usage
Common Problems with SQL Window Functions
Additional Resources for Practicing SQL Window Functions

Every data analyst can benefit from knowing SQL window functions. They up your data analysis game and open doors to analyzing data on an entirely new level. This article is your starting point for the world of SQL window functions.

SQL window functions allow you to show all the data rows and their aggregate values at the same time. Sounds a bit like GROUP BY on steroids, doesn’t it? It doesn’t stop there. As window functions perform various calculations across the set of rows, they also allow datasets to be divided into subsets. They help you easily rank data, aggregate it, calculate differences between different periods (e.g., profit increase/decrease between two months/quarters/years), and find cumulative sums, running totals, moving averages, etc.

Window functions are a complex topic, but learning them is much easier with our Window Functions course. It’s an interactive course with 218 exercises that provide systematic learning with plenty of coding. You’ll learn all about essential window function clauses – e.g. OVER(), ORDER BY, and PARTITION BY – and what a window frame is. Then, you’ll learn how to use all that in the context of different window functions.

After learning, it’s time for practice (which is also learning). Our Window Functions Practice Set gives you 100 additional interactive exercises so you can really get comfortable putting your knowledge to work.

The Syntax of SQL Window Functions

Window functions get their name from a window frame, which is a set of rows related to the current row.

To perform a window function operation on the window frame, you need to know general window function syntax:

SELECT column_1,
       column_2,
	<window_function> OVER(PARTITION BY … ORDER BY … <window_frame>) AS column_alias
FROM table;

There are several essential parts of this syntax that require explaining:

OVER clause: This is the mandatory clause required to define a window frame. For example, you would recognize SUM(order_value) OVER() as a SUM() window function. Without OVER(), this is just a regular SUM() aggregate function.
PARTITION BY: This is an optional clause for partitioning the dataset, i.e. dividing it into subsets. This lets you apply a window function to each partition separately. If this clause is omitted, then the whole result set is one partition.
ORDER BY: This optional clause (for some window functions) is used for specifying the order of rows in a window frame. If you omit this clause, then the order of the rows in the window frame will be arbitrary.
<window_frame>: This defines the upper and lower bounds of a window frame. Two important clauses used for that are ROWS and RANGE. ROWS defines the number of rows that precede and follow the current row. The RANGE clause defines the range of rows based on their value compared to the current row. You can learn more in our article about the differences between ROWS and RANGE. This part of the syntax is usually omitted, as the default window frame is what users most often need.

The default window frame, in this case, depends on whether you use the ORDER BY clause in OVER(). If you do, then the frame is all the current row and all the rows before it in the current partition. If you don’t specify ORDER BY, the window frame is the current row and all the rows preceding and following it in the current partition. In the second case, the window frame is basically the whole dataset – or the whole partition, if you’re also using PARTITION BY.

Common SQL Window Functions

There are many different window functions. Here’s an overview of some of the most common:

Window Function Category	Window Function	Description	Further Reading
Ranking Functions	ROW_NUMBER()	• Returns a unique row number for each row within a window frame. • Tied row values get different row numbers.	• How to Number Rows in an SQL Result Set • How to Use ROW_NUMBER OVER() in SQL to Rank Data
RANK()	• Ranks the rows within a window frame. • Tied row values get the same rank, with a gap in the ranking.	• What Is the RANK() Function in SQL, and How Do You Use It? • How to Rank Rows in SQL: A Complete Guide • How to Use the SQL RANK OVER (PARTITION BY)
DENSE_RANK()	• Ranks the rows within a window frame • Tied row values get the same rank, with no gap in the ranking.	• Overview of Ranking Functions in SQL • What’s the Difference Between RANK and DENSE_RANK in SQL?
Aggregate Functions	SUM()	• Calculates the sum of values within the window frame.	• How to Use SUM() with OVER(PARTITION BY) in SQL
AVG()	• Calculates the average values within the window frame.
COUNT()	• Counts the values of rows within the window frame.	• COUNT OVER PARTITION BY: An Explanation with 3 Examples
MIN()	• Finds the minimum value within the window frame.
MAX()	• Finds the maximum value within the window frame.
Analytic Functions	NTILE()	• Divides the window frame into n groups. If possible, each group will have the same number of rows. • Each row is assigned its group number.	• 6 Examples of NTILE() Function in SQL \| LearnSQL.com
LEAD()	• Gets the data from a row that is a defined number of rows after the current one.	• The LAG Function and the LEAD Function in SQL
LAG()	• Gets the data from a row that is a defined number of rows before the current one.	• The LAG Function and the LEAD Function in SQL
FIRST_VALUE()	• Gets the value of the first row within the window frame.
LAST_VALUE()	• Gets the value of the last row within the window frame.

SQL Window Functions Examples

Now that we have gone through the basics, it’s time to show several practical window function examples.

In all examples, I’ll use the same table. It’s entitled exchange_rates and contains the April 2024 European Central Bank (ECB) exchange rates for three currency pairs: EUR vs. USD, EUR vs. CHF, and EUR vs. JPY.

Here’s a partial snapshot of the data.

id	date	currency_pair	ecb_rate
1	2024-04-02	EUR_USD	1.0749
2	2024-04-02	EUR_JPY	163.01
3	2024-04-02	EUR_CHF	0.9765
4	2024-04-03	EUR_USD	1.0783
5	2024-04-03	EUR_JPY	163.66
6	2024-04-03	EUR_CHF	0.9792
7	2024-04-04	EUR_USD	1.0852
8	2024-04-04	EUR_JPY	164.69
9	2024-04-04	EUR_CHF	0.9846

Ranking Window Function Example

This example will show you how DENSE_RANK() works. The other two ranking window functions can be used in the same manner; they could (depending on the data) return slightly different results.

The code below ranks the data in the table from the highest to the lowest exchange rate:

SELECT date,
	 currency_pair,
	 ecb_rate, 
	 DENSE_RANK() OVER (ORDER BY ecb_rate DESC) AS rank_ecb_rate
FROM exchange_rates;

I select the date, currency pair, and rate. Now, I’ve chosen DENSE_RANK() to rank the data. This is just in case there are the same exchange rates (highly unlikely, but still…); I want them to be ranked the same, and I don’t want gaps in the ranking.

The DENSE_RANK() function is followed by the OVER() clause that defines the window function. In the clause parentheses, I use another window function clause—ORDER BY. This way, I’m telling the DENSE_RANK() window function to rank the data by the ECB rate in descending order.

This is what I get as output:

date	currency_pair	ecb_rate	rank_ecb_rate
2024-04-09	EUR_JPY	164.9700	1
2024-04-10	EUR_JPY	164.8900	2
2024-04-04	EUR_JPY	164.6900	3
…	…	…	...
2024-04-02	EUR_JPY	163.0100	9
2024-04-09	EUR_USD	1.0867	10
2024-04-10	EUR_USD	1.0860	11
2024-04-04	EUR_USD	1.0852	12
…	…	…	…
2024-04-12	EUR_USD	1.0652	18
2024-04-04	EUR_CHF	0.9846	19
2024-04-09	EUR_CHF	0.9819	20
2024-04-10	EUR_CHF	0.9810	21
…	…	…	…
2024-04-12	EUR_CHF	0.9716	27

As you can see, each row has been ranked according to its rate value. In case of the same rates, DENSE_RANK() would allocate the same ranking and wouldn’t skip the rank. RANK() would do the same, only it would skip the rank sequence. ROW_NUMBER() would assign consecutive ranking, even if some rows have the same exchange rate.

Further Reading:

Aggregate Window Function Example

Here’s a nice example of how you can use the AVG() window function to calculate the average rate for each pair of currencies:

SELECT date,
	 currency_pair,
	 ecb_rate,
	 AVG(ecb_rate) OVER (PARTITION BY currency_pair) AS avg_rate_by_currency
FROM exchange_rates
ORDER BY currency_pair, date;

Like in the regular aggregate AVG() function, you must write the column you want the average of. To make it a window function, use the OVER() clause. This time, I use PARTITION BY in the column currency_pair. By doing this, I’m dividing the data into subsets according to the currency pair.

Simply put, I’m calculating the average rate for each currency pair separately.

Have a look at the code output. It’s been sorted by currency pairs and date:

date	currency_pair	ecb_rate	avg_rate_by_currency
2024-04-02	EUR_CHF	0.9765	0.9793
2024-04-03	EUR_CHF	0.9792	0.9793
2024-04-04	EUR_CHF	0.9846	0.9793
…	…	…	…
2024-04-02	EUR_JPY	163.0100	164.1211
2024-04-03	EUR_JPY	163.6600	164.1211
2024-04-04	EUR_JPY	164.6900	164.1211
…	…	…	…
2024-04-02	EUR_USD	1.0749	1.0795
2024-04-03	EUR_USD	1.0783	1.0795
2024-04-0	EUR_USD	1.0852	1.0795

The EUR vs. CHF average rate is 0.9793, and this value is repeated for every EUR vs. CHF row. When the window function reaches the next currency pair, the average resets and is calculated anew; for EUR vs. JPY, it’s 164.1211. Finally, the average for EUR vs. USD is 1.0795.

This window function allowed me to calculate averages separately and show the average values without collapsing individual rows. In other words, I can see every daily rate alongside the average for that currency pair.

Further Reading:

Analytic Window Function Examples

In this section, I’ll show three examples of different analytic SQL window functions.

LAG()

The LAG() example will demonstrate how to calculate daily change. This function is used for accessing the value from the preceding rows. Another analytic window function is LEAD(), which does exactly the opposite – it fetches the data from the following rows. Both functions have basically the same syntax; you just change the name of the function.

In this example, I want to calculate the daily change in exchange rates:

SELECT date,
	 currency_pair,
	 ecb_rate,
	 ecb_rate - LAG(ecb_rate) OVER (PARTITION BY currency_pair ORDER BY date) AS daily_rate_change
FROM exchange_rates;

LAG() is a function for accessing the value(s) in the row(s) preceding the current row. In the above example, I specify the column ecb_rate in LAG(). This means I want to access the value of the ECB rate. I don’t explicitly specify the number of rows I want to go back to, so it will be the default value of one row.

OVER() comes after the function name. Here, I partition the dataset by currency pair since I want to calculate the daily rate change for each currency pair separately.

I also use ORDER BY to sort the data within the partitions. Since the logic is to go back one day, the data must be ordered ascendingly by date.

So, the LAG() part of the code represents the previous day's exchange rate. To get the daily difference, I simply subtract this value from the current exchange rate (ecb_rate - LAG(ecb_rate)).

Here’s the output:

date	currency_pair	ecb_rate	daily_rate_change
2024-04-02	EUR_CHF	0.9765	NULL
2024-04-03	EUR_CHF	0.9792	0.0027
2024-04-04	EUR_CHF	0.9846	0.0054
2024-04-05	EUR_CHF	0.9793	-0.0053
…	…	…	…
2024-04-02	EUR_JPY	163.0100	NULL
2024-04-03	EUR_JPY	163.6600	0.6500
2024-04-04	EUR_JPY	164.6900	1.0300
2024-04-05	EUR_JPY	164.1000	-0.5900
…	…	…	…
2024-04-02	EUR_USD	1.0749	NULL
2024-04-03	EUR_USD	1.0783	0.0034
2024-04-04	EUR_USD	1.0852	0.0069
2024-04-05	EUR_USD	1.0841	-0.0011
…	…	…	…

The first row is NULL because there is no previous date, so the difference can’t be calculated. In the next row, the daily rate change is 0.9792-0.9765 = 0.0027. The same principle of taking the previous row value and subtracting it from the current is continued with every row.

Since the dataset is partitioned by currency pair, the calculation is reset once it reaches other currency pairs, i.e. EUR vs. JPY and EUR vs. USD.

Further Reading:

• The LAG Function and the LEAD Function in SQL

FIRST_VALUE()

The FIRST_VALUE() window function can be used on our data to calculate the differences between the current and the first exchange rate of the month.

I can do that because FIRST_VALUE() returns the first value within the partition. Its opposite is LAST_VALUE(), which returns the last value in the partition. Both functions have basically the same syntax; only the name of the function is different (and the outcome, of course!).

SELECT date,
	 currency_pair,
	 ecb_rate,
	 ecb_rate - FIRST_VALUE(ecb_rate) OVER (PARTITION BY currency_pair ORDER BY date) AS difference_current_first
	  FROM exchange_rates
ORDER BY currency_pair, date;

I want the rate value; that’s why there’s the column ecb_rate in FIRST_VALUE(). The data is again partitioned by currency pair, as I want a calculation for each pair.

OK, so FIRST_VALUE() is used to fetch the value from the first row in the partition. But what happens if I order the data in the partition ascendingly by date? That’s right; the first row is the one with the rate from the month’s first day. In our case, it’s the first exchange rate in April.

Now, subtract this from the current exchange rate. Since we know our data is only for April, we get the difference between the current and first rates for that month.

date	currency_pair	ecb_rate	difference_current_lowest
2024-04-02	EUR_CHF	0.9765	0.0000
2024-04-03	EUR_CHF	0.9792	0.0027
…	…	…	…
2024-04-12	EUR_CHF	0.9716	-0.0049
2024-04-02	EUR_JPY	163.0100	0.0000
2024-04-03	EUR_JPY	163.6600	0.6500
…	…	…	…
2024-04-12	EUR_JPY	163.1600	0.1500
2024-04-02	EUR_USD	1.0749	0.0000
2024-04-03	EUR_USD	1.0783	0.0034
…	…	…	…
2024-04-12	EUR_USD	1.0652	-0.0097

When the difference is 0, the current and the earliest rates are the same. For EUR vs. CHF, the first rate is 0.9765. Let’s check the first two rows: 0.9765 - 0.9765 = 0.0000; 0.9792 - 0.9765 = 0.0027.

The same principle is applied to the other two currency pairs.

NTILE()

The last example I’ll show is the NTILE() function, which divides the window (or partition) into groups. The argument in the NTILE() function parentheses specifies the number of groups you want your dataset to be divided into.

The splitting will be done chronologically by ordering the data ascendingly by date:

SELECT date,
	 currency_pair,
	 ecb_rate,
	 NTILE(3) OVER (ORDER BY date) AS group_number
FROM exchange_rates;

And here’s the result:

date	currency_pair	ecb_rate	group_number
2024-04-02	EUR_USD	1.0749	1
2024-04-02	EUR_JPY	163.0100	1
…	…	…	…
2024-04-04	EUR_CHF	0.9846	1
2024-04-05	EUR_USD	1.0841	2
2024-04-05	EUR_JPY	164.1000	2
…	…	…	…
2024-04-09	EUR_CHF	0.9819	2
2024-04-10	EUR_USD	1.0860	3
2024-04-10	EUR_JPY	164.8900	3
…	…	…	…
2024-04-12	EUR_CHF	0.9716	3

The data is split into three groups. Since there are 27 rows of data in total, it could be split into equal groups of nine rows.

Further Reading:

6 Examples of NTILE() Function in SQL | LearnSQL.com

Window Functions vs. GROUP BY and Aggregate Functions

Aggregate window functions, as you can assume from their name, are used to aggregate data. But what about ‘regular’ aggregate functions and GROUP BY? They, too, are used to aggregate data. So, how are the window functions different?

The main difference is that aggregate window functions (and window functions in general) don’t collapse individual rows while showing the aggregated value. On the other hand, GROUP BY and aggregate functions can only show the aggregated values; they collapse individual rows.

In simple words, window functions let you show the analytical and aggregate data at the same time.

Further Reading:

Advanced SQL Window Function Usage

Window functions have extensive usage in data analytics, as they can solve many business reporting requirements.

Here’s an overview of some of the more complex window function usages.

Ranking

Yes, I already discussed ranking and showed you an example of how to do it. There are three window functions for ranking data: ROW_NUMBER(), RANK(), and DENSE_RANK(). They don’t necessarily return the same result, as they all have slightly different ways of ranking data. Which one you’ll use depends on your data and what you want to achieve.

Further Reading:

Running Totals and Moving Averages

Both these calculations are usually used in time series analysis. Time series are data showing values at certain points of time. Analyzing such data is, well, time series analysis. Its purpose is to unveil trends in data and find possible causes of significant deviations from the trend.

The running total (or cumulative sum) is the sum of the current row and all the preceding rows’ values. As you move to the future, the time frame size increases by one row/data point and the value is added to the previous row’s running total.

Moving averages are the average value of the last n periods. As you move to the future, the time frame moves – but its size remains the same. This is heavily used in the financial industry, e.g. a 5-day moving average in stock price analysis. That way, the average price is continuously updated, and the impact of significant short-term changes to a stock price is neutralized.

The rolling total is a mixture of the running total and the moving average. It is the sum of the last n periods. For example, a 7-day rolling total is the cumulative sum over the period of 7 previous days. That way, you can see if the trends in the weekly sum without typical day of the week fluctuations.

Difference Between Two Rows or Time Periods

The window function used for calculating a difference between two rows is LAG(), which allows you to access values from the preceding rows. The difference between two time periods is basically the same thing; it only refers to finding differences when working with time series. In the examples section, I showed you how to do this.

Analyzing Time Series

Window functions work like a charm when you need to analyze time series. There’s not only the LAG() function to do that, but many others.

Further Reading:

Common Problems with SQL Window Functions

There are several issues everybody using window functions runs into sooner or later:

Confusing window functions with aggregate functions and GROUP BY, which we already discussed.
Trying to use window functions in WHERE. This can’t be done because SQL processes WHERE conditions before the window functions.
Trying to use window functions in GROUP BY, which is also not allowed because of SQL’s order of the operations: window functions are executed after GROUP BY.

Additional Resources for Practicing SQL Window Functions

SQL window functions are among the most useful SQL tools data analysts have. This is especially true when you go beyond basic reports and require sophisticated calculations and the ability to show analytical and aggregated data simultaneously.

All the topics I covered in this article require further practice on practical examples, which you can find in the following articles and courses:

Remember – practice makes perfect! So don’t just read the articles; make sure to get some hands-on coding practice as well. Happy learning!

Tags:

window functions

The Syntax of SQL Window Functions

Common SQL Window Functions

SQL Window Functions Examples

Ranking Window Function Example

Aggregate Window Function Example

Analytic Window Function Examples

LAG()

FIRST_VALUE()

NTILE()

Window Functions vs. GROUP BY and Aggregate Functions

Advanced SQL Window Function Usage

Ranking

Running Totals and Moving Averages

Difference Between Two Rows or Time Periods

Analyzing Time Series

Common Problems with SQL Window Functions

Additional Resources for Practicing SQL Window Functions

You may also like