28th Nov 2023 22 minutes read

What Are SQL Window Functions?

window functions

Table of Contents

A Brief History of SQL Window Functions
What’s a Window?
SQL Window Function Syntax
Dataset and Syntax Examples
What Are the Most Common Window Functions?
Real-World SQL Window Function Examples
SQL Window Functions: A Window to Better Data Analysis

Need to up your data analysis game? Learn these SQL window functions here and you’ll take your analysis skills to the next level. Practice exercises and detailed explanations included!

When I first heard about SQL window functions, I thought it was some strange marriage between SQL and Windows. I was wrong. The SQL window functions have nothing to do with that famous operating system. They are SQL functions that do computations on the set of rows related to the current row. This set of rows is called a window or window frame – hence the function name.

You might also hear about windowing functions in SQL, analytical functions, or the OVER() functions. These are all just alternative names for SQL window functions – an extremely helpful set of tools for data analysis.

In this article, we’ll show you what you can do with window functions and how. I’ll start with a short history of SQL window functions and explain why they are called that. Then, I’ll guide you through the syntax and show you how it works with several examples. After practicing syntax, we’ll be ready for real-life window function examples from a data analyst’s life. And here’s the juiciest part: after each example, there’s an exercise for you to solve and learn through coding.

However, the main source of your knowledge on this subject should be our Window Functions course. Its 218 interactive exercises cover SQL window functions in detail. In other words, you’ll learn about window frames and the OVER(), PARTITION BY, and ORDER BY clauses. All this is necessary to aggregate, rank, and analyze data using window functions.

A Brief History of SQL Window Functions

Window functions were first introduced in the Oracle8i Database, which was released in 1998. However, they were included in the SQL standard five years later with SQL:2003.

Then Microsoft included them in SQL Server 2005. Other database management systems (DBMS) followed; PostgreSQL has supported them since PostgreSQL 8.4 was released in 2009; MariaDB included them with the 10.2 version (2016), and MySQL added them to version 8 in 2018.

Window functions are a rather new feature in SQL. Because of that, they are not part of the usual SQL curriculum. By learning them, you’ll be ahead of the curve compared to many SQL users.

What’s a Window?

A set of rows related to the current row is called a window or a window frame. Hence, the name of these functions: their result is based on a sliding window frame.

For example, you can calculate a cumulative sum as shown below:

date	sales	cumulative_sum
2023-10-01	4,241	4,241
2023-10-02	2,389	6,630
2023-10-03	1,580	8,210
2023-10-04	3,395	11,605
2023-10-05	1,265	12,870

The window for the 2023-10-04 cumulative sum is highlighted in green. It includes the current row (for 2023-10-04) and all the previous rows. So the cumulative sum is calculated as the sum of all the previous and current sales: 4,241 + 2,389 + 1,580 + 3,395 = 11,605. (Note that the row outlined in red dots is not included in the window or the sum.)

When we move to the next row, the window also moves: it will now include all the previous rows (green) and the current row (red dotted). Now the cumulative sum is 4,241 + 2,389 + 1,580 + 3,395 + 1,265 = 12,870.

So the window is the set of rows related to the current row that are used in computations for this row. The window changes (slides) as we move across the rows; thanks to these images of a sliding window, we get the name of these functions.

SQL Window Function Syntax

The syntax for window functions is:

SELECT column_1,
       column_2,
	 <window_function> OVER(PARTITION BY … ORDER BY … ) AS column_alias
FROM table;

Here’s what each part does:

<window_function> – Specifies the function to apply to that window.
OVER() – Defines the window (set of rows) and indicates that this is a window function; without this clause, it’s not a window function.
<window_frame> – Defines the window frame size (optional).
PARTITION BY – Divides the window into smaller groups called partitions (optional); if omitted, the whole result set is one partition.
ORDER BY – Sorts rows within the window frame (optional), i.e., decides in which order the window operation will be performed; if omitted, the order of rows within the partition is arbitrary.

Additional clauses can further define the window. Their syntax is:

[<ROWS or RANGE clause> BETWEEN <lower_bound> AND <upper_bound>]

The ROWS clause defines the window in terms of the fixed number of rows in relation to the current row.

The RANGE clause does the same. But it also takes into calculation all the rows with the same values in the columns specified in the ORDER BY clause as the current row.

The window bounds can be defined as

UNBOUNDED PRECEDING – All the rows before the current row.
n PRECEDING – A defined number of rows before the current row.
CURRENT ROW – Includes the current row.
n FOLLOWING – A defined number of rows after the current row.
UNBOUNDED FOLLOWING – All the rows after the current row.

Let’s now see how this works in practice.

Dataset and Syntax Examples

We’ll use the table album_catalogue in all these examples. You can create it yourself using this script. A data snapshot is shown below:

id	album_title	album_length	album_genre	artist	copies_sold	sales_period
1	Wednesday Morning, 3 A.M	0:31:38	Folk	Simon & Garfunkel	1043	2022_1Q
2	EnRoute: John Scofield Trio LIVE	1:13:48	Jazz	John Scofield Trio	512	2022_1Q
3	Nasty Gal	0:39:15	Funk	Betty Davis	809	2022_1Q
4	The New Folk Sound of Terry Callier	0:37:41	Folk	Terry Callier	903	2022_1Q
5	In a Silent Way	0:38:08	Jazz	Miles Davis	428	2022_1Q

The dataset is a list of albums with their length, genre, artist, and sales data, including the number of copies sold and the period (quarters). The data goes all the way to the third quarter of 2023.

I’ll first show you several examples, explaining each crucial part of the windowing functions’ syntax along the way.

Syntax Example #1: OVER ()

You can use the SUM() window function with only the OVER() clause to get the total sales in the fourth quarter of 2022:

SELECT sales_period,
 album_title,
	 artist,
	 copies_sold,
	 SUM (copies_sold) OVER() AS sold_in_4Q_2022
FROM album_catalogue
WHERE sales_period = '2022_4Q';

I want the sum of the copies sold, so I specify this column in SUM(). The OVER() clause is mandatory. If you want to use OVER() without any of the optional clauses, just leave the parentheses empty.

I use WHERE to output only data from the desired quarter.

When you write a query like this – with an empty OVER() – the whole result set (selected columns, applied filters, etc.) is taken into account when performing the window function calculations. Here, the result shows the individual sales of every album sold in the fourth quarter of 2022. It also shows the total sales of all albums sold in that period.

sales_period	album_title	artist	copies_sold	sold_in_4q_2022
2022_4Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	809	7,403
2022_4Q	EnRoute: John Scofield Trio LIVE	John Scofield Trio	612	7,403
2022_4Q	Nasty Gal	Betty Davis	369	7,403
2022_4Q	The New Folk Sound of Terry Callier	Terry Callier	214	7,403
2022_4Q	In a Silent Way	Miles Davis	65	7,403
2022_4Q	Cold Sweat	James Brown	209	7,403
2022_4Q	The Freewheelin' Bob Dylan	Bob Dylan	246	7,403
2022_4Q	My Favorite Things	John Coltrane	377	7,403
2022_4Q	A Whole New Thing	Sly and the Family Stone	816	7,403
2022_4Q	Five Leaves Left	Nick Drake	400	7,403
2022_4Q	Head Hunters	Herbie Hancock	409	7,403
2022_4Q	In the Right Place	Dr. John	912	7,403
2022_4Q	Blue	Joni Mitchell	412	7,403
2022_4Q	Concierto	Jim Hall	612	7,403
2022_4Q	Dirty Mind	Prince	941	7,403

With the help of SUM() and OVER(), I’m able to show each individual album’s sales and quarterly total.

Syntax Example #2: OVER (ORDER BY)

You can add additional clauses inside the OVER() clause to change the definition of the window frame. One such clause is ORDER BY. The ORDER BY clause defines the sorting of rows within a window frame: the rows can be processed by the window function in a given order.

Let’s see an example. You can calculate the cumulative sum by adding ORDER BY to the previous query. In this example, I want to see how the album ‘In the Right Place’ sells over time and the cumulative number of albums sold up to a given period of time. Here’s the query:

SELECT sales_period,
	 album_title,
	 artist,
	 copies_sold,
	 SUM (copies_sold) OVER(ORDER BY sales_period ASC) AS cumulative_sum
FROM album_catalogue
WHERE album_title = 'In the Right Place';

The same column copies_sold is in SUM(). This time, OVER() contains the ORDER BY clause. You want to show the cumulative sales from the earliest to the latest quarter. That’s why you need sales_period and ASC in ORDER BY.

sales_period	album_title	artist	copies_sold	cumulative_sum
2022_1Q	In the Right Place	Dr. John	222	222
2022_2Q	In the Right Place	Dr. John	208	430
2022_3Q	In the Right Place	Dr. John	94	524
2022_4Q	In the Right Place	Dr. John	912	1436
2023_1Q	In the Right Place	Dr. John	912	2348
2023_2Q	In the Right Place	Dr. John	56	2404
2023_3Q	In the Right Place	Dr. John	562	2966

In each row, you can see the sales for each quarter and the cumulative sum, i.e., the sum of the current and all the previous quarters. For instance, the album sold 94 copies in the third quarter of 2022. The total sales in 2022 up until then (or in three quarters) is: 222 + 208 + 94 = 524.

Syntax Example #3: OVER (PARTITION BY)

Another clause you can use in OVER() is PARTITION BY. PARTITION BY is used to divide the window into smaller segments based on some criteria. For example, you can list the albums, their sales data for the fourth quarter of 2022, and the sales by genre for that quarter:

SELECT album_title,
	 artist,
	 copies_sold,
	 album_genre,
	 SUM (copies_sold) OVER(PARTITION BY album_genre) AS sales_by_genre
FROM album_catalogue
WHERE sales_period = '2022_4Q';

Again, we use the same window function SUM(). This time, though, we use PARTITION BY to divide the window into smaller segments based on album genre. Everything else stays the same.

The query returns the result below. It’s an analysis of album sales data by genre for the last quarter of 2022.

album_title	artist	copies_sold	album_genre	sales_by_genre
Wednesday Morning, 3 A.M	Simon & Garfunkel	809	Folk	2,081
The Freewheelin' Bob Dylan	Bob Dylan	246	Folk	2,081
Five Leaves Left	Nick Drake	400	Folk	2,081
The New Folk Sound of Terry Callier	Terry Callier	214	Folk	2,081
Blue	Joni Mitchell	412	Folk	2,081
Dirty Mind	Prince	941	Funk	3,247
Nasty Gal	Betty Davis	369	Funk	3,247
Cold Sweat	James Brown	209	Funk	3,247
A Whole New Thing	Sly and the Family Stone	816	Funk	3,247
In the Right Place	Dr. John	912	Funk	3,247
Head Hunters	Herbie Hancock	409	Jazz	2,075
EnRoute: John Scofield Trio LIVE	John Scofield Trio	612	Jazz	2,075
In a Silent Way	Miles Davis	65	Jazz	2,075
Concierto	Jim Hall	612	Jazz	2,075
My Favorite Things	John Coltrane	377	Jazz	2,075

For instance, the cumulative sum for the folk albums is 809 + 246 + 400 + 214 + 412 = 2,081.

Syntax Example #4: OVER (ORDER BY PARTITION BY)

You can also use both PARTITION BY and ORDER BY in OVER(). The rows are divided into segments with PARTITION BY and processed in a given order by ORDER BY.

Using the query below, I can show all the album analytical data and calculate the cumulative sum for each album separately:

SELECT sales_period,
	 album_title,
	 artist,
	 copies_sold,
	 SUM (copies_sold) OVER(PARTITION BY album_title ORDER BY sales_period ASC) AS cumulative_sum_by_album
FROM album_catalogue;

I calculate this with the help of the SUM() window function, as I did earlier. I partitioned the window by album. This means that the sum will be cumulated until the function reaches the last row of a particular album. When it reaches another album, it resets and starts accumulating the sum from the beginning.

I also use ORDER BY to instruct the function to cumulate the sum from the earliest to the latest quarter.

sales_period	album_title	artist	copies_sold	cumulative_sum_by_album
2022_1Q	A Whole New Thing	Sly and the Family Stone	674	674
2022_2Q	A Whole New Thing	Sly and the Family Stone	257	931
2022_3Q	A Whole New Thing	Sly and the Family Stone	666	1,597
2022_4Q	A Whole New Thing	Sly and the Family Stone	816	2,413
2023_1Q	A Whole New Thing	Sly and the Family Stone	816	3,229
2023_2Q	A Whole New Thing	Sly and the Family Stone	302	3,531
2023_3Q	A Whole New Thing	Sly and the Family Stone	123	3,654
2022_1Q	Blue	Joni Mitchell	589	589
2022_2Q	Blue	Joni Mitchell	184	773
2022_3Q	Blue	Joni Mitchell	256	1,029
2022_4Q	Blue	Joni Mitchell	412	1,441
2023_1Q	Blue	Joni Mitchell	412	1,853
2023_2Q	Blue	Joni Mitchell	99	1,952
2023_3Q	Blue	Joni Mitchell	995	2,947
…	…	…	…	…
2022_1Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	1,043	1,043
2022_2Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	437	1,480
2022_3Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	184	1,664
2022_4Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	809	2,473
2023_1Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	809	3,282
2023_2Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	325	3,607
2023_3Q	Wednesday Morning, 3 A.M	Simon & Garfunkel	612	4,219

You can see that the cumulative sum for ‘A Whole New Thing’ is 3,654. The next album (‘Blue’) starts with the next row, so the cumulation is reset: the cumulative sum is the same as the individual sales of the album in the first quarter of 2022. Then, it accumulates until it reaches the next album. The result goes all the way until the last album, which is ‘Wednesday Morning, 3 A.M.’ in our case.

I’ve shown you the most common ways of defining window frames with these examples. But these are not the only ways. You can also use the ROW or RANGE clauses with the syntax and bounds we explained earlier. Don’t worry. You’ll see the practical use of this in real-world examples.

I used only one function, SUM(), throughout these examples. This is one of many window functions; let’s quickly review some of the others.

What Are the Most Common Window Functions?

The most common window functions can be divided into three categories:

Aggregate Window Functions:

COUNT() – Counts the number of rows within a window.
SUM() – Totals given values within a window.
AVG() – Calculate the average of given values within a window.
MIN() – Finds the smallest value within a window.
MAX() – Finds the largest value within a window.

Ranking Window Functions:

ROW_NUMBER() – Ranks values sequentially, with different ranks for the tied values.
RANK() – Ranks values using the same rank for tied values; skips the next rank after the ties (e.g. 1, 2, 2, 4).
DENSE_RANK() – Ranks values using the same rank for tied values; doesn’t skip the next rank after the ties (e.g. 1,2,2,3,4).

Analytic Window Functions:

LEAD() – Gets data from a defined offset (i.e. a stated number of rows) after the current row.
LAG() – Gets data from a defined offset (i.e. a stated number of rows) before the current row.

There are more window functions you could find useful. Check them out in our free SQL Window Functions Cheat Sheet.

Real-World SQL Window Function Examples

So far, I’ve focused more on the syntax of SQL window functions. I’ll now show you the most common practical uses of window functions and how they can help data analysts in their work.

These examples will use the same dataset as earlier.

Example #1: Percentage of Total

Let’s show info about each album and its sales in the first quarter of 2023. Additionally, we will show quarterly sales by each genre. Then, let’s calculate how much each album (as a percentage) contributes to genre sales.

SELECT album_title,
	 artist,
	 copies_sold,
	 album_genre,
	 SUM(copies_sold) OVER (PARTITION BY album_genre) AS sales_by_genre,
	 (copies_sold*1.0/ SUM(copies_sold) OVER (PARTITION BY album_genre))*100.0 AS percent_of_genre_sales
FROM album_catalogue
WHERE sales_period = '2023_1Q'
ORDER BY album_genre, copies_sold DESC;

To get the sales by genre, I again use the SUM() window function. In the OVER() clause, I use only PARTITION BY. That way, I can partition the window by the album genre.

In the next code line, I divide the copies sold (of each album) and divide by the genre sales. To do that, simply copy the calculation from the previous line. Then, multiply the quotient by 100 to get the percentage. You’ll notice that I also multiplied copies_sold with 1.0. This is for converting integers to decimal values.

Filter the desired quarter using WHERE. Finally, order the output alphabetically by genre and then descendingly by copies sold.

Here’s the result:

album_title	artist	copies_sold	album_genre	sales_by_genre	percent_of_genre_sales
Wednesday Morning, 3 A.M	Simon & Garfunkel	809	Folk	2,081	38.88
Blue	Joni Mitchell	412	Folk	2,081	19.80
Five Leaves Left	Nick Drake	400	Folk	2,081	19.22
The Freewheelin' Bob Dylan	Bob Dylan	246	Folk	2,081	11.82
The New Folk Sound of Terry Callier	Terry Callier	214	Folk	2,081	10.28
Dirty Mind	Prince	941	Funk	3,247	28.98
In the Right Place	Dr. John	912	Funk	3,247	28.09
A Whole New Thing	Sly and the Family Stone	816	Funk	3,247	25.13
Nasty Gal	Betty Davis	369	Funk	3,247	11.36
Cold Sweat	James Brown	209	Funk	3,247	6.44
EnRoute: John Scofield Trio LIVE	John Scofield Trio	612	Jazz	2,075	29.49
Concierto	Jim Hall	612	Jazz	2,075	29.49
Head Hunters	Herbie Hancock	409	Jazz	2,075	19.71
My Favorite Things	John Coltrane	377	Jazz	2,075	18.17
In a Silent Way	Miles Davis	65	Jazz	2,075	3.13

Let’s check the calculation for the first row. The album by Simon & Garfunkel sold 809 copies. The total folk album sales for that quarter were 2,081. So the percentage of the individual sales in total genre sales is 809/2,081*100 = 38.88%.

The sum of percentages for each genre should be 100%. Let’s check this on a folk genre: 38.88% + 19.80% + 19.22% + 11.82% + 10.28 % = 100%.

Window Function vs. Aggregate Function vs. GROUP BY

I’m again using the aggregate function as a window function. I could’ve used a simple aggregate SUM() function with GROUP BY to get sales by each genre for the specified quarter. What’s the difference, then?

A window function allows you to show both analytical and aggregate data (individual sales with sales by genre and the quotient of these values), while an aggregate function used with GROUP BY would collapse the individual row and show only the aggregate value (the sum of sales for the quarter).

Solve This Exercise for Practice

Using window functions, rewrite the above query so that it shows the average sale by genre. Also, show how much each album's sales are above or below the genre average (as a percentage). Show only sales from the third quarter of 2023. Show the album title, artist, copies sold, and the album genre. Sort the output ascendingly by genre and individual album sales.

Solution:

SELECT album_title,
	 artist,
	 copies_sold,
	 album_genre,
	 AVG(copies_sold) OVER (PARTITION BY album_genre) AS average_sales_by_genre,
	   ((copies_sold/AVG(copies_sold) OVER (PARTITION BY album_genre))-1)*100 AS pct_from_average	   
FROM album_catalogue
WHERE sales_period = '2023_3Q'
ORDER BY album_genre, copies_sold;

Output:

album_title	artist	copies_sold	album_genre	average_sales_by_genre	pct_from_average
The New Folk Sound of Terry Callier	Terry Callier	283	Folk	561.6	-49.61
Five Leaves Left	Nick Drake	321	Folk	561.6	-42.84
The Freewheelin' Bob Dylan	Bob Dylan	597	Folk	561.6	6.30
Wednesday Morning, 3 A.M	Simon & Garfunkel	612	Folk	561.6	8.97
Blue	Joni Mitchell	995	Folk	561.6	77.17
A Whole New Thing	Sly and the Family Stone	123	Funk	533.4	-76.94
Dirty Mind	Prince	169	Funk	533.4	-68.32
In the Right Place	Dr. John	562	Funk	533.4	5.36
Nasty Gal	Betty Davis	808	Funk	533.4	51.48
Cold Sweat	James Brown	1005	Funk	533.4	88.41
Concierto	Jim Hall	263	Jazz	464	-43.32
My Favorite Things	John Coltrane	302	Jazz	464	-34.91
EnRoute: John Scofield Trio LIVE	John Scofield Trio	404	Jazz	464	-12.93
Head Hunters	Herbie Hancock	542	Jazz	464	16.81
In a Silent Way	Miles Davis	809	Jazz	464	74.35

Example #2: Rank Data

In this example, I’ll use a window function to rank data. I want to show each distinct album title and its length and rank them by length. The longest album will be ranked first.

SELECT *, 
	 RANK() OVER (ORDER BY album_length DESC) AS album_length_rank
FROM (SELECT DISTINCT album_title,
	       album_length
	FROM album_catalogue) AS distinct_album;

Let’s start by explaining the subquery: we use it to select distinct albums and their lengths.

Then, we use the main query to select all the data from the subquery. Now, use the RANK() window function to rank albums. You can also use other ranking functions, depending on your data and tasks.

For the ranking to work the way you want to, use the ORDER BY clause in OVER(). Specify the column by which you want to rank and in what order. In this case, it’s descendingly by length.

Here’s the ranking:

album_title	album_length	album_length_rank
EnRoute: John Scofield Trio LIVE	1:13:48	1
The Freewheelin' Bob Dylan	0:44:14	2
Head Hunters	0:41:52	3
Five Leaves Left	0:41:43	4
My Favorite Things	0:40:25	5
Nasty Gal	0:39:15	6
In a Silent Way	0:38:08	7
Concierto	0:38:02	8
A Whole New Thing	0:38:01	9
The New Folk Sound of Terry Callier	0:37:41	10
Blue	0:36:15	11
Cold Sweat	0:33:43	12
In the Right Place	0:33:22	13
Wednesday Morning, 3 A.M	0:31:38	14
Dirty Mind	0:30:14	15

Solve This Exercise for Practice

Rank each unique album by its sales within its genre. Show only data for the first quarter of 2023. Show the album title, its sales, genre, and rank. If there are albums with the same number of sales, rank them equally and don't skip the next rank.

Solution:

SELECT *, 
	 DENSE_RANK() OVER (PARTITION BY album_genre ORDER BY copies_sold DESC) AS album_sales_rank
FROM (SELECT DISTINCT album_title,
	       copies_sold,
	       album_genre
	FROM album_catalogue
	WHERE sales_period = '2023_1Q') AS distinct_album;

Output:

album_title	copies_sold	album_genre	album_sales_rank
Wednesday Morning, 3 A.M	809	Folk	1
Blue	412	Folk	2
Five Leaves Left	400	Folk	3
The Freewheelin' Bob Dylan	246	Folk	4
The New Folk Sound of Terry Callier	214	Folk	5
Dirty Mind	941	Funk	1
In the Right Place	912	Funk	2
A Whole New Thing	816	Funk	3
Nasty Gal	369	Funk	4
Cold Sweat	209	Funk	5
EnRoute: John Scofield Trio LIVE	612	Jazz	1
Concierto	612	Jazz	1
Head Hunters	409	Jazz	2
My Favorite Things	377	Jazz	3
In a Silent Way	65	Jazz	4

Example #3: Running Total

In this example, I’ll show a particular album’s sales period, title, artist, and the copies sold. I’ll also add a running total of copies sold that will include three rows: the current row and the two previous. The sum should be calculated from the earliest to the latest quarter.

SELECT sales_period, 
	 album_title,
	 artist,
	 copies_sold,
	 SUM(copies_sold) OVER (ORDER BY sales_period ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sales_running_total
FROM album_catalogue
WHERE album_title = 'In a Silent Way';

I again use the SUM() window function. Then there’s an ORDER BY in OVER() to order the sales so we can sum them ascendingly.

Next, I need to define the moving window frame. The running total should include the current row and the previous two rows. These are the lower and upper bounds specified in the ROWS clause. The lower bound is two previous rows, i.e., BETWEEN 2 PRECEDING. The upper bound is CURRENT ROW. The two bounds are stitched together in a window frame using the keyword AND.

I want to show the calculation for Miles Davis’s album ‘In a Silent Way’, so I filter data using WHERE.

Here are the running totals:

sales_period	album_title	artist	copies_sold	sales_running_total
2022_1Q	In a Silent Way	Miles Davis	428	428
2022_2Q	In a Silent Way	Miles Davis	1,053	1,481
2022_3Q	In a Silent Way	Miles Davis	19	1,500
2022_4Q	In a Silent Way	Miles Davis	65	1,137
2023_1Q	In a Silent Way	Miles Davis	65	149
2023_2Q	In a Silent Way	Miles Davis	218	348
2023_3Q	In a Silent Way	Miles Davis	809	1,092

Let’s check the result and explain what a running total is.

A running total is similar to a cumulative total (or sum), but they’re not the same. The cumulative total will give you the sum of the current row and all the previous rows, i.e., the window frame increases with each row. A running total is a sum within a defined window frame that stays the same size but moves with each row. In our case, the window is defined as the current row and the two previous rows.

Take a look at the highlighted values. The running total for the first quarter in 2022 is 428, the same as the individual sale. There are no previous rows, so the running total includes only the current row.

The next running total is 428 + 1,053 = 1,481. It sums the current and the previous row, as there is only one previous row.

The running total for the third quarter in 2022 is 428 + 1,053 + 19 = 1,500. This is the first time you get the whole window, i.e., the current row and the two previous rows.

As you go to the next row, the window will move but its size will remain the same. The running total for the following quarter is 428 + 1,053 + 19 + 65 = 1,137. It, again, involves the current row and the two previous rows – but different ones compared to the quarter.

Solve this exercise for practice

Rewrite the above query so it calculates the running total for the album ‘The New Folk Sound of Terry Callier’. The running total should be calculated from the earliest to the latest quarter. It should include four quarters: the two previous, the current quarter, and the one following. Also, show the sales period, album title, artist, and the number of copies sold.

Solution:

SELECT sales_period, 
	 album_title,
	 artist,
	 copies_sold,
	 SUM(copies_sold) OVER (ORDER BY sales_period ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS sales_running_total
FROM album_catalogue
WHERE album_title = 'The New Folk Sound of Terry Callier';

Output:

sales_period	album_title	artist	copies_sold	sales_running_total
2022_1Q	The New Folk Sound of Terry Callier	Terry Callier	903	2,575
2022_2Q	The New Folk Sound of Terry Callier	Terry Callier	418	2,789
2022_3Q	The New Folk Sound of Terry Callier	Terry Callier	1,254	3,003
2022_4Q	The New Folk Sound of Terry Callier	Terry Callier	214	2,641
2023_1Q	The New Folk Sound of Terry Callier	Terry Callier	214	2,506
2023_2Q	The New Folk Sound of Terry Callier	Terry Callier	541	1,252
2023_3Q	The New Folk Sound of Terry Callier	Terry Callier	283	1,038

Example #4: Quarter-To-Quarter Difference

In this last example, I’ll show how to use the window functions to calculate the sales difference between quarters:

SELECT *,
	 LAG(quarterly_copies_sold) OVER (ORDER BY sales_period) AS previous_quarter_sales,
	 quarterly_copies_sold - LAG(quarterly_copies_sold) OVER (ORDER BY sales_period) AS quarterly_sales_difference
FROM (SELECT sales_period,
	SUM(copies_sold) AS quarterly_copies_sold
FROM album_catalogue
GROUP BY sales_period) AS quarterly_sales;

I first write a subquery that calculates the total sales for each quarter. I use the SUM() aggregate function and group the results by the sales period.

Next, I select all the data from the subquery in the main query.

Now I need to get the previous quarter's sales. I’ll write the LAG() window function, which is used to access values from the previous rows. The value I want to access is specified in the function. In this case, it’s the quarterly copies sold from the subquery. By defining the offset argument, the function allows me to define how far back I want to go. I didn’t define it, so the default offset is one. In other words, the function will get the data from the previous row/quarter. But if you want to go two rows/quarters back, then you would write LAG(quarterly_copies_sold, 2).

I also use ORDER BY in OVER() to make sure the values within the frame are sorted from the oldest to the latest quarter.

This use of the window function is so that it’s clearer what I’ll do in the next code line. This is where the actual calculation of comparing the current and previous quarter’s sales is done. It’s simple now: subtract the window function defined above from the column quarterly_copies_sold.

Here’s the output:

sales_period	quarterly_copies_sold	previous_quarter_sales	quarterly_sales_difference
2022_1Q	9,519	NULL	NULL
2022_2Q	7,581	9,519	-1,938
2022_3Q	4,273	7,581	-3,308
2022_4Q	7,403	4,273	3,130
2023_1Q	7,403	7,403	0
2023_2Q	4,956	7,403	-2,447
2023_3Q	7,795	4,956	2,839

There are no previous values for 2022_1Q, as there is no previous quarter. The quarterly sales for 2022_2Q are 7,581. The sales in the previous quarter were 9,519. The calculation shows that the current sales are 1,938 copies (7,581 - 9,519) below the previous quarter's sales.

You can analyze the rest of the output the same way.

Solve This Exercise for Practice

Rewrite the above query so it shows the difference between quarterly sales on a year-over-year basis – e.g., compare the first quarter of 2023 with the first quarter of 2022. Show the sales period, copies sold in the quarter, sales for the same quarter in the previous year, and the year-over-year difference between the quarters.

Solution:

SELECT *,
	 LAG(quarterly_copies_sold, 4) OVER (ORDER BY sales_period) AS year_over_year_sales,
	 quarterly_copies_sold - LAG(quarterly_copies_sold, 4) OVER (ORDER BY sales_period) AS year_over_year_difference
FROM (SELECT sales_period,
	   SUM(copies_sold) AS quarterly_copies_sold
FROM album_catalogue
GROUP BY sales_period) AS quarterly_sales;

Output:

sales_period	quarterly_copies_sold	year_over_year_sales	year_over_year_difference
2022_1Q	9,519	NULL	NULL
2022_2Q	7,581	NULL	NULL
2022_3Q	4,273	NULL	NULL
2022_4Q	7,403	NULL	NULL
2023_1Q	7,403	9,519	-2,116
2023_2Q	4,956	7,581	-2,625
2023_3Q	7,795	4,273	3,522

If you want more, here are additional window function examples. For practice materials, take a look at these 11 SQL window functions exercises.

SQL Window Functions: A Window to Better Data Analysis

This turned out to be a pretty comprehensive article about SQL window functions. You learned the window functions and how each crucial part of their syntax works.

You also know there are several categories of window functions. The most commonly used are aggregate, ranking, and analytical window functions. The practical examples showed you how window functions can be used in common data analysis tasks.

Hopefully, you didn’t skip the exercises in the article. If you did, I once again recommend that you solve them. Only through practice can you really bring home what SQL window functions are about.

The richest resource for learning and practicing is our Window Functions course. It is an interactive course that has over 200 hands-on exercises and covers the full syntax of window functions. If you have job interviews lined up, make sure you go through these SQL window functions interview questions. Good luck, and keep on learning SQL!

Tags:

window functions

A Brief History of SQL Window Functions

What’s a Window?

SQL Window Function Syntax

Dataset and Syntax Examples

Syntax Example #1: OVER ()

Syntax Example #2: OVER (ORDER BY)

Syntax Example #3: OVER (PARTITION BY)

Syntax Example #4: OVER (ORDER BY PARTITION BY)

What Are the Most Common Window Functions?

Real-World SQL Window Function Examples

Example #1: Percentage of Total

Window Function vs. Aggregate Function vs. GROUP BY

Solve This Exercise for Practice

Example #2: Rank Data

Solve This Exercise for Practice

Example #3: Running Total

Solve this exercise for practice

Example #4: Quarter-To-Quarter Difference

Solve This Exercise for Practice

SQL Window Functions: A Window to Better Data Analysis

You may also like