25th Jan 2022 15 minutes read

The SQL OVER() Clause Explained

window functions

Table of Contents

What Is the SQL OVER() Clause?
The SQL OVER() Clause Syntax
The SQL OVER() Clause Examples
Learning the SQL OVER() Clause Is Not Yet Over

Here is a detailed explanation of the OVER() clause in SQL window functions.

This is a detailed guide on using the OVER() clause in SQL window functions. I will focus specifically on this clause, which requires that you have at least a general idea of how window functions work. To make the best out of this article, take a look at the SQL Window Functions Cheat Sheet. Feel free to have it by your side as we go.

What Is the SQL OVER() Clause?

Let’s start by answering the first question that springs to mind: what is an OVER() clause? It is what makes functions become window functions.

The OVER() clause lets you define the windows for window functions. It also lets you define the order in which a given window function is executed.

The SQL OVER() Clause Syntax

The SQL syntax of this clause is as follows:

SELECT
  <column name>,
  <window function> OVER([PARTITION BY <column names>] [ORDER BY <column names>]  [ <ROW or RANGE clause> ])
FROM table;

The three distinct parts of the OVER() clause syntax are:

PARTITION BY
ORDER BY
The window frame (ROW or RANGE clause)

I’ll walk through each of these.

PARTITION BY()

This SQL clause lets you define the window. It partitions a dataset into smaller segments called windows.

When PARTITION BY is omitted, the window function is applied to the whole dataset. Let’s use a small example dataset to see how it works:

id	date	sales	product
1	2021-12-06	1,357.22	Ax
2	2021-12-06	2,154.88	Hammer
3	2021-12-07	1,984.88	Ax
4	2021-12-07	3,147.11	Hammer

If I want to see total sales by product, using GROUP BY comes to mind. I can write a query like this:

SELECT	product,
		SUM(sales) AS sales_per_product
FROM product_sales
GROUP BY product;

The result is:

product	sales_per_product
Ax	3,342.10
Hammer	5,301.99

GROUP BY works fine if I only want to see aggregate values for each group. However, if I want to see aggregate values for each group while preserving row-level data, I need to use PARTITION BY. For example:

SELECT	product,
		date,
		sales,
		SUM(sales) OVER (PARTITION BY product) AS sales_per_product
FROM product_sales;

This query is just to show you what PARTITION BY does. Don’t worry; later on, I’ll show you example queries and explain them in detail. For now, take a look at the result of the query above:

product	date	sales	sales_per_product
Ax	2021-12-06	1,357.22	3,342.10
Ax	2021-12-07	1,984.88	3,342.10
Hammer	2021-12-06	2,154.88	5,301.99
Hammer	2021-12-07	3,147.11	5,301.99

Here, I have sales per product as in the previous example, but I also have sales data for every product and date. The partition in this example is the product. This means the ax is one window partition, while the hammer is another window partition. The window function sums all sales values for the ax; it then goes to the next window partition, resets, and sums all sales values for the hammer.

This illustrates the main difference between using GROUP BY and PARTITION BY: GROUP BY collapses rows and returns aggregate values in one row only, while PARTITION BY preserves all rows and returns aggregate values in as many rows as there are in the partition.

You may want to help yourself in clarifying this topic further by reading an article dedicated to the difference between GROUP BY and PARTITION BY.

ORDER BY

You probably can guess what this SQL clause does. Yes, it orders something. But what?

Unlike the regular ORDER BY, the ORDER BY in an OVER() clause does not sort the query result. When used in window functions as part of an OVER() clause, ORDER BY defines how window functions are executed.

Both ASC and DESC may be used with it. ASC means the window function is executed in ascending order, e.g., A-Z, from the lowest to the highest number, from the oldest to the newest date, etc. If you use DESC, the window function is executed in reverse order compared to ASC. If ORDER BY is omitted, the window function is executed in arbitrary order.

Whichever way you use it, keep in mind ORDER BY in an OVER() clause does not have to do with how the query output is sorted. You may still sort the result even when using window functions; you just have to write another ORDER BY clause at the end of your query like you usually do.

The Window Frame

A window frame is a set of rows that are related to the current row. The current row is where the window function is being executed.

Window frames are defined using the RANGE or ROWS keyword. These keywords define the upper and lower bounds of a window frame, thus defining the frame itself. The difference between these two keywords is explained in an article about the RANGE clause.

The Default Window Frame

The default window frame depends on whether ORDER BY is omitted or not. There are two possible scenarios when ORDER BY is omitted:

Using OVER (), omitting both PARTITION BY and ORDER BY.
Using OVER (PARTITION BY …), omitting only ORDER BY.

In both cases, the default window frame is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This means the window frame is all rows (all rows before the current row, the current row, and all rows after the current row).

In addition, there are two possible scenarios when ORDER BY is not omitted:

OVER (ORDER BY …): PARTITION BY is omitted.
OVER (PARTITION BY … ORDER BY …): neither PARTITION BY nor ORDER BY is omitted.

In these two cases, the default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means all values (not rows!) before the current row and the current row itself.

Meaning of Window Frame Abbreviations

While I’m at it, here’s a list of all possible upper and lower bounds for the window frame and what they mean:

Abbreviation	Meaning
UNBOUNDED PRECEDING	BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
n PRECEDING	BETWEEN n PRECEDING AND CURRENT ROW
CURRENT ROW	BETWEEN CURRENT ROW AND CURRENT ROW
n FOLLOWING	BETWEEN AND CURRENT ROW AND n FOLLOWING
UNBOUNDED FOLLOWING	BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

If you want to see this in an example, here’s an article about defining a window frame in SQL window functions.

Now that we have gone through the technical stuff, it’s time to explain the OVER() clause with examples.

The SQL OVER() Clause Examples

To understand the examples, you have to become familiar with the data first.

The Dataset Used

All four examples use the same dataset, which consists of three tables:

stores
products
inventory

The table stores is a list of the stores selling tea. Here’s what it looks like:

id	store_name
1	Zaandam
2	Amsterdam
3	Rotterdam
4	Utrecht

The next table shows the list of tea producers and the product they make; it’s the products table:

id	product_brand	product_name
1	Mr. Wellington	Earl Grey
2	Karnataka Tea Company	Kashmir Tchai
3	Fukuyama & Co.	Matcha
4	Fukuyama & Co.	Sencha

The table inventory shows the quantities of tea available for every store. It has much more data than the first two tables, so here are just the first several rows:

id	inventory_date	opening_quantity	closing_quantity	product_id	store_id
1	2021-12-06	13	8	1	1
2	2021-12-07	8	1	1	1
3	2021-12-08	52	50	1	1
4	2021-12-09	50	44	1	1
5	2021-12-10	44	32	1	1
6	2021-12-06	NULL	NULL	2	1
7	2021-12-07	NULL	NULL	2	1
8	2021-12-08	28	20	2	1
9	2021-12-09	20	17	2	1
10	2021-12-10	17	17	2	1

As you can see, there are NULL values. They mean the tea was not available when the store opened or closed. In the above, product_id = 2 in store_id = 1 wasn’t available for two days: 2021-12-06 and 2021-12-07.

Generally, the closing_quantity of a given day is the opening_quantity of the following day. If it’s not, then it means a new order was delivered and accounted for before the store opened. This is the case, for example, in the highlighted rows.

Ready for some OVER() clause examples? I bet you are!

Example 1: With PARTITION BY but Without ORDER BY

In this example, the task is to show the inventory date, the quantity at store open, and the quantity at store close. We also want the highest quantity at store close for each day regardless of the store or the product.

To do that, I’ve written this query:

SELECT 	inventory_date,
		opening_quantity,
		closing_quantity,
		MAX(closing_quantity) OVER (PARTITION BY inventory_date) AS daily_max
FROM inventory;

This query uses only PARTITION BY in the OVER() clause. What does this query do?

First, it selects three columns from the table inventory. Then I use the MAX() aggregate function to find the highest product quantity at store close. I use the OVER() clause to make this aggregate function a window function. In it, I partition the window by the column inventory_date. This means I get the highest product quantity at store close by inventory date across all stores and across all products.

The table below shows all the data for the first date (December 6) and the first row of the second date (December 7) for simplicity. Of course, the query returns much more data.

inventory_date	opening_quantity	closing_quantity	daily_max
2021-12-06	128	51	397
2021-12-06	NULL	NULL	397
2021-12-06	87	87	397
2021-12-06	22	21	397
2021-12-06	13	8	397
2021-12-06	247	200	397
2021-12-06	42	39	397
2021-12-06	33	33	397
2021-12-06	44	41	397
2021-12-06	73	71	397
2021-12-06	97	94	397
2021-12-06	412	397	397
2021-12-06	39	33	397
2021-12-06	62	58	397
2021-12-06	49	44	397
2021-12-06	79	NULL	397
2021-12-07	58	54	327

The table lists the quantities at store open and store close for every store-product combination. The highest quantity at store close is in the column daily_max. As you can see, it is 397 for the date 2021-12-06. You can verify this is the highest value; it appears in the highlighted row. The NULL values are shown, but they don’t impact the result.

The data is presented the same way for every date in the dataset. You can see the next date is 2021-12-07, and the highest product quantity at store close for this date is 327. The same principle applies to the rest of the table.

Example 2: With Both PARTITION BY and ORDER BY

In this example, I’ll show you how to use both PARTITION BY and ORDER BY in an OVER() clause. To do that, I write a query that displays the product brand, the product name, and the inventory date. I also want to show the daily quantity sold. In addition, I want to see the daily cumulative quantity sold per brand across all stores.

You need to know how to calculate a running total for this example. Before showing you the query, let’s make sure you understand what a running total is. Here it is with the example data:

id	date	sales	running_total	product
1	2021-12-06	1,357.22	1,357.22	Ax
2	2021-12-06	2,154.88	3,512.10	Hammer
3	2021-12-07	1,984.88	5,496.98	Ax
4	2021-12-07	3,147.11	8,644.09	Hammer

A running total is the sum of the current row plus all the preceding rows. In the first row, it’s 1,357.22 + 0 = 1,357.22 because there are no preceding rows. For the next row, it is 2,154.88 + 1,357.22 = 3,512.10. The same logic applies to all remaining rows: 1,984.88 + 3,512.10 = 5,496.98 and 3,147.11 + 5,496.98 = 8,644.09.

Now let’s calculate the running total in SQL:

SELECT 	product_brand,
		product_name,
		inventory_date,
		opening_quantity - closing_quantity AS daily_sales,
		SUM(opening_quantity - closing_quantity) OVER (PARTITION BY product_brand ORDER BY inventory_date) AS cumulative_sales_per_brand
FROM products p
JOIN inventory i
ON p.id = i.product_id;

The query selects certain columns from the tables products and inventory. Then, I calculate the difference between the quantities at store open and store close into the column daily_sales. After that, this difference is used in the SUM() window function to get me the sum of daily sales.

The window is partitioned by the column product_brand because I want to see the sum of daily sales at the brand level. The ordinary SUM() aggregate function now becomes a window function for calculating the cumulative sum.

Finally, the window function is executed by inventory date in ascending order. Had I not used ORDER BY, the SUM() function would have been executed in arbitrary order, which is not much of a cumulative sum/running total.

Note: If no ASC or DESC is specified, ASC is the default value.

The query returns the result below. Again, I’m only showing a handful of rows:

product_brand	product_name	inventory_date	daily_sales	sales_per_brand
Fukuyama & Co.	Matcha	2021-12-06	5	27
Fukuyama & Co.	Sencha	2021-12-06	1	27
Fukuyama & Co.	Matcha	2021-12-06	0	27
Fukuyama & Co.	Sencha	2021-12-06	0	27
Fukuyama & Co.	Sencha	2021-12-06	NULL	27
Fukuyama & Co.	Matcha	2021-12-06	3	27
Fukuyama & Co.	Sencha	2021-12-06	15	27
Fukuyama & Co.	Matcha	2021-12-06	3	27
Fukuyama & Co.	Matcha	2021-12-07	4	224
Fukuyama & Co.	Sencha	2021-12-07	98	224
Fukuyama & Co.	Matcha	2021-12-07	3	224
Fukuyama & Co.	Matcha	2021-12-07	11	224
Fukuyama & Co.	Sencha	2021-12-07	1	224
Fukuyama & Co.	Sencha	2021-12-07	70	224
Fukuyama & Co.	Matcha	2021-12-07	5	224
Fukuyama & Co.	Sencha	2021-12-07	5	224

According to the result, the quantity sold for Fukuyama & Co. products is 27 for 2021-12-06. The total quantity sold on 2021-12-07 is 197. However, the quantity in the column sales_per_brand is 224. Why is that? This is because it’s a cumulative sum that shows the total sold on that day plus the total of all prior days. In other words, you get 224 if you sum all the values in the highlighted cells.

This is done the same way for all other dates and brands.

Example 3: With Both PARTITION BY and ORDER BY

Now, say you want to show the store name, the inventory date, the brand, and the products. Again, we will show the daily quantity sold, but we will also show the average daily quantity sold by each store. This is a cumulative average that changes with every passing day and from the lowest to the highest daily quantity sold.

The general principles of calculating a cumulative average are shown in the table below:

id	date	sales	cumulative_average	product
1	2021-12-06	1,357.22	1,357.22	Ax
2	2021-12-06	2,154.88	1,756.05	Hammer
3	2021-12-07	1,984.88	1,832.33	Ax
4	2021-12-07	3,147.11	2,161.02	Hammer

The moving average is calculated as an average of a subset. Generally, moving averages take into calculation the current row, a certain number of rows or values preceding the current row, and/or a certain number of rows or values following the current row. In the above example, I calculate an average that includes the current row and all rows that precede the respective current row. This results in a type of moving average, but I’ll refer to it as a cumulative average to make a distinction.

The cumulative average for the first row is 1,357.22. This is the same as the sales value of the row, because there are no rows before the first one. The calculation for the second row looks like this: (1,357.22 + 2,154.88)/2 = 1,756.05; for the next one: (1,357.22 + 2,154.88 + 1,984.88)/3 = 1,832.33. The last cumulative average is calculated by following the same principle: (1,357.22 + 2,154.88 + 1,984.88 + 3,147.11)/4 = 2,161.02.

Now, the cumulative average in SQL:

SELECT 	store_name,
		inventory_date,
		product_brand,
		product_name,
		opening_quantity - closing_quantity AS daily_sales,
		AVG(opening_quantity - closing_quantity) OVER (
PARTITION BY store_name
ORDER BY inventory_date, opening_quantity - closing_quantity
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_avg_sales
FROM stores s
JOIN inventory i
ON s.id = i.store_id
JOIN products p
ON p.id = i.product_id;

I select several columns from all three tables. Then, I calculate the daily sales the same way as in Example 2. I use the AVG() aggregate function to calculate the average daily sales. I need the average by store, so I partition the window by the column store_name.

Then, I order the data in the partition by inventory date and daily sales using an ORDER BY clause. It means the window function calculates the average from the oldest to the newest date and from the lowest to the highest daily sales. Why? I want to see my cumulative averages as they change.

Finally, the window frame is defined as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The average includes the current row and all the preceding rows – within the defined window partition, of course.

Take a look at the resulting table:

store_name	inventory_date	product_brand	product_name	daily_sales	cumulative_avg_sales
Zaandam	2021-12-06	Fukuyama & Co.	Matcha	0	0.00
Zaandam	2021-12-06	Fukuyama & Co.	Sencha	1	0.50
Zaandam	2021-12-06	Mr. Wellington	Earl Grey	5	2.00
Zaandam	2021-12-06	Karnataka Tea Company	Kashmir Tchai	NULL	2.00
Zaandam	2021-12-07	Fukuyama & Co.	Sencha	1	1.75
Zaandam	2021-12-07	Mr. Wellington	Earl Grey	7	2.80
Zaandam	2021-12-07	Fukuyama & Co.	Matcha	11	4.17
Zaandam	2021-12-07	Karnataka Tea Company	Kashmir Tchai	NULL	4.17
Zaandam	2021-12-08	Fukuyama & Co.	Matcha	0	3.57
Zaandam	2021-12-08	Fukuyama & Co.	Sencha	1	3.25
Zaandam	2021-12-08	Mr. Wellington	Earl Grey	2	3.11
Zaandam	2021-12-08	Karnataka Tea Company	Kashmir Tchai	8	3.60
Zaandam	2021-12-09	Fukuyama & Co.	Sencha	2	3.45
Zaandam	2021-12-09	Karnataka Tea Company	Kashmir Tchai	3	3.42
Zaandam	2021-12-09	Mr. Wellington	Earl Grey	6	3.62
Zaandam	2021-12-09	Fukuyama & Co.	Matcha	24	5.07
Zaandam	2021-12-10	Karnataka Tea Company	Kashmir Tchai	0	4.73
Zaandam	2021-12-10	Fukuyama & Co.	Matcha	0	4.44
Zaandam	2021-12-10	Fukuyama & Co.	Sencha	1	4.24
Zaandam	2021-12-10	Mr. Wellington	Earl Grey	12	4.67

Let me prove the calculation is correct by using the same principle you already know. The cumulative average for the first row is 0.00 because the value of this row is 0 and there are no preceding rows.

How about the second row? (0 + 1)/2 = 0.50. So far, so good! The third row: (0 + 1 + 5)/3 = 2.00.

That’s easy. The same with the next row: (0 + 1 + 5 + NULL)/4 = 1.5. But the above table says the cumulative average is 2.00, not 1.5.

Ha, SQL has made an error, right? No, you have made an error if you have treated the NULL values as 0. When calculating averages, SQL simply ignores NULL values. So, the calculation for this row is again (0 + 1 + 5)/3 = 2.00. This is exactly what is shown in the table.

How about the next row? You skip the row with the NULL value and calculate the cumulative average this way: (0 + 1 + 5 + 1)/4 = 1.75.

Example 4: Without PARTITION BY but With ORDER BY DESC

This time, I’ll show you how to use window functions without the PARTITION BY clause but with the data in the window partition in descending order.

I want to rank the data by quantity at store open in descending order. In addition to the rank, I want the result to show the following columns: inventory_date, product_brand, product_name, and opening_quantity.

Take a look:

SELECT	inventory_date,
		product_brand,
		product_name,
		opening_quantity,
		DENSE_RANK() OVER (ORDER BY opening_quantity DESC)
FROM inventory i
JOIN products p
ON i.product_id = p.id;

First, there are several columns from the tables inventory and products. I use the DENSE_RANK() function to rank the values. You may also use the RANK() or the ROW_NUMBER() function, but the result may be slightly different.

In the OVER() clause, there’s an ORDER BY used with a DESC keyword, so the ranking goes from the highest to the lowest quantity at store open. No partition is defined, so the ranking is applied to the whole dataset.

Here are several rows from the data the query returns:

inventory_date	product_brand	product_name	opening_quantity	rank
2021-12-06	Karnataka Tea Company	Kashmir Tchai	NULL	1
2021-12-07	Karnataka Tea Company	Kashmir Tchai	NULL	1
2021-12-09	Fukuyama & Co.	Sencha	NULL	1
2021-12-08	Karnataka Tea Company	Kashmir Tchai	820	2
2021-12-09	Karnataka Tea Company	Kashmir Tchai	809	3
2021-12-09	Karnataka Tea Company	Kashmir Tchai	704	4
2021-12-06	Fukuyama & Co.	Sencha	412	5
2021-12-07	Fukuyama & Co.	Sencha	397	6
2021-12-08	Fukuyama & Co.	Sencha	327	7

As you can see, all NULL values are ranked 1. This is because I’m using PostgreSQL, which puts NULL values at the top when data is in descending order. Since the first three rows have the same value, they have the same rank.

The quantity 820 is ranked 2, while the quantity 809 is ranked 3. This goes on in the same way throughout the dataset.

To show the same rank is allocated not only when the values are NULL, here are two rows you don’t see above:

inventory_date	product_brand	product_name	opening_quantity	rank
2021-12-06	Fukuyama & Co.	Matcha	87	19
2021-12-07	Fukuyama & Co.	Matcha	87	19

As you see, the rows with duplicate values are ranked the same.

Ranking functions can be very useful. I recommend you learn how to use all the ranking functions with PARTITION BY, too.

Learning the SQL OVER() Clause Is Not Yet Over

My explanation of the OVER() clause is done, but that doesn’t mean your learning and practice are over. Now that you understand PARTITION BY, ORDER BY, and the window frame, continue to build on that knowledge.

If you like reading about window functions, here’s a list of the best articles about window functions. If practice is your thing, here’s an article showing you several window function examples.

The best combination of learning through reading and practicing is in our Window Functions Course. It’s a part of the Advanced SQL Track, which, aside from window functions, covers topics such as GROUP BY extensions and recursive queries. Check it out!

Tags:

window functions