Back to articles list Articles Cookbook
15 minutes read

The SQL OVER() Clause Explained

Here is a detailed explanation of the OVER() clause in SQL window functions.

This is a detailed guide on using the OVER() clause in SQL window functions. I will focus specifically on this clause, which requires that you have at least a general idea of how window functions work. To make the best out of this article, take a look at the SQL Window Functions Cheat Sheet. Feel free to have it by your side as we go.

What Is the SQL OVER() Clause?

Let’s start by answering the first question that springs to mind: what is an OVER() clause? It is what makes functions become window functions.

The OVER() clause lets you define the windows for window functions. It also lets you define the order in which a given window function is executed.

The SQL OVER() Clause Syntax

The SQL syntax of this clause is as follows:

SELECT
  <column name>,
  <window function> OVER([PARTITION BY <column names>] [ORDER BY <column names>]  [ <ROW or RANGE clause> ])
FROM table;

The three distinct parts of the OVER() clause syntax are:

  • PARTITION BY
  • ORDER BY
  • The window frame (ROW or RANGE clause)

I’ll walk through each of these.

PARTITION BY()

This SQL clause lets you define the window. It partitions a dataset into smaller segments called windows.

When PARTITION BY is omitted, the window function is applied to the whole dataset. Let’s use a small example dataset to see how it works:

iddatesalesproduct
12021-12-061,357.22Ax
22021-12-062,154.88Hammer
32021-12-071,984.88Ax
42021-12-073,147.11Hammer

If I want to see total sales by product, using GROUP BY comes to mind. I can write a query like this:

SELECT	product,
		SUM(sales) AS sales_per_product
FROM product_sales
GROUP BY product;

The result is:

productsales_per_product
Ax3,342.10
Hammer5,301.99

GROUP BY works fine if I only want to see aggregate values for each group. However, if I want to see aggregate values for each group while preserving row-level data, I need to use PARTITION BY. For example:

SELECT	product,
		date,
		sales,
		SUM(sales) OVER (PARTITION BY product) AS sales_per_product
FROM product_sales;

This query is just to show you what PARTITION BY does. Don’t worry; later on, I’ll show you example queries and explain them in detail. For now, take a look at the result of the query above:

productdatesalessales_per_product
Ax2021-12-061,357.223,342.10
Ax2021-12-071,984.883,342.10
Hammer2021-12-062,154.885,301.99
Hammer2021-12-073,147.115,301.99

Here, I have sales per product as in the previous example, but I also have sales data for every product and date. The partition in this example is the product. This means the ax is one window partition, while the hammer is another window partition. The window function sums all sales values for the ax; it then goes to the next window partition, resets, and sums all sales values for the hammer.

This illustrates the main difference between using GROUP BY and PARTITION BY: GROUP BY collapses rows and returns aggregate values in one row only, while PARTITION BY preserves all rows and returns aggregate values in as many rows as there are in the partition.

You may want to help yourself in clarifying this topic further by reading an article dedicated to the difference between GROUP BY and PARTITION BY.

ORDER BY

You probably can guess what this SQL clause does. Yes, it orders something. But what?

Unlike the regular ORDER BY, the ORDER BY in an OVER() clause does not sort the query result. When used in window functions as part of an OVER() clause, ORDER BY defines how window functions are executed.

Both ASC and DESC may be used with it. ASC means the window function is executed in ascending order, e.g., A-Z, from the lowest to the highest number, from the oldest to the newest date, etc. If you use DESC, the window function is executed in reverse order compared to ASC. If ORDER BY is omitted, the window function is executed in arbitrary order.

Whichever way you use it, keep in mind ORDER BY in an OVER() clause does not have to do with how the query output is sorted. You may still sort the result even when using window functions; you just have to write another ORDER BY clause at the end of your query like you usually do.

The Window Frame

A window frame is a set of rows that are related to the current row. The current row is where the window function is being executed.

Window frames are defined using the RANGE or ROWS keyword. These keywords define the upper and lower bounds of a window frame, thus defining the frame itself. The difference between these two keywords is explained in an article about the RANGE clause.

The Default Window Frame

The default window frame depends on whether ORDER BY is omitted or not. There are two possible scenarios when ORDER BY is omitted:

  • Using OVER (), omitting both PARTITION BY and ORDER BY.
  • Using OVER (PARTITION BY …), omitting only ORDER BY.

In both cases, the default window frame is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This means the window frame is all rows (all rows before the current row, the current row, and all rows after the current row).

In addition, there are two possible scenarios when ORDER BY is not omitted:

  • OVER (ORDER BY …): PARTITION BY is omitted.
  • OVER (PARTITION BY … ORDER BY …): neither PARTITION BY nor ORDER BY is omitted.

In these two cases, the default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This means all values (not rows!) before the current row and the current row itself.

Meaning of Window Frame Abbreviations

While I’m at it, here’s a list of all possible upper and lower bounds for the window frame and what they mean:

AbbreviationMeaning
UNBOUNDED PRECEDINGBETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
n PRECEDINGBETWEEN n PRECEDING AND CURRENT ROW
CURRENT ROWBETWEEN CURRENT ROW AND CURRENT ROW
n FOLLOWINGBETWEEN AND CURRENT ROW AND n FOLLOWING
UNBOUNDED FOLLOWINGBETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

If you want to see this in an example, here’s an article about defining a window frame in SQL window functions.

Now that we have gone through the technical stuff, it’s time to explain the OVER() clause with examples.

The SQL OVER() Clause Examples

To understand the examples, you have to become familiar with the data first.

The Dataset Used

All four examples use the same dataset, which consists of three tables:

  • stores
  • products
  • inventory

The table stores is a list of the stores selling tea. Here’s what it looks like:

idstore_name
1Zaandam
2Amsterdam
3Rotterdam
4Utrecht

The next table shows the list of tea producers and the product they make; it’s the products table:

idproduct_brandproduct_name
1Mr. WellingtonEarl Grey
2Karnataka Tea CompanyKashmir Tchai
3Fukuyama & Co.Matcha
4Fukuyama & Co.Sencha

The table inventory shows the quantities of tea available for every store. It has much more data than the first two tables, so here are just the first several rows:

idinventory_dateopening_quantityclosing_quantityproduct_idstore_id
12021-12-0613811
22021-12-078111
32021-12-08525011
42021-12-09504411
52021-12-10443211
62021-12-06NULLNULL21
72021-12-07NULLNULL21
82021-12-08282021
92021-12-09201721
102021-12-10171721

As you can see, there are NULL values. They mean the tea was not available when the store opened or closed. In the above, product_id = 2 in store_id = 1 wasn’t available for two days: 2021-12-06 and 2021-12-07.

Generally, the closing_quantity of a given day is the opening_quantity of the following day. If it’s not, then it means a new order was delivered and accounted for before the store opened. This is the case, for example, in the highlighted rows.

Ready for some OVER() clause examples? I bet you are!

Example 1: With PARTITION BY but Without ORDER BY

In this example, the task is to show the inventory date, the quantity at store open, and the quantity at store close. We also want the highest quantity at store close for each day regardless of the store or the product.

To do that, I’ve written this query:

SELECT 	inventory_date,
		opening_quantity,
		closing_quantity,
		MAX(closing_quantity) OVER (PARTITION BY inventory_date) AS daily_max
FROM inventory;

This query uses only PARTITION BY in the OVER() clause. What does this query do?

First, it selects three columns from the table inventory. Then I use the MAX() aggregate function to find the highest product quantity at store close. I use the OVER() clause to make this aggregate function a window function. In it, I partition the window by the column inventory_date. This means I get the highest product quantity at store close by inventory date across all stores and across all products.

The table below shows all the data for the first date (December 6) and the first row of the second date (December 7) for simplicity. Of course, the query returns much more data.

inventory_dateopening_quantityclosing_quantitydaily_max
2021-12-0612851397
2021-12-06NULLNULL397
2021-12-068787397
2021-12-062221397
2021-12-06138397
2021-12-06247200397
2021-12-064239397
2021-12-063333397
2021-12-064441397
2021-12-067371397
2021-12-069794397
2021-12-06412397397
2021-12-063933397
2021-12-066258397
2021-12-064944397
2021-12-0679NULL397
2021-12-075854327

The table lists the quantities at store open and store close for every store-product combination. The highest quantity at store close is in the column daily_max. As you can see, it is 397 for the date 2021-12-06. You can verify this is the highest value; it appears in the highlighted row. The NULL values are shown, but they don’t impact the result.

The data is presented the same way for every date in the dataset. You can see the next date is 2021-12-07, and the highest product quantity at store close for this date is 327. The same principle applies to the rest of the table.

Example 2: With Both PARTITION BY and ORDER BY

In this example, I’ll show you how to use both PARTITION BY and ORDER BY in an OVER() clause. To do that, I write a query that displays the product brand, the product name, and the inventory date. I also want to show the daily quantity sold. In addition, I want to see the daily cumulative quantity sold per brand across all stores.

You need to know how to calculate a running total for this example. Before showing you the query, let’s make sure you understand what a running total is. Here it is with the example data:

iddatesalesrunning_totalproduct
12021-12-061,357.221,357.22Ax
22021-12-062,154.883,512.10Hammer
32021-12-071,984.885,496.98Ax
42021-12-073,147.118,644.09Hammer

A running total is the sum of the current row plus all the preceding rows. In the first row, it’s 1,357.22 + 0 = 1,357.22 because there are no preceding rows. For the next row, it is 2,154.88 + 1,357.22 = 3,512.10. The same logic applies to all remaining rows: 1,984.88 + 3,512.10 = 5,496.98 and 3,147.11 + 5,496.98 = 8,644.09.

Now let’s calculate the running total in SQL:

SELECT 	product_brand,
		product_name,
		inventory_date,
		opening_quantity - closing_quantity AS daily_sales,
		SUM(opening_quantity - closing_quantity) OVER (PARTITION BY product_brand ORDER BY inventory_date) AS cumulative_sales_per_brand
FROM products p
JOIN inventory i
ON p.id = i.product_id;

The query selects certain columns from the tables products and inventory. Then, I calculate the difference between the quantities at store open and store close into the column daily_sales. After that, this difference is used in the SUM() window function to get me the sum of daily sales.

The window is partitioned by the column product_brand because I want to see the sum of daily sales at the brand level. The ordinary SUM() aggregate function now becomes a window function for calculating the cumulative sum.

Finally, the window function is executed by inventory date in ascending order. Had I not used ORDER BY, the SUM() function would have been executed in arbitrary order, which is not much of a cumulative sum/running total.

Note: If no ASC or DESC is specified, ASC is the default value.

The query returns the result below. Again, I’m only showing a handful of rows:

product_brandproduct_nameinventory_datedaily_salessales_per_brand
Fukuyama & Co.Matcha2021-12-06527
Fukuyama & Co.Sencha2021-12-06127
Fukuyama & Co.Matcha2021-12-06027
Fukuyama & Co.Sencha2021-12-06027
Fukuyama & Co.Sencha2021-12-06NULL27
Fukuyama & Co.Matcha2021-12-06327
Fukuyama & Co.Sencha2021-12-061527
Fukuyama & Co.Matcha2021-12-06327
Fukuyama & Co.Matcha2021-12-074224
Fukuyama & Co.Sencha2021-12-0798224
Fukuyama & Co.Matcha2021-12-073224
Fukuyama & Co.Matcha2021-12-0711224
Fukuyama & Co.Sencha2021-12-071224
Fukuyama & Co.Sencha2021-12-0770224
Fukuyama & Co.Matcha2021-12-075224
Fukuyama & Co.Sencha2021-12-075224

According to the result, the quantity sold for Fukuyama & Co. products is 27 for 2021-12-06. The total quantity sold on 2021-12-07 is 197. However, the quantity in the column sales_per_brand is 224. Why is that? This is because it’s a cumulative sum that shows the total sold on that day plus the total of all prior days. In other words, you get 224 if you sum all the values in the highlighted cells.

This is done the same way for all other dates and brands.

Example 3: With Both PARTITION BY and ORDER BY

Now, say you want to show the store name, the inventory date, the brand, and the products. Again, we will show the daily quantity sold, but we will also show the average daily quantity sold by each store. This is a cumulative average that changes with every passing day and from the lowest to the highest daily quantity sold.

The general principles of calculating a cumulative average are shown in the table below:

iddatesalescumulative_averageproduct
12021-12-061,357.221,357.22Ax
22021-12-062,154.881,756.05Hammer
32021-12-071,984.881,832.33Ax
42021-12-073,147.112,161.02Hammer

The moving average is calculated as an average of a subset. Generally, moving averages take into calculation the current row, a certain number of rows or values preceding the current row, and/or a certain number of rows or values following the current row. In the above example, I calculate an average that includes the current row and all rows that precede the respective current row. This results in a type of moving average, but I’ll refer to it as a cumulative average to make a distinction.

The cumulative average for the first row is 1,357.22. This is the same as the sales value of the row, because there are no rows before the first one. The calculation for the second row looks like this: (1,357.22 + 2,154.88)/2 = 1,756.05; for the next one: (1,357.22 + 2,154.88 + 1,984.88)/3 = 1,832.33. The last cumulative average is calculated by following the same principle: (1,357.22 + 2,154.88 + 1,984.88 + 3,147.11)/4 = 2,161.02.

Now, the cumulative average in SQL:

SELECT 	store_name,
		inventory_date,
		product_brand,
		product_name,
		opening_quantity - closing_quantity AS daily_sales,
		AVG(opening_quantity - closing_quantity) OVER (
PARTITION BY store_name
ORDER BY inventory_date, opening_quantity - closing_quantity
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_avg_sales
FROM stores s
JOIN inventory i
ON s.id = i.store_id
JOIN products p
ON p.id = i.product_id;

I select several columns from all three tables. Then, I calculate the daily sales the same way as in Example 2. I use the AVG() aggregate function to calculate the average daily sales. I need the average by store, so I partition the window by the column store_name.

Then, I order the data in the partition by inventory date and daily sales using an ORDER BY clause. It means the window function calculates the average from the oldest to the newest date and from the lowest to the highest daily sales. Why? I want to see my cumulative averages as they change.

Finally, the window frame is defined as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The average includes the current row and all the preceding rows – within the defined window partition, of course.

Take a look at the resulting table:

store_nameinventory_dateproduct_brandproduct_namedaily_salescumulative_avg_sales
Zaandam2021-12-06Fukuyama & Co.Matcha00.00
Zaandam2021-12-06Fukuyama & Co.Sencha10.50
Zaandam2021-12-06Mr. WellingtonEarl Grey52.00
Zaandam2021-12-06Karnataka Tea CompanyKashmir TchaiNULL2.00
Zaandam2021-12-07Fukuyama & Co.Sencha11.75
Zaandam2021-12-07Mr. WellingtonEarl Grey72.80
Zaandam2021-12-07Fukuyama & Co.Matcha114.17
Zaandam2021-12-07Karnataka Tea CompanyKashmir TchaiNULL4.17
Zaandam2021-12-08Fukuyama & Co.Matcha03.57
Zaandam2021-12-08Fukuyama & Co.Sencha13.25
Zaandam2021-12-08Mr. WellingtonEarl Grey23.11
Zaandam2021-12-08Karnataka Tea CompanyKashmir Tchai83.60
Zaandam2021-12-09Fukuyama & Co.Sencha23.45
Zaandam2021-12-09Karnataka Tea CompanyKashmir Tchai33.42
Zaandam2021-12-09Mr. WellingtonEarl Grey63.62
Zaandam2021-12-09Fukuyama & Co.Matcha245.07
Zaandam2021-12-10Karnataka Tea CompanyKashmir Tchai04.73
Zaandam2021-12-10Fukuyama & Co.Matcha04.44
Zaandam2021-12-10Fukuyama & Co.Sencha14.24
Zaandam2021-12-10Mr. WellingtonEarl Grey124.67

Let me prove the calculation is correct by using the same principle you already know. The cumulative average for the first row is 0.00 because the value of this row is 0 and there are no preceding rows.

How about the second row? (0 + 1)/2 = 0.50. So far, so good! The third row: (0 + 1 + 5)/3 = 2.00.

That’s easy. The same with the next row: (0 + 1 + 5 + NULL)/4 = 1.5. But the above table says the cumulative average is 2.00, not 1.5.

Ha, SQL has made an error, right? No, you have made an error if you have treated the NULL values as 0. When calculating averages, SQL simply ignores NULL values. So, the calculation for this row is again (0 + 1 + 5)/3 = 2.00. This is exactly what is shown in the table.

How about the next row? You skip the row with the NULL value and calculate the cumulative average this way: (0 + 1 + 5 + 1)/4 = 1.75.

Example 4: Without PARTITION BY but With ORDER BY DESC

This time, I’ll show you how to use window functions without the PARTITION BY clause but with the data in the window partition in descending order.

I want to rank the data by quantity at store open in descending order. In addition to the rank, I want the result to show the following columns: inventory_date, product_brand, product_name, and opening_quantity.

Take a look:

SELECT	inventory_date,
		product_brand,
		product_name,
		opening_quantity,
		DENSE_RANK() OVER (ORDER BY opening_quantity DESC)
FROM inventory i
JOIN products p
ON i.product_id = p.id;

First, there are several columns from the tables inventory and products. I use the DENSE_RANK() function to rank the values. You may also use the RANK() or the ROW_NUMBER() function, but the result may be slightly different.

In the OVER() clause, there’s an ORDER BY used with a DESC keyword, so the ranking goes from the highest to the lowest quantity at store open. No partition is defined, so the ranking is applied to the whole dataset.

Here are several rows from the data the query returns:

inventory_dateproduct_brandproduct_nameopening_quantityrank
2021-12-06Karnataka Tea CompanyKashmir TchaiNULL1
2021-12-07Karnataka Tea CompanyKashmir TchaiNULL1
2021-12-09Fukuyama & Co.SenchaNULL1
2021-12-08Karnataka Tea CompanyKashmir Tchai8202
2021-12-09Karnataka Tea CompanyKashmir Tchai8093
2021-12-09Karnataka Tea CompanyKashmir Tchai7044
2021-12-06Fukuyama & Co.Sencha4125
2021-12-07Fukuyama & Co.Sencha3976
2021-12-08Fukuyama & Co.Sencha3277

As you can see, all NULL values are ranked 1. This is because I’m using PostgreSQL, which puts NULL values at the top when data is in descending order. Since the first three rows have the same value, they have the same rank.

The quantity 820 is ranked 2, while the quantity 809 is ranked 3. This goes on in the same way throughout the dataset.

To show the same rank is allocated not only when the values are NULL, here are two rows you don’t see above:

inventory_dateproduct_brandproduct_nameopening_quantityrank
2021-12-06Fukuyama & Co.Matcha8719
2021-12-07Fukuyama & Co.Matcha8719

As you see, the rows with duplicate values are ranked the same.

Ranking functions can be very useful. I recommend you learn how to use all the ranking functions with PARTITION BY, too.

Learning the SQL OVER() Clause Is Not Yet Over

My explanation of the OVER() clause is done, but that doesn’t mean your learning and practice are over. Now that you understand PARTITION BY, ORDER BY, and the window frame, continue to build on that knowledge.

If you like reading about window functions, here’s a list of the best articles about window functions. If practice is your thing, here’s an article showing you several window function examples.

The best combination of learning through reading and practicing is in our Window Functions Course. It’s a part of the Advanced SQL Track, which, aside from window functions, covers topics such as GROUP BY extensions and recursive queries. Check it out!