What Is the OVER() Clause in SQL?
Window functions are one of SQL’s most powerful resources, but they are not frequently used by the average SQL developer. In this article, we will explain how you can define different kinds of window frames using the OVER
clause.
The OVER
clause is essential to SQL window functions. Like aggregation functions, window functions perform calculations based on a set of records – e.g. finding the average salary across a group of employees.
The best way to learn window functions is our interactive Window Functions course. There are 218 exercises that will teach you how window functions work, what functions there are, and how to apply them to real-world problems. You only need a web browser and some basic SQL knowledge.
In some cases, aggregate functions cannot be used because they collapse all the individual records into a group; this makes it impossible to refer to specific values (like one employee’s salary out of the group). In these situations, window functions are preferred because they do not collapse rows; you can refer to a column value at the row level as well as the aggregate value.
There are other scenarios where window functions are useful. For example, we may need to do arithmetic involving an individual column and a calculation based on a set of rows. A real-world instance of this is calculating the difference between the average department salary and the salary of each employee in the department.
When using window functions, the definition of the set of records where the function will be calculated is critical. This set of records is called the window frame; we define it using the SQL OVER
clause.
Throughout this article, we will demonstrate SQL queries using the database of a small luxury watch company. The company stores their sales information in a table called sales
:
sale_day | sale_month | sale_time | branch | article | quantity | revenue |
---|---|---|---|---|---|---|
2021-08-11 | AUG | 11:00 | New York | Rolex P1 | 1 | 3000.00 |
2021-08-14 | AUG | 11:20 | New York | Rolex P1 | 2 | 6000.00 |
2021-08-17 | AUG | 10:00 | Paris | Omega 100 | 3 | 4000.00 |
2021-08-19 | AUG | 10:00 | London | Omega 100 | 1 | 1300.00 |
2021-07-17 | JUL | 09:30 | Paris | Cartier A1 | 1 | 2000.00 |
2021-07-11 | JUL | 10:10 | New York | Cartier A1 | 1 | 2000.00 |
2021-07-10 | JUL | 11:40 | London | Omega 100 | 2 | 2600.00 |
2021-07-15 | JUL | 10:30 | London | Omega 100 | 3 | 4000.00 |
The window frame is a set of rows that depends on the current row; thus, the set of rows could change for each row processed by the query. We define window frames using the OVER
clause. The syntax is:
OVER ([PARTITION BY columns] [ ORDER BY columns]) |
The PARTITION BY
subclause defines the criteria that the records must satisfy to be part of the window frame. In other words, PARTITION BY
defines the groups into which the rows are divided; this will be clearer in our next example query. Finally, the ORDER BY
clause defines the order of the records in the window frame.
Let’s see the SQL OVER
clause in action. Here’s a simple query that returns the total quantity of units sold for each article.
SELECT sale_day, sale_time, branch, article, quantity, revenue, SUM (quantity) OVER (PARTITION BY article) AS total_units_sold FROM sales |
This query will show all the records of the sales
table with a new column displaying the total number of units sold for the relevant article. We can obtain the quantity of units sold using the SUM
aggregation function, but then we couldn’t show the individual records.
In this query, the OVER PARTITION BY
article subclause indicates that the window frame is determined by the values in the article
column; all records with the same article
value will be in one group. Below, we have the result of this query:
sale day | sale time | branch | article | quantity | revenue | total units sold |
---|---|---|---|---|---|---|
2021-07-11 | 10:10 | New York | Cartier A1 | 1 | 2000.00 | 2 |
2021-07-17 | 9:30 | Paris | Cartier A1 | 1 | 2000.00 | 2 |
2021-08-19 | 10:00 | London | Omega 100 | 1 | 1300.00 | 9 |
2021-07-15 | 10:30 | London | Omega 100 | 3 | 4000.00 | 9 |
2021-08-17 | 10:00 | Paris | Omega 100 | 3 | 4000.00 | 9 |
2021-07-10 | 11:40 | London | Omega 100 | 2 | 2600.00 | 9 |
2021-08-11 | 11:00 | New York | Rolex P1 | 1 | 3000.00 | 3 |
2021-08-14 | 11:20 | New York | Rolex P1 | 2 | 6000.00 | 3 |
The total_units_sold
column of the report was obtained by the expression:
SUM (quantity) OVER (PARTITION BY article) total_units_sold |
For those readers who want to go deeper into the subject, I suggest the following two articles: What Is The Difference Between GROUP BY and PARTITION BY and Window Functions in SQL Server: Part One: The OVER() Clause
The SQL OVER Clause In Action
For each article, suppose we want to compare the total quantity of this article sold in each month of 2021 with the total quantity of this article sold in the whole year. To do this, we will create a simple report with the columns article
, month
, units_sold_month
and units_sold_year
. The query is:
SELECT DISTINCT article, EXTRACT( 'month' FROM sale_day) AS month , SUM (quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month, SUM (quantity) OVER (PARTITION BY article) AS units_sold_year FROM sales WHERE EXTRACT( 'YEAR' FROM sale_day) = 2021 ORDER BY article, month |
The results of the query are:
article | month | units_sold_month | units_sold_year |
---|---|---|---|
Cartier A1 | 7 | 2 | 2 |
Omega 100 | 7 | 5 | 9 |
Omega 100 | 8 | 4 | 9 |
Rolex P1 | 8 | 3 | 3 |
Here, we calculated the total units sold using two different grouping granularities: month and year. The first OVER
clause ...
OVER (PARTITION BY article, sale_month) |
… allows us to obtain the number of units of each article sold in a month. The second OVER
clause ...
OVER (PARTITION BY article) |
… allows us to compute the total number of units of a given article sold in the whole year.
In the next query, we will just add the column month_percentage
to show the percentage that a specific month occupies within the yearly total. We can calculate it using the following query:
SELECT DISTINCT article, EXTRACT( 'month' FROM sale_day) as month , SUM (quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month, SUM (quantity) OVER (PARTITION BY article) AS units_sold_year, ( ( SUM (quantity) OVER (PARTITION BY article, sale_month):: decimal / SUM (quantity) OVER (PARTITION BY article):: decimal ) * 100 ) AS month_percentage FROM sales WHERE extract( 'YEAR' FROM sale_day) = 2021 ORDER BY article, month |
Note that in the previous query, we didn’t use different OVER
clauses; we just reused the same window functions and calculated a percentage. You can see the results below:
article | month | units_sold_month | units_sold_year | month_percentage |
---|---|---|---|---|
Cartier A1 | 7 | 2 | 2 | 100.00 |
Omega 100 | 7 | 5 | 9 | 55.55 |
Omega 100 | 8 | 4 | 9 | 45.44 |
Rolex P1 | 8 | 3 | 3 | 100.00 |
We will now create a different report that looks at how various branches are performing. We want to see the columns branch
and month
. We also need calculations to obtain the:
- Total revenue for that month.
- Revenue grouped by branch and month.
- Monthly average branch revenue.
- Difference between each branch’s revenue and the monthly average revenue.
SELECT DISTINCT branch, EXTRACT( 'month' FROM sale_day) AS month , SUM (revenue) OVER (PARTITION BY sale_month) AS total_revenue_month, SUM (revenue) OVER (PARTITION BY branch, sale_month) AS branch_revenue_month, -- Next column is the branch average revenue in the current month ( SUM (revenue) OVER (PARTITION BY sale_month):: decimal / ( SELECT COUNT ( DISTINCT branch) FROM sales):: decimal ) AS average_month_branch, -- Next column is the difference between branch revenue and average branch revenue SUM (revenue) OVER (PARTITION BY branch, sale_month) - ( SUM (revenue) OVER (PARTITION BY sale_month):: decimal / ( SELECT COUNT ( DISTINCT branch) FROM sales):: decimal ) AS gap_branch_average FROM sales WHERE extract( 'YEAR' from sale_day) = 2021 ORDER BY branch, month |
Once again we used only two OVER
clauses, but we used different arithmetic expressions to obtain certain values. We used ...
SUM (revenue) OVER (PARTITION BY sale_month) |
… to calculate the total revenue of the month, but we also used it in an arithmetic expression to obtain the monthly average branch revenue.
We used ...
SUM (revenue) OVER (PARTITION BY branch, sale_month) |
… to calculate the monthly branch revenue and the difference between that branch’s monthly revenue and the average.
The next table is the query result. Notice that the gap_branch_average
column can contain positive or negative numbers. A negative number indicates this branch’s monthly revenue was less than the average revenue.
Branch | Month | total_revenue_month | branch_revenue_month | average_month_branch | gap_branch_average |
---|---|---|---|---|---|
London | 7 | 10600 | 6600 | 3533.33 | 3066.66 |
London | 8 | 14300 | 1300 | 4766.66 | -3466.66 |
New York | 7 | 10600 | 2000 | 3533.33 | -1533.33 |
New York | 8 | 14300 | 9000 | 4766.66 | 4233.33 |
Paris | 7 | 10600 | 2000 | 3533.33 | -1533.33 |
Paris | 8 | 14300 | 4000 | 4766.66 | -766.66 |
For additional information on window functions in SQL, I suggest SQL Window Function Example With Explanations, an entry level-article on window functions. For more advanced readers, How to Rank Rows Within a Partition in SQL shows how to create rankings in your reports using the RANK()
window function.
The OVER Clause and Analytical Window Functions
In the previous queries, we used window functions to compare monthly numbers (revenue and units sold, respectively) with yearly figures. In this section, we will use ordered window frames, which allows us to pick a record in the frame based on its position. For example, we can choose the first record in the window frame, or the record previous to the current record, or the record after the current record. These analytical window functions provide great expressive power to SQL.
In the following query, we will show the revenue increase/decrease for the same branch in two contiguous months. To do that, we need to calculate the difference between the current month’s revenue and the previous month's revenue. This requires the analytic window function LAG()
, which can obtain a column value from a row previous to the current row.
WITH branch_month_sales AS ( SELECT DISTINCT branch, EXTRACT( 'MONTH' FROM sale_day) AS month , SUM (revenue) OVER (PARTITION BY branch, sale_month ) AS revenue FROM sales ) SELECT branch, month , revenue AS revenue_current_month, LAG(revenue) OVER (PARTITION BY branch ORDER BY month ) AS revenue_prev_month, revenue - LAG(revenue) OVER (PARTITION BY branch ORDER BY month ) AS revenue_delta FROM branch_month_sales ORDER BY branch, month |
In this query, we used a common table expression (CTE) called branch_month_sales
to store the total revenue of each branch and month. Then we wrote a second query that uses the LAG()
window function to obtain the revenue of the previous month (using the info from branch_month_sales
). Note that the window frame is ordered by month.
Here are the results:
Branch | Month | revenue_current_month | revenue_prev_month | revenue_delta |
---|---|---|---|---|
London | 7 | 6600 | null | null |
London | 8 | 1300 | 6600 | -5300 |
New York | 7 | 2000 | null | null |
New York | 8 | 9000 | 2000 | 7000 |
Paris | 7 | 2000 | null | null |
Paris | 8 | 4000 | 2000 | 2000 |
In all the queries shown in this article, we’ve used just a few window functions. There are quite a few other window functions in SQL. Here’s a list of each one:
function | syntax | return value |
---|---|---|
AVG() | AVG(expression) | The average within the OVER partition. |
COUNT() | COUNT() | The number of rows within the OVER partition. |
MAX() | MAX(expression) | The maximum value of a column or expression for each partition. |
MIN() | MIN(expression) | The minimum value of a column or expression for each partition. |
SUM() | SUM(expression) | The total of all values in a column within a partition. |
ROW_NUMBER() | ROW_NUMBER() | Assigns a unique number to each row within a partition. Rows with identical values are given row different numbers. |
RANK() | RANK() | Ranks rows by column values within a partition. Gaps and tied rankings are permitted. |
DENSE_RANK() | DENSE_RANK() | Ranks row by column values within a partition. There are no gaps in the ranking, but tied rankings are permitted. |
PERCENT_RANK() | PERCENT_RANK() | Assigns a percentile ranking number to each row in a partition. To calculate a value in the [0, 1] interval, we use (rank - 1) / (total number of rows - 1). |
CUME_DIST() | CUME_DIST() | Shows the cumulative distribution of a value within a group of values, i.e. the number of rows with values less than or equal to the current row’s value divided by the total number of rows. |
LEAD() | LEAD(expr, offset, default) | The value of the row n number of rows after the current row. The offset and default arguments are optional; it will return the next row value by default. |
LAG() | LAG(expr, offset, default) | The value of the row n number of rows before the current row. The offset and default arguments are optional; it will return the previous row value by default. |
NTILE() | NTILE(n) | Divides rows within a partition into n groups and assigns each row a group number. |
FIRST_VALUE() | FIRST_VALUE(expr) | The value for the first row within the window frame. |
LAST_VALUE() | LAST_VALUE(expr) | The value for the last row within the window frame. |
NTH_VALUE() | NTH_VALUE(expr, n) | The value for the n-th row within the window frame. |
Learn More About the SQL OVER Clause
In this article, we covered the OVER
clause and the PARTITION BY
and ORDER BY subclauses.
If you want to continue learning about window functions, there are a pair of articles I would like to share. The first one is 8 Best SQL Window Function Articles, which will point you to other great articles. The second is a window function cheat sheet that includes syntax, examples, and images; it’s my favorite article about window functions.
For those who want to go deeper, I suggest our interactive Window Functions SQL course. If you want to increase your SQL skills in general, try the track SQL From A to Z. It’s a comprehensive look at everything you need to know to work effectively with SQL.