30th Sep 2021 11 minutes read

What Is the OVER() Clause in SQL?

Window Functions

Table of Contents

The SQL OVER Clause In Action
The OVER Clause and Analytical Window Functions
Learn More About the SQL OVER Clause

Window functions are one of SQL’s most powerful resources, but they are not frequently used by the average SQL developer. In this article, we will explain how you can define different kinds of window frames using the OVER clause.

The OVER clause is essential to SQL window functions. Like aggregation functions, window functions perform calculations based on a set of records – e.g. finding the average salary across a group of employees.

The best way to learn window functions is our interactive Window Functions course. There are 218 exercises that will teach you how window functions work, what functions there are, and how to apply them to real-world problems. You only need a web browser and some basic SQL knowledge.

In some cases, aggregate functions cannot be used because they collapse all the individual records into a group; this makes it impossible to refer to specific values (like one employee’s salary out of the group). In these situations, window functions are preferred because they do not collapse rows; you can refer to a column value at the row level as well as the aggregate value.

There are other scenarios where window functions are useful. For example, we may need to do arithmetic involving an individual column and a calculation based on a set of rows. A real-world instance of this is calculating the difference between the average department salary and the salary of each employee in the department.

When using window functions, the definition of the set of records where the function will be calculated is critical. This set of records is called the window frame; we define it using the SQL OVER clause.

Throughout this article, we will demonstrate SQL queries using the database of a small luxury watch company. The company stores their sales information in a table called sales:

sale_day	sale_month	sale_time	branch	article	quantity	revenue
2021-08-11	AUG	11:00	New York	Rolex P1	1	3000.00
2021-08-14	AUG	11:20	New York	Rolex P1	2	6000.00
2021-08-17	AUG	10:00	Paris	Omega 100	3	4000.00
2021-08-19	AUG	10:00	London	Omega 100	1	1300.00
2021-07-17	JUL	09:30	Paris	Cartier A1	1	2000.00
2021-07-11	JUL	10:10	New York	Cartier A1	1	2000.00
2021-07-10	JUL	11:40	London	Omega 100	2	2600.00
2021-07-15	JUL	10:30	London	Omega 100	3	4000.00

The window frame is a set of rows that depends on the current row; thus, the set of rows could change for each row processed by the query. We define window frames using the OVER clause. The syntax is:

  OVER ([PARTITION BY columns] [ORDER BY columns])

The PARTITION BY subclause defines the criteria that the records must satisfy to be part of the window frame. In other words, PARTITION BY defines the groups into which the rows are divided; this will be clearer in our next example query. Finally, the ORDER BY clause defines the order of the records in the window frame.

Let’s see the SQL OVER clause in action. Here’s a simple query that returns the total quantity of units sold for each article.

SELECT sale_day, sale_time, 
       branch, article, quantity, revenue,
       SUM(quantity) OVER (PARTITION BY article) AS total_units_sold
FROM   sales

This query will show all the records of the sales table with a new column displaying the total number of units sold for the relevant article. We can obtain the quantity of units sold using the SUM aggregation function, but then we couldn’t show the individual records.

In this query, the OVER PARTITION BY article subclause indicates that the window frame is determined by the values in the article column; all records with the same article value will be in one group. Below, we have the result of this query:

sale day	sale time	branch	article	quantity	revenue	total units sold
2021-07-11	10:10	New York	Cartier A1	1	2000.00	2
2021-07-17	9:30	Paris	Cartier A1	1	2000.00	2
2021-08-19	10:00	London	Omega 100	1	1300.00	9
2021-07-15	10:30	London	Omega 100	3	4000.00	9
2021-08-17	10:00	Paris	Omega 100	3	4000.00	9
2021-07-10	11:40	London	Omega 100	2	2600.00	9
2021-08-11	11:00	New York	Rolex P1	1	3000.00	3
2021-08-14	11:20	New York	Rolex P1	2	6000.00	3

The total_units_sold column of the report was obtained by the expression:

SUM(quantity) OVER (PARTITION BY article) total_units_sold

For those readers who want to go deeper into the subject, I suggest the following two articles: What Is The Difference Between GROUP BY and PARTITION BY and Window Functions in SQL Server: Part One: The OVER() Clause

The SQL OVER Clause In Action

For each article, suppose we want to compare the total quantity of this article sold in each month of 2021 with the total quantity of this article sold in the whole year. To do this, we will create a simple report with the columns article, month, units_sold_month and units_sold_year. The query is:

SELECT DISTINCT article,
       EXTRACT('month' FROM sale_day) AS month,
       SUM(quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month,
       SUM(quantity) OVER (PARTITION BY article) AS units_sold_year
FROM  sales 
WHERE EXTRACT('YEAR' FROM sale_day) = 2021
ORDER BY article, month

The results of the query are:

article	month	units_sold_month	units_sold_year
Cartier A1	7	2	2
Omega 100	7	5	9
Omega 100	8	4	9
Rolex P1	8	3	3

Here, we calculated the total units sold using two different grouping granularities: month and year. The first OVER clause ...

OVER (PARTITION BY article, sale_month)

… allows us to obtain the number of units of each article sold in a month. The second OVER clause ...

OVER (PARTITION BY article)

… allows us to compute the total number of units of a given article sold in the whole year.

In the next query, we will just add the column month_percentage to show the percentage that a specific month occupies within the yearly total. We can calculate it using the following query:

SELECT DISTINCT article,
       EXTRACT('month' FROM sale_day) as month,
       SUM(quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month,
       SUM(quantity) OVER (PARTITION BY article) AS units_sold_year,
       ( ( SUM(quantity) OVER (PARTITION BY article, sale_month)::decimal /
           SUM(quantity) OVER (PARTITION BY article)::decimal ) * 100
       ) AS month_percentage
FROM sales 
WHERE extract('YEAR' FROM sale_day) = 2021
ORDER BY article, month

Note that in the previous query, we didn’t use different OVER clauses; we just reused the same window functions and calculated a percentage. You can see the results below:

article	month	units_sold_month	units_sold_year	month_percentage
Cartier A1	7	2	2	100.00
Omega 100	7	5	9	55.55
Omega 100	8	4	9	45.44
Rolex P1	8	3	3	100.00

We will now create a different report that looks at how various branches are performing. We want to see the columns branch and month. We also need calculations to obtain the:

Total revenue for that month.
Revenue grouped by branch and month.
Monthly average branch revenue.
Difference between each branch’s revenue and the monthly average revenue.

SELECT DISTINCT branch,
       EXTRACT('month' FROM sale_day) AS month,
       SUM(revenue) OVER (PARTITION BY sale_month) AS total_revenue_month,
       SUM(revenue) OVER (PARTITION BY branch, sale_month) AS branch_revenue_month,
      
 -- Next column is the branch average revenue in the current month
	 ( 
		  SUM(revenue) OVER (PARTITION BY sale_month)::decimal 
		   / 
		  (SELECT COUNT(DISTINCT branch) FROM sales)::decimal 
	 ) AS average_month_branch,
 
 -- Next column is the difference between branch revenue and average branch revenue

        SUM(revenue) OVER (PARTITION BY branch, sale_month) -
	 ( 
		  SUM(revenue) OVER (PARTITION BY sale_month)::decimal 
		   / 
		  (SELECT COUNT(DISTINCT branch) FROM sales)::decimal 
	 ) AS gap_branch_average

FROM sales 
WHERE extract('YEAR' from sale_day) = 2021
ORDER BY branch, month

Once again we used only two OVER clauses, but we used different arithmetic expressions to obtain certain values. We used ...

SUM(revenue) OVER (PARTITION BY sale_month)

… to calculate the total revenue of the month, but we also used it in an arithmetic expression to obtain the monthly average branch revenue.

We used ...

SUM(revenue) OVER (PARTITION BY branch, sale_month)

… to calculate the monthly branch revenue and the difference between that branch’s monthly revenue and the average.

The next table is the query result. Notice that the gap_branch_average column can contain positive or negative numbers. A negative number indicates this branch’s monthly revenue was less than the average revenue.

Branch	Month	total_revenue_month	branch_revenue_month	average_month_branch	gap_branch_average
London	7	10600	6600	3533.33	3066.66
London	8	14300	1300	4766.66	-3466.66
New York	7	10600	2000	3533.33	-1533.33
New York	8	14300	9000	4766.66	4233.33
Paris	7	10600	2000	3533.33	-1533.33
Paris	8	14300	4000	4766.66	-766.66

For additional information on window functions in SQL, I suggest SQL Window Function Example With Explanations, an entry level-article on window functions. For more advanced readers, How to Rank Rows Within a Partition in SQL shows how to create rankings in your reports using the RANK() window function.

The OVER Clause and Analytical Window Functions

In the previous queries, we used window functions to compare monthly numbers (revenue and units sold, respectively) with yearly figures. In this section, we will use ordered window frames, which allows us to pick a record in the frame based on its position. For example, we can choose the first record in the window frame, or the record previous to the current record, or the record after the current record. These analytical window functions provide great expressive power to SQL.

In the following query, we will show the revenue increase/decrease for the same branch in two contiguous months. To do that, we need to calculate the difference between the current month’s revenue and the previous month's revenue. This requires the analytic window function LAG(), which can obtain a column value from a row previous to the current row.

WITH branch_month_sales AS (
	SELECT    DISTINCT
		    branch,
		    EXTRACT('MONTH' FROM sale_day) AS month,
		    SUM(revenue) OVER (PARTITION BY branch, sale_month ) AS revenue
	FROM sales
)
SELECT branch, 
	month,
	revenue AS revenue_current_month,
	LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_prev_month,
	revenue - LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_delta
FROM branch_month_sales
ORDER BY branch, month

In this query, we used a common table expression (CTE) called branch_month_sales to store the total revenue of each branch and month. Then we wrote a second query that uses the LAG() window function to obtain the revenue of the previous month (using the info from branch_month_sales). Note that the window frame is ordered by month.

Here are the results:

Branch	Month	revenue_current_month	revenue_prev_month	revenue_delta
London	7	6600	null	null
London	8	1300	6600	-5300
New York	7	2000	null	null
New York	8	9000	2000	7000
Paris	7	2000	null	null
Paris	8	4000	2000	2000

In all the queries shown in this article, we’ve used just a few window functions. There are quite a few other window functions in SQL. Here’s a list of each one:

function	syntax	return value
AVG()	AVG(expression)	The average within the OVER partition.
COUNT()	COUNT()	The number of rows within the OVER partition.
MAX()	MAX(expression)	The maximum value of a column or expression for each partition.
MIN()	MIN(expression)	The minimum value of a column or expression for each partition.
SUM()	SUM(expression)	The total of all values in a column within a partition.
ROW_NUMBER()	ROW_NUMBER()	Assigns a unique number to each row within a partition. Rows with identical values are given row different numbers.
RANK()	RANK()	Ranks rows by column values within a partition. Gaps and tied rankings are permitted.
DENSE_RANK()	DENSE_RANK()	Ranks row by column values within a partition. There are no gaps in the ranking, but tied rankings are permitted.
PERCENT_RANK()	PERCENT_RANK()	Assigns a percentile ranking number to each row in a partition. To calculate a value in the [0, 1] interval, we use (rank - 1) / (total number of rows - 1).
CUME_DIST()	CUME_DIST()	Shows the cumulative distribution of a value within a group of values, i.e. the number of rows with values less than or equal to the current row’s value divided by the total number of rows.
LEAD()	LEAD(expr, offset, default)	The value of the row n number of rows after the current row. The offset and default arguments are optional; it will return the next row value by default.
LAG()	LAG(expr, offset, default)	The value of the row n number of rows before the current row. The offset and default arguments are optional; it will return the previous row value by default.
NTILE()	NTILE(n)	Divides rows within a partition into n groups and assigns each row a group number.
FIRST_VALUE()	FIRST_VALUE(expr)	The value for the first row within the window frame.
LAST_VALUE()	LAST_VALUE(expr)	The value for the last row within the window frame.
NTH_VALUE()	NTH_VALUE(expr, n)	The value for the n-th row within the window frame.

Learn More About the SQL OVER Clause

In this article, we covered the OVER clause and the PARTITION BY and ORDER BY subclauses.

If you want to continue learning about window functions, there are a pair of articles I would like to share. The first one is 8 Best SQL Window Function Articles, which will point you to other great articles. The second is a window function cheat sheet that includes syntax, examples, and images; it’s my favorite article about window functions.

For those who want to go deeper, I suggest our interactive Window Functions SQL course. If you want to increase your SQL skills in general, try the track SQL From A to Z. It’s a comprehensive look at everything you need to know to work effectively with SQL.

Tags:

Window Functions

The SQL OVER Clause In Action

The OVER Clause and Analytical Window Functions

Learn More About the SQL OVER Clause

You may also like