30th Sep 2021 11 minutes read What Is the OVER() Clause in SQL? Ignacio L. Bisso Window Functions Table of Contents The SQL OVER Clause In Action The OVER Clause and Analytical Window Functions Learn More About the SQL OVER Clause Window functions are one of SQL’s most powerful resources, but they are not frequently used by the average SQL developer. In this article, we will explain how you can define different kinds of window frames using the OVER clause. The OVER clause is essential to SQL window functions. Like aggregation functions, window functions perform calculations based on a set of records – e.g. finding the average salary across a group of employees. The best way to learn window functions is our interactive Window Functions course. There are 218 exercises that will teach you how window functions work, what functions there are, and how to apply them to real-world problems. You only need a web browser and some basic SQL knowledge. In some cases, aggregate functions cannot be used because they collapse all the individual records into a group; this makes it impossible to refer to specific values (like one employee’s salary out of the group). In these situations, window functions are preferred because they do not collapse rows; you can refer to a column value at the row level as well as the aggregate value. There are other scenarios where window functions are useful. For example, we may need to do arithmetic involving an individual column and a calculation based on a set of rows. A real-world instance of this is calculating the difference between the average department salary and the salary of each employee in the department. When using window functions, the definition of the set of records where the function will be calculated is critical. This set of records is called the window frame; we define it using the SQL OVER clause. Throughout this article, we will demonstrate SQL queries using the database of a small luxury watch company. The company stores their sales information in a table called sales: sale_day sale_month sale_time branch article quantity revenue 2021-08-11 AUG 11:00 New York Rolex P1 1 3000.00 2021-08-14 AUG 11:20 New York Rolex P1 2 6000.00 2021-08-17 AUG 10:00 Paris Omega 100 3 4000.00 2021-08-19 AUG 10:00 London Omega 100 1 1300.00 2021-07-17 JUL 09:30 Paris Cartier A1 1 2000.00 2021-07-11 JUL 10:10 New York Cartier A1 1 2000.00 2021-07-10 JUL 11:40 London Omega 100 2 2600.00 2021-07-15 JUL 10:30 London Omega 100 3 4000.00 The window frame is a set of rows that depends on the current row; thus, the set of rows could change for each row processed by the query. We define window frames using the OVER clause. The syntax is: OVER ([PARTITION BY columns] [ORDER BY columns]) The PARTITION BY subclause defines the criteria that the records must satisfy to be part of the window frame. In other words, PARTITION BY defines the groups into which the rows are divided; this will be clearer in our next example query. Finally, the ORDER BY clause defines the order of the records in the window frame. Let’s see the SQL OVER clause in action. Here’s a simple query that returns the total quantity of units sold for each article. SELECT sale_day, sale_time, branch, article, quantity, revenue, SUM(quantity) OVER (PARTITION BY article) AS total_units_sold FROM sales This query will show all the records of the sales table with a new column displaying the total number of units sold for the relevant article. We can obtain the quantity of units sold using the SUM aggregation function, but then we couldn’t show the individual records. In this query, the OVER PARTITION BY article subclause indicates that the window frame is determined by the values in the article column; all records with the same article value will be in one group. Below, we have the result of this query: sale day sale time branch article quantity revenue total units sold 2021-07-11 10:10 New York Cartier A1 1 2000.00 2 2021-07-17 9:30 Paris Cartier A1 1 2000.00 2 2021-08-19 10:00 London Omega 100 1 1300.00 9 2021-07-15 10:30 London Omega 100 3 4000.00 9 2021-08-17 10:00 Paris Omega 100 3 4000.00 9 2021-07-10 11:40 London Omega 100 2 2600.00 9 2021-08-11 11:00 New York Rolex P1 1 3000.00 3 2021-08-14 11:20 New York Rolex P1 2 6000.00 3 The total_units_sold column of the report was obtained by the expression: SUM(quantity) OVER (PARTITION BY article) total_units_sold For those readers who want to go deeper into the subject, I suggest the following two articles: What Is The Difference Between GROUP BY and PARTITION BY and Window Functions in SQL Server: Part One: The OVER() Clause The SQL OVER Clause In Action For each article, suppose we want to compare the total quantity of this article sold in each month of 2021 with the total quantity of this article sold in the whole year. To do this, we will create a simple report with the columns article, month, units_sold_month and units_sold_year. The query is: SELECT DISTINCT article, EXTRACT('month' FROM sale_day) AS month, SUM(quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month, SUM(quantity) OVER (PARTITION BY article) AS units_sold_year FROM sales WHERE EXTRACT('YEAR' FROM sale_day) = 2021 ORDER BY article, month The results of the query are: article month units_sold_month units_sold_year Cartier A1 7 2 2 Omega 100 7 5 9 Omega 100 8 4 9 Rolex P1 8 3 3 Here, we calculated the total units sold using two different grouping granularities: month and year. The first OVER clause ... OVER (PARTITION BY article, sale_month) … allows us to obtain the number of units of each article sold in a month. The second OVER clause ... OVER (PARTITION BY article) … allows us to compute the total number of units of a given article sold in the whole year. In the next query, we will just add the column month_percentage to show the percentage that a specific month occupies within the yearly total. We can calculate it using the following query: SELECT DISTINCT article, EXTRACT('month' FROM sale_day) as month, SUM(quantity) OVER (PARTITION BY article, sale_month) AS units_sold_month, SUM(quantity) OVER (PARTITION BY article) AS units_sold_year, ( ( SUM(quantity) OVER (PARTITION BY article, sale_month)::decimal / SUM(quantity) OVER (PARTITION BY article)::decimal ) * 100 ) AS month_percentage FROM sales WHERE extract('YEAR' FROM sale_day) = 2021 ORDER BY article, month Note that in the previous query, we didn’t use different OVER clauses; we just reused the same window functions and calculated a percentage. You can see the results below: article month units_sold_month units_sold_year month_percentage Cartier A1 7 2 2 100.00 Omega 100 7 5 9 55.55 Omega 100 8 4 9 45.44 Rolex P1 8 3 3 100.00 We will now create a different report that looks at how various branches are performing. We want to see the columns branch and month. We also need calculations to obtain the: Total revenue for that month. Revenue grouped by branch and month. Monthly average branch revenue. Difference between each branch’s revenue and the monthly average revenue. SELECT DISTINCT branch, EXTRACT('month' FROM sale_day) AS month, SUM(revenue) OVER (PARTITION BY sale_month) AS total_revenue_month, SUM(revenue) OVER (PARTITION BY branch, sale_month) AS branch_revenue_month, -- Next column is the branch average revenue in the current month ( SUM(revenue) OVER (PARTITION BY sale_month)::decimal / (SELECT COUNT(DISTINCT branch) FROM sales)::decimal ) AS average_month_branch, -- Next column is the difference between branch revenue and average branch revenue SUM(revenue) OVER (PARTITION BY branch, sale_month) - ( SUM(revenue) OVER (PARTITION BY sale_month)::decimal / (SELECT COUNT(DISTINCT branch) FROM sales)::decimal ) AS gap_branch_average FROM sales WHERE extract('YEAR' from sale_day) = 2021 ORDER BY branch, month Once again we used only two OVER clauses, but we used different arithmetic expressions to obtain certain values. We used ... SUM(revenue) OVER (PARTITION BY sale_month) … to calculate the total revenue of the month, but we also used it in an arithmetic expression to obtain the monthly average branch revenue. We used ... SUM(revenue) OVER (PARTITION BY branch, sale_month) … to calculate the monthly branch revenue and the difference between that branch’s monthly revenue and the average. The next table is the query result. Notice that the gap_branch_average column can contain positive or negative numbers. A negative number indicates this branch’s monthly revenue was less than the average revenue. Branch Month total_revenue_month branch_revenue_month average_month_branch gap_branch_average London 7 10600 6600 3533.33 3066.66 London 8 14300 1300 4766.66 -3466.66 New York 7 10600 2000 3533.33 -1533.33 New York 8 14300 9000 4766.66 4233.33 Paris 7 10600 2000 3533.33 -1533.33 Paris 8 14300 4000 4766.66 -766.66 For additional information on window functions in SQL, I suggest SQL Window Function Example With Explanations, an entry level-article on window functions. For more advanced readers, How to Rank Rows Within a Partition in SQL shows how to create rankings in your reports using the RANK() window function. The OVER Clause and Analytical Window Functions In the previous queries, we used window functions to compare monthly numbers (revenue and units sold, respectively) with yearly figures. In this section, we will use ordered window frames, which allows us to pick a record in the frame based on its position. For example, we can choose the first record in the window frame, or the record previous to the current record, or the record after the current record. These analytical window functions provide great expressive power to SQL. In the following query, we will show the revenue increase/decrease for the same branch in two contiguous months. To do that, we need to calculate the difference between the current month’s revenue and the previous month's revenue. This requires the analytic window function LAG(), which can obtain a column value from a row previous to the current row. WITH branch_month_sales AS ( SELECT DISTINCT branch, EXTRACT('MONTH' FROM sale_day) AS month, SUM(revenue) OVER (PARTITION BY branch, sale_month ) AS revenue FROM sales ) SELECT branch, month, revenue AS revenue_current_month, LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_prev_month, revenue - LAG(revenue) OVER (PARTITION BY branch ORDER BY month) AS revenue_delta FROM branch_month_sales ORDER BY branch, month In this query, we used a common table expression (CTE) called branch_month_sales to store the total revenue of each branch and month. Then we wrote a second query that uses the LAG() window function to obtain the revenue of the previous month (using the info from branch_month_sales). Note that the window frame is ordered by month. Here are the results: Branch Month revenue_current_month revenue_prev_month revenue_delta London 7 6600 null null London 8 1300 6600 -5300 New York 7 2000 null null New York 8 9000 2000 7000 Paris 7 2000 null null Paris 8 4000 2000 2000 In all the queries shown in this article, we’ve used just a few window functions. There are quite a few other window functions in SQL. Here’s a list of each one: function syntax return value AVG() AVG(expression) The average within the OVER partition. COUNT() COUNT() The number of rows within the OVER partition. MAX() MAX(expression) The maximum value of a column or expression for each partition. MIN() MIN(expression) The minimum value of a column or expression for each partition. SUM() SUM(expression) The total of all values in a column within a partition. ROW_NUMBER() ROW_NUMBER() Assigns a unique number to each row within a partition. Rows with identical values are given row different numbers. RANK() RANK() Ranks rows by column values within a partition. Gaps and tied rankings are permitted. DENSE_RANK() DENSE_RANK() Ranks row by column values within a partition. There are no gaps in the ranking, but tied rankings are permitted. PERCENT_RANK() PERCENT_RANK() Assigns a percentile ranking number to each row in a partition. To calculate a value in the [0, 1] interval, we use (rank - 1) / (total number of rows - 1). CUME_DIST() CUME_DIST() Shows the cumulative distribution of a value within a group of values, i.e. the number of rows with values less than or equal to the current row’s value divided by the total number of rows. LEAD() LEAD(expr, offset, default) The value of the row n number of rows after the current row. The offset and default arguments are optional; it will return the next row value by default. LAG() LAG(expr, offset, default) The value of the row n number of rows before the current row. The offset and default arguments are optional; it will return the previous row value by default. NTILE() NTILE(n) Divides rows within a partition into n groups and assigns each row a group number. FIRST_VALUE() FIRST_VALUE(expr) The value for the first row within the window frame. LAST_VALUE() LAST_VALUE(expr) The value for the last row within the window frame. NTH_VALUE() NTH_VALUE(expr, n) The value for the n-th row within the window frame. Learn More About the SQL OVER Clause In this article, we covered the OVER clause and the PARTITION BY and ORDER BY subclauses. If you want to continue learning about window functions, there are a pair of articles I would like to share. The first one is 8 Best SQL Window Function Articles, which will point you to other great articles. The second is a window function cheat sheet that includes syntax, examples, and images; it’s my favorite article about window functions. For those who want to go deeper, I suggest our interactive Window Functions SQL course. If you want to increase your SQL skills in general, try the track SQL From A to Z. It’s a comprehensive look at everything you need to know to work effectively with SQL. Tags: Window Functions