30th Jun 2022 12 minutes read

Learn the OVER() Clause in SQL with 7 Examples

window functions

Table of Contents

The OVER() Clause in SQL
Ready to Practice the SQL OVER() Clause?

Learn how to use the OVER() clause in your SQL queries and you will take a step forward in your SQL knowledge.

Most data analysts and SQL developers have a good handle on the SELECT, FROM, WHERE, GROUP BY, and ORDER BY clauses. However, these clauses are only a limited subset of the SQL language. There are other clauses, such as OVER(), which allow us to create flexible SQL reports and add an enormous expressive power to the SQL language.

In this article, we are going to explain the OVER() clause by demonstrating several examples of its usage. Let’s get started!

The OVER() Clause in SQL

SQL has a fairly easy learning curve; anyone can learn how to create simple queries in SQL. In fact, the average IT pro or SQL developer handles the basic SELECT, FROM, WHERE, GROUP BY and ORDER BY clauses with ease. These clauses were available in SQL-92, which is 30 years old! However, the SQL language does not end there; many new elements have been added since 1992. One of them is the OVER() clause, which allows us to use window functions in SQL queries.

In SQL, window functions are similar to GROUP BY in that they work on a group of rows. However, window functions are based on a data window, or a set of rows related to the current row. Unlike GROUP BY, window functions do not collapse rows; they keep the details of individual rows intact.

For those people who want to go deeper into OVER() and window functions in SQL, I suggest our Window Functions online course. It contains plenty of examples using different window functions.

Ok, now let’s shift back to the OVER() function. First, let’s look at our data.

Introducing the Data

Our example queries will be based on the following data.

Suppose we have a seafood restaurant on the Mediterranean coast. We also have a database table where we store the history of the restaurant’s activity. Our database has a table called restaurant_activity with these columns:

table_number
waiter_name
start_date
start_time
served_time
end_time
total_diners
amount_payment
total_tips.

Below is an example of the data:

Restaurant_activity

table_number	waiter_name	start_date	start_ time	served_time	end_time	total_diners	amount_payment	total_tips
1	John	5/5/2022	11:03	11:17	11:45	3	350.00	37
2	Peter	5/5/2022	11:10	11:32	13:10	4	430.50	50
3	Mary	5/5/2022	11:30	12:05	12:40	2	260.35	20
1	John	5/5/2022	12:00	12:38	13:10	4	670.12	30
3	Mary	5/5/2022	12:43	13:12	13:50	3	320.50	20
2	Peter	6/5/2022	11:10	11:21	11:40	5	560.75	60
3	Mary	6/5/2022	11:40	11:53	12:40	3	240.10	25
1	John	6/5/2022	11:30	11:53	12:30	1	150.00	10
3	Mary	6/5/2022	14:10	14:20	14:40	1	240.10	25
1	Mary	6/5/2022	14:30	14:35	14:50	2	150.00	30

This should all be self-explanatory, but let’s quickly run through a few columns. The start_date and start_time is the date and time when a meal was ordered by the customers at table_number; served_time is when the meal was served, and end_time is when the customers asked for their bill. The amount_payment is the cost of the meal, not including the tip (which is stored as total_tips).

Example Query #1: A Simple Use of OVER()

Let’s start by using the OVER() clause to calculate each order’s percentage of the total daily revenue on 5/5/2022. We can calculate the total revenue of the restaurant on this day (excluding tips) using OVER() combined with the SUM() window function. The OVER() clause is always associated with a window function; window functions calculate a value based on a set of records defined by the OVER() clause. The query is:

SELECT 
  start_date AS date,
  SUM(amount_payment) OVER () AS daily_revenue,
  amount_payment AS total_order,
  (amount_payment / SUM(amount_payment) OVER ())*100 AS order_percentage
FROM   restaurant_activity
WHERE  start_date = ’5/5/2022’

The previous query calculates the daily_revenue for 5/5/2022 by adding up each payment amount for this day. The OVER() clause is empty, meaning that the window of records used to calculate the SUM() function is the complete set of records returned by the query. In other words, the window is composed of all records for this date.

In the calculation of column order_percentage, we divide the individual order amount by the total daily revenue to obtain the percentage; this is a central point in the query, because we are combining row-level columns with window function results in one expression.

Understanding what records are part of the window is central to understanding how window functions work; we’ll return to this point later in the article. For now, let’s look at the results:

date	daily_revenue	total_order	order_percentage
2022-05-05	2031.47	350.00	17.23
2022-05-05	2031.47	430.50	21.19
2022-05-05	2031.47	260.35	12.82
2022-05-05	2031.47	670.12	32.99
2022-05-05	2031.47	320.50	15.78

Before going into more complex OVER() examples, I’d like to suggest our Window Functions Practice Set. If you want to really learn how to use OVER(), these 100 interactive exercises will do the trick.

Example Query #2: Using the PARTITION BY Subclause

In this example, we will use the PARTITION BY subclause; it works with OVER() to define data windows.

Let’s suppose we want a report similar to the previous one, but expanded to any day of May 2022. For each day, we want to see the total daily revenue, the individual order amount for all orders, and what percentage of the daily revenue is attributed to each order. Here is the query:

SELECT start_date AS date,
  		SUM(amount_payment) OVER (PARTITION BY start_date) AS daily_revenue,
  		amount_payment AS total_order,
  		(amount_payment / SUM(amount_payment) OVER (PARTITION BY start_date)) * 100 AS
    order_percentage
FROM restaurant_activity
WHERE start_date BETWEEN ’5/1/2022’ AND ’5/31/2022’

We mentioned that window functions work based on a window of records (or a set of records) related to the current record. The previous query uses the subclause PARTITION BY start_date to define which records belong in each window. In our example, all records with the same start_date value as the current row will be part of the window. As we have only two unique values for start_date, we have only two different windows. They are shown in green and red in the result below:

date	daily_revenue	total_order	order_percentage
5/5/2022	2031.47	350.00	17.23
5/5/2022	2031.47	430.50	21.19
5/5/2022	2031.47	260.35	12.82
5/5/2022	2031.47	670.12	32.99
5/5/2022	2031.47	320.50	15.78
5/6/2022	1340.95	560.75	41.82
5/6/2022	1340.95	240.10	17.91
5/6/2022	1340.95	150.00	11.19
5/6/2022	1340.95	240.10	17.91
5/6/2022	1340.95	150.00	11.19

Example Query #3: Using the ORDER BY Subclause

Now let’s introduce the ORDER BY subclause, which allows you to put rows in a specific order within the window.

Let’s suppose we want a simple report with the five most expensive orders along with the waiter, date, and table number. The query is:

WITH ranking AS (
	SELECT 
table_number, 
amount_payment, 
waiter_name, 
start_date, 
	       RANK() OVER (ORDER BY amount_payment DESC) AS position
	FROM restaurant_activity
)
SELECT 
  amount_payment, 
  waiter_name, 
  start_date, 
  table_number, 
  position
FROM ranking
WHERE position <= 5
ORDER BY position

Here, we used a CTE (or Common Table Expression) to build the ranking. In the CTE, we calculated the position of each order by using the window function RANK() combined with the following OVER() clause:

RANK() OVER (ORDER BY amount_payment DESC)

The above clause defines a window of records which is formed by all records in the table, so all orders of the restaurant are included. This window is sorted by amount_payment in descending order: the largest amount is the first record in the window, and so on. The RANK() function returns the position of the current row in the ordered window, storing this value in the position column of the CTE.

After we build the ranking CTE, the rest of the query uses ranking like any other table. We filter records to obtain only the first 5 ranks, then we order the results by position. This is what we get:

amount_payment	waiter_name	start_date	table_number	position
670.12	John	2022-05-05	1	1
560.75	Peter	2022-05-06	2	2
430.50	Peter	2022-05-05	2	3
350.00	John	2022-05-05	1	4
320.50	Mary	2022-05-05	3	5

At this point, I would like to suggest the article SQL Window Functions vs. GROUP BY: What's the Difference?. It explains the differences between the GROUP BY clause and window functions through several SQL query examples. Another interesting article with additional details is What Is the OVER() Clause in SQL?.

Example Query #4: Computing Table Free Time with OVER() and LAG()

Window functions include many ways to make complex calculations easier. One of them is the LAG() function, which returns a value from any column in the previous row related to the current row of the window.

The owner of the restaurant wants to know how much time the tables are free, e.g. the time that elapses between customers. We can do this by creating a report with the columns table_number, date, free_start, free_end, and free_time_duration.

To calculate the free time, we need to access two rows. We need the end_time of the previous occupation and the start_time of the next occupation; then we can calculate the elapsed time between the two. Here is where the LAG() window function comes in, because LAG() allows access to any column in the previous record. Here’s the query we’d use:

    SELECT 
start_date AS date,
table_number,
-- ending time of the previous occupation 
COALESCE(LAG(end_time) OVER (PARTITION BY start_date, table_number 
                             ORDER BY start_time),'11:00') AS start_free_time,
-- starting time of current occupation
start_time AS end_free_time,
-- calculating the free time when the table was unoccupied
start_time - 
COALESCE(LAG(end_time) OVER (PARTITION BY start_date, table_number 
                             ORDER BY start_time),'11:00') AS free_time_duration
    FROM   restaurant_activity

The key in the previous query is the LAG() window function. We used it to obtain the end_time of the previous occupation. The clause …

OVER (PARTITION BY start_date, table_number ORDER BY start_time)

… defines one window (or set of rows) for each distinct pair of <start_date, table_number>, and each of these windows are ordered by start_time. Then LAG(end_time) returns the ending time of the previous occupation of the table.

You’ve probably noticed we used LAG() two times. The first one is used to obtain the starting time of the free period, and the second is to calculate the duration of the free time using the following expression:

start_time - 
coalesce(LAG(end_time) OVER (PARTITION BY start_date, table_number 
                             ORDER BY start_time),'11:00')

The results of the query are:

date	table_number	start_free_time	end_free_time	free_time_duration
5/5/2022	1	11:00:00	11:03:00	00:03:00
5/5/2022	1	11:45:00	12:00:00	00:15:00
5/5/2022	2	11:00:00	11:10:00	00:10:00
5/5/2022	3	11:00:00	11:30:00	00:30:00
5/5/2022	3	12:40:00	12:43:00	00:03:00
6/5/2022	1	11:00:00	11:30:00	00:30:00
6/5/2022	1	12:30:00	14:30:00	02:00:00
6/5/2022	2	11:00:00	11:10:00	00:10:00
6/5/2022	3	11:00:00	11:40:00	00:40:00
6/5/2022	3	12:40:00	14:10:00	01:30:00

Before going to the next section, I suggest the following articles for more details:

Example Query #5: Computing Rankings with the OVER() Clause

In this section, we will cover a business situation where the SQL OVER() clause can be applied to create a ranking like the top 10 salespeople or 5 best-selling products. You can use the OVER() clause combined with the RANK() window function to obtain this kind of report. Let’s see an example query that will return the two largest tips of the day and the waiter receiving them:

SELECT  *
FROM (
  		SELECT 	waiter_name,
  			start_date AS date,
			total_tips AS tip_amount,
			RANK() OVER(PARTITION BY start_date ORDER BY total_tips DESC) AS ranking
  	  	FROM restaurant_activity
     	     ) AS ranking_table
WHERE ranking <= 2;

We used a subquery in the FROM clause to create a temporary “table” called ranking_table. It uses the column ranking to store the position of the tip in the daily tip ranking. The position in the ranking is calculated with the RANK() window function. Other columns in the temporary table are waiter_name, date, and tip_amount.

In the outer query, we filter only the tips ranked 1 and 2. The result of the query is shown below:

waiter_name	date	tip_amount	ranking
John	5/5/2022	50	1
John	5/5/2022	37	2
Peter	6/5/2022	60	1
Mary	6/5/2022	30	2

Example Query #6: Computing Differences from a Previous Period

Another interesting possibility is showing the difference in value from the previous time period and the current one. The OVER() clause combined with window functions like LEAD() and LAG() are used to create this kind of report. For additional background, see How to Calculate the Difference Between Two Rows in SQL.

Let’s suppose we want to see each waiter’s daily revenue alongside the difference from the previous day. In the same row, we also want to see the difference expressed as a percentage. Here’s the query:

SELECT	
  waiter_name,
  date,
  today_revenue,
  -- revenue variation ----------------------------------------------
  LAG(today_revenue) OVER (
    PARTITION BY waiter_name ORDER BY date) yesterday_revenue,
  today_revenue - LAG(today_revenue) OVER (
    PARTITION BY waiter_name ORDER BY date) AS revenue_variation,
  -- -----------------------------------------------------------------
  -- revenue variation percentage ------------------------------------------------
  round((today_revenue - 
   LAG(today_revenue) OVER (PARTITION BY waiter_name ORDER BY date)) /
   LAG(today_revenue) OVER (PARTITION BY waiter_name ORDER BY date),2) * 100 
AS revenue_variation_percentage
  -- -----------------------------------------------------------------------------		
FROM (
  SELECT  DISTINCT 
    start_date::date AS "date",
    waiter_name,
    SUM(total_tips) OVER ( 
      PARTITION BY waiter_name, start_date::date ORDER BY start_date)
AS today_revenue
  FROM 	restaurant_activity
) AS daily_revenue_per_waiter;

We’ve created a subquery in the FROM clause called daily_revenue_per_waiter that has the waiter_name, date, and the total revenue made by this waiter on this day. In the outer query, we use the LAG() window function to obtain the revenue in the previous day, then we obtain the variation of the revenue between yesterday and today and the percentage variation. The result is:

waiter_name	date	today_revenue	yesterday_revenue	revenue_variation	revenue_variation_percentage
John	2022-05-05	67	null	null	null
John	2022-05-06	10	67	-57	-85.00
Mary	2022-05-05	40	null	null	null
Mary	2022-05-06	80	40	40	100.00
Peter	2022-05-05	50	null	null	null
Peter	2022-05-06	60	50	10	20.00

Example Query #7: Moving Average

Moving averages are a metric that’s frequently used to smooth out short-term fluctuations. You can learn more in the article What a Moving Average Is and How to Compute it in SQL; let’s see how we can build one using the OVER() clause.

As an example, let’s suppose the restaurant owner wants to know the average amount paid per person on the last table served at any given time. He uses this average as a metric to know how much money the diners are paying and to activate some promotions or discounts. We can easily calculate this average with the expression amount_payment/total_diners; however, the owner has realized that this metric has major fluctuations, so he decided to use the average amount paid per person on the last 3 and 6 tables served. The query to calculate this report is:

SELECT start_date AS "date",
start_time AS "time",
table_number,
amount_payment AS total_amount,
total_diners,
       	ROUND(amount_payment/total_diners,2) AS diner_avg,
ROUND(AVG(amount_payment/total_diners) 
     OVER (ORDER BY start_date, start_time ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) 
     AS diner_moving_avg_last_3_tables_served,
	ROUND(AVG(amount_payment/total_diners) 
     OVER (ORDER BY start_date, start_time ROWS BETWEEN 5 PRECEDING AND CURRENT ROW),2) 
     AS diner_moving_avg_last_6_tables_served
FROM   restaurant_activity

The previous query calculates 3 different averages. The first one is a simple average based on the expression:

ROUND(amount_payment/total_diners,2)

The second average is the moving average for the last 3 tables served; the third average is just the same, but for the last 6 tables served:

ROUND(AVG(amount_payment/total_diners) 
     OVER (ORDER BY start_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2)
…
OVER (ORDER BY start_date, start_time ROWS BETWEEN 5 PRECEDING AND CURRENT ROW),2)

The term “ROWS BETWEEN 2 PRECEDING AND CURRENT ROW” specifies that the average is calculated based on 3 rows: the current row and the 2 rows immediately preceding it. The window is ordered by table starting time. Here’s the result:

date	time	table_number	total_amount	diners	diner_avg	diner_moving_avg_last_3_tables_served	diner_moving_avg_last_6_tables_served
2022-05-05	11:03	1	350.00	3	116.67	116.67	116.67
2022-05-05	11:10	2	430.50	4	107.63	112.15	112.15
2022-05-05	11:30	3	260.35	2	130.18	118.16	118.16
2022-05-05	12:00	1	670.12	4	167.53	135.11	130.50
2022-05-05	12:43	3	320.50	3	106.83	134.85	125.77
2022-05-06	11:10	2	560.75	5	112.15	128.84	123.50
2022-05-06	11:40	3	240.10	3	80.03	99.67	117.39
2022-05-06	11:30	1	150.00	1	150.00	114.06	124.45
2022-05-06	14:10	3	240.10	1	240.10	156.71	142.77
2022-05-06	14:30	1	150.00	2	75.00	155.03	127.35

Other business uses of the OVER() clause include computing running totals (useful in all kinds of financial analysis scenarios) and calculating the length of a data series.

Ready to Practice the SQL OVER() Clause?

We’ve demonstrated several queries featuring the SQL OVER() clause. As OVER() must be used combined with a window function, we covered a few of those, too: SUM(), AVG(), LAG() and RANK().

If you want to put your new knowledge of the SQL OVER() clause into practice, I recommend our interactive Window Functions course followed by our Window Functions Practice Set. You can read about the course in the article SQL Course of the Month – Window Functions. You can also get a copy of our free SQL Window Functions Cheat Sheet to help you on your learning path. Develop your skills and increase your assets!

Tags:

window functions