Back to articles list Articles Cookbook
10 minutes read

How to Use the SQL PARTITION BY With OVER

At the heart of every window function call is an OVER clause that defines how the windows of the records are built. Within the OVER clause, there may be an optional PARTITION BY subclause that defines the criteria for identifying which records to include in each window. Read on and take an important step in growing your SQL skills!

What Is the PARTITION BY Clause in SQL?

The SQL PARTITION BY expression is a subclause of the OVER clause, which is used in almost all invocations of window functions like AVG(), MAX(), and RANK(). As many readers probably know, window functions operate on window frames which are sets of rows that can be different for each record in the query result. This is where the SQL PARTITION BY subclause comes in: it is used to define which records to make part of the window frame associated with each record of the result.

This article explains the SQL PARTITION BY and its uses with examples. Since it is deeply related to window functions, you may first want to read some articles on window functions, like “SQL Window Function Example With Explanations” where you find a lot of examples. If you want to learn more about window functions, there is also an interesting article with many pointers to other window functions articles.

The first thing to focus on is the syntax. Here’s how to use the SQL PARTITION BY clause:

SELECT
    ,
     OVER(PARTITION BY  [ORDER BY ])
FROM table;

Let’s look at an example that uses a PARTITION BY clause. We will use the following table called car_list_prices:

car_makecar_modelcar_typecar_price
FordMondeopremium18200
RenaultFuegosport16500
CitroenCactuspremium19000
FordFalconlow cost8990
FordGalaxystandard12400
RenaultMeganestandard14300
CitroenPicassopremium23400

For each car, we want to obtain the make, the model, the price, the average price across all cars, and the average price over the same type of car (to get a better idea of how the price of a given car compared to other cars). Here’s the query:

SELECT
    car_make,
    car_model,
    car_price,
    AVG(car_price) OVER() AS "overall average price",
    AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price"
FROM car_list_prices

The result of the query is the following:

car_makecar_modelcar_priceoverall average pricecar type average price
FordMondeo1820016112.858990.00
RenaultFuego1650016112.8520200.00
CitroenCactus1900016112.8520200.00
FordFalcon899016112.8520200.00
FordGalaxy1240016112.8516500.00
RenaultMegane1430016112.8513350.00
CitroenPicasso2340016112.8513350.00

The above query uses two window functions. The first is used to calculate the average price across all cars in the price list. It uses the window function AVG() with an empty OVER clause as we see in the following expression:

AVG(car_price) OVER() AS "overall average price"

The second window function is used to calculate the average price of a specific car_type like standard, premium, sport, etc. This is where we use an OVER clause with a PARTITION BY subclause as we see in this expression:

AVG(car_price) OVER (PARTITION BY car_type) AS "car type average price"

The window functions are quite powerful, right? If you’d like to learn more by doing well-prepared exercises, I suggest the course Window Functions, where you can learn about and become comfortable with using window functions in SQL databases.

Going Deep With the SQL PARTITION BY Clause

The GROUP BY clause groups a set of records based on criteria. This allows us to apply a function (for example, AVG() or MAX()) to groups of records to yield one result per group.

As an example, say we want to obtain the average price and the top price for each make. Use the following query:

SELECT car_make,
       AVG(car_price) AS average_price,
       MAX(car_price) AS top_price
FROM   car_list_prices
GROUP BY car_make

Here is the result of this query:

car_makeaverage_pricetop_price
Ford1319618200
Renault1540016500
Citroen2120023400

Compared to window functions, GROUP BY collapses individual records into a group. As a consequence, you cannot refer to any individual record field; that is, only the columns in the GROUP BY clause can be referenced.

For example, say you want to create a report with the model, the price, and the average price of the make. You cannot do this by using GROUP BY, because the individual records of each model are collapsed due to the clause GROUP BY car_make. For something like this, you need to use window functions, as we see in the following example:

SELECT car_make,
       car_model,
       car_price,
       AVG(car_price) OVER (PARTITION BY car_make) AS average_make
FROM   car_list_prices

The result of this query is the following:

car_makecar_modelcar_priceaverage_make
CitroenPicasso2340021200
CitroenCactus1900021200
FordGalaxy1240013196
FordFalcon899013196
FordMondeo1820013196
RenaultMegane1430015400
RenaultFuego1650015400

For those who want to go deeper, I suggest the article ““What Is the Difference Between a GROUP BY and a PARTITION BY?” with plenty of examples using aggregate and window functions.

In addition to the PARTITION BY clause, there is another clause called ORDER BY that establishes the order of the records within the window frame. Some window functions require an ORDER BY. For example, the LEAD() and the LAG() window functions need the record window to be ordered since they access the preceding or the next record from the current record.

A window frame is composed of several rows defined by the criteria in the PARTITION BY clause. However, we can specify limits or bounds to the window frame as we see in the following image:

How to Use the SQL PARTITION BY With OVER

The lower and upper bounds in the OVER clause may be:

  • UNBOUNDED PRECEDING
  • n PRECEDING
  • CURRENT ROW
  • n FOLLOWING
  • UNBOUNDED FOLLOWING

When we do not specify any bound in an OVER clause, its window frame is built based on some default boundary values. They depend on the syntax used to call the window function. The following table shows the default bounds of the window frame.

Syntax usedFirst Row in WindowLast Row in Window
Just empty OVER() clauseUNBOUNDED PRECEDINGUNBOUNDED FOLLOWING
OVER(PARTITION BY …)UNBOUNDED PRECEDINGUNBOUNDED FOLLOWING
OVER(PARTITION BY … ORDER BY …)UNBOUNDED PRECEDINGCURRENT ROW

There is a detailed article called “SQL Window Functions Cheat Sheet” where you can find a lot of syntax details and examples about the different bounds of the window frame.

The SQL PARTITION BY Clause in Action

In this section, we show some examples of the SQL PARTITION BY clause. All are based on the table paris_london_flights, used by an airline to analyze the business results of this route for the years 2018 and 2019. Here’s a subset of the data:

aircraft_makeaircarft_modelflight_numberscheduled_departurereal_departurescheduled_arrivalnum_of_passengerstotal_revenue
Boeing757 300FLP0032019-01-30 15:00:002019-01-30 15:00:002019-01-30 15:00:0026082630.10
Boeing737 200FLP0032019-02-01 15:00:002019-02-01 15:10:002019-02-01 15:55:0019558459.34
AirbusA500FLP0032019-02-01 15:00:002019-02-01 15:03:002019-02-01 15:03:5531291570.87
AirbusA500FLP0012019-10-28 05:00:002019-10-28 05:04:002019-10-28 05:55:0029887943.00
Boeing737 200FLP0022019-10-28 09:00:002019-10-28 09:00:002019-10-28 09:55:0017856342.45

Example 1

The first query generates a report including the flight_number, aircraft_model with the quantity of passenger transported, and the total revenue. The query is below:

SELECT DISTINCT
       flight_number,
       aircraft_model,
	SUM(num_of_passengers) OVER (PARTITION BY flight_number, aircraft_model)
                                                            AS total_passengers,
	SUM(total_revenue) OVER (PARTITION BY flight_number, aircraft_model)
                                                            AS total_revenue
FROM paris_london_flights
ORDER BY flight_number, aircraft_model;

Since the total passengers transported and the total revenue are generated for each possible combination of flight_number and aircraft_model, we use the following PARTITION BY clause to generate a set of records with the same flight number and aircraft model:

OVER (PARTITION BY flight_number, aircraft_model)

Then, for each set of records, we apply window functions SUM(num_of_passengers) and SUM(total_revenue) to obtain the metrics total_passengers and total_revenue shown in the next result set.

flight_numberaircraft_modeltotal_passengerstotal_revenue
FLP001737 200204816016060.82
FLP001757 300183895361126.23
FLP001Airbus A5005387215892165.58
FLP002737 200216606297197.71
FLP002757 300168694951475.86
FLP002Airbus A5005462716004812.16
FLP003737 200200985874892.44
FLP003757 300157084573379.28
FLP003Airbus A5005753316712475.04

Example 2

In the next query, we show how the business evolves by comparing metrics from one month with those from the previous month. We create a report using window functions to show the monthly variation in passengers and revenue.

WITH year_month_data AS (
  SELECT DISTINCT
	   EXTRACT(YEAR FROM scheduled_departure) AS year,
	   EXTRACT(MONTH FROM scheduled_departure) AS month,
	   SUM(number_of_passengers)
              OVER (PARTITION BY EXTRACT(YEAR FROM scheduled_departure),
                                  EXTRACT(MONTH FROM scheduled_departure)
                   ) AS passengers
   FROM  paris_london_flights
  ORDER BY 1, 2
)
SELECT  year,
        month,
	 passengers,
	 LAG(passengers) OVER (ORDER BY year, month) passengers_previous_month,
	 passengers - LAG(passengers) OVER (ORDER BY year, month) AS passengers_delta
FROM year_month_data;

In the query above, we use a WITH clause to generate a CTE (CTE stands for common table expressions and is a type of query to generate a virtual table that can be used in the rest of the query). We populate data into a virtual table called year_month_data, which has 3 columns: year, month, and passengers with the total transported passengers in the month.

Then, the second query (which takes the CTE year_month_data as an input) generates the result of the query. The column passengers contains the total passengers transported associated with the current record. With the LAG(passenger) window function, we obtain the value of the column passengers of the previous record to the current record. We ORDER BY year and month:

LAG(passengers) OVER (ORDER BY year, month)
passengers_previous_month

It obtains the number of passengers from the previous record, corresponding to the previous month. Then, we have the number of passengers for the current and the previous months. Finally, in the last column, we calculate the difference between both values to obtain the monthly variation of passengers.

yearmonthpassengerspassengers_previous_monthpassengers_delta
20181211469nullnull
20191247231146913254
201922253624723-2187
2019324994225362458
201942440824994-586
201952399824408-410
201962379323998-205
2019724816237931023
201982433424816-482
201992371924334-615
20191024989237191270
2019112437124989-618
201912108724371-23284

Example 3

For our last example, let’s look at flight delays. We want to obtain different delay averages to explain the reasons behind the delays.

We use a CTE to calculate a column called month_delay with the average delay for each month and obtain the aircraft model. Then in the main query, we obtain the different averages as we see below:

WITH paris_london_delays AS (
  SELECT DISTINCT
	   aircraft_model,
	   EXTRACT(YEAR FROM scheduled_departure) AS year,
	   EXTRACT(MONTH FROM scheduled_departure) AS month,
	   AVG(real_departure - scheduled_departure) AS month_delay
   FROM  paris_london_flights
   GROUP BY 1, 2, 3
)
SELECT  DISTINCT
     aircraft_model,
     year,
     month,
     month_delay AS monthly_avg_delay,
     AVG(month_delay) OVER (PARTITION BY aircraft_model, year) AS year_avg_delay,
     AVG(month_delay) OVER (PARTITION BY year) AS year_avg_delay_all_models,
     AVG(month_delay) OVER (PARTITION BY aircraft_model, year 
                               ORDER BY month
                               ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
                            ) AS rolling_average_last_4_months

FROM paris_london_delays
ORDER BY 1,2,3

This query calculates several averages. The first is the average per aircraft model and year, which is very clear. The second is the average per year across all aircraft models. Note we only use the column year in the PARTITION BY clause. The third and last average is the rolling average, where we use the most recent 3 months and the current month (i.e., row) to calculate the average with the following expression:

AVG(month_delay) OVER (PARTITION BY aircraft_model, year
                               ORDER BY month
                               ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
                           ) AS rolling_average_last_4_months

The clause ROWS BETWEEN 3 PRECEDING AND CURRENT ROW in the PARTITION BY restricts the number of rows (i.e., months) to be included in the average: the previous 3 months and the current month. You can see a partial result of this query below:

aircraft_modelyearmonthmonth_delayyear_avg_delayyear_avg_delay_all_modelsrolling_average_last_4_months
737 20020181200:02:13.8400:02:13.8400:03:13.7000:02:13.84
737 2002019100:02:16.8000:02:36.5900:02:34.1200:02:16.80
737 2002019200:02:35.0000:02:36.5900:02:34.1200:02:25.90
737 2002019300:01:38.4000:02:36.5900:02:34.1200:02:10.06
737 2002019400:04:00.0000:02:36.5900:02:34.1200:02:37.55
737 2002019500:03:12.7200:02:36.5900:02:34.1200:02:51.53
737 2002019600:02:21.4200:02:36.5900:02:34.1200:02:48.13

The article “The RANGE Clause in SQL Window Functions: 5 Practical Examples” explains how to define a subset of rows in the window frame using RANGE instead of ROWS, with several examples. Another interesting article is “Common SQL Window Functions: Using Partitions With Ranking Functions” in which the PARTITION BY clause is covered in detail.

The Power of Window Functions and the SQL PARTITION BY

Window functions are a very powerful resource of the SQL language, and the SQL PARTITION BY clause plays a central role in their use. In this article, we have covered how this clause works and showed several examples using different syntaxes.

Before closing, I suggest an Advanced SQL course, where you can go beyond the basics and become a SQL master. If you want to read about the OVER clause, there is a complete article about the topic: “How to Define a Window Frame in SQL Window Functions.” Improve your skills and grow your assets!