8th Nov 2022 9 minutes read

How to Use the PARTITION BY Clause in SQL

Table of Contents

PARTITION BY Syntax
PARTITION BY Examples
When to Use PARTITION BY
PARTITION BY Must’ve Tickled Your Curiosity

We’ll be dealing with the window functions today. Specifically, we’ll focus on the PARTITION BY clause and explain what it does.

PARTITION BY is one of the clauses used in window functions. In SQL, window functions are used for organizing data into groups and calculating statistics for them. It sounds awfully familiar, doesn’t it? Even though they sound similar, window functions and GROUP BY are not the same; window functions are more like GROUP BY on steroids. Why? Because window functions keep the details of individual rows while calculating statistics for the row groups. GROUP BY can’t do that!

PARTITION BY is crucial for that distinction; this is the clause that divides a window function result into data subsets or partitions. In a way, it’s GROUP BY for window functions. You’ll soon learn how it works.

There’s a much more comprehensive (and interactive) version of this article – our Window Functions course. It covers everything we’ll talk about and plenty more. Through its interactive exercises, you will learn all you need to know about window functions. You’ll go through the OVER(), PARTITION BY, and ORDER BY clauses and learn how to use ranking and analytics window functions. The course also gives you 47 exercises to practice and a final quiz. If you’re indecisive, here’s why you should learn window functions.

Now, let’s talk about PARTITION BY!

PARTITION BY Syntax

The syntax for the PARTITION BY clause is:

SELECT column_name,
  	 window_function (expression) OVER (PARTITION BY column name)
FROM table;

In the window_function part, you put the specific window function.

The OVER() clause is a mandatory clause that makes the window function work. It virtually defines the window function.

The PARTITION BY subclause is followed by the column name(s). The column(s) you specify in this clause will be the partitions/groups into which the window function results will be grouped.

The following examples will make this clearer. We know you can’t memorize everything immediately, so feel free to keep our SQL Window Functions Cheat Sheet nearby as we go through the examples. It’s a handy reminder of different window functions and their syntax.

PARTITION BY Examples

The example dataset consists of one table, employees. Here are its columns:

id – The employee’s ID.
first_name – The employee’s first name.
last_name – The employee’s last name.
job_title – The employee’s job title.
department – The employee’s department.
date_of_employment – The date when the employee’s employment started.
salary – The employee’s salary.

Have a look at the table data before we start writing the code:

id	first_name	last_name	job_title	department	date_of_employment	salary
1	Bob	Mendelsohn	Data Analyst	Risk Management	2020-09-25	5,412.47
2	Frances	Jackson	Data Analyst	Marketing	2020-04-24	4,919.34
3	Franck	Monteblanc	Data Scientist	Marketing	2021-03-18	7,519.34
4	Patricia	King	Data Scientist	Risk Management	2020-03-05	7,871.69
5	Willie	Hayes	Statistician	Risk Management	2021-07-09	6,995.87
6	Simone	Hill	Statistician	Marketing	2021-05-09	6,815.67
7	Walter	Tyson	Database Administrator	IT	2022-08-12	7,512.14
8	Ines	Owen	Database Administrator	IT	2021-09-15	8,105.41
9	Carolina	Oliveira	Data Engineer	IT	2022-09-15	8,410.57
10	Sean	Rice	System Analyst	IT	2022-01-19	6,518.22

If you wish to follow along by writing your own SQL queries, here’s the code for creating this dataset.

Using OVER (PARTITION BY)

Now it’s time that we show you how PARTITION BY works on an example or two.

Example #1

In the first example, the goal is to show the employees’ salaries and the average salary for each department. If you were paying attention, you already know how PARTITION BY can help us here:

SELECT first_name,
	 last_name,
	 job_title,
	 department,
	 salary,
	 AVG(salary) OVER (PARTITION BY department) AS average_salary_by_department
FROM employees;

To calculate the average, you need to use the AVG() aggregate function. Write the column salary in the parentheses. This is, for now, an ordinary aggregate function. To make it a window aggregate function, write the OVER() clause.

Now, remember that we don’t need the total average (i.e. for the whole company) but the average by department. To have this metric, put the column department in the PARTITION BY clause.

Does this return the desired output? Let’s see!

first_name	last_name	job_title	department	salary	average_salary_by_department
Carolina	Oliveira	Data Engineer	IT	8,410.57	7,636.59
Ines	Owen	Database Administrator	IT	8,105.41	7,636.59
Walter	Tyson	Database Administrator	IT	7,512.14	7,636.59
Sean	Rice	System Analyst	IT	6,518.22	7,636.59
Simone	Hill	Statistician	Marketing	6,815.67	6,418.12
Frances	Jackson	Data Analyst	Marketing	4,919.34	6,418.12
Franck	Monteblanc	Data Scientist	Marketing	7,519.34	6,418.12
Bob	Mendelsohn	Data Analyst	Risk Management	5,412.47	6,760.01
Willie	Hayes	Statistician	Risk Management	6,995.87	6,760.01
Patricia	King	Data Scientist	Risk Management	7,871.69	6,760.01

You can see that the output lists all the employees and their salaries. For the IT department, the average salary is 7,636.59. This value is repeated for all IT employees.

When we arrive at employees from another department, the average changes. In this case, it’s 6,418.12 in Marketing. Following this logic, the average salary in Risk Management is 6,760.01.

As you can see, PARTITION BY instructed the window function to calculate the departmental average.

How does this differ from GROUP BY? Let’s see what happens if we calculate the average salary by department using GROUP BY.

department	average_salary_by_department
Risk Management	6,760.01
Marketing	6,418.12
IT	7,636.59

As you can see, you can get all the same average salaries by department. However, one huge difference is you don’t get the individual employee’s salary. You can expand on this difference by reading an article about the difference between PARTITION BY and GROUP BY.

Example #2

Now we want to show all the employees’ salaries along with the highest salary by job title.

The query is very similar to the previous one. The only two changes are the aggregate function and the column in PARTITION BY.

SELECT first_name,
	 last_name,
	 job_title,
	 department,
	 salary,
	 MAX(salary) OVER (PARTITION BY job_title) AS max_salary_by_job_title
FROM employees;

This time, we use the MAX() aggregate function and partition the output by job title.

Here’s the result:

first_name	last_name	job_title	department	salary	max_salary_by_job_title
Bob	Mendelsohn	Data Analyst	Risk Management	5,412.47	5,412.47
Frances	Jackson	Data Analyst	Marketing	4,919.34	5,412.47
Carolina	Oliveira	Data Engineer	IT	8,410.57	8,410.57
Patricia	King	Data Scientist	Risk Management	7,871.69	7,871.69
Franck	Monteblanc	Data Scientist	Marketing	7,519.34	7,871.69
Ines	Owen	Database Administrator	IT	8,105.41	8,105.41
Walter	Tyson	Database Administrator	IT	7,512.14	8,105.41
Simone	Hill	Statistician	Marketing	6,815.67	6,995.87
Willie	Hayes	Statistician	Risk Management	6,995.87	6,995.87
Sean	Rice	System Analyst	IT	6,518.22	6,518.22

Take a look at the first two rows. Bob Mendelsohn and Frances Jackson are data analysts working in Risk Management and Marketing, respectively. The table shows their salaries and the highest salary for this job position. It’s 5,412.47, Bob Mendelsohn’s salary.

The same logic applies to the rest of the results. Of course, when there’s only one job title, the employee’s salary and maximum job salary for that job title will be the same. That’s the case for the data engineer and the system analyst.

This example can also show the limitations of GROUP BY.

The code below will show the highest salary by the job title:

SELECT job_title,
	 MAX(salary) AS max_salary_by_job_title
FROM employees
GROUP BY job_title;

And here’s the output:

job_title	max_salary_by_job_title
Data Scientist	7,871.69
Statistician	6,995.87
System Analyst	6,518.22
Data Engineer	8,410.57
Data Analyst	5,412.47
Database Administrator	8,105.41

Yes, the salaries are the same as with PARTITION BY. But with this result, you have no idea what every employee’s salary is and who has the highest salary.

Using OVER (ORDER BY)

The ORDER BY clause is another window function subclause. It orders data within a partition or, if the partition isn’t defined, the whole dataset.

When we say order, we don’t mean the output. When used with window functions, the ORDER BY clause defines the order in which a window function will perform its calculation.

ORDER BY can be used with or without PARTITION BY.

Let’s first see how it works without PARTITION BY. We’ll use it to show employees’ data and rank them by their employment date. The ranking will be done from the earliest to the latest date.

SELECT first_name,
	 last_name,
	 job_title,
	 department,
	 date_of_employment,
	 RANK() OVER (ORDER BY date_of_employment ASC) AS employment_date_rank
FROM employees;

The window function we use now is RANK(). It’s one of the functions used for ranking data. Again, the OVER() clause is mandatory.

The ORDER BY clause tells the ranking function to assign ranks according to the date of employment in descending order.

Run the query and you’ll get this output:

first_name	last_name	job_title	department	date_of_employment	employment_date_rank
Patricia	King	Data Scientist	Risk Management	2020-03-05	1
Frances	Jackson	Data Analyst	Marketing	2020-04-24	2
Bob	Mendelsohn	Data Analyst	Risk Management	2020-09-25	3
Franck	Monteblanc	Data Scientist	Marketing	2021-03-18	4
Simone	Hill	Statistician	Marketing	2021-05-09	5
Willie	Hayes	Statistician	Risk Management	2021-07-09	6
Ines	Owen	Database Administrator	IT	2021-09-15	7
Sean	Rice	System Analyst	IT	2022-01-19	8
Walter	Tyson	Database Administrator	IT	2022-08-12	9
Carolina	Oliveira	Data Engineer	IT	2022-09-15	10

All the employees are ranked according to their employment date. The first person employed ranks first and the last ranks tenth.

Using OVER (PARTITION BY ORDER BY)

As we already mentioned, PARTITION BY and ORDER BY can also be used simultaneously. Let’s look at a few examples.

Example #1

Imagine you have to rank the employees in each department according to their salary. How would you do that?

Here’s the solution:

SELECT first_name,
	 last_name,
	 job_title,
	 department,
	 salary,
	 RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;

We again use the RANK() window function. In the OVER() clause, data needs to be partitioned by department. To sort the employees, use the column salary in ORDER BY and sort the records in descending order.

Let’s see what this query does:

first_name	last_name	job_title	department	salary	salary_rank
Carolina	Oliveira	Data Engineer	IT	8,410.57	1
Ines	Owen	Database Administrator	IT	8,105.41	2
Walter	Tyson	Database Administrator	IT	7,512.14	3
Sean	Rice	System Analyst	IT	6,518.22	4
Franck	Monteblanc	Data Scientist	Marketing	7,519.34	1
Simone	Hill	Statistician	Marketing	6,815.67	2
Frances	Jackson	Data Analyst	Marketing	4,919.34	3
Patricia	King	Data Scientist	Risk Management	7,871.69	1
Willie	Hayes	Statistician	Risk Management	6,995.87	2
Bob	Mendelsohn	Data Analyst	Risk Management	5,412.47	3

In the IT department, Carolina Oliveira has the highest salary. Then come Ines Owen and Walter Tyson, while the last one is Sean Rice. They are all ranked accordingly.

When the window function comes to the next department, it resets and starts ranking from the beginning. So, Franck Monteblanc is paid the highest, while Simone Hill and Frances Jackson come second and third, respectively.

The same is done with the employees from Risk Management.

Example #2

Let’s practice this on a slightly different example. We still want to rank the employees by salary. This time, not by the department but by the job title.

Here’s how to do it.

SELECT first_name,
	 last_name,
	 job_title,
	 department,
	 salary,
	 RANK() OVER (PARTITION BY job_title ORDER BY salary DESC) AS salary_rank
FROM employees;

The data is now partitioned by job title. The ORDER BY clause stays the same: it still sorts in descending order by salary.

This is the query’s output:

first_name	last_name	job_title	department	salary	salary_rank
Bob	Mendelsohn	Data Analyst	Risk Management	5,412.47	1
Frances	Jackson	Data Analyst	Marketing	4,919.34	2
Carolina	Oliveira	Data Engineer	IT	8,410.57	1
Patricia	King	Data Scientist	Risk Management	7,871.69	1
Franck	Monteblanc	Data Scientist	Marketing	7,519.34	2
Ines	Owen	Database Administrator	IT	8,105.41	1
Walter	Tyson	Database Administrator	IT	7,512.14	2
Willie	Hayes	Statistician	Risk Management	6,995.87	1
Simone	Hill	Statistician	Marketing	6,815.67	2
Sean	Rice	System Analyst	IT	6,518.22	1

The logic is the same as in the previous example. In this example, there is a maximum of two employees with the same job title, so the ranks don’t go any further.

Bob Mendelsohn is the highest paid of the two data analysts. Then there is only rank 1 for data engineer because there is only one employee with that job title. The rest of the data is sorted with the same logic.

You can find more examples in this article on window functions in SQL. And if knowing window functions makes you hungry for a better career, you’ll be happy that we answered the top 10 SQL window functions interview questions for you.

When to Use PARTITION BY

We answered the ‘how’. The second important question that needs answering is when you should use PARTITION BY.

There are two main uses. The first use is when you want to group data and calculate some metrics but also keep the individual rows with their values.

The second use of PARTITION BY is when you want to aggregate data into two or more groups and calculate statistics for these groups.

PARTITION BY Must’ve Tickled Your Curiosity

PARTITION BY is a wonderful clause to be familiar with. Not only does it mean you know window functions, it also increases your ability to calculate metrics by moving you beyond the mandatory clauses used in window functions.

Do you want to satisfy your curiosity about what else window functions and PARTITION BY can do? The Window Functions course is waiting for you!

Tags:

PARTITION BY Syntax

PARTITION BY Examples

Using OVER (PARTITION BY)

Example #1

Example #2

Using OVER (ORDER BY)

Using OVER (PARTITION BY ORDER BY)

Example #1

Example #2

When to Use PARTITION BY

PARTITION BY Must’ve Tickled Your Curiosity

You may also like