8th Nov 2022 9 minutes read How to Use the PARTITION BY Clause in SQL Tihomir Babic sql partition by Table of Contents PARTITION BY Syntax PARTITION BY Examples Using OVER (PARTITION BY) Example #1 Example #2 Using OVER (ORDER BY) Using OVER (PARTITION BY ORDER BY) Example #1 Example #2 When to Use PARTITION BY PARTITION BY Must’ve Tickled Your Curiosity We’ll be dealing with the window functions today. Specifically, we’ll focus on the PARTITION BY clause and explain what it does. PARTITION BY is one of the clauses used in window functions. In SQL, window functions are used for organizing data into groups and calculating statistics for them. It sounds awfully familiar, doesn’t it? Even though they sound similar, window functions and GROUP BY are not the same; window functions are more like GROUP BY on steroids. Why? Because window functions keep the details of individual rows while calculating statistics for the row groups. GROUP BY can’t do that! PARTITION BY is crucial for that distinction; this is the clause that divides a window function result into data subsets or partitions. In a way, it’s GROUP BY for window functions. You’ll soon learn how it works. There’s a much more comprehensive (and interactive) version of this article – our Window Functions course. It covers everything we’ll talk about and plenty more. Through its interactive exercises, you will learn all you need to know about window functions. You’ll go through the OVER(), PARTITION BY, and ORDER BY clauses and learn how to use ranking and analytics window functions. The course also gives you 47 exercises to practice and a final quiz. If you’re indecisive, here’s why you should learn window functions. Now, let’s talk about PARTITION BY! PARTITION BY Syntax The syntax for the PARTITION BY clause is: SELECT column_name, window_function (expression) OVER (PARTITION BY column name) FROM table; In the window_function part, you put the specific window function. The OVER() clause is a mandatory clause that makes the window function work. It virtually defines the window function. The PARTITION BY subclause is followed by the column name(s). The column(s) you specify in this clause will be the partitions/groups into which the window function results will be grouped. The following examples will make this clearer. We know you can’t memorize everything immediately, so feel free to keep our SQL Window Functions Cheat Sheet nearby as we go through the examples. It’s a handy reminder of different window functions and their syntax. PARTITION BY Examples The example dataset consists of one table, employees. Here are its columns: id – The employee’s ID. first_name – The employee’s first name. last_name – The employee’s last name. job_title – The employee’s job title. department – The employee’s department. date_of_employment – The date when the employee’s employment started. salary – The employee’s salary. Have a look at the table data before we start writing the code: idfirst_namelast_namejob_titledepartmentdate_of_employmentsalary 1BobMendelsohnData AnalystRisk Management2020-09-255,412.47 2FrancesJacksonData AnalystMarketing2020-04-244,919.34 3FranckMonteblancData ScientistMarketing2021-03-187,519.34 4PatriciaKingData ScientistRisk Management2020-03-057,871.69 5WillieHayesStatisticianRisk Management2021-07-096,995.87 6SimoneHillStatisticianMarketing2021-05-096,815.67 7WalterTysonDatabase AdministratorIT2022-08-127,512.14 8InesOwenDatabase AdministratorIT2021-09-158,105.41 9CarolinaOliveiraData EngineerIT2022-09-158,410.57 10SeanRiceSystem AnalystIT2022-01-196,518.22 If you wish to follow along by writing your own SQL queries, here’s the code for creating this dataset. Using OVER (PARTITION BY) Now it’s time that we show you how PARTITION BY works on an example or two. Example #1 In the first example, the goal is to show the employees’ salaries and the average salary for each department. If you were paying attention, you already know how PARTITION BY can help us here: SELECT first_name, last_name, job_title, department, salary, AVG(salary) OVER (PARTITION BY department) AS average_salary_by_department FROM employees; To calculate the average, you need to use the AVG() aggregate function. Write the column salary in the parentheses. This is, for now, an ordinary aggregate function. To make it a window aggregate function, write the OVER() clause. Now, remember that we don’t need the total average (i.e. for the whole company) but the average by department. To have this metric, put the column department in the PARTITION BY clause. Does this return the desired output? Let’s see! first_namelast_namejob_titledepartmentsalaryaverage_salary_by_department CarolinaOliveiraData EngineerIT8,410.577,636.59 InesOwenDatabase AdministratorIT8,105.417,636.59 WalterTysonDatabase AdministratorIT7,512.147,636.59 SeanRiceSystem AnalystIT6,518.227,636.59 SimoneHillStatisticianMarketing6,815.676,418.12 FrancesJacksonData AnalystMarketing4,919.346,418.12 FranckMonteblancData ScientistMarketing7,519.346,418.12 BobMendelsohnData AnalystRisk Management5,412.476,760.01 WillieHayesStatisticianRisk Management6,995.876,760.01 PatriciaKingData ScientistRisk Management7,871.696,760.01 You can see that the output lists all the employees and their salaries. For the IT department, the average salary is 7,636.59. This value is repeated for all IT employees. When we arrive at employees from another department, the average changes. In this case, it’s 6,418.12 in Marketing. Following this logic, the average salary in Risk Management is 6,760.01. As you can see, PARTITION BY instructed the window function to calculate the departmental average. How does this differ from GROUP BY? Let’s see what happens if we calculate the average salary by department using GROUP BY. departmentaverage_salary_by_department Risk Management6,760.01 Marketing6,418.12 IT7,636.59 As you can see, you can get all the same average salaries by department. However, one huge difference is you don’t get the individual employee’s salary. You can expand on this difference by reading an article about the difference between PARTITION BY and GROUP BY. Example #2 Now we want to show all the employees’ salaries along with the highest salary by job title. The query is very similar to the previous one. The only two changes are the aggregate function and the column in PARTITION BY. SELECT first_name, last_name, job_title, department, salary, MAX(salary) OVER (PARTITION BY job_title) AS max_salary_by_job_title FROM employees; This time, we use the MAX() aggregate function and partition the output by job title. Here’s the result: first_namelast_namejob_titledepartmentsalarymax_salary_by_job_title BobMendelsohnData AnalystRisk Management5,412.475,412.47 FrancesJacksonData AnalystMarketing4,919.345,412.47 CarolinaOliveiraData EngineerIT8,410.578,410.57 PatriciaKingData ScientistRisk Management7,871.697,871.69 FranckMonteblancData ScientistMarketing7,519.347,871.69 InesOwenDatabase AdministratorIT8,105.418,105.41 WalterTysonDatabase AdministratorIT7,512.148,105.41 SimoneHillStatisticianMarketing6,815.676,995.87 WillieHayesStatisticianRisk Management6,995.876,995.87 SeanRiceSystem AnalystIT6,518.226,518.22 Take a look at the first two rows. Bob Mendelsohn and Frances Jackson are data analysts working in Risk Management and Marketing, respectively. The table shows their salaries and the highest salary for this job position. It’s 5,412.47, Bob Mendelsohn’s salary. The same logic applies to the rest of the results. Of course, when there’s only one job title, the employee’s salary and maximum job salary for that job title will be the same. That’s the case for the data engineer and the system analyst. This example can also show the limitations of GROUP BY. The code below will show the highest salary by the job title: SELECT job_title, MAX(salary) AS max_salary_by_job_title FROM employees GROUP BY job_title; And here’s the output: job_titlemax_salary_by_job_title Data Scientist7,871.69 Statistician6,995.87 System Analyst6,518.22 Data Engineer8,410.57 Data Analyst5,412.47 Database Administrator8,105.41 Yes, the salaries are the same as with PARTITION BY. But with this result, you have no idea what every employee’s salary is and who has the highest salary. Using OVER (ORDER BY) The ORDER BY clause is another window function subclause. It orders data within a partition or, if the partition isn’t defined, the whole dataset. When we say order, we don’t mean the output. When used with window functions, the ORDER BY clause defines the order in which a window function will perform its calculation. ORDER BY can be used with or without PARTITION BY. Let’s first see how it works without PARTITION BY. We’ll use it to show employees’ data and rank them by their employment date. The ranking will be done from the earliest to the latest date. SELECT first_name, last_name, job_title, department, date_of_employment, RANK() OVER (ORDER BY date_of_employment ASC) AS employment_date_rank FROM employees; The window function we use now is RANK(). It’s one of the functions used for ranking data. Again, the OVER() clause is mandatory. The ORDER BY clause tells the ranking function to assign ranks according to the date of employment in descending order. Run the query and you’ll get this output: first_namelast_namejob_titledepartmentdate_of_employmentemployment_date_rank PatriciaKingData ScientistRisk Management2020-03-051 FrancesJacksonData AnalystMarketing2020-04-242 BobMendelsohnData AnalystRisk Management2020-09-253 FranckMonteblancData ScientistMarketing2021-03-184 SimoneHillStatisticianMarketing2021-05-095 WillieHayesStatisticianRisk Management2021-07-096 InesOwenDatabase AdministratorIT2021-09-157 SeanRiceSystem AnalystIT2022-01-198 WalterTysonDatabase AdministratorIT2022-08-129 CarolinaOliveiraData EngineerIT2022-09-1510 All the employees are ranked according to their employment date. The first person employed ranks first and the last ranks tenth. Using OVER (PARTITION BY ORDER BY) As we already mentioned, PARTITION BY and ORDER BY can also be used simultaneously. Let’s look at a few examples. Example #1 Imagine you have to rank the employees in each department according to their salary. How would you do that? Here’s the solution: SELECT first_name, last_name, job_title, department, salary, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank FROM employees; We again use the RANK() window function. In the OVER() clause, data needs to be partitioned by department. To sort the employees, use the column salary in ORDER BY and sort the records in descending order. Let’s see what this query does: first_namelast_namejob_titledepartmentsalarysalary_rank CarolinaOliveiraData EngineerIT8,410.571 InesOwenDatabase AdministratorIT8,105.412 WalterTysonDatabase AdministratorIT7,512.143 SeanRiceSystem AnalystIT6,518.224 FranckMonteblancData ScientistMarketing7,519.341 SimoneHillStatisticianMarketing6,815.672 FrancesJacksonData AnalystMarketing4,919.343 PatriciaKingData ScientistRisk Management7,871.691 WillieHayesStatisticianRisk Management6,995.872 BobMendelsohnData AnalystRisk Management5,412.473 In the IT department, Carolina Oliveira has the highest salary. Then come Ines Owen and Walter Tyson, while the last one is Sean Rice. They are all ranked accordingly. When the window function comes to the next department, it resets and starts ranking from the beginning. So, Franck Monteblanc is paid the highest, while Simone Hill and Frances Jackson come second and third, respectively. The same is done with the employees from Risk Management. Example #2 Let’s practice this on a slightly different example. We still want to rank the employees by salary. This time, not by the department but by the job title. Here’s how to do it. SELECT first_name, last_name, job_title, department, salary, RANK() OVER (PARTITION BY job_title ORDER BY salary DESC) AS salary_rank FROM employees; The data is now partitioned by job title. The ORDER BY clause stays the same: it still sorts in descending order by salary. This is the query’s output: first_namelast_namejob_titledepartmentsalarysalary_rank BobMendelsohnData AnalystRisk Management5,412.471 FrancesJacksonData AnalystMarketing4,919.342 CarolinaOliveiraData EngineerIT8,410.571 PatriciaKingData ScientistRisk Management7,871.691 FranckMonteblancData ScientistMarketing7,519.342 InesOwenDatabase AdministratorIT8,105.411 WalterTysonDatabase AdministratorIT7,512.142 WillieHayesStatisticianRisk Management6,995.871 SimoneHillStatisticianMarketing6,815.672 SeanRiceSystem AnalystIT6,518.221 The logic is the same as in the previous example. In this example, there is a maximum of two employees with the same job title, so the ranks don’t go any further. Bob Mendelsohn is the highest paid of the two data analysts. Then there is only rank 1 for data engineer because there is only one employee with that job title. The rest of the data is sorted with the same logic. You can find more examples in this article on window functions in SQL. And if knowing window functions makes you hungry for a better career, you’ll be happy that we answered the top 10 SQL window functions interview questions for you. When to Use PARTITION BY We answered the ‘how’. The second important question that needs answering is when you should use PARTITION BY. There are two main uses. The first use is when you want to group data and calculate some metrics but also keep the individual rows with their values. The second use of PARTITION BY is when you want to aggregate data into two or more groups and calculate statistics for these groups. PARTITION BY Must’ve Tickled Your Curiosity PARTITION BY is a wonderful clause to be familiar with. Not only does it mean you know window functions, it also increases your ability to calculate metrics by moving you beyond the mandatory clauses used in window functions. Do you want to satisfy your curiosity about what else window functions and PARTITION BY can do? The Window Functions course is waiting for you! Tags: sql partition by