25 Advanced SQL Query Examples
Table of Contents
- Table Employee
-
25 Advanced SQL Query Examples with Explanations
- Example 1. Ranking Rows Based on a Specific Ordering Criteria
- Example 2. List The First 5 Rows of a Result Set
- Example 3. List the Last 5 Rows of a Result Set
- Example 4. List The Second Highest Row of a Result Set
- Example 5. List the Second Highest Salary By Department
- Example 6. List the First 50% Rows in a Result Set
- Example 7. List the Last 25% Rows in a Result Set
- Example 8. Number the Rows in a Result Set
- Example 9. List All Combinations of Rows from Two Tables
- Example 10. Join a Table to Itself
- Example 11. Show All Rows with an Above-Average Value
- Example 12. Employees with Salaries Higher Than Their Departmental Average
- Example 13. Obtain All Rows Where a Value Is in a Subquery Result
- Example 14. Find Duplicate Rows in SQL
- Example 15. Count Duplicate Rows
- Example 16. Find Common Records Between Tables
- Example 17. Grouping Data with ROLLUP
- Example 18. Conditional Summation
- Example 19. Group Rows by a Range
- Example 20. Compute a Running Total in SQL
- Example 21. Compute a Moving Average in SQL
- Example 22. Compute a Difference (Delta) Between Two Columns on Different Rows
- Example 23. Compute a Year-Over-Year Difference
- Example 24. Use Recursive Queries to Manage Data Hierarchies
- Example 25. Find the Length of a Series Using Window Functions
- Practice Advanced SQL with LearnSQL.com Courses
One of the best ways to learn advanced SQL is by studying example queries. In this article, we'll show 25 examples of advanced SQL queries from medium to high complexity. You can use them to refresh your knowledge of advanced SQL or to review before a SQL interview.
Table Employee
Many of the examples in this article will be based on the following employee
table. Only a few examples will be based on other tables; in these cases, the tables will be explained along with the example.
employee_id | first_name | last_name | dept_id | manager_id | salary | expertise |
---|---|---|---|---|---|---|
100 | John | White | IT | 103 | 120000 | Senior |
101 | Mary | Danner | Account | 109 | 80000 | junior |
102 | Ann | Lynn | Sales | 107 | 140000 | Semisenior |
103 | Peter | O'connor | IT | 110 | 130000 | Senior |
106 | Sue | Sanchez | Sales | 107 | 110000 | Junior |
107 | Marta | Doe | Sales | 110 | 180000 | Senior |
109 | Ann | Danner | Account | 110 | 90000 | Senior |
110 | Simon | Yang | CEO | null | 250000 | Senior |
111 | Juan | Graue | Sales | 102 | 37000 | Junior |
Even for persons with SQL expertise, a good online interactive SQL course can be a real help. You can find the most complete set of interactive SQL courses in our SQL from A to Z track. It contains 7 interactive SQL courses with over 850(!) exercises logically arranged to take you from a complete beginner to an advanced SQL user. The beginner courses cover the foundations of SQL and are a perfect way to review and refresh your basic SQL knowledge. The advanced SQL courses will teach you concepts like window functions, recursive queries, and complex SQL reports.
Ok, let’s dig into our advanced SQL queries!
25 Advanced SQL Query Examples with Explanations
Question: What is a SQL Query?
A SQL query is a command written in Structured Query Language (SQL) that's used to retrieve, filter, combine, manage, or manipulate data in a database.
Example 1. Ranking Rows Based on a Specific Ordering Criteria
Sometimes we need to create a SQL query to show a ranking of rows based on a specific order criteria. In this example query, we will show a list of all employees ordered by salary (highest salary first). The report will include the position of each employee in the ranking.
Here’s the code:
SELECT employee_id, last_name, first_name, salary, RANK() OVER ( ORDER BY salary DESC ) as ranking FROM employee ORDER BY ranking |
In the above query, we use the function RANK()
. It is a window function that returns each row’s position in the result set, based on the order defined in the OVER
clause (1 for the highest salary, 2 for the second-highest, and so on). We need to use an ORDER BY
ranking clause at the end of the query to indicate the order on which the result set will be shown.
If you want to know more about ranking functions in SQL, I recommend our article What Is the RANK() Function in SQL, and How Do You Use It?
Example 2. List The First 5 Rows of a Result Set
The next SQL query creates a report with the employee data for the top 5 salaries in the company. This kind of report must be ordered based on a given criteria; in our example, the order criteria will again be salary DESC
:
WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, RANK() OVER ( ORDER BY salary DESC ) as ranking FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking <= 5 ORDER BY ranking |
The WITH
clause in the previous query creates a CTE called employee_ranking
, which is a kind of virtual table that’s consumed in the main query. The subquery in the CTE uses the function RANK()
to obtain the position of each row in the ranking. The clause OVER (ORDER BY salary DESC)
indicates how the RANK()
value must be calculated. The RANK()
function for the row with the highest salary will return 1, and so on.
Finally, in the WHERE
of the main query we ask for those rows with a ranking value smaller or equal than 5. This lets us obtain only the top 5 rows by ranking value. Again, we use an ORDER BY
clause to show the result set, which is ordered by rank ascending.
Example 3. List the Last 5 Rows of a Result Set
This query is similar to the top 5 query, but we want the last 5 rows. We only need to introduce a change in the type of order, i.e. using ASC
instead of DESC
. In the CTE, we will create a ranking column based on an ascendent order of salary (lowest salary first):
WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, RANK() OVER ( ORDER BY salary ASC ) as ranking FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking <= 5 ORDER BY ranking |
In the main query, we use WHERE ranking <= 5
to filter the rows with the 5 lowest salaries. After that, we use ORDER BY ranking
to order the rows of the report by ranking value.
Example 4. List The Second Highest Row of a Result Set
Let’s suppose we’d like to obtain the data of the employee with the second highest salary in the company. We can apply a similar approach to our previous query:
WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, RANK() OVER ( ORDER BY salary DESC ) as ranking FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking = 2 |
The WHERE
condition ranking = 2
is used to filter the rows with the salary in position 2. Note that we can have more than one employee in position 2 if they have the same salary.
At this point, it is important to understand the behavior of the RANK()
function as well as other available functions like ROW_NUMBER()
and DENSE_RANK()
. This topic is covered in detail in our Overview of Ranking Functions in SQL. I strongly recommend reading this article if you need to work with different kinds of rankings.
Example 5. List the Second Highest Salary By Department
Let’s add a variation to the previous SQL query. As each of our employees belongs to a department, we now want a report showing the department ID and the employee name with the second highest salary in this department. We want one record for each department in the company. Here’s the query:
WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, dept_id RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC ) as ranking FROM employee ) SELECT dept_id, employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking = 2 ORDER BY dept_id, last_name |
The main change introduced in this query is the PARTITION BY dept_id
clause in OVER
. This clause groups rows with the same dept_id
, ordering the rows in each group by salary DESC
. Then the RANK()
function is calculated for each department.
In the main query, we return the dept_id
and the employee data for those employees in position 2 of their departmental ranking.
You can read more about combining RANK()
with PARTITION BY
in How to Use the SQL RANK OVER (PARTITION BY). For those readers who want to find out more about finding the Nth highest row in a group, I recommend the article How to Find the Nth-Highest Salary by Department with SQL.
Example 6. List the First 50% Rows in a Result Set
In some cases, we could be interested in obtaining the first 50% of the result set, (or any other percentage). For this kind of report, there is a SQL function called NTILE()
which receives an integer parameter indicating the number of subsets into which we want to divide the entire result set. For example NTILE(2)
divides the result set into 2 subsets with the same quantity of elements; for each row, it returns a 1 or a 2 depending on the subset where the row is located.
Here’s the query:
WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, NTILE(2) OVER ( ORDER BY salary ) as ntile FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ntile = 1 ORDER BY salary |
The above query returns only the rows in the first half of a report of employees ordered by salary in ascending order. We use the condition ntile = 1
to filter only those rows in the first half of the report. If you are interested in the NTILE()
window function, see our article on the NTILE Function.
Example 7. List the Last 25% Rows in a Result Set
As with the previous query, in this example we will use NTILE(4)
to divide the result set into 4 subsets; each subset will have 25% of the total result set. Using the NTILE()
function, we will generate a column called ntile
with the values 1, 2, 3, and 4:
WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, NTILE(4) OVER ( ORDER BY salary) as ntile FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ntile = 4 ORDER BY salary |
The WHERE ntile = 4
condition filters only the rows in the last quarter of the report. The last clause ORDER BY salary
orders the result set to be returned by the query, while OVER (ORDER BY salary)
orders the rows before dividing them into 4 subsets using NTILE(4)
.
Example 8. Number the Rows in a Result Set
Sometimes we want to create a ranking that assigns each row a number indicating the position of that row in the ranking: 1 to the first row, 2 to the second one, and so on. SQL provides a few ways to do this. If we want a simple sequence of numbers from 1 to N, we can use the ROW_NUMBER()
function. However if we want a ranking that allows two rows in the same position (i.e. because they share the same value) we can use the RANK()
or DENSE_RANK()
function. The following query creates a report where each row has a position value:
SELECT employee_id, last_name, first_name, salary, ROW_NUMBER() OVER ( ORDER BY employee_id) as ranking_position FROM employee |
If you want to learn about different advanced ranking functions, I recommend the article Overview of Ranking Functions in SQL.
Example 9. List All Combinations of Rows from Two Tables
In some cases, we could need a join that includes all the possible combinations of rows from two tables. Suppose we have a food company where we sell 3 kinds of cereal: corn flakes, sugared corn flakes, and rice flakes. All these cereals are sold in 3 different packaging sizes: 1 pound, 3 pounds, and 5 pounds. As we offer 3 products in 3 different package sizes, then we offer nine different combinations.
We have a product
table with 3 records (corn flakes, sugared corn flakes and rice flakes) and another table called box_size
with 3 records one for 1 pound and two records for 3 and 5 pounds, respectively. If we want to create a report with the price list for our nine combinations, we can use the following query:
SELECT grain.product_name, box_size.description, grain.price_per_pound * box_size.box_weight FROM product CROSS JOIN box_sizes |
The query result will be:
product | package_size | price |
---|---|---|
Corn flake | 1 pound box | 2.43 |
Corn flake | 3 pound box | 7.29 |
Corn flake | 5 pound box | 12.15 |
Sugared corn flake | 1 pound box | 2.85 |
Sugared corn flake | 3 pound box | 8.55 |
Sugared corn flake | 5 pound box | 14.25 |
Rice flake | 1 pound box | 1.98 |
Rice flake | 3 pound box | 5.94 |
Rice flake | 5 pound box | 9.90 |
The CROSS JOIN
clause without any condition produces a table with all row combinations from both tables. Note we calculate the price based on the per-pound price stored in the product
table and the weight from box_sizes
with the expression:
grain.price_per_pound * box_size.box_weight |
A deep dive into the CROSS JOIN
can be found in A Guide to the SQL CROSS JOIN.
Example 10. Join a Table to Itself
In some cases, we need to join a table to itself. Think about the employee
table. Every row has a column called manager_id
with the ID of the manager supervising this employee. Using a self-join we can obtain a report with the columns employee_name
and manager_name
; this will show us who manages each employee. Here is the query:
SELECT e1.first_name || ' ' || e1.last_name AS manager_name, e2.first_name || ' ' || e2.last_name AS employee_name FROM employee e1 JOIN employee e2 ON e1.employee_id = e2.manager_id |
In the above query, we can see the table employee
is referenced twice as e1
and e2
, and the join condition is e1.employee_id = e2.manager_id
. This condition links each employee row with the manager row. The article What Is a Self Join in SQL? An Explanation With Seven Examples will give you more ideas about when you can apply self joins in your SQL queries.
Example 11. Show All Rows with an Above-Average Value
We need a report showing all employees with a salary that’s higher than the company average. We can first create a subquery to obtain the average salary of the company, and then compare the salary of every employee with the subquery result. This is shown in the following example:
SELECT first_name, last_name, salary FROM employee WHERE salary > ( SELECT AVG (salary) FROM employee ) |
You can see the subquery that obtains the average salary in the WHERE
clause. In the main query, we select the employee name and salary. You can read more about subqueries in the article SQL Subquery Practice: 15 Exercises with Solutions.
Example 12. Employees with Salaries Higher Than Their Departmental Average
Let’s suppose we want to obtain records for employees with salaries higher than the average salary in their departments. This query is different than the previous one because now we need a subquery to obtain the average salary for the current employee’s department rather than the entire company This is called a correlated subquery because there’s a reference to a column in the current row of the main table of the query within the subquery.
Here’s the code:
SELECT first_name, last_name, salary FROM employee e1 WHERE salary > ( SELECT AVG (salary) FROM employee e2 WHERE e1.departmet_id = e2.department_id) |
In the subquery, we can see a reference to the column e1.department_id, which is a column referenced in the main query. The condition e1.departmet_id = e2.department_id
is the key in the subquery because it allows us to obtain the average of all employees in the department of the current row. Once we obtain the departmental average salary, we compare it with the employee salary and filter accordingly.
Example 13. Obtain All Rows Where a Value Is in a Subquery Result
Suppose that John Smith manages several departments and we want to obtain a list of all the employees in those departments. We’ll use a subquery to obtain IDs of the departments managed by John Smith. Then we’ll use the IN
operator to find the employees working in those departments:
SELECT first_name, last_name FROM employee e1 WHERE department_id IN ( SELECT department_id FROM department WHERE manager_name=‘John Smith’) |
The previous subquery is a multi-row subquery: it returns more than one row. In fact, it will return several rows because John Smith manages many departments. When working with multi-row subqueries, you need to use specific operators (like IN
) in the WHERE
condition involving the subquery.
Example 14. Find Duplicate Rows in SQL
If a table has duplicate rows, you can find them with SQL. Use a query with a GROUP BY
clause including all the columns in the table and a HAVING
clause to filter rows that appear more than one time. Here’s an example:
SELECT employee_id, last_name, first_name, dept_id, manager_id, salary FROM employee GROUP BY employee_id, last_name, first_name, dept_id, manager_id, salary HAVING COUNT (*) > 1 |
The rows that are not duplicated will have a COUNT(*)
equal to 1, but those rows that exist many times will have a COUNT(*)
returning the number of times that the row exists. I suggest the article How to Find Duplicate Values in SQL if you want to find more details about this technique.
Example 15. Count Duplicate Rows
If you want to count duplicate rows, you can use the following query. It’s similar to the previous one, but we add a COUNT(*)
in the SELECT
list to show how many times each duplicate row appears in the table:
SELECT employee_id, last_name, first_name, dept_id, manager_id, salary, COUNT (*) AS number_of_rows FROM employee GROUP BY employee_id, last_name, first_name, dept_id, manager_id, salary HAVING COUNT (*) > 1 |
Again, you can find valuable information about how to manage duplicate records in the article How To Find Duplicate Records in SQL.
Example 16. Find Common Records Between Tables
If you have two tables with the same schema or if two tables have a subset of columns in common, you can obtain the rows that appear in both tables with the set operator INTERSECT
. Let’s suppose we have a snapshot of the table employee
taken in Jan 2020 called employee_2020_jan
and we want to obtain the list of employees that exist in both tables. We can do that with this query:
SELECT last_name, first_name FROM employee INTERSECT SELECT last_name, first_name FROM employee_2020_jan |
As a result, we will obtain a list of employees that appear in both tables. Perhaps they’ll have different values on the columns like salary
or dept_id
. In other words, we are obtaining those employees who worked for the company in Jan 2020 and who are still working for the company.
If you are interested in finding more about set operators, I suggest the article Introducing SQL Set Operators: Union, Union All, Minus, and Intersect.
Example 17. Grouping Data with ROLLUP
The GROUP BY
clause in SQL is used to aggregate rows in groups and apply functions to all the rows in the group, returning a single result value. For example, if we want to obtain a report with the total salary amount per department and expertise level, we can do the following query:
SELECT dept_id, expertise, SUM (salary) total_salary FROM employee GROUP BY dept_id, expertise |
The GROUP BY
has the optional clause ROLLUP
, which allows it to include additional groupings in one query. Adding the ROLLUP
clause to our example could give us the total sum of salaries for each department (no matter what expertise level the employee has) and the total sum of salaries for the whole table (no matter the employee’s department and expertise level). The modified query is:
SELECT dept_id, expertise, SUM (salary) total_salary FROM employee GROUP BY ROLLUP (dept_id, expertise) |
And the result will be:
dept_id | expertise | total_salary |
---|---|---|
Account | Senior | 90000 |
Account | Junior | 80000 |
Account | NULL | 170000 |
CEO | Senior | 250000 |
CEO | NULL | 250000 |
IT | Senior | 250000 |
IT | NULL | 250000 |
Sales | Junior | 110000 |
Sales | Semisenior | 140000 |
Sales | Senior | 180000 |
Sales | NULL | 430000 |
NULL | NULL | 1100000 |
The rows in the result set with a NULL
are the extra rows added by the ROLLUP
clause. A NULL
value in the column expertise
means a group of rows for a specific value of dept_id
but without a specific expertise
value. In other words, it is the total amount of salaries for each dept_id
. In the same way, the last row of the result having a NULL
for columns dept_id
and expertise
means the grand total for all departments in the company.
If you want to learn more about the ROLLUP
clause and other similar clauses like CUBE
, the article Grouping, Rolling, and Cubing Data has lots of examples.
Example 18. Conditional Summation
In some cases, we need to summarize or count values based on some condition(s). For example, if we want to obtain the total salaries in the Sales and Human Resources departments combined and in the IT and Support departments combined, we can execute the following query:
SELECT SUM ( CASE WHEN dept_id IN ( 'SALES' , 'HUMAN RESOURCES' ) THEN salary ELSE 0 END ) AS total_salary_sales_and_hr, SUM ( CASE WHEN dept_id IN ( 'IT' , 'SUPPORT' ) THEN salary ELSE 0 END ) AS total_salary_it_and_support FROM employee |
The query returns a single row with two columns. The first column shows the total salary for the Sales and Human Resources departments. This value is calculated using the SUM()
function on the salary
column – but only when the employee belongs to the Sales or Human Resources department. A zero is added to the sum when the employee belongs to any other department. The same idea is applied for the total_salary_it_and_support
column.
The article How to Use CASE WHEN with SUM() in SQL provides more details about this technique.
Example 19. Group Rows by a Range
In the next example query, we will create the salary ranges low
, medium
, and high
. Then we will count how many employees are in each salary range:
SELECT CASE WHEN salary <= 750000 THEN 'low' WHEN salary > 750000 AND salary <= 100000 THEN 'medium' WHEN salary > 100000 THEN 'high' END AS salary_category, COUNT (*) AS number_of_employees FROM employee GROUP BY CASE WHEN salary <= 750000 THEN 'low' WHEN salary > 750000 AND salary <= 100000 THEN 'medium' WHEN salary > 100000 THEN 'high' END |
In this query, we use CASE
to define the salary range for each employee. You can see the same CASE
statement twice. The first one defines the ranges, as we just said; the second one in the GROUP BY
aggregates records and applies the COUNT(*)
function to each group of records. You can use the CASE
statement in the same way to compute counts or sums for other custom-defined levels.
How to Use CASE in SQL explains other examples of CASE
statements like the one used in this query.
Example 20. Compute a Running Total in SQL
A running total is a very common SQL pattern, one that’s used frequently in finance and in trend analysis.
When you have a table that stores any daily metric, such as a sales
table with the columns day
and daily_amount
, you can calculate the running total as the cumulative sum of all previous daily_amount
values. SQL provides a window function called SUM()
to do just that.
In the following query, we’ll calculate the cumulative sales for each day:
SELECT day , daily_amount, SUM (daily_amount) OVER ( ORDER BY day ) AS running_total FROM sales |
The SUM()
function uses the OVER()
clause to define the order of the rows; all rows previous to the current day are included in the SUM()
. Here’s a partial result:
day | daily_amount | running_total |
---|---|---|
Jan 30, 2023 | 1000.00 | 1000.00 |
Jan 31, 2023 | 800.00 | 1800.00 |
Feb 1, 2023 | 700.00 | 2500.00 |
The first two columns day and daily_amount
are values taken directly from the table sales
. The column running_total
is calculated by the expression:
SUM (daily_amount) OVER ( order by day ) |
You can clearly see how the running_total
is the accumulated sum of the previous daily_amounts
.
If you wish to go deeper on this topic, I suggest the article What Is a SQL Running Total and How Do You Compute It?, which includes many clarifying examples.
Example 21. Compute a Moving Average in SQL
A moving average is a time series technique for analyzing trends in data. It is calculated as the average of the current value and a specified number of immediately preceding values for each point in time. The main idea is to examine how these averages behave over time instead of examining the behavior of the original or raw data points.
Let’s calculate the moving average for the last 7 days using the sales
table from the previous example:
SELECT day , daily_amount, AVG (daily_amount) OVER ( ORDER BY day ROWS 6 PRECEDING) AS moving_average FROM sales |
In the above query, we use the AVG()
window function to calculate the average using the current row (today) and the previous 6 rows. As the rows are ordered by day, the current row and the 6 previous rows defines a period of 1 week.
The article What a Moving Average Is and How to Compute it in SQL goes into detail about this subject; check it out if you want to learn more.
Example 22. Compute a Difference (Delta) Between Two Columns on Different Rows
There’s more than one way to calculate the difference between two rows in SQL. One way to do it is by using the window functions LEAD()
and LAG()
, as we will do in this example.
Let’s suppose we want to obtain a report with the total amount sold on each day, but we also want to obtain the difference (or delta) related to the previous day. We can use a query like this one:
SELECT day , daily_amount, daily_amount - LAG(daily_amount) OVER ( ORDER BY day ) AS delta_yesterday_today FROM sales |
The key expression in this query is:
daily_amount - LAG(daily_amount) OVER ( ORDER BY day ) |
Both elements of the arithmetic difference come from different rows. The first element comes from the current row and LAG(daily_amount)
comes from the previous day row. LAG()
returns the value of any column from the previous row (based on the ORDER BY
specified in the OVER
clause).
If you want to read more about LAG()
and LEAD()
, I suggest the article How to Calculate the Difference Between Two Rows in SQL.
Example 23. Compute a Year-Over-Year Difference
Year-over-year (YOY) or month-to-month comparisons are a popular and effective way to evaluate the performance of several kinds of organizations. You can calculate the comparison as a value or as a percentage.
In this example, we will use the sales
table, which has data in a daily granularity. We first need to aggregate the data to the year or month, which we will do by creating a CTE with amounts aggregated by year. Here’s the query:
WITH year_metrics AS ( SELECT extract( year from day ) as year , SUM (daily_amount) as year_amount FROM sales GROUP BY year ) SELECT year , year_amount, LAG(year_amount) OVER ( ORDER BY year ) AS revenue_previous_year, year_amount - LAG(year_amount) OVER ( ORDER BY year ) as yoy_diff_value, ((year_amount - LAG(year_amount) OVER ( ORDER BY year ) ) / LAG(year_amount) OVER ( ORDER BY year )) as yoy_diff_perc FROM year_metrics ORDER BY 1 |
The first expression to analyze is the one used to calculate yoy_diff_value
:
year_amount - LAG(year_amount ) OVER ( ORDER BY year ) |
It is used to calculate the difference (as a value) between the amount of the current year and the previous year using the LAG()
window function and ordering the data by year.
In the next expression, we calculate the same difference as a percentage. This calculation is a little more complex because we need to divide by the previous year’s amount. (Note: We use the previous year as the base for percentage calculation, so the previous year is 100 percent.)
((year_amount-LAG(year_amount ) OVER( ORDER BY year ))/LAG(year_amount ) OVER( ORDER BY year )) |
In the article How to Compute Year-Over-Year Differences in SQL, you can find several examples of calculating year-to-year and month-to-month differences.
Example 24. Use Recursive Queries to Manage Data Hierarchies
Some tables in SQL can have an implicit kind of data hierarchy. As an example, our employee
table has a manager_id
for each employee. We have a manager who is in charge of other managers, who in turn have other employees under their charge, and so on.
When we have this sort of organization, we can have a hierarchy of various levels. In each row, the column manager_id
refers to the row on the immediate upper level in the hierarchy. In these cases, a frequent request is to obtain a list of all employees reporting (directly or indirectly) to the CEO of the company (who, in this case, has the employee_id
of 110
). The query to use is:
WITH RECURSIVE subordinate AS ( SELECT employee_id, first_name, last_name, manager_id FROM employee WHERE employee_id = 110 -- id of the top hierarchy employee (CEO) UNION ALL SELECT e.employee_id, e.first_name, e.last_name, e.manager_id FROM employee e JOIN subordinate s ON e.manager_id = s.employee_id ) SELECT employee_id, first_name, last_name, manager_id FROM subordinate ; |
In this query, we created a recursive CTE called subordinate
. It’s the key part of this query because it traverses the data hierarchy going from one row to the rows in the hierarchy immediately below it.
There are two subqueries connected by a UNION ALL
; the first subquery returns the top row of the hierarchy and the second query returns the next level, adding those rows to the intermediate result of the query. Then the second subquery is executed again to return the next level, which again will be added to the intermediate result set. This process is repeated until no new rows are added to the intermediate result.
Finally, the main query consumes the data in the subordinate
CTE and returns data in the way we expect. If you want to learn more about recursive queries in SQL, I suggest the article How to Find All Employees Under Each Manager in SQL.
Example 25. Find the Length of a Series Using Window Functions
Suppose we have a table with user registration data. We store information about how many users registered on each date. We define a data series as the sequence of consecutive days when users registered. A day when no user registers breaks the data series. For each data series, we want to find its length.
The table below shows data series:
id | day | Registered users |
---|---|---|
1 | Jan 25 2023 | 51 |
2 | Jan 26 2023 | 46 |
3 | Jan 27 2023 | 41 |
4 | Jan 30 2023 | 59 |
5 | Jan 31 2023 | 73 |
6 | Feb 1 2023 | 34 |
7 | Feb 2 2023 | 56 |
8 | Feb 4 2023 | 34 |
There are 3 different data series shown in different colors. We are looking for a query to obtain the length of each data series. The first data series starts on Jan 25 and has a length of 3 elements, the second one starts on Jan 30 and its length is 4, and so on.
The query is as follows:
WITH data_series AS ( SELECT RANK() OVER ( ORDER BY day ) AS row_number, day , day - RANK() OVER ( ORDER BY day ) AS series_id FROM user_registration ) SELECT MIN ( day ) AS series_start_day, MAX ( day ) AS series_end_day, MAX ( day ) - MIN ( day ) + 1 AS series_length FROM data_series GROUP BY series_id ORDER BY series_start_date |
In the previous query, the CTE has the column series_id
, which is a value intended to be used as an ID for the rows in the same data series. In the main query, the GROUP BY series_id
clause is used to aggregate rows of the same data series. Then we can obtain the start of the series with MIN(day)
and its end with MAX(day)
. The length of the series is calculated with the expression:
MAX ( day ) - MIN ( day ) + 1 |
If you want to go deeper with this topic, the article How to Calculate the Length of a Series with SQL provides a detailed explanation of this technique.
Practice Advanced SQL with LearnSQL.com Courses
SQL is an easy-to-learn and powerful language. In this article, we showed 25 examples of advanced SQL queries. All of them can be explained in about 5 minutes, showing that SQL is an accessible language even when you need to do complex reports or queries.
If you want to continue learning SQL, I suggest our Advanced SQL track. The track contains Window Functions course, Recursive Queries, and GROUP BY Extensions in SQL. All of them cover complex areas of the SQL language in simple words and with plenty of examples. Sign up today for free!