28th Feb 2023 Updated: 31st Oct 2024 24 minutes read

25 Advanced SQL Query Examples

advanced sql

Table of Contents

Table Employee
25 Advanced SQL Query Examples with Explanations
Practice Advanced SQL with LearnSQL.com Courses

One of the best ways to learn advanced SQL is by studying example queries. In this article, we'll show 25 examples of advanced SQL queries from medium to high complexity. You can use them to refresh your knowledge of advanced SQL or to review before a SQL interview.

Table Employee

Many of the examples in this article will be based on the following employee table. Only a few examples will be based on other tables; in these cases, the tables will be explained along with the example.

employee_id	first_name	last_name	dept_id	manager_id	salary	expertise
100	John	White	IT	103	120000	Senior
101	Mary	Danner	Account	109	80000	junior
102	Ann	Lynn	Sales	107	140000	Semisenior
103	Peter	O'connor	IT	110	130000	Senior
106	Sue	Sanchez	Sales	107	110000	Junior
107	Marta	Doe	Sales	110	180000	Senior
109	Ann	Danner	Account	110	90000	Senior
110	Simon	Yang	CEO	null	250000	Senior
111	Juan	Graue	Sales	102	37000	Junior

Even for persons with SQL expertise, a good online interactive SQL course can be a real help. You can find the most complete set of interactive SQL courses in our SQL from A to Z track. It contains 7 interactive SQL courses with over 850(!) exercises logically arranged to take you from a complete beginner to an advanced SQL user. The beginner courses cover the foundations of SQL and are a perfect way to review and refresh your basic SQL knowledge. The advanced SQL courses will teach you concepts like window functions, recursive queries, and complex SQL reports.

Ok, let’s dig into our advanced SQL queries!

25 Advanced SQL Query Examples with Explanations

Question: What is a SQL Query?

A SQL query is a command written in Structured Query Language (SQL) that's used to retrieve, filter, combine, manage, or manipulate data in a database.

Example 1. Ranking Rows Based on a Specific Ordering Criteria

Sometimes we need to create a SQL query to show a ranking of rows based on a specific order criteria. In this example query, we will show a list of all employees ordered by salary (highest salary first). The report will include the position of each employee in the ranking.

Here’s the code:

SELECT 	
  employee_id, 
  last_name, 
  first_name, 
  salary, 
  RANK() OVER (ORDER BY salary DESC) as ranking
FROM employee
ORDER BY ranking

In the above query, we use the function RANK(). It is a window function that returns each row’s position in the result set, based on the order defined in the OVER clause (1 for the highest salary, 2 for the second-highest, and so on). We need to use an ORDER BY ranking clause at the end of the query to indicate the order on which the result set will be shown.

If you want to know more about ranking functions in SQL, I recommend our article What Is the RANK() Function in SQL, and How Do You Use It?

Example 2. List The First 5 Rows of a Result Set

The next SQL query creates a report with the employee data for the top 5 salaries in the company. This kind of report must be ordered based on a given criteria; in our example, the order criteria will again be salary DESC:

WITH employee_ranking AS (
  SELECT 
    employee_id, 
    last_name, 
    first_name, 
    salary,
    RANK() OVER (ORDER BY salary DESC) as ranking
  FROM employee
)
SELECT
  employee_id, 
  last_name, 
  first_name, 
  salary
FROM employee_ranking
WHERE ranking <= 5
ORDER BY ranking

The WITH clause in the previous query creates a CTE called employee_ranking, which is a kind of virtual table that’s consumed in the main query. The subquery in the CTE uses the function RANK() to obtain the position of each row in the ranking. The clause OVER (ORDER BY salary DESC) indicates how the RANK() value must be calculated. The RANK() function for the row with the highest salary will return 1, and so on.

Finally, in the WHERE of the main query we ask for those rows with a ranking value smaller or equal than 5. This lets us obtain only the top 5 rows by ranking value. Again, we use an ORDER BY clause to show the result set, which is ordered by rank ascending.

Example 3. List the Last 5 Rows of a Result Set

This query is similar to the top 5 query, but we want the last 5 rows. We only need to introduce a change in the type of order, i.e. using ASC instead of DESC. In the CTE, we will create a ranking column based on an ascendent order of salary (lowest salary first):

WITH employee_ranking AS (
  SELECT 
    employee_id, 
    last_name, 
    first_name, 
    salary,
    RANK() OVER (ORDER BY salary ASC) as ranking
  FROM employee
)
SELECT
  employee_id, 
  last_name, 
  first_name, 
  salary
FROM employee_ranking
WHERE ranking <= 5
ORDER BY ranking

In the main query, we use WHERE ranking <= 5 to filter the rows with the 5 lowest salaries. After that, we use ORDER BY ranking to order the rows of the report by ranking value.

Example 4. List The Second Highest Row of a Result Set

Let’s suppose we’d like to obtain the data of the employee with the second highest salary in the company. We can apply a similar approach to our previous query:

WITH employee_ranking AS (
  SELECT 
    employee_id, 
    last_name, 
    first_name, 
    salary,
    RANK() OVER (ORDER BY salary DESC) as ranking
  FROM employee
)
SELECT 
  employee_id, 
  last_name, 
  first_name, 
  salary
FROM employee_ranking
WHERE ranking = 2

The WHERE condition ranking = 2 is used to filter the rows with the salary in position 2. Note that we can have more than one employee in position 2 if they have the same salary.

At this point, it is important to understand the behavior of the RANK() function as well as other available functions like ROW_NUMBER() and DENSE_RANK(). This topic is covered in detail in our Overview of Ranking Functions in SQL. I strongly recommend reading this article if you need to work with different kinds of rankings.

Example 5. List the Second Highest Salary By Department

Let’s add a variation to the previous SQL query. As each of our employees belongs to a department, we now want a report showing the department ID and the employee name with the second highest salary in this department. We want one record for each department in the company. Here’s the query:

WITH employee_ranking AS (
  SELECT 
    employee_id, 
    last_name, 
    first_name, 
    salary, 
    dept_id
    RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) as ranking
  FROM employee
)
SELECT
  dept_id, 
  employee_id, 
  last_name, 
  first_name, 
  salary
FROM employee_ranking
WHERE ranking = 2
ORDER BY dept_id, last_name

The main change introduced in this query is the PARTITION BY dept_id clause in OVER. This clause groups rows with the same dept_id, ordering the rows in each group by salary DESC. Then the RANK() function is calculated for each department.

In the main query, we return the dept_id and the employee data for those employees in position 2 of their departmental ranking.

You can read more about combining RANK() with PARTITION BY in How to Use the SQL RANK OVER (PARTITION BY). For those readers who want to find out more about finding the Nth highest row in a group, I recommend the article How to Find the Nth-Highest Salary by Department with SQL.

Example 6. List the First 50% Rows in a Result Set

In some cases, we could be interested in obtaining the first 50% of the result set, (or any other percentage). For this kind of report, there is a SQL function called NTILE() which receives an integer parameter indicating the number of subsets into which we want to divide the entire result set. For example NTILE(2) divides the result set into 2 subsets with the same quantity of elements; for each row, it returns a 1 or a 2 depending on the subset where the row is located.

Here’s the query:

WITH employee_ranking AS (
  SELECT 
    employee_id, 
    last_name, 
    first_name, 
    salary,
    NTILE(2) OVER (ORDER BY salary ) as ntile
  FROM employee
)
SELECT 
  employee_id, 
  last_name, 
  first_name, 
  salary
FROM employee_ranking
WHERE ntile = 1
ORDER BY salary

The above query returns only the rows in the first half of a report of employees ordered by salary in ascending order. We use the condition ntile = 1 to filter only those rows in the first half of the report. If you are interested in the NTILE() window function, see our article on the NTILE Function.

Example 7. List the Last 25% Rows in a Result Set

As with the previous query, in this example we will use NTILE(4) to divide the result set into 4 subsets; each subset will have 25% of the total result set. Using the NTILE() function, we will generate a column called ntile with the values 1, 2, 3, and 4:

WITH employee_ranking AS (
  SELECT 
    employee_id, 
    last_name, 
    first_name, 
    salary,
    NTILE(4) OVER (ORDER BY salary) as ntile
  FROM employee
)
SELECT 
  employee_id, 
  last_name, 
  first_name, 
  salary
FROM employee_ranking
WHERE ntile = 4
ORDER BY salary

The WHERE ntile = 4 condition filters only the rows in the last quarter of the report. The last clause ORDER BY salary orders the result set to be returned by the query, while OVER (ORDER BY salary) orders the rows before dividing them into 4 subsets using NTILE(4).

Example 8. Number the Rows in a Result Set

Sometimes we want to create a ranking that assigns each row a number indicating the position of that row in the ranking: 1 to the first row, 2 to the second one, and so on. SQL provides a few ways to do this. If we want a simple sequence of numbers from 1 to N, we can use the ROW_NUMBER() function. However if we want a ranking that allows two rows in the same position (i.e. because they share the same value) we can use the RANK() or DENSE_RANK() function. The following query creates a report where each row has a position value:

SELECT
  employee_id, 
  last_name, 
  first_name, 
  salary,
  ROW_NUMBER() OVER (ORDER BY employee_id) as ranking_position
FROM employee

If you want to learn about different advanced ranking functions, I recommend the article Overview of Ranking Functions in SQL.

Example 9. List All Combinations of Rows from Two Tables

In some cases, we could need a join that includes all the possible combinations of rows from two tables. Suppose we have a food company where we sell 3 kinds of cereal: corn flakes, sugared corn flakes, and rice flakes. All these cereals are sold in 3 different packaging sizes: 1 pound, 3 pounds, and 5 pounds. As we offer 3 products in 3 different package sizes, then we offer nine different combinations.

We have a product table with 3 records (corn flakes, sugared corn flakes and rice flakes) and another table called box_size with 3 records one for 1 pound and two records for 3 and 5 pounds, respectively. If we want to create a report with the price list for our nine combinations, we can use the following query:

SELECT
  grain.product_name,
  box_size.description, 
  grain.price_per_pound * box_size.box_weight
FROM product
CROSS JOIN box_sizes

The query result will be:

product	package_size	price
Corn flake	1 pound box	2.43
Corn flake	3 pound box	7.29
Corn flake	5 pound box	12.15
Sugared corn flake	1 pound box	2.85
Sugared corn flake	3 pound box	8.55
Sugared corn flake	5 pound box	14.25
Rice flake	1 pound box	1.98
Rice flake	3 pound box	5.94
Rice flake	5 pound box	9.90

The CROSS JOIN clause without any condition produces a table with all row combinations from both tables. Note we calculate the price based on the per-pound price stored in the product table and the weight from box_sizes with the expression:

    grain.price_per_pound * box_size.box_weight

A deep dive into the CROSS JOIN can be found in A Guide to the SQL CROSS JOIN.

Example 10. Join a Table to Itself

In some cases, we need to join a table to itself. Think about the employee table. Every row has a column called manager_id with the ID of the manager supervising this employee. Using a self-join we can obtain a report with the columns employee_name and manager_name; this will show us who manages each employee. Here is the query:

SELECT 	
  e1.first_name ||' '|| e1.last_name AS manager_name,
  e2.first_name ||' '|| e2.last_name AS employee_name
FROM employee e1
JOIN employee e2 
ON e1.employee_id = e2.manager_id

In the above query, we can see the table employee is referenced twice as e1 and e2, and the join condition is e1.employee_id = e2.manager_id. This condition links each employee row with the manager row. The article What Is a Self Join in SQL? An Explanation With Seven Examples will give you more ideas about when you can apply self joins in your SQL queries.

Example 11. Show All Rows with an Above-Average Value

We need a report showing all employees with a salary that’s higher than the company average. We can first create a subquery to obtain the average salary of the company, and then compare the salary of every employee with the subquery result. This is shown in the following example:

SELECT 
  first_name, 
  last_name, 
  salary
FROM employee  
WHERE salary > ( SELECT AVG(salary) FROM employee )

You can see the subquery that obtains the average salary in the WHERE clause. In the main query, we select the employee name and salary. You can read more about subqueries in the article SQL Subquery Practice: 15 Exercises with Solutions.

Example 12. Employees with Salaries Higher Than Their Departmental Average

Let’s suppose we want to obtain records for employees with salaries higher than the average salary in their departments. This query is different than the previous one because now we need a subquery to obtain the average salary for the current employee’s department rather than the entire company This is called a correlated subquery because there’s a reference to a column in the current row of the main table of the query within the subquery.

Here’s the code:

SELECT
  first_name, 
  last_name, 
  salary
FROM employee e1 
WHERE salary > 
    (SELECT AVG(salary) 
     FROM employee e2 
     WHERE e1.departmet_id = e2.department_id)

In the subquery, we can see a reference to the column e1.department_id, which is a column referenced in the main query. The condition e1.departmet_id = e2.department_id is the key in the subquery because it allows us to obtain the average of all employees in the department of the current row. Once we obtain the departmental average salary, we compare it with the employee salary and filter accordingly.

Example 13. Obtain All Rows Where a Value Is in a Subquery Result

Suppose that John Smith manages several departments and we want to obtain a list of all the employees in those departments. We’ll use a subquery to obtain IDs of the departments managed by John Smith. Then we’ll use the IN operator to find the employees working in those departments:

SELECT 	
  first_name, 
  last_name
FROM employee e1 
WHERE department_id IN (
   SELECT department_id 
   FROM department
   WHERE manager_name=‘John Smith’)

The previous subquery is a multi-row subquery: it returns more than one row. In fact, it will return several rows because John Smith manages many departments. When working with multi-row subqueries, you need to use specific operators (like IN) in the WHERE condition involving the subquery.

Example 14. Find Duplicate Rows in SQL

If a table has duplicate rows, you can find them with SQL. Use a query with a GROUP BY clause including all the columns in the table and a HAVING clause to filter rows that appear more than one time. Here’s an example:

SELECT 	
  employee_id, 
  last_name, 
  first_name, 
  dept_id, 
  manager_id, 
  salary
FROM employee
GROUP BY 	
  employee_id, 
  last_name, 
  first_name, 
  dept_id, 
  manager_id, 
  salary
HAVING COUNT(*) > 1

The rows that are not duplicated will have a COUNT(*) equal to 1, but those rows that exist many times will have a COUNT(*) returning the number of times that the row exists. I suggest the article How to Find Duplicate Values in SQL if you want to find more details about this technique.

Example 15. Count Duplicate Rows

If you want to count duplicate rows, you can use the following query. It’s similar to the previous one, but we add a COUNT(*) in the SELECT list to show how many times each duplicate row appears in the table:

SELECT 	
  employee_id, 
  last_name, 
  first_name, 
  dept_id, 
  manager_id, 
  salary, 
  COUNT(*) AS number_of_rows
FROM employee
GROUP BY
  employee_id, 
  last_name, 
  first_name, 
  dept_id, 
  manager_id, 
  salary
HAVING COUNT(*) > 1

Again, you can find valuable information about how to manage duplicate records in the article How To Find Duplicate Records in SQL.

Example 16. Find Common Records Between Tables

If you have two tables with the same schema or if two tables have a subset of columns in common, you can obtain the rows that appear in both tables with the set operator INTERSECT. Let’s suppose we have a snapshot of the table employee taken in Jan 2020 called employee_2020_jan and we want to obtain the list of employees that exist in both tables. We can do that with this query:

SELECT 
  last_name, 
  first_name 
FROM employee
INTERSECT
SELECT 
  last_name, 
  first_name 
FROM employee_2020_jan

As a result, we will obtain a list of employees that appear in both tables. Perhaps they’ll have different values on the columns like salary or dept_id. In other words, we are obtaining those employees who worked for the company in Jan 2020 and who are still working for the company.

If you are interested in finding more about set operators, I suggest the article Introducing SQL Set Operators: Union, Union All, Minus, and Intersect.

Example 17. Grouping Data with ROLLUP

The GROUP BY clause in SQL is used to aggregate rows in groups and apply functions to all the rows in the group, returning a single result value. For example, if we want to obtain a report with the total salary amount per department and expertise level, we can do the following query:

SELECT 	
  dept_id, 
  expertise, 
  SUM(salary) total_salary
FROM	employee
GROUP BY dept_id, expertise

The GROUP BY has the optional clause ROLLUP, which allows it to include additional groupings in one query. Adding the ROLLUP clause to our example could give us the total sum of salaries for each department (no matter what expertise level the employee has) and the total sum of salaries for the whole table (no matter the employee’s department and expertise level). The modified query is:

SELECT 
  dept_id, 
  expertise, 
  SUM(salary) total_salary
FROM employee
GROUP BY ROLLUP (dept_id, expertise)

And the result will be:

dept_id	expertise	total_salary
Account	Senior	90000
Account	Junior	80000
Account	NULL	170000
CEO	Senior	250000
CEO	NULL	250000
IT	Senior	250000
IT	NULL	250000
Sales	Junior	110000
Sales	Semisenior	140000
Sales	Senior	180000
Sales	NULL	430000
NULL	NULL	1100000

The rows in the result set with a NULL are the extra rows added by the ROLLUP clause. A NULL value in the column expertise means a group of rows for a specific value of dept_id but without a specific expertise value. In other words, it is the total amount of salaries for each dept_id. In the same way, the last row of the result having a NULL for columns dept_id and expertise means the grand total for all departments in the company.

If you want to learn more about the ROLLUP clause and other similar clauses like CUBE, the article Grouping, Rolling, and Cubing Data has lots of examples.

Example 18. Conditional Summation

In some cases, we need to summarize or count values based on some condition(s). For example, if we want to obtain the total salaries in the Sales and Human Resources departments combined and in the IT and Support departments combined, we can execute the following query:

SELECT 
  SUM (CASE
    WHEN dept_id IN ('SALES','HUMAN RESOURCES') 
    THEN salary
    ELSE 0 END) AS total_salary_sales_and_hr,
  SUM (CASE
    WHEN dept_id IN ('IT','SUPPORT') 
    THEN salary
    ELSE 0 END) AS total_salary_it_and_support
FROM employee

The query returns a single row with two columns. The first column shows the total salary for the Sales and Human Resources departments. This value is calculated using the SUM() function on the salary column – but only when the employee belongs to the Sales or Human Resources department. A zero is added to the sum when the employee belongs to any other department. The same idea is applied for the total_salary_it_and_support column.

The article How to Use CASE WHEN with SUM() in SQL provides more details about this technique.

Example 19. Group Rows by a Range

In the next example query, we will create the salary ranges low, medium, and high. Then we will count how many employees are in each salary range:

SELECT 
  CASE
    WHEN salary <= 750000 THEN 'low'
    WHEN salary > 750000 AND salary <= 100000 THEN 'medium'
    WHEN salary > 100000 THEN 'high'
  END AS salary_category, 
  COUNT(*) AS number_of_employees
FROM	employee
GROUP BY 
  CASE
    WHEN salary <= 750000 THEN 'low'
    WHEN salary > 750000 AND salary <= 100000 THEN 'medium'
    WHEN salary > 100000 THEN 'high'
END

In this query, we use CASE to define the salary range for each employee. You can see the same CASE statement twice. The first one defines the ranges, as we just said; the second one in the GROUP BY aggregates records and applies the COUNT(*) function to each group of records. You can use the CASE statement in the same way to compute counts or sums for other custom-defined levels.

How to Use CASE in SQL explains other examples of CASE statements like the one used in this query.

Example 20. Compute a Running Total in SQL

A running total is a very common SQL pattern, one that’s used frequently in finance and in trend analysis.

When you have a table that stores any daily metric, such as a sales table with the columns day and daily_amount, you can calculate the running total as the cumulative sum of all previous daily_amount values. SQL provides a window function called SUM() to do just that.

In the following query, we’ll calculate the cumulative sales for each day:

SELECT 
  day,
  daily_amount,
  SUM (daily_amount) OVER (ORDER BY day) AS running_total
FROM sales

The SUM() function uses the OVER() clause to define the order of the rows; all rows previous to the current day are included in the SUM(). Here’s a partial result:

day	daily_amount	running_total
Jan 30, 2023	1000.00	1000.00
Jan 31, 2023	800.00	1800.00
Feb 1, 2023	700.00	2500.00

The first two columns day and daily_amount are values taken directly from the table sales. The column running_total is calculated by the expression:

SUM (daily_amount) OVER (order by day)

You can clearly see how the running_total is the accumulated sum of the previous daily_amounts.

If you wish to go deeper on this topic, I suggest the article What Is a SQL Running Total and How Do You Compute It?, which includes many clarifying examples.

Example 21. Compute a Moving Average in SQL

A moving average is a time series technique for analyzing trends in data. It is calculated as the average of the current value and a specified number of immediately preceding values for each point in time. The main idea is to examine how these averages behave over time instead of examining the behavior of the original or raw data points.

Let’s calculate the moving average for the last 7 days using the sales table from the previous example:

SELECT 
  day,
  daily_amount,
  AVG (daily_amount) OVER (ORDER BY day ROWS 6 PRECEDING)
    AS moving_average
FROM sales

In the above query, we use the AVG() window function to calculate the average using the current row (today) and the previous 6 rows. As the rows are ordered by day, the current row and the 6 previous rows defines a period of 1 week.

The article What a Moving Average Is and How to Compute it in SQL goes into detail about this subject; check it out if you want to learn more.

Example 22. Compute a Difference (Delta) Between Two Columns on Different Rows

There’s more than one way to calculate the difference between two rows in SQL. One way to do it is by using the window functions LEAD() and LAG(), as we will do in this example.

Let’s suppose we want to obtain a report with the total amount sold on each day, but we also want to obtain the difference (or delta) related to the previous day. We can use a query like this one:

SELECT 
  day,
  daily_amount,
  daily_amount - LAG(daily_amount) OVER (ORDER BY day)
    AS delta_yesterday_today
FROM sales

The key expression in this query is:

daily_amount - LAG(daily_amount) OVER (ORDER BY day)

Both elements of the arithmetic difference come from different rows. The first element comes from the current row and LAG(daily_amount) comes from the previous day row. LAG() returns the value of any column from the previous row (based on the ORDER BY specified in the OVER clause).

If you want to read more about LAG() and LEAD(), I suggest the article How to Calculate the Difference Between Two Rows in SQL.

Example 23. Compute a Year-Over-Year Difference

Year-over-year (YOY) or month-to-month comparisons are a popular and effective way to evaluate the performance of several kinds of organizations. You can calculate the comparison as a value or as a percentage.

In this example, we will use the sales table, which has data in a daily granularity. We first need to aggregate the data to the year or month, which we will do by creating a CTE with amounts aggregated by year. Here’s the query:

WITH year_metrics AS (
  SELECT 
    extract(year from day) as year,
    SUM(daily_amount) as year_amount
  FROM sales 
  GROUP BY year)
SELECT 
  year, 
  year_amount,
  LAG(year_amount) OVER (ORDER BY year) AS revenue_previous_year,
  year_amount - LAG(year_amount) OVER (ORDER BY year) as yoy_diff_value,
  ((year_amount - LAG(year_amount) OVER (ORDER BY year) ) /
     LAG(year_amount) OVER (ORDER BY year)) as yoy_diff_perc
FROM year_metrics
ORDER BY 1

The first expression to analyze is the one used to calculate yoy_diff_value:

year_amount - LAG(year_amount ) OVER (ORDER BY year)

It is used to calculate the difference (as a value) between the amount of the current year and the previous year using the LAG() window function and ordering the data by year.

In the next expression, we calculate the same difference as a percentage. This calculation is a little more complex because we need to divide by the previous year’s amount. (Note: We use the previous year as the base for percentage calculation, so the previous year is 100 percent.)

((year_amount-LAG(year_amount ) OVER(ORDER BY year))/LAG(year_amount ) OVER(ORDER BY year))

In the article How to Compute Year-Over-Year Differences in SQL, you can find several examples of calculating year-to-year and month-to-month differences.

Example 24. Use Recursive Queries to Manage Data Hierarchies

Some tables in SQL can have an implicit kind of data hierarchy. As an example, our employee table has a manager_id for each employee. We have a manager who is in charge of other managers, who in turn have other employees under their charge, and so on.

When we have this sort of organization, we can have a hierarchy of various levels. In each row, the column manager_id refers to the row on the immediate upper level in the hierarchy. In these cases, a frequent request is to obtain a list of all employees reporting (directly or indirectly) to the CEO of the company (who, in this case, has the employee_id of 110). The query to use is:

WITH RECURSIVE subordinate AS (
 SELECT  
   employee_id,
   first_name,
   last_name,
   manager_id
  FROM employee
  WHERE employee_id = 110 -- id of the top hierarchy employee (CEO)
 
  UNION ALL
 
  SELECT  
    e.employee_id,
    e.first_name,
    e.last_name,
    e.manager_id
  FROM employee e 
  JOIN subordinate s 
  ON e.manager_id = s.employee_id
)
SELECT 	
  employee_id,
  first_name,
  last_name,
  manager_id
FROM subordinate ;

In this query, we created a recursive CTE called subordinate. It’s the key part of this query because it traverses the data hierarchy going from one row to the rows in the hierarchy immediately below it.

There are two subqueries connected by a UNION ALL; the first subquery returns the top row of the hierarchy and the second query returns the next level, adding those rows to the intermediate result of the query. Then the second subquery is executed again to return the next level, which again will be added to the intermediate result set. This process is repeated until no new rows are added to the intermediate result.

Finally, the main query consumes the data in the subordinate CTE and returns data in the way we expect. If you want to learn more about recursive queries in SQL, I suggest the article How to Find All Employees Under Each Manager in SQL.

Example 25. Find the Length of a Series Using Window Functions

Suppose we have a table with user registration data. We store information about how many users registered on each date. We define a data series as the sequence of consecutive days when users registered. A day when no user registers breaks the data series. For each data series, we want to find its length.

The table below shows data series:

id	day	Registered users
1	Jan 25 2023	51
2	Jan 26 2023	46
3	Jan 27 2023	41
4	Jan 30 2023	59
5	Jan 31 2023	73
6	Feb 1 2023	34
7	Feb 2 2023	56
8	Feb 4 2023	34

There are 3 different data series shown in different colors. We are looking for a query to obtain the length of each data series. The first data series starts on Jan 25 and has a length of 3 elements, the second one starts on Jan 30 and its length is 4, and so on.

The query is as follows:

WITH data_series AS (
  SELECT  	
    RANK() OVER (ORDER BY day) AS row_number,
    day, 
    day - RANK() OVER (ORDER BY day) AS series_id
 FROM	user_registration )
SELECT	
  MIN(day) AS series_start_day,
  MAX(day) AS series_end_day,
  MAX(day) - MIN (day) + 1 AS series_length
FROM	data_series
GROUP BY series_id
ORDER BY series_start_date

In the previous query, the CTE has the column series_id, which is a value intended to be used as an ID for the rows in the same data series. In the main query, the GROUP BY series_id clause is used to aggregate rows of the same data series. Then we can obtain the start of the series with MIN(day) and its end with MAX(day). The length of the series is calculated with the expression:

      MAX(day) - MIN (day) + 1

If you want to go deeper with this topic, the article How to Calculate the Length of a Series with SQL provides a detailed explanation of this technique.

Practice Advanced SQL with LearnSQL.com Courses

SQL is an easy-to-learn and powerful language. In this article, we showed 25 examples of advanced SQL queries. All of them can be explained in about 5 minutes, showing that SQL is an accessible language even when you need to do complex reports or queries.

If you want to continue learning SQL, I suggest our Advanced SQL track. The track contains Window Functions course, Recursive Queries, and GROUP BY Extensions in SQL. All of them cover complex areas of the SQL language in simple words and with plenty of examples. Sign up today for free!

Tags:

advanced sql