# 25 Advanced SQL Query Examples

*One of the best ways to learn advanced SQL is by studying example queries. In this article, we'll show 25 examples of advanced SQL queries from medium to high complexity. You can use them to refresh your knowledge of advanced SQL or to review before a SQL interview. *

Many of the examples in this article will be based on the following ** employee** table. Only a few examples will be based on other tables; in these cases, the tables will be explained along with the example.

employee_id | first_name | last_name | dept_id | manager_id | salary | expertise |
---|---|---|---|---|---|---|

100 | John | White | IT | 103 | 120000 | Senior |

101 | Mary | Danner | Account | 109 | 80000 | junior |

102 | Ann | Lynn | Sales | 107 | 140000 | Semisenior |

103 | Peter | O'connor | IT | 110 | 130000 | Senior |

106 | Sue | Sanchez | Sales | 107 | 110000 | Junior |

107 | Marta | Doe | Sales | 110 | 180000 | Senior |

109 | Ann | Danner | Account | 110 | 90000 | Senior |

110 | Simon | Yang | CEO | null | 250000 | Senior |

111 | Juan | Graue | Sales | 102 | 37000 | Junior |

Even for persons with SQL expertise, a good online interactive SQL course can be a real help. You can find the most complete set of interactive SQL courses in our SQL from A to Z track. It contains **7 interactive SQL** courses with over 850(!) exercises logically arranged to take you from a complete beginner to an advanced SQL user. The beginner courses cover the foundations of SQL and are a perfect way to review and refresh your basic SQL knowledge. The advanced SQL courses will teach you concepts like window functions, recursive queries, and complex SQL reports. Create a free LearnSQL.com account and try our interactive courses without having to spend any money. Then, if you like what you’re learning, you can buy full access.

Ok, let’s dig into our advanced SQL queries!

## 25 Advanced SQL Query Examples with Explanations

- Example #1 - Ranking Rows Based on a Specific Ordering Criteria
- Example #2 - List The First 5 Rows of a Result Set
- Example #3 - List the Last 5 Rows of a Result Set
- Example #4 - List The Second Highest Row of a Result Set
- Example #5 - List the Second Highest Salary By Department
- Example #6 - List the First 50% Rows in a Result Set
- Example #7 - List the Last 25% Rows in a Result Set
- Example #8 - Number the Rows in a Result Set
- Example #9 - List All Combinations of Rows from Two Tables
- Example #10 – Join a Table to Itself
- Example #11 – Show All Rows with an Above-Average Value
- Example #12 – Employees with Salaries Higher Than Their Departmental Average
- Example #13 – Obtain All Rows Where a Value Is in a Subquery Result
- Example #14 – Find Duplicate Rows in SQL
- Example #15 – Count Duplicate Rows
- Example #16 – Find Common Records Between Tables
- Example #17 – Grouping Data with ROLLUP
- Example #18 – Conditional Summation
- Example #19 – Group Rows by a Range
- Example #20 – Compute a Running Total in SQL
- Example #21 – Compute a Moving Average in SQL
- Example #22 – Compute a Difference (Delta) Between Two Columns on Different Rows
- Example #23 – Compute a Year-Over-Year Difference
- Example #24 – Use Recursive Queries to Manage Data Hierarchies
- Example #25 – Find the Length of a Series Using Window Functions

### Example #1 - Ranking Rows Based on a Specific Ordering Criteria

Sometimes we need to create a SQL query to show a ranking of rows based on a specific order criteria. In this example query, we will show a list of all employees ordered by salary (highest salary first). The report will include the position of each employee in the ranking.

Here’s the code:

SELECT employee_id, last_name, first_name, salary, RANK() OVER (ORDER BY salary DESC) as ranking FROM employee ORDER BY ranking

In the above query, we use the function `RANK()`

. It is a window function that returns each row’s position in the result set, based on the order defined in the `OVER`

clause (1 for the highest salary, 2 for the second-highest, and so on). We need to use an `ORDER BY`

ranking clause at the end of the query to indicate the order on which the result set will be shown.

If you want to know more about ranking functions in SQL, I recommend our article What Is the RANK() Function in SQL, and How Do You Use It?

### Example #2 - List The First 5 Rows of a Result Set

The next SQL query creates a report with the employee data for the **top 5 **salaries in the company. This kind of report must be ordered based on a given criteria; in our example, the order criteria will again be `salary DESC`

:

WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, RANK() OVER (ORDER BY salary DESC) as ranking FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking <= 5 ORDER BY ranking

The `WITH`

clause in the previous query creates a CTE called `employee_ranking`

, which is a kind of virtual table that’s consumed in the main query. The subquery in the CTE uses the function `RANK()`

to obtain the position of each row in the ranking. The clause `OVER (ORDER BY salary DESC)`

indicates how the `RANK()`

value must be calculated. The `RANK()`

function for the row with the highest salary will return 1, and so on.

Finally, in the `WHERE`

of the main query we ask for those rows with a ranking value smaller or equal than 5. This lets us obtain only the top 5 rows by ranking value. Again, we use an ORDER BY clause to show the result set, which is ordered by rank ascending.

### Example #3 - List the Last 5 Rows of a Result Set

This query is similar to the top 5 query, but we want the **last 5** rows. We only need to introduce a change in the type of order, i.e. using ASC instead of DESC. In the CTE, we will create a ranking column based on an ascendent order of salary (lowest salary first):

WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, RANK() OVER (ORDER BY salary ASC) as ranking FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking <= 5 ORDER BY ranking

In the main query, we use `WHERE ranking <= 5`

to filter the rows with the 5 lowest salaries. After that, we use` ORDER BY ranking`

to order the rows of the report by ranking value.

### Example #4 - List The Second Highest Row of a Result Set

Let’s suppose we’d like to obtain the data of the employee with the second highest salary in the company. We can apply a similar approach to our previous query:

WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, RANK() OVER (ORDER BY salary DESC) as ranking FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking = 2

The `WHERE`

condition `ranking = 2`

is used to filter the rows with the salary in position 2. Note that we can have more than one employee in position 2 if they have the same salary.

At this point, it is important to understand the behavior of the `RANK()`

function as well as other available functions like `ROW_NUMBER()`

and `DENSE_RANK()`

. This topic is covered in detail in our Overview of Ranking Functions in SQL. I strongly recommend reading this article if you need to work with different kinds of rankings.

### Example #5 - List the Second Highest Salary By Department

Let’s add a variation to the previous SQL query. As each of our employees belongs to a department, we now want a report showing the department ID and the employee name with the second highest salary in this department. We want one record for each department in the company. Here’s the query:

WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, dept_id RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) as ranking FROM employee ) SELECT dept_id, employee_id, last_name, first_name, salary FROM employee_ranking WHERE ranking = 2 ORDER BY dept_id, last_name

The main change introduced in this query is the `PARTITION BY`

`dept_id`

clause in `OVER`

. This clause groups rows with the same `dept_id`

, ordering the rows in each group by salary `DESC`

. Then the `RANK()`

function is calculated for each department.

In the main query, we return the `dept_id`

and the employee data for those employees in position 2 of their departmental ranking.

For those readers who want to find out more about finding the *N*th highest row in a group, I recommend the article How to Find the Nth-Highest Salary by Department with SQL.

### Example #6 - List the First 50% Rows in a Result Set

In some cases, we could be interested in obtaining the first 50% of the result set, (or any other percentage). For this kind of report, there is a SQL function called `NTILE()`

which receives an integer parameter indicating the number of subsets into which we want to divide the entire result set. For example `NTILE(2)`

divides the result set into 2 subsets with the same quantity of elements; for each row, it returns a 1 or a 2 depending on the subset where the row is located.

Here’s the query:

WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, NTILE(2) OVER (ORDER BY salary ) as ntile FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ntile = 1 ORDER BY salary

The above query returns only the rows in the first half of a report of employees ordered by salary in ascending order. We use the condition `ntile = 1`

to filter only those rows in the first half of the report. If you are interested in the `NTILE()`

window function, see the article Common SQL Window Functions: Using Partitions With Ranking Functions.

### Example #7 - List the Last 25% Rows in a Result Set

As with the previous query, in this example we will use `NTILE(4)`

to divide the result set into 4 subsets; each subset will have 25% of the total result set. Using the `NTILE()`

function, we will generate a column called `ntile`

with the values 1, 2, 3, and 4:

WITH employee_ranking AS ( SELECT employee_id, last_name, first_name, salary, NTILE(4) OVER (ORDER BY salary) as ntile FROM employee ) SELECT employee_id, last_name, first_name, salary FROM employee_ranking WHERE ntile = 4 ORDER BY salary

The `WHERE ntile = 4`

condition filters only the rows in the last quarter of the report. The last clause `ORDER BY salary`

orders the result set to be returned by the query, while `OVER (ORDER BY salary)`

orders the rows before dividing them into 4 subsets using `NTILE(4)`

.

### Example #8 - Number the Rows in a Result Set

Sometimes we want to create a ranking that assigns each row a number indicating the position of that row in the ranking: 1 to the first row, 2 to the second one, and so on. SQL provides a few ways to do this. If we want a simple sequence of numbers from 1 to *N,* we can use the `ROW_NUMBER()`

function. However if we want a ranking that allows two rows in the same position (i.e. because they share the same value) we can use the `RANK()`

or `DENSE_RANK()`

function. The following query creates a report where each row has a position value:

SELECT employee_id, last_name, first_name, salary, ROW_NUMBER() OVER (ORDER BY employee_id) as ranking_position FROM employee

If you want to learn about different advanced ranking functions, I recommend the article Overview of Ranking Functions in SQL.

### Example #9 - List All Combinations of Rows from Two Tables

In some cases, we could need a join that includes all the possible combinations of rows from two tables. Suppose we have a food company where we sell 3 kinds of cereal: corn flakes, sugared corn flakes, and rice flakes. All these cereals are sold in 3 different packaging sizes: 1 pound, 3 pounds, and 5 pounds. As we offer 3 products in 3 different package sizes, then we offer nine different combinations.

We have a ** product** table with 3 records (corn flakes, sugared corn flakes and rice flakes) and another table called

**with 3 records one for 1 pound and two records for 3 and 5 pounds, respectively. If we want to create a report with the price list for our nine combinations, we can use the following query:**

`box_size`

SELECT grain.product_name, box_size.description, grain.price_per_pound * box_size.box_weight FROM product CROSS JOIN box_sizes

The query result will be:

product | package_size | price |
---|---|---|

Corn flake | 1 pound box | 2.43 |

Corn flake | 3 pound box | 7.29 |

Corn flake | 5 pound box | 12.15 |

Sugared corn flake | 1 pound box | 2.85 |

Sugared corn flake | 3 pound box | 8.55 |

Sugared corn flake | 5 pound box | 14.25 |

Rice flake | 1 pound box | 1.98 |

Rice flake | 3 pound box | 5.94 |

Rice flake | 5 pound box | 9.90 |

The `CROSS JOIN`

clause without any condition produces a table with all row combinations from both tables. Note we calculate the price based on the per-pound price stored in the ** product** table and the weight from

**with the expression:**

`box_sizes`

grain.price_per_pound * box_size.box_weight

A deep dive into the `CROSS JOIN`

can be found in An Illustrated Guide to the SQL CROSS JOIN.

### Example #10 – Join a Table to Itself

In some cases, we need to join a table to itself. Think about the ** employee** table. Every row has a column called

`manager_id`

with the ID of the manager supervising this employee. Using a self-join we can obtain a report with the columns `employee_name`

and `manager_name`

; this will show us who manages each employee. Here is the query:SELECT e1.first_name ||' '|| e1.last_name AS manager_name, e2.first_name ||' '|| e2.last_name AS employee_name FROM employee e1 JOIN employee e2 ON e1.employee_id = e2.manager_id

In the above query, we can see the table ** employee** is referenced twice as

`e1`

and `e2`

, and the join condition is `e1.employee_id = e2.manager_id`

. This condition links each employee row with the manager row. The article What Is a Self Join in SQL? An Explanation With Seven Examples will give you more ideas about when you can apply self joins in your SQL queries.### Example #11 – Show All Rows with an Above-Average Value

We need a report showing all employees with a salary that’s higher than the company average. We can first create a subquery to obtain the average salary of the company, and then compare the salary of every employee with the subquery result. This is shown in the following example:

SELECT first_name, last_name, salary FROM employee WHERE salary > ( SELECT AVG(salary) FROM employee )

You can see the subquery that obtains the average salary in the WHERE clause. In the main query, we select the employee name and salary. You can read more about subqueries in the article How to practice SQL subqueries.

### Example #12 – Employees with Salaries Higher Than Their Departmental Average

Let’s suppose we want to obtain records for employees with salaries higher than the average salary in their departments. This query is different than the previous one because now we need a subquery to obtain the average salary for the current employee’s department rather than the entire company This is called a correlated subquery because there’s a reference to a column in the current row of the main table of the query within the subquery.

Here’s the code:

SELECT first_name, last_name, salary FROM employee e1 WHERE salary > (SELECT AVG(salary) FROM employee e2 WHERE e1.departmet_id = e2.department_id)

In the subquery, we can see a reference to the column e1.department_id, which is a column referenced in the main query. The condition `e1.departmet_id = e2.department_id`

is the key in the subquery because it allows us to obtain the average of all employees in the department of the current row. Once we obtain the departmental average salary, we compare it with the employee salary and filter accordingly.

### Example #13 – Obtain All Rows Where a Value Is in a Subquery Result

Suppose that John Smith manages several departments and we want to obtain a list of all the employees in those departments. We’ll use a subquery to obtain IDs of the departments managed by John Smith. Then we’ll use the `IN`

operator to find the employees working in those departments:

SELECT first_name, last_name FROM employee e1 WHERE department_id IN ( SELECT department_id FROM department WHERE manager_name=‘John Smith’)

The previous subquery is a multi-row subquery: it returns more than one row. In fact, it will return several rows because John Smith manages many departments. When working with multi-row subqueries, you need to use specific operators (like IN) in the WHERE condition involving the subquery.

### Example #14 – Find Duplicate Rows in SQL

If a table has duplicate rows, you can find them with SQL. Use a query with a `GROUP BY`

clause including all the columns in the table and a `HAVING`

clause to filter rows that appear more than one time. Here’s an example:

SELECT employee_id, last_name, first_name, dept_id, manager_id, salary FROM employee GROUP BY employee_id, last_name, first_name, dept_id, manager_id, salary HAVING COUNT(*) > 1

The rows that are not duplicated will have a `COUNT(*)`

equal to 1, but those rows that exist many times will have a `COUNT(*)`

returning the number of times that the row exists. I suggest the article How to Find Duplicate Values in SQL if you want to find more details about this technique.

### Example #15 – Count Duplicate Rows

If you want to count duplicate rows, you can use the following query. It’s similar to the previous one, but we add a `COUNT(*)`

in the `SELECT`

list to show how many times each duplicate row appears in the table:

SELECT employee_id, last_name, first_name, dept_id, manager_id, salary, COUNT(*) AS number_of_rows FROM employee GROUP BY employee_id, last_name, first_name, dept_id, manager_id, salary HAVING COUNT(*) > 1

Again, you can find valuable information about how to manage duplicate records in the article How To Find Duplicate Records in SQL.

### Example #16 – Find Common Records Between Tables

If you have two tables with the same schema or if two tables have a subset of columns in common, you can obtain the rows that appear in both tables with the set operator `INTERSECT`

. Let’s suppose we have a snapshot of the table ** employee** taken in Jan 2020 called

**and we want to obtain the list of employees that exist in both tables. We can do that with this query:**

`employee_2020_jan`

SELECT last_name, first_name FROM employee INTERSECT SELECT last_name, first_name FROM employee_2020_jan

As a result, we will obtain a list of employees that appear in both tables. Perhaps they’ll have different values on the columns like `salary`

or `dept_id`

. In other words, we are obtaining those employees who worked for the company in Jan 2020 and who are still working for the company.

If you are interested in finding more about set operators, I suggest the article Introducing SQL Set Operators: Union, Union All, Minus, and Intersect.

### Example #17 – Grouping Data with ROLLUP

The `GROUP BY`

clause in SQL is used to aggregate rows in groups and apply functions to all the rows in the group, returning a single result value. For example, if we want to obtain a report with the total salary amount per department and expertise level, we can do the following query:

SELECT dept_id, expertise, SUM(salary) total_salary FROM employee GROUP BY dept_id, expertise

The `GROUP BY`

has the optional clause `ROLLUP`

, which allows it to include additional groupings in one query. Adding the `ROLLUP`

clause to our example could give us the total sum of salaries for each department (no matter what expertise level the employee has) and the total sum of salaries for the whole table (no matter the employee’s department and expertise level). The modified query is:

SELECT dept_id, expertise, SUM(salary) total_salary FROM employee GROUP BY ROLLUP (dept_id, expertise)

And the result will be:

dept_id | expertise | total_salary |
---|---|---|

Account | Senior | 90000 |

Account | Junior | 80000 |

Account | NULL | 170000 |

CEO | Senior | 250000 |

CEO | NULL | 250000 |

IT | Senior | 250000 |

IT | NULL | 250000 |

Sales | Junior | 110000 |

Sales | Semisenior | 140000 |

Sales | Senior | 180000 |

Sales | NULL | 430000 |

NULL | NULL | 1100000 |

The rows in the result set with a `NULL`

are the extra rows added by the `ROLLUP`

clause. A `NULL`

value in the column `expertise`

means a group of rows for a specific value of `dept_id`

but without a specific `expertise`

value. In other words, it is the total amount of salaries for each `dept_id`

. In the same way, the last row of the result having a `NULL`

for columns `dept_id`

and `expertise`

means the grand total for all departments in the company.

If you want to learn more about the `ROLLUP`

clause and other similar clauses like `CUBE`

, the article Grouping, Rolling, and Cubing Data has lots of examples.

### Example #18 – Conditional Summation

In some cases, we need to summarize or count values based on some condition(s). For example, if we want to obtain the total salaries in the Sales and Human Resources departments combined and in the IT and Support departments combined, we can execute the following query:

SELECT SUM (CASE WHEN dept_id IN ('SALES','HUMAN RESOURCES') THEN salary ELSE 0 END) AS total_salary_sales_and_hr, SUM (CASE WHEN dept_id IN ('IT','SUPPORT') THEN salary ELSE 0 END) AS total_salary_it_and_support FROM employee

The query returns a single row with two columns. The first column shows the total salary for the Sales and Human Resources departments. This value is calculated using the `SUM()`

function on the `salary`

column – but only when the employee belongs to the Sales or Human Resources department. A zero is added to the sum when the employee belongs to any other department. The same idea is applied for the `total_salary_it_and_support`

column.

The article How to Use CASE WHEN with SUM() in SQL provides more details about this technique.

### Example #19 – Group Rows by a Range

In the next example query, we will create the salary ranges `low`

, `medium`

, and `high`

. Then we will count how many employees are in each salary range:

SELECT CASE WHEN salary <= 750000 THEN 'low' WHEN salary > 750000 AND salary <= 100000 THEN 'medium' WHEN salary > 100000 THEN 'high' END AS salary_category, COUNT(*) AS number_of_employees FROM employee GROUP BY CASE WHEN salary <= 750000 THEN 'low' WHEN salary > 750000 AND salary <= 100000 THEN 'medium' WHEN salary > 100000 THEN 'high' END

In this query, we use CASE to define the salary range for each employee. You can see the same CASE statement twice. The first one defines the ranges, as we just said; the second one in the GROUP BY aggregates records and applies the COUNT(*) function to each group of records. You can use the CASE statement in the same way to compute counts or sums for other custom-defined levels.

How to Use CASE in SQL explains other examples of CASE statements like the one used in this query.

### Example #20 – Compute a Running Total in SQL

A running total is a very common SQL pattern, one that’s used frequently in finance and in trend analysis.

When you have a table that stores any daily metric, such as a `sales`

table with the columns `day`

and `daily_amount`

, you can calculate the running total as the cumulative sum of all previous `daily_amount`

values. SQL provides a window function called `SUM()`

to do just that.

In the following query, we’ll calculate the cumulative sales for each day:

SELECT day, daily_amount, SUM (daily_amount) OVER (ORDER BY day) AS running_total FROM sales

The `SUM()`

function uses the `OVER()`

clause to define the order of the rows; all rows previous to the current day are included in the `SUM()`

. Here’s a partial result:

day | daily_amount | running_total |
---|---|---|

Jan 30, 2023 | 1000.00 | 1000.00 |

Jan 31, 2023 | 800.00 | 1800.00 |

Feb 1, 2023 | 700.00 | 2500.00 |

The first two columns day and `daily_amount`

are values taken directly from the table

. The column **sales**`running_total`

is calculated by the expression:

SUM (daily_amount) OVER (order by day)

You can clearly see how the `running_total`

is the accumulated sum of the previous `daily_amounts`

.

If you wish to go deeper on this topic, I suggest the article What Is a SQL Running Total and How Do You Compute It?, which includes many clarifying examples.

### Example #21 – Compute a Moving Average in SQL

A moving average is a time series technique for analyzing trends in data. It is calculated as the average of the current value and a specified number of immediately preceding values for each point in time. The main idea is to examine how these averages behave over time instead of examining the behavior of the original or raw data points.

Let’s calculate the moving average for the last 7 days using the ** sales** table from the previous example:

SELECT day, daily_amount, AVG (daily_amount) OVER (ORDER BY day ROWS 6 PRECEDING) AS moving_average FROM sales

In the above query, we use the `AVG()`

window function to calculate the average using the current row (today) and the previous 6 rows. As the rows are ordered by day, the current row and the 6 previous rows defines a period of 1 week.

The article What a Moving Average Is and How to Compute it in SQL goes into detail about this subject; check it out if you want to learn more.

### Example #22 – Compute a Difference (Delta) Between Two Columns on Different Rows

There’s more than one way to calculate the difference between two rows in SQL. One way to do it is by using the window functions `LEAD()`

and `LAG()`

, as we will do in this example.

Let’s suppose we want to obtain a report with the total amount sold on each day, but we also want to obtain the difference (or delta) related to the previous day. We can use a query like this one:

SELECT day, daily_amount, daily_amount - LAG(daily_amount) OVER (ORDER BY day) AS delta_yesterday_today FROM sales

The key expression in this query is:

daily_amount - LAG(daily_amount) OVER (ORDER BY day)

Both elements of the arithmetic difference come from different rows. The first element comes from the current row and `LAG(daily_amount)`

comes from the previous day row. `LAG()`

returns the value of any column from the previous row (based on the `ORDER BY`

specified in the `OVER`

clause).

If you want to read more about `LAG()`

and `LEAD()`

, I suggest the article How to Calculate the Difference Between Two Rows in SQL.

### Example #23 – Compute a Year-Over-Year Difference

Year-over-year (YOY) or month-to-month comparisons are a popular and effective way to evaluate the performance of several kinds of organizations. You can calculate the comparison as a value or as a percentage.

In this example, we will use the ** sales** table, which has data in a daily granularity. We first need to aggregate the data to the year or month, which we will do by creating a CTE with amounts aggregated by year. Here’s the query:

WITH year_metrics AS ( SELECT extract(year from day) as year, SUM(daily_amount) as year_amount FROM sales GROUP BY year) SELECT year, year_amount, LAG(year_amount) OVER (ORDER BY year) AS revenue_previous_year, year_amount - LAG(year_amount) OVER (ORDER BY year) as yoy_diff_value, ((year_amount - LAG(year_amount) OVER (ORDER BY year) ) / LAG(year_amount) OVER (ORDER BY year)) as yoy_diff_perc FROM year_metrics ORDER BY 1

The first expression to analyze is the one used to calculate `yoy_diff_value`

:

year_amount - LAG(year_amount ) OVER (ORDER BY year)

It is used to calculate the difference (as a value) between the amount of the current year and the previous year using the `LAG()`

window function and ordering the data by year.

In the next expression, we calculate the same difference as a percentage. This calculation is a little more complex because we need to divide by the previous year’s amount. (Note: We use the previous year as the base for percentage calculation, so the previous year is 100 percent.)

((year_amount-LAG(year_amount ) OVER(ORDER BY year))/LAG(year_amount ) OVER(ORDER BY year))

In the article How to Compute Year-Over-Year Differences in SQL, you can find several examples of calculating year-to-year and month-to-month differences.

### Example #24 – Use Recursive Queries to Manage Data Hierarchies

Some tables in SQL can have an implicit kind of data hierarchy. As an example, our ** employee** table has a

`manager_id`

for each employee. We have a manager who is in charge of other managers, who in turn have other employees under their charge, and so on.When we have this sort of organization, we can have a hierarchy of various levels. In each row, the column `manager_id`

refers to the row on the immediate upper level in the hierarchy. In these cases, a frequent request is to obtain a list of all employees reporting (directly or indirectly) to the CEO of the company (who, in this case, has the `employee_id`

of `110`

). The query to use is:

WITH RECURSIVE subordinate AS ( SELECT employee_id, first_name, last_name, manager_id FROM employee WHERE employee_id = 110 -- id of the top hierarchy employee (CEO) UNION ALL SELECT e.employee_id, e.first_name, e.last_name, e.manager_id FROM employee e JOIN subordinate s ON e.manager_id = s.employee_id ) SELECT employee_id, first_name, last_name, manager_id FROM subordinate ;

In this query, we created a recursive CTE called `subordinate`

. It’s the key part of this query because it traverses the data hierarchy going from one row to the rows in the hierarchy immediately below it.

There are two subqueries connected by a `UNION ALL`

; the first subquery returns the top row of the hierarchy and the second query returns the next level, adding those rows to the intermediate result of the query. Then the second subquery is executed again to return the next level, which again will be added to the intermediate result set. This process is repeated until no new rows are added to the intermediate result.

Finally, the main query consumes the data in the `subordinate`

CTE and returns data in the way we expect. If you want to learn more about recursive queries in SQL, I suggest the article How to Find All Employees Under Each Manager in SQL.

### Example #25 – Find the Length of a Series Using Window Functions

Suppose we have a table with user registration data. We store information about how many users registered on each date. We define a data series as the sequence of consecutive days when users registered. A day when no user registers breaks the data series. For each data series, we want to find its length.

The table below shows data series:

id | day | Registered users |
---|---|---|

1 | Jan 25 2023 | 51 |

2 | Jan 26 2023 | 46 |

3 | Jan 27 2023 | 41 |

4 | Jan 30 2023 | 59 |

5 | Jan 31 2023 | 73 |

6 | Feb 1 2023 | 34 |

7 | Feb 2 2023 | 56 |

8 | Feb 4 2023 | 34 |

There are 3 different data series shown in different colors. We are looking for a query to obtain the length of each data series. The first data series starts on Jan 25 and has a length of 3 elements, the second one starts on Jan 30 and its length is 4, and so on.

The query is as follows:

WITH data_series AS ( SELECT RANK() OVER (ORDER BY day) AS row_number, day, day - RANK() OVER (ORDER BY day) AS series_id FROM user_registration ) SELECT MIN(day) AS series_start_day, MAX(day) AS series_end_day, MAX(day) - MIN (day) + 1 AS series_length FROM data_series GROUP BY series_id ORDER BY series_start_date

In the previous query, the CTE has the column `series_id`

, which is a value intended to be used as an ID for the rows in the same data series. In the main query, the `GROUP BY series_id`

clause is used to aggregate rows of the same data series. Then we can obtain the start of the series with `MIN(day)`

and its end with `MAX(day)`

. The length of the series is calculated with the expression:

MAX(day) - MIN (day) + 1

If you want to go deeper with this topic, the article How to Calculate the Length of a Series with SQL provides a detailed explanation of this technique.

## Practice Advanced SQL with LearnSQL.com Courses

SQL is an easy-to-learn and powerful language. In this article, we showed 25 examples of advanced SQL queries. All of them can be explained in about 5 minutes, showing that SQL is an accessible language even when you need to do complex reports or queries.

If you want to continue learning SQL, I suggest our advanced SQL courses: Window Functions course, Recursive Queries, and GROUP BY Extensions in SQL. All of them cover complex areas of the SQL language in simple words and with plenty of examples. Increase your skill and invest in yourself with SQL!