Back to articles list Articles Cookbook
18 minutes read

11 SQL Common Table Expression Exercises

In this article, we offer you 11 practice exercises that put your knowledge of common table expressions (CTEs) into practice. Each CTE exercise is accompanied by a solution and a detailed explanation.

As the saying goes, “A little progress each day adds up to big results”. And that is undoubtedly true for SQL proficiency. Just as you should regularly go to the gym to keep your muscles toned, you should do common table expressions exercises often to keep your querying skills toned for heavy data analysis work.

CTEs, or common table expressions, are powerful statements in SQL. They allow you to temporarily define a subquery within a SQL query and assign a name to it. This means the CTE can then be referred to by its name within the query – just as if it were a table.

The Value of CTEs

In addition to names, there are other important differences between CTEs and subqueries. CTEs are handy for structuring complex queries, making them easier to read, understand, and debug. CTEs can also be used repeatedly within the same query, making it more concise. These qualities of CTEs make them ideal for data analysis work, as it is common for data analysis queries to mix simple data SELECTs with grouped SELECTs with totals, averages, and other aggregate functions. Without CTEs, complex queries could become virtually impossible to read or require the creation of temporary tables or views that unnecessarily populate the database with lots of objects.

CTEs are also very useful when performing highly complex data analysis without resorting to other programming languages. Using recursion to find critical paths or to traverse abstract data types (such as trees and graphs) are examples that demonstrate CTEs’ usefulness. So is the possibility of creating nested queries, which gradually reduce the complexity of a problem until it becomes a simple SELECT.

CTEs’ ability to make queries more readable and concise is something you will appreciate if you need to review a long query years after writing it. Fortunately, most modern relational database management systems (RDBMSs) – including PostgreSQL, MySQL, SQL Server, and Oracle – allow the use of CTEs.

All the exercises compiled in this article were taken from our Recursive Queries course. It’s an in-depth tutorial on common table expressions in data analysis. The 114 interactive exercises cover simple CTEs, nested CTEs, and recursive CTEs in a total estimated time of 18 hours. You can also learn more about CTEs by reading CTEs Explained with Examples.

Another important topic that every data analyst should master is SQL window functions. You can check out this set of SQL window function practice exercises to put your skills to the test.

Now, let’s get started with our common table expression exercises. We’ll start with simple CTEs and then move on to nested and recursive CTEs.

Simple CTE Exercises

For these CTE exercises, we will use a database designed to manage crowdfunding projects. This schema is composed of three tables:

  1. supporter contains information about supporters, who are those who donate money for the projects.
  2. project contains information about the projects that receive donations from supporters.
  3. donation records the donations from supporters to the projects.

The supporter table stores the id, first_name, and last_name of each supporter in the system. Let’s see some of its rows:

idfirst_namelast_name
1MarleneWagner
2LonnieGoodwin
3SophiePeters
4EdwinPaul
5HughThornton

The project table stores id, category, author_id, and the minimal_amount needed to start each project. These are some of its rows:

idcategoryauthor_idminimal_amount
1music11677
2music521573
3traveling24952
4traveling53135
5traveling28555

The data in the author_id column links each project in the project table with a row of the supporter table. Each supporter related to a project by the column author_id is the author of that project.

Finally, the donation table contains id, supporter_id, the amount of the donation, and the column donated, showing the date each donation was made.

idproject_idsupporter_idamountdonated
144928.402016-09-07
2818384.382016-12-16
3612367.212016-01-21
4219108.622016-12-29
51020842.582016-11-30

Exercise 1: Using a CTE to Get Totalized Data

Exercise: Obtain the project ID, minimal amount, and total donations for projects that have received donations over the minimum amount.

Solution:

WITH project_revenue AS (
  SELECT
    project_id,
    SUM(amount) AS sum_amount
  FROM donation
  GROUP BY project_id
)
SELECT project.id, minimal_amount, sum_amount
FROM project_revenue
INNER JOIN project ON
project.id = project_revenue.project_id
WHERE sum_amount >= minimal_amount;

Explanation: To solve this exercise, we use a CTE called project_revenue that totals the donations of each project. This CTE has two columns: id and sum_amount, the latter being the calculated sum of donations for each project_id. After the CTE definition, we use a SELECT statement that joins the project table with the CTE. For each project that has received donations, the CTE returns its id, minimal_amount, and the total donations (sum_amount) it received.

The CTE project_revenue only includes rows for projects that have received donations because it gets data from the donation table. The SELECT below the CTE definition also shows only projects that received donations because of the INNER JOIN between the CTE and the project table. And the WHERE condition ensures that we will get only projects for which the amount donated exceeds the minimum amount.

If you need practice in grouping data in SQL, check out this set of 10 GROUP BY exercises. Try these advanced SQL practice exercises to speed your way to SQL proficiency.

Exercise 2: Using Multiple CTEs in the Same Statement

Exercise: Select supporters who donated more than $200 total or who donated at least twice.

Solution:

WITH rich AS (
  SELECT
	s.id,
	first_name,
	last_name
  FROM supporter s
  JOIN donation d
	ON d.supporter_id = s.id
  GROUP BY s.id, first_name, last_name
  HAVING SUM(amount) > 200
),
frequent AS (
  SELECT
	s.id,
	first_name,
	last_name
  FROM supporter s
  JOIN donation d
	ON d.supporter_id = s.id
  GROUP BY s.id, first_name, last_name
  HAVING COUNT(d.id) > 1
)
SELECT
  id,
  first_name,
  last_name
FROM rich
UNION ALL
SELECT
  id,
  first_name,
  last_name
FROM frequent;

Explanation: This exercise asks us to combine two different results that we must obtain by retrieving information from the donation and supporter tables: supporters whose total donations exceed $200 and supporters who made more than one donation. This situation is ideal to solve by writing two CTEs, one to obtain the first set of data (rich) and the other to obtain the second set (frequent).

SQL syntax supports writing multiple CTEs within the same command, which we took advantage of to solve this exercise. By placing each subquery in a different CTE, the final SELECT is simply the union of two simple SELECTs – each of which fetches data directly from one CTE.

Nested CTEs

Although no RDBMS allows the creation of a CTE within another CTE, what they do allow is nested CTEs; this is when a CTE refers to a previously-defined CTE as if it were a table. In this way, CTEs create different levels of abstraction. This makes the final query a simple and concise SELECT.

For our nested CTE exercises, we will use a table schema of a door-to-door sales company. This schema has three tables: salesman, daily_sales, and city. The salesman table includes the id, first_name, last_name, and city_id for each salesman. These are some of its rows:

idfirst_namelast_namecity_id
1FrederickWebster1
2CaseySantiago2
3CindyFields3
4TimothyPratt4
5SusanRose5

The daily_sales table represents the totalized sales per day and salesman. It has the columns day, salesman_id, items_sold, amount_earned, distance, and customers. The latter two columns show the distance traveled and the number of customers served by each salesman each day. These are a few of its rows:

daysalesman_iditems_soldamount_earneddistancecustomers
2017-01-15101673.203020
2017-01-152162288.4913613
2017-01-153171232.7812914
2017-01-15421496.882512
2017-01-155221384.1334018

Finally, we have the city table that stores the id, name, country, and region of each city:

idnamecountryregion
1ChicagoUSAAmericas
2New YorkUSAAmericas
3Mexico CityMexicoAmericas
4Rio de JaneiroBrasilAmericas
5ParisFranceEurope

Exercise 3: Using Nested CTEs to Move Incrementally Toward a Result

Exercise: Get the date, city ID, city name, and total amount of all daily sales – grouped by date and city – that exceed the average daily sales for all cities and all days.

Solution:

WITH earnings_per_day_city AS (
  SELECT
	ds.day,
	c.id,
	c.name,
	SUM(amount_earned) AS total_earnings
  FROM salesman s
  JOIN daily_sales ds
	ON s.id = ds.salesman_id
  JOIN city c
	ON s.city_id = c.id
  GROUP BY ds.day, c.id, c.name
),
overall_day_city_avg AS (
  SELECT
	AVG(total_earnings) AS avg_earnings
  FROM earnings_per_day_city
)
SELECT
  day,
  id,
  name,
  total_earnings
FROM earnings_per_day_city, overall_day_city_avg
WHERE total_earnings > avg_earnings;

Explanation: Nested CTEs let us break a problem into parts and gradually approach the solution. In this exercise, we first need to total the sales per day and per city. We do that with the first CTE, earnings_per_day_city.

Then we need to get an average of all totalized sales per day and per city. We achieve this with the overall_day_city_avg CTE, which in turn uses the results previously totaled by the earnings_per_day_city CTE. This second CTE will return a single row with the average sales for all days and all cities.

In the final SELECT, we simply take the data from the two CTEs (no need to combine them with a JOIN, since earnings_per_day_city returns a single row) and add the WHERE condition that the total sales of the day and the city must be greater than the overall average.

Exercise 4: Combining Nested Queries and Subqueries

Exercise: Obtain the date on which the average number of customers served per region was the lowest of all, displaying this average along with the date.

 

Solution:

WITH sum_region AS (
  SELECT
	day,
	region,
	SUM(customers) AS sum_customers
  FROM salesman s
  JOIN daily_sales ds
	ON s.id = ds.salesman_id
  JOIN city c
	ON s.city_id = c.id
  GROUP BY day, region
),
avg_region AS (
  SELECT
	day,
	AVG(sum_customers) AS avg_region_customers
  FROM sum_region
  GROUP BY day
)
SELECT
  day,
  avg_region_customers
FROM avg_region
WHERE avg_region_customers = (SELECT
  MIN(avg_region_customers)
  FROM avg_region);

Explanation: To solve this query we use the same successive approach to the solution as in the previous exercise, creating first a CTE to obtain the total number of customers served per day and per region and then another CTE based on the previous one to obtain the daily averages of customers served per day. Then, in the final SELECT, we use a subquery to get the minimum of the average number of customers per day and use it in the WHERE clause as the comparison value, so that the query returns the day that corresponds to that minimum.

If you wanted to further break up the final SELECT, you could add a third CTE instead of a subquery. This way, the final SELECT gets even simpler. Here’s the new (third) CTE and the outer SELECT:

min_avg_region as (
  SELECT
    MIN(avg_region_customers) as min_avg_region_customers
  FROM avg_region
  )
SELECT
  day,
  avg_region_customers
FROM avg_region, min_avg_region
WHERE avg_region_customers = min_avg_region_customers;

Exercise 5: Using Nested CTEs to Compute Complex Statistics

Exercise: For each city, calculate the average total distance traveled by each salesperson. Also calculate an overall average of all city averages.

Solution:

WITH distance_salesman_city AS (
  SELECT
	city_id,
	salesman_id,
	SUM(distance) AS sum_distance
  FROM daily_sales d
  JOIN salesman s
	ON d.salesman_id = s.id
  GROUP BY city_id, salesman_id
),
city_average AS (
  SELECT
	city_id,
	AVG(sum_distance) AS city_avg
  FROM distance_salesman_city
  GROUP BY city_id
)
SELECT AVG(city_avg)
FROM city_average;

Explanation: The benefits of nested CTEs are noticeable when you need to perform statistical calculations composed of several successive steps.

In this case, the final result is a total average of the averages per city of the sums of distances for each city and salesperson. That is an average of averages of sums. CTEs allow us to make a stepwise approximation to the result, analogous to how a data scientist would do it with statistical formulae.

Exercise 6: Using Nested CTEs to Compare Sets of Elements

Exercise: Compare the average sales of all salespersons in the USA with the average sales of all salespersons in the rest of the world.

Solution:

WITH cities_categorized AS (
  SELECT
    id AS city_id,
    CASE WHEN country = 'USA' THEN country ELSE 'Rest of the World' END AS category
  FROM city
),
sales_category AS (
  SELECT
    category,
    salesman_id,
    SUM(items_sold) total_sales
  FROM daily_sales ds
  JOIN salesman s
    ON s.id = ds.salesman_id
  JOIN cities_categorized
    ON cities_categorized.city_id = s.city_id
  GROUP BY category, salesman_id
)

SELECT
  category,
  AVG(total_sales)
FROM sales_category
GROUP BY category;

Explanation: In the first CTE (cities_categorized), we separated cities into two groups: cities in the USA and cities in the rest of the world.

In the second CTE, we combine information from the cities_categorized CTE with daily_sales and salesman to obtain sales totals grouped by the two categories of cities and by salesman.

In the final SELECT, we simply group by city category and get the average sales for each of the two categories (US cities and Rest of the World cities).

Recursive CTEs

In SQL programming, recursive CTEs are common table expressions that reference themselves. Like recursive functions used in other programming languages, recursive CTEs are based on the principle of taking the data resulting from a previous run, adding to it or modifying it, and passing the results to the next run. We keep doing the same until a stop condition is met, which is when the final result is obtained.

Recursive CTEs must have the word RECURSIVE after the word WITH. The best way to understand the operation of recursive CTEs is by using a simple example, as in the following exercise.

Exercise 7: Using Recursive CTEs to Generate Sequences

Exercise: Use recursion to list all integers from 1 to 10.

Solution:

WITH RECURSIVE ten_numbers(prev_number) AS (
  SELECT 1
  UNION ALL
  SELECT ten_numbers.prev_number + 1
  FROM ten_numbers
  WHERE prev_number < 10
)

SELECT *
FROM ten_numbers;

Explanation: This query adopts the PostgreSQL recursive CTEs notation, which has four parts:

  1. Anchor member: This is where we define the recursion’s starting point. This part of the query must be able to be solved autonomously, without the need to use results from previous iterations of the same CTE.
  2. Recursive member: This part is repeated as many times as necessary, using the results of the previous iteration as a basis .
  3. Termination condition: This condition is evaluated after each repetition of the recursive member; when it’s met, the recursive loop will end. If this condition were not present or always yielded a true result, the recursion would continue indefinitely.
  4. Invocation: The main difference between this SELECT query and other CTE main queries is that this SELECT acts as a trigger for the cycle of recursive executions.

In this exercise, the anchor member simply returns a row with the number 1. The recursive member takes the row(s) from the previous execution and appends (via the UNION clause) a new row with the previous value incremented by 1. The termination condition states that the query will continue iterating until the value obtained equals 10.

Exercise 8: Using a Recursive CTE to Traverse a Hierarchical Data Structure (Tree)

For this exercise, we will use the employee table, which has the columns id, first_name, last_name, and superior_id. Its rows contain the following data:

idfirst_namelast_namesuperior_id
1MadelineRaynull
2VioletGreen1
3AltonVasquez1
4GeoffreyDelgado1
5AllenGarcia2
6MarianDaniels2


Exercise: Show all data for each employee, plus a text showing the path in the organization’s hierarchy that separates each employee from the top boss (identified by the literal value 'Boss)'.

Solution:

WITH RECURSIVE hierarchy AS (
  SELECT
    id,
    first_name,
    last_name,
    superior_id,
    'Boss' AS path
  FROM employee
  WHERE superior_id IS NULL
  UNION ALL
  SELECT
    employee.id,
    employee.first_name,
    employee.last_name,
    employee.superior_id,
    hierarchy.path || '->' || employee.last_name
  FROM employee JOIN hierarchy
  ON employee.superior_id = hierarchy.id
)

SELECT *
FROM hierarchy;

Explanation: The data in the employee table represents a hierarchical or tree structure, where each row has a column that relates it to its superior (another row in the same table). The row that corresponds to the company boss (the root node of the tree) is the one that has a null value in the superior_id column. Therefore, that is our anchor member to build this recursive CTE. The path of this anchor member simply carries the literal value 'Boss'.

Then, the recursive query member joins the previous iteration of hierarchy with employee, setting the condition that the bosses of the employees of the current iteration (superior_id) are already in the hierarchy. This means that, for each iteration, we add another layer to the hierarchy. This layer is formed by the subordinates of the employees that were added in the previous iteration. Hence the join condition is employee.superior_id = hierarchy.id.

The path of each employee is assembled by concatenating the path of their boss (hierarchy.path, which shows all the way to 'Boss') with the last name of the employee of the current iteration, joined by a string representing an arrow (hierarchy.path || '->' || employee.last_name).

Exercise 9: Show the Path of a Tree Starting from the Root Node

Exercise: Display a list containing the first and last name of each employee (including the top boss), together with a text (the path field) showing the path of the tree between each employee and the top boss. In the case of the top boss, the path column should show the last_name of the boss.

Solution:

WITH RECURSIVE hierarchy AS (
  SELECT
	first_name,
	last_name,
	CAST(last_name AS text) AS path
  FROM employee
  WHERE superior_id IS NULL
  UNION ALL
  SELECT
	employee.first_name,
	employee.last_name,
	hierarchy.path || '->' || employee.last_name AS path
  FROM employee, hierarchy
  WHERE employee.superior_id = hierarchy.id
)

SELECT *
FROM hierarchy;

Explanation: The solution of this exercise is very similar to the previous exercise, with the only exception that the path value for the root node is not a literal TEXT-type value; it’s a last_name value in the employee table. This forces us to perform a data conversion to avoid getting an error when we run this query.

Since the CTE makes a UNION between the data returned by the anchor component and the data returned by the recursive component, it is imperative that both result sets have the same number of columns and that the data types of the columns match.

The last_name column of the employee table (referred as path in the CTE anchor member) is a VARCHAR type, while the concatenation hierarchy.path || '->' || employee.last_name (referred as path in the recursive member) automatically yields a TEXT column. For UNION not to cause a type mismatch error, it is necessary to CAST(last_name AS text) in the anchor member. In this way, the path columns of both parts of the CTE will be TEXT.

Exercise 10: Using Multiple Recursive Columns

Exercise: List all the data for each employee plus the path in the hierarchy until the top boss is reached, Include a column called distance that shows the number of people in the hierarchy from the top boss to the employee. For the top boss, the distance is 0; for their subordinates, it is 1; for the subordinates of their subordinates, it is 2, and so on.

Solution:

WITH RECURSIVE hierarchy AS (
  SELECT
	id,
	first_name,
	last_name,
	superior_id,
	'Boss' AS path,
	0 AS distance
  FROM employee
  WHERE superior_id IS NULL
  UNION ALL
  SELECT
	employee.id,
	employee.first_name,
	employee.last_name,
	employee.superior_id,
	hierarchy.path || '->' || employee.last_name,
	hierarchy.distance + 1
 FROM employee, hierarchy
 WHERE employee.superior_id = hierarchy.id
)
SELECT *
FROM hierarchy;

Explanation: As there are two recursive columns in the CTE, it is necessary to indicate an initial value for each of them in the anchor member. In this case, the path column has the initial value 'Boss' (as in exercise 8) and the distance column has the value 0. Then, in the recursive member, the distance value results from adding 1 to the distance of the previous iteration.

Exercise 11: Using Recursive CTEs to Traverse Network-type Data Structures

For this exercise, we use two tables: a destination table (consisting of the columns id and name) and a ticket table (consisting of the city_from, city_to, and cost columns). The destination table contains the IDs and names of a group of cities, while the ticket table indicates the cost of tickets between the pairs of cities in the destination table (where such connections exist).

This is sample data from both tables (first destination, then ticket):

idname
1Warsaw
2Berlin
3Bucharest
4Prague
city_fromcity_tocost
12350
1380
14220
23410
24230
32160
34110
42140
4375

Exercise: Find the cheapest route to travel between all cities in the destination table, starting from Warsaw. The query must show the following columns:

  • path – The names of the cities on the path, separated by '->'.
  • last_id – The id of the final city in this trip.
  • total_cost – The sum of the costs of the tickets.
  • count_places – The number of cities visited. This must be equal to the total number of cities in destination, i.e. 4.

Solution:

WITH RECURSIVE travel(path, last_id,
	total_cost, count_places) AS (
  SELECT
	CAST(name as text),
	Id,
	0,
	1
  FROM destination
  WHERE name = 'Warsaw'
  UNION ALL
  SELECT
	travel.path || '->' || c2.name,
	c2.id,
	travel.total_cost + t.cost,
	travel.count_places + 1
  FROM travel
  JOIN ticket t
	ON travel.last_id = t.city_from
  JOIN destination c1
	ON c1.id = t.city_from
  JOIN destination c2
	ON c2.id = t.city_to
  WHERE position(c2.name IN travel.path) = 0
)
SELECT *
FROM travel
WHERE count_places = 4
ORDER BY total_cost ASC;

Explanation: The method for solving this exercise is similar to the previous exercise. In this case, however, there is no direct order to the relationship between the elements in the same table. Instead, the relationships between the elements of the destination table are expressed in the ticket table, linking each pair of connected cities.

The first row of the results of the above query shows the lowest-cost route. This is possible because the outer SELECT of the query sorts the results in ascending order by total_cost. In turn, the outer SELECT makes sure that the number of cities traveled is 4 by setting the condition that count_places equals 4.

The recursive member ensures that each iteration adds a new city to the route by setting the condition that the name of the city is not already in the path (position(c2.name IN travel.path) = 0).

Since the starting point of the trip is the city of Warsaw, the anchor member of the recursive CTE is the row of the destination table where the name is equal to 'Warsaw'. Note that we convert the name column to the TEXT data type (as in Exercise 9) so that the data type matches the corresponding column in the recursive CTE member.

Need More Common Table Expression Exercises?

If you've been working through the SQL exercises in this article, you now know how to use common table expressions . And you know how useful CTEs can be. These exercises come from our Recursive Queries course, and there are even more exercises like these in the full course.

To get really good at data analysis with SQL, think about joining LearnSQL.com. We recommend checking out the All Forever SQL Package. It gives you lifetime access to all the SQL courses we offer now, plus any new ones we add later. This way, you can keep learning new skills forever. Join LearnSQL.com today and start building a great future in data analysis!