9th Mar 2021 8 minutes read

Why the SQL WITH Clause Is Awesome

Table of Contents

Why Is the WITH Clause So Useful?
Time to Practice WITH Clauses!

If you’re not using WITH clauses yet, it’s definitely time to start! SQL WITH clauses, or common table expressions, help improve the structure of SQL queries by making them more readable. That’s already a lot, but WITH clauses have many more benefits. Let’s see together!

The WITH clause was introduced in SQL:1999 to define views that are only valid for the query they belong to. Also known as common table expressions (CTEs), WITH clauses allow us to improve the structure of an SQL statement without polluting the database namespace.

Using WITH clauses, we can create named subqueries that are referenced in the main query. The basic syntax is:

WITH subquery1_name AS
(SELECT … subquery1...),
subquery2_name AS
(SELECT … subquery2...)
SELECT … main query ...

As you can see, we can have multiple CTEs in one query. Moreover, one common table expression can reference another CTE (i.e. nested CTEs) or even itself (recursive CTEs). This gives us a whole bunch of interesting options, as we’ll see later in this article.

If you are completely new to CTEs and subqueries, I recommend this beginner's guide to CTEs. You can also get hands-on practice writing WITH clauses in LearnSQL.com’s interactive Recursive Query course.

Now, let’s go through several examples to see how we can benefit from common table expressions in real-world scenarios.

Why Is the WITH Clause So Useful?

WITH clauses are great at improving the structure of complex SQL queries that contain multiple computations. Let’s see how they work in practice!

In our examples, we are going to analyze the San Francisco Bay area real estate market – arguably the most expensive place to live! So, here we have a table with houses available for sale.

houses
id	address	area	city	price
1	28 Sofia St	1,565	San Francisco	998,000
2	25 Jack St	1,680	San Francisco	1,850,000
3	13 Rose St	1,148	Redwood City	265,000
4	35 Mary Ave	3,800	San Francisco	5,500,000
5	54 Sara St	1,098	Redwood City	1,375,000
6	12 John St	1,910	San Francisco	1,195,000
7	32 Nice Ave	1,340	Mountain View	349,000
8	12 Flower Ave	1,234	Redwood City	1,048,000
9	34 Park St	1,505	Mountain View	2,400,000
10	78 Paul Ave	1,870	Mountain View	1,658,000

Now it’s time to demonstrate the benefits of the WITH clause.

Improving Readability

When using WITH syntax, you name parts of the query as you write them. Unlike subqueries, the name comes before the query itself. Thus, when reading the query, you already know from the name what to expect. Furthermore, a well-written WITH clause is self-documenting.

Let’s go through an example to see how WITH syntax makes the query more readable. To facilitate the evaluation of houses available for sale, we want to compare the price of each house with the minimum, average, and maximum house prices in its city.

We can get the necessary output by using three CTEs that calculate the (1) minimum, (2) average, and (3) maximum prices for all cities in the table. Then, we can simply join our original table with the outputs of these three queries to display the required data:

 WITH min_price_city AS
  (SELECT city, MIN(price) AS minimum
   FROM houses
   GROUP BY city),
avg_price_city AS
  (SELECT city, AVG(price) AS average
   FROM houses
   GROUP BY city),
max_price_city AS
  (SELECT city, MAX(price) AS maximum
   FROM houses
   GROUP BY city)
SELECT 
  h.id, 
  h.address, 
  h.city, 
  h.price, 
  min.minimum, 
  avg.average, 
  max.maximum
FROM houses h
JOIN min_price_city min
ON h.city = min.city
JOIN avg_price_city avg
ON h.city = avg.city
JOIN max_price_city max
ON h.city = max.city;

id	address	city	price	minimum	average	maximum
10	78 Paul Ave	Mountain View	1,658,000	349,000	1,469,000	2,400,000
9	34 Park St	Mountain View	2,400,000	349,000	1,469,000	2,400,000
7	32 Nice Ave	Mountain View	349,000	349,000	1,469,000	2,400,000
6	12 John St	San Francisco	1,195,000	998,000	2,385,750	5,500,000
4	35 Mary Ave	San Francisco	5,500,000	998,000	2,385,750	5,500,000
2	25 Jack St	San Francisco	1,850,000	998,000	2,385,750	5,500,000
1	28 Sofia St	San Francisco	998,000	998,000	2,385,750	5,500,000
8	12 Flower Ave	Redwood City	1,048,000	265,000	896,000	1,375,000
5	54 Sara St	Redwood City	1,375,000	265,000	896,000	1,375,000
3	13 Rose St	Redwood City	265,000	265,000	896,000	1,375,000

In the query above, you can easily distinguish the three common table expressions. From their names, we can also quickly understand that the first clause calculates minimum prices in each city, the second one calculates the average prices, and the third one gives us the maximum prices. We also see how the result set of each CTE is used in the main query – they are referenced by name. WITH syntax makes queries easier to write and to read.

Breaking Computations into Parts

Another great benefit of WITH syntax is that it helps break computations into parts. In each WITH clause, you prepare a temporary table with the results of some computations; in the main query, you simply merge the results of all the preliminary computations.

For example, let’s say we want to calculate how many houses in each city are expensive (i.e. priced above $1.5M) and how many are relatively cheap (i.e. below $500K). For this purpose, we’ll have two WITH clauses that separately calculate the number of (1) expensive and (2) cheap houses in each city. For convenience, we’ll also have a third WITH clause to get the list of cities. In the main query, we’ll join the results of these three subqueries to display the requested information:

WITH cities AS
   (SELECT city
    FROM houses
    GROUP BY city),
  expensive AS
  (SELECT city, COUNT(*) AS expensive_houses
   FROM houses
   WHERE price > 1500000
   GROUP BY city),
  cheap AS
  (SELECT city, COUNT(*) AS cheap_houses
   FROM houses
   WHERE price < 500000
   GROUP BY city)
SELECT 
  c.city, 
  e.expensive_houses, 
  ch.cheap_houses
FROM cities c
FULL JOIN expensive e
ON c.city = e.city
FULL JOIN cheap ch
ON c.city = ch.city;

Here is the result of the query. In this case, the NULL values reflect that there are no houses above $1.5M in Redwood City and no houses below $500K in San Francisco:

city	expensive_houses	cheap_houses
Mountain View	2	1
San Francisco	2	NULL
Redwood City	NULL	1

As you see, WITH clauses make creating complex reports much easier. For more examples of applying WITH to complex analytical tasks, check out this article explaining CTEs by example.

Nesting Computations

Considering that the WITH syntax allows nesting, you can have several ‘layers’ of computations. For example, you can use one WITH clause to compute one aggregate function (like AVG()), then use another CTE to aggregate the result set of the first WITH clause (i.e. calculate the minimum average).

Let’s demonstrate this with our house example. Our task will be to compute the average house price for each city and then compare this price with the minimum and maximum average price across the cities of the San Francisco Bay area.

WITH avg_per_city AS (
    SELECT city, AVG(price) AS average_price
    FROM houses
    GROUP BY city),
min_price_city AS (
    SELECT MIN (average_price) AS min_avg_price_city
    FROM avg_per_city),
max_price_city AS (
    SELECT MAX (average_price) AS max_avg_price_city
    FROM avg_per_city)    
SELECT 
  ac.city, 
  ac.average_price, 
  min.min_avg_price_city,
  max.max_avg_price_city
FROM avg_per_city ac
CROSS JOIN min_price_city min
CROSS JOIN max_price_city max;

As you see, the first WITH clause computes the average house price for each city. The other two WITH clauses reference the first one to compute the minimum and maximum average prices across different cities. In the main query, we join the results of these CTEs to get the required output:

city	average_price	min_avg_price_city	max_avg_price_city
Mountain View	1469000	896000	2385750
San Francisco	2385750	896000	2385750
Redwood City	896000	896000	2385750

WITH clauses help build complex computations in one query. Excited? Wait until you hear about the amazing opportunities that recursive CTEs bring into play!

Processing Hierarchical Structures

The WITH syntax allows recursion. A recursive query is a query that refers to itself. Recursive CTEs are awesome for processing graphs, trees, and other hierarchical structures.

Let’s say we have a company with a hierarchy where each employee has exactly one superior and superiors may have multiple subordinates:

We can store these hierarchical relations in a table. In the employees table below, each employee is represented by a record that includes the employee’s ID, first name, last name, and the ID of their immediate superior.

employees
id	first_name	last_name	superior_id
1	Mariya	Stevens	NULL
2	Bob	White	1
3	Sarah	Grey	1
4	Kate	Williams	2
5	Tanya	Smith	2
6	Marat	Simpson	3
7	Jack	Sailor	3
8	Sophia	Barrel	3
9	Tom	Star	3

Our task is to show the level of each employee in the company’s organizational structure, i.e. level 1 for the big boss, level 2 for her direct subordinates, and so on.

Recursive queries let us accomplish this task in one query:

WITH RECURSIVE levels AS (
  SELECT
    id,
    first_name,
    last_name,
    superior_id,
    1 AS level
  FROM employees
  WHERE superior_id IS NULL
  UNION ALL
  SELECT
    employees.id,
    employees.first_name,
    employees.last_name,
    employees.superior_id,
    levels.level + 1
  FROM employees, levels
  WHERE employees.superior_id = levels.id
)
SELECT *
FROM levels;

id	first_name	last_name	superior_id	level
1	Mariya	Stevens	NULL	1
2	Bob	White	1	2
3	Sarah	Grey	1	2
4	Kate	Williams	2	3
5	Tanya	Smith	2	3
6	Marat	Simpson	3	3
7	Jack	Sailor	3	3
8	Sophia	Barrel	3	3
9	Tom	Star	3	3

In this example, we have created a recursive query called levels that refers to itself in the query body. The syntax of recursive CTEs is quite complicated. For a detailed explanation, I recommend reading this article that explains how recursive queries assist in processing hierarchical structure.

Time to Practice WITH Clauses!

Now you know that WITH clauses are very handy for creating complex reports and processing hierarchical structures. However, just reading about them is not enough. This is a complicated topic, especially when it comes to recursion. Practice is key to success here!

I recommend starting with LearnSQL.com’s interactive course on Recursive Queries. It includes 114 coding challenges that cover all types of common table expressions, including simple CTEs, nested CTEs, and recursive CTEs. The WITH syntax is usually not taught in a typical SQL course. So, this course is quite unique.

If you want to upgrade your skills in building complex, multilevel reports in SQL, consider also taking the SQL Reporting track. It includes three interactive courses that cover simple SQL reports, revenue trend analysis, and customer behavior analysis in SQL.

Thanks for reading, and happy learning!

Tags: