15th Jul 2021 10 minutes read

5 Reasons Why You Should Use CTEs Instead of Subqueries

Table of Contents

Common Table Expressions vs. Subqueries
5 Reasons to Choose CTEs
5 Reasons to Use CTEs Rather Than Subqueries
Let’s Practice CTEs!

Common Table Expressions, or CTEs, were introduced in SQL:1999 to handle cases where the output of one query is used within another query. But didn’t we already have subqueries for this? In this article, I’ll demonstrate with multiple examples why CTEs are better than subqueries for the structure and readability of your SQL queries.

Let’s start by reminding ourselves what CTEs and subqueries are and how they differ.

Common Table Expressions vs. Subqueries

A subquery is a query nested inside another query. It can be placed anywhere in a query, even within another subquery. The syntax is very simple – you just put your subquery inside parentheses and insert it into the main query, wherever it is needed. Note that subqueries are executed before the main query (also called the parent query or the outer query) so that their results can be used by the main query. It’s often quite tricky to read an SQL query with multiple subqueries because even though they are executed first, they are defined somewhere in the main query. You can read more about SQL subqueries in this introductory guide.

A Common Table Expression (CTE), also referred to as a WITH clause, is a temporary named result set that you can reference anywhere in your query. In contrast to subqueries, which are inserted exactly where you need them, all CTEs are defined before the main query and are then referenced in the query using the assigned name. In the examples below, we’ll see how this significantly improves the structure and readability of an SQL query. You first define all temporary result sets you need and then use them in the main query when necessary.

A detailed introduction to common table expressions can be found here. Learn more about the difference between SQL CTEs and subqueries in this overview article.

5 Reasons to Choose CTEs

SQL learners often wonder whether they should use a common table expression or a subquery when writing a complex SQL query. Let’s go through several examples to explain why you should prefer a CTE over a subquery. For these examples, I’ll be using the following table that summarizes the performance of employees from several departments.

performance
id	name	department	years_experience	kpi
11	Marta Stewart	Sales	1	68.8
12	Cathrine Black	Sales	1	91.7
13	Julian Lee	Operations	6	95.5
14	Oliver Henrikson	Sales	5	78.1
15	Julia Grey	Sales	4	73.2
16	Penelope Green	Operations	3	82.7
17	William Smith	Sales	2	97.4
18	Gabriel Garcia	Operations	1	90.1
19	Nikita Moore	Operations	1	75.2
20	Lucy Miller	Finance	3	79.9
21	Sheldon Cooper	Finance	3	98.9
22	Leonard Hofstadter	Finance	2	93.4

For each employee, we have their ID, full name, department, years of experience, and a KPI completion score on a scale from 0 to 100.

Let’s see how CTEs vs. subqueries may assist us in analyzing this data.

5 Reasons to Use CTEs Rather Than Subqueries

#1. CTEs Use Meaningful Names

You can give meaningful names to CTEs that specify your intention and make the query more readable. For example, let’s say we want to compare each employee’s performance with the average KPI completion in their respective departments. To add the corresponding column to our table, we first need to calculate the average KPI for each department. Our entire query with a subquery is as follows:

SELECT p.name, p.department, p.kpi, avg.average_dep_kpi
FROM performance p
JOIN 
  (SELECT department, AVG(kpi) AS average_dep_kpi
   FROM performance
   GROUP BY department) AS avg
ON p.department = avg.department;

name	department	kpi	average_dep_kpi
Marta Stewart	Sales	68.8	81.84
Cathrine Black	Sales	91.7	81.84
Julian Lee	Operations	95.5	85.875
Oliver Henrikson	Sales	78.1	81.84
Julia Grey	Sales	73.2	81.84
Penelope Green	Operations	82.7	85.875
William Smith	Sales	97.4	81.84
Gabriel Garcia	Operations	90.1	85.875
Nikita Moore	Operations	75.2	85.875
Lucy Miller	Finance	79.9	90.733
Sheldon Cooper	Finance	98.9	90.733
Leonard Hofstadter	Finance	93.4	90.733

You can get the same output using a common table expression:

WITH avg_department_kpi AS
  (SELECT department, AVG(kpi) AS average_dep_kpi
   FROM performance
   GROUP BY department)
SELECT p.name, p.department, p.kpi, avg.average_dep_kpi
FROM performance p
JOIN avg_department_kpi avg
ON p.department = avg.department;

CTEs and subqueries give the same output, and the queries are more or less of the same length. But note that the readability of the CTE query is better. What a complex query means might be obvious to you when you’re looking at your own code, but it may not be at all clear to your colleagues. When reading the CTE version:

You see the query in the order it will be executed: first the subquery and then the main query.
You can determine the purpose of the subquery based on its name. In our case, avg_department_kpi refers to the CTE that outputs the average KPI for each department.

At the same time, you may have noticed that the CTE query is a bit longer than the subquery version. That is not always the case, especially when we need the same subquery several times in our main query.

If you are new to WITH syntax, you can practice common table expressions in our interactive course. For now, let’s move to the second reason to prefer CTEs over subqueries.

#2. CTEs Are Reusable Within a Query

In contrast to subqueries, you don’t have to repeat a CTE definition each time you need it in the query. You define it only once, at the beginning of your query, and then reference it when necessary.

Let’s say we want to compare the performance of different departments. Specifically, we want to see the average KPI in every department along with the minimum and maximum average KPI across departments. We could address this problem using an SQL query with three subqueries:

To find the average KPI for each department.
To find the minimum average KPI across departments.
To find the maximum average KPI across departments.

Note that the last two subqueries will both need the result of the first one. Thus, if using subqueries, our solution will include subqueries defined within other subqueries. The whole thing looks quite confusing:

SELECT 
  ad.department, 
  ad.average_kpi, 
  min.min_avg_kpi_department,
  max.max_avg_kpi_department
FROM
    (SELECT department, AVG(kpi) AS average_kpi
    FROM performance
    GROUP BY department) AS ad
CROSS JOIN
    (SELECT MIN (average_kpi) AS min_avg_kpi_department
     FROM
       (SELECT department, AVG(kpi) AS average_kpi
        FROM performance
        GROUP BY department) AS ad) AS min
CROSS JOIN
    (SELECT MAX (average_kpi) AS max_avg_kpi_department
     FROM
       (SELECT department, AVG(kpi) AS average_kpi
        FROM performance
        GROUP BY department) AS ad) AS max;

In contrast, when using common table expressions, we define our three CTEs at the beginning of the query, reference them in the main query when needed, and avoid multiple nested subqueries:

WITH avg_per_department AS (
    SELECT department, AVG(kpi) AS average_kpi
    FROM performance
    GROUP BY department),

    min_kpi_department AS (
    SELECT MIN (average_kpi) AS min_avg_kpi_department
    FROM avg_per_department),

    max_kpi_department AS (
    SELECT MAX (average_kpi) AS max_avg_kpi_department
    FROM avg_per_department)
SELECT 
  ad.department, 
  ad.average_kpi, 
  min.min_avg_kpi_department,
  max.max_avg_kpi_department
FROM avg_per_department ad
CROSS JOIN min_kpi_department min
CROSS JOIN max_kpi_department max;

department	average_kpi	min_avg_kpi_department	max_avg_kpi_department
Finance	90.733	81.84	90.733
Operations	85.875	81.84	90.733
Sales	81.84	81.84	90.733

In this case, you can observe that WITH queries make your code not only more readable but also much shorter.

#3. CTEs Divide Complex Computations into Parts

CTEs bring clarity to the computation process. When subqueries are used for computations, the query often turns into a tangled mess of subqueries. CTEs, on the other hand, show the computation process more clearly.

Let’s say we want to know how many junior and experienced employees are in each department. We’ll consider those employees with two or fewer years of experience as junior and those with over two years of experience as experienced.

We basically need two subqueries here:

To calculate the number of junior employees in each department.
To calculate the number of experienced employees in each department.

Here’s how we can approach this with subqueries:

SELECT 
  e.department, 
  e.experienced_employees, 
  j.junior_employees
FROM (SELECT department, COUNT(*) AS experienced_employees
   FROM performance
   WHERE years_experience > 2
   GROUP BY department) AS e
JOIN (SELECT department, COUNT(*) AS junior_employees
   FROM performance
   WHERE years_experience <= 2
   GROUP BY department) AS j
ON e.department = j.department;

department	experienced_employees	junior_employees
Finance	2	1
Operations	2	2
Sales	2	3

This example is not very complicated, so you’re probably able to follow the above query. However, notice that:

You need to guess the purpose of each subquery.
It’s not immediately clear what tables are joined.
The SELECT clause lists the fields to be displayed in the output, but these fields are defined only later within subqueries.

When calculations get more complicated, it’s really a challenge to follow the computation process through all the subqueries. In contrast, the computation process using CTEs is much more readable:

WITH experienced AS
  (SELECT department, COUNT(*) AS experienced_employees
   FROM performance
   WHERE years_experience > 2
   GROUP BY department),

  junior AS
  (SELECT department, COUNT(*) AS junior_employees
   FROM performance
   WHERE years_experience <= 2
   GROUP BY department)

SELECT 
  e.department, 
  e.experienced_employees, 
  j.junior_employees
FROM experienced e
JOIN junior j
ON e.department = j.department;

Here, we start by defining all the temporary result sets necessary for the computations. Then we just write a simple query to get the needed output. Everything is neat and clear.

Believe me, you’ll really appreciate the structure that common table expressions provide when working with more complex computations. To prove my point, I want to suggest yet another example with nested computations.

#4. Nested CTEs Are Really Cool

Considering the reusability of CTEs and their ability to clarify the computation process, it is not surprising that WITH clauses are perfect for nested computations.

Let’s say we want to calculate how many employees in each department have a KPI score that’s (1) above average or (2) below average in their respective department. Specifically, for each department, we want to show the average KPI score, the number of above-average-KPI employees, and the number of below-average-KPI employees.

We basically need three subqueries here:

To get the average KPI for each department.
To get the number of employees that have an above-average KPI score.
To get the number of employees that have a below-average KPI score.

However, when using subqueries, you’ll need to add the first subquery three times, including two times when it will be nested within the other two subqueries. The resulting query looks quite messy:

SELECT 
  avg.department, 
  avg.average_kpi, 
  aa.employees_above_average, 
  ba.employees_below_average
FROM
    (SELECT department, AVG(kpi) AS average_kpi
    FROM performance
    GROUP BY department) AS avg
JOIN
    (SELECT p.department, count(*) AS employees_above_average
     FROM performance p
     JOIN
       (SELECT department, AVG(kpi) AS average_kpi
        FROM performance
        GROUP BY department) AS avg
     ON p.department = avg.department
     WHERE kpi > average_kpi
     GROUP BY p.department) AS aa
ON avg.department = aa.department
JOIN
    (SELECT p.department, count(*) AS employees_below_average
     FROM performance p
     JOIN
       (SELECT department, AVG(kpi) AS average_kpi
        FROM performance
        GROUP BY department) AS avg
     ON p.department = avg.department
     WHERE kpi < average_kpi
     GROUP BY p.department) AS ba
ON avg.department = ba.department;

department	average_kpi	employees_above_average	employees_below_average
Finance	90.733	2	1
Operations	85.875	2	2
Sales	81.84	2	3

The query worked and we got the output we wanted, but notice how difficult it is to follow the multiple nested subqueries. In the real world, it often gets even more complicated.

In contrast, see how neat the same query looks when using CTEs rather than subqueries:

WITH avg_kpi_department AS
    (SELECT department, AVG(kpi) AS average_kpi
    FROM performance
    GROUP BY department),

    above_average AS
    (SELECT p.department, count(*) AS employees_above_average
     FROM performance p
     JOIN avg_kpi_department avg
     ON p.department = avg.department
     WHERE kpi > average_kpi
     GROUP BY p.department),

     below_average AS
     (SELECT p.department, count(*) AS employees_below_average
     FROM performance p
     JOIN avg_kpi_department avg
     ON p.department = avg.department
     WHERE kpi < average_kpi
     GROUP BY p.department)

SELECT 
  avg.department, 
  avg.average_kpi, 
  aa.employees_above_average, 
  ba.employees_below_average
FROM avg_kpi_department avg
JOIN above_average aa
ON avg.department = aa.department
JOIN below_average ba
ON avg.department = ba.department;

You can clearly follow how the three temporary result sets are defined. You can understand their purpose from the assigned names. You don’t need to repeat the same subquery several times within the same main query. When you get to the main query, all the preparatory work is done and you can simply join the CTEs that have been already defined.

#5. CTEs Allow Recursion

Last but not least, CTEs are great at processing graphs, trees, and other hierarchical structures. This is because the WITH syntax can process recursion. A recursive query is a query that refers to itself.

For example, if we have a typical organizational structure, where each employee has a superior and superiors have multiple subordinates, recursive CTEs can help us analyze this data efficiently. With just one SQL query and individual-level data, we can calculate the total salary for each of the departments and sub-departments or the total number of unused vacation days in each of the organizational units.

The syntax of recursive CTEs is quite complicated. For a detailed explanation, I recommend reading this article that explains how recursive queries process hierarchical structures. You may also want to check out these examples of applying recursive queries to graph structures.

Let’s Practice CTEs!

Now that you realize the benefits that CTEs bring into SQL queries, it’s time to practice! I recommend starting with our Recursive Queries interactive course (114 coding challenges), which covers all kinds of common table expressions, including simple CTEs, nested CTEs, and recursive CTEs. Get more details in this overview article.

Thanks for reading, and happy learning!

Tags: