22nd May 2020 11 minutes read

Subquery vs. CTE: A SQL Primer

Table of Contents

What Is a Subquery?
What Is CTE?
No Difference at All...?
Subquery vs CTE: What’s the Difference?
But Subqueries Are Sometimes Irreplaceable
One More Difference: CTEs Must Be Named
Subqueries vs CTEs – Which Is Better?

Have you ever wondered about the differences between a subquery and a common table expression (CTE) in SQL? The concepts seem very similar, but knowing the difference – and when to use each one – will help you write efficient and readable queries.

First, we’ll explain the concepts underlying subqueries and CTEs. Then we’ll look at a few examples and finally analyze the pros and cons of each technique.

What Is a Subquery?

A subquery is a query within a query. We can use it in multiple ways: in the FROM clause, for filtering, or even as a column. To use a subquery, simply add parentheses and put the query inside them.

In our first examples, we’ll work with data on the results of a long jump competition. We’re given two tables:

participant – Stores information about contest participants:

id – A unique ID for each participant.
first_name – The participant’s first name.
last_name – The participant’s last name.

jump – Stores information about the jumps made by the participants:

id – The ID of each jump.
participant_id – The ID of the participant who made the jump.
contest_id – The ID of the contest in which the jump was made.
length – Jump length, in centimeters.

participant

id	first_name	last_name
1	Amisha	Brown
2	Jamaal	Sanford
3	Hiba	Cameron

jump

id	participant_id	contest_id	length
1	1	1	667
2	1	2	745
3	1	3	723
4	2	1	736
5	2	2	669
6	2	3	508
7	3	1	664
8	3	2	502
9	3	3	739

Since you know the data we’re using, take a look at the following subquery examples:

SELECT
  first_name,
  last_name,
  length
FROM participant
JOIN jump
  ON jump.participant_id = participant.id
WHERE length > (
  SELECT
    AVG(length)
  FROM jump
);

This query shows participants with their jumps that were longer than the average jump length. In the WHERE condition, we’ve used a subquery to get the average jump length. Since there’s only one value returned by this subquery, we can easily compare column data to it.

Next, another example:

SELECT
  MAX(avg_length) AS max_length
FROM (
  SELECT
    first_name,
    last_name,
    AVG(length) AS avg_length
  FROM participant
  JOIN jump
    ON jump.participant_id = participant.id
  GROUP BY first_name,
    last_name
) AS average_lengths;

In this query, we’re showing the largest average jump length by any participant. To get the result, we first compute the average jump length for each participant. We’ve done this by using a subquery inside the FROM clause. Then, we simply use MAX() to return the largest average length.

These are only two examples of subqueries. It’s a broad topic – even though the use cases are rather simple – and there are too many concepts to be described in this article. A deep overview of subqueries can be found in our course SQL Subqueries. It is an interactive course with over 80 hands-on exercises designed to help you practice subqueries.

You can also watch episodes of our We Learn SQL series on Youtube. Several of them have been dedicated to SQL Subqueries. Remember to subscribe to our channel.

What Is CTE?

A common table expression (called CTE for short) is a query which we create before writing the main query. Then, we can simply use it as a regular table inside our code.

Look at the following example. Once again, we’re using the data from the long jump contest:

WITH average_lengths AS (
  SELECT
    first_name,
    last_name,
    AVG(length) AS avg_length
  FROM participant
  JOIN jump
    ON jump.participant_id = participant.id
  GROUP BY first_name,
    last_name
)

SELECT
  MAX(avg_length) AS max_length
FROM average_lengths;

This returns exactly the same result as the previous example with the subquery: the largest average jump length by any participant. However, instead of writing our query inside the FROM clause, we’ve put it in the WITH clause (which comes before the main query). We’ve called our CTE average_length and used it in the FROM clause in the main query.

Of course, CTEs can be far more complicated than this example. But we won’t discuss that here. If you’d like to learn about CTEs, check out the Recursive Queries course at LearnSQL.com.

No Difference at All...?

At first, you might think that there’s almost no difference between subqueries and CTEs. We’ve used both a subquery and a CTE in the FROM clause and the syntax was only a little different. However, don’t forget the first example – we used a subquery in the WHERE clause there. You couldn’t use a CTE there, and that’s not the only difference!

Subquery vs CTE: What’s the Difference?

Of course, this doesn’t mean that CTEs are inferior to subqueries. Let’s examine the differences between the two, starting with CTEs.

Difference #1: CTEs can be recursive

Let’s take a look at the first advantage of CTEs. CTEs allow you to use a powerful concept: recursion. Thanks to recursion, SQL is now Turing complete – every program which can be written in any programming language can also be written in SQL. (If you have doubts that SQL is a programming language, check out Is SQL a programming language? on the LearnSQL.com blog.)

How does the recursion work in SQL? It allows your CTE to call itself until a specified ending condition is met. In each step, the query expands itself and changes the data it has. Let’s see an example.

We’ll work with the following data for some fictional company employees. It’s stored in the employee table, which contains the following columns:

id – A unique ID for each employee.
first_name – The employee’s first name.
last_name – The employee’s last name.
manager_id – The ID of that employee’s manager.

employee

id	first_name	last_name	manager_id
1	Maisy	Bloom	NULL
2	Caine	Farrow	1
3	Waqar	Jarvis	2
4	Lacey-Mai	Rahman	2
5	Merryn	French	3

Now, we’d like to show the chain of management from the CEO (a person without a value in the manager_id column) to each employee. The query below will solve this problem. Take a look:

WITH RECURSIVE employee_chain AS (
  SELECT
    id,
    first_name,
    last_name,
    first_name || ' ' || last_name AS chain
  FROM employee
  WHERE manager_id IS NULL
  UNION ALL
  SELECT
    employee.id,
    employee.first_name,
    employee.last_name,
    chain || '->' || employee.first_name || ' ' || employee.last_name
  FROM employee_chain
  JOIN employee
    ON employee.manager_id = employee_chain.id
)

SELECT
  first_name,
  last_name,
  chain
FROM employee_chain;

The result will look like this:

first_name	last_name	chain
Maisy	Bloom	Maisy Bloom
Caine	Farrow	Maisy Bloom->Caine Farrow
Waqar	Jarvis	Maisy Bloom->Caine Farrow->Waqar Jarvis
Lacey-Mai	Rahman	Maisy Bloom->Caine Farrow->Lacey-Mai Rahman
Merryn	French	Maisy Bloom->Caine Farrow->Waqar Jarvis->Merryn French

We have written a query that can easily create a whole chain of relations. You might think that this could be achieved with subqueries, but as the management chain goes deeper and deeper, you’d have to write more and more code. The amount of code you’d have to write would depend on the depth of the chain – and that can only be checked with a recursive CTE.

How does this query work? It starts by running the first part (before the UNION ALL) and selects an employee without a manager (i.e. Maisy Bloom). Then the part beneath the UNION ALL selects the employee(s) directly managed by Maisy (Caine Farrow). Since the query is calling itself, it then runs the same part again and selects all the employees managed by Caine (Waqar Jarvis and Lacey-Mai Rahman). It repeats this operation as long as it has rows to join. After traversing the whole management chain, the query halts.

If this is your first encounter with recursion in SQL, it might be a little hard to understand. And that’s totally normal. Check out Do it in SQL: Recursive SQL Tree Traversal for a more detailed explanation.

Difference #2: CTEs are reusable

A huge advantage of CTEs is that they can be used multiple times in a query. You don’t have to copy the whole CTE code – you simply put the CTE name.

Using the data from the previous section, we’d like to 1) filter out the employees who don’t have a manager and then 2) show each employee with their manager – but only if they have a manager. The result will look like this:

first_name	last_name	first_name	last_name
Waqar	Jarvis	Caine	Farrow
Lacey-Mai	Rahman	Caine	Farrow
Merryn	French	Waqar	Jarvis

Now, let’s see how a CTE would solve this task:

WITH not_null_manager AS (
  SELECT
    *
  FROM employee
  WHERE manager_id IS NOT NULL
)

SELECT
  nnm1.first_name,
  nnm1.last_name,
  nnm2.first_name,
  nnm2.last_name
FROM not_null_manager AS nnm1
JOIN not_null_manager AS nnm2
  ON nnm1.manager_id = nnm2.id;

Now let’s see how a subquery would achieve the same result:

SELECT
  nnm1.first_name,
  nnm1.last_name,
  nnm2.first_name,
  nnm2.last_name
FROM (
  SELECT
    *
  FROM employee
  WHERE manager_id IS NOT NULL
) AS nnm1
JOIN (
  SELECT
    *
  FROM employee
  WHERE manager_id IS NOT NULL
) AS nnm2
  ON nnm1.manager_id = nnm2.id;

As you can see, the CTE query has less code. It’s more readable, too: you simply repeat the CTE name (not_null_manager) instead of a whole chunk of code.

There’s not really much difference in the performance efficiency between these two queries. Even though you only declare the CTE once, the execution time is almost the same.

Difference #3: CTEs can be more readable

So, you know that you can write less code using CTEs. What about code organization? Here’s another example that focuses on the FROM clause.

Do you still remember the first examples? The ones where returning the largest average jump length? If not, here’s a quick review.

This one uses a subquery:

SELECT
  MAX(avg_length) AS max_length
FROM (
  SELECT
    first_name,
    last_name,
    AVG(length) AS avg_length
  FROM participant
  JOIN jump
    ON jump.participant_id = participant.id
  GROUP BY first_name,
    last_name
) AS average_lengths;

And this one uses a CTE:

WITH average_lengths AS (
  SELECT
    first_name,
    last_name,
    AVG(length) AS avg_length
  FROM participant
  JOIN jump
    ON jump.participant_id = participant.id
  GROUP BY first_name,
    last_name
)

SELECT
  MAX(avg_length) AS max_length
FROM average_lengths;

Which one do you think is more readable? I’d say the second one (CTE) is a bit better. The code is less nested and there’s a small layer of abstraction. You know at first glance that this query selects the maximal average length.

CTEs’ readability is shown even more in the examples from the previous section. And when your queries grow bigger and bigger, using CTEs will significantly increase the readability of your code. And readability is key to the efficient development of code.

If you want to learn more about using CTEs to improve the readability of your code, you should definitely check out the article How to Organize SQL Queries with CTEs.

But Subqueries Are Sometimes Irreplaceable

So far, you’ve learnt quite a few differences between subqueries and CTEs. To be honest, I was trying to convince you that CTEs are much better than subqueries. But in this section, you’ll learn why subqueries can be invaluable.

Filtering with a subquery

This article’s first example used a subquery in the WHERE clause. I didn’t show a similar example in the CTE section. That’s because only subqueries can be used in the WHERE clause!

What’s more, there are quite a few keywords you can use in the WHERE condition – e.g. ALL, ANY, EXISTS, and a few more! Unfortunately, I can’t explain them here; it’d take too much time. Instead, I recommend checking out the Subqueries section of our SQL Basics course. You’ll not only learn about these keywords, you’ll solve a few problems using them! Or check out the article SQL Subqueries in our blog if you want a brief explanation.

Subqueries can act like columns

You can also use subqueries as you would a column. The only constraint is that the subquery must return only one value. Take a look:


SELECT DISTINCT
  contest_id,
  (
    SELECT
      COUNT(length)
    FROM jump AS inside_jump
    WHERE inside_jump.contest_id = outside_jump.contest_id
      AND inside_jump.length > 600
  ) AS longer_jumps,
  (
    SELECT
      COUNT(length)
    FROM jump AS inside_jump
    WHERE inside_jump.contest_id = outside_jump.contest_id
      AND inside_jump.length <= 600
  ) AS shorter_jumps
FROM jump AS outside_jump;

For each contest, this query returns the number of jumps longer than 600cm (computed in the first subquery) and the number of jumps shorter than or equal to 600cm (computed in the second subquery). The result will look like this:

contest_id	longer_jumps	shorter_jumps
1	3	0
2	2	1
3	2	1

Correlated subqueries

Look at the above example again. Did you notice that I used a reference to a table in the outer query inside the subquery? I even referred to the value of the current row from that table. This is called a “correlated subquery”. It allows you to use values from the outer query inside the subquery.

This is a very handy technique, but it’s also quite complicated; we won’t explain it in this article. However, feel free to check out Correlated Subquery in SQL: A Beginner’s Guide in our blog for an explanation.

One More Difference: CTEs Must Be Named

The last difference between CTEs and subqueries is in the naming. CTEs must always have a name. On the other hand, in most database engines, subqueries don’t require any name (the only exception is the FROM clause in my favorite database engine, PostgreSQL).

It’s a good practice to name subqueries placed in the FROM or SELECT clauses, but it’s not a requirement. And, to be precise, you can’t name the subqueries you use in the WHERE clause.

You might think that naming is not a big difference and won’t affect you much. However, you may have to quickly check something in the database. In this case, the easier syntax might be your choice. Even if the query is less readable, keep this in mind – such queries are rarely read after they’re used.

Subqueries vs CTEs – Which Is Better?

You’ve learnt a lot about the differences between CTEs and subqueries. So, then, which is better? The answer is neither, either, or it depends – both subqueries and CTEs have pros and cons. Each query should be analyzed and the choice between these two decided on a case-by-case basis. But to do this, you’ll have to thoroughly learn both concepts.

To learn more about subqueries, I can suggest the SQL Subqueries course. If you’d like to learn more about CTEs, the Recursive Queries course is your best bet. These courses will help you quickly learn these concepts. Thus, you’ll be able to decide which queries benefit from CTEs and which call for subqueries.

Tags: