16th Jul 2020 11 minutes read

When Should I Use a Common Table Expression (CTE)?

Table of Contents

What Is a Common Table Expression?
The CTE Syntax
Using More Than one CTEs in a Query
When to Use CTEs
Getting the Hang of how to Use CTEs?

Interested in hearing more about common table expressions or CTEs? Would you like to know when CTEs are useful? Read on—we’ll discuss in this article.

If you’ve heard of common table expressions, you’ve probably wondered what they do. Even if you have not, it’s good that you’re here! CTEs can be very useful, especially if you have already mastered the basics of SQL, such as selecting, ordering, filtering data, and joining tables. Let’s get to the point right away and see what CTEs are as well as when and how to use them.

What Is a Common Table Expression?

A common table expression (CTE) is a relatively new SQL feature. It was introduced in SQL:1999, the fourth SQL revision, with ISO standards issued from 1999 to 2002 for this version of SQL.

CTEs were first introduced in SQL Server in 2005, then PostgreSQL made them available starting with Version 8.4 in 2009. MySQL waited a little bit longer and made them available in 2018, starting with Version 8.0. Simply put, it’s a temporary data set returned by a query, which is then used by another query. It’s temporary because the result is not stored anywhere; it exists only when the query is run.

There are two types of CTEs:

non-recursive
recursive

I’ll discuss only the non-recursive CTE in this article, and only mention the recursive CTE briefly at the end.

The basic syntax of the (non-recursive) CTE is as follows:

WITH expression_name AS (CTE definition)

As you can see, it is done using a WITH statement. For this reason, CTEs are also called WITH queries. After the WITH, you define a CTE in parenthesis. Defining CTE simply means writing a SELECT query which will give you a result you want to use within another query.

As you can see, it is done using a WITH statement. For this reason, CTEs are also called WITH queries. After the WITH, you define a CTE in parenthesis. Defining CTE simply means writing a SELECT query which will give you a result you want to use within another query.

SELECT ... 
FROM expression_name

You define your SELECT query and then reference your CTE, using it as you would any other table after the FROM clause.

If you want to read more about CTEs before going on to the examples, here is an article that explains them nicely.

The CTE Syntax

Now, let’s see how the CTE syntax works in practice. Suppose there is a database containing various data from the university with the following three tables:

students
subjects
exams

The table students has the following columns:

id: the ID of the student, a primary key
first_name: the student’s first name
last_name: the student’s last name

The next table is the subjects table containing the data:

id: the ID of the subject, a primary key
subject_name: the name of the subject

The third table is the exams table that stores the following data:

id: the ID of the exam given
exam_date: the date when the exam was given
subject_id: the ID of the subject, a foreign key from the table subjects
student_id: the ID of the student who took the exam, a foreign key from the table students

Your task is to calculate the average grade for the students. Then for every student with the average grade above 8.5, you need to show their first name, last name, and their average grade, and label them as “exceptional” students. How would you do this using a CTE?

The code that will give you the desired result can be written like this:

WITH grade_average AS (
SELECT	s.id,
		s.first_name,
		s.last_name,
		AVG (e.grade) AS average_grade 
FROM students s JOIN exams e ON s.id = e.student_id
GROUP BY s.id, s.first_name, s.last_name
)

SELECT	first_name,
		last_name,
		average_grade,
		'exceptional' AS tag
FROM grade_average
WHERE average_grade>8.5;

First, you need to define your CTE. As you’ve already learned, this is done using a WITH statement. It is followed by the name of the CTE, which is grade_average in this case. A CTE query is defined in between the parentheses. When you look at it on its own, it’s not complicated; it is a rather regular looking SELECT query. It selects id, first_name, and the last_name from the table students. It also calculates the average grade, using the column grade from the table exams. The result is shown in the new column average_grade. The tables students and exams are joined on the appropriate student ID column from each table. The result is grouped by the columns id, first_name, and last_name from the table students. The records are grouped, since you need to obtain the result by student.

After the CTE is defined, there’s another SELECT query that uses the CTE. This query selects the columns first_name, last_name, and average_grade from the CTE, grade_average. It also assigns the value “exceptional.” There is a WHERE clause at the end to show only those students with an average grade above 8.5.

Running the query will result in the names of three exceptional students.

first_name	last_name	average_grade	tag
John	Cheese	9.00	exceptional
Rowan	Chatkinson	9.50	exceptional
Petunia	Opportunia	8.67	exceptional

Using More Than one CTEs in a Query

It is possible to define and use more than one CTEs in a query. You do so by separating every CTE with a comma, and you use a WITH statement only when defining the first CTE.

Let me show you an example. With the same tables from the previous example, you have the following task: show the name of the subjects and their respective average and minimum grades, but only for those subjects in which everybody passed the exam, i.e. their mark is 5 or above.

To get the desired result, your query should look like this:

WITH subject_average AS (
SELECT	su.id,
su.subject_name,
		AVG (e.grade) AS subject_average_grade
FROM subjects su JOIN exams e ON su.id = e.subject_id
GROUP BY su.id, su.subject_name
),

min_grade AS (
SELECT	su.id,
		su.subject_name,
		MIN (e.grade) AS subject_min_grade
FROM subjects su JOIN exams e ON su.id = e.subject_id
GROUP BY su.id, su.subject_name
HAVING MIN (e.grade) > 5
)

SELECT	sa.id,
		sa.subject_name,
		sa.subject_average_grade
FROM subject_average sa JOIN min_grade m ON sa.id =m.id;

First, a CTE called subject_average is defined. It selects columns id and subject_name from the table subjects. Next, it calculates the average grades using data from the table exams and assigns the results in the new column subject_average_grade. Then, it groups the data to get the result by subject.

Now, you define the second CTE. Remember what I said earlier—you separate CTEs with commas and write the second CTE omitting the WITH statement. The second CTE here is named min_grade. It too selects id and subject_name from the table subjects then calculates the minimum grades, showing the result in the new column subject_min_grade. It groups the data as was done in the first CTE. Since you need the result only for the subjects in which everybody passed, you use a HAVING clause to select only the subjects in which the minimum grade is 5 or above.

Finally, you write the SELECT query which will show the subject ID, the subject name, and the average grade for each subject that meets the criteria. There are only two such subjects:

id	subject_name	subject_average_grade
5	Monetary Policy	7.40
6	Tax	8.00

Once you learn the basics of CTEs, there’s the Recursive Queries course with plenty more examples where you can practice writing the syntax.

When to Use CTEs

CTEs allow you to perform multi-level aggregations. What are they?

Let’s go back to the tables we used in the previous examples. The task now is to calculate the average minimum grade and the average maximum grade by subject.

Where would you start? If you think logically, you should first find the minimum and maximum grades per subject then find the average of the results by subject. It’s straightforward—the code looks like this:

SELECT		su.id,
		MIN (e.grade) AS min_grade,
		MAX (e.grade) AS max_grade,
		AVG (MIN (e.grade)) AS avg_min_grade,
		AVG (MAX (e.grade)) AS avg_max_grade
FROM subjects su JOIN exams e ON su.id = e.subject_id
GROUP BY su.id, su.subject_name;

Logically speaking, this tries to calculate the minimum grade and the maximum grade by subject first, then the average of those values. Voila! You now run the code, and get a message that looks like this:

Msg 130, Level 15, State 1, Line 16
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.

Hm, not the outcome you hoped for? That’s because SQL doesn’t allow constructions such as AVG (MIN (e.grade)). Your thoughts were correct, but you have to use a CTE to translate them into a SQL code. Here’s how to do it:

WITH min_max_grade AS (
SELECT		su.id,
		MIN (e.grade) AS min_grade,
		MAX (e.grade) AS max_grade
FROM subjects su JOIN exams e ON su.id = e.subject_id
GROUP BY su.id, su.subject_name
)

SELECT		AVG (min_grade) AS avg_min_grade,
		AVG (max_grade) AS avg_max_grade
FROM min_max_grade;

The CTE is named min_max_grade. In it, there is a SELECT statement that calculates the minimum and the maximum grades by subject, as I intended to do in the query that returned the error message. The result is shown in the new columns min_grade and max_grade. A CTE now helps translate your logic into code.

After defining the CTE, you write a SELECT statement that calculates the average of min_grade and max_grade from the CTE. The result will be shown in the new columns avg_min_grade and avg_max_grade. Now that you see it, it’s easy, right?

avg_min_grade	avg_max_grade
4.166666	9.833333

CTEs are also very helpful when you need to organize long and complex queries. Using CTEs will improve the readability of your code, since it breaks down the code nicely into separate steps. It becomes easier to change the code or correct errors. If you were to insist on not using CTEs, your code could look like this:

SELECT		AVG (min_grade) AS avg_min_grade,
		AVG (max_grade) AS avg_max_grade
FROM (
SELECT	su.id,
		su.subject_name,
		MIN (e.grade) AS min_grade,
		MAX (e.grade) AS max_grade
	FROM subjects su JOIN exams e ON su.id = e.subject_id
	GROUP BY su.id, su.subject_name
) AS min_max;

Compared to the solution using a CTE, this seems a bit all over the place and harder to read. Reading subqueries can be difficult, because you have to first think about what each subquery does, then go back to the main query, and somehow connect them all in your head. Besides, using subqueries like this goes counter to how your mind works logically and how you think about the steps that would get you to the solution. Recall that you broke down the problem into two steps: first, calculate the minimum and the maximum grades for every subject, then calculate the average of the minimums and the maximums. The CTE code reflects this order exactly.

The logic in the code with a subquery is the opposite of how you thought about the solution. Here, we first write that you want some average of the grades, then we specify in the subquery that you want the averages to be of the minimum and the maximum grades. When you use a subquery, how you write the code generally goes counter to how you think of the logic.

And if the code with a subquery is less readable and harder to understand than the code with a CTE in this simple example, imagine what it would be like if you had to write more complex queries! You would be scratching your head, trying really hard just to understand what every part of the code does. Having difficulties understanding a code can be very frustrating. This is where CTEs can help you.

You’ve probably noticed that CTEs are a lot like subqueries. Maybe you were wondering why I am using CTEs when everything I did could have been done with subqueries. That’s true, but aside from being more readable, CTEs have one big advantage over subqueries: the results from a CTE can be used more than once in a query. If you’re interested in more on this topic, I recommend reading about more differences between CTEs and subqueries.

I mentioned earlier that CTEs may be non-recursive or recursive. So far, we’ve looked at only non-recursive CTEs. Recursive CTEs are CTEs that reference themselves; by doing so, they return the sub-result and repeat the process until they return the final result. Using recursive CTEs really unlocks the power of CTEs; they are useful when processing hierarchical structures, such as trees and graphs.

Getting the Hang of how to Use CTEs?

We’ve covered some of the basics of the CTE in this article. You’ve learned what a CTE is, understood its syntax, and reviewed some simple examples to give you a feel for what CTEs can do. I also pointed out some common uses of CTEs to help you find a way to use them in your study or work. I hope to have given you some good directions; now it’s your turn to put what you’ve learned into practice.

If you have any questions or comments, let me know in the comments section!

Tags:

What Is a Common Table Expression?

The CTE Syntax

Using More Than one CTEs in a Query

When to Use CTEs

Getting the Hang of how to Use CTEs?

You may also like