Back to articles list July 15, 2021 - 10 minutes read 5 Reasons Why You Should Use CTEs Instead of Subqueries Kateryna Koidan Kateryna is a data science writer from Kyiv, Ukraine. She worked for BNP Paribas, the leading European banking group, as an internal auditor for more than 6 years. More recently, she decided to pursue only the favorite part of her job—data analysis. Now she is continuing her self-education with deep-learning courses, enjoys coding for data analysis and visualization projects, and writes on the topics of data science and artificial intelligence. Kateryna is also a proud mother of two lovely toddlers, who make her life full of fun. Tags: sql learn sql CTE subqueries Common Table Expressions, or CTEs, were introduced in SQL:1999 to handle cases where the output of one query is used within another query. But didn’t we already have subqueries for this? In this article, I’ll demonstrate with multiple examples why CTEs are better than subqueries for the structure and readability of your SQL queries. Let’s start by reminding ourselves what CTEs and subqueries are and how they differ. Common Table Expressions vs. Subqueries A subquery is a query nested inside another query. It can be placed anywhere in a query, even within another subquery. The syntax is very simple – you just put your subquery inside parentheses and insert it into the main query, wherever it is needed. Note that subqueries are executed before the main query (also called the parent query or the outer query) so that their results can be used by the main query. It’s often quite tricky to read an SQL query with multiple subqueries because even though they are executed first, they are defined somewhere in the main query. You can read more about SQL subqueries in this introductory guide. A Common Table Expression (CTE), also referred to as a WITH clause, is a temporary named result set that you can reference anywhere in your query. In contrast to subqueries, which are inserted exactly where you need them, all CTEs are defined before the main query and are then referenced in the query using the assigned name. In the examples below, we’ll see how this significantly improves the structure and readability of an SQL query. You first define all temporary result sets you need and then use them in the main query when necessary. A detailed introduction to common table expressions can be found here. Learn more about the difference between SQL CTEs and subqueries in this overview article. 5 Reasons to Choose CTEs SQL learners often wonder whether they should use a common table expression or a subquery when writing a complex SQL query. Let’s go through several examples to explain why you should prefer a CTE over a subquery. For these examples, I’ll be using the following table that summarizes the performance of employees from several departments. performance idnamedepartmentyears_experiencekpi 11Marta StewartSales168.8 12Cathrine BlackSales191.7 13Julian LeeOperations695.5 14Oliver HenriksonSales578.1 15Julia GreySales473.2 16Penelope GreenOperations382.7 17William SmithSales297.4 18Gabriel GarciaOperations190.1 19Nikita MooreOperations175.2 20Lucy MillerFinance379.9 21Sheldon CooperFinance398.9 22Leonard HofstadterFinance293.4 For each employee, we have their ID, full name, department, years of experience, and a KPI completion score on a scale from 0 to 100. Let’s see how CTEs vs. subqueries may assist us in analyzing this data. 5 Reasons to Use CTEs Rather Than Subqueries #1. CTEs Use Meaningful Names You can give meaningful names to CTEs that specify your intention and make the query more readable. For example, let’s say we want to compare each employee’s performance with the average KPI completion in their respective departments. To add the corresponding column to our table, we first need to calculate the average KPI for each department. Our entire query with a subquery is as follows: SELECT p.name, p.department, p.kpi, avg.average_dep_kpi FROM performance p JOIN (SELECT department, AVG(kpi) AS average_dep_kpi FROM performance GROUP BY department) AS avg ON p.department = avg.department; namedepartmentkpiaverage_dep_kpi Marta StewartSales68.881.84 Cathrine BlackSales91.781.84 Julian LeeOperations95.585.875 Oliver HenriksonSales78.181.84 Julia GreySales73.281.84 Penelope GreenOperations82.785.875 William SmithSales97.481.84 Gabriel GarciaOperations90.185.875 Nikita MooreOperations75.285.875 Lucy MillerFinance79.990.733 Sheldon CooperFinance98.990.733 Leonard HofstadterFinance93.490.733 You can get the same output using a common table expression: WITH avg_department_kpi AS (SELECT department, AVG(kpi) AS average_dep_kpi FROM performance GROUP BY department) SELECT p.name, p.department, p.kpi, avg.average_dep_kpi FROM performance p JOIN avg_department_kpi avg ON p.department = avg.department; CTEs and subqueries give the same output, and the queries are more or less of the same length. But note that the readability of the CTE query is better. What a complex query means might be obvious to you when you’re looking at your own code, but it may not be at all clear to your colleagues. When reading the CTE version: You see the query in the order it will be executed: first the subquery and then the main query. You can determine the purpose of the subquery based on its name. In our case, avg_department_kpi refers to the CTE that outputs the average KPI for each department. At the same time, you may have noticed that the CTE query is a bit longer than the subquery version. That is not always the case, especially when we need the same subquery several times in our main query. If you are new to WITH syntax, you can practice common table expressions in our interactive course. For now, let’s move to the second reason to prefer CTEs over subqueries. #2. CTEs Are Reusable Within a Query In contrast to subqueries, you don’t have to repeat a CTE definition each time you need it in the query. You define it only once, at the beginning of your query, and then reference it when necessary. Let’s say we want to compare the performance of different departments. Specifically, we want to see the average KPI in every department along with the minimum and maximum average KPI across departments. We could address this problem using an SQL query with three subqueries: To find the average KPI for each department. To find the minimum average KPI across departments. To find the maximum average KPI across departments. Note that the last two subqueries will both need the result of the first one. Thus, if using subqueries, our solution will include subqueries defined within other subqueries. The whole thing looks quite confusing: SELECT ad.department, ad.average_kpi, min.min_avg_kpi_department, max.max_avg_kpi_department FROM (SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department) AS ad CROSS JOIN (SELECT MIN (average_kpi) AS min_avg_kpi_department FROM (SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department) AS ad) AS min CROSS JOIN (SELECT MAX (average_kpi) AS max_avg_kpi_department FROM (SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department) AS ad) AS max; In contrast, when using common table expressions, we define our three CTEs at the beginning of the query, reference them in the main query when needed, and avoid multiple nested subqueries: WITH avg_per_department AS ( SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department), min_kpi_department AS ( SELECT MIN (average_kpi) AS min_avg_kpi_department FROM avg_per_department), max_kpi_department AS ( SELECT MAX (average_kpi) AS max_avg_kpi_department FROM avg_per_department) SELECT ad.department, ad.average_kpi, min.min_avg_kpi_department, max.max_avg_kpi_department FROM avg_per_department ad CROSS JOIN min_kpi_department min CROSS JOIN max_kpi_department max; departmentaverage_kpimin_avg_kpi_departmentmax_avg_kpi_department Finance90.73381.8490.733 Operations85.87581.8490.733 Sales81.8481.8490.733 In this case, you can observe that WITH queries make your code not only more readable but also much shorter. #3. CTEs Divide Complex Computations into Parts CTEs bring clarity to the computation process. When subqueries are used for computations, the query often turns into a tangled mess of subqueries. CTEs, on the other hand, show the computation process more clearly. Let’s say we want to know how many junior and experienced employees are in each department. We’ll consider those employees with two or fewer years of experience as junior and those with over two years of experience as experienced. We basically need two subqueries here: To calculate the number of junior employees in each department. To calculate the number of experienced employees in each department. Here’s how we can approach this with subqueries: SELECT e.department, e.experienced_employees, j.junior_employees FROM (SELECT department, COUNT(*) AS experienced_employees FROM performance WHERE years_experience > 2 GROUP BY department) AS e JOIN (SELECT department, COUNT(*) AS junior_employees FROM performance WHERE years_experience <= 2 GROUP BY department) AS j ON e.department = j.department; departmentexperienced_employeesjunior_employees Finance21 Operations22 Sales23 This example is not very complicated, so you’re probably able to follow the above query. However, notice that: You need to guess the purpose of each subquery. It’s not immediately clear what tables are joined. The SELECT clause lists the fields to be displayed in the output, but these fields are defined only later within subqueries. When calculations get more complicated, it’s really a challenge to follow the computation process through all the subqueries. In contrast, the computation process using CTEs is much more readable: WITH experienced AS (SELECT department, COUNT(*) AS experienced_employees FROM performance WHERE years_experience > 2 GROUP BY department), junior AS (SELECT department, COUNT(*) AS junior_employees FROM performance WHERE years_experience <= 2 GROUP BY department) SELECT e.department, e.experienced_employees, j.junior_employees FROM experienced e JOIN junior j ON e.department = j.department; Here, we start by defining all the temporary result sets necessary for the computations. Then we just write a simple query to get the needed output. Everything is neat and clear. Believe me, you’ll really appreciate the structure that common table expressions provide when working with more complex computations. To prove my point, I want to suggest yet another example with nested computations. #4. Nested CTEs Are Really Cool Considering the reusability of CTEs and their ability to clarify the computation process, it is not surprising that WITH clauses are perfect for nested computations. Let’s say we want to calculate how many employees in each department have a KPI score that’s (1) above average or (2) below average in their respective department. Specifically, for each department, we want to show the average KPI score, the number of above-average-KPI employees, and the number of below-average-KPI employees. We basically need three subqueries here: To get the average KPI for each department. To get the number of employees that have an above-average KPI score. To get the number of employees that have a below-average KPI score. However, when using subqueries, you’ll need to add the first subquery three times, including two times when it will be nested within the other two subqueries. The resulting query looks quite messy: SELECT avg.department, avg.average_kpi, aa.employees_above_average, ba.employees_below_average FROM (SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department) AS avg JOIN (SELECT p.department, count(*) AS employees_above_average FROM performance p JOIN (SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department) AS avg ON p.department = avg.department WHERE kpi > average_kpi GROUP BY p.department) AS aa ON avg.department = aa.department JOIN (SELECT p.department, count(*) AS employees_below_average FROM performance p JOIN (SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department) AS avg ON p.department = avg.department WHERE kpi < average_kpi GROUP BY p.department) AS ba ON avg.department = ba.department; departmentaverage_kpiemployees_above_averageemployees_below_average Finance90.73321 Operations85.87522 Sales81.8423 The query worked and we got the output we wanted, but notice how difficult it is to follow the multiple nested subqueries. In the real world, it often gets even more complicated. In contrast, see how neat the same query looks when using CTEs rather than subqueries: WITH avg_kpi_department AS (SELECT department, AVG(kpi) AS average_kpi FROM performance GROUP BY department), above_average AS (SELECT p.department, count(*) AS employees_above_average FROM performance p JOIN avg_kpi_department avg ON p.department = avg.department WHERE kpi > average_kpi GROUP BY p.department), below_average AS (SELECT p.department, count(*) AS employees_below_average FROM performance p JOIN avg_kpi_department avg ON p.department = avg.department WHERE kpi < average_kpi GROUP BY p.department) SELECT avg.department, avg.average_kpi, aa.employees_above_average, ba.employees_below_average FROM avg_kpi_department avg JOIN above_average aa ON avg.department = aa.department JOIN below_average ba ON avg.department = ba.department; You can clearly follow how the three temporary result sets are defined. You can understand their purpose from the assigned names. You don’t need to repeat the same subquery several times within the same main query. When you get to the main query, all the preparatory work is done and you can simply join the CTEs that have been already defined. #5. CTEs Allow Recursion Last but not least, CTEs are great at processing graphs, trees, and other hierarchical structures. This is because the WITH syntax can process recursion. A recursive query is a query that refers to itself. For example, if we have a typical organizational structure, where each employee has a superior and superiors have multiple subordinates, recursive CTEs can help us analyze this data efficiently. With just one SQL query and individual-level data, we can calculate the total salary for each of the departments and sub-departments or the total number of unused vacation days in each of the organizational units. The syntax of recursive CTEs is quite complicated. For a detailed explanation, I recommend reading this article that explains how recursive queries process hierarchical structures. You may also want to check out these examples of applying recursive queries to graph structures. Let’s Practice CTEs! Now that you realize the benefits that CTEs bring into SQL queries, it’s time to practice! I recommend starting with our Recursive Queries interactive course (114 coding challenges), which covers all kinds of common table expressions, including simple CTEs, nested CTEs, and recursive CTEs. Get more details in this overview article. Thanks for reading, and happy learning! Tags: sql learn sql CTE subqueries You may also like Subquery vs. CTE: A SQL Primer SQL subqueries and CTEs seem similar, but even pros may not know all their quirks. Learn their similarities, differences, and best use cases. Read more SQL CTE Explained Common Table Expressions are a powerful SQL tool. Learn how to improve code readability, break down complex subqueries, and traverse hierarchical data. Read more Where Can I Find Good SQL CTE Exercises? SQL Common Table Expressions (CTEs) are widely used to improve the performance and readability of SQL code. Learn about where you can find good SQL CTE exercises and resources. Read more What’s the Difference Between SQL CTEs and Views? Learn the similarities and differences between SQL CTEs and views to finally understand when to use which. Examples provided. Read more What are SQL CTE Best Practices? When is it best to use SQL CTEs, and what naming conventions should you have in mind. Find out with minimal theory, plenty of code. Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.