16th Jul 2024 12 minutes read

PostgreSQL CTE: What It Is and How to Use It

Table of Contents

Common Table Expressions in PostgreSQL
Nested Queries Using PostgreSQL CTE
PostgreSQL CTE in Data Manipulation Language
Recursive Queries
Learn More About PostgreSQL CTEs

CTEs, or Common Table Expressions, are a powerful PostgreSQL tool that’s often ignored. This article looks at various PostgreSQL CTEs – including nested and recursive CTEs – and what you can do with them.

If you write complex queries in SQL, you’ll soon find that your code becomes cluttered and hard to read. CTEs – also known as WITH clauses – are primarily a way of simplifying queries. However, they also allow you to use recursion. Recursion, among other uses, lets you easily navigate hierarchical structures.

PostgreSQL CTEs (or Common Table Expressions) are very similar to subqueries; the difference is that CTEs are named and defined at the top of your code. This lets you break a large query down into small sections.

In this article, I’ll work through several examples of PostgreSQL CTEs. I’m assuming that you’re already familiar with writing queries in PostgreSQL. If not, our PostgreSQL Cheat Sheet is well worth downloading.

If you think PostgreSQL CTEs will help you in your job, you may like to look at our interactive Common Table Expressions in PostgreSQL course. This course is designed for those who are already familiar with basic SQL. You’ll get lots of practice using PostgreSQL CTEs, thanks to over a hundred interactive exercises.

Let’s dig into common table expressions in Postgres!

Common Table Expressions in PostgreSQL

CTE in PostgreSQL Syntax

Let’s now have a closer look at CTE syntax. In its simplest form, it looks like this:

WITH cte_name AS (query_1)
query_2;

cte_name is the name you assign to the CTE. You can refer to this name in your main query or in subqueries, just as you would a table.
query_1 is any valid SELECT
query_2 is a valid SQL statement. It could be a SELECT, an UPDATE, an INSERT or a DELETE.

The results of query_1 will be available as if they were a table. The table name will be the name you specified as cte_name. You can use it in the rest of your query in the same way you use other tables.

CTE in PostgreSQL Example

Let’s look at an example. Alpha Sales is an online retailer. They want to know if their latest marketing strategy has been effective, and what type of customer has best responded to it.

Here is a sample of their order_summary table, which holds the value of each order made in April, May, and June of 2024.

order_id	customer_id	order_date	value
1	1	2024-06-05	700
2	1	2024-04-18	400
3	1	2024-05-15	500
4	2	2024-04-25	200
5	88	2024-05-04	700
6	88	2024-06-18	500
7	88	2024-05-25	150
8	345	2024-04-02	250
9	345	2024-06-25	450
10	345	2024-06-19	300
11	657	2024-05-25	900
12	657	2024-06-25	200

As the first step in analyzing the success of their marketing campaign, the company leaders want to compare the June sales by customer to the average monthly sales by customer for April and May and calculate the percentage change.

Of course, you could achieve this using subqueries, but the code would be quite complex. You want to see the previous month average on the report, but also use it in the percentage change calculation.

Using a CTE, the query would look like this:

WITH april_may_sales AS
(SELECT 
   customer_id, 
   SUM(value) / 2 AS prev_avg
 FROM order_summary 
 WHERE EXTRACT (MONTH FROM order_date) in (4,5)
 GROUP BY customer_id;
)
SELECT 
  order_summary.customer_id, 
  prev_avg, 
  SUM(value) AS jun_total, 
  (SUM(value) - prev_avg) * 100 / prev_avg AS percent_change
FROM order_summary  
JOIN april_may_sales
ONapril_may_sales.customer_id = order_summary.customer_id
WHERE EXTRACT (MONTH FROM order_date) = 6
GROUP BY order_summary.customer_id, prev_avg
ORDER BY customer_id;

This query uses the WITH clause to create a virtual table named april_may_sales. It extracts the total sales by customer for the months of April and May, divides the result by 2 to get a monthly average, and stores that information in a column named prev_avg.

This table is joined to the order_summary table in the main query so we can look at June’s total alongside the average for April and May.

The query produces the following result set:

customer_id	prev_avg	jun_total	percent_change
1	450.00	700.00	55.56
88	425.00	500.00	17.65
345	125.00	750.00	500.00
657	450.00	200.00	-55.56

Defining CTE Column Names

Optionally, you can specifically define the column names for the CTE table using the following syntax:

WITH cte_name (column_name_list)
AS (query_1)
query_2;

Here, column_name_list is a list of column names separated by commas.

Changing the previous example to use this syntax gives us the following query:

WITH april_may_sales (customer_id, prev_avg)
AS (
  SELECT 
 	customer_id, 
      SUM(value) /2 
  FROM order_summary 
  WHERE EXTRACT (MONTH FROM order_date) in (4,5)
  GROUP BY customer_id
)
SELECT 
order_summary.customer_id,
prev_avg, 
SUM(value) AS jun_total, 
(SUM(value) - prev_avg) * 100/prev_avg AS percent_change
FROM order_summary  
JOIN april_may_sales
ON april_may_sales.customer_id = order_summary.customer_id
WHERE EXTRACT (MONTH FROM order_date) = 6
GROUP BY order_summary.customer_id, prev_avg
ORDER BY customer_id;

This makes no difference to the query’s output, which remains the same as the previous sample query. It does, however, make it easier for someone else to understand your query.

Nested Queries Using PostgreSQL CTE

You can define two or more CTEs using one WITH keyword in PostgreSQL. You simply begin by using the WITH keyword and then specify each CTE separated by commas. The syntax looks like this:

WITH 
cte_name_1 AS (query_1),
cte_name_2 AS (query_2)
query_3;

Each CTE has its own name and select statement. Each CTE can refer to any previously-defined CTE to pick up any data it needs. Notice that you don’t repeat the WITH keyword: you just list the CTEs separated by commas.

Let’s look at this in action. Suppose Alpha Sales now wants to take this analysis a stage further. They would like to extract demographics for those customers who bought more in June than their average purchases for April and May.

To do this, they need to combine the data extracted in the previous query with data from their customer table. Here’s a sample of the data:

customer_id	prev_avg	jun_total	percent_change
1	450.00	700.00	55.56
88	425.00	500.00	17.65
345	125.00	750.00	500.00
657	450.00	200.00	-55.56

To do this, you can:

Move the previous main query to the front as a nested CTE. This effectively creates a virtual table holding the customer_id, the previous average, the June total, and the percentage change.
Write a new main query that joins this table with the customer table to calculate the customer’s age and extract his state.

The new query looks like this:

WITH april_may_sales AS
  (SELECT 
     customer_id, 
     SUM(value) / 2 AS prev_avg
   FROM order_summary 
   WHERE EXTRACT (MONTH FROM order_date) in (4,5)
   GROUP BY customer_id
),
comparison AS
  (
    SELECT 
      order_summary.customer_id, 
      prev_avg, 
      SUM(value) AS jun_total, 
      (SUM(value) - prev_avg) * 100/prev_avg AS percent_change
    FROM order_summary  
    JOIN april_may_sales
    ON april_may_sales.customer_id = order_summary.customer_id
    WHERE EXTRACT (MONTH FROM order_date) = 6
    GROUP BY order_summary.customer_id, prev_avg
  )
SELECT 
  customer.customer_id,
  name,
  EXTRACT(YEAR from CURRENT_DATE) - 
		EXTRACT(YEAR from date_of_birth) AS age,
  state,
  prev_avg, 
  jun_total,
  percent_change
FROM customer
JOIN comparison 
    ON comparison.customer_id = customer.customer_id
WHERE percent_change > 0;

As before, the query defines the CTE named april_may_sales as a virtual table containing the average sales for April and May.

It then defines a new CTE named comparison, which contains a comparison of the June totals by customer against the contents of april_may_sales.

Finally, the main query combines the data in the comparison virtual table with data from the customer table.

The result set looks like this:

customer_id	name	age	state	prev_avg	jun_total	percent_change
1	John Smith	30	KY	450.00	700.00	55.56
88	Tinashe Mpofu	50	ID	425.00	500.00	17.65
345	Jennifer Perry	26	HI	125.00	750.00	500.00

PostgreSQL CTE in Data Manipulation Language

Let’s now look at data manipulation statements like INSERT, UPDATE, and DELETE.

One of the limitations of CTEs is that you can’t use them directly in place of a value in an UPDATE statement in the same way that you can with a subquery.

Let’s say you wanted to update the balance in the customer table by adding in the value of all the June orders. With ordinary subqueries, you can do something like this:

UPDATE customer 
SET balance = balance + 
(select SUM(value) FROM order_summary 
WHERE customer.customer_id = order_summary.customer_id
   AND EXTRACT (MONTH from order_date) = 6);

You can’t do that with a CTE. What you can do, however, is use the following syntax:

WITH cte_name AS (select_statement)
UPDATE tablename 
SET column_name_1 = column_name_2
FROM cte_name 
WHERE join_clause;

cte_name is the name you’ll use to refer to the ‘table’ created by the CTE.
select_statement is the statement you’ll use to populate the CTE.
column_name_1 is the column name in the main table that you want updated.
column_name_2 is the column name in your CTE that you’ll use to set the new value.
join_clause specifies the condition you’ll use to join the two tables.

The following query adds the total June orders from the order_summary table to the balance in the customer table:

WITH june_total AS
(SELECT 
   customer_id, 
   SUM(value) AS jun_tot
 FROM order_summary WHERE EXTRACT(MONTH FROM order_date) = 6
 GROUP BY customer_id
)
UPDATE customer
SET balance = balance + jun_tot
FROM june_total 
WHERE customer.customer_id = june_total.customer_id;

First, the WITH clause creates a pseudo-table named june_total. It contains totals by customer_id of orders where the month of order_date is 6.

Next, the column jun_tot from this table is used to increase the balance where customer_id matches between the two tables.

The table customer now contains the following data:

customer_id	name	date_of_birth	state	balance
1	John Smith	5/7/1994	KY	1000
2	Shamila Patel	14/3/2006	CT	1000
88	Tinashe Mpofu	17/4/1974	ID	500
345	Jennifer Perry	21/10/1998	HI	850
657	Sarah Jones	25/4/1984	KY	570

You can also use CTEs to insert or delete rows in the same way.

Recursive Queries

Recursive queries are a feature of CTEs. These queries let you loop from a single base query to repetitively carry out a specified task. They are especially useful for querying hierarchical data such as organizational structures and bills of materials.

A full cover of recursive queries is beyond the scope of this article. We’ll just look at the syntax and a simple example. For more details, have a look at What Is a Recursive CTE in SQL, which gives a full explanation and several examples.

The syntax of recursive queries in PostgreSQL is:

WITH RECURSIVE cte_name AS 
(query_1 UNION query_2)
query_3;

The keyword RECURSIVE indicates that this is a recursive query.
query_1 is the base query, or starting point. For example, suppose you’re working with an organizational structure. In that case, query_1 could be a query that selects the top-level manager from an employee file.
query_2 is the recursive query. This query will be repeated until no more rows meet the criteria specified in the WHERE It can reference the last row added to the result set to pick up data. This could be used to find all employees who report to a manager.
UNION combines the results. If you use UNION ALL, duplicates will be retained; otherwise they will be omitted.
query_3 is used to return the final result set. It can reference the virtual table created by the CTE.

Let’s think about an example of an employee table where employee records have a field identifying the manager they report to. What actually happens if you use a recursive query to navigate this hierarchy?

The results of the base query are added to the virtual table. The base query extracts the employee record of the CEO. The database engine then uses this row to find all rows that match the criteria of the recursive part of the query. This will be all employees who report directly to the top-level manager.

For each of these records in turn, the engine will find all employees who report to that person. This is repeated until there are no more employees who meet the condition.

Let’s look at a simple example. A firm of IT consultants has several ongoing projects, and their policy is to schedule weekly progress meetings for each project. A table named projects holds details of new projects. A sample from this table looks like this:

proj_name	start_date	end_date	meet_day	meet_time
Online Shopping	2024-05-01	2024-08-29	2	09:00
Customer Migration	2024-04-01	2024-05-16	4	15:00

The firm wants to create details of scheduled meetings in a table named meetings; this information will be used to send out reminders and book a venue each week. The column meet_day holds the day of the week when meetings will be scheduled. It’s stored as a day number within the week, where 0 represents Sunday.

You could achieve this with the following recursive query:

WITH RECURSIVE date_list
   (proj_name, meet_date, end_date, meet_day, meet_time)
AS (
    SELECT proj_name, start_date, end_date, meet_day, meet_time
	FROM projects
    UNION ALL
    SELECT 
proj_name, 
meet_date + 1,
	end_date, 
meet_day, 
meet_time
    FROM date_list
    WHERE meet_date + 1 <= end_date
	
)
INSERT INTO meetings
SELECT proj_name, meet_date, meet_time
FROM date_list 
WHERE meet_day = EXTRACT (DOW from meet_date)
ORDER BY proj_name, meet_date;

After the query has run, the table meetings holds the following data:

proj_name	meet_date	meet_time
Customer Migration	2024-04-03	15:00:00
Customer Migration	2024-04-10	15:00:00
Customer Migration	2024-04-17	15:00:00
Customer Migration	2024-04-24	15:00:00
Customer Migration	2024-05-01	15:00:00
Customer Migration	2024-05-08	15:00:00
Customer Migration	2024-05-15	15:00:00
Online Shopping	2024-05-07	09:00:00
Online Shopping	2024-05-14	09:00:00
Online Shopping	2024-05-21	09:00:00
Online Shopping	2024-05-28	09:00:00
Online Shopping	2024-06-04	09:00:00
Online Shopping	2024-06-11	09:00:00
Online Shopping	2024-06-18	09:00:00
Online Shopping	2024-06-25	09:00:00
Online Shopping	2024-07-02	09:00:00
Online Shopping	2024-07-09	09:00:00
Online Shopping	2024-07-16	09:00:00
Online Shopping	2024-07-23	09:00:00
Online Shopping	2024-07-30	09:00:00
Online Shopping	2024-08-06	09:00:00
Online Shopping	2024-08-13	09:00:00
Online Shopping	2024-08-20	09:00:00
Online Shopping	2024-08-27	09:00:00

Let’s break the query up and look at what it actually does.

First, it defines the columns that will be included in the CTE date_list:

WITH RECURSIVE date_list
   (proj_name, meet_date, end_date, meet_day, meet_time)

Next it establishes the base data for the recursion, which is the contents of the projects table:

AS (
    SELECT proj_name, start_date, end_date, meet_day, meet_time
	from projects

It then specifies what data must be included in each recursion, with a condition that ensures the recursion terminates when complete:

    UNION ALL
    SELECT proj_name, 
	meet_date + 1,
	end_date, meet_day, meet_time
	FROM date_list
    WHERE meet_date + 1 <= end_date

Finally, the main query inserts the results held in the virtual table into the table meetings.

Does this sound useful? You can learn more about recursive queries and practice some real-world examples if you take our online CTE in PostgreSQL course.

Learn More About PostgreSQL CTEs

Although CTEs in PostgreSQL may not improve the performance of your queries, they certainly make complex queries easier to write and easier to understand. By splitting a long query into component parts, you can organize your thoughts and keep your coding simple. CTEs also facilitate working with hierarchical structures using the RECURSIVE clause.

This article specifically uses PostgreSQL syntax and examples, but CTEs work in much the same way for other SQL dialects like MS SQL Server.

If you want to get comfortable using CTEs, LearnSQL’s Common Table Expressions in PostgreSQL course has over 100 hands-on practice exercises that will help you really understand this tool.

If you’d like some extra practice, try these free 11 SQL Common Table Expression Exercises. Each exercise gives you the kind of challenge you’d face in the real world, and solutions and explanations are included. And if you’re preparing for an interview, here’s some sample interview CTE questions and answers.

I hope this article has given you a good idea of what PostgreSQL CTE can do for you. If you’d like to learn some other advanced PostgreSQL concepts, this article is a good place to start.

Now it’s up to you! Remember, practice makes perfect, so check out our Advanced SQL Practice learning track for more hands-on practice of advanced SQL features!

Tags: