Back to articles list Articles Cookbook
5 minutes read

Learn to Write a SQL Correlated Subquery in 5 Minutes

If you’re familiar with the famous Russian nesting doll, then SQL correlated subqueries should be a peace of cake to understand—subqueries are just queries nested within queries. An SQL subquery is often called an “inner” query; the main query is usually called the “outer” query. This article covers everything you need to know about correlated subqueries.

What Exactly is a SQL Correlated Subquery?

A correlated SQL subquery is just a subquery that is executed many times—once for each record (row) returned by the outer (main) query. In other words, the outer query returns a table with multiple rows; the inner query then runs once for each of those rows. If your outer query returns 10 rows, then the inner query will run 10 times. And if your outer query returns 100 rows, the inner query will run 100 times.

When Should You Use a SQL Correlated Subquery?

Some data questions can only be answered with correlated subqueries. This is particularly true when asking negative data questions.

Negative data questions arise when we search for records that don’t satisfy a particular condition. An example of a simple negative data question is: “Obtain the names of all movies not produced by Steven Spielberg.”

We’ll solve an example of a negative data question in this article. Before we do so, let's first review the tables involved. We have two tables: the employee table and the payment_history table. The second table has a column named payment_type that denotes whether a payment is part of a regular salary, a bonus, or an award. Let's see some sample data from the tables we will use:

employee

employee_id payment_type amount_paid payment_date
100 salary 2000.00 2018-Mar-02
101 salary 1800.00 2018-Mar-02
102 salary 1900.00 2018-Mar-02
101 award 500.00 2018-Mar-08
102 adjustment 124.70 2018-Mar-10

payment_history

employee_id payment_type amount_paid payment_date
100 salary 2000.00 2018-Mar-02
101 salary 1800.00 2018-Mar-02
102 salary 1900.00 2018-Mar-02
101 award 500.00 2018-Mar-08
102 adjustment 124.70 2018-Mar-10

Here’s the negative data question we’d like to answer:

“Obtain the names of employees who never received an award”

The SQL query that answers the previous question is the following:

SELECT last_name, first_name
FROM    employee e1
WHERE NOT EXISTS (SELECT ph.last_name 
                                      FROM payment_history ph 
                                      WHERE ph.employee_id = e1.employee_id 
                                      AND ph.payment_type = 'award')

SQL Correlated Subqueries Vs. Simple Subqueries

The main difference between a SQL correlated subquery and a simple subquery is that a SQL correlated subquery references columns from the table of the outer query.

In the above example, ph.employee_id = e1.employee_id is a reference to the outer subquery table (e1). To identify a correlated subquery, just look for these kinds of references. If you find at least one, you have a correlated subquery!

The negative part of a data question is often solved in a SQL correlated subquery by using the NOT EXISTS operator in the WHERE clause. EXISTS is an operator always followed by a subquery. If the subquery returns at least one record, then EXISTS evaluates to TRUE. If the subquery returns an empty set, then EXISTS evaluates to FALSE. Note that we use NOT EXISTS, which is simply the opposite of EXISTS.

The result of the previous query is:

first_name last_name
John Smith
Alice Johnson

Another Correlated Subquery Example

In this example, we’ll try to obtain the names of all employees who earned higher salaries in March 2018 than their average monthly salaries for all previous months using a SQL correlated subquery. Here’s the query we’ll run:

SELECT    first_name, last_name 
FROM       employee e1, payment_history ph
WHERE    e1.employee_id = ph.employee_id 
     AND     amount_paid > = (
                               SELECT AVG(amount_paid) FROM payment_history ph2
                               WHERE ph2.employee_id = e1.employee_id
                                     AND ph2.payment_date  < '01/03/2018'
                                     AND ph2.payment_type = 'salary' 
                               )
     AND    month(ph.payment_date) =3 
     AND    year(ph.payment_date) = 2018 
     AND    ph.payment_type ='salary'

Enough Negativity. What About Positive Data Questions?

Must we use a SQL correlated subquery to answer a positive data question? No, you don’t have to. You can still do so if you want to, though. For positive questions, we can usually just use a JOIN condition or a relationship between two tables.

Let's change our previous question to a positive one and solve it with a JOIN instead a correlated subquery. The question becomes: “Obtain the names of employees who received award payments.” And the SQL query (without correlated subqueries) that answers this question is:

SELECT    first_name, last_name 
FROM       employee e1 
                 JOIN payment_history ph ON  ph.employee_id = e1.employee_id 
WHERE    ph.payment_type =award'

The result is:

first_name last_name
Kate Miller

Warning: Your SQL Correlated Subquery Is Likely to Be Slow

I’d just like to mention that we try not to overuse SQL correlated subqueries, if possible. Recall that a correlated subquery runs once for each record returned by the outer query. If the outer query returns thousands upon thousands of records, you can imagine how quickly your query will slow down in performance. In general, you should only use a SQL correlated subquery if it’s absolutely necessary.

Try a SQL Correlated Subquery Yourself!

We’ve seen that correlated subqueries are an important part of the SQL language and can help us answer different data questions, especially negative ones. We also explained how to recognize a SQL correlated subquery and why we should generally try to avoid correlated subqueries, if possible, due to performance reasons.

To improve your subquery skills, try LearnSQL.com’s SQL Basics course. We have a specific section devoted to subqueries, with plenty of practical exercises and correlated subqueries examples to help you master the content.