Back to articles list Articles Cookbook
5 minutes read

SQL Subqueries

The article describes what a subquery is and what these useful statements look like. We will cover basic examples with the IN, EXISTS, ANY, and ALL operators, look at subqueries in FROM and WHERE clauses, and explore the difference between correlated and nested subqueries.

First, let’s start with an example database. To present some of these statements we need to have an example table and fill it with some data.

 

What is a Subquery?

A subquery is a SELECT statement with another SQL statement, like in the example below.

SELECT *
FROM product
WHERE id IN (
  SELECT product_id
  FROM provider_offer
  WHERE provider_id = 156
);

Question: What is an SQL Subquery?

A SQL subquery is a query within another query; this structure allows complex data retrieval across multiple tables in a single command. Subqueries are used in WHERE, FROM, and SELECT clauses, simplifying queries and supporting in-depth data analysis.

Subqueries are further classified as either a correlated subquery or a nested subquery. They are usually constructed in such a way to return:

  1. a table
    SELECT MAX(average.average_price)
    FROM (
      SELECT
        product_category,
        AVG(price) AS average_price
      FROM product
      GROUP BY product_category
    ) average;
    
  2. or a value
    SELECT id
    FROM purchase
    WHERE value > (
      SELECT AVG(value)
      FROM purchase
    );
    

Want to know more about SQL Subqueries? Check out our We Learn SQL series on Youtube. Remember to subscribe to our channel.

Nested Subqueries

Nested subqueries are subqueries that don’t rely on an outer query. In other words, both queries in a nested subquery may be run as separate queries.

This type of subquery could be used almost everywhere, but it usually takes one of these formats:

SELECT
FROM
WHERE [NOT] IN (subquery)
SELECT *
FROM client
WHERE city IN (
  SELECT city
  FROM provider
);

The example subquery returns all clients that are FROM the same city as the product providers.
The IN operator checks if the value is within the table and retrieves the matching rows.

SELECT
FROM
WHERE expression comparison_operator [ANY| ALL] (subquery)

Subquery With ALL Operator

The ALL operator compares a value to every value FROM the result table.

For example, the following query returns all of the models and producers of bikes that have a price greater than the most expensive headphones.

SELECT producer, model
FROM product
WHERE product_category = 'bike'
  AND price > ALL(
    SELECT price
    FROM product
    WHERE product_category = 'headphones'
  );

Similar subquery but with ANY operator:

Subquery With ANY Operator

The ANY operator compares a value to each value in a table and evaluates whether or not the result of an inner query contains at least one row.

The following query returns all of the models and producers of bikes that have a price greater than at least one of the headphones.

SELECT producer, model
FROM product
WHERE product_category = 'bike'
  AND price > ANY(
    SELECT price
    FROM product
    WHERE product_category = 'headphones'
  );

You can also nest a subquery in another aubquery. For example:

Subquery Nested in Another Subquery Using IN Operator

This query returns producers and models of bikes that exist in provider’s offers FROM the USA.

SELECT producer, model
FROM product
WHERE product_category = 'bike'
  AND id IN (
    SELECT distinct product_id
    FROM provider_offer
    WHERE provider_id IN (
      SELECT id
      FROM provider
      WHERE country = 'USA'
    )
  );

The same could be done using joins.

SELECT product.producer, product.model
FROM product, provider_offer, provider
WHERE provider_offer.product_id = product.id
  AND provider_offer.provider_id = provider.id
  AND product_category = 'bike'
  AND provider.country = 'USA';

Correlated Subqueries

Subqueries are correlated when the inner and outer queries are interdependent, that is, when the outer query is a query that contains a subquery and the subquery itself is an inner query. Users that know programming concepts may compare it to a nested loop structure.

Let’s start with a simple example.

The inner query calculates the average value and returns it. In the outer query’s WHERE clause, we filter only those purchases which have a value greater than the inner query’s returned value.

Subquery Correlated in WHERE Clause

SELECT id
FROM purchase p1
WHERE date > '2013-07-15'
  AND value > (
    SELECT AVG(value)
    FROM purchase p2
    WHERE p1.date = p2.date
  );

The query returns purchases after 15/07/2014 with a total price greater than the average value FROM the same day.

The equivalent example, but with joining tables.

SELECT  p1.id
FROM purchase p1, purchase p2
WHERE p1.date = p2.date
  AND p1.date > '2013-07-15'
GROUP BY p1.id
HAVING p1.value > AVG(p2.value);

This example can also be written as a SELECT statement with a subquery correlated in a FROM clause.

The subquery returns the table that contains the average value for each purchase for each day. We join this result with the Purchase table on column ‘date’ to check the condition date > ’15/07/2014′.

SELECT id
FROM
  purchase,
  (
    SELECT date, AVG(value) AS average_value
    FROM purchase
    WHERE date > '2013-07-15'
    GROUP BY date
  ) average
WHERE purchase.date  = average.date
  AND purchase.date  > '2013-07-15'
  AND purchase.value > average.average_value;

Usually, this kind of subquery should be avoided because indexes can’t be used on a temporary table in memory.

Subquery With EXISTS

SELECT
FROM
WHERE [NOT] EXISTS (subquery)

The EXISTS operator checks if the row FROM the subquery matches any row in the outer query. If there’s no data matched, the EXISTS operator returns FALSE.

This Query returns all clients that ordered after 10/07/2013.

SELECT id, company_name
FROM client
WHERE EXISTS(
  SELECT *
  FROM purchase
  WHERE client.id = purchase.client_id
  WHERE date > '2013-07-10'
);

When a subquery is used, the query optimizer performs additional steps before the results FROM the subquery are used. If a query that contains a subquery can be written using a join, it should be done this way. Joins usually allow the query optimizer to retrieve the data in a more efficient way.

Subquery Limitations

There are some limitations of using subqueries:

  • In Oracle you can nest up to 255 levels of subqueries in a WHERE clause.
  • In SQL Server you can nest up to 32 levels.