Back to articles list Articles Cookbook
13 minutes read

10 Correlated Subquery Exercises with Solutions

Correlated subqueries are a powerful SQL feature essential for advanced data analysis. This article provides 10 practical exercises to help you master correlated subqueries.

SQL is a fundamental skill for anyone working with data, whether as a data analyst, a SQL developer, a data engineer, or any other related profession. Mastering SQL involves more than just understanding the basics. It also requires learning advanced SQL features, such as subqueries.

A subquery is a SQL query nested inside a larger query. There are many different types of subqueries. A correlated subquery is a type of subquery that refers to the outer query and it cannot be executed independently. Subqueries, and particularly correlated subqueries, can be a challenging topic for SQL learners.

Previously, we provided a collection of subquery exercises in our article: SQL Subquery Practice: 15 Exercises with Solutions. In this article, we want to give you a set of practice exercises specifically on the topic of correlated subqueries. The exercises in this article are taken from our interactive Basic SQL Practice: Store and Basic SQL Practice: University courses. Both of these courses are part of "SQL Practice" track, which offers multiple SQL practice courses. All courses in the track are based on real-world scenarios and are organized by SQL topics to help you focus on your area of interest.

Exercises on a Store Database

The first set of exercises is based on the database of an online store. These exercises are taken from our course Basic SQL Practice: Store. We’ll start small and first work with only two tables: product and category.

The product table contains a list of products available in the store.

  • product_id – the ID of the product.
  • product_name – the name of the product.
  • category_id – the ID of the product's category. It helps you connect with the category table.
  • quantity_per_unit – the quantity of product items in one unit.
  • unit_price – the price of the product
  • discontinued – the information about whether the product is still available in the store (the FALSE value), or if it has been discontinued (TRUE).

The category table contains information about the categories of the products:

  • category_id – the ID of the category.
  • name – the category name
  • description – the optional description of the category

Exercise 1: Expensive products in each category

Exercise: Find products that are more expensive than the average price of products in their own category. Include the category name, product name, and unit price in the result.

Solution:

SELECT
  c.name,
  p.product_name,
  p.unit_price
FROM category AS c
JOIN product AS p
  ON c.category_id = p.category_id
WHERE p.unit_price > (
  SELECT AVG(unit_price)
  FROM product
  JOIN category
    ON product.category_id = category.category_id
  WHERE category.category_id = c.category_id
  );

Solution explanation:

In the solution, we use a correlated subquery to calculate the average unit price for products within the same category as the product being processed by the outer query. This subquery references the category, denoted as c, from the outer query. It identifies products in this category and computes their average price. Products that cost more than the average for their category are filtered out using this average. The outer query then retrieves and displays the category name, product name, and unit price of these products.

Exercise 2: The number of expensive products in each category

Exercise: Show the names of categories and the number of products from this category with a unit price greater than the average price of a product in this category. Show only the categories that have such product(s). Display two columns: name (the name of the category), and expensive_products (the number of products that cost more than the average product in this category).

Solution:

SELECT
  c.name,
  COUNT(*) AS expensive_products
FROM category AS c
JOIN product AS p
  ON c.category_id = p.category_id
WHERE p.unit_price > (
  SELECT
    AVG(unit_price)
  FROM product
  JOIN category
    ON product.category_id = category.category_id
  WHERE category.category_id = c.category_id
  )
GROUP BY
  c.category_id,
  c.name;

Solution explanation:

This exercise is similar to the previous one. We also need to find products with unit price higher than the average price for their category. However, this time we want to count products in each category.

In the solution, we use the same correlated subquery as before: we calculate the average unit price for products in the same category as the product being processed by the outer query. This average is then used in the main query to filter only those products whose unit price is higher than the category's average. The main query counts these products using COUNT(*) and groups the results by category with GROUP BY. The final result displays the category name and the count of products.

Exercise 3: Discontinued, continued, and all products in a category

Exercise: For each category, display its name, the number of discontinued products in this category (discontinued_products), the number of continued products in this category (continued_products), and the number of all products in this category (all_products).

Solution:

SELECT
  c.name,
  (SELECT COUNT(*) FROM product WHERE category_id = c.category_id AND discontinued IS TRUE) AS discontinued_products,
  (SELECT COUNT(*) FROM product WHERE category_id = c.category_id AND discontinued IS FALSE) AS continued_products,
  (SELECT COUNT(*) FROM product WHERE category_id = c.category_id) AS all_products
FROM category c;

Solution explanation:

Here we use three correlated subqueries in the SELECT statement to get counts of products based on their continued/discontinued status for each category. Each subquery correlates with the main query through the category_id so that the counts are correct for each category.

The first subquery counts the number of products in a category where the discontinued flag is TRUE. This gives us a total count of all products that are no longer available under each category. The second subquery counts the number of products that are still active in each category, using a similar method. The third subquery simply counts the number of all products in a category.

This exercise could alternatively be solved without subqueries, by using a combination of CASE WHEN with SUM and GROUP BY. As an alternative exercise: can you solve this exercise both ways?

More Tables in the Store Database: Customer and Purchase Tables

In the following exercises, we’ll work with additional tables in the store database related to purchases: customer, purchase, and purchase_item.

The customer table contains the information about the customers. It has the following columns:

  • customer_id – the ID of the customer.
  • contact_name – the full name of the customer.
  • contact_email – the customer email.

The purchase table contains the information about each order:

  • purchase_id – the ID of the purchase.
  • customer_id – the ID of the customer.
  • total_price – the total price of the order.
  • purchase_date – the timestamp of the purchase.

The purchase_item table connects the purchases with the products. The table contains the following columns:

  • purchase_id – the purchase ID.
  • product_id – the ID of the purchased product.
  • unit_price – the price of one unit of a product.
  • quantity – the number of purchased units of a product.

Exercise 4: Products in categories

Exercise: For each purchased product, display the product name, the maximum quantity in which it was purchased, and the number of such purchases (of this product in this maximum quantity). Display three columns: product_name, quantity, and purchases_number.

Solution:

SELECT 
  product_name,
  quantity,
  COUNT(purchase_id) AS purchases_number
FROM purchase_item pi
JOIN product p
  ON pi.product_id = p.product_id 
WHERE quantity = (SELECT MAX(quantity) FROM purchase_item WHERE product_id = pi.product_id)
GROUP BY
  pi.product_id,
  product_name,
  quantity;

Solution explanation:

Here we want to find the maximum quantity in which each product was purchased and how many times such purchases occurred. We use a correlated subquery and GROUP BY to achieve this result.

First, we use a correlated subquery in WHERE to determine the maximum quantity in which each product has been purchased. Then in the outer query we use this value to find purchases where the quantity is equal to this maximum quantity. Finally, the outer query groups the results by product_id, product_name, and quantity and uses the aggregate function COUNT(purchase_id) to compute the number of purchases for each product at the maximum quantity.

Exercise 5: Percentage of money spent by the customer on the purchase

Exercise: For each customer who made any purchase, display the ID of each purchase made by this customer, the percentage of the money spent on that purchase relative to all the money spent by that customer. Round the percentages to integers. Show three columns: contact_name, purchase_id and percentage.

Solution:

SELECT
  contact_name,
  purchase_id,
  ROUND(total_price * 100.0 / (SELECT SUM(total_price) FROM purchase WHERE customer_id = p.customer_id)) AS percentage
FROM purchase p
JOIN customer 
  ON p.customer_id = customer.customer_id;

Solution explanation:

In the outer query we list each purchase done by each customer. We join purchase and customer tables to display the customer contact name, and the purchase ID. We use a correlated subquery to find out the total amount of money spent by the current customer. We use the amount found by the subquery to compute what percentage of the total spending that the current purchase represents.

Exercise 6: Customers with purchases above their average purchase amount

Exercise: Find customers whose last purchase total was higher than their average purchase amount. Display the customer name and last purchase total.

Solution:

SELECT 
  c.contact_name, 
  p.total_price AS last_purchase_total
FROM purchase p
JOIN customer c 
ON p.customer_id = c.customer_id
WHERE p.purchase_date = (
    SELECT MAX(lp.purchase_date)
    FROM purchase lp
    WHERE lp.customer_id = p.customer_id
  )
AND p.total_price > (
    SELECT AVG(ap.total_price)
    FROM purchase ap
    WHERE ap.customer_id = p.customer_id
);

Solution explanation:

This exercise needs two subqueries. The first subquery is used to find the date of the most recent purchase for each customer, similar to the subquery in Exercise 4, but focusing on the transaction date rather than quantities. The second subquery calculates the average total price of the customer’s purchases and is used to filter out purchases where the price is higher than the average. Both subqueries are combined in the WHERE clause using an AND operator.

Exercises on University Data Model

The second set of exercises in this article is based on the database of a university. These exercises are taken from our course Basic SQL Practice: University. The university database has 4 tables.

The course table contains information about the courses offered at the university and the learning paths they are associated with:

  • id – A unique ID for each course.
  • title – The name of the course.
  • lecture_hours – Total number of lecture hours in the course.
  • tutorial_hours – Total number of tutorial hours in the course.

The student table contains all the information about each student attending the university:

  • id – A unique ID for each student.
  • first_name – The student's first name.
  • last_name – The student's last name.

The course_edition table contains information on which lecturers teach each course in each semester:

  • id – The ID of the course edition.
  • course_id – The ID of the course.
  • academic_semester – The calendar year and term (fall or spring) of the semester
  • lecturer – the name of the lecturer who teaches the course.

The course_enrollment table contains the information about the student enrolled in a course edition:

  • course_edition_id – The ID of the course edition.
  • student_id – the ID of the student.
  • midterm_grade – The midterm grade the student received.
  • final_grade – The final grade the student received.
  • course_letter_grade – The course grade, in the form of a letter (A+, A, A-, B+, B, B-, C+, C, C-, D+, D, D-, or F).
  • passed – Whether the student passed (TRUE) or failed (FALSE) the course.

If a student has not taken their exam yet, you may see some NULL fields in the table.

Exercise 7: Students with grade above the average

Exercise: Find students who got an above average final grade in any of the course editions they attended. Show: student first and last name, course title, and their final_grade. Only show students whose final grade in this course edition was higher than the average final grade in this course edition.

Solution:

SELECT
  first_name,
  last_name,
  title,
  final_grade
FROM course_enrollment AS c_en
JOIN student AS s
  ON c_en.student_id = s.id
JOIN course_edition AS c_ed
  ON c_en.course_edition_id = c_ed.id
JOIN course AS c
  ON c_ed.course_id = c.id
WHERE final_grade > (
  SELECT AVG(final_grade)
  FROM course_enrollment AS c_e
  WHERE c_e.course_edition_id = c_en.course_edition_id
);

Solution explanation:

The solution uses a single correlated query. In the correlated subquery we find the average final grade for the course edition processed in the main query. We then use the average found in the subquery to filter course enrollments where the student’s final grade was above the average. In the main query we join tables course_enrollment, student, course_edition, and course to combine the first and last name of the student with the course title and the grade they received.

Exercise 8: Number of students with same grades or above

Exercise:For each final grade in the course_enrollment table, count the number of students who got exactly that grade or above it. Show two columns: final_grade and students_number.

Solution:

SELECT DISTINCT
  final_grade,
  (SELECT COUNT (student_id)
   FROM course_enrollment
   WHERE final_grade >= c.final_grade) AS students_number
FROM course_enrollment c;

Solution explanation:

The main query selects each unique (distinct) final grade from the course_enrollment table. For each of these grades, the correlated subquery counts the number of students whose final grades are equal to or higher than the grade being processed by the outer query. The result is displayed in two columns: final_grade, which shows the grade under consideration, and students_number, which indicates the number of students who achieved that grade or a higher one.

Exercise 9: Students who passed and those who didn’t

Exercise: Divide the students from each course edition into two groups: those who passed the course and those who didn't. For each group, display the following columns:

  • course_edition_id
  • passed
  • average_final_grade – The average for students in this group (either passed or not passed), rounded to 2 decimal places.
  • average_edition_grade – The overall average for students in this course edition, rounded to 2 decimal places.

Solution:

SELECT
  course_edition_id,
  passed,
  ROUND(AVG(final_grade), 2) AS average_final_grade,
  (SELECT ROUND(AVG(final_grade), 2)
         FROM course_enrollment
         WHERE course_edition_id = c_e.course_edition_id) AS average_edition_grade
FROM course_enrollment c_e
GROUP BY 
  course_edition_id, 
  passed;

Solution explanation:

The outer query selects the course edition ID, passed value, and the average final grade for this group of students (either 'passed' or 'not passed') using the regular GROUP BY clause. It also uses a correlated subquery to find the overall average final grade for the course edition processed in the outer query and adds this average to the result of the query. This way, you can compare the final grade among each group with the overall average.

Exercise 10: Averages again

Exercise: For each course edition, show the following data:

  • title
  • average_result – The average final grade for students with a final grade higher than the course edition average. Round the result to 0 decimal places (i.e. an integer).
  • results_better_than_average – The number of students whose final grade is greater than the average final grade for this course edition.

Solution:

SELECT
  c.title,
  ROUND(AVG(final_grade)) AS average_result,
  COUNT (student_id) AS results_better_than_average
FROM course_enrollment AS c_en
JOIN course_edition AS c_ed
  ON c_en.course_edition_id = c_ed.id
JOIN course AS c
  ON c_ed.course_id = c.id
WHERE final_grade > (
  SELECT AVG(final_grade)
  FROM course_enrollment AS c_en2
  WHERE c_en2.course_edition_id = c_en.course_edition_id
)
GROUP BY 
  c.id, 
  c.title;

Solution explanation:

In the correlated subquery we compute the average final grade for the course edition processed by the outer query. We then used this value to filter students whose final grade in this course edition was above the average. In the outer query we display the title of the course, the number of students with the final grade higher than the average, and the average final grade for this group of students.

Conclusion and Next Steps

In this article, we've explored several correlated subquery exercises to improve your understanding of this important SQL feature. For those looking to further develop their SQL skills, we recommend our "SQL Practice" track. It includes 10 different SQL practice courses. Each course is filled with hands-on exercises based on real-world scenarios and is organized by SQL topics to help you focus on your area of interest

Additionally, we recommend checking out our All Forever SQL Package – the ultimate deal for SQL learners. This one-time payment offer provides lifetime access to all current and future courses in our catalog, including everything in the 'SQL Practice' track. It’s an invaluable resource for anyone serious about becoming a SQL expert.