15th Oct 2024 20 minutes read 19 Aggregate Function Exercises Ekre Ceannmor Aggregate Functions SQL Practice Online Practice Table of Contents What Are Aggregate Functions? Why Practice SQL Aggregations? The Dataset Practice These SQL Aggregate Functions Exercise 1: Books in the System Exercise 2: Non-Returned Books Exercise 3: Books by Genre Exercise 4: Authors by Country Exercise 5: Page Ranges per Genre Exercise 5: Page Ranges per Genre Exercise 6: Genres of Big Books Exercise 7: Modern Genres Exercise 8: Books with Multiple Authors Exercise 9: Each Book’s Latest Loan Exercise 10: Book Loans per Month Exercise 11: Popular Books Exercise 12: Overdue Books Exercise 13: Average Authors per Genre Exercise 14: Number of Pages Read by Patrons Exercise 15: Patrons Without Borrowed Books Exercise 16: Authors and Audiences Exercise 17: The Oldest Books Exercise 18: Most Active Patrons Exercise 19: The Most Productive Author Hungry for More SQL Aggregate Function Practice? Solve these 19 SQL aggregate function exercises and sharpen your SQL skills! Practice using aggregate functions with GROUP BY, HAVING, subqueries, and more. Includes a solution and detailed explanation for each exercise. Aggregate functions are an important part of SQL. They allow you to calculate different statistics and generate reports that you would not have been able to with single-row operations like filtering. It’s important to practice aggregate functions often, as they are a crucial part of your SQL skill set. Let’s start! This article includes 19 brand-new aggregate functions exercises using our library database. We’ll cover grouping with GROUP BY, filtering data with HAVING, using aggregate functions within subqueries, and multi-level aggregation. We’ll also refresh your knowledge of working with JOINs and NULLs and on ordering data by several statistics. Once you’re ready to apply these concepts in real-world scenarios, check out our Basic SQL Reporting course! It covers all the aggregate function concepts you will practice in this article, plus other important topics like using CASE WHEN. We also recommend trying out our monthly challenges in our Monthly SQL Practice Track. It’s a collection of practice exercises – with new ones published every month – specifically designed to keep your SQL skills sharp and up to date. What Are Aggregate Functions? Aggregate functions perform calculations on a set of values and return a single value as a result. Common aggregate functions are: SUM() – Computes the sum of all values in each group. AVG() – Calculates the average value of all the values in each group. COUNT() – Returns the number of values in each group. MIN() and MAX() – Return the smallest and largest values (respectively) in each group. These functions are especially useful when creating reports where you need to calculate various metrics. Need a more detailed explanation? Check out our SQL Aggregate Functions Cheat Sheet, which covers all the aggregate functions, their use cases, and their interactions with GROUP BY. Why Practice SQL Aggregations? Regular practice helps you keep your skills sharp, allowing you to analyze and manipulate data more quickly and efficiently. As you practice more, you will be able to solve more complex problems and better optimize your queries. Practicing SQL is vital if you are pursuing a career as a data analyst, database developer, or any other position that deals with a lot of data. See our GROUP BY exercises article to further refine your reporting skills. The Dataset Let’s take a look at the dataset we will be working with for these practice questions. The dataset consists of five tables: book, author, book_author, patron, and book_loan. Here is the schema: Information about books is stored in the table book. It has the following columns: book_id - A unique ID for each book and the primary key of the table. title - The title of the book publication_year - The year when the book was published. Can be NULL if this is unknown. genre - The genre of the book, e.g., ‘Fantasy’ or ‘Mystery’. pages - The number of pages in the book. Here is a snapshot of the data in the table: book_idtitlepublication_yeargenrepages 119841949Political Fiction328 2Animal Farm1945Political Fiction112 3The Hobbit1937Fantasy310 4The Fellowship of the Ring1954Fantasy423 Information about authors is stored in the table author. It has the following columns: author_id - A unique ID for each author and the primary key of the table. author_name - The author’s full name or pseudonym. country - The author’s country. Here is some of the data from the table: author_idauthor_namecountry 1George OrwellUnited Kingdom 2J.R.R. TolkienUnited Kingdom 3Isaac AsimovUnited States 4Agatha ChristieUnited Kingdom Data about people who borrow books from the library is stored in the table patron. It has the following columns: patron_id - A unique ID for each patron and the primary key of the table. patron_name - The patron’s full name. registration_date - The date when the patron registered in the library system. Here is some of the data in the table: patron_idpatron_nameregistration_date 1Alice Johnson2024-01-15 2Bob Smith2024-03-22 3Charlie Brown2024-05-10 4David Wilson2024-06-01 The many-to-many relationship between authors and the books they wrote is stored in the book_author table. It has the following columns: book_author_id - A unique ID for each book-author pair and the primary key of the table. author_id - The ID of the author. book_id - The ID of the book the author wrote. Here is some of the data in the table: book_author_idauthor_idbook_id 111 212 323 424 The many-to-many relationship between patrons and the books they have borrowed is stored in the book_loan table. It has the following columns: loan_id - A unique ID for each loan and the primary key of the table. book_id - The ID of the book loaned. patron_id - The ID of the patron who borrowed the book. loan_date - The date when the book loan was issued. due_date - The date when the book must be returned. return_date - The actual date when the book was returned. Here is some of the data from the table: loan_idbook_idpatron_idloan_datedue_datereturn_date 1112024-01-202024-02-202024-02-15 2812024-02-012024-03-012024-02-28 3322024-02-102024-03-102024-03-05 4432024-03-152024-04-102024-04-15 Return to this section if you forget table or column names when solving the exercises. If you need syntax hints, have our SQL For Data Analysis Cheat Sheet at hand. It covers all the tools that might help you solve these exercises. You can even download it in PDF format and print it, so it helps you with your future exercises! Practice These SQL Aggregate Functions Solve the exercises on your own, then check out the solutions below each exercise. There are also further explanations for each solution if you get stuck. Exercise 1: Books in the System Exercise: Count the number of books recorded in the database. Solution: SELECT COUNT(book_id) FROM book; Explanation: We use the COUNT() function to get the number of rows in the book table. Note two things. First, the query has no WHERE clause, so no rows are filtered; all rows in the table are counted. Second, the query has no GROUP BY clause. When you use an aggregate function without GROUP BY, all rows are put into one group and the function is applied to all rows in this group. Thus our query counts all rows in the book table. Exercise 2: Non-Returned Books Exercise: Count how many books have not yet been returned (i.e., books that do not have a return date). Solution: SELECT COUNT(loan_id) FROM book_loan WHERE return_date IS NULL; Explanation: Books that have not been returned will not have a return_date in the book_loan table; their return_date is NULL. We use this condition in the WHERE clause to only select the books that have not yet been returned. We use the COUNT() function to count the number of rows in the resulting dataset. Keep in mind that the argument used with the COUNT() function is important. Here we put loan_id as the argument, which tells the database to count all the values in the loan_id columns. Alternatively, we could have used COUNT(*) and simply count all rows in the result set, giving the same result. However, COUNT(return_date) would not be appropriate: the result would be 0. All return_dates in the result are NULL because of the return_date IS NULL condition. Again, there is no GROUP BY in this query so the COUNT() function will count all loans where the return date is empty. Exercise 3: Books by Genre Exercise: For each genre, show the genre name and the number of books in that genre. Solution: SELECT genre, COUNT(book_id) FROM book GROUP BY genre; Explanation: This is the most basic GROUP BY exercise. From the table book, we select the genre. To make sure that the COUNT() function returns a separate result for each genre, we split the dataset into groups using GROUP BY genre. This will create groups based on values in the genre column; books with the same value in genre go into the same group. The COUNT() function will work on each group separately, counting the number of books in each group. If you need a refresher on how to work with GROUP BY and aggregate functions, check out our Complete Overview of GROUP BY and Aggregate Functions. Exercise 4: Authors by Country Exercise: For each country, show its name and the number of authors associated with it. Solution: SELECT country, COUNT(author_id) FROM author GROUP BY country; Explanation: This is another basic GROUP BY exercise. We select data from the table author and group it by the values in the country column. Then we apply COUNT(author_id) to each group to count authors coming from this country. Exercise 5: Page Ranges per Genre Exercise: For each genre, show four columns: the genre name, the minimum and maximum number of pages for books in that genre, and the difference between the largest and the smallest number of pages in each book. Solution: SELECT genre, MIN(pages), MAX(pages), MAX(pages) - MIN(pages) AS difference FROM book GROUP BY genre; Explanation: This is another basic GROUP BY exercise. We select data from the table author and group it by the values in the country column. Then we apply COUNT(author_id) to each group to count authors coming from this country. Exercise 5: Page Ranges per Genre Exercise: For each genre, show four columns: the genre name, the minimum and maximum number of pages for books in that genre, and the difference between the largest and the smallest number of pages in each book. Solution: Explanation: To obtain statistics for each genre, group the data from the book table by the genre column. Use aggregate functions MIN(pages) and MAX(pages) to calculate the minimum and maximum number of pages. In the third column, use MIN(pages) - MAX(pages) to calculate the difference for each group. Finally, rename the last column to difference using AS. Exercise 6: Genres of Big Books Exercise: For each genre, show the average number of pages for all books in that genre. Only show genres where the average book has 250+ pages. Name the average pages column avg_pages. Solution: SELECT genre, AVG(pages) AS avg_pages FROM book GROUP BY genre HAVING AVG(pages) >= 250; Explanation: This exercise is similar to the previous one: we group the books by genre and compute the average number of books in each genre using AVG(). However, there’s one new element here: HAVING. HAVING is used to filter groups and find groups for which an aggregate function satisfies a certain condition. In our case, we look for groups (genres) where the average number of pages is higher than or equal to 250. Remember that HAVING works differently than WHERE. WHERE is used to filter individual rows before grouping, while HAVING is used to filter rows after grouping. You can read about the difference between HAVING and WHERE in our article HAVING vs. WHERE in SQL: What You Should Know. Exercise 7: Modern Genres Exercise: Show the average publication year for each genre of books. Round the year to an integer. Only show genres where the average publication year is after 1940. Solution: SELECT genre, ROUND(AVG(publication_year)) FROM book GROUP BY genre HAVING ROUND(AVG(publication_year)) > 1940; Explanation: This exercise is similar to the previous one: we group books by genre and compute the average publication year using AVG(publication_year). Then we filter for genres with an average aggregation year higher than 1940 using HAVING. Exercise 8: Books with Multiple Authors Exercise: For books that have been written by more than one author, show each book’s title and number of authors. Solution: SELECT title, COUNT(author_id) FROM book b JOIN book_author ba ON b.book_id = ba.book_id GROUP BY b.book_id HAVING COUNT(author_id) > 1; Explanation: First, we have to find the authors for each book. To that end, we join the tables book and book_author on their common book_id. This will combine data for each book with the data for its authors: one row for each book-author combination. We then group rows by book_id: all rows related to the same book are in the same group. Thus, all authors for each book will be in the same group. We then apply the function COUNT(author_id) to count the authors in each group. Finally, we filter out any books with just one author by using HAVING COUNT(author_id) > 1. Exercise 9: Each Book’s Latest Loan Exercise: For each book, show its title and the most recent date it was loaned. Name the second column last_loaned. Show NULL in the second column for all books that have never been loaned. Solution: SELECT book.title, MAX(book_loan.loan_date) AS last_loaned FROM book LEFT JOIN book_loan ON book.book_id = book_loan.book_id GROUP BY book.book_id, book.title; Explanation: Use a LEFT JOIN to join the book and book_loan tables to make sure that books that have never been loaned are also included in the result set. Group the results by book_id and book_title. Note that you should not just group the result by the title; if two books have the same title, they would be mistakenly put in the same group. Group by book_id (since it uniquely identifies each book) and title (because SQL throws an error if an unaggregated column in SELECT is not put in GROUP BY). You can read about it in our article 7 Common GROUP BY Errors. To get the latest loan_date, use MAX(loan_date). The later dates are treated as “bigger”. If there are no book loans for this book, all of its loan dates will be NULL and the MAX() function will return NULL for this book. Exercise 10: Book Loans per Month Exercise: Show how many book loans were issued each month of each year. Show three columns: The year and month part of the loan_date as numbers in the first two columns. Name them loan_year and loan_month A column counting how many books were loaned out that month. Order the result by the year and then the month, showing older dates first. Solution: SELECT EXTRACT(YEAR FROM loan_date) AS loan_year, EXTRACT(MONTH FROM loan_date) AS loan_month, COUNT(loan_id) FROM book_loan GROUP BY EXTRACT(MONTH FROM loan_date), EXTRACT(YEAR FROM loan_date) ORDER BY loan_year, loan_month; Explanation: We use EXTRACT(YEAR FROM loan_date) and EXTRACT(MONTH FROM loan_date) to get the year and month parts from the loan_date. We again use EXTRACT() in GROUP BY to group loans from the same months together. We use the COUNT() function to calculate the number of loans done in each month. Finally, we order the results by the loan_year and loan_month. Note that you can use the column aliases in the ORDER BY statement. It operates on the final result set when the column names are known. In the GROUP BY, however, you still have to use the functions; when this clause is processed, the EXTRACT() function (and thus the new columns) have not yet been defined. You can read more about this topic in our article SQL Order of Operations. Exercise 11: Popular Books Exercise: For each book, show its title, the number of times it has been loaned, and the number of different patrons who have borrowed the book. Name the last two columns times_loaned and different_patrons. Solution: SELECT title, COUNT(loan_id) AS times_loaned, COUNT(DISTINCT patron_id) AS different_patrons FROM book b LEFT JOIN book_loan bl ON b.book_id = bl.book_id GROUP BY b.title, b.book_id; Explanation: To find book loans for each book, you have to join tables book and book_loan. Use a LEFT JOIN to make sure books that have never been loaned out also show up in the result. We want to group loans for each book together, so we have to group by both book_id and the book title (for the same reason as we discussed in Exercise 9). We want to count the number of times the book was loaned and the number of different patrons borrowed the book. To do this, we have to use the COUNT() function twice. First, we use COUNT(loan_id) to count the number of loans for the book. The second usage of COUNT() is more interesting: we want to count the different patrons who borrowed the book. If someone borrowed the same book multiple times, we want to count them only once. Thus we use COUNT(DISTINCT patron_id). Using DISTINCT will ensure that even if one patron borrowed the same books several times, their ID will only be counted once. You can read more about this by reading What is the Difference Between COUNT(*), COUNT(1), COUNT(column), and COUNT(DISTINCT)? Exercise 12: Overdue Books Exercise: For each patron, show their name and the amount of books they have (had) overdue (i.e. with a return date after the due date). Solution: SELECT patron_name, COUNT(book_id) AS overdue_books FROM patron p LEFT JOIN book_loan bl ON p.patron_id = bl.patron_id AND return_date > due_date GROUP BY p.patron_id, patron_name; Explanation: Join patron and book_loan using a LEFT JOIN to make sure that patrons who don’t have any overdue book loans are also included in the result. To only select loans where the return date is after the due date, use a combined joining condition: ON p.patron_id = bl.patron_id AND return_date > due_date. The first part will only join the rows that are actually related. The second part is used as an additional filter to only join in places where return_date > due_date. Note that this is different from using a WHERE clause later in the query. The WHERE clause will discard any rows where loan_id IS NULL. However, we want to keep those rows to include patrons who don’t have any overdue books. We group rows by the patron_id and patron_name (for the same reasons as in Exercise 9). Finally, we use COUNT(book_id) to count overdue books for each patron. The COUNT() will return 0 for patrons who have never made any loans and for patrons who always returned their books on time. Exercise 13: Average Authors per Genre Exercise: For each genre, show its name and the average number of authors books of that genre have. Name the second column average_authors_per_book Solution: WITH number_of_authors AS ( SELECT book_id, COUNT(author_id) AS author_count FROM book_author GROUP BY book_id ) SELECT genre, AVG(author_count) AS average_authors_per_book FROM number_of_authors na JOIN book b ON na.book_id = b.book_id GROUP BY genre; Explanation: Here we use a construction called a common table expression (CTE). You can read more about CTEs in our Guide to Common Table Expressions. In short, a CTE allows you to create a named temporary result set you can use in the query. You create a CTE using this syntax: WITH <cte_name> AS (query) Any query within the parenthesis will act as a virtual table named cte_name and will be accessible to the main query (the SELECT statement after the CTE’s closing parenthesis). In the CTE, we compute the number of authors for each book. We select the book_id and the count of authors. This is a similar query to the one in Exercise 8. In the outer query, we join our number_of_authors CTE with the book table to display the genre of each book. Then we use AVG(author_count) and GROUP BY genre to get the final result. If you don’t want to use a CTE, you can get the same result using a subquery: SELECT genre, AVG(author_count) AS average_authors_per_book FROM (SELECT book_id, COUNT(author_id) AS author_count FROM book_author GROUP BY book_id) AS na JOIN book b ON na.book_id = b.book_id GROUP BY genre; Exercise 14: Number of Pages Read by Patrons Exercise: For each patron, show their name and the total number of pages they’ve read (i.e. the page count from all the books that they have borrowed). We’re assuming they read each book entirely. Include all books, even those they have not yet returned. Only show results for patrons that have read more than 1,000 pages. Solution: SELECT patron_name, SUM(pages) AS total_pages_read FROM book b JOIN book_loan bl ON b.book_id = bl.book_id JOIN patron p ON p.patron_id = bl.patron_id GROUP BY p.patron_id, p.patron_name HAVING SUM(pages) > 1000; Explanation: Join three tables using a regular JOIN: patron, book_loan, and book. Use SUM(pages) to sum up the number of pages across all books that the patron has borrowed. Filter with HAVING SUM(pages) > 1000 to only show patrons who have read more than 1000 pages. Note: Since we want to show the patrons who have read more than 1,000 pages, there’s no need to use a LEFT JOIN or a FULL JOIN here. The patrons who read 0 pages will be filtered out with our HAVING condition anyway. Exercise 15: Patrons Without Borrowed Books Exercise: Show the total number of patrons who have never borrowed books. Solution: SELECT COUNT(p.patron_id) FROM patron p WHERE NOT EXISTS ( SELECT * FROM book_loan WHERE patron_id = p.patron_id ); Explanation: To only show patrons who have never borrowed books, filter the result with a WHERE NOT EXISTS clause. For any patron who has never borrowed books, there will not exist a book_loan entry with that patron’s id. Use a subquery to find a set of book loans for each patron, then use the result of that subquery in the WHERE NOT EXISTS clause. This will make sure that all patrons in the resulting set do not have any corresponding book_loans. Finally, use the COUNT() function to count the selected patron_ids. Exercise 16: Authors and Audiences Exercise: For each author, show how many different patrons have borrowed their book. Solution: SELECT a.author_name, COUNT(DISTINCT patron_id) AS distinct_patrons FROM author a JOIN book_author ba ON a.author_id = ba.author_id JOIN book b ON b.book_id = ba.book_id LEFT JOIN book_loan bl ON bl.book_id = b.book_id GROUP BY a.author_id, a.author_name; Explanation: Join four tables: author, book_author, book, and book_loan. Use a regular JOIN with the first three tables and a LEFT JOIN between book and book_loan. The LEFT JOIN will ensure that even if the book has never been loaned, it will still be shown in the result. SELECT the author’s name and use COUNT(DISTINCT patron_id) to count all the different patrons that have borrowed books. If the author’s books were never borrowed, COUNT() will return 0. Group the result by the author’s ID and name to avoid the errors we talked about earlier. Exercise 17: The Oldest Books Exercise: Find the oldest book(s) in the database (i.e. the book(s) with the oldest publication_year). Show only two columns: title and publication_year. Remember that there may be more than one book with the oldest publication year. Solution: SELECT title, publication_year FROM book WHERE publication_year = ( SELECT MIN(publication_year) FROM book ); Explanation: Use a subquery to find the oldest books. Only select the title and publication year of books that have the publication_year equal to the lowest publication year in the system. You can find the earliest publication_year with MIN(publication_year). Use this expression within a subquery, then compare the publication_year of each book to the result of the subquery. Exercise 18: Most Active Patrons Exercise: Find the names of all patrons who have borrowed an above-average number of books. Show the number of books they borrowed together with their name. Solution: SELECT patron_name, COUNT(*) AS loan_count FROM patron JOIN book_loan ON patron.patron_id = book_loan.patron_id GROUP BY patron_name HAVING COUNT(*) > ( SELECT COUNT(*) FROM book_loan ) / ( SELECT COUNT(*) FROM patron ); Explanation: Join the patron and book_loan tables and group the results by the patron’s name and ID. To only show patrons that have borrowed an above-average number of books, use the HAVING clause comparing the current patron’s loan count with the average number of books borrowed per patron. Find that average by dividing the total number of loans by the total number of patrons. Because / in SQL is integer division (meaning the remainder is discarded) use > (greater than), not >= (greater or equal) to compare the values in the HAVING clause. Exercise 19: The Most Productive Author Exercise: Find the author that wrote the most books. Solution: WITH authors_books_count AS ( SELECT author_id, COUNT(*) AS book_count FROM book_author GROUP BY author_id ) SELECT author_name, book_count FROM author JOIN authors_books_count abc ON author.author_id = abc.author_id WHERE book_count = ( SELECT MAX(book_count) FROM authors_books_count ); Explanation: In the CTE authors_books_count, we find the number of books each author has written using that author’s id. In the outer query, we select the authors whose book count is the same as the maximum book count. We use a subquery and the MAX() function to select the maximum book count from the CTE and compare it to the book_count for each author. Hungry for More SQL Aggregate Function Practice? And that marks the end of this SQL aggregate function practice set. Now your aggregate function skills are up to date! You can check your theoretical knowledge with these GROUP BY Interview Questions. We’ve covered different aggregate functions, GROUP BY, HAVING and so much more! We’ve practiced different types of JOINs, subqueries, and working with NULLs. Hungry for more? Check out the exercises in our huge SQL Practice Track; it has more than 100 hours of SQL practice! Or try our Monthly SQL Practice, which provides a new SQL practice course each month! Tags: Aggregate Functions SQL Practice Online Practice