11th Nov 2021 13 minutes read What Are Aggregate Functions in SQL, and How Do I Use Them? Martyna Sławińska sql learn sql aggregate functions Table of Contents How Aggregate Functions Work The * Argument to Aggregate Functions The DISTINCT Keyword The CASE Statement What Happens to NULLs? The Role of HAVING and GROUP BY With Aggregate Functions Let’s Practice! Sample Database Examples With COUNT() Examples With SUM() Examples With AVG() Examples With MAX() and MIN() SQL Aggregate Functions as an Essential Tool in Data Science Data is your source of knowledge. And thanks to SQL aggregate functions, you can extract the precise knowledge you need from your data efficiently. Read along to find out more. The core SQL aggregate functions are the following: COUNT(column_name | *) returns the number of rows in a table. SUM(column_name) returns the sum of the values of a numeric column. AVG(column_name) returns the average value of a numeric column. MIN(column_name) returns the minimum value of a selected column. MAX(column_name) returns the maximum value of a selected column. In this article, we’ll discuss each of these with examples. You’ll find out what happens to NULLs and duplicates when subjected to aggregate functions. Furthermore, we'll explain *, the DISTINCT keyword, and the CASE statement. Let’s get started! How Aggregate Functions Work SQL aggregate functions accumulate data from multiple rows into a single summary row. The accumulated value is based on the values from the column passed as an argument. We can group the rows using a GROUP BY clause and further filter them using a HAVING clause. A standard example is finding the number of rows in a table. Here, we aggregate all rows of the Books table into one row. SELECT * FROM Books; IdAuthorTitlePrice 234Anthony MolinaroSQL Cookbook20.00 235Alan BeaulieuLearning SQL25.00 236Donald KnuthThings a Computer Scientist Rarely Talks About25.00 237Donald KnuthThe Art of Computer Programming27.00 SELECT COUNT(*) AS NumberOfBooks FROM Books; NumberOfBooks 4 Easy, right? Using COUNT(*), you can count the number of all rows. You can also have a look at another articles on SQL aggregate functions here. The * Argument to Aggregate Functions If you ask me what * stands for in SQL, my response is that it stands for all. It is commonly used with a SELECT statement when querying all the columns of a given table. Take, for example, SELECT * FROM Books, as above. The * argument can also be used with the COUNT() aggregate function. It then counts all rows of a table. Take, for example, SELECT COUNT(*) as NumberOfBooks FROM Books, as above. You can group data by some column or even by many columns. Take a look at the following example: SELECT Author, COUNT(*) AS NumberOfBooks FROM Books GROUP BY Author; AuthorNumberOfBooks Anthony Molinaro1 Alan Beaulieu1 Donald Knuth2 This counts the number of books by author. The * argument applies only to the COUNT() aggregate function. For other aggregate functions, a specific column, or a combination of columns, is required as an argument. The DISTINCT Keyword The DISTINCT keyword tells the database that we don't want to consider duplicate values. For example, COUNT(Author) gets us the number of all authors present in a table. But if the same author appears many times in a column, the author is counted many times. Take a look at this: SELECT COUNT(Author) AS NumberOfAuthors FROM books; NumberOfAuthors 4 See? It counts four authors because Donald Knuth is counted twice. What happens if we add the DISTINCT keyword? SELECT COUNT(DISTINCT Author) AS NumberOfAuthors FROM Books; NumberOfAuthors 3 This time, we use a DISTINCT keyword. Now, Donald Knuth is counted only once. When using the DISTINCT keyword, COUNT() must take a specific column as an argument. It returns the number of unique values stored in that column. Similarly, we can use the DISTINCT keyword with arguments of SUM() and AVG() aggregate functions. Below, we compare the results of executing the SUM() function with and without the DISTINCT keyword. SELECT SUM(DISTINCT Price) AS TotalDistinctPrice FROM Books; TotalDistinctPrice 72 SELECT SUM(Price) AS TotalPrice FROM Books; TotalPrice 97 As you can see, when you use a DISTINCT keyword, the books with the same price are considered only once in SUM(). In this case, it makes more sense to use the SUM() function without the DISTINCT keyword. Similarly, when calculating an average price, it's better not to use the DISTINCT keyword; we should consider each price as many times as it appears in the column. See what happens with AVG(): SELECT AVG(DISTINCT Price) AS TotalDistinctAvg FROM Books; TotalDistinctAvg 24 SELECT AVG(Price) AS TotalAvg FROM Books; TotalAvg 24.25 For the MIN() and MAX() aggregate functions, the DISTINCT keyword doesn’t make a difference. But it doesn’t cause any errors either. Why? Let’s consider a set of numbers {1, 2, 2, 3, 4, 5, 5, 6}. Its maximum and minimum values are 6 and 1, respectively. With the DISTINCT keyword, this set becomes {1, 2, 3, 4, 5, 6}, so the maximum and minimum values are still the same. The CASE Statement The CASE statement categorizes and filters data. It is like a gatekeeper for the argument of an aggregate function, deciding which values to let in. Let’s look at some examples to illustrate this concept. In the following query, we use a CASE statement as an argument for the COUNT() function. It counts only the books whose price is higher than $20.00. SELECT COUNT(CASE WHEN Price > 20 THEN Price END) AS NumberOfExpensiveBooks FROM Books; NumberOfExpensiveBooks 3 A CASE statement can be used as an argument for other aggregate functions as well. In the query below, we sum the prices of the books that cost exactly $25.00. The CASE statement inside of the SUM() function allows only books with a price of $25.00 to be included in the sum. SELECT SUM(CASE WHEN Price = 25 THEN Price END) AS BooksSum FROM Books; BooksSum 50 Now, we average the prices of books that cost below $26.00 in the next query. The CASE statement inside the AVG() function allows only books with a price below $26.00 to be included in the average. The AVG() function is an argument to the ROUND() function, so the output of the AVG() function is rounded to two decimal places. SELECT ROUND(AVG(CASE WHEN Price < 26 THEN Price END), 2) AS BooksAvg FROM Books; BooksAvg 23.33 In the next query, we find the minimum price of the books about SQL that cost below $26.00. The CASE statement inside of the MIN() function allows only books with a price below $26.00 to be included in the set. SELECT MIN(CASE WHEN Price < 26 AND Title LIKE '%SQL%' THEN Price END) AS BooksMin FROM Books; BooksMin 20 Next, we find the maximum price of the books that cost below $25.00. The CASE statement inside the MAX() function allows only books with a price below $25.00 to be included in the set. SELECT MAX(CASE WHEN Price < 25 THEN Price END) AS BooksMax FROM Books; BooksMax 20 I’m sure you can already figure out the output of these queries! What Happens to NULLs? The answer is simple. SQL aggregate functions ignore NULL values. Let’s consider an updated Books table. This time we have a NULL price. SELECT * FROM Books; IdAuthorTitlePrice 234Anthony MolinaroSQL Cookbook20.00 235Alan BeaulieuLearning SQL25.00 236Donald KnuthThings a Computer Scientist Rarely Talks About25.00 237Donald KnuthThe Art of Computer ProgrammingNULL COUNT(Price) now returns 3, not 4, and SUM(Price) returns 70.00. NULLs are ignored in both cases. You can also use aggregate functions with JOINs! Take a look at our article on Using SQL Aggregate Functions with JOINs to learn more. The Role of HAVING and GROUP BY With Aggregate Functions It is straightforward to understand what a HAVING clause does if you are familiar with the WHERE clause. A HAVING clause filters the output values of aggregate functions. A GROUP BY clause lets you divide your data into groups and find an aggregate value for each group. Let’s look at an example. SELECT Author, AVG(Price) AS AvgBookPrice FROM Books GROUP BY Author HAVING AVG(Price) > 20; AuthorAvgBookPrice Alan Beaulieu25 Donald Knuth26 We group the data by the Author column using a GROUP BY clause. Then, we restrict the values of AVG(Price) to be greater than 20 using a HAVING clause. We can try using the WHERE and HAVING clauses together to see the difference between them. SELECT Author, AVG(Price) AS AvgBookPrice FROM Books WHERE Author LIKE 'A%' GROUP BY Author HAVING AVG(Price) > 20; AuthorAvgBookPrice Alan Beaulieu25 The HAVING clause is often confused with the WHERE clause. Remember you can't use aggregate functions in a WHERE clause. Make sure to get enough practice and check out our SQL HAVING Tutorial. Also, check out our article on Using GROUP BY in SQL to get even more insight into the GROUP BY clause. SQL aggregate functions or SQL window functions? Or maybe both? Check out our article on similarities and differences between the two! Let’s Practice! It isn't enough to just read. SQL requires a great deal of practice. Let's get you started on some examples here so you can continue on your own! Before we jump into the examples, make sure you are clear on all SQL Fundamentals and Standard SQL Functions! Sample Database Below is the blueprint of the database. We'll use this database in our examples below. Let’s analyze the database blueprint, starting from the left. The Customers table stores data about the customers. Its primary key is the CustomerId column. The Customers and Orders tables are linked using the CustomerId column. The Orders table stores the order date and the ID of the customer who placed the order. Its primary key is the OrderId column. The link between the Customers and Orders tables defines the relationship between them. One customer can have zero or more orders, but an order can be assigned to only one customer. The Orders and OrderDetails tables are linked using the OrderId column. The Products and OrderDetails tables are linked using the ProductId column. The primary key of the OrderDetails table consists of the OrderId and ProductId columns. One order can consist of one or more products. Hence, one row from the Orders table can be related to one or more rows from the OrderDetails table. Also, one product can be in zero or more orders. Hence, one row from the Products table can be related to zero or more rows from the OrderDetails table. Now, let’s insert some data into our tables. The Customers table: CustomerIdFirstNameLastNameStreetHouseNoCityCountryEmailPhoneNo 1RachelHartleyBicetown Road602New YorkUSArh@email.com0123456789 2CaitlynRayFriedrichstrasse44BerlinGermanycr@email.com0987654321 3AndrewDuncanLairg Road38LondonNULLad@email.com0567432678 4TaylorJenkinsPark Row106EdinburghUKNULL0876345123 5BenHollandWilliams Avenue252Los AngelesUSAbh@email.com0987456789 The Orders table (the date column is in the format DD-MM-YYYY): OrderIdCustomerIdOrderDate 45110-10-2021 46211-12-2020 47305-05-2021 48409-08-2021 495NULL 50102-06-2021 51207-07-2021 The OrderDetails table: OrderIdProductIdQuantity 451002 451013 461001 471024 481013 481035 491042 501003 511011 The Products table: ProductIdNameUnitPriceAvailableInStock 100Keyboard30.00300 101USB Drive20.00450 102Mouse20.00500 103Screen100.00450 104Laptop600.00200 We are now ready to start with the examples. Examples With COUNT() We start with the Customers table. Let’s find out how many customers there are by country. SELECT Country, COUNT(CustomerId) AS NumberOfCustomers FROM Customers GROUP BY Country; CountryNumberOfCustomers NULL1 Germany1 UK1 USA2 We have selected all distinct values from the Country column, including the NULL value. The NumberOfCustomers column stores the number of customers for each value of the Country column. What happens if we use the Email column as an argument to the COUNT() function? SELECT Country, COUNT(Email) AS NumberOfCustomers FROM Customers GROUP BY Country; CountryNumberOfCustomers NULL1 Germany1 UK0 USA2 The value of the NumberOfCustomers column for the country “UK” becomes zero. This is because the Email column in the Customers table is NULL for this customer. Now, let’s look at an example that uses the GROUP BY and HAVING clauses. SELECT Country, COUNT(Email) AS NumberOfCustomersWithEmail FROM Customers WHERE Country IS NOT NULL GROUP BY Country HAVING COUNT(Email) > 1; CountryNumberOfCustomersWithEmail USA2 As before, we select the values from the Country column and obtain the count of customers with emails by country. In the WHERE clause, we state we don’t consider NULL values for the Country column. Next, we group our data by Country. Finally, we restrict the NumberOfCustomersWithEmail column values to be greater than 1 with a HAVING clause. Examples With SUM() Let’s check how much all the available products are worth. SELECT SUM(UnitPrice * AvailableInStock) AS AllProductsValue FROM Products; AllProductsValue 193000 Here, the SUM() aggregate function creates a value of UnitPrice * AvailableInStock for each row and then adds up all those values. Let’s say each order of value greater than $100.00 qualifies for a discount. We want to find out which orders qualify for the discount. SELECT OrderId, CASE WHEN OrderValue > 100 THEN 1 ELSE 0 END AS QualifiesForDiscount FROM ( SELECT aod.OrderId AS OrderId, SUM(aod.Quantity * ap.UnitPrice) AS OrderValue FROM Products ap JOIN OrderDetails aod ON ap.ProductId = aod.ProductId GROUP BY aod.OrderId ); OrderIdQualifiesForDiscount 451 460 470 481 491 500 510 The inner query selects all the OrderId column values and calculates the value of each order using the SUM() function. The outer query uses a CASE statement to decide whether the order qualifies for a discount (1) or not (0). Now, say we define all products with a unit price greater than $90.00 to be expensive. Let’s find out the total value of all expensive products in stock. SELECT SUM(CASE WHEN UnitPrice > 90 THEN UnitPrice * AvailableInStock END) AS ExpensiveProductsValue FROM Products; ExpensiveProductsValue 165000 We have passed a CASE statement as an argument to the SUM() function. This argument ensures that only the rows with the UnitPrice value greater than $90.00 are considered. Other than that, this example is quite similar to the first one in this section. You can find more examples of using SUM() with GROUP BY in our article How to Use SUM() with GROUP BY: A Guide with 8 Examples. Examples With AVG() Let’s check what the average price of an order is. SELECT AVG(OrderValue) AS AvgOrderValue FROM ( SELECT aod.OrderId AS OrderId, SUM(aod.Quantity * ap.UnitPrice) AS OrderValue FROM Products ap JOIN OrderDetails aod ON ap.ProductId = aod.ProductId GROUP BY aod.OrderId ); AvgOrderValue 300 The inner query outputs the total value of the order for each order. The outer query calculates the average value of an order. We can also find out the average quantity ordered by product. SELECT ROUND(AVG(Quantity), 2) AS AvgOrderQuantity FROM OrderDetails; AvgOrderQuantity 2.67 On average, our customers buy between 2 and 3 items of any given product in an order. Let’s see what changes when we consider only the unique values of the Quantity column. SELECT ROUND(AVG(DISTINCT Quantity), 2) AS AvgOrderQuantity FROM OrderDetails; AvgOrderQuantity 3 The output value changes, because we no longer consider duplicate values that appear in the Quantity column of the OrderDetails table. Examples With MAX() and MIN() Last but not least! The MAX() and MIN() functions are pretty straightforward. Let’s find out the oldest and the most recent orders. SELECT MIN(OrderDate) AS EarliestOrder, MAX(OrderDate) AS LatestOrder FROM Orders; EarliestOrderLatestOrder 11-12-202010-10-2021 The MIN() function returns the oldest date, and the MAX() function returns the most recent date. We can also identify the cheapest and the most expensive products. You can search the Products table to do this. SELECT MIN(UnitPrice) AS CheapestProductPrice, MAX(UnitPrice) AS MostExpensiveProductPrice FROM Products; CheapestProductPriceMostExpensiveProductPrice 20600 Let’s find out how many orders there are per customer, then get the minimum and the maximum number of orders per customer. SELECT MIN(NumberOfOrders) AS MinNumberOfOrders, MAX(NumberOfOrders) AS MaxNumberOfOrders FROM ( SELECT CustomerId, COUNT(OrderID) AS NumberOfOrders FROM Orders GROUP BY CustomerId ); MinNumberOfOrdersMaxNumberOfOrders 12 The inner query selects the CustomerId column and the total number of orders placed by a given customer. The COUNT(OrderId) function counts the number of orders per customer. Next, we group our data by the CustomerId column using a GROUP BY clause. In this step, the COUNT(OrderId) function counts the orders by customer and not for all customers together. The outer query selects the minimum and the maximum values of the NumberOfOrders column from the inner query. SQL Aggregate Functions as an Essential Tool in Data Science By using aggregate functions, we can easily find answers to specific questions, such as how many customers there are or what the average price of an order is. SQL aggregate functions let us analyze data efficiently. These core SQL aggregate functions are very helpful in data science. With them, you can organize data the way you want and extract the information you need. We have reviewed many examples with the COUNT(), SUM(), AVG(), MIN(), and MAX() aggregate functions. We have also covered examples of *, the DISTINCT keyword, and the CASE statement as arguments to aggregate functions. Now, you are ready to create your database and practice some more! Tags: sql learn sql aggregate functions