11th Nov 2021 13 minutes read

What Are Aggregate Functions in SQL, and How Do I Use Them?

Table of Contents

How Aggregate Functions Work
The Role of HAVING and GROUP BY With Aggregate Functions
Let’s Practice!
SQL Aggregate Functions as an Essential Tool in Data Science

Data is your source of knowledge. And thanks to SQL aggregate functions, you can extract the precise knowledge you need from your data efficiently. Read along to find out more.

The core SQL aggregate functions are the following:

COUNT(column_name | *) returns the number of rows in a table.
SUM(column_name) returns the sum of the values of a numeric column.
AVG(column_name) returns the average value of a numeric column.
MIN(column_name) returns the minimum value of a selected column.
MAX(column_name) returns the maximum value of a selected column.

In this article, we’ll discuss each of these with examples. You’ll find out what happens to NULLs and duplicates when subjected to aggregate functions. Furthermore, we'll explain *, the DISTINCT keyword, and the CASE statement.

Let’s get started!

How Aggregate Functions Work

SQL aggregate functions accumulate data from multiple rows into a single summary row. The accumulated value is based on the values from the column passed as an argument. We can group the rows using a GROUP BY clause and further filter them using a HAVING clause.

A standard example is finding the number of rows in a table. Here, we aggregate all rows of the Books table into one row.

SELECT * FROM Books;

Id	Author	Title	Price
234	Anthony Molinaro	SQL Cookbook	20.00
235	Alan Beaulieu	Learning SQL	25.00
236	Donald Knuth	Things a Computer Scientist Rarely Talks About	25.00
237	Donald Knuth	The Art of Computer Programming	27.00

SELECT COUNT(*) AS NumberOfBooks
FROM Books;

NumberOfBooks
4

Easy, right? Using COUNT(*), you can count the number of all rows.

You can also have a look at another articles on SQL aggregate functions here.

The * Argument to Aggregate Functions

If you ask me what * stands for in SQL, my response is that it stands for all. It is commonly used with a SELECT statement when querying all the columns of a given table. Take, for example, SELECT * FROM Books, as above.

The * argument can also be used with the COUNT() aggregate function. It then counts all rows of a table. Take, for example, SELECT COUNT(*) as NumberOfBooks FROM Books, as above.

You can group data by some column or even by many columns. Take a look at the following example:

SELECT Author, COUNT(*) AS NumberOfBooks
FROM Books
GROUP BY Author;

Author	NumberOfBooks
Anthony Molinaro	1
Alan Beaulieu	1
Donald Knuth	2

This counts the number of books by author.

The * argument applies only to the COUNT() aggregate function. For other aggregate functions, a specific column, or a combination of columns, is required as an argument.

The DISTINCT Keyword

The DISTINCT keyword tells the database that we don't want to consider duplicate values.

For example, COUNT(Author) gets us the number of all authors present in a table. But if the same author appears many times in a column, the author is counted many times. Take a look at this:

SELECT COUNT(Author) AS NumberOfAuthors
FROM books;

NumberOfAuthors
4

See? It counts four authors because Donald Knuth is counted twice.

What happens if we add the DISTINCT keyword?

SELECT COUNT(DISTINCT Author) AS NumberOfAuthors
FROM Books;

NumberOfAuthors
3

This time, we use a DISTINCT keyword. Now, Donald Knuth is counted only once. When using the DISTINCT keyword, COUNT() must take a specific column as an argument. It returns the number of unique values stored in that column.

Similarly, we can use the DISTINCT keyword with arguments of SUM() and AVG() aggregate functions. Below, we compare the results of executing the SUM() function with and without the DISTINCT keyword.

SELECT SUM(DISTINCT Price)
AS TotalDistinctPrice
FROM Books;

TotalDistinctPrice
72

	SELECT SUM(Price)
	AS TotalPrice
	FROM Books;

TotalPrice
97

As you can see, when you use a DISTINCT keyword, the books with the same price are considered only once in SUM(). In this case, it makes more sense to use the SUM() function without the DISTINCT keyword.

Similarly, when calculating an average price, it's better not to use the DISTINCT keyword; we should consider each price as many times as it appears in the column. See what happens with AVG():

SELECT AVG(DISTINCT Price)
AS TotalDistinctAvg
FROM Books;

TotalDistinctAvg
24

	SELECT AVG(Price)
	AS TotalAvg
	FROM Books;

TotalAvg
24.25

For the MIN() and MAX() aggregate functions, the DISTINCT keyword doesn’t make a difference. But it doesn’t cause any errors either. Why? Let’s consider a set of numbers {1, 2, 2, 3, 4, 5, 5, 6}. Its maximum and minimum values are 6 and 1, respectively. With the DISTINCT keyword, this set becomes {1, 2, 3, 4, 5, 6}, so the maximum and minimum values are still the same.

The CASE Statement

The CASE statement categorizes and filters data. It is like a gatekeeper for the argument of an aggregate function, deciding which values to let in. Let’s look at some examples to illustrate this concept.

In the following query, we use a CASE statement as an argument for the COUNT() function. It counts only the books whose price is higher than $20.00.

SELECT COUNT(CASE WHEN Price > 20 THEN Price END)
AS NumberOfExpensiveBooks
FROM Books;

NumberOfExpensiveBooks
3

A CASE statement can be used as an argument for other aggregate functions as well. In the query below, we sum the prices of the books that cost exactly $25.00. The CASE statement inside of the SUM() function allows only books with a price of $25.00 to be included in the sum.

SELECT SUM(CASE WHEN Price = 25 THEN Price END)
AS BooksSum
FROM Books;

BooksSum
50

Now, we average the prices of books that cost below $26.00 in the next query. The CASE statement inside the AVG() function allows only books with a price below $26.00 to be included in the average. The AVG() function is an argument to the ROUND() function, so the output of the AVG() function is rounded to two decimal places.

SELECT ROUND(AVG(CASE WHEN Price < 26 THEN Price END), 2)
AS BooksAvg
FROM Books;

BooksAvg
23.33

In the next query, we find the minimum price of the books about SQL that cost below $26.00. The CASE statement inside of the MIN() function allows only books with a price below $26.00 to be included in the set.

SELECT MIN(CASE WHEN Price < 26 AND Title LIKE '%SQL%' THEN Price END)
AS BooksMin
FROM Books;

BooksMin
20

Next, we find the maximum price of the books that cost below $25.00. The CASE statement inside the MAX() function allows only books with a price below $25.00 to be included in the set.

SELECT MAX(CASE WHEN Price < 25 THEN Price END)
AS BooksMax
FROM Books;

BooksMax
20

I’m sure you can already figure out the output of these queries!

What Happens to NULLs?

The answer is simple. SQL aggregate functions ignore NULL values.

Let’s consider an updated Books table. This time we have a NULL price.

SELECT * FROM Books;

Id	Author	Title	Price
234	Anthony Molinaro	SQL Cookbook	20.00
235	Alan Beaulieu	Learning SQL	25.00
236	Donald Knuth	Things a Computer Scientist Rarely Talks About	25.00
237	Donald Knuth	The Art of Computer Programming	NULL

COUNT(Price) now returns 3, not 4, and SUM(Price) returns 70.00. NULLs are ignored in both cases.

You can also use aggregate functions with JOINs! Take a look at our article on Using SQL Aggregate Functions with JOINs to learn more.

The Role of HAVING and GROUP BY With Aggregate Functions

It is straightforward to understand what a HAVING clause does if you are familiar with the WHERE clause. A HAVING clause filters the output values of aggregate functions. A GROUP BY clause lets you divide your data into groups and find an aggregate value for each group.

Let’s look at an example.

SELECT Author, AVG(Price) AS AvgBookPrice
FROM Books
GROUP BY Author
HAVING AVG(Price) > 20;

Author	AvgBookPrice
Alan Beaulieu	25
Donald Knuth	26

We group the data by the Author column using a GROUP BY clause. Then, we restrict the values of AVG(Price) to be greater than 20 using a HAVING clause.

We can try using the WHERE and HAVING clauses together to see the difference between them.

SELECT Author, AVG(Price) AS AvgBookPrice
FROM Books
WHERE Author LIKE 'A%'
GROUP BY Author
HAVING AVG(Price) > 20;

Author	AvgBookPrice
Alan Beaulieu	25

The HAVING clause is often confused with the WHERE clause. Remember you can't use aggregate functions in a WHERE clause. Make sure to get enough practice and check out our SQL HAVING Tutorial. Also, check out our article on Using GROUP BY in SQL to get even more insight into the GROUP BY clause.

SQL aggregate functions or SQL window functions? Or maybe both? Check out our article on similarities and differences between the two!

Let’s Practice!

It isn't enough to just read. SQL requires a great deal of practice. Let's get you started on some examples here so you can continue on your own!

Before we jump into the examples, make sure you are clear on all SQL Fundamentals and Standard SQL Functions!

Sample Database

Below is the blueprint of the database. We'll use this database in our examples below.

Let’s analyze the database blueprint, starting from the left.

The Customers table stores data about the customers. Its primary key is the CustomerId column. The Customers and Orders tables are linked using the CustomerId column. The Orders table stores the order date and the ID of the customer who placed the order. Its primary key is the OrderId column. The link between the Customers and Orders tables defines the relationship between them. One customer can have zero or more orders, but an order can be assigned to only one customer.

The Orders and OrderDetails tables are linked using the OrderId column. The Products and OrderDetails tables are linked using the ProductId column. The primary key of the OrderDetails table consists of the OrderId and ProductId columns.

One order can consist of one or more products. Hence, one row from the Orders table can be related to one or more rows from the OrderDetails table. Also, one product can be in zero or more orders. Hence, one row from the Products table can be related to zero or more rows from the OrderDetails table.

Now, let’s insert some data into our tables.

The Customers table:

CustomerId	FirstName	LastName	Street	HouseNo	City	Country	Email	PhoneNo
1	Rachel	Hartley	Bicetown Road	602	New York	USA	rh@email.com	0123456789
2	Caitlyn	Ray	Friedrichstrasse	44	Berlin	Germany	cr@email.com	0987654321
3	Andrew	Duncan	Lairg Road	38	London	NULL	ad@email.com	0567432678
4	Taylor	Jenkins	Park Row	106	Edinburgh	UK	NULL	0876345123
5	Ben	Holland	Williams Avenue	252	Los Angeles	USA	bh@email.com	0987456789

The Orders table (the date column is in the format DD-MM-YYYY):

OrderId	CustomerId	OrderDate
45	1	10-10-2021
46	2	11-12-2020
47	3	05-05-2021
48	4	09-08-2021
49	5	NULL
50	1	02-06-2021
51	2	07-07-2021

The OrderDetails table:

OrderId	ProductId	Quantity
45	100	2
45	101	3
46	100	1
47	102	4
48	101	3
48	103	5
49	104	2
50	100	3
51	101	1

The Products table:

ProductId	Name	UnitPrice	AvailableInStock
100	Keyboard	30.00	300
101	USB Drive	20.00	450
102	Mouse	20.00	500
103	Screen	100.00	450
104	Laptop	600.00	200

We are now ready to start with the examples.

Examples With COUNT()

We start with the Customers table. Let’s find out how many customers there are by country.

SELECT Country, COUNT(CustomerId) AS NumberOfCustomers
FROM Customers
GROUP BY Country;

Country	NumberOfCustomers
NULL	1
Germany	1
UK	1
USA	2

We have selected all distinct values from the Country column, including the NULL value. The NumberOfCustomers column stores the number of customers for each value of the Country column.

What happens if we use the Email column as an argument to the COUNT() function?

SELECT Country, COUNT(Email) AS NumberOfCustomers
FROM Customers
GROUP BY Country;

Country	NumberOfCustomers
NULL	1
Germany	1
UK	0
USA	2

The value of the NumberOfCustomers column for the country “UK” becomes zero. This is because the Email column in the Customers table is NULL for this customer.

Now, let’s look at an example that uses the GROUP BY and HAVING clauses.

SELECT Country, COUNT(Email) AS NumberOfCustomersWithEmail
FROM Customers
WHERE Country IS NOT NULL
GROUP BY Country
HAVING COUNT(Email) > 1;

Country	NumberOfCustomersWithEmail
USA	2

As before, we select the values from the Country column and obtain the count of customers with emails by country. In the WHERE clause, we state we don’t consider NULL values for the Country column. Next, we group our data by Country. Finally, we restrict the NumberOfCustomersWithEmail column values to be greater than 1 with a HAVING clause.

Examples With SUM()

Let’s check how much all the available products are worth.

SELECT SUM(UnitPrice * AvailableInStock) AS AllProductsValue
FROM Products;

AllProductsValue
193000

Here, the SUM() aggregate function creates a value of UnitPrice * AvailableInStock for each row and then adds up all those values.

Let’s say each order of value greater than $100.00 qualifies for a discount. We want to find out which orders qualify for the discount.

SELECT OrderId,
       CASE WHEN OrderValue > 100 THEN 1 ELSE 0 END
       AS QualifiesForDiscount
FROM
     ( SELECT aod.OrderId AS OrderId,
              SUM(aod.Quantity * ap.UnitPrice) AS OrderValue
       FROM Products ap
       JOIN OrderDetails aod
       ON ap.ProductId = aod.ProductId
       GROUP BY aod.OrderId );

OrderId	QualifiesForDiscount
45	1
46	0
47	0
48	1
49	1
50	0
51	0

The inner query selects all the OrderId column values and calculates the value of each order using the SUM() function. The outer query uses a CASE statement to decide whether the order qualifies for a discount (1) or not (0).

Now, say we define all products with a unit price greater than $90.00 to be expensive. Let’s find out the total value of all expensive products in stock.

SELECT
    SUM(CASE WHEN UnitPrice > 90 THEN UnitPrice * AvailableInStock END)
    AS ExpensiveProductsValue
FROM Products;

ExpensiveProductsValue
165000

We have passed a CASE statement as an argument to the SUM() function. This argument ensures that only the rows with the UnitPrice value greater than $90.00 are considered. Other than that, this example is quite similar to the first one in this section.

You can find more examples of using SUM() with GROUP BY in our article How to Use SUM() with GROUP BY: A Guide with 8 Examples.

Examples With AVG()

Let’s check what the average price of an order is.

SELECT AVG(OrderValue) AS AvgOrderValue
FROM
     ( SELECT aod.OrderId AS OrderId,
              SUM(aod.Quantity * ap.UnitPrice) AS OrderValue
       FROM Products ap
       JOIN OrderDetails aod
       ON ap.ProductId = aod.ProductId
       GROUP BY aod.OrderId );

AvgOrderValue
300

The inner query outputs the total value of the order for each order. The outer query calculates the average value of an order.

We can also find out the average quantity ordered by product.

SELECT ROUND(AVG(Quantity), 2) AS AvgOrderQuantity
FROM OrderDetails;

AvgOrderQuantity
2.67

On average, our customers buy between 2 and 3 items of any given product in an order.

Let’s see what changes when we consider only the unique values of the Quantity column.

SELECT ROUND(AVG(DISTINCT Quantity), 2) AS AvgOrderQuantity
FROM OrderDetails;

AvgOrderQuantity
3

The output value changes, because we no longer consider duplicate values that appear in the Quantity column of the OrderDetails table.

Examples With MAX() and MIN()

Last but not least! The MAX() and MIN() functions are pretty straightforward. Let’s find out the oldest and the most recent orders.

SELECT MIN(OrderDate) AS EarliestOrder,
       MAX(OrderDate) AS LatestOrder
FROM Orders;

EarliestOrder	LatestOrder
11-12-2020	10-10-2021

The MIN() function returns the oldest date, and the MAX() function returns the most recent date.

We can also identify the cheapest and the most expensive products. You can search the Products table to do this.

SELECT MIN(UnitPrice) AS CheapestProductPrice,
       MAX(UnitPrice) AS MostExpensiveProductPrice
FROM Products;

CheapestProductPrice	MostExpensiveProductPrice
20	600

Let’s find out how many orders there are per customer, then get the minimum and the maximum number of orders per customer.

SELECT MIN(NumberOfOrders) AS MinNumberOfOrders,
       MAX(NumberOfOrders) AS MaxNumberOfOrders
FROM
     ( SELECT CustomerId,
              COUNT(OrderID) AS NumberOfOrders
       FROM Orders
       GROUP BY CustomerId );

MinNumberOfOrders	MaxNumberOfOrders
1	2

The inner query selects the CustomerId column and the total number of orders placed by a given customer. The COUNT(OrderId) function counts the number of orders per customer. Next, we group our data by the CustomerId column using a GROUP BY clause. In this step, the COUNT(OrderId) function counts the orders by customer and not for all customers together. The outer query selects the minimum and the maximum values of the NumberOfOrders column from the inner query.

SQL Aggregate Functions as an Essential Tool in Data Science

By using aggregate functions, we can easily find answers to specific questions, such as how many customers there are or what the average price of an order is. SQL aggregate functions let us analyze data efficiently.

These core SQL aggregate functions are very helpful in data science. With them, you can organize data the way you want and extract the information you need.

We have reviewed many examples with the COUNT(), SUM(), AVG(), MIN(), and MAX() aggregate functions. We have also covered examples of *, the DISTINCT keyword, and the CASE statement as arguments to aggregate functions. Now, you are ready to create your database and practice some more!

Tags: