What Are Aggregate Functions in SQL, and How Do I Use Them?
Data is your source of knowledge. And thanks to SQL aggregate functions, you can extract the precise knowledge you need from your data efficiently. Read along to find out more.
The core SQL aggregate functions are the following:
COUNT(column_name | *)
returns the number of rows in a table.SUM(column_name)
returns the sum of the values of a numeric column.AVG(column_name)
returns the average value of a numeric column.MIN(column_name)
returns the minimum value of a selected column.MAX(column_name)
returns the maximum value of a selected column.
In this article, we’ll discuss each of these with examples. You’ll find out what happens to NULLs
and duplicates when subjected to aggregate functions. Furthermore, we'll explain *
, the DISTINCT
keyword, and the CASE
statement.
Let’s get started!
How Aggregate Functions Work
SQL aggregate functions accumulate data from multiple rows into a single summary row. The accumulated value is based on the values from the column passed as an argument. We can group the rows using a GROUP BY
clause and further filter them using a HAVING
clause.
A standard example is finding the number of rows in a table. Here, we aggregate all rows of the Books table into one row.
SELECT * FROM Books; |
Id | Author | Title | Price |
---|---|---|---|
234 | Anthony Molinaro | SQL Cookbook | 20.00 |
235 | Alan Beaulieu | Learning SQL | 25.00 |
236 | Donald Knuth | Things a Computer Scientist Rarely Talks About | 25.00 |
237 | Donald Knuth | The Art of Computer Programming | 27.00 |
SELECT COUNT (*) AS NumberOfBooks FROM Books; |
NumberOfBooks |
---|
4 |
Easy, right? Using COUNT(*)
, you can count the number of all rows.
You can also have a look at another articles on SQL aggregate functions here.
The * Argument to Aggregate Functions
If you ask me what * stands for in SQL, my response is that it stands for all. It is commonly used with a SELECT
statement when querying all the columns of a given table. Take, for example, SELECT * FROM Books
, as above.
The *
argument can also be used with the COUNT()
aggregate function. It then counts all rows of a table. Take, for example, SELECT COUNT(*) as NumberOfBooks FROM Books
, as above.
You can group data by some column or even by many columns. Take a look at the following example:
SELECT Author, COUNT (*) AS NumberOfBooks FROM Books GROUP BY Author; |
Author | NumberOfBooks |
---|---|
Anthony Molinaro | 1 |
Alan Beaulieu | 1 |
Donald Knuth | 2 |
This counts the number of books by author.
The *
argument applies only to the COUNT()
aggregate function. For other aggregate functions, a specific column, or a combination of columns, is required as an argument.
The DISTINCT Keyword
The DISTINCT
keyword tells the database that we don't want to consider duplicate values.
For example, COUNT(Author)
gets us the number of all authors present in a table. But if the same author appears many times in a column, the author is counted many times. Take a look at this:
SELECT COUNT (Author) AS NumberOfAuthors FROM books; |
NumberOfAuthors |
---|
4 |
See? It counts four authors because Donald Knuth is counted twice.
What happens if we add the DISTINCT
keyword?
SELECT COUNT ( DISTINCT Author) AS NumberOfAuthors FROM Books; |
NumberOfAuthors |
---|
3 |
This time, we use a DISTINCT
keyword. Now, Donald Knuth
is counted only once. When using the DISTINCT
keyword, COUNT()
must take a specific column as an argument. It returns the number of unique values stored in that column.
Similarly, we can use the DISTINCT
keyword with arguments of SUM()
and AVG()
aggregate functions. Below, we compare the results of executing the SUM()
function with and without the DISTINCT
keyword.
SELECT SUM ( DISTINCT Price) AS TotalDistinctPrice FROM Books; |
TotalDistinctPrice |
---|
72 |
SELECT SUM (Price) AS TotalPrice FROM Books; |
TotalPrice |
---|
97 |
As you can see, when you use a DISTINCT
keyword, the books with the same price are considered only once in SUM(). In this case, it makes more sense to use the SUM()
function without the DISTINCT
keyword.
Similarly, when calculating an average price, it's better not to use the DISTINCT
keyword; we should consider each price as many times as it appears in the column. See what happens with AVG()
:
SELECT AVG ( DISTINCT Price) AS TotalDistinctAvg FROM Books; |
TotalDistinctAvg |
---|
24 |
SELECT AVG (Price) AS TotalAvg FROM Books; |
TotalAvg |
---|
24.25 |
For the MIN()
and MAX()
aggregate functions, the DISTINCT
keyword doesn’t make a difference. But it doesn’t cause any errors either. Why? Let’s consider a set of numbers {1, 2, 2, 3, 4, 5, 5, 6}. Its maximum and minimum values are 6 and 1, respectively. With the DISTINCT
keyword, this set becomes {1, 2, 3, 4, 5, 6}, so the maximum and minimum values are still the same.
The CASE Statement
The CASE
statement categorizes and filters data. It is like a gatekeeper for the argument of an aggregate function, deciding which values to let in. Let’s look at some examples to illustrate this concept.
In the following query, we use a CASE
statement as an argument for the COUNT()
function. It counts only the books whose price is higher than $20.00.
SELECT COUNT ( CASE WHEN Price > 20 THEN Price END ) AS NumberOfExpensiveBooks FROM Books; |
NumberOfExpensiveBooks |
---|
3 |
A CASE
statement can be used as an argument for other aggregate functions as well. In the query below, we sum the prices of the books that cost exactly $25.00. The CASE
statement inside of the SUM()
function allows only books with a price of $25.00 to be included in the sum.
SELECT SUM ( CASE WHEN Price = 25 THEN Price END ) AS BooksSum FROM Books; |
BooksSum |
---|
50 |
Now, we average the prices of books that cost below $26.00 in the next query. The CASE
statement inside the AVG()
function allows only books with a price below $26.00 to be included in the average. The AVG()
function is an argument to the ROUND()
function, so the output of the AVG()
function is rounded to two decimal places.
SELECT ROUND( AVG ( CASE WHEN Price < 26 THEN Price END ), 2) AS BooksAvg FROM Books; |
BooksAvg |
---|
23.33 |
In the next query, we find the minimum price of the books about SQL that cost below $26.00. The CASE
statement inside of the MIN()
function allows only books with a price below $26.00 to be included in the set.
SELECT MIN ( CASE WHEN Price < 26 AND Title LIKE '%SQL%' THEN Price END ) AS BooksMin FROM Books; |
BooksMin |
---|
20 |
Next, we find the maximum price of the books that cost below $25.00. The CASE
statement inside the MAX()
function allows only books with a price below $25.00 to be included in the set.
SELECT MAX ( CASE WHEN Price < 25 THEN Price END ) AS BooksMax FROM Books; |
BooksMax |
---|
20 |
I’m sure you can already figure out the output of these queries!
What Happens to NULLs?
The answer is simple. SQL aggregate functions ignore NULL
values.
Let’s consider an updated Books
table. This time we have a NULL
price.
SELECT * FROM Books; |
Id | Author | Title | Price |
---|---|---|---|
234 | Anthony Molinaro | SQL Cookbook | 20.00 |
235 | Alan Beaulieu | Learning SQL | 25.00 |
236 | Donald Knuth | Things a Computer Scientist Rarely Talks About | 25.00 |
237 | Donald Knuth | The Art of Computer Programming | NULL |
COUNT(Price)
now returns 3, not 4, and SUM(Price)
returns 70.00. NULLs
are ignored in both cases.
You can also use aggregate functions with JOINs! Take a look at our article on Using SQL Aggregate Functions with JOINs to learn more.
The Role of HAVING and GROUP BY With Aggregate Functions
It is straightforward to understand what a HAVING
clause does if you are familiar with the WHERE
clause. A HAVING
clause filters the output values of aggregate functions. A GROUP BY
clause lets you divide your data into groups and find an aggregate value for each group.
Let’s look at an example.
SELECT Author, AVG (Price) AS AvgBookPrice FROM Books GROUP BY Author HAVING AVG (Price) > 20; |
Author | AvgBookPrice |
---|---|
Alan Beaulieu | 25 |
Donald Knuth | 26 |
We group the data by the Author column using a GROUP BY
clause. Then, we restrict the values of AVG(Price)
to be greater than 20 using a HAVING
clause.
We can try using the WHERE
and HAVING
clauses together to see the difference between them.
SELECT Author, AVG (Price) AS AvgBookPrice FROM Books WHERE Author LIKE 'A%' GROUP BY Author HAVING AVG (Price) > 20; |
Author | AvgBookPrice |
---|---|
Alan Beaulieu | 25 |
The HAVING
clause is often confused with the WHERE
clause. Remember you can't use aggregate functions in a WHERE
clause. Make sure to get enough practice and check out our SQL HAVING Tutorial. Also, check out our article on Using GROUP BY in SQL to get even more insight into the GROUP BY clause.
SQL aggregate functions or SQL window functions? Or maybe both? Check out our article on similarities and differences between the two!
Let’s Practice!
It isn't enough to just read. SQL requires a great deal of practice. Let's get you started on some examples here so you can continue on your own!
Before we jump into the examples, make sure you are clear on all SQL Fundamentals and Standard SQL Functions!
Sample Database
Below is the blueprint of the database. We'll use this database in our examples below.

Let’s analyze the database blueprint, starting from the left.
The Customers
table stores data about the customers. Its primary key is the CustomerId
column. The Customers
and Orders
tables are linked using the CustomerId
column. The Orders table stores the order date and the ID of the customer who placed the order. Its primary key is the OrderId
column. The link between the Customers
and Orders
tables defines the relationship between them. One customer can have zero or more orders, but an order can be assigned to only one customer.
The Orders
and OrderDetails
tables are linked using the OrderId
column. The Products
and OrderDetails
tables are linked using the ProductId
column. The primary key of the OrderDetails
table consists of the OrderId
and ProductId
columns.
One order can consist of one or more products. Hence, one row from the Orders
table can be related to one or more rows from the OrderDetails
table. Also, one product can be in zero or more orders. Hence, one row from the Products
table can be related to zero or more rows from the OrderDetails
table.
Now, let’s insert some data into our tables.
The Customers
table:
CustomerId | FirstName | LastName | Street | HouseNo | City | Country | PhoneNo | |
---|---|---|---|---|---|---|---|---|
1 | Rachel | Hartley | Bicetown Road | 602 | New York | USA | rh@email.com | 0123456789 |
2 | Caitlyn | Ray | Friedrichstrasse | 44 | Berlin | Germany | cr@email.com | 0987654321 |
3 | Andrew | Duncan | Lairg Road | 38 | London | NULL | ad@email.com | 0567432678 |
4 | Taylor | Jenkins | Park Row | 106 | Edinburgh | UK | NULL | 0876345123 |
5 | Ben | Holland | Williams Avenue | 252 | Los Angeles | USA | bh@email.com | 0987456789 |
The Orders
table (the date column is in the format DD-MM-YYYY):
OrderId | CustomerId | OrderDate |
---|---|---|
45 | 1 | 10-10-2021 |
46 | 2 | 11-12-2020 |
47 | 3 | 05-05-2021 |
48 | 4 | 09-08-2021 |
49 | 5 | NULL |
50 | 1 | 02-06-2021 |
51 | 2 | 07-07-2021 |
The OrderDetails
table:
OrderId | ProductId | Quantity |
---|---|---|
45 | 100 | 2 |
45 | 101 | 3 |
46 | 100 | 1 |
47 | 102 | 4 |
48 | 101 | 3 |
48 | 103 | 5 |
49 | 104 | 2 |
50 | 100 | 3 |
51 | 101 | 1 |
The Products
table:
ProductId | Name | UnitPrice | AvailableInStock |
---|---|---|---|
100 | Keyboard | 30.00 | 300 |
101 | USB Drive | 20.00 | 450 |
102 | Mouse | 20.00 | 500 |
103 | Screen | 100.00 | 450 |
104 | Laptop | 600.00 | 200 |
We are now ready to start with the examples.
Examples With COUNT()
We start with the Customers
table. Let’s find out how many customers there are by country.
SELECT Country, COUNT (CustomerId) AS NumberOfCustomers FROM Customers GROUP BY Country; |
Country | NumberOfCustomers |
---|---|
NULL | 1 |
Germany | 1 |
UK | 1 |
USA | 2 |
We have selected all distinct values from the Country
column, including the NULL
value. The NumberOfCustomers
column stores the number of customers for each value of the Country
column.
What happens if we use the Email
column as an argument to the COUNT()
function?
SELECT Country, COUNT (Email) AS NumberOfCustomers FROM Customers GROUP BY Country; |
Country | NumberOfCustomers |
---|---|
NULL | 1 |
Germany | 1 |
UK | 0 |
USA | 2 |
The value of the NumberOfCustomers
column for the country “UK” becomes zero. This is because the Email
column in the Customers
table is NULL
for this customer.
Now, let’s look at an example that uses the GROUP BY
and HAVING
clauses.
SELECT Country, COUNT (Email) AS NumberOfCustomersWithEmail FROM Customers WHERE Country IS NOT NULL GROUP BY Country HAVING COUNT (Email) > 1; |
Country | NumberOfCustomersWithEmail |
---|---|
USA | 2 |
As before, we select the values from the Country column and obtain the count of customers with emails by country. In the WHERE
clause, we state we don’t consider NULL
values for the Country
column. Next, we group our data by Country
. Finally, we restrict the NumberOfCustomersWithEmail
column values to be greater than 1 with a HAVING
clause.
Examples With SUM()
Let’s check how much all the available products are worth.
SELECT SUM (UnitPrice * AvailableInStock) AS AllProductsValue FROM Products; |
AllProductsValue |
---|
193000 |
Here, the SUM()
aggregate function creates a value of UnitPrice * AvailableInStock
for each row and then adds up all those values.
Let’s say each order of value greater than $100.00 qualifies for a discount. We want to find out which orders qualify for the discount.
SELECT OrderId, CASE WHEN OrderValue > 100 THEN 1 ELSE 0 END AS QualifiesForDiscount FROM ( SELECT aod.OrderId AS OrderId, SUM (aod.Quantity * ap.UnitPrice) AS OrderValue FROM Products ap JOIN OrderDetails aod ON ap.ProductId = aod.ProductId GROUP BY aod.OrderId ); |
OrderId | QualifiesForDiscount |
---|---|
45 | 1 |
46 | 0 |
47 | 0 |
48 | 1 |
49 | 1 |
50 | 0 |
51 | 0 |
The inner query selects all the OrderId
column values and calculates the value of each order using the SUM()
function. The outer query uses a CASE
statement to decide whether the order qualifies for a discount (1
) or not (0
).
Now, say we define all products with a unit price greater than $90.00 to be expensive. Let’s find out the total value of all expensive products in stock.
SELECT SUM ( CASE WHEN UnitPrice > 90 THEN UnitPrice * AvailableInStock END ) AS ExpensiveProductsValue FROM Products; |
ExpensiveProductsValue |
---|
165000 |
We have passed a CASE
statement as an argument to the SUM()
function. This argument ensures that only the rows with the UnitPrice
value greater than $90.00 are considered. Other than that, this example is quite similar to the first one in this section.
You can find more examples of using SUM() with GROUP BY in our article How to Use SUM() with GROUP BY: A Guide with 8 Examples.
Examples With AVG()
Let’s check what the average price of an order is.
SELECT AVG (OrderValue) AS AvgOrderValue FROM ( SELECT aod.OrderId AS OrderId, SUM (aod.Quantity * ap.UnitPrice) AS OrderValue FROM Products ap JOIN OrderDetails aod ON ap.ProductId = aod.ProductId GROUP BY aod.OrderId ); |
AvgOrderValue |
---|
300 |
The inner query outputs the total value of the order for each order. The outer query calculates the average value of an order.
We can also find out the average quantity ordered by product.
SELECT ROUND( AVG (Quantity), 2) AS AvgOrderQuantity FROM OrderDetails; |
AvgOrderQuantity |
---|
2.67 |
On average, our customers buy between 2 and 3 items of any given product in an order.
Let’s see what changes when we consider only the unique values of the Quantity
column.
SELECT ROUND( AVG ( DISTINCT Quantity), 2) AS AvgOrderQuantity FROM OrderDetails; |
AvgOrderQuantity |
---|
3 |
The output value changes, because we no longer consider duplicate values that appear in the Quantity
column of the OrderDetails
table.
Examples With MAX() and MIN()
Last but not least! The MAX()
and MIN()
functions are pretty straightforward. Let’s find out the oldest and the most recent orders.
SELECT MIN (OrderDate) AS EarliestOrder, MAX (OrderDate) AS LatestOrder FROM Orders; |
EarliestOrder | LatestOrder |
---|---|
11-12-2020 | 10-10-2021 |
The MIN()
function returns the oldest date, and the MAX()
function returns the most recent date.
We can also identify the cheapest and the most expensive products. You can search the Products
table to do this.
SELECT MIN (UnitPrice) AS CheapestProductPrice, MAX (UnitPrice) AS MostExpensiveProductPrice FROM Products; |
CheapestProductPrice | MostExpensiveProductPrice |
---|---|
20 | 600 |
Let’s find out how many orders there are per customer, then get the minimum and the maximum number of orders per customer.
SELECT MIN (NumberOfOrders) AS MinNumberOfOrders, MAX (NumberOfOrders) AS MaxNumberOfOrders FROM ( SELECT CustomerId, COUNT (OrderID) AS NumberOfOrders FROM Orders GROUP BY CustomerId ); |
MinNumberOfOrders | MaxNumberOfOrders |
---|---|
1 | 2 |
The inner query selects the CustomerId
column and the total number of orders placed by a given customer. The COUNT(OrderId)
function counts the number of orders per customer. Next, we group our data by the CustomerId
column using a GROUP BY
clause. In this step, the COUNT(OrderId)
function counts the orders by customer and not for all customers together. The outer query selects the minimum and the maximum values of the NumberOfOrders
column from the inner query.
SQL Aggregate Functions as an Essential Tool in Data Science
By using aggregate functions, we can easily find answers to specific questions, such as how many customers there are or what the average price of an order is. SQL aggregate functions let us analyze data efficiently.
These core SQL aggregate functions are very helpful in data science. With them, you can organize data the way you want and extract the information you need.
We have reviewed many examples with the COUNT()
, SUM()
, AVG()
, MIN()
, and MAX()
aggregate functions. We have also covered examples of *
, the DISTINCT
keyword, and the CASE
statement as arguments to aggregate functions. Now, you are ready to create your database and practice some more!