# What Are Aggregate Functions in SQL, and How Do I Use Them?

*Data is your source of knowledge. And thanks to SQL aggregate functions, you can extract the precise knowledge you need from your data efficiently. Read along to find out more.*

The core SQL aggregate functions are the following:

`COUNT(column_name | *)`

returns the number of rows in a table.`SUM(column_name)`

returns the sum of the values of a numeric column.`AVG(column_name)`

returns the average value of a numeric column.`MIN(column_name)`

returns the minimum value of a selected column.`MAX(column_name)`

returns the maximum value of a selected column.

In this article, we’ll discuss each of these with examples. You’ll find out what happens to `NULLs`

and duplicates when subjected to aggregate functions. Furthermore, we'll explain `*`

, the `DISTINCT`

keyword, and the `CASE`

statement.

Let’s get started!

## How Aggregate Functions Work

SQL aggregate functions accumulate data from multiple rows into a single summary row. The accumulated value is based on the values from the column passed as an argument. We can group the rows using a `GROUP BY`

clause and further filter them using a `HAVING`

clause.

A standard example is finding the number of rows in a table. Here, we aggregate all rows of the **Books** table into one row.

SELECT * FROM Books;

Id | Author | Title | Price |
---|---|---|---|

234 | Anthony Molinaro | SQL Cookbook | 20.00 |

235 | Alan Beaulieu | Learning SQL | 25.00 |

236 | Donald Knuth | Things a Computer Scientist Rarely Talks About | 25.00 |

237 | Donald Knuth | The Art of Computer Programming | 27.00 |

SELECT COUNT(*) AS NumberOfBooks FROM Books;

NumberOfBooks |
---|

4 |

Easy, right? Using `COUNT(*)`

, you can count the number of all rows.

You can also have a look at another articles on SQL aggregate functions here.

### The * Argument to Aggregate Functions

If you ask me what * stands for in SQL, my response is that it stands for *all*. It is commonly used with a `SELECT`

statement when querying all the columns of a given table. Take, for example, `SELECT * FROM Books`

, as above.

The `*`

argument can also be used with the `COUNT()`

aggregate function. It then counts all rows of a table. Take, for example, `SELECT COUNT(*) as NumberOfBooks FROM Books`

, as above.

You can group data by some column or even by many columns. Take a look at the following example:

SELECT Author, COUNT(*) AS NumberOfBooks FROM Books GROUP BY Author;

Author | NumberOfBooks |
---|---|

Anthony Molinaro | 1 |

Alan Beaulieu | 1 |

Donald Knuth | 2 |

This counts the number of books by author.

The `*`

argument applies only to the `COUNT()`

aggregate function. For other aggregate functions, a specific column, or a combination of columns, is required as an argument.

### The DISTINCT Keyword

The `DISTINCT`

keyword tells the database that we don't want to consider duplicate values.

For example, `COUNT(Author)`

gets us the number of all authors present in a table. But if the same author appears many times in a column, the author is counted many times. Take a look at this:

SELECT COUNT(Author) AS NumberOfAuthors FROM books;

NumberOfAuthors |
---|

4 |

See? It counts four authors because Donald Knuth is counted twice.

What happens if we add the `DISTINCT`

keyword?

SELECT COUNT(DISTINCT Author) AS NumberOfAuthors FROM Books;

NumberOfAuthors |
---|

3 |

This time, we use a `DISTINCT`

keyword. Now, `Donald Knuth`

is counted only once. When using the `DISTINCT`

keyword, `COUNT()`

must take a specific column as an argument. It returns the number of unique values stored in that column.

Similarly, we can use the `DISTINCT`

keyword with arguments of `SUM()`

and `AVG()`

aggregate functions. Below, we compare the results of executing the `SUM()`

function with and without the `DISTINCT`

keyword.

SELECT SUM(DISTINCT Price) AS TotalDistinctPrice FROM Books;

TotalDistinctPrice |
---|

72 |

SELECT SUM(Price) AS TotalPrice FROM Books;

TotalPrice |
---|

97 |

As you can see, when you use a `DISTINCT`

keyword, the books with the same price are considered only once in SUM(). In this case, it makes more sense to use the `SUM()`

function without the `DISTINCT`

keyword.

Similarly, when calculating an average price, it's better not to use the `DISTINCT`

keyword; we should consider each price as many times as it appears in the column. See what happens with `AVG()`

:

SELECT AVG(DISTINCT Price) AS TotalDistinctAvg FROM Books;

TotalDistinctAvg |
---|

24 |

SELECT AVG(Price) AS TotalAvg FROM Books;

TotalAvg |
---|

24.25 |

For the `MIN()`

and `MAX()`

aggregate functions, the `DISTINCT`

keyword doesn’t make a difference. But it doesn’t cause any errors either. Why? Let’s consider a set of numbers {1, 2, 2, 3, 4, 5, 5, 6}. Its maximum and minimum values are 6 and 1, respectively. With the `DISTINCT`

keyword, this set becomes {1, 2, 3, 4, 5, 6}, so the maximum and minimum values are still the same.

### The CASE Statement

The `CASE`

statement categorizes and filters data. It is like a gatekeeper for the argument of an aggregate function, deciding which values to let in. Let’s look at some examples to illustrate this concept.

In the following query, we use a `CASE`

statement as an argument for the `COUNT()`

function. It counts only the books whose price is higher than $20.00.

SELECT COUNT(CASE WHEN Price > 20 THEN Price END) AS NumberOfExpensiveBooks FROM Books;

NumberOfExpensiveBooks |
---|

3 |

A `CASE`

statement can be used as an argument for other aggregate functions as well. In the query below, we sum the prices of the books that cost exactly $25.00. The `CASE`

statement inside of the `SUM()`

function allows only books with a price of $25.00 to be included in the sum.

SELECT SUM(CASE WHEN Price = 25 THEN Price END) AS BooksSum FROM Books;

BooksSum |
---|

50 |

Now, we average the prices of books that cost below $26.00 in the next query. The `CASE`

statement inside the `AVG()`

function allows only books with a price below $26.00 to be included in the average. The `AVG()`

function is an argument to the `ROUND()`

function, so the output of the `AVG()`

function is rounded to two decimal places.

SELECT ROUND(AVG(CASE WHEN Price < 26 THEN Price END), 2) AS BooksAvg FROM Books;

BooksAvg |
---|

23.33 |

In the next query, we find the minimum price of the books about SQL that cost below $26.00. The `CASE`

statement inside of the `MIN()`

function allows only books with a price below $26.00 to be included in the set.

SELECT MIN(CASE WHEN Price < 26 AND Title LIKE '%SQL%' THEN Price END) AS BooksMin FROM Books;

BooksMin |
---|

20 |

Next, we find the maximum price of the books that cost below $25.00. The `CASE`

statement inside the `MAX()`

function allows only books with a price below $25.00 to be included in the set.

SELECT MAX(CASE WHEN Price < 25 THEN Price END) AS BooksMax FROM Books;

BooksMax |
---|

20 |

I’m sure you can already figure out the output of these queries!

### What Happens to NULLs?

The answer is simple. SQL aggregate functions ignore `NULL`

values.

Let’s consider an updated ** Books** table. This time we have a

`NULL`

price.SELECT * FROM Books;

Id | Author | Title | Price |
---|---|---|---|

234 | Anthony Molinaro | SQL Cookbook | 20.00 |

235 | Alan Beaulieu | Learning SQL | 25.00 |

236 | Donald Knuth | Things a Computer Scientist Rarely Talks About | 25.00 |

237 | Donald Knuth | The Art of Computer Programming | NULL |

`COUNT(Price)`

now returns 3, not 4, and `SUM(Price)`

returns 70.00. `NULLs`

are ignored in both cases.

You can also use aggregate functions with JOINs! Take a look at our article on Using SQL Aggregate Functions with JOINs to learn more.

## The Role of HAVING and GROUP BY With Aggregate Functions

It is straightforward to understand what a `HAVING`

clause does if you are familiar with the `WHERE`

clause. A `HAVING`

clause filters the output values of aggregate functions. A `GROUP BY`

clause lets you divide your data into groups and find an aggregate value for each group.

Let’s look at an example.

SELECT Author, AVG(Price) AS AvgBookPrice FROM Books GROUP BY Author HAVING AVG(Price) > 20;

Author | AvgBookPrice |
---|---|

Alan Beaulieu | 25 |

Donald Knuth | 26 |

We group the data by the Author column using a `GROUP BY`

clause. Then, we restrict the values of `AVG(Price)`

to be greater than 20 using a `HAVING`

clause.

We can try using the `WHERE`

and `HAVING`

clauses together to see the difference between them.

SELECT Author, AVG(Price) AS AvgBookPrice FROM Books WHERE Author LIKE 'A%' GROUP BY Author HAVING AVG(Price) > 20;

Author | AvgBookPrice |
---|---|

Alan Beaulieu | 25 |

The `HAVING`

clause is often confused with the `WHERE`

clause. Remember you can't use aggregate functions in a `WHERE`

clause. Make sure to get enough practice and check out our SQL HAVING Tutorial. Also, check out our article on Using GROUP BY in SQL to get even more insight into the GROUP BY clause.

SQL aggregate functions or SQL window functions? Or maybe both? Check out our article on similarities and differences between the two!

## Let’s Practice!

It isn't enough to just read. SQL requires a great deal of practice. Let's get you started on some examples here so you can continue on your own!

Before we jump into the examples, make sure you are clear on all SQL Fundamentals and Standard SQL Functions!

### Sample Database

Below is the blueprint of the database. We'll use this database in our examples below.

Let’s analyze the database blueprint, starting from the left.

The ** Customers** table stores data about the customers. Its primary key is the

`CustomerId`

column. The **and**

`Customers`

**tables are linked using the**

`Orders`

`CustomerId`

column. The **Orders**table stores the order date and the ID of the customer who placed the order. Its primary key is the

`OrderId`

column. The link between the **and**

`Customers`

**tables defines the relationship between them. One customer can have zero or more orders, but an order can be assigned to only one customer.**

`Orders`

The ** Orders** and

**tables are linked using the**

`OrderDetails`

`OrderId`

column. The **and**

`Products`

**tables are linked using the**

`OrderDetails`

`ProductId`

column. The primary key of the **table consists of the**

`OrderDetails`

`OrderId`

and `ProductId`

columns.One order can consist of one or more products. Hence, one row from the ** Orders** table can be related to one or more rows from the

**table. Also, one product can be in zero or more orders. Hence, one row from the**

`OrderDetails`

**table can be related to zero or more rows from the**

`Products`

**table.**

`OrderDetails`

Now, let’s insert some data into our tables.

The ** Customers** table:

CustomerId | FirstName | LastName | Street | HouseNo | City | Country | PhoneNo | |
---|---|---|---|---|---|---|---|---|

1 | Rachel | Hartley | Bicetown Road | 602 | New York | USA | rh@email.com | 0123456789 |

2 | Caitlyn | Ray | Friedrichstrasse | 44 | Berlin | Germany | cr@email.com | 0987654321 |

3 | Andrew | Duncan | Lairg Road | 38 | London | NULL | ad@email.com | 0567432678 |

4 | Taylor | Jenkins | Park Row | 106 | Edinburgh | UK | NULL | 0876345123 |

5 | Ben | Holland | Williams Avenue | 252 | Los Angeles | USA | bh@email.com | 0987456789 |

The

table (the date column is in the format DD-MM-YYYY):**Orders**

OrderId | CustomerId | OrderDate |
---|---|---|

45 | 1 | 10-10-2021 |

46 | 2 | 11-12-2020 |

47 | 3 | 05-05-2021 |

48 | 4 | 09-08-2021 |

49 | 5 | NULL |

50 | 1 | 02-06-2021 |

51 | 2 | 07-07-2021 |

The

table:**OrderDetails**

OrderId | ProductId | Quantity |
---|---|---|

45 | 100 | 2 |

45 | 101 | 3 |

46 | 100 | 1 |

47 | 102 | 4 |

48 | 101 | 3 |

48 | 103 | 5 |

49 | 104 | 2 |

50 | 100 | 3 |

51 | 101 | 1 |

The

table:**Products**

ProductId | Name | UnitPrice | AvailableInStock |
---|---|---|---|

100 | Keyboard | 30.00 | 300 |

101 | USB Drive | 20.00 | 450 |

102 | Mouse | 20.00 | 500 |

103 | Screen | 100.00 | 450 |

104 | Laptop | 600.00 | 200 |

We are now ready to start with the examples.

### Examples With COUNT()

We start with the ** Customers** table. Let’s find out how many customers there are by country.

SELECT Country, COUNT(CustomerId) AS NumberOfCustomers FROM Customers GROUP BY Country;

Country | NumberOfCustomers |
---|---|

NULL | 1 |

Germany | 1 |

UK | 1 |

USA | 2 |

We have selected all distinct values from the `Country`

column, including the `NULL`

value. The `NumberOfCustomers`

column stores the number of customers for each value of the `Country`

column.

What happens if we use the `Email`

column as an argument to the `COUNT()`

function?

SELECT Country, COUNT(Email) AS NumberOfCustomers FROM Customers GROUP BY Country;

Country | NumberOfCustomers |
---|---|

NULL | 1 |

Germany | 1 |

UK | 0 |

USA | 2 |

The value of the `NumberOfCustomers`

column for the country “UK” becomes zero. This is because the `Email`

column in the ** Customers** table is

`NULL`

for this customer.Now, let’s look at an example that uses the `GROUP BY`

and `HAVING`

clauses.

SELECT Country, COUNT(Email) AS NumberOfCustomersWithEmail FROM Customers WHERE Country IS NOT NULL GROUP BY Country HAVING COUNT(Email) > 1;

Country | NumberOfCustomersWithEmail |
---|---|

USA | 2 |

As before, we select the values from the Country column and obtain the count of customers with emails by country. In the `WHERE`

clause, we state we don’t consider `NULL`

values for the `Country`

column. Next, we group our data by `Country`

. Finally, we restrict the `NumberOfCustomersWithEmail`

column values to be greater than 1 with a `HAVING`

clause.

### Examples With SUM()

Let’s check how much all the available products are worth.

SELECT SUM(UnitPrice * AvailableInStock) AS AllProductsValue FROM Products;

AllProductsValue |
---|

193000 |

Here, the `SUM()`

aggregate function creates a value of `UnitPrice * AvailableInStock`

for each row and then adds up all those values.

Let’s say each order of value greater than $100.00 qualifies for a discount. We want to find out which orders qualify for the discount.

SELECT OrderId, CASE WHEN OrderValue > 100 THEN 1 ELSE 0 END AS QualifiesForDiscount FROM ( SELECT aod.OrderId AS OrderId, SUM(aod.Quantity * ap.UnitPrice) AS OrderValue FROM Products ap JOIN OrderDetails aod ON ap.ProductId = aod.ProductId GROUP BY aod.OrderId );

OrderId | QualifiesForDiscount |
---|---|

45 | 1 |

46 | 0 |

47 | 0 |

48 | 1 |

49 | 1 |

50 | 0 |

51 | 0 |

The inner query selects all the `OrderId`

column values and calculates the value of each order using the `SUM()`

function. The outer query uses a `CASE`

statement to decide whether the order qualifies for a discount (`1`

) or not (`0`

).

Now, say we define all products with a unit price greater than $90.00 to be expensive. Let’s find out the total value of all expensive products in stock.

SELECT SUM(CASE WHEN UnitPrice > 90 THEN UnitPrice * AvailableInStock END) AS ExpensiveProductsValue FROM Products;

ExpensiveProductsValue |
---|

165000 |

We have passed a `CASE`

statement as an argument to the `SUM()`

function. This argument ensures that only the rows with the `UnitPrice`

value greater than $90.00 are considered. Other than that, this example is quite similar to the first one in this section.

You can find more examples of using SUM() with GROUP BY in our article How to Use SUM() with GROUP BY: A Guide with 8 Examples.

### Examples With AVG()

Let’s check what the average price of an order is.

SELECT AVG(OrderValue) AS AvgOrderValue FROM ( SELECT aod.OrderId AS OrderId, SUM(aod.Quantity * ap.UnitPrice) AS OrderValue FROM Products ap JOIN OrderDetails aod ON ap.ProductId = aod.ProductId GROUP BY aod.OrderId );

AvgOrderValue |
---|

300 |

The inner query outputs the total value of the order for each order. The outer query calculates the average value of an order.

We can also find out the average quantity ordered by product.

SELECT ROUND(AVG(Quantity), 2) AS AvgOrderQuantity FROM OrderDetails;

AvgOrderQuantity |
---|

2.67 |

On average, our customers buy between 2 and 3 items of any given product in an order.

Let’s see what changes when we consider only the unique values of the `Quantity`

column.

SELECT ROUND(AVG(DISTINCT Quantity), 2) AS AvgOrderQuantity FROM OrderDetails;

AvgOrderQuantity |
---|

3 |

The output value changes, because we no longer consider duplicate values that appear in the `Quantity`

column of the ** OrderDetails** table.

### Examples With MAX() and MIN()

Last but not least! The `MAX()`

and `MIN()`

functions are pretty straightforward. Let’s find out the oldest and the most recent orders.

SELECT MIN(OrderDate) AS EarliestOrder, MAX(OrderDate) AS LatestOrder FROM Orders;

EarliestOrder | LatestOrder |
---|---|

11-12-2020 | 10-10-2021 |

The `MIN()`

function returns the oldest date, and the `MAX()`

function returns the most recent date.

We can also identify the cheapest and the most expensive products. You can search the ** Products** table to do this.

SELECT MIN(UnitPrice) AS CheapestProductPrice, MAX(UnitPrice) AS MostExpensiveProductPrice FROM Products;

CheapestProductPrice | MostExpensiveProductPrice |
---|---|

20 | 600 |

Let’s find out how many orders there are per customer, then get the minimum and the maximum number of orders per customer.

SELECT MIN(NumberOfOrders) AS MinNumberOfOrders, MAX(NumberOfOrders) AS MaxNumberOfOrders FROM ( SELECT CustomerId, COUNT(OrderID) AS NumberOfOrders FROM Orders GROUP BY CustomerId );

MinNumberOfOrders | MaxNumberOfOrders |
---|---|

1 | 2 |

The inner query selects the `CustomerId`

column and the total number of orders placed by a given customer. The `COUNT(OrderId)`

function counts the number of orders per customer. Next, we group our data by the `CustomerId`

column using a `GROUP BY`

clause. In this step, the `COUNT(OrderId)`

function counts the orders by customer and not for all customers together. The outer query selects the minimum and the maximum values of the `NumberOfOrders`

column from the inner query.

## SQL Aggregate Functions as an Essential Tool in Data Science

By using aggregate functions, we can easily find answers to specific questions, such as *how many customers there are *or *what the average price of an order is*. SQL aggregate functions let us analyze data efficiently.

These core SQL aggregate functions are very helpful in data science. With them, you can organize data the way you want and extract the information you need.

We have reviewed many examples with the `COUNT()`

, `SUM()`

, `AVG()`

, `MIN()`

, and `MAX()`

aggregate functions. We have also covered examples of `*`

, the `DISTINCT`

keyword, and the `CASE`

statement as arguments to aggregate functions. Now, you are ready to create your database and practice some more!