21st Nov 2023 Updated: 29th Oct 2024 10 minutes read

SQL Aggregate Functions: A Comprehensive Guide for Beginners

Table of Contents

Common SQL Aggregate Functions
What is Data Aggregation in SQL
Using Aggregate Functions with GROUP BY
Real-world Examples of Using Aggregate Functions
Where to Learn More

Are you curious about SQL aggregate functions and how they apply to real-world scenarios? In this article, we'll explore SQL aggregate functions, showcase real-world examples, and offer valuable insights on how to master them.

Data aggregation is the process of taking several rows of data and condensing them into a single result or summary. When dealing with large datasets, this is invaluable because it allows you to extract relevant insights without having to scrutinize each individual data point.

So, what exactly are SQL aggregate functions? They are specialized functions that perform calculations on groups of variables and return a single result. Unlike traditional functions, aggregate functions work on groups of rows of data. This allows you to efficiently compute statistics or generate summary information from a dataset.

In this article, we will look at the importance of SQL aggregate functions and how to use them. We’ll explain them using real-world examples. Sounds interesting, right? Then check out our course on Standard SQL Functions. It’s a great starting place for learning how to process data using SQL's most used functions!

Common SQL Aggregate Functions

In the world of SQL, aggregate functions are essential tools for efficient data summarization and analysis. They have a unique ability to distill large datasets into meaningful insights, to facilitate statistical analysis, and to simplify complex data structures.

The most frequently used SQL aggregate functions are:

SUM()
COUNT()
AVG()
MIN()
MAX()

Let's explore each of them below.

SUM()

The SUM() function returns the total of a numerical column. It is typically used when you need to find the total of values such as sales income, quantities, or expenses.

Imagine that you want to know your company's entire revenue from sales; you can do so by running the following query:

SELECT SUM(sales_amount) as total_revenue
FROM sales_data;

COUNT()

Next in line is the COUNT() function. The objective of this function is to count the number of rows in a table or the number of non-null values in a column.

Suppose you want to find out how many products are sold in your store; you can use the following query:

SELECT COUNT(*) as total_products
FROM products;

In this query, we employ the asterisk (*) to indicate all columns. And we’re counting all rows in the query. See here for more examples of using the COUNT() function

AVG()

When you need to calculate the average (mean) value of a numeric column, the AVG() function is your go-to. It is useful when looking for the average price, rating, units sold, and so on. This query, for example, would be beneficial if you run a subscription-based streaming service and want to discover the average length of user sessions on your platform:

SELECT AVG(session_duration) as average_session_duration
FROM user_sessions;

MIN()

The MIN() function returns the smallest value within a column. It's especially useful for locating the lowest or smallest value in a dataset.

Consider a manager attempting to identify the fastest response time to a customer question for support tickets; the following query can return the answer:

SELECT MIN(response_time) as shortest_response_time
FROM customer_support_tickets;

MAX()

On the other hand, the MAX() function returns the largest value within a column. It is useful for determining the highest value in a dataset.

Suppose you own an e-commerce website and want to know the highest order value placed by a customer in your online store. This is the query you’d write:

SELECT MAX(order_value) as max_order_value
FROM customer_orders;

For more examples of both the MIN() and MAX() functions, check out SQL MIN() and MAX() Functions Explained.

What is Data Aggregation in SQL

Data aggregation is a powerful technique for summarizing and analyzing information from a dataset. It enables you to gain useful insights by doing computations on groupings of linked data, removing the need for the time-consuming inspection of individual records.

To make this concept more concrete, consider the following real-world use cases:

Sales: Understanding how different product categories contribute to total sales is critical for a store owner. You can determine which product category makes the most money by aggregating sales data for each product category.
Student Grades: If you're a teacher, you might want to figure out the average grade for each subject in your class. By aggregating student grade data, you can determine subject-wise averages and identify areas for improvement.
Website Traffic: Suppose you run a website and want to know which pages get the most traffic. You can find the most popular pages and manage resources more effectively by aggregating website traffic statistics.
Employee Salaries: Imagine you're an HR manager in a company looking for insights into the salaries across different departments. SQL can help you aggregate and analyze the data for each department.

Using Aggregate Functions with GROUP BY

For the most part, using SQL’s aggregate functions means using the GROUP BY clause. This powerful combination allows you to execute aggregate computations on specific groupings of data in your dataset.

The SQL GROUP BY clause separates your dataset into groups based on matching values in one or more columns; this allows you to apply aggregate functions to each group independently. Below is an overview of the GROUP BY syntax:

SELECT 
  column1, 
  column2, 
  aggregate_function(column3)
FROM table_name
GROUP BY column1, column2;

In this query, data with matching values in column1 and column2 will become the groups; aggregate_function(column3) is the aggregate function you want to apply to the values of column3 within each group. Let's look at a few examples.

Example 1: Average Salary per City

Imagine you have an employee database that includes information on employees' names, cities, and salaries. Your goal is to find out what the average salary is in each city. You can do so with the following query:

SELECT 
  city, 
  AVG(salary) as average_salary
FROM employee_data
GROUP BY city;

Now, let's break down the query step by step:

Step 1: Grouping by City

The database kicks off the process by organizing employees into groups based on their city of residence. Each city group is visually represented below by a different color:

Group 1: New York (Yellow)
Group 2: San Francisco (Red)
Group 3: Los Angeles (Green)

employee_name	city	salary
John	New York	60000
Jane	New York	55000
Bob	San Francisco	70000
Alice	San Francisco	72000
Mike	Los Angeles	62000
Emily	Los Angeles	64000

Step 2: Calculating Average Salaries

Within each city group, the database calculates the average salary by applying the AVG() aggregate function to the salary column.

Final Result:

This SQL query yields the average salary for each city, as illustrated below. Executing this query provides valuable insights into compensation trends and disparities across the cities where your employees reside. These insights can be instrumental in making informed decisions regarding salary adjustments, cost-of-living considerations, and more.

city	average_sale
New York	57500
San Francisco	71000
Los Angeles	63000

Example 2: Total Sales per Product Category

Suppose you have a sales dataset with product categories and sales quantities. Your objective is to determine the total sales for each product category. Here's the query:

SELECT 
  product_category, 
  SUM(sales_amount) as total_sales
FROM sales_data
GROUP BY product_category;

This query groups sales data by product category and computes total sales amounts for each group. It provides useful insights into the most profitable product categories. More examples of this combination can be found in How to Use SUM() with GROUP BY: A Detailed Guide.

Example 3: Count Orders per Year and Month

Let's say you run an e-commerce platform with a database that includes order dates. You want to keep track of the amount of orders placed each year and month. The following query can achieve this:

SELECT 
  EXTRACT(YEAR FROM order_date) as order_year, 
  EXTRACT(MONTH FROM order_date) as order_month, 
  COUNT(*) as order_count
FROM orders
GROUP BY order_year, order_month
ORDER BY order_year, order_month;

This query groups orders by year and month, providing a timeline of order counts. It's quite useful for tracking seasonal changes and estimating monthly order numbers. Similar real-world examples may be found in How to Use COUNT() with GROUP BY.

In each of these examples, the GROUP BY clause splits the data into groups based on certain columns and the aggregate functions (AVG(), SUM(), and COUNT()) execute computations on each group. More examples can be found in our article Using GROUP BY in SQL.

Real-world Examples of Using Aggregate Functions

Now that we've covered the fundamentals of aggregate functions and the GROUP BY clause, let’s see how we can use them in additional real-world examples.

1. Latest Sale per Product Category

Imagine you have a sales database and want to find the most recent order date for each product category. To achieve this, we need to group the data by the column product_category and use the MAX() function to select the most recent (i.e. largest) order date:

SELECT 
  product_category, 
  MAX(order_date) as most_recent_order_date
FROM sales_data
GROUP BY product_category;

Analyzing the most recent order date for each product category aids in the identification of current market trends and product demand.

2. Sales per Year and Product Category

A more extensive analysis may necessitate grouping data by multiple columns. For example, suppose you need to know the total sales for each product category and year. Here’s the query:

SELECT 
  product_category, 
  EXTRACT(YEAR FROM order_date) as sales_year, 
  SUM(sales_amount) as total_sales
FROM sales_data
GROUP BY product_category, sales_year;

This query groups data by two columns, allowing us to analyze sales by both product category and year and thus providing more detailed and granular insights.

3. Product Categories with High Sales

Maybe you may need to filter groups depending on the output of an aggregate function. In such instances, the HAVING clause is what you need. Suppose you want to find product categories with total sales of more than $10,000:

SELECT 
  product_category, 
  SUM(sales_amount) as total_sales
FROM sales_data
GROUP BY product_category
HAVING SUM(sales_amount) > 10000;

The HAVING clause filters the grouped results, displaying those product categories with a total sales greater than the provided threshold.

It is critical to note that aggregate functions cannot be used in the WHERE clause. Aggregate functions act on data that has been grouped using the GROUP BY clause, whereas the WHERE clause filters individual rows before they are grouped.

4. Counting Unique Customers

Counting distinct values in a column is essential for various analyses. For example, you might want to find the number of unique customers:

SELECT 
  COUNT(DISTINCT customer_id) as unique_customers
FROM sales_data;

This query counts the number of unique customer IDs in the sales data, which might be useful when working with datasets that contain duplicates.

5. Advanced Usage of COUNT() Function

Understanding the difference between COUNT(*) and COUNT(expression) is important. COUNT(*) counts all rows in a group, whereas COUNT(expression) counts only the non-NULL values of the supplied expression within a group. NULL values represent missing or unknown data in a dataset.

Let’s look at an example:

SELECT 
  product_category,
  COUNT(*) as total_orders,
  COUNT(discount) as discounted_orders
FROM sales_data
GROUP BY product_category;

Here, COUNT(*) counts all orders in each product category, including those with NULL values in the discount column. COUNT(discount) on the other hand, counts only orders with non-NULL values in the discount column within each category. Check out our article What is the Difference Between COUNT(*), COUNT(1), COUNT(column name), and COUNT(DISTINCT column name)? for more examples of COUNT() function usage.

6. Sales Statistics for Product Categories

SQL allows you to use numerous aggregate functions in a single query to get a complete picture of your data. For example, let's find the total sales, average price, and minimum price for each product category:

SELECT 
  product_category,
  SUM(sales_amount) as total_sales,
  AVG(product_price) as avg_price,
  MIN(product_price) as min_price
FROM sales_data
GROUP BY product_category;

This query effectively extracts multiple insights from your data and at the same time provides a concise summary.

Where to Learn More

SQL aggregate functions are powerful tools for summarizing and evaluating data in relational databases. Whether you're new to SQL or an experienced user, knowing how to use aggregate functions effectively will help you extract useful insights from your data, make data-driven decisions, and ultimately improve your business performance.

Acquiring SQL skills can provide a competitive advantage in today's data-driven business market. If you want to dive deep into SQL, I recommend our All Forever SQL Package. This offer contains all current and future SQL courses in four different SQL dialects. Here are some other highlights from our offer:

SQL from A to Z. This track’s seven courses and almost 900 exercises offer a complete learning path for modern SQL, from the basics to advanced concepts.
SQL Practice. This track is for those looking for lots of interactive SQL practice. It contains nine interactive courses and over 800 hands-on exercises – and we keep adding more.
Monthly SQL Practice. Each month, we publish a new SQL practice course at a basic or advanced level.

Happy learning!

Tags: