15th Feb 2024 15 minutes read SQL HAVING Clause: A Tutorial for Beginners Maria Durkin GROUP BY Table of Contents The HAVING Clause Example Queries with HAVING Exercise 1: Customers with More Than One Purchase Exercise 2: Recognizing Loyal Customers Exercise 3: Top Products by Sales Exercise 4: Average Rating by Genre Exercise 5: Identifying Users with Extended Login Durations Common Use Cases for HAVING 1. Customer Segmentation 2. Employee Performance Analysis 3. Product Sales Monitoring 4. Event Attendance Tracking 5. Quality Control in Manufacturing Troubleshooting Tips for the HAVING Clause Order of SQL Statements Not Grouping Data Inconsistency of SELECT and GROUP BY Incorrect Use of HAVING Conditions WHERE vs. HAVING Excluding Aggregate Functions from HAVING Ready for Some SQL HAVING Clause Practice? So, you're new to SQL and have just stumbled upon the HAVING clause. But what does it do? Why is it essential? In this article, we'll delve into the basics of the HAVING clause in SQL. And we’ll walk through practical exercises to enhance your understanding of its functionality. Let's start by defining Structured Query Language (SQL). SQL is a powerful and standardized programming language designed specifically for managing and manipulating relational databases; simply put, it is how we communicate with the databases that hold all of our important data. We can perform a variety of tasks with these database systems, including querying, updating, and managing data. SQL is essential for effective data management and analysis. It allows us as users to retrieve specific information, manipulate data in databases, and create database structures. SQL offers a powerful set of tools for extracting relevant information and generating reports in data analytics. You can learn more about these databases in our article What Is an SQL Database? SQL has specific keywords and clauses (commands) that help us perform these actions, much like a written language's grammar and constructs. One such example is the HAVING clause. In SQL, the HAVING clause is useful for refining query results by filtering data groups based on specific conditions. In this article, we'll go over some beginner-friendly exercises with the HAVING clause to get you started; for a more in-depth explanation, see our article The SQL HAVING Clause Explained. If you're thinking about starting your SQL journey but don't know where to begin, our SQL Basics course is an excellent place to start. In this course, you will learn the fundamentals of SQL by retrieving data from databases and creating simple reports using interactive and real-world examples. So, with that said, let's take a deeper look at the SQL HAVING clause. The HAVING Clause The HAVING clause works in tandem with the GROUP BY clause to narrow down results based on aggregate functions. Aggregate functions are SQL tools that allow us to calculate values like the total (SUM), average (AVG), and lowest value (MIN) from groups of rows in our data. Let’s explain these clauses with an example. Imagine we have a sales table that stores our company’s sales reports. We can use the following query to find regions with total sales greater than $10,000: SELECT Region, SUM(TotalSales) as TotalSales FROM Sales GROUP BY Region HAVING SUM(TotalSales) > 10000; Here’s the breakdown: SELECT Region, SUM(TotalSales) as TotalSales: The SELECT clause displays both the column region and the total sum of sales. FROM Sales: The FROM clause specifies the source table – here, sales. GROUP BY Region: The GROUP BY clause groups the data based on the column Region. For example, if we have the values ‘France’, ‘Italy’, and ‘Spain’ in the Region column, we will have three groups. HAVING SUM(TotalSales) > 10000: The HAVING clause specifies that the aggregated data for that column value (i.e. all the sales for ‘Italy’) must have total sales greater than $10,000 to be included in our results set. If a region's total sales were less than this amount, it would not appear in our query results. With HAVING, we can refine query results by filtering aggregated (grouped) data based on the conditions we specify. This allows users to focus on subsets of data that meet specific criteria, allowing for the extraction of meaningful insights from data sets. Perhaps you've heard that the WHERE clause is used in SQL for filtering data. What's the difference between these two clauses? Which one should you use? In SQL, both the HAVING and WHERE clauses are important for filtering data, but they are used at different stages of the query. For a more in-depth look, see our article HAVING vs. WHERE in SQL: What You Should Know? For now, let’s look at some examples. Let's modify the previous query slightly. This time, we're calculating the total sales for each region in 2023: SELECT Region,SUM(TotalSales) as TotalSales FROM Sales WHERE YEAR(SaleDate) = 2023 GROUP BY Region; In this version, the WHERE clause filters individual rows before they are grouped to include only those with a sale date in 2023. Now, let’s include both the WHERE and HAVING clauses: SELECT Region, SUM(TotalSales) as TotalSales FROM Sales WHERE YEAR(SaleDate) = 2023 GROUP BY Region HAVING SUM(TotalSales) > 10000; Here the WHERE clause filters individual rows based on the sale date. After GROUP BY is applied, the HAVING clause ensures that only regions with total sales of more than $10,000 in 2023 are included in the final result. To summarize: WHERE is used to filter individual rows, narrowing down the data before aggregation. HAVING is used to filter groups of rows; it is applied to the groups made by aggregation. Example Queries with HAVING The best way to learn a language is to practice –so in this section, we will look at some practical examples to help explain the HAVING clause in context. If you are completely new to SQL queries, take a look at our SQL Basics Cheat Sheet to help you navigate the following queries. Exercise 1: Customers with More Than One Purchase Exercise: You have an online store's purchase data and the store’s analytics team wants to identify customers who have multiple purchases – i.e. to find the returning customers. Display the customer IDs and the number of purchases they have made. Solution: SELECT customer_id, COUNT(purchase_id) as purchase_count FROM purchases GROUP BY customer_id HAVING COUNT(purchase_id) > 1; Solution Explanation: SELECT customer_id, COUNT(purchase_id) as purchase_count: The SELECT clause displays the customer IDs and the number of purchases. FROM purchases: Specifies the source table as purchases. GROUP BY customer_id: Groups the purchase data by the customer IDs, creating one group for each customer. HAVING COUNT(purchase_id) > 1: The HAVING clause filters the result set, including only customers who have made more than one purchase. Output: customer_idpurchase_count 1035 1092 1113 This query highlights customers who have made more than one purchase – or returning customers. The HAVING clause is critical because it filters out one-and-done customers. Without HAVING, the query would return all customers. Exercise 2: Recognizing Loyal Customers Exercise: The manager of an online clothing store wishes to reward customers who made at least ten purchases in 2023. You have been asked to display the customer ID and number of orders for customers who placed more than 10 orders last year. Additionally, make sure that the customer with the most orders appears at the top of the results. Solution: SELECT customer_id, COUNT(order_id) as order_count FROM orders WHERE YEAR(order_date) = 2023 GROUP BY customer_id HAVING COUNT(order_id) > 10 ORDER BY OrderCount DESC; Solution Explanation: SELECT customer_id, COUNT(order_id) as OrderCount: The SELECT clause displays the customer’s ID and their number of orders. FROM orders: This specifies the source table as orders. WHERE YEAR(order_date) = 2023: This filters the rows to only include orders from 2023. GROUP BY customer_id: This groups the data by the customer ID, creating one group for each customer. HAVING COUNT(order_id) > 10: This clause restricts the result set to customers with more than ten orders. ORDER BY OrderCount DESC: This sorts the results by the number of orders in descending order (10-1) so that the customer with the highest number of orders goes first. Output: customer_idorder_count 121415 113514 115612 This query returns a list of customers who made at least ten purchases last year, with the customer with the most orders at the top of the results. The HAVING clause is important in this scenario because it narrows the results to only loyal customers who have placed more than ten orders. Without the HAVING clause, the query would return all customers, regardless of the number of orders. Exercise 3: Top Products by Sales Exercise: The sales manager of a furniture store wants to identify the top-grossing products. For products with sales greater than $10,000, display the product ID as well as the total amount sold. Also, make sure that the products with the highest sales are displayed at the top of the results. Solution: SELECT product_id, SUM(sales_amount) as total_sales FROM sales GROUP BY product_id HAVING SUM(sales_amount) > 10000 ORDER BY total_sales DESC; Solution Explanation: SELECT product_id, SUM(sales_amount) as total_sales: The SELECT clause displays both the product ID and the total sales. FROM sales: This specifies the source table as sales. GROUP BY product_id: This groups the data by the product ID, creating one group for each product. HAVING SUM(sales_amount) > 10000: The HAVING clause filters the result set to only include products with total sales greater than $10,000. ORDER BY total_sales DESC: This sorts the results by total sales in descending order, ensuring that the products with the most sales appear first. Output: product_idtotal_sales 10218300 10515600 16012200 This query returns a list of top-performing products, displaying products with sales greater than $10,000 and sorting the rows in descending order by total sales. The HAVING clause is required here because it allows us to filter the result set using aggregated data, specifically total sales. Without the HAVING clause, we would be unable to filter the results based on the total sales for each product. The WHERE clause is unsuitable for this task because it filters rows before grouping; we are interested in conditions applied to aggregated data after grouping. Exercise 4: Average Rating by Genre Exercise: A film critic wants to identify movie genres with high average ratings in 2020. You have been asked to display movie genres and average ratings for those genres rated above 4.0. Also, make sure that the genres with the highest average ratings appear at the top. Solution: SELECT genre, AVG(rating) as avg_rating FROM movies WHERE YEAR(release_date) = 2020 GROUP BY genre HAVING AVG(rating) > 4.0 ORDER BY avg_rating DESC; Explanation: SELECT genre, AVG(rating) as avg_rating: The SELECT clause displays the movie genre and the average rating. FROM movies: This specifies the source table as movies. WHERE YEAR(release_date) = 2020: This filters the rows to include only movies released in the year 2020. GROUP BY genre: This groups the data by genre, creating one group for each genre. HAVING AVG(rating) > 4.0: This filters the result set to include only genres with an average rating greater than 4.0. ORDER BY avg_rating DESC: This sorts the results by average rating in descending order, ensuring that the genres with the highest average ratings are at the top. Output: genreavg_rating Drama4.5 Comedy4.2 This query assists the movie critic in identifying exceptional movie genres with an average rating of more than 4.0 in 2020. The result set includes each genre and its average rating, which are sorted in descending order based on the average rating. The HAVING clause is essential because it allows us to filter genres based on aggregated data (the average rating). Exercise 5: Identifying Users with Extended Login Durations Exercise: Using a website's user login data, the IT security team seeks to identify users with unusually long single login durations; this allows them to perform comprehensive security monitoring and analysis. Display the users’ names, email addresses, and the length of their longest single login session. Consider only users whose login times exceed two hours (7200 seconds). Additionally, ensure that the users with the longest login durations appear at the top of the results. Solution: SELECT users.user_name, users.email_address, MAX(logins.login_duration) as longest_login_duration FROM users JOIN logins ON users.user_id = logins.user_id GROUP BY users.user_name, users.email_address HAVING MAX(logins.login_duration) > 7200 ORDER BY longest_login_duration DESC; Explanation: SELECT users.user_name, users.email_address, MAX(logins.login_duration) as longest_login_duration: The SELECT clause displays the user names, email addresses, and the duration of their longest single login session. FROM users: This specifies the source table as users. JOIN logins ON users.user_id = logins.user_id: Performs an inner join between the users and logins tables based on the user ID. GROUP BY users.user_name: This groups the data by the user names, creating one group for each user. HAVING MAX(logins.login_duration) > 7200: The HAVING clause filters the result set, including only users with login durations exceeding 2 hours (7200 seconds). ORDER BY longest_login_duration DESC: This orders the result by the longest login duration, in descending order. Output: user_nameemail_addresslongest_login_duration marty_89ff_7799@mail.com9067 scott_32scott_oc@mail.com7591 rachel_2003ray_hhh@mail.com7231 This query provides the IT security team with users’ names, email addresses, and the length of their longest single login sessions. It only shows those users with extended login durations for increased security scrutiny. In this case, the HAVING clause is important because it allows us to filter the result set based on aggregated data, specifically the maximum login duration achieved by each user. Because we want to identify users who have logged in for more than two hours, the HAVING clause allows us to impose this condition on the maximum login duration. Common Use Cases for HAVING The HAVING clause in SQL is a critical tool in real-world scenarios where data analytics, reporting, and business intelligence are important. Let's look at some common use cases that demonstrate the versatility of HAVING. 1. Customer Segmentation Businesses benefit from understanding different customer groups so they can target their marketing efforts. Creating customer segments allows for tailored strategies, which improves customer satisfaction. Using HAVING, we can filter high-value customers based on purchase thresholds: SELECT customer_id, SUM(purchase_amount) as total_purchase FROM purchases GROUP BY customer_id HAVING SUM(purchase_amount) > 500; 2. Employee Performance Analysis HR professionals can use SQL to help assess employees' performance. This facilitates fair and specific performance evaluations. Using the following query, we can find employees who have completed more than a certain number of projects: SELECT employee_id, COUNT(project_id) as completed_projects FROM projects GROUP BY employee_id HAVING COUNT(project_id) >= 5; 3. Product Sales Monitoring To ensure efficient inventory management, businesses need to track top-performing products. This can help them identify high-selling items and allocate resources more effectively. Using the following query, we can filter products with sales that exceed a certain threshold: SELECT product_id, SUM(sales_quantity) as total_sales FROM sales GROUP BY product_id HAVING SUM(sales_quantity) > 50; 4. Event Attendance Tracking Organizers of events may want to identify highly engaged attendees – i.e. those who have attended more than one event. Finding insights from these attendees can help with future event planning. Using HAVING in the following query enables us to identify highly engaged attendees based on their event count: SELECT attendee_id, COUNT(event_id) as attended_events FROM event_attendance GROUP BY attendee_id HAVING COUNT(event_id) > 3; 5. Quality Control in Manufacturing In manufacturing, quality control is essential for producing high-quality products. By specifying thresholds in their SQL queries, the team can ensure the quality of their products. Using HAVING in the following query, we can identify products with defect rates less than the specified limit: SELECT product_id, AVG(defect_rate) as avg_defect_rate FROM manufacturing_quality GROUP BY product_id HAVING AVG(defect_rate) > 0.02; Troubleshooting Tips for the HAVING Clause Using the HAVING clause in SQL can sometimes result in common errors, especially if you are just getting started. Here are some troubleshooting tips to address possible issues: Order of SQL Statements When first starting out with SQL queries, it is common to make mistakes with the order of the statements. Always ensure your query follows this order: SELECT column1, COUNT(column2) as count_column2 FROM my_table WHERE condition GROUP BY column1 HAVING COUNT(column2) > 10 ORDER BY column1; Not Grouping Data When using the HAVING clause, the data must first be grouped using the GROUP BY clause. The following query would result in an error because it doesn’t have GROUP BY: SELECT category, COUNT(product_id) as product_count FROM products HAVING COUNT(product_id) > 5; Inconsistency of SELECT and GROUP BY The nonaggregated column in the SELECT statement must be included in the GROUP BY clause. The example below demonstrates that because the category is unaggregated, it must be included in the GROUP BY clause. SELECT category, COUNT(product_id) as product_count FROM products GROUP BY category HAVING COUNT(product_id) > 5; Incorrect Use of HAVING Conditions The following query will execute without error, but it doesn’t follow best practices. The HAVING clause is used to filter results based on aggregated values, so including non-aggregated columns or incorrect conditions contradicts its intended purpose. SELECT category, COUNT(product_id) as product_count FROM products GROUP BY category HAVING category = 'Electronics'; Although the query executes successfully, it is recommended to use the WHERE clause for filtering non-aggregated columns. This complies with best practices and ensures the clarity of your SQL code. WHERE vs. HAVING Errors may occur as a result of confusion between WHERE and HAVING. To avoid this, remember that WHERE is for rows and HAVING is for groups. In the previous query with the error, we could more appropriately use WHERE rather than HAVING: SELECT category, COUNT(product_id) as product_count FROM products WHERE category = 'Electronics' GROUP BY category; Excluding Aggregate Functions from HAVING When filtering based on aggregated values, remember to include the aggregate function in the HAVING clause. In the following example, product_id should be COUNT(product_id) in the HAVING clause: SELECT category, COUNT(product_id) FROM products GROUP BY category HAVING product_id > 5; There are many more possible SQL syntax errors. The good news is that, with practice, you will quickly learn how to avoid them. When you encounter an error, pay attention to the error message, which can help you identify the specific problem. Ready for Some SQL HAVING Clause Practice? The SQL HAVING clause is a valuable tool for optimizing query results in data analysis, reporting, and Business Intelligence. Mastering this clause enables SQL users to extract valuable insights from aggregated data. As you progress through your SQL journey, don't be afraid to make mistakes while writing queries with the HAVING clause; it's all part of the learning process! The good news is, you're not alone. There is plenty of material available on LearnSQL.com to help you master the HAVING clause. For more examples, see our SQL HAVING Tutorial. To go even further, check out our SQL from A to Z track. It covers everything SQL has to offer, including the all-important HAVING clause. This track allows you to combine theoretical knowledge with hands-on practice by writing your own SQL queries in an interactive console and receiving instant feedback. So what are you waiting for? Let’s get started! Tags: GROUP BY