20th Dec 2022 8 minutes read How to Use WHERE with GROUP BY in SQL Nicole Darnley sql GROUP BY WHERE Table of Contents The WHERE Clause The GROUP BY Clause Using WHERE and GROUP BY Together WHERE and GROUP BY - Where to Go From Here In this article, we’ll discuss how to combine the WHERE and GROUP BY clauses in SQL. If you’re writing SQL on a daily basis, you will quickly realize how often both WHERE and GROUP BY clauses are used. WHERE is an essential part of most queries. It allows you to filter large datasets to only the pieces you are interested in. GROUP BY is one of the most powerful tools an analyst has when aggregating data. By the end of this article, you will understand how to effectively use both of these clauses while also avoiding common pitfalls. Upgrade your SQL skills with our SQL Practice track! This track consists of 9 SQL practice courses (and we keep adding more!) and offers over 900 hands-on practice exercises. Each course is interactive, and the exercises are based on real-world scenarios. Don't miss this opportunity to practice SQL in a structured and interactive way. When it comes to writing SQL, consider each clause (SELECT, WHERE, GROUP BY, etc.) as a separate tool. As you learn more SQL syntax, you add that tool to your tool kit. As you can imagine, a mechanic with only a few tools will not be very effective. I would not recommend you take your car to someone with only a handful of tools. It’s the same with data analytics. The more tools you have at your disposal, the more quickly and efficiently you can analyze different data sets. Let’s get started by taking an in-depth look at WHERE and GROUP BY separately. After we’ve established this foundational knowledge, we’ll combine the two and unleash their full power. The WHERE Clause As stated previously, the WHERE clause is used to filter a dataset and return only the records that match some specified criteria. Consider the following dataset, which includes country populations for 2022. For illustration purposes, we’ll say this data is housed in a table called world_populations and only includes the top 10 most populous countries. We’ll take a look at this data by running: SELECT * FROM world_populations; Here is the result: countrypopulationyearly_changenet_changemed_ageurban_pop_pctworld_share China1,439,323,7760.00395,540,09038.61.1847 India1,380,004,3850.009913,586,63128.35.1770 United States331,002,6510.00591,937,73438.83.0425 Indonesia273,523,615.01072,898,04730.56.0351 Pakistan220,892,340.024,327,02223.35.0283 Brazil212,559,41700721,509,89033.88.0273 Nigeria206,139,589.02585,175,99018.52.0264 Bangladesh164,689,383.01011,643,22228.39.0211 Russia145,934,462.000462,20640.74.0187 Mexico128,932,753.01061,357,22429.84.0165 OK, we’ve got our data. Now let’s add some filters. Example #1 In our first example, we only want to see countries where the population is greater than 200 million. We’ll accomplish this by running the following query: SELECT * FROM world_populations WHERE population > 200000000; And the result: countrypopulationyearly_changenet_changemed_ageurban_pop_pctworld_share China14393237760.0039554009038.61.1847 India13800043850.00991358663128.35.1770 United States3310026510.0059193773438.83.0425 Indonesia273523615.0107289804730.56.0351 Pakistan220892340.02432702223.35.0283 Brazil2125594170.72%15098903388%2.73% Nigeria2061395892.58%51759901852%2.64% We can see that the WHERE clause filtered out Bangladesh, Russia, and Mexico because their populations fall below the threshold established in our query. Example #2 The WHERE clause can also support multiple filters. Let’s only list countries that saw a positive yearly change in the number of migrants and where at least 80% of the population lived in urban areas. Note that we’ve used AND in the WHERE clause to connect these two conditions: SELECT * FROM world_populations WHERE migrants > 0 AND urban_pop_pct > .80; This query returns: countrypopulationyearly_changenet_changemed_ageurban_pop_pctworld_share United States3310026510.0059193773438.83.0425 Brazil2125594170.0072150989033.88.0273 There is no limit to the amount of filtering you can do in a WHERE clause. For more details, check out our Complete Guide to the SQL WHERE Clause The GROUP BY Clause To understand GROUP BY, we’ll use a different dataset called transactions. It contains transactional data for an online retailer: DescriptionProductCategoryQuantityUnitPriceCustomerID KNITTED UNION FLAG HOT WATER BOTTLEKitchen63.3917850 POPPY'S PLAYHOUSE BEDROOMToys62.117850 IVORY KNITTED MUG COSYKitchen61.6513047 BOX OF VINTAGE JIGSAW BLOCKSToys34.9513047 RED COAT RACK PARIS FASHIONClothing34.9513047 YELLOW COAT RACK PARIS FASHIONClothing34.9513047 BLUE COAT RACK PARIS FASHIONClothing34.9513047 Example #1 The GROUP BY operator is used to aggregate data by grouping records that share the same value in a specified field. To answer the question, “How many items were sold in each product category?” we would run the following query: SELECT productCategory, SUM(quantity) as quantity FROM transactions GROUP BY productCategory; Here are the results: ProductCategoryQuantity Kitchen12 Toys9 Clothing9 There are two products in the kitchen product category: KNITTED UNION FLAG HOT WATER BOTTLE and IVORY KNITTED MUG COSY. Each of these products has a quantity of 6; therefore, the total sum for the kitchen product category is 12. There are two products in the toy product category. POPPY'S PLAYHOUSE BEDROOM has a quantity of 6 and BOX OF VINTAGE JIGSAW BLOCKS has a quantity of 3, for a total sum of 9. Finally, there are three products in the clothing category, each with a quantity of 3. This brings the total sum of the clothing category to 9. We will not go through them all, but there are many types of aggregations available to the data analyst. These include SUM(), AVG(), COUNT(), MEDIAN(), MIN(), and MAX(). More information can be found in the article How to Use Aggregate Functions in the WHERE Clause. Example #2 What if we want to aggregate quantities for each product category and customer? We can GROUP BY multiple columns: SELECT customerId, productCategory, SUM(quantity) as quantity FROM transactions GROUP BY customerId, productCategory; And this is the result: CustomerIDProductCategoryQuantity 17850Kitchen6 17850Toys6 13047Kitchen6 13047Toys3 13047Clothing9 We can see that there’s a row for each combination of customer and product category; now we know how many items of each category every customer has purchased. For more information about GROUP BY, check out Using GROUP BY in SQL and GROUP BY in SQL Explained. Using WHERE and GROUP BY Together Now that we’ve laid the foundation, let’s combine WHERE and GROUP BY together. It’s important to remember that the WHERE clause is going to filter the dataset before the GROUP BY clause is evaluated. Also, the WHERE clause will always come before GROUP BY. If you put it after, the query will return an error. Example #1 Consider the same dataset and think about how we would answer the question “What is the average unit price of kitchen products and toys?” First we’ll need to filter for kitchen products and toys. Next, we’ll calculate the average unit price. Here’s the query: SELECT productCategory, AVG(UnitPrice) as AvgUnitPrice FROM transactions WHERE productCategory in ('Kitchen', 'Toys') GROUP BY productCategory; And the result is: ProductCategoryAvgUnitPrice Kitchen2.52 Toys3.05 The average unit price is computed only for kitchen and toy products. All other categories are filtered out of the query. Example #2 Now let’s consider how we would get the total quantity of products by category where the average unit price is greater than 3. To do this, we would run: SELECT productCategory, SUM(quantity) as quantity FROM transactions WHERE unitPrice > 3 GROUP BY productCategory; In this example, the aggregates are computed on the filtered rows. We get the following result: ProductCategoryQuantity Kitchen6 Toys3 Clothing9 Since only the products KNITTED UNION FLAG HOT WATER BOTTLE, BOX OF VINTAGE JIGSAW BLOCKS, RED COAT RACK PARIS FASHION, YELLOW COAT RACK PARIS FASHION, and BLUE COAT RACK PARIS FASHION have unit prices greater than 3, those are the only products included in the aggregation. If we were to exclude the WHERE clause, we would get the following results: ProductCategoryQuantity Kitchen12 Toys9 Clothing9 Here we see the quantities of all products, regardless of the unit price, are totaled. The total quantity is different for all products vs. for products with a unit price over 3. Example #3 What if we want to filter by an aggregated column? This is the job of the HAVING clause. You cannot put an aggregation in the WHERE clause. The HAVING clause is used instead of WHERE when filtering based on aggregate functions. We can illustrate this with another example. Let’s continue with the above example and filter the results by product categories where the average unit price is greater than 3. To accomplish this, we’d write: SELECT productCategory AVG(UnitPrice) as AvgUnitPrice FROM transactions WHERE productCategory in (‘Kitchen’, ‘Toys’) GROUP BY productCategory HAVING AVG(UnitPrice) > 3; Result: ProductCategoryAvgUnitPrice Toys3.05 The first thing that is happening is that the WHERE clause limits the data to just the kitchen and product categories. The GROUP BY then aggregates the average unit price for each category. Finally, the HAVING clause further filters the results to include only product categories with an average unit price greater than 3. If we were to remove the HAVING clause, we would see that both the kitchen and toy categories would be seen in the result. However, since the average unit price of kitchen items is less than 3, it is filtered out when we add the HAVING clause. For more examples, read our article on HAVING vs. WHERE in SQL. WHERE and GROUP BY - Where to Go From Here Great work! You’ve gained the foundational knowledge needed to combine WHERE and GROUP BY clauses in SQL. You’ve been introduced to the difference between WHERE and HAVING. More importantly, you can combine all three of these clauses together to filter and aggregate data to fit your needs. You’ve added two more tools to your toolkit and are ready to use them. A great way to reinforce these learnings is by taking our interactive SQL Basics course. Practice makes perfect! Also, be sure to check out all of our other great articles. Tags: sql GROUP BY WHERE