30th Mar 2023 9 minutes read How to Use CASE WHEN in GROUP BY Nicole Darnley GROUP BY CASE WHEN Table of Contents A Brief Review of CASE WHEN Using CASE WHEN with GROUP BY Example 1: Aggregations of Custom Categories Example 2: CASE WHEN With ELSE in GROUP BY Example 3: CASE WHEN Statements Inside Aggregations Example 4: CASE WHEN in GROUP BY What’s Next with CASE WHEN and GROUP BY? Learn how you can combine SQL CASE WHEN and GROUP BY to create custom categories in your SQL queries. Raw data, by its very nature, is not always human readable. Many times, the data you’re querying is in its most unformatted form. Examples of this include codes for different business departments or product SKUs that represent specific products. To the naked eye, these codes mean nothing, so pulling them into a report is not helpful for the person reading them. There are also situations when raw data should be rolled up to higher levels to make it more easily digestible. For example, a list of 50 products could be rolled up to 5 product categories, making your report much easier to read and understand. In these types of situations, we can use the CASE WHEN and GROUP BY statements to format data and add metadata to our original dataset. We will explore how these statements work throughout this article. For a more in-depth look into these concepts, take our Creating Basic SQL Reports, which will walk you through all the nuances of CASE and GROUP BY. It’s a great follow-up to this article. First, let’s review the CASE WHEN statement with a few examples. If you’d like more information on the CASE statement, be sure to read our article How to Use CASE in SQL. A Brief Review of CASE WHEN You can think of the CASE WHEN statement as if..then logic for your query. It evaluates conditions and if it finds the condition to be true, it will return a specific defined result. There are three important pieces to CASE statements in SQL: CASE WHEN, THEN, and END. Each of these is needed or your query will return an error. You begin the statement with CASE WHEN to define your logical condition. After that you use THEN to define the value if that condition is true. After your last THEN statement, you use END to close the clause. Another optional clause is the ELSE clause. If all logical conditions in the CASE WHEN statement fail, you can use ELSE to assign a value to this data. It is effectively a catch-all. If your data fails all your conditions and you do not use ELSE, the data will return a NULL value. This is what CASE WHEN syntax looks like: CASE WHEN product = ‘Shirt’ THEN ‘Clothing’ WHEN product = ‘Hat’ THEN ‘Accessories’ ELSE ‘Other’ END If the product is a shirt, the CASE WHEN assigns it to the Clothing category. If the product is a hat, the CASE WHEN assigns it to the Accessories category. Otherwise, the CASE WHEN assigns the product to the Other category. To illustrate further, consider the following data found in the cities table: citystateprice_to_income_ratiomortgage_as_pct_of_incomehomeowner_pctpopulation Santa BarbaraCA13.3103.753%88,000 BrooklynNY11.289.930%2,533,862 QueensNY11.191.345%2,271,000 New YorkNY10.485.924%8,468,000 OaklandCA9.477.541%433,800 SunnyvaleCA9.376.545%152,300 San DiegoCA8.266.354%1,382,000 San FranciscoCA9.273.238%815,200 Long BeachCA8.569.641%456,000 BuffaloNY6.55343%276,800 If we wanted to categorize each city’s population into low, medium, or high we would run: SELECT city, population, CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END as population_level FROM cities In this query, we’re adding conditional logic based on the population field. If the population is less than 500,000, then we assign the value of Low. If the population is between 500,000 and 1,500,000, then we assign the value Medium. Finally, if the population is greater than 1,500,000, we assign the value High. We are then aliasing the column as population_level. The returned result looks like this: citypopulationpopulation_level Santa Barbara88,000Low Brooklyn2,533,862High Queens2,271,000High New York8,468,000High Oakland433,800Low Sunnyvale152,300Low San Diego1,382,000Medium San Francisco815,200Medium Long Beach456,000Low Buffalo276,800Low What if we wanted to find out the average population for each population_level? In this scenario, we can accomplish this by including an aggregation and GROUP BY clause. We’ll demonstrate this in the next section. Using CASE WHEN with GROUP BY Example 1: Aggregations of Custom Categories Now that we have our custom category of population_level, we can calculate different metrics for it. In this example, we’ll calculate the average population for each population_level. We’ll use the same CASE WHEN statement as above, add in an aggregation for average, and then GROUP BY using the same syntax. Let’s run this query: SELECT CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END as population_level, AVG(population) as average_population FROM cities GROUP BY CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END We’ve used the same population category as in the query above. We’ve added a calculation for the average population. Note that we have repeated the same CASE WHEN expression in SELECT and in GROUP BY. When this query is run, the data is first evaluated in the logical conditions of the CASE WHEN statement and assigned a value for population_level. Then the average is computed across each of these levels with the GROUP BY. The returned results look like this: population_levelaverage_population Low281,380 Medium1,098,600 High4,424,287 If you need a refresher on the GROUP BY clause, read this article on GROUP BY in SQL. Example 2: CASE WHEN With ELSE in GROUP BY Another way to write this query would be to utilize the ELSE clause. You would define the first two population levels and then use ELSE to bucket every other city into high. This would look like the following query: SELECT CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ ELSE ‘High’ END as population_level, AVG(population) as average_population FROM cities GROUP BY CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END We don’t necessarily need to define the logic for high-population cities; if the city fails the first two logic statements, then the population is greater than 1,500,000. Example 3: CASE WHEN Statements Inside Aggregations We can also place CASE WHEN statements inside of aggregate functions to count the number of records that meet certain conditions. Continuing with the same example, let’s consider how we would count the number of cities in each population_level. We will need three separate CASE WHEN statements for this example, one for each condition that we’ve defined in our CASE WHEN statement: low, medium, and high. SELECT SUM(CASE WHEN population < 500000 THEN 1 ELSE 0 END) as low_pop_ct, SUM(CASE WHEN population >= 500000 and population < 1500000 THEN 1 ELSE 0 END) as medium_pop_ct, SUM(CASE WHEN population >= 1500000 THEN 1 ELSE 0 END) as high_pop_ct FROM cities Let’s walk through this line by line. In the first line, we’re evaluating if the city has a population less than 500,000. If it does, we’re assigning the value of 1 to that city. If it does not, it is assigned 0 by the ELSE clause. Then we’ve wrapped that entire CASE WHEN statement in a SUM(). This effectively counts the rows for cities with low populations. We then repeat this pattern in the next two lines, using the same logical conditions we used earlier to assign the value of medium and high to cities based on their population. This query will return: low_pop_ctmedium_pop_cthigh_pop_ct 523 Now we have a count of the number of cities that fall into each category. Interested in learning more? Check out how to use CASE WHEN with SUM() for more examples. Example 4: CASE WHEN in GROUP BY Let’s take a look at a few more examples using CASE WHEN in GROUP BY. The data below is contained in the table products. skudescriptionpricestatus 978568952cowl neck sweater59in stock 978548759embroidered v neck blouse49in stock 978125698notched collar button down blazer79in stock 979156258oversized stripe shirt29sale 979145875polka dot maxi dress109back ordered 978457852rib knit t shirt19sale 978333562cropped denim jacket99back ordered 978142154sleeveless midi dress89in stock 979415858utility jumpsuit59sale 978112546scoop neck sweater49in stock Let’s enhance this data with a new column that contains a product_category. This will allow us to group the individual products into a higher-level category so we can then compute aggregations. In order to add a column for product_category, we’ll run: SELECT *, CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END as product_category FROM products We’re using the LIKE operator to search the description field for each string in quotations. The % operator just means that there can be anything before or after. We’re effectively searching each description for any of the key words listed in the WHEN statements. The results are shown below: skudescriptionpricestatusproduct_category 978568952cowl neck sweater59in stockOuterwear 978548759embroidered v neck blouse49in stockTops 978125698notched collar button down blazer79in stockOuterwear 979156258oversized stripe shirt29saleTops 979145875polka dot maxi dress109back orderedDresses 978457852rib knit t shirt19saleTops 978333562cropped denim jacket99back orderedOuterwear 978142154sleeveless midi dress89in stockDresses 979415858utility jumpsuit59saleDresses 978112546scoop neck sweater49in stockOuterwear Now that we have our new product_category field, we can count the number of products in each category by utilizing GROUP BY: SELECT CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END as product_category, COUNT(DISTINCT description) as number_of_products FROM products GROUP BY CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END product_categorynumber_of_products Outerwear4 Tops3 Dresses3 In this example, we’ve used CASE WHEN in the SELECT clause, but this is not always needed. You could also just run the above query without it: SELECT COUNT(DISTINCT description) as number_of_products FROM products GROUP BY CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END number_of_products 4 3 3 What’s Next with CASE WHEN and GROUP BY? As you can see, there are many different scenarios where combining the CASE WHEN and GROUP BY statements are extremely useful. They allow you to add business logic to your data and then compute metrics based on the newly-defined data fields. You can use CASE WHEN statements both outside and inside of aggregations; they follow the same syntax. Start your statement with CASE WHEN to define your conditional logic and then assign values with the THEN/ELSE statements. Finally, close it out with END. Be sure to check out our course on Creating Basic SQL Reports. You’ll learn all the nuances of CASE and GROUP BY and get real life practice problems to complete! Tags: GROUP BY CASE WHEN