Back to articles list Articles Cookbook
9 minutes read

How to Use CASE WHEN in GROUP BY

Learn how you can combine SQL CASE WHEN and GROUP BY to create custom categories in your SQL queries.

Raw data, by its very nature, is not always human readable. Many times, the data you’re querying is in its most unformatted form. Examples of this include codes for different business departments or product SKUs that represent specific products. To the naked eye, these codes mean nothing, so pulling them into a report is not helpful for the person reading them.

There are also situations when raw data should be rolled up to higher levels to make it more easily digestible. For example, a list of 50 products could be rolled up to 5 product categories, making your report much easier to read and understand.

In these types of situations, we can use the CASE WHEN and GROUP BY statements to format data and add metadata to our original dataset. We will explore how these statements work throughout this article. For a more in-depth look into these concepts, take our Creating Basic SQL Reports, which will walk you through all the nuances of CASE and GROUP BY. It’s a great follow-up to this article.

First, let’s review the CASE WHEN statement with a few examples. If you’d like more information on the CASE statement, be sure to read our article How to Use CASE in SQL.

A Brief Review of CASE WHEN

You can think of the CASE WHEN statement as if..then logic for your query. It evaluates conditions and if it finds the condition to be true, it will return a specific defined result.

There are three important pieces to CASE statements in SQL: CASE WHEN, THEN, and END. Each of these is needed or your query will return an error.

You begin the statement with CASE WHEN to define your logical condition. After that you use THEN to define the value if that condition is true. After your last THEN statement, you use END to close the clause.

Another optional clause is the ELSE clause. If all logical conditions in the CASE WHEN statement fail, you can use ELSE to assign a value to this data. It is effectively a catch-all. If your data fails all your conditions and you do not use ELSE, the data will return a NULL value.

This is what CASE WHEN syntax looks like:

CASE WHEN product = ‘Shirt’ THEN ‘Clothing’ 
     WHEN product = ‘Hat’ THEN ‘Accessories’ 
     ELSE ‘Other’
END 

If the product is a shirt, the CASE WHEN assigns it to the Clothing category. If the product is a hat, the CASE WHEN assigns it to the Accessories category. Otherwise, the CASE WHEN assigns the product to the Other category.

To illustrate further, consider the following data found in the cities table:

citystateprice_to_income_ratiomortgage_as_pct_of_incomehomeowner_pctpopulation
Santa BarbaraCA13.3103.753%88,000
BrooklynNY11.289.930%2,533,862
QueensNY11.191.345%2,271,000
New YorkNY10.485.924%8,468,000
OaklandCA9.477.541%433,800
SunnyvaleCA9.376.545%152,300
San DiegoCA8.266.354%1,382,000
San FranciscoCA9.273.238%815,200
Long BeachCA8.569.641%456,000
BuffaloNY6.55343%276,800

If we wanted to categorize each city’s population into low, medium, or high we would run:

SELECT 
  city,
  population, 
  CASE 
   WHEN population < 500000 THEN ‘Low’ 
   WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ 
   WHEN population >= 1500000 THEN ‘High’ 
  END as population_level
FROM cities

In this query, we’re adding conditional logic based on the population field. If the population is less than 500,000, then we assign the value of Low. If the population is between 500,000 and 1,500,000, then we assign the value Medium. Finally, if the population is greater than 1,500,000, we assign the value High. We are then aliasing the column as population_level. The returned result looks like this:

citypopulationpopulation_level
Santa Barbara88,000Low
Brooklyn2,533,862High
Queens2,271,000High
New York8,468,000High
Oakland433,800Low
Sunnyvale152,300Low
San Diego1,382,000Medium
San Francisco815,200Medium
Long Beach456,000Low
Buffalo276,800Low

What if we wanted to find out the average population for each population_level? In this scenario, we can accomplish this by including an aggregation and  GROUP BY clause. We’ll demonstrate this in the next section.

Using CASE WHEN with GROUP BY

Example 1: Aggregations of Custom Categories

Now that we have our custom category of population_level, we can calculate different metrics for it. In this example, we’ll calculate the average population for each population_level.

We’ll use the same CASE WHEN statement as above, add in an aggregation for average, and then GROUP BY using the same syntax.

Let’s run this query:

SELECT  
  CASE 
   WHEN population < 500000 THEN ‘Low’ 
   WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ 
   WHEN population >= 1500000 THEN ‘High’ 
  END as population_level, 
  AVG(population) as average_population
FROM cities
GROUP BY
  CASE 
   WHEN population < 500000 THEN ‘Low’ 
   WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ 
   WHEN population >= 1500000 THEN ‘High’ 
  END

We’ve used the same population category as in the query above. We’ve added a calculation for the average population. Note that we have repeated the same CASE WHEN expression in SELECT and in GROUP BY.

When this query is run, the data is first evaluated in the logical conditions of the CASE WHEN statement and assigned a value for population_level. Then the average is computed across each of these levels with the GROUP BY.

The returned results look like this:

population_levelaverage_population
Low281,380
Medium1,098,600
High4,424,287

If you need a refresher on the GROUP BY clause, read this article on GROUP BY in SQL.

Example 2: CASE WHEN With ELSE in GROUP BY

Another way to write this query would be to utilize the ELSE clause. You would define the first two population levels and then use ELSE to bucket every other city into high. This would look like the following query:

SELECT  
  CASE 
    WHEN population < 500000 THEN ‘Low’ 
    WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ 
    ELSE ‘High’ 
  END as population_level, 
  AVG(population) as average_population
FROM cities
GROUP BY 
  CASE 
   WHEN population < 500000 THEN ‘Low’ 
   WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ 
   WHEN population >= 1500000 THEN ‘High’ 
  END

We don’t necessarily need to define the logic for high-population cities; if the city fails the first two logic statements, then the population is greater than 1,500,000.

Example 3: CASE WHEN Statements Inside Aggregations

We can also place CASE WHEN statements inside of aggregate functions to count the number of records that meet certain conditions. Continuing with the same example, let’s consider how we would count the number of cities in each population_level.

We will need three separate CASE WHEN  statements for this example, one for each condition that we’ve defined in our CASE WHEN  statement: low, medium, and high.

SELECT  
  SUM(CASE 
      WHEN population < 500000 
      THEN 1 
      ELSE 0 END) as low_pop_ct,
  SUM(CASE 
      WHEN population >= 500000 and population < 1500000 
      THEN 1   
      ELSE 0 END) as medium_pop_ct, 
  SUM(CASE 
      WHEN population >= 1500000 
      THEN 1 
      ELSE 0 END) as high_pop_ct
FROM cities

Let’s walk through this line by line. In the first line, we’re evaluating if the city has a population less than 500,000. If it does, we’re assigning the value of 1 to that city. If it does not, it is assigned 0 by the ELSE clause. Then we’ve wrapped that entire CASE WHEN statement in a SUM(). This effectively counts the rows for cities with low populations.

We then repeat this pattern in the next two lines, using the same logical conditions we used earlier to assign the value of medium and high to cities based on their population.

This query will return:

low_pop_ctmedium_pop_cthigh_pop_ct
523

Now we have a count of the number of cities that fall into each category.

Interested in learning more? Check out how to use CASE WHEN with SUM() for more examples.

Example 4: CASE WHEN in GROUP BY

Let’s take a look at a few more examples using CASE WHEN in GROUP BY. The data below is contained in the table products.

skudescriptionpricestatus
978568952cowl neck sweater59in stock
978548759embroidered v neck blouse49in stock
978125698notched collar button down blazer79in stock
979156258oversized stripe shirt29sale
979145875polka dot maxi dress109back ordered
978457852rib knit t shirt19sale
978333562cropped denim jacket99back ordered
978142154sleeveless midi dress89in stock
979415858utility jumpsuit59sale
978112546scoop neck sweater49in stock

Let’s enhance this data with a new column that contains a product_category. This will allow us to group the individual products into a higher-level category so we can then compute aggregations.

In order to add a column for product_category, we’ll run:

SELECT *, 
  CASE WHEN description LIKE '%sweater%' 
         OR description LIKE '%blazer%' 
         OR description LIKE '%jacket%' THEN  'Outerwear'
       WHEN description LIKE '%dress%' 
         OR description LIKE '%jumpsuit%' THEN  'Dresses' 
       WHEN description LIKE '%shirt%' 
         OR description LIKE '%blouse%' THEN 'Tops'
  END as product_category 
FROM products

We’re using the LIKE operator to search the description field for each string in quotations. The % operator just means that there can be anything before or after. We’re effectively searching each description for any of the key words listed in the WHEN statements. The results are shown below:

skudescriptionpricestatusproduct_category
978568952cowl neck sweater59in stockOuterwear
978548759embroidered v neck blouse49in stockTops
978125698notched collar button down blazer79in stockOuterwear
979156258oversized stripe shirt29saleTops
979145875polka dot maxi dress109back orderedDresses
978457852rib knit t shirt19saleTops
978333562cropped denim jacket99back orderedOuterwear
978142154sleeveless midi dress89in stockDresses
979415858utility jumpsuit59saleDresses
978112546scoop neck sweater49in stockOuterwear

Now that we have our new product_category field, we can count the number of products in each category by utilizing GROUP BY:

SELECT
  CASE WHEN description LIKE '%sweater%' 
         OR description LIKE '%blazer%' 
         OR description LIKE '%jacket%' THEN  'Outerwear'
       WHEN description LIKE '%dress%' 
         OR description LIKE '%jumpsuit%' THEN  'Dresses' 
       WHEN description LIKE '%shirt%' 
         OR description LIKE '%blouse%' THEN 'Tops'
   END as product_category, 
   COUNT(DISTINCT description) as number_of_products
FROM products 
GROUP BY 
  CASE WHEN description LIKE '%sweater%' 
         OR description LIKE '%blazer%' 
         OR description LIKE '%jacket%' THEN  'Outerwear'
       WHEN description LIKE '%dress%' 
         OR description LIKE '%jumpsuit%' THEN  'Dresses' 
       WHEN description LIKE '%shirt%' 
         OR description LIKE '%blouse%' THEN 'Tops'
  END 
product_categorynumber_of_products
Outerwear4
Tops3
Dresses3

In this example, we’ve used CASE WHEN in the SELECT clause, but this is not always needed. You could also just run the above query without it:

SELECT COUNT(DISTINCT description) as number_of_products
FROM products 
GROUP BY 
  CASE WHEN description LIKE '%sweater%' 
         OR description LIKE '%blazer%' 
         OR description LIKE '%jacket%' THEN  'Outerwear'
       WHEN description LIKE '%dress%' 
         OR description LIKE '%jumpsuit%' THEN  'Dresses' 
       WHEN description LIKE '%shirt%' 
         OR description LIKE '%blouse%' THEN 'Tops'
  END 
number_of_products
4
3
3

What’s Next with CASE WHEN and GROUP BY?

As you can see, there are many different scenarios where combining the CASE WHEN and GROUP BY statements are extremely useful. They allow you to add business logic to your data and then compute metrics based on the newly-defined data fields. You can use CASE WHEN statements both outside and inside of aggregations; they follow the same syntax. Start your statement with CASE WHEN to define your conditional logic and then assign values with the THEN/ELSE statements. Finally, close it out with END.

Be sure to check out our course on Creating Basic SQL Reports. You’ll learn all the nuances of CASE and GROUP BY and get real life practice problems to complete!