How to Use CASE WHEN in GROUP BY
Learn how you can combine SQL CASE WHEN and GROUP BY to create custom categories in your SQL queries.
Raw data, by its very nature, is not always human readable. Many times, the data you’re querying is in its most unformatted form. Examples of this include codes for different business departments or product SKUs that represent specific products. To the naked eye, these codes mean nothing, so pulling them into a report is not helpful for the person reading them.
There are also situations when raw data should be rolled up to higher levels to make it more easily digestible. For example, a list of 50 products could be rolled up to 5 product categories, making your report much easier to read and understand.
In these types of situations, we can use the CASE WHEN
and GROUP BY
statements to format data and add metadata to our original dataset. We will explore how these statements work throughout this article. For a more in-depth look into these concepts, take our Creating Basic SQL Reports, which will walk you through all the nuances of CASE
and GROUP BY
. It’s a great follow-up to this article.
First, let’s review the CASE WHEN
statement with a few examples. If you’d like more information on the CASE
statement, be sure to read our article How to Use CASE in SQL.
A Brief Review of CASE WHEN
You can think of the CASE WHEN
statement as if..then logic for your query. It evaluates conditions and if it finds the condition to be true, it will return a specific defined result.
There are three important pieces to CASE
statements in SQL: CASE WHEN
, THEN
, and END
. Each of these is needed or your query will return an error.
You begin the statement with CASE WHEN
to define your logical condition. After that you use THEN
to define the value if that condition is true. After your last THEN
statement, you use END
to close the clause.
Another optional clause is the ELSE
clause. If all logical conditions in the CASE WHEN
statement fail, you can use ELSE
to assign a value to this data. It is effectively a catch-all. If your data fails all your conditions and you do not use ELSE
, the data will return a NULL value.
This is what CASE WHEN
syntax looks like:
CASE WHEN product = ‘Shirt’ THEN ‘Clothing’ WHEN product = ‘Hat’ THEN ‘Accessories’ ELSE ‘Other’ END |
If the product is a shirt, the CASE WHEN
assigns it to the Clothing
category. If the product is a hat, the CASE WHEN
assigns it to the Accessories
category. Otherwise, the CASE WHEN
assigns the product to the Other
category.
To illustrate further, consider the following data found in the cities
table:
city | state | price_to_income_ratio | mortgage_as_pct_of_income | homeowner_pct | population |
---|---|---|---|---|---|
Santa Barbara | CA | 13.3 | 103.7 | 53% | 88,000 |
Brooklyn | NY | 11.2 | 89.9 | 30% | 2,533,862 |
Queens | NY | 11.1 | 91.3 | 45% | 2,271,000 |
New York | NY | 10.4 | 85.9 | 24% | 8,468,000 |
Oakland | CA | 9.4 | 77.5 | 41% | 433,800 |
Sunnyvale | CA | 9.3 | 76.5 | 45% | 152,300 |
San Diego | CA | 8.2 | 66.3 | 54% | 1,382,000 |
San Francisco | CA | 9.2 | 73.2 | 38% | 815,200 |
Long Beach | CA | 8.5 | 69.6 | 41% | 456,000 |
Buffalo | NY | 6.5 | 53 | 43% | 276,800 |
If we wanted to categorize each city’s population into low, medium, or high we would run:
SELECT city, population, CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END as population_level FROM cities |
In this query, we’re adding conditional logic based on the population
field. If the population is less than 500,000, then we assign the value of Low
. If the population is between 500,000 and 1,500,000, then we assign the value Medium
. Finally, if the population is greater than 1,500,000, we assign the value High
. We are then aliasing the column as population_level
. The returned result looks like this:
city | population | population_level |
---|---|---|
Santa Barbara | 88,000 | Low |
Brooklyn | 2,533,862 | High |
Queens | 2,271,000 | High |
New York | 8,468,000 | High |
Oakland | 433,800 | Low |
Sunnyvale | 152,300 | Low |
San Diego | 1,382,000 | Medium |
San Francisco | 815,200 | Medium |
Long Beach | 456,000 | Low |
Buffalo | 276,800 | Low |
What if we wanted to find out the average population for each population_level
? In this scenario, we can accomplish this by including an aggregation and GROUP BY
clause. We’ll demonstrate this in the next section.
Using CASE WHEN with GROUP BY
Example 1: Aggregations of Custom Categories
Now that we have our custom category of population_level
, we can calculate different metrics for it. In this example, we’ll calculate the average population for each population_level
.
We’ll use the same CASE WHEN
statement as above, add in an aggregation for average, and then GROUP BY
using the same syntax.
Let’s run this query:
SELECT CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END as population_level, AVG (population) as average_population FROM cities GROUP BY CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END |
We’ve used the same population category as in the query above. We’ve added a calculation for the average population. Note that we have repeated the same CASE WHEN
expression in SELECT
and in GROUP BY
.
When this query is run, the data is first evaluated in the logical conditions of the CASE WHEN
statement and assigned a value for population_level
. Then the average is computed across each of these levels with the GROUP BY
.
The returned results look like this:
population_level | average_population |
---|---|
Low | 281,380 |
Medium | 1,098,600 |
High | 4,424,287 |
If you need a refresher on the GROUP BY
clause, read this article on GROUP BY in SQL.
Example 2: CASE WHEN With ELSE in GROUP BY
Another way to write this query would be to utilize the ELSE
clause. You would define the first two population levels and then use ELSE
to bucket every other city into high
. This would look like the following query:
SELECT CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ ELSE ‘High’ END as population_level, AVG (population) as average_population FROM cities GROUP BY CASE WHEN population < 500000 THEN ‘Low’ WHEN population >= 500000 and population < 1500000 THEN ‘Medium’ WHEN population >= 1500000 THEN ‘High’ END |
We don’t necessarily need to define the logic for high-population cities; if the city fails the first two logic statements, then the population is greater than 1,500,000.
Example 3: CASE WHEN Statements Inside Aggregations
We can also place CASE WHEN
statements inside of aggregate functions to count the number of records that meet certain conditions. Continuing with the same example, let’s consider how we would count the number of cities in each population_level
.
We will need three separate CASE WHEN
statements for this example, one for each condition that we’ve defined in our CASE WHEN
statement: low
, medium
, and high
.
SELECT SUM ( CASE WHEN population < 500000 THEN 1 ELSE 0 END ) as low_pop_ct, SUM ( CASE WHEN population >= 500000 and population < 1500000 THEN 1 ELSE 0 END ) as medium_pop_ct, SUM ( CASE WHEN population >= 1500000 THEN 1 ELSE 0 END ) as high_pop_ct FROM cities |
Let’s walk through this line by line. In the first line, we’re evaluating if the city has a population less than 500,000. If it does, we’re assigning the value of 1 to that city. If it does not, it is assigned 0 by the ELSE
clause. Then we’ve wrapped that entire CASE WHEN
statement in a SUM()
. This effectively counts the rows for cities with low populations.
We then repeat this pattern in the next two lines, using the same logical conditions we used earlier to assign the value of medium
and high
to cities based on their population.
This query will return:
low_pop_ct | medium_pop_ct | high_pop_ct |
---|---|---|
5 | 2 | 3 |
Now we have a count of the number of cities that fall into each category.
Interested in learning more? Check out how to use CASE WHEN with SUM() for more examples.
Example 4: CASE WHEN in GROUP BY
Let’s take a look at a few more examples using CASE WHEN
in GROUP BY
. The data below is contained in the table products
.
sku | description | price | status |
---|---|---|---|
978568952 | cowl neck sweater | 59 | in stock |
978548759 | embroidered v neck blouse | 49 | in stock |
978125698 | notched collar button down blazer | 79 | in stock |
979156258 | oversized stripe shirt | 29 | sale |
979145875 | polka dot maxi dress | 109 | back ordered |
978457852 | rib knit t shirt | 19 | sale |
978333562 | cropped denim jacket | 99 | back ordered |
978142154 | sleeveless midi dress | 89 | in stock |
979415858 | utility jumpsuit | 59 | sale |
978112546 | scoop neck sweater | 49 | in stock |
Let’s enhance this data with a new column that contains a product_category
. This will allow us to group the individual products into a higher-level category so we can then compute aggregations.
In order to add a column for product_category
, we’ll run:
SELECT *, CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END as product_category FROM products |
We’re using the LIKE
operator to search the description field for each string in quotations. The %
operator just means that there can be anything before or after. We’re effectively searching each description for any of the key words listed in the WHEN
statements. The results are shown below:
sku | description | price | status | product_category |
---|---|---|---|---|
978568952 | cowl neck sweater | 59 | in stock | Outerwear |
978548759 | embroidered v neck blouse | 49 | in stock | Tops |
978125698 | notched collar button down blazer | 79 | in stock | Outerwear |
979156258 | oversized stripe shirt | 29 | sale | Tops |
979145875 | polka dot maxi dress | 109 | back ordered | Dresses |
978457852 | rib knit t shirt | 19 | sale | Tops |
978333562 | cropped denim jacket | 99 | back ordered | Outerwear |
978142154 | sleeveless midi dress | 89 | in stock | Dresses |
979415858 | utility jumpsuit | 59 | sale | Dresses |
978112546 | scoop neck sweater | 49 | in stock | Outerwear |
Now that we have our new product_category
field, we can count the number of products in each category by utilizing GROUP BY
:
SELECT CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END as product_category, COUNT ( DISTINCT description) as number_of_products FROM products GROUP BY CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END |
product_category | number_of_products |
---|---|
Outerwear | 4 |
Tops | 3 |
Dresses | 3 |
In this example, we’ve used CASE WHEN
in the SELECT
clause, but this is not always needed. You could also just run the above query without it:
SELECT COUNT ( DISTINCT description) as number_of_products FROM products GROUP BY CASE WHEN description LIKE '%sweater%' OR description LIKE '%blazer%' OR description LIKE '%jacket%' THEN 'Outerwear' WHEN description LIKE '%dress%' OR description LIKE '%jumpsuit%' THEN 'Dresses' WHEN description LIKE '%shirt%' OR description LIKE '%blouse%' THEN 'Tops' END |
number_of_products |
---|
4 |
3 |
3 |
What’s Next with CASE WHEN and GROUP BY?
As you can see, there are many different scenarios where combining the CASE WHEN
and GROUP BY
statements are extremely useful. They allow you to add business logic to your data and then compute metrics based on the newly-defined data fields. You can use CASE WHEN
statements both outside and inside of aggregations; they follow the same syntax. Start your statement with CASE WHEN
to define your conditional logic and then assign values with the THEN
/ELSE
statements. Finally, close it out with END
.
Be sure to check out our course on Creating Basic SQL Reports. You’ll learn all the nuances of CASE
and GROUP BY
and get real life practice problems to complete!