26th Jan 2023 9 minutes read

How to Group by Multiple Columns in SQL

GROUP BY

Table of Contents

GROUP BY 1 Column
GROUP BY 2 Columns
GROUP BY Multiple Columns
Other Ways of Using GROUP BY with Multiple Columns
- Using GROUP BY Multiple Columns: Grouping a Hierarchy
- Using GROUP BY Multiple Columns: Non-Hierarchical Grouping
GROUP BY with Multiple Columns Returns Faceted Information

When analyzing large data sets, you often create groupings and apply aggregate functions to find totals or averages. In these cases, using the GROUP BY clause with multiple columns unfolds its full potential.

GROUP BY is a clause of the SELECT command. It allows you to compute various statistics for a group of rows. For example, you can use GROUP BY with an employee table to know how many employees are of each gender. Or you can group by multiple columns to determine the average age of vehicles for each make and model in a vehicle_fleet table. In this article, we’ll examine in detail how grouping by multiple columns works.

This article assumes you already know how to use GROUP BY in an SQL query. Not familiar with GROUP BY? The best way to learn this and other basic SQL constructions is with our interactive SQL Basics course. It contains 129 hands-on practical exercises. In each exercise, you get a short explanation and task to solve. With each exercise completed, you build confidence in your SQL skills. This course is also a great way to review basic SQL features if your knowledge is a bit rusty.

Ok, let’s start with a refresher on a simple use case for GROUP BY.

GROUP BY 1 Column

Each combination of the values of column(s) specified in the GROUP BY clause constitutes a group; the SELECT command with a GROUP BY clause displays a single row for each group. It’s also good to note that GROUP BY allows you to apply aggregate functions on columns not included in the outstanding subset.

Let’s see an example. I have created a table called WorldWideFriends that stores data on my friends in different parts of the world:

FriendName	City	State	Country
María	Acapulco	Guerrero	México
Fernando	Caracas	Distrito Capital	Venezuela
Gerson	Medellín	Antioquía	Colombia
Mónica	Bogotá	Cundinamarca	Colombia
Paul	Bogotá	Cundinamarca	Colombia
Kevin	Lexington	Kentucky	USA
Cecilia	Godoy Cruz	Mendoza	Argentina
Pablo	Atlántida	Canelones	Uruguay
Andrea	Cdad. Mendoza	Mendoza	Argentina
Marlon	Sao Paulo	Sao Paulo	Brasil
Joao	Rio de Janeiro	Rio de Janeiro	Brasil
Andrés	Bariloche	Río Negro	Argentina
Mariano	Miami	Florida	USA

I would like to use the information in this table to do some research – e.g. to get a list of the countries where my friends live, including the number of friends living in each country.

If I wanted to know how many friends I have in each country, I’d use GROUP BY together with the COUNT() aggregate function:

SELECT 
  Country, 
  COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country;

This query gives me a result set that condenses the rows with the same country into only one row, while COUNT(*) tells me how many repeated rows there are for each country:

Country	HowMany
Argentina	3
Venezuela	1
Colombia	3
Brasil	2
USA	2
México	1
Uruguay	1

The above query gives me the information I would need if, for example, I needed to choose which country to travel to in order to meet as many of my friends as possible. If you’d like to read more about the basic usage of GROUP BY, I recommend our articles on What Is GROUP BY in SQL and How to Use GROUP BY.

However, even if I travel to a country where many of my friends live, those friends may be located in different states. I may not have time to travel from one state to another to visit them all. So I need to refine my search a bit to find the geographic location where there is a higher concentration of my friends.

GROUP BY 2 Columns

So now I need to know how my friends are distributed by state as well as country. I can find this out by adding the column State to my previous GROUP BY Country (separating them with commas) and in the SELECT clause. The query looks like this:

SELECT 
  Country, 
  State, 
  COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country, State;

Looking at the results of this query, we can see some of the countries that previously appeared in only one row now appear in several rows. The reason is that when we add the State field, the query must assemble the groups with the rows that have the same value in both Country and State.

In the previous query, the row corresponding to ‘Colombia’ had a 3 in the HowMany field. In this case, ‘Colombia’ appears in two rows with different values for State: one for 'Antioquia' and the other for 'Cundinamarca'. In the HowMany field, the row corresponding to 'Antioquia' indicates 1, while the row corresponding to 'Cundinamarca' indicates 2. This means that, in the disaggregated list, there are two rows with Country = 'Colombia' and State = 'Cundinamarca', and only one with Country = 'Colombia' and State = 'Antioquia'.

The sum of the HowMany values of these two rows logically matches the previous HowMany value for the row corresponding to 'Colombia'. The same will be true for any of the other countries that are divided into several rows with different states.

Country	State	HowMany
Argentina	Mendoza	2
Argentina	Río Negro	1
Venezuela	Distrito Capital	1
Colombia	Antioquía	1
Colombia	Cundinamarca	2
Brasil	Rio de Janeiro	1
Brasil	Sao Paulo	1
USA	Kentucky	1
USA	Florida	1
México	Guerrero	1
Uruguay	Canelones	1

GROUP BY Multiple Columns

Finally, if my intention is to make my trip as short as possible and still visit as many friends as possible, I just need to add the City column to my query – both in the SELECT and in the GROUP BY – to see which cities have the highest number of friends:

SELECT 
  Country, 
  State, 
  City, 
  COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country, State, City;

When we add columns to GROUP BY, the number of rows in the result increases. This is because the number of possible value combinations grows. When I add the City column to the SQL GROUP BY, the size of the result grows considerably:

Country	State	City	HowMany
Argentina	Mendoza	Cdad. Mendoza	1
Argentina	Mendoza	Godoy Cruz	1
Argentina	Río Negro	Bariloche	1
Venezuela	Distrito Capital	Caracas	1
Colombia	Antioquía	Medellín	1
Colombia	Cundinamarca	Bogotá	2
Brasil	Rio de Janeiro	Rio de Janeiro	1
Brasil	Sao Paulo	Sao Paulo	1
USA	Kentucky	Lexington	1
USA	Florida	Miami	1
México	Guerrero	Acapulco	1
Uruguay	Canelones	Atlántida	1

In this case, I think it would be better to see only those cities where there are more than one of my friends. So to summarize the results, I will use the HAVING clause. This clause allows me to set a condition on the results of the aggregate functions when using GROUP BY. Here, the condition to apply will be that the count of friends is greater than 1 (COUNT(*) > 1). After incorporating the HAVING clause, the query looks like this:

SELECT Country, State, City, COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country, State, City
HAVING COUNT(*) > 1;

And this way, the result of the query is reduced to a single row that shows me the only city where there is more than one of my friends:

Country	State	City	HowMany
Colombia	Cundinamarca	Bogotá	2

Other Ways of Using GROUP BY with Multiple Columns

It is common to use GROUP BY multiple columns when two or more of the columns in a query result form a hierarchy of classifications with several levels. Such hierarchies are found in many areas, such as:

Detailed sales data with the sale date divided into year, quarter, and month.
A manufacturer’s product catalog organized by family, brand, line, model.
The payroll of a company’s employees organized by management, sector, department.

In all these cases, different subsets of columns can be used in the GROUP BY to go from the general to the particular.

Using GROUP BY Multiple Columns: Grouping a Hierarchy

Let’s look at an example result set of sales data. Suppose you have a view called ViewSales that returns the following information:

Year	Quarter	Month	Date	Quantity	Unit_Price
2021	4	11	11/15/2021	5	16.08
2021	3	8	8/2/2021	1	17.06
2022	2	4	4/5/2022	2	19.48
2022	2	5	5/21/2022	1	17.06
2021	4	11	11/17/2021	2	18.50
2022	2	4	4/5/2022	1	18.08
2022	3	8	8/16/2022	5	15.26

It is easy to see that the first fields of this table form a hierarchy, with the year as the highest level and the date as the lowest level. Using GROUP BY and the SUM() function, we can obtain total sales amounts by Year, by Quarter, by Month or by Date. If you want to get the total units sold and the average unit price per Year and Quarter, you need to specify those two columns in the SELECT and in the GROUP BY:

SELECT 
  Year, 
  Quarter, 
  SUM(Quantity) AS TotalQty, 
  AVG(Unit_Price) as AvgUnit_Prc
FROM ViewSales
GROUP BY Year, Quarter;

The result will be:

Year	Quarter	TotalQty	AvgUnit_Prc
2021	4	7	17.29
2021	3	1	17.06
2022	2	4	18.21
2022	3	5	15.26

Please note that, although there is a hierarchical order, the data in the different grouping columns are independent of each other. This means that if you group only by Quarter instead of by Year plus Quarter, the aggregate calculations will combine the information from the same quarter for all years (i.e. all Q2s will have one row):

SELECT 
  Quarter, 
  SUM(Quantity) AS TotalQty,
  AVG(Unit_Price) as AvgUnit_Prc
FROM ViewSales
GROUP BY Quarter;

Quarter	TotalQty	AvgUnit_Prc
4	7	17.29
3	6	16.16
2	4	18.21

This is not a mistake; you just need to understand that the results express different insights. The latter query allows you to compare sales performance between different quarters regardless of the year (e.g. to detect seasonal factors affecting sales at the same time of every year), while the former compares sales for each particular year and quarter.

Using GROUP BY Multiple Columns: Non-Hierarchical Grouping

In the previous example, we saw that grouping by multiple columns allows us to go from the general to the particular when we have data sets with columns that form a data hierarchy. But in situations where a result set is made up of columns that do not form a hierarchy, using GROUP BY with multiple columns allows us to discover hidden truths in large data sets; it combines attributes that at first glance are unrelated to each other.

For example, let’s imagine we have a table named Downloads that stores information on people who have downloaded movies from a streaming service in the last two years. That table has one row for each download, and each row includes the following information about each person who downloaded a movie:

Age
Gender
Nationality

Each row also captures these attributes about each downloaded movie:

Genre
Year
Country

Using GROUP BY with several of these columns and the COUNT(*) function, we can detect correlations between the columns. For example, to find out movie genre preferences by age, we’d type:

SELECT 
  Age, 
  Genre, 
  COUNT(*) AS Downloads
FROM Downloads
GROUP BY Age, Genre

As results, we’d get something like this:

Age	Genre	Downloads
18	Horror	12,945
18	Comedy	15,371
19	Drama	25,902
19	Horror	11,038
21	Comedy	37,408
…	…	…

We could also use GROUP BY 3 columns, to find out (for example) genre preferences by gender and nationality:

SELECT 
  Gender, 
  Nationality, 
  Genre, 
  COUNT(*) AS Downloads
FROM Downloads
GROUP BY Gender, Nationality, Genre

And we would get something like this:

Gender	Nationality	Genre	Downloads
Male	French	Horror	102,044
Male	French	Comedy	149,290
Male	German	Horror	80,104
Female	French	Horror	91.668
Female	German	Comedy	50,103
Female	German	Drama	61,440
Other	French	Drama	77,993
Other	German	Comedy	25,484
…	…	…	…

GROUP BY with Multiple Columns Returns Faceted Information

GROUP BY is a powerful tool for extracting insights from large data sets that are difficult to manipulate in any other way. By using GROUP BY multiple columns, you can leverage its full potential to expose the truths of a dataset, allowing you to see different facets of it. To do this successfully, it is critical that you understand – and know how to explain – what an SQL result set grouped by multiple columns represents.

If you’re planning to do some serious data analysis work, then you should take our interactive SQL Basics course to learn about all the tools SQL can offer. Also, follow these links if you need further explanations of GROUP BY or want to see more examples of GROUP BY in SQL.

Tags:

GROUP BY