Back to articles list Articles Cookbook
9 minutes read

How to Group by Multiple Columns in SQL

When analyzing large data sets, you often create groupings and apply aggregate functions to find totals or averages. In these cases, using the GROUP BY clause with multiple columns unfolds its full potential.

GROUP BY is a clause of the SELECT command. It allows you to compute various statistics for a group of rows. For example, you can use GROUP BY with an employee table to know how many employees are of each gender. Or you can group by multiple columns to determine the average age of vehicles for each make and model in a vehicle_fleet table. In this article, we’ll examine in detail how grouping by multiple columns works.

This article assumes you already know how to use GROUP BY in an SQL query. Not familiar with GROUP BY? The best way to learn this and other basic SQL constructions is with our interactive SQL Basics course. It contains 129 hands-on practical exercises. In each exercise, you get a short explanation and task to solve. With each exercise completed, you build confidence in your SQL skills. This course is also a great way to review basic SQL features if your knowledge is a bit rusty.

Ok, let’s start with a refresher on a simple use case for GROUP BY.

GROUP BY 1 Column

Each combination of the values of column(s) specified in the GROUP BY clause constitutes a group; the SELECT command with a GROUP BY clause displays a single row for each group. It’s also good to note that GROUP BY allows you to apply aggregate functions on columns not included in the outstanding subset.

Let’s see an example. I have created a table called WorldWideFriends that stores data on my friends in different parts of the world:

FriendNameCityStateCountry
MaríaAcapulcoGuerreroMéxico
FernandoCaracasDistrito CapitalVenezuela
GersonMedellínAntioquíaColombia
MónicaBogotáCundinamarcaColombia
PaulBogotáCundinamarcaColombia
KevinLexingtonKentuckyUSA
CeciliaGodoy CruzMendozaArgentina
PabloAtlántidaCanelonesUruguay
AndreaCdad. MendozaMendozaArgentina
MarlonSao PauloSao PauloBrasil
JoaoRio de JaneiroRio de JaneiroBrasil
AndrésBarilocheRío NegroArgentina
MarianoMiamiFloridaUSA

I would like to use the information in this table to do some research – e.g. to get a list of the countries where my friends live, including the number of  friends living in each country.

If I wanted to know how many friends I have in each country, I’d use GROUP BY together with the COUNT() aggregate function:

SELECT 
  Country, 
  COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country;

This query gives me a result set that condenses the rows with the same country into only one row, while COUNT(*) tells me how many repeated rows there are for each country:

CountryHowMany
Argentina3
Venezuela1
Colombia3
Brasil2
USA2
México1
Uruguay1

The above query gives me the information I would need if, for example, I needed to choose which country to travel to in order to meet as many of my friends as possible. If you’d like to read more about the basic usage of GROUP BY,  I recommend our articles on What Is GROUP BY in SQL and How to Use GROUP BY.

However, even if I travel to a country where many of my friends live, those friends may be located in different states. I may not have time to travel from one state to another to visit them all. So I need to refine my search a bit to find the geographic location where there is a higher concentration of my friends.

GROUP BY 2 Columns

So now I need to know how my friends are distributed by state as well as country. I can find this out by adding the column State to my previous GROUP BY Country (separating them with commas) and in the SELECT clause. The query looks like this:

SELECT 
  Country, 
  State, 
  COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country, State;

Looking at the results of this query, we can see some of the countries that previously appeared in only one row now appear in several rows. The reason is that when we add the State field, the query must assemble the groups with the rows that have the same value in both Country and State.

In the previous query, the row corresponding to ‘Colombia’ had a 3 in the HowMany field. In this case, ‘Colombia’ appears in two rows with different values for State: one for 'Antioquia' and the other for 'Cundinamarca'. In the HowMany field, the row corresponding to 'Antioquia' indicates 1, while the row corresponding to 'Cundinamarca' indicates 2. This means that, in the disaggregated list, there are two rows with Country = 'Colombia' and State = 'Cundinamarca', and only one with Country = 'Colombia' and State = 'Antioquia'.

The sum of the HowMany values of these two rows logically matches the previous HowMany value for the row corresponding to 'Colombia'. The same will be true for any of the other countries that are divided into several rows with different states.

CountryStateHowMany
ArgentinaMendoza2
ArgentinaRío Negro1
VenezuelaDistrito Capital1
ColombiaAntioquía1
ColombiaCundinamarca2
BrasilRio de Janeiro1
BrasilSao Paulo1
USAKentucky1
USAFlorida1
MéxicoGuerrero1
UruguayCanelones1

GROUP BY Multiple Columns

Finally, if my intention is to make my trip as short as possible and still visit as many friends as possible, I just need to add the City column to my query  – both in the SELECT and in the GROUP BY – to see which cities have the highest number of friends:

SELECT 
  Country, 
  State, 
  City, 
  COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country, State, City;

When we add columns to GROUP BY, the number of rows in the result increases. This is because the number of possible value combinations grows. When I add the City column to the SQL GROUP BY, the size of the result grows considerably:

CountryStateCityHowMany
ArgentinaMendozaCdad. Mendoza1
ArgentinaMendozaGodoy Cruz1
ArgentinaRío NegroBariloche1
VenezuelaDistrito CapitalCaracas1
ColombiaAntioquíaMedellín1
ColombiaCundinamarcaBogotá2
BrasilRio de JaneiroRio de Janeiro1
BrasilSao PauloSao Paulo1
USAKentuckyLexington1
USAFloridaMiami1
MéxicoGuerreroAcapulco1
UruguayCanelonesAtlántida1

In this case, I think it would be better to see only those cities where there are more than one of my friends. So to summarize the results, I will use the HAVING clause. This clause allows me to set a condition on the results of the aggregate functions when using GROUP BY. Here, the condition to apply will be that the count of friends is greater than 1 (COUNT(*) > 1). After incorporating the HAVING clause, the query looks like this:

SELECT Country, State, City, COUNT(*) AS HowMany
FROM WorldWideFriends
GROUP BY Country, State, City
HAVING COUNT(*) > 1;

And this way, the result of the query is reduced to a single row that shows me the only city where there is more than one of my friends:

CountryStateCityHowMany
ColombiaCundinamarcaBogotá2

Other Ways of Using GROUP BY with Multiple Columns

It is common to use GROUP BY multiple columns when two or more of the columns in a query result form a hierarchy of classifications with several levels. Such hierarchies are found in many areas, such as:

  • Detailed sales data with the sale date divided into year, quarter, and month.
  • A manufacturer’s product catalog organized by family, brand, line, model.
  • The payroll of a company’s employees organized by management, sector, department.

In all these cases, different subsets of columns can be used in the GROUP BY to go from the general to the particular.

Using GROUP BY Multiple Columns: Grouping a Hierarchy

Let’s look at an example result set of sales data. Suppose you have a view called ViewSales that returns the following information:

YearQuarterMonthDateQuantityUnit_Price
202141111/15/2021516.08
2021388/2/2021117.06
2022244/5/2022219.48
2022255/21/2022117.06
202141111/17/2021218.50
2022244/5/2022118.08
2022388/16/2022515.26

It is easy to see that the first fields of this table form a hierarchy, with the year as the highest level and the date as the lowest level. Using GROUP BY and the SUM() function, we can obtain total sales amounts by Year, by Quarter, by Month or by Date. If you want to get the total units sold and the average unit price per Year and Quarter, you need to specify those two columns in the SELECT and in the GROUP BY:

SELECT 
  Year, 
  Quarter, 
  SUM(Quantity) AS TotalQty, 
  AVG(Unit_Price) as AvgUnit_Prc
FROM ViewSales
GROUP BY Year, Quarter;

The result will be:

YearQuarterTotalQtyAvgUnit_Prc
20214717.29
20213117.06
20222418.21
20223515.26

Please note that, although there is a hierarchical order, the data in the different grouping columns are independent of each other. This means that if you group only by Quarter instead of by Year plus Quarter, the aggregate calculations will combine the information from the same quarter for all years (i.e. all Q2s will have one row):

SELECT 
  Quarter, 
  SUM(Quantity) AS TotalQty,
  AVG(Unit_Price) as AvgUnit_Prc
FROM ViewSales
GROUP BY Quarter;
QuarterTotalQtyAvgUnit_Prc
4717.29
3616.16
2418.21

This is not a mistake; you just need to understand that the results express different insights. The latter query allows you to compare sales performance between different quarters regardless of the year (e.g. to detect seasonal factors affecting sales at the same time of every year), while the former compares sales for each particular year and quarter.

Using GROUP BY Multiple Columns: Non-Hierarchical Grouping

In the previous example, we saw that grouping by multiple columns allows us to go from the general to the particular when we have data sets with columns that form a data hierarchy. But in situations where a result set is made up of columns that do not form a hierarchy, using GROUP BY with multiple columns allows us to discover hidden truths in large data sets; it combines attributes that at first glance are unrelated to each other.

For example, let’s imagine we have a table named Downloads that stores information on people who have downloaded movies from a streaming service in the last two years. That table has one row for each download, and each row includes the following information about each person who downloaded a movie:

  • Age
  • Gender
  • Nationality

Each row also captures these attributes about each downloaded movie:

  • Genre
  • Year
  • Country

Using GROUP BY with several of these columns and the COUNT(*) function, we can detect correlations between the columns. For example, to find out movie genre preferences by age, we’d type:

SELECT 
  Age, 
  Genre, 
  COUNT(*) AS Downloads
FROM Downloads
GROUP BY Age, Genre

As results, we’d get something like this:

AgeGenreDownloads
18Horror12,945
18Comedy15,371
19Drama25,902
19Horror11,038
21Comedy37,408

We could also use GROUP BY 3 columns, to find out (for example) genre preferences by gender and nationality:

SELECT 
  Gender, 
  Nationality, 
  Genre, 
  COUNT(*) AS Downloads
FROM Downloads
GROUP BY Gender, Nationality, Genre

And we would get something like this:

GenderNationalityGenreDownloads
MaleFrenchHorror102,044
MaleFrenchComedy149,290
MaleGermanHorror80,104
FemaleFrenchHorror91.668
FemaleGermanComedy50,103
FemaleGermanDrama61,440
OtherFrenchDrama77,993
OtherGermanComedy25,484

GROUP BY with Multiple Columns Returns Faceted Information

GROUP BY is a powerful tool for extracting insights from large data sets that are difficult to manipulate in any other way. By using GROUP BY  multiple columns, you can leverage its full potential to expose the truths of a dataset, allowing you to see different facets of it. To do this successfully, it is critical that you understand – and know how to explain – what an SQL result set grouped by multiple columns represents.

If you’re planning to do some serious data analysis work, then you should take our interactive SQL Basics course to learn about all the tools SQL can offer. Also, follow these links if you need further explanations of GROUP BY or want to see more examples of GROUP BY in SQL.