Back to articles list Articles Cookbook
10 minutes read

How to Use GROUP BY in SQL

In this article, we’ll give you a step-by-step guide on using GROUP BY in SQL. Uncover the nuances of the GROUP BY clause and learn different ways of using it.

GROUP BY is one of the most-used clauses in SQL. It moves you from simply selecting data from the database to grouping rows with the same column values into one group. When used with SQL’s aggregate functions, you can use GROUP BY to compute metrics like counting the number of instances or finding the total, average, minimum, or maximum value.

GROUP BY  is part of basic SQL knowledge; you need to feel really comfortable with it before you go on to more complicated concepts.

You already know that writing code is best learned through practice. Finding the opportunity to write SQL code regularly might be tricky if you’re not working with SQL daily. Hard, but not impossible. What makes it possible is the SQL Practice Set course. It 88 exercises offer plenty of opportunities to practice GROUP BY and some other SQL concepts, such as aggregate functions, JOINs, and subqueries.

How Does GROUP BY Work? An Example 

I’ll use the table typewriter_products to demonstrate the GROUP BY clause’s importance in SQL. It shows things you can buy from a fictional company selling typewriter-related products. For those who might wonder, this is a typewriter:

How to Use GROUP BY in SQL

Source: https://i.etsystatic.com/8429430/r/il/132716/1789166606/il_fullxfull.1789166606_1qnc.jpg

Almost like a computer without a screen. Who knows how many important books were written using a typewriter. Books, remember those?

(But enough already with the smugness! Talk about the table!)

Yes, the table. It  has the following columns:

  • id – The record’s unique ID.
  • product_name – The product’s name.
  • product_id – The product’s ID.
  • ribbon_brand – The brand of the typewriter ribbon.
  • typewriter_brand – The brand of typewriter the ribbon is made for.
  • ribbon_color – The typewriter ribbon’s color.
  • units – The product’s number of units available.
  • price – the product’s price per unit.

The data itself is shown below.

idproduct_nameproduct_idribbon_brandtypewriter_brandribbon_colorunitsprice
1typewriter ribbon1All You NeedOlympiaBlack8810.00
2typewriter ribbon1All You NeedOlympiaBlack + Red4810.00
3typewriter ribbon2Ribbons & UsOlympiaBlack9712.39
4typewriter ribbon2Ribbons & UsOlympiaBlack + Red14715.15
5typewriter ribbon3All You NeedUnderwoodBlack1424.74
6typewriter ribbon3All You NeedUnderwoodBlack + Red1325.17
7typewriter ribbon4Our RibbonUnderwoodBlack5425.00
8typewriter ribbon5Ribbons & UsUnderwoodBlack15730.47
9typewriter ribbon5Ribbons & UsUnderwoodBlack + Red1428.47
10typewriter ribbon6All You NeedAdlerBlack4420.00
11typewriter ribbon6All You NeedAdlerBlack + Red1630.00
12typewriter ribbon7Ribbons & UsAdlerBlack5424.69
13typewriter ribbon7Ribbons & UsAdlerBlack + Red2730.30

With this table, you could get several groups. For example, you can calculate the average price of a product by the typewriter brand.

Here’s the query that will do that:

SELECT typewriter_brand,
 AVG(price) AS average_price
FROM typewriter_products
GROUP BY typewriter_brand;

This query groups data by the typewriter brand and calculates the average price. The query will output this table:

typewriter_brandaverage_price
Adler26.25
Olympia11.89
Underwood26.77

The output shows three types of typewriter brands and the average product price for each typewriter.

Or you could find the number of ribbons available by color with this query:

SELECT ribbon_color,
	 SUM(units) AS sum_units
FROM typewriter_products
GROUP BY ribbon_color;

The group in this query is the ribbon color. I also use the SUM() aggregate function to sum the number of units per ribbon color. Here’s the result:

ribbon_colorsum_units
Black508
Black + Red265

Data is grouped into two rows: black ribbon and black + red ribbon. For each ribbon color, there’s a number of units available for sale.

This is a sneak peek of what GROUP BY does and how it works. As you can see, the GROUP BY syntax is relatively simple. If you need some more clarification on the GROUP BY syntax, take a look at this article.

How can you write your own queries and make use of the GROUP BY clause? Here are some tips to help you.

Steps to Use GROUP BY

How to use GROUP BY in SQL? I’ll give you a step-by-step recipe in this section. Let’s imagine I want to find the number of records for each product.

The first step in writing a query should be finding an adequate grouping column. In this case, it’s product_id. We put this column in the GROUP BY clause:

SELECT
…
GROUP BY product_id;

The second step is to choose the right aggregate function and use it in the SELECT statement. Since our goal is to find the number of records, we use the COUNT() function. Of course, you also need to specify the column in the FROM clause:

SELECT COUNT(*)
FROM typewriter_products
GROUP BY product_id;

Will this query run? Of course! It will return this output.

count
2
2
1
2
2
2
2

As you can see, this is not very helpful. All the products have two records except one, but what are these products? We don’t have a clue!

That’s why the third step is important. In this step, write the grouping column in the SELECT, too:

SELECT product_id,
	 COUNT(*)
FROM typewriter_products
GROUP BY product_id
ORDER BY product_id; 

Now, the query will show this output.

product_idcount
12
22
32
41
52
62
72

This is much more helpful, right? Now you know that the product with ID 4 has only one occurrence in the table. You probably noticed I used COUNT(*)  in the above queries. This is not the only way of using that aggregate function. You can also learn about all the options for using COUNT().

By comparing the two queries above, you can see it’s possible to use a column in the GROUP BY but not use it in SELECT; the grouping still works.

And what if you used the column in the SELECT statement but not in the GROUP BY? No, you can’t do that. The general rule is: If the column is in SELECT and is not used in an aggregate function, it must be listed in the GROUP BY clause. Here are some more details about this rule.

Other Tips for Using GROUP BY in SQL

While using GROUP BY seems to be rather easy (and it is!), there are some other tips and tricks that will make using it a much more comfortable experience.

Use a Unique Identifier

When choosing which column to group by, you should generally use the column that uniquely identifies the group. If you don’t do that, the result you get could be misleading or just plain wrong.

For example, let’s show the products by typewriter brand but try grouping the data by product name. The query …

SELECT product_name,
	 typewriter_brand,
	 COUNT(*) AS product_count
FROM typewriter_products
GROUP BY product_name, typewriter_brand
ORDER BY typewriter_brand;

… will output the following table:

product_nametypewriter_brandproduct_count
typewriter ribbonAdler4
typewriter ribbonOlympia4
typewriter ribbonUnderwood5

If you grouped data this way, you would get the wrong impression there’s only one product for each of the three typewriter brands – one product appearing four times for Adler and Olympia typewriters and five times for Underwood.

What if you used the column product_id instead of product_name?

SELECT product_id,
	 typewriter_brand,
	 COUNT(*) AS product_count
FROM typewriter_products
GROUP BY product_id, typewriter_brand
ORDER BY typewriter_brand;

Now you see the result is a little different:

product_idtypewriter_brandproduct_count
6Adler2
7Adler2
1Olympia2
2Olympia2
3Underwood2
4Underwood1
5Underwood2

There are still products for three typewriter brands. However, now you know there are two products for Adler and Olympia, both appearing twice for each typewriter. There are three products for Underwood. Not only that, but you also know which products they are.

Let’s look at this example:

SELECT product_id,
	 product_name,
	 typewriter_brand,
	 COUNT(*) AS product_count
FROM typewriter_products
GROUP BY product_id, product_name, typewriter_brand
ORDER BY typewriter_brand;

This is an extended version of the previous query. We again group by product_id. To get the number of product occurrences, we use the COUNT() function. We also want more labels, so we add the product name and the typewriter brand to the SELECT.

Since these columns appear in the SELECT, they must also appear in GROUP BY. Note that all the columns do so except the one with the aggregate function.

Have a look at the output:

product_idproduct_nametypewriter_brandproduct_count
6typewriter ribbonAdler2
7typewriter ribbonAdler2
1typewriter ribbonOlympia2
2typewriter ribbonOlympia2
3typewriter ribbonUnderwood2
4typewriter ribbonUnderwood1
5typewriter ribbonUnderwood2

You can see this output gives the clearest picture. There are seven different products, but they are all typewriter ribbons. These products are for three typewriter brands. All products have two occurrences, except product #4.

Grouping by Value

You don’t always have to group by the ID column. It’s also possible to group data by value.

For example, if you wanted to know how many units there are by the ribbon color, you’d use this query:

SELECT ribbon_color,
	 SUM(units) AS units_sum
FROM typewriter_products
GROUP BY ribbon_color;

Here we use ribbon color as the grouping criteria. The aggregate function SUM() will total the units in the following way:

ribbon_colorunits_sum
Black508
Black + Red265

The output shows there are 508 black ribbons and 265 black and red ribbons.

If you’re not familiar with summing the values, check out the SUM() function explained. The MIN() & MAX() functions are also commonly used with GROUP BY.

Two Columns as the Group Definer

The GROUP BY clause allows you to group data by two or more columns; you already saw that. But it’s also possible to use two columns as the unique identifier of a group. For example:

SELECT ribbon_color,
	 typewriter_brand,
	 SUM(units) AS units_sum
FROM typewriter_products
GROUP BY ribbon_color, typewriter_brand
ORDER BY typewriter_brand;

In this query, we use the columns ribbon_color and typewriter_brand to define a group. Each ribbon color for a specific typewriter will be shown only once. For such a defined group, we calculate the number of ribbons available:

ribbon_colortypewriter_brandunits_sum
BlackAdler98
Black + RedAdler43
BlackOlympia185
Black + RedOlympia195
BlackUnderwood225
Black + RedUnderwood27

Grouping by an Expression

Instead of grouping only by columns, it’s also possible to group data by an expression.

Let’s take a look at the following query:

SELECT ribbon_brand,
	 ribbon_color,
	 units*price AS product_value,
 SUM(units*price) AS product_value_sum
FROM typewriter_products
GROUP BY ribbon_brand, ribbon_color, units*price
ORDER BY ribbon_brand;

Here we select the ribbon brand and color. Additionally, we also calculate the value of each group by multiplying the unit by the price.

Then we group data by the columns ribbon_brand and ribbon_color. This is something you’re used to. But we also add the formula for calculating the value to the GROUP BY clause. We want to show only unique values by ribbon brand and color. If there are several of the same calculated values within the same group, they will be shown as one row.

To make the aggregation more obvious, I added the sum of product values. You’ll soon see why.

ribbon_brandribbon_colorproduct_valueproduct_value_sum
All You NeedBlack346.36346.36
All You NeedBlack880.001,760.00
All You NeedBlack + Red327.21327.21
All You NeedBlack + Red480.00960.00
Our RibbonBlack1,350.001,350.00
Ribbons & UsBlack1,201.831,201.83
Ribbons & UsBlack1,333.261,333.26
Ribbons & UsBlack4,783.794,783.79
Ribbons & UsBlack + Red398.58398.58
Ribbons & UsBlack + Red818.10818.10
Ribbons & UsBlack + Red2,227.052,227.05

It might look like this table isn’t grouped at all. But let’s take a closer look. If the data weren’t grouped, the brand All You Need would have appeared six times, the same as Ribbons & Us. But it appears only four times. Why? Because the values 880.00 and 480.00 appear two times each, so they’re grouped together.

This is what the column product_value_sum tells you in the colored rows. These are the only rows where this column is different from product_value. The green row has a sum of 1,760.00 because value 880.00 appears twice. The red row’s sum is 960.00 because 480.00 appears twice.

Steps to Writing a Query With GROUP BY and Aggregate Functions

You’ve already internalized GROUP BY through writing all the above queries. But I think it’s worth having this as a separate list – the steps checklist.

  • Step 1: Identify the grouping column(s), i.e. a column or columns you want to group the data by. After you’ve identified it, put it in the GROUP BY clause.
  • Step 2: Depending on the metric you want to calculate, choose the appropriate aggregate function and use it in the SELECT statement.
  • Step 3: Use the grouping column in the SELECT statement. That way, you’ll get data labels for each group, not only the aggregate function output.

No SQL User Can Avoid Using GROUP BY

No SQL user known to me has managed to live a successful SQL life and write queries without GROUP BY. It’s impossible! So why not practice GROUP BY and other SQL expressions and functions? The wise choice would be our SQL Practice Set. It offers you the opportunity to practice all the different ways of using GROUP BY covered in the article.

Some other useful resources for GROUP BY examples are available on our blog; the concept also often appears in SQL job interview questions.