16th Feb 2021 7 minutes read

What Is GROUP BY in SQL?

GROUP BY

Table of Contents

Why You Need GROUP BY
When You Need GROUP BY
GROUP BY Syntax
How GROUP BY Works
Multiple Calculations in a GROUP BY
GROUP BY With Multiple Columns
Deepen Your Knowledge of SQL GROUP BY

GROUP BY is a great SQL feature that allows you to perform powerful calculations on grouped data. Do you want to learn how to get powerful insights from your data? Would you like to know what GROUP BY does and when you should use it? If your answer is yes, keep reading. In this article, you will learn exactly that.

Why You Need GROUP BY

In today's digitized world – no matter what business you are in – database tables are usually huge, with millions of records. To get acquainted with such data, it isn't enough to know how to use SELECT statements. One of the first things that you should learn when dealing with massive amounts of data is how to use the SQL GROUP BY construction. This powerful feature enables you to group your data by some common attribute, after which you can perform calculations on each of these groups separately.

When You Need GROUP BY

So, you can use GROUP BY when you want to group records and then run some calculations on each group and display the results. The next question is when – in which situations or business scenarios – does it make sense to use the GROUP BY clause?

Well, SQL GROUP BY is widely used in different industries and areas of business. For example:

In sales, you can use GROUP BY to calculate the total number of products sold on a product or category level.
HR might ask you to calculate the average salary for each company department using a table that contains information about employees, their salaries, and their departments.
In the banking industry, the total number of active clients vs. non-active clients can be calculated by using the GROUP BY construction.
In public health or medicine, you can calculate the total number of confirmed COVID cases for each country and province with GROUP BY.

One thing's for sure – there are a lot of different business scenarios where GROUP BY is useful. No matter where you work as an SQL analyst, you will surely find yourself using GROUP BY every day.

Now it’s time to learn how to use the GROUP BY clause. Let’s start with its syntax.

GROUP BY Syntax

The syntax of GROUP BY is easiest to explain with an example, so let's start by taking a look at the movies table shown below:

id	title	director	production_year
1	Psycho	Alfred Hitchcock	1960
3	Sweet and Lowdown	Woody Allen	1993
4	Talk to Her	Pedro Almodovar	2002
2	Midnight in Paris	Woody Allen	2011
8	The Skin I Live In	Pedro Almodovar	2011

As you can see, each movie is represented by one record in the table. For each movie, the table contains information about its director and production year.

Some movies are made by the same director. If you would like to display the number of movies made by each director, you could use GROUP BY like this:

SELECT director,
      count(*) AS number_of_movies
FROM movies
GROUP BY director;

Here is a short explanation of what’s going on in this query:

This SQL statement groups records by the column director. This is denoted by the GROUP BY keyword followed by the name of the column by which records are grouped (in our case, the column is director).
The grouping column must also be listed after the SELECT keyword (i.e. inside the SELECT). This means that directors will be displayed on the output screen and specific calculations will be made for each group (for each director) separately.
There are three groups of records because we have three different directors in our movies table (Alfred Hitchcock, Woody Allen, and Pedro Almodovar).
The COUNT(*) function is run on each group. It will count all the rows inside each group.

The resulting table shows us the total number of movies for each director:

director	number_of_movies
Alfred Hitchcock	1
Woody Allen	2
Pedro Almodovar	2

Notice how the result rows are collapsed?—?each group is displayed in one row. That is why the resulting table contains three rows instead of five (the number of rows in the table); there are three groups.

How GROUP BY Works

Now you know what the SQL engine will display as a result of our GROUP BY. But how does GROUP BY actually work? Look at the animation below:

As you can see from this animation, the SQL engine first groups the rows per a defined key – in our case, the director column. Alfred Hitchcock movies are in one group, Woody Allen movies in another and Pedro Almodovar in a third. After grouping, a count is performed on each group separately and the result is displayed. In this case, the COUNT() function counts the total number of movies for each director.

Maybe you’ve come across the SQL ORDER BY clause and are unsure about the difference between ordering and grouping rows. If so, I suggest that you take a look at the Difference Between GROUP BY and ORDER BY in Simple Words.

Multiple Calculations in a GROUP BY

In our previous example, we counted records (movies) for each director separately. There are a lot of other statistics that you can use in a GROUP BY clause. For example, instead of counting individual rows, you can calculate the average value for each group using the AVG() function or you can calculate totals by using SUM(). You can also use the MIN() or MAX() functions to retrieve the smallest (min) or largest (max) values from each group. We call such functions aggregate functions because they do calculations on groups of values and return a single value as a result.

Let’s have one quick example of using GROUP BY with aggregate functions. In this query, you can display the earliest/latest film made by each director:

SELECT director,
      min(production_year) AS the_earliest_film_made,
      max(production_year) AS the_latest_film_made
FROM movies 
GROUP BY director;

Here we are using the MIN() and MAX() functions to retrieve the desired information. We used two functions? because we’re retrieving two different statistics for each group of records. This is another good thing to know about GROUP BY – you can do different calculations in one SQL statement.

Here is the result:

director	the_earliest_film_made	the_latest_film_made
Alfred Hitchcock	1960	1960
Woody Allen	1993	2011
Pedro Almodovar	2002	2011

Notice that Alfred Hitchcock’s minimum and maximum values are equal. This is because there is only one of his movies in our table; thus, MIN() and MAX() return the same value. Other directors in the movies table have more movies in their portfolio, so they have different values.

We don’t have room to delve into aggregate functions here; you can find a detailed explanation about aggregate functions and their usage with GROUP BY in our article SQL Aggregate Functions: A Comprehensive Guide for Beginners.

GROUP BY With Multiple Columns

In the previous example, we saw that it is possible to do more than one calculation in one GROUP BY statement. Similarly, it is also possible to group rows based on more than one column. This actually means that each specific group in the output is defined by more than one column value.

Suppose our movies table also has a genre attribute for each of the movies:

id	title	director	production_year	genre
1	Psycho	Alfred Hitchcock	1960	thriller
3	Sweet and Lowdown	Woody Allen	1993	comedy/drama
4	Talk to Her	Pedro Almodovar	2002	romance/drama
2	Midnight in Paris	Woody Allen	2011	comedy/drama
8	The Skin I Live In	Pedro Almodovar	2011	thriller/drama

Instead of grouping rows by director, we could group them by director and genre to calculate the number of films by each director in each genre:

SELECT director,
      genre,
      count(*) as number_of_movies 
FROM movies
GROUP BY director, genre;

Here is a short explanation of the above query:

Inside the GROUP BY, we have two columns: director and genre. Those two columns are also defined after the SELECT Remember, you must list all columns from the GROUP BY inside the SELECT.
In this example, each group is defined by one director and one genre. This means that we will distinguish Pedro Almodovar’s thriller/drama movies as one group and his romance/drama movies as another group. Thus, we will have four rows (not three) in the output:

director	genre	number_of_movies
Alfred Hitchcock	thriller	1
Woody Allen	comedy/drama	2
Pedro Almodovar	romance/drama	1
Pedro Almodovar	thriller/drama	1

th style="background-color:#37c591"

Neat, right? By now, you probably understand that GROUP BY is a very powerful and effective SQL feature. If you would like to see more GROUP BY examples, I suggest the article Examples of SQL GROUP BY. There you will find 5 additional examples which will help you to master GROUP BY.

Deepen Your Knowledge of SQL GROUP BY

Hopefully, these examples have shown you how easy it is to write a SQL GROUP BY clause. It is really useful in everyday business, as it helps you to get powerful data insights in just a few lines of code.

Now it’s time for you to practice. I suggest that you start with the SQL Basics course on LearnSQL.com. If you’ve already been through the fundamentals and want to hone your skills, try our SQL Practice Set. Remember, only by practicing can you improve your knowledge and move towards the title of SQL expert.

Tags:

GROUP BY