Back to articles list Articles Cookbook
7 minutes read

What Is GROUP BY in SQL?

GROUP BY is a great SQL feature that allows you to perform powerful calculations on grouped data. Do you want to learn how to get powerful insights from your data? Would you like to know what GROUP BY does and when you should use it? If your answer is yes, keep reading. In this article, you will learn exactly that.

Why You Need GROUP BY

In today's digitized world – no matter what business you are in – database tables are usually huge, with millions of records. To get acquainted with such data, it isn't enough to know how to use SELECT statements. One of the first things that you should learn when dealing with massive amounts of data is how to use the SQL GROUP BY construction. This powerful feature enables you to group your data by some common attribute, after which you can perform calculations on each of these groups separately.

When You Need GROUP BY

So, you can use GROUP BY when you want to group records and then run some calculations on each group and display the results. The next question is when – in which situations or business scenarios – does it make sense to use the GROUP BY clause?

Well, SQL GROUP BY  is widely used in different industries and areas of business. For example:

  • In sales, you can use GROUP BY to calculate the total number of products sold on a product or category level.
  • HR might ask you to calculate the average salary for each company department using a table that contains information about employees, their salaries, and their departments.
  • In the banking industry, the total number of active clients vs. non-active clients can be calculated by using the GROUP BY construction.
  • In public health or medicine, you can calculate the total number of confirmed COVID cases for each country and province with GROUP BY.

One thing's for sure – there are a lot of different business scenarios where GROUP BY is useful. No matter where you work as an SQL analyst, you will surely find yourself using GROUP BY every day.

Now it’s time to learn how to use the GROUP BY clause. Let’s start with its syntax.

GROUP BY Syntax

The syntax of GROUP BY is easiest to explain with an example, so let's start by taking a look at the movies table shown below:

idtitledirectorproduction_year
1PsychoAlfred Hitchcock1960
3Sweet and LowdownWoody Allen1993
4Talk to HerPedro Almodovar2002
2Midnight in ParisWoody Allen2011
8The Skin I Live InPedro Almodovar2011

As you can see, each movie is represented by one record in the table. For each movie, the table contains information about its director and production year.

Some movies are made by the same director. If you would like to display the number of movies made by each director, you could use GROUP BY like this:

SELECT director,
      count(*) AS number_of_movies
FROM movies
GROUP BY director;

Here is a short explanation of what’s going on in this query:

  • This SQL statement groups records by the column director. This is denoted by the GROUP BY keyword followed by the name of the column by which records are grouped (in our case, the column is director).
  • The grouping column must also be listed after the SELECT keyword (i.e. inside the SELECT). This means that directors will be displayed on the output screen and specific calculations will be made for each group (for each director) separately.
  • There are three groups of records because we have three different directors in our movies table (Alfred Hitchcock, Woody Allen, and Pedro Almodovar).
  • The COUNT(*) function is run on each group. It will count all the rows inside each group.

The resulting table shows us the total number of movies for each director:

directornumber_of_movies
Alfred Hitchcock1
Woody Allen2
Pedro Almodovar2

Notice how the result rows are collapsed?—?each group is displayed in one row. That is why the resulting table contains three rows instead of five (the number of rows in the table); there are three groups.

How GROUP BY Works

Now you know what the SQL engine will display as a result of our GROUP BY. But how does GROUP BY actually work?  Look at the animation below:

GROUP BY

As you can see from this animation, the SQL engine first groups the rows per a defined key – in our case, the director column. Alfred Hitchcock movies are in one group, Woody Allen movies in another and Pedro Almodovar in a third. After grouping, a count is performed on each group separately and the result is displayed. In this case, the COUNT() function counts the total number of movies for each director.

Maybe you’ve come across the SQL ORDER BY clause and are unsure about the difference between ordering and grouping rows. If so, I suggest that you take a look at the Difference Between GROUP BY and ORDER BY in Simple Words.

Multiple Calculations in a GROUP BY

In our previous example, we counted records (movies) for each director separately. There are a lot of other statistics that you can use in a GROUP BY clause. For example, instead of counting individual rows, you can calculate the average value for each group using the AVG() function or you can calculate totals by using SUM(). You can also use the MIN() or MAX() functions to retrieve the smallest (min) or largest (max) values from each group. We call such functions aggregate functions because they do calculations on groups of values and return a single value as a result.

Let’s have one quick example of using GROUP BY with aggregate functions. In this query, you can display the earliest/latest film made by each director:

SELECT director,
      min(production_year) AS the_earliest_film_made,
      max(production_year) AS the_latest_film_made
FROM movies 
GROUP BY director;

Here we are using the MIN() and MAX() functions to retrieve the desired information. We used two functions? because we’re retrieving two different statistics for each group of records. This is another good thing to know about GROUP BY – you can do different calculations in one SQL statement.

Here is the result:

directorthe_earliest_film_madethe_latest_film_made
Alfred Hitchcock19601960
Woody Allen19932011
Pedro Almodovar20022011

Notice that Alfred Hitchcock’s minimum and maximum values are equal. This is because there is only one of his movies in our table; thus, MIN() and MAX() return the same value. Other directors in the movies table have more movies in their portfolio, so they have different values.

We don’t have room to delve into aggregate functions here; you can find a detailed explanation about aggregate functions and their usage with GROUP BY  in our article SQL Aggregate Functions: A Comprehensive Guide for Beginners.

GROUP BY With Multiple Columns

In the previous example, we saw that it is possible to do more than one calculation in one GROUP BY statement. Similarly, it is also possible to group rows based on more than one column. This actually means that each specific group in the output is defined by more than one column value.

Suppose our movies table also has a genre attribute for each of the movies:

idtitledirectorproduction_yeargenre
1PsychoAlfred Hitchcock1960thriller
3Sweet and LowdownWoody Allen1993comedy/drama
4Talk to HerPedro Almodovar2002romance/drama
2Midnight in ParisWoody Allen2011comedy/drama
8The Skin I Live InPedro Almodovar2011thriller/drama

Instead of grouping rows by director, we could group them by director and genre to calculate the number of films by each director in each genre:

SELECT director,
      genre,
      count(*) as number_of_movies 
FROM movies
GROUP BY director, genre;

Here is a short explanation of the above query:

  • Inside the GROUP BY, we have two columns: director and genre. Those two columns are also defined after the SELECT Remember, you must list all columns from the GROUP BY inside the SELECT.
  • In this example, each group is defined by one director and one genre. This means that we will distinguish Pedro Almodovar’s thriller/drama movies as one group and his romance/drama movies as another group. Thus, we will have four rows (not three) in the output:
directorgenrenumber_of_movies
Alfred Hitchcockthriller1
Woody Allencomedy/drama2
Pedro Almodovarromance/drama1
Pedro Almodovarthriller/drama1

th style="background-color:#37c591"

Neat, right? By now, you probably understand that GROUP BY is a very powerful and effective SQL feature. If you would like to see more GROUP BY examples, I suggest the article Examples of SQL GROUP BY. There you will find 5 additional examples which will help you to master GROUP BY.

Deepen Your Knowledge of SQL GROUP BY

Hopefully, these examples have shown you how easy it is to write a SQL GROUP BY clause. It is really useful in everyday business, as it helps you to get powerful data insights in just a few lines of code.

Now it’s time for you to practice. I suggest that you start with the SQL Basics course on LearnSQL.com. If you’ve already been through the fundamentals and want to hone your skills, try our SQL Practice Set. Remember, only by practicing can you improve your knowledge and move towards the title of SQL expert.