Back to articles list February 16, 2021 - 7 minutes read What Is GROUP BY in SQL? Marija Ilic Marija works as a data scientist in the banking industry. She specializes in big data platforms (Cloudera and Hadoop) with software and technologies such as Hive/Impala, Python and PySpark, Kafka, and R. Marija has an extensive background in DWH/ETL development in the banking industry. Her main interests are predictive modeling, real-time decision-making, and social network analysis. Outside of work, Marija enjoys listening to her favorite LPs on her old gramophone—and never grows tired of its soothing crackle. Tags: sql learn sql GROUP BY GROUP BY is a great SQL feature that allows you to perform powerful calculations on grouped data. Do you want to learn how to get powerful insights from your data? Would you like to know what GROUP BY does and when you should use it? If your answer is yes, keep reading. In this article, you will learn exactly that. Why You Need GROUP BY In today's digitized world – no matter what business you are in – database tables are usually huge, with millions of records. To get acquainted with such data, it isn't enough to know how to use SELECT statements. One of the first things that you should learn when dealing with massive amounts of data is how to use the SQL GROUP BY construction. This powerful feature enables you to group your data by some common attribute, after which you can perform calculations on each of these groups separately. When You Need GROUP BY So, you can use GROUP BY when you want to group records and then run some calculations on each group and display the results. The next question is when – in which situations or business scenarios – does it make sense to use the GROUP BY clause? Well, SQL GROUP BY is widely used in different industries and areas of business. For example: In sales, you can use GROUP BY to calculate the total number of products sold on a product or category level. HR might ask you to calculate the average salary for each company department using a table that contains information about employees, their salaries, and their departments. In the banking industry, the total number of active clients vs. non-active clients can be calculated by using the GROUP BY construction. In public health or medicine, you can calculate the total number of confirmed COVID cases for each country and province with GROUP BY. One thing's for sure – there are a lot of different business scenarios where GROUP BY is useful. No matter where you work as an SQL analyst, you will surely find yourself using GROUP BY every day. Now it’s time to learn how to use the GROUP BY clause. Let’s start with its syntax. GROUP BY Syntax The syntax of GROUP BY is easiest to explain with an example, so let's start by taking a look at the movies table shown below: idtitledirectorproduction_year 1PsychoAlfred Hitchcock1960 3Sweet and LowdownWoody Allen1993 4Talk to HerPedro Almodovar2002 2Midnight in ParisWoody Allen2011 8The Skin I Live InPedro Almodovar2011 As you can see, each movie is represented by one record in the table. For each movie, the table contains information about its director and production year. Some movies are made by the same director. If you would like to display the number of movies made by each director, you could use GROUP BY like this: SELECT director, count(*) AS number_of_movies FROM movies GROUP BY director; Here is a short explanation of what’s going on in this query: This SQL statement groups records by the column director. This is denoted by the GROUP BY keyword followed by the name of the column by which records are grouped (in our case, the column is director). The grouping column must also be listed after the SELECT keyword (i.e. inside the SELECT). This means that directors will be displayed on the output screen and specific calculations will be made for each group (for each director) separately. There are three groups of records because we have three different directors in our movies table (Alfred Hitchcock, Woody Allen, and Pedro Almodovar). The COUNT(*) function is run on each group. It will count all the rows inside each group. The resulting table shows us the total number of movies for each director: directornumber_of_movies Alfred Hitchcock1 Woody Allen2 Pedro Almodovar2 Notice how the result rows are collapsed?—?each group is displayed in one row. That is why the resulting table contains three rows instead of five (the number of rows in the table); there are three groups. How GROUP BY Works Now you know what the SQL engine will display as a result of our GROUP BY. But how does GROUP BY actually work? Look at the animation below: As you can see from this animation, the SQL engine first groups the rows per a defined key – in our case, the director column. Alfred Hitchcock movies are in one group, Woody Allen movies in another and Pedro Almodovar in a third. After grouping, a count is performed on each group separately and the result is displayed. In this case, the COUNT() function counts the total number of movies for each director. Maybe you’ve come across the SQL ORDER BY clause and are unsure about the difference between ordering and grouping rows. If so, I suggest that you take a look at the Difference Between GROUP BY and ORDER BY in Simple Words. Multiple Calculations in a GROUP BY In our previous example, we counted records (movies) for each director separately. There are a lot of other statistics that you can use in a GROUP BY clause. For example, instead of counting individual rows, you can calculate the average value for each group using the AVG() function or you can calculate totals by using SUM(). You can also use the MIN() or MAX() functions to retrieve the smallest (min) or largest (max) values from each group. We call such functions aggregate functions because they do calculations on groups of values and return a single value as a result. Let’s have one quick example of using GROUP BY with aggregate functions. In this query, you can display the earliest/latest film made by each director: SELECT director, min(production_year) AS the_earliest_film_made, max(production_year) AS the_latest_film_made FROM movies GROUP BY director; Here we are using the MIN() and MAX() functions to retrieve the desired information. We used two functions? because we’re retrieving two different statistics for each group of records. This is another good thing to know about GROUP BY – you can do different calculations in one SQL statement. Here is the result: directorthe_earliest_film_madethe_latest_film_made Alfred Hitchcock19601960 Woody Allen19932011 Pedro Almodovar20022011 Notice that Alfred Hitchcock’s minimum and maximum values are equal. This is because there is only one of his movies in our table; thus, MIN() and MAX() return the same value. Other directors in the movies table have more movies in their portfolio, so they have different values. We don’t have room to delve into aggregate functions here; you can find a detailed explanation about aggregate functions and their usage with GROUP BY in our article A Beginner’s Guide to SQL Aggregate Functions. GROUP BY With Multiple Columns In the previous example, we saw that it is possible to do more than one calculation in one GROUP BY statement. Similarly, it is also possible to group rows based on more than one column. This actually means that each specific group in the output is defined by more than one column value. Suppose our movies table also has a genre attribute for each of the movies: idtitledirectorproduction_yeargenre 1PsychoAlfred Hitchcock1960thriller 3Sweet and LowdownWoody Allen1993comedy/drama 4Talk to HerPedro Almodovar2002romance/drama 2Midnight in ParisWoody Allen2011comedy/drama 8The Skin I Live InPedro Almodovar2011thriller/drama Instead of grouping rows by director, we could group them by director and genre to calculate the number of films by each director in each genre: SELECT director, genre, count(*) as number_of_movies FROM movies GROUP BY director, genre; Here is a short explanation of the above query: Inside the GROUP BY, we have two columns: director and genre. Those two columns are also defined after the SELECT Remember, you must list all columns from the GROUP BY inside the SELECT. In this example, each group is defined by one director and one genre. This means that we will distinguish Pedro Almodovar’s thriller/drama movies as one group and his romance/drama movies as another group. Thus, we will have four rows (not three) in the output: directorgenrenumber_of_movies Alfred Hitchcockthriller1 Woody Allencomedy/drama2 Pedro Almodovarromance/drama1 Pedro Almodovarthriller/drama1 th style="background-color:#37c591" Neat, right? By now, you probably understand that GROUP BY is a very powerful and effective SQL feature. If you would like to see more GROUP BY examples, I suggest the article Examples of SQL GROUP BY. There you will find 5 additional examples which will help you to master GROUP BY. Deepen Your Knowledge of SQL GROUP BY Hopefully, these examples have shown you how easy it is to write a SQL GROUP BY clause. It is really useful in everyday business, as it helps you to get powerful data insights in just a few lines of code. Now it’s time for you to practice. I suggest that you start with the SQL Basics course on LearnSQL.com. If you’ve already been through the fundamentals and want to hone your skills, try our SQL Practice Set. Remember, only by practicing can you improve your knowledge and move towards the title of SQL expert. Tags: sql learn sql GROUP BY You may also like Using GROUP BY in SQL Master the powerful SQL GROUP BY command. Learn to compute statistical summaries with aggregate functions to solve real-world Business Intelligence challenges. Read more 7 Common GROUP BY Errors This article explains the most common errors beginners make when using the SQL GROUP BY clause. Read more GROUP BY Clause: How Well Do You Know It? The GROUP BY clause is the most basic way to compute statistics in SQL. It can be quite tough for beginners but it is really powerful. Read more GROUP BY in SQL Explained Need to refresh your knowledge of SQL GROUP BY? Learn how GROUP BY works and when it can be useful. Examples provided. Read more SQL Window Functions vs. GROUP BY: What’s the Difference? Window functions and GROUP BY may seem similar at first, but they’re quite different. Learn how window functions differ from GROUP BY and aggregate functions. Read more NULL Values and the GROUP BY Clause We've already covered how to use the GROUP BY clause but how does SQL's GROUP BY clause work when NULL values are involved? Find out! Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.