15th Sep 2020 6 minutes read GROUP BY in SQL Explained Kateryna Koidan GROUP BY Table of Contents How Does SQL GROUP BY Work? GROUP BY in Action Using Aggregate Functions with GROUP BY GROUP BY Two Columns Ready to Use GROUP BY in SQL? The SQL GROUP BY statement is easy to use, but it can be hard to master. Learn what you can do with GROUP BY, how it works, and how to use it with aggregate functions. GROUP BY is one of the basic SQL tools. However, it might be difficult for beginners to understand how GROUP BY works and how it is used to compute statistics with different aggregate functions. In this article, I’ll use several examples to show exactly how GROUP BY organizes rows and how aggregate functions can be used to calculate statistics on one or more columns. So, let’s get started. How Does SQL GROUP BY Work? GROUP BY is an indispensable tool for any data analyst working with SQL. If you want to organize your data in groups and calculate some kind of aggregate statistics for these groups, the GROUP BY clause is what you need. So, what does GROUP BY do? Basically, it groups rows with the same value into one group or bucket. Let’s see how it works. Suppose we’re running a bookstore and want to know how many books of different genres we have in stock. Our database includes a table that lists the books’ titles, genres, and stock quantity. The visualization below shows how the GROUP BY clause creates groups from table data. We want to know the total quantity of books for each genre; thus, GROUP BY groups the books of the same genre and sums up the corresponding quantities. This creates a result table that lists genres and their total quantity of books in our stock. For a more detailed introduction to the GROUP BY topic, see this video tutorial from our “We Learn SQL” series. GROUP BY in Action Now it’s time for more specific examples of SQL queries with a GROUP BY clause. We’ll use the books table, which stores the ID, title, author, genre, language, price, and quantity of each novel we stock. idtitleauthorgenrelangpriceqty 1Les Trois MousquetairesAlexandre Dumasadventurefr11.904 2A Game of ThronesGeorge R.R. Martinfantasyen8.495 3Pride and PrejudiceJane Austenromanceen9.992 4Vampire AcademyRichelle Meadfantasyen7.993 5IvanhoeWalter Scottadventureen9.993 6ArmanceStendhalromancefr5.881 To calculate the total number of books of each genre, we’ll use the following query: SELECT genre, SUM(qty) AS total FROM books GROUP BY genre; Here, in the GROUP BY clause, we select our rows to be grouped by the column genre. Then the SUM(qty) function in the SELECT statement sums the qty values within each group (i.e. each book genre), with the result displayed in the corresponding total field: genretotal adventure7 fantasy8 romance3 Using Aggregate Functions with GROUP BY GROUP BY puts rows with the same value into one bucket. We usually want to compute some statistics for this group of rows, like the average value or the total quantity. To this end, SQL provides aggregate functions that combine values from a certain column into one value for the respective group. So far, we’ve only used SUM() as our aggregate function for grouping the book titles in stock. However, this is not the only aggregate function you can use with GROUP BY. SQL also offers: COUNT() to calculate the number of rows in each group. AVG() to find the average value for each group. MIN() to return the minimum value in each group. MAX() to return the maximum value in each group. Let’s see how the AVG() function works with GROUP BY. This time, we want to calculate the average price for books in each genre. We’ll start by visualizing the output we want to get. We’re again grouping our books by genre, but this time we want to calculate the average book price in each genre. The SQL query for this looks as follows: SELECT genre, AVG(price) AS avg_price FROM books GROUP BY genre; This query creates a table with two columns (genre and avg_price), where the average price is calculated by averaging the price values for the books of each genre: genreavg_price adventure10.945 fantasy8.24 romance7.935 Actually, we are not limited to using only one aggregate function with a GROUP BY clause. So, let’s add the information about the minimum and maximum price of the books in each genre: SELECT genre, MIN(price) AS min_price, AVG(price) AS avg_price, MAX(price) AS max_price FROM books GROUP BY genre; The result set now includes four columns: genre, min_price, avg_price, and max_price. genremin_priceavg_pricemax_price adventure9.9910.94511.90 fantasy7.998.248.49 romance5.887.9359.99 Note that when using GROUP BY the SELECT statement may only include fields that are either used in the aggregate function or listed in the GROUP BY clause. For example, in our case, we cannot add title or author to our result set. To do so makes no sense, as each row in our output table includes information on several books with different titles and authors. GROUP BY Two Columns In SQL, you can also group your data using several columns. For example, let’s say we want to group our books not only by genre but also by language. And we want to calculate the number of book titles in each category. Hint: The COUNT() aggregate function will assist us in calculating the number of book titles (i.e. rows). To compute the number of book titles by genre and language, we’ll use the following query: SELECT genre, lang, count(title) AS titles FROM books GROUP BY genre, lang; We’ll get an output table with three columns: genre, language (lang), and the number of titles in that category: genrelangtitles adventureen1 adventurefr1 fantasyen2 romanceen1 romancefr1 So, in our example dataset, we have: One English-language book title in the adventure genre (Ivanhoe). One French-language book title in the adventure genre (Les Trois Mousquetaires). Two English-language book titles in the fantasy genre (A Game of Thrones and Vampire Academy). One English-language book title in the romance genre (Pride and Prejudice). One French-language book title in the romance genre (Armance). As you see, there are no French-language fantasy books in our dataset. For more examples of GROUP BY applications, check out this article showing how GROUP BY can be leveraged in realistic business scenarios. Ready to Use GROUP BY in SQL? Now you have a better understanding of how GROUP BY can assist you with organizing and analyzing your data. However, if you want to handle GROUP BY like a pro, you need lots of practice. LearnSQL.com offers an SQL Basics course that includes 129 interactive exercises covering aggregation, grouping, JOINs, subqueries, and more. This course is a perfect opportunity to learn how to group rows and compute statistics with aggregate functions. Happy learning! Tags: GROUP BY