# Using GROUP BY in SQL

*Now that you know SQL’s core commands, power up your queries with the GROUP BY clause and aggregate functions. *

`GROUP BY`

is a SQL clause that arranges data into groups based on a common value (or values). It is widely used to obtain summary information for large datasets as well as for computing basic aggregate statistics.

Once you have grasped SQL’s basic commands like `SELECT`

, `WHERE`

, and `JOIN`

, mastering the `GROUP BY`

clause can help you implement your SQL knowledge in real-world business intelligence scenarios.

SQL `GROUP BY`

is most often used in combination with aggregate functions like `COUNT()`

, `MAX()`

, `MIN()`

, `SUM()`

, and `AVG()`

.It groups the result set by one or more columns, essentially creating summary rows for each group. These groups are usually based on categorical rather than continuous values.

For example, if you are using a dataset of global customers for your business, you can use the GROUP BY clause to summarize information such as the “total number of customers by each country” or the “total sales per customer”.

## GROUP BY Syntax

Now that we know what the `GROUP BY`

clause is, let's look at the syntax of a basic `GROUP BY`

query.

SELECT column_name(s) FROM table_name WHERE [condition] GROUP BY column_name1, column_name2 HAVING [condition] ORDER BY column_name

Here:

`SELECT`

is the standard SQL`SELECT`

query.`GROUP BY column_name1`

performs the grouping based on values in`column_name1`

.`column_name2`

is used when grouping is done on more than one column; you can group by any number of columns. This is optional.`HAVING [condition]`

and`WHERE [condition]`

are also optional; both are used to restrict the rows affected by the`GROUP BY`

clause. The key difference is thatand`WHERE`

filters the rows before grouping**HAVING filters the groups**themselves once they are formed. Important:`WHERE`

must precede the`GROUP BY`

clause and`HAVING`

must follow it.`ORDER BY [condition]`

is used after`GROUP BY`

to sort or order the data by a given`column_name`

.

At first, `GROUP BY`

and `ORDER BY`

appear to do the same thing – sort the data. But that is where their similarities end. As the name suggests, `ORDER BY`

simply orders the complete dataset in a defined order, while `GROUP BY`

groups the data as an aggregate output. Our earlier article, The Difference Between GROUP BY and ORDER BY in Simple Words, explains this in more detail; check it out if you need a refresher.

Now, let’s see how the `GROUP BY`

clause works with real queries.

## GROUP BY a Single Column

To help understand the effect of the `GROUP BY`

clause, let's execute a simple query on the

table, which has recorded 196 individual orders. Below, you can see the first five entries in the table:**Orders**

OrderID | CustomerID | EmployeeID | OrderDate | ShipperID |
---|---|---|---|---|

10248 | 90 | 5 | 1996-07-04 | 3 |

10249 | 81 | 6 | 1996-07-05 | 1 |

10250 | 34 | 4 | 1996-07-08 | 2 |

10251 | 84 | 3 | 1996-07-08 | 1 |

10252 | 76 | 4 | 1996-07-09 | 2 |

Using the

table, let’s find out how many unique customers have placed an order. Here’s the query:**Orders**

SELECT CustomerID FROM Orders GROUP BY CustomerID

RESULT

Number of Records: 74

CustomerID |
---|

2 |

3 |

4 |

5 |

7 |

This is an example of a simple `GROUP BY`

clause on a single column (`CustomerID`

); the output places all the rows with the same value for the `CustomerID`

column in the same group. It also returns the number of unique customers who have placed an order (74). Essentially, a `GROUP BY`

clause without an aggregate function behaves similarly to a `DISTINCT`

clause: it finds the unique values in a column.

Grouping can also be done by multiple columns. This is most useful when used with aggregate function(s).

## Grouping with Aggregate Functions

The practical application of SQL’s `GROUP BY`

command is summarizing statistical parameters for groups of data; this is usually achieved with aggregate functions. We’ll explore some examples now, using the

tables and the **Orders**

table, a sample of which is shown below:**OrderDetails**

OrderDetailID | OrderID | ProductID | Quantity |
---|---|---|---|

1 | 10248 | 11 | 12 |

2 | 10248 | 42 | 10 |

3 | 10248 | 72 | 5 |

4 | 10249 | 14 | 9 |

5 | 10249 | 51 | 40 |

## MAX(), MIN(), AVG(), SUM()

`GROUP BY`

is perfect for summarizing the statistical parameters of individual categories in your dataset. For example, let’s try to find the maximum, minimum, and average quantity of each `ProductID`

per order and the total number of each product ordered. We’ll use the

table and put the results in descending order by the sum quantities ordered for each product. This will shed light on the most popular **OrderDetails**`ProductID`

s. Here’s the query:

SELECT ProductID, MAX(Quantity), Min(Quantity), AVG(Quantity), SUM (Quantity) FROM OrderDetails GROUP BY ProductID Order By SUM(Quantity) DESC

RESULT

Number of Records: 77

ProductID | MAX(Quantity) | Min(Quantity) | AVG(Quantity) | SUM (Quantity) |
---|---|---|---|---|

31 | 70 | 4 | 32.714285714285715 | 458 |

60 | 80 | 6 | 35.833333333333336 | 430 |

35 | 100 | 4 | 41 | 369 |

59 | 70 | 6 | 24.714285714285715 | 346 |

2 | 60 | 7 | 31 | 341 |

The table below breaks down how some of these aggregate values are calculated for the `ProductID`

values 2 and 31. The `GROUP BY`

clause first groups the rows together for `ProductID`

2 and then calculates the `AVG()`

and `MAX()`

for the whole group. It does the same for each `ProductID`

for each of the aggregate functions in the query.

ProductID | Quantity | Average (?Quantity/# of ProductID) | Max (Quantity) |
---|---|---|---|

2 | 60 | 31 | 60 |

2 | 50 | ||

2 | 45 | ||

2 | 40 | ||

2 | 35 | ||

…… | …… | ||

31 | 70 | 32.714 | 70 |

31 | 60 | ||

31 | 56 | ||

31 | 42 | ||

31 | 40 | ||

…… | …… |

As you see, the `GROUP BY`

command is an extremely useful tool for finding statistical summaries for any class of variables.

## GROUP BY Multiple Columns with COUNT()

As mentioned earlier, you can use multiple columns in the `GROUP BY`

. In other words, you’re placing all rows with the same values in both `column_name1`

and `column_name2`

in one group.

Let’s consider an example where we want to count the number of times each employee has used each shipping service. Once again, we’ll use the

table.**Orders**

SELECT EmployeeID, ShipperID, COUNT(*) FROM Orders Group BY EmployeeID, ShipperID Order BY ShipperID, Count(*) DESC

RESULT

Number of Records: 26

EmployeeID | ShipperID | COUNT(*) |
---|---|---|

4 | 1 | 12 |

1 | 1 | 8 |

2 | 1 | 7 |

3 | 1 | 7 |

6 | 1 | 7 |

……. | ……. | ……. |

This query could be used, for example, to get feedback on shippers’ customer service from the employees who have used that service. As we can see above, there are 26 unique interaction pairs between employees and shippers!

How does grouping by two columns work behind the scenes? Rows with the same `EmployeeID`

and `ShipperID`

are paired into one group. The size of those groups (i.e. the number of records in the group) is calculated, as you can see below:

EmployeeID | ShipperID | Count |
---|---|---|

1 | 1 | |

1 | 1 | |

1 | 1 | 12 |

1 | 1 | |

1 | 1 | |

…… | ……. | |

4 | 1 | |

4 | 1 | |

4 | 1 | 8 |

4 | 1 | |

4 | 1 | |

…… | ……. |

The `GROUP BY`

command is also a powerful tool that can be combined with the `SQL HAVING`

and `WHERE`

clauses to define additional conditions for the returned results.

Keep in mind that there are certain subtleties when `GROUP BY`

is used with columns that have `NULL`

values.

## Mastering SQL GROUP BY

Needless to say, only practice will make you master this command! Practice is best done on real-world datasets, where you can visualize and relate to the business needs. The SQL Basics course includes a comprehensive set of interactive exercises. You’ll learn how to group rows and compute statistics with aggregate functions on practical datasets. Try it out, and good luck!