Back to articles list Articles Cookbook
6 minutes read

What Is the SQL GROUPING SETS Clause, and How Do You Use it?

GROUPING SETS are groups, or sets, of columns by which rows can be grouped together. Instead of writing multiple queries and combining the results with a UNION, you can simply use GROUPING SETS.

GROUPING SETS in SQL can be considered an extension of the GROUP BY clause. It allows you to define multiple grouping sets in the same query.

Let’s look at its syntax and how it can be equivalent to a GROUP BY with multiple UNION ALL clauses.

SQL GROUPING SETS Syntax

The general syntax of the GROUPING SETS is as follows:

SELECT
    aggregate_function(column_1)
    column_2,
    column_3,
FROM
    table_name
GROUP BY
    GROUPING SETS (
        (column_2, column_3),
        (column_2),
        (column_3),
        ()
);

You can see how we are grouping by the different sets.

This syntax is equivalent to the following lengthier query that uses GROUP BY with UNION ALL to combine the results:

SELECT SUM(column_1), column_2, column_3
FROM table_name
GROUP BY
    column_2,
    column_3

UNION ALL

SELECT SUM(column_1), column_2, NULL
FROM table_name
GROUP BY column_2

UNION ALL

SELECT SUM(column_1), NULL, column_3
FROM table_name
GROUP BY column_3

UNION ALL

SELECT SUM(column_1), NULL, NULL
FROM table_name

If you use the GROUP BY like this, you need multiple UNION ALL clauses to combine the data from different sources. UNION ALL also requires all result sets to have the same number of columns with compatible data types, so you need to adjust the queries by adding a NULL value where required.

Even though the query will work as you expect, it has two main problems:

  • It is lengthy and not very manageable.
  • It can lead to a performance issue, because SQL has to scan the sales table each time.

The GROUPING SETS clause addresses these problems. But how does it affect the output compared to a traditional GROUP BY clause? Time to look at an example!

SQL GROUPING SETS Example

We need some sample data. Let’s create a table called payments that contains all payments our company has received in January, February, and March for the past four years, 2018 to 2021. The exact store where the payment occurred is denoted by the store_id column.

To create this table, execute the following query:

CREATE TABLE payment (payment_amount decimal(8,2), payment_date date, store_id int);

INSERT INTO payment
VALUES
(1200.99, '2018-01-18', 1),
(189.23, '2018-02-15', 1),
(33.43, '2018-03-03', 3),
(7382.10, '2019-01-11', 2),
(382.92, '2019-02-18', 1),
(322.34, '2019-03-29', 2),
(2929.14, '2020-01-03', 2),
(499.02, '2020-02-19', 3),
(994.11, '2020-03-14', 1),
(394.93, '2021-01-22', 2),
(3332.23, '2021-02-23', 3),
(9499.49, '2021-03-10', 3),
(3002.43, '2018-02-25', 2),
(100.99, '2019-03-07', 1),
(211.65, '2020-02-02', 1),
(500.73, '2021-01-06', 3);

You can view the data using this simple SELECT clause:

SELECT * FROM payment ORDER BY payment_date;

Executing this query yields the result:

payment_amountpayment_datestore_id
1200.992018-01-181
189.232018-02-151
3002.432018-02-252
33.432018-03-033
7382.102019-01-112
382.922019-02-181
100.992019-03-071
322.342019-03-292
2929.142020-01-032
211.652020-02-021
499.022020-02-193
994.112020-03-141
500.732021-01-063
394.932021-01-222
3332.232021-02-233
9499.492021-03-103

You can see that there are multiple entries for some stores. Imagine we are preparing a report and we want to see one total for each store. The SUM() aggregate function can help us with this. We will also use the GROUP BY clause to group our results by year and store.

SELECT
  SUM(payment_amount),
  YEAR(payment_date) AS 'Payment Year',
  store_id AS 'Store'
FROM payment
GROUP BY YEAR(payment_date), store_id
ORDER BY YEAR(payment_date), store_id;
SUM(payment_amount)Payment YearStore
1390.2220181
3002.4320182
33.4320183
483.9120191
7704.4420192
1205.7620201
2929.1420202
499.0220203
394.9320212
13332.4520213

The results are aggregated by each unique combination of year and store.

However, we can’t see the total payments by year: the total payments for 2018, 2019, 2020, or 2021. We cannot see the totals by store either, which would be a useful metric to have. Using GROUPING SETS allows us to view these totals.

SELECT
  SUM(payment_amount),
  YEAR(payment_date) AS 'Payment Year',
  store_id AS 'Store'
FROM payment_new
GROUP BY GROUPING SETS (YEAR(payment_date), store_id)
ORDER BY YEAR(payment_date), store_id;
SUM(payment_amount)Payment YearStore
3079.89NULL1
14030.94NULL2
13864.90NULL3
4426.082018NULL
8188.352019NULL
4633.922020NULL
13727.382021NULL

Wow, our results changed drastically! Now, we see just the grand totals for each store along with the grand totals for each year.

For the columns by which the rows are not grouped, you see the NULL values.

Remember that you can include multiple GROUP BY clauses in your GROUPING SETS. Applying this to our query yields the following:

SELECT
  SUM(payment_amount),
  YEAR(payment_date) AS 'Payment Year',
  store_id AS 'Store'
FROM payment_new
GROUP BY GROUPING SETS
(
(YEAR(payment_date), store_id),
(YEAR(payment_date)),
(store_id)
)
ORDER BY YEAR(payment_date), store_id;
SUM(payment_amount)Payment YearStore
3079.89NULL1
14030.94NULL2
13864.90NULL3
4426.082018NULL
1390.2220181
3002.4320182
33.4320183
8188.352019NULL
483.9120191
7704.4420192
4633.922020NULL
1205.7620201
2929.1420202
499.0220203
13727.382021NULL
394.9320212
13332.4520213

Before ending this tutorial, we should mention two other SQL GROUP BY extensions that could prove useful for your particular project or scenario: ROLLUP and CUBE. These topics are covered in great detail in this Advanced SQL learning track from LearnSQL.com, which features window functions, GROUP BY extensions, and recursive queries.

SQL ROLLUP Example

Similar to GROUPING SETS, you can use the ROLLUP option in a single query to generate multiple grouping sets.

ROLLUP assumes a hierarchy among the input columns. For example, if the input columns are:

GROUP BY ROLLUP(column_1,column_2)

the hierarchy for this is column_1 > column_2, and ROLLUP generates the following grouping sets:

(column_1, column_2)
(column_1)
()

ROLLUP generates all grouping sets that make sense in this hierarchy. It generates a subtotal row every time the value of column_1 changes; this is the hierarchy we have provided. For this reason, we often use ROLLUP to generate subtotals and grand totals in reporting. The ordering of your columns in ROLLUP is very important.

Let's look at a query that uses ROLLUP:

SELECT
  SUM(payment_amount),
  YEAR(payment_date) AS 'Payment Year',
  store_id AS 'Store'
FROM payment
GROUP BY ROLLUP (YEAR(payment_date), store_id)
ORDER BY YEAR(payment_date), store_id
SUM(payment_amount)Payment YearStore
30975.73NULLNULL
4426.082018NULL
1390.2220181
3002.4320182
33.4320183
8188.352019NULL
483.9120191
7704.4420192
4633.922020NULL
1205.7620201
2929.1420202
499.0220203
13727.382021NULL
394.9320212
13332.4520213

The grand total is shown at the top of the result:

30975.73NULLNULL

The rest of the result is structured as follows. First, the yearly total is shown:

4426.082018NULL

This is followed by the totals by store by year:

1390.2220181
3002.4320182
33.4320183

As you can see, ROLLUP generates a subtotal row every time the value of Payment Year changes, since this is the hierarchy we provided. This example shows how useful ROLLUP can be for reporting purposes.

SQL CUBE Example

Similar to ROLLUP, CUBE is an extension of the GROUP BY clause. It allows you to generate subtotals for all combinations of the grouping columns specified in the GROUP BY clause.

The CUBE is like combining GROUPING SETS and ROLLUP. It shows the detailed output of both.

SELECT
  SUM(payment_amount),
  YEAR(payment_date) AS 'Payment Year',
  store_id AS 'Store'
FROM payment
GROUP BY CUBE (YEAR(payment_date), store_id)
ORDER BY YEAR(payment_date), store_id
SUM(payment_amount)Payment YearStore
30975.73NULLNULL
3079.89NULL1
14030.94NULL2
13864.90NULL3
4426.082018NULL
1390.2220181
3002.4320182
33.4320183
8188.352019NULL
483.9120191
7704.4420192
4633.922020NULL
1205.7620201
2929.1420202
499.0220203
13727.382021NULL
394.9320212
13332.4520213

The main difference in this output from the ROLLUP example is that the grand total for each store is also shown here.

3079.89NULL1
14030.94NULL2
13864.90NULL3

Apart from these rows, all the rows in this result are the same as the result of the ROLLUP.

This concludes our comparison of GROUPING SETS, ROLLUP, and CUBE! You can find more examples in this article about grouping, rolling and cubing data.

Group Your Data Effectively With SQL GROUP BY Extensions

Gaining mastery of the SQL GROUP BY extensions will take practice. Options like GROUPING SETS, ROLLUP, and CUBE allow you to manipulate the results of your queries in different ways. Knowing how to use these extensions effectively reduces the need for manually formatting your data before passing it on to relevant stakeholders.

To expand your knowledge further in this area, consider this GROUP BY extensions course from LearnSQL.com that covers GROUPING SETS, ROLLUP, and CUBE.