Back to articles list Articles Cookbook
8 minutes read

Top 9 SQL GROUP BY Interview Questions

Preparing for an SQL job interview? This guide will walk you through nine basic, intermediate, and advanced GROUP BY questions that you may have to answer.

There are common GROUP BY questions that pop up in a lot of SQL job interviews; in this article, we’ll explain how to answer them. If you are a data analyst or another data professional preparing for an interview, this article aims to help you out. Of course, anyone interested in SQL and the GROUP BY clause can read these questions, too. You’ll probably learn something new and interesting about this popular clause.

This article is divided into two sections: basic questions and intermediate/advanced questions. This gradual approach also reflects our courses that cover the GROUP BY clause, including our SQL Basics course. In 129 interactive exercises, it introduces important basic SQL topics, such as GROUP BY, JOINs, WHERE, and set operations (UNION, INTERSECT, EXCEPT).

The intermediate and advanced nuances of GROUP BY are covered in our Creating Basic SQL Reports course. Here you’ll learn how to aggregate data, use the CASE WHEN statement, create more complex metrics, ratios, and data comparisons, and work with Common Table Expressions (CTEs). If you want to learn more, take a look at this article on the Creating Basic SQL Reports course.

Let’s kick off this overview with some basic interview questions!

Basic GROUP BY Interview Questions

1: What does GROUP BY do?

The GROUP BY clause in SQL arranges query output into groups, with all the rows with the same value in a given column belonging to one group. For example, if you have a table that stores employee data, you can count employees by department or find the average salary for every department. GROUP BY is usually used with the aggregate functions such as COUNT(), SUM(), AVG(), MIN(), MAX().

You can find a more thorough overview of GROUP BY in our article How Does SQL GROUP BY Work?

2: Use the table movie_streaming to write three queries.

It’s typical for the interviewer to ask the candidate to write a query using the GROUP BY clause. We’ll demonstrate three examples showcasing common GROUP BY topics.

The table movie_streaming contains statistics on how many times a movie was streamed on each day.

12022-04-01FargoCoen brothers495
22022-04-01The Big LebowskiCoen brothers512
32022-04-01No Country for Old MenCoen brothers270
42022-04-01DogtoothYorgos Lanthimos157
52022-04-01The LobsterYorgos Lanthimos247
62022-04-01The Killing of a Sacred DeerYorgos Lanthimos320
72022-04-02FargoCoen brothers321
82022-04-02The Big LebowskiCoen brothers905
92022-04-02No Country for Old MenCoen brothers308
102022-04-02DogtoothYorgos Lanthimos233
112022-04-02The LobsterYorgos Lanthimos405
122022-04-02The Killing of a Sacred DeerYorgos Lanthimos109

2a: Find the total number of streams by date.

This question asks you to group data by only one column:

SELECT date,
	 SUM(number_of_streams) AS total_number_of_streams
FROM movie_streaming
GROUP BY date;

The code output will be:


The grouping is done on the column date, and the SUM() aggregate function is used to get the number of streams by date.

2b: Find the total number of streams by date and director.

To get an answer to this question, data needs to be grouped by two columns. To do this, the columns date and director have to be stated in GROUP BY and separated with a comma.

SELECT date,
	 SUM(number_of_streams) AS total_number_of_streams
FROM movie_streaming
GROUP BY date, director;

This is what the code will return:

2022-04-01Yorgos Lanthimos724
2022-04-02Coen brothers1,534
2022-04-02Yorgos Lanthimos747
2022-04-01Coen brothers1,277

2c: Find the total number of streams by date and director. Show only dates with a total number of streams above 740.

SELECT date,
	 SUM(number_of_streams) AS total_number_of_streams
FROM movie_streaming
GROUP BY date, director
HAVING SUM(number_of_streams) > 740;

Data is again grouped by the columns date and director. The HAVING clause and the SUM() function are used to show only the total streams above 740. (SUM() adds up the number of daily streams; HAVING filters out streams that don’t meet the minimum amount.)

Running the query will return:

2022-04-02Coen brothers1,534
2022-04-02Yorgos Lanthimos747
2022-04-01Coen brothers1,277

You can find more details about these concepts in this article explaining how GROUP BY works.

3: How do you filter groups in an SQL query?

Groups in an SQL query are filtered using the HAVING clause. HAVING cannot be used without GROUP BY ; it’s always written after GROUP BY, and its purpose is to filter data resulting from an aggregate function – e.g. finding groups with the count higher than a specified number, etc.

It’s important to note that the WHERE clause is also used for filtering in a SQL query. While HAVING is used to filter groups, WHERE filters rows. In other words, WHERE filters data before it is aggregated.

4: What is the difference between WHERE and HAVING in SQL?

Both WHERE and HAVING are used to filter data in an SQL query. The main difference is that WHERE is used on non-aggregated values, while HAVING is used on the aggregated ones. The order of execution is as follows: WHEREGROUP BYHAVING. This also means WHERE is written before GROUP BY, while HAVING comes after GROUP BY.

In practice, WHERE filters data first. This filtered data will be grouped and aggregated, and then HAVING will filter the grouped and aggregated data.

For example, you would use WHERE to filter only users that have been employed for more than one year. Then you can calculate the average salary for employees with 1+ years’ experience by department. Finally, you can show only departments with an average salary above a specific value using the HAVING clause. That way, you filter rows before and after the aggregate function.  Check out this article for a detailed explanation of the differences between WHERE and HAVING.

Intermediate and Advanced GROUP BY Interview Questions

5: What functions can be used with GROUP BY?

The GROUP BY clause is usually used with SQL’s aggregate functions. These take values from  individual rows, perform a calculation, and return a single value. Some of the most used aggregate functions are:

  • SUM() – Adds up all the row values.
  • COUNT() – Counts the number of rows.
  • AVG() – Returns the average value.
  • MIN() – Returns the smallest value.
  • MAX() – Returns the largest value.

6: What is the difference between COUNT(*), COUNT(expression), and COUNT(DISTINCT expression)?

COUNT(*) returns the number of rows, including duplicate rows and rows with NULL values.

This is different from COUNT(expression), which returns only the number of rows with non-NULL values. It also counts rows with duplicate values.

If you add the DISTINCT keyword, any duplicate rows will be ignored, as will rows with NULL values. That way, you get the number of unique or distinct rows.

This article on the three uses of the COUNT() function explains the differences in greater detail.

7: What happens to NULLs when you use GROUP BY?

The GROUP BY clause doesn’t ignore NULL values. Therefore, if you use it and there are  NULL values in the column, all the rows with NULL column values will be treated as one group.

How does this work in practice? Here’s a more detailed explanation of GROUP BY and NULLs, along with practical examples.

8: What will this code return?

SELECT department,
 AVG(salary) AS average_salary
FROM salaries
GROUP BY department;

This code will throw an error because the column month is not written in the GROUP BY. The error occurs because all the columns appearing in the SELECT statement must appear in the GROUP BY. This “not a GROUP BY expression” error has to be fixed.

For the code to work, it should be written like this:

SELECT department,
 	 AVG(salary) AS average_salary
FROM salaries
GROUP BY department, month;

Now this query will output the average salary by month and by department.

However, all the columns appearing in the GROUP BY clause don’t need to appear in the SELECT statement. Therefore, the above code could look like this and still not return an error:

SELECT department,
 	 AVG(salary) AS average_salary
FROM salaries
GROUP BY department, month;

While this code will work and will return the correct average values, not showing the month information means it’ll be impossible to decipher which average salary relates to which month.

9: What are GROUP BY extensions? What GROUP BY extensions do you know?

GROUP BY extensions enhance the possibilities of the GROUP BY clause, allowing the creation of more complex reports by way of more sophisticated data grouping.

These extensions are ROLLUP, CUBE, and GROUPING SETS. The ROLLUP extension is most often used in calculating subtotals in hierarchical data. It will assume the hierarchical relationship between data, creating subtotals for every level and grand total.

The CUBE extension is similar, but it creates all the possible data combinations and returns their subtotals and a grand total.

The third extension is GROUPING SETS.  It allows writing the multiple GROUP BY clauses in one GROUP BY clause. This comes from a definition of a grouping set, which is a group of columns by which you want to group your data.

To see how these extensions work in practice, you should see what the GROUP BY Extensions course offers.

Get Ready for Your SQL Job Interview! 

This has been an overview of the most common SQL interview questions focused on the GROUP BY clause.

GROUP BY is an important concept. Without using it like it’s your second nature, you won’t be able to impress on the job interview. This article gave you a good feel of what you can expect..

But it’s not enough to just know about concepts. You also need practice, and practice means writing a lot of code using the GROUP BY. Our SQL Basics and Creating Basic SQL Reports courses will give you a structured opportunity to do exactly that!