Back to articles list Articles Cookbook
13 minutes read

7 Common GROUP BY Errors

Are you making these errors with GROUP BY in SQL? Find out what they are, how to avoid them, and how to fix them.

SQL’s GROUP BY statement can quickly uncover powerful data insights. At first, using GROUP BY can seem easy – e.g. when creating basic SQL reports that you’ll present to business decision-makers. But while learning this powerful feature, you could become trapped in weird errors or get incorrect results caused by improperly written GROUP BY statements. If you’re feeling like things aren’t quite adding up with your use of GROUP BY, keep reading. In this article, I will explain the most common GROUP BY mistakes and how you can avoid them.

Stop Making These 7 Common GROUP BY Mistakes

1. Forgetting GROUP BY with Aggregate Functions

You use SELECT statements with the GROUP BY clause when you want to group and organize rows into specific groups and then perform a specific calculation of each group.

The most common GROUP BY error is forgetting to write GROUP BY inside the SELECT statement.

Here is one example. Imagine that you have the table recipes, which contains 100 records and six columns. This table stores the number of views (no_of_views) per each recipe published on a famous culinary website:

meal_categorynameauthorno_of_viewsno_of_views_lst_mthauthor_id
Cold appetizerMarinated CheeseMarta107104906211
SoupsPumpkin soupJohn68856693772
dessertsBanana CheesecakeAlly131944NULL3
drinksPaloma PicanteLuke72027713124
Bread and pastrySour Cream DoughnutsJohn50935527912
dessertsReal Strawberry CupcakesLisa17626811693911
Soupspotato soupMary64796643886
..................
..................
..................
Bread and pastryCider DoughnutsTim53896511608

recipe table

Here is a short description of the table’s columns:

  • meal_category – The recipe category (soup, drinks, desserts, etc.).
  • name – The recipe’s name.
  • author – The author’s name.
  • no_of_views – The number of views (total pages/recipes viewed) in the current month.
  • no_of_views_lst_mth – The number of views (total pages/recipes viewed) in the previous month.
  • author_id – The author’s unique ID number.

Let's say that you want to count the number of recipes in each meal category. If you write the statement like this (without GROUP BY at the end) ...

SELECT 
  meal_category,
  count(*) AS total_recipes 
FROM recipes ;

... most SQL engines will give you an error. Some engines, however, will output weird, unwanted results. I’m using MySQL and when I run this statement, I get this:

meal_categorytotal_recipes
Cold appetizer100

Result without GROUP BY

100 is the total count of all the recipes in the whole data set and the meal category ‘Cold appetizer’ is just one category out of ten. To correct this type of error, you need to add a GROUP BY meal_category at the end of the statement. (Otherwise, your result in MySQL just doesn't make sense.)

The correct SELECT looks like this:

SELECT 
  meal_category,
  count(*) AS total_recipes 
FROM recipes
GROUP BY meal_category ;

Here is a short explanation of what’s going on:

  • Records are merged based on meal category. For example, desserts are one group, soups another, main dishes yet another, etc. The column meal_category is specified after GROUP BY; it is also listed in SELECT.
  • For each group, we are using COUNT(*) to count the total number of recipes in that group.

I'm not going to dive deeply into the syntax here, but I would definitely suggest you read GROUP BY in SQL Explained or Using GROUP BY in SQL for more details.

As you can see, the result is like we expected:

meal_categorytotal_recipes
Bread and pastry7
Cold appetizer6
desserts20
drinks7
Main dishes20
Salads8
Side dishes12
Soups17
Warm appetizer3

Valid GROUP BY result

2. Confusing WHERE and HAVING

Maybe you’d like to see only those meal categories that have more than 10 recipes. A  lot of beginners would write this query:

SELECT 
  meal_category,
  count(*) AS total_recipes 
FROM recipes
WHERE count(*) > 10 
GROUP BY meal_category ;

This statement will return an error because you cannot use aggregate functions in a WHERE clause. WHERE is used with GROUP BY when you want to filter rows before grouping them.

In our example, we want to filter rows after grouping; in cases like this, we need to use the HAVING clause:

SELECT 
  meal_category,
  count(*) AS total_recipes 
FROM recipes
           GROUP BY meal_category
HAVING count(*) > 10  ;

This misunderstanding about the difference between HAVING and WHERE is the second most common error with GROUP BY.

Let's clarify this difference with two more examples.

Example 1 -  How to Display Meal Categories with 1M+ Views 

A statement that displays only categories with more than 1 million total page views can be written like this:

SELECT 
  meal_category,
  sum(no_of_views) AS total 
FROM recipes 
GROUP BY meal_category
HAVING sum(no_of_views) >1000000;

Here we are using HAVING because we want to filter records after they have been grouped. The result is presented below:

meal_categorytotal
desserts2969324
Main dishes1323981
Side dishes1662910
Soups1100911

Example with HAVING

Example 2 – John’s Performance in Each Meal Category

This query extracts only John's recipes and calculates his performance:

SELECT 
  meal_category, 
  sum(no_of_views) AS total 
FROM recipes 
WHERE author = ‘John’ 
GROUP BY meal_category;

We’re using WHERE because we need to filter records (so we only get John’s data) before we put the records into groups by meal category. Here is how the result looks:

meal_categorytotal
Bread and pastry50935
desserts301869
drinks147745
Main dishes279934
Salads88097
Side dishes415864
Soups393253
Warm appetizer85570

John’s KPIs

HAVING and WHERE are nicely described in our articles HAVING vs. WHERE in SQL: What You Should Know and 5 Examples of GROUP BY. If you would like to see more examples on this topic, I suggest starting there.

3. Listing a Column Inside SELECT but Not in GROUP BY

Now suppose you want to see the total number of views per meal_category and author. We can do that? – ?we only need to add the author column to our previous query:

SELECT 
  meal_category,
  author,
  sum(no_of_views) AS total 
FROM recipes 
GROUP BY meal_category;

Does this look okay to you?  Of course not; it will throw an error in most SQL engines. For example, Oracle will tell you error: Not a GROUP BY expression. Why this confusing error? What is missing here?

Well, the SQL engine doesn't know how to calculate the total for each author because we didn't include it in the GROUP BY clause; the attribute author is not listed inside the GROUP BY clause. This is another common error with GROUP BY.

Let's fix this query and run it one more time:

SELECT 
  meal_category,
  author,
  sum(no_of_views) AS total 
FROM recipes 
GROUP BY meal_category, author;

The result is:

meal_categoryauthortotal
Bread and pastryDino53789
Bread and pastryJohn50935
Bread and pastryMarta52998
Bread and pastryMary52904
Bread and pastryPatricia51451
Bread and pastryTim106226
.........
.........
.........
SoupsMary125731
SoupsMonte128356
SoupsPatricia255574
SoupsTim132532
Warm appetizerJohn85570
Warm appetizerLisa82960
Warm appetizerMary87560

Now this looks okay. Remember, unaggregated columns that are listed in SELECT must also be listed in GROUP BY.  In our case, the unaggregated columns are meal_category and author, which are now in SELECT and GROUP BY.

You don't list columns that are inside aggregate functions in GROUP BY. In our example, the column no_of_views is used in the aggregate function SUM() and thus is not listed in the GROUP BY clause.

Our article How to Fix a ‘Not a GROUP BY Expression’ Error gives more examples related to this type of error.

4. Not Grouping by a Unique Key

Now let’s try something else. Suppose we want to get the average number of page views for each recipe author. The following query calculates the average total number of page views for each author using the author name:

SELECT 
  author,
  avg(no_of_views) 
FROM recipes 
GROUP BY author;

When you look at the result, you will notice that Lisa averages 116101.5 page views:

authoravg(NO_OF_VIEWS)
Ally106545
Dino94667.9091
John88163.35
Lisa116101.5
Luke104591
Marta119789.1667
Mary101040.0588
Monte84794
Patricia81911.1333
Tim76185.375

GROUP BY author – but names are not unique

However, we actually have two authors named Lisa in our table. When we group the results by the author column, both Lisas are averaged together. Why? Because we are using a non-unique column in the GROUP BY. This means that not all the grouping values have to be unique. If we want to see each Lisa’s average separately, we should add author_id (a unique column) to the GROUP BY list:

SELECT 
  author, author_id
  avg(no_of_views) 
FROM recipes 
GROUP BY author, author_id;

Now we see how recipes from Lisa (id=11) are much more viewed than recipes by Lisa (id=5):

authorauthor_idavg(no_of_views)
Ally3106545
Dino794667.9091
John288163.35
Lisa585798
Lisa11146405
Luke4104591
Marta1119789.1667
Mary6101040.0588
Monte984794
Patricia1081911.1333
Tim876185.375

GROUP BY with author and author_id

It is important to always think about grouping keys. Grouping values should be unique and must represent each group in the desired way. Otherwise, you’ll get inaccurate, confusing results and possibly a GROUP BY error.

5. Confusing COUNT(distinct) and COUNT(*)

If you’re curious to see the total number of authors for each meal category, you can write a GROUP BY statement to calculate that. Let's use COUNT(*) and retrieve the number of authors in each category:

SELECT 
  meal_category, 
  count(*) 
FROM recipes 
GROUP BY meal_category;

Here’s the result – but it’s not what you expected, is it?

meal_categorycount(*)
Bread and pastry7
Cold appetizer6
desserts20
drinks7
Main dishes20
Salads8
Side dishes12
Soups17
Warm appetizer3

This is the total number of recipes in each category, not the total number of authors. Why is that? Well, COUNT(*) counts all the rows in each group. The table recipe contains information on a recipe level -  each record is one recipe. This query counts the recipes (rows) in each category, not the recipe authors.

One author can have many recipes in each category, so to get the information you want, you must count distinct authors (using COUNT(distinct author) instead of COUNT(*))  inside each group. This is a very common GROUP BY error.

So, when should you use COUNT(*), COUNT(expression) and COUNT(distinct expression)?

Let's take a look at an example:

SELECT 
  meal_category, 
  count(distinct author), 
  count(author),
  count(*) 
FROM recipes 
GROUP BY meal_category;
meal_categorycount(distinct author)count(author)count(*)
Bread and pastry677
Cold appetizer266
desserts82020
drinks577
Main dishes92020
Salads688
Side dishes81212
Soups61717
Warm appetizer333

The difference between COUNT(*) and COUNT(expression) is visible if we are doing calculations on a column that has some missing values. When missing values are present,  COUNT(*) will count all the records in a group and COUNT(expression) will count only non-null values.

In the above example, COUNT(*) and COUNT(author) give the exact same result because the author column doesn’t have any NULL values. 

COUNT(distinct author) gives us the number of distinct authors for each category, which is not the same as COUNT(*). For example, the cold appetizer meal category contains six recipes from two distinct authors. COUNT(*) counts the number of recipes (records) in each category, while COUNT(distinct author) counts the number of distinct authors.

So, if you would like to display the total number of distinct authors per each meal category, use COUNT(distinct author). Here is the correct query:

SELECT 
  meal_category, 
  count(distinct author)
FROM recipes 
GROUP BY meal_category;
GROUP BY meal_category;

For a more detailed explanation, see What is the Difference Between COUNT(*), COUNT(1), COUNT(column name), and COUNT(DISTINCT column name)?

6. Problems Using Aggregate Functions With NULLs

This is another ‘missing value’ problem. Let's say that you want to calculate the average total number of views from the previous month for each category. Your colleague calculated those figures, but they’d like you to double check the result.

Here is your query:

SELECT
      meal_category,
      avg(no_of_views_lst_mth) as average,
  FROM recipes 
GROUP BY meal_category;

And what you get is ...

meal_categoryaverage
Bread and pastry52274.8571
Cold appetizer95584.2
desserts144349.7222
drinks72551.7143
Main dishes61350.8889
Salads90798.875
Side dishes139765.25
Soups64978.8824
Warm appetizer78390.6667

The result looks okay and you are confident when it comes to the correctness of your query. However, your colleague got slightly different figures:

meal_categoryaverage
Bread and pastry52274.8571
Cold appetizer79653.5
desserts129914.75
drinks72551.7143
Main dishes55215.8
Salads90798.875
Side dishes139765.25
Soups64978.8824
Warm appetizer78390.6667

What just happened? Why the different results?

 In a nutshell, the differing results arise from different interpretations of missing values.

The column no_of_views_lst_mth represents the number of total page views in the previous month. If a recipe was created in the current month, this column will be NULL for that row.

For example, Ally’s Banana Cheesecake recipe was written in the current month, so there are no statistics for the previous month:

meal_categorynameauthorno_of_viewsno_of_views_lst_mthauthor_id
dessertsBanana CheesecakeAlly131944NULL3

Banana cheesecake was published in current month

Now, let's get back to those averages and their different results. Averages are calculated as the total sum of no_of_views_lst_mth divided by the total number of records. If you use the AVG() function and NULLs are present, the engine just ignores the NULLs and does calculations without them. This is what happened when you ran your query – the NULLs were omitted. In some cases, you’ll want to replace NULLs with 0 (because business logic dictates); this is what your colleague did, which produced slightly different figures. Here is your colleague’s query:

SELECT
      meal_category,
           avg(CASE WHEN no_of_views_lst_mth is null 
            THEN 0 
            ELSE no_of_views_lst_mth END) AS average
FROM recipes 
GROUP BY meal_category;

Notice how the averages from these two queries treat NULLs differently. For example, the ‘desserts’ category contains NULLs. Thus, the first query omits these rows and does not count them towards the total number of rows; this gives the value 144349.72. The second query replaces all NULLs with zero and counts these rows in the average, giving a smaller value  of 129914.75.

I would say that both queries could be valid, depending on how you want to calculate averages.

7. Using COUNT(*) with GROUP BY and a LEFT JOIN

Using GROUP BY with a LEFT JOIN statement can be quite confusing – especially with COUNT(). Let's see how COUNT(*) and COUNT(expression) function in a LEFT JOIN.

Let’s suppose that someone in marketing has the following table, recipes_campaigns. It contains information about the number of campaigns run on each meal category in the current month:

meal_categorycampaigns
Bread and pastry2
Cold appetizer1
desserts3
drinks0
Main dishes3
Salads1
Side dishes2
Soups3
Warm appetizer0
brunch1
sandwiches0

recipes_campaign

In addition to the data in recipes_campaigns, the marketer also wants to see the number of recipes for each meal category. For that, we’ll need information from the recipes table. So let’s left join these two tables and calculate the number of recipes using COUNT(*), like so:

SELECT 
      a.meal_category,
      count(*),
     FROM recipes_campaigns a 
LEFT JOIN recipes b ON a.meal_category=b.meal_category
GROUP BY a.meal_category;

Here’s the result:

meal_categorycount(*)
Bread and pastry7
brunch1
Cold appetizer6
desserts20
drinks7
Main dishes20
Salads8
sandwiches1
Side dishes12
Soups17
Warm appetizer3

This isn’t what we expected. The table recipe does not contain any recipes in the ‘brunch’ category, so why then did we get that 1 in the result? This happens because COUNT() is applied to the LEFT JOIN result! When you LEFT JOIN two tables, the ‘brunch’ category will be present in the output – even if there are no matching recipes or categories in the recipe table.

How can we fix this? If we use COUNT(expression) instead of COUNT(*), we’ll get the result we want:

SELECT 
      a.meal_category,
      count(author_id),
FROM recipes_campaigns a 
LEFT JOIN recipes b ON a.meal_category=b.meal_category
GROUP BY a.meal_category;

Once you run this, you get:

meal_categorycount(author_id)
Bread and pastry7
brunch0
Cold appetizer6
desserts20
drinks7
Main dishes20
Salads8
sandwiches0
Side dishes12
Soups17
Warm appetizer3

Here, COUNT(author_id) counts only the non-NULL values in author_id after the LEFT JOIN is performed. There is no author_id value for the ‘brunch’ category; in other words, it’s NULL and the result for that category is 0.

You Can Solve GROUP BY Errors!

Through several examples, we’ve explored GROUP BY and the most common errors that beginners often make. I hope that now you have a better sense of how GROUP BY works and what’s causing those weird errors or confusing results.

GROUP BY is really very important in report creation. If you want to learn how to construct good reports, I recommend our Creating Basic SQL Reports course. It has lots of interactive exercises that let you gain experience writing GROUP BY queries for reports. And more experience certainly reduces the possibility of mistakes!