*Are you making these errors with GROUP BY in SQL? Find out what they are, how to avoid them, and how to fix them.*

SQL’s `GROUP BY`

statement can quickly uncover powerful data insights. At first, using `GROUP BY`

can seem easy – e.g. when creating basic SQL reports that you’ll present to business decision-makers. But while learning this powerful feature, you could become trapped in weird errors or get incorrect results caused by improperly written `GROUP BY`

statements. If you’re feeling like things aren’t quite adding up with your use of GROUP BY, keep reading. In this article, I will explain the most common `GROUP BY`

mistakes and how you can avoid them.

## Stop Making These 7 Common GROUP BY Mistakes

### 1. Forgetting GROUP BY with Aggregate Functions

You use `SELECT`

statements with the `GROUP BY`

clause when you want to group and organize rows into specific groups and then perform a specific calculation of each group.

The most common `GROUP BY`

error is forgetting to write `GROUP BY`

inside the SELECT statement.

Here is one example. Imagine that you have the table **recipes***,* which contains 100 records and six columns. This table stores the number of views (`no_of_views`

) per each recipe published on a famous culinary website:

meal_category | name | author | no_of_views | no_of_views_lst_mth | author_id |
---|---|---|---|---|---|

Cold appetizer | Marinated Cheese | Marta | 107104 | 90621 | 1 |

Soups | Pumpkin soup | John | 68856 | 69377 | 2 |

desserts | Banana Cheesecake | Ally | 131944 | NULL | 3 |

drinks | Paloma Picante | Luke | 72027 | 71312 | 4 |

Bread and pastry | Sour Cream Doughnuts | John | 50935 | 52791 | 2 |

desserts | Real Strawberry Cupcakes | Lisa | 176268 | 116939 | 11 |

Soups | potato soup | Mary | 64796 | 64388 | 6 |

... | ... | ... | ... | ... | ... |

... | ... | ... | ... | ... | ... |

... | ... | ... | ... | ... | ... |

Bread and pastry | Cider Doughnuts | Tim | 53896 | 51160 | 8 |

*recipe table*

Here is a short description of the table’s columns:

`meal_category`

– The recipe category (soup, drinks, desserts, etc.).`name`

– The recipe’s name.`author`

– The author’s name.`no_of_views`

– The number of views (total pages/recipes viewed) in the current month.`no_of_views_lst_mth`

– The number of views (total pages/recipes viewed) in the previous month.`author_id`

– The author’s unique ID number.

Let's say that you want to count the number of recipes in each meal category. If you write the statement like this (without `GROUP BY`

at the end) ...

SELECT meal_category, count(*) AS total_recipes FROM recipes ;

... most SQL engines will give you an error. Some engines, however, will output weird, unwanted results. I’m using MySQL and when I run this statement, I get this:

meal_category | total_recipes |
---|---|

Cold appetizer | 100 |

*Result without GROUP BY*

100 is the total count of all the recipes in the whole data set and the meal category ‘Cold appetizer’ is just one category out of ten. To correct this type of error, you need to add a `GROUP BY meal_category`

at the end of the statement. (Otherwise, your result in MySQL just doesn't make sense.)

The correct SELECT looks like this:

SELECT meal_category, count(*) AS total_recipes FROM recipes GROUP BY meal_category ;

Here is a short explanation of what’s going on:

- Records are merged based on meal category. For example, desserts are one group, soups another, main dishes yet another, etc. The column
`meal_category`

is specified after`GROUP BY`

; it is also listed in`SELECT`

. - For each group, we are using
`COUNT(*)`

to count the total number of recipes in that group.

I'm not going to dive deeply into the syntax here, but I would definitely suggest you read GROUP BY in SQL Explained or Using GROUP BY in SQL for more details.

As you can see, the result is like we expected:

meal_category | total_recipes |
---|---|

Bread and pastry | 7 |

Cold appetizer | 6 |

desserts | 20 |

drinks | 7 |

Main dishes | 20 |

Salads | 8 |

Side dishes | 12 |

Soups | 17 |

Warm appetizer | 3 |

*Valid GROUP BY result*

### 2. Confusing WHERE and HAVING

Maybe you’d like to see only those meal categories that have more than 10 recipes. A lot of beginners would write this query:

SELECT meal_category, count(*) AS total_recipes FROM recipes WHERE count(*) > 10 GROUP BY meal_category ;

This statement will return an error because you cannot use aggregate functions in a `WHERE`

clause. `WHERE`

is used with `GROUP BY`

when you want to **filter rows before grouping them**.

In our example, we want to filter rows *after grouping*; in cases like this, we need to use the `HAVING`

clause:

SELECT meal_category, count(*) AS total_recipes FROM recipes GROUP BY meal_category HAVING count(*) > 10 ;

This misunderstanding about the difference between `HAVING`

and `WHERE`

is the second most common error with `GROUP BY`

.

Let's clarify this difference with two more examples.

#### Example 1 - How to Display Meal Categories with 1M+ Views

A statement that displays only categories with more than 1 million total page views can be written like this:

SELECT meal_category, sum(no_of_views) AS total FROM recipes GROUP BY meal_category HAVING sum(no_of_views) >1000000;

Here we are using `HAVING`

because we want to filter records after they have been grouped. The result is presented below:

meal_category | total |
---|---|

desserts | 2969324 |

Main dishes | 1323981 |

Side dishes | 1662910 |

Soups | 1100911 |

*Example with HAVING *

#### Example 2 – John’s Performance in Each Meal Category

This query extracts only John's recipes and calculates his performance:

SELECT meal_category, sum(no_of_views) AS total FROM recipes WHERE author = ‘John’ GROUP BY meal_category;

We’re using WHERE because we need to filter records (so we only get John’s data) before we put the records into groups by meal category. Here is how the result looks:

meal_category | total |
---|---|

Bread and pastry | 50935 |

desserts | 301869 |

drinks | 147745 |

Main dishes | 279934 |

Salads | 88097 |

Side dishes | 415864 |

Soups | 393253 |

Warm appetizer | 85570 |

*John’s KPIs*

`HAVING`

and `WHERE`

are nicely described in our articles What Is the Difference Between WHERE and HAVING Clauses in SQL? and 5 Examples of GROUP BY. If you would like to see more examples on this topic, I suggest starting there.

### 3. Listing a Column Inside SELECT but Not in GROUP BY

Now suppose you want to see the total number of views per `meal_category`

and `author`

. We can do that? – ?we only need to add the author column to our previous query:

SELECT meal_category, author, sum(no_of_views) AS total FROM recipes GROUP BY meal_category;

Does this look okay to you? Of course not; it will throw an error in most SQL engines. For example, Oracle will tell you “**error**: Not a GROUP BY expression”. Why this confusing error? What is missing here?

Well, the SQL engine doesn't know how to calculate the total for each author because we didn't include it in the GROUP BY clause; the attribute author is not listed inside the GROUP BY clause. This is another common error with GROUP BY.

Let's fix this query and run it one more time:

SELECT meal_category, author, sum(no_of_views) AS total FROM recipes GROUP BY meal_category, author;

The result is:

meal_category | author | total |
---|---|---|

Bread and pastry | Dino | 53789 |

Bread and pastry | John | 50935 |

Bread and pastry | Marta | 52998 |

Bread and pastry | Mary | 52904 |

Bread and pastry | Patricia | 51451 |

Bread and pastry | Tim | 106226 |

... | ... | ... |

... | ... | ... |

... | ... | ... |

Soups | Mary | 125731 |

Soups | Monte | 128356 |

Soups | Patricia | 255574 |

Soups | Tim | 132532 |

Warm appetizer | John | 85570 |

Warm appetizer | Lisa | 82960 |

Warm appetizer | Mary | 87560 |

Now this looks okay. Remember, **unaggregated columns that are listed in **`SELECT`

must also be listed in ** GROUP BY**. In our case, the unaggregated columns are

`meal_category`

and `author`

, which are now in `SELECT`

and `GROUP BY`

.You don't list columns that are inside aggregate functions in `GROUP BY`

. In our example, the column `no_of_views`

is used in the aggregate function `SUM()`

and thus is not listed in the `GROUP BY`

clause.

If you would like to learn more about this topic, check out our article GROUP BY Clause: How Well Do You Know It?. It explains why SELECTed columns need to appear in the GROUP BY clause. Also, How to Fix a ‘Not a GROUP BY Expression’ Error gives more examples related to this type of error.

### 4. Not Grouping by a Unique Key

Now let’s try something else. Suppose we want to get the average number of page views for each recipe author. The following query calculates the average total number of page views for each author using the author name:

SELECT author, avg(no_of_views) FROM recipes GROUP BY author;

When you look at the result, you will notice that Lisa averages 116101.5 page views:

author | avg(NO_OF_VIEWS) |
---|---|

Ally | 106545 |

Dino | 94667.9091 |

John | 88163.35 |

Lisa | 116101.5 |

Luke | 104591 |

Marta | 119789.1667 |

Mary | 101040.0588 |

Monte | 84794 |

Patricia | 81911.1333 |

Tim | 76185.375 |

*GROUP BY author – but names are not unique*

However, we actually have two authors named Lisa in our table. When we group the results by the author column, both Lisas are averaged together. Why? Because we are using a non-unique column in the `GROUP BY`

. This means that not all the grouping values have to be unique. If we want to see each Lisa’s average separately, we should add `author_id`

(a unique column) to the `GROUP BY`

list:

SELECT author, author_id avg(no_of_views) FROM recipes GROUP BY author, author_id;

Now we see how recipes from Lisa (*id=11*) are much more viewed than recipes by Lisa (*id=5*):

author | author_id | avg(no_of_views) |
---|---|---|

Ally | 3 | 106545 |

Dino | 7 | 94667.9091 |

John | 2 | 88163.35 |

Lisa | 5 | 85798 |

Lisa | 11 | 146405 |

Luke | 4 | 104591 |

Marta | 1 | 119789.1667 |

Mary | 6 | 101040.0588 |

Monte | 9 | 84794 |

Patricia | 10 | 81911.1333 |

Tim | 8 | 76185.375 |

*GROUP BY with author and author_id*

It is important to always think about grouping keys. Grouping values should be unique and must represent each group in the desired way. Otherwise, you’ll get inaccurate, confusing results and possibly a GROUP BY error.

### 5. Confusing COUNT(distinct) and COUNT(*)

If you’re curious to see the total number of authors for each meal category, you can write a `GROUP BY`

statement to calculate that. Let's use `COUNT(*)`

and retrieve the number of authors in each category:

SELECT meal_category, count(*) FROM recipes GROUP BY meal_category;

Here’s the result – but it’s not what you expected, is it?

meal_category | count(*) |
---|---|

Bread and pastry | 7 |

Cold appetizer | 6 |

desserts | 20 |

drinks | 7 |

Main dishes | 20 |

Salads | 8 |

Side dishes | 12 |

Soups | 17 |

Warm appetizer | 3 |

This is the total number of recipes in each category, **not the total number of authors**. Why is that? Well, `COUNT(*)`

counts all the rows in each group. The table **recipe **contains information on a

level - each record is one recipe. This query counts the recipes (rows) in each category, not the recipe authors.**recipe**

One author can have many recipes in each category, so to get the information you want, you must **count distinct authors** (using `COUNT(distinct author)`

instead of `COUNT(*)`

) inside each group. This is a very common `GROUP BY`

error.

So, when should you use `COUNT(*)`

, `COUNT(expression)`

and `COUNT(distinct expression)`

?

Let's take a look at an example:

SELECT meal_category, count(distinct author), count(author), count(*) FROM recipes GROUP BY meal_category;

meal_category | count(distinct author) | count(author) | count(*) |
---|---|---|---|

Bread and pastry | 6 | 7 | 7 |

Cold appetizer | 2 | 6 | 6 |

desserts | 8 | 20 | 20 |

drinks | 5 | 7 | 7 |

Main dishes | 9 | 20 | 20 |

Salads | 6 | 8 | 8 |

Side dishes | 8 | 12 | 12 |

Soups | 6 | 17 | 17 |

Warm appetizer | 3 | 3 | 3 |

The difference between `COUNT(*)`

and `COUNT(expression)`

is visible if we are doing calculations on a column that has some missing values. When missing values are present, `COUNT(*)`

will count all the records in a group and `COUNT(expression)`

will count only non-null values.

In the above example, `COUNT(*)`

and `COUNT(author)`

give the exact same result because the author column doesn’t have any NULL values.

`COUNT(distinct author)`

gives us the number of distinct authors for each category, which is not the same as `COUNT(*)`

. For example, the cold appetizer meal category contains six recipes from two distinct authors. `COUNT(*)`

counts the number of recipes (records) in each category, while `COUNT(distinct author)`

counts the number of distinct authors.

So, if you would like to display the total number of distinct authors per each meal category, use `COUNT(distinct author)`

. Here is the correct query:

SELECT meal_category, count(distinct author) FROM recipes GROUP BY meal_category; GROUP BY meal_category;

For a more detailed explanation, see What is the Difference Between COUNT(*), COUNT(1), COUNT(column name), and COUNT(DISTINCT column name)?

### 6. Problems Using Aggregate Functions With NULLs

This is another ‘missing value’ problem. Let's say that you want to calculate the average total number of views from the previous month for each category. Your colleague calculated those figures, but they’d like you to double check the result.

Here is your query:

SELECT meal_category, avg(no_of_views_lst_mth) as average, FROM recipes GROUP BY meal_category;

And what you get is ...

meal_category | average |
---|---|

Bread and pastry | 52274.8571 |

Cold appetizer | 95584.2 |

desserts | 144349.7222 |

drinks | 72551.7143 |

Main dishes | 61350.8889 |

Salads | 90798.875 |

Side dishes | 139765.25 |

Soups | 64978.8824 |

Warm appetizer | 78390.6667 |

The result looks okay and you are confident when it comes to the correctness of your query. However, your colleague got slightly different figures:

meal_category | average |
---|---|

Bread and pastry | 52274.8571 |

Cold appetizer | 79653.5 |

desserts | 129914.75 |

drinks | 72551.7143 |

Main dishes | 55215.8 |

Salads | 90798.875 |

Side dishes | 139765.25 |

Soups | 64978.8824 |

Warm appetizer | 78390.6667 |

What just happened? Why the different results?

In a nutshell, the differing results arise from different interpretations of missing values.

The column `no_of_views_lst_mth`

represents the number of total page views in the previous month. If a recipe was created in the current month, this column will be NULL for that row.

For example, Ally’s Banana Cheesecake recipe was written in the current month, so there are no statistics for the previous month:

meal_category | name | author | no_of_views | no_of_views_lst_mth | author_id |
---|---|---|---|---|---|

desserts | Banana Cheesecake | Ally | 131944 | NULL | 3 |

*Banana cheesecake was published in current month*

Now, let's get back to those averages and their different results. Averages are calculated as the total sum of `no_of_views_lst_mth`

divided by the total number of records. If you use the AVG() function and NULLs are present, the engine just ignores the NULLs and does calculations without them. This is what happened when you ran your query – the NULLs were omitted. In some cases, you’ll want to replace NULLs with 0 (because business logic dictates); this is what your colleague did, which produced slightly different figures. Here is your colleague’s query:

SELECT meal_category, avg(CASE WHEN no_of_views_lst_mth is null THEN 0 ELSE no_of_views_lst_mth END) AS average FROM recipes GROUP BY meal_category;

Notice how the averages from these two queries treat NULLs differently. For example, the ‘desserts’ category contains NULLs. Thus, the first query omits these rows and does not count them towards the total number of rows; this gives the value 144349.72. The second query replaces all NULLs with zero and counts these rows in the average, giving a smaller value of 129914.75.

I would say that both queries could be valid, depending on how you want to calculate averages.

## 7. Using COUNT(*) with GROUP BY and a LEFT JOIN

Using `GROUP BY`

with a `LEFT JOIN`

statement can be quite confusing – especially with `COUNT()`

. Let's see how `COUNT(*)`

and `COUNT(expression)`

function in a `LEFT JOIN`

.

Let’s suppose that someone in marketing has the following table, ** recipes_campaigns**. It contains information about the number of campaigns run on each meal category in the current month:

meal_category | campaigns |
---|---|

Bread and pastry | 2 |

Cold appetizer | 1 |

desserts | 3 |

drinks | 0 |

Main dishes | 3 |

Salads | 1 |

Side dishes | 2 |

Soups | 3 |

Warm appetizer | 0 |

brunch | 1 |

sandwiches | 0 |

*recipes_campaign*

In addition to the data in

, the marketer also wants to see the number of recipes for each meal category. For that, we’ll need information from the **recipes_campaigns**

table. So let’s left join these two tables and calculate the number of recipes using **recipes**`COUNT(*)`

, like so:

SELECT a.meal_category, count(*), FROM recipes_campaigns a LEFT JOIN recipes b ON a.meal_category=b.meal_category GROUP BY a.meal_category;

Here’s the result:

meal_category | count(*) |
---|---|

Bread and pastry | 7 |

brunch | 1 |

Cold appetizer | 6 |

desserts | 20 |

drinks | 7 |

Main dishes | 20 |

Salads | 8 |

sandwiches | 1 |

Side dishes | 12 |

Soups | 17 |

Warm appetizer | 3 |

This isn’t what we expected. The table

does not contain any recipes in the ‘brunch’ category, so why then did we get that 1 in the result? This happens because **recipe**`COUNT()`

is applied to the `LEFT JOIN`

result! When you `LEFT JOIN`

two tables, the ‘brunch’ category will be present in the output – even if there are no matching recipes or categories in the

table. **recipe**

How can we fix this? If we use `COUNT(expression)`

instead of `COUNT(*)`

, we’ll get the result we want:

SELECT a.meal_category, count(author_id), FROM recipes_campaigns a LEFT JOIN recipes b ON a.meal_category=b.meal_category GROUP BY a.meal_category;

Once you run this, you get:

meal_category | count(author_id) |
---|---|

Bread and pastry | 7 |

brunch | 0 |

Cold appetizer | 6 |

desserts | 20 |

drinks | 7 |

Main dishes | 20 |

Salads | 8 |

sandwiches | 0 |

Side dishes | 12 |

Soups | 17 |

Warm appetizer | 3 |

Here, `COUNT(author_id)`

counts only the non-NULL values in `author_id`

**after** the `LEFT JOIN`

is performed. There is no `author_id`

value for the ‘brunch’ category; in other words, it’s NULL and the result for that category is 0.

## You Can Solve GROUP BY Errors!

Through several examples, we’ve explored GROUP BY and the most common errors that beginners often make. I hope that now you have a better sense of how `GROUP BY`

works and what’s causing those weird errors or confusing results.

`GROUP BY`

is really very important in report creation. If you want to learn how to construct good reports, I recommend our Creating Basic SQL Reports course. It has lots of interactive exercises that let you gain experience writing `GROUP BY`

queries for reports. And more experience certainly reduces the possibility of mistakes!