28th Mar 2023 7 minutes read

How to Solve the “must appear in the GROUP BY clause or be used in an aggregate function” Error in PostgreSQL

Table of Contents

What Causes the “must appear in the GROUP BY clause or be used in an aggregate function” Error?
Bonus: PostgreSQL Optimizer, or Why This Error Doesn’t Always Appear
We About Learned Fixing Errors. Now It’s Time to Avoid Them.

Learn what causes one of the most common PostgreSQL GROUP BY errors and how you can fix it!

As a PostgreSQL user, you surely encounter many error messages when writing an SQL code. Sometimes they are quite unclear, but you won’t need an interpreter for this one: “must appear in the GROUP BY clause or be used in an aggregate function”.

This error message mentions GROUP BY and aggregate functions. These are fundamental PostgreSQL concepts you need to master to fix this error. Of course, fixing it also means not making this error too often when you move to advanced concepts, such as creating reports; using window functions, CTEs, and recursion; writing your own functions, or working with maps. All this is covered in our SQL from A to Z in PostgreSQL learning path. This comprehensive interactive course will make it easier for you to avoid PostgreSQL error messages. And when they appear, you’ll be able to solve them quickly.

As you’ll soon see, solving the error we mentioned in the article’s title is also relatively easy. It does, however, require an understanding of how GROUP BY in PostgreSQL works.

What Causes the “must appear in the GROUP BY clause or be used in an aggregate function” Error?

As always, we’ll be very practical and use SQL code to show you what causes the error and how to fix it.

Dataset

The dataset we’ll use consists of two tables. The first is freelancers:

id – The freelancer’s ID and the table’s primary key (PK).
first_name – The freelancer’s first name.
last_name – The freelancer’s last name.
email – The freelancer’s email.
country – The freelancer’s country.

id	first_name	last_name	email	country
1	Pete	Thompson	pthompson@gmail.com	UK
2	Nadine	Lopez	nlopez@gmail.com	Italy
3	Claudio	Stratos	cstratos@gmail.com	Italy
4	Miriam	Valetti	mvaletti@gmail.com	Italy

The CREATE TABLE query is available here.

The second table is named weekly_salary and contains data on how much freelancers’ are paid each week. The columns are:

id – The salary’s ID and the table’s primary key (PK).
freelancers_id – The freelancer’s ID and the table’s foreign key (FK) from the table freelancers.
week_start – The starting date for the salary calculation.
week_end – The ending date for the salary calculation.
paid_amount – The salary amount.

id	freelancers_id	week_start	week_end	paid_amount
1	1	2023-01-02	2023-01-08	623.56
2	1	2023-01-09	2023-01-15	987.41
3	1	2023-01-16	2023-01-22	874.54
4	1	2023-01-23	2023-01-29	354.78
5	1	2023-01-30	2023-02-05	478.65
6	2	2023-01-30	2023-02-05	1,457.17
7	3	2023-01-30	2023-02-05	1,105.94
8	1	2023-02-06	2023-02-12	3,418.95
9	2	2023-02-06	2023-02-12	1,547.98
10	3	2023-02-06	2023-02-12	1,549.36
11	4	2023-02-06	2023-02-12	415.78

Here’s the query for creating the table.

Throwing the Error

Let’s attempt to show the freelancers’ first and last names and the number of weekly salaries they have received so far:

SELECT
  first_name, 
  last_name,
  COUNT(freelancers_id) AS number_of_payments
FROM freelancers f
LEFT JOIN weekly_salary ws
ON f.id = ws.freelancers_id;

We LEFT JOIN the two tables so we can get all the required data.

That data is the first and the last name. Then we use the COUNT() aggregate function on the column freelancers_id to count how many times this ID appears. The counted number is equal to the number of salaries the freelancer received.

So the salaries count for each freelancer is…an error!

: must appear in the group by clause or be used in an aggregate function

The wording of this PostgreSQL message is quite clear, and it reflects the general rule: the columns listed in SELECT should appear in GROUP BY. If they don’t appear in GROUP BY, then they have to be used in the aggregate function.

As you can see, our query doesn’t have GROUP BY at all. We’re using the COUNT() aggregate function and we should define the groups for aggregation, but we didn’t do that.

Fixing the “must appear in the GROUP BY clause or be used in an aggregate function” Error

The usual approach to fixing this error is to simply write all the columns from SELECT in the GROUP BY clause. This excludes the columns that are the result of the aggregate function.

In our case, listing the columns first_name and last_name in GROUP BY will fix the error.

SELECT 
  first_name, 
  last_name,
  COUNT(freelancers_id) AS number_of_payments
FROM freelancers f
LEFT JOIN weekly_salary ws
ON f.id = ws.freelancers_id
GROUP BY first_name, last_name;

The only change from the previous query is that now we use GROUP BY. In it, we write all the columns from SELECT except the one used in the aggregate function. Aggregate functions are not allowed in GROUP BY – this would show a whole other Postgres error message.

first_name	last_name	number_of_payments
Miriam	Valetti	1
Claudio	Stratos	2
Nadine	Lopez	2
Pete	Thompson	6

The query returns the above output. It’s obvious we really did fix the error. This output shows Miriam Valetti got paid once, Claudio Stratos twice, and so on.

Bonus: PostgreSQL Optimizer, or Why This Error Doesn’t Always Appear

PostgreSQL uses an optimizer. It tries to “think” and do things you meant but maybe didn’t write explicitly.

The error we discussed won’t appear in PostgreSQL if you group by the primary key.

Take a look at this query:

SELECT 
  f.id,
  first_name, 
  last_name,
  SUM(paid_amount) AS total_paid_amount
FROM freelancers f
LEFT JOIN weekly_salary ws
ON f.id = ws.freelancers_id
WHERE country = 'Italy'
GROUP BY f.id;

It attempts to return the freelancers’ ID, first name, last name, and the total salary amount they’ve been paid so far (the SUM() function). The two tables are LEFT JOINed and the data is filtered to show only freelancers from Italy.

What seems strange is the GROUP BY. Didn’t we just say that all the columns in SELECT must also appear in GROUP BY?

This code should return an error. Let’s run it and see:

id	first_name	last_name	total_paid_amount
2	Nadine	Lopez	3,005.15
3	Claudio	Stratos	2,655.30
4	Miriam	Valetti	415.78

Surprisingly, the query doesn’t return an error! Why is that?

The PostgreSQL optimizer allows you to group by primary key (PK) and have the non-PK columns in SELECT. PostgreSQL understands that the PK columns determine the values of other columns in the same table.

In our case, column f.id is the PK. Having it in GROUP BY is enough; no need for including first_name and last_name (the non-PK columns) in GROUP BY.

However, you shouldn’t rely on that – the optimizer is not that smart! For instance, it can’t identify PKs for subqueries, CTEs, and views.

Let’s rewrite the above query with a subquery. The table you use might be much bigger, so you might want to filter only the freelancers from Italy immediately. You’ll do this in a subquery:

SELECT 
  ifr.id,
  first_name, 
  last_name,
  SUM(paid_amount) AS total_paid_amount
FROM (SELECT *
	FROM freelancers
	WHERE country = 'Italy') AS ifr 
LEFT JOIN weekly_salary ws
ON ifr.id = ws.freelancers_id
GROUP BY ifr.id;

The SELECT part is the same as earlier. But instead of using the whole freelancers table and joining it with weekly_salary, we write the ifr (as in ‘italy_freelancers’) subquery.

This subquery selects all the columns from the table freelancers and filters data by country in WHERE.

As we use the subquery instead of the whole freelancers table, we group the output by the column id from the subquery.

This should work, as the subquery is basically just a limited version of the freelancers table. However, PostgreSQL returns a familiar error:

We used this derived table as a subquery, so the PostgreSQL optimizer doesn’t recognize its PK. If you insist on having a subquery, then this error is fixed the same way as before: list all other columns from SELECT in GROUP BY.

SELECT 
  ifr.id,
  first_name, 
  last_name,
  SUM(paid_amount) AS total_paid_amount
FROM (SELECT *
	FROM freelancers
	WHERE country = 'Italy') AS ifr 
LEFT JOIN weekly_salary ws
ON ifr.id = ws.freelancers_id
GROUP BY ifr.id, first_name, last_name;

The query now works:

id	first_name	last_name	total_paid_amount
2	Nadine	Lopez	3,005.15
3	Claudio	Stratos	2,655.30
4	Miriam	Valetti	415.78

We About Learned Fixing Errors. Now It’s Time to Avoid Them.

Getting such error messages in PostgreSQL is very helpful, as they make you learn. Of course, even the most advanced PostgreSQL users will now and then see this (or any other) mistake, so you won’t avoid them completely.

It’s important that you know how to fix the errors. However, the goal is to see this GROUP BY error as rarely as possible. Of course, sometimes the PostgreSQL optimizer might save you. It’s a good tool to have, but you should rely more on your knowledge than the optimizer being able to read your mind. To ensure that, you need a lot of practice to make writing the columns in SELECT and then in GROUP BY automatic.

The SQL from A to Z in PostgreSQL will boost your knowledge and confidence in it.

There are also some additional ideas on how to practice PostgreSQL online. Use this to become your own code optimizer!

Tags: