Back to articles list Articles Cookbook
5 minutes read

NULL Values and the GROUP BY Clause

We've already covered how to use the GROUP BY clause and some aggregation functions like SUM(), AVG(), MAX(), MIN(), COUNT(). In this article, we will explain how the GROUP BY clause works when NULL values are involved. We will also explain about using NULLs with the ORDER BY clause.

The best way to master GROUP BY and NULL in SQL is through practice. I recommend the SQL Practice track at LearnSQL.com. It contains over 600 hands-on exercises to help you gain confidence in your skills.

In SQL, NULL is a special marker used to indicate that a data value does not exist in the database. For more details, check out Wikipedia's explanation of NULL in SQL.

We will use the following employee table to illustrate how the GROUP BY clause works with NULL values.

EMPLOYEE TABLE
EmplidNameDepartmentSalary
100John Smith IT 2000
101Jean Pellu NULL 2500
102Mary Popins FINANCES2000
103Blas MerrieuNULL NULL
104Joan Piquet IT 1000
105Jose Gomez IT NULL

The GROUP BY Clause and NULL Values

Let's start by executing a simple SQL query with both the GROUP BY clause and NULL values:

SELECT department 
FROM employee 
GROUP BY department;
RESULTS
department
1.	
2.	IT
3.	FINANCES

Note: I've added a numbered list here for clarity; usually the results would be shown as an unnumbered list.

We can see that the first result value is a NULL represented by an empty string (the empty line before the IT department). This empty space represents all the NULL values returned by the GROUP BY clause, so we can conclude that GROUP BY treats NULLs as valid values.

In the next query, we will count how many employees are in each department, including the "NULL" department:

SELECT department, count(*) 
FROM employee 
GROUP BY department;
RESULTS
department		count(*)
1. 			2
2. IT        		3
3. FINANCES 		1

Note: I've added a numbered list here for clarity; usually the results would be shown as an unnumbered list.

Analyzing the previous results from a "GROUP BY perspective", we can conclude that all NULL values are grouped into one value or bucket. This makes it look like NULL is one department with two employees. However, treating NULLs this way – grouping many NULLs into one bucket – does not align with the concept that a NULL value is not equal to any other value, even another NULL.

To explain why NULLs are grouped into one bucket, we need to review the SQL standard. SQL defines “any two values that are equal to one another, or any two NULLs”, as “not distinct”. This definition of "not distinct" allows SQL to group and sort NULLs when the GROUP BY clause (or other keywords that perform grouping) is used.

There is another confusing point in the previous result: the way the NULL is represented (by a blank line) is not clear. One interesting way to solve this issue is to use the COALESCE function, which converts NULLs to a specific value but leaves other values unchanged. Let's see the following query:

SELECT coalesce(department,'Unassigned department'), count(*) 
FROM employee 
GROUP BY 1;
RESULTS
department			count(*)
IT				3
Unassigned department	        2
FINANCES			1

Aggregate Functions and Null Values

Until now we've been working with the NULL values in the department column, and we've only used the GROUP BY clause. Let's try executing some queries using NULL values as parameters in aggregate functions. First, we'll use the COUNT() function:

SELECT COUNT(salary) as "Salaries"
FROM employee
RESULTS
Salaries
     4

Without the DISTINCT clause, COUNT(salary) returns the number of records that have non-NULL values (2000, 2500, 2000, 1000) in the salary column. So, we can conclude that COUNT doesn't include NULL values.

Let's try using the COUNT(distinct column) aggregate function, which counts all the different values in a column. How does this treat NULL values?

SELECT COUNT(distinct salary) as "Different Salaries"
FROM employee
RESULTS
Different Salaries
	3

The query returned a "3", but there are four different salaries: 2000, 2500, 1000, and NULL. Again, we can conclude that the NULL is not included in the resulting value.

Let's see another example, this time using the AVG() aggregate function:

SELECT coalesce(department,'Unassigned department'), AVG(salary) 
FROM employee 
GROUP BY 1
RESULTS
department			count(*)
Unassigned department    	2500
IT				1500
FINANCES			2000

Let's analyze if NULL values are included in the AVG() function. The IT department has three employees with the following salary values: 2000, 1000, and NULL. The AVG result for IT is 1500, so it is clear that the NULL value is not considered in the average calculation. (Because (1000 + 2000 ) / 2 = 1500.)

The conclusion is that averages are only calculated using non-NULL values. The general rule is NULL values are not considered in any aggregate function like SUM(), AVG(), COUNT(), MAX() and MIN(). The exception to this rule is the COUNT(*) function, which counts all rows, even those rows with NULL values. Here's an example:

SELECT COUNT(*) as "Total Records"
FROM employee
RESULTS
Total Records
         6

As we can see, COUNT(*) returns the total number of records in the "employee" table, even those records with NULL values in some or all fields.

The ORDER BY Clause and NULL Values

The SQL standard does not explicitly define a default sort order for NULLs. Some databases like Oracle and PostgreSQL use a NULLS FIRST or NULLS LAST specification to indicate the place of the NULL value. The following example shows this feature:

SELECT department, COUNT(*) as "Num of employees" , AVG(salary) as "Avg Dept. Salary"
FROM employee
GROUP BY department
ORDER BY department NULLS LAST
RESULTS
department			Num of employees		Avg Dept. Salary
FINANCES			1				2000
IT				3				1500
				2				2500

Boolean Expressions Involving NULLS

We normally see TRUE or FALSE as a Boolean result, but it is usual for expressions or conditions that include a NULL to return an UNKNOWN result. The UNKNOWN result is covered in detail in another article we previously published here on our blog.

TRY IT YOURSELF!

There are many relational database features and functions that produce a specific behavior whenever a NULL value is involved. You can learn more in the LearnSQL's Standard SQL Functions course. Try it out for free!