Back to articles list Articles Cookbook
7 minutes read

How to Tackle SQL NULLs: COALESCE function

It's inevitable that some data in the database has no value what in SQL is represented by the NULL keyword. "No value" here is different from zero, false, or an empty string (but with exceptions! In Oracle database, NULL is the same as string of zero length). During playing with the database, treating NULLs are becoming more and more troublesome, therefore MySQL standard provides help with some of the functions, like COALESCE.

Let's introduce the COALESCE() function:

Definition

COALESCE function returns a first non NULL argument from the passed list of arguments. Its syntax is as follows:

COALESCE(x, y, … , n)

What additionally is important, that:

  • It takes at least two arguments.
  • The expressions in the COALESCE function must evaluate to the same data type (For example the SQL statement SELECT COALESCE (1, 'aa', NULL, 23); will produce an error).
  • If all listed arguments evaluate to NULL, the function also returns NULL.

COALESCE is essentially a shorthanded CASE statement presented below:

CASE
WHEN x IS NOT NULL THEN x
WHEN y IS NOT NULL THEN y
WHEN ... IS NOT NULL THEN …
ELSE n
END

COALESCE – usage examples

Now, take a look at the presented below prototype of the university database model. I will show some examples that requires the COALESCE function that refers to tables from this model.

Using COALESCE with value concatenation

Take a look at the student table from the introduced university database model:
Strudent table -postgres coalesce

It may happen, that the particular person could not have the the middle name, therefore some records in the middle_name column may be NULLs like show the example data below.

pic-3

Let's try to concatenate student's first name, middle name, last name. This one works in PostgreSQL, Oracle.

SELECT
  first_name || ' ' || middle_name || ' ' || last_name AS full_name
FROM student;

The result table contains a column full_name with the concatenated strings.

pic-4

Notice that only one record has a proper full name. The rest of them are NULLs, because the middle name was also NULL (while concatenating strings with NULL value, the result is also NULL).

Such a result doesn't looks very nice. The result don't show the full name of students with no middle name. Here the COALESCE() function comes with the rescue. Using this we can replace NULL fields with for example an empty string. Now, the new one query looks as follows:

SELECT
  first_name || COALESCE(' ' || middle_name || ' ', ' ') || last_name AS full_name
FROM student;

The result is now complete 🙂

pic-5

MySQL, MS SQL Server uses for the concatenation CONCAT() function or ‘+‘ operator.

Similarly, the query in MySQL or MS SQL Server will look as follows:

SELECT
  CONCAT(first_name, ' ', middle_name, ' ', last_name)
FROM student;

… and the records that have NULLs will also have :

pic-4

With the COALESCE function, the NULLs that appear in the middle_name column will be replaced.

SELECT
  CONCAT(first_name, ' ', COALESCE(middle_name, ''), ' ', last_name)
FROM student;

Using PostgreSQL COALESCE with aggregate functions

Well, NULLs indeed can appear very problematic. Let's imagine different situation. I want to get to know what's the average grade for the course. It seems very easy, right? But, wait, wait… even if this is a simple query we can encounter some drawbacks. The issue concerns some of the SQL aggregate functions like SUM(), MAX() or MIN().

EXAMPLE: Using COALESCE with AVG function

Let's try an example with student_courses table.

pic-6

As you see below, students who are registered to the course of id 1 don't have a grade yet.

pic-7

The query, that returns the average grade  for the course looks as follows:

SELECT
  course_instance_id,
  AVG(grade) AS average_grade_in_course
FROM student_courses
GROUP BY course_instance_id;

For the AVG() function , the non-NULL fields are summed and the sum is divided by the number of non-NULL fields. So, if all the records have NULL in grade column, the average grade will be NULL as well.

The result table below:

pic-8

In such a case, we want to insert other value, like 0 (the grade can be a value from 2 to 5, so the average 0 can indicate for us that students don't have grades)

COALESCE() function comes with help in replacing a values. The same query with COALESCE will be:

SELECT
  course_instance_id,
  COALESCE(AVG(grade), 0) AS average_grade_in_course
FROM student_courses
GROUP BY course_instance_id;

And now we can get the result table as:

pic-9

EXAMPLE: Using COALESCE with SUM() function

This one is a similar example. Let's assume different situation. We want to count the total amount of the absence hours for each student.  Take a look at selected part of the model:

pic-10

The data in the table student_course_attendance looks as follows.

pic-11

While the  data student_courses is:

pic-12

Take a look at the data in those tables closely. As you see, two students are registered for the particular course, while only the one was missing classes.

The query, that will calculate the sum of the total number of the absence hours for a course_id = 1 will for each student is:

SELECT
  student_id,
  SUM(absence_hours) AS total_absence_hours
FROM student_courses sc
LEFT JOIN student_course_attendance sca
  ON sc.id = sca.student_courses_id
WHERE course_instance_id = 1
GROUP BY 1;

… and evaluates in a result table:

pic-13

The total absence hour was evaluated to NULL, because indeed, this student didn't miss any classes in this course instance. There weren't any record for this student in the student_course_attendance table, and the SUM() function returned NULL. We can secure from this situation of course using COALESCE function.

The new query is as follows:

SELECT
  student_id,
  COALESCE(SUM(absence_hours), 0)  AS total_absence_hours
FROM student_courses sc
LEFT JOIN student_course_attendance sca
  ON sc.id = sca.student_courses_id
WHERE course_instance_id = 1
GROUP BY 1;

And the result is now:

pic-14

Let's abandon now the university model and look at the different example.

Using COALESCE in creating PIVOT tables

This use case will be presented for the PostgreSQL database. Note, that other databases like Oracle and MS SQL Server have their equivalent functions for creating pivot tables.

Look at the table below. This is the result of some query, that shows what is the total sale for each brand in the particular month.

pic-15

I want to transponse the rows with columns (create a pivot table). The result table should have columns: brand_id, jan, feb, …, dec. Notice that not in the every month product from the particular brand was sold. Let's take a look at the Postgres example.

SELECT brand_id, jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec

FROM CROSSTAB (
  'SELECT year, month, qty FROM sales ORDER BY 1',
  'SELECT m FROM generate_series(1,12) m'
) AS (
   year int,
   "jan" int,
   "feb" int,
   "mar" int,
   "apr" int,
   "may" int,
   "jun" int,
   "jul" int,
   "aug" int,
   "sep" int,
   "oct" int,
   "nov" int,
   "dec" int
);

After executing such a query, we receive:

pic-16

Unfortunately, the sell is recorded only in some of the months and only in those fields, the number of total sales is pasted. As you probably expected, the rest of the fields contains NULLs.

To get rid of the unwanted NULLs we can use the COALESCE function, that when needed insert a 0 instead of NULL. The repaired query looks as follows:

SELECT 
  brand_id,
  COALESCE(jan, 0),
  COALESCE(feb, 0),
  COALESCE(mar, 0),
  COALESCE(apr, 0),
  COALESCE(may, 0),
  COALESCE(jun, 0),
  COALESCE(jul, 0),
  COALESCE(aug, 0),
  COALESCE(sep, 0),
  COALESCE(oct, 0),
  COALESCE(nov, 0),
  COALESCE(dec, 0)
FROM CROSSTAB(
  'SELECT year, month, qty FROM sales ORDER BY 1',
  'SELECT m FROM generate_series(1,12) m'
) AS (
   year int,
   "jan" int,
   "feb" int,
   "mar" int,
   "apr" int,
   "may" int,
   "jun" int,
   "jul" int,
   "aug" int,
   "sep" int,
   "oct" int,
   "nov" int,
   "dec" int
);
pic-17

Summary

NULLs can make life problematic. If you haven't experienced the bad side of the missing values yet, you will come across it for sure. It's a matter of time. For now, remember the COALESCE function, that will help you tackle the unwanted NULLs.