Back to articles list Articles Cookbook
9 minutes read

How to Find the Next Non-NULL Value in SQL

You need to find the next non-NULL value in a time series, but you don’t know how. Can you even do that in SQL? Yes, you can! This article will show you what to do.

If you work with SQL, you will sooner or later confront NULL values. Having NULLs in a database is almost unavoidable. However, sometimes you want to avoid them in your reports. This is quite often true when you’re analyzing time series data; NULL values mean there’s no data available. No data available in the data series usually means some event did or did not occur.

So, how do you find the next non-NULL value in a time series? Before answering how, let’s focus on when – as in ‘When would I need to find the next non-NULL value?’. Here’s a scenario that’ll provide an answer.

Scenario

Suppose you’re analyzing data for a freelance platform called NoBoss that connects companies with freelancers. The companies are looking for good freelancers; the freelancers are looking for good jobs (or gigs, as they say in the industry). You’re working with the log table, which contains freelancers’ data, their login data, and some activities. Here are the table columns:

  • id – The log record’s ID and the table’s primary key (PK).
  • user_id – The user’s ID.
  • first_name – The user’s first name.
  • last_name – The user’s last name.
  • login_start – The user’s login start time.
  • login_end – The user’s login end time.
  • job_id – The ID of the job the user applied to.
  • job_name – The name of the job the user applied to.
  • category_id – The job’s category ID.
  • category_name – The job’s category name.

Your task here is to find the users’ IDs and names. Also, you need to get the users’ login start and end times, together with the job the user applied to. Finally, you also need a new column named profile_category. A user’s profile category is determined by the category of the first job the user applies for (i.e. if the freelancer applies to a ‘Virtual Assistant’ gig, their category_name value is “Virtual Assistant”). That way, the NoBoss platform can provide its users with reports and statistics that compare them to other users within the same category.

Does this sound complicated? Maybe not at first. But take a look at the data from the log table and you’ll see why your task is not straightforward. Pay special attention to the highlighted column.

iduser_idfirst_namelast_namelogin_startlogin_endjob_idjob_namecategory_idcategory_name
1512SuzyCinque2021-01-01 4:05:062021-01-01 4:10:1256789Data analyst to scrub data1Data Analysis
2513RobertUpney2021-01-03 13:24:482021-01-03 13:49:48NULLNULLNULLNULL
3513RobertUpney2021-01-04 12:01:012021-01-04 12:25:45NULLNULLNULLNULL
4513RobertUpney2021-01-04 18:19:202021-01-04 18:29:29NULLNULLNULLNULL
5512SuzyCinque2021-01-04 18:20:202021-01-04 18:28:20NULLNULLNULLNULL
6514LauraGalsworthy2021-01-06 9:03:042021-01-06 10:30:55NULLNULLNULLNULL
7513RobertUpney2021-01-09 2:05:072021-01-09 2:15:1459874Content writer for musical blog13Writing
8514LauraGalsworthy2021-01-14 11:05:282021-01-14 11:52:18NULLNULLNULLNULL
9514LauraGalsworthy2021-01-15 11:22:492021-01-15 14:55:5562459Financial data analyst for a fintech company1Data Analysis
10512SuzyCinque2021-01-16 14:22:122021-01-16 14:38:5262499Ghostwriter for biography13Writing
11513RobertUpney2021-01-22 15:05:122021-01-22 16:00:0062512Write a pop song23Composer
12513RobertUpney2021-01-22 17:12:132021-01-22 17:49:1462515Dashboard expert for a startup1Data Analysis
13514LauraGalsworthy2021-01-23 11:08:032021-01-23 11:26:5562528PowerBI expert1Data Analysis
14514LauraGalsworthy2021-01-23 12:04:122021-01-23 13:00:0062600Design a logo47Design
15512SuzyCinque2021-01-30 14:08:082021-01-30 14:38:1262700Proofreader13Writing

Do you see what makes this so difficult? This table records every login. In other words, this table contains a time series, which means there can be log records where the user did not apply for any job. You need to exclude all the records with NULL values in the column job_id. This is the easier part.

You also need to show only the first job’s category. The user can be logged in and not apply to their first job for months. This means you’ll need to skip who-knows-how-many records with NULL values before reaching the non-NULL record that equals the first job.

Remember, to complete this task, you’ll have to find all the non-NULL values in the column job_id and only the first non-NULL value from the same column. To solve this problem, you’ll need SQL window functions; if you’re not familiar with them, check out our Window Functions course.

Example Solution: SQL Window Functions

Using  window functions is not the only way to solve this problem in SQL. That’s why the following code is only an example. However, using window functions is the most elegant way to find non-NULL values. So, here’s the code:

SELECT	user_id,
		first_name,
		last_name,
		login_start,
		login_end,
		job_name,
		FIRST_VALUE(category_name) OVER (PARTITION BY user_id ORDER BY job_id ASC) AS profile_category
FROM log
WHERE job_name IS NOT NULL;

What does this code do? It first selects several columns from the table log; nothing fancy. The exciting part comes from using the FIRST_VALUE() window function. This function returns the first value in a defined set of data. In this case, it will return the first value from the column category_name.

However, I don’t want the first job category ever chosen to be the category for every user; the users’ first job is used only for that particular user. To achieve that, I’ve used the PARTITION BY. This defines the window over which the FIRST_VALUE() function will work. This means it will find the first user’s first job; then it will move to the following user and find their first job, and so on.

Notice that this operation is performed over the column job_id in ascending order. Why is that? Job IDs are sequentially allocated to jobs. If you order the job IDs ascendingly, it means the lowest ID will be on top and the NULL values will be at the bottom. That way, you avoid getting a NULL as the first value in the result.

Finally, the WHERE clause excludes all the logins where the user didn’t apply for any job.

Let’s check the code to see if it returns the desired result:

user_idfirst_namelast_namelogin_startlogin_endjob_nameprofile_category
512SuzyCinque2021-01-01 4:05:062021-01-01 4:10:12Data analyst to scrub the dataData Analysis
512SuzyCinque2021-01-16 14:22:122021-01-16 14:38:52Ghostwriter for biographyData Analysis
512SuzyCinque2021-01-30 14:08:082021-01-30 14:38:12ProofreaderData Analysis
513RobertUpney2021-01-09 2:05:072021-01-09 2:15:14Content writer for musical blogWriting
513RobertUpney2021-01-22 15:05:122021-01-22 16:00:00Write a pop songWriting
513RobertUpney2021-01-22 17:12:132021-01-22 17:49:14Dashboard expert for a startupWriting
514LauraGalsworthy2021-01-15 11:22:492021-01-15 14:55:55Financial data analyst for a fintech companyData Analysis
514LauraGalsworthy2021-01-23 11:08:032021-01-23 11:26:55PowerBI expertData Analysis
514LauraGalsworthy2021-01-23 12:04:122021-01-23 13:00:00Design a logoData Analysis

The code seems to be working perfectly! This is only one aspect of analyzing time series in SQL. You can learn more by reading this article on time series analysis. You don’t need to know all the window functions and their syntax by heart, especially if you don’t use them very often. In such situations, our SQL Window Functions Cheat Sheet can be helpful. There you’ll find all the window functions, their syntax, and examples showing how they work.

Pay Attention When Adapting This Query

The above query gives you the general framework for the solution and succeeds at finding the next non-NULL value. You can easily adapt this code to the time series you’re analyzing. However, be careful and pay attention when you do!

The most important thing is to know your data. Understand if your data set contains the NULL values and what they mean. In the above example, it was necessary to know that the user can be logged in and never apply for any job. For this data, having NULL values is not a mistake; it’s information that leads you to certain conclusions.

There is another example of the importance of knowing your data. I knew that the job_id values are allocated sequentially, not randomly. I’ve used this knowledge to order the data and thus eliminate NULLs from the result. It’s also helpful to know how NULL values are treated when ordering the data. Depending on the order, they will appear either as the first or last values in the table.

It’s easy to think that you can use the LAST_VALUE() function instead of FIRST_VALUE() and get the same results by simply ordering the data differently. Let’s see if it works!

Common logic says I just need to take our previous query and write it this way:

SELECT	user_id,
		first_name,
		last_name,
		login_start,
		login_end,
		job_name,
		LAST_VALUE(category_name) OVER (PARTITION BY user_id ORDER BY job_id DESC ) AS profile_category
FROM log
WHERE job_name IS NOT NULL;

The only difference is in this line: LAST_VALUE(category_name) OVER (PARTITION BY user_id ORDER BY job_id DESC ) AS profile_category. Instead of the FIRST_VALUE(), I’ve used the LAST_VALUE() function. The other difference is that the function will be performed over job_id in descending order.

Let’s run the code and analyze the result for the user Suzy Cinque:

user_idfirst_namelast_namelogin_startlogin_endjob_nameprofile_category
512SuzyCinque2021-01-30 14:08:082021-01-30 14:38:12ProofreaderWriting
512SuzyCinque2021-01-16 14:22:122021-01-16 14:38:52Ghostwriter for biographyWriting
512SuzyCinque2021-01-01 4:05:062021-01-01 4:10:12Data analyst to scrub the dataData Analysis

There should only be “Data Analysis” in the profile_category. Why is that? It’s because Suzy Cinque first applied to the job “Data analyst to scrub the data”. However, this column now shows the “Writing” category too. Take a closer look. The other two jobs Suzy Cinque applied to are “Proofreader” and “Ghostwriter for biography”. They both belong to the “Writing” job category.

What this code gives you is only the category of the current job. A proofreader is a writing job. A ghostwriter is also a writing job. Data analyst is, well, a data analysis job. Why did this happen? It’s because the default window frame is RANGE UNBOUNDED PRECEDING when ORDER BY is used. It means the LAST_VALUE() will consider only values between the first row and the current row.

Don’t despair – there’s a way to make this work! The trick is to define the window frame correctly:

SELECT	user_id,
		first_name,
		last_name,
		login_start,
		login_end,
		job_name,
		LAST_VALUE(category_name) OVER (PARTITION BY user_id ORDER BY job_id DESC ROWS BETWEEN UNBOUNDED PRECEDING
      AND UNBOUNDED FOLLOWING ) AS profile_category
FROM log
WHERE job_name IS NOT NULL;

This changed code will now consider all the values between the first and last rows. It achieves that by the following clause: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Check out the new result for Suzy Cinque:

user_idfirst_namelast_namelogin_startlogin_endjob_nameprofile_category
512SuzyCinque2021-01-30 14:08:082021-01-30 14:38:12ProofreaderData Analysis
512SuzyCinque2021-01-16 14:22:122021-01-16 14:38:52Ghostwriter for biographyData Analysis
512SuzyCinque2021-01-01 4:05:062021-01-01 4:10:12Data analyst to scrub the dataData Analysis

Now the result is correct again. The first job Suzy Cinque applied to is “Data analyst to scrub the data”; that’s why her profile category is always going to be “Data Analysis”.

Finding the next non-NULL value is only one aspect of analyzing a time series. To get more familiar with both time series and window functions, try practicing on actual COVID-19 data like in this article.

Speaking of the time data, you might be required to calculate a time series’ length. Don’t worry, here’s an article that teaches you how to use window functions to calculate the length of a series.

Want More Help Working with Non-NULL Values?

This scenario with the NoBoss platform is only one example of how to find the next non-NULL values. Use this scenario and the code I explained as a basis for further learning and practice. Our Window Functions course will give you more structure and help you learn about all the other window functions. If you’re interested, here’s all the info about the Window Functions course you’ll need, provided by our Chief Content Officer.

Practice what you learned here, and good luck with finding your way through the non-NULL values in time series data!