14th Jul 2020 9 minutes read

How to Calculate the Difference Between Two Rows in SQL

window functions

Table of Contents

Find the Difference Between Two Values in the Same Row
Calculating the Difference Between Two Values in the Same Column
Calculating the Difference Between Date Values in SQL
Finding the Difference Between Non-Consecutive Records
More About SQL Calculations and Window Functions

Calculating the difference between two rows in SQL can be a challenging task. It is possible – and there’s more than one way to do it. In this article, I’ll explain how to use the SQL window functions LEAD() and LAG() to find the difference between two rows in the same table.

The best way to learn window functions is our interactive Window Functions course. It contains over 200 hands-on exercise that will teach you all window functions concepts, from simple OVER() clause to complex topics such as combining window functions and GROUP BY clause. If you already know window functions and want to refresh your skills, check out our Window Functions Practice Set with 100 hands-on exercises.

I like to think of this as a “happy hour" article – you can learn about two topics (calculating the difference between two rows and SQL window functions) by reading just one article. Let’s get right into it!

Find the Difference Between Two Values in the Same Row

To calculate any difference, you need two elements; to calculate a difference in SQL, you need two records. You can calculate the difference between two columns in the same record, as I’ll show in a moment. It’s very easy. However, I’ll mainly focus on finding the difference between two values of the same column in different records.

First, let’s talk about our data. We’ll use a database with two tables, as used by the government of an imaginary province to define some social programs. The first table is housing, which contains data about the number of people renting, owning, or needing a house. Have a look:

Table housing

City	Year	Total Population	Population Owning house	Population renting house	Population needing house
Goldpolis	2017	2250	1500	500	250
Silverpolis	2017	1750	1200	400	150
Bronzepolis	2017	1420	1000	300	120
Goldpolis	2018	2425	1600	550	275
Silverpolis	2018	1920	1300	410	210
Bronzepolis	2018	1730	1020	300	410
Goldpolis	2019	2610	1750	580	280
Silverpolis	2019	2110	1400	420	290
Bronzepolis	2019	2010	1050	300	660

To design a plan to build houses for people who need them, the government wants to obtain some metrics about the housing problems in different cities. Suppose the government wants to know how many people in each city don’t own a house; the query will be:

SELECT 	
    city,
    total_population, 
    total_population - population_owning_house AS people_not_owning_house
FROM housing

In this query, it’s clear you’re calculating a difference using two different columns in the same record. Next, you’ll calculate differences using two records.

Calculating the Difference Between Two Values in the Same Column

Usually, all the operations we do in a SQL query are related to the current record. But in this case, we need a second record. This is the tricky part. We’ll use a technique that employs the windows functions LAG() and LEAD() to obtain data from another record. If you want to go deeper, I suggest reading this window functions article that gives clear explanations of window functions with plenty of examples.

Let’s suppose that, for one specific city ( ‘Bronzepolis’ ), you want to obtain the variation of people needing a house related to the previous year. Let’s see the query:

SELECT 	
  city,
  year,
  population_needing_house,
  LAG(population_needing_house) AS previous_year,
  population_needing_house - LAG(population_needing_house)
    OVER (ORDER BY year ) AS difference_previous_year
FROM housing
WHERE city = ‘Bronzepolis’
ORDER BY year

In the blue text, you can see the calculation of the SQL delta between two rows. To calculate a difference, you need a pair of records; those two records are “the current record” and “the previous year’s record”. You obtain this record using the LAG() window function. This function allows you to obtain data from the previous record (based on an order criterion, which here is “ORDER BY year”).

Below is the result of this query. The yellow arrows show that the value of the LAG() function returns the same population_needing_house value as the previous year’s record. The column at the right shows the result of the difference between the current and previous years.

For the next query, we’ll expand the analysis of housing issues to all cities. For each city, we want the difference between the current and previous years for the population_needing_house column.

First, you’ll remove the condition city = ‘Bronzepolis’. You want to calculate values for each city, so you need a way to separate data into groups. This is why you add the PARTITION BY city clause to LAG(). PARTITION BY city allows you to process all the records for the same city in the same window.

You’ll use LAG() again to calculate the difference between the number of people needing a house between this year and the previous year. Let’s see the query:

SELECT 	
  city,
  year,
  population_needing_house,
  LAG(population_needing_house)
    OVER (PARTITION BY city ORDER BY year ) AS previous_year,
  population_needing_house - LAG(population_needing_house)
   OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year
FROM housing
ORDER BY city, year

In blue text, you can see how the difference is calculated. Here’s a plain-English version of what’s going on:

difference_previous_year = 
  population_needing_house in current record 
  - population_needing_house in the previous year’s record

Where:

Population_needing_house in the current record is the column population_needing_house

and

Population_needing_house in the previous year’s record is obtained by the function LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year )

City	Year	Population needing house	Previous Year	Difference previous year
Bronzepolis	2017	150	null	null
Bronzepolis	2018	410	150	290
Bronzepolis	2019	660	410	250
Goldpolis	2017	250	null	null
Goldpolis	2018	275	250	25
Goldpolis	2019	280	275	5
Silverpolis	2017	150	null	null
Silverpolis	2018	210	150	60
Silverpolis	2019	290	210	80

So, you've used the PARTITION clause to create a set with all the records for the same city. Then the ORDER BY clause is used to order all these records by year. Finally, the LAG() function is used to obtain the population_needing_house value from the previous record.

Another SQL window function, LEAD(), is similar to LAG(). However, it returns the next record in the set (in our example, this would be the next year’s record).

If you review the two previous queries, you’ll see that we use the same pattern to calculate the difference: subtracting the previous column value from the current value using LAG() (or from the next record using LEAD()).

SQL window functions are very powerful. For examples from different business areas, like finance, sales, and trading, check out this article on when to use SQL window functions.

Calculating the Difference Between Date Values in SQL

In the previous examples, you calculated the delta between two rows using numeric column values. Now I’ll demonstrate how to calculate the difference between two values of the date data type.

First, let’s consider the result’s data type. When you calculate the difference between two date values, the result is not a date. It is an interval that represents the number of days between the two dates.

Suppose there’s a table called hospital_statistics that stores statistics for the province’s three hospitals. Below is a sample of the table data:

Table hospital_statistics

Hospital name	day	illness	number of patients
Hospital of Bronzepolis	2017-03-22	MDLR	1
Hospital of Goldpolis	2017-12-03	MDLR	1
Hospital of Silverpolis	2018-08-03	MDLR	1
Hospital of Bronzepolis	2019-01-23	MDLR	1
Hospital of Goldpolis	2019-06-14	MDLR	1

Now let’s say there’s a rare illness called MDLR. The government wants to study how often a patient with MDLR is admitted to any of the hospitals. They’ve asked for a report with the columns day, hospital name, number of cases, and days_since_the_last_case. We’ll use SQL to build the report:

SELECT
  day,
  hospital_name,
  number_of_patients,
  day - LAG(day) OVER (ORDER BY day) 
     AS days_since_last_case
FROM hospital_statistics
WHERE illness_name = 'MDLR'
ORDER BY day

You can see the same pattern that we used before. Now, it’s being used to calculate the days since the last case. The only difference is that you’re calculating a difference between two dates instead of numeric values.

As you see in the results, the column days_since_last_case is an integer value, representing a number of days.

Day	Hospital Name	Illness	Patients	days_since_last_case
2017-03-22	Hospital of Bronzepolis	MDLR	1	null
2017-12-03	Hospital of Goldpolis	MDLR	1	256
2018-08-03	Hospital of Silverpolis	MDLR	1	243
2019-01-23	Hospital of Bronzepolis	MDLR	1	173
2019-06-14	Hospital of Goldpolis	MDLR	1	142

SQL also allows you to calculate differences between timestamp values. And there are other interesting date arithmetic operations you can use on date-related data types. If you want to go deeper, I suggest the article How to Analyze Time Series COVID 19 Data with SQL Window Functions.

Finding the Difference Between Non-Consecutive Records

Until now, you’ve calculated differences between contiguous records based on specific order criteria. In some cases, you’ll need to calculate the difference between non-contiguous records.

As an example, let’s go back to the first query. Suppose we want to add another column showing the number of cases in the last two years. Fortunately, LAG() and LEAD() has an optional parameter that specifies how many records to skip before/after the current record. By default, this parameter is 1 (i.e. “use the next/previous record”), but you can set it to another number. So, with this new parameter, the query will be:

SELECT 	
  city,
  year,
  population_needing_house,
  LAG(population_needing_house)
    OVER (PARTITION BY city ORDER BY year ) AS one_year_before,
  LAG(population_needing_house,2)
    OVER (PARTITION BY city ORDER BY year ) AS two_years_before,
  population_needing_house - LAG(population_needing_house)
    OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year,
  population_needing_house - LAG(population_needing_house,2)
    OVER (PARTITION BY city ORDER BY year ) AS difference_last_two_years
FROM housing
ORDER BY 1, 2 DESC

In blue, you can see the calculation of the difference between two non-consecutive records. In the next image, we can see the new column Difference last two years only for the 2019 values; this is because 2019 is the only year that has data for two prior years. The red lines connect the LAG() function’s results (obtaining data for 2017 and showing the same data in the 2017 record). You can see that both lines have the same value.

We’ve shown how important LAG() and LEAD() are, but there are many more window functions in SQL. Read this article about window functions to learn about more of them.

More About SQL Calculations and Window Functions

In this article, you’ve learned how to calculate the difference between consecutive rows by applying SQL. You’ve also learned how to use LAG() and LEAD() to find the difference between non-consecutive rows. Although there are other ways to calculate this difference, such as using a self-join, these methods are very useful.

If you will be using windows functions (or indeed SQL) frequently – or if you just want to improve your knowledge – this complete Window Functions Cheat Sheet is an excellent resource. To learn window functions through practice, try our Window Functions course. You can read about the course in this article.

Tags:

window functions

Find the Difference Between Two Values in the Same Row

Calculating the Difference Between Two Values in the Same Column

Calculating the Difference Between Date Values in SQL

Finding the Difference Between Non-Consecutive Records

More About SQL Calculations and Window Functions

You may also like