Articles Cookbook
Back to articles list
- 8 minutes read

How to Calculate the Difference Between Two Rows in SQL

Calculating the difference between two rows in SQL can be a challenging task. It is possible – and there’s more than one way to do it. In this article, I’ll explain how to use the SQL window functions LEAD() and LAG() to find the difference between two rows in the same table.

I like to think of this as a “happy hour" article – you can learn about two topics (calculating the difference between two rows and SQL window functions) by reading just one article. Let’s get right into it!

Find the Difference Between Two Values in the Same Row

To calculate any difference, you need two elements; to calculate a difference in SQL, you need two records. You can calculate the difference between two columns in the same record, as I’ll show in a moment. It’s very easy. However, I’ll mainly focus on finding the difference between two values of the same column in different records.

First, let’s talk about our data. We’ll use a database with two tables, as used by the government of an imaginary province to define some social programs. The first table is housing, which contains data about the number of people renting, owning, or needing a house. Have a look:

Table housing

CityYearTotal PopulationPopulation Owning housePopulation renting housePopulation needing house
Goldpolis201722501500500250
Silverpolis201717501200400150
Bronzepolis201714201000300120
Goldpolis201824251600550275
Silverpolis201819201300410210
Bronzepolis201817301020300410
Goldpolis201926101750580280
Silverpolis201921101400420290
Bronzepolis201920101050300660

To design a plan to build houses for people who need them, the government wants to obtain some metrics about the housing problems in different cities. Suppose the government wants to know how many people in each city don’t own a house; the query will be:

SELECT 	
    city,
    total_population, 
    total_population - population_owning_house AS people_not_owning_house
FROM 	housing

In this query, it’s clear you’re calculating a difference using two different columns in the same record. Next, you’ll calculate differences using two records.

Calculating the Difference Between Two Values in the Same Column

Usually, all the operations we do in a SQL query are related to the current record. But in this case, we need a second record. This is the tricky part. We’ll use a technique that employs the windows functions LAG() and LEAD() to obtain data from another record. If you want to go deeper, I suggest reading this window functions article that gives clear explanations of window functions with plenty of examples.

Let’s suppose that, for one specific city ( ‘Bronzepolis’ ), you want to obtain the variation of people needing a house related to the previous year. Let’s see the query:

SELECT 	
city,
	year,
      population_needing_house,
      LAG(population_needing_house) AS previous_year
      population_needing_house - LAG(population_needing_house)
 	OVER (ORDER BY year ) AS difference_previous_year
FROM 	housing
WHERE city = ‘Bronzepolis’
ORDER BY year

In the blue text, you can see the calculation of the SQL delta between two rows. To calculate a difference, you need a pair of records; those two records are “the current record” and “the previous year’s record”. You obtain this record using the LAG() window function. This function allows you to obtain data from the previous record (based on an order criterion, which here is “ORDER BY year”).

LAG

Below is the result of this query. The red arrows show that the value of the LAG() function returns the same population_needing_house value as the previous year’s record. The column at the right shows the result of the difference between the current and previous years.

LAG

For the next query, we’ll expand the analysis of housing issues to all cities. For each city, we want the difference between the current and previous years for the population_needing_house column.

First, you’ll remove the condition city = ‘Bronzepolis’. You want to calculate values for each city, so you need a way to separate data into groups. This is why you add the PARTITION BY city clause to LAG(). PARTITION BY city allows you to process all the records for the same city in the same window.

You’ll use LAG() again to calculate the difference between the number of people needing a house between this year and the previous year. Let’s see the query:

SELECT 	
city,
	year,
      population_needing_house,
      LAG(population_needing_house)
      OVER (PARTITION BY city ORDER BY year ) AS previous_year
      population_needing_house - LAG(population_needing_house)
 	OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year
FROM 	housing
ORDER BY city, year

In blue text, you can see how the difference is calculated. Here’s a plain-English version of what’s going on:

 difference_previous_year = population_needing_house in current record - population_needing_house in the previous year’s record

Where:

Population_needing_house in the current record is the column population_needing_house

and

Population_needing_house in the previous year’s record is obtained by the function LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year )

CityYearPopulationPrevious YearDifference previous year
needing2019660410250
housePrevious YearDifference150290
previous2017150nullnull
year201824251600550
Bronzepolis2017150nullnull
Bronzepolis2018410150290
Bronzepolis2019660410250
Goldpolis2017250nullnull
Goldpolis201827525025
Goldpolis20192802755
Silverpolis2017150nullnull
Silverpolis201821015060
Silverpolis201929021080

So, you've used the PARTITION clause to create a set with all the records for the same city. Then the ORDER BY clause is used to order all these records by year. Finally, the LAG() function is used to obtain the population_needing_house value from the previous record.

Another SQL window function, LEAD(), is similar to LAG(). However, it returns the next record in the set (in our example, this would be the next year’s record).

If you review the two previous queries, you’ll see that we use the same pattern to calculate the difference: subtracting the previous column value from the current value using LAG() (or from the next record using LEAD()).

SQL window functions are very powerful. To learn more, check out this article on when to use SQL window functions. It provides examples from different business areas, like finance, sales, and trading.

Calculating the Difference Between Date Values in SQL

In the previous examples, you calculated the delta between two rows using numeric column values. Now I’ll demonstrate how to calculate the difference between two values of the date data type.

First, let’s consider the result’s data type. When you calculate the difference between two date values, the result is not a date. It is an interval that represents the number of days between the two dates.

Suppose there’s a table called hospital_statistics that stores statistics for the province’s three hospitals. Below is a sample of the table data:

Table hospital_statistics

Hospital namedayillnessnumber of patients
Hospital of Bronzepolis2017-03-22MDLR1
Hospital of Goldpolis2017-12-03MDLR1
Hospital of Silverpolis2018-08-03MDLR1
Hospital of Bronzepolis2019-01-23MDLR1
Hospital of Goldpolis2019-06-14MDLR1

Now let’s say there’s a rare illness called MDLR. The government wants to study how often a patient with MDLR is admitted to any of the hospitals. They’ve asked for a report with the columns day, hospital name, number of cases, and days_since_the_last_case. We’ll use SQL to build the report:

SELECT
	  day,
	  hospital_name,
	  number_of_patients,
	  day - LAG(day) OVER (ORDER BY day) 
                AS days_since_last_case
FROM    hospital_statistics
WHERE  illness_name = 'MDLR'
ORDER BY day

You can see the same pattern that we used before. Now, it’s being used to calculate the days since the last case. The only difference is that you’re calculating a difference between two dates instead of numeric values.

As you see in the results, the column days_since_last_case is an integer value, representing a number of days.

DayHospital NameIllnessPatientsdays_since_last_case
2017-03-22Hospital of BronzepolisMDLR1null
2017-12-03Hospital of GoldpolisMDLR1256
2018-08-03Hospital of SilverpolisMDLR1243
2019-01-23Hospital of BronzepolisMDLR1173
2019-06-14Hospital of GoldpolisMDLR1142

SQL also allows you to calculate differences between timestamp values. And there are other interesting date arithmetic operations you can use on date-related data types. If you want to go deeper, I suggest the article How to Analyze Time Series COVID 19 Data with SQL Window Functions.

Finding the Difference Between Non-Consecutive Records

Until now, you’ve calculated differences between contiguous records based on specific order criteria. In some cases, you’ll need to calculate the difference between non-contiguous records.

As an example, let’s go back to the first query. Suppose we want to add another column showing the number of cases in the last two years. Fortunately, LAG() and LEAD() has an optional parameter that specifies how many records to skip before/after the current record. By default, this parameter is 1 (i.e. “use the next/previous record”), but you can set it to another number. So, with this new parameter, the query will be:

SELECT 	
      city,
	year,
	population_needing_house,
	LAG(population_needing_house)
	    OVER (PARTITION BY city ORDER BY year ) AS one_year_before,
	LAG(population_needing_house,2)
	    OVER (PARTITION BY city ORDER BY year ) AS two_years_before,
	population_needing_house - LAG(population_needing_house)
	    OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year,
	population_needing_house - LAG(population_needing_house,2)
	    OVER (PARTITION BY city ORDER BY year ) AS difference_last_two_years
FROM 	housing
ORDER BY 1 , 2 DESC

In blue, you can see the calculation of the difference between two non-consecutive records. In the next image, we can see the new column Difference last two years only for the 2019 values; this is because 2019 is the only year that has data for two prior years. The red lines connect the LAG() function’s results (obtaining data for 2017 and showing the same data in the 2017 record). You can see that both lines have the same value.

LAG

We’ve shown how important LAG() and LEAD() are, but there are many more window functions in SQL. I suggest the following article about window functions to get some background on them.

More About SQL Calculations and Window Functions

In this article, you’ve learned how to calculate the difference between consecutive rows by applying SQL. You’ve also learned how to use LAG() and LEAD() to find the difference between non-consecutive rows. Although there are other ways to calculate this difference, such as using a self-join, these methods are very useful.

If you will be using windows functions (or indeed SQL) frequently – or if you just want to improve your knowledge – this complete window functions cheat sheet is an excellent resource. Or, for an interactive learning experience, try our Window Functions course.

go to top