*Calculating the difference between two rows in SQL can be a challenging task. It is possible – and there’s more than one way to do it. In this article, I’ll explain how to use the SQL window functions LEAD() and LAG() to find the difference between two rows in the same table.*

I like to think of this as a “happy hour" article – you can learn about two topics (calculating the difference between two rows and SQL window functions) by reading just one article. Let’s get right into it!

## Find the Difference Between Two Values in the Same Row

To calculate any difference, you need two elements; to calculate a difference in SQL, you need two records. You can calculate the difference between two columns in the same record, as I’ll show in a moment. It’s very easy. However, I’ll mainly focus on finding the difference between two values of the same column in different records.

First, let’s talk about our data. We’ll use a database with two tables, as used by the government of an imaginary province to define some social programs. The first table is

, which contains data about the number of people renting, owning, or needing a house. Have a look: **housing**

Table **housing**

City | Year | Total Population | Population Owning house | Population renting house | Population needing house |
---|---|---|---|---|---|

Goldpolis | 2017 | 2250 | 1500 | 500 | 250 |

Silverpolis | 2017 | 1750 | 1200 | 400 | 150 |

Bronzepolis | 2017 | 1420 | 1000 | 300 | 120 |

Goldpolis | 2018 | 2425 | 1600 | 550 | 275 |

Silverpolis | 2018 | 1920 | 1300 | 410 | 210 |

Bronzepolis | 2018 | 1730 | 1020 | 300 | 410 |

Goldpolis | 2019 | 2610 | 1750 | 580 | 280 |

Silverpolis | 2019 | 2110 | 1400 | 420 | 290 |

Bronzepolis | 2019 | 2010 | 1050 | 300 | 660 |

To design a plan to build houses for people who need them, the government wants to obtain some metrics about the housing problems in different cities. Suppose the government wants to know how many people in each city don’t own a house; the query will be:

SELECT city, total_population, total_population - population_owning_house AS people_not_owning_house FROM housing

In this query, it’s clear you’re calculating a difference using two different columns in the same record. Next, you’ll calculate differences using two records.

## Calculating the Difference Between Two Values in the Same Column

Usually, all the operations we do in a SQL query are related to the current record. But in this case, we need a second record. This is the tricky part. We’ll use a technique that employs the windows functions `LAG()`

and `LEAD()`

to obtain data from another record. If you want to go deeper, I suggest reading this window functions article that gives clear explanations of window functions with plenty of examples.

Let’s suppose that, for one specific city ( ‘Bronzepolis’ ), you want to obtain the variation of people needing a house related to the previous year. Let’s see the query:

```
SELECT
city,
year,
population_needing_house,
LAG(population_needing_house) AS previous_year
population_needing_house - LAG(population_needing_house)
OVER (ORDER BY year ) AS difference_previous_year
FROM housing
WHERE city = ‘Bronzepolis’
ORDER BY year
```

In the blue text, you can see the calculation of the SQL delta between two rows. To calculate a difference, you need a pair of records; those two records are “the current record” and “the previous year’s record”. You obtain this record using the

window function. This function allows you to obtain data from the previous record (based on an order criterion, which here is “**LAG()**

”).**ORDER BY year**

Below is the result of this query. The red arrows show that the value of the `LAG()`

function returns the same population_needing_house value as the previous year’s record. The column at the right shows the result of the difference between the current and previous years.

For the next query, we’ll expand the analysis of housing issues to all cities. For each city, we want the difference between the current and previous years for the `population_needing_house`

column.

First, you’ll remove the condition `city = ‘Bronzepolis’`

. You want to calculate values for each city, so you need a way to separate data into groups. This is why you add the `PARTITION BY`

city clause to `LAG()`

. `PARTITION BY city`

allows you to process all the records for the same city in the same window.

You’ll use `LAG()`

again to calculate the difference between the number of people needing a house between this year and the previous year. Let’s see the query:

```
SELECT
city,
year,
population_needing_house,
LAG(population_needing_house)
OVER (PARTITION BY city ORDER BY year ) AS previous_year
population_needing_house - LAG(population_needing_house)
OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year
FROM housing
ORDER BY city, year
```

In blue text, you can see how the difference is calculated. Here’s a plain-English version of what’s going on:

difference_previous_year = population_needing_house in current record - population_needing_house in the previous year’s record

Where:

**Population_needing_house in the current** record is the column **population_needing_house**

and

**Population_needing_house in the previous year’s record** is obtained by the function **LAG(population_needing_house) OVER (PARTITION BY city ORDER BY year )**

City | Year | Population | Previous Year | Difference previous year |
---|---|---|---|---|

needing | 2019 | 660 | 410 | 250 |

house | Previous Year | Difference | 150 | 290 |

previous | 2017 | 150 | null | null |

year | 2018 | 2425 | 1600 | 550 |

Bronzepolis | 2017 | 150 | null | null |

Bronzepolis | 2018 | 410 | 150 | 290 |

Bronzepolis | 2019 | 660 | 410 | 250 |

Goldpolis | 2017 | 250 | null | null |

Goldpolis | 2018 | 275 | 250 | 25 |

Goldpolis | 2019 | 280 | 275 | 5 |

Silverpolis | 2017 | 150 | null | null |

Silverpolis | 2018 | 210 | 150 | 60 |

Silverpolis | 2019 | 290 | 210 | 80 |

So, you've used the `PARTITION`

clause to create a set with all the records for the same city. Then the `ORDER BY`

clause is used to order all these records by year. Finally, the `LAG()`

function is used to obtain the `population_needing_house`

value from the previous record.

Another SQL window function, `LEAD()`

, is similar to `LAG()`

. However, it returns the **next** record in the set (in our example, this would be the next year’s record).

If you review the two previous queries, you’ll see that we use the same pattern to calculate the difference: subtracting the previous column value from the current value using `LAG()`

(or from the next record using `LEAD()`

).

SQL window functions are very powerful. To learn more, check out this article on when to use SQL window functions. It provides examples from different business areas, like finance, sales, and trading.

## Calculating the Difference Between Date Values in SQL

In the previous examples, you calculated the delta between two rows using numeric column values. Now I’ll demonstrate how to calculate the difference between two values of the date data type.

First, let’s consider the result’s data type. When you calculate the difference between two date values, the result is not a date. It is an interval that represents the number of days between the two dates.

Suppose there’s a table called

that stores statistics for the province’s three hospitals. Below is a sample of the table data:**hospital_statistics**

Table **hospital_statistics**

Hospital name | day | illness | number of patients |
---|---|---|---|

Hospital of Bronzepolis | 2017-03-22 | MDLR | 1 |

Hospital of Goldpolis | 2017-12-03 | MDLR | 1 |

Hospital of Silverpolis | 2018-08-03 | MDLR | 1 |

Hospital of Bronzepolis | 2019-01-23 | MDLR | 1 |

Hospital of Goldpolis | 2019-06-14 | MDLR | 1 |

Now let’s say there’s a rare illness called MDLR. The government wants to study how often a patient with MDLR is admitted to any of the hospitals. They’ve asked for a report with the columns `day`

, `hospital name`

, `number of cases`

, and `days_since_the_last_case`

. We’ll use SQL to build the report:

```
SELECT
day,
hospital_name,
number_of_patients,
day - LAG(day) OVER (ORDER BY day)
AS days_since_last_case
FROM hospital_statistics
WHERE illness_name = 'MDLR'
ORDER BY day
```

You can see the same pattern that we used before. Now, it’s being used to calculate the days since the last case. The only difference is that you’re calculating a difference between two dates instead of numeric values.

As you see in the results, the column `days_since_last_case`

is an integer value, representing a number of days.

Day | Hospital Name | Illness | Patients | days_since_last_case |
---|---|---|---|---|

2017-03-22 | Hospital of Bronzepolis | MDLR | 1 | null |

2017-12-03 | Hospital of Goldpolis | MDLR | 1 | 256 |

2018-08-03 | Hospital of Silverpolis | MDLR | 1 | 243 |

2019-01-23 | Hospital of Bronzepolis | MDLR | 1 | 173 |

2019-06-14 | Hospital of Goldpolis | MDLR | 1 | 142 |

SQL also allows you to calculate differences between `timestamp`

values. And there are other interesting date arithmetic operations you can use on date-related data types. If you want to go deeper, I suggest the article How to Analyze Time Series COVID 19 Data with SQL Window Functions.

## Finding the Difference Between Non-Consecutive Records

Until now, you’ve calculated differences between contiguous records based on specific order criteria. In some cases, you’ll need to calculate the difference between non-contiguous records.

As an example, let’s go back to the first query. Suppose we want to add another column showing the number of cases in the last **two** years. Fortunately, `LAG()`

and `LEAD()`

has an optional parameter that specifies how many records to skip before/after the current record. By default, this parameter is 1 (i.e. “use the next/previous record”), but you can set it to another number. So, with this new parameter, the query will be:

```
SELECT
city,
year,
population_needing_house,
LAG(population_needing_house)
OVER (PARTITION BY city ORDER BY year ) AS one_year_before,
LAG(population_needing_house,2)
OVER (PARTITION BY city ORDER BY year ) AS two_years_before,
population_needing_house - LAG(population_needing_house)
OVER (PARTITION BY city ORDER BY year ) AS difference_previous_year,
population_needing_house - LAG(population_needing_house,2)
OVER (PARTITION BY city ORDER BY year ) AS difference_last_two_years
FROM housing
ORDER BY 1 , 2 DESC
```

In blue, you can see the calculation of the difference between two non-consecutive records. In the next image, we can see the new column `Difference last two years`

only for the 2019 values; this is because 2019 is the only year that has data for two prior years. The red lines connect the `LAG()`

function’s results (obtaining data for 2017 and showing the same data in the 2017 record). You can see that both lines have the same value.

We’ve shown how important `LAG()`

and `LEAD()`

are, but there are many more window functions in SQL. I suggest the following article about window functions to get some background on them.

## More About SQL Calculations and Window Functions

In this article, you’ve learned how to calculate the difference between consecutive rows by applying SQL. You’ve also learned how to use `LAG()`

and `LEAD()`

to find the difference between non-consecutive rows. Although there are other ways to calculate this difference, such as using a self-join, these methods are very useful.

If you will be using windows functions (or indeed SQL) frequently – or if you just want to improve your knowledge – this complete window functions cheat sheet is an excellent resource. Or, for an interactive learning experience, try our Window Functions course.