11th Jun 2024 8 minutes read SQL LAG() Function Jill Thornhill window functions Table of Contents What Does the LAG() Function Do? LAG() Function Syntax Basic LAG() Function Example More Complex Examples Using LAG() in Calculations Using LAG() with an Offset Including a Default Using LAG() with PARTITION BY Using LAG() Results to Order a Report Where to Learn More About the SQL LAG() Function The LAG() function – one of SQL’s window functions – is an important tool for planning and trend analysis. In this article, I’ll demonstrate how to include SQL LAG() in your queries using a few real-world examples. SQL window functions, also known as analytical functions or OVER functions, add a new dimension to data analysis. They allow you to include aggregates or data from other rows alongside the current row. The LAG() function lets you look back ‘through the window’ at a previous row and include its data alongside the current row. A similar function, LEAD(), allows you to look forward to subsequent rows. Using the LEAD() and LAG() functions, you can easily include useful features such as year-on-year comparisons in your reports. If you’re serious about data analysis, you may like to have a look at LearnSQL.com’s Window Functions course. You’ll learn by solving 200+ guided interactive exercises using a real database you’ll access through your browser. The course takes about 20 hours to complete; when you’ve finished, you can use your new skills to supercharge your own data analysis projects. What Does the LAG() Function Do? This function allows you to include a column value from an earlier row alongside data from the current row. It’s particularly useful for analyzing trends over time, such as month-on-month and year-on-year comparisons. It can also be used to answer a variety of questions. How big is the gap between average results at School A and School B? How much does using a different raw material affect the lifespan of a component? LAG() Function Syntax In its simplest form, the syntax of the SQL LAG() function is … LAG(column_1) OVER (ORDER BY column_2) … where: column_1 is the name of the column you want to include from the earlier row. OVER indicates that you’re using a window function; this keyword is mandatory. ORDER BY is also mandatory when you’re using LAG(). The sequence of rows must be predictable, otherwise the function is meaningless. However, the order you choose needn’t be the same as in the final report. column_2 is the column you’re using to sequence the rows. You can specify more than one column here. SQL first sequences your data using values in column_2. On each row, it looks back to the previous row in this sequence and retrieves the value from column_1. This value is included alongside any other data you’ve requested from the current row. Basic LAG() Function Example Let’s look at a simple month-on-month comparison. A table named monthly_sales contains the following data: yearmonthsales_qtysales_value 20231210007380 202418005620 202429426945 2024312701745 2024415202048 2024514001890 If you wanted to see last month’s sales alongside this month’s sales, your query would look like this: SELECT year, month, sales_value, LAG(sales_value) OVER (ORDER BY year, month) AS last_month FROM monthly_sales; The result looks like this: yearmonthsales_valuelast_month 2023127380NULL 2024156207380 2024269455620 2024317456945 2024420481745 2024518902048 Notice that the sales value from the previous month has been brought down into the current month’s row: More Complex Examples Now that you’ve learned how to do a simple query using the SQL LAG() function, let’s look at some of the other ways you can use it in your analysis. Using LAG() in Calculations You’ll often want to use the results of LAG() in calculations. For example, you may want to see the difference and percentage change between months. Let’s amend the previous query to include these calculations. You can use the result of a function in calculations just as you would any other column. The query looks like this: SELECT year, month, sales_value, LAG(sales_value) OVER (ORDER BY year, month) AS last_month, sales_value - LAG(sales_value) OVER (ORDER BY year, month) AS change, ((sales_value - LAG(sales_value) OVER (ORDER BY year, month)) * 100) / (LAG(sales_value) OVER (ORDER BY year, month)) AS percentage FROM monthly_sales; The results are: yearmonthsales_valuelast_monthchangepercentage 2023127380NULLNULLNULL 2024156207380-1760-23.85 2024269455620132523.58 2024317456945-5200-74.87 202442048174530317.36 2024518902048-158-7.71 Using LAG() with an Offset In the examples we’ve looked at, the query picked up data from the previous row in the set. This is the default behaviour. However, you can look backwards by more than one row by specifying an offset. The syntax for specifying an offset is: LAG(column_1, offset) OVER (ORDER BY column_2) The offset is a whole number indicating how many rows the query should look backwards to find the data. If you don’t specify an offset, the database assumes an offset of 1. Let’s suppose you want to see this month’s sales, last month’s sales, and the previous month’s sales side by side. The query is: SELECT year, month, sales_value, LAG(sales_value) OVER (ORDER BY year, month) AS last_month, LAG(sales_value,2) OVER (ORDER BY year, month) AS prev_month FROM monthly_sales; The results are: yearmonthsales_valuelast_monthprev_month 2023127380NULLNULL 2024156207380NULL 20242694556207380 20243174569455620 20244204817456945 20245189020481745 Including a Default In the results we’ve looked at, the LAG() function returned NULL in the first row because there was no previous row. You may not always want it to do that. For example, suppose a company opened in December 2023. You may want to show this by having zero in last month’s sales for December so that it’s obvious that sales have increased from nothing to 7380 during that month. The zero is called a default – a value shown where no figures exist. To include a default in your query, the syntax is: LAG(column_1, offset, default) OVER (ORDER BY column_2) Notice that you always have to specify an offset if you include a default. The offset would be 1 if you want to look at the row immediately before the current row. The query for the above example is: SELECT year, month, sales_value, LAG(sales_value, 1, 0) OVER (ORDER BY year, month) AS last_month FROM monthly_sales; The results are: yearmonthsales_valuelast_month 20231273800 2024156207380 2024269455620 2024317456945 2024420481745 2024518902048 Using LAG() with PARTITION BY Let’s suppose the original data was expanded to include more than one department: departmentyearmonthsales_qtysales_value Electrical20231210007380 Electrical202418005620 Electrical202429426945 Electrical2024312701745 Electrical2024415202048 Electrical2024514001890 Hardware20231264009000 Hardware2024140006520 Hardware20242700010300 Hardware20243800012000 Hardware20244805014000 Hardware2024560009000 You would probably want each department to be kept separate in your comparison. This is known as partitioning. The syntax for splitting your results into partitions is … LAG(column_1) OVER (PARTITION BY column_2 ORDER BY column_3) … where column_2 is the column you want to use for partitioning. Let’s try a query that will split the report by department but still order rows by year and month within each department: SELECT department, year, month, sales_value, LAG(sales_value) OVER (PARTITION BY department ORDER BY year, month) AS last_month FROM monthly_sales; The results look like this: departmentyearmonthsales_valuelast_month Electrical2023127380NULL Electrical2024156207380 Electrical2024269455620 Electrical2024317456945 Electrical2024420481745 Electrical2024518902048 Hardware2023129000NULL Hardware2024165209000 Hardware20242103006520 Hardware202431200010300 Hardware202441400012000 Hardware20245900014000 Notice that last month’s sales reverts to NULL on the first row of the new department. SQL uses department as the primary sort key because it was specified in the PARTITION BY clause. Within department, it sequences the data by year and then by month, as per the ORDER BY clause. On each row, it looks back to see if there is a previous row belonging to the same department. If so, it retrieves the contents of the sales_value column from the previous row. This is included in the current row as last_month. If there was no previous row, last_month is set to a NULL value. Using LAG() Results to Order a Report The final order of the report doesn’t have to be the same as the sequence you use within the OVER clause. You can use the normal ORDER BY clause at the end of your query to specify a different sequence. Suppose you wanted to view the results sequenced by the increase in sales value between months. Your query could look like this: SELECT department, year, month, sales_value, sales_value - LAG(sales_value) OVER (PARTITION BY department ORDER BY year, month) AS change FROM monthly_sales ORDER BY department, sales_value - LAG(sales_value) OVER (PARTITION BY department ORDER BY year, month); The results look like this: departmentyearmonthsales_valuechange Electrical2023127380NULL Electrical202431745-5200 Electrical202415620-1760 Electrical202451890-158 Electrical202442048303 Electrical2024269451325 Hardware2023129000NULL Hardware202459000-5000 Hardware202416520-2480 Hardware20243120001700 Hardware20244140002000 Hardware20242103003780 This can be useful if you want to see when the company was doing badly and when it was doing well. Such information can help you trace the source of any problems. Where to Learn More About the SQL LAG() Function If you’d like to learn more about the LAG() function – and window functions in general – here are some articles that offer additional information: The LAG() and LEAD() Functions in SQL SQL LEAD() Function How to Compute Year-Over-Year Differences in SQL If you need to brush up on window functions, these resources may help: This Window Functions Cheat Sheet is great for keeping with you while you’re writing queries. When you’re new to the subject, looking at these Window Function Examples is really helpful. If you’re job-hunting, here are some Window Functions Interview Questions. Study them and they’ll help you ace the SQL interview. As I mentioned earlier, if you really want to become an expert, I’d recommend LearnSQL.com’s Window Functions course. You’ll learn exactly how to use all the power of SQL’s window functions. And you’ll gain confidence by solving lots of hands-on exercises. There’s nothing like practice for increasing your knowledge and skills. If you’d like some guided examples to work through on your own, here are some resources: The Window Functions Practice Exercises article has 11 exercises with solutions. This Window Functions Practice Set is a learning track containing over 100 practical exercises. You’ll work on three different databases to learn how to solve different types of problems. SQL’s LAG() function is an amazing tool for analyzing your data to quickly identify trends and keep your organization competitive. As a data analyst, it’s well worth spending the time to become a window function expert! Tags: window functions