31st Aug 2017 9 minutes read

Common SQL Window Functions: Positional Functions

Table of Contents

Introducing SQL's Positional Windows Functions
FIRST VALUE
LAST VALUE
LAG
Step 1
Step 2
LEAD
Conclusion

Positional SQL window functions deal with data's location in the set. In this post, we explain LEAD, LAG, and other positional functions.

SQL window functions allow us to aggregate data while still using individual row values. We've already dealt with ranking functions and the use of partitions. In this post, we'll examine positional window functions, which are extremely helpful in reporting and summarizing data. Specifically, we'll look at LAG, LEAD, FIRST_VALUE and LAST_VALUE.

It is worthwhile mentioning that LEAD mirrors LAG and FIRST_VALUE mirrors LAST_VALUE. In some ways, these pairs can be used interchangeably.

We're getting into a more complex area of SQL window functions now. As we go through this post, you'll notice that some problems are solved more easily using window functions (as compared to using nested complex subquery combinations). Don't worry if you feel a little bit overwhelmed at first. The topic isn't easy and it takes time to master it. Just keep learning and practicing!

Introducing SQL's Positional Windows Functions

Once again, we'll be working with the sales table. This table tracks basic sales data using the following columns:

id" – The primary key of the table.
salesman_id – The ID of the person who made the sale.
sales_item – The ID of the item sold.
sales_num – The number of items sold.
sales_price – The price per individual item.
datetime – The date of the sale.

Now, let's start with the FIRST VALUE function.

FIRST VALUE

First, we are going to look at the analytical function called FIRST VALUE. As the name suggests, this functions returns the first value in an ordered list of values.

Suppose we want to extract the first six months of data from our sales table, but we want to list the salesman with the biggest sale volume first for each month. To accomplish this, we use the FIRST VALUE function with the following arguments:

first_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc)

We are feeding the FIRST VALUE function with the salesman_id argument over the window of a month where the data is sorted in descending order based on the sales_num" column. We expect the salesman ID with the largest associated sale volume to be listed first for each month.

SELECT 
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  extract(MONTH FROM datetime) month,
  first_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) as TOP_SALESMAN
FROM sales
WHERE extract(MONTH FROM datetime) <= 6
ORDER BY extract(MONTH FROM datetime);

Does the result live up to the expectation?

SALESMAN _ID	SALES _ITEM	SALES _NUM	SALES _PRICE	MONTH	TOP_ SALESMAN
10	8	22	160.5	1	10
70	80	2	60.5	1	10
10	6	550	16.1	2	10
30	25	50	26.1	2	10
30	25	330	12.1	3	30
10	10	50	55.1	3	30
40	200	40	10.5	3	30
70	200	22	10.5	3	30
10	200	20	2.1	3	30
10	200	20	1.1	3	30
50	200	15	110	3	30
60	200	8	30.5	3	30
30	200	7	50	3	30
10	100	4	50.5	3	30
10	40	40	5.1	4	10
30	25	30	2.1	5	30
50	600	215	10	6	50

Try It Yourself: You can simulate the results of using FIRST_VALUE with other functions. See if you can find any without using the RANK function.

LAST VALUE

LAST VALUE is the opposite of FIRST VALUE; it returns the last value of an ordered list of values.

Let's take the previous example and simply replace FIRST VALUE with the LAST VALUE:

SELECT 
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  extract(MONTH FROM datetime) month,
  last_value(salesman_id) over (partition by extract(MONTH FROM datetime) order by sales_num desc) as TOP_SALESMAN
FROM sales
WHERE extract(MONTH FROM datetime) <= 6
ORDER BY extract(MONTH FROM datetime);

We get this result:

SALESMAN _ID	SALES _ITEM	SALES _NUM	SALES _PRICE	MONTH	TOP_ SALESMAN
10	8	22	160.5	1	10
70	80	2	60.5	1	70
10	6	550	16.1	2	10
30	25	50	26.1	2	30
30	25	330	12.1	3	30
10	10	50	55.1	3	10
40	200	40	10.5	3	40
70	200	22	10.5	3	70
10	200	20	2.1	3	10
10	200	20	1.1	3	10
50	200	15	110	3	50
60	200	8	30.5	3	60
30	200	7	50	3	30
10	100	4	50.5	3	10
10	40	40	5.1	4	10
30	25	30	2.1	5	30
50	600	215	10	6	50

Now the values in salesman_id that have the smallest associated sales_number are listed first, just as we'd expect.

Try It Yourself: LAST VALUE and FIRST_VALUE are interchangeable – you can receive the same result you expect from LAST VALUE using FIRST VALUE. I will leave it up to you to figure out how.

There, that wasn't too bad. Let's move on to something more complex: the LEAD and LAG functions. You should note that, like FIRST VALUE and LAST VALUE, these have an opposite yet interchangeable relationship. Also, know that these are usually used in calculations as helper functions that show some difference between current value and the leading or lagging value.

We'll start with the LAG function.

LAG

LAG lets you access the information in another row without using a SELF JOIN command. You simply specify the column name (the value) and how many rows back the data is. Once again, we are using this function with a window of data.

Important: LAG returns the value from a row located before the current row in the specified sorting order.

The syntax of the function looks like this:

LAG(value, offset, default_value)

value is the column we want to access
offset is the number of previous rows we skip to get to the value
default_value is what the function returns if there is no value in the row

LEAD / LAG functions is usually used in SQL queries as a "helper" function in calculations that show some difference between current value and leading/lagging value.

Let's try a simple LAG function in our sales dataset:

 SELECT 
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  datetime as DATE,
  lag(sales_num,1,0) over (partition by salesman_id order by datetime desc) as LAG_SALES
FROM sales
WHERE extract(MONTH FROM datetime) <= 6
ORDER BY salesman_id,datetime desc;

Here the LAG_SALES column contains sales numbers that are lagging based on the datetime column. The column is reset for every salesman.

SALESMAN _ID	SALES _ITEM	SALES _NUM	SALES _PRICE	DATE	LAG_ SALES
10	40	40	5.1	14-APR-17	0
10	200	20	2.1	20-MAR-17	40
10	10	50	55.1	14-MAR-17	20
10	200	20	1.1	07-MAR-17	50
10	100	4	50.5	01-MAR-17	20
10	6	550	16.1	07-FEB-17	4
10	8	22	160.5	15-JAN-17	550
30	25	30	2.1	25-MAY-17	0
30	25	330	12.1	25-MAR-17	30
30	200	7	50	01-MAR-17	330
30	25	50	26.1	03-FEB-17	7
40	200	40	10.5	02-MAR-17	0
50	600	215	10	03-JUN-17	0
50	200	15	110	03-MAR-17	215
60	200	8	30.5	04-MAR-17	0
70	200	22	10.5	05-MAR-17	0
70	80	2	60.5	05-JAN-17	22

We can try a more complex example. We know who are our top salesmen, in terms of volume; let's explore the difference between the sales volume of the current salesman and that of the top salesman. We'll use a six-month timeframe again.

We'll compute this in two steps.

Step 1

In this step, we are calculating LAG sales numbers. We are gathering sales_numbers values, offset by 1. If we do not find any row preceding the value, we put in the default value of 0.

SELECT 
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  extract(MONTH FROM datetime) month,
	leg(sales_num,1,0) over (order by sales_num desc) as LAG_SALES
FROM sales
WHERE extract(MONTH FROM datetime) <= 6
ORDER BY sales_num desc;

Here is the result:

SALESMAN _ID	SALES _ITEM	SALES _NUM	SALES _PRICE	MONTH	LAG _SALES
10	6	550	16.1	2	0
30	25	330	12.1	3	550
50	600	215	10	6	330
30	25	50	26.1	2	215
10	10	50	55.1	3	50
40	200	40	10.5	3	50
10	40	40	5.1	4	40
30	25	30	2.1	5	40
10	8	22	160.5	1	30
70	200	22	10.5	3	22
10	200	20	2.1	3	22
10	200	20	1.1	3	20
50	200	15	110	3	20
60	200	8	30.5	3	15
30	200	7	50	3	8
10	100	4	50.5	3	7
70	80	2	60.5	1	4

Step 2

Now we will compute the difference between the volume of the current row and the volume of the previous row:

SELECT 
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  extract(MONTH FROM datetime) month,
	ABS(leg(sales_num,1,0) over (order by sales_num desc)-sales_num) as DIFFERENCE_SALES
FROM sales
WHERE extract(MONTH FROM datetime) <= 6
ORDER BY sales_num desc;

The result is:

SALESMAN _ID	SALES _ITEM	SALES _NUM	SALES _PRICE	MONTH	DIFFERENCE _SALES
10	6	550	16.1	2	550
30	25	330	12.1	3	220
50	600	215	10	6	115
30	25	50	26.1	2	165
10	10	50	55.1	3	0
40	200	40	10.5	3	10
10	40	40	5.1	4	0
30	25	30	2.1	5	10
10	8	22	160.5	1	8
70	200	22	10.5	3	0
10	200	20	2.1	3	2
10	200	20	1.1	3	0
50	200	15	110	3	5
60	200	8	30.5	3	7
30	200	7	50	3	1
10	100	4	50.5	3	3
70	80	2	60.5	1	2

Now we can continue our analysis on the new column and see some interesting trends. We see that the top performers in sales volume are better then the rest by couple of orders of magnitude. Where we find zero we know that the salesman are selling the same amount of goods. See the resulting data-set and try to come up with some of your own insights.

LEAD

LEAD is an analytical function that returns the value of an argument column that is offset-ed by an inputted number or rows in the partition window. The function is extracting the offset downstream from the supplied execution order.

Like LAG, the LEAD function returns the value from some row after the current row (for example: the next row, or the row two rows after current), in the specified sorting order.

So the difference is in the direction in which we take the offset-ed value.

The syntax of the function looks the same as with the LAG function (except the keyword):

LEAD(value, offset, default_value)

value is the column we are calculating upon
offset is the number of rows lagging we skip to get to the value
default_value is the value the function returns if there is no value left in the lead

To demonstrate this function let's get the same result we got from the LAG example. To achieve this we will change one small thing besides the LAG keyword.

This is the query:

SELECT 
  salesman_id,
  sales_item,
  sales_num,
  sales_price,
  extract(MONTH FROM datetime) month,
	ABS(lead(sales_num,1,0) over (order by sales_num desc)-sales_num) as DIFFERENCE_SALES
FROM sales
WHERE extract(MONTH FROM datetime) <= 6
ORDER BY sales_num desc;

Can you spot the difference?

(Hint: This was the challenge in the FIRST_VALUE / LAST_VALUE demonstration.)

Conclusion

On your learning path, I encourage you to practice, practice, practice 😉 There is no better place for that than LearnSQL.com. Discover how you can practice advanced SQL with our platform.

The new functions are best mastered and practiced through the excellent hand-on courses of LearnSQL. Practice coding as much as you can and use this article as a side companion, this is the best way to learn advanced SQL.

Tags: