29th Sep 2020 11 minutes read

Six Window Function Templates to Save in Your Code Editor

Table of Contents

What Are Window Functions?
Template 1: Ranking the Rows
Template 2: Difference Between two Consecutive Periods
Template 3: Percent Difference Between two Consecutive Periods
Template 4: Calculating the Running Total
Template 5: Calculating the Moving Average
Template 6: Calculating the Last/First Value in Each Group
Interested in the Possibilities of the SQL Window Functions?

Do you use window functions often enough to know them well but not often enough to remember all the details? Here are six window function codes you should save as templates.

Window functions are a handy SQL option, but it doesn’t mean you use them all the time. Maybe you only use them occasionally; if that’s the case, you probably find yourself needing to refresh your memory every time you use them. It’s easy to forget the nuances of the functions, and you’re pulling your hair out and wondering why the code isn’t working. They say it’s always better to write something down, not remember it!

In the world of SQL, writing down equals saving your templates in the code editor. Why shouldn’t you make your life easier and have a cheat sheet for window functions?

Speaking of cheat sheets, there’s one already. You can print it and have it by your side; that way, you can quickly remember the syntax and how any specific function works. Everything in this cheat sheet, you can also learn and practice in our course on window functions.

Having a cheat sheet is good but having it in your code editor is even better.

What Are Window Functions?

You probably remember that window functions do calculations across a set of table rows. This set of rows is called a window or a window frame. All this and much more is explained in a very accessible way in this article.

A window function syntax looks like this:

window_function ([ALL] expression) OVER (
  [PARTITION BY partition_list] 
  [ORDER BY order_list] 
  [window_frame_clause]
)

You see that the window function syntax is rather complicated. It’s easy to forget the details if you’re not using it very often—especially the nuances that differentiate one function from another. Having templates in your code editor is a handy way to write the code much quicker.

I won’t go into further explanation of the syntax, but I won’t leave you in the lurch, either. Here’s an article explaining the window function syntax and all its elements.

Now, on to the templates!

Template 1: Ranking the Rows

By using the RANK() function, I can rank anything I want. You’ll probably use it pretty often; we live in a world of top lists. Just take a look at the title of this article. You just can’t escape the top five this, the best ten that, or the worst fifteen something. The business world is no different, so it’s best you have your ranking functions ready.

Ranking functions? Yes, plural. Aside from RANK(), there are three other ranking functions:

DENSE RANK
ROW_NUMBER
NTILE

More on these ranking functions can be found in an article dedicated solely to the ranking functions.

Here is the table rich_people, containing the ten wealthiest people in the world in no particular order. The table includes the following columns:

id: the ID of the person
first_name: the first name of the person
last_name: the last name of the person
net_worth: the net worth of the person in billions of dollars

How would you select only the five richest people and rank them from the richest to the least rich? Here’s how:

SELECT *
FROM (
	SELECT	RANK() OVER (ORDER BY net_worth DESC) AS row_rank,
			first_name,
			last_name,
			net_worth
	FROM rich_people
) AS net_worth_rank
WHERE row_rank <= 5;

In this code, I use a subquery to get the desired result. This subquery contains a RANK() function, ordering the rows in a descending order by the column net_worth. The rank of the rows will appear in the new column row_rank. It also selects the columns from the table rich_people. The subquery used in the FROM clause has to have an alias; here, it’s net_worth_rank. Since I want only the top five rich people, there is a condition WHERE row_rank <= 5.

Run the query to get the following result:

row_rank	first_name	last_name	net_worth
1	Naomi	Marken	644.58
2	Karel	Charapek	612.33
3	Jebediah	Vaultory	562.84
4	Jean-Jacques	Quichelorraine	499.51
5	Vicky Christina	Barcelona	425.37

Store this query in your code editor, and you’ll always have a ranking code ready to play with and adapt to your needs. I didn’t use PARTITION BY in this code, which you may need sometimes. If you’re not sure how to use it, this article talks about ranking and using PARTITION BY.

Template 2: Difference Between two Consecutive Periods

This is another common use of SQL window functions. It’s also common in any data-related job to calculate daily, monthly, quarterly, or any other differences between consecutive periods. I’m going to show you how to use window functions to do this easily.

There’s a table country_gdp with the following columns:

country: the name of the country
GDP: the GDP of the country in billions of dollars
year: the year of the GDP

Your task is to calculate the difference in GDP (increase/decrease) between the years. You could do that by using the window function LAG() like this:

SELECT	country,
		GDP,
		year,
		GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year) AS     
GDP_difference
FROM country_gdp;

The code selects all three columns from the table country_gdp. To calculate the difference, you need to take the GDP of one year and subtract from it the GDP of the previous year. The LAG() function helps you achieve that; it returns the value of the previous row. This means the part GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year) in the code is nothing more than the current GDP minus the previous GDP, since LAG() takes the GDP value from the previous row. We need to do this by country, hence PARTITION BY country. The calculation needs to be performed for every year starting from the earliest year to the most recent, i.e., in ascending order, so it’s ordered by the column year. The result will appear in the column GDP_difference.

And here’s the result:

country	GDP	year	GDP_difference
Quazistan	585.79	2017	NULL
Quazistan	584.59	2018	-1.2
Quazistan	562.55	2019	-22.04
Snowland	185.68	2017	NULL
Snowland	190.57	2018	4.89
Snowland	180.56	2019	-10.01
Sunnylandia	270.12	2017	NULL
Sunnylandia	276.56	2018	6.44
Sunnylandia	294.22	2019	17.66

We have a NULL value for every country in 2017, because this is the starting year. There is no previous row we can use to subtract from the 2017 value.

Template 3: Percent Difference Between two Consecutive Periods

This is a variation of the previous template. However, it’s nice to have them saved separately, because sometimes you just need the percentage. This way, there’s no need to tweak the previous query; you simply run the query below.

Using the same data from the example above, how would you calculate the percent difference in GDP? Let’s take Quazistan as an example.

country	GDP	year	GDP_difference
Quazistan	585.79	2017	NULL
Quazistan	584.59	2018	-1.2

In this case, the percent difference should be (584.59-585.79)/585.79*100. It’s written in SQL like this:

SELECT	country,
		GDP,
		year,
		((GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year))/LAG (GDP) OVER (PARTITION BY country ORDER BY year)*100) AS GDP_percent_difference
FROM country_gdp;

The code is almost identical to that of the previous example. Except this time, we don’t just subtract one value from another using the LAG() function. The difference is then divided by the GDP of the previous year, again using the LAG() function, after which the result is multiplied by 100 to get the percentage. This percentage shows up in the column GDP_percent_difference.

Run the code to get this result:

country	GDP	year	GDP_percent_difference
Quazistan	585.79	2017	NULL
Quazistan	584.59	2018	-0.204851568
Quazistan	562.55	2019	-3.770163704
Snowland	185.68	2017	NULL
Snowland	190.57	2018	2.633563119
Snowland	180.56	2019	-5.252663063
Sunnylandia	270.12	2017	NULL
Sunnylandia	276.56	2018	2.384125574
Sunnylandia	294.22	2019	6.385594446

Template 4: Calculating the Running Total

The running total is the same as the cumulative sum. While it’s not very advanced mathematics to say the least, you’ll often run into it when looking at data. Fortunately, this calculation becomes even easier by using window functions.

You have a table named subscribers, containing the monthly number of subscribers to your YouTube channel from three cities. These are the columns:

city: the name of the city
new_subscribers: the number of new subscribers
month: the month of subscription

How would you calculate the running total of your subscribers by city? Here’s how:

SELECT	city,
		new_subscribers,
		SUM (new_subscribers) OVER (PARTITION BY city ORDER BY month) AS 
            cumulative_subscribers,
		month
FROM subscribers;

Apart from selecting all the columns from the table subscribers, what does this code do? It calculates the running total and shows the result in the column cumulative_subscribers. To do that, I use the SUM() window function. This function summarizes the column new_subscribers, defined in the parentheses. Then, the regular part of the window function follows. We need the result by city, so this is specified by the PARTITION BY. The operation needs to be performed for each month starting from the oldest, which can be done by ordering the rows by the column month.

There you have it, a lovely little template giving you the following result:

city	new_subscribers	cumulative_subscribers	month
Amsterdam	412	412	04/2020
Amsterdam	241	653	05/2020
Amsterdam	312	965	06/2020
Warsaw	108	108	04/2020
Warsaw	91	199	05/2020
Warsaw	124	323	06/2020
Zagreb	28	28	04/2020
Zagreb	39	67	05/2020
Zagreb	50	117	06/2020

Template 5: Calculating the Moving Average

The moving average has several pseudonyms. If someone refers to the rolling mean, the rolling average, or the running average, don’t worry; it’s the same thing as the moving average. Different names, same calculation. The moving average is often used in analyzing time series data to determine trends. This article will help you understand what the moving average is for those of you who are not familiar with it.

Here’s an example of calculating the moving average in SQL. Feel free to store it in your code editor, adapt it to your needs, and use it whenever needed. The table for this example is eur_usd, which contains historical exchange rates from the EUR to the USD in the following columns:

date: the date of the exchange rate
exchange_rate: the value of the exchange rate

To calculate the three-day moving average of the exchange rate, you’ll need the following code:

SELECT	date,
		exchange_rate,
		AVG (exchange_rate) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average
FROM eur_usd;

The code first selects the columns from the table eur_usd. There’s an AVG() function calculating the average of the exchange rate. This calculation needs to be performed sequentially, so I’ve ordered it by date. The steps to calculate the three-day moving average is to select the values from the current row and the previous two, sum them, then divide the sum by three, i.e., the number of rows taken into account. This is done by the part of the code: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW. You’ll see the result in the column moving_average; here it is:

date	exchange_rate	moving_average
2020-09-01	1.1921	1.1921
2020-09-02	1.1853	1.1887
2020-09-03	1.1847	1.187366
2020-09-04	1.1838	1.1846
2020-09-05	1.1838	1.1841
2020-09-06	1.1843	1.183966
2020-09-07	1.1816	1.183233

Template 6: Calculating the Last/First Value in Each Group

This is a convenient feature of window functions, allowing you to group the data and then find the row with the smallest or the largest value. It’s also possible to do that with a GROUP BY, but only if you’re using MySQL. What if you’re not? The functions LAST_VALUE and FIRST_VALUE are what you need.

How do they work? I’ll show you an example, using the product_sold table. It contains the following columns:

product: the name of the product
product_category: the category of the product
items_sold: the number of product items sold

SELECT	product,
		product_category,
		items_sold,
		LAST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bestseller
FROM product_sold;

What does this window function do? I want the best-selling product, that is, the largest number of items sold, so I define it by LAST_VALUE (product). Of course, this needs to be done by category, so I specify that with PARTITION BY product_category. What are the criteria for the product being the best-selling? It's by the number of items sold. That’s why we have ORDER BY items_sold; the value of the sold items in the last row represents the largest volume. After that, I define the frame by writing ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING; this means the frame starts at the first row and ends at the last row. This is the result of the query:

product	product_category	items_sold	bestseller
CD51	Cable	13	CD61
CD31	Cable	100	CD61
CD41	Cable	417	CD61
CD61	Cable	2541	CD61
MX001	Headphones	59	MX500
MX202	Headphones	658	MX500
MX420	Headphones	1586	MX500
MX500	Headphones	2587	MX500
BX1A	Microphone	354	BX1B
BX1D	Microphone	412	BX1B
BX1C	Microphone	587	BX1B
BX1B	Microphone	954	BX1B

The logic is similar if you want to calculate the product that sold the least in each category. In that case, you use FIRST_VALUE() function.

SELECT	product,
		product_category,
		items_sold,
		FIRST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold) AS worstseller
FROM product_sold;

This is almost identical to the previous code. Instead of LAST_VALUE(), there’s FIRST_VALUE(). The window is again defined by product_category, and the rows are ordered by items_sold. Here, you can omit the part ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. The result of this query looks like this:

product	product_category	items_sold	worstseller
CD51	Cable	13	CD51
CD31	Cable	100	CD51
CD41	Cable	417	CD51
CD61	Cable	2541	CD51
MX001	Headphones	59	MX001
MX202	Headphones	658	MX001
MX420	Headphones	1586	MX001
MX500	Headphones	2587	MX001
BX1A	Microphone	354	BX1A
BX1D	Microphone	412	BX1A
BX1C	Microphone	587	BX1A
BX1B	Microphone	954	BX1A

Interested in the Possibilities of the SQL Window Functions?

These six are probably the most commonly used window functions. This is only to show you how applicable they are in business and everyday life. There’s much more to window functions, which you can discover in the window functions course. If you want to learn about window functions and are interested in what this course can offer you, read the interview with the course creator herself.

Feel free to use the comments section! Let me know which window functions you have already saved in your code editor.

Tags: