Back to articles list Articles Cookbook
11 minutes read

Six Window Function Templates to Save in Your Code Editor

Do you use window functions often enough to know them well but not often enough to remember all the details? Here are six window function codes you should save as templates.

Window functions are a handy SQL option, but it doesn’t mean you use them all the time. Maybe you only use them occasionally; if that’s the case, you probably find yourself needing to refresh your memory every time you use them. It’s easy to forget the nuances of the functions, and you’re pulling your hair out and wondering why the code isn’t working. They say it’s always better to write something down, not remember it!

In the world of SQL, writing down equals saving your templates in the code editor. Why shouldn’t you make your life easier and have a cheat sheet for window functions?

Speaking of cheat sheets, there’s one already. You can print it and have it by your side; that way, you can quickly remember the syntax and how any specific function works. Everything in this cheat sheet, you can also learn and practice in our course on window functions.

Having a cheat sheet is good but having it in your code editor is even better.

What Are Window Functions?

You probably remember that window functions do calculations across a set of table rows. This set of rows is called a window or a window frame. All this and much more is explained in a very accessible way in this article.

A window function syntax looks like this:

window_function ([ALL] expression) OVER (
  [PARTITION BY partition_list] 
  [ORDER BY order_list] 
  [window_frame_clause]
)

You see that the window function syntax is rather complicated. It’s easy to forget the details if you’re not using it very often—especially the nuances that differentiate one function from another. Having templates in your code editor is a handy way to write the code much quicker.

I won’t go into further explanation of the syntax, but I won’t leave you in the lurch, either. Here’s an article explaining the window function syntax and all its elements.

Now, on to the templates!

Template 1: Ranking the Rows

By using the RANK() function, I can rank anything I want. You’ll probably use it pretty often; we live in a world of top lists. Just take a look at the title of this article. You just can’t escape the top five this, the best ten that, or the worst fifteen something. The business world is no different, so it’s best you have your ranking functions ready.

Ranking functions? Yes, plural. Aside from RANK(), there are three other ranking functions:

  • DENSE RANK
  • ROW_NUMBER
  • NTILE

More on these ranking functions can be found in an article dedicated solely to the ranking functions.

Here is the table rich_people, containing the ten wealthiest people in the world in no particular order. The table includes the following columns:

  • id: the ID of the person
  • first_name: the first name of the person
  • last_name: the last name of the person
  • net_worth: the net worth of the person in billions of dollars

How would you select only the five richest people and rank them from the richest to the least rich? Here’s how:

SELECT *
FROM (
	SELECT	RANK() OVER (ORDER BY net_worth DESC) AS row_rank,
			first_name,
			last_name,
			net_worth
	FROM rich_people
) AS net_worth_rank
WHERE row_rank <= 5;

In this code, I use a subquery to get the desired result. This subquery contains a RANK() function, ordering the rows in a descending order by the column net_worth. The rank of the rows will appear in the new column row_rank. It also selects the columns from the table rich_people. The subquery used in the FROM clause has to have an alias; here, it’s net_worth_rank. Since I want only the top five rich people, there is a condition WHERE row_rank <= 5.

Run the query to get the following result:

row_rankfirst_namelast_namenet_worth
1NaomiMarken644.58
2KarelCharapek612.33
3JebediahVaultory562.84
4Jean-JacquesQuichelorraine499.51
5Vicky ChristinaBarcelona425.37

Store this query in your code editor, and you’ll always have a ranking code ready to play with and adapt to your needs. I didn’t use PARTITION BY in this code, which you may need sometimes. If you’re not sure how to use it, this article talks about ranking and using PARTITION BY.

Template 2: Difference Between two Consecutive Periods

This is another common use of SQL window functions. It’s also common in any data-related job to calculate daily, monthly, quarterly, or any other differences between consecutive periods. I’m going to show you how to use window functions to do this easily.

There’s a table country_gdp with the following columns:

  • country: the name of the country
  • GDP: the GDP of the country in billions of dollars
  • year: the year of the GDP

Your task is to calculate the difference in GDP (increase/decrease) between the years. You could do that by using the window function LAG() like this:

SELECT	country,
		GDP,
		year,
		GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year) AS     
GDP_difference
FROM country_gdp;

The code selects all three columns from the table country_gdp. To calculate the difference, you need to take the GDP of one year and subtract from it the GDP of the previous year. The LAG() function helps you achieve that; it returns the value of the previous row. This means the part GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year) in the code is nothing more than the current GDP minus the previous GDP, since LAG() takes the GDP value from the previous row. We need to do this by country, hence PARTITION BY country. The calculation needs to be performed for every year starting from the earliest year to the most recent, i.e., in ascending order, so it’s ordered by the column year. The result will appear in the column GDP_difference.

And here’s the result:

countryGDPyearGDP_difference
Quazistan585.792017NULL
Quazistan584.592018-1.2
Quazistan562.552019-22.04
Snowland185.682017NULL
Snowland190.5720184.89
Snowland180.562019-10.01
Sunnylandia270.122017NULL
Sunnylandia276.5620186.44
Sunnylandia294.22201917.66

We have a NULL value for every country in 2017, because this is the starting year. There is no previous row we can use to subtract from the 2017 value.

Template 3: Percent Difference Between two Consecutive Periods

This is a variation of the previous template. However, it’s nice to have them saved separately, because sometimes you just need the percentage. This way, there’s no need to tweak the previous query; you simply run the query below.

Using the same data from the example above, how would you calculate the percent difference in GDP? Let’s take Quazistan as an example.

countryGDPyearGDP_difference
Quazistan585.792017NULL
Quazistan584.592018-1.2

In this case, the percent difference should be (584.59-585.79)/585.79*100. It’s written in SQL like this:

SELECT	country,
		GDP,
		year,
		((GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year))/LAG (GDP) OVER (PARTITION BY country ORDER BY year)*100) AS GDP_percent_difference
FROM country_gdp;

The code is almost identical to that of the previous example. Except this time, we don’t just subtract one value from another using the LAG() function. The difference is then divided by the GDP of the previous year, again using the LAG() function, after which the result is multiplied by 100 to get the percentage. This percentage shows up in the column GDP_percent_difference.

Run the code to get this result:

countryGDPyearGDP_percent_difference
Quazistan585.792017NULL
Quazistan584.592018-0.204851568
Quazistan562.552019-3.770163704
Snowland185.682017NULL
Snowland190.5720182.633563119
Snowland180.562019-5.252663063
Sunnylandia270.122017NULL
Sunnylandia276.5620182.384125574
Sunnylandia294.2220196.385594446

Template 4: Calculating the Running Total

The running total is the same as the cumulative sum. While it’s not very advanced mathematics to say the least, you’ll often run into it when looking at data. Fortunately, this calculation becomes even easier by using window functions.

You have a table named subscribers, containing the monthly number of subscribers to your YouTube channel from three cities. These are the columns:

  • city: the name of the city
  • new_subscribers: the number of new subscribers
  • month: the month of subscription

How would you calculate the running total of your subscribers by city? Here’s how:

SELECT	city,
		new_subscribers,
		SUM (new_subscribers) OVER (PARTITION BY city ORDER BY month) AS 
            cumulative_subscribers,
		month
FROM subscribers;

Apart from selecting all the columns from the table subscribers, what does this code do? It calculates the running total and shows the result in the column cumulative_subscribers. To do that, I use the SUM() window function. This function summarizes the column new_subscribers, defined in the parentheses. Then, the regular part of the window function follows. We need the result by city, so this is specified by the PARTITION BY. The operation needs to be performed for each month starting from the oldest, which can be done by ordering the rows by the column month.

There you have it, a lovely little template giving you the following result:

citynew_subscriberscumulative_subscribersmonth
Amsterdam41241204/2020
Amsterdam24165305/2020
Amsterdam31296506/2020
Warsaw10810804/2020
Warsaw9119905/2020
Warsaw12432306/2020
Zagreb282804/2020
Zagreb396705/2020
Zagreb5011706/2020

Template 5: Calculating the Moving Average

The moving average has several pseudonyms. If someone refers to the rolling mean, the rolling average, or the running average, don’t worry; it’s the same thing as the moving average. Different names, same calculation. The moving average is often used in analyzing time series data to determine trends. This article will help you understand what the moving average is for those of you who are not familiar with it.

Here’s an example of calculating the moving average in SQL. Feel free to store it in your code editor, adapt it to your needs, and use it whenever needed. The table for this example is eur_usd, which contains historical exchange rates from the EUR to the USD in the following columns:

  • date: the date of the exchange rate
  • exchange_rate: the value of the exchange rate

To calculate the three-day moving average of the exchange rate, you’ll need the following code:

SELECT	date,
		exchange_rate,
		AVG (exchange_rate) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average
FROM eur_usd;

The code first selects the columns from the table eur_usd. There’s an AVG() function calculating the average of the exchange rate. This calculation needs to be performed sequentially, so I’ve ordered it by date. The steps to calculate the three-day moving average is to select the values from the current row and the previous two, sum them, then divide the sum by three, i.e., the number of rows taken into account. This is done by the part of the code: ROWS BETWEEN 2 PRECEDING AND CURRENT ROW. You’ll see the result in the column moving_average; here it is:

dateexchange_ratemoving_average
2020-09-011.19211.1921
2020-09-021.18531.1887
2020-09-031.18471.187366
2020-09-041.18381.1846
2020-09-051.18381.1841
2020-09-061.18431.183966
2020-09-071.18161.183233

Template 6: Calculating the Last/First Value in Each Group

This is a convenient feature of window functions, allowing you to group the data and then find the row with the smallest or the largest value. It’s also possible to do that with a GROUP BY, but only if you’re using MySQL. What if you’re not? The functions LAST_VALUE and FIRST_VALUE are what you need.

How do they work? I’ll show you an example, using the product_sold table. It contains the following columns:

  • product: the name of the product
  • product_category: the category of the product
  • items_sold: the number of product items sold
SELECT	product,
		product_category,
		items_sold,
		LAST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bestseller
FROM product_sold;

What does this window function do? I want the best-selling product, that is, the largest number of items sold, so I define it by LAST_VALUE (product). Of course, this needs to be done by category, so I specify that with PARTITION BY product_category. What are the criteria for the product being the best-selling? It's by the number of items sold. That’s why we have ORDER BY items_sold; the value of the sold items in the last row represents the largest volume. After that, I define the frame by writing ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING; this means the frame starts at the first row and ends at the last row. This is the result of the query:

productproduct_categoryitems_soldbestseller
CD51Cable13CD61
CD31Cable100CD61
CD41Cable417CD61
CD61Cable2541CD61
MX001Headphones59MX500
MX202Headphones658MX500
MX420Headphones1586MX500
MX500Headphones2587MX500
BX1AMicrophone354BX1B
BX1DMicrophone412BX1B
BX1CMicrophone587BX1B
BX1BMicrophone954BX1B

The logic is similar if you want to calculate the product that sold the least in each category. In that case, you use FIRST_VALUE() function.

SELECT	product,
		product_category,
		items_sold,
		FIRST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold) AS worstseller
FROM product_sold;

This is almost identical to the previous code. Instead of LAST_VALUE(), there’s FIRST_VALUE(). The window is again defined by product_category, and the rows are ordered by items_sold. Here, you can omit the part ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. The result of this query looks like this:

productproduct_categoryitems_soldworstseller
CD51Cable13CD51
CD31Cable100CD51
CD41Cable417CD51
CD61Cable2541CD51
MX001Headphones59MX001
MX202Headphones658MX001
MX420Headphones1586MX001
MX500Headphones2587MX001
BX1AMicrophone354BX1A
BX1DMicrophone412BX1A
BX1CMicrophone587BX1A
BX1BMicrophone954BX1A

Interested in the Possibilities of the SQL Window Functions?

These six are probably the most commonly used window functions. This is only to show you how applicable they are in business and everyday life. There’s much more to window functions, which you can discover in the window functions course. If you want to learn about window functions and are interested in what this course can offer you, read the interview with the course creator herself.

Feel free to use the comments section! Let me know which window functions you have already saved in your code editor.