*Do you use window functions often enough to know them well but not often enough to remember all the details? Here are six window function codes you should save as templates.*

Window functions are a handy SQL option, but it doesn’t mean you use them all the time. Maybe you only use them occasionally; if that’s the case, you probably find yourself needing to refresh your memory every time you use them. It’s easy to forget the nuances of the functions, and you’re pulling your hair out and wondering why the code isn’t working. They say it’s always better to write something down, not remember it!

In the world of SQL, writing down equals saving your templates in the code editor. Why shouldn’t you make your life easier and have a cheat sheet for window functions?

Speaking of cheat sheets, there’s one already. You can print it and have it by your side; that way, you can quickly remember the syntax and how any specific function works. Everything in this cheat sheet, you can also learn and practice in our course on window functions.

Having a cheat sheet is good but having it in your code editor is even better.

## What Are Window Functions?

You probably remember that window functions do calculations across a set of table rows. This set of rows is called a window or a window frame. All this and much more is explained in a very accessible way in this article.

A window function syntax looks like this:

window_function ([ALL] expression) OVER ( [PARTITION BY partition_list] [ORDER BY order_list] [window_frame_clause] )

You see that the window function syntax is rather complicated. It’s easy to forget the details if you’re not using it very often—especially the nuances that differentiate one function from another. Having templates in your code editor is a handy way to write the code much quicker.

I won’t go into further explanation of the syntax, but I won’t leave you in the lurch, either. Here’s an article explaining the window function syntax and all its elements.

Now, on to the templates!

## Template 1: Ranking the Rows

By using the `RANK()`

function, I can rank anything I want. You’ll probably use it pretty often; we live in a world of top lists. Just take a look at the title of this article. You just can’t escape the top five this, the best ten that, or the worst fifteen something. The business world is no different, so it’s best you have your ranking functions ready.

Ranking functions? Yes, plural. Aside from `RANK()`

, there are three other ranking functions:

`DENSE RANK`

`ROW_NUMBER`

`NTILE`

More on these ranking functions can be found in an article dedicated solely to the ranking functions.

Here is the table

, containing the ten wealthiest people in the world in no particular order. The table includes the following columns:**rich_people**

`id`

: the ID of the person`first_name`

: the first name of the person`last_name`

: the last name of the person`net_worth`

: the net worth of the person in billions of dollars

How would you select only the five richest people and rank them from the richest to the least rich? Here’s how:

SELECT * FROM ( SELECT RANK() OVER (ORDER BY net_worth DESC) AS row_rank, first_name, last_name, net_worth FROM rich_people ) AS net_worth_rank WHERE row_rank <= 5;

In this code, I use a subquery to get the desired result. This subquery contains a `RANK()`

function, ordering the rows in a descending order by the column `net_worth`

. The rank of the rows will appear in the new column `row_rank`

. It also selects the columns from the table

. The subquery used in the **rich_people**`FROM`

clause has to have an alias; here, it’s `net_worth_rank`

. Since I want only the top five rich people, there is a condition `WHERE row_rank <= 5`

.

Run the query to get the following result:

row_rank | first_name | last_name | net_worth |
---|---|---|---|

1 | Naomi | Marken | 644.58 |

2 | Karel | Charapek | 612.33 |

3 | Jebediah | Vaultory | 562.84 |

4 | Jean-Jacques | Quichelorraine | 499.51 |

5 | Vicky Christina | Barcelona | 425.37 |

Store this query in your code editor, and you’ll always have a ranking code ready to play with and adapt to your needs. I didn’t use `PARTITION BY`

in this code, which you may need sometimes. If you’re not sure how to use it, this article talks about ranking and using `PARTITION BY`

.

## Template 2: Difference Between two Consecutive Periods

This is another common use of SQL window functions. It’s also common in any data-related job to calculate daily, monthly, quarterly, or any other differences between consecutive periods. I’m going to show you how to use window functions to do this easily.

There’s a table

with the following columns:**country_gdp**

`country`

: the name of the country`GDP`

: the GDP of the country in billions of dollars`year`

: the year of the GDP

Your task is to calculate the difference in GDP (increase/decrease) between the years. You could do that by using the window function `LAG()`

like this:

SELECT country, GDP, year, GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year) AS GDP_difference FROM country_gdp;

The code selects all three columns from the table

. To calculate the difference, you need to take the GDP of one year and subtract from it the GDP of the previous year. The **country_gdp**`LAG()`

function helps you achieve that; it returns the value of the previous row. This means the part `GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year)`

in the code is nothing more than the current GDP minus the previous GDP, since `LAG()`

takes the `GDP`

value from the previous row. We need to do this by country, hence `PARTITION BY`

country. The calculation needs to be performed for every year starting from the earliest year to the most recent, i.e., in ascending order, so it’s ordered by the column year. The result will appear in the column `GDP_difference`

.

And here’s the result:

country | GDP | year | GDP_difference |
---|---|---|---|

Quazistan | 585.79 | 2017 | NULL |

Quazistan | 584.59 | 2018 | -1.2 |

Quazistan | 562.55 | 2019 | -22.04 |

Snowland | 185.68 | 2017 | NULL |

Snowland | 190.57 | 2018 | 4.89 |

Snowland | 180.56 | 2019 | -10.01 |

Sunnylandia | 270.12 | 2017 | NULL |

Sunnylandia | 276.56 | 2018 | 6.44 |

Sunnylandia | 294.22 | 2019 | 17.66 |

We have a `NULL`

value for every country in 2017, because this is the starting year. There is no previous row we can use to subtract from the 2017 value.

## Template 3: Percent Difference Between two Consecutive Periods

This is a variation of the previous template. However, it’s nice to have them saved separately, because sometimes you just need the percentage. This way, there’s no need to tweak the previous query; you simply run the query below.

Using the same data from the example above, how would you calculate the percent difference in GDP? Let’s take Quazistan as an example.

country | GDP | year | GDP_difference |
---|---|---|---|

Quazistan | 585.79 | 2017 | NULL |

Quazistan | 584.59 | 2018 | -1.2 |

In this case, the percent difference should be (584.59-585.79)/585.79*100. It’s written in SQL like this:

SELECT country, GDP, year, ((GDP - LAG (GDP) OVER (PARTITION BY country ORDER BY year))/LAG (GDP) OVER (PARTITION BY country ORDER BY year)*100) AS GDP_percent_difference FROM country_gdp;

The code is almost identical to that of the previous example. Except this time, we don’t just subtract one value from another using the `LAG()`

function. The difference is then divided by the GDP of the previous year, again using the `LAG()`

function, after which the result is multiplied by 100 to get the percentage. This percentage shows up in the column `GDP_percent_difference`

.

Run the code to get this result:

country | GDP | year | GDP_percent_difference |
---|---|---|---|

Quazistan | 585.79 | 2017 | NULL |

Quazistan | 584.59 | 2018 | -0.204851568 |

Quazistan | 562.55 | 2019 | -3.770163704 |

Snowland | 185.68 | 2017 | NULL |

Snowland | 190.57 | 2018 | 2.633563119 |

Snowland | 180.56 | 2019 | -5.252663063 |

Sunnylandia | 270.12 | 2017 | NULL |

Sunnylandia | 276.56 | 2018 | 2.384125574 |

Sunnylandia | 294.22 | 2019 | 6.385594446 |

## Template 4: Calculating the Running Total

The running total is the same as the cumulative sum. While it’s not very advanced mathematics to say the least, you’ll often run into it when looking at data. Fortunately, this calculation becomes even easier by using window functions.

You have a table named

, containing the monthly number of subscribers to your YouTube channel from three cities. These are the columns:**subscribers**

`city`

: the name of the city`new_subscribers`

: the number of new subscribers`month`

: the month of subscription

How would you calculate the running total of your subscribers by city? Here’s how:

SELECT city, new_subscribers, SUM (new_subscribers) OVER (PARTITION BY city ORDER BY month) AS cumulative_subscribers, month FROM subscribers;

Apart from selecting all the columns from the table

, what does this code do? It calculates the running total and shows the result in the column **subscribers**`cumulative_subscribers`

. To do that, I use the `SUM()`

window function. This function summarizes the column `new_subscribers`

, defined in the parentheses. Then, the regular part of the window function follows. We need the result by city, so this is specified by the `PARTITION BY`

. The operation needs to be performed for each month starting from the oldest, which can be done by ordering the rows by the column month.

There you have it, a lovely little template giving you the following result:

city | new_subscribers | cumulative_subscribers | month |
---|---|---|---|

Amsterdam | 412 | 412 | 04/2020 |

Amsterdam | 241 | 653 | 05/2020 |

Amsterdam | 312 | 965 | 06/2020 |

Warsaw | 108 | 108 | 04/2020 |

Warsaw | 91 | 199 | 05/2020 |

Warsaw | 124 | 323 | 06/2020 |

Zagreb | 28 | 28 | 04/2020 |

Zagreb | 39 | 67 | 05/2020 |

Zagreb | 50 | 117 | 06/2020 |

## Template 5: Calculating the Moving Average

The moving average has several pseudonyms. If someone refers to the rolling mean, the rolling average, or the running average, don’t worry; it’s the same thing as the moving average. Different names, same calculation. The moving average is often used in analyzing time series data to determine trends. This article will help you understand what the moving average is for those of you who are not familiar with it.

Here’s an example of calculating the moving average in SQL. Feel free to store it in your code editor, adapt it to your needs, and use it whenever needed. The table for this example is

, which contains historical exchange rates from the EUR to the USD in the following columns:**eur_usd**

`date`

: the date of the exchange rate`exchange_rate`

: the value of the exchange rate

To calculate the three-day moving average of the exchange rate, you’ll need the following code:

SELECT date, exchange_rate, AVG (exchange_rate) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average FROM eur_usd;

The code first selects the columns from the table

. There’s an **eur_usd**`AVG()`

function calculating the average of the exchange rate. This calculation needs to be performed sequentially, so I’ve ordered it by date. The steps to calculate the three-day moving average is to select the values from the current row and the previous two, sum them, then divide the sum by three, i.e., the number of rows taken into account. This is done by the part of the code: `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`

. You’ll see the result in the column `moving_average`

; here it is:

date | exchange_rate | moving_average |
---|---|---|

2020-09-01 | 1.1921 | 1.1921 |

2020-09-02 | 1.1853 | 1.1887 |

2020-09-03 | 1.1847 | 1.187366 |

2020-09-04 | 1.1838 | 1.1846 |

2020-09-05 | 1.1838 | 1.1841 |

2020-09-06 | 1.1843 | 1.183966 |

2020-09-07 | 1.1816 | 1.183233 |

## Template 6: Calculating the Last/First Value in Each Group

This is a convenient feature of window functions, allowing you to group the data and then find the row with the smallest or the largest value. It’s also possible to do that with a `GROUP BY`

, but only if you’re using MySQL. What if you’re not? The functions `LAST_VALUE`

and `FIRST_VALUE`

are what you need.

How do they work? I’ll show you an example, using the

table. It contains the following columns:**product_sold**

`product`

: the name of the product`product_category`

: the category of the product`items_sold`

: the number of product items sold

SELECT product, product_category, items_sold, LAST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS bestseller FROM product_sold;

What does this window function do? I want the best-selling product, that is, the largest number of items sold, so I define it by `LAST_VALUE (product)`

. Of course, this needs to be done by category, so I specify that with `PARTITION BY product_category`

. What are the criteria for the product being the best-selling? It's by the number of items sold. That’s why we have `ORDER BY items_sold`

; the value of the sold items in the last row represents the largest volume. After that, I define the frame by writing `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`

; this means the frame starts at the first row and ends at the last row. This is the result of the query:

product | product_category | items_sold | bestseller |
---|---|---|---|

CD51 | Cable | 13 | CD61 |

CD31 | Cable | 100 | CD61 |

CD41 | Cable | 417 | CD61 |

CD61 | Cable | 2541 | CD61 |

MX001 | Headphones | 59 | MX500 |

MX202 | Headphones | 658 | MX500 |

MX420 | Headphones | 1586 | MX500 |

MX500 | Headphones | 2587 | MX500 |

BX1A | Microphone | 354 | BX1B |

BX1D | Microphone | 412 | BX1B |

BX1C | Microphone | 587 | BX1B |

BX1B | Microphone | 954 | BX1B |

The logic is similar if you want to calculate the product that sold the least in each category. In that case, you use `FIRST_VALUE()`

function.

SELECT product, product_category, items_sold, FIRST_VALUE (product) OVER (PARTITION BY product_category ORDER BY items_sold) AS worstseller FROM product_sold;

This is almost identical to the previous code. Instead of `LAST_VALUE()`

, there’s `FIRST_VALUE()`

. The window is again defined by `product_category`

, and the rows are ordered by `items_sold`

. Here, you can omit the part `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`

. The result of this query looks like this:

product | product_category | items_sold | worstseller |
---|---|---|---|

CD51 | Cable | 13 | CD51 |

CD31 | Cable | 100 | CD51 |

CD41 | Cable | 417 | CD51 |

CD61 | Cable | 2541 | CD51 |

MX001 | Headphones | 59 | MX001 |

MX202 | Headphones | 658 | MX001 |

MX420 | Headphones | 1586 | MX001 |

MX500 | Headphones | 2587 | MX001 |

BX1A | Microphone | 354 | BX1A |

BX1D | Microphone | 412 | BX1A |

BX1C | Microphone | 587 | BX1A |

BX1B | Microphone | 954 | BX1A |

## Interested in the Possibilities of the SQL Window Functions?

These six are probably the most commonly used window functions. This is only to show you how applicable they are in business and everyday life. There’s much more to window functions, which you can discover in the window functions course. If you want to learn about window functions and are interested in what this course can offer you, read the interview with the course creator herself.

Feel free to use the comments section! Let me know which window functions you have already saved in your code editor.