22nd Apr 2020 11 minutes read When Do I Use SQL Window Functions? Tihomir Babic window functions Table of Contents What Are Window Functions? Example 1: Calculate Average Salary and Compare Individual Salary to the Average Example 2: Calculate Average Number of Products Sold by Date and Region Example 3: Calculate Cumulative Sum of Products Sold by Region Example 4: Compare Stock Prices to Minimum and Maximum Prices on the Market Example 5: Calculate Price Change Percentage and Moving Averages Do You Think Learning Window Functions Can Help You in Your Job? Have you heard one of your colleagues bragging about using SQL window functions? Do you know basic SQL but not much, or anything, about window functions? If you want to learn about window functions and how to use them in a business context, you’re in the right place! This happens often. You reach a certain level of knowledge, and you feel like the king (or queen) of the world. Then, you overhear your colleagues talking about something you have never heard of (in this case, window functions). You immediately feel a bit ashamed that you don’t know window functions already. Does this sound familiar? This feeling is not pleasant. But, it can be a good thing if it drives you to ask what window functions are. So, you Google it and find some useful articles (like this one). Suddenly, the embarrassment will turn into empowerment when you realize that window functions are not mysterious or unlearnable. Again, you feel like the king of the world. Or queen. Maybe even both. Let's move towards this royal feeling! I’ll try to not bombard you with SQL. Instead, I’ll take the other way around. I’ll give several real-world business examples that will show you the practical use of window functions. What Are Window Functions? A window function is, simply put, a function that performs calculations across a set of table rows. The name comes from the fact that the set of rows is called a window or window frame. Here’s an example of what a window looks like. In the table below, the windows are marked by different colors. The cumulative sum is calculated for every region, so in this case, the windows are defined by region. dateregionproducts_soldcumulative_sum 2020-03-01Region 19999 2020-03-02Region 198197 2020-03-03Region 143240 2020-03-01Region 29696 2020-03-02Region 259155 2020-03-03Region 229184 2020-03-01Region 37676 2020-03-02Region 350126 2020-03-03Region 370196 The window function syntax is: window_function ([ALL] expression) OVER ([PARTITION BY partition_list] [ORDER BY order_list] [window_frame_clause]) The window function part of the syntax can be an aggregate function, such as SUM() or AVG(). It can also be another type of function, such as an analytic or ranking function. The window, or the number of rows over which certain calculations will be performed, is defined by the OVER() clause. PARTITION BY is an optional clause in window functions. It defines a partition on which the window function will perform the operation. If PARTITION BY is not defined, the window function will perform the operation on the entire table. You can learn more in this article. ORDER BY will sort the rows within each partition in the desired order. If this clause is not defined, then the clause will use the entire table. The window frame clause defines the number of rows over which the window function will operate using two keywords. One is ROWS, used to limit the number of rows by specifying the number of rows preceding or following the current row. The other one is RANGE, used to limit the number of rows by specifying a range of values with respect to the current row value. That’s why the window frame clause is also called the ROW or RANGE clause. To learn how to use this syntax and practice it, the LearnSQL.com course on window functions is the place to go. Also, if you're interested in further explanation of window functions, you can find it here, along with some examples. Let's now move on to the fun part, solving problems! Example 1: Calculate Average Salary and Compare Individual Salary to the Average This is a simple example of a common problem in the business world. Let’s say there’s a table employee, which contains data about employees' salaries. It consists of the following rows: id – unique id first_name – employee’s first name last_name – employee’s last name department – department of the employee salary – employee’s monthly salary Using this data, you first need to calculate the average salary for the whole company. Then, you need to calculate how much above or below the average salary each employee’s salary is. The following code will do that quickly: SELECT first_name, last_name, department, salary, AVG(salary) OVER() AS avg_salary, salary - AVG(salary) OVER() AS diff_salary FROM employee; Since you’re familiar with SQL, you probably recognize at least part of this code. It selects first_name, last_name, department, and salary from the table employee. Nothing new here. The line AVG(salary) OVER() AS avg_salary calculates the average salary in the window defined by OVER(). The result will be shown in the column avg_salary. The line salary - AVG(salary) OVER() AS diff_salary calculates the difference between each employee’s salary and the average salary, The result will be shown in the column diff_salary. Running this code will give a helpful table. See an excerpt of it below: first_namelast_namedepartmentsalaryavg_salarydiff_salary EvangelinaChesshireTraining1,0152,469-1,454 JudDunkerleyLegal3,579.32,4691,111 EssaOdoSupport786.82,469-1,682 SaudraBolducServices609.22,469-1,860 GarveyJefferysSales4,600.22,4692,132 MaryjaneDumbrellServices590.92,469-1,878 RicaSiburnEngineering4,353.82,4691,885 ArlindaKilminsterSales3,891.92,4691,423 VerenaDevinnResearch and Development1,093.52,469-1,375 GerdaLegendreServices3,863.92,4691,395 Example 2: Calculate Average Number of Products Sold by Date and Region Imagine you’re working at a company that operates in three regions. The management wants to know the average number of products sold in each region. Also, they want to know the average number of products sold overall for each date. Window functions will allow you to do that easily. In this example, the data is stored in the table sales which has three columns: date – date of the product sale region – name of the region products_sold – number of products sold Here is the code needed to create that report: SELECT date, region, products_sold, AVG(products_sold) OVER(PARTITION BY date) AS avg_date, AVG(products_sold) OVER(PARTITION BY region) AS avg_region FROM sales ORDER BY region, date; This code selects the columns date, region, and products_sold. Then, it calculates the average number of products sold on each date. This is defined by the PARTITION BY() clause. The result will be shown in the column avg_date. The next line also calculates the average number of products sold, this time in each region. The result will be shown in the column avg_region. Then, the result is ordered by the region and the date using the ORDER BY clause. See the results below: dateregionproducts_soldavg_dateavg_region 2020-03-01Region 19990.33333380 2020-03-02Region 1986980 2020-03-03Region 14347.33333380 2020-03-01Region 29690.33333361.333333 2020-03-02Region 2596961.333333 2020-03-03Region 22947.33333361.333333 2020-03-01Region 37690.33333365.333333 2020-03-02Region 3506965.333333 2020-03-03Region 37047.33333365.333333 Example 3: Calculate Cumulative Sum of Products Sold by Region The management was happy with your previous report! Now, they want you to calculate the cumulative sum (or running total) of products sold in each region. Window functions are useful for doing such calculations. The calculation will be performed on the table sales used in Example 2. This code will allow you to quickly deliver the required numbers: SELECT date, region, products_sold, SUM(products_sold) OVER(PARTITION BY region ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_region FROM sales ORDER BY region, date; The first part of the code selects the same columns as the code in Example 2. Then, it calculates the sum of the products by region. This is defined by the PARTITION BY() clause. Of course, you need the cumulative sum, not the total sum. That’s why the window is ordered by the date using the ORDER BY command. Now that everything is defined, you need to tell SQL to add the value of the current row to the sum of the previous rows within the window. This is done by ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The window function will sum all rows before the current row (that’s why UNBOUNDED PRECEDING) and no rows after the current row (that’s why CURRENT ROW) within the specified window. The result will be shown in the column cumulative_region. The result will be ordered by the region and the date, as with the previous example. After running the code, you will get the table below: dateregionproducts_soldcumulative_region 2020-03-01Region 19999 2020-03-02Region 198197 2020-03-03Region 143240 2020-03-01Region 29696 2020-03-02Region 259155 2020-03-03Region 229184 2020-03-01Region 37676 2020-03-02Region 350126 2020-03-03Region 370196 Example 4: Compare Stock Prices to Minimum and Maximum Prices on the Market Your company is monitoring stock prices on NASDAQ and NYSE. The management wants you to calculate the minimum and maximum stock prices on both markets in the last 365 days. They also want to see how the price of each stock traded in 2020 differs from the minimum and maximum price on the market. The table stockprice contains data from 3/15/2019 to 3/14/2020 and consists of the following columns: date – date of the trading stock_name – name of the stock stock_price – price of the stock stock_market – market on which the stock is being traded max_price – maximum price on the market in the last 365 days min_price – minimum price on the market in the last 365 days diff_max – stock price minus maximum price on the market in the last 365 days diff_min – stock price minus minimum price on the market in the last 365 days The code that will quickly create the requested report looks like this: SELECT date, stock_name, stock_price, stock_market, MAX(stock_price) OVER(PARTITION BY stock_market) as max_price, MIN(stock_price) OVER(PARTITION BY stock_market) as min_price, stock_price - MAX(stock_price) OVER(PARTITION BY stock_market) AS diff_max, stock_price - MIN(stock_price) OVER(PARTITION BY stock_market) AS diff_min FROM stockprice WHERE date > '2019-12-31' ORDER BY date; The first part of the code selects the original columns in the table: date, stock_name, stock_price, and stock_market. Then, the window function MAX() with the PARTITION BY clause calculates the maximum price for each market, NASDAQ and NYSE, separately. The result will be shown in the column max_price. The next line of code works the same way, except now it calculates the minimum price. The result will be shown in the column min_price. The following lines of code calculate the difference between the stock price and the maximum and minimum price, respectively, for each market. The results will be shown in the columns diff_max and diff_min. Since the report needs to only show data from 2020, I’ve used the WHERE clause. Finally, the resulting table is ordered by the date, which is logical for such a report. Here’s what the first several rows of the report look like: datestock_namestock_pricestock_marketmax_pricemin_pricediff_maxdiff_min 1.1.2020MYOS RENS Technology Inc.91.49NASDAQ99.7530.37-8.2660.39 1.1.2020Huron Consulting Group Inc.59.56NASDAQ99.7530.37-40.1928.46 1.1.2020Sensient Technologies Corporation82.05NYSE99.5430.37-17.4951.68 2.1.2020Regions Financial Corporation46.75NYSE99.5430.37-52.7916.38 2.1.2020Ottawa Bancorp, Inc.51.55NASDAQ99.7530.37-48.220.45 3.1.2020Vanguard Long-Term Government Bond ETF70.62NASDAQ99.7530.37-29.1339.52 3.1.2020Bruker Corporation52.99NASDAQ99.7530.37-46.7621.89 Example 5: Calculate Price Change Percentage and Moving Averages The management now knows that you can deliver reports with great accuracy and speed. They are impressed! They don’t realize that you’ve been learning SQL window functions and that what they asked is easy-peasy for you. Next, they ask for something that should take you much more time. But it won’t! You have a table price that contains all the price changes of one stock in 2020. Sometimes there is only one change daily, sometimes there are more. The table consists of the following columns: date – date of the price stock_price – price of the stock The management has asked you to send them a report that will take every price and compare it with the previous price. Additionally, they’ve asked you to calculate the moving average for the stock price. This is how you’re going to do it: SELECT date, stock_price, (stock_price/LAG(stock_price) OVER(ORDER BY date)) - 1 AS percent_change, AVG(stock_price) OVER(ORDER BY date ROWS BETWEEN 0 PRECEDING AND 4 FOLLOWING) AS moving_avg FROM price; So, what does this code do? First, it selects the original columns in the table: date and stock_price. The next line introduces a new window function, LAG(). This function accesses data from the previous row, which is ideal for this task. The stock_price is divided by the previous price (hence, the LAG() function). Then, 1 is subtracted from the result to get a percentage. The result will be shown in the column percent_change. The following line calculates the moving average. It uses the AVG() window function, which you already know. In the OVER() clause, the data is ordered by the date. The management didn’t specify how they want the moving average to be calculated. So, I’ve decided to calculate it using five price changes, which is specified in ROWS BETWEEN 0 PRECEDING AND 4 FOLLOWING. When calculating the moving average, the code will take into account the current row and four rows that follow, which is five in total. You can play with that part of the code since there's no defined number of data that should be taken into account when calculating moving averages. It depends on the amount and type of data, as well as individual preference. By changing the numbers that come before PRECEDING and FOLLOWING, you can easily and quickly change the calculation, depending on the methodology you decide to use. See the results below: datestock_pricepercent_changemoving_avg 1.1.201936.37NULL39.126 1.1.201937.890.041792637.922 1.1.201944.080.163367638.768 2.1.201930.43-0.309664340.84 3.1.201946.860.539927741.058 3.1.201930.35-0.352326143.3 3.1.201942.120.387808845.276 4.1.201954.440.292497648.452 5.1.201931.52-0.42101448.78 6.1.201958.070.842322352.822 7.1.201940.23-0.307215549.19 Do You Think Learning Window Functions Can Help You in Your Job? I’ve given real-world business examples as a starting point instead of pure window functions code analysis. All five examples are something I’ve dealt with in my career. This is the way I learned SQL. First, I got a problem to solve. Then, I tried to figure out how to do it using SQL. If you think window functions will be useful for your job, this LearnSQL.com course is a great way to learn more. If you found these examples interesting or want to share some of your real-world examples, feel free to comment below! Tags: window functions