21st May 2024 15 minutes read SQL Window Functions Guide Tihomir Babic window functions learn sql Table of Contents The Syntax of SQL Window Functions Common SQL Window Functions SQL Window Functions Examples Ranking Window Function Example Aggregate Window Function Example Analytic Window Function Examples LAG() FIRST_VALUE() NTILE() Window Functions vs. GROUP BY and Aggregate Functions Advanced SQL Window Function Usage Ranking Running Totals and Moving Averages Difference Between Two Rows or Time Periods Analyzing Time Series Common Problems with SQL Window Functions Additional Resources for Practicing SQL Window Functions Every data analyst can benefit from knowing SQL window functions. They up your data analysis game and open doors to analyzing data on an entirely new level. This article is your starting point for the world of SQL window functions. SQL window functions allow you to show all the data rows and their aggregate values at the same time. Sounds a bit like GROUP BY on steroids, doesn’t it? It doesn’t stop there. As window functions perform various calculations across the set of rows, they also allow datasets to be divided into subsets. They help you easily rank data, aggregate it, calculate differences between different periods (e.g., profit increase/decrease between two months/quarters/years), and find cumulative sums, running totals, moving averages, etc. Window functions are a complex topic, but learning them is much easier with our Window Functions course. It’s an interactive course with 218 exercises that provide systematic learning with plenty of coding. You’ll learn all about essential window function clauses – e.g. OVER(), ORDER BY, and PARTITION BY – and what a window frame is. Then, you’ll learn how to use all that in the context of different window functions. After learning, it’s time for practice (which is also learning). Our Window Functions Practice Set gives you 100 additional interactive exercises so you can really get comfortable putting your knowledge to work. The Syntax of SQL Window Functions Window functions get their name from a window frame, which is a set of rows related to the current row. To perform a window function operation on the window frame, you need to know general window function syntax: SELECT column_1, column_2, <window_function> OVER(PARTITION BY … ORDER BY … <window_frame>) AS column_alias FROM table; There are several essential parts of this syntax that require explaining: OVER clause: This is the mandatory clause required to define a window frame. For example, you would recognize SUM(order_value) OVER() as a SUM() window function. Without OVER(), this is just a regular SUM() aggregate function. PARTITION BY: This is an optional clause for partitioning the dataset, i.e. dividing it into subsets. This lets you apply a window function to each partition separately. If this clause is omitted, then the whole result set is one partition. ORDER BY: This optional clause (for some window functions) is used for specifying the order of rows in a window frame. If you omit this clause, then the order of the rows in the window frame will be arbitrary. <window_frame>: This defines the upper and lower bounds of a window frame. Two important clauses used for that are ROWS and RANGE. ROWS defines the number of rows that precede and follow the current row. The RANGE clause defines the range of rows based on their value compared to the current row. You can learn more in our article about the differences between ROWS and RANGE. This part of the syntax is usually omitted, as the default window frame is what users most often need. The default window frame, in this case, depends on whether you use the ORDER BY clause in OVER(). If you do, then the frame is all the current row and all the rows before it in the current partition. If you don’t specify ORDER BY, the window frame is the current row and all the rows preceding and following it in the current partition. In the second case, the window frame is basically the whole dataset – or the whole partition, if you’re also using PARTITION BY. Common SQL Window Functions There are many different window functions. Here’s an overview of some of the most common: Window Function Category Window Function Description Further Reading Ranking Functions ROW_NUMBER() • Returns a unique row number for each row within a window frame. • Tied row values get different row numbers. • How to Number Rows in an SQL Result Set • How to Use ROW_NUMBER OVER() in SQL to Rank Data RANK() • Ranks the rows within a window frame. • Tied row values get the same rank, with a gap in the ranking. • What Is the RANK() Function in SQL, and How Do You Use It? • How to Rank Rows in SQL: A Complete Guide • How to Use the SQL RANK OVER (PARTITION BY) DENSE_RANK() • Ranks the rows within a window frame • Tied row values get the same rank, with no gap in the ranking. • Overview of Ranking Functions in SQL • What’s the Difference Between RANK and DENSE_RANK in SQL? Aggregate Functions SUM() • Calculates the sum of values within the window frame. • How to Use SUM() with OVER(PARTITION BY) in SQL AVG() • Calculates the average values within the window frame. COUNT() • Counts the values of rows within the window frame. • COUNT OVER PARTITION BY: An Explanation with 3 Examples MIN() • Finds the minimum value within the window frame. MAX() • Finds the maximum value within the window frame. Analytic Functions NTILE() • Divides the window frame into n groups. If possible, each group will have the same number of rows. • Each row is assigned its group number. • 6 Examples of NTILE() Function in SQL | LearnSQL.com LEAD() • Gets the data from a row that is a defined number of rows after the current one. • The LAG Function and the LEAD Function in SQL LAG() • Gets the data from a row that is a defined number of rows before the current one. • The LAG Function and the LEAD Function in SQL FIRST_VALUE() • Gets the value of the first row within the window frame. LAST_VALUE() • Gets the value of the last row within the window frame. SQL Window Functions Examples Now that we have gone through the basics, it’s time to show several practical window function examples. In all examples, I’ll use the same table. It’s entitled exchange_rates and contains the April 2024 European Central Bank (ECB) exchange rates for three currency pairs: EUR vs. USD, EUR vs. CHF, and EUR vs. JPY. Here’s a partial snapshot of the data. iddatecurrency_pairecb_rate 12024-04-02EUR_USD1.0749 22024-04-02EUR_JPY163.01 32024-04-02EUR_CHF0.9765 42024-04-03EUR_USD1.0783 52024-04-03EUR_JPY163.66 62024-04-03EUR_CHF0.9792 72024-04-04EUR_USD1.0852 82024-04-04EUR_JPY164.69 92024-04-04EUR_CHF0.9846 Ranking Window Function Example This example will show you how DENSE_RANK() works. The other two ranking window functions can be used in the same manner; they could (depending on the data) return slightly different results. The code below ranks the data in the table from the highest to the lowest exchange rate: SELECT date, currency_pair, ecb_rate, DENSE_RANK() OVER (ORDER BY ecb_rate DESC) AS rank_ecb_rate FROM exchange_rates; I select the date, currency pair, and rate. Now, I’ve chosen DENSE_RANK() to rank the data. This is just in case there are the same exchange rates (highly unlikely, but still…); I want them to be ranked the same, and I don’t want gaps in the ranking. The DENSE_RANK() function is followed by the OVER() clause that defines the window function. In the clause parentheses, I use another window function clause—ORDER BY. This way, I’m telling the DENSE_RANK() window function to rank the data by the ECB rate in descending order. This is what I get as output: datecurrency_pairecb_raterank_ecb_rate 2024-04-09EUR_JPY164.97001 2024-04-10EUR_JPY164.89002 2024-04-04EUR_JPY164.69003 ………... 2024-04-02EUR_JPY163.01009 2024-04-09EUR_USD1.086710 2024-04-10EUR_USD1.086011 2024-04-04EUR_USD1.085212 ………… 2024-04-12EUR_USD1.065218 2024-04-04EUR_CHF0.984619 2024-04-09EUR_CHF0.981920 2024-04-10EUR_CHF0.981021 ………… 2024-04-12EUR_CHF0.971627 As you can see, each row has been ranked according to its rate value. In case of the same rates, DENSE_RANK() would allocate the same ranking and wouldn’t skip the rank. RANK() would do the same, only it would skip the rank sequence. ROW_NUMBER() would assign consecutive ranking, even if some rows have the same exchange rate. Further Reading: Overview of Ranking Functions in SQL How to Rank Rows in SQL: A Complete Guide What’s the Difference Between RANK and DENSE_RANK in SQL? What Is the RANK() Function in SQL, and How Do You Use It? How to Use the SQL RANK OVER (PARTITION BY) How to Number Rows in an SQL Result Set How to Use ROW_NUMBER OVER() in SQL to Rank Data Aggregate Window Function Example Here’s a nice example of how you can use the AVG() window function to calculate the average rate for each pair of currencies: SELECT date, currency_pair, ecb_rate, AVG(ecb_rate) OVER (PARTITION BY currency_pair) AS avg_rate_by_currency FROM exchange_rates ORDER BY currency_pair, date; Like in the regular aggregate AVG() function, you must write the column you want the average of. To make it a window function, use the OVER() clause. This time, I use PARTITION BY in the column currency_pair. By doing this, I’m dividing the data into subsets according to the currency pair. Simply put, I’m calculating the average rate for each currency pair separately. Have a look at the code output. It’s been sorted by currency pairs and date: datecurrency_pairecb_rateavg_rate_by_currency 2024-04-02EUR_CHF0.97650.9793 2024-04-03EUR_CHF0.97920.9793 2024-04-04EUR_CHF0.98460.9793 ………… 2024-04-02EUR_JPY163.0100164.1211 2024-04-03EUR_JPY163.6600164.1211 2024-04-04EUR_JPY164.6900164.1211 ………… 2024-04-02EUR_USD1.07491.0795 2024-04-03EUR_USD1.07831.0795 2024-04-0EUR_USD1.08521.0795 The EUR vs. CHF average rate is 0.9793, and this value is repeated for every EUR vs. CHF row. When the window function reaches the next currency pair, the average resets and is calculated anew; for EUR vs. JPY, it’s 164.1211. Finally, the average for EUR vs. USD is 1.0795. This window function allowed me to calculate averages separately and show the average values without collapsing individual rows. In other words, I can see every daily rate alongside the average for that currency pair. Further Reading: How to Use SUM() with OVER(PARTITION BY) in SQL COUNT OVER PARTITION BY: An Explanation with 3 Examples Analytic Window Function Examples In this section, I’ll show three examples of different analytic SQL window functions. LAG() The LAG() example will demonstrate how to calculate daily change. This function is used for accessing the value from the preceding rows. Another analytic window function is LEAD(), which does exactly the opposite – it fetches the data from the following rows. Both functions have basically the same syntax; you just change the name of the function. In this example, I want to calculate the daily change in exchange rates: SELECT date, currency_pair, ecb_rate, ecb_rate - LAG(ecb_rate) OVER (PARTITION BY currency_pair ORDER BY date) AS daily_rate_change FROM exchange_rates; LAG() is a function for accessing the value(s) in the row(s) preceding the current row. In the above example, I specify the column ecb_rate in LAG(). This means I want to access the value of the ECB rate. I don’t explicitly specify the number of rows I want to go back to, so it will be the default value of one row. OVER() comes after the function name. Here, I partition the dataset by currency pair since I want to calculate the daily rate change for each currency pair separately. I also use ORDER BY to sort the data within the partitions. Since the logic is to go back one day, the data must be ordered ascendingly by date. So, the LAG() part of the code represents the previous day's exchange rate. To get the daily difference, I simply subtract this value from the current exchange rate (ecb_rate - LAG(ecb_rate)). Here’s the output: datecurrency_pairecb_ratedaily_rate_change 2024-04-02EUR_CHF0.9765NULL 2024-04-03EUR_CHF0.97920.0027 2024-04-04EUR_CHF0.98460.0054 2024-04-05EUR_CHF0.9793-0.0053 ………… 2024-04-02EUR_JPY163.0100NULL 2024-04-03EUR_JPY163.66000.6500 2024-04-04EUR_JPY164.69001.0300 2024-04-05EUR_JPY164.1000-0.5900 ………… 2024-04-02EUR_USD1.0749NULL 2024-04-03EUR_USD1.07830.0034 2024-04-04EUR_USD1.08520.0069 2024-04-05EUR_USD1.0841-0.0011 ………… The first row is NULL because there is no previous date, so the difference can’t be calculated. In the next row, the daily rate change is 0.9792-0.9765 = 0.0027. The same principle of taking the previous row value and subtracting it from the current is continued with every row. Since the dataset is partitioned by currency pair, the calculation is reset once it reaches other currency pairs, i.e. EUR vs. JPY and EUR vs. USD. Further Reading: • The LAG Function and the LEAD Function in SQL FIRST_VALUE() The FIRST_VALUE() window function can be used on our data to calculate the differences between the current and the first exchange rate of the month. I can do that because FIRST_VALUE() returns the first value within the partition. Its opposite is LAST_VALUE(), which returns the last value in the partition. Both functions have basically the same syntax; only the name of the function is different (and the outcome, of course!). SELECT date, currency_pair, ecb_rate, ecb_rate - FIRST_VALUE(ecb_rate) OVER (PARTITION BY currency_pair ORDER BY date) AS difference_current_first FROM exchange_rates ORDER BY currency_pair, date; I want the rate value; that’s why there’s the column ecb_rate in FIRST_VALUE(). The data is again partitioned by currency pair, as I want a calculation for each pair. OK, so FIRST_VALUE() is used to fetch the value from the first row in the partition. But what happens if I order the data in the partition ascendingly by date? That’s right; the first row is the one with the rate from the month’s first day. In our case, it’s the first exchange rate in April. Now, subtract this from the current exchange rate. Since we know our data is only for April, we get the difference between the current and first rates for that month. datecurrency_pairecb_ratedifference_current_lowest 2024-04-02EUR_CHF0.97650.0000 2024-04-03EUR_CHF0.97920.0027 ………… 2024-04-12EUR_CHF0.9716-0.0049 2024-04-02EUR_JPY163.01000.0000 2024-04-03EUR_JPY163.66000.6500 ………… 2024-04-12EUR_JPY163.16000.1500 2024-04-02EUR_USD1.07490.0000 2024-04-03EUR_USD1.07830.0034 ………… 2024-04-12EUR_USD1.0652-0.0097 When the difference is 0, the current and the earliest rates are the same. For EUR vs. CHF, the first rate is 0.9765. Let’s check the first two rows: 0.9765 - 0.9765 = 0.0000; 0.9792 - 0.9765 = 0.0027. The same principle is applied to the other two currency pairs. NTILE() The last example I’ll show is the NTILE() function, which divides the window (or partition) into groups. The argument in the NTILE() function parentheses specifies the number of groups you want your dataset to be divided into. The splitting will be done chronologically by ordering the data ascendingly by date: SELECT date, currency_pair, ecb_rate, NTILE(3) OVER (ORDER BY date) AS group_number FROM exchange_rates; And here’s the result: datecurrency_pairecb_rategroup_number 2024-04-02EUR_USD1.07491 2024-04-02EUR_JPY163.01001 ………… 2024-04-04EUR_CHF0.98461 2024-04-05EUR_USD1.08412 2024-04-05EUR_JPY164.10002 ………… 2024-04-09EUR_CHF0.98192 2024-04-10EUR_USD1.08603 2024-04-10EUR_JPY164.89003 ………… 2024-04-12EUR_CHF0.97163 The data is split into three groups. Since there are 27 rows of data in total, it could be split into equal groups of nine rows. Further Reading: 6 Examples of NTILE() Function in SQL | LearnSQL.com Window Functions vs. GROUP BY and Aggregate Functions Aggregate window functions, as you can assume from their name, are used to aggregate data. But what about ‘regular’ aggregate functions and GROUP BY? They, too, are used to aggregate data. So, how are the window functions different? The main difference is that aggregate window functions (and window functions in general) don’t collapse individual rows while showing the aggregated value. On the other hand, GROUP BY and aggregate functions can only show the aggregated values; they collapse individual rows. In simple words, window functions let you show the analytical and aggregate data at the same time. Further Reading: Aggregate Functions vs. Window Functions: A Comparison SQL Window Functions vs. GROUP BY: What’s the Difference? Differences Between GROUP BY and PARTITION BY Advanced SQL Window Function Usage Window functions have extensive usage in data analytics, as they can solve many business reporting requirements. Here’s an overview of some of the more complex window function usages. Ranking Yes, I already discussed ranking and showed you an example of how to do it. There are three window functions for ranking data: ROW_NUMBER(), RANK(), and DENSE_RANK(). They don’t necessarily return the same result, as they all have slightly different ways of ranking data. Which one you’ll use depends on your data and what you want to achieve. Further Reading: Overview of Ranking Functions in SQL How to Rank Rows in SQL: A Complete Guide What’s the Difference Between RANK and DENSE_RANK in SQL? What Is the RANK() Function in SQL, and How Do You Use It? How to Use the SQL RANK OVER (PARTITION BY) How to Number Rows in an SQL Result Set How to Use ROW_NUMBER OVER() in SQL to Rank Data Running Totals and Moving Averages Both these calculations are usually used in time series analysis. Time series are data showing values at certain points of time. Analyzing such data is, well, time series analysis. Its purpose is to unveil trends in data and find possible causes of significant deviations from the trend. The running total (or cumulative sum) is the sum of the current row and all the preceding rows’ values. As you move to the future, the time frame size increases by one row/data point and the value is added to the previous row’s running total. Moving averages are the average value of the last n periods. As you move to the future, the time frame moves – but its size remains the same. This is heavily used in the financial industry, e.g. a 5-day moving average in stock price analysis. That way, the average price is continuously updated, and the impact of significant short-term changes to a stock price is neutralized. The rolling total is a mixture of the running total and the moving average. It is the sum of the last n periods. For example, a 7-day rolling total is the cumulative sum over the period of 7 previous days. That way, you can see if the trends in the weekly sum without typical day of the week fluctuations. Difference Between Two Rows or Time Periods The window function used for calculating a difference between two rows is LAG(), which allows you to access values from the preceding rows. The difference between two time periods is basically the same thing; it only refers to finding differences when working with time series. In the examples section, I showed you how to do this. Analyzing Time Series Window functions work like a charm when you need to analyze time series. There’s not only the LAG() function to do that, but many others. Further Reading: Analyze Time Series COVID-19 Data with Window Functions How to Calculate the Length of a Series with SQL How to Analyze a Time Series in SQL Common Problems with SQL Window Functions There are several issues everybody using window functions runs into sooner or later: Confusing window functions with aggregate functions and GROUP BY, which we already discussed. Trying to use window functions in WHERE. This can’t be done because SQL processes WHERE conditions before the window functions. Trying to use window functions in GROUP BY, which is also not allowed because of SQL’s order of the operations: window functions are executed after GROUP BY. Additional Resources for Practicing SQL Window Functions SQL window functions are among the most useful SQL tools data analysts have. This is especially true when you go beyond basic reports and require sophisticated calculations and the ability to show analytical and aggregated data simultaneously. All the topics I covered in this article require further practice on practical examples, which you can find in the following articles and courses: 11 SQL Window Functions Exercises with Solutions Top 10 SQL Window Functions Interview Questions SQL Window Functions Cheat Sheet Window Functions Course Window Functions Practice Set Remember – practice makes perfect! So don’t just read the articles; make sure to get some hands-on coding practice as well. Happy learning! Tags: window functions learn sql