Back to articles list October 12, 2017 - 5 minutes read SQL Window Functions By Explanation Ignacio L. Bisso Ignacio is a database consultant from Buenos Aires, Argentina. He’s worked for 15 years as a database consultant for IT companies like Informix and IBM. These days, he teaches databases at Sarmiento University and works as a PostgreSQL independent SQL consultant. A proud father of four kids with 54 years in his backpack, Ignacio plays soccer every Saturday afternoon, enjoying every match as if it’s his last one. Tags: how to in sql SQL basics SQL clauses sql queries window functions In our previous post, we explained how SQL window functions work by example. We started with some very simple, basic functions. Let's extend it by explaining subclauses in window functions. SQL window functions are a great way to compute results from a set of rows rather than a single row. As you know from our first article, the "window" in window function refers to the set of rows. We showed you some examples of simple window functions like RANK and LEAD. Today, we're going to use racecars to show you how to use the PARTITION BY and ORDER BY subclauses in window functions. That's right, we said racecars. Everyone, start your SQL engines! Getting Started Window functions can only be used in the SELECT or ORDER BY clauses of a SQL query. In a window function, all the rows returned by the SELECT are divided into windows (or frames). The criteria for deciding which rows belong to which window is defined by putting a PARTITION BY subclause inside the window function invocation. The window function then computes one result based on all the rows in one window. Let's see an example. The following table is a subset of Formula 1 racing results for 2016 and 2017. pilot_name circuit_name year time finish Alonso Monza 2016 1:57:06.32 true Hamilton Monza 2016 1:51:54.28 true Vetel Monza 2016 1:52:04.12 true Alonso Montecarlo 2016 0:43:14.73 false Hamilton Montecarlo 2016 1:12:09.12 true Vetel Montecarlo 2016 0:21:54.73 false Raikonen Montecarlo 2016 1:14:04.12 true Hamilton Monza 2017 1:13:16.97 true Vetel Monza 2017 1:11:39.12 true Raikonen Montecarlo 2017 0:43:14.73 false Alonso Montecarlo 2017 1:32:14.42 true Hamilton Montecarlo 2017 0:43:14.73 false Vetel Montecarlo 2017 1:33:04.12 true Suppose we want to create a report with the pilots' names, their time in every race (where they finished), the average time for that race, and the historical average time for that circuit. The next query will get this data for us: SELECT pilot_name, circuit_name, year, time, AVG(time) OVER (PARTITION BY circuit_name) avg_this_circuit, AVG(time) OVER (PARTITION BY circuit_name,year) avg_this_race FROM races WHERE finish = true ORDER BY year desc, circuit_name, time To understand how the avg_this_circuit column was obtained, let's look at what PARTITION BY did. It generated two windows: one for Monte Carlo and the other for Monza. Then the average was calculated based on all the rows in each set. Note that there are only two different values in the avg_this_circuit column. The avg_this_race column was generated in a similar way, except that the PARTITION BY divided the data into four windows: ("Montecarlo", 2017) ("Monza", 2017), ("Montecarlo", 2016) and ("Monza", 2016). Below, you can see the complete frames for each circuit and for each circuit-year: First Shall Also Be Last? For some window functions, it is important to order the rows within the window. You can do this using the ORDER BY subclause. Let's see how this works. First, we'll execute a query to obtain every pilot name, their time, their position in the race (via the RANK function), the time of the race winner (via the FIRST_VALUE function) and the delta time between this pilot and the winner. Here is the query: SELECT pilot_name, circuit_name, year, time pilot_time, RANK() OVER (PARTITION BY circuit_name, year ORDER BY time) as position, FIRST_VALUE(time) OVER (PARTITION BY circuit_name, year ORDER BY time) winner_time, time - FIRST_VALUE(time) OVER (PARTITION BY circuit_name, year ORDER BY time) difference_time FROM races WHERE finish = true ORDER BY year desc, circuit_name, time Suppose we now want to add the winner's name and the name of the last pilot to finish the race. This is easy: we just add two expressions to the query. (Note: We've shown these expressions in bold text). SELECT pilot_name, circuit_name, year, time pilot_time, RANK() OVER (PARTITION BY circuit_name, year ORDER BY time) as position, FIRST_VALUE(pilot_name) OVER (PARTITION BY circuit_name, year ORDER BY time) winner_pilot, LAST_VALUE(pilot_name) OVER (PARTITION BY circuit_name, year ORDER BY time) last_pilot FROM races WHERE finish = true ORDER BY year desc, circuit_name, time And here are the results: If we analyze the query results, we will find that the winner column is correct, but the last_pilot column is not. In fact, they show the same name: Vatel. How did this happen? Remember, the window function is calculated over a set of rows. It is dynamic, changing with every row returned by the SELECT. At every moment when the query is working, there is a current row in the result set and the window is formed by all the rows in the partition except any rows yet to be returned. When we calculate the last_pilot column, we are taking the last value in the window, which corresponds to the current row being added to the frame. The following image visualizes this scenario: Fixing the Problem The previous issue with last_pilot column is due to the FIRST_VALUE, LAST_VALUE, and NTH_VALUE functions. These consider only the rows within the window frame, which only contains rows up to the current row. Thus, LAST_VALUE does not return the correct value. In order to fix this, we need to add a frame specification (RANGE or ROWS) to the OVER clause. We can select a different range in the partition by using another subclause. Some of the most important options are: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (default) ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING We would add this subclause to the ORDER BY clause. Let's see how the first option works: SELECT pilot_name, circuit_name, year, time pilot_time, rank() OVER (PARTITION BY circuit_name, year ORDER BY time) as position, FIRST_VALUE(pilot_name) OVER (PARTITION BY circuit_name, year ORDER BY time) winner_pilot, LAST_VALUE(pilot_name) OVER (PARTITION BY circuit_name, year ORDER BY time ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) last_pilot FROM races WHERE finish = true ORDER BY year desc, circuit_name, time It worked! Learning More About SQL Window Functions This was just a short example of how we can use subclauses to power up SQL window functions. These might be one of the least-known parts of SQL, but they are powerful, flexible, and worth spending time to master. You can learn more about SQL window functions on our blog, or sign up for the Window Functions course to really dig into the subject! Tags: how to in sql SQL basics SQL clauses sql queries window functions You may also like SQL Window Functions Cheat Sheet This 2-page SQL Window Functions Cheat Sheet covers the syntax of window functions and a list of window functions. Download it in PDF or PNG format. Read more Why Window Functions Are Not Allowed in WHERE Clauses Window functions can only appear in SELECT and ORDER BY but not in WHERE clauses. The reason is the logical order in which SQL queries are processed. Read more You Want to Learn SQL? You've Come to the Right Place! If you want to learn SQL basics or enhance your SQL skills, check out LearnSQL.com for a wide range of SQL courses and tracks. Read more Referential Constraints and Foreign Keys in MySQL Foreign keys and referential constraints allow you to set relationships between tables and modify database engine’s actions. See how to use it in MySQL. Read more SQL Window Function Example With Explanations Interested in how SQL window functions work? Scroll down to see our SQL window function example with definitive explanations! Read more What Is Vertabelo’s SQL Cheat Sheet? Rock the SQL! You don’t have to be a programmer to master SQL. Download the SQL Cheat Sheet and find quick answers for the common problems with SQL queries. Read more SQL Window Functions vs. GROUP BY: What’s the Difference? Window functions and GROUP BY may seem similar at first, but they’re quite different. Learn how window functions differ from GROUP BY and aggregate functions. Read more When Do I Use SQL Window Functions? SQL window functions can help you quickly and accurately create useful reports and analyses. Learn more with real-world business examples. Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.