8th Apr 2020 6 minutes read SQL Window Functions vs. GROUP BY: What’s the Difference? Ignacio L. Bisso sql learn sql window functions Table of Contents Quick Review of GROUP BY GROUP BY vs Window Functions The Power of Window Functions Window Functions vs Aggregate Functions Closing the Window A very common misconception among SQL users is that there is not much difference between SQL window functions and aggregate functions or the GROUP BY clause. However, the differences are very significant. Perhaps the only similar point between GROUP BY and window functions is that both allow you to execute a function (such as AVG, MAX, MIN, or COUNT) on a group of records. I’d say that window functions’ special power is that they allow us to obtain results that otherwise would be almost impossible to achieve. In this article, we’ll go over the use of window functions vs GROUP BY and window functions vs aggregate functions. Quick Review of GROUP BY The GROUP BY clause allows us to group a set of records based on some criteria and apply a function (e.g. AVG or MAX) to each group, obtaining one result for each group of records. Let’s see an example. We have a table called employee with a total of five employees and three departments: Employee_NameDepartmentSalary John RobertsFinance2300 Peter HudsonMarketing1800 Sue GibsonFinance2000 Melinda BishopMarketing1500 Nancy HudsonIT1950 fig1: the employee table Suppose we want to obtain the average salary per department and the top salary for every department. We should use the following query: SELECT Department, avg(salary) as average, max(salary) as top_salary FROM employee GROUP BY department The image below shows the result: Departmentaveragetop_salary Marketing16501800 Finance21502300 IT19501950 GROUP BY vs Window Functions When comparing window functions and GROUP BY, it’s essential to remember that GROUP BY collapses the individual records into groups; after using GROUP BY, you cannot refer to any individual field because it is collapsed. Later, we will talk in depth about this topic. For now, we’ll just mention that window functions do not collapse individual records. So, if you want to create a report with an employee name, salary, and the top salary of the employee’s department, you can’t do it with GROUP BY. The individual records of each employee are collapsed by the GROUP BY department clause. For this kind of report, you need to use window functions, which is the topic of the next section. If you want to go deep into the nuances of SQL GROUP BY and reports, we recommend our interactive course Creating Basic SQL Reports. The Power of Window Functions Window functions are a powerful feature of SQL. They allow us to apply functions like AVG, COUNT, MAX, and MIN on a group of records while still leaving the individual records accessible. Since the individual records are not collapsed, we can create queries showing data from the individual record together with the result of the window function. This is what makes window functions so powerful. Suppose we want to obtain a list of employee names, salaries, and the top salary in their departments. SELECT employee_name, department, salary, max(salary) OVER (PARTITION BY department) as top_salary FROM employee The next image shows the result: Employee_NameDepartmentsalarytop_salary John RobertsFinance23002300 Peter HudsonMarketing18001800 Sue GibsonFinance20002300 Melinda BishopMarketing15001800 Nancy HudsonIT19501950 In the previous query, we used a window function: max(salary) OVER (PARTITION BY department) as top_salary The window function is MAX() and we applied it to the set of records defined by the clause OVER (PARTITION BY department), which are the records with the same value in the department field. Lastly, we renamed the column top_salary. In the query result, we have rows for individual employees. If we used GROUP BY instead of window functions, we would have rows for each department. Window functions have a rather verbose syntax; if you want to go into details, I suggest the “Window Functions” course, which is a step-by-step tutorial that takes you through SQL window functions using examples and exercises. Window Functions vs Aggregate Functions When we compare window functions and aggregate functions, we note a super powerful feature on the window functions side: positional functions. They allow us to obtain a column value from other records in the same window. This is a really amazing ability, allowing SQL users to create complex reports in just a few lines. Let’s briefly discuss two of these functions: LEAD() and LAG(). The LAG() function returns the column value of the previous record in the window, while LEAD() returns the column value from the next record in the window. It is very important to have the window ordered by the right column if you want to use these functions. Let’s see an example of how we can use these functions. Suppose we have a table that stores company shares with their market values at a point in time. The table might look like this: share_symboltimestampvalue OILBEST2020-03-05 10:00120 OILBEST2020-03-05 12:00123 OILBEST2020-03-05 15:00122 BANKWEB2020-03-05 10:0091 BANKWEB2020-03-05 12:0087 BANKWEB2020-03-05 15:0099 fig2: the share table Suppose we want a report showing every share’s value with its previous value and the variation percentage related to the previous value. We can do it by using the LEAD() function to obtain the share’s previous value. Note that we use ORDER BY timestamp when defining the partition (i.e. the window of records). We’ll go back to this point later. SELECT share_symbol, timestamp, value, LAG(value) OVER (PARTITION BY share_symbol ORDER BY timestamp ) AS previous_value, TRUNC(((value - (LAG(value) OVER (PARTITION BY share_symbol ORDER BY timestamp )))*100)/value,2) AS percentage_variation FROM share Notice that the columns previous_value and percentage_variation are calculated columns. They use values from different records in the same table. share_symboltimestampvalueprevious_valuepercentage_variation OILBEST2020-03-05 10:00120 OILBEST2020-03-05 12:001231202.43 OILBEST2020-03-05 15:00122123-0.81 BANKWEB2020-03-05 10:0091 BANKWEB2020-03-05 12:008791-4.59 BANKWEB2020-03-05 15:00998712.12 When we use positional functions, it is very important to put an ORDER BY clause along with the PARTITION clause (as we did in the previous query). If we don’t use the right ORDER BY, the results can be wrong. Why? Because positional functions work based on the order of the records in the window. Let’s examine this a little more. The FIRST_VALUE() function returns a column value from the first record in the window. LAG(), as we know, returns the column value of the previous record in the window. Having the right window order is crucial; imagine what you’d get from these functions otherwise! In our example, we want the previous chronological market value for a specific share. Thus, we used ORDER BY timestamp. If we omit the ORDER BY or we order by another column, the result would be wrong. In some specific cases, positional functions can return wrong values because of a partially populated window. And there are more window functions, like RANK(), NTH_VALUE() and LAST_VALUE(). We don’t have space to cover all this here, but I suggest checking out this article explaining window functions and these window function examples to learn more. Closing the Window In this article, we explored the differences between window functions and GROUP BY. We looked at examples with several aggregate and window functions. We also talked about an important limitation of the GROUP BY clause, namely the “collapse of records”. This limitation is not present on window functions, allowing SQL developers to combine record-level data with window function results in the same query. Another advantage of window functions is their capability to combine query values from different records (from the same window) in the same row of the result set. If you are interested in learning more about window functions, I suggest the Window Functions course, where you can learn SQL window functions using interactive exercises and detailed explanations. Tags: sql learn sql window functions