# The SQL OVER() Clause - When and Why to Use It

*The OVER() clause, or how to reach a whole new level of data analysis. *

Why all the commotion around the SQL `OVER()`

clause? I’ll tell you why: the `OVER()`

clause introduces window functions.

And what are window functions in SQL, you might ask? They are functions that operate on a window, i.e. a set of rows related to the current row. In that way, they are similar to the `GROUP BY`

clause and to aggregate functions.

So why does knowing `OVER()`

unlock a new level of data analysis? If you think about aggregate functions and GROUP BY, you’ll realize that they do aggregate data. But what you get is only the aggregated values. In other words, you lose all the individual rows.

With window functions, you can show individual row values and aggregate values at the same time. If you’re not familiar with window functions in SQL, this is a perfect moment to let you know about our Window Functions course with 218 interactive exercises. You have an internet browser, so all you need to enjoy the course is some basic SQL knowledge, like how to use `JOINs`

, `WHERE`

, `GROUP BY`

, and `HAVING`

.

For those of you who already use window functions, our Window Functions Practice Set will help maintain your coding form. Get your free Window Functions Cheat Sheet and you’re ready to go!

Let’s answer the question of when and why you should use the `OVER()`

clause in four practical examples. But first, let’s examine the data.

## Data for Showcasing the OVER() Clause

There’s a chain of bookstores selling, quite predictably, books. They record their sales daily using two rather simple tables. The first one is ** author**, which has three columns.

`id`

– The author’s ID and the table’s primary key (PK).`first_name`

– The author’s first name.`last_name`

– The author’s last name.

The other table is ** book_sales**:

`id`

- The sales ID and the table’s primary key (PK).`date`

– The sales date.`book_title`

– The book’s title.`sales`

– The number of book copies sold.`author_id`

– The author’s ID and the table’s foreign key (FK), referencing the table.`author`

To help you visualize what this data looks like, here’s what you get when joining these two tables:

id | first_name | last_name | id-2 | date | book_title | sales | author_id |
---|---|---|---|---|---|---|---|

1 | Hanya | Yanagihara | 1 | 2022-02-01 | A Little Life | 14 | 1 |

2 | Maya | Angelou | 2 | 2022-02-01 | I Know Why the Caged Bird Sings | 9 | 2 |

3 | Milan | Kundera | 3 | 2022-02-01 | The Unbearable Lightness of Being | 1 | 3 |

The above data tells you that:

- 14 copies of Hanya Yanagihara’s “A Little Life” were sold on 1 February 2022.
- On the same date, 9 copies of “I Know Why the Caged Bird Sings” by Maya Angelou were sold.
- Milan Kundera’s “The Unbearable Lightness of Being” was sold only once on that day.

Data is presented the same way for all other titles.

## Example 1: OVER() Without Additional Clauses

The easiest way of working with SQL window functions is using `OVER()`

without its optional clauses. I’ll demonstrate how this works by showing every book’s sales in relation to the total sales on that date.

Here’s how the code should be written:

SELECT date, first_name, last_name, book_title, sales, SUM(sales) OVER() AS total_daily_sales, sales::DECIMAL/SUM(sales) OVER()*100 AS percent_of_daily_sales FROM author JOIN book_sales ON author.id =book_sales.author_id WHERE date = '2022-02-01' GROUP BY date, first_name, last_name, book_title, sales;

After listing the date, author’s name, and book title, I selected the column `sales`

. It represents the daily sales of a book title.

The next step is to show the total sales on the desired date. To do this, an ordinary aggregate function has to become a window aggregate function. This transformation is achieved by writing the `OVER()`

clause and leaving the parentheses empty.

To get the sales for 1 February 2022, I need to filter data using the `WHERE`

clause. If I didn’t apply this filter, using the `OVER()`

clause would get me the sum of overall sales.

Now I need to divide the sales by the book with the total daily sales to get the sales percentage for each book. Since SQL doesn’t allow referencing the alias column of the calculation, I needed to write the above calculation again (or copy it, to be more honest). I also converted the data type into DECIMAL and multiplied the result by 100 to get a percentage. Data is fetched from both tables using the `JOIN`

clause.

In the end, data is grouped by the selected columns. Running the query will give you this result:

date | first_name | last_name | book_title | sales | total_daily_sales | percent_of_daily_sales |
---|---|---|---|---|---|---|

2022-02-01 | Hanya | Yanagihara | A Little Life | 14 | 80 | 17.5 |

2022-02-01 | Hanya | Yanagihara | To Paradise | 42 | 80 | 52.5 |

2022-02-01 | Maya | Angelou | Gather Together in My Name | 7 | 80 | 8.75 |

2022-02-01 | Maya | Angelou | I Know Why the Caged Bird Sings | 9 | 80 | 11.25 |

2022-02-01 | Milan | Kundera | The Book of Laughter and Forgetting | 7 | 80 | 8.75 |

2022-02-01 | Milan | Kundera | The Unbearable Lightness of Being | 1 | 80 | 1.25 |

You can interpret the result in the following way. The total daily sales for 1 February 2022 is 80. Of those 80, 14 copies sold refer to “A Little Life”, which is 17.5% of the total daily sales. You can go through the result and see the percentage for other titles.

## Example 2: OVER(ORDER BY)

The second example will show you how to rank sales using the DENSE_RANK() function (which does not omit ranking numbers) with `ORDER BY`

in the `OVER()`

clause.

The `ORDER BY`

clause tells the window function in which order to perform its calculations. It includes both the column name and the order of the calculation (descending (10-1) or ascending (1-10)).

SELECT date, first_name, last_name, book_title, sales, DENSE_RANK() OVER(ORDER BY sales DESC) FROM author JOIN book_sales ON author.id = book_sales.author_id;

The info I want to see is the sales date, author’s name, book title, and the number of copies sold on that day.

Remember, I want to rank these sales using the `DENSE_RANK()`

function. As in the previous example, I introduced the window function using the `OVER()`

clause. The parentheses aren’t empty this time: there’s `ORDER BY`

sales `DESC`

. What does that mean? That I want to rank data according to sales, from the highest to the lowest.

Have a look at the partial output:

date | first_name | last_name | book_title | sales | dense_rank |
---|---|---|---|---|---|

2022-02-02 | Hanya | Yanagihara | To Paradise | 67 | 1 |

2022-02-01 | Hanya | Yanagihara | To Paradise | 42 | 2 |

2022-02-03 | Hanya | Yanagihara | To Paradise | 34 | 3 |

2022-02-03 | Maya | Angelou | I Know Why the Caged Bird Sings | 21 | 4 |

2022-02-02 | Hanya | Yanagihara | A Little Life | 21 | 4 |

2022-02-03 | Hanya | Yanagihara | A Little Life | 19 | 5 |

The highest daily sales for any particular book was on 2 February 2022. It was Hanya Yanagihara’s latest title, “To Paradise”, which sold 67 copies. The same title holds the second and the third place. Two other titles sold 21 copies per day; these are “I Know Why the Caged Bird Sings” and “A Little Life”. They share fourth place, since the number of copies sold is the same. For information on why tied ranks are important and how different ranking window functions treat them, see this overview of SQL ranking functions.

## Example 3: OVER(PARTITION BY)

The second important clause in `OVER()`

is `PARTITION BY`

. This specifies the window of data over which the calculation will be executed. What that means will be more apparent after I show you how to find the daily sales and the maximum daily sales.

Here’s the code:

SELECT date, book_title, sales, MAX(sales) OVER(PARTITION BY date) AS highest_daily_sales FROM book_sales;

I want to output the date, book title, and the number of copies sold. There also needs to be info about the highest number of copies sold that day.

To achieve that, I use the `MAX()`

function. Again, the `OVER()`

clause is here to make it a window function. Inside the `OVER()`

, there’s `PARTITION BY date`

. This instructs the window function not to show the highest overall daily sales, but only the highest sales for that particular date.

The results might help you understand what that means:

date | book_title | sales | highest_daily_sales |
---|---|---|---|

2022-02-01 | A Little Life | 14 | 42 |

2022-02-01 | I Know Why the Caged Bird Sings | 9 | 42 |

2022-02-01 | The Unbearable Lightness of Being | 1 | 42 |

2022-02-01 | To Paradise | 42 | 42 |

2022-02-01 | Gather Together in My Name | 7 | 42 |

2022-02-01 | The Book of Laughter and Forgetting | 7 | 42 |

2022-02-02 | A Little Life | 21 | 67 |

2022-02-02 | I Know Why the Caged Bird Sings | 12 | 67 |

2022-02-02 | The Unbearable Lightness of Being | 18 | 67 |

2022-02-02 | To Paradise | 67 | 67 |

2022-02-02 | Gather Together in My Name | 8 | 67 |

2022-02-02 | The Book of Laughter and Forgetting | 12 | 67 |

This is, again, only part of the output. You see that the output lists all sales on 1 February 2022. It also shows the highest sales for that day was 42 copies. If you go a few rows lower, you’ll discover 42 copies is how many “To Paradise” sold, which makes it the best-selling book of that day.

On 2 February 2022, the highest sales were 67, which is again “To Paradise”.

Now it’s obvious what the `PARTITION BY`

clause does. It divides data into windows (in this case, according to date), performs the calculation for the first window, restarts the calculation at the second window (another date), and so on.

## Example 4: OVER(PARTITION BY ORDER BY)

Now, I’m going all in and showing you how the window functions work when you use both the `PARTITION BY`

and the `ORDER BY`

clauses in `OVER()`

.

The example asks you to calculate the cumulative sum of each title’s sales. As you’d imagine, I’ll use the `SUM()`

window function:

SELECT date, book_title, SUM(sales) OVER (PARTITION BY book_title ORDER BY date) AS cum_sum FROM book_sales;

I need only three rows: date, book title, and the cumulative sum of sales. The first two are easy, but how do I get this cumulative amount?

I use the `SUM()`

function on the sales column. Then I introduce the window function via the `OVER()`

clause. I want to calculate the cumulative sum for every book separately, so I partition data by book title.

The cumulative sum with the data I have would mean going through sales chronologically and adding the current day’s sales to the previous sales. The ORDER BY instructs the window function to sum data exactly that way: from the earliest date to the latest. When not specified, the order is ascending by default. Again, remember that the `ORDER BY`

in the `OVER()`

clause **does not** order the output. It orders data within the window frame, i.e., influences the order of the calculation.

And the full-scale output is:

date | book_title | cumulative_sum |
---|---|---|

2022-02-01 | A Little Life | 14 |

2022-02-02 | A Little Life | 35 |

2022-02-03 | A Little Life | 54 |

2022-02-01 | Gather Together in My Name | 7 |

2022-02-02 | Gather Together in My Name | 15 |

2022-02-03 | Gather Together in My Name | 17 |

2022-02-01 | I Know Why the Caged Bird Sings | 9 |

2022-02-02 | I Know Why the Caged Bird Sings | 21 |

2022-02-03 | I Know Why the Caged Bird Sings | 42 |

2022-02-01 | The Book of Laughter and Forgetting | 7 |

2022-02-02 | The Book of Laughter and Forgetting | 19 |

2022-02-03 | The Book of Laughter and Forgetting | 23 |

2022-02-01 | The Unbearable Lightness of Being | 1 |

2022-02-02 | The Unbearable Lightness of Being | 19 |

2022-02-03 | The Unbearable Lightness of Being | 23 |

2022-02-01 | To Paradise | 42 |

2022-02-02 | To Paradise | 109 |

2022-02-03 | To Paradise | 143 |

The output tells you that “A Little Life” sold 14 copies on 1 February 2022. The cumulative sum on 2 February is 35, while the cumulative sum on 3 February is 54, which is also the total number of copies sold. You can interpret the rest of the data in the same way.

I believe it’s clear how the cumulative sum (or running total) works. If it’s not, read this article on computing running totals in SQL.

## When Is OVER() Useful in Business?

The practical examples I showed you explain the uses of the `OVER()`

clause from a technical standpoint. Let’s see how this translates to practical business use!

### Creating Rankings

SQL window functions such as `ROW_NUMBER()`

, `RANK()`

, and `DENSE_RANK()`

are helpful when creating various rankings or in numbering rows. Some business examples include creating Top 5, Top 10, or Top *n* reports of the highest or the lowest selling products, highest or lowest-paid employees, highest or lowest streamed songs or movies, or the website/app users with the highest and lowest activity.

### Calculating Cumulative Sums

From the last example, you already learned how to calculate running totals (cumulative sums) using the `SUM()`

window function. Some examples of business use are calculating cumulative sales, costs, streams, website/app activities, etc.

### Calculating Moving Sums and Moving Averages

By applying the ROWS BETWEEN clause with the `SUM()`

and `AVG()`

window functions, you can calculate the moving sum and the moving average. They are cumulative sums and cumulative averages with set starting and ending time periods. This means you could, for instance, calculate the sum of the last three days or the average of the last seven days. In practice, this is used again on summing the revenue over time (day, week, month, etc.) or FX rates.

### Time Series Analysis

To perform a time series analysis in SQL, you should use the `LEAD()`

and LAG() window functions. They make it possible to go a certain number of rows back or forward compared to the current row. This is helpful when you prepare day-to-day, month-to-month, year-to-year and similar reports. It’s used to show the difference (delta) between the previous and current period or the percentage of increase or decrease. You could compare budgeted and/or actual sales, salaries, costs, or any data showing time series, such as COVID-19 data.

### Finding the *N*th Value

If you want to find the first or last values in the dataset, the `MIN()`

, `MAX()`

, `FIRST_VALUE()`

, and `LAST_VALUE()`

functions are the way to go. If you don’t want to limit yourself only to the first or last values, you should use the `NTH_VALUE()`

window function. It will get you the *n*th value (6th, 1,000th, etc.) from a dataset.

One popular business example of this is finding the highest/lowest paid employee in a department or company or finding the third, the fifth, or the *n*th highest/lowest paid employee. You could do the same with the product sales, product costs, loan applications, borrowed books, streams, etc.

## See for Yourself How the `OVER()`

Clause Works

You’ve learned how to use `OVER()`

alone and with any combination of its additional `ORDER BY`

and `PARTITION BY`

clauses. I think you realize how useful it is in elevating your data analysis skills and helping you in endless practical business situations.

Don’t let what you’ve learned here evaporate! Have a look at our Window Functions course to practice the `OVER()`

clause and discover even more situations for its practical use.