5th Oct 2023 14 minutes read

BigQuery Window Functions Explained

Table of Contents

What’s BigQuery?
What Are SQL Window Functions?
- Why Are They Called Window Functions?
Syntax of BigQuery Window Functions
SQL Window Functions Available in BigQuery
Practical Uses of BigQuery Window Functions
Next Steps with BigQuery Window Functions

Keeping up with data analysis trends gives your organization – and your CV – the cutting edge. In this article, we'll look at BigQuery window functions and how you can use them to gain deeper insights into your data.

Google's BigQuery, launched in 2010, is gaining traction as a popular choice with organizations needing to analyze large quantities of information quickly and to compare their own data against statistical data in the public domain.

Since Google aligned BigQuery's data retrieval language to conform with standard SQL— and included advanced features such as SQL window functions—its popularity has increased. Many organizations now include BigQuery skills as a must-have, and this means that SQL skills are more in demand than ever. This article explains why knowledge of SQL is essential to working with BigQuery

SQL continues to hold its place as the leading skill for anyone who needs to work with data. If you're not yet an SQL guru, you may be interested in our SQL from A to Z learning track. It includes 7 courses that take you from absolute beginner to SQL expert. The course has hundreds of real-world coding challenges and takes about 84 hours to complete. Since you can access the sample databases through your browser, you don't need to install any software to get started.

What’s BigQuery?

Google BigQuery is a high-speed data warehouse located in the Cloud. Designed specially to hold data used for analysis, it can process petabytes of data in minutes. If you haven't yet got your head around petabytes, a petabyte is one quadrillion bytes (or a million gigabytes).

You pay for what you use in BigQuery, so the cost of storing and analyzing huge quantities of data usually works out much lower than investing in lots of hard disks. And since all kinds of useful statistical data from governments and world organizations are stored publicly in BigQuery, you can access them to gain insights into how your organization's performance could be enhanced.

What Are SQL Window Functions?

Window functions are also known as analytical functions or OVER functions. They were added to the SQL standard in 2003, and most major database vendors began implementing them from 2010 onwards. They are thus a fairly new addition to SQL.

Before window functions were included in SQL, you could either list individual rows or calculate aggregates such as totals and averages. You couldn't easily do both in the same query unless you wrote complex—and probably slow and inefficient—subqueries.

This means you could either have a list like this …

Student ID	Subject	Score
1	Math	63
1	Science	50
2	Math	59

… or you could show aggregates like this:

Subject	Class Average
Math	52
Science	61
English	55
Overall Average	56

In this example, if you wanted to know how an individual student's score compared to the class average, you'd have to look at the two reports side by side. It would be much nicer if you could see the class average on the same row as a student's score, like this:

Student ID	Subject	Score	Class Average
1	Math	63	62
1	Science	50	61
2	Math	59	52

As I mentioned, you could achieve this using subqueries. But subqueries are notoriously slow to run and can make your query very complicated.

This is the kind of thing window functions allow you to do easily and efficiently: include aggregates alongside details on the same row.

Why Are They Called Window Functions?

They're called window functions because, while looking at an individual row, you can also "look through the window" and extract information from the entire dataset or rows related to the current row.

As an example, let's look at a table of students' grades:

Student ID	Teacher ID	Subject	Grade
1	1	Math	63
2	1	Math	80
3	2	Math	60
4	2	Math	45
5	1	Math	52
6	1	Math	70
7	2	Math	65
1	2	Science	70
2	2	Science	62
3	2	Science	90
4	2	Science	30
5	2	Science	53
1	5	English	59
3	5	English	70
5	5	English	45
6	5	English	62
1	12	History	55
3	12	History	67
4	12	History	58

As per the previous example, we want to show the class average next to each student, so that a single row looks like this:

Student ID	Subject	Grade	Class Average
4	Math	45	62

To do this, we need to look at the average of all the other students when we're extracting this row and show the result alongside the other data.

In SQL, window functions use a sliding window of rows to extract additional information from either the entire dataset, or a subset related to the current row.

In the above diagram, the current row is a math result, and the window includes all the math grades.

When we're extracting this row …

Student ID	Subject	Grade	Class Average
1	Science	70	62

… we'll need the window to "slide" so we're viewing all the science results in order to calculate the average for science.

Window functions have many of the same capabilities as the GROUP BY clause, but the difference is that they allow us to view aggregates and details side by side.

Syntax of BigQuery Window Functions

The OVER() Clause

The OVER() clause indicates that you're using a window function. You'd write your query as usual, and include the aggregates you want alongside the other column names. Each aggregate is identified with the OVER() clause. When you use this clause on its own, the 'window' is the entire dataset. I'll cover sliding windows a bit later in the article.

For example, if you wanted to extract all the math results and show the class average and the highest and lowest grade against each student’s grade, your query would look like this:

SELECT 
  student_id,
  grade,
  AVG(grade) OVER() AS average,
  MIN(grade) OVER() AS lowest,
  MAX(grade) OVER() AS highest
FROM exam_results
WHERE subject = 'Math';

Your results would then look like this:

student_id	grade	average	lowest	highest
1	63	62	45	80
2	80	62	45	80
3	60	62	45	80
4	45	62	45	80
5	52	62	45	80
6	70	62	45	80
7	65	62	45	80

The PARTITION BY Clause

This clause uses a sliding window. Instead of a window containing the entire dataset, it includes only a partition (or part) of the set.

In the previous example, I included only the math results by excluding all others using the WHERE clause. If you wanted a report that showed results for all subjects but calculated the average using only the rows where the subject matched the current row, you would use the PARTITION BY clause:

SELECT 
  student_id,
  subject,
  grade,
  AVG(grade) OVER(PARTITION BY subject) AS average
FROM exam_results;

Have a look at this color-coded copy of the students’ grades table to see how the partitions will work:

Student ID	Teacher ID	Subject	Grade
1	1	Math	63
2	1	Math	80
3	2	Math	60
4	2	Math	45
5	1	Math	52
6	1	Math	70
7	2	Math	65
1	2	Science	70
2	2	Science	62
3	2	Science	90
4	2	Science	30
5	2	Science	53
1	5	English	59
3	5	English	70
5	5	English	45
6	5	English	62
1	12	History	55
3	12	History	67
4	12	History	58

When processing each row, the rows included in the window change based on the value in the subject column. This means the average is calculated only for the partition of the dataset where the subject matches the current row. You can visualize it like this:

The results would look like this:

Student ID	Subject	Grade	Class Average
1	Math	63	62
2	Math	80	62
3	Math	60	62
4	Math	45	62
5	Math	52	62
6	Math	70	62
7	Math	65	62
1	Science	70	61
2	Science	62	61
3	Science	90	61
4	Science	30	61
5	Science	53	61
1	English	59	59
3	English	70	59
5	English	45	59
6	English	62	59
1	History	55	60
3	History	67	60
4	History	58	60

The ORDER BY Clause

The ORDER BY clause within the OVER() function uses a different type of sliding window. When you use OVER(ORDER BY column_name), the window includes only those rows where the value of the specified column is less than or equal to that column’s value in the current row.

The ORDER BY clause is useful for calculating running totals and moving averages. As an example, I'll use a table called monthly_transactions that contains bank account transactions:

account_id	tran_date	transaction	value
1	2023-09-01	Opening Balance	500.00
1	2023-09-03	Deposit	137.45
1	2023-09-12	Withdrawal	-200.00
1	2023-09-18	Withdrawal	-250.00
2	2023-09-01	Opening Balance	1200.00
2	2023-09-14	Deposit	900.00
2	2023-09-20	Purchase	-318.90

The query below will list the transactions for Account ID 1, showing a running balance.

SELECT 
  account_id,
  tran_date,
  transaction,
  value,
  SUM(value) OVER(ORDER BY tran_date) AS balance
FROM monthly_transactions
WHERE account_id = 1;

Including the ORDER BY clause within the OVER clause controls a sliding window.

If you wanted to, you could also use the usual ORDER BY clause at the end of the query to control the final order of the rows on the report. They don’t have to be shown in the original order.

By default, using the ORDER BY inside the OVER clause causes the window to slide so that it only views rows where the date is less than or equal to the date of the current row. There are other keywords that can change this default, but they’re a bit beyond the scope of this article.

You can visualize it like this:

The results would look like this:

account_id	tran_date	transaction	value	balance
1	2023-09-01	Opening Balance	500.00	500.00
1	2023-09-03	Deposit	137.45	637.45
1	2023-09-12	Withdrawal	-200.00	437.45
1	2023-09-18	Withdrawal	-250.00	187.45
2	2023-09-01	Opening Balance	1200.00	1200.00
2	2023-09-14	Deposit	900.00	2100.00
2	2023-09-20	Purchase	-318.90	1781.10

Using ORDER BY with the PARTITION BY Clause

If you want to show all accounts with their running balances, you can use PARTITION BY and ORDER BY together:

SELECT 
  account_id,
  tran_date,
  transaction,
  value,
  SUM(value) OVER(PARTITION BY account_id ORDER BY tran_date) AS balance
FROM monthly_transactions
ORDER BY account_id, tran_date

The PARTITION BY clause will cause the sliding window to only include rows where the account_id matches the current row. The ORDER BY clause will cause it to only include rows within that partition where the date is less than or equal to the date of the current row.

The results would be:

account_id	tran_date	transaction	value	balance
1	2023-09-01	Opening Balance	500.00	500.00
1	2023-09-03	Deposit	137.45	637.45
1	2023-09-12	Withdrawal	-200.00	437.45
1	2023-09-18	Withdrawal	-250.00	187.45
2	2023-09-01	Opening Balance	1200.00	1200.00
2	2023-09-14	Deposit	900.00	2100.00
2	2023-09-20	Purchase	-318.90	1781.10

Each account has its own separate running balance.

SQL Window Functions Available in BigQuery

You've seen how common SQL aggregate functions such as SUM(), AVG(), MIN(), and MAX() can be used in conjunction with the OVER clause to extract aggregates from a window of data.

Google BigQuery, in common with many other SQL dialects, has additional functions that can give deeper insights into the data. Here are some examples.

RANK()

This function ranks the dataset from highest to lowest on a specified column. It can answer questions like:

Where did each student place in the class, based on exam results?
Which products were most profitable?
Which customers spent the most money?
Which warehouse received the most complaints?

Using the sample table we looked at earlier, let’s rank the students by exam results using this query:

SELECT 
  student_id,
  subject,
  grade,
  RANK() OVER(PARTITION BY subject ORDER BY grade desc) AS place
FROM exam_results
ORDER BY subject, grade DESC;

The result would be:

student_id	subject	grade	place
3	English	70	1
6	English	62	2
1	English	59	3
5	English	45	4
3	History	67	1
4	History	58	2
1	History	55	3
2	Math	80	1
6	Math	70	2
7	Math	65	3
1	Math	63	4
3	Math	60	5
5	Math	52	6
4	Math	45	7
3	Science	90	1
1	Science	70	2
2	Science	62	3
5	Science	53	4
4	Science	30	5

When calculating these results, the PARTITION BY subject clause causes SQL to look at only the results for the same subject as the current row. The ORDER BY grade DESC clause sorts them in descending order of grade. The RANK() function then ranks the students in this order.

Since student 3 has the highest grade in English, his rank is 1; student 6, the next highest, is ranked 2 in this subject. Student 2 has the highest grade in Math and is ranked as 1.

DENSE_RANK()

DENSE_RANK() is used for the same purpose as RANK. The difference between them can best be explained by looking at these query results, which represent scores in a shooting competition.

Using RANK, the query would be:

SELECT 
  competitor_no,
  score,
  RANK() OVER(ORDER BY score desc) AS rank
FROM match_results
ORDER BY score DESC;

The results are:

Competitor No	Score	Rank
4	85	1
5	83	2
10	83	2
9	81	4
2	76	5
6	76	5
7	72	7
3	70	8
8	68	9
1	62	10

Competitors 5 and 10 tied for second place, and both received a ranking of 2. Competitor 9 is the next highest, and ranks as 4. Third place is left out.

Using DENSE_RANK, the query is:

SELECT 
  competitor_no,
  score,
  DENSE RANK() OVER(ORDER BY score desc) AS rank
FROM match_results
ORDER BY score DESC;

The results are:

Competitor No	Score	Rank
4	85	1
5	83	2
10	83	2
9	81	3
2	76	4
6	76	4
7	72	5
3	70	6
8	68	7
1	62	8

Competitors 5 and 10 are still both ranked as 2, but third place is not left out: Competitor 9 now has a ranking of 3.

Both functions have the same syntax. If we wanted to recode the previous students’ grades query using the DENSE_RANK() function, it would look like this:

SELECT 
  student_id,
  subject,
  grade,
  DENSE_RANK() OVER(PARTITION BY subject ORDER BY grade desc) AS class_place
FROM exam_results
ORDER BY subject, grade DESC

But because there are no tie values, the result would be the same.

ROW_NUMBER()

The ROW_NUMBER() function works in a similar way to the previous two functions, but the rows are simply numbered in order. If rows have the same value, they will be numbered consecutively, depending on which was encountered first. Here’s the results of the shooting competition query using ROW_NUMBER() instead of RANK() or DENSE_RANK():

Competitor No	Score	Rank
4	85	1
5	83	2
10	83	3
9	81	4
2	76	5
6	76	6
7	72	7
3	70	8
8	68	9
1	62	10

LAG()

This function allows you to compare data from the previous row in the result set to data in the current row. It's ideal for year-on-year comparisons, allowing you to discover trends and identify business performance issues.

LAG() and the related function LEAD() can only be used in conjunction with the OVER(ORDER BY) clause.

As an example, take the following table, which holds sales data for a small company:

year	sales_value	sales_quantity	profit
2019	54000	900	8000
2020	75000	1200	11000
2021	30000	450	1000
2022	60000	10000	7000

The sample query to compare year-on-year figures is:

SELECT 
  year,
  sales_value,
  sales_quantity,
  profit,
  LAG(sales_value) OVER(ORDER BY year) as ly_value,
  LAG(sales_quantity) OVER(ORDER BY year) as ly_qty,
  LAG(profit) OVER(ORDER BY year) as ly_profit,
  profit - LAG(profit) OVER(ORDER BY year) as inc_dec
FROM annual_sales
ORDER BY year;

The results of this query are:

year	sales_value	sales_quantity	profit	ly_value	ly_qty	ly_profit	inc_dec
2019	54000	900	8000	NULL	NULL	NULL	NULL
2020	75000	1200	11000	54000	900	8000	3000
2021	30000	450	1000	75000	1200	11000	-10000
2022	60000	10000	7000	30000	450	1000	6000

Let's look at this line of the query and see what it actually did:

LAG(sales_value) OVER(ORDER BY year) as ly_value

On this row of the result …

year	sales_value	sales_quantity	profit	ly_value	ly_qty	ly_profit	inc_dec
2020	75000	1200	11000	54000	900	8000	3000

… our current row is for the year 2020. The LAG() function in conjunction with ORDER BY year causes SQL to look at the row for the previous year (2019) and extract the sales value from it under the heading ly_value.

You will notice that in the first row, the columns calculated by the LAG() function contain a null value, since there is no previous record.

LEAD()

The LEAD() function is the inverse of LAG(): it obtains data from the row after the current row, instead of the one before it.

To compare the profit between the current year, the previous year, and the subsequent year using the same sample table, the query would be:

SELECT 
  year,
  profit,
  LAG(profit) OVER(ORDER BY year) as ly_profit,
  profit - LAG(profit) OVER(ORDER BY year) as ly_inc_dec,
  LEAD(profit) OVER(ORDER BY year) as ny_profit,
  LEAD(profit) OVER(ORDER BY year) - profit as ny_inc_dec
FROM annual_sales
ORDER BY year;

The results of this query would be:

year	profit	ly_profit	ly_inc_dec	ny_profit	ny_inc_dec
2019	8000	NULL	NULL	11000	3000
2020	11000	8000	3000	1000	-10000
2021	1000	11000	-10000	7000	6000
2022	7000	1000	6000	NULL	NULL

The columns ny_profit and ny_inc_dec use the LEAD() function, which causes SQL to look at the next row in sequence to extract these fields. When the current row is for 2019, these two fields will be taken from the row for 2020.

This should have given you an idea of some of the useful SQL window functions available in BigQuery. You'll find a full list of functions in the BigQuery SQL documentation.

For more examples of window functions, refer to the article SQL Window Function Examples. You can also find a quick reference guide for the syntax of SQL window functions in our Window Functions Cheat Sheet.

Practical Uses of BigQuery Window Functions

In the real world, there are many ways that BigQuery window functions can give you insights to help your organization perform better. They are very powerful and allow you to produce complex reports very quickly.

Here are a few ideas of how BigQuery window functions may be used:

Ranking your employees by performance to give motivational rewards. Check out this article for more information on ranking rows.
Finding how each product line performed compared to other similar items.
Learning how individual products affect the moving average profit. You can read more about computing moving averages in SQL here.
Comparing year-on-year data to discover trends. Find out more about preparing year-on-year comparisons in SQL in this article.
Using running totals to be able to see exactly how many sales had been made at a given point in time. Learn about computing running totals in SQL here.

Next Steps with BigQuery Window Functions

Now that you've seen what SQL window functions can do in BigQuery and other database management systems, it's time to think about upping your skills in this important area.

A good place to start is LearnSQL.com's SQL Window Functions course. You'll learn step by step how to make use of the techniques you've seen in this article, with a full explanation of each topic. You'll also get plenty of practice, with over 200 interactive exercises to make sure you know how to extract complex information in real-world situations. Help is available when you get stuck, and you will be able to access sample databases through your browser. The course takes about 20 hours to complete.

If you really want to become an expert, you can get more practice in solving complex problems by working through our Window Functions Practice Set. You will tackle 100 examples of complex reports using three different databases.

You can also read this article on how to practice window functions, and you can find out more about BigQuery syntax here.

If you're job-hunting for a top-level job in data analysis or data science, it's very likely that you'll be asked to demonstrate your knowledge of SQL window functions at the interview. To get an idea of what kind of questions you might be asked (and how to answer them), here's an article that discusses top SQL window functions interview questions.

Take the first steps today to move your data analysis skills to the next level!

Tags:

student_id	grade	average	lowest	highest
1	63	62	45	80
2	80	62	45	80
3	60	62	45	80
4	45	62	45	80
5	52	62	45	80
6	70	62	45	80
7	65	62	45	80

student_id	grade	average	lowest	highest
1	63	62	45	80
2	80	62	45	80
3	60	62	45	80
4	45	62	45	80
5	52	62	45	80
6	70	62	45	80
7	65	62	45	80