Back to articles list Articles Cookbook
14 minutes read

BigQuery Window Functions Explained

Keeping up with data analysis trends gives your organization – and your CV – the cutting edge. In this article, we'll look at BigQuery window functions and how you can use them to gain deeper insights into your data.

Google's BigQuery, launched in 2010, is gaining traction as a popular choice with organizations needing to analyze large quantities of information quickly and to compare their own data against statistical data in the public domain.

Since Google aligned BigQuery's data retrieval language to conform with standard SQL— and included advanced features such as SQL window functions—its popularity has increased. Many organizations now include BigQuery skills as a must-have, and this means that SQL skills are more in demand than ever. This article explains why knowledge of SQL is essential to working with BigQuery

SQL continues to hold its place as the leading skill for anyone who needs to work with data. If you're not yet an SQL guru, you may be interested in our SQL from A to Z learning track. It includes 7 courses that take you from absolute beginner to SQL expert. The course has hundreds of real-world coding challenges and takes about 84 hours to complete. Since you can access the sample databases through your browser, you don't need to install any software to get started.

What’s BigQuery?

Google BigQuery is a high-speed data warehouse located in the Cloud. Designed specially to hold data used for analysis, it can process petabytes of data in minutes. If you haven't yet got your head around petabytes, a petabyte is one quadrillion bytes (or a million gigabytes).

You pay for what you use in BigQuery, so the cost of storing and analyzing huge quantities of data usually works out much lower than investing in lots of hard disks. And since all kinds of useful statistical data from governments and world organizations are stored publicly in BigQuery, you can access them to gain insights into how your organization's performance could be enhanced.

What Are SQL Window Functions?

Window functions are also known as analytical functions or OVER functions. They were added to the SQL standard in 2003, and most major database vendors began implementing them from 2010 onwards. They are thus a fairly new addition to SQL.

Before window functions were included in SQL, you could either list individual rows or calculate aggregates such as totals and averages. You couldn't easily do both in the same query unless you wrote complex—and probably slow and inefficient—subqueries.

This means you could either have a list like this …

Student IDSubjectScore
1Math63
1Science50
2Math59

… or you could show aggregates like this:

SubjectClass Average
Math52
Science61
English55
Overall Average56

In this example, if you wanted to know how an individual student's score compared to the class average, you'd have to look at the two reports side by side. It would be much nicer if you could see the class average on the same row as a student's score, like this:

Student IDSubjectScoreClass Average
1Math6362
1Science5061
2Math5952

As I mentioned, you could achieve this using subqueries. But subqueries are notoriously slow to run and can make your query very complicated.

This is the kind of thing window functions allow you to do easily and efficiently: include aggregates alongside details on the same row.

Why Are They Called Window Functions?

They're called window functions because, while looking at an individual row, you can also "look through the window" and extract information from the entire dataset or rows related to the current row.

As an example, let's look at a table of students' grades:

Student IDTeacher IDSubjectGrade
11Math63
21Math80
32Math60
42Math45
51Math52
61Math70
72Math65
12Science70
22Science62
32Science90
42Science30
52Science53
15English59
35English70
55English45
65English62
112History55
312History67
412History58

As per the previous example, we want to show the class average next to each student, so that a single row looks like this:

Student IDSubjectGradeClass Average
4Math4562

To do this, we need to look at the average of all the other students when we're extracting this row and show the result alongside the other data.

BigQuery Window Functions Explained

In SQL, window functions use a sliding window of rows to extract additional information from either the entire dataset, or a subset related to the current row.

In the above diagram, the current row is a math result, and the window includes all the math grades.

When we're extracting this row …

Student IDSubjectGradeClass Average
1Science7062

… we'll need the window to "slide" so we're viewing all the science results in order to calculate the average for science.

Window functions have many of the same capabilities as the GROUP BY clause, but the difference is that they allow us to view aggregates and details side by side.

Syntax of BigQuery Window Functions

The OVER() Clause

The OVER() clause indicates that you're using a window function. You'd write your query as usual, and include the aggregates you want alongside the other column names. Each aggregate is identified with the OVER() clause. When you use this clause on its own, the 'window' is the entire dataset. I'll cover sliding windows a bit later in the article.

For example, if you wanted to extract all the math results and show the class average and the highest and lowest grade against each student’s grade, your query would look like this:

SELECT 
  student_id,
  grade,
  AVG(grade) OVER() AS average,
  MIN(grade) OVER() AS lowest,
  MAX(grade) OVER() AS highest
FROM exam_results
WHERE subject = 'Math';

Your results would then look like this:

student_idgradeaveragelowesthighest
163624580
280624580
360624580
445624580
552624580
670624580
765624580

The PARTITION BY Clause

This clause uses a sliding window. Instead of a window containing the entire dataset, it includes only a partition (or part) of the set.

In the previous example, I included only the math results by excluding all others using the WHERE clause. If you wanted a report that showed results for all subjects but calculated the average using only the rows where the subject matched the current row, you would use the PARTITION BY clause:

SELECT 
  student_id,
  subject,
  grade,
  AVG(grade) OVER(PARTITION BY subject) AS average
FROM exam_results;

Have a look at this color-coded copy of the students’ grades table to see how the partitions will work:

Student IDTeacher IDSubjectGrade
11Math63
21Math80
32Math60
42Math45
51Math52
61Math70
72Math65
12Science70
22Science62
32Science90
42Science30
52Science53
15English59
35English70
55English45
65English62
112History55
312History67
412History58

When processing each row, the rows included in the window change based on the value in the subject column. This means the average is calculated only for the partition of the dataset where the subject matches the current row. You can visualize it like this:

BigQuery Window Functions Explained

The results would look like this:

Student IDSubjectGradeClass Average
1Math6362
2Math8062
3Math6062
4Math4562
5Math5262
6Math7062
7Math6562
1Science7061
2Science6261
3Science9061
4Science3061
5Science5361
1English5959
3English7059
5English4559
6English6259
1History5560
3History6760
4History5860

The ORDER BY Clause

The ORDER BY clause within the OVER() function uses a different type of sliding window. When you use OVER(ORDER BY column_name), the window includes only those rows where the value of the specified column is less than or equal to that column’s value in the current row.

The ORDER BY clause is useful for calculating running totals and moving averages. As an example, I'll use a table called monthly_transactions that contains bank account transactions:

account_idtran_datetransactionvalue
12023-09-01Opening Balance500.00
12023-09-03Deposit137.45
12023-09-12Withdrawal-200.00
12023-09-18Withdrawal-250.00
22023-09-01Opening Balance1200.00
22023-09-14Deposit900.00
22023-09-20Purchase-318.90

The query below will list the transactions for Account ID 1, showing a running balance.

SELECT 
  account_id,
  tran_date,
  transaction,
  value,
  SUM(value) OVER(ORDER BY tran_date) AS balance
FROM monthly_transactions
WHERE account_id = 1;

Including the ORDER BY clause within the OVER clause controls a sliding window.

If you wanted to, you could also use the usual ORDER BY clause at the end of the query to control the final order of the rows on the report. They don’t have to be shown in the original order.

By default, using the ORDER BY inside the OVER clause causes the window to slide so that it only views rows where the date is less than or equal to the date of the current row. There are other keywords that can change this default, but they’re a bit beyond the scope of this article.

You can visualize it like this:

BigQuery Window Functions Explained

The results would look like this:

account_idtran_datetransactionvaluebalance
12023-09-01Opening Balance500.00500.00
12023-09-03Deposit137.45637.45
12023-09-12Withdrawal-200.00437.45
12023-09-18Withdrawal-250.00187.45
22023-09-01Opening Balance1200.001200.00
22023-09-14Deposit900.002100.00
22023-09-20Purchase-318.901781.10

Using ORDER BY with the PARTITION BY Clause

If you want to show all accounts with their running balances, you can use  PARTITION BY and ORDER BY together:

SELECT 
  account_id,
  tran_date,
  transaction,
  value,
  SUM(value) OVER(PARTITION BY account_id ORDER BY tran_date) AS balance
FROM monthly_transactions
ORDER BY account_id, tran_date

The PARTITION BY clause will cause the sliding window to only include rows where the account_id matches the current row. The ORDER BY clause will cause it to only include rows within that partition where the date is less than or equal to the date of the current row.

The results would be:

account_idtran_datetransactionvaluebalance
12023-09-01Opening Balance500.00500.00
12023-09-03Deposit137.45637.45
12023-09-12Withdrawal-200.00437.45
12023-09-18Withdrawal-250.00187.45
22023-09-01Opening Balance1200.001200.00
22023-09-14Deposit900.002100.00
22023-09-20Purchase-318.901781.10

Each account has its own separate running balance.

SQL Window Functions Available in BigQuery

You've seen how common SQL aggregate functions such as SUM(), AVG(), MIN(), and MAX() can be used in conjunction with the OVER clause to extract aggregates from a window of data.

Google BigQuery, in common with many other SQL dialects, has additional functions that can give deeper insights into the data. Here are some examples.

RANK()

This function ranks the dataset from highest to lowest on a specified column. It can answer questions like:

  • Where did each student place in the class, based on exam results?
  • Which products were most profitable?
  • Which customers spent the most money?
  • Which warehouse received the most complaints?

Using the sample table we looked at earlier, let’s rank the students by exam results using this query:

SELECT 
  student_id,
  subject,
  grade,
  RANK() OVER(PARTITION BY subject ORDER BY grade desc) AS place
FROM exam_results
ORDER BY subject, grade DESC;

The result would be:

student_idsubjectgradeplace
3English701
6English622
1English593
5English454
3History671
4History582
1History553
2Math801
6Math702
7Math653
1Math634
3Math605
5Math526
4Math457
3Science901
1Science702
2Science623
5Science534
4Science305

When calculating these results, the PARTITION BY subject clause causes SQL to look at only the results for the same subject as the current row. The ORDER BY grade DESC clause sorts them in descending order of grade. The RANK() function then ranks the students in this order.

Since student 3 has the highest grade in English, his rank is 1; student 6, the next highest, is ranked 2 in this subject. Student 2 has the highest grade in Math and is ranked as 1.

DENSE_RANK()

DENSE_RANK() is used for the same purpose as RANK. The difference between them can best be explained by looking at these query results, which represent scores in a shooting competition.

Using RANK, the query would be:

SELECT 
  competitor_no,
  score,
  RANK() OVER(ORDER BY score desc) AS rank
FROM match_results
ORDER BY score DESC;

The results are:

Competitor NoScoreRank
4851
5832
10832
9814
2765
6765
7727
3708
8689
16210

Competitors 5 and 10 tied for second place, and both received a ranking of 2. Competitor 9 is the next highest, and ranks as 4. Third place is left out.

Using DENSE_RANK, the query is:

SELECT 
  competitor_no,
  score,
  DENSE RANK() OVER(ORDER BY score desc) AS rank
FROM match_results
ORDER BY score DESC;

The results are:

Competitor NoScoreRank
4851
5832
10832
9813
2764
6764
7725
3706
8687
1628

Competitors 5 and 10 are still both ranked as 2, but third place is not left out: Competitor 9 now has a ranking of 3.

Both functions have the same syntax. If we wanted to recode the previous students’ grades query using the DENSE_RANK() function, it would look like this:

SELECT 
  student_id,
  subject,
  grade,
  DENSE_RANK() OVER(PARTITION BY subject ORDER BY grade desc) AS class_place
FROM exam_results
ORDER BY subject, grade DESC

But because there are no tie values, the result would be the same.

ROW_NUMBER()

The ROW_NUMBER() function works in a similar way to the previous two functions, but the rows are simply numbered in order. If rows have the same value, they will be numbered consecutively, depending on which was encountered first. Here’s the results of the shooting competition query using ROW_NUMBER() instead of RANK() or DENSE_RANK():

Competitor NoScoreRank
4851
5832
10833
9814
2765
6766
7727
3708
8689
16210

LAG()

This function allows you to compare data from the previous row in the result set to data in the current row. It's ideal for year-on-year comparisons, allowing you to discover trends and identify business performance issues.

LAG() and the related function LEAD() can only be used in conjunction with the OVER(ORDER BY) clause.

As an example, take the following table, which holds sales data for a small company:

yearsales_valuesales_quantityprofit
2019540009008000
202075000120011000
2021300004501000
202260000100007000

The sample query to compare year-on-year figures is:

SELECT 
  year,
  sales_value,
  sales_quantity,
  profit,
  LAG(sales_value) OVER(ORDER BY year) as ly_value,
  LAG(sales_quantity) OVER(ORDER BY year) as ly_qty,
  LAG(profit) OVER(ORDER BY year) as ly_profit,
  profit - LAG(profit) OVER(ORDER BY year) as inc_dec
FROM annual_sales
ORDER BY year;

The results of this query are:

yearsales_valuesales_quantityprofitly_valuely_qtyly_profitinc_dec
2019540009008000NULLNULLNULLNULL
2020750001200110005400090080003000
202130000450100075000120011000-10000
2022600001000070003000045010006000

Let's look at this line of the query and see what it actually did:

LAG(sales_value) OVER(ORDER BY year) as ly_value

On this row of the result …

yearsales_valuesales_quantityprofitly_valuely_qtyly_profitinc_dec
2020750001200110005400090080003000

… our current row is for the year 2020. The LAG() function in conjunction with ORDER BY year causes SQL to look at the row for the previous year (2019) and extract the sales value from it under the heading ly_value.

You will notice that in the first row, the columns calculated by the LAG() function contain a null value, since there is no previous record.

LEAD()

The LEAD() function is the inverse of LAG(): it obtains data from the row after the current row, instead of the one before it.

To compare the profit between the current year, the previous year, and the subsequent year using the same sample table, the query would be:

SELECT 
  year,
  profit,
  LAG(profit) OVER(ORDER BY year) as ly_profit,
  profit - LAG(profit) OVER(ORDER BY year) as ly_inc_dec,
  LEAD(profit) OVER(ORDER BY year) as ny_profit,
  LEAD(profit) OVER(ORDER BY year) - profit as ny_inc_dec
FROM annual_sales
ORDER BY year;

The results of this query would be:

yearprofitly_profitly_inc_decny_profitny_inc_dec
20198000NULLNULL110003000
202011000800030001000-10000
2021100011000-1000070006000
2022700010006000NULLNULL

The columns ny_profit and ny_inc_dec use the LEAD() function, which causes SQL to look at the next row in sequence to extract these fields. When the current row is for 2019, these two fields will be taken from the row for 2020.

This should have given you an idea of some of the useful SQL window functions available in BigQuery. You'll find a full list of functions in the BigQuery SQL documentation.

For more examples of window functions, refer to the article SQL Window Function Examples. You can also find a quick reference guide for the syntax of SQL window functions in our Window Functions Cheat Sheet.

Practical Uses of BigQuery Window Functions

In the real world, there are many ways that BigQuery window functions can give you insights to help your organization perform better. They are very powerful and allow you to produce complex reports very quickly.

Here are a few ideas of how BigQuery window functions may be used:

Next Steps with BigQuery Window Functions

Now that you've seen what SQL window functions can do in BigQuery and other database management systems, it's time to think about upping your skills in this important area.

A good place to start is LearnSQL.com's SQL Window Functions course. You'll learn step by step how to make use of the techniques you've seen in this article, with a full explanation of each topic. You'll also get plenty of practice, with over 200 interactive exercises to make sure you know how to extract complex information in real-world situations. Help is available when you get stuck, and you will be able to access sample databases through your browser. The course takes about 20 hours to complete.

If you really want to become an expert, you can get more practice in solving complex problems by working through our Window Functions Practice Set. You will tackle 100 examples of complex reports using three different databases.

You can also read this article on how to practice window functions, and you can find out more about BigQuery syntax here.

If you're job-hunting for a top-level job in data analysis or data science, it's very likely that you'll be asked to demonstrate your knowledge of SQL window functions at the interview. To get an idea of what kind of questions you might be asked (and how to answer them), here's an article that discusses top SQL window functions interview questions.

Take the first steps today to move your data analysis skills to the next level!