6th Jul 2023 12 minutes read

Google BigQuery SQL Syntax: A Comprehensive Guide

Table of Contents

What Is Google BigQuery?
Practicing BigQuery
BigQuery Syntax
Running Commands in the BigQuery Cloud Console
Working with Google BigQuery: The Next Step

What is Google BigQuery? What can it do for your organization? Is BigQuery SQL a valuable skill? Where can you learn it, and where can you get some practice?

Very few organizations today are not computerized. Many processes now make use of the IoT (Internet of Things), where all kinds of devices are networked and continually feed real time data into computer systems. The result is a vast amount of data available for decision-making. This is often referred to as a data lake.

The major challenges with this stream of data are how and where to store it and how best to make it available to decision-makers. Google's BigQuery is arguably one of the best answers to these challenges.

BigQuery SQL skills are an important addition to your portfolio. If you want to learn BigQuery, where should you start? BigQuery supports ANSI SQL standards, so the first step is to gain basic SQL skills.

If you don't already know SQL, LearnSQL.com's SQL Basics course will get you up to speed very quickly. You'll learn important database concepts; by the end of the course you'll be able to extract, aggregate, and analyze data from one or more tables to gain meaningful information. The course takes about 10 hours to complete; it includes 129 interactive exercises that teach you to solve real-world problems with SQL.

What Is Google BigQuery?

Google BigQuery is a serverless, highly scalable data warehousing solution. It can analyze terabytes of data in seconds and petabytes of data in minutes. But what does that actually mean?

Data Warehouse

Data is usually stored for one of two purposes:

To support a business’ day-to-day operations
To support decision making.

The first of these is known as operational data, and it is often stored by several different computer systems as they carry out various tasks needed by the organization. The second is known as analytical data. Ideally, this data should be stored in a central place where it's accessible to anyone who needs information.

This central place is known as the data warehouse, and it should be optimized to search and analyze large volumes of data quickly. You'd use a process known as ETL (Extract, Transform, Load) to transfer operational data into the warehouse.

As a data warehouse, BigQuery has several advantages:

It has excellent ETL tools.
It uses the Colossus file system, which is designed for 'big'; more space can easily be added when needed.
It uses SQL as its query language, so it's compatible with top business intelligence (BI) tools like
Google Data Studio allows data to be presented in many different ways.

Fast Searches

BigQuery uses the Dremel search engine to process large volumes of data quickly. As I mentioned earlier, BigQuery can scan petabytes (a petabyte is roughly equivalent to a million gigabytes) in minutes. How does it do this?

Each time you run a search on BigQuery, thousands of worker threads split the task between them; this allows enormous amounts of data to be scanned concurrently. The results are then amalgamated and returned to the user.

Serverless Architecture

If you use BigQuery, you don't hire a server. This means you don't have to manage the infrastructure, worry about backups and security, or figure out whether you need more hard disks. Your data becomes part of the great data lake – or ocean!

This digital data ocean is split across countless servers worldwide. If more space is needed, Google allocates and manages it. You only pay for what you use.

What about the processing power needed to carry out the computations? Google allocates processing power as and when it's needed. Google's high-speed networking software, Jupiter, is specially designed for fast communication between threads; you only pay for the processing slots that you use.

And what about security? You can create views for your data and specify exactly who can see what. Since data is encrypted – both when stored and when in transit – it's safe from intruders.

Practicing BigQuery

If you already know some SQL, you can try out BigQuery for yourself right now. If you don't, you can still try out some of the examples in the next section to get a good feel of what it can do. But to get the full benefit of BigQuery, it's worth upping your SQL skills by enrolling in a comprehensive SQL course. LearnSQL.com's SQL from A to Z learning track includes 7 courses that take you from beginner to expert.

Where and how can you practice these skills in BigQuery? Anyone can create a project on BigQuery for free. You only get charged when you want to load and process large amounts of data. The BigQuery sandbox lets you insert your own test data. Or you can access a wide range of public databases to play with.

To get started, visit Google Cloud Console's BigQuery page. If you're not already logged into a Google account, you may be asked to do so. The first time you access this console, you'll see a notice like this:

Click CREATE PROJECT and either give your project a name or accept Google's suggestion.

Now you'll need some data to practice on. Fortunately, BigQuery has lots of public data available. To access it, you'll need to add the public datasets to your SQL Workspace. Towards the top left of the page, you'll see this:

Click ADD. A pop-up screen will appear, giving you several choices. Click this one:

You'll be asked for the project name. Enter bigquery-public-data and click 'STAR'. You should see this project listed in the project explorer on the left of the screen. And now you're all ready to start exploring. If you click the arrow next to bigquery-public-data, you'll see a list of all the databases in this project. And if you click a database, you'll see a list of its tables. The examples in the next section of this article are taken from the census_bureau_international database.

If you've already had experience with SQL, that's all you need to get started. However, if you're an SQL newbie, you may like to get some practice in an environment where help is available and you're guided to finding solutions to problems. In that case, I'd recommend LearnSQL.com's practice track, where you'll have the chance to solve over 80 real-world challenges.

BigQuery Syntax

Now that you know how to set yourself up with BigQuery, let's have a look at the BigQuery syntax. But first, as you may already know, SQL has many different dialects unique to different database products. You can learn more about the different SQL dialects here.

The American National Standards Institute (ANSI) defines the recommended standards for SQL; each dialect may or may not implement all of these standards. Many of them have their own additions to the language to allow more functionality.

BigQuery originally had a very non-standard version of SQL that was unique to its needs. This is now called 'Legacy SQL' and you'd only need to learn it if you wanted to work for a company that still uses it. BigQuery now uses Google SQL, which sticks very closely to the ANSI standard. If you're already familiar with another dialect (such as MS SQL Server or MySQL), you'll find very few differences when using BigQuery.

The major difference is that BigQuery has some extra statistical features and supports complex data structures like JSON and arrays.

Let's have a look at a few examples of running queries in the Cloud Console. If you've never used SQL before and you're finding the examples difficult to follow, you might find this article on the most important SQL commands helpful.

You can find the full syntax of all the Google SQL commands here.

Running Commands in the BigQuery Cloud Console

To the right of the project explorer in the BigQuery Cloud Console, you'll see a window where you can run Google SQL commands. It looks like this:

You type your query in the query window, then click 'RUN' in the actions bar at the top. If you've typed a valid SQL command, you'll see the data you requested in the results window. If not, you'll see an error message instead.

Simple SQL Commands

Extracting and analyzing data uses only one SQL command: the SELECT statement. In its simplest form, the command is SELECT * FROM tablename, where tablename is the name of one of the tables in your database. The asterisk (*) tells it to return all the columns in the table.

In Google SQL – unless the data is coming from your own project – you prefix the table name with its project and database name. In the examples, I'm going to access data from the census_bureau_international database in the bigquery-public-data project. One of the tables I'll use is called midyear_population. I'll need to refer to this table as:

bigquery-public-data.census_bureau_international.midyear_population

For convenience, you can give this rather long name an alias so you don't have to keep typing it. In the example below, I'm giving it the alias midyear.

bigquery-public-data.census_bureau_international.midyear_population AS midyear

When you're testing on tables that may be very large, it's a good idea to put a limit on the number of rows that will be returned. Otherwise you may exceed the limits of the free version of BigQuery.

To see everything in the first 10 rows of this table, you'd type this command into the query window, then click RUN:

SELECT * FROM 
bigquery-public-data.census_bureau_international.midyear_population 
AS midyear
LIMIT 10

Try it out for yourself. Unless you've made a typo, your screen should look like this:

You can use the scroll bar to scroll through your results.

Now let's look at a few of the optional features of the SELECT statement.

The WHERE clause lets you choose only rows that meet a certain criteria.
Instead of showing all columns (i.e. using the asterisk), you can list the column names you want to see. In the above image, the column names (country_code, country_name, year, and midyear_population) are shown at the top of each column in the results window.
You can specify the order of the rows in the result set using the ORDER BY

Try running this query to see only country names and populations for 1975, sorted by country name.

SELECT
midyear.country_name, midyear_population
FROM
bigquery-public-data.census_bureau_international.midyear_population
AS midyear
WHERE
year = 1975
ORDER BY country_name

If you've done everything correctly, you should see this:

SQL has lots more features. Two of the most useful are:

Using aggregate functions, such as SUM() to find totals, AVG() to find averages, and MIN() and MAX() to find the smallest/largest values. Aggregates are usually used in conjunction with the GROUP BY clause, which defines the groups the aggregate functions apply to. Without grouping the data, you’ll see the aggregates for the data set as a whole.
Retrieving data from two or more tables, provided they have one or more columns in common that can be used to join them.

Let’s try some sample queries. First:

SELECT
midyear.country_name, AVG(midyear.midyear_population)
FROM
bigquery-public-data.census_bureau_international.midyear_population
AS midyear
GROUP BY country_name
ORDER BY country_name

This query shows the average midyear population for each country over the entire period.

The next query takes data from two tables: the midyear_population table and the birth_death_growth_rates table. You can then see the growth rate alongside the population.

You'll need to join these on two matching columns: the country and the year. Here’s the query, followed by its results:

SELECT
midyear.country_name, midyear.midyear_population, 
growth.growth_rate
FROM
bigquery-public-data.census_bureau_international.midyear_population
AS midyear
JOIN
bigquery-public-data.census_bureau_international.birth_death_growth_rates
AS growth
ON growth.country_name = midyear.country_name
AND growth.year = midyear.year
WHERE midyear.year = 1975
ORDER BY country_name

For all these queries, BigQuery uses ANSI standard syntax; the queries would work equally well in any SQL dialect.

Some Advanced SQL Commands

Going deeply into the syntax of complex SQL statements is beyond the scope of this article, but I'd like to include a couple of advanced queries that have the same syntax in Google SQL as they do in other dialects (such as MS SQL Server).

Window functions are used to view individual rows against aggregates from the entire dataset. This example ranks countries by population – the country with the highest population ranks as 1.

SELECT
midyear.country_name, midyear.midyear_population, RANK() OVER (ORDER BY midyear_population DESC)
FROM
bigquery-public-data.census_bureau_international.midyear_population
AS midyear
WHERE
year = 1975
ORDER BY country_name

Your results should look like this:

Other advanced features include GROUP BY extensions, which allow you to show aggregates at more than one level. Again, BigQuery's syntax follows ANSI standards; if you've learned these features in MS SQL, you'll be able to use them in BigQuery.

Here's an example. This will show totals at the age and country levels.

SELECT
m.country_name, m.age, SUM(m.population)
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
AS m
WHERE m.year = 1975
GROUP BY ROLLUP (m.country_name, m.age)
ORDER BY m.country_name, m.age

The results look like this. The nulls indicate that that row of output is a total.

Additional Features of BigQuery

BigQuery has many features that are not found in most dialects. These make it particularly useful for dealing with big data coming from many different sources.

It has several extra data types:

Geography – Stores geographical co-ordinates and can be used to calculate the distance between two locations.
Interval – Stores a time interval.
JSON – A format used by many applications.
Struct – A container to hold ordered fields.

BigQuery also allows you to work with arrays.

When working with very large datasets, it’s particularly useful to be able to extract random samples of data. Here's an example of taking a random sample from the midyear_population table.

SELECT
AVG (m.life_expectancy)
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
AS m
TABLESAMPLE SYSTEM (10 PERCENT)

Instead of going through the entire table, which is large, this will take a statistical sample of 10% of the table. Since the sample is random, you might get slightly different results every time you run it. The output might look like this:

You can learn more about the additional features of BigQuery SQL in the product documentation.

Working with Google BigQuery: The Next Step

The digital ocean grows larger every year, and more and more employers are looking for BigQuery skills. It's well worth becoming an expert in this area. But how?

Enhance your SQL skills by taking some courses. LearnSQL.com has a wide range of courses covering everything from beginner to advanced topics. You can search for specific courses or just browse what we have on offer.

LearnSQL.com’s practice track teaches you to be agile in solving all kinds of problems with SQL. You should also spend lots of time experimenting with the public datasets in Google Cloud Console. If you have your own business and would like to set up an SQL training program for your staff, we can help you do that, too.

BigQuery is here to stay, so start becoming an expert right now!

Tags: