18th Aug 2022 12 minutes read

An Overview of SQL Text Functions in Google BigQuery

Table of Contents

Do You Know What a Function Is?
Data for the Examples That Use BigQuery Text Functions
Google BigQuery Text Functions
Aren't These SQL Text Functions Useful? There Are More!

Data analysts and Google BigQuery go hand in hand. Google's cloud data warehouse presents you with plenty of opportunities for using standard SQL text functions you can't avoid in your day-to-day work as a data analyst.

BigQuery uses Google Standard SQL, an ANSI-compliant SQL dialect. This means you can use standard SQL text functions in BigQuery without needing to learn a variant of a given function.

The Standard SQL Functions course is an excellent resource for learning those functions. As a prerequisite, you need to know SELECT, WHERE, JOINs, GROUP BY, and HAVING. If you do, the course teaches you how to work with text, numeric, and date and time functions. It also covers specific situations, such as dealing with NULL values, using aggregate functions, and writing CASE statements. To make sure you have enough examples to practice code writing, the course offers 211 interactive exercises.

Do You Know What a Function Is?

You probably have an idea of what functions are in SQL. As a reminder, an SQL function is a named predefined piece of code you can use in your SQL query.

If you use a function in a SELECT statement, this is what the syntax looks like:

SELECT BIGQUERY_FUNCTION(argument1, argument 2, …)
FROM table

In this example, the function is called BIGQUERY_FUNCTION. Every function is accompanied by parentheses that hold the arguments. Arguments tell a function what to do and how to do it.

Data analysts deal with text, not just numeric values. This is why you need text functions in SQL to write effective queries.

So, which popular Google BigQuery text functions do you need to know? Find out below where I explain each of them. The examples are all based on the same sample data.

Data for the Examples That Use BigQuery Text Functions

One table is enough for you to see how these text functions work. All the data is stored in the table orders.

The columns are:

id – the ID of the record.
order_id – the ID of the order.
customer – the customer ID.
shipping_address – the customer's address for delivering the order.
email – the customer's email address.

Here's the data, so that everything is out in the open.

id	order_id	customer	shipping_address	email
1	&&&4592022070101&&&	Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	s.jones@jones.com
2	&&&4592022070102&&&	Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	s.jones@jones.com
3	&&&4922022070101&&&	Stavros GmbH	Landsberger Alee 49, Berlin, 10249, Germany	jasmin.s@stavros.com
4	&&&4922022071201ć&&	Hendricks, Hendricks, Hoekstra	Anna Paulownastraat 23, Rotterdam, 3014 JA, Netherlands	cindy@hhh.com
5	&&&4822022071501&&&	Mckenna & Partners	2-8 Ingram St, Glasgow, G1 1HA, UK	sean.m@mckenna.com

Let's not waste any time – let's go directly to the examples.

Google BigQuery Text Functions

CONCAT()

This function concatenates multiple strings into one. In other words, it "glues" together two or more text values. There are two ways to do this: with the CONCAT() function or with the || operator. Let's see how each of these works in the following example.

SELECT customer,
       shipping_address,
       CONCAT(customer, ', ', shipping_address) AS full_address_concat,
       customer|| ', ' || shipping_address AS full_address_operator
FROM `orders.orders`;

First, I select the customer and his/her address. I want to show the name and the address in one column, which is the full address where the order will be sent. To achieve this, I use the CONCAT() function, and the columns customer and shipping_address are the arguments for the function. Also, I want the name and the address to be separated with a comma and a blank space after it, so this is also a function argument written in single quotes.

The || operator works similarly. The logic is that you put a || between every pair of strings you want to "glue" together. Compared to CONCAT(), this method does not call for a specific function, and the "arguments" are separated by a || instead of a comma.

Here's the code output.

customer	shipping_address	full_address_concat	full_address_operator
Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	Jones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USA	Jones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USA
Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	Jones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USA	Jones & Son, Inc. , 10-2 Parkson St, Boston, MA 02136, USA
Stavros GmbH	Landsberger Alee 49, 10249, Berlin, Germany	Stavros GmbH, Landsberger Alee 49, 10249, Berlin, Germany	Stavros GmbH, Landsberger Alee 49, 10249, Berlin, Germany
Hendricks, Hendricks, Hoekstra	Anna Paulownastraat, 3014 JA, Rotterdam, Netherlands	Hendricks, Hendricks, Hoekstra, Anna Paulownastraat, 3014 JA, Rotterdam, Netherlands	Hendricks, Hendricks, Hoekstra, Anna Paulownastraat, 3014 JA, Rotterdam, Netherlands
Mckenna & Partners	2-8 Ingram St, Glasgow G1 1HA, UK	Mckenna & Partners, 2-8 Ingram St, Glasgow G1 1HA, UK	Mckenna & Partners, 2-8 Ingram St, Glasgow G1 1HA, UK

As you see, both ways of concatenating strings give the same result.

CHAR_LENGTH(), CHARACTER_LENGTH(), and BYTE_LENGTH()

The CHAR_LENGTH and CHARACTER_LENGTH() functions are the same: they both return the length of a string in numbers of characters.

BYTE_LENGTH(), as you may have imagined, does the same but in numbers of bytes rather than characters.

Let's apply all three functions to the order ID.

SELECT order_id,
       CHAR_LENGTH(order_id) AS order_id_CHAR,
       CHARACTER_LENGTH(order_id) AS order_id_CHARACTER,
       BYTE_LENGTH(order_id) AS order_id_BYTE
FROM `orders.orders`;

The syntax here is simple: call the function and write the column name.

Here are the text lengths.

order_id	order_id_CHAR	order_id_CHARACTER	order_id_BYTE
&&&4592022070101&&&	19	19	19
&&&4592022070102&&&	19	19	19
&&&4922022070101&&&	19	19	19
&&&4922022071201ć&&	19	19	20
&&&4822022071501&&&	19	19	19

The functions CHAR_LENGTH() and CHARACTER_LENGTH() both return the same result. As I have mentioned, BYTE_LENGTH() measures the length in bytes. The length in bytes is different from the character length if there are Unicode characters. That's why it shows a length of 20 bytes in the fourth row. The character "ć", a Unicode character, somehow got into the order ID.

UPPER(), LOWER(), and INITCAP()

The UPPER() function turns all letters of a string into uppercase. LOWER() does the opposite; all the letters become lowercase.

INITCAP() is something in between. The first letter of each word becomes uppercase, and all other letters become lowercase. Also, you can specify a delimiter. If you do so, then the first letter of each word after the delimiter becomes uppercase, and all others become lowercase.

Here's an example of these functions.

SELECT customer,
       UPPER(customer) AS customer_upper,
       LOWER(customer) AS customer_lower,
       INITCAP(customer, ',') AS customer_initial_upper
FROM `orders.orders`;

Again, the syntax is very simple: specify the column in the parentheses after the function, and you get the desired result.

For INITCAP(), the result is different only if you specify the delimiter, which I do here. My delimiter is the comma.

The code returns the following result.

customer	customer_upper	customer_lower	customer_initial_upper
Jones & Son, Inc.	JONES & SON, INC.	jones & son, inc.	Jones & son, inc.
Jones & Son, Inc.	JONES & SON, INC.	jones & son, inc.	Jones & son, inc.
Stavros GmbH	STAVROS GMBH	stavros gmbh	Stavros gmbh
Hendricks, Hendricks, Hoekstra	HENDRICKS, HENDRICKS, HOEKSTRA	hendricks, hendricks, hoekstra	Hendricks, hendricks, hoekstra
Mckenna & Partners	MCKENNA & PARTNERS	mckenna & partners	Mckenna & partners

All three functions work as described earlier. However, for the result of INITCAP(), you see the first letters after commas are still lowercase. Shouldn't it have been uppercased since we specified the comma as the delimiter?

Well, no, not in this case. This is because the first text values after the comma are spaces. For this reason, you do not see the effect of INITCAP().

SUBSTR() and SUBSTRING()

Again, two functions named slightly differently do exactly the same thing: return a part (a substring) of a text value.

Both SUBSTR() and SUBSTRING() allow you to define the string, the starting position of the substring, and its length.

SELECT order_id,
       SUBSTR(order_id, 7, 8) AS order_date_substr,
       SUBSTRING(order_id, 7, 8) AS order_date_substring
FROM `orders.orders`;

When extracting a substring, the first argument in the function is the string itself. In this example, it is the column order_id. The starting position of the substring is the second argument. If it's a positive integer, it counts from the left; if it's negative, it counts from the right. In the above example, the substring starts from the seventh character of the string, counting from the left. Finally, the length of the substring is an optional argument. Here, the substring consists of eight characters.

What do I get by doing this? Knowing the logic of the order IDs, I know that the order ID contains the date of the order. Yes, it consists of eight characters (year, month, date) and begins with the seventh character of the order ID.

order_id	order_date_substr	order_date_substring
&&&4592022070101&&&	20220701	20220701
&&&4592022070102&&&	20220701	20220701
&&&4922022070101&&&	20220701	20220701
&&&4922022071201ć&&	20220712	20220712
&&&4822022071501&&&	20220715	20220715

The result shows that the first order, for instance, was placed on 1 July 2022.

REPLACE()

As you may have guessed from its name, the REPLACE() function replaces one string of characters with another.

For example, the client Jones & Son, Inc. was recently renamed to Jones & Jones & Son, Inc. Here's how I rename their occurrences in the table.

SELECT id,
       customer,
       REPLACE(customer, 'Son', 'Jones & Son') AS customer_renamed
FROM `orders.orders`;

In the function, I first specify the column I want to change. Then I want to change "Son" to "Jones & Son". And there it is; the client name is changed in the table below.

id	customer	customer_renamed
1	Jones & Son, Inc.	Jones & Jones & Son, Inc.
2	Jones & Son, Inc.	Jones & Jones & Son, Inc.
3	Stavros GmbH	Stavros GmbH
4	Hendricks, Hendricks, Hoekstra	Hendricks, Hendricks, Hoekstra
5	Mckenna & Partners	Mckenna & Partners

TRIM()

The TRIM() function cleans your data. How does it do that? By specifying a string and a Unicode character you want to get rid of. It removes all the leading and trailing occurrences of that character.

Here's an example that cleans the order ID.

SELECT id,
       order_id,
       TRIM(order_id, '&') AS order_id_trimmed
FROM `orders.orders`;

The arguments in the function are the column order_id and the character "&", the latter of which appears before and after the order ID for some reason.

Here's how the cleaned data looks.

id	order_id	order_id_trimmed
1	&&&4592022070101&&&	4592022070101
2	&&&4592022070102&&&	4592022070102
3	&&&4922022070101&&&	4922022070101
4	&&&4922022071201ć&&	4922022071201ć
5	&&&4822022071501&&&	4822022071501

REVERSE()

This is another popular SQL text function that often comes in handy. It takes the string and reverses it so that the last character becomes the first, the second from the last becomes the second, and so on.

For instance, this may be useful if the order IDs have been entered completely backward. The structure of the order IDs has been such that the first three digits are the customer ID, then the order date is the next eight digits, and the last two digits are the control number. But due to changes in the system, now the new order IDs have to be reversed. It is easy to do so with the REVERSE() function.

SELECT id,
       order_id,
       REVERSE(order_id) AS order_id_reversed
FROM `orders.orders`;

The only thing I specify in the function is the column I want to reverse. The result is given below.

id	order_id	order_id_reversed
1	&&&4592022070101&&&	&&&1010702202954&&&
2	&&&4592022070102&&&	&&&2010702202954&&&
3	&&&4922022070101&&&	&&&1010702202294&&&
4	&&&4922022071201ć&&	&&ć1021702202294&&&
5	&&&4822022071501&&&	&&&1051702202284&&&

STRPOS()

This is a function for when you want to find out the character position of a specific substring inside a given text. This is useful if, for example, you need to find the position of "@" in the email address of every customer.

SELECT id,
       customer,
       email,
       STRPOS(email, '@') AS at_position
FROM `orders.orders`;

The first argument in the function takes, as usual, the column or the string value you want to search. Then you state the substring you want to search for. The function returns an integer value, i.e., the position of the substring from the left.

id	customer	email	at_position
1	Jones & Son, Inc.	s.jones@jones.com	8
2	Jones & Son, Inc.	s.jones@jones.com	8
3	Stavros GmbH	jasmin.s@stavros.com	9
4	Hendricks, Hendricks, Hoekstra	cindy@hhh.com	6
5	Mckenna & Partners	seckenna.com	7

Check the result, using the first email, "s.jones@jones.com", as an example. There are seven characters before the "@", so it is at the eighth position, which is precisely what the output shows.

SPLIT()

The SPLIT() function takes a string value and splits it into its components based on a delimiter. The function takes the string value as an argument. Also, the optional delimiter argument takes a character you want to use for splitting.

In the data, the column shipping_address is perfect for showcasing how the function works. Let's say I want to split the address into the following components: street name, city, postal code, and country. For this, I have the following code.

SELECT customer,
       shipping_address,
       SPLIT(shipping_address) AS address_split
FROM orders.orders;

The only argument in the function is the shipping address. I do not need to specify a delimiter because the comma is the default delimiter for string values, and that's exactly what I need here.

Here's the output.

customer	shipping_address	address_split
Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	10-2 Parkson St
		Boston
		MA 02136
		USA
Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	10-2 Parkson St
		Boston
		MA 02136
		USA
Stavros GmbH	Landsberger Alee 49, Berlin, 10249, Germany	Landsberger Alee 49
		Berlin
		10249
		Germany
Hendricks, Hendricks, Hoekstra	Anna Paulownastraat 23, Rotterdam, 3014 JA, Netherlands	Anna Paulownastraat 23
		Rotterdam
		3014 JA
		Netherlands
Mckenna & Partners	2-8 Ingram St, Glasgow, G1 1HA, UK	2-8 Ingram St
		Glasgow
		G1 1HA
		UK

As you see, the shipping address is split into the desired components. However, I admit the output format is not too table-friendly. It would be ideal if there were a way to show each address component in a separate column. Fortunately, this is possible in BigQuery. It offers two functions: ORDINAL() and SAFE_ORDINAL(). They return the same result except when the index is out of range: ORDINAL() returns an error while SAFE_ORDINAL() returns NULL.

Take a look at the code below. Don't worry; it's not difficult.

SELECT customer,
       shipping_address,
       SPLIT(shipping_address, ', ') [SAFE_ORDINAL (1)] AS street,
       SPLIT(shipping_address, ', ') [SAFE_ORDINAL (2)] AS city,
       SPLIT(shipping_address, ', ') [SAFE_ORDINAL (3)] AS postal_code,
       SPLIT(shipping_address, ', ') [SAFE_ORDINAL (4)] AS country
FROM orders.orders;

As you see, I use the SPLIT() function four times, once for each address component. This is because I want each component in a dedicated column. To do this, I just use SAFE_ORDINAL() after each SPLIT(). The integer in the SAFE_ORDINAL() function specifies the index, that is, the position of the value. In this example, index 1 returns the first value in the string, which is the street address, index 2 is the city, and so on.

This returns a much nicer output. It's perfect!

customer	shipping_address	street	city	postal_code	country
Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	10-2 Parkson St	Boston	MA 02136	USA
Jones & Son, Inc.	10-2 Parkson St, Boston, MA 02136, USA	10-2 Parkson St	Boston	MA 02136	USA
Stavros GmbH	Landsberger Alee 49, Berlin, 10249, Germany	Landsberger Alee 49	Berlin	10249	Germany
Hendricks, Hendricks, Hoekstra	Anna Paulownastraat 23, Rotterdam, 3014 JA, Netherlands	Anna Paulownastraat 23	Rotterdam	3014 JA	Netherlands
Mckenna & Partners	2-8 Ingram St, Glasgow, G1 1HA, UK	2-8 Ingram St	Glasgow	G1 1HA	UK

If you're also an MS SQL Server user, take a look at the most popular text functions it offers.

Aren't These SQL Text Functions Useful? There Are More!

These examples show there are plenty of situations when text functions are helpful in your day-to-day data manipulation. I have covered only the most popular ones; there are more of them.

If you're interested in what they are, consult the Google BigQuery documentation. It's a great source of reference.

However, if you want these functions (or at least some of them) to become second nature for you, you need to use them in practice. Going through the numerous examples in the Standard SQL Functions course is a perfect way to do so!

Tags: