Back to articles list Articles Cookbook
12 minutes read

An Overview of SQL Text Functions in MySQL

Do you work in MySQL as a data analyst? Then you should know how to use its text functions – data analysts don’t work only with numbers. Reporting requires computations, data classification, and label creation, and you can do all of that in MySQL. 

In this article, I’ll explain several of the most common and useful MySQL text functions by discussing what they do and showing how they do it.

In a way, this will be a condensed version of our Common Functions in MySQL course. It’s a great resource for intermediate users who want to become proficient in MySQL functions. The course uses MySQL 8, but anyone using MySQL 5 and above can take it; the functions available only in MySQL 8 are clearly marked. The course covers four topics:

Now, let’s get started by defining MySQL functions.

What Is a Function, Anyway?

Function this, function that. But what is a function in SQL? It’s a named and predefined piece of code that you can use in your query.  How does this translate to SQL syntax?

MYSQL_FUNCTION(argument1, argument 2, …)

The name of this example function is MYSQL_FUNCTION. The arguments (enclosed  in parentheses) tell the function what to do.

MySQL Text Function Examples

MySQL’s built-in text functions are important; they let you write queries effectively when you’re dealing with text (e.g. string, varchar) values. Some of these are standard SQL functions, so you can use the Standard SQL Functions Cheat Sheet to follow the article better.

First, let’s look at the sample data.

Sample Data

Below is a table for a car dealer network in France. It contains basic info about showroom cars at every dealer.  The table is named showroom_cars and it contains these columns:

  • id – The customer’s ID.
  • make – The car’s manufacturer.
  • model – The car model.
  • serial_number – The car’s serial number.
  • made_in – The country where the car was manufactured.
  • dealer – The name of the dealer that has the car.

Here’s a look at the data:

idmakemodelserial_numbermade_indealer
1RenaultClio2MRKL32020t894509###France###Fred Cars, Nantes
2FordFocus5irpo52022D568498###Germany###Auto Jacques, Bordeaux
3FordFiesta7kOhj71999T456055###Spain###Fred Cars, Nantes
4RenaultTwingo9weŽT92018t489337###Slovenia###Paris Cars, Paris
5MazdaCX-36PPRO42018M774848###Japan###Top Car, Marseille

Now, let’s start exploring the MySQL text functions.

CONCAT()

CONCAT() combines the arguments into one string. When you have several separate texts, this is the function to use if you want to turn them into one text.

A car’s make and model are ideal for this function. You want to show them in one column, with the make and model separated by space. This is how you’d do it:

SELECT id,
	 CONCAT(make, ' ', model) AS car_name
FROM showroom_cars;

The first argument in CONCAT() is the column make. The space is also a character, so the query states it explicitly as the second argument in the single quotation marks. The third argument is the car model.

That’s how you get a list of all the showroom cars.

idcar_name
1Renault Clio
2Ford Focus
3Ford Fiesta
4Renault Twingo
5Mazda CX-3

CHAR_LENGTH() and LENGTH()

Both functions return the length of the string that is the function’s argument. However, they are not the same functions. CHAR_LENGTH() returns the string length as the number of characters; LENGTH() returns the length in bytes.

A car’s serial number has to be 17 characters long. If it’s not, there’s some mistake in the data. Let’s use both CHAR_LENGTH() and LENGTH() to see if they return different results.

SELECT id,
	 serial_number,
       CHAR_LENGTH(serial_number) AS serial_number_length_char,
       LENGTH(serial_number) AS serial_number_length_bytes
FROM showroom_cars;

The query selects the car ID and the serial number. Then it checks the serial number’s length. The argument in the CHAR_LENGTH() and LENGTH() functions is the column serial_number.

The query will return this:

idserial_numberserial_number_length_charserial_number_length_bytes
12MRKL32020t8945091717
25irpo52022D5684981717
37kOhj71999T4560551717
49weŽT92018t4893371718
56PPRO42018M7748481717

All the serial numbers are OK because the column serial_number_length_char shows they are all 17 characters long.

Is the result the same when it comes to length in bytes? It seems it is…oh, no, it’s not! Look at row four! It says the serial number is 18 bytes long, instead of 17, like all the other serial numbers. But the previous column clearly shows this serial number consists of 17 characters.

What went wrong? Nothing. If you inspect this serial number a little closer, you’ll see the fourth character is the letter ‘Ž’. This letter is a Unicode character, which is usually encoded in two bytes. Thus, 16 characters in this serial number are taking up one byte each and the Unicode character takes two bytes. It’s correct to say the serial number’s length is 17 characters but 18 bytes.

UPPER() & LOWER()

The UPPER() function takes the text and turns all the letters into uppercase. LOWER() does the opposite: all the letters will become lowercase.

A closer inspection of data in the column serial_number shows that letters are sometimes the upper case and sometimes the lower case. It shouldn’t be this way! The people in charge still don’t know which way is correct, but they are sure that all the letters must be either uppercase or lowercase. They asked you to provide them with both versions.

You need the UPPER() and LOWER() functions to give them what they want:

SELECT id,
	 make,
       model,
       serial_number,
       UPPER(serial_number) AS serial_number_uppercase,
       LOWER(serial_number) AS serial_number_lowercase
FROM showroom_cars;

Here we select the columns id, make, model, and serial_number. Then we use the column serial_number as an argument in both the UPPER() and LOWER() functions.

The serial numbers are now formatted in the desired way:

idmakemodelserial_numberserial_number_uppercaseserial_number_lowercase
1RenaultClio2MRKL32020t8945092MRKL32020T8945092mrkl32020t894509
2FordFocus5irpo52022D5684985IRPO52022D5684985irpo52022d568498
3FordFiesta7kOhj71999T4560557KOHJ71999T4560557kohj71999t456055
4RenaultTwingo9werT92018t4893379WERT92018T4893379wert92018t489337
5MazdaCX-36PPRO42018M7748486PPRO42018M7748486ppro42018m774848

SUBSTRING(), RIGHT(), and LEFT()

The SUBSTRING() function will return a substring of a string. In other words, it returns part of the text that is the function’s argument.

The basic syntax is to define the argument and then the position of the substring, i.e. the character from which the string begins. The position is an integer, and counting starts from the left.

MySQL gives you some additional possibilities:

  • You can define the start of the substring and its length in the number of characters
  • You can use a negative integer as an argument to give the position of a substring from the right.

The RIGHT() function is a type of a SUBSTRING() function, but it works only from the right. In other words, it returns the defined number of the characters counting from the right. You can define the length of the output.  LEFT() is the opposite: it returns the number of characters from the left.

For a SUBSTRING() example, imagine that you have to extract a car’s year of manufacture from its serial number. Oh, yeah, it’s possible! The four numbers starting from the seventh character are the manufacture year. Just to make things a bit trickier, you also have to show the serial number's first and last six characters.

The following query uses SUBSTRING() in two ways: once counting characters from the left and once from the right. Then it uses the LEFT() and RIGHT() functions to get the six beginning and ending characters:

SELECT id,
  make,
  model,
  serial_number,
  SUBSTRING(serial_number, 7, 4) AS year_produced_left,
  SUBSTRING(serial_number, -11, 4) AS year_produced_right,
  LEFT(serial_number, 6) AS serial_number_six_left,
  RIGHT(serial_number, 6) AS serial_number_six_right
FROM showroom_cars;

The first SUBSTRING() in the above query counts from the left. The first argument is the column serial_number; this is the string you’ll extract the substring from. The first integer defines the start of the substring. The manufacture year takes up the seventh to eleventh characters; therefore, seven is the first integer. The second integer, four, is the length of the substring; the manufacture year consists of four characters.

The second SUBSTRING() does the same but from the right. The first argument is the same. The second one is the integer -11 because the manufacture year starts from the eleventh character if counted from the right. The length of the year is again four characters.

In the LEFT() function there is, again, the serial number. The integer six means the output will be six characters from the left. The arguments are the same in the RIGHT() function.

idmakemodelserial_numberyear_produced_leftyear_produced_rightserial_number_six_leftserial_number_six_right
1RenaultClio2MRKL32020t894509202020202MRKL3894509
2FordFocus5irpo52022D568498202220225irpo5568498
3FordFiesta7kOhj71999T456055199919997kOhj7456055
4RenaultTwingo9weŽT92018t489337201820189weŽT9489337
5MazdaCX-36PPRO42018M774848201820186PPRO4774848

As you can see, both uses of SUBSTRING() give the same result.

REPLACE()

The REPLACE() function replaces one or more characters in the text with other character(s). Therefore, the function allows you to define the string, the character that is to be replaced, and the character to be replaced with.

In the example data, the dealer's name and location are separated by a comma. You need to remove the comma because it’s there by mistake; the location is part of the dealer name.

SELECT dealer,
	 REPLACE(dealer, ',', '') AS dealer_formatted
FROM showroom_cars;

Here is the result:

dealerdealer_formatted
Fred Cars, NantesFred Cars Nantes
Auto Jacques, BordeauxAuto Jacques Bordeaux
Fred Cars, NantesFred Cars Nantes
Paris Cars, ParisParis Cars Paris
Top Car, MarseilleTop Car Marseille

The first column is the current string in the dealer column, shown as it was originally entered. The second column shows the corrected version. The formatting is done using REPLACE(). The first argument is the column dealer. The second is a comma in the single quote; this is the replaced character. It is replaced by nothing, with nothing written in the single quote.

There really are no commas anymore.

TRIM()

This is a function that returns the string without the specified leading or trailing characters (or both).

How can you use TRIM()? In our example, the manufacturing country in the example data is not shown clean. There are some characters that should be removed. I’ll show you all three ways of using the TRIM() function. Here’s the query:

SELECT id,
	 make,
       model,
       made_in,
       TRIM(BOTH '#' FROM made_in) AS country_formatted,
       TRIM(LEADING '#' FROM made_in) AS country_remove_leading,
       TRIM(TRAILING '#' FROM made_in) AS country_remove_trailing
FROM showroom_cars;

The first TRIM() function removes both leading and trailing characters. This is indicated by using the keyword BOTH. After this keyword, you define the character that you want to remove. Then the FROM keyword is used to define which string the character will be removed from.

The other two TRIM() functions are almost the same. The only difference is that the LEADING keyword means removing the leading characters, i.e., the ones appearing before the string. TRAILING will remove the trailing characters, i.e., the ones after the string.

And the results you can see for yourself:

idmakemodelmade_incountry_formattedcountry_remove_leadingcountry_remove_trailing
1RenaultClio###France###FranceFrance######France
2FordFocus###Germany###GermanyGermany######Germany
3FordFiesta###Spain###SpainSpain######Spain
4RenaultTwingo###Slovenia###SloveniaSlovenia######Slovenia
5MazdaCX-3###Japan###JapanJapan######Japan

REVERSE()

The REVERSE() function reverses the string so that the first character becomes last and vice versa. In other words, it shows the text backwards.

An inspection of our data  showed that the serial number was entered incorrectly. How it happened, nobody knows. But there are some indications that the angry data analyst who worked there before you messed with data before leaving the company. The word is, it was revenge for not getting a pay raise for five years.

Now you have to reverse the serial numbers again. This is not difficult, because you’re a data analyst and you know the REVERSE() function.

SELECT id,
	 make,
       model,
       serial_number,
       REVERSE(serial_number) AS serial_number_corrected
FROM showroom_cars;

The REVERSE() function is simple to use. All you need to do is specify the text, and it will do the rest.

idmakemodelserial_numberserial_number_corrected
1RenaultClio2MRKL32020t894509905498t02023LKRM2
2FordFocus5irpo52022D568498894865D22025opri5
3FordFiesta7kOhj71999T456055550654T99917jhOk7
4RenaultTwingo9weŽT92018t489337733984t81029TŽew9
5MazdaCX-36PPRO42018M774848848477M81024ORPP6

LOCATE()

Use this function when you want to find the location of a particular character. It will be returned as an integer, specifying the number of characters from the start of the string.

For example, if you want to locate the comma in the column dealer, this is what you’d use:

SELECT dealer,
	 LOCATE(',', dealer) AS comma_location
FROM showroom_cars;

In this example, the first argument in the function is the comma specified in the single quote. The second argument is the column dealer.

Here are the commas’ positions:

dealercomma_location
Fred Cars, Nantes10
Auto Jacques, Bordeaux13
Fred Cars, Nantes10
Paris Cars, Paris11
Top Car, Marseille8

How can this function be more useful? Well, you can use it with other functions, such as CHAR_LENGTH() and RIGHT(). For example, we can use it with the dealer column to show only the dealer’s city:

SELECT dealer,
	 LOCATE(',', dealer) AS comma_location,
       CHAR_LENGTH(dealer) AS dealer_length,
       RIGHT(dealer, CHAR_LENGTH(dealer)-LOCATE(',', dealer)-1) AS dealer_town
FROM showroom_cars;

The columns comma_location and dealer_length are there only to make it easier to understand how showing the dealer’s town works. They return the location of the comma (same as in the above example) and the length of the string.

Now, the RIGHT() function also takes the string from the column dealer. The second argument should be the output string length. And it is, but it’s comprised of the CHAR_LENGTH() and LOCATE() functions. The string length is the difference between these two functions’ output.

If you want to see the dealers’ cities only, you need to output the characters that come after the comma. The thing is, the number of characters before the comma is different for every dealer. That’s why the output string in the RIGHT() function will be the total length of the dealer name minus the number of characters before the comma. This difference is the number of characters that appear after the comma, i.e. the dealer’s city. Why then the minus one in the RIGHT() function? Because after a comma comes the space, so deducting one character accounts for that.

dealercomma_locationdealer_lengthdealer_town
Fred Cars, Nantes1017Nantes
Auto Jacques, Bordeaux1322Bordeaux
Fred Cars, Nantes1017Nantes
Paris Cars, Paris1117Paris
Top Car, Marseille818Marseille

The number of characters in every city name should be the difference between the dealer_length and comma_location columns – which use the functions CHAR_LENGTH() and LOCATE(), respectively – minus one. Let’s check it. There are six characters in ‘Nantes’, which the mathematics corroborates: 17-10-1 = 6. The same is true with the other rows; you can check if you want.

If these examples have only activated your appetite for the SQL text functions, don’t worry; there are more examples in our article An Overview of SQL Text Functions. If you want to read about MySQL date functions, check out our in-depth-guide.

Time to Practice MySQL Text Functions!

I tried to make these examples of MySQL text functions as relatable as possible; now you understand how these functions work. However, nothing beats writing code yourself in the Common Functions in MySQL course, so use it to develop what you learned here. If that leads to taking the complete SQL learning path or any other MySQL courses, even better for you!