22nd Dec 2014 Updated: 15th Mar 2023 2 minutes read

How Does a Database Sort Strings?

ORDER BY

Different languages have different alphabets and different ways to order letters within those alphabets. For example, a Polish character Ł comes right after L and before M. In Swedish, a letter Å comes almost at the end, right after Z. In French diacritics marks have no impact on the alphabetical order, so the letters À, Á and Â are treated as the letter A when sorting strings.

A collation is a set of rules that defines how to compare and sort character strings. A collation tells you what the order of characters is and which characters should be treated as the same.

Usually there are at least two collations for a language:

the case-insensitive collation – uppercase and lowercase version of a character are treated as the same character
the case-sensitive collation – uppercase and lowercase characters are not identified.

In most databases you can specify a default collation. If you need to, you can change the collation for a table, a column, or even at the query level.

In a database the collation influences:

the behavior of the string comparisons (>, <, >=, <=, = operators).

In a case-insensitive collation we would have:
```
select 'A' = 'a';
+-----------+
| 'A' = 'a' |
+-----------+
|         1 |
+-----------+
```

the behavior of the ORDER BY clause,

In a Polish collation you would have:

select * from names  order by name;
+----+------------+
| id | name       |
+----+------------+
|  5 | Leon       |
|  3 | Łukasz     |
|  4 | Marcin     |
+----+------------+

how LIKE statements behave.

In a French collation we have:

select * from names where name like '%i%';
+----+------------+
| id | name       |
+----+------------+
|  1 | Eveline    |
|  2 | Adélaïde   |
+----+------------+

The website: http://collation-charts.org lists collation for some popular database engines, such as MySQL, Oracle, or SQL Server.

You may also like