Back to articles list Articles Cookbook
2 minutes read

How Does a Database Sort Strings?

Different languages have different alphabets and different ways to order letters within those alphabets. For example, a Polish character Ł comes right after L and before M. In Swedish, a letter Å comes almost at the end, right after Z. In French diacritics marks have no impact on the alphabetical order, so the letters À, Á and  are treated as the letter A when sorting strings.

A collation is a set of rules that defines how to compare and sort character strings. A collation tells you what the order of characters is and which characters should be treated as the same.

Usually there are at least two collations for a language:

  • the case-insensitive collation – uppercase and lowercase version of a character are treated as the same character
  • the case-sensitive collation – uppercase and lowercase characters are not identified.

In most databases you can specify a default collation. If you need to, you can change the collation for a table, a column, or even at the query level.

In a database the collation influences:

  • the behavior of the string comparisons (>, <, >=, <=, = operators).

    In a case-insensitive collation we would have:

    select 'A' = 'a';
    | 'A' = 'a' |
    |         1 |

  • the behavior of the ORDER BY clause,

    In a Polish collation you would have:

    select * from names  order by name;
    | id | name       |
    |  5 | Leon       |
    |  3 | Łukasz     |
    |  4 | Marcin     |

  • how LIKE statements behave.

    In a French collation we have:

    select * from names where name like '%i%';
    | id | name       |
    |  1 | Eveline    |
    |  2 | Adélaïde   |

The website: lists collation for some popular database engines, such as MySQL, Oracle, or SQL Server.

Further reading: