Back to articles list Articles Cookbook
4 minutes read

Can SQL Help Solve Crossword Puzzles?

Everyone has solved crossword puzzles and has certainly had some problems finding an appropriate word. Thanks to SQL, it is ridiculously simple to quickly dispel your crossword doubts and give you the correct answers. Of course, Google is commonly known as a universal cure for many doubts, but handling the problem yourself is much more rewarding.

Recently I came across some simple and interesting examples from Andrew Cumming's book "SQL Hacks." These examples won't make a huge impression on those who are programming experts, but less-experienced code wranglers could find them interesting. In this article I will try to present a funny approach to solving casual problems using the power of SQL according to Andrew Cumming's book.

First of all, create a database and table containing the whole dictionary. You can easily download the dictionary from: dictionary.

To perform any searches using SQL, you have to have your dictionary in a table in the database.

To create a table in the database, type this simple statement:

CREATE TABLE dict (word VARCHAR(255))

... and fill the table with words.

Unfortunately, your dictionary text file contains each word in each line in the form provided above. We need to convert the file into the form:

INSERT INTO dict VALUES ('word');

Converting words into SQL INSERT statements

To convert the text into the statement INSERT INTO dict VALUES ('word'); you can use either sed, that is a stream editor for filtering and transforming text, or perl.

First, change all uppercase letters to lowercase. It isn't necessary, but in my opinion it makes the text more readable.

perl -p -i -e 'tr/A-Z/a-z/' words.txt
perl -pe "s/'/''/g;s/.*/INSERT INTO dict VALUES ('$&');/" words.txt > words.sql

After conversion, you can run the script in your database and start to play.

Let's start with a simple example. Suppose that we are looking for an eight-letter word, where the second letter is 'h', the fourth letter 'b' and seventh letter 'n'. MySQL provides SQL pattern matching that enables you to use "_" to match any single character and "%" to match an arbitrary number of characters including zero characters. To compare words, use the operator 'like'. Additionally, MySQL supports POSIX regular expressions.

FROM dict
WHERE word like '_h_b__n_'

The result: thebaine

...For those of you who are interested, thebaine is a poisonous alkaloid that is chemically similar to both morphine and codeine, but has stimulatory rather than depressant effects.

Now, we want to find an 8-letter word beginning with the character 'o'.

Typing the underscore character multiple times may cause a mistake. It's possible to do it using various string functions: 'locate' returns the position of the first occurrence of a substring and 'length' specifies the length of a string.

Sample crossword puzzle, SQL puzzle

FROM dict
where locate('o',word)=1
and length(word) = 8

We can do the same using operator RLIKE or REGEXP.

FROM dict
WHERE word RLIKE '^o.{7}$'

^ specifies beginning of a line
$ end of line
. match character
{n} match n times

Using this simple pattern we specified an 8-letter word beginning with 'o'.

In another situation, we want to find an eight-letter word with the same letters at the beginning and the end:

FROM dict
WHERE word like concat('%', SUBSTR(word,1,3))

one of the example result: underground

SUBSTR(word,1,3) returns first three characters from the word according to the following pattern: SUBSTR(string,position,length). Function 'concat' concatenates two parts. The first '%' presents an arbitrary number of characters, the second presents the last part of word.

How SQL function concat works, SQL Hacks

Knowing only the fourth, fifth and sixth letter in a word, we can search the substring in words.

FROM dict
where SUBSTR(word,4,3) = 'bai'
and length(word) = 8

or in the different way

FROM dict
WHERE word RLIKE '.{3}bai.{2}'

A regular expression is a powerful way of specifying a pattern for a complex search and is used almost everywhere. In certain situations, they are a technique leading to faster and cleaner code. However, using it in SQL could also be slow. These queries can't use indexes, so a full table scan is required. There are some things we probably shouldn't be doing in SQL. Sometimes it's hard to get rid of LIKE, which is considered to have some regex-like functionality. I quote here David Eisinger, who on his blog about "Regular Expressions in MySQL" writes "Don't use RLIKE when LIKE will suffice and be sure to benchmark your queries with datasets similar to the ones you'll be facing in production."