# Hierarchical Data and How to Query It in SQL

How do you recognize hierarchical data, and how do you query it? Two common SQL questions are about to be answered.

Querying hierarchical data is a rather common situation if you’re working with SQL and relational databases. Not that hierarchical data is something mystical and rare; on the contrary, it’s everywhere. So why is hierarchical data so challenging when it comes to relational databases? Mainly, the difficulties occur in translating hierarchical data to relational database principles.

When working with hierarchical data, the first step is to recognize it. I’ll start off by defining what hierarchical data is and giving you several everyday examples of it. Then I’ll move on to explaining how hierarchical data is usually stored in the databases. Finally, you’ll learn how to query such data using SQL.

What you learn in this article can be practiced in our interactive Recursive Queries course. It contains over 100 exercises that teach recursive queries starting with the basics and progressing to advanced topics like processing trees and graphs with SQL queries.

## What Is Hierarchical Data?

Hierarchical data is a specific kind of data, characterized by a hierarchical relationship between the data sets. What do you think of when you think about hierarchy? Probably of different levels: something is above, below, or at the same level as something else. In relational databases, a hierarchical relationship is also called a parent-child relationship. This means the child data has only one parent, while the parent data has one or more ‘children’.

It’s common to say that hierarchical data is recognized by its tree-like structure. You’ll see why in a moment as we look at common examples of hierarchical data.

## Hierarchical Data Examples

### Employee Hierarchies

One of the typical examples used for explaining hierarchical data are employee hierarchies. They’re represented by organizational charts like this one:

As you can see, this structure is narrow at the top and becomes wider further down – much like a pine tree. The president is at the top. His subordinates are two board members. Those two board members also have their subordinates. In the case of Jacqueline Managerovicz, those are Diane Drinkalot, the HR manager, and Rashawn Mangarello, the accounting manager. But they are also the president’s subordinates, even though they are indirect subordinates.

The other board member’s direct subordinates are Tony Workaholio, the sales manager, and Cassandra Ninetofiver, the IT manager. Paul Bossenheim, the company president, is their indirect superior too.

### Family Tree

A family tree is another common hierarchical data example. Its structure allows us to find ancestors and their descendants. A family tree could look something like this:

In this example, Mike Strongbow married Victoria Stromboli. They had two children, Florence and Claudio. Florence had one child (Valerie), and Claudio had two children (Art and Michelle). All three are children to their parents, but they are also grandchildren to Mike and Victoria. Mike and Victoria are also great-grandparents; their granddaughter had two children, Judy and James.

If you go to a bar, you’ve likely come across a hierarchical structure. I’m talking about the one you usually look at immediately after you take a seat: the drink menu, or card, or whatever you call it. For example, if you visit the Panthelya Bar, you’ll find it’s a very primitive bar that only offers beers and wines.

The bar offers two types of beers: ale and wheat beer. There are many beers you can order from each category. The wine category is divided into reds and whites, with four wines in each category.

### Living Things Taxonomy

Even you, the person reading this article, are a record in hierarchical data. As a human being, you have a certain position in the hierarchy of the taxonomy of living things.

The diagram above shows the hierarchy for humans, or homo sapiens. It would be too complicated to show the whole living things taxonomy. However, the principle is the same as in all the above diagrams. All humans belong to the Sapiens species, which is part of the genus Homo. This genus is part of the Hominidae family, one of the family under the order of Primates. Primates belong to the class of Mammalia, which is subordinate to the phylum, the kingdom, and, finally, the domain.

If you’re reading this article, there’s a good chance you’re doing it on your computer. Taking this into account, it’s entirely possible that your computer folders look something like this:

All your folders in this example (Learning and Freelancing) are on your D: disc. The Learning folder has two sub-folders: SQL, where you put all the interesting SQL-related articles like this one, and Python. Your freelancing folder contains three sub-folders: Jobs, Invoices, and Other documents.

Now that you know how to recognize hierarchical data, let’s see how it’s stored in the database and how to query it.

## Storing Hierarchical Data In a Database

A problem with hierarchical data usually surfaces when you try to save such data in a database. To do that, you need to cram all those multi-level data into a relatively flat format: a table. How do you convert hierarchical data to simple rows of data?

To store the hierarchical data in a database, there’s usually a column that refers to the same table. What does that mean? It’s probably best that I show you an example. The employee hierarchy seems very suitable for that!

### Querying Hierarchical Data Using a Self-Join

I’ll show you how to query an employee hierarchy. Suppose we have a table named `employee` with the following data:

• `employee_id` – The employee’s ID and the table’s primary key (PK).
• `first_name` – The employee’s first name.
• `last_name` – The employee’s last name.
• `reports_to` – The ID of this employee’s immediate supervisor or manager.

The `reports_to` is nothing but the `employee_id` column serving as a tool to show which employee reports to which employee. If the employee’s ID appears in the column `reports_to`, this employee is the boss of (at least some) other employees. Let me show you how it works:

employee_idfirst_namelast_namereports_to
1SharonSimon6
6MartinaNovakNULL

We see that Sharon Simon reports to the employee that has `employee_id = 6`, Martina Novak. In her case, the `reports_to` value is `NULL`. This means Martina Novak does not report to anyone. From that, we can conclude she’s at the top of the employee hierarchy.

This is the part where you’ll be self-joining the table. Don’t know what a self-join is? You can easily learn by reading this article with seven self-join examples. Remember, I’ve already mentioned that hierarchical data in a database usually has a column that refers to the same table. This is such an example. To get the direct subordinates from the table `employee`, you’ll need to write this query:

```SELECT
sub.employee_id AS subordinate_id,
sub.first_name AS subordinate_first_name,
sub.last_name AS subordinate_last_name,
sup.employee_id AS superior_id,
sup.first_name AS superior_first_name,
sup.last_name AS superior_last_name
FROM employee sub
JOIN employee sup
ON sub.reports_to = sup.employee_id
ORDER BY superior_id;
```

This query joins the `employee` table with itself. Let me explain how it works. When joining a table with itself, you must use clear aliases so that SQL knows which data comes from which table – and that you know which data is from which table. In the above query, one table alias is `sub`. This means it’s the table with the subordinates’ data. The other alias is `sup`, meaning the table with the superiors’ data. Even though this is the same table, we’re treating it like it’s two different tables.

So the above query first selects the columns `employee_id`, `first_name`, and the `last_name` from the `sub` table. Then it takes the same data from the table `sup`. Thus, the table `employee` is then joined with itself using the two aliases. The self-join is made where the column `reports_to` from the table `sub` equals the column `employee_id` from the table `sup`. The data is finally ordered by the column `superior_id`.

Here’s the result:

subordinate_idsubordinate_first_namesubordinate_last_namesuperior_idsuperior_first_namesuperior_last_name
4KimMagnus6MartinaNovak
1SharonSimon6MartinaNovak
5VincentTrafalgar6MartinaNovak
7VictorFonseca2PatriciaRooney
2PatriciaRooney1SharonSimon
3JamesPalin1SharonSimon

The table shows that Kim Magnus, Sharon Simon, and Vincent Trafalgar are direct subordinates to Martina Novak. Victor Fonseca reports to Patricia Rooney. In turn, Patricia Rooney is a direct subordinate to Sharon Simon, as is James Palin.

## Using Recursive Queries on Deep Hierarchical Data

In the above example, I’ve shown you how to find direct superiors/subordinates. This means you’ve learned how to look only one level above or below. While this is very useful, hierarchies can be very deep and have an immense number of levels. Before querying such data, you’ll need to learn your way around recursive queries. Let’s talk about recursive queries first; then I’ll show you how they work with an example or two.

### What Are Recursive Queries?

If you want to know the recursive queries, you’ll first have to learn about Common Table Expressions, or CTEs.

A CTE is a temporary data set returned by a query, which is then used by another query. It’s temporary because the result is not stored anywhere; it exists only when the query is run. CTEs can be non-recursive and recursive. I’ve already written about (non-recursive) CTEs and when to use them; feel free to check that out later for more info.
A recursive query is a query that references itself. By doing so, they return the sub-result and repeat the process until they return the final result. Following this logic, a recursive CTE is a CTE that references itself.

### The Recursive CTE Syntax

The general recursive CTE syntax looks like this and can be divided into three parts:

```WITH RECURSIVE cte_name AS (
cte_query_definition

UNION ALL

cte_query_definition
)

SELECT *
FROM cte_name;
```

CTEs are also called  “WITH queries”. If you take a look at the above syntax, you’ll see why; the CTE always starts with the `WITH` clause. If you want your CTE to be recursive, you have to follow it by the word `RECURSIVE`. After that, you define the CTE name.

Then you have to write the CTE query definition. This part of the query is called the anchor member. It is “connected” to the other CTE using the `UNION ALL`. This second CTE query definition is called the recursive member, and it references the CTE itself.

At the end comes the `SELECT` statement, which fetches the data from the CTE. This part of the query is called the invocation.

Syntax is always best learned when you see it in an example. So here’s your first recursive CTE example!

### Querying the Employee Hierarchy

I want to build on the example where you learned how to self-join the `employee` table. Now I’m going to use the same table, but this time we’ll use a recursive query. The task is to find every employee’s direct and indirect boss. This relationship between the employees will be shown as a path that leads from the boss at the top (the owner) to every employee in the table.

```WITH RECURSIVE employee_hierarchy AS (
SELECT	employee_id,
first_name,
last_name,
reports_to,
'Owner' AS path
FROM employee
WHERE reports_to IS NULL

UNION ALL

SELECT
e.employee_id,
e.first_name,
e.last_name,
e.reports_to,
employee_hierarchy.path || '->' || e.last_name
FROM employee e, employee_hierarchy
WHERE e.reports_to = employee_hierarchy.employee_id
)
SELECT *
FROM employee_hierarchy;
```

As you’ve already learned, you start writing a recursive CTE using `WITH RECURSIVE`. Then you name the CTE. It’s `employee_hierarchy`, in this case.

The anchor member of the CTE is the first `SELECT` statement. By doing this, you select the root of the hierarchy; it’s the basis on which the recursive query will work its magic and find all other levels of the hierarchy. This statement selects all the columns from the table `employee`. It also adds the new column `path`, with the value in it being `'Owner'`. The `WHERE` clause means this will be done only for the rows where the value in the column `reports_to` is `NULL`. Why is that? If there is a `NULL` value in the `reports_to` column, the employee does not report to anyone. It means this is the owner of the company.

The next step is to “connect” the anchor member with the recursive member of the CTE with `UNION ALL`. The important thing about using `UNION ALL `is that the `SELECT` statements you’re “connecting” have to have the same number of columns. Otherwise, `UNION ALL` won’t work.

The recursive member is the second `SELECT` statement. This statement again selects all the columns from the table `employee`. It also takes the value (which is `'Owner'`) from the column path from the `employee_hierarchy` CTE. It adds '->' to it, followed by the value from the column `last_name` from the table `employee`. (The || is a concatenation operator; it combines two or more values into one value.) This will be a path leading from the owner to every employee.

The table `employee` and the CTE `employee_hierarchy` are joined like any other two tables. This is done where the column `reports_to` equals the column `employee_id`. The recursive member serves as an extension of the anchor member. This means it extends a result that’s already been found (by the anchor member) with new results. Thus, the recursive member will perform everything described until it reaches the last employee.

Finally, the simple invocation part selects all the data from the CTE `employee_hierarchy`. And, voila! The result is:

employee_idfirst_namelast_namereports_topath
6MartinaNovakNULLOwner
1SharonSimon6Owner->Simon
4KimMagnus6Owner->Magnus
5VincentTrafalgar6Owner->Trafalgar
2PatriciaRooney1Owner->Simon->Rooney
3JamesPalin1Owner->Simon->Palin
7VictorFonseca2Owner->Simon->Rooney->Fonseca

If you look at, say, Victor Fonseca, you can see that the path from the owner to him leads through Sharon Simon and Patricia Rooney.

Let’s practice recursive queries on another example!

### Querying the Folder Hierarchy

Companies usually have network drives where the employees save all their work. This usually leads to a very branched tree structure of folders. The data about the folders is stored in the table `folder`. Its columns are:

• `id` – The folder’s ID and the table’s primary key (PK).
• `name` – The name of the folder.
• `subfolder_of` – The name of the folder one level up.

To find the path for all the folders, you’ll need the following query:

```WITH RECURSIVE folder_hierarchy AS (
SELECT	id,
name,
subfolder_of,
CAST (name AS text) AS path
FROM folder
WHERE subfolder_of IS NULL

UNION ALL

SELECT	folder.id,
folder.name,
folder.subfolder_of,
folder_hierarchy.path || '\' || folder.name
FROM folder, folder_hierarchy
WHERE folder.subfolder_of = folder_hierarchy.id
)
SELECT *
FROM folder_hierarchy;
```

The principle is the same as in the previous example. Again, you start with `WITH RECURSIVE` and the name: `folder_hierarchy`. The first `SELECT` statement selects all three columns from the table `folder`. The fourth column is path, which contains data from the column name cast as text values. The data is cast to match the data type from the recursive member of the CTE. Finally, the WHERE clause limits the data only to those with the `NULL` values in the column `subfolder_of`. Where there’s `NULL`, there’s the root folder (i.e. the one having no folders above it).

The `UNION ALL` is again used to “connect” the anchor and recursive members of the CTE. One additional thing to remember: the data types in both `SELECT` statements have to be the same for `UNION ALL` to work. Otherwise, the query will return an error.

The second `SELECT` statement again selects all the columns from the table `folder`. The value from the columns path and name are put together, with `'\'` separating the data.

Finally, all the data from the CTE is selected, which returns a nice table:

idnamesubfolder_ofpath
1F:NULLF:
2Reporting1F:\Reporting
4Budget2F:\Reporting\Budget
5KPI2F:\Reporting\KPI
6Financial Reports2F:\Reporting\Financial Reports