Back to articles list Articles Cookbook
8 minutes read

How to Get Descendants of a Parent in SQL

Want to learn how to handle family trees and find descendants of a parent? By reading this article, you’ll learn how to handle hierarchical data.

Finding descendants from a parent is a common problem in SQL. If you imagine a family tree, the basic building block that forms the relationships within it is the parent-child relationship. The parent-child relationship is precisely what defines all hierarchical data.

Another example of hierarchical data is the manager-employee relationship. You can find hierarchical data in numerous other situations, such as detailed data about project(s), their subprojects, and every task that can be broken down into subtasks. Another example of hierarchical data is when you’re building a classification of living things where you classify the creatures according to the kingdom, phylum, classes, order, families, genus, and species. Or maybe you are looking at a music genre and its subgenres. For example, blues music can be separated into subgenres such as Delta blues, country blues, jump blues, and Chicago blues. It influenced a new style, soul music. Soul music has its subgenres, such as Southern soul, Motown soul, Philly soul, progressive soul, etc., which in turn influenced the creation of other genres, such as funk or disco.

This article covers the topics you can learn in greater detail in the Recursive Queries course. There’s a guide describing how this knowledge can be beneficial for you; reading it can help you decide whether to learn recursive queries.

Design of a Hierarchical Table

You’ll notice one specific characteristic of tables holding data with parent-child relationships: two columns containing ID values. One column is the ID used for the unique identification of the data within the table. The other column contains the ID used to reference another row.

Let me show you what I mean by that. The table you’ll use is named family_tree, with the following columns:

  • id: The person’s ID and the primary key (PK) for the table.
  • first_name: The person’s first name.
  • last_name: The person’s last name.
  • parent_id: The person’s parent ID.

Here are some data from the table:

idfirst_namelast_nameparent_id
2JohnMcArthur1
5SamMcArthur2

You see that John McArthur’s ID is 2. Sam McArthur’s ID is 5, while his parent_id = 2. This means his parent has the ID 2; in this case, it’s John McArthur.

This data structure is characteristic of hierarchical data which is quite common in databases. To get data from a table like this, you’ll have to use hierarchical or recursive queries. If you want to know what recursive queries can do, jump to the article revealing their power. I’ll be more specific and show you how to structure a recursive query to get all descendants of a parent from the table above.

Recursive Query: Retrieving the Descendants

First, let me show you what the query looks like. Then I’ll break it down for you, analyze all the parts of the query, and explain their purpose.

Let’s go! The query that’ll get you all descendants of a parent is this one:

WITH RECURSIVE descendant AS (
	SELECT	id,
			first_name,
			last_name,
			parent_id,
			0 AS level
	FROM family_tree
	WHERE id = 1

	UNION ALL

	SELECT	ft.id,
			ft.first_name,
			ft.last_name,
			ft.parent_id,
			level + 1
	FROM family_tree ft
JOIN descendant d
ON ft.parent_id = d.id
)

SELECT	d.id AS descendant_id,
		d.first_name AS descendant_first_name,
		d.last_name AS descendant_last_name,
		a.id AS ancestor_id,
		a.first_name AS ancestor_first_name,
		a.last_name AS ancestor_last_name,
		d.level
FROM descendant d
JOIN family_tree a
ON d.parent_id = a.id
ORDER BY level, ancestor_id;

The query starts by defining the CTE. The three letters stand for Common Table Expression, and you’ll recognize a CTE when you see the word WITH. More theory behind the CTE and its syntax are explained in another article.

CTEs can be recursive or non-recursive. Wondering how to differentiate between those two? It’s more straightforward than you think; the ones starting with WITH RECURSIVE are recursive.

When you write a CTE, you have to specify its name. In my case, the CTE is named descendant by the following line of code: WITH RECURSIVE descendant AS. What follows after that is a SELECT statement in parentheses, which is stored as a CTE. In this SELECT statement, I select all columns from the table family_tree. I also add a new column named level which will contain the value 0. Bear with me; you’ll see the purpose of this step in a little bit. Finally, I use a WHERE clause to filter the results. I’m interested in finding all descendants of one Peter McArthur, whose id = 1.

Then I “merge” this SELECT statement with the next one by using UNION ALL. It is a command that helps you combine the results of two or more SELECT statements. To make the UNION ALL work, all SELECT statements have to have the same number of columns. The following SELECT statement again selects all columns from the table family_tree. Additionally, it takes the value from the column level (from the previous SELECT statement, 0) and adds 1 with every recursion. The data for this SELECT statement is fetched by joining the table family_tree with the CTE, treated here like any other table. Since the CTE data is the same as the data in the table family_tree, save for the new column level, this JOIN essentially joins the table family_tree with itself on the columns parent_id and id.

By closing the parentheses, you’re done with defining the CTE. Now comes the SELECT statement, which uses the CTE and returns all descendants for each parent. I first select the columns id, first_name, and last_name from the CTE descendant in this SELECT statement. The CTE is treated like a regular table, with d as the alias. I join it with the table family_tree with the alias a. I’ve chosen this alias, because I’m treating the CTE as the table for the descendant data and family_tree as the table containing the ancestor data. It’s just a simple trick that allows me not to mess up when writing a query. The columns selected from the table family_tree are again id, first_name, and last_name. The last column selected is level from the CTE.

At the end, the data is ordered by the columns level and ancestor_id. Here’s the result:

descendant_iddescendant_first_namedescendant_last_nameancestor_idancestor_first_nameancestor_last_namelevel
2JohnMcArthur1PeterMcArthur1
3SteveMcArthur1PeterMcArthur1
4StaceyGustaffson1PeterMcArthur1
5SamMcArthur2JohnMcArthur2
6PaulMcArthur2JohnMcArthur2
7Steve IIMcArthur3SteveMcArthur2
8JimiGustaffson4StaceyGustaffson2
9JanisMontignac4StaceyGustaffson2
10TracySarakopuolus4StaceyGustaffson2
11AlMcArthur5SamMcArthur3
12RobertMcArthur6PaulMcArthur3
13CarolMcArthur6PaulMcArthur3
14SabineMcArthur7Steve IIMcArthur3
15MichelleMcArthur7Steve IIMcArthur3
16JudiOswald7Steve IIMcArthur3

These are all descendants of Peter McArthur whose ID is 1. The people at Level 1 are Peter McArthur’s children. His grandchildren are at Level 2, with their parents shown in the table. Level 3 shows Peter McArthur’s great-grandchildren, with their parents also shown by their names.

By knowing how to write the query above, you can apply it to any other scenario in which there’s a hierarchical data structure.

Additional Tips

Writing Recursive CTEs in Microsoft SQL Server

If you try to run the query above in SQL Server, it will show an error. No need to panic; SQL Server does support CTEs and recursive queries. The only difference is you don’t need to write RECURSIVE in SQL Server to get a recursive query. Simply omit it, like this:

WITH descendant AS...

Everything else is the same; there is no need to change any other part of the query. If you’re using Oracle, here’s the article explaining the use of hierarchical queries.

Getting the NULL Values in the Table

If you take a closer look at the table showing Peter McArthur’s descendants, you’ll notice there’s no Peter McArthur himself. That’s not an error; it’s simply because he’s not his own descendant. However, maybe you’ll want to show him too, so that you have the complete family tree from the first ancestor to the last descendant.

To do that, you’ll first need to know why Peter did not show up in the resulting table. The reason is that I used JOIN to connect the tables. Since Peter McArthur is the starting point of the family tree, there’s a NULL value in the column parent_id. The JOIN doesn’t return NULL values.

 

There’s just one little thing you need to change in the query to get the NULL values. Instead of using JOIN, simply use the LEFT JOIN. Here’s the part of the code that changes; everything else remains the same:

...FROM descendant d LEFT JOIN family_tree a ON d.parent_id = a.id...

The result differs in one row, namely the first one:

descendant_iddescendant_first_namedescendant_last_nameancestor_idancestor_first_nameancestor_last_namelevel
1PeterMcArthurNULLNULLNULL0
2JohnMcArthur1PeterMcArthur1
3SteveMcArthur1PeterMcArthur1
4StaceyGustaffson1PeterMcArthur1
5SamMcArthur2JohnMcArthur2
6PaulMcArthur2JohnMcArthur2
7Steve IIMcArthur3SteveMcArthur2
8JimiGustaffson4StaceyGustaffson2
9JanisMontignac4StaceyGustaffson2
10TracySarakopuolus4StaceyGustaffson2
11AlMcArthur5SamMcArthur3
12RobertMcArthur6PaulMcArthur3
13CarolMcArthur6PaulMcArthur3
14SabineMcArthur7Steve IIMcArthur3
15MichelleMcArthur7Steve IIMcArthur3
16JudiOswald7Steve IIMcArthur3

Getting the Hang of Finding the Descendants of a Parent?

In this article, you’ve learned about the hierarchical data structure and about the parent-child relationship. I’ve given you several real-life examples where you can find such data structures. They are not the only ones, of course.

You’ve learned the characteristic structure of such data, in preparation for writing a query that allows you to get all descendants of an ancestor. From that example, you should be able to apply the knowledge in similar scenarios. To be even more proficient at writing recursive queries and recognizing when to use them, practicing with the Recursive Queries course is highly recommended. The article explaining when to use the CTEs can also help you in getting better with CTEs.