Back to articles list Articles Cookbook
11 minutes read

What Is a Self Join in SQL? Explaned With 7 Examples

The self join is a popular special case of the SQL JOIN. While most JOINs link two or more tables with each other to present their data together, a self join links a table to itself. This is usually done by joining a table to itself just once within a SQL query, but it is possible to do so multiple times within the same query.

Typically, each table in a database stores a specific type of information. So, often there are hundreds of tables related to each other in a database. This implies the need for joins. You can join different tables by their common columns using the JOIN keyword. It is also possible to join a table to itself, which is known as a self join. In this article, we will discuss what a self join is, how it works, and when you need it in your SQL queries.

To practice SQL JOIN, including self joins, I recommend our interactive SQL JOINs course. It contains over 90 exercises on differnt types of JOIN, including sections dedicated just to self joins.

What Is a Self Join in SQL?

The self join, as its name implies, joins a table to itself. To use a self join, the table must contain a column (call it X) that acts as the primary key and a different column (call it Y) that stores values that can be matched up with the values in Column X. The values of Columns X and Y do not have to be the same for any given row, and the value in Column Y may even be null.

Let’s take a look at an example. Consider the table Employees:

IdFullNameSalaryManagerId
1John Smith100003
2Jane Anderson120003
3Tom Lanon150004
4Anne Connor20000
5Jeremy York90001

Each employee has his/her own Id, which is our “Column X.” For a given employee (i.e., row), the column ManagerId contains the Id of his or her manager; this is our “Column Y.” If we trace the employee-manager pairs in this table using these columns:

  • The manager of the employee John Smith is the employee with Id 3, i.e., Tom Lanon.
  • The manager of the employee Jane Anderson is the employee with Id 3, i.e., Tom Lanon.
  • The manager of the employee Tom Lanon is the employee with Id 4, i.e., Anne Connor.
  • The employee Anne Connor does not have a manager; her ManagerId is null.
  • The manager of the employee Jeremy York is the employee with Id 1, i.e., John Smith.

This type of table structure is very common in hierarchies. Now, to show the name of the manager for each employee in the same row, we can run the following query:

SELECT
	employee.Id,
    	employee.FullName,
    	employee.ManagerId,
    	manager.FullName as ManagerName
FROM Employees employee
JOIN Employees manager
ON employee.ManagerId = manager.Id

which returns the following result:

IdFullNameManagerIdManagerName
1John Smith3Tom Lanon
2Jane Anderson3Tom Lanon
3Tom Lanon4Anne Connor
5Jeremy York1John Smith

The query selects the columns Id, FullName, and ManagerId from the table aliased employee. It also selects the FullName column of the table aliased manager and designates this column as ManagerName. As a result, every employee who has a manager is output along with his/her manager’s ID and name.

In this query, the Employees table is joined with itself and has two different roles:

  • Role 1: It stores the employee data (alias employee).
  • Role 2: It stores the manager data (alias manager).

By doing so, we are essentially considering the two copies of the Employees table as if they are two distinct tables, one for the employees and another for the managers.

You can find more about the concept of the self join in our article An Illustrated Guide to the SQL Self Join.

Table Aliases in Self Join

When referring to the same table more than once in an SQL query, we need a way to distinguish each reference from the others. For this reason, it is important to use aliases to uniquely identify each reference of the same table in an SQL query. As a good practice, the aliases should indicate the role of the table for each specific reference in a query.

The aliases are in red in the following query. You can see their declaration in the FROM and JOIN clauses.

SELECT
	employee.Id,
    	employee.FullName,
    	employee.ManagerId,
    	manager.FullName as ManagerName
FROM Employees employee
JOIN Employees manager
ON employee.ManagerId = manager.Id

The JOIN keyword connects two tables and is usually followed by an ON or USING clause that specifies the common columns used for linking the two tables. Here, we see that the two references to the Employees table are linked by conditioning on the employee’s ManagerId to match the employee ID of the manager.

Examples

Let’s go through some common scenarios that use the self join.

Scenario 1: Processing a Hierarchy in SQL

The self join is commonly used in processing a hierarchy. As we saw earlier, a hierarchy assigns a row in a table to another row within the same table. You might think of it as having parent and child rows.

Let’s go back to the example with the employees and their managers. Here’s the Employees table again:

IdFullNameSalaryManagerId
1John Smith100003
2Jane Anderson120003
3Tom Lanon150004
4Anne Connor20000
5Jeremy York90001

And the code to list every employee that has a manager with the name of his or her manager:

SELECT
	employee.Id,
    	employee.FullName,
    	employee.ManagerId,
    	manager.FullName as ManagerName
FROM Employees employee
JOIN Employees manager
ON employee.ManagerId = manager.Id

Here’s the result when you run the code:

IdFullNameManagerIdManagerName
1John Smith3Tom Lanon
2Jane Anderson3Tom Lanon
3Tom Lanon4Anne Connor
5Jeremy York1John Smith

This query uses the standard join, also known as the INNER JOIN. To read more about the INNER JOIN, please see our article An Illustrated Guide to the SQL INNER JOIN.

If we want to list all the employees whether or not they have managers, we can use a LEFT OUTER JOIN instead. The query below does this:

SELECT
	employee.Id,
    	employee.FullName,
    	employee.ManagerId,
    	manager.FullName as ManagerName
FROM Employees employee
LEFT OUTER JOIN Employees manager
ON employee.ManagerId = manager.Id

When you run this query, you get the following result:

IdFullNameManagerIdManagerName
1John Smith3Tom Lanon
2Jane Anderson3Tom Lanon
3Tom Lanon4Anne Connor
4Anne Connor  
5Jeremy York1John Smith

The difference between JOIN and LEFT OUTER JOIN becomes clear when we compare this with the result of the previous SQL query. In the result of the (inner) JOIN query, only the employees with managers are included. In contrast, the LEFT OUTER JOIN query returns all employees, with or without managers. To read more about the OUTER JOIN, please read our article An Illustrated Guide to the SQL OUTER JOIN.

Another example of a hierarchy is the relationship between parents and their children. Consider the Human table shown here:

IdNameAgeParentId
1Jonathan53
2Alexandra73
3Barbara30

In the query below, the children are assigned to their respective parents by joining the Human table to itself:

SELECT
	child.Id as ChildId,
    	child.FirstName as ChildFirstName,
    	child.Age as ChildAge,
    	child.ParentId,
    	parent.FirstName as ParentFirstName,
    	parent.age as ParentAge
FROM Human child
INNER JOIN Human parent
ON child.ParentId = parent.Id

Here’s the result of this query:

ChildIdChildFirstNameChildAgeParentIdParentFirstNameParentAge
1Jonathan53Barbara30
2Alexandra73Barbara30

The result of the query includes only the children who have parents. As was the case in the example of the employee-manager hierarchy, we could use a LEFT OUTER JOIN to include all rows from the table aliased child.

Here is yet another example of a hierarchy. Consider the table Category, shown below:

IdQuantityCategoryParentCategoryId
160Food
250Fruit1
340Apple2
420Granny Smith3
5100Milk1
660Soy Milk5
740Cow Milk5
830Whole Milk7
910Fat-Free Milk7

Let’s assign a parent category to each category wherever it is possible. Here’s a query to do that:

SELECT
	category.Id,
	category.Quantity,
    	category.Category,
    	category.ParentCategoryId,
    	parentcategory.Category as ParentCategory
FROM Category category
JOIN Category parentcategory
ON category.ParentCategoryId = parentcategory.Id

And here is the result:

IdQuantityCategoryParentCategoryIdParentCategory
250Fruit1Food
340Apple2Fruit
420Granny Smith3Apple
5100Milk1Food
660Soy Milk5Milk
740Cow Milk5Milk
830Whole Milk7Cow Milk
910Fat-Free Milk7Cow Milk

The first four columns of the result above come from the reference to the table aliased category. The last column comes from the table aliased parentcategory and contains the parent category name matched by the respective Id.

The Category table has two different roles as denoted by the two separate references. The column ParentCategoryId from the table aliased category is matched with Id from the table aliased parentcategory. The ON clause specifies that ParentCategoryId from category must equal Id from parentcategory to connect the corresponding rows.

Scenario 2: Listing Pairs Within a Table

You can use a self join to generate pairs of rows based on the condition in the ON clause. Let’s start with a simple example that generates all possible pairs among the colleagues. Consider the following table, Colleagues:

IdFullNameAge
1Bart Thompson43
2Catherine Anderson44
3John Burkin35
4Nicole McGregor29

Suppose we need to generate all possible pairs among the colleagues so that everyone has a chance to talk with everyone else at the company introductory evening. Here’s the SQL code:

SELECT
	teammate1.FullName as Teammate1FullName,
	teammate1.Age as Teammate1Age,
    	teammate2.FullName as Teammate2FullName,
	teammate2.Age as Teammate2Age
FROM Colleagues teammate1
CROSS JOIN Colleagues teammate2
ON teammate1.FullName <> teammate2.FullName

And here’s the result:

Teammate1FullNameTeammate1AgeTeammate2FullNameTeammate2Age
Catherine Anderson44Bart Thompson43
John Burkin35Bart Thompson43
Nicole McGregor29Bart Thompson43
Bart Thompson43Catherine Anderson44
John Burkin35Catherine Anderson44
Nicole McGregor29Catherine Anderson44
Bart Thompson43John Burkin35
Catherine Anderson44John Burkin35
Nicole McGregor29John Burkin35
Bart Thompson43Nicole McGregor29
Catherine Anderson44Nicole McGregor29
John Burkin35Nicole McGregor29

The result matches every person with every single person in the table. Since we don’t want anyone to be paired with himself or herself, we have the ON clause condition teammate1.FullName <> teammate2.FullName. This means that each person will be paired with three other colleagues, because there are four colleagues at this event.

Now, let’s look at a slightly more complicated example. Consider the Human table shown below. We want to match all the ancestors to each person wherever the data allows, where a person is an ancestor if he or she has a higher Id value.

Below is the Human table used in this example.

IdFirstNameAgeParentId
1Jonathan53
2Alexandra73
3Barbara304
4Tom506
5George556
6Amy807
7Josephine9935

Let’s find all the descendant-ancestor pairs in the above table. Here’s the SQL code:

SELECT
	descendant.Id,
    	descendant.FirstName,
    	descendant.Age,
    	descendant.ParentId,
    	ancestor.Id as AncestorId,
    	ancestor.FirstName as AncestorFirstName,
    	ancestor.Age as AncestorAge
FROM Human descendant
LEFT JOIN Human ancestor
ON descendant.ParentId <= ancestor.Id

And the result:

IdFirstNameAgeParentIdAncestorIdAncestorFirstNameAncestorAge
1Jonathan533Barbara30
1Jonathan534Tom50
1Jonathan535George55
1Jonathan536Amy80
1Jonathan537Josephine99
2Alexandra733Barbara30
2Alexandra734Tom50
2Alexandra735George55
2Alexandra736Amy80
2Alexandra737Josephine99
3Barbara3044Tom50
3Barbara3045George55
3Barbara3046Amy80
3Barbara3047Josephine99
4Tom5066Amy80
4Tom5067Josephine99
5George5566Amy80
5George5567Josephine99
6Amy8077Josephine99
7Josephine99

By specifying the ON clause condition descendant.ParentId <= ancestor.Id, we find all the ancestors of every person in the table where they exist; otherwise, the query returns null for the ancestor information.

The first four columns are taken from the table aliased descendant which contains information of the person for whom the ancestors are searched. The last three columns are taken from the table aliased ancestor and contain details about each ancestor.

Scenario 3: Self Join in Combination With Another Table

In SQL, it is possible to have a self join in combination with one or more different tables. While not a clean self join, this is very common in practice.

A real-life example of this is the flight information in airports, with an enormous amount of data each hour. Suppose we want to search for a flight identification number along with the details about their departure and destination airports. Consider the following tables:

Table Airport:

AirportIdCountryCity
1USANew York
2CanadaToronto
3GermanyFrankfurt
4FranceParis
5ItalyRome

Table Flight:

FlightIdAirplaneIdStartTimestampEndTimestampStartAirportIdEndAirportId
25558772020-01-14 13:00:002020-01-14 15:00:0034
32225362020-02-04 01:00:002020-02-04 16:00:0015
41117452020-02-15 09:00:002020-02-15 12:00:0054
57775242020-02-24 03:00:002020-02-24 19:00:0042
68885212020-03-25 10:00:002020-03-25 12:00:0021
74449372020-04-01 00:00:002020-04-01 17:00:0031
2431116542020-01-01 02:00:002020-01-01 04:00:0012

Here, note that the column AirportId of the Airport table is the foreign key to the columns StartAirportId and EndAirportId of the Flight table. We will join the Airport table to the Flight table two separate times as follows:

  • In the first JOIN, Airport takes the role of the table with the starting airports.
  • In the second JOIN, Airport takes the role of the table with the destination airports.

The query looks like this:

SELECT
	flight.FlightId,
    	flight.AirplaneId,
    	flight.StartAirportId,
    	startairport.Country as StartAirportCountry,
    	startairport.City as StartAirportCity,
    	flight.EndAirportId,
    	endairport.Country as EndAirportCountry,
    	endairport.City as EndAirportCity
FROM Flight flight
JOIN Airport startairport
ON flight.StartAirportId = startairport.AirportId
JOIN Airport endairport
ON flight.EndAirportId = endairport.AirportId

And the result of the query looks like this:

FlightIdAirplaneIdStartAirportIdStartAirportCountryStartAirportCityEndAirportIdEndAirportCountryEndAirportCity
11116541USANew York2CanadaToronto
25558773GermanyFrankfurt4FranceParis
32225361USANew York5ItalyRome
41117455ItalyRome4FranceParis
57775244FranceParis2CanadaToronto
68885212CanadaToronto1USANew York
74449373GermanyFrankfurt1USANew York

Let’s analyze the result. The first three columns come from a straightforward SELECT of the Flight table. The next two columns come from Airport in the role of the starting airport table; the rows are matched based on AirportId and StartAirportId from the Airport and Flight tables, respectively. This is followed by a column from the Flight table. The last two columns come from Airport in the role of the destination airport table this time; the rows are matched based on AirportId and EndAirportId from the Airport and Flight tables, respectively.

Still a bit confused about all the JOINs? There are many more articles for you to browse through for help. I especially recommend the article on How to Learn SQL JOINs. And if you need to start practicing, please visit our article on How to Practice SQL JOINs.

Self Join: A Special Case of the Join

As we have seen, the self join is an important special case of the join. We have seen examples of various applications of the self join, including processing a hierarchy in a table and pairing the rows within a table. We can join the same table multiple times, but it is important to give each reference an alias that indicates its role. These table aliases are used to fetch columns from this single table based on the role for which it is referenced.

Joins are a vital part of SQL and a very helpful, frequently used feature for combining different tables. They are everywhere – be sure to check out our course on SQL JOINs to master this powerful tool!