Back to articles list Articles Cookbook
10 minutes read

How to Join the Same Table Twice

JOIN is one of the most common statements in SQL. As you may know, it is used to join and combine data from two or more tables into one common data set. In this article, I’m going to discuss special types of joins? in which you combine the same table twice—including joining a table to itself, also known as the self join. When and why do you need to do this? How do you write it in SQL? Let’s find out.

Twins

Joins: A Quick Review

You are probably familiar with the joins in SQL. You have two tables, A and B, and you combine them by using a column common to both. Here is an example:

Tables

We have two tables: customer and city, with a common column named city_id.

Now, if you want to join them together to get the customers’ respective city names, you can do so with a join like this:

select customer.customer_id,
     customer.firstname,
     customer.lastname,
     customer.birthdate,
     customer.spouse_id,
     customer.city_id,
     city.name as city_name
from customer 
join city 
on customer.city_id = city.city_id;

In this JOIN statement, we match the records from customer and city by a key (city_id). We retrieve all 6 columns from the customer table and one column, name, from the city table. There are several types of joins in SQL; this example does an INNER JOIN.

I'm not going to dive deep into the JOIN syntax here. To learn more, check out our interactive SQL JOINs course. It contains 93 exercises designed to refresh SQL JOIN syntax and practice different JOIN types on real world problems.

The result of this join will be one table with all 6 fields from the customer table, plus an additional field from the city table:

customer_idfirstnamelastnamebirthdatespouse_idcity_idcity_name
1JohnMayer1983‑05‑1221London
2MaryMayer1990-07-3011London
3LisaRoss1989-04-1556Oxford
4AnnaTimothy1988-12-2664Leeds
5TimRoss1957-08-1536Oxford
6SteveDonell1967-07-0944Leeds
7DonnaTrapp1978-06-2302Manchester

Joining the Same Table Multiple Times

Now that we have done a quick review, let's look at more complex joins.

Sometimes you need to join the same table multiple times. Generally, this involves adding one or more columns to a result set from the same table but to different records or by different columns. We will examine two such scenarios: joining a table to itself and joining tables with multiple relationships.

Self Join: Joining a Table to Itself

A self join is a special case of the join. Instead of joining two different tables, you join one table to itself. Why would we want to do this?

In our example above, we wanted to add a column from the city table, the city name, to the customer table. So, we joined two different tables to each other. Doing a self join would mean, for instance, joining the customer table to itself.

Here’s the customer table as a reminder:

customer_idfirstnamelastnamebirthdatespouse_id
1JohnMayer1983-05-122
2MaryMayer1990-07-301
3LisaRoss1989-04-155
4AnnaTimothy1988-12-266
5TimRoss1957-08-153
6SteveDonell1967-07-094
7DonnaTrapp1978-06-23.

The spouse_id column stores the customer_id of the customer’s spouse. For example, Customers 1 and 2 (John and Mary) are spouses of each other, Customers 3 and 5 (Lisa and Tim) are spouses of each other, and so on. We can add the first name and the last name of the spouse to each record in the customer table. To do this, we need to perform a self join, that is, join the customer table to itself:

select cust.customer_id,
      cust.firstname,
      cust.lastname,
      cust.birthdate,
      cust.spouse_id,
      spouse.firstname as spouse_firstname,
      spouse.lastname as spouse_lastname
from customer cust 
join customer spouse
on cust.spouse_id = spouse.customer_id;

When you run this code, the result is the following:

customer_idfirstnamelastnamebirthdatespouse_idspouse_firstnamespouse_lastname
1JohnMayer1983‑05‑122MaryMayer
2MaryMayer1990-07-301JohnMayer
3LisaRoss1989-04-155TimRoss
4AnnaTimothy1988-12-266SteveDonell
5TimRoss1957-08-153LisaRoss
6SteveDonell1967-07-094AnnaTimothy

Now that you’ve seen an example use case for self joins, let's review its SQL syntax.

Self Join Syntax

The syntax for the self join is very similar to any other type of joins. Here is the code from our self join example:

select cust.customer_id,
      cust.firstname,
      cust.lastname,
      cust.birthdate,
      cust.spouse_id,
      spouse.firstname as spouse_firstname,
      spouse.lastname as spouse_lastname
from customer cust
join customer spouse
on cust.spouse_id = spouse.customer_id;

It is a JOIN statement in which the customer table is used twice.

The information about the spouse, such as his or her name, is stored in the same table as a separate customer with his or her own customer_id. Since spouse_id contains the customer_id of the spouse, we need to join the table with itself to get the name of the spouse.

You can think of a self join as a join between two copies of the same table. For each record with a non-null value in spouse_id, we search for the value of customer_id that matches it. When we find a match, the columns firstname and lastname are added to the resulting table.

Table aliases are required in a self join. The code does not work without them, since it would not know which copy of the table you are referring to. Here, I’m using the aliases cust and spouse.

Right before the FROM keyword, we choose columns we want to keep in the resulting table. We need to use the table aliases for column retrieval (cust.firstname, cust.lastname, spouse.firstname, etc.). We keep five columns from the customer table and append from the same table two columns that contain the name of the spouse.

This is an inner join, but you can use any type of join: LEFT JOIN, RIGHT JOIN, CROSS JOIN, etc. In our example, we have an inner join which returns only the matched records; that is, only the customers with spouses are returned. Donna doesn't have a spouse in the table, so Donna is not included in the resulting data set.

If you want to learn more about joins, I recommend our article “How to Learn SQL JOINs.” There is also an interactive SQL JOINs course available on our LearnSQL.com platform.

Examples of the Self Join

The above is just one example. Once you start to use SQL on a daily basis, you come across the need for self joins quite often.

A common use case for the self join is when there is a hierarchy among the records in a table. This type of data structure is called a tree structure, and you often need to join the table with itself in SQL. Here are some examples.

Example 1: Employee Hierarchy

Each employee has a manager, and a manager in turn has his or her manager, all in the same table. If you want to add the corresponding manager information to each record, you need to do a self join. We cover this example in the article “An Illustrated Guide to the SQL Self Join,” so take a look at it to see what this looks like.

Example 2: Department Hierarchy

Each department within an organization has a parent: for example, the Data Science department is under the IT department, the IT department is under Business Support, and Business Support is under the board.

Consider the following table, department, which looks like this:

department_idnameparent_department_id
1Board of directors.
2Operations1
3Control and risk1
4Administration1
5Corporate credit2
6Retail banking2
7Investment2
8Risk management3
9Finance3
10Internal audit3
11IT4
12Legal4
13General services4
14Human resources4

Now, if you want to append the parent name to each department, you need to write a self join:

select c.*,
   p.name as parent_name
from department c 
left join department p 
on c.parent_department_id=p.department_id;

Inside this SELECT, we join the department table with itself to get the parent department name as an additional field. Note that the record whose department_id is 1 doesn't have a parent (parent_department_id is NULL; it is not populated). This is because the board of directors are at the top of the tree structure. We want to display this record in the result, so we use a LEFT JOIN and not an INNER JOIN.

When you run this code, the resulting table looks like this:

department_idnameparent_department_idparent_name
1Board of directors..
2Operations1Board of directors
3Control and risk1Board of directors
4Administration1Board of directors
5Corporate credit2Operations
6Retail banking2Operations
7Investment2Operations
8Risk management3Control and risk
9Finance3Control and risk
10Internal audit3Control and risk
11IT4Administration
12Legal4Administration
13General services4Administration
14Human resources4Administration

You can easily see the parent to which each department belongs: IT is under Administration, Administration is under the Board, etc.

Example 3: Category Hierarchy

Take classified advertising—those popular web pages where you can rent, buy, or sell anything from real estate to miscellaneous products and services. To place an ad, you pick a category and a subcategory for your ad. For example, if you sell real estate, you would choose among subcategories like house, apartment, or land.

We have a table named category which contains information about these categories and subcategories as well as their relationships. The relationships among the categories and subcategories in this table are stored in a parent-child structure like this:

category_idcategory_nameparent_category_id
1Real estate.
2Apartments1
3Houses1
4Offices1
5Cars.
6Motorcycles5
7Personal cars5
8Oldtimer5
9Trucks5

All categories and subcategories are in this one table. Now, If you want to add information about the parent to each record, you will need to do a self join—join this table to itself:

select subcategory.*,
    main.category_name as parent_name
from category subcategory 
left join category main 
on subcategory.parent_category_id = main.category_id;

Here is the result from executing this SQL statement:

category_idcategory_nameparent_category_idparent_name
1Real Estate..
2Apartments1Real Estate
3Houses1Real Estate
4Offices1Real Estate
5Cars..
6Motorcycles5Cars
7Personal cars5Cars
8Oldtimer5Cars
9Trucks5Cars

Multiple Relationships Between two Tables

There are situations beside the self join in which you need to join the same table more than once. One is when you have multiple relationships between two different tables. This is where you join the same table twice but usually to some other table and not necessarily to itself.

Suppose that the customer table has two fields that contain city IDs. This is common if you have two different cities for each customer?—?for example, the city of residence (residence_city_id) and the city of the mailing address where the notices should be sent (notice_city_id):

customer_idfirstnamelastnamebirthdateresidence_city_idnotice_city_id
1JohnMayer1983-05-1216
2MaryMayer1990-07-3016
3LisaRoss1989-04-1567
4AnnaTimothy1988-12-2644
5TimRoss1957-08-1567
6SteveDonell1967-07-0944
7DonnaTrapp1978-06-2322

We also have city which has the city ID (city_id) and the name of the city (name), as seen earlier and shown below as a reminder:

city_idname
1London
2Manchester
3Liverpool
4Leeds
5Bristol
6Oxford
7Reading
8Brighton
9Sheffield
10York

Now, if you want to display the names of the cities, you will have to join the city table twice:

select cust.customer_id,
      cust.firstname,
      cust.lastname,
      cust.birthdate,
      cust.residence_city_id,
      cust.notice_city_id,
      residence_city.name as residence_city_name,
      notice_city.name as notice_city_name
from customer cust 
join city residence_city 
on cust.residence_city_id=residence_city.city_id
join city notice_city 
on cust.notice_city_id=notice_city.city_id;

Let’s break down what is happening in this code. First, we join customer and city with residence_city_id as the key. We get residence_city_name by matching it to city_id in the city table. A second join is performed between customer and city to get notice_city_name. The key used here is notice_city_id which also matches to city_id in the city table.

We use table aliases cust for customer, residence_city for the first copy of city to get the residence city name, and notice_city for the second copy of city to get the notice city name. We use the aliases to define the columns in the resulting table. Aliases are also used during the join to define the key columns. Again, aliases are required in order to distinguish the two copies of city.

When you run this code, you get the following result:

customer_idfirstnamelastnamebirthdateresidence_city_idnotice_city_idresidence_city_namenotice_city_name
1JohnMayer1983‑05‑1216LondonOxford
2MaryMayer1990-07-3016LondonOxford
3LisaRoss1989-04-1567OxfordReading
4AnnaTimothy1988-12-2644LeedsLeeds
5TimRoss1957-08-1567OxfordReading
6SteveDonell1967-07-0944LeedsLeeds
7DonnaTrapp1978-06-2322ManchesterManchester

We now have two additional columns with the corresponding city names.

We use simple (i.e., inner) joins here, but you can use any type of join as needed. If you are new to SQL joins and want to read more about their different types, I recommend the articles “How to learn joins” and “How to practice joins” which cover these topics. If you prefer to learn by watching videos. I highly recommend the episode that discusses joins.

Joining the Same Table Twice

In this article, we discussed when you need to join the same table twice in SQL and saw some common business use cases. We explained how to do so and what the SQL syntax looks like. Self joins with hierarchical data and multiple relationships between two tables are just two of the situations for which you need to join the same table twice. There are others; generally, they involve adding one or more columns to a result set from the same table in the same column.

If you want to learn more about joins, including self joins, I recommend our SQL JOINs interactive course available on our LearnSQL.com platform. When it comes to SQL, it is important to practice; our course is designed for just that!