Back to articles list October 9, 2020 - 10 minutes read How to Join the Same Table Twice Marija Ilic Marija works as a data scientist in the banking industry. She specializes in big data platforms (Cloudera and Hadoop) with software and technologies such as Hive/Impala, Python and PySpark, Kafka, and R. Marija has an extensive background in DWH/ETL development in the banking industry. Her main interests are predictive modeling, real-time decision-making, and social network analysis. Outside of work, Marija enjoys listening to her favorite LPs on her old gramophone—and never grows tired of its soothing crackle. Tags: sql learn sql joins JOIN is one of the most common statements in SQL. As you may know, it is used to join and combine data from two or more tables into one common data set. In this article, I’m going to discuss special types of joins? in which you combine the same table twice—including joining a table to itself, also known as the self join. When and why do you need to do this? How do you write it in SQL? Let’s find out. Joins: A Quick Review You are probably familiar with the joins in SQL. You have two tables, A and B, and you combine them by using a column common to both. Here is an example: We have two tables: customer and city, with a common column named city_id. Now, if you want to join them together to get the customers’ respective city names, you can do so with a join like this: select customer.customer_id, customer.firstname, customer.lastname, customer.birthdate, customer.spouse_id, customer.city_id, city.name as city_name from customer join city on customer.city_id = city.city_id; In this JOIN statement, we match the records from customer and city by a key (city_id). We retrieve all 6 columns from the customer table and one column, name, from the city table. There are several types of joins in SQL; this example does an INNER JOIN. I'm not going to dive deep into the JOIN syntax here. To learn more, check out our interactive course on SQL joins which you can find on LearnSQL.com. The result of this join will be one table with all 6 fields from the customer table, plus an additional field from the city table: customer_idfirstnamelastnamebirthdatespouse_idcity_idcity_name 1JohnMayer1983‑05‑1221London 2MaryMayer1990-07-3011London 3LisaRoss1989-04-1556Oxford 4AnnaTimothy1988-12-2664Leeds 5TimRoss1957-08-1536Oxford 6SteveDonell1967-07-0944Leeds 7DonnaTrapp1978-06-2302Manchester Joining the Same Table Multiple Times Now that we have done a quick review, let's look at more complex joins. Sometimes you need to join the same table multiple times. Generally, this involves adding one or more columns to a result set from the same table but to different records or by different columns. We will examine two such scenarios: joining a table to itself and joining tables with multiple relationships. Self Join: Joining a Table to Itself A self join is a special case of the join. Instead of joining two different tables, you join one table to itself. Why would we want to do this? In our example above, we wanted to add a column from the city table, the city name, to the customer table. So, we joined two different tables to each other. Doing a self join would mean, for instance, joining the customer table to itself. Here’s the customer table as a reminder: customer_idfirstnamelastnamebirthdatespouse_id 1JohnMayer1983-05-122 2MaryMayer1990-07-301 3LisaRoss1989-04-155 4AnnaTimothy1988-12-266 5TimRoss1957-08-153 6SteveDonell1967-07-094 7DonnaTrapp1978-06-23. The spouse_id column stores the customer_id of the customer’s spouse. For example, Customers 1 and 2 (John and Mary) are spouses of each other, Customers 3 and 5 (Lisa and Tim) are spouses of each other, and so on. We can add the first name and the last name of the spouse to each record in the customer table. To do this, we need to perform a self join, that is, join the customer table to itself: select cust.customer_id, cust.firstname, cust.lastname, cust.birthdate, cust.spouse_id, spouse.firstname as spouse_firstname, spouse.lastname as spouse_lastname from customer cust join customer spouse on cust.spouse_id = spouse.customer_id; When you run this code, the result is the following: customer_idfirstnamelastnamebirthdatespouse_idspouse_firstnamespouse_lastname 1JohnMayer1983‑05‑122MaryMayer 2MaryMayer1990-07-301JohnMayer 3LisaRoss1989-04-155TimRoss 4AnnaTimothy1988-12-266SteveDonell 5TimRoss1957-08-153LisaRoss 6SteveDonell1967-07-094AnnaTimothy Now that you’ve seen an example use case for self joins, let's review its SQL syntax. Self Join Syntax The syntax for the self join is very similar to any other type of joins. Here is the code from our self join example: select cust.customer_id, cust.firstname, cust.lastname, cust.birthdate, cust.spouse_id, spouse.firstname as spouse_firstname, spouse.lastname as spouse_lastname from customer cust join customer spouse on cust.spouse_id = spouse.customer_id; It is a JOIN statement in which the customer table is used twice. The information about the spouse, such as his or her name, is stored in the same table as a separate customer with his or her own customer_id. Since spouse_id contains the customer_id of the spouse, we need to join the table with itself to get the name of the spouse. You can think of a self join as a join between two copies of the same table. For each record with a non-null value in spouse_id, we search for the value of customer_id that matches it. When we find a match, the columns firstname and lastname are added to the resulting table. Table aliases are required in a self join. The code does not work without them, since it would not know which copy of the table you are referring to. Here, I’m using the aliases cust and spouse. Right before the FROM keyword, we choose columns we want to keep in the resulting table. We need to use the table aliases for column retrieval (cust.firstname, cust.lastname, spouse.firstname, etc.). We keep five columns from the customer table and append from the same table two columns that contain the name of the spouse. This is an inner join, but you can use any type of join: LEFT JOIN, RIGHT JOIN, CROSS JOIN, etc. In our example, we have an inner join which returns only the matched records; that is, only the customers with spouses are returned. Donna doesn't have a spouse in the table, so Donna is not included in the resulting data set. If you want to learn more about joins, I recommend our article “How to Learn SQL JOINs.” There is also an interactive SQL joins course available on our LearnSQL.com platform. Examples of the Self Join The above is just one example. Once you start to use SQL on a daily basis, you come across the need for self joins quite often. A common use case for the self join is when there is a hierarchy among the records in a table. This type of data structure is called a tree structure, and you often need to join the table with itself in SQL. Here are some examples. Example 1: Employee Hierarchy Each employee has a manager, and a manager in turn has his or her manager, all in the same table. If you want to add the corresponding manager information to each record, you need to do a self join. We cover this example in the article “An Illustrated Guide to the SQL Self Join,” so take a look at it to see what this looks like. Example 2: Department Hierarchy Each department within an organization has a parent: for example, the Data Science department is under the IT department, the IT department is under Business Support, and Business Support is under the board. Consider the following table, department, which looks like this: department_idnameparent_department_id 1Board of directors. 2Operations1 3Control and risk1 4Administration1 5Corporate credit2 6Retail banking2 7Investment2 8Risk management3 9Finance3 10Internal audit3 11IT4 12Legal4 13General services4 14Human resources4 Now, if you want to append the parent name to each department, you need to write a self join: select c.*, p.name as parent_name from department c left join department p on c.parent_department_id=p.department_id; Inside this SELECT, we join the department table with itself to get the parent department name as an additional field. Note that the record whose department_id is 1 doesn't have a parent (parent_department_id is NULL; it is not populated). This is because the board of directors are at the top of the tree structure. We want to display this record in the result, so we use a LEFT JOIN and not an INNER JOIN. When you run this code, the resulting table looks like this: department_idnameparent_department_idparent_name 1Board of directors.. 2Operations1Board of directors 3Control and risk1Board of directors 4Administration1Board of directors 5Corporate credit2Operations 6Retail banking2Operations 7Investment2Operations 8Risk management3Control and risk 9Finance3Control and risk 10Internal audit3Control and risk 11IT4Administration 12Legal4Administration 13General services4Administration 14Human resources4Administration You can easily see the parent to which each department belongs: IT is under Administration, Administration is under the Board, etc. Example 3: Category Hierarchy Take classified advertising—those popular web pages where you can rent, buy, or sell anything from real estate to miscellaneous products and services. To place an ad, you pick a category and a subcategory for your ad. For example, if you sell real estate, you would choose among subcategories like house, apartment, or land. We have a table named category which contains information about these categories and subcategories as well as their relationships. The relationships among the categories and subcategories in this table are stored in a parent-child structure like this: category_idcategory_nameparent_category_id 1Real estate. 2Apartments1 3Houses1 4Offices1 5Cars. 6Motorcycles5 7Personal cars5 8Oldtimer5 9Trucks5 All categories and subcategories are in this one table. Now, If you want to add information about the parent to each record, you will need to do a self join—join this table to itself: select subcategory.*, main.category_name as parent_name from category subcategory left join category main on subcategory.parent_category_id = main.category_id; Here is the result from executing this SQL statement: category_idcategory_nameparent_category_idparent_name 1Real Estate.. 2Apartments1Real Estate 3Houses1Real Estate 4Offices1Real Estate 5Cars.. 6Motorcycles5Cars 7Personal cars5Cars 8Oldtimer5Cars 9Trucks5Cars Multiple Relationships Between two Tables There are situations beside the self join in which you need to join the same table more than once. One is when you have multiple relationships between two different tables. This is where you join the same table twice but usually to some other table and not necessarily to itself. Suppose that the customer table has two fields that contain city IDs. This is common if you have two different cities for each customer?—?for example, the city of residence (residence_city_id) and the city of the mailing address where the notices should be sent (notice_city_id): customer_idfirstnamelastnamebirthdateresidence_city_idnotice_city_id 1JohnMayer1983-05-1216 2MaryMayer1990-07-3016 3LisaRoss1989-04-1567 4AnnaTimothy1988-12-2644 5TimRoss1957-08-1567 6SteveDonell1967-07-0944 7DonnaTrapp1978-06-2322 We also have city which has the city ID (city_id) and the name of the city (name), as seen earlier and shown below as a reminder: city_idname 1London 2Manchester 3Liverpool 4Leeds 5Bristol 6Oxford 7Reading 8Brighton 9Sheffield 10York Now, if you want to display the names of the cities, you will have to join the city table twice: select cust.customer_id, cust.firstname, cust.lastname, cust.birthdate, cust.residence_city_id, cust.notice_city_id, residence_city.name as residence_city_name, notice_city.name as notice_city_name from customer cust join city residence_city on cust.residence_city_id=residence_city.city_id join city notice_city on cust.notice_city_id=notice_city.city_id; Let’s break down what is happening in this code. First, we join customer and city with residence_city_id as the key. We get residence_city_name by matching it to city_id in the city table. A second join is performed between customer and city to get notice_city_name. The key used here is notice_city_id which also matches to city_id in the city table. We use table aliases cust for customer, residence_city for the first copy of city to get the residence city name, and notice_city for the second copy of city to get the notice city name. We use the aliases to define the columns in the resulting table. Aliases are also used during the join to define the key columns. Again, aliases are required in order to distinguish the two copies of city. When you run this code, you get the following result: customer_idfirstnamelastnamebirthdateresidence_city_idnotice_city_idresidence_city_namenotice_city_name 1JohnMayer1983‑05‑1216LondonOxford 2MaryMayer1990-07-3016LondonOxford 3LisaRoss1989-04-1567OxfordReading 4AnnaTimothy1988-12-2644LeedsLeeds 5TimRoss1957-08-1567OxfordReading 6SteveDonell1967-07-0944LeedsLeeds 7DonnaTrapp1978-06-2322ManchesterManchester We now have two additional columns with the corresponding city names. We use simple (i.e., inner) joins here, but you can use any type of join as needed. If you are new to SQL joins and want to read more about their different types, I recommend the articles “How to learn joins” and “How to practice joins” which cover these topics. If you prefer to learn by watching videos. I highly recommend the episode that discusses joins. Joining the Same Table Twice In this article, we discussed when you need to join the same table twice in SQL and saw some common business use cases. We explained how to do so and what the SQL syntax looks like. Self joins with hierarchical data and multiple relationships between two tables are just two of the situations for which you need to join the same table twice. There are others; generally, they involve adding one or more columns to a result set from the same table in the same column. If you want to learn more about joins, including self joins, I recommend our interactive course SQL Joins available on our LearnSQL.com platform. When it comes to SQL, it is important to practice; our course is designed for just that! Tags: sql learn sql joins You may also like How to Learn SQL JOINs Learn how to use SQL JOINs to effectively combine data across multiple tables and analyze sophisticated data sets. Read more SQL JOIN Cheat Sheet This 2-page SQL JOIN Cheat Sheet covers the syntax of different JOINs (even the rare ones!) Download it in PDF or PNG format. Read more How to Join 3 Tables (or More) in SQL Have you ever wondered how to join three tables in SQL? It's easy when you know the basics. Joining three tables can be as easy as joining two tables. Read more What's the Difference Between Having Multiple Tables in FROM and Using JOIN? What’s your approach to SQL JOINs? Do you use JOIN or simply list tables in FROM? Learn why SQL practitioners prefer the new syntax with the JOIN keyword. Read more Can you Join two Tables Without a Common Column? Do you need to combine two tables without a common column? Learn how the CROSS JOIN and the UNION operators can help you with this task. Read more Practical Examples of When to Use Non-Equi JOINs in SQL Do you only use equals in your JOIN conditions? Learn how non-equi JOINs can assist with listing duplicates and joining tables based on a range of values. Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.