15th Oct 2021 8 minutes read How to Join Only the First Row in SQL Kateryna Koidan JOIN Table of Contents The Problem 4 Ways to Join Only the Top Row in SQL Solution 1 Solution 2 Solution 3 Solution 4 Let’s Practice SQL JOINs! In your projects, you may encounter situations when you have many orders corresponding to one customer or many temperature observations corresponding to the same location, but you only need to join the first row with the most recent order or the most recently observed temperature to the corresponding record in another table. In this article, I’ll go through several ways to do this in SQL. The best way to practice basic and advanced SQL is our interactive SQL Practice Set course. It contains 88 hands-on exercises to help you refresh your SQL skills, starting with the basics and going to challenging problems. The Problem There are many different scenarios where you have a one-to-many relationship between two tables and you need to join only the first match from one table to the corresponding record in another. For example, you may be looking for: The most expensive item in each order. The most recently observed temperature for each location. The most experienced employee in each department. The most recent order for each customer. In all these cases, you may order the table with many corresponding records accordingly (e.g., by item price, observation date, etc.), and therefore, turn your problem into selecting the first, or the top, row. To demonstrate several possible solutions to this problem, we use the following tables that list the customers and their respective orders. customers idfirst_namelast_namephoneemail 11KateWhite+1 (415) 000 0000kate111111@gmail.com 12RoseParker+1 (415) 111 1111rose111111@gmail.com 13WilliamSpencer+1 (220) 222 2222bill111111@gmail.com 14JohnSmith+1 (220) 333 3333john111111@gmail.com orders idorder_datecustomer_idshipped_dateorder_status 1012021-10-01142021-10-02Completed 1022021-10-01112021-10-02Completed 1032021-10-02122021-10-03Completed 1042021-10-02112021-10-03Completed 1052021-10-0213NULLCanceled 1062021-10-03132021-10-05Completed 1072021-10-04122021-10-05Completed 1082021-10-04142021-10-06Completed 1092021-10-04132021-10-06Completed 1102021-10-04112021-10-06Completed 1112021-10-0511NULLAwaiting shipment 1122021-10-0512NULLAwaiting payment As you can see, every customer has several orders at our store. Let’s say for each customer, we want to know the date and the status of his/her most recent order. Here’s the output we are looking for: idfirst_namelast_nameorder_dateorder_status 11KateWhite2021-10-05Awaiting shipment 12RoseParker2021-10-05Awaiting payment 13WilliamSpencer2021-10-04Completed 14JohnSmith2021-10-04Completed The table lists the most recent order for each customer. No duplicates – each customer is mentioned only once, with the corresponding order that is the most recent according to the order date. Now let’s go through several possible ways to get this output from our initial tables. 4 Ways to Join Only the Top Row in SQL I’ll present four possible solutions to joining only the first row in SQL. Some of these solutions can be used with any database, while others work only with specific databases (e.g., PostgreSQL or MS SQL Server). Solution 1 If we know that the orders in our table are numbered sequentially, with a greater value of ID indicating a more recent order, we can use this column to define the latest record for each customer. Our step-by-step solution is the following: Define the greatest order ID for each customer. Assuming these IDs correspond to the most recent order for each customer, create a table that lists only the most recent orders. Join the customers table with this table of the most recent orders. This solution can be implemented using common table expressions (CTEs). WITH last_orders AS ( SELECT * FROM orders WHERE id IN ( SELECT MAX(id) FROM orders GROUP BY customer_id ) ) SELECT customers.id, customers.first_name, customers.last_name, last_orders.order_date, last_orders.order_status FROM customers JOIN last_orders ON customers.id = last_orders.customer_id ORDER BY customer_id; Alternatively, you can do the same using nested subqueries: SELECT customers.id, customers.first_name, customers.last_name, last_orders.order_date, last_orders.order_status FROM customers JOIN ( SELECT * FROM orders WHERE id IN ( SELECT MAX(id) FROM orders GROUP BY customer_id ) ) AS last_orders ON customers.id = last_orders.customer_id ORDER BY customer_id; In the queries above, we use one SELECT statement, or subquery, to find order IDs that correspond to the most recent order for each customer. We have another subquery to list these orders, and yet another query to join the table with the most recent orders with the table with customer information. I prefer to use CTEs in cases like these because, in my opinion, they have better structure and readability. If you want to learn more about CTEs or WITH clauses, check out this introductory article and this interactive Recursive Queries course that covers all kinds of CTEs. This solution gets us the output we need, but it relies on orders being indexed sequentially by when it was created. This may not always be the case. So, let’s move to the next solution that gives us more control over the output. Solution 2 If we cannot rely on the order ID to define the most recent order, we can add a column that does the job. Specifically, we can use a window function to number the rows of our orders table based on the order date, separately for each customer. Note that in our example, we use the order date without information on the exact order time for simplicity. This works in our case because we don’t have customers making multiple orders on the same day. However, you need to use the full timestamp to sort the orders in such cases. Our strategy in this solution is the following: Number the rows in the orders table so that the most recent order for each customer gets number 1. Select only the most recent order for each customer by filtering the records with row numbers equal to 1. Join the customers table with the table containing only the most recent orders. Again, we can implement the above strategy using CTEs: WITH numbered_orders AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC ) AS row_number FROM orders ), last_orders AS ( SELECT * FROM numbered_orders WHERE numbered_orders.row_number = 1 ) SELECT customers.id, customers.first_name, customers.last_name, last_orders.order_date, last_orders.order_status FROM customers JOIN last_orders ON customers.id = last_orders.customer_id ORDER BY customer_id; or using nested subqueries: SELECT customers.id, customers.first_name, customers.last_name, last_orders.order_date, last_orders.order_status FROM customers JOIN ( SELECT * FROM ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC ) AS row_number FROM orders ) AS numbered_orders WHERE numbered_orders.row_number = 1 ) AS last_orders ON customers.id = last_orders.customer_id ORDER BY customer_id; In the above SQL queries: We use the ROW_NUMBER() function to number the rows in the orders Note that before numbering the rows, we group them by customer ID with PARTITION BY and sort them by date in descending order to get the most recent order in the top row. We save the output of this subquery as numbered_orders. Next, we select the orders with the row number equal to 1 and save the result of this subquery as last_orders. Finally, we join the customers table with last_orders to get the required output. If you are new to window functions, learn more in this beginner-friendly guide and consider taking this interactive Window Functions course. For an overview of the syntax, check out the SQL Window Functions Cheat Sheet. Both of these solutions so far can be applied in all kinds of relational databases. Now, let’s move on to database-specific solutions. Solution 3 PostgreSQL allows the DISTINCT ON clause that can be of great value when we need to join only the first match in SQL: WITH last_orders AS ( SELECT DISTINCT ON (customer_id) * FROM orders ORDER BY customer_id, order_date DESC ) SELECT customers.id, customers.first_name, customers.last_name, last_orders.order_date, last_orders.order_status FROM customers JOIN last_orders ON customers.id = last_orders.customer_id ORDER BY customer_id; Instead of a separate subquery to number the rows or define the most recent order using order ID, we use DISTINCT ON (customer_id) to get only the first row corresponding to each customer. Also, in our CTE, we sort the rows by order date in descending order to ensure that the first row for each customer corresponds to the most recent order of this customer. The DISTINCT ON () clause is very convenient for cases like this, but unfortunately, it is available only in PostgreSQL. Solution 4 We can use the SQL toolkit for specifying the number of rows to be displayed in the output. This option is available in most SQL dialects, but the syntax can be different. Several SQL dialects (e.g., SQLite, MySQL, and PostgreSQL) use the LIMIT clause to specify the number of rows to be returned. You can use this option to select only the most recent order for each customer. You’ll need to sort the results by order date in descending order then limit the output to only one row: SELECT customers.id, customers.first_name, customers.last_name, orders.order_date, orders.order_status FROM customers JOIN orders ON orders.id = ( SELECT id FROM orders WHERE customer_id = customers.id ORDER BY order_date DESC LIMIT 1 ) ORDER BY customer_id; MS SQL Server doesn’t support the LIMIT clause, but it has another solution to join only the top row in SQL. You can use the TOP 1 clause in MS SQL Server to get only the most recent orders joined with the customers table: SELECT customers.id, customers.first_name, customers.last_name, orders.order_date, orders.order_status FROM customers JOIN orders ON orders.id = ( SELECT TOP 1 id FROM orders WHERE customer_id = customers.id ORDER BY order_date DESC ) ORDER BY customer_id; Like the previous solution, we order the rows by date in descending order to make sure that TOP 1 clause selects the most recent order for each customer. Let’s Practice SQL JOINs! I hope that these solutions have shown you how powerful and flexible SQL can be with various tasks. You can see how SQL JOINs can be used to join only the first row when there is a one-to-many relationship between two tables. There are many more use cases where SQL JOINs can help address non-trivial problems. To review and deepen your knowledge of SQL JOINs, I recommend this interactive course that includes 93 coding challenges. It covers INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN; filtering data with different kinds of JOINs; joining a table with itself; joining tables on non-key columns; and more. If you want to master advanced tools for data analysis with SQL, consider taking the Advanced SQL track that covers Window Functions, GROUP BY Extensions in SQL, and common table expressions (CTEs). Thanks for reading, and happy learning! Tags: JOIN