24th Nov 2021 12 minutes read Top 5 SQL CTE Interview Questions Tihomir Babic Common Table Expressions SQL Interview Questions Table of Contents Table for Questions 1, 2, and 3 Interview Question 1: Find the Average Salary by Department Solution Query Solution Query Explanation The Result Table The Result Explanation You can interpret the remaining rows in the result in the same way. Interview Question 2: Find the Highest Salary by Department Solution Query Solution Query Explanation The Result Table The Result Explanation Interview Question 3: Find All Employees Under a Specific Manager Solution Query Solution Query Explanation The Result Table The Result Explanation Tables Used for Questions 4 & 5 Interview Question 4: Find the Average Number of Orders Solution Query Solution Query Explanation The Result Table The Result Explanation Interview Question 5: Find the Number of Consecutive Days With Order Solution Query Solution Query Explanation The Result Table The Result Explanation When Are CTEs Useful? Want to Know More About SQL Common Table Expressions? Here are five questions (and solutions!) on CTEs you’ll (probably) be asked at an interview. I’m not saying you will get these SQL CTE interview questions at every job interview. But when you do, they are probably along the lines of the five I’m about to show here. Other than theoretical questions about CTEs, there are not that many variations in the CTE scenarios evaluated by interviewers. Go through these five examples, and you get a good foundation for acing your interview! For this, you have to know what SQL common table expressions are and how their syntax works. Ready to take a look at the questions? Let’s go! Table for Questions 1, 2, and 3 For the first set of questions, we’ll use the table employees. Here’s what it looks like: idfirst_namelast_namedepartmentsalarymanager_id 1AngelikaVoulesMarketing5,293.742 2RozelleSwynleyMarketing8,295.0818 3WarrenWilleyEngineering9,126.7219 4LynelleWhitenManagement Board10,716.15NULL 5ConsolataRomanLegal8,456.064 6HoebartBaldockResearch and Development4,817.3420 7StarleneWatkissAccounting6,541.484 8BardeRibbensMarketing4,852.872 9LornePhilipsenEngineering7,235.593 10PedroNaldrettResearch and Development5,471.6220 11BrinaDillingerMarketing6,512.172 12VerileSonleyResearch and Development4,574.4120 13NobleGeerlingResearch and Development8,391.1820 14GareyMacAdamAccounting3,829.887 15TheoSorrellEngineering6,441.673 16ErminieGellingResearch and Development8,590.7020 17LoralieKoopAccounting5,248.467 18CalAndreyManagement Board11,258.82NULL 19QuinceyGamellManagement Board11,366.52NULL 20JanithMcGiffieResearch and Development7,428.8319 What does the data tell you? The first row, for example, says Angelika Voules works in Marketing, and her salary is 5,293.74. Her boss has a manager ID of 2; look it up under the column id, and you see Angelika Voules’ boss is Rozelle Swynley. There are three rows with NULL values in the manager_id column: idfirst_namelast_namedepartmentsalarymanager_id 4LynelleWhitenManagement Board10,716.15NULL 18CalAndreyManagement Board11,258.82NULL 19QuinceyGamellManagement Board11,366.52NULL It means these three employees have no managers. This makes sense since all three of them are on the management board. Now, let’s see the interview questions. Interview Question 1: Find the Average Salary by Department Using the table employees, show all the employees, their departments, salaries, and the average salary in their respective department. Order the result by department. Solution Query WITH avg_salary AS ( SELECT AVG(salary) AS average_salary, department FROM employees GROUP BY department) SELECT e.first_name, e.last_name, e.department, e.salary, avgs.average_salary FROM employees e JOIN avg_salary avgs ON e.department = avgs.department ORDER BY department; Solution Query Explanation This query uses a CTE named avg_salary to calculate the average salary by department. The SELECT statement in the CTE groups the rows by department and uses the AVG() aggregate function to calculate the average for each department. Once we have that, we combine it with other columns from the table employees to complete the answer to the interview question. To do so, we join the table employees with the CTE as we would do with any two tables. We select the columns first_name, last_name, department, and salary from the table employees, and the column average_salary from the CTE. For convenience, we use aliases for the table and the CTE. Finally, we order the result by department. The Result Table Here’s what the result looks like: first_namelast_namedepartmentsalaryaverage_salary GareyMacAdamAccounting3,829.885,206.61 LoralieKoopAccounting5,248.465,206.61 StarleneWatkissAccounting6,541.485,206.61 WarrenWilleyEngineering9,126.727,601.33 LornePhilipsenEngineering7,235.597,601.33 TheoSorrellEngineering6,441.677,601.33 ConsolataRomanLegal8,456.068,456.06 CalAndreyManagement Board11,258.8211,113.83 QuinceyGamellManagement Board11,366.5211,113.83 LynelleWhitenManagement Board10,716.1511,113.83 AngelikaVoulesMarketing5,293.746,238.47 RozelleSwynleyMarketing8,295.086,238.47 BardeRibbensMarketing4,852.876,238.47 BrinaDillingerMarketing6,512.176,238.47 NobleGeerlingResearch and Development8,391.186,545.68 ErminieGellingResearch and Development8,590.706,545.68 VerileSonleyResearch and Development4,574.416,545.68 PedroNaldrettResearch and Development5,471.626,545.68 HoebartBaldockResearch and Development4,817.346,545.68 JanithMcGiffieResearch and Development7,428.836,545.68 The Result Explanation This table tells us that, for example, Starlene Watkiss’ salary is 6,541.48, and the average salary in her department (Accounting) is 5,206.61. first_namelast_namedepartmentsalaryaverage_salary StarleneWatkissAccounting6,541.485,206.61 You can interpret the remaining rows in the result in the same way. Interview Question 2: Find the Highest Salary by Department Find the employee with the highest salary in each department. Show their first and last names, salaries, and departments. Solution Query WITH highest_salary AS ( SELECT first_name, last_name, department, salary, RANK () OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank FROM employees) SELECT first_name, last_name, salary, department FROM highest_salary WHERE salary_rank = 1; Solution Query Explanation This time, the CTE is named highest_salary. We use it to rank employees within each department by salary. How? By using the window function RANK(). We partition the data by the column department. This means we rank the salaries only within each department, not across the company overall. The data is sorted according to the column salary in descending order because we want the highest salary in the department to have the rank of 1. We also select several columns from the table employees in the same CTE. We need data from this CTE in the next SELECT statement. And in that SELECT statement, we select the columns needed to answer the interview question. All the columns are from the CTE. We filter the result using a WHERE clause to get only the rows where the salary rank is 1, i.e., the rows with the highest salary. The Result Table first_namelast_namesalarydepartment StarleneWatkiss6,541.48Accounting WarrenWilley9,126.72Engineering ConsolataRoman8,456.06Legal QuinceyGamell11,366.52Management Board RozelleSwynley8,295.08Marketing ErminieGelling8,590.70Research and Development The Result Explanation It isn’t too difficult to interpret the result. Let’s take this employee as an example: first_namelast_namesalarydepartment ConsolataRoman8,456.06Legal The data shown tells us Consolata Roman has the highest salary in the Legal department at 8,456.06. Let’s move on to the third SQL CTE interview question. Interview Question 3: Find All Employees Under a Specific Manager Find all employees working directly or indirectly under the employee whose ID is 18. Solution Query WITH RECURSIVE subordinates AS ( SELECT id, first_name, last_name, manager_id FROM employees WHERE id = 18 UNION SELECT e.id, e.first_name, e.last_name, e.manager_id FROM employees e JOIN subordinates s ON e.manager_id = s.id ) SELECT * FROM subordinates WHERE id != 18; Solution Query Explanation In this example, it is not just a regular CTE – rather, we use a recursive CTE to get the result. The difference in syntax is WITH RECURSIVE instead of just WITH. The CTE is named subordinates, and then comes the SELECT statement just like in non-recursive CTEs. This statement selects certain columns from the table employees but only for the employee whose ID is 18. Then, we use the UNION operator to link the result of this SELECT statement with the result of another SELECT statement. For this, both SELECT statements need to have the same columns. The second SELECT statement in the CTE selects columns where manager_id (from the table employees) is equal to the id (from the CTE). We get all the data for the employee whose ID is 18. Then, we find the direct subordinates, and by recursion, we find the subordinates’ subordinates, until we go all the way down the hierarchy of the organization. Then we get the columns from the CTE subordinates and remove the employee whose ID is 18. That’s it! The Result Table idfirst_namelast_namemanager_id 2RozelleSwynley18 1AngelikaVoules2 8BardeRibbens2 11BrinaDillinger2 The Result Explanation Rozelle Swynley’s direct boss is the employee whose manager’s ID is 18. But Rozelle has subordinates, too. They are Angelika Voules, Barde Ribbens, and Brina Dillinger. We know this because the table shows their manager’s ID is 2, which is Rozelle Swynley’s ID. They are Rozelle Swynley’s direct subordinates; they are also indirect subordinates of the employee whose ID is 18. The employee whose ID is 18 is Cal Andrey, who is not shown in the result because we filter it out in the WHERE clause. Tables Used for Questions 4 & 5 For the last two interview questions, we’ll use the tables customers and orders. The table customers looks like this: idfirst_namelast_name 1SimonPaulson 2DylanBobson 3RebMackennack The orders table is a little bigger: idorder_dateorder_amountcustomer_id 12021-10-0142.123 22021-10-01415.631 32021-10-0284.992 42021-10-0228.963 52021-10-0254.311 62021-10-0374.261 72021-10-0377.772 82021-10-0355.703 92021-10-0416.943 102021-10-0451.441 112021-10-0541.583 122021-10-0695.001 This second table contains data about the orders placed. The column customer_id is a foreign key to the primary key of the first table, so we can identify which customer placed which order. Let’s take the first row as an example: idorder_dateorder_amountcustomer_id 12021-10-0142.123 It’s an order placed on October 1, 2021. The value of the goods ordered is 42.12, and the order was placed by the customer whose ID is 3. From the table customers, you see it is Reb Mackennack. Let’s see the tasks to be solved using these tables. Interview Question 4: Find the Average Number of Orders This interview question asks you to use a SQL common table expression to find the average number of orders per customer. Solution Query WITH orders_count AS ( SELECT customer_id, COUNT(*) AS no_of_orders FROM orders GROUP BY customer_id) SELECT AVG(no_of_orders) AS avg_no_of_orders FROM orders_count; Solution Query Explanation To get the result, you first count the number of orders by customer. We do this using the CTE orders_count and the COUNT() function in it. This function counts the number of rows in the table orders. Since we're interested not in the total count of orders but rather in the number of orders by customer, I group the result by the column customer_id. Now that we have the number of orders, it’s easy to calculate the average number. Simply use the AVG() function in the SELECT statement, and you get the answer to the interview question. The Result Table avg_no_of_orders 4 The Result Explanation A very small table requires a very short explanation. The above table simply shows the average number of orders per customer is four. Interview Question 5: Find the Number of Consecutive Days With Order In this CTE example, you have to calculate how many days in a row each customer places an order. It basically asks you to calculate the length of a contiguous series without gaps. Note that all the orders are within the same month. Solution Query WITH groupings_by_date AS ( SELECT c.id, c.first_name, c.last_name, RANK() OVER (PARTITION BY c.id ORDER BY o.order_date) AS row_number, o.order_date, EXTRACT(DAY FROM o.order_date) - RANK() OVER (PARTITION BY c.id ORDER BY o.order_date) AS date_group FROM customers c JOIN orders o ON c.id = o.customer_id ) SELECT id, first_name, last_name, COUNT(*) AS orders_in_row FROM groupings_by_date GROUP BY id, first_name, last_name, date_group; Solution Query Explanation This rather lengthy query starts with a CTE. It selects certain columns from the tables customers and orders. There’s a RANK() window function as in Question 2. This time, we use this function to allocate a row number to every order within the same customer. For that reason, we partition the data by the column id. The ranking is performed according to the order date (we want the ranking to be sequential). The other function used here is EXTRACT(). Its purpose is to extract the day part from the order date so that we can deduct the row number from it. Why are we doing this? We're simply giving a group of consecutive orders a common date_group. If you run just this CTE, the first two rows of the result look like these: idfirst_namelast_namerow_numberorder_datedate_group 1SimonPaulson12021-10-010 1SimonPaulson22021-10-020 As you can see, Simon Paulson placed orders on both October 1 and October 2, 2021. Since they were placed two days in a row, they belong to the same date_group. How do we get this? It’s the day extracted from the order_date minus the row_number. In the first row, the day from the date ‘2021-10-01’ is 1. The row number is also 1. So, it’s 1-1 = 0, which is also the value in the date_group. The second row is 2-2 = 0, which is the same date_group as above. The specific value of date_group doesn’t really matter! It matters only that the consecutive days have the same date_group value. This is a little trick to calculate the length of a series. It works because, if the orders are placed every day, then the difference between the number of days in the order_date and value in the row_number is always the same for the same customer. Note that this data does not appear anywhere. I’m just showing it to you for an easier explanation of what the CTE does here. It’s important to mention this trick works only if your data is all within the same month. If the series, say, starts on 2021-10-31 and goes to 2021-11-01, the trick does not work; these two days, although they’re one after another, won’t belong to the same date_group. So, you have to understand your data before you decide to use this trick to get the length of a series. Once you have the CTE, use it like a table in the SELECT statement. I’m going to use the COUNT() function to get the number of the orders in a row. I want to see the result for every customer, so I group the data by id, first_name, and last_name. I’m also grouping it by the column date_group, so the orders placed by the same customer get separated if there is a gap between them. The Result Table idfirst_namelast_nameorders_in_row 1SimonPaulson4 3RebMackennack5 2DylanBobson2 1SimonPaulson1 The Result Explanation This table shows Simon Paulson has placed four orders in a row. Reb Mackennack has done that five days in a row, while Dylan Bobson only has two orders in a row. Finally, there’s one additional order by Simon Paulson. When Are CTEs Useful? As you see in these examples, CTEs are very useful when you have to calculate something in at least two steps. You use a CTE for preparing the preliminary calculation, and then you simply use the SELECT statement, reference the CTE, and perform another level of calculation. In more complex calculations, you can also use several CTEs or even nested CTEs. It’s similar to subqueries, but CTEs make code more readable and easier to break down the calculation into steps. And with nested CTEs, you can write a query and immediately reference it like any other table. Some of these examples show you can also use aggregate and window functions. This makes CTEs more powerful. In addition, the CTEs are great for hierarchical data like organizational structures and for traversing a graph. However, you need to write a recursive CTE in these cases as I did in Question 3. Finally, if you want to write recursive queries and use them, the first step is to learn CTEs. If you need help, here’s a guide on how to approach learning CTEs. As you can see, CTEs have plenty of benefits and practical uses. These examples show you only a glimpse into solutions to practical problems that take much longer if it weren’t for CTEs. You want some more? No problem, take a look at some other practical uses of CTEs. Once you learn CTEs and start using them, try to think about the problem first. Break down the calculation steps, and then translate this logic into an SQL common table expression code. CTEs are suitable for this. Since they are a version of temporary tables, be clear when naming them: use a name that immediately tells what the CTE does and be consistent with naming conventions. Generally, you want to follow CTE best practices, which make your (and others!) code reading much easier. Want to Know More About SQL Common Table Expressions? Whether you’re preparing for an SQL job interview, want to be better at your current job, or simply want to learn something new in your free time, take a look at our Recursive Queries course. There, you find a systematic approach to CTEs, explaining the theory behind them, showing you the syntax, and giving you some more examples to practice. The whole course is part of the Advanced SQL track, which covers two additional topics: window functions (which we’ve used in this article) and the GROUP BY extensions. There are many ways in which you can practice these advanced SQL topics on our platform. Good luck! Tags: Common Table Expressions SQL Interview Questions