16th Apr 2024 19 minutes read

25 SQL Interview Questions for Data Analysts

Table of Contents

How to Prepare for a Data Analyst Interview
Basic SQL Interview Questions
Intermediate SQL Questions
Advanced SQL Questions
More Help with SQL Interview Questions for Data Analysts

You have an upcoming interview for a data analyst position and SQL is one of the skills that will be tested. Well, these SQL questions for data analysts are specifically tailored for you!

SQL is a must-have skill in every data analyst's toolkit. It simplifies data extraction and manipulation, allowing analysts to convert raw data into actionable insights. It enables analysts to perform ad hoc analyses, accelerating organizational decision-making processes. It also encourages collaboration between data analysts and database administrators, resulting in smoother data operations.

If you're not sure where to begin your SQL journey, look no further than our SQL for Data Analysis course. This comprehensive course covers all aspects of data analysis, including data retrieval and report generation.

However, you’re probably here to prepare for your data analyst interview. So, without further ado, let’s jump right in.

How to Prepare for a Data Analyst Interview

Setting yourself up for a successful data analyst interview requires a strategic combination of technical expertise and focused preparation. Here are some practical pointers and suggestions to help you stand out:

Review fundamental SQL concepts. Ensure you have a thorough understanding of fundamental SQL concepts – i.e. querying, filtering, joining, and aggregating data. Consider enrolling in our SQL Basics course, which is designed to reinforce your knowledge of core SQL.
Learn standard SQL functions. Improve your understanding of standard SQL functions, including numeric, text, and date/time functions. Learn how to effectively handle NULL values and use SQL functions to write more effective queries.
Practice SQL reporting. Begin with basic reports using the GROUP BY, ORDER BY, and HAVING Then you can progress to intermediate and advanced reports that use subqueries, window functions, and common table expressions (CTEs).
Take a skill assessment. Consider taking an SQL assessment to evaluate your level of proficiency. Our SQL Skills Assessment includes 18 interactive online tasks that cover six SQL competency areas.
Utilize learning resources. Use online courses and tutorials to improve your SQL. Our SQL courses include comprehensive learning materials and hands-on exercises designed to help you improve your data analysis and SQL queries. Alternatively, if you prefer books, check out our curated list of the Best SQL Books for Data Analysis.

Basic SQL Interview Questions

Now that we've discussed the importance of learning SQL for data analysis, let's look at some actual SQL interview questions tailored to different skill levels – basic, intermediate, and advanced. Keep our SQL for Data Analysis Cheat Sheet handy to help you with the challenges ahead.

Question 1: What is SQL?

SQL (Structured Query Language) is a standard programming language for managing and communicating with relational databases. It enables users to query data; insert, update, and delete records; create and modify database schemas, and manage access permissions.

Question 2: List the basic elements of an SQL query.

An SQL query typically contains a:

SELECT statement that retrieves data from one or more tables.
FROM clause that specifies the table(s) where the data resides.
WHERE clause (optional) that filters results based on specified criteria.
GROUP BY clause (optional) that groups results by column(s).
HAVING clause (optional) that filters grouped results.
ORDER BY clause (optional) that sorts the results in ascending (or descending) order.

Question 3: How do you select all the data from a table?

To retrieve all the data from a table, use:

SELECT * 
FROM employees;

This query fetches all rows of all columns from the employees table:

SELECT *: Selects all columns from the specified table; the asterisk (*) represents all columns in a table.
FROM employees: Specifies the table from which data will be retrieved.

Question 4: How do you select certain columns from a table?

To retrieve specific columns from a table, you must list the desired column names in the SELECT statement:

SELECT first_name, last_name, department 
FROM employees;

This query retrieves data only from the specified columns, providing a focused view of relevant information from the employees table.

SELECT first_name, last_name, department: This returns only the data in the first_name, last_name, and department
FROM employees: Specifies the table from which data will be retrieved.

We can also name selected columns using AS:

SELECT first_name, 
last_name AS surname, 
department 
FROM employees;

In the query results, the last_name column will be shown as surname. You can use this to differentiate between two columns with similar names or to add more context to a column. For example, renaming the number column to part_id_number allows readers to quickly identify what kind of data is stored in that column.

Question 5: Explain the syntax of the WHERE clause.

The WHERE clause filters rows to meet specific conditions. Here is an example of its syntax:

SELECT first_name, last_name
FROM employees
WHERE department = 'Sales';

In this query, WHERE limits the results to data for employees in the Sales department.

SELECT first_name, last_name: Specifies which columns to retrieve data from.
FROM employees: Specifies the table from which data will be retrieved.
WHERE department = 'Sales': Limits the results to the employees in the Sales department.

WHERE can be applied to different data types. When filtering by numbers or dates, you can use direct comparisons or operators like <,<=,>,>=, or =.

SELECT first_name, last_name
FROM employees
WHERE salary > 50000;

When filtering NULL values, the IS NULL or IS NOT NULL operators can be used:

SELECT * 
FROM employees 
WHERE manager_id IS NULL;

Question 6: How do you filter query results based on multiple conditions?

To filter results based on multiple conditions, logical operators such as AND, OR, and NOT are used in the WHERE clause.

The AND operator returns rows that meet all of the specified conditions.
The OR operator returns rows that meet at least one of the specified conditions.
The NOT operator returns rows that do not meet the condition(s).

Consider the following example:

SELECT first_name, last_name
FROM employees
WHERE department = 'Sales' AND salary > 50000;

This query will only return employees who work in sales and whose salary is greater than $50,000. If the employee works in any other department or works in sales and earns less than $50,000, they won’t be shown in the results.

SELECT first_name, last_name: Determines which columns to retrieve data from.
FROM employees: Specifies the table from which data will be retrieved.
WHERE department = 'Sales' AND salary > 50000: Filters the results to include only employees from the Sales department with a higher salary.

You can combine multiple conditions using parentheses:

SELECT first_name, last_name
FROM employees
WHERE (department = 'Sales' AND salary > 50000) 
  OR manager_id IS NULL;

In this query, only employees who don’t report to a manager (i.e. whose manager ID is NULL) or who work in sales and earn $50k+ will be included in the results.

Question 7: How do you filter results for a given text value?

To filter results based on a specific value in a text column, use the LIKE operator. You’ll also need the pattern matching symbols % (matches zero or more characters) and/or _ (matches any single character). Consider the following example:

SELECT product_name, description
FROM products
WHERE description LIKE '%apple%';

This query demonstrates using LIKE with pattern matching:

SELECT product_name, description: specifies which columns to retrieve data from.
FROM products: Specifies the table from which data will be retrieved.
WHERE description IS LIKE '%apple%': Filters the results to only show products that have the word "apple" in their description.

Another example could be:

SELECT * 
FROM employees 
WHERE first_name LIKE 'Mar_a';

Imagine you are looking for all employees named María or Maria. This query will return all employees whose first names begin with 'Mar', followed by any character in place of 'i', then an 'a'. This pattern can be used to match both 'Maria' and 'María' in a database where the name may not have the accent on the 'i'. (It would also return ‘Marta’, ‘Marla’, ‘Marja’, and any other names that fit the pattern.)

Question 8: How do you update rows in a database?

To update existing rows in a database table, use the UPDATE statement:

UPDATE employees
SET salary = 60000, department = 'Marketing'
WHERE employee_id = 101;

This query demonstrates how to update existing columns with new values for rows that satisfy the specified condition.

UPDATE employees: Specifies which table will be updated.
SET salary = 60000, department = 'Marketing': Specifies the columns and their new (updated) values.
WHERE employee_id = 101: Defines which rows will be updated – in this case, the row where the employee ID is 101. You can also use AND, OR, NOT, number filtering, etc. to specify more complex update conditions.

Question 9: How do you remove rows from a database?

To remove rows from a database table, use the DELETE FROM statement, followed by the table name and the condition indicating which rows to delete:

DELETE FROM employees
WHERE department = 'Human Resources';

This query shows how to delete rows from a database table based on a specified condition:

DELETE FROM employees: Indicates the table from which rows will be deleted.
WHERE department = 'Human Resources': Specifies the condition that determines which rows should be deleted – in this case, all employees in the Human Resources department. You can specify deletion conditions using standard WHERE

Question 10: How do you sort results from a SQL query?

Use the ORDER BY clause, followed by the column name(s) by which you want to sort the results:

SELECT product_name, price
FROM products
ORDER BY price DESC;

This query demonstrates how to sort results based on a specified column, allowing data to be presented in the desired order.

SELECT product_name, price: Determines which columns to retrieve data from.
FROM products: Defines the table where this data resides.
ORDER BY price DESC: Sorts the results (from highest to lowest, aka descending order) by the price SQL’s default sort order is ascending, with text values sorted from A to Z and numerical values sorted from lowest to highest. If DESC is not specified, the sorting will be done in ascending order.

Intermediate SQL Questions

Question 11: How do you combine data from multiple tables in one query?

SQL queries use JOIN clauses to combine results from multiple tables. JOINs allow you to retrieve data from related tables (i.e. tables with one or more columns in common) based on specific criteria. Here’s an example of JOIN:

SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers 
ON orders.customer_id = customers.customer_id;

This query retrieves the order ID and customer name from the orders and customers tables. It joins the information where records in both tables contain a matching customer ID. For more information on how JOINs work, see An Illustrated Guide to the SQL INNER JOIN. You can find more examples of JOIN interview questions in The Top 10 SQL JOIN Interview Questions and How to Answer Them.

Question 12: Describe the JOIN types.

JOIN types include several variations, each serving a specific purpose in combining results from multiple tables:

INNER JOIN: Retrieves only records where both tables have matching values based on the specified condition.
LEFT JOIN: Returns all records from the left table and any matching records from the right table. If no match is found, NULL values are returned for the right table’s columns.
RIGHT JOIN: Returns all records from the right table and any matching records from the left table. If no match is found, NULL values are returned for the left table's columns.
FULL JOIN: Returns all records when there is a match in either the left or right tables. If no match is found, the non-matching rows are returned with NULLs.
CROSS JOIN: Returns the Cartesian product of the two tables – i.e. all possible combinations of rows.

Although a self-JOIN uses the same syntax as other JOINs, it actually involves joining a table to itself. This allows for comparisons between rows within the same table.

Question 13: Describe LEFT JOIN and the difference between LEFT JOIN and INNER JOIN.

LEFT JOIN is a type of OUTER JOIN that returns all rows from the left table plus any matching rows from the right table. If there is no match, NULL values are returned for the columns in the right table.

The main distinction between LEFT JOIN and INNER JOIN is their behavior with unmatched rows:

LEFT JOIN returns all rows from the left table, regardless of whether a match exists in the right table. If there is no match in the right table, NULL values are assigned to those columns.
INNER JOIN returns only matching rows from both tables. Unmatched rows from either table are removed from the results set.

Imagine you have a table of students’ names and another containing their grades. Using a LEFT JOIN, you'd get a list of all students – including those who have NULL grades. In contrast, an INNER JOIN would only show students who have grades, excluding records with NULL grades.

Question 14: What does GROUP BY do?

The GROUP BY clause groups rows with the same values into summary rows. It is frequently combined with aggregate functions (such as SUM(), COUNT(), and AVG()) to perform calculations on grouped data.

SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department;

In this example, the query groups the rows based on the values in the department column and counts the employee_id values for each department. For more examples of similar queries, see our article Top 9 SQL GROUP BY Interview Questions.

Question 15: What are aggregate functions?

SQL aggregate functions calculate a set of values and return a single value. These functions are frequently used to summarize data and perform calculations on multiple rows.

The most common SQL aggregate functions are:

SUM(): Returns the sum of the numeric values in a set.
COUNT(): Counts the number of rows in a set, optionally including non-null values.
AVG(): Determines the average of numeric values in a set.
MIN(): Returns the set's minimum value.
MAX(): Returns the set's maximum value.

SELECT 
    AVG(salary) AS avg_salary,
    COUNT(*) AS total_employees,
    MAX(age) AS max_age
FROM 
    employees;

In this example, aggregate functions compute the average salary, count the total number of employees, and determine the greatest age among all employees in the employees table. Check out our SQL Aggregate Functions Cheat Sheet for more examples of these functions.

Question 16: What does HAVING do?

The HAVING clause filters the results of the GROUP BY clause. It is only used with aggregated data, allowing you to apply filtering criteria to grouped results.

Imagine you have a table of orders and you want to find the total sales amount for each customer. However, you only want to include customers with a total of $1,000 or more. This is the query you’d use:

SELECT customer_id, SUM(amount) AS total_sales
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;

Here, the HAVING clause filters the groups (customers) based on the condition that the customer’s total sales amount exceeds $1,000. It is applied following the GROUP BY operation and allows you to filter the grouped results.

Question 17: What is the difference between WHERE and HAVING?

HAVING filters groups of rows and is applied after the GROUP BY statement. It is intended specifically for aggregated data and allows you to apply filtering criteria to grouped results. WHERE filters individual rows and is applied before GROUP BY.

Imagine you have a table of orders and want to calculate the total sales amount for each product, but only for products whose total sales exceed $10,000. Here's how you’d use the HAVING clause:

SELECT 
    product_id,
    SUM(quantity_sold * price_per_unit) AS total_sales
FROM 
    sales
GROUP BY 
    product_id
HAVING 
    SUM(quantity_sold * price_per_unit) > 10000;

Now consider the same scenario, but this time you only want need sales where the product’s price per unit is greater than $10 and the total sale is over $10,000. Some of this information is found in individual rows, so the rows must be filtered before grouping. We also have to filter for total sales, so the groups will be filtered too.

SELECT 
    product_id,
    SUM(quantity_sold * price_per_unit) AS total_sales
FROM 
    sales
WHERE
    price_per_unit > 10
GROUP BY 
    product_id
HAVING 
    SUM(quantity_sold * price_per_unit) > 10000;

This demonstrates how the WHERE clause filters individual rows based on conditions, whereas the HAVING clause filters groups of rows based on aggregated data. For a deeper dive into this subject, check out our article HAVING vs. WHERE in SQL: What You Should Know.

Question 18: Give an example of an SQL subquery.

A SQL subquery, also known as a nested query or inner query, is a query contained within another query. It is used in SELECT, INSERT, UPDATE, and DELETE statements to execute operations based on the results of another query. Consider the following example:

SELECT product_name
FROM products
WHERE category_id IN (SELECT category_id FROM categories WHERE category_name = 'Electronics');

In this query, the WHERE clause uses a subquery to retrieve category_id values from the categories table with the category_name 'Electronics'. The outer query then retrieves product names for electronics products based on the subquery results.

Advanced SQL Questions

Question 19: What is the difference between `UNION` and `UNION ALL`?

UNION removes duplicate rows from the combined result set. In contrast, UNION ALL keeps all rows, including duplicates.

Let's imagine we have two tables, employees and managers, both of which contain the column employee_id. We want to combine the employee IDs from the two tables.

SELECT employee_id FROM employees
UNION
SELECT employee_id FROM managers;

In this query, if employee_id appears in both employees and managers, UNION will remove the duplicate, but UNION ALL will keep both occurrences. You can read more about this in UNION vs. UNION ALL in SQL: What’s the Difference?

Question 20: What are SQL window functions and what are they used for?

SQL window functions perform calculations on a subset of table rows that are related to the current row. They are used on a window, which is a collection of rows related to the current row.

Window functions differ from GROUP BY queries because GROUP BY aggregates data based on specific columns, collapsing multiple rows into summary rows. In contrast, window functions perform calculations across a set of rows related to the current row within the query result, without collapsing the result set. You can read more in SQL Window Functions vs. GROUP BY: What’s the Difference?

To apply window functions, the OVER clause is used with the following syntax:

<function>(column) OVER (PARTITION BY partition_column ORDER BY order_column)

Where:

<function> indicates the window function being used (e.g., AVG(), SUM(), RANK()).
(column): The column or expression to which the window function is applied.
PARTITION BY partition_column: An optional clause that splits the result set into subsets based on one or more columns. The function is then applied to each partition individually.
ORDER BY order_column: An optional clause specifying the order of rows within each partition. This determines which rows appear in the window frame for each calculation.

Window functions are frequently used to compute aggregate values such as averages, totals, and counts over a set of rows. They are also used to compare values to other rows in the result set; this aids in determining rankings, row numberings, and maximum and minimum values within a window. To find out more, read When Do I Use SQL Window Functions?

Imagine we have a table called employees with the columns employee_id, department_id, and salary. We can calculate the average salary per department using the AVG() window function:

SELECT 
    employee_id, 
    department_id, 
    salary,
    AVG(salary) OVER (PARTITION BY department_id) AS avg_salary_per_department
FROM 
    employees;

SQL window functions are great for calculating aggregates over specific subsets of data – as shown in this example of calculating average salaries within departments. They are invaluable for data ranking, which can be useful in tasks like performance evaluation. These functions make it easier to perform analytical comparisons and identify trends, such as in quarterly revenue. We'll look at more examples in the following questions.

Question 21: How do you rank rows in SQL?

To rank rows in SQL, use the window functions RANK(), DENSE_RANK(), PERCENT_RANK() and ROW_NUMBER(). These functions assign a rank or row number to each row in the result set based on predefined criteria:

RANK(): Assigns a unique rank to each distinct row. Rows with the same value receive the same rank, but there are gaps in the ranking (e.g. 1,2,3,3,5).
DENSE_RANK(): Also assigns a unique rank to each row and gives tied rows the same rank, but it eliminates gaps in the ranking sequence. Instead, it assigns the next consecutive rank to the row following the tied rows (e.g. 1,2,3,3,4).
PERCENT_RANK() returns the ranking for each row in the partition/result set as a percentile (from 0 to 1).
ROW_NUMBER() assigns a unique row number to each row in the result set, regardless of duplicates or tie values.

Read our Overview of Ranking Functions in SQL for more differences between these functions.

Imagine we have a table called employees with the columns employee_id, employee_name, and salary. We can rank the employees based on their salary in descending order using the RANK() window function:

SELECT 
    employee_id, 
    employee_name, 
    salary,
    RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM 
    employees;

This query returns employee IDs, names, and salaries from the employees table. It then ranks each employee according to their salary, with higher salaries receiving lower ranks. The result includes the original columns as well as a new column, salary_rank, which indicates each employee's salary rank.

Question 22: What is a CTE? When do you use it?

A CTE, or common table expression, is a temporary named result set that can be used in a SELECT, INSERT, UPDATE, or DELETE statement. It allows you to divide complex queries into manageable and reusable chunks.

A CTE, introduced with the WITH keyword, allows you to create a temporary table-like structure that exists only during query execution. The temporary table can then be referred to in the same query. CTEs are especially useful for breaking down complex computations into multiple steps; this improves code organization and enables code reuse.

Assume we have a table called employees with the columns employee_id, employee_name, and salary. We can create a CTE to select high-paid employees who earn more than $100,000:

WITH high_paid_employees AS (
  SELECT 
      employee_id, 
      employee_name, 
      salary
  FROM 
      employees
  WHERE 
      salary > 100000
)
SELECT 
    * 
FROM 
    high_paid_employees;

Question 23: What is a running total and how do you compute it in SQL?

A running total, also known as a cumulative sum, is the sum of a value over a set of rows in a particular order. In SQL, a running total can be calculated using SUM() and the OVER() clause.

The following query computes the running total of order_amount over all rows in the orders table sorted by order_date:

SELECT 
    order_date, 
    order_amount,
    SUM(order_amount) OVER (ORDER BY order_date) AS running_total
FROM 
    orders;

Question 24: What is a moving average and how do you compute it in SQL?

A moving average represents the average of a set of values over a set of times, where the window "moves" along the data set. In SQL, you can compute a moving average using AVG() with OVER() and specifying a window frame.

The following query calculates the moving average of value for the current row and the two preceding rows, ordered by date.

SELECT 
    date, 
    value,
    AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average
FROM 
    dataset;

Question 25: How do you compute the year-to-year difference in SQL?

To calculate the year-to-year difference in SQL, use functions like LAG() to compare values between consecutive rows.

The following query calculates the year-to-year difference in revenue by subtracting the current year's revenue from the previous year's revenue:

SELECT 
    year, 
    revenue,
    revenue - LAG(revenue, 1) OVER (ORDER BY year) AS year_to_year_difference
FROM 
    yearly_revenue;