Back to articles list Articles Cookbook
10 minutes read

Building a Custom 'Recommendation Engine' in 10 Lines of SQL

Recommendation engines don’t always start with AI or machine learning. In many cases, a few well-written SQL queries are enough to uncover products that customers frequently buy together. In this article, you’ll build a simple recommendation engine in about 10 lines of SQL using self-joins and aggregation.

When you hear “recommendation engine,” you probably think of machine learning, Python notebooks, and complex AI models. That association has become so common that we often forget how many recommendations in real systems are built on much simpler foundations. In practice, a large part of e-commerce and marketing recommendations starts with SQL and a clear analytical question.

One of the most common approaches is based on co-occurrence. If two products appear together in the same transaction often enough, that relationship is meaningful from a business perspective. This logic sits behind familiar patterns such as “Customers also bought” sections, product bundles, cross-sell suggestions, and even some marketing segmentation analyses. There is no prediction here in the machine-learning sense—just counting, comparing, and ranking what actually happened.

SQL is particularly well suited for this kind of work. With JOIN and GROUP BY, you can analyze relationships inside transactional data and turn raw purchase history into actionable insights. If you’re comfortable with these concepts, you already have most of what you need to build a basic recommendation engine.

In this article, you’ll build a basic recommendation engine using one idea:
customers who bought this also bought that.

We will produce real recommendations using analytical SQL alone, without Python or AI libraries. This type of thinking is exactly what intermediate and advanced SQL practice on LearnSQL.com is designed to develop.

The Business Question

Let’s make the problem more concrete.

Imagine you work for an online electronics store. One of your best-selling items is wireless headphones. You want to improve the product page by showing customers which other products are most often bought together with these headphones.

The question becomes:

For the product Wireless Headphones, which three other products are most frequently purchased in the same transaction?

This is a simple question, but a very practical one. The answer can be used directly on a product page, in cross-sell widgets, or by a marketing team planning bundles and promotions. For an e-commerce manager or a product analyst, this kind of insight is immediately actionable.

The Data Model

To keep the example realistic, assume you work with a typical e-commerce database. Orders, products, and purchased items are stored separately, which makes analytical queries both flexible and scalable.

We’ll use three tables.

transactions — This table stores information about each completed order.

transaction_id transaction_date customer_id total_amount
1001 2024-05-01 501 249.00
1002 2024-05-01 502 179.00
1003 2024-05-02 501 219.00

Each row represents a single transaction placed by a customer. In real systems, this table often includes payment status, shipping method, or currency, but those details are not relevant for this analysis.

productsThis table stores the product catalog.

product_id product_name category price
101 Wireless Headphones Audio 199.00
102 Phone Case Accessories 29.00
103 USB-C Charger Accessories 49.00
104 Screen Protector Accessories 19.00

This table gives us readable product names and allows further filtering by category or price if needed.

transaction_items — This table links products to transactions and is the core of our analysis.

transaction_id product_id quantity
1001 101 1
1001 102 1
1001 103 1
1002 101 1
1002 104 1
1003 101 1
1003 102 1

Each row represents a product included in a transaction. If a customer buys multiple products in one order, the transaction appears in multiple rows.

This table design is extremely common in production databases and is also the structure used in many LearnSQL.com practice exercises. If you want hands-on practice with exactly this kind of store data—working with orders, products, and transaction items—the Basic SQL Practice: Store course on LearnSQL.com is a natural next step.

The Core Idea: Self-Join

Before applying it to products, let’s look at a simple example of what a self-join actually does. A self-join is a join of a table with itself.

Imagine a table called employees with the following columns: employee_id, employee_name, manager_id. Each employee has a manager, and that manager is also stored in the same table. To display employees together with their managers’ names, you don’t join to a different table—you join the employees table to itself.

When you use the same table twice in a query, SQL requires that each copy has its own alias. Without aliases, SQL wouldn’t know which instance of the table you’re referring to. It’s best to choose aliases that reflect the role each copy plays in the query, not just short technical labels.

In this example:
e represents the employee
m represents the manager

Here’s the query:

SELECT
    e.employee_name,
    m.employee_name AS manager_name
FROM employees e
JOIN employees m
    ON e.manager_id = m.employee_id;

Let’s walk through how this works.

SQL reads the employees table twice. The first time, it treats the rows as employees (e). The second time, it treats the same rows as managers (m). The JOIN condition connects the two roles by matching the employee’s manager_id to the manager’s employee_id. As a result, each employee row is paired with the row that represents their manager.

If self-joins feel unintuitive, this exact pattern is practiced extensively in the SQL JOINs course on LearnSQL.com, including a dedicated section on self-joins.

Applying Self-Join to the Recommendation Problem

To find products bought together, we need to compare products within the same transaction. That means joining the transaction_items table to itself and giving each copy a clear role. One instance represents the product we’re analyzing, and the other represents the other products that appear in the same order. The shared transaction_id is what links them.

Conceptually, this looks like:
– the first instance represents the product we care about
– the second instance represents the other products in the same transaction

This is where a self-join becomes extremely useful. It allows SQL to reason about relationships inside the same dataset, using the same table viewed from two different perspectives.

Before writing the query, let’s outline the logic step by step:

  • Take all products from a transaction
  • Match each product with other products from the same transaction
  • Exclude matching a product with itself
  • Count how often each product pair appears
  • Sort the results by frequency
  • Return the top results

This step-by-step breakdown is exactly how analytical SQL problems should be approached. Syntax comes last. Thinking comes first.

The Query

Here is a complete query that does all of the above in roughly ten lines of SQL:

SELECT
    t1.product_id AS base_product,
    t2.product_id AS recommended_product,
    COUNT(*) AS times_bought_together
FROM transaction_items t1
JOIN transaction_items t2
    ON t1.transaction_id = t2.transaction_id
   AND t1.product_id <> t2.product_id
WHERE t1.product_id = 101
GROUP BY t1.product_id, t2.product_id
ORDER BY times_bought_together DESC
LIMIT 3;

This query finds the three products most frequently bought together with product 101.

Let’s break down what happens here, step by step.

The query starts by reading the transaction_items table twice. The first copy, t1, represents the base product we are analyzing. The second copy, t2, represents the other products that appear in the same transaction. Giving each copy a clear alias makes their roles explicit and keeps the query readable.

The JOIN condition connects rows from both copies using transaction_id. This ensures we are only comparing products that were bought in the same order. The additional condition t1.product_id <> t2.product_id prevents a product from being matched with itself, which would add meaningless pairs to the results.

The WHERE clause narrows the analysis to one specific product. In this case, t1.product_id = 101 means we are asking: when product 101 appears in a transaction, what other products appear alongside it? This is what turns a general co-occurrence query into a concrete recommendation engine for a single product.

Next, the GROUP BY groups the results by product pairs: the base product and the recommended product. For each unique pair, COUNT(*) counts how many times that combination appears across all transactions. This count represents the strength of the relationship between the two products.

The ORDER BY clause sorts these pairs from the most frequent to the least frequent, and the LIMIT clause keeps only the top three results. These are the three products most often bought together with product 101.

The result is simple but powerful: a ranked list of related products based entirely on real purchase behavior. No predictions, no models—just SQL expressing a clear analytical idea. That’s your recommendation engine.

How to Read the Result

The output of the query looks like this:

base_product recommended_product times_bought_together
101 102 2
101 103 1
101 104 1

Each row represents a recommendation for the base product.

In this example, product 101 (Wireless Headphones) was most often bought together with product 102 (Phone Case). The value in times_bought_together shows how many transactions contained both products. The higher the number, the stronger the relationship between the two products.

This result can be used directly in dashboards, reports, or application logic—for example, to power a “Customers also bought” section on a product page or to help a marketing team decide which products to bundle together.

How to Extend This with More SQL

Once you understand this pattern, extending it becomes a matter of asking better questions, not learning new tools. The core logic stays the same; you simply refine which transactions you look at and how you interpret the results.

A common first step is filtering transactions by time. You might want recommendations based only on the last 30 days or the current season, instead of all historical data. Adding a date condition lets you adapt recommendations to recent buying behavior, which is often more relevant for marketing and merchandising decisions.

You can also limit the analysis to specific customer segments. For example, you may want to see which products are commonly bought together by returning customers, high-value customers, or customers from a particular region. This turns a generic recommendation into something more targeted, without changing the overall structure of the query.

Ranking is another natural extension. Instead of relying only on ORDER BY and LIMIT, you can use window functions to rank recommendations per product or per segment. This is especially useful when you want to generate recommendations for many products at once, rather than analyzing one product at a time.

Finally, it’s often worth excluding low-volume noise. Products that were bought together only once or twice may not represent a meaningful relationship. Adding a minimum threshold helps ensure that recommendations are based on consistent patterns rather than random coincidences.

All of these extensions rely on analytical SQL techniques: JOINs, aggregations, filtering, and window functions. These are exactly the skills developed in LearnSQL.com intermediate and advanced courses, where similar patterns appear again and again in realistic business scenarios.

SQL Is Often the First Recommendation Engine

Before reaching for Python or AI tools, it’s worth pausing and asking a simpler question: can SQL already solve this problem? As this example shows, in many real business scenarios the answer is yes. With a clear question and a solid understanding of JOIN and aggregations, SQL is enough to turn raw transactional data into practical recommendations.

This is why strong SQL fundamentals matter. If you can reason about data relationships and express that logic clearly in SQL, you can build solutions that deliver real business value without unnecessary complexity. Developing that kind of confidence takes practice across many realistic scenarios—not just one query.

If you want to go further, the All Forever SQL Plan on LearnSQL.com gives you access to all current SQL courses and every new course added in the future. It’s a way to build and continuously expand the exact skills used in this article: analytical thinking, working with real datasets, and turning business questions into clear, effective SQL queries.