Back to articles list Articles Cookbook
10 minutes read

6 Examples of NTILE() Function in SQL

The SQL NTILE() function can greatly simplify your data analysis and reporting. Follow these six examples to learn what it is and when to use it.

Window functions are very useful, providing great expressive power to SQL. In this article, we discuss the NTILE() function, which allows us to divide a set of records into subsets of approximately equal size. This function is widely used in financial or economic calculations.

Before we get into the NTILE() function, let’s quickly review window functions in SQL.

What Are SQL Window Functions?

Window functions are functions that work on a group of rows called a window; they return a value based on that group. The NTILE(N) function receives an integer parameter (N) and divides the complete set of rows into N subsets. Each subset has approximately the same number of rows and is identified by a number between 1 and N. This ID number is what NTILE() returns.

If you want to learn SQL window functions, I recommend our interactive course. It contains 218 hands-on exercises to help you practice different window functions. At the end, you'll feel comfortable using this advanced SQL technique.

If you want to see window functions in action, check out our article SQL Window Function Example with Explanations. And if you want to explore the differences between GROUP BY  and window functions, read SQL Window Functions vs. GROUP BY.

Example Dataset: A Soccer Store

In the rest of the article, we will base our examples on the database of a fictional soccer store. We’ll focus on the sales table, which has the following structure and data:

customer_idsale_dateamountproduct_idproduct_categorysoccer team
1142024-01-2720.001083AccessoryRiver
1302023-12-18150.001002ShirtBarcelona
1192023-12-0115.001002AccessoryBarcelona
1072023-12-23145.001011ShirtManchester
1042023-12-1210.001003AccessoryPSG
1352023-12-24185.001002ShirtBarcelona
1232023-12-24135.001012ShirtBarcelona
1132023-12-24110.001022ShirtBarcelona

I think everything in this table is self-explanatory, so let’s move on to the examples.

Example #1: Dividing Rows into 2 Groups

We will start with a very simple example. We want to divide the rows in the sales table into two groups: group #1 and group #2. Then the following query does that:

SELECT NTILE(2) OVER() AS group, 
       sale_date, 
       product_id,
 soccer_team 
FROM sales;

In the query, the NTILE(2) OVER() expression returns 1 for the first 50% of the rows in the result set and 2 for the second 50% of the rows. The rows are assigned to each group in a non-deterministic way – i.e. there aren’t any criteria to assign rows to any particular group. Here’s a partial result of the query, showing each group in a different color:

groupsale_dateproduct_idsoccer_team
12024-01-121083River Plate
12023-12-181002Barcelona
12023-12-011002Barcelona
12023-12-231011Manchester
22023-12-121003PSG
22023-12-241002Barcelona
22023-12-241012Barcelona
22023-12-241022Barcelona

Example #2: Distribution of 2023 Sales

The table sales stores records for each completed sale. The marketing department is interested in analyzing the sales distribution based on the amount spent. They asked for a report grouping all the sales of the year 2023 into four groups of the same size (the number of sales in each group must be the same). Each sale must be assigned based on the sale amount.

The first group (sale_group #1) should have the sales with the lowest amount, and the last group (sale_group #4) should have sales with the highest amount. For each sale, the report should include the sale group number, the customer_id, the product_id, and the soccer_team. The query to obtain this result is the following:

  SELECT
      NTILE(4) OVER ( ORDER BY amount ) AS sale_group,
      product_id,
      product_category,
      soccer_team,
      amount as sales_amount
  FROM sales
  WHERE sale_date >= '2023-12-01' AND sale_date <= '2023-12-31';

This query uses a WHERE clause to filter for sales that occurred in 2023. Then the NTILE(4) function tries to create four groups of rows with the same number of rows each. We use “tries” because it isn’t always possible to create groups of the same size; some groups may have one row less than the others.

How do we define which group each row belongs to? The OVER (ORDER BY amount) clause indicates that, before rows are assigned groups, all rows must be ordered based on the sale amount. Once sorted, the NTILE(4) function will take the first quarter of the sales and assign them the value 1, then take the next quarter of the sales and assign them the value 2, and so on. Below is a partial result showing records in each sale_group:

sale_groupproduct_idproduct_categorysoccer_teamsale_amount
11003AccessoryPSG10.00
11002AccessoryBarcelona15.00
21083AccessoryRiver20.00
21022ShirtBarcelona110.00
31012ShirtBarcelona135.00
31011ShirtManchester145.00
41002ShirtBarcelona150.00
41002ShirtBarcelona185.00

We can see that in the cheapest group (1), we only have sales from the Accessories category. This is because accessory products are usually cheapest, while shirts are usually more expensive. We can also see that products for Barcelona are in all four sale groups, which suggests that this team has an offer at all price points.

Example #3: Monthly Sales for Each Category and Team

In the next query, we will create a report on monthly sales. The marketing department wants to divide monthly sales for each product category and team into four groups. The first group will have the product categories, soccer teams, and months with the lowest total sales. The next group will have the next level of total_sales, and so on. This way, the marketers will be able to analyze the distribution of sales in different months and categories.  Below is the query:

WITH monthly_sales_stats AS (
   SELECT
        EXTRACT(MONTH FROM sale_date) as month,
        product_category,
        soccer_team,
        SUM(amount) AS total_sales
   FROM sales
   WHERE sale_date >= '2023-01-01' and sale_date <= '2023-12-31'
   GROUP BY EXTRACT(MONTH FROM sale_date), product_category, soccer_team
)
SELECT
     NTILE(4) OVER ( ORDER BY total_sales ) AS group,
     month,
     product_category,
     soccer_team,
     total_sales
FROM monthly_sales_stats
ORDER BY group DESC, total_sales DESC;

The query has a common table expression (CTE) called monthly_sales_stats, which calculates the total sales for each combination of month, product_category, and soccer_team.

Note that we filter the rows of the year 2023 in the WHERE clause. The expression EXTRACT(MONTH FROM sale_date) is used to obtain the month value in the field sale_date.

In the main query, we call NTILE(4) OVER (ORDER BY total_sales) to assign to each row of the CTE a group number. Below is a partial result of the query showing the first three rows of each group:

groupmonthproduct_categorysoccer_teamtotal_sales
412ShirtBarcelona1158.00
49ShirtReal Madrid755.00
412ShirtManchester433.00
34ShirtReal Madrid225.00
312ShirtRiver220.00
33ShirtBarcelona210.00
22ShirtBarcelona115.00
22ShirtReal Madrid105.00
26ShirtRiver100.00
111AccessoryBarcelona30.00
16AccessoryReal Madrid30.00
19AccessoryBarcelona25.00

In the result, you can see the highest sales for each soccer team are in December, probably because of Christmas shopping. The category “Accessory” is at the end of the result table, because accessories are usually lower in price.

Example #4: Teams’ Lowest and Highest Sales

Like other window functions, you can use NTILE() with the PARTITION BY clause. Here’s an example.

The marketing team wants to investigate how sales are distributed within articles for each soccer team. The idea is to split each team’s sales into sets based on amount. Once again, we will put the lowest sales in the first set and then work up to the highest sales in the fourth set.

This is the query:

SELECT	soccer_team,
      	NTILE(4) OVER (PARTITION BY soccer_team ORDER BY amount) AS group_number,
product_id, 
product_category,
amount 
FROM sales ; 

The query is very simple; it has a SELECT with a list of columns and a FROM with a table name. The NTILE() function uses PARTITION BY and ORDER BY. The PARTITION BY soccer_team puts all the rows with the same value in soccer_team in the same partition. The ORDER BY amount orders the records in each team’s set, putting the ones with the lowest values first. Then NTILE(4) returns 1 for the first 25% of the rows in the group of rows, 2 for the second 25% of the rows in the group, and so on.

Below are some partial results:

soccer_teamgroup_numberproduct_idproduct_categoryamount
Barcelona11028Accessory10.00
Barcelona11027Accessory15.00
Barcelona11002Accessory15.00
Barcelona11025Accessory20.00
Barcelona21022Shirt100.00
Barcelona21023Shirt110.00
Barcelona21024Shirt115.00
Barcelona21023Shirt115.00
Barcelona31035Shirt115.00
Barcelona31032Shirt120.00
Barcelona31036Shirt120.00
Barcelona31026Shirt128.00
Barcelona41002Shirt150.00
Barcelona41004Shirt155.00
Barcelona41012Shirt170.00
Barcelona41013Shirt185.00
Manchester11028Accessory20.00
Manchester11025Accessory20.00
Manchester11024Accessory25.00
Manchester21022Shirt105.00
Manchester21032Shirt110.00
Manchester21035Shirt110.00
Manchester31024Shirt115.00
Manchester31022Shirt115.00
Manchester31023Shirt118.00
Manchester41033Shirt120.00
Manchester41011Shirt145.00
Manchester41012Shirt178.00

Example #5: Christmas Sales Behavior by Soccer Team

This example is very similar to Example 3, with the difference that the NTILE() function uses a PARTITION BY soccer_team subclause. This means that  NTILE() will create groups of rows for each soccer_team instead of creating groups from the complete result set (as in Example 3). As a result, each soccer team will have four sets.

The query is:

WITH monthly_sales_stats AS (
       SELECT
            EXTRACT(MONTH FROM sale_date) as month,
            product_category,
            soccer_team,
            SUM(amount) AS total_sales
      FROM sales
      WHERE sale_date >= '2023-01-01' and sale_date <= '2023-12-31'
      GROUP BY EXTRACT(MONTH FROM sale_date), product_category, soccer_team
)
  SELECT
     NTILE(4) OVER(PARTITION BY soccer_team ORDER BY total_sales DESC) AS group,
     month,
     product_category,
     soccer_team,
     total_sales
  FROM monthly_sales_stats
  ORDER BY total_sales DESC;

The CTE monthly_sales_stats is exactly the same as in the previous example. It has a GROUP BY clause that helps calculate the total amount of sales for each combination of month, product_category, and soccer_team.

After creating the CTE, we write a SELECT with the following NTILE() expression:

NTILE(4) OVER(PARTITION BY soccer_team ORDER BY total_sales DESC) AS group

This expression splits the CTE rows into sets that have the same value in the soccer_team field. For each soccer team set, NTILE(4) tries to create four equal subsets, assigning each row a value from 1 to 4.

The rest of the columns in the result come from the CTE. Below are partial results showing two rows for each group for the Barcelona and Manchester teams.

groupmonthproduct_categorysoccer_teamtotal_sales
112ShirtBarcelona1158.00
16ShirtBarcelona360.00
23ShirtBarcelona340.00
27ShirtBarcelona225.00
310ShirtBarcelona115.00
31ShirtBarcelona115.00
115.00
49AccessoryBarcelona25.00
410AccessoryBarcelona20.00
112ShirtManchester433.00
16ShirtManchester340.00
24ShirtManchester210.00
29ShirtManchester155.00
35ShirtManchester120.00
39ShirtManchester115.00
43AccessoryManchester30.00
411AccessoryManchester30.00

Example #6: Obtain Evenly Distributed Groups of Customers

Let’s suppose the marketing department wants to create three evenly distributed groups of customers to run three different marketing campaigns; each campaign will be targeted to one group. The customers in each group are chosen randomly. Then the marketing department will compare campaign results and evaluate which campaign is better.

To simulate a random selection of customers, one idea is to use the seconds of the last time each customer bought something. And ordered by the seconds, we will create three groups of customers. Let’s see the query:

WITH customer_last_transaction_timestamp AS (
  SELECT customer_id, 
       max(sales_date) AS last_ts
  FROM   sales
  GROUP BY customer_id
)
SELECT NTILE(3) OVER (ORDER BY EXTRACT(SECOND FROM last_ts)) AS group_number, 
       customer_id 
FROM   customer_last_transaction_timestamp 
ORDER BY group_number;

The previous query returns all the customers with a group_number from 1 to 3; this represents the marketing group to which the customer was assigned. The CTE customer_last_transaction_timestamp stores every customer with their last transaction timestamp (obtained with MAX(sales_date)).

The main query uses the NTILE(3) function to create three customer groups of approximately the same size:

NTILE(3) OVER (ORDER BY EXTRACT(SECOND FROM last_ts))

The above statement returns 1 for the 33% of the customers with a timestamp value in the lower third. It returns 2 for the next 33 percent, and so on. The subexpression EXTRACT(SECOND FROM last_ts) takes the seconds part (i.e. 22) from a timestamp (ie. ‘2023-03-30 10:30:22’). Below is a partial result:

group_numbercustomer_id
1111
1135
2123
2154
3108
3104

Continue Learning About NTILE() and Other SQL Window Functions

In this article, we showed several examples of using the NTILE() window function. We also demonstrated different OVER clauses. The OVER clause is common to all SQL’s window functions. If you want to get some hands-on experience with these commands, I suggest our interactive Window Functions course.

If you are a frequent user of window functions our free  SQL Window Functions Cheat Sheet is a very useful resource. In fact, I have it on the wall of my office, ready to use when I have doubts about syntax. I strongly recommend it.

Finally, the article Top 10 SQL Window Functions Interview Questions is great if you have a job interview and want to be prepared for SQL topics.

Happy learning, and keep progressing with SQL window functions!