27th Feb 2024 10 minutes read

6 Examples of NTILE() Function in SQL

Table of Contents

What Are SQL Window Functions?
Example Dataset: A Soccer Store
Example #1: Dividing Rows into 2 Groups
Example #2: Distribution of 2023 Sales
Example #3: Monthly Sales for Each Category and Team
Example #4: Teams’ Lowest and Highest Sales
Example #5: Christmas Sales Behavior by Soccer Team
Example #6: Obtain Evenly Distributed Groups of Customers
Continue Learning About NTILE() and Other SQL Window Functions

The SQL NTILE() function can greatly simplify your data analysis and reporting. Follow these six examples to learn what it is and when to use it.

Window functions are very useful, providing great expressive power to SQL. In this article, we discuss the NTILE() function, which allows us to divide a set of records into subsets of approximately equal size. This function is widely used in financial or economic calculations.

Before we get into the NTILE() function, let’s quickly review window functions in SQL.

What Are SQL Window Functions?

Window functions are functions that work on a group of rows called a window; they return a value based on that group. The NTILE(N) function receives an integer parameter (N) and divides the complete set of rows into N subsets. Each subset has approximately the same number of rows and is identified by a number between 1 and N. This ID number is what NTILE() returns.

If you want to learn SQL window functions, I recommend our interactive course. It contains 218 hands-on exercises to help you practice different window functions. At the end, you'll feel comfortable using this advanced SQL technique.

If you want to see window functions in action, check out our article SQL Window Function Example with Explanations. And if you want to explore the differences between GROUP BY and window functions, read SQL Window Functions vs. GROUP BY.

Example Dataset: A Soccer Store

In the rest of the article, we will base our examples on the database of a fictional soccer store. We’ll focus on the sales table, which has the following structure and data:

customer_id	sale_date	amount	product_id	product_category	soccer team
114	2024-01-27	20.00	1083	Accessory	River
130	2023-12-18	150.00	1002	Shirt	Barcelona
119	2023-12-01	15.00	1002	Accessory	Barcelona
107	2023-12-23	145.00	1011	Shirt	Manchester
104	2023-12-12	10.00	1003	Accessory	PSG
135	2023-12-24	185.00	1002	Shirt	Barcelona
123	2023-12-24	135.00	1012	Shirt	Barcelona
113	2023-12-24	110.00	1022	Shirt	Barcelona

I think everything in this table is self-explanatory, so let’s move on to the examples.

Example #1: Dividing Rows into 2 Groups

We will start with a very simple example. We want to divide the rows in the sales table into two groups: group #1 and group #2. Then the following query does that:

SELECT NTILE(2) OVER() AS group, 
       sale_date, 
       product_id,
 soccer_team 
FROM sales;

In the query, the NTILE(2) OVER() expression returns 1 for the first 50% of the rows in the result set and 2 for the second 50% of the rows. The rows are assigned to each group in a non-deterministic way – i.e. there aren’t any criteria to assign rows to any particular group. Here’s a partial result of the query, showing each group in a different color:

group	sale_date	product_id	soccer_team
1	2024-01-12	1083	River Plate
1	2023-12-18	1002	Barcelona
…
1	2023-12-01	1002	Barcelona
1	2023-12-23	1011	Manchester
2	2023-12-12	1003	PSG
2	2023-12-24	1002	Barcelona
…
2	2023-12-24	1012	Barcelona
2	2023-12-24	1022	Barcelona

Example #2: Distribution of 2023 Sales

The table sales stores records for each completed sale. The marketing department is interested in analyzing the sales distribution based on the amount spent. They asked for a report grouping all the sales of the year 2023 into four groups of the same size (the number of sales in each group must be the same). Each sale must be assigned based on the sale amount.

The first group (sale_group #1) should have the sales with the lowest amount, and the last group (sale_group #4) should have sales with the highest amount. For each sale, the report should include the sale group number, the customer_id, the product_id, and the soccer_team. The query to obtain this result is the following:

  SELECT
      NTILE(4) OVER ( ORDER BY amount ) AS sale_group,
      product_id,
      product_category,
      soccer_team,
      amount as sales_amount
  FROM sales
  WHERE sale_date >= '2023-12-01' AND sale_date <= '2023-12-31';

This query uses a WHERE clause to filter for sales that occurred in 2023. Then the NTILE(4) function tries to create four groups of rows with the same number of rows each. We use “tries” because it isn’t always possible to create groups of the same size; some groups may have one row less than the others.

How do we define which group each row belongs to? The OVER (ORDER BY amount) clause indicates that, before rows are assigned groups, all rows must be ordered based on the sale amount. Once sorted, the NTILE(4) function will take the first quarter of the sales and assign them the value 1, then take the next quarter of the sales and assign them the value 2, and so on. Below is a partial result showing records in each sale_group:

sale_group	product_id	product_category	soccer_team	sale_amount
1	1003	Accessory	PSG	10.00
1	1002	Accessory	Barcelona	15.00
…
2	1083	Accessory	River	20.00
2	1022	Shirt	Barcelona	110.00
…
3	1012	Shirt	Barcelona	135.00
3	1011	Shirt	Manchester	145.00
…
4	1002	Shirt	Barcelona	150.00
4	1002	Shirt	Barcelona	185.00
…

We can see that in the cheapest group (1), we only have sales from the Accessories category. This is because accessory products are usually cheapest, while shirts are usually more expensive. We can also see that products for Barcelona are in all four sale groups, which suggests that this team has an offer at all price points.

Example #3: Monthly Sales for Each Category and Team

In the next query, we will create a report on monthly sales. The marketing department wants to divide monthly sales for each product category and team into four groups. The first group will have the product categories, soccer teams, and months with the lowest total sales. The next group will have the next level of total_sales, and so on. This way, the marketers will be able to analyze the distribution of sales in different months and categories. Below is the query:

WITH monthly_sales_stats AS (
   SELECT
        EXTRACT(MONTH FROM sale_date) as month,
        product_category,
        soccer_team,
        SUM(amount) AS total_sales
   FROM sales
   WHERE sale_date >= '2023-01-01' and sale_date <= '2023-12-31'
   GROUP BY EXTRACT(MONTH FROM sale_date), product_category, soccer_team
)
SELECT
     NTILE(4) OVER ( ORDER BY total_sales ) AS group,
     month,
     product_category,
     soccer_team,
     total_sales
FROM monthly_sales_stats
ORDER BY group DESC, total_sales DESC;

The query has a common table expression (CTE) called monthly_sales_stats, which calculates the total sales for each combination of month, product_category, and soccer_team.

Note that we filter the rows of the year 2023 in the WHERE clause. The expression EXTRACT(MONTH FROM sale_date) is used to obtain the month value in the field sale_date.

In the main query, we call NTILE(4) OVER (ORDER BY total_sales) to assign to each row of the CTE a group number. Below is a partial result of the query showing the first three rows of each group:

group	month	product_category	soccer_team	total_sales
4	12	Shirt	Barcelona	1158.00
4	9	Shirt	Real Madrid	755.00
4	12	Shirt	Manchester	433.00
…
3	4	Shirt	Real Madrid	225.00
3	12	Shirt	River	220.00
3	3	Shirt	Barcelona	210.00
…
2	2	Shirt	Barcelona	115.00
2	2	Shirt	Real Madrid	105.00
2	6	Shirt	River	100.00
…
1	11	Accessory	Barcelona	30.00
1	6	Accessory	Real Madrid	30.00
1	9	Accessory	Barcelona	25.00
…

In the result, you can see the highest sales for each soccer team are in December, probably because of Christmas shopping. The category “Accessory” is at the end of the result table, because accessories are usually lower in price.

Example #4: Teams’ Lowest and Highest Sales

Like other window functions, you can use NTILE() with the PARTITION BY clause. Here’s an example.

The marketing team wants to investigate how sales are distributed within articles for each soccer team. The idea is to split each team’s sales into sets based on amount. Once again, we will put the lowest sales in the first set and then work up to the highest sales in the fourth set.

This is the query:

SELECT	soccer_team,
      	NTILE(4) OVER (PARTITION BY soccer_team ORDER BY amount) AS group_number,
product_id, 
product_category,
amount 
FROM sales ;

The query is very simple; it has a SELECT with a list of columns and a FROM with a table name. The NTILE() function uses PARTITION BY and ORDER BY. The PARTITION BY soccer_team puts all the rows with the same value in soccer_team in the same partition. The ORDER BY amount orders the records in each team’s set, putting the ones with the lowest values first. Then NTILE(4) returns 1 for the first 25% of the rows in the group of rows, 2 for the second 25% of the rows in the group, and so on.

Below are some partial results:

soccer_team	group_number	product_id	product_category	amount
Barcelona	1	1028	Accessory	10.00
Barcelona	1	1027	Accessory	15.00
Barcelona	1	1002	Accessory	15.00
Barcelona	1	1025	Accessory	20.00
…				…
Barcelona	2	1022	Shirt	100.00
Barcelona	2	1023	Shirt	110.00
Barcelona	2	1024	Shirt	115.00
Barcelona	2	1023	Shirt	115.00
…
Barcelona	3	1035	Shirt	115.00
Barcelona	3	1032	Shirt	120.00
Barcelona	3	1036	Shirt	120.00
Barcelona	3	1026	Shirt	128.00
…
Barcelona	4	1002	Shirt	150.00
Barcelona	4	1004	Shirt	155.00
Barcelona	4	1012	Shirt	170.00
Barcelona	4	1013	Shirt	185.00
…
Manchester	1	1028	Accessory	20.00
Manchester	1	1025	Accessory	20.00
Manchester	1	1024	Accessory	25.00
…
Manchester	2	1022	Shirt	105.00
Manchester	2	1032	Shirt	110.00
Manchester	2	1035	Shirt	110.00
…
Manchester	3	1024	Shirt	115.00
Manchester	3	1022	Shirt	115.00
Manchester	3	1023	Shirt	118.00
…
Manchester	4	1033	Shirt	120.00
Manchester	4	1011	Shirt	145.00
Manchester	4	1012	Shirt	178.00
…

Example #5: Christmas Sales Behavior by Soccer Team

This example is very similar to Example 3, with the difference that the NTILE() function uses a PARTITION BY soccer_team subclause. This means that NTILE() will create groups of rows for each soccer_team instead of creating groups from the complete result set (as in Example 3). As a result, each soccer team will have four sets.

The query is:

WITH monthly_sales_stats AS (
       SELECT
            EXTRACT(MONTH FROM sale_date) as month,
            product_category,
            soccer_team,
            SUM(amount) AS total_sales
      FROM sales
      WHERE sale_date >= '2023-01-01' and sale_date <= '2023-12-31'
      GROUP BY EXTRACT(MONTH FROM sale_date), product_category, soccer_team
)
  SELECT
     NTILE(4) OVER(PARTITION BY soccer_team ORDER BY total_sales DESC) AS group,
     month,
     product_category,
     soccer_team,
     total_sales
  FROM monthly_sales_stats
  ORDER BY total_sales DESC;

The CTE monthly_sales_stats is exactly the same as in the previous example. It has a GROUP BY clause that helps calculate the total amount of sales for each combination of month, product_category, and soccer_team.

After creating the CTE, we write a SELECT with the following NTILE() expression:

NTILE(4) OVER(PARTITION BY soccer_team ORDER BY total_sales DESC) AS group

This expression splits the CTE rows into sets that have the same value in the soccer_team field. For each soccer team set, NTILE(4) tries to create four equal subsets, assigning each row a value from 1 to 4.

The rest of the columns in the result come from the CTE. Below are partial results showing two rows for each group for the Barcelona and Manchester teams.

group	month	product_category	soccer_team	total_sales
1	12	Shirt	Barcelona	1158.00
1	6	Shirt	Barcelona	360.00
…
2	3	Shirt	Barcelona	340.00
2	7	Shirt	Barcelona	225.00
…
3	10	Shirt	Barcelona	115.00
3	1	Shirt	Barcelona	115.00
…				115.00
4	9	Accessory	Barcelona	25.00
4	10	Accessory	Barcelona	20.00
…
1	12	Shirt	Manchester	433.00
1	6	Shirt	Manchester	340.00
…
2	4	Shirt	Manchester	210.00
2	9	Shirt	Manchester	155.00
…
3	5	Shirt	Manchester	120.00
3	9	Shirt	Manchester	115.00
…
4	3	Accessory	Manchester	30.00
4	11	Accessory	Manchester	30.00
…

Example #6: Obtain Evenly Distributed Groups of Customers

Let’s suppose the marketing department wants to create three evenly distributed groups of customers to run three different marketing campaigns; each campaign will be targeted to one group. The customers in each group are chosen randomly. Then the marketing department will compare campaign results and evaluate which campaign is better.

To simulate a random selection of customers, one idea is to use the seconds of the last time each customer bought something. And ordered by the seconds, we will create three groups of customers. Let’s see the query:

WITH customer_last_transaction_timestamp AS (
  SELECT customer_id, 
       max(sales_date) AS last_ts
  FROM   sales
  GROUP BY customer_id
)
SELECT NTILE(3) OVER (ORDER BY EXTRACT(SECOND FROM last_ts)) AS group_number, 
       customer_id 
FROM   customer_last_transaction_timestamp 
ORDER BY group_number;

The previous query returns all the customers with a group_number from 1 to 3; this represents the marketing group to which the customer was assigned. The CTE customer_last_transaction_timestamp stores every customer with their last transaction timestamp (obtained with MAX(sales_date)).

The main query uses the NTILE(3) function to create three customer groups of approximately the same size:

NTILE(3) OVER (ORDER BY EXTRACT(SECOND FROM last_ts))

The above statement returns 1 for the 33% of the customers with a timestamp value in the lower third. It returns 2 for the next 33 percent, and so on. The subexpression EXTRACT(SECOND FROM last_ts) takes the seconds part (i.e. 22) from a timestamp (ie. ‘2023-03-30 10:30:22’). Below is a partial result:

group_number	customer_id
1	111
1	135
2	123
2	154
3	108
3	104

Continue Learning About NTILE() and Other SQL Window Functions

In this article, we showed several examples of using the NTILE() window function. We also demonstrated different OVER clauses. The OVER clause is common to all SQL’s window functions. If you want to get some hands-on experience with these commands, I suggest our interactive Window Functions course.

If you are a frequent user of window functions our free SQL Window Functions Cheat Sheet is a very useful resource. In fact, I have it on the wall of my office, ready to use when I have doubts about syntax. I strongly recommend it.

Finally, the article Top 10 SQL Window Functions Interview Questions is great if you have a job interview and want to be prepared for SQL topics.

Happy learning, and keep progressing with SQL window functions!

Tags: