7th Nov 2024 10 minutes read

SQL Pivot Tables: A Step-by-Step Tutorial

Data Analysis

Table of Contents

How SQL Pivot Tables Work in Data Analysis
The Ingredients of an SQL Pivot Table
Native Pivot Tables
Creating SQL Pivots Without Native PIVOT() Tables
A Final Word on SQL Pivot Tables

Are you a data analyst looking to better your abilities? Or perhaps you want to discover what tools you need to succeed in this role? Today, we're going to look at one of the most important tools for data analysts: the SQL pivot table.

An SQL pivot table is an essential tool for quickly arranging and summarizing data. In just a few lines of code, it helps analysts rapidly and effectively identify patterns and trends in data. This is achieved by pivoting or rotating the table's rows and columns and applying an aggregated calculation to the underlying data.

If you're new to creating SQL reports, check out our Creating Basic SQL Reports course. You will learn how to write complex queries in SQL. Among the many functions taught in this course is the CASE WHEN syntax, which is essential for building pivot tables.

Before we jump into pivot tables, make sure to keep our SQL for Data Analysis Cheat Sheet close. You can use it as a refresher on some of the functions we'll be discussing.

How SQL Pivot Tables Work in Data Analysis

SQL pivot tables are a useful tool for swiftly reorganizing, summarizing, and analyzing vast amounts of information. You may have already worked with tools like Microsoft Excel and Google Sheets; these offer built-in pivot table functionality, allowing you to easily perform such transformations.

Let's take an example. Imagine you're provided with a table that displays an artist's music streaming data across several music platforms and countries.

Table:

country	platform	streams
France	Spotify	1000
Ireland	Apple Music	800
Germany	SoundCloud	500
Ireland	Spotify	1000
Germany	SoundCloud	600

How this data is laid out makes comparison difficult, especially when the number of countries and platforms increases.

Now imagine if we could structure the data so that each country's streams are broken down by platform and totaled collectively. Then we would be able to answer questions like “Which platform is most popular in each country?” and “Which platform and country should we focus more effort on?” This new structure could look like this:

Result:

platform	France	Ireland	Germany
Spotify	1000	1000	-
Apple Music	-	800	-
SoundCloud	-	-	1100

The newly formatted table makes it much easier to compare the artist's streams across platforms and countries. You can now see at a glance which platform works best in each country – Spotify in France and Ireland, and SoundCloud in Germany. The new format enables faster analysis and insights, displaying geographical and platform-specific performance without having to manually dig through rows of data.

The Ingredients of an SQL Pivot Table

Let's first break down the building blocks of the pivot table by defining them with an accompanying example. We’ll use the same table as we did in the previous section to illustrate:

Row Identifier: This is the first ingredient; it is a column selected from your dataset to represent the pivot table's rows (i.e. the individual records). In the above example, we used the column platform as our row identifier, so each row will hold the details of one platform.
Column Identifier: The second ingredient is a column that is selected to be the pivot table’s vertical portion. In our example, we used the column country as the column identifier; the countries will be listed horizontally, like headers across the top of the result table.
Aggregation: Now that we have the vertical and horizontal part of our table, the aggregation is the ingredient that determines how the underlying data in the row and column identifier interact. Aggregation functions such as SUM(), AVG(), and MAX() are commonly used. In our example, we used SUM() to calculate the total streams for each platform in each of the countries.
Pivot Table: The pivot table is the final dish. It summarizes the inputted data by pivoting rows into columns with an aggregation function. By effectively summarizing the data, the pivot table helps us visualize what we are analyzing in a more structured form. In the example above, we see that the final output from our inputs displays the streams per each platform across the countries listed in our original table.

Native Pivot Tables

Depending on the database you are using, it may have a native pivot table. Essentially this just means that there’s a built-in function that the database uses to create SQL pivot tables. SQL Server is a great example of a database that offers a PIVOT() function.

Let's build a pivot table using the SQL Server PIVOT() function to summarize streaming data. Here’s the data we’ll use stored in a table called streams_table:

country	platform	streams
France	Spotify	1000
Ireland	Apple Music	800
Germany	SoundCloud	500
Ireland	Spotify	1000
Germany	SoundCloud	600

Query:

SELECT platform, France, Ireland, Germany
FROM
(
    SELECT platform, country, streams
    FROM streams_table
) AS SourceTable
PIVOT
(
    SUM(streams)
    FOR country IN (France, Ireland, Germany)
) AS PivotTable;

Let's break down this query. We’ll start with the subquery.

1. Define the Source Table

SELECT platform, country, streams
FROM streams_table

Lines 4–6 (immediately after the open parenthesis) define the SourceTable through a subquery. The source table has three columns stated in the SELECT statement (platform, country and streams) that come from the initial streams_table.

2. Reshaping the Data Using Pivot Operations

SUM(streams)
FOR country IN (France, Ireland, Germany)

Next – inside the PIVOT() function parentheses – we define our PivotTable by applying the SUM() aggregate function on the column streams. This sums the number of streams for each platform. Then we specify which column to pivot on: country.

Note that the country values are given without quotation marks. Note also that in SQL Server, column names that contain spaces or special characters or that begin with a number must be enclosed in square brackets. If we had a column containing years, it would look like this:

SUM(streams)
FOR stream_year IN ([2022], [2023])

3. Selecting the Final Columns

SELECT platform, France, Ireland, Germany

Finally, we head back to the start of the query and select the columns for our output. This includes the column platform and the newly formed columns for each country.

Result:

platform	France	Ireland	Germany
Spotify	1000	1000	-
Apple Music	-	800	-
SoundCloud	-	-	1100

Not all SQL languages have built-in PIVOT() functions. When this function isn’t available in your database, there is another straightforward technique we can apply to get around it. Let's explore it.

Creating SQL Pivots Without Native PIVOT() Tables

As previously mentioned, not all databases—including well-known ones like MySQL and PostgreSQL—have native pivot capabilities. You may remember that we touched on the CASE WHEN syntax in our introduction. By using this syntax, we can obtain the exact same result!

Since MySQL and PostgreSQL are widely used databases, the CASE WHEN technique is a widely used method for creating pivot tables. Let's take a look at some examples:

Example 1: Pivoting Streaming Data

We'll use the same example to demonstrate how the CASE WHEN approach can produce the same outcomes:

Query:

SELECT platform,
    SUM(CASE WHEN country = 'France' THEN streams END) AS france_streams,
    SUM(CASE WHEN country = 'Ireland' THEN streams END) AS ireland_streams,
    SUM(CASE WHEN country = 'Germany' THEN streams END) AS germany_streams
FROM streams_table
GROUP BY platform;

The resulting SQL pivot tables show total streams by platform for the countries listed.

Row Identifier: Since we select the column platform, each platform appears as a column in the output. You'll also notice we also include platform in our GROUP BY to separate our platforms into groups.
Column Identifier: To specify the countries as the vertical columns, we are using the structure CASE WHEN country = 'France'. If the stream is from France, the CASE WHEN statement returns the streams It is then summed with the SUM() function. If the stream is not from France, then the CASE WHEN returns NULL and the values are not counted in that sum and the query moves on to the next CASE WHEN. See our article on How to Use CASE WHEN in GROUP BY for a more detailed explanation.
Aggregation: Just like before, we use SUM() as the aggregating function. Our case statement's logic states that streams are accumulated if the country is the same as the given country; if not, nothing is accumulated.

Let's use the same table for demonstration, but reverse the row and column identifiers and see what happens. T

Query:

SELECT country,
    SUM(CASE WHEN platform = 'Apple Music' THEN streams END) AS apple_music_streams,
    SUM(CASE WHEN platform = 'SoundCloud' THEN streams END) AS soundcloud_streams,
    SUM(CASE WHEN platform = 'Spotify' THEN streams END) AS spotify_streams
FROM streams_table
GROUP BY country;

Result:

country	apple_music_streams	soundcloud_streams	spotify_streams
France	NULL	NULL	1000
Ireland	800	NULL	1000
Germany	NULL	1100	NULL

As you can see, the result of the aggregation remains the same. The only change is how the data is shown:

Row Identifier: This time, we use country as the row identifier, so each country appears as a horizontal row in the result. We also include country in the GROUP BY clause to ensure that the results are properly grouped by country, resulting in separate rows for each country's total streaming data.
Column Identifier: We modified our column identification to platform, so the platforms appear as vertical columns in the output. This is accomplished with the SUM(CASE WHEN platform =...)
Aggregation: Similar to the last query, the numerical outputs are the same because we used the same aggregate method (SUM()).

For more examples of the CASE WHEN with SUM() aggregation, check out our article How to Use CASE WHEN with SUM() in SQL.

Example 2: Pivoting Customer Data

Next, let's see how pivot tables can assist us in discovering target market segments.

The customer_table describes customer sales data with four columns: customer, age_category, country and purchases:

customer	age_category	country	purchases
Rachel	18-24	France	60
Harry	35-44	Spain	75
John	25-34	Italy	120
Fred	35-44	Spain	105
Mary	35-44	Italy	40

Query:

SELECT age_category,
    SUM(CASE WHEN country = 'France' THEN purchases END) AS France,
    SUM(CASE WHEN country = 'Spain' THEN purchases END) AS Spain,
    SUM(CASE WHEN country = 'Italy' THEN purchases END) AS Italy
FROM customer_table
GROUP BY age_category;

Result:

age_category	France	Spain	Italy
18-24	60	NULL	NULL
25-34	NULL	NULL	120
35-44	NULL	180	40

The resulting SQL pivot tables show total purchases by age category in France, Spain, and Italy where:

The row identifier is the column age_category.
The column identifier is country, specifying France, Spain and Italy.
The aggregate function is SUM().

Example 3: Pivoting Operations Data

Now, let's see an example of how using SQL pivot tables can uncover potential bottlenecks or faults in operations.

The table below, cake_baking_data, describes bakery data using four columns: order_number, stage, order_day and mins_taken:

Table:

order_number	stage	order_day	mins_taken
101	Mixing	Monday	10
101	Baking	Monday	30
101	Decorating	Monday	27
102	Mixing	Monday	15
102	Baking	Monday	32
102	Decorating	Monday	25
103	Mixing	Friday	12
103	Baking	Friday	30
103	Decorating	Friday	29

Suppose we want to find the average time taken by each task on each day. Here’s the query we’d use.

Query:

SELECT 
    order_day,
    AVG(CASE WHEN stage = 'Mixing' THEN mins_taken END) AS avg_mixing_time,
    AVG(CASE WHEN stage = 'Baking' THEN mins_taken END) AS avg_baking_time,
    AVG(CASE WHEN stage = 'Decorating' THEN mins_taken END) AS avg_decorating_time
FROM cake_baking_data
GROUP BY order_day;

Result:

order_day	avg_mixing_time	avg_baking_time	avg_decorating_time
Monday	12.5	31.0	26.0
Friday	12.0	30.0	29.0

The resulting pivot table shows the average times by order day for mixing, baking, and decorating where:

The row identifier is the column order_day.
The column identifier is stage, specifying mixing, baking, or decorating.
The aggregate function is AVG().

As you can see, the CASE WHEN workaround allows you to stay within the popular databases MySQL and PostgreSQL and still achieve the same SQL pivot tables.

A Final Word on SQL Pivot Tables

To summarize, it is clear why SQL pivot tables are so important. Being able to pivot data is an important tool in your data analysis toolbox. The ability to quickly and efficiently transform and summarize datasets lets you discover trends and make smarter judgments.

In this article, we’ve mainly focused on pivoting for data analysts. If a data analysis career interests you, I recommend checking out our article 25 SQL Interview Questions for Data Analysts. It will help you understand what you’d need to know to be successful in this field.

The best way to improve your SQL is to practice, practice, practice! We recommend our course Creating Basic SQL Reports for this; you'll not only be able to practice what we've learned today, you’ll also learn how to build meaningful reports. So check it out and keep building your SQL pivot table skills!

Tags:

Data Analysis

How SQL Pivot Tables Work in Data Analysis

The Ingredients of an SQL Pivot Table

Native Pivot Tables

1. Define the Source Table

2. Reshaping the Data Using Pivot Operations

3. Selecting the Final Columns

Creating SQL Pivots Without Native PIVOT() Tables

Example 1: Pivoting Streaming Data

Example 2: Pivoting Customer Data

Example 3: Pivoting Operations Data

A Final Word on SQL Pivot Tables

You may also like