Back to articles list May 19, 2016 - 6 minutes read Pivot Tables in PostgreSQL Using the Crosstab Function Maria Alcaraz Former Freelance Database Developer, Mother of 4 children Tags: PostgreSQL SQL Some years ago, when PostgreSQL version 8.3 was released, a new extension called tablefunc was introduced. This extension provides a really interesting set of functions. One of them is the crosstab function, which is used for pivot table creation. That's what we'll cover in this article. The simplest way to explain how this function works is using an example with a pivot table. First, we will explain our initial point from a practical perspective, then we'll define the pivot table we want. Our Initial Point: Raw Data As you read this article, imagine yourself as a teacher at a primary (elementary) school. We will assume that you teach every subject (language, music, etc). The school provides a system for you to record all evaluation or test results. The following SQL statement would show the evaluation results you've previously loaded into the system: SELECT * FROM evaluations StudentSubjectEvaluation_resultEvaluation_day Smith, JohnMusic7.02016-03-01 Smith, JohnMaths4.02016-03-01 Smith, JohnHistory9.02016-03-22 Smith, JohnLanguage7.02016-03-15 Smith, JohnGeography9.02016-03-04 Gabriel, PeterMusic2.02016-03-01 Gabriel, PeterMaths10.02016-03-01 Gabriel, PeterHistory7.02016-03-22 Gabriel, PeterLanguage4.02016-03-15 Gabriel, PeterGeography10.02016-03-04 Our Target: a Pivot Table The following grid could easily keep track of your students' progress. In computer science, we call this kind of grid a pivot table. If you analyze how the pivot table is built, you will find that we use values from raw data as column headers or field names (in this case, geography, history, maths, etc.) StudentGeographyHistoryLanguageMathsMusic Gabriel, Peter10.07.04.010.02.0 Smith, John9.09.07.04.07.0 Enabling the Crosstab Function As we previously mentioned, the crosstab function is part of a PostgreSQL extension called tablefunc. To call the crosstab function, you must first enable the tablefunc extension by executing the following SQL command: CREATE extension tablefunc; How the Crosstab Function Works The crosstab function receives an SQL SELECT command as a parameter, which must be compliant with the following restrictions: The SELECT must return 3 columns. The first column in the SELECT will be the identifier of every row in the pivot table or final result. In our example, this is the student's name. Notice how students' names (John Smith and Peter Gabriel) appear in the first column. The second column in the SELECT represents the categories in the pivot table. In our example, these categories are the school subjects. It is important to note that the values of this column will expand into many columns in the pivot table. If the second column returns five different values (geography, history, and so on) the pivot table will have five columns. The third column in the SELECT represents the value to be assigned to each cell of the pivot table. These are the evaluation results in our example. If we think of our pivot table as a two-dimensional array, then the first SELECT column is the first array dimension, the second SELECT column is the second dimension, and the third is the array element value.like grid [first_column_value, second_column_value] = third_column_value. In our example, the SELECT parameter will be: SELECT student, subject, evaluation_result FROM evaluations ORDER BY 1,2 The crosstab function is invoked in the SELECT statement's FROM clause. We must define the names of the columns and data types that will go into the final result. For our purposes, the final result is defined as: AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC) Joining all these pieces, our final query will be: SELECT * FROM crosstab( 'select student, subject, evaluation_result from evaluations order by 1,2') AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC); And we can see the result in here: StudentGeographyHistoryLanguageMathsMusic Gabriel, Peter10.07.04.010.02.0 Smith, John9.09.07.04.07.0 One Raw Data Set, Many Pivot Tables From a single data set, we can produce many different pivot tables. Let's continue with the teacher-and-class example as we look at a few of our options. Example 1: Monthly Evaluation Averages As teachers, we may also need a report for a student's evaluation results for the year to date. For instance, suppose we want to obtain the average evaluations for John Smith from March to July. In a grid like the following, the table would look like this: month textgeography numerichistory numericlanguage numericmaths numericmusic numeric 39.009.007.004.007.00 44.007.507.004.005.66 58.006.007.007.007.00 67.507.007.007.008.00 76.669.007.7510.006.00 The SQL for this pivot table is: SELECT * FROM crosstab( 'select extract(month from period)::text, subject.name, trunc(avg(evaluation_result),2) from evaluation, subject where evaluation.subject_id = subject.subject_id and student_id = 1 group by 1,2 order by 1,2') AS final_result(Month TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC); Please compare this output with the previous table: MonthGeographyHistoryLanguageMathsMusic 39.009.007.004.007.00 44.007.507.004.005.66 58.006.007.007.007.00 67.507.007.007.008.00 76.669.007.7510.006.00 Example 2: Finding Incomplete Student Records We could also call this section 'One Limitation of Crosstab and How to Fix It'. Before we get into that, let's set the scene: Suppose you want to see if some students don't have an evaluation score for certain subjects. Perhaps you will try our previous query, adding a WHERE clause for July. The code would look like this: SELECT * FROM crosstab( 'select student, subject, evaluation_result from evaluations where extract (month from evaluation_day) = 7 order by 1,2') AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC); The following pivot table is the result of this query. We can quickly see that we don't have any grades for language, maths, and music for Peter. StudentGeographyHistoryLanguageMathsMusic Gabriel, Peter10.06.07.00 Smith, John6.08.06.09.04.0 However, if we try a regular query to obtain Peter's grades in July ... SELECT * from evaluations where extract ( month from evaluation_day)=7 and student like 'Gabriel, Peter' ... we get different results. Here we have grades for geography and language: studentsubjectevaluation_resultevaluation_day Gabriel, PeterLanguage6.02016-07-15 Gabriel, PeterGeography10.02016-07-04 Of course, the second query is the correct one because it is showing raw data. The problem is in the pivot table building process – some categories are missing information. To fix this, we can use the crosstab function with a second parameter, which represents the complete list of categories. If there are missing values, the pivot table will still be built correctly. (I've highlighted the second parameter query in red.) SELECT * FROM crosstab( 'select student, subject, evaluation_result from evaluations where extract (month from evaluation_day) = 7 order by 1,2', 'select name from subject order by 1') AS final_result(Student TEXT, Geography NUMERIC,History NUMERIC,Language NUMERIC,Maths NUMERIC,Music NUMERIC); Now we have a correct pivot table with empty values in the proper places. StudentGeographyHistoryLanguageMathsMusic Gabriel, Peter10.0--6.0–-- Smith, John6.08.06.09.04.0 Exercise Pivot tables give us a different way to see our data. Moreover, we can create different pivot tables based on the same raw data by using the crosstab function. Try building a pivot table that shows the max temperature for each city and month based on the raw data in the table below. CREATE TABLE weather (city text, when timestamp, temperature float); CityWhenTemperature Miami2016-01-01 08:00:0068.6 Miami2016-01-21 08:00:0073.3 Orlando2016-01-01 08:00:0072.5 Miami2016-02-01 18:00:0058.6 Orlando2016-02-02 18:00:0062.5 Miami2016-03-03 08:00:0055.6 Orlando2016-03-03 08:00:0056.7 Miami2016-04-04 18:00:0050.6 Orlando2016-04-04 18:00:0061.5 The pivot table should have one row for each city and one column for each month. If you like, you can think of other pivot tables that could be made using the same data. Tags: PostgreSQL SQL You may also like Useful SQL Patterns: Pivoting The concept of the pivot in SQL refers to taking the data in table rows and making that data into columns. It’s one of the most useful SQL patterns. Read more SQL Window Functions Cheat Sheet This 2-page SQL Window Functions Cheat Sheet covers the syntax of window functions and a list of window functions. Download it in PDF or PNG format. Read more Converting Subqueries to Joins No more badly performed queries! Find out how you can convert SQL subqueries to joins and improve your query efficiency. Read more SQL Subqueries The article describes what a subquery is and what these useful statements look like. We will cover basic examples with the IN, EXISTS, ANY, and ALL operators, look at subqueries in FROM and WHERE clauses, and explore the difference between correlated and nested subqueries. Read more SQL Basics Cheat Sheet This 2-page SQL Basics Cheat Sheet will be a great value for beginners as well as for professionals. Download it in PDF or PNG format. Read more SQL Window Functions vs. SQL Aggregate Functions: Similarities and Differences What’s the difference between SQL window functions vs. SQL aggregate functions? We summarize their similarities and differences and explain when to use each one. Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.