Back to articles list Articles Cookbook
13 minutes read

Charting with SQL: A Guide For Beginners

SQL, often known as Structured Query Language, is a valuable tool for data analysts. It enables you to query, update, and organize your data. This is a critical precondition for making charts. In this article, we’ll explore how to prepare data with SQL for use in data visualizations.

Raw data is only valuable if you know how to clean and organize it. Thus, properly preparing data for charts is an important step in making captivating and informative graphs. In this article, we’ll discuss how to use SQL to clean data that you’ll use in a chart.

Charts are visual representations of data that include bar graphs, line graphs, pie charts, box plots, and many others. Their goal is to convert complex data into clear, actionable insights. This is especially important because data analysts must be able to communicate their insights to the rest of their team and to other stakeholders.

While SQL excels in data cleaning and preparation, it lacks a built-in charting functionality. However there are plenty of external resources for doing this, including Tableau, Looker Studio, Power BI, or Python libraries. That being said, the quality of your graphs is only as good as the data used to create the graphs themselves. If the data is not properly prepared, your graphs will not be able to express any meaningful message.

As a result, SQL remains extremely important. And what better time to begin studying than now, with our SQL for Data Analysis course? This course will teach you all the key skills for preparing data for graphs.

In this article we will show you how to use SQL to prepare your data for graphing using external tools. Examples and SQL queries will also be provided for you to practice along. So, with that said, let's get started!

Getting Ready to Chart with SQL

Data preparation is a vital first step for creating engaging data visualizations. Using SQL, we can efficiently organize, aggregate, and filter data with just a few lines of code. Let's take a look at some examples of charts you can make. (P.S. Here's our SQL for Data Analysis Cheat Sheet so you can review some of the functions if needed.)

1. Bar Chart

A bar chart is a type of graph which contains rectangular bars that represent different data categories. The height (for a vertical chart) or length (for a horizontal chart) of the bar represents the value of the related category.

Bar charts are useful for comparing data from different categories, such as sales by product, orders by region, and so on. You can identify the differences or trends within these categories at a glance.

Example Bar Chart

Here’s an example bar chart. We’ll create the SQL query for this chart later in the article.

Charting with SQL

What information do we need to create a bar chart?

  • Labels: We need the labels for each bar on the X (horizontal) axis. The query should return the labels as one of the columns in the SELECT clause.
  • Bar Heights: The height of the bars is according to our Y (vertical) axis. This is another column that should be computed in a SELECT The bar height will usually be computed using an aggregate function like SUM() or COUNT() and GROUP BY using the X-axis labels.
  • Sorting: Optionally, we may want to arrange the bars in the chart in a given order.

Product Category Sales Example

Imagine you want to analyze the 2023 total sales for each product category. With the help of SQL, you can prepare data for the bar chart using the following query:

SELECT 
    product_category, 
    SUM(sales_amount) AS total_sales
FROM 
    sales_data
WHERE 
    sale_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY 
    product_category
ORDER BY 
    total_sales DESC;

Explanation:

  • SELECT product_category, SUM(sales_amount) AS total_sales: First we select the desired columns and use the SUM aggregate function on sales_amount to aggregate sales per product category. This will be our X-axis label and the bar height.
  • WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31: We want to evaluate only 2023 sales, so this clause filters the data to contain only sales from 2023.
  • GROUP BY product_category: Here we group our data by product_category to summarize total sales.
  • ORDER BY total_sales DESC: Finally, we sort the data in descending order so the highest sales are first and the lowest sales are last.

2. Line Chart

A line chart is a type of graph that shows data points connected by a line. This graph is a great data visualization for showing data trends over time, such as tracking website activity over a month, analyzing sales trends over a year, and more.

Example Line Chart

Here’s an example line chart:

Charting with SQL

What information do we need to create a line chart?

  • Labels: We need labels (typically dates) for the X-axis.
  • Line Height: For each point on the line, we need a line height. This can be the value on this date, either selected from the database or calculated with an aggregate function. It can also be another value, such as a moving average, calculated with the help of window functions.
  • Sorting: In order to ensure our data is ordered correctly by date, we use ORDER

Mobile Phone Sales Example

Let's imagine we want to find the peak days for mobile phone sales by monitoring our website's monthly sales. To do this, we can calculate the rolling average of the sales data to smooth out fluctuations.

Firstly we will divide the query into two parts: a subquery that collects the required data and a main query that computes the rolling average using this data. Using SQL, we could prepare the subquery as follows:

SELECT 
    DATE(sale_date) AS sale_day, 
    COUNT(*) AS mobile_sales_count
FROM 
    sales_data
WHERE 
    device_type = 'mobile'
GROUP BY 
    sale_day
ORDER BY 
    sale_day ASC;

Explanation:

  • SELECT DATE(sale_date) AS sale_day, COUNT(*) AS mobile_sales_count: Extracts the date from sale_date and counts mobile sales for each day.
  • WHERE device_type = 'mobile': Filters the device_type column to only include mobile device sales.
  • GROUP BY sale_day: Groups data by each day.
  • ORDER BY sale_day ASC: Sorts the results by date in ascending order.

We then take this query one step further by utilizing a moving average as mentioned. A moving average smooths out short-term variations in data helping to minimize fluctuations. A fluctuation is a short-term variance in data, such as daily spikes or declines, that can hide larger trends.

By using a 7-day moving average, we can determine the average for the current day and the previous six days. As a result we smooth out daily variances and reveal weekly trends. Altering our previous query will let us do exactly that:

SELECT 
    sale_day, 
    AVG(mobile_sales_count) OVER (ORDER BY sale_day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
FROM (
    SELECT 
        DATE(sale_date) AS sale_day, 
        COUNT(*) AS mobile_sales_count
    FROM 
        sales_data
    WHERE 
         device_type = 'mobile'
    GROUP BY 
        DATE(sale_date)
) daily_sales;

Explanation:

  • SELECT sale_day, AVG(mobile_sales_count) OVER (ORDER BY sale_day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg: This is a window function which calculates a 7-day moving average (current + 6 previous days) of mobile sales, ordered by sale_day.
  • FROM (...) daily_sales: Here we are adapting the previous query as a subquery, and naming it daily_sales.

Find more about window functions with our article SQL Window Function Example With Explanations.

3. Pie Chart

A pie chart is a graph that divides a circle into sections or slices, with each one representing a portion of the whole. This sort of graph is particularly useful for representing proportions or percentages, such as in demographic distribution, revenue source breakdowns, and more.

Example Pie Chart

An example pie chart may look like this:

Charting with SQL

What information do we need to create a pie chart?

  • Labels: We need labels for each slice of the pie chart.
  • Slice Size: We need to calculate the size of each slice.
  • Percentage Calculation: We need to calculate each group's proportionate contribution to the total, as a percentage.
  • Grouping: We usually need to group by chart labels.

Demographic Sales Breakdown Example

Let's imagine that we want to see the percentage of sales contributed by each age demographic segment for our business. We could use the following query to prepare our data:

SELECT 
    CASE 
        WHEN age < 18 THEN 'Below 18'
        WHEN age BETWEEN 18 AND 24 THEN '18-24'
        WHEN age BETWEEN 25 AND 34 THEN '25-34'
        WHEN age BETWEEN 35 AND 44 THEN '35-44'
        WHEN age BETWEEN 45 AND 54 THEN '45-54'
        WHEN age >= 55 THEN '55+'
    END AS age_group,
    SUM(sales_amount) AS total_sales, 
    SUM(sales_amount) * 100.0 / SUM(SUM(sales_amount)) OVER () AS pc_of_sales  
FROM 
    sales_data
WHERE 
    age IS NOT NULL  
GROUP BY 
    age_group; 

Explanation:

  • CASE ... END AS age_group: Using the CASE statement, we divide the ages of our customers into specified age groups. Each group will form one slice of our pie chart.
  • SUM(sales_amount) AS total_sales: Using the SUM() aggregate function, we calculate the total sales amount for each age group.
  • SUM(sales_amount) * 100.0 / SUM(SUM(sales_amount)) OVER () AS pc_of_sales: Using a window function, we calculate the percentage of total sales contributed by each age group.
  • WHERE age IS NOT NULL: To clean the data, we filter out rows with NULL values in the age column.
  • GROUP BY age_group: Finally we group the data by our defined age groups to summarize sales for each category.

4. Histogram

A histogram is a graph that shows the distribution or spread of data by grouping it into intervals. This is especially helpful when looking for patterns and trends in how spread out our data is. This is often the case with delivery time distributions, sales volume across price ranges, and more.

Example Histogram

Charting with SQL

What information do we need to create a histogram?

  • Labels: We need the start and end value for each bin. Here we use 10-minute intervals on the X-axis.
  • Bar Heights: For each bin, we need to calculate its height. This is usually done using an aggregate function with GROUP BY.
  • Sorting: Usually, we use ORDER BY so that bins are sorted from lowest to highest.

Delivery Time Distributions Example

Let's say you work for a food delivery business and want to find the variation in delivery times in Barcelona. We can prepare our data for this by running the following SQL query:

SELECT 
    FLOOR(delivery_duration / 10) * 10 AS bin_start, 
    (FLOOR(delivery_duration / 10) + 1) * 10 AS bin_end, 
    COUNT(*) AS no_of_orders
FROM 
    deliveries
WHERE
   region = ‘Barcelona’
GROUP BY 
    bin_start
ORDER BY 
    bin_start;

Explanation:

  • FLOOR(delivery_duration / 10) * 10 AS bin_start: First. we want to create our intervals. To do this, we group delivery_duration into 10-minute intervals. This calculates the lower bound of each bin (bin_start).
  • (FLOOR(delivery_duration / 10) + 1) * 10 AS bin_end: Next, we need to calculate the upper bound of each bin (bin_end) by adding 10 minutes to bin_start.
  • COUNT(*) AS no_of_orders: Now we count the number of deliveries falling within each 10-minute bin.
  • WHERE region = 'Barcelona': To limit the data to Barcelona, we filter our region column.
  • GROUP BY bin_start: We group deliveries by each 10-minute interval.
  • ORDER BY bin_start: Finally, we sort the groups in ascending order by their starting times.

5. Scatter Plot

Last but not least, let's take a look at the scatter plot. This graph can visualize the potential relationship between up to three variables by plotting data points. It effectively identifies correlations and links in data, such as advertising cost vs. sales or employee satisfaction vs. retention.

Example Scatter Plot

Charting with SQL

What information do we need to create a scatter plot?

  • X and Y coordinates for each point.
  • Label: Optionally, we need a label for each point to give each point a color (like in our example chart) or a label.

This data can be simply selected from a database or can be computed, like in our example below.

Advertising Effectiveness Example

Let's say we want to analyze the effectiveness of different forms of advertising over the past two years by observing their impact on sales. Using SQL, we can prepare the data as follows:

SELECT 
    DATE_FORMAT(sd.sale_date, '%Y-%m') AS year_month, 
    mc.advertising_type, 
    SUM(mc.advertising_spend) AS total_advertising_spend, 
    SUM(sd.sales) AS total_sales
FROM 
    marketing_campaigns mc
JOIN 
    sales_data sd ON mc.campaign_id = sd.campaign_id
WHERE 
    sd.sale_date BETWEEN '2022-01-01' AND '2023-12-31'
GROUP BY 
    month, mc.advertising_type
ORDER BY 
    year_month;

Explanation:

  • Each point on the plot corresponds to a month and advertising type:
    • DATE_FORMAT(sd.sale_date, '%Y-%m') AS year_month: First, we extract the year and month from sale_date.
    • mc.advertising_type: Here, we are selecting the advertising type, which will be used as a hue on our scatterplot. Hue is essentially a third layer we can add with color to differentiate categories in a plot.
  • SUM(mc.advertising_spend) AS total_advertising_spend: We aggregate the total advertising spend for each month and advertising type using SUM().
  • SUM(sd.sales) AS total_sales: Again we use SUM() to aggregate the total sales.
  • JOIN sales_data sd ON mc.campaign_id = sd.campaign_id: As our desired data is in two separate tables, we join the marketing_campaigns table with the sales_data table based on the common column campaign_id.
  • WHERE sd.sale_date BETWEEN '2022-01-01' AND '2023-12-31': To focus on data from 2022 and 2023, we filter the sales date.
  • GROUP BY year_month, mc.advertising_type: We group the data by our chart categories, giving us a total per month year per advertising type.
  • ORDER BY year_month: Finally, we order the data in ascending order by month.

If you want to learn even more tips and tricks for leveraging SQL for creating graphs, make sure to check out our article Data Visualization Using SQL: A Guide for Data Analysts.

Creating Charts with Data Exported from SQL

After our data is prepared, we must utilize specialized tools to create charts outside of SQL. Let's look at some of the alternatives and how you can import your SQL data to them.

1.    Python

Python provides libraries such as Matplotlib and seaborn for creating various charts. Using the Python library pandas, you can import and visualize the data from SQL. Let's go over how to do this.

import pandas as pd

# Load the CSV file
df = pd.read_csv('/path/to/sql_export.csv')

# Preview the DataFrame
print(df.head())
  • Step 3: Create a chart using a Python library: Once the data is imported, we can create our graph. In this example, we will create a bar chart using Matplotlib and seaborn:
import matplotlib.pyplot as plt
import seaborn as sns

# Create a bar chart
plt.figure(figsize=(10, 6))
sns.barplot(x='product_category', y='total_sales', data=df, palette='viridis')

# Customize the chart
plt.title('Total Sales by Product Category (2023)', fontsize=16)
plt.xlabel('Product Category', fontsize=12)
plt.ylabel('Total Sales', fontsize=12)
plt.xticks(rotation=45)
plt.tight_layout()

# Show the chart
plt.show()

2.    Tableau

Tableau is an easy-to-use Business Intelligence solution that connects directly to SQL databases and generates dynamic, interactive dashboards. It can even support your SQL queries directly using its custom SQL editor. There are also connectors for major databases like MySQL, PostgreSQL, and Snowflake.

3.    Power BI

Power BI is a popular Microsoft application which can be used for creating interactive reports and dashboards. It too includes native support for SQL Server and other databases, thus allowing you to perform direct SQL queries.

4.    Looker Studio

Looker Studio (formerly Google Data Studio) is a Cloud-based application for developing shareable, interactive dashboards. It is perfect for users who are already familiar with the Google environment. It is compatible with databases like BigQuery, MySQL, and PostgreSQL, and supports custom queries.

Learn SQL for Charting and Data Analysis

SQL is the foundation of data analysis; as such, it plays an important part in charting data. Although SQL lacks built-in charting tools, its strength resides in preparing and arranging data for data visualizations. Using SQL in conjunction with tools like Python, Tableau, Power BI, or Looker Studio improves your ability to extract actionable insights and effectively present them.

If you are new to SQL and want to improve your skills, now is the time to get started. Mastering SQL will not only speed up your data preparation process, it will also improve your ability to produce visually appealing charts and dashboards.

Begin learning SQL today with our course SQL for Data Analysis! You can also look into other skills that may be valuable for your career in our article Skills Every Data Analyst Should Have. Happy learning!