29th Oct 2024 17 minutes read SQL Project for Beginners: AdventureWorks Sales Dashboard Ekre Ceannmor SQL Project SQL Practice Table of Contents Exploring the AdventureWorks Database Scoping Your SQL Project What Will We Include in Our SQL Project Dashboard? Report 1: Monthly Sales Question: What is the company’s revenue for each month? Approach Report 2: Monthly Revenue by Country Question: What is the monthly revenue for each country? Approach Report 3: Bestselling Products Question: What Are Our Bestselling Products? Approach Report 4: Top-Performing Stores Question: What Are the Top 10 Stores by Sales for the Past 2 Months? Approach Report 5: Revenue Sources Question: How Does Online Revenue Compare with Offline Revenue? Approach Report 6: Average Order Size per Country Question: What Is the Average Order Size? Approach Report 6: Average Lifetime Customer Value per Region Question: What Is the Average Customer Lifetime Value in Each Region? Approach Create Your Own SQL Project for Beginners! Building an SQL project on your own is a great way to sharpen your skills and gain practical experience. This SQL project example for beginners will show you how to create a sales dashboard with SQL and the AdventureWorks sample database. If you're an SQL beginner who wants to take their skills beyond simple query exercises, it’s a good idea to tackle an SQL data analysis project. Not only will it challenge you to apply what you've learned, it will also give you a deeper understanding of how SQL works in real-world scenarios. This is your chance to move from basic practice to creating something tangible that showcases your growing expertise! In this article, we’ll walk you through the steps of building a SQL project using the AdventureWorks database. We’ll cover everything from the project idea to writing the final queries. You'll learn how to approach each step of the process, and the concepts that we’ll use can be transferred to any SQL project. Once you are finished, take a look at our SQL Databases for Practice course, which has 6 different databases for you to practice creating reports with. Alternatively, you can read up on how to find free datasets for your own SQL project and follow along with your own database! Now, let’s take a look at the database we will be using in this SQL example project. Exploring the AdventureWorks Database The AdventureWorks database is a sample database created by Microsoft for SQL Server. It has been ported to many databases – including PostgreSQL, which we’ll be using in this article. The database includes 68 tables that describe a fictional bicycle manufacturer and contains data about different types of transactions occurring during business operations. Because AdventureWorks is so big, it’s a perfect dataset for beginners who want to practice SQL in a real-world setting. The database is divided into 5 schemas, each representing a different field of operations: Production, Purchasing, Sales, HR, and Person. In this project, most of our data will come from the Sales schema. We will also use other schemas to get some additional information. Let’s look at the tables we’ll use the most: SalesOrderHeader: This is the biggest table in the database. It stores all information relating to an order as a whole. It will be the most common starting point for sales-related queries. Product: This stores extensive information about products offered by the company. SalesOrderDetail: This table connects the SalesOrderHeader and Product tables, storing information about the individual products that make up each order. ProductReview: This stores customers’ reviews for specific products. Store: This table stores basic information about each store. Most data is stored in the Demographics column in XML format; we will not be using it in this project. SalesTerritory and CountryRegion: We’ll use these two tables together to get the name of the country with which the order is associated. There are many other tables in the database, but don’t worry; we will be introducing them and their important columns as needed. Scoping Your SQL Project In this project, we want to build a sales dashboard for AdventureWorks. But how do we begin a project like this? A great starting point is to determine the scope of the project. Often, this means creating a list of questions you want to answer with the data. Our starting list of questions is: What are the total monthly sales? What are the monthly sales by country? Which products are our top sellers? Which stores perform the best? What is the average size of each order? What is the average customer lifetime value in each country? As we dive into the data and write our queries, we may refine these questions and adjust our queries accordingly. This is how you should start all your SQL projects: write a list of the questions you have for the data and then write the queries that will give you the answers. As you work through the queries and data, new questions will come up. Try to answer them with your queries too. Now we’re ready to start writing the queries for our sales dashboard! Keep our SQL Basics Cheat Sheet handy in case you need a quick refresher on syntax. What Will We Include in Our SQL Project Dashboard? Report 1: Monthly Sales Question: What is the company’s revenue for each month? The first report in our dashboard shows the revenue for all past months (across every year). This query can be used to plot trend lines of income, or it can be analyzed as-is. We want to view the data in chronological order, starting with the most recent dates. Approach First, we choose the table our query will be based on. We would like to show three columns: the year and month of sale and the total revenue for that month and year. We need the total order amount and the date of sale, so the SalesOrderHeader table from the Sales schema is an obvious choice. To display the year and month, we’ll use the EXTRACT function: EXTRACT(<MONTH / YEAR> FROM OrderDate) We will use these columns when grouping and ordering the result. To get the total revenue for that specific month, we can use SUM(TotalDue) together with grouping by the previous two columns. This will lead us to have separate SUM() results for each unique year-month pair. Here’s the final query: SELECT EXTRACT(YEAR FROM OrderDate) AS OrderYear, EXTRACT(MONTH FROM OrderDate) AS OrderMonth, ROUND(SUM(TotalDue), 2) AS TotalRevenue FROM Sales.SalesOrderHeader GROUP BY EXTRACT(YEAR FROM OrderDate), EXTRACT(MONTH FROM OrderDate) ORDER BY OrderYear DESC, OrderMonth DESC; Report 2: Monthly Revenue by Country Question: What is the monthly revenue for each country? We would like to see the total revenue for each month (of each year) for every country in the database. The result should be sorted in chronological order, with the more recent dates coming first. We can use this data to plot trend lines for each country or analyze the data as-is. Approach This query is similar to the previous query – we just have to add information about the country for each purchase. How can we do that? The SalesOrderHeader table has a column named TerritoryId, which refers to the SalesTerritory table. This table gives us access to the CountryRegionCode column. Country names are stored in the CountryRegion table of the Person schema. We can join this table to the SalesTerritory table using the region code. This leaves us with a clear way to expand our query. Joining the tables like this SalesOrderHeader -> SalesTerritory -> CountryRegion gives us access to the CountryRegion.Name column. We can add it as the first column in the SELECT statement and add it at the end of both the GROUP BY and ORDER BY. Take a look at the final query: SELECT cr.Name AS Country, EXTRACT(YEAR FROM OrderDate) AS OrderYear, EXTRACT(MONTH FROM OrderDate) AS OrderMonth, ROUND(SUM(TotalDue), 2) AS TotalRevenue FROM Sales.SalesOrderHeader soh JOIN Sales.SalesTerritory st ON soh.TerritoryId = st.TerritoryId JOIN Person.CountryRegion cr ON cr.CountryRegionCode = st.CountryRegionCode GROUP BY cr.Name, EXTRACT(YEAR FROM OrderDate), EXTRACT(MONTH FROM OrderDate) ORDER BY OrderYear DESC, OrderMonth DESC, Country; Here is a portion of the output: CountryOrderYearOrderMonthTotalRevenue Australia201469958.82 Canada2014611471.62 France201463660.88 Germany201463818.44 Report 3: Bestselling Products Question: What Are Our Bestselling Products? We would like to add a list of the bestselling products together with some relevant information like lifetime sales and customer reviews. The report should show reasonably recent data. Approach The SalesOrderDetail table, which holds the quantity of each item sold, will be our starting point for this query. We’ll build it incrementally. The main metric in this query is the total quantity sold for each product, which we can obtain by using the combination of SUM(OrderQty) and GROUP BY ProductId. However, the result of the query would be quite hard to read, as the products are only identifiable by their Id. The products’ names are stored in the Product table of the Production schema. Joining the two tables using ProductId and adding ProductName leaves us with this query: SELECT p.ProductId, p.Name AS ProductName, SUM(od.OrderQty) AS TotalUnitsSold FROM Sales.SalesOrderDetail od JOIN Production.Product p ON od.ProductID = p.ProductID GROUP BY p.Name, p.ProductId ORDER BY TotalUnitsSold DESC LIMIT 10; ProductIdProductNameTotalUnitsSold 712AWC Logo Cap8311 870Water Bottle - 30 oz.6815 This query is a viable dashboard element, but it can be improved. We have already laid down the foundation to work with products. Now, we want to add the ratings to our products. Let’s build it as a separate query, which we will later join with the main query. Product reviews are stored in the ProductReview table of the Production schema. We mainly care about the ProductId and the Rating left by the customer, so we’ll average the rating scores and group the data by ProductId. Now we have a neat one-to-one mapping between the product and its ratings. Let’s round the result to one decimal place, in the usual review fashion: SELECT ProductId, ROUND(AVG(Rating), 1) AS ProductRating FROM Production.ProductReview GROUP BY ProductId; ProductIdProductRating 9373.0 7985.0 7095.0 We can now wrap this query into a common table expression (CTE) to use together with our main query. CTEs are a way to create a temporary result set – like a virtual table that only exists within the context of the query. If you want a full explanation of how CTEs work, check out our comprehensive guide to CTEs in SQL. Now, here’s a possible hitch: there might not be reviews for all the products. So, when we join Product from the main query with the CTE, be sure to use a LEFT JOIN so as to not lose any data. It might be beneficial to only look at the most recent data. We can filter out old sales in the main query using the OrderDate column and subtract two months from the current date using INTERVAL ‘2 MONTHS’. Then we can filter the data to only show the orders placed after that date. If you decide to change the date range, modifying the INTERVAL part is quite easy. To learn more about the INTERVAL syntax, take a look at our Standard SQL Functions course. This is the full query: WITH ProductRating AS ( SELECT ProductId, ROUND(AVG(Rating), 1) AS ProductRating FROM Production.ProductReview GROUP BY ProductId ) SELECT p.ProductId, p.Name AS ProductName, SUM(od.OrderQty) AS TotalUnitsSold, pr.ProductRating AS ProductRating FROM Sales.SalesOrderDetail od JOIN Sales.SalesOrderHeader oh ON od.SalesOrderId = oh.SalesOrderId JOIN Production.Product p ON od.ProductID = p.ProductID LEFT JOIN ProductRating pr ON pr.ProductId = p.ProductId WHERE OrderDate > CURRENT_DATE - INTERVAL '2 MONTH' GROUP BY p.Name, p.ProductId, pr.ProductRating ORDER BY TotalUnitsSold DESC LIMIT 10; ProductIdProductNameTotalUnitsSoldProductRating 921Mountain Tire Tube187N/A 873Patch Kit / 8 Patches181N/A 870Water Bottle - 30 oz.168N/A 922Road Tire Tube160N/A 878Fender Set - Mountain107N/A 711Sport-100 Helmet, Blue96N/A 712AWC Logo Cap95N/A 708Sport-100 Helmet, Black93N/A 923Touring Tire Tube91N/A 871Mountain Bottle Cage84N/A Now this is a list worthy of being displayed on a representative’s dashboard! An understanding of sales trends is something every employee should have access to – and now we have a query that delivers it. Our dashboard is growing; let’s add some more metrics! Report 4: Top-Performing Stores Question: What Are the Top 10 Stores by Sales for the Past 2 Months? We’d like to show the top ten stores – including online stores – and their respective revenue gain for the past two months. A “top store” is defined as having the most revenue over the past 2 months. Treat all online stores as one entity. Approach As usual, we will be starting with the SalesOrderHeader table. To find out which store is responsible for which sale, let’s take a look at the Store table in the Sales schema. Each store can be linked to a sale via the SalesPersonId column. All sales where SalesPersonId is NULL also have their OnlineOrderFlag set to true. This will make the job of grouping and displaying online sales easier, as we can treat any sale where SalesPersonId is NULL as an online sale. To not lose data when joining the tables on SalesPersonId (which is sometimes NULL), use a LEFT JOIN between SalesOrderHeader and Store. In the SELECT statement, we care about two things: the name of the store, and the total sales made at that store. Don’t forget to wrap the Store.Name column with COALESCE(..., ‘Online’) to make sure that all online orders have a nice readable label. To filter out old data, use the same WHERE condition as in the last report: WHERE OrderDate > CURRENT_DATE - INTERVAL ‘2 MONTHS’. Order the results by the total sales amount (showing larger values first) and use LIMIT 10 to only return the top 10 stores by revenue in the selected period. Here’s the query: SELECT COALESCE(s.Name, 'Online') AS StoreName, ROUND(SUM(so.TotalDue), 2) AS TotalSalesAmount FROM Sales.SalesOrderHeader so LEFT JOIN Sales.Store s ON so.SalesPersonId = s.SalesPersonId WHERE OrderDate > CURRENT_DATE - INTERVAL '2 MONTHS' GROUP BY s.Name ORDER BY TotalSalesAmount DESC LIMIT 10; StoreNameTotalSalesAmount Online2419079.86 Friendly Bike Shop577136.64 Exceptional Cycles Services495918.62 Number One Bike Co.495918.62 Report 5: Revenue Sources Question: How Does Online Revenue Compare with Offline Revenue? For this dashboard metric, we would like to include data about how many orders were placed online vs. at a store. The final results should be broken down by month and year. Approach Once again, we will build this query iteratively. To split the sales into online and offline groups, we can use the OnlineOrderFlag from the Sales.SalesOrderHeader table. This table also conveniently includes the total price of the order, so this metric can be calculated from one table. However, the OnlineOrderFlag is of the BOOLEAN data type; using it in the GROUP BY clause will leave us with true and false groups. To make the data more readable, we can use a CASE WHEN expression to replace all the trues with “Online” and all the falses with “Store”. And since the OnlineOrderFlag already evaluates to BOOLEAN, we don’t need to modify it to use it as a condition. For a more in-depth explanation on how we are using CASE WHEN in this example, read our guide on how to use CASE WHEN with GROUP BY. Now that we have proper group names, we can select the other metrics that compare the revenue sources: the total number of deals closed and the total revenue from those deals. You can find the first one by simply counting all the SalesOrderIds and the second one by summing up all the TotalDue values. The grouping can be done using the bare OnlineOrderFlag and not the CASE WHEN expression; in this usage, we are simply mapping values. Here is how the current query looks: SELECT CASE WHEN OnlineOrderFlag THEN 'Online' ELSE 'Store' END AS OrderOrigin, COUNT(SalesOrderId) AS TotalSales, SUM(TotalDue) AS TotalRevenue FROM Sales.SalesOrderHeader GROUP BY OnlineOrderFlag ORDER BY TotalRevenue DESC; OrderOriginTotalSalesTotalRevenue Online276593244139 Store380690775446 We can clearly see a conclusion: Physical stores make almost 10 times less sales but produce 3 times as much revenue as the online store. We can make this data more meaningful by filtering it by year and month so we can see trends in the revenue streams. The Sales.SalesOrderHeader table includes the OrderDate column, which we can use to obtain just the YEAR and MONTH parts as we did in the first report. If we add this as extra columns and use them in the GROUP BY, we will see that both “Store” and “Online” groups have been split into different years and months. We can additionally filter the dates, like we did before with CURRENT_DATE and INTERVAL. Here’s the full query and a sample of the results: SELECT CASE WHEN OnlineOrderFlag THEN 'Online' ELSE 'Store' END AS OrderOrigin, EXTRACT(YEAR FROM OrderDate) AS OrderYear, EXTRACT(MONTH FROM OrderDate) AS OrderMonth, COUNT(SalesOrderId) AS TotalSales, SUM(TotalDue) AS totalRevenue FROM Sales.SalesOrderHeader WHERE OrderDate > CURRENT_DATE - INTERVAL '2 MONTHS' GROUP BY EXTRACT(YEAR FROM OrderDate), EXTRACT(MONTH FROM OrderDate), OnlineOrderFlag ORDER BY OrderOrigin, OrderYear DESC, OrderMonth DESC; OrderOriginOrderYearOrderMonthTotalSalesTotalRevenue Online2014693954151,48 Store20145125843850111,69 Online201454062156071,52 Store2014421428,61 Report 6: Average Order Size per Country Question: What Is the Average Order Size? We would like to display each country and the average order size for all orders coming from that country. Countries with bigger average order sizes should be shown first. Approach We have already handled getting the country name for each sale in one of the previous reports, so we can focus on the order sizes here. It’s important to note that we will have to aggregate two times in this query: First to get the order size for each order, and then to get the country average. We’ll start by getting the order sizes. Each order can be identified by an entry in the SalesOrderHeader table. It will have multiple respective entries in the SalesOrderDetail table corresponding to each product in the order. Summing up the OrderQtys from the SalesOrderDetail table for each SalesOrderId leaves us with neat order_number–order_size pairs. We’ll use a couple of JOINs to get the country names, forming a CTE like this: WITH OrderSizes AS ( SELECT sod.SalesOrderId, SUM(OrderQty) AS ProductCount, cr.Name AS Country FROM Sales.SalesOrderHeader soh JOIN Sales.SalesOrderDetail sod ON sod.SalesOrderId = soh.SalesOrderId JOIN Sales.SalesTerritory st ON soh.TerritoryId = st.TerritoryId JOIN Person.CountryRegion cr ON cr.CountryRegionCode = st.CountryRegionCode GROUP BY sod.SalesOrderId, cr.Name ) SELECT * FROM OrderSizes; Now it’s just a matter of getting the average order size from the CTE and grouping the result by country. Don’t forget to order the data to show the largest average order sizes first. Take a look at the complete query and its partial result: WITH OrderSizes AS ( SELECT sod.SalesOrderId, SUM(OrderQty) AS ProductCount, cr.Name AS Country FROM Sales.SalesOrderHeader soh JOIN Sales.SalesOrderDetail sod ON sod.SalesOrderId = soh.SalesOrderId JOIN Sales.SalesTerritory st ON soh.TerritoryId = st.TerritoryId JOIN Person.CountryRegion cr ON cr.CountryRegionCode = st.CountryRegionCode GROUP BY sod.SalesOrderId, cr.Name ) SELECT Country, ROUND(AVG(ProductCount), 2) AS AverageOrderSize FROM OrderSizes GROUP BY Country ORDER BY AverageOrderSize DESC; CountryAverageOrderSize United States12.80 Canada12.14 France7.45 United Kingdom6.24 Germany5.01 Australia2.67 Report 6: Average Lifetime Customer Value per Region Question: What Is the Average Customer Lifetime Value in Each Region? Customer Lifetime Value (CLV) is an important metric showing how much the company can expect a newly acquired customer to spend during their entire relationship with the brand. CLV is calculated by summing the revenue from all the customer’s purchases. This is most useful when calculating the budget for marketing campaigns; it’s best shown grouped by the region of operations. Approach Let’s first build the query that calculates the average CLV for each customer individually. We already know how CLV is calculated: by summing all the TotalDue values for each customer from the SalesOrderHeader table. This will give us an intermediate result. We want to group it later by country, so it is beneficial to include some of that data in here too. The Customer table in the same schema has a TerritoryId column, which we can use later to obtain more information. For now, let’s just add TerritoryId to the CLV query as an additional column. This requires a simple join and an expansion of the GROUP BY statement. Here is how the query looks so far, together with some sample results: SELECT cs.CustomerId, cs.TerritoryId, SUM(TotalDue) AS LifetimeRevenues FROM sales.Customer cs JOIN sales.SalesOrderheader ord ON cstm.CustomerId = ord.CustomerId GROUP BY cs.CustomerId, cs.TerritoryId; CustomerIdTerritoryIdLifetimeRevenue 26264434.56 30052421863.90 244161106.16 2674592135.37 Half of the problem is solved. What’s left is to group this at the country level and make the result more readable. Let’s use a CTE to save the results of this query. We can group the CTE by TerritoryId and show the average lifetime value of customers per territory. This gives a technically correct result, though the territories are unidentifiable. The actual names of countries are stored in the CountryRegion table from the Person schema. It is identifiable by the CountryRegionCode key. In the Sales schema, we can find that key in the SalesTerritory table. So, we will join the result of the CTE, through the SalesTerritory table, with the CountryRegion table. Now we can show the country’s name instead of its numeric Id. Don’t forget to edit the GROUP BY accordingly. Take a look at the final query and some of its results: WITH CustomerLifetimeRevenue AS ( SELECT cstm.CustomerId, ord.TerritoryId, SUM(TotalDue) AS LifetimeRevenue FROM Sales.Customer cstm JOIN Sales.SalesOrderHeader ord ON cstm.CustomerId = ord.CustomerId GROUP BY cstm.CustomerId, ord.TerritoryId ) SELECT cr.Name AS Country, ROUND(AVG(clr.LifetimeRevenue),2) AS AvgLifetimeCustomerValue FROM CustomerLifetimeRevenue clr JOIN Sales.SalesTerritory tr ON clr.TerritoryId = tr.TerritoryId JOIN Person.CountryRegion cr ON cr.CountryRegionCode = tr.CountryRegionCode GROUP BY cr.Name ORDER BY AvgLifetimeCustomerValue DESC, cr.Name; CountryAvgLifetimeCustomerValue Canada10971.34 United States8627.27 France4403.33 United Kingdom4394.69 Australia3259.14 Germany3024.18 Create Your Own SQL Project for Beginners! We have successfully created queries that can power an informative sales dashboard. However, this article is meant to inspire you to start your own SQL project. Let’s recap how to build an SQL project: Find an interesting dataset. Come up with a business problem for this dataset, like we did with this sales dashboard. Write questions related to the problem which the project will help answer. Build SQL queries for the questions you wrote. If you need a bit more structure, we recommend our mini-course SQL Databases for Practice. It contains 6 different databases that you can use in your own SQL project as well as some questions and suggestions to get you started. If you would like to see more SQL beginner project examples, check out our article about creating a Northwind Store Project for your portfolio. Hungry for more general practice? Check out our massive SQL Practice Track. Happy learning! Tags: SQL Project SQL Practice