19th May 2021 Updated: 30th May 2024 12 minutes read What Is Advanced SQL? Tihomir Babic Advanced SQL Table of Contents Advanced SQL Is Everywhere Embrace the Inconsistency What Might Advanced SQL Include? What Is Advanced SQL at LearnSQL.com? Window Functions Common Table Expressions (CTEs) GROUP BY Extensions Do You Consider Yourself an Advanced SQL User? Are you confused about advanced SQL skills? What are they? This article will explain what advanced SQL can mean, especially as we use it on LearnSQL.com. I’m sure you find the phrases ‘advanced SQL skills’ or ‘advanced SQL topics’ very often. You read one article about advanced SQL and you’re happy with how easy these advanced topics seem to be. Then you talk to someone and you see they consider everything you know as basic SQL knowledge. How do you define yourself? Do you consider yourself a basic, intermediate, or advanced SQL user? Advanced SQL Is Everywhere Advanced SQL is everywhere. Well, the word ‘advanced’ is, at least. It’s used very commonly by SQL learners and SQL users. You can find it in SQL course descriptions, in job ads, and in the job interview questions. It’s in the SQL literature. You hear it when colleagues are talking at work. It’s in numerous articles trying to define what advanced SQL is. An advanced SQL query often involves techniques such as case statements, the UNION operator, and aliasing tables and columns to perform conditional logic, combine results from multiple SELECT statements, and simplify code. Actually, I’m not trying to give you a definition of advanced SQL. I’m trying to tell you something else: There’s no single definition of advanced SQL! And you should stop looking for it. What should you do instead? Embrace the Inconsistency That’s right! You should accept that the term ‘advanced SQL’ is used inconsistently. It means different things depending on the context and who’s using the term. It’s only logical that advanced SQL would include one thing for someone who writes SQL reports and something entirely different for someone hiring a data analyst. A software developer will surely have yet another definition of what advanced SQL includes. You get the picture. Advanced SQL can’t have just one definition. When you’re reading about advanced SQL skills, you should always consider the context, who’s doing the talking, and their audience. What Might Advanced SQL Include? For instance, there’s a really interesting discussion about advanced SQL on Stack Overflow. The discussion was started by someone looking for an SQL job who noted that there are plenty of jobs requiring “advanced SQL knowledge”. The user is asking what is to be expected from this kind of job. What knowledge is considered advanced? The first answer gives a rather long code snippet as a measure of advanced knowledge. Even though it’s pretty long, it’s not that complicated. According to this reply, advanced SQL covers selecting columns, aggregate functions like MIN() and MAX(), the CASE WHEN statement, JOINs, the WHERE clause, GROUP BY, declaring variables, and subqueries. On the other hand, the following reply considers most of these topics to be basic or intermediate at best. This user believes advanced SQL topics include functions, stored procedures, hierarchical queries, triggers, indices, data modeling (normal forms, primary and foreign keys, table constraints), transactions, and much more. This is much closer to my definition of advanced SQL and what I was taught in SQL lectures. However, this was a program for database administrators; understandably, this knowledge is considered advanced. Some reporting specialists and data analysts may never need to use such things. It’s interesting to note that sometimes JOINs are considered advanced while writing stored procedures is still regarded as basic knowledge. I can understand why because one user hints at the problem with JOINs. Even though they are generally considered basic knowledge, many SQL users learn much more advanced topics before really understanding JOINs. This is how the basics easily become advanced knowledge. It’s not unusual to find someone using flashy functions, triggers, and whatnot – without knowing how to write a simple JOIN. What Is Advanced SQL at LearnSQL.com? Before explaining what advanced SQL is, it’s essential to know what it isn't. When you look at our courses and articles, basic/intermediate SQL is anything in SQL-92. (Here’s the history and details of SQL standards if you’re interested in finding out more.) This includes: All types of JOINs Aggregate functions GROUP BY HAVING Subqueries Set operations (UNION, UNION ALL, INTERSECT, MINUS) You must be familiar with these topics if you claim to know SQL. These are things you should understand before moving to more advanced topics. Generally, we consider three topics as ‘advanced SQL’: Window functions Common Table Expressions (CTEs) GROUP BY extensions (ROLLUP, CUBE, and GROUPING SETS) Anybody wanting to learn (or practice) all three topics should check out our Advanced SQL track. Of course, this is not the only advanced SQL course out there; we’ve already reviewed some excellent advanced SQL courses from other platforms. For now, let’s look at an example of each of these topics. Window Functions SQL window functions allow you to perform operations that are often required for creating reports, e.g. ranking data, calculating running totals and moving averages, finding the difference between rows, etc. Not only that, but you can also divide data into windows, which enables you to perform operations on data subsets rather than the data as a whole. You can learn much more about this in our Window Functions course. Let’s see an example. This code will show the difference in yearly numbers of cars sold, according to make (i.e. car brand): SELECT car_make, cars_sold, year, cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff FROM cars_sale; To get this information, you first have to select the columns you want in the result: car_make, cars_sold, year. To get the yearly difference, subtract the previous year’s sale from the current year’s sale: cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff. Here, cars_sold means the current year’s sales. The LAG() function allows you to fetch data from the previous row. The OVER clause signifies this is a window function. Then follows the PARTITION BY clause, which is used to define the window (data subset) we want to use. In this case, it’s the car_make; this means the function will calculate the sale difference only within a specific car make. When it runs into another car make, the function will reset and start calculating the sales difference all over again. Finally, the operation is ordered by year ascending. Why is that? The LAG() function will get us the data from the previous row. So, if this operation is performed for every year in ascending order, the “previous year” will mean the previous row. That’s exactly what we need. Take a look at the result: car_makecars_soldyearsales_diff Nissan459,6632015NULL Nissan312,4532016-147,210 Nissan541,2232017228,770 Nissan452,8442018-88,379 Nissan584,2562019131,412 Renault1,342,5582015NULL Renault17,251,456201615,908,898 Renault16,842,5522017-408,904 Renault1,425,8952018-15,416,657 Renault1,548,6982019122,803 Did you see the NULL value in the first row? That’s because 2015 is the first year; there’s no previous data that can be deducted from it. If you follow the results, you’ll see every row is the difference between the current row’s yearly sales and the previous row’s yearly sales. When you reach the row where Renault starts, there’s a NULL again. This is what window functions do; they work on data within a given window., I’ve defined the window according to the car_make, so the window function resets when we get a new value in this column. It’s only logical. Why would I deduct Renault sales from Nissan sales? I want to do that for every car make separately. Common Table Expressions (CTEs) CTEs will allow you to write complex queries without using subqueries keeping your code simple and straightforward. They give you the possibility to produce complex reports quickly and efficiently. They also enable you to make some calculations you wouldn’t be able to do otherwise. What is a common table expression, you might ask? It’s a temporary result you can use in the SELECT statement. It works like a temporary table – you can join it with other tables, other CTEs, or with itself. They can be helpful if you, for instance, have to report on time spent on a particular project. On one side, there’s a table containing data about the date when each employee worked on this project. There’s also the start time and end time. On the other side, there’s a table containing employee names. You have to produce a table showing every employee’s name and his or her average time spent on this project. Here’s how the CTE can help you: WITH time_worked AS ( SELECT employee_id, end_time - start_time AS time FROM project_timesheet ) SELECT e.first_name, e.last_name, AVG (tw.time) AS avg_time_worked FROM employee e LEFT JOIN time_worked tw ON e.id = tw.employee_id GROUP BY e.first_name, e.last_name; How does this CTE work? Every CTE opens with the WITH clause. Then you must name your CTE; in this case, it’s time_worked. Then you write a SELECT statement. Here, I’ll use the CTE to calculate how much time each employee worked every time they worked on the project. I need the CTE because I don’t have this information stated explicitly in the table; I only have the start_time and end_time. To calculate the average time worked, the first step is to get the time worked. That’s why this CTE deducts the start_time from the end_time and shows the result in the column time. The data is taken from the table project_timesheet. Now that I’ve written the CTE, I can use it in the next SELECT statement. First, I’ll get the first name and the last name from the table employee. Then I’ll use the AVG() function on the column time from the CTE time_worked. To do that, I’ve used the LEFT JOIN – and I’ve used it exactly like I would with any other table. Finally, the data is grouped by the employees’ first and last names. The result is a little table like this: first_namelast_nameavg_time_worked JanineRooney4:58:39 MikeWatson5:52:24 PeterMarcotti4:09:33 IngeOngeborg8:56:05 If CTEs have you interested, imagine what you’ll be able to do after finishing our Recursive Queries course. Oh, yeah – I didn’t mention that a CTE can be recursive, which means it references itself. By doing so, it returns the sub-result and repeats the process until it returns the final result. While CTEs can be non-recursive, there are no recursive queries that are non-CTE. If you want to learn recursive queries, knowing CTEs is a must. GROUP BY Extensions SQL’s GROUP BY extensions provide you with additional possibilities for grouping data. This, in return, can increase the complexity of your data analysis and the reports you create. There are three GROUP BY extensions: ROLLUP CUBE GROUPING SETS Unlike regular GROUP BY, ROLLUP lets you group the data into multiple data sets and aggregate results on different levels. Fancy talk, but simply put: you can use ROLLUP to calculate totals and subtotals, just like in Excel pivot tables. The CUBE extension is similar, but there’s one crucial difference. CUBE will generate subtotals for every combination of the columns specified. Finally, there are GROUPING SETs. A grouping set is a set of columns you use in the GROUP BY clause. You can connect different queries containing GROUP BY if you use UNION ALL. However, the more queries you have, the messier it gets. You can achieve the same result but with much neater queries by using GROUPING SETS. Let me show you how ROLLUP works. Suppose you’re working for a guitar store that has several locations. You’ll sometimes need to create a report showing the total number of guitars you have in stock. Here’s a query that will do that on a manufacturer, model, and store level: SELECT manufacturer, model, store, SUM(quantity) AS quantity_sum FROM guitars GROUP BY ROLLUP (manufacturer, model, store) ORDER BY manufacturer; This doesn’t look complicated. It’s a simple SELECT statement that will give you the columns manufacturer, model, and store from the table guitars. I’ve used the aggregate function SUM() to get the quantity. Then I wrote GROUP BY followed immediately by ROLLUP. The data will be grouped according to the columns in the parentheses. Finally, the result is ordered by the manufacturer. What will this query return? Have a look: manufacturermodelstorequantity_sum FenderJazzmasterAmsterdam9 FenderJazzmasterNew York32 FenderJazzmasterNULL41 FenderStratocasterAmsterdam102 FenderStratocasterNew York157 FenderStratocasterNULL259 FenderTelecasterAmsterdam80 FenderTelecasterNew York212 FenderTelecasterNULL292 FenderNULLNULL592 GibsonES-335Amsterdam4 GibsonES-335New York26 GibsonES-335NULL30 GibsonLes PaulAmsterdam21 GibsonLes PaulNew York42 GibsonLes PaulNULL63 GibsonSGAmsterdam32 GibsonSGNew York61 GibsonSGNULL93 GibsonNULLNULL186 NULLNULLNULL778 It should be easier to understand what I mean by different grouping levels. A little tip before I continue: Wherever you see a NULL value, this is a subtotal. Let’s have a look at the table. First, there are 9 Fender Jazzmasters in Amsterdam. Then there are 32 Fender Jazzmasters in New York. The total quantity is 41, which is what is shown in the row: manufacturermodelstorequantity_sum FenderJazzmasterNULL41 The NULL value means the data is grouped on a store level. This result reads “there are 41 Fender Jazzmasters in total, in both New York and Amsterdam”. The same calculation is done for every other Fender model, i.e. Stratocaster and Telecaster. Then there’s this row: manufacturermodelstorequantity_sum FenderNULLNULL592 What does it mean? It means there are in total 592 Fenders of all three models in both stores. The same principle is applied to Gibson. The quantity of guitars in Amsterdam and New York is first shown for the model. After this is done, there is a subtotal summing of the quantities from both stores. This is done for all three Gibson models: ES-335, Les Paul, and SG. Then there is a line showing the total number of all three Gibson guitar models in both stores (the same as with Fenders): manufacturermodelstorequantity_sum GibsonNULLNULL186 Finally, there’s a row showing the total number of guitars, no matter the store, guitar manufacturer, or model: manufacturermodelstorequantity_sum NULLNULLNULL778 I’m sure you now want to find out how CUBE and GROUPING SETS work. For that, I’d recommend having a look at the GROUP BY extensions course. These advanced topics are something data analysts will use very often. I’ve therefore prepared some SQL constructions for my fellow data analysts. If you’re into finance, here are some advanced SQL queries for financial analysis. Do You Consider Yourself an Advanced SQL User? How do you feel now? Did I raise your confidence? If you already know SQL window functions, CTEs, and the GROUP BY extensions, you can brag about your advanced SQL skills. If you're serious about proving your advanced SQL knowledge, try the SQL Skill Assessment at LearnSQL.com. It's a great way to validate your skills and show off your proficiency. Or maybe I did just the opposite? Perhaps I’ve shaken your confidence when you realized you don’t know anything about the advanced topics I’ve talked about in this article. Don’t worry! Whatever group you belong to, there are LearnSQL.com courses that will help you build your knowledge and your skills. Want to learn window functions? No problem – see our Window Functions course. Interested in CTEs? You can learn and practice them in our Recursive Queries course. Need to get more out of GROUP BY? Our GROUP BY Extensions in SQL course has you covered. All these courses are interactive, letting you learn by doing right in your browser. No need to install or set up anything – just jump in and start learning. Tags: Advanced SQL