Back to articles list Articles Cookbook
12 minutes read

What Is Advanced SQL?

Are you confused about advanced SQL skills? What are they? This article will explain what advanced SQL can mean, especially as we use it on LearnSQL.com.

I’m sure you find the phrases ‘advanced SQL skills’ or ‘advanced SQL topics’ very often. You read one article about advanced SQL and you’re happy with how easy these advanced topics seem to be. Then you talk to someone and you see they consider everything you know as basic SQL knowledge. How do you define yourself? Do you consider yourself a basic, intermediate, or advanced SQL user?

Advanced SQL Is Everywhere

Advanced SQL is everywhere. Well, the word ‘advanced’ is, at least. It’s used very commonly by SQL learners and SQL users. You can find it in SQL course descriptions, in job ads, and in the job interview questions. It’s in the SQL literature. You hear it when colleagues are talking at work. It’s in numerous articles trying to define what advanced SQL is.

Actually, I’m not trying to give you a definition of advanced SQL. I’m trying to tell you something else: There’s no single definition of advanced SQL! And you should stop looking for it. What should you do instead?

Embrace the Inconsistency

That’s right! You should accept that the term ‘advanced SQL’ is used inconsistently. It means different things depending on the context and who’s using the term.

It’s only logical that advanced SQL would include one thing for someone who writes SQL reports and something entirely different for someone hiring a data analyst. A software developer will surely have yet another definition of what advanced SQL includes.

You get the picture. Advanced SQL can’t have just one definition. When you’re reading about advanced SQL skills, you should always consider the context, who’s doing the talking, and their audience.

What Might Advanced SQL Include?

For instance, there’s a really interesting discussion about advanced SQL on Stack Overflow.

The discussion was started by someone looking for an SQL job who noted that there are plenty of jobs requiring “advanced SQL knowledge”. The user is asking what is to be expected from this kind of job. What knowledge is considered advanced?

The first answer gives a rather long code snippet as a measure of advanced knowledge. Even though it’s pretty long, it’s not that complicated. According to this reply, advanced SQL covers selecting columns, aggregate functions like MIN() and MAX(), the CASE WHEN statement, JOINs, the WHERE clause, GROUP BY, declaring variables, and subqueries.

On the other hand, the following reply considers most of these topics to be basic or intermediate at best. This user believes advanced SQL topics include functions, stored procedures, hierarchical queries, triggers, indices, data modeling (normal forms, primary and foreign keys, table constraints), transactions, and much more. This is much closer to my definition of advanced SQL and what I was taught in SQL lectures. However, this was a program for database administrators; understandably, this knowledge is considered advanced. Some reporting specialists and data analysts may never need to use such things.

It’s interesting to note that sometimes JOINs are considered advanced while writing stored procedures is still regarded as basic knowledge. I can understand why because one user hints at the problem with JOINs. Even though they are generally considered basic knowledge, many SQL users learn much more advanced topics before really understanding JOINs. This is how the basics easily become advanced knowledge. It’s not unusual to find someone using flashy functions, triggers, and whatnot – without knowing how to write a simple JOIN.

What Is Advanced SQL at LearnSQL.com?

Before explaining what advanced SQL is, it’s essential to know what it isn't. When you look at our courses and articles, basic/intermediate SQL is anything in SQL-92. (Here’s the history and details of SQL standards if you’re interested in finding out more.) This includes:

  • All types of JOINs
  • Aggregate functions
  • GROUP BY
  • HAVING
  • Subqueries
  • Set operations (UNION, UNION ALL, INTERSECT, MINUS)

You must be familiar with these topics if you claim to know SQL. These are things you should understand before moving to more advanced topics.

Generally, we consider three topics as ‘advanced SQL’:

  • Window functions
  • Common Table Expressions (CTEs)
  • GROUP BY extensions (ROLLUP, CUBE, and GROUPING SETS)

Anybody wanting to learn (or practice) all three topics should check out our Advanced SQL track.  Of course, this is not the only advanced SQL course out there; we’ve already reviewed some excellent advanced SQL courses from other platforms. For now, let’s look at an example of each of these topics.

Window Functions

SQL window functions allow you to perform operations that are often required for creating reports, e.g. ranking data, calculating running totals and moving averages, finding the difference between rows, etc.  Not only that, but you can also divide data into windows, which enables you to perform operations on data subsets rather than the data as a whole. You can learn much more about this in our Window Functions course.

Let’s see an example. This code will show the difference in yearly numbers of cars sold, according to make (i.e. car brand):

SELECT	car_make,
		cars_sold,
		year,
		cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff
FROM cars_sale;

To get this information, you first have to select the columns you want in the result: car_make, cars_sold, year. To get the yearly difference, subtract the previous year’s sale from the current year’s sale: cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff. Here, cars_sold means the current year’s sales. The LAG() function allows you to fetch data from the previous row. The OVER clause signifies this is a window function. Then follows the PARTITION BY clause, which is used to define the window (data subset) we want to use. In this case, it’s the car_make; this means the function will calculate the sale difference only within a specific car make. When it runs into another car make, the function will reset and start calculating the sales difference all over again.

Finally, the operation is ordered by year ascending. Why is that? The LAG() function will get us the data from the previous row. So, if this operation is performed for every year in ascending order, the “previous year” will mean the previous row. That’s exactly what we need.

Take a look at the result:

car_makecars_soldyearsales_diff
Nissan459,6632015NULL
Nissan312,4532016-147,210
Nissan541,2232017228,770
Nissan452,8442018-88,379
Nissan584,2562019131,412
Renault1,342,5582015NULL
Renault17,251,456201615,908,898
Renault16,842,5522017-408,904
Renault1,425,8952018-15,416,657
Renault1,548,6982019122,803

Did you see the NULL value in the first row? That’s because 2015 is the first year; there’s no previous data that can be deducted from it. If you follow the results, you’ll see every row is the difference between the current row’s yearly sales and the previous row’s yearly sales. When you reach the row where Renault starts, there’s a NULL again. This is what window functions do; they work on data within a given window., I’ve defined the window according to the car_make, so the window function resets when we get a new value in this column. It’s only logical. Why would I deduct Renault sales from Nissan sales? I want to do that for every car make separately.

Common Table Expressions (CTEs)

CTEs will allow you to write complex queries without using subqueries keeping your code simple and straightforward. They give you the possibility to produce complex reports quickly and efficiently. They also enable you to make some calculations you wouldn’t be able to do otherwise.

What is a common table expression, you might ask? It’s a temporary result you can use in the SELECT statement. It works like a temporary table – you can join it with other tables, other CTEs, or with itself.

They can be helpful if you, for instance, have to report on time spent on a particular project. On one side, there’s a table containing data about the date when each employee worked on this project. There’s also the start time and end time. On the other side, there’s a table containing employee names. You have to produce a table showing every employee’s name and his or her average time spent on this project.

Here’s how the CTE can help you:

WITH time_worked AS (
	SELECT	employee_id,
			end_time - start_time AS time
FROM project_timesheet
)
SELECT	e.first_name,
		e.last_name,
		AVG (tw.time) AS avg_time_worked
FROM employee e 
LEFT JOIN time_worked tw 
ON e.id = tw.employee_id
GROUP BY e.first_name, e.last_name;

How does this CTE work? Every CTE opens with the WITH clause. Then you must name your CTE; in this case, it’s time_worked. Then you write a SELECT statement. Here, I’ll use the CTE to calculate how much time each employee worked every time they worked on the project. I need the CTE because I don’t have this information stated explicitly in the table; I only have the start_time and end_time. To calculate the average time worked, the first step is to get the time worked. That’s why this CTE deducts the start_time from the end_time and shows the result in the column time. The data is taken from the table project_timesheet.

Now that I’ve written the CTE, I can use it in the next SELECT statement. First, I’ll get the first name and the last name from the table employee. Then I’ll use the AVG() function on the column time from the CTE time_worked. To do that, I’ve used the LEFT JOIN – and I’ve used it exactly like I would with any other table. Finally, the data is grouped by the employees’ first and last names.

The result is a little table like this:

first_namelast_nameavg_time_worked
JanineRooney4:58:39
MikeWatson5:52:24
PeterMarcotti4:09:33
IngeOngeborg8:56:05

If CTEs have you interested, imagine what you’ll be able to do after finishing our Recursive Queries course. Oh, yeah – I didn’t mention that a CTE can be recursive, which means it references itself. By doing so, it returns the sub-result and repeats the process until it returns the final result. While CTEs can be non-recursive, there are no recursive queries that are non-CTE. If you want to learn recursive queries, knowing CTEs is a must.

GROUP BY Extensions

SQL’s GROUP BY extensions provide you with additional possibilities for grouping data. This, in return, can increase the complexity of your data analysis and the reports you create.

There are three GROUP BY extensions:

  • ROLLUP
  • CUBE
  • GROUPING SETS

Unlike regular GROUP BY, ROLLUP lets you group the data into multiple data sets and aggregate results on different levels. Fancy talk, but simply put: you can use ROLLUP to calculate totals and subtotals, just like in Excel pivot tables.

The CUBE extension is similar, but there’s one crucial difference. CUBE will generate subtotals for every combination of the columns specified.

Finally, there are GROUPING SETs. A grouping set is a set of columns you use in the GROUP BY clause. You can connect different queries containing GROUP BY if you use UNION ALL. However, the more queries you have, the messier it gets. You can achieve the same result but with much neater queries by using GROUPING SETS.

Let me show you how ROLLUP works. Suppose you’re working for a guitar store that has several locations. You’ll sometimes need to create a report showing the total number of guitars you have in stock. Here’s a query that will do that on a manufacturer, model, and store level:

SELECT	manufacturer,
		model,
		store,
		SUM(quantity) AS quantity_sum
FROM guitars
GROUP BY ROLLUP (manufacturer, model, store)
ORDER BY manufacturer;

This doesn’t look complicated. It’s a simple SELECT statement that will give you the columns manufacturer, model, and store from the table guitars. I’ve used the aggregate function SUM() to get the quantity. Then I wrote GROUP BY followed immediately by ROLLUP. The data will be grouped according to the columns in the parentheses. Finally, the result is ordered by the manufacturer.

What will this query return? Have a look:

manufacturermodelstorequantity_sum
FenderJazzmasterAmsterdam9
FenderJazzmasterNew York32
FenderJazzmasterNULL41
FenderStratocasterAmsterdam102
FenderStratocasterNew York157
FenderStratocasterNULL259
FenderTelecasterAmsterdam80
FenderTelecasterNew York212
FenderTelecasterNULL292
FenderNULLNULL592
GibsonES-335Amsterdam4
GibsonES-335New York26
GibsonES-335NULL30
GibsonLes PaulAmsterdam21
GibsonLes PaulNew York42
GibsonLes PaulNULL63
GibsonSGAmsterdam32
GibsonSGNew York61
GibsonSGNULL93
GibsonNULLNULL186
NULLNULLNULL778

It should be easier to understand what I mean by different grouping levels. A little tip before I continue: Wherever you see a NULL value, this is a subtotal. Let’s have a look at the table. First, there are 9 Fender Jazzmasters in Amsterdam. Then there are 32 Fender Jazzmasters in New York. The total quantity is 41, which is what is shown in the row:

manufacturermodelstorequantity_sum
FenderJazzmasterNULL41

The NULL value means the data is grouped on a store level. This result reads “there are 41 Fender Jazzmasters in total, in both New York and Amsterdam”. The same calculation is done for every other Fender model, i.e. Stratocaster and Telecaster. Then there’s this row:

manufacturermodelstorequantity_sum
FenderNULLNULL592

What does it mean? It means there are in total 592 Fenders of all three models in both stores.

The same principle is applied to Gibson. The quantity of guitars in Amsterdam and New York is first shown for the model. After this is done, there is a subtotal summing of the quantities from both stores. This is done for all three Gibson models: ES-335, Les Paul, and SG. Then there is a line showing the total number of all three Gibson guitar models in both stores (the same as with Fenders):

manufacturermodelstorequantity_sum
GibsonNULLNULL186

Finally, there’s a row showing the total number of guitars, no matter the store, guitar manufacturer, or model:

manufacturermodelstorequantity_sum
NULLNULLNULL778

I’m sure you now want to find out how CUBE and GROUPING SETS work. For that, I’d recommend having a look at the GROUP BY extensions course.

These advanced topics are something data analysts will use very often. I’ve therefore prepared some SQL constructions for my fellow data analysts. If you’re into finance, here are some advanced SQL queries for financial analysis.

Do You Consider Yourself an Advanced SQL User?

How do you feel now? Did I raise your confidence? If you already know SQL window functions, CTEs, and the GROUP BY extensions, you can brag about your advanced SQL skills.

Or maybe I did just the opposite? Perhaps I’ve shaken your confidence when you realized you don’t know anything about the advanced topics I’ve talked about in this article.

Don’t worry! Whatever group you belong to, there are LearnSQL.com courses that will help you build your knowledge and your skills. Want to learn window functions? No problem – see our Window Functions course. Interested in CTEs? You can learn and practice them in our Recursive Queries course. Need to get more out of GROUP BY? Our GROUP BY Extensions in SQL course has you covered.