8th Aug 2023 7 minutes read What Does GROUP BY 1 and GROUP BY 1, 2, 3 Mean in SQL? Dominika Florczykowska GROUP BY Table of Contents Let’s Start with an Example! What About GROUP BY 1, 2, 3? Caution: Do Not Use The GROUP BY 1 Syntax Eager to Learn More About GROUP BY? Have you ever come across the unfamiliar GROUP BY 1 syntax in SQL queries and wondered what it's all about? Well, you're not alone! In this article, we'll dive into this syntax, explain its purpose, and make it easier to understand. Whether you are a regular SQL user or you’re just beginning to learn this language, you have probably worked with GROUP BY before. The GROUP BY clause in SQL is used to group rows from a table based on one or more columns. It allows you to aggregate data and perform calculations on subsets of the data. Usually, the GROUP BY clause is followed by the name of the column on which you want to group the data. But did you know that you can also use syntax like GROUP BY 1 in SQL? The syntax of such a query is as follows: SELECT column_name(s) FROM table_name GROUP BY 1; You might have encountered this syntax in some SQL queries and found yourself wondering what it means. By the end of this article, you will know all about it and understand how it works. The best way to learn SQL is through practice. If you’re looking for a range of practice exercises, we recommend our SQL Practice path, where you can learn through hands-on exercises. By utilizing the code editor and working with real datasets, you'll gain valuable skills in solving real-world problems. This path contains 9 courses and over 1,100 hands-on practical exercises. Let’s Start with an Example! Imagine that you have a table called employee with the following columns: id – The ID of the employee, which is also the primary key (PK) of the table. name – The name of the employee. branch – The location of the company branch where the employee works. department – The department where the employee works. job_title – The employee’s job title. salary – The employee’s annual salary in USD. Here you can see a handful of rows from the table: id name branch department job_title salary 1 John Doe Toronto Marketing Marketing Specialist 50000 2 Jane Smith New York Marketing Marketing Specialist 60000 3 David Johnson Toronto Finance Financial Analyst 55000 4 Sarah Lee Seattle IT Software Engineer 70000 5 Michael Brown New York HR HR Coordinator 45000 6 Emily Davis Toronto Operations Operations Manager 65000 7 Robert Wilson Seattle Customer Service Customer Support Representative 40000 8 Jennifer Anderson New York IT Software Engineer 75000 9 Christopher Thompson Toronto Operations Supply Chain Analyst 52000 10 Amy Roberts Toronto Marketing Marketing Coordinator 48000 What happens if we run the following query? SELECT department, ROUND(AVG(salary)) as avg_salary FROM employee GROUP BY 1; By specifying GROUP BY 1, we're instructing the database to group the results based on the first column in the SELECT list, which is department in this case. This means that the above query is equivalent to this one: SELECT department, ROUND(AVG(salary)) as avg_salary FROM employee GROUP BY department; The first query is a bit shorter, but you might have noticed that it's less readable. In the second query, the reader immediately knows which column is used for grouping and there is less room for error. This is the result of our query: department avg_salary Marketing 52667 Operations 58500 Finance 55000 IT 72500 HR 45000 Customer Service 40000 If the order of columns selected was different, you would have to use a different column number: SELECT ROUND(AVG(salary)) as avg_salary, department FROM employee GROUP BY 2; As you can see, using this syntax can be a bit confusing. What About GROUP BY 1, 2, 3? In our previous example, we grouped the data by department. Let’s say that now we would like to group it not only by department but also by branch and job title. To do that, we can use the following query: SELECT department, branch, job_title, ROUND(AVG(salary)) as avg_salary FROM employee GROUP BY 1, 2, 3; When using GROUP BY 1, 2, 3, we're instructing the database to group the results based on the first, second, and third columns in the SELECT list. This is equivalent to running the following query: SELECT department, branch, job_title, ROUND(AVG(salary)) as avg_salary FROM employee GROUP BY department, branch, job_title; Note that the column numbers in the GROUP BY clause do not have to be consecutive. You can also write a query like this: SELECT branch, department, job_title, ROUND(AVG(salary)) as avg_salary FROM employee GROUP BY 2, 1, 3; Now, we're instructing the database to group the results based on the second, first, and third column in the SELECT list. You can also mix column numbers and names in a single GROUP BY clause: SELECT branch, department, job_title, ROUND(AVG(salary)) as avg_salary FROM employee GROUP BY department, 1, 3; This time, we're grouping the results based on the department column and the first and third column in the SELECT list. All of the above queries return the same result: department branch job_title avg_salary IT New York Software Engineer 75000 IT Seattle Software Engineer 70000 Finance Toronto Financial Analyst 55000 Marketing Toronto Marketing Coordinator 48000 Marketing New York Marketing Specialist 60000 Marketing Toronto Marketing Specialist 50000 Customer Service Seattle Customer Support Representative 40000 Operations Toronto Operations Manager 65000 HR New York HR Coordinator 45000 Caution: Do Not Use The GROUP BY 1 Syntax Using the GROUP BY 1 and GROUP BY 1, 2, 3 syntax in SQL is generally considered a bad practice due to readability and maintainability concerns. While it may provide a shortcut for typing queries, it can also lead to unintended consequences. The main issue with this syntax is that it relies on the position of columns in the SELECT list rather than explicitly specifying the column names. If the order of columns changes or new columns are added, the query results may be unexpectedly affected. For example, let's say that you used the following query to get the average annual salary grouped by department, branch, and job title: SELECT ROUND(AVG(salary)) as avg_salary, department, branch, job_title FROM employee GROUP BY 2, 3, 4; Now, imagine that we added a new column called bonus to our employee table. This column represents the amount of the annual bonus the employee received at the end of the year. We'd like to include the average bonus in our previous query, so we modify it a bit: SELECT ROUND(AVG(salary)) as avg_salary, ROUND(AVG(bonus)) as avg_bonus, department, branch, job_title FROM employee GROUP BY 2, 3, 4; But after running, this query… it fails! This is because we tried to group our rows by the avg_bonus, department, and branch columns instead of department, branch, and job_title. The correct query would look like this: SELECT ROUND(AVG(salary)) as avg_salary, ROUND(AVG(bonus)) as avg_bonus, department, branch, job_title FROM employee GROUP BY 3, 4, 5; However, we can easily avoid making this mistake by using a GROUP BY clause with full column names: SELECT ROUND(AVG(salary)) as avg_salary, ROUND(AVG(bonus)) as avg_bonus, department, branch, job_title FROM employee GROUP BY department, branch, job_title; Now – even if we decide to modify or rearrange columns in the SELECT statement – we don't need to worry about the GROUP BY clause. As you can see, it is far better to explicitly mention the column names in the GROUP BY clause: it makes your query more readable and less error-prone. The GROUP BY 1 syntax should only be used when you’re exploring data on your own, as it speeds up typing. If you mean to reuse the query in any way, use the full syntax instead. Eager to Learn More About GROUP BY? That’s all for today! If you’d like to learn more about the GROUP BY clause, make sure to check out these articles on using GROUP BY in SQL and grouping by multiple columns. Additionally, if you’re preparing for an interview, don’t miss our Top 9 SQL GROUP BY Interview Questions guide. And for more hands-on practice, remember to check out our SQL Practice track! Tags: GROUP BY