SQL for Data Analysis Cheat Sheet
Welcome to our SQL for Data Analysis Cheat Sheet — a must-have resource for anyone looking to harness the power of SQL in the realm of data analysis. Whether you're a data enthusiast, a business analyst, or a seasoned data professional, this cheat sheet is designed to empower you to extract valuable insights from your datasets.
Inside, you'll find a carefully curated collection of SQL commands and tips that will elevate your data analysis game. We've distilled the most frequently employed SQL patterns in data analysis into concise references enabling you to quickly find the essential hints and syntax reminders you need. Whether you're modifying data, performing calculations, or crafting insightful reports, our cheat sheet has got you covered.
Download the cheat sheet in your preferred format:
- A4 Format - Best for printing on A4-sized paper.
- Letter Format - Ideal for printing on standard Letter-sized paper.
- Mobile Friendly Format - Optimized for viewing on mobile devices.
For visual enthusiasts:
If you prefer a more visual approach or need a digital-friendly version for various applications, we also offer the cheat sheet as a high-resolution PNG image. To download, right-click (for desktop users) or long tap (for mobile users) on the image.
SQL
SQL, or Structured Query Language, is a language for talking to databases. It lets you select specific data and build complex reports. Today, SQL is a universal language of data, used in practically all technologies that process data.
SELECT
Fetch the id
and name
columns from the product
table:
SELECT id, nameFROM product;
Concatenate the name and the description to fetch the full description of the products:
SELECT name || ' - ' || descriptionFROM product;
Fetch names of products with prices above 15:
SELECT nameFROM productWHERE price > 15;
Fetch names of products with prices between 50 and 150:
SELECT nameFROM productWHERE priceBETWEEN 50AND 150;
Fetch names of products that are not watches:
SELECT nameFROM productWHERE name != 'watch';
Fetch names of products that start with a 'P'
or end with an 's'
:
SELECT nameFROM productWHERE nameLIKE 'P%'OR nameLIKE '%s';
Fetch names of products that start with any letter followed by 'rain'
(like 'train'
or 'grain'
):
SELECT nameFROM productWHERE nameLIKE '_rain';
Fetch names of products with non-null prices:
SELECT nameFROM productWHERE priceIS NOT NULL ;
GROUP BY

Aggregate functions
Count the number of products:
SELECT COUNT( *) FROM product;
Count the number of products with non-null prices:
SELECT COUNT( price) FROM product;
Count the number of unique category
values:
SELECT COUNT( DISTINCT category) FROM product;
Get the lowest and the highest product price:
SELECT MIN( price) ,MAX( price) FROM product;
Find the total price of products for each category:
SELECT category,SUM( price) FROM productGROUP BY category;
Find the average price of products for each category whose average is above 3.0:
SELECT category,AVG( price) FROM productGROUP BY categoryHAVING AVG( price) > 3.0;
ORDER BY
Fetch product names sorted by the price
column in the default ASCending order:
SELECT nameFROM productORDER BY price[ ASC ] ;
Fetch product names sorted by the price
column in DESCending order:
SELECT nameFROM productORDER BY priceDESC ;
Computations
Use +
, -
, *
, /
to do basic math. To get the number of seconds in a week:
SELECT 60 * 60 * 24 * 7;-- result: 604800
Rounding numbers
Round a number to its nearest integer:
SELECT ROUND(1234.56789);-- result: 1235
Round a number to two decimal places:
SELECT ROUND(AVG(price), 2)FROM productWHERE category_id = 21;-- result: 124.56
Troubleshooting
Integer division
In PostgreSQL and SQL Server, the /
operator performs integer division for integer arguments. If you do not see the number of decimal places you expect, it is because you are dividing between two integers. Cast one to decimal:
123 / 2-- result: 61 CAST( 123 AS decimal) / 2-- result: 61.5
Division by 0
To avoid this error, make sure the denominator is not 0
. You may use the NULLIF()
function to replace 0
with a NULL
, which results in a NULL
for the entire expression:
count /NULLIF( count_all, 0)
JOIN
JOIN is used to fetch data from multiple tables. To get the names of products purchased in each order, use:
SELECT orders.order_date, product.nameAS product, amountFROM ordersJOIN productON product.id = orders.product_id;
INSERT, UPDATE, DELETE
INSERT
To insert data into a table, use the INSERT
command:
INSERT INTO categoryVALUES (1, 'Home and Kitchen'), (2, 'Clothing and Apparel');
You may specify the columns to which the data is added. The remaining columns are filled with predefined default values or NULL
s.
INSERT INTO category (name)VALUES ('Electronics');
UPDATE
To update the data in a table, use the UPDATE
command:
UPDATE categorySET is_active = true, name = 'Office'WHERE name = 'Ofice';
DELETE
To delete data from a table, use the DELETE
command:
DELETE FROM categoryWHERE nameIS NULL ;
Date and time
There are 3 main time-related types:
,
, and
. Time is expressed using a 24-hour clock, and it can be as vague as just hour and minutes (e.g., 15:30
– 3:30 p.m.) or as precise as microseconds and time zone (as shown below):

14:39:53.662522-05
is almost 2:40 p.m. CDT (e.g., in Chicago; in UTC it'd be 7:40 p.m.). The letters in the above example represent:
YYYY
– the 4-digit year.mm
– the zero-padded month (01
—January through 12
—December).dd
– the zero-padded day.HH
– the zero-padded hour in a 24-hour clock.MM
– the minutes.SS
– the seconds. Omissible.ssssss
– the smaller parts of a second – they can be expressed using 1 to 6 digits. Omissible.±TZ
– the timezone. It must start with either +
or -
, and use two digits relative to UTC. Omissible.Current date and time
Find out what time it is:
SELECT CURRENT_TIME ;
Get today's date:
SELECT CURRENT_DATE ;
In SQL Server:
SELECT GETDATE() ;
Get the timestamp with the current date and time:
SELECT CURRENT_TIMESTAMP ;
Creating date and time values
To create a date, time, or timestamp, write the value as a string and cast it to the proper type.
SELECT CAST( '2021-12-31'AS date) ;SELECT CAST( '15:31'AS time) ;SELECT CAST( '2021-12-31 23:59:29+02'AS timestamp) ;SELECT CAST( '15:31.124769'AS time) ;
Be careful with the last example – it is interpreted as 15 minutes 31 seconds and 124769 microseconds! It is always a good idea to write 00 for hours explicitly: '00:15:31.124769'
.
Sorting chronologically
Using ORDER BY
on date and time columns sorts rows chronologically from the oldest to the most recent:
SELECT order_date, product, quantityFROM salesORDER BY order_date;
order_date | product | quantity |
---|---|---|
2023-07-22 | Laptop | 2 |
2023-07-23 | Mouse | 3 |
2023-07-24 | Sneakers | 10 |
2023-07-24 | Jeans | 3 |
2023-07-25 | Mixer | 2 |
Use the DESCending order to sort from the most recent to the oldest:
SELECT order_date, product, quantityFROM salesORDER BY order_dateDESC ;
Comparing date and time values
You may use the comparison operators <
, <=
, >
, >=
, and =
to compare date and time values. Earlier dates are less than later ones. For example, 2023-07-05
is "less" than 2023-08-05
.
Find sales made in July 2023:
SELECT order_date, product_name, quantityFROM salesWHERE order_date >= '2023-07-01'AND order_date < '2023-08-01';
Find customers who registered in July 2023:
SELECT registration_timestamp, emailFROM customerWHERE registration_timestamp >= '2023-07-01'AND registration_timestamp < '2023-08-01';
Note: Pay attention to the end date in the query. The upper bound '2023-08-01'
is not included in the range. The timestamp '2023-08-01'
is actually the timestamp '2023-08-01 00:00:00.0'
. The comparison operator <
is used to ensure the selection is made for all timestamps less than '2023-08-01 00:00:00.0'
, that is, all timestamps in July 2023, even those close to the midnight of August 1, 2023.
Intervals
An interval measures the difference between two points in time. For example, the interval between 2023-07-04
and 2023-07-06
is 2 days.
To define an interval in SQL, use this syntax:
INTERVAL '1'DAY
The syntax consists of three elements: the INTERVAL
keyword, a quoted value, and a time part keyword. You may use the following time parts: YEAR
, MONTH
, DAY
, HOUR
, MINUTE
, and SECOND
.
Adding intervals to date and time values
You may use +
or -
to add or subtract an interval to date or timestamp values.
Subtract one year from 2023-07-05
:
SELECT CAST( '2023-07-05'AS TIMESTAMP) -INTERVAL '1'year ;-- result: 2022-07-05 00:00:00
Find customers who placed the first order within a month from the registration date:
SELECT idFROM customersWHERE first_order_date > registration_date +INTERVAL '1'month ;
Filtering events to those in the last 7 days
To find the deliveries scheduled for the last 7 days, use:
SELECT delivery_date, addressFROM salesWHERE delivery_date <=CURRENT_DATE AND delivery_date >=CURRENT_DATE -INTERVAL '7'DAY ;
Note: In SQL Server, intervals are not implemented – use the DATEADD()
and DATEDIFF()
functions.
Filtering events to those in the last 7 days in SQL Server
To find the sales made within the last 7 days, use:
SELECT delivery_date, addressFROM salesWHERE delivery_date <=GETDATE() AND delivery_date >=DATEADD(DAY , -7,GETDATE() ) ;
Extracting parts of dates
The standard SQL syntax to get a part of a date is
SELECT EXTRACT( YEARFROM order_date) FROM sales;
You may extract the following fields:YEAR
, MONTH
, DAY
, HOUR
, MINUTE
, and SECOND
.
The standard syntax does not work In SQL Server. Use the DATEPART(part, date)
function instead.
SELECT DATEPART( YEAR, order_date) FROM sales;
Grouping by year and month
Find the count of sales by month:
SELECT EXTRACT( YEARFROM order_date) AS year,EXTRACT( MONTHFROM order_date) AS month,COUNT( *) AS countFROM salesGROUP BY year, monthORDER BY year month;
year | month | count |
---|---|---|
2022 | 8 | 51 |
2022 | 9 | 58 |
2022 | 10 | 62 |
2022 | 11 | 76 |
2022 | 12 | 85 |
2023 | 1 | 71 |
2023 | 2 | 69 |
Note that you must group by both the year and the month. EXTRACT(MONTH FROM order_date)
only extracts the month number (1, 2, ..., 12). To distinguish between months from different years, you must also group by year.
CASE WHEN
CASE WHEN
lets you pass conditions (as in the WHERE
clause), evaluates them in order, then returns the value for the first condition met.
SELECT name,CASE WHEN price > 150THEN 'Premium'WHEN price > 100THEN 'Mid-range'ELSE 'Standard'END AS price_categoryFROM product;
Here, all products with prices above 150 get the Premium label, those with prices above 100 (and below 150) get the Mid-range label, and the rest receives the Standard label.
CASE WHEN and GROUP BY
You may combine CASE WHEN
and GROUP BY
to compute object statistics in the categories you define.
SELECT CASE WHEN price > 150THEN 'Premium'WHEN price > 100THEN 'Mid-range'ELSE 'Standard'END AS price_category,COUNT( *) AS productsFROM productGROUP BY price_category;
Count the number of large orders for each customer using CASE WHEN
and SUM()
:
SELECT customer_id,SUM( CASE WHEN quantity > 10THEN 1ELSE 0END ) AS large_ordersFROM salesGROUP BY customer_id;
... or using CASE WHEN
and COUNT()
:
SELECT customer_id,COUNT( CASE WHEN quantity > 10THEN order_idEND ) AS large_ordersFROM salesGROUP BY customer_id;
GROUP BY extensions (GROUPING SETS, ROLLUP, CUBE)
GROUPING SETS
GROUPING SETS
lets you specify multiple sets of columns to group by in one query.
SELECT region, product,COUNT( order_id) FROM salesGROUP BY GROUPING SETS ((region, product) ,() );

CUBE
CUBE
generates groupings for all possible subsets of the GROUP BY
columns.
SELECT region, product,COUNT( order_id) FROM salesGROUP BY CUBE (region, product);

ROLLUP
ROLLUP
adds new levels of grouping for subtotals and grand totals.
SELECT region, product,COUNT( order_id) FROM salesGROUP BY ROLLUP (region, product);

COALESCE
COALESCE
replaces the first NULL
argument with a given value. It is often used to display labels with GROUP BY
extensions.
SELECT region,COALESCE( product, 'All') ,COUNT( order_id) FROM salesGROUP BY ROLLUP (region, product);
region | product | count |
---|---|---|
USA | Laptop | 10 |
USA | Mouse | 5 |
USA | All | 15 |
UK | Laptop | 6 |
UK | All | 6 |
All | All | 21 |
Common Table Expressions
A common table expression (CTE) is a named temporary result set that can be referenced within a larger query. They are especially useful for complex aggregations and for breaking down large queries into more manageable parts.
WITH total_product_salesAS (SELECT product,SUM( profit) AS total_profitFROM salesGROUP BY product )SELECT AVG( total_profit) FROM total_product_sales;
Window functions
Window functions compute their results based on a sliding window frame, a set of rows related to the current row. Unlike aggregate functions, window functions do not collapse rows.
Computing the percent of total within a group
SELECT product, brand, profit, (100.0 * profit /SUM( profit) OVER( PARTITION BY brand) )AS percFROM sales;
product | brand | profit | perc |
---|---|---|---|
Knife | Culina | 1000 | 25 |
Pot | Culina | 3000 | 75 |
Doll | Toyze | 2000 | 40 |
Car | Toyze | 3000 | 60 |
Ranking
Rank products by price:
SELECT RANK() OVER( ORDER BY price) , nameFROM product;
Ranking functions
RANK
– gives the same rank for tied values, leaves gaps.DENSE_RANK
– gives the same rank for tied values without gaps.ROW_NUMBER
– gives consecutive numbers without gaps.
name | rank | dense_rank | row_number |
---|---|---|---|
Jeans | 1 | 1 | 1 |
Leggings | 2 | 2 | 2 |
Leggings | 2 | 2 | 3 |
Sneakers | 4 | 3 | 4 |
Sneakers | 4 | 3 | 5 |
Sneakers | 4 | 3 | 6 |
T-Shirt | 7 | 4 | 7 |
Running total
A running total is the cumulative sum of a given value and all preceding values in a column.
SELECT date, amount,SUM( amount) OVER( ORDER BY date) AS running_totalFROM sales;
Moving average
A moving average (a.k.a. rolling average, running average) is a technique for analyzing trends in time series data. It is the average of the current value and a specified number of preceding values.
SELECT date, price,AVG( price) OVER( ORDER BY dateROWS BETWEEN 2PRECEDING AND CURRENT ROW ) AS moving_avergeFROM stock_prices;
Difference between two rows (delta)
SELECT year, revenue,LAG( revenue) OVER( ORDER BY year) AS revenue_prev_year, revenue -LAG( revenue) OVER( ORDER BY year) AS yoy_differenceFROM yearly_metrics;