Back to articles list November 29, 2017 - 6 minutes read Digging Into Data: Explore and Analyze Survey Results With SQL Kacper Skawina Tags: data analysis how to in sql reporting with SQL SQL basics Excel is a powerful beast that lets you analyze complex data. Yet, operating on big chunks of data can sometimes be a daunting task. Let's take a look at how SQL can help. Today, we'll tackle a common problem with importing data to an SQL database, using a real-life example. Suppose your company conducted a survey on the most popular programming trends and preferences, striving to meet the expectations of its users. Your user base was overwhelmingly responsive, as the questionnaire was completed by more than 15,000 people! Unfortunately, someone has to analyze those results—someone like you. So, let's get right to the task! Among many others, users had to answer the following questions, with possible answers listed in parentheses: How old are you? (17, 19, 25, 42, 33) What is your gender? (M, F, N–prefer not to answer, O–other) What is your current employment status? (E–employed; U–unemployed) What programming language would you like to learn? (Python, Java, SQL, JavaScript, C, C#, etc.) What other topics would you like to learn? (Data science, Statistics, Data visualization, etc.) Do you have any experience in programming? (Yes, No) What is your preferred method of learning? (Course, Video, Articles, Webinar, Mix) We've stored our data in an Excel file called survey.xlsx. Here's what our table looks like: Idagegenderemp_statuspreferred_languageother_topic experiencepref_learn_way 1 17 M E Python Data scienceyes Video 2 21 F E SQL Statistics yes Course 3 24 F E Python Statistics no Course 4 38 N E Python Excel no Mix Of course, this is only a small sample of data from a file that has around 15,000 rows containing the results of our survey. If you already have some experience with SQL, you may know that you can insert existing data into a table with the INSERT INTO command: INSERT INTO survey VALUES('id', 'age', 'gender' ...); If you have no idea how to insert data into a table using SQL, I recommend you take our free Operating on data in SQL course before reading any further. The INSERT INTO command can come in handy if you need to manually create a few rows of data. However, it would be tedious to fill in a table with 100,000 rows of data this way unless you were using a generated script. Fortunately, many database systems already provide a means of importing data from a spreadsheet into a database. Save XLSX File as a CSV To make things work, we need to first prepare an .xlsx file and then change its file extension from .xlsx (or any other common spreadsheet extension) to .csv, which is short for comma-separated values (CSV). This file format is used to store tabular data in plaintext. In most spreadsheets, all you have to do is click Save As and choose the .csv extension. You may be asked to specify a field delimiter value when you do this—you can use any character you like, but it should strictly be a character that is not used in the spreadsheet itself. Create a Table Now that our file is ready, we can begin preparing the table that will store all our survey data. We'll create the table with the same column names as in our CSV file to avoid confusion. CREATE TABLE survey ( id int, age int, gender char(1), emp_status char(1), preferred_language varchar(16), other_topic varchar(32), experience boolean, pref_learn_way varchar(16) ); Regardless of their names, the columns must contain the correct data types and be defined in the order in which they appear in the CSV file. As you've already seen, each column of our CSV file stores numbers, single characters, or strings. In our case, we've opted to use the following data types: int for numbers char(n) for single-character responses, where 'n' signifies the exact number of characters that are to be stored in a given column's entry varchar(n) for strings, where the 'n' signifies the maximum number of characters that may be stored in a given column's entry. boolean for yes/no responses. Remember, the order of the columns in our table must match the order of the columns in the CSV file. Fill the Table With Data Finally, we can import data from our CSV into our database. We'll use the COPY command to do that, like so: COPY survey FROM '/home/user/pathToFile/survey.csv' CSV HEADER DELIMITER ','; What exactly does this query do? Basically, we tell our database to copy data from the given path (note the quotation marks) to the survey table. We then specified the following parameters: CSV: informs the database that the file we would like to copy from is a csv. HEADER: we indicated that the the first row of our file is a header and should be omitted. DELIMITER ',': informs our database that the delimiter should be set to a comma. However, keep in mind that the order in which we list these parameters after the path does not matter. We could've just as well written DELIMITER ',' HEADER CSV, and everything would have worked fine. You also have to remember that the COPY command appends data to the table. Every time you use it, the database adds the data to anything that already exists in a given table and does not replace anything. The NULL Parameter Sometimes, the dataset we'd like to import is incomplete, meaning responses are missing from some of the cells. By default, the COPY command assumes that nulls are represented as empty, unquoted strings in a CSV file—in other words, as just an empty field. Obviously, this is not always the case. Someone could leave the field empty, but another person may write "NO DATA" instead to indicate that a response was not provided. We can handle this by using NULL AS 'something' to inform our database what should be interpreted as a null value: COPY survey FROM '/home/user/pathToFile/employees.csv' CSV HEADER DELIMITER ',' NULL AS 'NO DATA'; In the above example, all CSV fields that have 'NO DATA' written inside them will be converted to nulls. Analyze Your Data Awesome! Our survey responses are now in the database! The only thing that remains is analyzing the information we've gathered. We can query almost anything we'd like. It all just depends on what you'd like to know. The following example is a simple demonstration of some SQL queries you could execute: WITH no_experience_python AS ( SELECT count(*) AS no_experience_python_users FROM survey WHERE experience IS FALSE AND preferred_language = 'Python' AND other_topic = 'Data Science' ), experienced_python AS ( SELECT count(*) AS experienced_python_users FROM survey WHERE experience IS TRUE AND preferred_language = 'Python' AND other_topic = 'Data Science' ) SELECT * FROM no_experience_python, experienced_python In the above query, we compared the number of users with no prior experience who want to learn Python and are interested in Data Science to users who have no prior experience, want to learn Python, and are also interested in Data Science. It's easy, readable, and simple! Conclusion SQL's quite a powerful and useful language to learn, especially if you plan to analyze data in your line of work. If you'd like to learn about data types, creating tables, and much more, check out our Creating Tables in SQL course. Or, if you're just getting started with SQL, you can learn the core principles of the language by signing up for our SQL Basics course. Tags: data analysis how to in sql reporting with SQL SQL basics You may also like SQL Statistical Analysis Part 1: Calculating Frequencies and Histograms If you wonder whether you can perform statistical analysis in SQL, the answer is ‘yes’. Read my article to learn how to do this! Read more SQL Statistical Analysis Part 2: Calculating Centers of Distribution In this article, we’ll focus on calculating centers of distribution. We’ll learn how to calculate the SQL median, SQL mode, and various types of mean. Read more SQL Statistical Analysis Part 3: Measuring Spread of Distribution In this article, we’ll explain how to find the spread of a distribution in SQL to take your SQL statistical analysis skills to the next level! Read more You Want to Learn SQL? You've Come to the Right Place! If you want to learn SQL basics or enhance your SQL skills, check out LearnSQL.com for a wide range of SQL courses and tracks. Read more How SQL supports data-driven organization Businesses face many decisions. Most of them are driven by intuition, but organizations that make the most business impact use data-driven decision-making. Read more How to Begin Running SQL Queries What do you need to run SQL queries? Jump right in to find out! We'll assist you in writing and executing your first SQL query. Read more How to Solve Capitalization Data Quality Issues Misspelled names, typos, and text data quality issues in your database? No problem! We can fix it with SQL string functions. Read more 15 Best SQL Articles for Beginners Published in 2017 If you're looking for articles on SQL for beginners, take a look at my comprehensive list of best SQL articles from 2017! Read more Introduction to Reporting with SQL — the Ultimate Tutorial for Business Professionals The right business decision-making is not based on intuition but data analysis. Learn how to perform simple SQL queries for data reporting. Read more Is SQL Worth Learning? Is learning SQL something to consider? Or is it yet another useless skill that employers don’t value? Read more SQL Window Functions Cheat Sheet This 2-page SQL Window Functions Cheat Sheet covers the syntax of window functions and a list of window functions. Download it in PDF or PNG format. Read more High Performance Statistical Queries –Skewness and Kurtosis In this article, I am explaining the third and fourth population moments, the skewness and the kurtosis, and how to calculate them. Read more High-Performance Statistical Queries: Dependencies Between Discrete Variables Contingency tables and the chi-squared test—which method is better to test dependencies that do not rely on numeric values? Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.