Author: Dejan Sarka SQL Server Database and BI Trainer, Consultant and Developer High-Performance Statistical Queries: Dependencies Between Discrete Variables Tags: data analysis SQL for advanced statistical queries statistics Dejan Sarka SQL Server Database and BI Trainer, Consultant and Developer In my previous article, we looked at how you can calculate linear dependencies between two continuous variables with covariance and correlation. Both methods use the means of the two variables in their calculations. However, mean values and other population moments make no sense for categorical (nominal) variables. For instance, if you denote "Clerical" as 1 and "Professional" as 2 for an occupation variable, what does the average of 1.5 signify? High Performance Statistical Queries –Skewness and Kurtosis Tags: kurtosis skewness SQL for advanced statistics Dejan Sarka SQL Server Database and BI Trainer, Consultant and Developer In descriptive statistics, the first four population moments include center, spread, skewness, and kurtosis or peakedness of a distribution. In this article, I am explaining the third and fourth population moments, the skewness and the kurtosis, and how to calculate them. Mean uses the values on the first degree in the calculation; therefore, it is the first population moment. Standard deviation uses the squared values and is therefore the second population moment. SQL Statistical Analysis Part 3: Measuring Spread of Distribution Tags: aggregate functions sql analytic functions Dejan Sarka SQL Server Database and BI Trainer, Consultant and Developer Besides knowing the centers of a distribution in your data, you need to know how varied the observations are. In this article, we’ll explain how to find the spread of a distribution in SQL. Are you dealing with a very uniform or a very spread population? To really understand what the numbers are saying, you must know the answer to this question. In the second part of this series, we discussed how to calculate centers of distribution. SQL Statistical Analysis Part 2: Calculating Centers of Distribution Tags: calculating statistical queries statistics Dejan Sarka SQL Server Database and BI Trainer, Consultant and Developer My previous article explained how to calculate frequencies using T-SQL queries. Frequencies are used to analyze the distribution of discrete variables. Today, we’ll continue learning about statistics and SQL. In particular, we’ll focus on calculating centers of distribution. We’ll learn e.g. how to calculate the SQL median, what functions to use to calculate the SQL mode, and how to calculate various types of mean in SQL (geometric mean, harmonic mean and, of course, arithmetic mean). SQL Statistical Analysis Part 1: Calculating Frequencies and Histograms Dejan Sarka SQL Server Database and BI Trainer, Consultant and Developer Database and Business Intelligence (BI) developers create huge numbers of reports on a daily basis, and data analyses are an integral part of them. If you wonder whether you can perform statistical analysis in SQL, the answer is ‘yes’. Read my article to learn how to do this! Statistics are very useful as an initial stage of a more in-depth analysis, i.e. for data overview and data quality assessment. However, SQL statistical analysis possibilities are somewhat limited as there are not many statistical functions in SQL Server.