# Window Functions: Part Two: Aggregating Data

In part one of this series on SQL Server window functions, the functionality of the OVER() clause was discussed. This article will focus on aggregate functions. The majority of these can be used in conjunction with OVER. Some, like SUM() and AVG(), are widely used. Others — including VARP() and CHECKSUM_AGG() — are not as well-known, but are potentially quite useful.

OVER defines the user-specified range on which a function is applied. This can be combined with many SQL functions, allowing easy separation of data into defined windows within a single SQL query.

In the examples below, we will use the same base data as in the first article. The attributes *PhysicalSkill* and *MentalSkill* will be aggregated, often based on the *House* group.

CREATE TABLE WindowTable ( WindowID INT IDENTITY, House Varchar(32), FullName Varchar(64), PhysicalSkill Decimal(4, 2), MentalSkill Decimal (4, 2) ) GO

INSERT WindowTable (House, FullName, PhysicalSkill, MentalSkill) VALUES ('Stark', 'Robb Stark', 6, 6), ('Stark', 'Ned Stark', 8, 7), ('Stark', 'Bran Stark', 2, 9), ('Stark', 'Arya Stark', 4, 6), ('Lannister', 'Jamie Lannister', 7, 5), ('Lannister', 'Tyrion Lannister', 3, 10), ('Lannister', 'Tywin Lannister', 3, 8) GO

## The Commoners: SUM, MIN, MAX, COUNT, and AVG

The aggregate functions SUM(), MIN(), MAX(), COUNT(), and AVG() are the most commonly seen in SQL. Let's look at each one in some detail.

### SUM

The most basic of aggregate functions, SUM operates only on numeric data. NULL values are automatically ignored. In Part 1 of this series, SUM was used in most examples due to its simplicity and intuitive nature.

In this example, SUM is partitioned by house. It will add up totals only within the window defined by that partition:

SELECT House, FullName, PhysicalSkill, SUM(PhysicalSkill) OVER (PARTITION BY House) AS TotalPhysicalSkillByHouse, MentalSkill, SUM(MentalSkill) OVER (PARTITION BY House) AS TotalMentalSkillByHouse FROM WindowTable

A lesser-known functionality of SUM is that it takes an additional argument: either ALL or DISTINCT. If this argument is not specified, ALL is used by default.

DISTINCT will cause the function to include distinct values only. So if a number is found twice in the dataset, it will only be included in the aggregation once.

SELECT House, FullName, PhysicalSkill, MentalSkill FROM WindowTable SELECT House, SUM( ALL PhysicalSkill) AS TotalPhysicalSkillByHouse , SUM( DISTINCT PhysicalSkill) AS TotalPhysicalSkillByHouseDistinct, SUM( ALL MentalSkill) AS TotalMentalSkillByHouse, SUM( DISTINCT MentalSkill) AS TotalMentalSkillByHouseDistinct FROM WindowTable GROUP BY House

In the above example, `PhysicalSkill`

contains '3' twice, and `MentalSkill`

contains '6' twice. when the DISTINCT argument is used with SUM, only one "3" and one "6" is included.

### MIN and MAX

Interestingly, MIN and MAX also take the DISTINCT and ALL arguments, although they serve no purpose in the following context:

SELECT House, FullName, PhysicalSkill, MentalSkill FROM WindowTable SELECT House, MIN( ALL PhysicalSkill) AS MinPhysicalSkillByHouse , MIN( DISTINCT PhysicalSkill) AS MinPhysicalSkillByHouseDistinct, MAX( ALL MentalSkill) AS MaxMentalSkillByHouse, MAX( DISTINCT MentalSkill) AS MaxMentalSkillByHouseDistinct FROM WindowTable GROUP BY House

### AVG, COUNT, and COUNT_BIG

All three of these functions also take the DISTINCT and ALL arguments. In this slightly more complicated example, `ROWS UNBOUNDED PRECEDING`

is used to generate a running total for each of the three functions:

SELECT House, FullName, MentalSkill, AVG(MentalSkill) OVER (PARTITION BY House ORDER BY House ROWS UNBOUNDED PRECEDING) AS MentalSkillRunningAverage, COUNT(MentalSkill) OVER (PARTITION BY House ORDER BY House ROWS UNBOUNDED PRECEDING) AS MentalSkillCountRunningTotal, COUNT_BIG(MentalSkill) OVER (PARTITION BY House ORDER BY House ROWS UNBOUNDED PRECEDING) AS MentalSkillCountBigRunningTotal FROM WindowTable

COUNT_BIG differs from COUNT only in the datatype returned (INT vs. BIGINT).

## The Uncommon Functions: VAR, VARP, STDEV, STDEVP, and CHECKSUM_AGG

In SQL Server, VAR and VARP calculate the variance of a dataset. VAR is a common mathematical function that measures the variance of a sample data set.

VARP measures the variance against a population. This function is used when you're working with the entire data set, rather than just a sample set. The internal calculation is slightly different.

STDEV calculates the standard deviation. The standard deviation and variance are tightly connected – STDEV is just the square root of the variance.

Like VARP, STDEVP calculates the standard deviation against a population. For more information on the math involved, see this link.

Since we know the entire population in our example, VARP and STDEVP can be used. With OVER, it's easy to see the average and standard deviation for each individual. We can also verify that the standard deviation is the square root of the variance using another SQL function, SQRT.

SELECT House, FullName, PhysicalSkill, AVG ( ALL PhysicalSkill ) OVER (PARTITION BY house) AS APartition ByVGPhysicalSkillByHouse, VARP ( ALL PhysicalSkill) OVER (PARTITION BY house) AS VarPPhysicalSkillByHouse, STDEVP( ALL PhysicalSkill) OVER (PARTITION BY house) AS StDevPPhysicalSkillByHouse , SQRT (VARP ( ALL PhysicalSkill) OVER (PARTITION BY house) ) AS SquareRootOfVariance FROM WindowTable

In SQL Server, CHECKSUM_AGG returns the checksum of all values in a group, ignoring NULLs. This is a simplified way to determine if the data in a row or group of rows has changed:

SELECT House, FullName, PhysicalSkill, CHECKSUM_AGG ( ALL (CAST (PhysicalSkill AS INT)) ) OVER (PARTITION BY house) AS CheckSumAGG FROM WindowTable UPDATE WindowTable SET PhysicalSkill = 7 WHERE FullName = 'Arya Stark' GO SELECT House, FullName, PhysicalSkill, CHECKSUM_AGG ( ALL (CAST (PhysicalSkill AS INT)) ) OVER (PARTITION BY house) AS CheckSumAGG FROM WindowTable

CHECKSUM_AGG is only compatible with the integer data type. HASHBYTES is a more powerful and flexible method for detecting changes in data, but it is not an aggregate function and cannot be used with OVER. Hashbytes requires the concatenation of the compared fields first.

Aggregate functions have been available since the original versions of SQL Server, and more were added in subsequent releases. The inclusion of OVER in 2008 provided a new level of functionality and usability.

In the next part of this series, we'll take a closer look at ranking functions.