Back to articles list August 25, 2020 - 8 minutes read How to Calculate the Length of a Series with SQL Tihomir Babic Tihomir is a financial and data analyst turned database designer from Zagreb, Croatia. He has extensive experience in the financial services industry, which helps him combine his finance background with his interest in data. When not tinkering with data, he likes to play the guitar in a band and pretend to be a musician. He also regularly writes columns for the Croatian LGBT news site CroL.hr and hopes to soon write his first novel. Tags: sql learn sql window functions What is a time series and why bother to calculate its length using SQL? How do we calculate the length of a series in SQL? I can answer that in two words: window functions! Yes, you’ll need some knowledge of SQL window functions to calculate series length. But what is a time series, and why would you want to find out how long it is? What Is a Series? Although there is some variation in the definition of “time series”, it is basically a sequence of data listed in time order. In a database, this is usually represented by events separated by an equal time. For example, user website logins on consecutive days is considered a series. The table below shows such login dates: iddateconsecutive logins 12020-06-013 days 22020-06-02 32020-06-03 42020-06-066 days 52020-06-07 62020-06-08 72020-06-09 82020-06-10 92020-06-11 102020-06-134 days 112020-06-14 122020-06-15 132020-06-16 142020-06-222 days 152020-06-23 If I wanted to analyze this series’ length, I would be counting the number of consecutive logins. The first series’ length is three days, since the user logged in on 2020-06-01, 2020-06-02, and 2020-06-03. The second series’ length is six days (the user logged in every day between 2020-06-06 and 2020-06-11). Following the same logic, the next two series’ lengths are four and two days, respectively. Why Calculate a Series Length? Time series are extensively used, and there are many situations where you’d need to calculate series’ length. Some examples of calculating series’ lengths include: Measuring a login streak on Stack Overflow. Seeing your activity streak on Duolingo. Tracking how many days you’ve used a fitness app. Analyzing a sales streak in an e-commerce site. Finding the consecutive increase or decrease of a currency value. Basically, anywhere you have a time series, you’ll probably need to calculate its length. The main question remains: How do you calculate a time series’ length, and how do you do it in SQL? As the above table shows, time series is not a difficult concept to grasp. The moment you saw this example, you intuitively knew how to get the series’ length. However, it’s one thing to find a series length manually across several lines of data. But you can’t do that on a database with hundreds, thousands, or millions of rows. Luckily, SQL window functions are here to save the day! If you need to refresh your understanding of window functions, try this window functions cheat sheet. It may be especially useful when we arrive at the calculation part. If you don’t even know what window functions are, I recommend that you read Why Should I Learn SQL Window Functions? before you go any further. The best way to really get a comprehensive knowledge of window functions is the LearnSQL.com Window Functions course. How to Calculate the Length of a Series in SQL For this example, let’s imagine you’re learning High Valyrian on Duolingo. The Duolingo platform has a thing called a streak. As the site explains, a streak is “the number of days in a row you have completed a lesson. Once you complete a lesson in the app or website, your streak will increase by one day. You will receive your daily reward when you meet your XP goal.” (XP is ‘experience points’, by the way.) Monitoring your streak allows you to receive specific platform incentives. So how does Duolingo know how long your streak is? Your learning streak for July 2020 can be presented by the table lesson_completed. It contains the following columns: id: The ID of the lesson. date_completed: The date you completed the lesson. Now let’s write a query to find your streak. I’ll use a Common Table Expression (CTE) to help organize this query. For those of you who are not familiar, our article on CTEs is a good starting point. Running this query will calculate the length of the series: WITH groups AS ( SELECT RANK() OVER (ORDER BY date_completed) AS row_number, date_completed, DATEADD (day, -RANK() OVER (ORDER BY date_completed), date_completed) AS date_group FROM lesson_completed) SELECT COUNT(*) AS days_streak, MIN (date_completed) AS min_date, MAX (date_completed) AS max_date FROM groups GROUP BY date_group; Let me explain what I did. The query can be divided into two parts: Creating the CTE. Selecting data from the CTE. Creating the CTE The part of the query that creates the CTE is given again below: WITH groups AS ( SELECT RANK() OVER (ORDER BY date_completed) AS row_number, date_completed, DATEADD (day, -RANK() OVER (ORDER BY date_completed), date_completed) AS date_group FROM lesson_completed) The CTE is defined by the WITH clause. I’ve decided the name of this CTE should be groups. Everything that is written in the parentheses after the AS keyword is just a fairly regular SELECT statement. Now, what does this do? First, I’ve added the number of rows to the table. To do that, I’ve used the RANK() function. This is a window function, which is why it’s defined by the OVER() clause. I want the ranks to be added sequentially according to the dates, so this function’s result is ordered by the column date_completed. The CTE then selects the column date_completed. I’ve used the DATEADD function (SQL Server) to deduct the row number from the date_completed. In this statement ... DATEADD (day, -RANK() OVER (ORDER BY date_completed), date_completed) AS date_group ... I had to define the interval that will be added (or deducted), which is the day. How many days do I want to deduct? The days that are equal to the number of rows – that’s why I’ve only copied the RANK() window function I already had defined and added the negative sign in front of it. From what do I want this to be deducted? From the date_completed, of course! This part of the query will give the following result: row_numberdate_completeddate_group 12020-07-012020-06-30 22020-07-022020-06-30 32020-07-032020-06-30 42020-07-042020-06-30 52020-07-052020-06-30 62020-07-082020-07-02 72020-07-092020-07-02 82020-07-102020-07-02 92020-07-182020-07-09 102020-07-192020-07-09 112020-07-202020-07-09 122020-07-212020-07-09 132020-07-222020-07-09 142020-07-232020-07-09 152020-07-242020-07-09 162020-07-252020-07-09 172020-07-262020-07-09 182020-07-282020-07-10 192020-07-292020-07-10 202020-07-302020-07-10 212020-07-312020-07-10 Why do I need this? It will help me calculate the series length. Look at the table. Did you notice that the consecutive days belong to the same date group? Why is that? If the dates are consecutive and you deduct the row number from it, you’ll always get the same date. Look at this: row_numberdate_completeddate_group 12020-07-012020-06-30 Deduct the row number (1) from the date (2020-07-01) and you’ll get 2020-06-30. That’s precisely what I have in the table. row_numberdate_completeddate_group 12020-07-012020-06-30 22020-07-022020-06-30 And then the next row. If you deduct the row number (2) from the date (2020-07-02), the result is 2020-06-30 again! Now let’s see what happens when the dates are not consecutive. row_numberdate_completeddate_group 52020-07-052020-06-30 62020-07-082020-07-02 For row 5, the result is still the same, 2020-06-30. That’s fine, but what about the next row? Deduct row number 6 from the date (2020-07-08), and what do you get? 2020-07-02. This is now a new date group, which is not surprising since 2020-07-08 isn’t consecutive to 2020-07-05 in any calendar! A nice little trick, right? Regarding dates in the date_group column, it really doesn’t matter what dates you get. They’ll only serve as values, which will be counted in the second part of the query. Count how many times every group occurs, and you’ll have your series length! Selecting Data from the CTE The second part of the query selects the data from the CTE I’ve defined above: SELECT COUNT(*) AS days_streak, MIN (date_completed) AS min_date, MAX (date_completed) AS max_date FROM groups GROUP BY date_group; This simple SELECT statement counts the number of rows and shows the results in the new column days_streak. Then it selects the minimum and maximum date in the column date_completed with the results shown in the columns min_date and max_date, respectively. All this data will be selected from the CTE named groups. Finally, the data has to be grouped by the date_group. Why? Because I don’t need the total number of rows; I want the number of rows for each date group. days_streakmin_datemax_date 52020-07-012020-07-05 32020-07-082020-07-10 92020-07-182020-07-26 42020-07-282020-07-31 That’s One Way to Calculate a Series Length ... This little example should give you an idea of how to calculate the length of a series with SQL. However, there’s no single way to approach it. And, as you’ve seen, there’s no simple SQL function that will give you what you want. Your data and what you need to calculate will determine your SQL code. It usually requires a little trick; your SQL experience and knowledge will help you decide which one! What you’ll almost certainly need to know when calculating the length of the series are SQL window functions. If you’re still not convinced you should learn window functions, read this interview with the course creator – she explains all the benefits of window functions. This article showed you only a glimpse of what time series are. If you’re interested in learning how to analyze this specific type of data, LearnSQL has some very detailed guidance. If this article helped you solve some time series problems, let me know in the comments section. Tags: sql learn sql window functions You may also like Who Should Learn SQL Window Functions? Learn how you can use SQL window functions daily at your job. We will show you real-life examples of business applications where they can be useful. Read more The LAG Function and the LEAD Function in SQL What are positional functions LAG() and LEAD()? What are the differences, and when do we use them? We’ll look at these window functions in detail. Read more SQL Window Functions Cheat Sheet This 2-page SQL Window Functions Cheat Sheet covers the syntax of window functions and a list of window functions. Download it in PDF or PNG format. Read more SQL Course of the Month – Window Functions Find out why you should learn SQL window functions in April and why you should do it in our course. Read more Common SQL Window Functions: Using Partitions With Ranking Functions Once you’ve learned such window functions as RANK or NTILE, it’s time to master using SQL partitions with ranking functions. Read more How to Use Rank Functions in SQL In this article, you’ll learn how to use rank functions in SQL. It’ll give you a solid foundation for getting deeper into SQL window functions. Read more SQL Window Function Example With Explanations Interested in how SQL window functions work? Scroll down to see our SQL window function example with definitive explanations! Read more When Do I Use SQL Window Functions? SQL window functions can help you quickly and accurately create useful reports and analyses. Learn more with real-world business examples. Read more SQL Window Functions vs. GROUP BY: What’s the Difference? Window functions and GROUP BY may seem similar at first, but they’re quite different. Learn how window functions differ from GROUP BY and aggregate functions. Read more Subscribe to our newsletter Join our weekly newsletter to be notified about the latest posts.