16th Mar 2023 10 minutes read A Detailed Guide to the SQL COUNT() Function Tihomir Babic sql count You can count on this guide to give you detailed explanations (backed with examples) of all the typical uses of the COUNT() function. Knowing data aggregation is a necessary stepping stone in your journey to becoming an SQL master. This includes the COUNT() function – one of SQL's most often used aggregate functions. Data aggregation still counts as basic-level SQL knowledge. Having firm foundations ensures you don’t have holes in your knowledge, which makes learning more complex SQL concepts easier. You can get this foundation and more in our SQL From A to Z learning track. Its seven interactive courses will give you a firm structure and make your learning systematic. You start with SQL basics, such as retrieving data and using standard SQL functions (which include COUNT() and other aggregate functions). Once you’ve got that down, it’s much easier to follow the more advanced concepts this track teaches you, such as window functions, common table expressions, and GROUP BY extensions. Once you learn all this, you’ll need some practice to really master it. For that purpose, there’s the SQL Practice Set with its 88 interactive exercises. This article will also be practical and show you different examples of using COUNT(). For that, we need a suitable dataset. Example Dataset We’ll be dealing with directors and their films. The first table is, unsurprisingly, named directors. Here’s the query you can use to replicate this table yourself. The data in the table looks like this: idfirst_namelast_namedate_of_birthdate_of_deathplace_of_birthcountry_of_birth 1IngmarBergman1918-07-142007-07-30UppsalaSweden 2LynneRamsay1969-12-05NULLGlasgowScotland 3AlejandroJodorowsky1929-02-07NULLTocophillaChile 4AgnesVarda1928-05-302019-03-29BrusselsBelgium 5PedroAlmodóvar1949-09-25NULLCalzada de CalatravaSpain 6ChloéZhao1982-03-31NULLBeijingChina 7JordanPeele1979-02-21NULLNew York CityUSA 8CélineSciamma1978-11-12NULLPontoiseFrance 9Jean-LucGodard1930-12-032022-09-13ParisFrance 10StanleyKubrick1928-07-261999-03-07New York CityUSA It’s a simple list of directors with some info about them. The second table is films. You can create this table, too, using this query. Here’s the data in the table: iddirector_idfilm_namerelease_datelanguage 17Get Out2017-01-23English 22We Need to Talk About Kevin2011-05-12English 32You Were Never Really Here2017-05-27English 45The Skin I Live In2011-05-19Spanish 57Us2019-03-08English 62Ratcatcher1999-05-13English 72Morvern Collar2002-11-01English 83El Topo1970-12-18Spanish 93The Holy Mountain1973-11-27Spanish 10NULLDog Day Afternoon1975-09-20English 11NULLThe Hater2020-03-06Polish 121Cries and Whispers1972-12-21English It’s a list of films connected to the table directors via the column director_id. Two director_id values are NULL. In the context of our dataset, it means there is data about these films in the table films. However, there’s no corresponding director in the table directors. In other words, we have all the info about ‘Dog Day Afternoon’ and ‘The Hater’, except who directed them. Now that we know the data, we can deal with COUNT(). What Is COUNT()? The answer is in the name: the COUNT() function in SQL is used for counting rows. It counts rows in the result set, not in the table. To be more precise, it will count rows in the table if your table is a result set – i.e. if you haven’t filtered data in any way. If you filter the data at all, COUNT() returns the number of rows in the filtered data. Here’s an example: SELECT COUNT(*) AS number_of_directors FROM directors; The asterisk (*) in the COUNT() function instructs it to count all rows. Since no filters are applied, the whole directors table will be the result set. The COUNT() function will, therefore, return the number of rows in the table: number_of_directors 10 The number of rows is ten, which, in this case, also represents the number of directors. If the result set is limited, COUNT(*) will return a different value. For example, suppose we wanted to show the number of dead directors. This means counting only directors that have a date in the column date_of_death. Those who have NULL values are still alive. Here’s the query: SELECT COUNT(*) AS number_of_dead_directors FROM directors WHERE date_of_death IS NOT NULL; We achieved what we wanted by filtering data using the WHERE clause. Here’s the result: number_of_dead_directors 4 Four directors are not alive anymore. COUNT() and GROUP BY Typically, the COUNT() function is used with the GROUP BY clause. To refresh your memory, GROUP BY is a clause that groups all the rows with the same value. Usually, groups are specified columns of the dataset. For more info, read this article on how to use GROUP BY. Here’s an example of GROUP BY’s usage with COUNT(*): SELECT country_of_birth, COUNT(*) AS number_of_directors FROM directors GROUP BY country_of_birth ORDER BY country_of_birth; We want to show all the countries and the number of directors born in them. We select the country and use COUNT(*) to show the number of directors. Then we specify the country_of_birth column in GROUP BY. Every column from the table that appears in the SELECT must also appear in GROUP BY. It’s logical, because you want to see the number of directors by country of birth and show those countries simultaneously. Finally, the output is sorted alphabetically by country using ORDER BY country_of_birth: country_of_birthnumber_of_directors Belgium1 Chile1 China1 France2 Scotland1 Spain1 Sweden1 USA2 There’s one director from every country except France and the USA. If you need some more examples, here’s the article that shows how to use GROUP BY with SQL aggregate functions. Using COUNT() With an Expression or Column You’re not limited to writing an asterisk in COUNT(). It can also be used with a column or an expression, such as the CASE WHEN statement. The difference is that COUNT(expression) counts only the non-NULL values of the expression. COUNT(*) will count the NULL values, too. As an example, let’s try to list all the film languages and the number of directors who made films in those languages. It seems we have all the data in the table films. What would happen if we use COUNT(*)? SELECT language, COUNT(*) AS number_of_directors FROM films GROUP BY language ORDER BY language; languagenumber_of_directors English8 Polish1 Spanish3 In total, this output shows all 12 films that appear in the table. Spoiler alert: This is wrong! The reason? Remember, COUNT(*) counts the number of rows in the dataset, including NULLs. So this output doesn’t represent the number of directors at all! Since we counted rows in the table films, it’s now obvious we counted the number of films, not directors! In other words, there are eight movies in English, one in Polish, and three in Spanish. This output has nothing to do with the number of directors! Would it be a better idea to use COUNT(director_id)? We should try it: SELECT language, COUNT(director_id) AS number_of_directors FROM films GROUP BY language ORDER BY language; We select the languages and count the directors via their identifier: director_id. The output is grouped by and sorted by language. languagenumber_of_directors English7 Polish0 Spanish3 The output differs from the previous one, so let’s analyze it. What we did well by counting director_id is that we got rid of the films with NULLs in director_id. This is especially obvious for the Polish language – the number of directors is zero. Why? Because there is only one Polish movie and it has a NULL in director_id, so it’s not counted. Less obviously, one such movie is also ‘missing’ in English. It’s Dog Day Afternoon, because it also has a NULL in director_id. In total, the result shows ten directors because there are ten records in the table films with non-NULL values in the column director_id. But if you go back to the table films, you could see some director IDs appear several times. So yes, we did show the number of directors, but we also included all the directors every time they appear in the table. In other words, we included duplicate values. So this result is closer to what we wanted to achieve, but still not completely right. Counting duplicate director IDs inflates the result, i.e. it shows an unrealistically high number of individual directors. What would solve this duplication problem is using COUNT() with DISTINCT. COUNT(expression) with DISTINCT Now that you’ve learned how to use COUNT() with a column name, it’s time to learn how to use it with DISTINCT. The DISTINCT clause removes duplicates. When it's used with COUNT(expression), it means the function will count only the unique instances of a column/expression. Let’s use the same example as above, but with DISTINCT: SELECT language, COUNT(DISTINCT director_id) AS number_of_directors FROM films GROUP BY language ORDER BY language; Notice that DISTINCT is written in the COUNT() function. It comes before the column you want to count. And here are the results: languagenumber_of_directors English3 Polish0 Spanish2 The result shows three directors with English movies, zero in Polish, and two in Spanish. Wait a second! Isn’t that a drastically different result from the one we got when we used COUNT(director_id) without DISTINCT? Here is the previous result: languagenumber_of_directors English7 Polish0 Spanish3 Do you realize what happened here? Without DISTINCT, we counted all the director_id values. So the correct way of finding the number of directors in this example is using COUNT() with DISTINCT. Caution: Using COUNT() With LEFT JOIN Finally, let’s use both our tables simultaneously. Imagine you want to output all the directors and the count of their films. You would (correctly) think that you need LEFT JOIN. That’s a good start! There may be directors in the table directors that don’t have any movies in our films table. Since we want to show the list of all directors, LEFT JOIN is the right choice. To count the number of films, you might feel the urge to use COUNT(*). Here’s the query: SELECT d.id, d.first_name, d.last_name, COUNT(*) AS number_of_films FROM directors d LEFT JOIN films f ON d.id = f.director_id GROUP BY d.id, d.first_name, d.last_name ORDER BY d.id; We selected the required columns and used COUNT(*). The two tables are joined on the column containing director IDs. The output is grouped by directors’ IDs and names and sorted by the ID. And the results: idfirst_namelast_namenumber_of_films 1IngmarBergman1 2LynneRamsay4 3AlejandroJodorowsky2 4AgnesVarda1 5PedroAlmodóvar1 6ChloéZhao1 7JordanPeele2 8CélineSciamma1 9Jean-LucGodard1 10StanleyKubrick1 It seems that every director has at least one movie. In total, that’s 15 movies. Hold on! That doesn’t feel right! There are only 12 films in the table films. Also, we know for a fact that there are no Stanley Kubrick movies in the table. How come the output shows he has one? Is the data for all other directors wrong, too? Yes. Using COUNT(*) is not the right solution in this case. LEFT JOIN will return a row for all directors, even the one that couldn’t be found in the table films. And COUNT(*) counts all those rows, even the ones which have no matching films. Instead of COUNT(*), use COUNT() with the column name. Which column should we count? The safest way is to count the unique identifiers, which is the column id from the table films. SELECT d.id, d.first_name, d.last_name, COUNT(f.id) AS number_of_films FROM directors d LEFT JOIN films f ON d.id = f.director_id GROUP BY d.id, d.first_name, d.last_name; The query is virtually the same as earlier, bar the different use of COUNT(). And this is what it returns: idfirst_namelast_namenumber_of_films 1IngmarBergman1 2LynneRamsay4 3AlejandroJodorowsky2 4AgnesVarda0 5PedroAlmodóvar1 6ChloéZhao0 7JordanPeele2 8CélineSciamma0 9Jean-LucGodard0 10StanleyKubrick0 That’s more like it! If you sum up the values, you’ll see there are ten films in total. Why not 12? Because two films are by directors that don’t exist in our dataset, i.e., they have NULLs in the column director_id in the table films. To consolidate what you learned here, check out some more examples of COUNT() uses. COUNT() Is as Easy as One, Two, Three! That wasn’t difficult, was it? These easy examples showed you all the variations of using COUNT(). The function itself is not difficult to understand. But, as you saw, there are several ways of using it, and each might return a different result. Choosing how to use COUNT() becomes easier with practice. The SQL Practice Set is designed for exactly that purpose. You could also try these seven examples of the COUNT() function. Aside from practice, it’s important that you know your data and what you want to achieve with it. When all this is clear to you, COUNT() really becomes an easy function. You can count on that! Pun intended. Tags: sql count