Stop using COUNT(id) to count rows

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • 📚 Learn more about PlanetScale at planetscale.com/youtube.
    ------------------
    00:00 Intro
    01:04 Origins of the myth
    02:00 What does COUNT(*) mean?
    02:30 COUNT(*) example
    04:00 Primary and secondary indexes in MySQL
    05:20 COUNT(id) example
    ------------------
    💬 Follow PlanetScale on social media
    • Twitter: / planetscaledata
    • Discord: / discord
    • TikTok: / planetscale
    • Twitch: / planetscale
    • LinkedIn: / planetscale
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 609

  • @teej_dv
    @teej_dv 11 หลายเดือนก่อน +411

    What about counting in OurSQL?

    • @aarondfrancis
      @aarondfrancis 11 หลายเดือนก่อน +65

      I can always count on you! YouSQL

    • @AmitMerchant
      @AmitMerchant 11 หลายเดือนก่อน

      😆@@aarondfrancis

    • @dmsalomon
      @dmsalomon 11 หลายเดือนก่อน +45

      Chill comrade, we don't need to expropriate the database engine for the proletariat

    • @ejaz787
      @ejaz787 11 หลายเดือนก่อน +61

      in soviet Russia you don't count(*) , * counts you

    • @XmasApple
      @XmasApple 11 หลายเดือนก่อน +8

      Did you mean YQL used in YDB?

  • @DoubleM55
    @DoubleM55 11 หลายเดือนก่อน +185

    I had a strong feeling that something like this is going on in DB engine, but only "argument" I had is pretty weak: "It would be very stupid if DB engine pulled all data from all columns just to pass it to count() function and return single int", and it would be very easy to implement such optimization. So I was using select(*) while simply trusting the DB to do the right thing.
    Thanks to this great video, now I have a confirmation and also know exactly how DB decises how to count rows in optimal way. Great video :)

    • @PlanetScale
      @PlanetScale  11 หลายเดือนก่อน +23

      Nice to have your gut feeling proven right! Glad you liked it.

    • @QWACHU
      @QWACHU 11 หลายเดือนก่อน

      This was maybe true far far in a past, but many years ago count(*) was optimized in DB engines to not use whole columns.

    • @SergeDuka
      @SergeDuka 10 หลายเดือนก่อน +3

      I suggest using ‘count(1)’ to specifically show that no data is used. It eliminates any confusion.

    • @timucinbahsi445
      @timucinbahsi445 9 หลายเดือนก่อน +3

      I think this is one of the most important skills in programming. Knowing you are not the only smart person around :) If it's so obvious to us, developers of the engine might have thought of it as well. Modesty W

  • @ErazerPT
    @ErazerPT 11 หลายเดือนก่อน +192

    The most important take from the video is not about count(), but generically that (if speed is critical) you should always review the execution plan. Easier to do while at the prototyping phase, but can still be accomplished in production, just needs more testing and QA.

    • @meenstreek
      @meenstreek 10 หลายเดือนก่อน +26

      The problem is the execution plan the database will use while in development with very little data will be different to the execution plan in production when the query is now traversing millions of rows. So you just have to make educated guesses and use experience to know when a query will work or not and then iterate.

    • @wasd3108
      @wasd3108 10 หลายเดือนก่อน +10

      I don't think you got the main point. It's about shutting down your uncle when everyone is around.

    • @christianbarnay2499
      @christianbarnay2499 9 หลายเดือนก่อน

      If speed is critical you should always write your queries in the most simple and easy to understand way. Optimizing the execution plan is one of the main purposes of the DB engine. Just let it do its job without interfering. When performances degrade, the first response is to update statistics to let the engine have an accurate view of the data. If the engine still fails at finding the fastest path, rewrite your query in a better way. And ultimately give it directions as a last resort solution.

    • @ErazerPT
      @ErazerPT 9 หลายเดือนก่อน

      @@christianbarnay2499 "If the engine still fails at finding the fastest path, rewrite your query in a better way". That's what i said. "Review the execution plan" not "optimize the execution plan". If you can see that the DB engine can't make heads from tails of what you want, THEN you need to think on what/where to change so that it can. And waiting for degradation is a bad idea, because it can take long enough that when you NEED to make changes you might find out you CAN'T make them anymore (without breaking stuff).
      Also, most people writing queries are software developers with not that much DB background. You can't expect them to write "excellent queries" or design "excellent schema"... As a "dumb software developer" i count my blessings when i have really good DB people around so i can show them my stuff and they can keep me from shooting my own feet :D

    • @christianbarnay2499
      @christianbarnay2499 9 หลายเดือนก่อน

      @@ErazerPTDatabases are so central to software that I can't label someone a software developer if they can't write decent SQL requests. And SQL is so easy to understand it only takes a couple hours to get the basic knowledge that will fulfill 90% or your needs.

  • @dimasshidqiparikesit1338
    @dimasshidqiparikesit1338 11 หลายเดือนก่อน +281

    This is actually incredibly educational. Thanks, planetscale!

  • @bazcuda
    @bazcuda 10 หลายเดือนก่อน +53

    "todos" also means "everything" in Spanish. So, "Select * from todos" means "select everything from everything". Terribly inefficient but it'll be great fun writing the code to sort out what we want from the results 😜
    - a grizzled vet.

    • @CraftDownloads
      @CraftDownloads 10 หลายเดือนก่อน +2

      I get that, also works in portuguese 😂

  • @anothermouth7077
    @anothermouth7077 10 หลายเดือนก่อน +1

    Two thumbs up to you. My TL was at times bugging me on some reviews. Even when I showed him SQL docs saying that most optimisation is already being done by the MySQL there's no need to over engineer the solution.

  • @GuilhermeMG
    @GuilhermeMG 11 หลายเดือนก่อน +46

    With count(1) you get the performance boost without all the confusion

    • @PlanetScale
      @PlanetScale  11 หลายเดือนก่อน +15

      Yup! Counting with a constant is totally a viable option. See 06:09.

    • @CottidaeSEA
      @CottidaeSEA 11 หลายเดือนก่อน +5

      It's what I always do, because I feel it is the most expressive.

    • @fano72
      @fano72 10 หลายเดือนก่อน +1

      I also like to do that. No data is needed from the rows to count them.

    • @TehKarmalizer
      @TehKarmalizer 10 หลายเดือนก่อน

      This is what I've been doing for ages. Using count(*) at least has been slower in some older or obscure database engines. And I've worked with several.

  • @binaryfire
    @binaryfire 11 หลายเดือนก่อน +10

    "You'll have won the argument, which is what the holidays are all about." 🤣

  • @kentlarsson1263
    @kentlarsson1263 7 หลายเดือนก่อน

    Great stuff! Love that you mix in a bit of fun with the content, it's what got me to subscribe!

  • @TravisTennies
    @TravisTennies 11 หลายเดือนก่อน +3

    Just subscribed. Hard to find people who actually talk real facts these days.

  • @user-yb6rd1fm5e
    @user-yb6rd1fm5e 10 หลายเดือนก่อน +62

    A couple of corrections:
    The COUNT() instruction can receive as parameter an EXPRESSION or a wildcard, that is to say, you could write: COUNT(*) or COUNT(1), COUNT(pepito), COUNT(id) or COUNT(99999) which will give you the same, the inside is considered a wildcard, the "*" is used as a wildcard by convention , but we could use any character because by definition, it doesn't use information about any particular column (and including for the COUNT the rows that contains NULL values in any column).
    In the case that you comment the COUNT(id) and the COUNT(*) bring the same result because the "id" is declared as if it was a wildcard so the behavior is the same and the server takes the license to optimize the process as you have explained in the video.
    But, if you really wanted to count the values of a field, the correct way would be to specify COUNT(ALL id) and this expression does have a difference with respect to the COUNT(id), and it is because it will only consider for the count the NON NULL values inside that field In the case of the example of the video COUNT(id) and COUNT(ALL id) should return the same result, since the "id" field, being a primary key, would never be empty, but the difference would be that you would force the server to use the index of the primary key to execute the COUNT(ALL id).
    Finally, while it is true that the server often saves us from ourselves, it is not exactly true that it always makes the best decisions, as a DBA with over 10 years of experience I have found myself in several situations where after checking the execution plan I realize that the server is taking a not so optimal index for the instruction that has been requested and you have to address it to use the correct index for some instruction, this is seen quite often in big data querys.

    • @myonlynick
      @myonlynick 9 หลายเดือนก่อน +7

      i partially disagree with your second paragraph. I Put it to the test. I Downloaded the open source 'world database' for mysql. I ran 2 queries. select count(*) from country; which gives a reply: 239. The second query is: select count(IndepYear) from country; which gives a reply: 192. Indepyear is not a primary key and has several NULL values. IF you are wondering: select count(ALL IndepYear); returns the value 192 as well. Hence, in mysql 'ALL' is optional.

    • @LeeKao
      @LeeKao 9 หลายเดือนก่อน

      I felt my brain growing as I was reading your comment

    • @DmitriyYankin
      @DmitriyYankin 9 หลายเดือนก่อน +4

      Why someone liking this absolutely wrong answer? count() depends on particular columns. count(id) will count only NON NULL ones. COUNT(id) and COUNT(ALL id) are absolutely the same exepression as count(id) is implicitely ALL.

    • @user-yb6rd1fm5e
      @user-yb6rd1fm5e 9 หลายเดือนก่อน +1

      @@DmitriyYankin read the damn documentation before you said something. Also, obviously some databases work a bit different from what I said, if you use other SQL database read you own documentation 🙄.

    • @DmitriyYankin
      @DmitriyYankin 9 หลายเดือนก่อน

      @@user-yb6rd1fm5e yep, read it yourself. dev mysql com: "COUNT(expr) Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement." ... "COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values." ...

  • @lostcarpark
    @lostcarpark 10 หลายเดือนก่อน +9

    Thanks for this. I've had this argument so many times. I think some early SQL engines did look at all columns for count(*), but I believe pretty much all of them have this optimization at this point.

    • @user-ps7zt3vm9q
      @user-ps7zt3vm9q 10 หลายเดือนก่อน +2

      I believe it is a key point. Some early SQL engines did not have that optiomization, now they have it. To make a coclusion with a smart look now, is not exactly right. So man should make it automatically and even do not spend time, always use `count(1)` and not to worry about the peformance or search for confirmations. That solution works everywhere.

  • @Metruzanca
    @Metruzanca 11 หลายเดือนก่อน +2

    This is the type of content I didn't know I wanted. More please.

  • @elkhayder
    @elkhayder 11 หลายเดือนก่อน

    Great video. Short, helpful, and straight to the point.

  • @spacemanmat
    @spacemanmat 9 หลายเดือนก่อน +1

    Always nice to see how the optimiser is working under the covers. I’ve seen a few cases where the original program had done something dumb but the optimiser picked up the issue an optimised the issue away. Still makes me uneasy relying on it though.

  • @pukkimi
    @pukkimi 10 หลายเดือนก่อน +52

    I might call myself a somewhat senior programmer. Sometimes query optimizers did not realize to use a clustered or whatever else indexes of a table when using count (*). This happened at least on Oracle 8 and the workaround was to use count ([indexed column]) where [indexed column] = something. Count (*) caused full table scan in some cases, at least if table contained lobs. So there might really be a reason why some grey beards warn on count(*). When in doubt, check the execution plan.

    • @TheGreatAtario
      @TheGreatAtario 10 หลายเดือนก่อน +13

      I think the real takeaway here is that Oracle sucks

    • @pukkimi
      @pukkimi 10 หลายเดือนก่อน +9

      @@TheGreatAtario Sir you are most certainly correct on Oracle, but there are same kind of stupid behaviors in almost every database as far as I know. Not this fault but many more and different. The real takeaway is that always check the execution plan :)

    • @alexU42k
      @alexU42k 9 หลายเดือนก่อน +2

      It is always a challenge to figure out if something was done on purpose or by lack of knowledge or any other reason

    • @DmitriyYankin
      @DmitriyYankin 9 หลายเดือนก่อน

      Why count should use index?

    • @alexU42k
      @alexU42k 9 หลายเดือนก่อน +3

      @@DmitriyYankinit is up to query optimizer to use index or not, but in general it uses cheaper solution (less IO operations)

  • @aaronmeder
    @aaronmeder 3 หลายเดือนก่อน

    Love it! Thanks guys for sharing

  • @mutatedllama
    @mutatedllama 11 หลายเดือนก่อน +1

    What a great video. Earned a subscribe. Looking forward to more!

  • @PaulSebastianM
    @PaulSebastianM 11 หลายเดือนก่อน +69

    primary keys are clustered, aligned to disk clusters, physically, so counting them means traversing the disk to gather the count, and if the order on disk is not adjacent, then the index is fragmented so counting can take a lot of time, while non primary keys or indexes are no clustered, meaning they don't need to follow the disk physical alignment so they are most often stored off-table in a much more compact data structure, which even when it gets fragmented, the data is still going to be close to each other because all the data structure holds is index records, not every row in the table, like what each clustered index follows.

    • @aarondfrancis
      @aarondfrancis 10 หลายเดือนก่อน

      Correct!

    • @jvapr27
      @jvapr27 10 หลายเดือนก่อน +5

      Very cool to know.
      FYI: I think for some databases this is not true though.
      Clustering can be set differently than primary keys.
      For db2, for example primary indexes are not by default clustered.
      For databases like snowflake, they do not index the primary key.
      Each DB may be different. Still very cool. Thanks!

    • @PaulSebastianM
      @PaulSebastianM 10 หลายเดือนก่อน +2

      @@jvapr27 correct, you can have one single index that can be clustered because that orders the records of that table physically. But that index doesn't have to be the primary key though some dbs might enforce that.

    • @debasishraychawdhuri
      @debasishraychawdhuri 10 หลายเดือนก่อน +3

      count(*) and count (id) are semantically equivalent, the database should not behave differently for those queries.

    • @PaulSebastianM
      @PaulSebastianM 10 หลายเดือนก่อน +1

      @@debasishraychawdhuri well, no. One is explicit, the other is implicit. Implicit means suggested but not expressly stated. Thus they are completely different. That is logical deduction.

  • @artemisamberdrive583
    @artemisamberdrive583 11 หลายเดือนก่อน +3

    for specific purposes (e.g. extremely large table, statistics, etc) i set up a count-table with a single row and column, holding the information of the count of rows of the "parent table". this requires setting up triggers on the parent table insert+delete procedures to increment+decrement the value of the count-table.
    keep in mind that this setup slows down the process of writing data, but data is usually read many times in contrast to written.

    • @GuruEvi
      @GuruEvi 11 หลายเดือนก่อน +5

      That seems extremely 'hacky' and you end up doing the same thing an auto_increment lock does (with more hoops) and if your system gets busy or needs to scale you basically lose any concurrency. Also makes your application a whole lot less portable and you could make a mistake (eg. most examples I've seen online, do not take into account that an INSERT or DELETE can target multiple rows, but the trigger only gets called once, so now you need looping logic or you have a bug). Not sure if you actually "need" a count if you're working with that many records, but most database engines can provide an estimate or perhaps, you may be able to use a different database system altogether that is better optimized for providing statistical information.

    • @SXsoft99
      @SXsoft99 10 หลายเดือนก่อน +1

      yes and at the same time you need to keep on separate columns all the conditions combinations for filtering, also good luck finding programmers that actually use things like triggers since it hides the application business logic in the database

    • @christianbarnay2499
      @christianbarnay2499 9 หลายเดือนก่อน

      I don't know for MySQL but most DB engines already do that on their own. They have a "table information" table that contains all the metadata of each table, including the row count.
      select count(anything not nullable) without a where clause will automatically trigger a lookup in that table to get the current count of rows of the selected table.

  • @ahmad-murery
    @ahmad-murery 10 หลายเดือนก่อน

    I know about count(*) but I didn't know how the Optimizer decides about what index to use.
    One learn new things everyday,
    Thanks Aaron!

  • @lakhanpurohit6969
    @lakhanpurohit6969 11 หลายเดือนก่อน

    Thanks for sharing ur knowledge 😊

  • @rahulxcr
    @rahulxcr 11 หลายเดือนก่อน

    Great explanation. Thanks that's very helpful.

  • @adamtak3128
    @adamtak3128 11 หลายเดือนก่อน +2

    Please make more education SQL content. This was fantastic.

  • @medilies
    @medilies 11 หลายเดือนก่อน

    Maaan, I didn't know this channel is posting such content :0 I liked it, subscribed and activated notifications.

  • @x364
    @x364 10 หลายเดือนก่อน

    Very nice! Thank you!

  • @SeraphPatrick
    @SeraphPatrick 11 หลายเดือนก่อน

    Great video, thanks!

  • @tomasma4896
    @tomasma4896 9 หลายเดือนก่อน

    Cools, I am using MySQL for years but never heard of this one. Please more videos like this :)

  • @TheyCalledMeT
    @TheyCalledMeT 10 หลายเดือนก่อน +3

    which also means it needs a secondary non null index to function the way you indicated.
    redo the trial without such an index to see what it does

  • @Wangaruro
    @Wangaruro 10 หลายเดือนก่อน

    I actually learned something new today! Thanks!

  • @HishamElsayad
    @HishamElsayad 10 หลายเดือนก่อน

    Thanks so much for providing this great information.

  • @adamzaczek6342
    @adamzaczek6342 10 หลายเดือนก่อน

    Holy count, I just found an awesome channel to subscribe to. Love the humor at the end!

    • @aarondfrancis
      @aarondfrancis 10 หลายเดือนก่อน +1

      Holy count 😂

  • @Austin-ft8pn
    @Austin-ft8pn 11 หลายเดือนก่อน

    I love content like this, I'm going to have to check this out for my self!

  • @xavier.xiques
    @xavier.xiques 10 หลายเดือนก่อน

    Good video, thanks

  • @diegocardenas4522
    @diegocardenas4522 5 หลายเดือนก่อน

    Best ad ever, keep them coming

  • @cangurcan99
    @cangurcan99 10 หลายเดือนก่อน

    Wow, using mysql for decades and never knew this. Thanks man.

  • @jhonatanwen
    @jhonatanwen 11 หลายเดือนก่อน

    Incredible video!

  • @airbornesnail
    @airbornesnail 10 หลายเดือนก่อน +1

    "You'll have won the argument which what holidays are all about" - best sentence I've ever heard. :D

  • @justinwduff
    @justinwduff 11 หลายเดือนก่อน

    Very interesting, thank you!

  • @GhiveciuMarian
    @GhiveciuMarian 7 หลายเดือนก่อน

    This make sense when you select all rows from table, but throw in there any WHERE clause, or any filtering then the advantage might evaporate. The hardest thing i found was to display results on filtering ... In this for example might you want to show how many todos are done from total. This specific table does not have the field 'done' but if its having done, and was not specified in a key it will result in a table scan.

  • @JuanLuisEcheverria
    @JuanLuisEcheverria 10 หลายเดือนก่อน

    Hey dude, your way to explain this topic is very well !! Congrats

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน

      Hey, thanks!

  • @Neakas
    @Neakas 9 หลายเดือนก่อน +1

    This is also true in MS-SQL. it will use the most narrow Non Clustered Index on a Table. If there is non, the Tables clustered Index has to be checked, which is slow. but i think in MS-SQL you can use sys.sysindexes to look up the rowcount even faster

  • @ragsChannel
    @ragsChannel 11 หลายเดือนก่อน +35

    A nice gotcha indeed! One question : this "optimization" -- is it applicable ONLY to MySQL or is also the case with say, Postgresql ??

    • @romanstingler435
      @romanstingler435 11 หลายเดือนก่อน +3

      in postgres the star is not necessarily the fastest, it also depends if an index is used or just a scan is used, due to the amount of data in the table, and if auto vacuum was successful recently.

    • @Keelyn1984
      @Keelyn1984 9 หลายเดือนก่อน

      It applies also to oracle. Count(*) counts the rows not the data. But keep in mind that inline-views or subselect still have to fetch data for the sql to even work.
      Using a constant instead of * is also a common (most offen 1 is used) viable alternative.

  • @adriancs6455
    @adriancs6455 8 หลายเดือนก่อน

    thanks for this info

  • @djenning90
    @djenning90 10 หลายเดือนก่อน

    I love your style!

  • @michelprovencher8518
    @michelprovencher8518 10 หลายเดือนก่อน

    Like the EXISTS function where the "SELECT * FROM ..." is just a predicate for the function to work and only the WHERE clause is meaningful

  • @Sweenus987
    @Sweenus987 11 หลายเดือนก่อน +1

    Just curious, what if you copied the id field as a secondary key so whenever id gets a value this copy would also get a copy, just for this purpose?

  • @alexcoding99
    @alexcoding99 10 หลายเดือนก่อน

    very informative!

  • @harrytsang1501
    @harrytsang1501 11 หลายเดือนก่อน +37

    Yes, very often the problem you are trying to solve is more generic, and can be expressed in more generic terms.
    Using language features and trusting that smarter people have put more effort in optimizing the language itself is often more optimised than what we can come up with ourselves.
    One classic C trick was to not use multiply/divide and instead add or subtract bitshifted values. However, in modern systems that no longer take dozens of clock cycles to multiply, the compiler knows better and will just replace your whole expression with a multiply.

    • @Demonslay335
      @Demonslay335 11 หลายเดือนก่อน +18

      The compiler may even optimize your division into multiplication or bit shifts, and all other kinds of fun wizardry. It really is more important to have readable code in many cases nowadays.

    • @CottidaeSEA
      @CottidaeSEA 11 หลายเดือนก่อน +2

      @@Demonslay335 In some cases it doesn't do what you want it to do; but only then is it worth looking into optimization. We should always be aware of the performance impact our code has, but there's no need to go crazy about optimization before you even have any performance information to work with.

    • @hwstar9416
      @hwstar9416 6 หลายเดือนก่อน

      usually compiler isn't as smart as you think. It can do simple optimizations like the one you mentioned, but anything slightly more complex it fails at.

    • @harrytsang1501
      @harrytsang1501 6 หลายเดือนก่อน

      @@hwstar9416 That is because you are using more dynamic languages. With more strict rules for memory safety and type setting, newer languages like Rust and Zig are doing wonders. It doesn't save you from algorithm problems with Big O of n cube tho

    • @hwstar9416
      @hwstar9416 6 หลายเดือนก่อน

      ​@@harrytsang1501 I don't use dynamically typed langs, I use C/C++.
      People often overestimate how optimizing the compiler is, it's not as impressive as you think

  • @CodeKujo
    @CodeKujo 10 หลายเดือนก่อน +2

    The advice to avoid count(*) predates mysql. It may have even been true in mysql at some point or some obscure schema. As you said, count(*) relies on the optimizer to do the right thing. I use count(0) myself, but I wouldn't be surprised if sometimes you have to be more specific to get the right query plan. I think the biggest lesson in this video is not to rely on advice, but to check the plan and know that counting an index can be faster--which is great advice!

    • @ivanskyttejrgensen7464
      @ivanskyttejrgensen7464 9 หลายเดือนก่อน +1

      When I got a job that involved Oracle 7 in 1999 the DBA told me to use count(*) because the old hack with count(1) wasn't needed anymore. So it was presumably true at some point in the 90s.

    • @davidlean8674
      @davidlean8674 9 หลายเดือนก่อน +2

      I was teaching performance tuning on behalf of a database vendor in 1989. COUNT(*) was the recommended approach for SQL Server (both Mifcosoft & Sybase), Oracle, DB2, & Ingress.
      So yes it predates mySQL. The syntax alternative was to specify a column name, But that was only if you wanted to find the count of non-null fields in that column or expression.
      Count (constant) was never necessary in any platform I've used. Yet lots of people suggested it. Most had minimal clue about DB internals or query optimisations.

    • @ABaumstumpf
      @ABaumstumpf 9 หลายเดือนก่อน +1

      @@ivanskyttejrgensen7464 I have used it on oracle6 so at least even back then the advice to use count(1) was already outdated. Likely it was something for pre-ansi sql.

    • @christianbarnay2499
      @christianbarnay2499 9 หลายเดือนก่อน +1

      But all queries rely on the optimizer. The optimizer is the core of the querying engine. As long as you don't mess with the execution plan by forcing a path through hints, the optimizer will always kick in and do its job.

    • @BlairdBlaird
      @BlairdBlaird 9 หลายเดือนก่อน

      @@christianbarnay2499 a big difference is that count(*) semantics are defined by the SQL standard itself, so its optimisation is a lot more likely than count(constant) being recognised as equivalent... to count(*).

  • @Zach2825
    @Zach2825 10 หลายเดือนก่อน

    Wow, thank you!

  • @wadecodez
    @wadecodez 11 หลายเดือนก่อน

    Your thanksgiving conversations sound interesting

  • @hieungo770
    @hieungo770 10 หลายเดือนก่อน

    I love how you explain things, is there any course from you that teach from the ground up

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน

      Check out our MySQL for Developers course: planetscale.com/learn/courses/mysql-for-developers/introduction/course-introduction

  • @devhaua
    @devhaua 9 หลายเดือนก่อน

    One of the best videos I haveever watched on SQL, tq

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน

      Thank you! Love hearing that

  • @odysseus655
    @odysseus655 10 หลายเดือนก่อน +2

    I stopped using count(*) decades ago when I ran into a problem with our database engine at the time where there was some catalog corruption and this was erroring out with "column not found error". Lately I've been using count(1). Likely not a great reason to continue not using it (and I'm expecting the execution plan to be the same in any case).

  • @mortona42yt
    @mortona42yt 10 หลายเดือนก่อน +2

    I knew the db optimizes the query, but didn't know details like this. I would be surprised if after 20+ years of development, it would interpret count(*) as "load everything from the table and count it". What about when you don't have a secondary index, or doing a join query? Probably still counts the returned rows, or some optimization with joined indexes?

  • @paulthomas2577
    @paulthomas2577 10 หลายเดือนก่อน

    This changes by database server. This is true for MySQL but SELECT count(*) is much slower in SQL Server.
    In SQL Server the way I learned to to do it was SELECT count(1)

  • @jannickbreunis
    @jannickbreunis 11 หลายเดือนก่อน

    “When you’re arguing with you family…” haha nice one.

  • @mariomario4676
    @mariomario4676 8 หลายเดือนก่อน

    thanks for the educational content

  • @smeedee
    @smeedee 8 หลายเดือนก่อน

    You could also use the „rows“ from the explain query. In some use cases this is already good enough ^^

  • @YOUdudex
    @YOUdudex 11 หลายเดือนก่อน

    Interesting, thanks ✌️

  • @nicolasguillenc
    @nicolasguillenc 10 หลายเดือนก่อน

    you are amazing at explaining things man

  • @greatestuff
    @greatestuff 9 หลายเดือนก่อน

    Great video

  • @svenroettjer
    @svenroettjer 10 หลายเดือนก่อน

    Thanx

  • @martymoo
    @martymoo 10 หลายเดือนก่อน

    I didn't know this. Thanks! Has it always been this way?

  • @hasenhirn1965
    @hasenhirn1965 9 หลายเดือนก่อน

    You never finish learning
    Great explanation 👍

  • @quintennn
    @quintennn 5 หลายเดือนก่อน

    "Tell your family on thanksgiving" It would take me about 5 winters to explain this to my family.

  • @nm6x
    @nm6x 11 หลายเดือนก่อน

    This video should have more views and likes ❤

  • @XaviSanz35
    @XaviSanz35 10 หลายเดือนก่อน +1

    if you remove secondary indexes, and run count again, how long it takes?

  • @dennisdashkevich
    @dennisdashkevich 11 หลายเดือนก่อน +2

    Nice video, thank you!
    And what's the time complexity of this operation in MySQL? Is it linear? How would you go about counting records in a table with millions of rows?

    • @IARRCSim
      @IARRCSim 10 หลายเดือนก่อน

      O(n) but not all O(n) algorithms take the same time in practice. The constant factor of n is probably several times different.

  • @Im_Ninooo
    @Im_Ninooo 11 หลายเดือนก่อน +6

    I've been using COUNT(1) for a while now. I wonder if the behavior is the same in other databases such as CockroachDB

  • @arithex
    @arithex 10 หลายเดือนก่อน +1

    Is 'select count' still always going to be a table-scan query? Or are there internal optimizations, that maintain a count of active rows in a table.. maybe an in-memory cache that's updated atomically with insert/delete operations?

  • @DreanPetruza
    @DreanPetruza 10 หลายเดือนก่อน +1

    Is this in ANSI SQL too? or what would be the advantage of supplying the column name in COUNT()? doesn't COUNT() skip the rows that have NULL in the specified column?

  • @sadhakbj
    @sadhakbj 8 หลายเดือนก่อน

    Love this video. Love the way he teaches.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      ❤️ thank you so much

  • @surgeon23
    @surgeon23 10 หลายเดือนก่อน

    I wonder if it's supposed to be the same for oracle because at one point we had a significant performance drop when using count(*) compared to count(id).
    Maybe just another of those optimizer issues.

  • @airjuri
    @airjuri 10 หลายเดือนก่อน

    You should also create index for columns that are used for whatever is usually in "where" ;)

  • @lq_yt
    @lq_yt 10 หลายเดือนก่อน

    on mysql

  • @dealloc
    @dealloc 9 หลายเดือนก่อน

    Even in SQLite COUNT(*) takes less opcodes compared to COUNT(1) and COUNT(id) as it will just read value that is stored already, instead of having to aggregate.

  • @MohamedAmer-hn1tv
    @MohamedAmer-hn1tv 11 หลายเดือนก่อน +13

    Great video, It was incredibly well-presented.
    By any chance, would it be possible to remove the credit card requirement for creating free DB? Thanks a bunch!

  • @StEvUgnIn
    @StEvUgnIn 11 หลายเดือนก่อน

    Well said

  • @qkktech
    @qkktech 9 หลายเดือนก่อน

    count star is asking recordcount and that is table property. When select * is used in columnar database then it is not optimal since in case columnar databases columns are retrieved separate and then records are put together. In case of columnar databases for null allowed columns also distinct count is also column property sometimes also min and max etc. Vertica has for example no indexes at all.

  • @SR-ti6jj
    @SR-ti6jj 11 หลายเดือนก่อน +18

    Does this mean adding a non-null secondary index will improve count performance on tables that don't already have one?

    • @parkamark
      @parkamark 10 หลายเดือนก่อน

      You could create a secondary index on the same column(s) as the primary index. That would then speed up counting, unless the DB engine is doing some clever stuff under the hood that means you don't necessarily have to do that. But given what he's said in this video, creating a secondary index on the same column(s) as the primary is certainly a good workaround. Maybe someone who knows more than I do could clarify this point.

    • @parkamark
      @parkamark 10 หลายเดือนก่อน

      To answer your question directly, having ANY index on a table is way better than none at all, both in the case of searching and counting. So having a simple non-null non-unique index would be the minimal requirement for fast counting. It also matters if the columns are fixed or variable length datatypes, eg. int is fixed length, varchar is variable. If all columns are fixed, then the database can also do a fast count without any indices because it knows that the length of each row is fixed, and it knows the full size of the entire table, thus what the number of rows must be.

  • @CirTap
    @CirTap 10 หลายเดือนก่อน

    Thank you for saving everyone's Thanksgiving! 😂

  • @nickzelenskyy7290
    @nickzelenskyy7290 10 หลายเดือนก่อน

    Any queries with dynamic attributes , functions , etc... Don't cache on db engine level if you use caching there, so in that case it does effect the performance.

    • @ABaumstumpf
      @ABaumstumpf 9 หลายเดือนก่อน

      That is not true - not only does this behaviour depend on the DB but also which function is used: Most DBs have some sort of notion of pure functions (functions who always return the same result for the same input and have no outside sideeffects).
      Heck we are using them. A couple thousand of those functions even.

  • @rosieroti4063
    @rosieroti4063 10 หลายเดือนก่อน +1

    great info. However, I'd also like some more insights into count(*) in case we have a where clause in the query.
    Since count(*) uses the smallest secondary non null key, will it be "slower" if I'm counting rows where a column value is null (or perhaps some other where clause combination which might include nulls) ?

    • @imacomputer1234
      @imacomputer1234 9 หลายเดือนก่อน

      It won't be slower, but it will only count rows where that column isn't null!

  • @nskeip
    @nskeip 10 หลายเดือนก่อน +1

    I should create a database engine where COUNT(*) is "multiply number of columns by the number of rows", COUNT(/) is "divide rows by columns", COUNT(+) is ... (you get the idea)

  • @victornogueira2346
    @victornogueira2346 11 หลายเดือนก่อน

    more content about SQL, please!

  • @Petoj87
    @Petoj87 9 หลายเดือนก่อน +1

    Would be interesting if the same is true for other sql engines like litesql, postgres, sql server and postgres

  • @jameslucas5590
    @jameslucas5590 10 หลายเดือนก่อน

    I'm a vet and will use *. I don't care if it bothers anyone, because I want to just move on. However, I Love this video and the knowledge you share.

  • @therealcomment5622
    @therealcomment5622 6 หลายเดือนก่อน

    I can't wait to go to the next thanksgiving.

  • @Peter-Ja
    @Peter-Ja 9 หลายเดือนก่อน

    Looking forward for the next argument with my family about the performance of the SQL count operation. Everyone will be so excited

    • @PlanetScale
      @PlanetScale  9 หลายเดือนก่อน

      Hopefully you win! "Happy Thanksgiving, y'all don't know anything!" - Peter, probably

  • @violin245
    @violin245 7 หลายเดือนก่อน

    How is someone talking about SQL this charming

  • @ZanarkandStarplayer
    @ZanarkandStarplayer 11 หลายเดือนก่อน

    Now I'm ready for the holidays 💪🏽

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน

      Go get em!

  • @rumisbadforyou9670
    @rumisbadforyou9670 11 หลายเดือนก่อน +2

    I never got into using SQL databases, but having written a few on-disk and over-the-network data structures, I'd expect the count(*) to be smart enough to use some cached "total_length" value, especially considering that a lot of effort went into writing query optimizers. I guess, people would think that because of lack of experience of writing a data store yourself?

    • @sohn7767
      @sohn7767 11 หลายเดือนก่อน

      would be easy enough on a simple table query, in fact count() and indexes probably do do that. However whenever you call a function or a view or whatever script with at least a little logic, the final length is unknown

    • @brdrnda3805
      @brdrnda3805 11 หลายเดือนก่อน +1

      I never wrote a data store, but worked with relational databases for almost 25 years. Still, assuming COUNT(*) wouldn't be fast and it would process the whole record sounds utterly absurd to me. (And, honestly, I never heard that myth)

  • @jeffmccloud905
    @jeffmccloud905 10 หลายเดือนก่อน +2

    actually, WAY back in the day in Oracle (1990s), it was recommended to use SELECT COUNT(1) and not COUNT(*) because it actually did make a difference. but they fixed that. but some grizzled old devs kept that convention.

    • @edbutler3
      @edbutler3 10 หลายเดือนก่อน +1

      Yeah, I've been using count(1) on Oracle for decades. I've suspected for a while that recent versions have a smart enough query optimizer to do the right thing with count(*), but I haven't taken the time to verify. And for MS SQL Server, the "culture" has always been to use count(*), so I've assumed it's ok there.

    • @PanduPoluan
      @PanduPoluan 10 หลายเดือนก่อน

      I always use COUNT(1). Because COUNT(*) depends totally on the engine's optimisation.

  • @alexpetrov8871
    @alexpetrov8871 8 หลายเดือนก่อน

    0:55 >optimized
    This optimization is specifically MySQL case, in other DB it may not be so and can even depend on DB version. So generally for SQL count(id) is better (if there is suitable index of course).

  • @wmafendi
    @wmafendi 10 หลายเดือนก่อน

    new thing for me. TQ

  • @jayakarthikreddy
    @jayakarthikreddy 6 หลายเดือนก่อน

    I verified the explain plan of count(*) on a table, optimiser even picked a secondary index on a null column.

  • @Ceelbc
    @Ceelbc 11 หลายเดือนก่อน +5

    With a proper SQL implementation, this should not matter; the compiler should handle this.

    • @Ceelbc
      @Ceelbc 9 หลายเดือนก่อน

      @@lawrencechiasson975 *which is part of the compiler.

  • @adamtretera273
    @adamtretera273 10 หลายเดือนก่อน

    Fire video ❤

    • @PlanetScale
      @PlanetScale  10 หลายเดือนก่อน

      Thanks 🔥