Stop using COUNT(id) to count rows

PlanetScale

มุมมอง 375 865

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 9 ม.ค. 2025

ความคิดเห็น • 611

@teej_dv ปีที่แล้ว ⁺⁴²³
What about counting in OurSQL?
@aarondfrancis ปีที่แล้ว ⁺⁶⁹
I can always count on you! YouSQL
@AmitMerchant ปีที่แล้ว
😆@@aarondfrancis
@dmsalomon ปีที่แล้ว ⁺⁴⁶
Chill comrade, we don't need to expropriate the database engine for the proletariat
@ejaz787 ปีที่แล้ว ⁺⁶⁴
in soviet Russia you don't count(*) , * counts you
@XmasApple ปีที่แล้ว ⁺⁸
Did you mean YQL used in YDB?
@DoubleM55 ปีที่แล้ว ⁺¹⁹¹
I had a strong feeling that something like this is going on in DB engine, but only "argument" I had is pretty weak: "It would be very stupid if DB engine pulled all data from all columns just to pass it to count() function and return single int", and it would be very easy to implement such optimization. So I was using select(*) while simply trusting the DB to do the right thing.
Thanks to this great video, now I have a confirmation and also know exactly how DB decises how to count rows in optimal way. Great video :)
@PlanetScale ปีที่แล้ว ⁺²³
Nice to have your gut feeling proven right! Glad you liked it.
@QWACHU ปีที่แล้ว
This was maybe true far far in a past, but many years ago count(*) was optimized in DB engines to not use whole columns.
@SergeDuka ปีที่แล้ว ⁺³
I suggest using ‘count(1)’ to specifically show that no data is used. It eliminates any confusion.
@timucinbahsi445 ปีที่แล้ว ⁺³
I think this is one of the most important skills in programming. Knowing you are not the only smart person around :) If it's so obvious to us, developers of the engine might have thought of it as well. Modesty W
@ErazerPT ปีที่แล้ว ⁺¹⁹⁸
The most important take from the video is not about count(), but generically that (if speed is critical) you should always review the execution plan. Easier to do while at the prototyping phase, but can still be accomplished in production, just needs more testing and QA.
@meenstreek ปีที่แล้ว ⁺²⁷
The problem is the execution plan the database will use while in development with very little data will be different to the execution plan in production when the query is now traversing millions of rows. So you just have to make educated guesses and use experience to know when a query will work or not and then iterate.
@wasd3108 ปีที่แล้ว ⁺¹⁰
I don't think you got the main point. It's about shutting down your uncle when everyone is around.
@christianbarnay2499 ปีที่แล้ว
If speed is critical you should always write your queries in the most simple and easy to understand way. Optimizing the execution plan is one of the main purposes of the DB engine. Just let it do its job without interfering. When performances degrade, the first response is to update statistics to let the engine have an accurate view of the data. If the engine still fails at finding the fastest path, rewrite your query in a better way. And ultimately give it directions as a last resort solution.
@ErazerPT ปีที่แล้ว
@@christianbarnay2499 "If the engine still fails at finding the fastest path, rewrite your query in a better way". That's what i said. "Review the execution plan" not "optimize the execution plan". If you can see that the DB engine can't make heads from tails of what you want, THEN you need to think on what/where to change so that it can. And waiting for degradation is a bad idea, because it can take long enough that when you NEED to make changes you might find out you CAN'T make them anymore (without breaking stuff).
Also, most people writing queries are software developers with not that much DB background. You can't expect them to write "excellent queries" or design "excellent schema"... As a "dumb software developer" i count my blessings when i have really good DB people around so i can show them my stuff and they can keep me from shooting my own feet :D
@christianbarnay2499 ปีที่แล้ว
@@ErazerPTDatabases are so central to software that I can't label someone a software developer if they can't write decent SQL requests. And SQL is so easy to understand it only takes a couple hours to get the basic knowledge that will fulfill 90% or your needs.
@dimasshidqiparikesit1338 ปีที่แล้ว ⁺²⁸¹
This is actually incredibly educational. Thanks, planetscale!
@binaryfire ปีที่แล้ว ⁺¹²
"You'll have won the argument, which is what the holidays are all about." 🤣
@ChristianPalominoRiquero ปีที่แล้ว ⁺⁶³
A couple of corrections:
The COUNT() instruction can receive as parameter an EXPRESSION or a wildcard, that is to say, you could write: COUNT(*) or COUNT(1), COUNT(pepito), COUNT(id) or COUNT(99999) which will give you the same, the inside is considered a wildcard, the "*" is used as a wildcard by convention , but we could use any character because by definition, it doesn't use information about any particular column (and including for the COUNT the rows that contains NULL values in any column).
In the case that you comment the COUNT(id) and the COUNT(*) bring the same result because the "id" is declared as if it was a wildcard so the behavior is the same and the server takes the license to optimize the process as you have explained in the video.
But, if you really wanted to count the values of a field, the correct way would be to specify COUNT(ALL id) and this expression does have a difference with respect to the COUNT(id), and it is because it will only consider for the count the NON NULL values inside that field In the case of the example of the video COUNT(id) and COUNT(ALL id) should return the same result, since the "id" field, being a primary key, would never be empty, but the difference would be that you would force the server to use the index of the primary key to execute the COUNT(ALL id).
Finally, while it is true that the server often saves us from ourselves, it is not exactly true that it always makes the best decisions, as a DBA with over 10 years of experience I have found myself in several situations where after checking the execution plan I realize that the server is taking a not so optimal index for the instruction that has been requested and you have to address it to use the correct index for some instruction, this is seen quite often in big data querys.
@myonlynick ปีที่แล้ว ⁺⁷
i partially disagree with your second paragraph. I Put it to the test. I Downloaded the open source 'world database' for mysql. I ran 2 queries. select count(*) from country; which gives a reply: 239. The second query is: select count(IndepYear) from country; which gives a reply: 192. Indepyear is not a primary key and has several NULL values. IF you are wondering: select count(ALL IndepYear); returns the value 192 as well. Hence, in mysql 'ALL' is optional.
@LeeKao ปีที่แล้ว
I felt my brain growing as I was reading your comment
@DmitriyYankin ปีที่แล้ว ⁺⁴
Why someone liking this absolutely wrong answer? count() depends on particular columns. count(id) will count only NON NULL ones. COUNT(id) and COUNT(ALL id) are absolutely the same exepression as count(id) is implicitely ALL.
@ChristianPalominoRiquero ปีที่แล้ว ⁺¹
@@DmitriyYankin read the damn documentation before you said something. Also, obviously some databases work a bit different from what I said, if you use other SQL database read you own documentation 🙄.
@DmitriyYankin ปีที่แล้ว
@@ChristianPalominoRiquero yep, read it yourself. dev mysql com: "COUNT(expr) Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement." ... "COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values." ...
@bazcuda ปีที่แล้ว ⁺⁵⁷
"todos" also means "everything" in Spanish. So, "Select * from todos" means "select everything from everything". Terribly inefficient but it'll be great fun writing the code to sort out what we want from the results 😜
- a grizzled vet.
@CraftDownloads ปีที่แล้ว ⁺²
I get that, also works in portuguese 😂
@GuilhermeMG ปีที่แล้ว ⁺⁴⁶
With count(1) you get the performance boost without all the confusion
@PlanetScale ปีที่แล้ว ⁺¹⁵
Yup! Counting with a constant is totally a viable option. See 06:09.
@CottidaeSEA ปีที่แล้ว ⁺⁵
It's what I always do, because I feel it is the most expressive.
@fano72 ปีที่แล้ว ⁺¹
I also like to do that. No data is needed from the rows to count them.
@TehKarmalizer ปีที่แล้ว
This is what I've been doing for ages. Using count(*) at least has been slower in some older or obscure database engines. And I've worked with several.
@pukkimi ปีที่แล้ว ⁺⁵³
I might call myself a somewhat senior programmer. Sometimes query optimizers did not realize to use a clustered or whatever else indexes of a table when using count (*). This happened at least on Oracle 8 and the workaround was to use count ([indexed column]) where [indexed column] = something. Count (*) caused full table scan in some cases, at least if table contained lobs. So there might really be a reason why some grey beards warn on count(*). When in doubt, check the execution plan.
@TheGreatAtario ปีที่แล้ว ⁺¹³
I think the real takeaway here is that Oracle sucks
@pukkimi ปีที่แล้ว ⁺⁹
@@TheGreatAtario Sir you are most certainly correct on Oracle, but there are same kind of stupid behaviors in almost every database as far as I know. Not this fault but many more and different. The real takeaway is that always check the execution plan :)
@alexU42k ปีที่แล้ว ⁺²
It is always a challenge to figure out if something was done on purpose or by lack of knowledge or any other reason
@DmitriyYankin ปีที่แล้ว
Why count should use index?
@alexU42k ปีที่แล้ว ⁺³
@@DmitriyYankinit is up to query optimizer to use index or not, but in general it uses cheaper solution (less IO operations)
@ragsChannel ปีที่แล้ว ⁺³⁵
A nice gotcha indeed! One question : this "optimization" -- is it applicable ONLY to MySQL or is also the case with say, Postgresql ??
@romanstingler435 ปีที่แล้ว ⁺³
in postgres the star is not necessarily the fastest, it also depends if an index is used or just a scan is used, due to the amount of data in the table, and if auto vacuum was successful recently.
@Keelyn1984 ปีที่แล้ว
It applies also to oracle. Count(*) counts the rows not the data. But keep in mind that inline-views or subselect still have to fetch data for the sql to even work.
Using a constant instead of * is also a common (most offen 1 is used) viable alternative.
@PaulSebastianM ปีที่แล้ว ⁺⁷⁰
primary keys are clustered, aligned to disk clusters, physically, so counting them means traversing the disk to gather the count, and if the order on disk is not adjacent, then the index is fragmented so counting can take a lot of time, while non primary keys or indexes are no clustered, meaning they don't need to follow the disk physical alignment so they are most often stored off-table in a much more compact data structure, which even when it gets fragmented, the data is still going to be close to each other because all the data structure holds is index records, not every row in the table, like what each clustered index follows.
@aarondfrancis ปีที่แล้ว
Correct!
@jvapr27 ปีที่แล้ว ⁺⁵
Very cool to know.
FYI: I think for some databases this is not true though.
Clustering can be set differently than primary keys.
For db2, for example primary indexes are not by default clustered.
For databases like snowflake, they do not index the primary key.
Each DB may be different. Still very cool. Thanks!
@PaulSebastianM ปีที่แล้ว ⁺²
@@jvapr27 correct, you can have one single index that can be clustered because that orders the records of that table physically. But that index doesn't have to be the primary key though some dbs might enforce that.
@debasishraychawdhuri ปีที่แล้ว ⁺³
count(*) and count (id) are semantically equivalent, the database should not behave differently for those queries.
@PaulSebastianM ปีที่แล้ว ⁺¹
@@debasishraychawdhuri well, no. One is explicit, the other is implicit. Implicit means suggested but not expressly stated. Thus they are completely different. That is logical deduction.
@SR-ti6jj ปีที่แล้ว ⁺¹⁸
Does this mean adding a non-null secondary index will improve count performance on tables that don't already have one?
@parkamark ปีที่แล้ว
You could create a secondary index on the same column(s) as the primary index. That would then speed up counting, unless the DB engine is doing some clever stuff under the hood that means you don't necessarily have to do that. But given what he's said in this video, creating a secondary index on the same column(s) as the primary is certainly a good workaround. Maybe someone who knows more than I do could clarify this point.
@parkamark ปีที่แล้ว
To answer your question directly, having ANY index on a table is way better than none at all, both in the case of searching and counting. So having a simple non-null non-unique index would be the minimal requirement for fast counting. It also matters if the columns are fixed or variable length datatypes, eg. int is fixed length, varchar is variable. If all columns are fixed, then the database can also do a fast count without any indices because it knows that the length of each row is fixed, and it knows the full size of the entire table, thus what the number of rows must be.
@artemisamberdrive583 ปีที่แล้ว ⁺³
for specific purposes (e.g. extremely large table, statistics, etc) i set up a count-table with a single row and column, holding the information of the count of rows of the "parent table". this requires setting up triggers on the parent table insert+delete procedures to increment+decrement the value of the count-table.
keep in mind that this setup slows down the process of writing data, but data is usually read many times in contrast to written.
@GuruEvi ปีที่แล้ว ⁺⁵
That seems extremely 'hacky' and you end up doing the same thing an auto_increment lock does (with more hoops) and if your system gets busy or needs to scale you basically lose any concurrency. Also makes your application a whole lot less portable and you could make a mistake (eg. most examples I've seen online, do not take into account that an INSERT or DELETE can target multiple rows, but the trigger only gets called once, so now you need looping logic or you have a bug). Not sure if you actually "need" a count if you're working with that many records, but most database engines can provide an estimate or perhaps, you may be able to use a different database system altogether that is better optimized for providing statistical information.
@SXsoft99 ปีที่แล้ว ⁺¹
yes and at the same time you need to keep on separate columns all the conditions combinations for filtering, also good luck finding programmers that actually use things like triggers since it hides the application business logic in the database
@christianbarnay2499 ปีที่แล้ว
I don't know for MySQL but most DB engines already do that on their own. They have a "table information" table that contains all the metadata of each table, including the row count.
select count(anything not nullable) without a where clause will automatically trigger a lookup in that table to get the current count of rows of the selected table.
@lostcarpark ปีที่แล้ว ⁺⁸
Thanks for this. I've had this argument so many times. I think some early SQL engines did look at all columns for count(*), but I believe pretty much all of them have this optimization at this point.
@АлександрС-ш5ь ปีที่แล้ว ⁺²
I believe it is a key point. Some early SQL engines did not have that optiomization, now they have it. To make a coclusion with a smart look now, is not exactly right. So man should make it automatically and even do not spend time, always use `count(1)` and not to worry about the peformance or search for confirmations. That solution works everywhere.
@TravisTennies ปีที่แล้ว ⁺³
Just subscribed. Hard to find people who actually talk real facts these days.
@andrewkamoha4666 ปีที่แล้ว ⁺¹
6:15 "Over Thanksgiving, when you're arguing with your family about whether count(*) or count(id) is faster"
I'm sure grandma will love to talk about that ...
@aarondfrancis ปีที่แล้ว
My dad was actually a DBA, so there's a non-zero chance it will come up
@TheyCalledMeT ปีที่แล้ว ⁺³
which also means it needs a secondary non null index to function the way you indicated.
redo the trial without such an index to see what it does
@harrytsang1501 ปีที่แล้ว ⁺³⁷
Yes, very often the problem you are trying to solve is more generic, and can be expressed in more generic terms.
Using language features and trusting that smarter people have put more effort in optimizing the language itself is often more optimised than what we can come up with ourselves.
One classic C trick was to not use multiply/divide and instead add or subtract bitshifted values. However, in modern systems that no longer take dozens of clock cycles to multiply, the compiler knows better and will just replace your whole expression with a multiply.
@Demonslay335 ปีที่แล้ว ⁺¹⁸
The compiler may even optimize your division into multiplication or bit shifts, and all other kinds of fun wizardry. It really is more important to have readable code in many cases nowadays.
@CottidaeSEA ปีที่แล้ว ⁺²
@@Demonslay335 In some cases it doesn't do what you want it to do; but only then is it worth looking into optimization. We should always be aware of the performance impact our code has, but there's no need to go crazy about optimization before you even have any performance information to work with.
@hwstar9416 ปีที่แล้ว
usually compiler isn't as smart as you think. It can do simple optimizations like the one you mentioned, but anything slightly more complex it fails at.
@harrytsang1501 ปีที่แล้ว
@@hwstar9416 That is because you are using more dynamic languages. With more strict rules for memory safety and type setting, newer languages like Rust and Zig are doing wonders. It doesn't save you from algorithm problems with Big O of n cube tho
@hwstar9416 ปีที่แล้ว
@@harrytsang1501 I don't use dynamically typed langs, I use C/C++.
People often overestimate how optimizing the compiler is, it's not as impressive as you think
@airbornesnail ปีที่แล้ว ⁺¹
"You'll have won the argument which what holidays are all about" - best sentence I've ever heard. :D
@quintennn 11 หลายเดือนก่อน
"Tell your family on thanksgiving" It would take me about 5 winters to explain this to my family.
@Im_Ninooo ปีที่แล้ว ⁺⁶
I've been using COUNT(1) for a while now. I wonder if the behavior is the same in other databases such as CockroachDB
@odysseus655 ปีที่แล้ว ⁺²
I stopped using count(*) decades ago when I ran into a problem with our database engine at the time where there was some catalog corruption and this was erroring out with "column not found error". Lately I've been using count(1). Likely not a great reason to continue not using it (and I'm expecting the execution plan to be the same in any case).
@Metruzanca ปีที่แล้ว ⁺²
This is the type of content I didn't know I wanted. More please.
@spaceemotion1 ปีที่แล้ว ⁺¹⁷
In a client project a few months back, we had to remove pagination from an admin dashboard, because COUNT(*) to get the no. of pages was timing out. even after hours of letting it run. The table we tried to scan was over 5TB in size. How would you count these kinds of tables in MySQL? (we couldn't easily migrate to something like Postgres for other reasons).
@PlanetScale ปีที่แล้ว ⁺¹⁷
In those scenarios, you'd probably be better off using something like cursor (or keyset) pagination. Check out this post for more info on that! planetscale.com/blog/mysql-pagination
@spaceemotion1 ปีที่แล้ว ⁺⁵
@@PlanetScale that's what we ended up with, but we still had no clue how many records there actually are 😅 (for internal analytics)
@aarondfrancis ปีที่แล้ว
@@spaceemotion1 This is totally the kind of thing PlanetScale Boost can solve for! th-cam.com/video/sJVzvBLDqqw/w-d-xo.html
@reikooters ปีที่แล้ว
If you're happy with an approximate number, in MSSQL we use: select rows from sys.partitions where object_id = Object_Id('my_table');
Based on a google search, you can do the same in MySQL: select table_rows from information_schema.tables where table_name = "my_table";
Or alternatively: select num_rows from information_schema.innodb_sys_tablestats.name = "my_table";
This uses a pre-calculated value and doesn't actually touch the table itself, so you get an answer immediately, although the value is not constantly updated every insert, hence approximate. I was actually surprised not to see this approach mentioned in the video, but I guess it was just a "myth buster" video - even though I swear I saw the thumbnail said "fastest way to count".
You could also keep your own separate table where you keep accurate counts every time you do an insert, and use that in your application when you want to show the total number of rows.
@SimonBuchanNz ปีที่แล้ว ⁺¹
I'd be interested to see if count(1) worked?
@anothermouth7077 ปีที่แล้ว ⁺¹
Two thumbs up to you. My TL was at times bugging me on some reviews. Even when I showed him SQL docs saying that most optimisation is already being done by the MySQL there's no need to over engineer the solution.
@marioomarti815 11 หลายเดือนก่อน ⁺¹
8. Conceder el rol creado a USU2 y USU3.
GRANT ROL_USU TO USU2, USU3
9. Conceder a USU4 privilegios sobre algunas tablas con la opción de poder concedérselos también
a otros usuarios.
GRANT INSERT, DELETE ON TABLA1 TO USU4 WITH GRANT OPTION;
GRANT INSERT, DELETE ON TABLA2 TO USU4 WITH GRANT OPTION;
@kentlarsson1263 ปีที่แล้ว
Great stuff! Love that you mix in a bit of fun with the content, it's what got me to subscribe!
@michelprovencher8518 ปีที่แล้ว
Like the EXISTS function where the "SELECT * FROM ..." is just a predicate for the function to work and only the WHERE clause is meaningful
@alexpetrov8871 ปีที่แล้ว
0:55 >optimized
This optimization is specifically MySQL case, in other DB it may not be so and can even depend on DB version. So generally for SQL count(id) is better (if there is suitable index of course).
@freibuis ปีที่แล้ว ⁺²
This was not always the case though. Count(*) was always slower. Not by a little.. by a lot. That saying i havent used my mysql in 10 years. But with my experience with postgres, it didnt matter
@spacemanmat ปีที่แล้ว ⁺¹
Always nice to see how the optimiser is working under the covers. I’ve seen a few cases where the original program had done something dumb but the optimiser picked up the issue an optimised the issue away. Still makes me uneasy relying on it though.
@XaviSanz35 ปีที่แล้ว ⁺¹
if you remove secondary indexes, and run count again, how long it takes?
@nskeip ปีที่แล้ว ⁺¹
I should create a database engine where COUNT(*) is "multiply number of columns by the number of rows", COUNT(/) is "divide rows by columns", COUNT(+) is ... (you get the idea)
@jeffmccloud905 ปีที่แล้ว ⁺²
actually, WAY back in the day in Oracle (1990s), it was recommended to use SELECT COUNT(1) and not COUNT(*) because it actually did make a difference. but they fixed that. but some grizzled old devs kept that convention.
@edbutler3 ปีที่แล้ว ⁺¹
Yeah, I've been using count(1) on Oracle for decades. I've suspected for a while that recent versions have a smart enough query optimizer to do the right thing with count(*), but I haven't taken the time to verify. And for MS SQL Server, the "culture" has always been to use count(*), so I've assumed it's ok there.
@PanduPoluan ปีที่แล้ว
I always use COUNT(1). Because COUNT(*) depends totally on the engine's optimisation.
@Petoj87 ปีที่แล้ว ⁺¹
Would be interesting if the same is true for other sql engines like litesql, postgres, sql server and postgres
@jacobstamm ปีที่แล้ว ⁺²
Doesn’t the compiler optimization demonstrated at 5:25 mean the recommended usage of count(*) makes no difference in practice?
@aarondfrancis ปีที่แล้ว ⁺¹
People use COUNT(id) because they think they are being helpful, when in reality they are not. If they are using COUNT(id) to mean COUNT(*), it's better to make that intent clear. Thanks for the comment!
@jannickbreunis ปีที่แล้ว
“When you’re arguing with you family…” haha nice one.
@lq_yt ปีที่แล้ว
on mysql
@TeslaTechNow ปีที่แล้ว ⁺¹
I felt like the video was not finished. I didn't see you run any thing. You just said it was faster, but I wanted to see the numbers.
@marioomarti815 11 หลายเดือนก่อน ⁺¹
7. Crea un rol que permita conectarse a la base de datos y hacer SELECT sobre las tablas creadas
anteriormente.
CREATE ROLE ROL_USU;
GRANT CREATE SISSION TO ROL_USU;
GRANT SELECT ON TABLA1 TO ROL_USU;
GRANT SELECT ON TABLA2 TO ROL_USU;
GRANT SELECT ON TABLA3 TO ROL_USU
@GhiveciuMarian ปีที่แล้ว
This make sense when you select all rows from table, but throw in there any WHERE clause, or any filtering then the advantage might evaporate. The hardest thing i found was to display results on filtering ... In this for example might you want to show how many todos are done from total. This specific table does not have the field 'done' but if its having done, and was not specified in a key it will result in a table scan.
@devhaua ปีที่แล้ว
One of the best videos I haveever watched on SQL, tq
@PlanetScale ปีที่แล้ว
Thank you! Love hearing that
@RockTheCage55 ปีที่แล้ว ⁺⁴
Granted it "might" be based on the sql engine: Oracle, Sql Server, Postgres, ect.... i usually do `select count(1) from MyTable;`. Whether that is faster or not i don't know & it might depend on the engine again.
@PlanetScale ปีที่แล้ว
Right. We are specifically talking about MySQL here.
@stLmpp ปีที่แล้ว
I also use count(1), don't know if it's slow or faster than count(*)
@HoSza1 ปีที่แล้ว
Why do you write "ect"? Is it a typo?
@brentfoo3938 ปีที่แล้ว ⁺¹
@@HoSza1 no that means i didn't list every database engine out there. There are many more :).
@jige1225 ปีที่แล้ว
@@brentfoo3938 then 'etc.' (and just one '.')
@medilies ปีที่แล้ว
Maaan, I didn't know this channel is posting such content :0 I liked it, subscribed and activated notifications.
@qkktech ปีที่แล้ว
count star is asking recordcount and that is table property. When select * is used in columnar database then it is not optimal since in case columnar databases columns are retrieved separate and then records are put together. In case of columnar databases for null allowed columns also distinct count is also column property sometimes also min and max etc. Vertica has for example no indexes at all.
@Neakas ปีที่แล้ว ⁺¹
This is also true in MS-SQL. it will use the most narrow Non Clustered Index on a Table. If there is non, the Tables clustered Index has to be checked, which is slow. but i think in MS-SQL you can use sys.sysindexes to look up the rowcount even faster
@cangurcan99 ปีที่แล้ว
Wow, using mysql for decades and never knew this. Thanks man.
@mortona42yt ปีที่แล้ว ⁺²
I knew the db optimizes the query, but didn't know details like this. I would be surprised if after 20+ years of development, it would interpret count(*) as "load everything from the table and count it". What about when you don't have a secondary index, or doing a join query? Probably still counts the returned rows, or some optimization with joined indexes?
@Ceelbc ปีที่แล้ว ⁺⁵
With a proper SQL implementation, this should not matter; the compiler should handle this.
@Ceelbc ปีที่แล้ว
@@lawrencechiasson975 *which is part of the compiler.
@tomasma4896 ปีที่แล้ว
Cools, I am using MySQL for years but never heard of this one. Please more videos like this :)
@CodeKujo ปีที่แล้ว ⁺²
The advice to avoid count(*) predates mysql. It may have even been true in mysql at some point or some obscure schema. As you said, count(*) relies on the optimizer to do the right thing. I use count(0) myself, but I wouldn't be surprised if sometimes you have to be more specific to get the right query plan. I think the biggest lesson in this video is not to rely on advice, but to check the plan and know that counting an index can be faster--which is great advice!
@ivanskyttejrgensen7464 ปีที่แล้ว ⁺¹
When I got a job that involved Oracle 7 in 1999 the DBA told me to use count(*) because the old hack with count(1) wasn't needed anymore. So it was presumably true at some point in the 90s.
@davidlean8674 ปีที่แล้ว ⁺²
I was teaching performance tuning on behalf of a database vendor in 1989. COUNT(*) was the recommended approach for SQL Server (both Mifcosoft & Sybase), Oracle, DB2, & Ingress.
So yes it predates mySQL. The syntax alternative was to specify a column name, But that was only if you wanted to find the count of non-null fields in that column or expression.
Count (constant) was never necessary in any platform I've used. Yet lots of people suggested it. Most had minimal clue about DB internals or query optimisations.
@ABaumstumpf ปีที่แล้ว ⁺¹
@@ivanskyttejrgensen7464 I have used it on oracle6 so at least even back then the advice to use count(1) was already outdated. Likely it was something for pre-ansi sql.
@christianbarnay2499 ปีที่แล้ว ⁺¹
But all queries rely on the optimizer. The optimizer is the core of the querying engine. As long as you don't mess with the execution plan by forcing a path through hints, the optimizer will always kick in and do its job.
@BlairdBlaird ปีที่แล้ว
@@christianbarnay2499 a big difference is that count(*) semantics are defined by the SQL standard itself, so its optimisation is a lot more likely than count(constant) being recognised as equivalent... to count(*).
@martymoo ปีที่แล้ว
I didn't know this. Thanks! Has it always been this way?
@adamzaczek6342 ปีที่แล้ว
Holy count, I just found an awesome channel to subscribe to. Love the humor at the end!
@aarondfrancis ปีที่แล้ว ⁺¹
Holy count 😂
@rumisbadforyou9670 ปีที่แล้ว ⁺²
I never got into using SQL databases, but having written a few on-disk and over-the-network data structures, I'd expect the count(*) to be smart enough to use some cached "total_length" value, especially considering that a lot of effort went into writing query optimizers. I guess, people would think that because of lack of experience of writing a data store yourself?
@sohn7767 ปีที่แล้ว
would be easy enough on a simple table query, in fact count() and indexes probably do do that. However whenever you call a function or a view or whatever script with at least a little logic, the final length is unknown
@brdrnda3805 ปีที่แล้ว ⁺¹
I never wrote a data store, but worked with relational databases for almost 25 years. Still, assuming COUNT(*) wouldn't be fast and it would process the whole record sounds utterly absurd to me. (And, honestly, I never heard that myth)
@diegocardenas4522 11 หลายเดือนก่อน
Best ad ever, keep them coming
@JuanLuisEcheverria ปีที่แล้ว
Hey dude, your way to explain this topic is very well !! Congrats
@PlanetScale ปีที่แล้ว
Hey, thanks!
@Runval2701 4 หลายเดือนก่อน
6:09 not understand ? somebody can explain how does select count(1) from todos will execute or work ?
@paulthomas2577 ปีที่แล้ว
This changes by database server. This is true for MySQL but SELECT count(*) is much slower in SQL Server.
In SQL Server the way I learned to to do it was SELECT count(1)
@hieungo770 ปีที่แล้ว
I love how you explain things, is there any course from you that teach from the ground up
@PlanetScale ปีที่แล้ว
Check out our MySQL for Developers course: planetscale.com/learn/courses/mysql-for-developers/introduction/course-introduction
@wadecodez ปีที่แล้ว
Your thanksgiving conversations sound interesting
@jameseze9469 ปีที่แล้ว ⁺²
So why didn't they just make the Count function parameterless.
Select Count() from customers
@AirelleInfo ปีที่แล้ว ⁺¹
Because there are good reasons to COUNT(): on a nullable column, COUNT() removes the NULL values and counts only valued rows for that column; or you may need to COUNT(DISTINCT ) to count non-NULL unique values.
@jameseze9469 ปีที่แล้ว
@@AirelleInfo ah shit!, I forgot the count(distinct ).
@mintx1720 ปีที่แล้ว
"Trust the compiler" is actually a learned skill.
@GrantGryczan ปีที่แล้ว
After seeing this channel's other video about how the YEAR function doesn't use indexes (where the compiler theoretically could totally make the optimization to use indexes), I do not trust the compiler... (new to SQL)
@ahmad-murery ปีที่แล้ว
I know about count(*) but I didn't know how the Optimizer decides about what index to use.
One learn new things everyday,
Thanks Aaron!
@dealloc ปีที่แล้ว
Even in SQLite COUNT(*) takes less opcodes compared to COUNT(1) and COUNT(id) as it will just read value that is stored already, instead of having to aggregate.
@IARRCSim ปีที่แล้ว ⁺²
If you count a primary key, why wouldn't the SQL vendor recognize the intent and treat it the same as * anyway? Can count(id) where id is the primary key ever lead to a different result than count(*)? I wish he tested with multiple SQL vendors(MySQL, Oracle, MS SQL, PostgreSQL, SQLite), with * and with id to see if all or just some are optimized this way. I wish he also timed some of them in milliseconds too to see how important the difference is. Not all SQL vendors optimize the same ways so I'd be surprised if this advice is applicable to every SQL vendor that exists.
If I submitted count(*) in a pull request and the reviewer didn't like it, he might not be convinced by this youtube video because it doesn't go into any depth on evidence with different SQL vendors or actually timing anything. It would be a lot more efficient in man hours if a video like this included enough evidence and detail than if the masses who tried to follow the advice did all this testing independently on their own.
@arithex ปีที่แล้ว ⁺¹
Is 'select count' still always going to be a table-scan query? Or are there internal optimizations, that maintain a count of active rows in a table.. maybe an in-memory cache that's updated atomically with insert/delete operations?
@aaronmeder 9 หลายเดือนก่อน
Love it! Thanks guys for sharing
@CFEF44AB1399978B0011 ปีที่แล้ว
It seems to me that the point of a declarative language is that you declare what you need, or tell the computer what you need. The computer figures out the best way to get it for you. Telling the computer how it should count seems to be defeating the purpose of having a declarative language where the computer will figure out the best way to do what you requested.
@rickdg ปีที่แล้ว ⁺¹
Me using count(1) like a caveman.
@willi1978 ปีที่แล้ว ⁺¹
All other dbs only return one field with count(*). Have never seen a table when running that query
@rael_gc ปีที่แล้ว
In Postgres, there is a difference between them. According the documentation, the COUNT(*) scan all rows, while COUNT(column) does not consider NULL values in the column.
@christianbarnay2499 ปีที่แล้ว ⁺¹
It's the same in MySQL and any other SQL standard compliant database. The argument of the count function is used to exclude nulls from the count.
This video is not accurate.
@adamtak3128 ปีที่แล้ว ⁺²
Please make more education SQL content. This was fantastic.
@violin245 ปีที่แล้ว
How is someone talking about SQL this charming
@Peter-Ja ปีที่แล้ว
Looking forward for the next argument with my family about the performance of the SQL count operation. Everyone will be so excited
@PlanetScale ปีที่แล้ว
Hopefully you win! "Happy Thanksgiving, y'all don't know anything!" - Peter, probably
@notoriouslycuriouswombat ปีที่แล้ว ⁺¹
maybe i am in the wrong circles but i've never worked with anyone who said don't use count(*) weird, did you just make this up? or do people actually say this?
@PlanetScale ปีที่แล้ว
Sounds like you're in the right circles, tbh
@DreanPetruza ปีที่แล้ว ⁺¹
Is this in ANSI SQL too? or what would be the advantage of supplying the column name in COUNT()? doesn't COUNT() skip the rows that have NULL in the specified column?
@chtomek ปีที่แล้ว
6:23 did creator trigger that rainbow around subscribe button or youtube detects subscribe words and does that?
@PlanetScale ปีที่แล้ว
I... have no idea. Must be TH-cam?
@anasouardini ปีที่แล้ว ⁺²
Why don't DBs store a count variable and save the day? Why don't they count ROWS by the rows themselves` instead of ROWS BY COLUMNS?
@nicolasguillenc ปีที่แล้ว
you are amazing at explaining things man
@aarondfrancis ปีที่แล้ว
Thank you!
@Wangaruro ปีที่แล้ว
I actually learned something new today! Thanks!
@mutatedllama ปีที่แล้ว ⁺¹
What a great video. Earned a subscribe. Looking forward to more!
@dennydharmawan4820 ปีที่แล้ว ⁺²
if count(id) and count(*) uses due-date in the end, then does it matter what you use?
@heiko3169 ปีที่แล้ว
Actually he explained more why using count(*) is not bad at all. :)
Although it would be interesting what would happen if there is no other (secondary) index at all on the table
@jacobstamm ปีที่แล้ว
I’m wondering this as well. Seems like the intelligence of the compiler makes the point somewhat moot. Then again, it’s good to know that I should always have an unclustered non-null index on my tables for this use case.
@heiko3169 ปีที่แล้ว
@@jacobstamm You dont need to. :)
I investigated that more for myself. It simply doesnt matter whether there is another index or not.
The COUNT() function is specialized right for that purpose, and using it with * is just fine and will be fast anyway.
@Kaspricci ปีที่แล้ว
When I was still coding DB scripts “select count(1) from table” was considered fastest.
@sathishkannan6600 ปีที่แล้ว ⁺²
What software are you using to see which column is read? Is there an extension for this postgres?
@BobChao87 ปีที่แล้ว ⁺¹
This isn't a special software at all! The EXPLAIN statement is part of MySQL itself and actually just returns a table with rows as seen in the video. PostgreSQL has a similar EXPLAIN statement, where you literally just put whatever you have as your and slap EXPLAIN in front, so "EXPLAIN " just like in the video will work similarly for you as well!
Most modern database systems should have EXPLAIN or something like it, and you can likely find it by searching for " EXPLAIN statement".
@sathishkannan6600 ปีที่แล้ว ⁺¹
@@BobChao87 Thank you so much. New to database world
@i3looi2 ปีที่แล้ว
It may have been that COUNT(*) was indeed slower in past times .. but since the PHP era, mysql got a lot of patches and upgrades.
@jameslucas5590 ปีที่แล้ว
I'm a vet and will use *. I don't care if it bothers anyone, because I want to just move on. However, I Love this video and the knowledge you share.
@smeedee ปีที่แล้ว
You could also use the „rows“ from the explain query. In some use cases this is already good enough ^^
@kireetibhyrovabhotla1763 8 หลายเดือนก่อน
Which editor are you using for your mySQL queries?
@teamRizek ปีที่แล้ว ⁺¹
Which database tool program did you use? Thanks
@PlanetScale ปีที่แล้ว ⁺¹
TablePlus!
@alexaneals8194 ปีที่แล้ว ⁺²
It's database engine dependent. Oracle, SQL Server, DB2, MySql and etc. all have different implementations. In SQL Server there are faster ways of getting a count then using select count(*) which incidentally puts locks on the table being counted if you are not careful.
@HishamElsayad ปีที่แล้ว
Thanks so much for providing this great information.
@RajveerSingh-vf7pr ปีที่แล้ว
That's the thing about Sql, Sql is not like a normal programming language,... It takes the query, analyse it, plans it, then runs it
@djenning90 ปีที่แล้ว
I love your style!
@aarondfrancis ปีที่แล้ว
Thank you!
@robbybankston4238 ปีที่แล้ว
I'd be interested to know if the same is true of Microsoft SQL Server or Oracle PL/SQL.
@ShivaprasadBhat-dm3kn ปีที่แล้ว
Great video! What is the tool you are using for query?
@PlanetScale ปีที่แล้ว
TablePlus!
@ShivaprasadBhat-dm3kn ปีที่แล้ว
@@PlanetScale thank you! installing now 😁
@jayakarthikreddy ปีที่แล้ว
I verified the explain plan of count(*) on a table, optimiser even picked a secondary index on a null column.
@djnormus 10 หลายเดือนก่อน
Hello,
what is your SQL software that you use in this video?
Thank you
@vihornis ปีที่แล้ว
Dos it apply to other SQLs, or it is MySQL specific?
@rayonsi ปีที่แล้ว
Great video about performance of counting, I am curious what is name your db client manager?
@PlanetScale ปีที่แล้ว
Table plus!
@Austin-ft8pn ปีที่แล้ว
I love content like this, I'm going to have to check this out for my self!

ต่อไป

เล่นอัตโนมัติ