We are heavily using clustered index in our app. But one drawback was use of UUID and creating own clustered index. Thanks this video helped to avoid bottleneck
Can we please get a video on secrets management? Love the breadth of topics you have covered on your channel (thankk you so much!), but this topic seems to be missing, so I'd love to learn it from you!
IDK if my question is valid but in minute 9:00 it's not clear why you assume that reading a range of IDs from the visible index would be faster than the hidden index, why chances are those IDs being in one page is higher than chances of that being in the hidden index? doesn't this depend on how we are writing records? why writing in the visible index is next to each other but in the hidden is random?!
if I would make a guess, it’s technical debt. because of their original model when they first shipped MMAPv1. they had a single btree with a diskloc pointer directly to disk. that model is simple but had alot of problems mainly the use of mmap and didn’t have full acid support and MVCC . in 2014 they bought WiredTiger and that had the btree with the recordid. so it was easier to integrate is to replace the diskloc pointer with a recordid and keep all architecture the same.. otherwise it will require major rewrite it seems they did this big change in 5.3 as clustered collection
Awesome talk as always! Regarding 18:00, why would you want to do a query with the _id, and another filter, while the _id is unique? For kind of “is exist” query?
one example is a range query, give me all documents between id10 and 50 and having certain field is particular value , if that field is indexed it will be preferred over id
About the secondary index being preferred, I could imagine a composite index being more selective, where the > 2 IO would be less of a cost than the lost selectiveness. Maybe more so in range queries. So I guess it depends on your query in the end (where if you wanted custom behaviour you could even go for $hint). What do you think?
Whenever possible UUID strings should be converted to binary and stored as binary in the DB itself. This way it takes 16 bytes, compared to "string-stored" 36 bytes.
That depends a lot on your workload, MongoDB can certainly outperform SQL by a huge magnitude provided that you have designed your schema that suits and fits NoSQL and similarly there will be certain workloads where SQL would run faster. A big chunk of that performance is also dependent on the configuration and the type of deployments you are running.
In general it's the opposite, unless you're abusing NoSQL they should outperform any SQL database due to having relaxed ACID guarantees. You'll find most big tech companies had to eventually migrate to a NoSQL database because of SQL being a performance bottleneck when you're at a massive scale, e.g. Twitter, Facebook, Instagram etc. Of course it all depends on your domain, some use-cases require strong consistency guarantees with relational data which doesn't leave you with much choice but to use an RDBMS.
fundamentals of database engineering course database.husseinnasser.com
We are heavily using clustered index in our app. But one drawback was use of UUID and creating own clustered index. Thanks this video helped to avoid bottleneck
Day 1 of waiting for Hussein to make a video on consensus algorithms
i tried to read into them few months ago and haven’t picked up the pace.
Can we please get a video on secrets management? Love the breadth of topics you have covered on your channel (thankk you so much!), but this topic seems to be missing, so I'd love to learn it from you!
Thank you so much for your insight everytime :) I am learning so much from your videos.
IDK if my question is valid but in minute 9:00 it's not clear why you assume that reading a range of IDs from the visible index would be faster than the hidden index, why chances are those IDs being in one page is higher than chances of that being in the hidden index? doesn't this depend on how we are writing records? why writing in the visible index is next to each other but in the hidden is random?!
Great video.
It is a great addition to the database.
Quick question, why did they go with the recordid way in the first place?
+1 on the same question
if I would make a guess, it’s technical debt.
because of their original model when they first shipped MMAPv1. they had a single btree with a diskloc pointer directly to disk. that model is simple but had alot of problems mainly the use of mmap and didn’t have full acid support and MVCC . in 2014 they bought WiredTiger and that had the btree with the recordid. so it was easier to integrate is to replace the diskloc pointer with a recordid and keep all architecture the same.. otherwise it will require major rewrite
it seems they did this big change in 5.3 as clustered collection
@@hnasr I see. That's interesting. Thank you for the answer.
Since b trees are aslo storede in files and pages. Do db fetched entire btree when an index scan/seek has to be done
another amazing video, love you man ❤
Awesome talk as always!
Regarding 18:00, why would you want to do a query with the _id, and another filter, while the _id is unique? For kind of “is exist” query?
one example is a range query, give me all documents between id10 and 50 and having certain field is particular value , if that field is indexed it will be preferred over id
Can you shard a clustered collection?
About the secondary index being preferred, I could imagine a composite index being more selective, where the > 2 IO would be less of a cost than the lost selectiveness. Maybe more so in range queries. So I guess it depends on your query in the end (where if you wanted custom behaviour you could even go for $hint). What do you think?
Whenever possible UUID strings should be converted to binary and stored as binary in the DB itself. This way it takes 16 bytes, compared to "string-stored" 36 bytes.
why won't mongodb team make a clustered index a default one?
i envision it being default in few years once they iron out the bugs and limitations . which will makes it close to mysql innodb
great, thx
Where did you books and sword go :(
I moved office, they are on my side now 😄
Why is SQL so much faster than NoSQL?
Indexing, structured data etc.
That depends a lot on your workload, MongoDB can certainly outperform SQL by a huge magnitude provided that you have designed your schema that suits and fits NoSQL and similarly there will be certain workloads where SQL would run faster. A big chunk of that performance is also dependent on the configuration and the type of deployments you are running.
You mean the other way around ???, most scalable database on planet use NoSQL, Vitess,Cassandra,ScyllaDB etc
In general it's the opposite, unless you're abusing NoSQL they should outperform any SQL database due to having relaxed ACID guarantees. You'll find most big tech companies had to eventually migrate to a NoSQL database because of SQL being a performance bottleneck when you're at a massive scale, e.g. Twitter, Facebook, Instagram etc.
Of course it all depends on your domain, some use-cases require strong consistency guarantees with relational data which doesn't leave you with much choice but to use an RDBMS.
#bukopin
#mandiri
#britama
#deposito
greentea_metrimini@graharaya