How do B-Tree Indexes work? | Systems Design Interview: 0 to 1 with Google Software Engineer
ฝัง
- เผยแพร่เมื่อ 19 มิ.ย. 2024
- You throw your data in B-Trees, I throw my data in B-holes, we are not the same.
Sorry about the poor explanation of node splitting on writes, here's an extra reference video if you need it:
• B-Tree Tutorial - An I... .
Again, please don't comment that the size of the B-tree page is in kilobytes, I get it now. - วิทยาศาสตร์และเทคโนโลยี
Nice work Jordan!
If I remember correctly, the relational databases use -tree because each of the nodes can be fit into a page on disk. This is efficient because disk data is loaded into memory page by page. The size of the node should then be tuned to fit into a page for the OS, which varies from OS to OS.
Sounds correct to me!
good stuff yet again! could you also include an overview of B+ tree storing the actual key:val pairs in the leaf nodes and implemented through linked lists in conjunction with perhaps how databases use the index from the B tree in some related, future content?
i just saw your latest vid on LSM tree + SSTable & appreciate the comparison table review.
I'm a bit confused here just because the data itself is actually stored in those leaf nodes!
@@jordanhasnolife5163 per my understanding the majority of relational dbs implement B+ trees rather than B trees. mongo db, which is a nosql one though actually uses a B tree which is a separate topic in itself.
the main benefits of B+ trees are that query ranges are much more performant than B trees where such a range were to span multiple nodes/pages. only the leaf nodes/pages contain the key:val data which are sequentially linked via doubly linked lists. therefore, you can store many more indexes in the non-leaf nodes of a B+ tree since you are not storing any of their associated values. the cost is duplicated keys in the non leaf nodes but that's a small cost relative to searching the valid range in your query.
potentially much less IO relative to traversing through a B tree when the range spans multiple nodes/pages as you may have to traverse up & down left to right.. each node of the B tree can store a smaller range itself relative to B+ tree nodes since they also persist the values rather than just the keys.
i came across this article that mentions combining both B trees to store the indexes which has pointers to the leaf nodes of the B+ trees so wondering if that is actually how most of the commercial dbs actually implement their dbs or not. i thought they only use only B+ trees over B trees.
dzone.com/articles/database-btree-indexing-in-sqlite
@@raysdev this article doesn't exist .. i think they removed it.
So basically we can have both the implementation for the hash maps, one on the memory (i.e. space constraints) and second on the disk right (but either case it doesn't support range queries) and that's why we need B-trees. Now B-trees are disk only right, because the size of the data could be huge, it solves the range query issue but the problem with it is that the write could be expensive if we have to perform a B-tree split (recursive) right?
This is what I understood so far, let me know if my understanding is correct. Also thanks a lot for this series, really helpful.
Just one more question I have is that, we generally maintain WAL when we are doing a memory implementation of any kind of index right? or we do we maintain the WAL irrespective?
I think just about all DB indexes will use some sort of write ahead logs. I think what you said is basically correct, but the main reason b-tree indexes are good for range queries is that similar keys are next to each other on disk.
Also, in reality, there's lots of in memory caching of B-tree pages that is done.
Hi Jordan, while performing writes in B-Tree, those are first written to WAL, and then it gets from WAL, to insert in its Btree like structure. So that's why writes are slower, as this must be synchronus operation. Am I right ?
Not sure what you mean by synchronous but basically a write must be fully committed in the WAL for it to count otherwise we throw it out
I am a little confused here - do B-trees store everything on disk (including the keys)? I read somewhere that they store keys in memory and the corresponding value is the disk address where the value is present (this is required since the actual values can be large and may eat up a lot of RAM space if kept in memory itself).
I imagine that it depends on implementation, and that there are various optimizations to use memory, but as far as I'm aware the original point of the B-Tree is to be easy to traverse on disk. Otherwise I feel like we could just use a Trie or something.
what does it load in the actual memory? for instance if I search for username bob, would it search for bob is the index (the b-tree) and find its reference in the heap. Does it only open that one page (assuming username is unique) from the heap in memory?
Depends on the implementation! This is disk here though, so not entirely sure what you mean by heap.
Depending on implementation, we may store a reference to the data on disk (more likely) or just the full row itself in the b-tree (probably less likely).
you didn't cover deletion eh? 😏
actually now that i think of it, b-trees remind me a bit of page tables. are all these database techniques inspired by the way OS manages virtual memory? (virtual memory also gets paged to the disk...)
Only thing I delete is beers - but good question on the OS part, I'd imagine yes but you'd perhaps want to look up some academia on them to see if they're mentioned in research
when there is no space in the B-tree available at all while inserting a new node, then instead of splitting the whole tree in two and check back with its parent , can't we create a new tree and then delete the old one , just wondering ?
We can but these trees can get large so making a whole new one can be expensive
Hi Jordan,
Could you please explain how b-tree on disk is better than hashmap on disk, if i don't want support for range queries? To find any key, hashmap will take O(1), time whereas B-tree will take O(log(n)) time .
Seems fairly reasonable to me. That being said, if you're building an index, you probably want range queries.
1. Linear probing and chaining as a collision strategy are horrible with disk seeks
2. Expanding and shrinking the hash map on disk requires a lot of disk seeks to recreate the new larger/smaller hash map.
3. Range queries are a big deal with B-trees
4. Multi-column index and prefix queries on B-trees
5. Database isolation for transactions are simpler and more mature in B-trees. E.g., "range locks" are needed for the highest level of database isolation -> to prevent phantom read anomalies. So to get truly isolated transactions, B-trees are the way to go (with hash indexes you likely have to do the much more complex method of doing predicate locks).
6. There is NO EXTRA COST to reading an entire disk block, which is usually 4KB. B-trees are optimized to use this fact, whereas with hash indexes, you end up needing only a part of a disk block, so you read the entire block and only use part of it.
@@slover4384 Great points thanks for sharing, yeah having to resize that map would be brutal lol
maaaaaaaaaaaaaaaaa maaaaaaaaaaaan
my guy