How do B-Tree Indexes work? | Systems Design Interview: 0 to 1 with Google Software Engineer

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 มิ.ย. 2024
  • You throw your data in B-Trees, I throw my data in B-holes, we are not the same.
    Sorry about the poor explanation of node splitting on writes, here's an extra reference video if you need it:
    • B-Tree Tutorial - An I... .
    Again, please don't comment that the size of the B-tree page is in kilobytes, I get it now.
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 25

  • @mann8737
    @mann8737 ปีที่แล้ว +3

    Nice work Jordan!

  • @strixhooligan2347
    @strixhooligan2347 4 หลายเดือนก่อน +2

    If I remember correctly, the relational databases use -tree because each of the nodes can be fit into a page on disk. This is efficient because disk data is loaded into memory page by page. The size of the node should then be tuned to fit into a page for the OS, which varies from OS to OS.

  • @raysdev
    @raysdev ปีที่แล้ว +2

    good stuff yet again! could you also include an overview of B+ tree storing the actual key:val pairs in the leaf nodes and implemented through linked lists in conjunction with perhaps how databases use the index from the B tree in some related, future content?
    i just saw your latest vid on LSM tree + SSTable & appreciate the comparison table review.

    • @jordanhasnolife5163
      @jordanhasnolife5163  ปีที่แล้ว

      I'm a bit confused here just because the data itself is actually stored in those leaf nodes!

    • @raysdev
      @raysdev ปีที่แล้ว

      ​@@jordanhasnolife5163 per my understanding the majority of relational dbs implement B+ trees rather than B trees. mongo db, which is a nosql one though actually uses a B tree which is a separate topic in itself.
      the main benefits of B+ trees are that query ranges are much more performant than B trees where such a range were to span multiple nodes/pages. only the leaf nodes/pages contain the key:val data which are sequentially linked via doubly linked lists. therefore, you can store many more indexes in the non-leaf nodes of a B+ tree since you are not storing any of their associated values. the cost is duplicated keys in the non leaf nodes but that's a small cost relative to searching the valid range in your query.
      potentially much less IO relative to traversing through a B tree when the range spans multiple nodes/pages as you may have to traverse up & down left to right.. each node of the B tree can store a smaller range itself relative to B+ tree nodes since they also persist the values rather than just the keys.
      i came across this article that mentions combining both B trees to store the indexes which has pointers to the leaf nodes of the B+ trees so wondering if that is actually how most of the commercial dbs actually implement their dbs or not. i thought they only use only B+ trees over B trees.
      dzone.com/articles/database-btree-indexing-in-sqlite

    • @ryan-bo2xi
      @ryan-bo2xi 7 หลายเดือนก่อน

      @@raysdev this article doesn't exist .. i think they removed it.

  • @user-ee7oi8qv7f
    @user-ee7oi8qv7f 16 วันที่ผ่านมา +1

    So basically we can have both the implementation for the hash maps, one on the memory (i.e. space constraints) and second on the disk right (but either case it doesn't support range queries) and that's why we need B-trees. Now B-trees are disk only right, because the size of the data could be huge, it solves the range query issue but the problem with it is that the write could be expensive if we have to perform a B-tree split (recursive) right?
    This is what I understood so far, let me know if my understanding is correct. Also thanks a lot for this series, really helpful.
    Just one more question I have is that, we generally maintain WAL when we are doing a memory implementation of any kind of index right? or we do we maintain the WAL irrespective?

    • @jordanhasnolife5163
      @jordanhasnolife5163  15 วันที่ผ่านมา

      I think just about all DB indexes will use some sort of write ahead logs. I think what you said is basically correct, but the main reason b-tree indexes are good for range queries is that similar keys are next to each other on disk.
      Also, in reality, there's lots of in memory caching of B-tree pages that is done.

  • @manishasharma-hy5mj
    @manishasharma-hy5mj 18 วันที่ผ่านมา +1

    Hi Jordan, while performing writes in B-Tree, those are first written to WAL, and then it gets from WAL, to insert in its Btree like structure. So that's why writes are slower, as this must be synchronus operation. Am I right ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  17 วันที่ผ่านมา

      Not sure what you mean by synchronous but basically a write must be fully committed in the WAL for it to count otherwise we throw it out

  • @whereisinfinity
    @whereisinfinity 5 หลายเดือนก่อน +1

    I am a little confused here - do B-trees store everything on disk (including the keys)? I read somewhere that they store keys in memory and the corresponding value is the disk address where the value is present (this is required since the actual values can be large and may eat up a lot of RAM space if kept in memory itself).

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 หลายเดือนก่อน

      I imagine that it depends on implementation, and that there are various optimizations to use memory, but as far as I'm aware the original point of the B-Tree is to be easy to traverse on disk. Otherwise I feel like we could just use a Trie or something.

  • @shehrozmalik773
    @shehrozmalik773 4 หลายเดือนก่อน +1

    what does it load in the actual memory? for instance if I search for username bob, would it search for bob is the index (the b-tree) and find its reference in the heap. Does it only open that one page (assuming username is unique) from the heap in memory?

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 หลายเดือนก่อน

      Depends on the implementation! This is disk here though, so not entirely sure what you mean by heap.
      Depending on implementation, we may store a reference to the data on disk (more likely) or just the full row itself in the b-tree (probably less likely).

  • @firezdog
    @firezdog 2 หลายเดือนก่อน +1

    you didn't cover deletion eh? 😏
    actually now that i think of it, b-trees remind me a bit of page tables. are all these database techniques inspired by the way OS manages virtual memory? (virtual memory also gets paged to the disk...)

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      Only thing I delete is beers - but good question on the OS part, I'd imagine yes but you'd perhaps want to look up some academia on them to see if they're mentioned in research

  • @Unstoppable_gaur
    @Unstoppable_gaur 10 หลายเดือนก่อน +1

    when there is no space in the B-tree available at all while inserting a new node, then instead of splitting the whole tree in two and check back with its parent , can't we create a new tree and then delete the old one , just wondering ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  10 หลายเดือนก่อน +1

      We can but these trees can get large so making a whole new one can be expensive

  • @anirbansaha7987
    @anirbansaha7987 หลายเดือนก่อน +1

    Hi Jordan,
    Could you please explain how b-tree on disk is better than hashmap on disk, if i don't want support for range queries? To find any key, hashmap will take O(1), time whereas B-tree will take O(log(n)) time .

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      Seems fairly reasonable to me. That being said, if you're building an index, you probably want range queries.

    • @slover4384
      @slover4384 20 วันที่ผ่านมา +1

      1. Linear probing and chaining as a collision strategy are horrible with disk seeks
      2. Expanding and shrinking the hash map on disk requires a lot of disk seeks to recreate the new larger/smaller hash map.
      3. Range queries are a big deal with B-trees
      4. Multi-column index and prefix queries on B-trees
      5. Database isolation for transactions are simpler and more mature in B-trees. E.g., "range locks" are needed for the highest level of database isolation -> to prevent phantom read anomalies. So to get truly isolated transactions, B-trees are the way to go (with hash indexes you likely have to do the much more complex method of doing predicate locks).
      6. There is NO EXTRA COST to reading an entire disk block, which is usually 4KB. B-trees are optimized to use this fact, whereas with hash indexes, you end up needing only a part of a disk block, so you read the entire block and only use part of it.

    • @jordanhasnolife5163
      @jordanhasnolife5163  18 วันที่ผ่านมา

      @@slover4384 Great points thanks for sharing, yeah having to resize that map would be brutal lol

  • @rick-kv1gl
    @rick-kv1gl 5 หลายเดือนก่อน +1

    maaaaaaaaaaaaaaaaa maaaaaaaaaaaan