Why do databases store data in B+ trees?

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ธ.ค. 2024

ความคิดเห็น • 76

  • @jatinrawat1047
    @jatinrawat1047 11 หลายเดือนก่อน +9

    was exploring mongodb internals and needed to understand why we use B+ trees.
    awesome explanation ,crisp and to the point !

  • @ViniciusdeSouzaFernandes-p8r
    @ViniciusdeSouzaFernandes-p8r ปีที่แล้ว +5

    Absolutely clear your explanation. Very simple approach making all B-tree propose understandable. I was reading a advanced book where I was getting lack of information and motivation about how to range from a leaf like containing 101 to a leaf containing 601. Your example is very realistic providing easier understanding. Congrats, buddy!

  • @shandoticwa
    @shandoticwa 3 วันที่ผ่านมา

    To be precise: Most implementations of a dynamic multilevel index use a variation of the B-tree data structure called a B+-tree. The leaf nodes have an entry for every value of the search field, along with a data pointer to the record (or to the block that contains this record) if the search field is a key(uniquely identifies each row) field. For a nonkey search field, the pointer points to a block containing pointers to the data file records, creating an extra level of indirection.

  • @arungowda
    @arungowda ปีที่แล้ว +27

    Now I am slightly confused with indexing and storing rows as B+ trees 😂

    • @RaghvendraKumar-rm7tv
      @RaghvendraKumar-rm7tv ปีที่แล้ว +9

      B+ Tree data structure is used to store both the row & index.
      Since every table has only one clustered index which determines how data is stored physically on the disk, in clustered index B+ tree will store the row data also.
      But in a non-clustered index, B+ tree will not have the actual row data but a pointer connecting it to the clustered index which has the actual data.

  • @namangarg3933
    @namangarg3933 ปีที่แล้ว +15

    Very well explained Arpit. Looking forward to more such videos.
    TH-cam is full of content and newbie creators.
    But, knowing the amount of research you put in your content coupled with your rich hands on experience, adds a tremendous confidence in whatever we learn from your channel.
    Thanks

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว +9

      Thanks for resonating 🙌 I ensure the correctness and in-depth understanding before making the video on it.
      Happy that you saw the efforts I have put in. Thanks again 🙌 will continue putting out videos on things that matter.

  • @nishankgoyal8323
    @nishankgoyal8323 ปีที่แล้ว +3

    Nice video, reminds me of how documents are stored in shelfs at my home. Containers saying what those documents are for say ITR, followed by files mentioning year blocks say 2010-2020, individual files and then individual documents.

  • @aneksingh4496
    @aneksingh4496 ปีที่แล้ว +1

    its very very deeply explained just like a good book author explains it .... highly Appreciate your efforts

  • @sanjaysreedhar9162
    @sanjaysreedhar9162 2 หลายเดือนก่อน +1

    Good enough for interview preparations...thanks a lot

  • @yuktigirdhar
    @yuktigirdhar หลายเดือนก่อน +1

    You mentioned in the first approach that while updating a row we cannot automatically move beyond the the size of the width of the row but later as you mentioned we have a fixed row size so in what case we will be exceeding the width and overwriting other data for the naive file storage approach?

  • @amitrastogi1405
    @amitrastogi1405 10 หลายเดือนก่อน

    Beautifully explained...hats off to you!

  • @rahulsarkar4206
    @rahulsarkar4206 ปีที่แล้ว +3

    A video suggestion. Difference between MySQL and Postgres. How they index differently and how that affects read, insert, update and delete query.

  • @smritimittal4128
    @smritimittal4128 ปีที่แล้ว +4

    For the first approach (discussed before B+ tree) -
    - All rows stored in a file
    - Can we not have 4KB (=disk block) sections each in that file too
    - If there were 1000 rows, then, each block is 100 rows now
    - so, to go through all of 1000 rows, I do 10 disk reads
    - hence, findOne is not O(n) but, O(no. of total disk blocks) ?
    - Also, when inserting -
    - Again, can we not put these 10 disk reads in RAM
    - re-arrange and flush all back to the file, the disk (similar to what we did for B+ tree) ?

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว +3

      1. that is exactly how disk io happens but how will you do random reads
      2. Today you are talking about 10, but you will not have just 40kb of data, it will go in GBs if not TBs. Then how will you do it?

  • @sanketh768
    @sanketh768 10 หลายเดือนก่อน +2

    Sir , I'm not able to imagine things when you say b tree is serialised and stored in the disk. Could someone please help me understand this better and visualize this?

    • @AsliEngineering
      @AsliEngineering  10 หลายเดือนก่อน +2

      each node is B+ tree is stored in one disk block. naive way to image each node in b+ tree as one separate file on the disk.

  • @rahulyadav8159
    @rahulyadav8159 ปีที่แล้ว +1

    One question here Arpit
    Leaf nodes of the tree contains the actural rows of table. So leafs nodes are efficiently utilizing the disk block size.
    But the internal nodes of the tree contains the child and their node information. So it seems internal nodes of the tree is not fully uitlizing the disk block size.
    So overall if we have more interanl nodes then we are un nessessarly wasting the disk block.
    Please let me know if I am thinking in right direction.

    • @Goku-xm1gq
      @Goku-xm1gq ปีที่แล้ว +1

      this is a trade off made in order to achieve better performance and efficient range searches

  • @swati12091993
    @swati12091993 5 หลายเดือนก่อน

    These videos are extremely helpful. Makes me curious to explore more on my own. Thanks!

    • @AsliEngineering
      @AsliEngineering  5 หลายเดือนก่อน

      Amazing. Thats precisely the attitude I wanted to percolate. Thank you for resonating 👍

  • @alejandrogarcia3004
    @alejandrogarcia3004 11 หลายเดือนก่อน

    loved your easy to understand explanation, thanks a lot for the new perspective on this!

  • @gomzysharma
    @gomzysharma 2 หลายเดือนก่อน

    So wonderful, very digestible content

  • @manukolathur4029
    @manukolathur4029 ปีที่แล้ว

    Simple and clear !!!!. Looking for more of these kinds...

  • @vishnuaggarwal1132
    @vishnuaggarwal1132 ปีที่แล้ว +1

    Can you share the Notes
    unable to download them from your website

  • @pankajvermacr7
    @pankajvermacr7 10 หลายเดือนก่อน

    Amazing, I really appreciate your effort, the way you teach us, is great Thanks man.

  • @abhishekpathania7093
    @abhishekpathania7093 หลายเดือนก่อน

    doubt,
    so now 1st leaf node (block) contains 3 rows,
    when we update its 1st row to an extent that it occupies whole space of the node leaf (block),
    will the rebalancing happen on the 1st leaf node & a new node will be created for 2nd & 3rd row ?
    how will this look like ?

    • @AsliEngineering
      @AsliEngineering  หลายเดือนก่อน +1

      Read more about B+ Trees, you will answers to all the questions you've asked.

  • @santoshbhatnagar2155
    @santoshbhatnagar2155 ปีที่แล้ว +1

    Hi @Arpit just asking out of curiosity can one see the internal implementation of storage of data in a database?

  • @vaibhavghadge4057
    @vaibhavghadge4057 27 วันที่ผ่านมา

    MongoDB does not use b+ tree
    In the b+ tree we have a duplicate node where the actual node is pointing to the leaf node from the leaf node we can directly get the value as well we can do range query as well

  • @unfiltered_with_yogi
    @unfiltered_with_yogi หลายเดือนก่อน

    but for the file case also we can't we do a similar operation of reading the entire file, bringing it into memory and then shifting the data around and finally flushing it back to disk.
    What advantage does B+ tree offer, I mean the only one which I can think if is that in case of B+ tress you wont have to touch the entire data but rather only those specific 100 rows, then can't have a design where we store 100 rows on file and the next 100 on another file and achieve the same output/performance ?

  • @punruthaan
    @punruthaan 2 หลายเดือนก่อน

    Hey Arpit - Help me to understand the case, when DB update and insert operation cannot take place in the given block due to limited space in the block then how DB manages insert and updates in such scenarios ?

    • @FullGardenStudent
      @FullGardenStudent 2 หลายเดือนก่อน

      how the dataBase manages that depends on how the database handles memory management. It could either reallocate bigger memory region and move the existing data over there or it might fill up any empty gaps caused by fragmentation within the existing memory region.

  • @deepeshtyagi8824
    @deepeshtyagi8824 หลายเดือนก่อน

    How can I get your notes

  • @darshanv1748
    @darshanv1748 ปีที่แล้ว +2

    I have a doubt , if i insert another row in row 195 then all the next rows should also move forward (you have to do the same in B Trees also right ?) , i don't get how btrees solve this problem

    • @snehagupta1044
      @snehagupta1044 7 หลายเดือนก่อน

      exactly tree will store the pointer to the data not the entire data

    • @codr0514
      @codr0514 3 หลายเดือนก่อน +2

      For this very reason, indexes, such as B+ Trees, perform poorly in write-heavy systems. Indexes are primarily designed to enhance lookup speeds (reads), not writes. This inefficiency arises because each insert, update, or delete operation may require rearranging the tree to maintain its balanced nature. This rearrangement often involves splitting or merging nodes and updating pointers, which can be time-consuming. It is crucial to carefully consider the required access patterns and create only the necessary minimum number of indexes on a table.

  • @32zim32
    @32zim32 2 หลายเดือนก่อน

    No need to do linear scan in sequential sorted rows. You can do binary search.

  • @AbhijeetBhattacharya-uh7dy
    @AbhijeetBhattacharya-uh7dy 6 หลายเดือนก่อน

    Awesome explanation! Thank you

  • @ronak6977
    @ronak6977 3 หลายเดือนก่อน

    Why AVL trees are not used as for them also time complexity is log(n)?

  • @nehagour6928
    @nehagour6928 2 หลายเดือนก่อน

    Arpit, great explanation!
    I have a few questions:
    1. Do you think the "1|101" information isn't fully utilizing the 4KB size?
    2. What happens if the 4KB space is full and we need to add more data to a leaf node?
    3. How does the system balance things if the leaf node (e.g., 1-100) is full and we need to add 8 more rows in between? Would we need to move data to another node?

    • @srinivassairamsampathirao5767
      @srinivassairamsampathirao5767 2 หลายเดือนก่อน

      don't know about 1 but
      2. a new node is added to the b+ tree based on the range it lies
      3. i think it will be handled in such a way that the no. of rows that are admissible to be put in a row shouldn't exceed 4kb size. ie sizeof(1|101)

  • @codepractices7
    @codepractices7 ปีที่แล้ว +2

    Can Clustered and Non-clustered index be related to B+ tree store of the database?
    Like for instance if we create a primary key then a clustered index is created on that, and inside the index, pointers to the block stored. So is that pointer pointing to the leaf node of the B+ tree?

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว

      Indexes are implemented as B+ trees. Indexes are implicit tables with 2 columns (indexed value and row id). Storage is very similar to how a table is stored.

    • @arungowda
      @arungowda ปีที่แล้ว +1

      @@AsliEngineering so the numbers 100,200..500 you talked about in the video, are they row Ids?
      when we index on a column, does it store a mapping of that column to rowId in B+ trees? In which case it has to do the B+ tree search?
      Or is the index a mapping of column to physical address on the disk?

  • @vinitsunita
    @vinitsunita ปีที่แล้ว

    Awesome explaination

  • @siddhantchavan1370
    @siddhantchavan1370 10 หลายเดือนก่อน

    I wanna learn more about this, any good references?

  • @shashankhegde8365
    @shashankhegde8365 5 หลายเดือนก่อน

    Very insightful video. Could you please make a video explaining the internal functionality of graph databases..??

  • @mohdhaseeb9818
    @mohdhaseeb9818 ปีที่แล้ว +1

    Thank you sir!!

  • @cooldudecs
    @cooldudecs ปีที่แล้ว

    beautiful! This is so cool

  • @binitrajshah9354
    @binitrajshah9354 11 หลายเดือนก่อน

    In case of update still we can face same issue of overflow which implies time complexity to O(n). How B+ trees handle this situation?
    I mean new update might have larger value and my leaf node is already fully occupied.

  • @rahulmenon5149
    @rahulmenon5149 9 หลายเดือนก่อน

    do the leaf nodes contain data or pointers to actual data ?
    I read in other places - its pointers

    • @AsliEngineering
      @AsliEngineering  9 หลายเดือนก่อน

      No. It is data. Read about the clustered index and how tables store data in the clustered index.

  • @sachinjindal4921
    @sachinjindal4921 ปีที่แล้ว

    Great explanation Arpit👏

  • @jagannathprasaddas1636
    @jagannathprasaddas1636 ปีที่แล้ว

    Does a DBMS behave more intelligent than an os by bypassing the filesystem I/O? If yes what is the system call for reading writing to disk block from user space?

  • @sahazeerks2363
    @sahazeerks2363 ปีที่แล้ว

    Awesome explanation, loved it

  • @Jagrati
    @Jagrati 2 หลายเดือนก่อน

    Somehow, I am naturally very curious about how various kinds of databases/use cases store data at the lowest level i.e. disk level/SSD level.
    How various social websites/apps model and store list of followers/followees of a user? because after each friend request, this entire list changes.
    Sources of my confusion:
    (1) If we store this entire data in one field, that field will keep changing.Do relational databases even provide such variable length field(I know varchar,string etc, but that doesnt seem right data structure for this, to begin with)
    (2) instagram I read some time back uses postgres. How does postgres model such variable length field?
    (3) is there any social platform which uses nosql just to meet this criteria. That would sound funny though.
    (4) I am inclined to split the discussion into 2 parts :
    (a) When this field will be updated in-place like regular B+ tree, then a separate block might be needed to be allocated everytime, the new field size causes the block to expand.
    (b) if we use a database which uses sstable/memtable, and argue that anyway the field will be appended after every change, still we will cause intentional appending of this entire field(list of complete followers/followees after each new friend request) very frequently.
    (5) Some reference from real world use cases of how actual systems implement this in practice, will be very useful. Given your vast readings, I am sure you will be able to add value to it.
    (6) graph databases might be an option, but in real practice, are there social apps which use graph database for this use case?
    Thanks a lot!,
    If you find the doubt generic enough, you might make content for general consumption.
    Jagrati

  • @learnwithme7750
    @learnwithme7750 ปีที่แล้ว +1

    Hey Arpit
    Really good content..but one doubt in findById when we reach the leaf node of B+ tree then to search a particular should we need to traverse one by one or we can use binary search as well to reach a particular node?

    • @DurgeshYadav-bc5nm
      @DurgeshYadav-bc5nm 11 หลายเดือนก่อน

      It's not ordered, so the search has to be linear. As the nodes can be scattered anywhere.. like.. imagine linked lists

  • @gauravsharma3142
    @gauravsharma3142 10 หลายเดือนก่อน

    If a table has too many columns such as the data in each row exceeds 4kb of block size
    Will that block size will be of bigger value
    Or the databse will partition the row on basis of columns

  • @chandrachurmukherjeejucse5816
    @chandrachurmukherjeejucse5816 9 หลายเดือนก่อน

    Great content as always.

  • @prashantshubham
    @prashantshubham ปีที่แล้ว

    Great detailed video, thanks Arpit. Had one doubt if you can explain:
    At one point we're saying when we're finding one by Id = 3 it does 3 disk reads from root of the tree to leaf node, but isn't there caching involved and indexes are cached in memory so that the disk reads are avoided also is the caching involved only after 1st call is done pertaining to a row/table or are the indexes cached as soon as DB is ready to take reads/writes?

  • @itsthesecondchannel
    @itsthesecondchannel ปีที่แล้ว

    Thank you

  • @butterhalves4262
    @butterhalves4262 ปีที่แล้ว

    awesome.

  • @subee128
    @subee128 5 หลายเดือนก่อน

    Thanks

  • @BugMug1996
    @BugMug1996 ปีที่แล้ว +1

    Kya quality he bhaiya videos ki

  • @saifulhasan2532
    @saifulhasan2532 ปีที่แล้ว

    What if we want to search a name? How B+ tree will work?

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว

      same way, strings are comparable.

    • @saifulhasan2532
      @saifulhasan2532 ปีที่แล้ว

      So tthey will have different indexing? And we would have another B+ tree@@AsliEngineering

  • @aayush7099
    @aayush7099 ปีที่แล้ว

    Sir, What will happen we use B trees instead of B+ trees.

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว +8

      intermediate nodes bloat up (given data now resides there as well). This means it will take us to read more data from disk to reach rows present in the leaf.
      Performance will degrade.

  • @ankuragarwal9712
    @ankuragarwal9712 ปีที่แล้ว

    At th-cam.com/video/09E-tVAUqQw/w-d-xo.html -> are we assuming always that all B+ tree nodes will be part of one file therefore concluding that B+ tree leaf nodes contain offset of the next leaf node?

  • @rjarora
    @rjarora ปีที่แล้ว

    with all due respect, you should choose better topics for the videos