LSM Tree + SSTable Database Indexes | Systems Design Interview: 0 to 1 with Google Software Engineer

แชร์
ฝัง

ความคิดเห็น • 78

  • @SreekantShenoy
    @SreekantShenoy 2 หลายเดือนก่อน +17

    I took notes bro. great going!
    Indexing techniques:
    1) Hash Index
    - They are basic hashmaps in memory. (NOT disk - as we need random hashmap access instantly and not seek it)
    - Thus limited by memory space, expensive
    - Fast reads and writes
    - Bad for range queries
    - To add durability, we add WAL - write to disk as log (easy as pointer is always at the end) and then write to the hashmap
    2) B-Trees
    - Self balancing tree, entirely kept on disk
    - Hence NO limit on size
    - Advantage having lesser concern over durability
    - Slower reads than “Hash index” technique as - disk vs memory speed.
    - Reads are fast - O(LogN) to retrieve, but slow writes
    - Good for range queries (as stored in tree structure)
    - Durability can be ensured by NOT updating tree in place, rather updating the pages/block separately file and then updating the reference later on.
    3) LSM Tree + SSTable
    - In-memory tree table (AVL, red-black), self balanced
    - Little faster writes compared to B-Trees, but slower than hash index
    - Slower reads speeds than B-Trees, as it has to check all SStables (Also slower range queries)
    - Durability handled by WAL - which can be replayed later
    - Space concern of LSM trees as in-memory, However we have SStables which are outputted to disk
    - Hence No of keys limit not a concern (to store by space)
    - SSTable - sorted, immutable
    - Finding key - can go on taking time - it is first checked in tree (memtable) and only then across SSTable backwards from creation time
    - To delete, its implemented by tombstone value - indicating deleted. It is added to the latest SSTable entry
    LSM Optimizations
    Use Sparse index
    - Also known as index for SSTable
    - Take “certain” keys of SSTable and create a sparse index with corresponding stored memory location.
    - Can use binary search to find in-between keys
    Bloom Filters
    - Can vaguely tell, if certain “key” present in the SSTable
    Compaction
    - Merges SSTables in background - O(N) operation
    - Cons is extra CPU processing in background could be bad

  • @slover4384
    @slover4384 16 วันที่ผ่านมา +3

    LSM tree is the name industry gives to the entire structure (the in memory BST plus the different SST tables)
    The in memory BST part is called the "memtable"
    That is, memtable + SSTables == LSM Tree

  • @daisuke.ryomen
    @daisuke.ryomen 13 วันที่ผ่านมา +1

    Just started watching this series, till now it has been a lot of fun + a lot of learning!

  • @youssefhany6345
    @youssefhany6345 6 หลายเดือนก่อน +6

    This part was mentioned in the system design interview boob by Alex Xu in the end of chapter 6. Thank you for the video as it really clarifies this part with deep details.

    • @bezimienny5
      @bezimienny5 หลายเดือนก่อน

      I like interviewing boobs

  • @Anon-bt6dz
    @Anon-bt6dz 3 หลายเดือนก่อน +4

    Excellent video. Watched this to fill some gaps after reading about this topic on DDIA.

    • @atibhiagrawal6460
      @atibhiagrawal6460 2 หลายเดือนก่อน

      Same. This is excellent. This channel is a gem

  • @nintendolover82
    @nintendolover82 4 หลายเดือนก่อน +2

    This is the exact series I was looking for. Subbed

  • @ihsannuruliman4005
    @ihsannuruliman4005 7 หลายเดือนก่อน +4

    Boredom and depression are both our lovely friends.

  • @vimarshkoul930
    @vimarshkoul930 ปีที่แล้ว +3

    This video literally cleared all my doubts. Thanks

  • @Amonnuns
    @Amonnuns 4 หลายเดือนก่อน +1

    This is really great content, I just learned that from a book and wanted to make sure a captured everything. Thanks!

  • @mann8737
    @mann8737 ปีที่แล้ว +1

    Nice work as usual Jordan!

  • @MrCanokan
    @MrCanokan 4 หลายเดือนก่อน +1

    Great opening, Great Explanation, THANKS

  • @utkarshkapil9272
    @utkarshkapil9272 2 วันที่ผ่านมา +1

    bro is goat anywhere stuck come here

  • @RandomShowerThoughts
    @RandomShowerThoughts ปีที่แล้ว +5

    2:55 LMAOO this is when you gained a subscriber

  • @AbhijithVMohan
    @AbhijithVMohan 2 หลายเดือนก่อน +1

    Hi Jordan. Great series.
    DDIA mentions that binary search is not used to search in the sstable files. Rather the range from the sparse index is searched by a sequential scan. The reason for this is given as variable size of records. Also the block between 2 offsets in the sparse indeed would be compressed as a whole as a further optimization

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 หลายเดือนก่อน

      Thanks for sharing! I believe once you scan the offsets in the sparse index, you binary search from the offsets you care about, but we may be misunderstanding one another

  • @hassanharera
    @hassanharera 11 หลายเดือนก่อน +1

    Awesome explanation 👏

  • @Delicatamente
    @Delicatamente หลายเดือนก่อน +1

    14:16 LSM also faster on writes over B tree because every write operation on B tree will start index restructurization, which is not happening in LSM case.
    Thank you for your videos.

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน +1

      I think some percentage of them will, depends if the leaf node is empty but good point here

  • @vojindjukic2719
    @vojindjukic2719 9 หลายเดือนก่อน +1

    very well explained, thanks

  • @shamilgurban6439
    @shamilgurban6439 หลายเดือนก่อน +1

    More clear than DDIA to be honest. Book left some gaps.

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      It's too goated so I had to make videos I could understand

  • @dwivedys
    @dwivedys 2 หลายเดือนก่อน +1

    Boy oh boy! Beautiful this 👍👍👍👌👌👌

  • @vankram1552
    @vankram1552 ปีที่แล้ว +19

    This dude just making a youtube video for each chapter of "designing data intesive applications"

    • @jordanhasnolife5163
      @jordanhasnolife5163  ปีที่แล้ว +16

      He sure is - he also makes TH-cam videos on non DDIA stuff afterwards too

    • @staffeng
      @staffeng 10 หลายเดือนก่อน +4

      There's nothing wrong with that. DDIA gets so much into the weeds that you forget the broader points it's trying to make by the end of the chapter. Videos are faster, concise, and easier to revisit and revise. Good luck revising with DDIA.

    • @arshadhammad
      @arshadhammad 7 หลายเดือนก่อน

      @@jordanhasnolife5163 And we love him for it!

    • @MikeTypes
      @MikeTypes 5 หลายเดือนก่อน +2

      That’s a good thing, no?

    • @dwivedys
      @dwivedys วันที่ผ่านมา

      And what - if any - may be an issue with that?

  • @JinilSasidharan
    @JinilSasidharan 2 หลายเดือนก่อน +1

    Excellent, Thanks bro!

  • @ky747r0
    @ky747r0 ปีที่แล้ว +5

    As usual great video Jordan. I do have one suggestion, I believe that your powerpoint method of explaining was more effective. And you know what SWEs say "If it ain't broke, don't fix it" 🤪

    • @jordanhasnolife5163
      @jordanhasnolife5163  ปีที่แล้ว +5

      Well to be fair, I've done all this stuff on powerpoints haha, and I've gotten relatively feedback surrounding the iPad. I think that someone will always be a little less convenienced no matter what I do, so I'll probably stick with it for now! If it helps, the slides are linked in my channel description and you can use those to help understand these videos better!
      Thanks for the feedback :)

    • @ky747r0
      @ky747r0 ปีที่แล้ว +1

      @@jordanhasnolife5163 True that

  • @code_with_om
    @code_with_om 11 หลายเดือนก่อน +1

    Amazing video man ❤ed it

  • @yahorzabalotski
    @yahorzabalotski ปีที่แล้ว +2

    Hi Jordan! Great job! AFAIK key length may vary, something (e.g. special structure of each row) needs to be done additionally to support bsearch, right? Am I missing something?

    • @jordanhasnolife5163
      @jordanhasnolife5163  ปีที่แล้ว

      I'm a bit confused here. You're saying that when key lengths can vary normal binary search doesn't work? As long as they're sorted and there's a comparison function implemented that shouldn't be the case.

    • @yahorzabalotski
      @yahorzabalotski ปีที่แล้ว +1

      Of course it works, but you somehow need to know positions where to jump, right? e.g. additional file with mapping {"key_n" : "starts_at_pos_x_in_sstable"}

    • @jordanhasnolife5163
      @jordanhasnolife5163  ปีที่แล้ว +2

      @@yahorzabalotski Yeah that's what the sparse index is for

    • @7th_CAV_Trooper
      @7th_CAV_Trooper ปีที่แล้ว

      @@yahorzabalotski slotted pages probably works

  • @dibyanshupranjal7704
    @dibyanshupranjal7704 11 หลายเดือนก่อน +3

    so it the SS table just the structure in disk? I have got confused reading data intensive applications

    • @jordanhasnolife5163
      @jordanhasnolife5163  11 หลายเดือนก่อน +2

      It's literally just sorted data on disk yea

  • @moidrees4661
    @moidrees4661 4 หลายเดือนก่อน +1

    I think we store a part of B-Tree if not full in-memory for faster searches and perform write in-memory and do periodical flush to disk. If they are written in-memory then B-Tree ~ LSM Trees. Please correct me if I am wrong.

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 หลายเดือนก่อน

      There is definitely a lot of caching involved! It helps a lot to use a write back in memory cache of certain nodes within the b-tree.

    • @moidrees4661
      @moidrees4661 4 หลายเดือนก่อน +1

      @@jordanhasnolife5163 Can you share the ipad notes for this 2.0 playlist.

    • @jordanhasnolife5163
      @jordanhasnolife5163  4 หลายเดือนก่อน

      @@moidrees4661 Yup, will get to this when I finish my current series.

  • @Andron4iKTV
    @Andron4iKTV หลายเดือนก่อน +1

    What do u think about write amplification in LSM tree? A lot of people talk about it as big cons. BUt rel db still have this issue too. Do u know a db or ways how various db handle this?

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      Good question - I am not sure off of the top of my head, if you google this and come back with an answer definitely let us all know!

  • @stefanss2494
    @stefanss2494 3 หลายเดือนก่อน +2

    There are a couple of things which I didn't understand (I assumed that bloom filters and sparse Index are created per SSTable):
    1. So each SSTable has a bloom filter. Basically when we add a new entry to the memtable (the in memory balanced BST) we also update the bloom filter. When a certain threshold has been exceeded => memtable is flushed on disk (SSTable). Is the bloom filter also flushed on disk (and we have a mechanism of loading into memory the bloom filters and cache the most recent ones)?
    2.I assume that Sparse Index are also written on disk. If the bloom filter is an array of bits and for each key we apply k hash functions and set to 1 the respective bits (constant time) how is this sparse index build (how do we know the address on disk for a key)? Do we keep 2 BSTs in memory and we flush both of them ? If that is the case how to we store in the Sparse Index the address on disk of that key ?
    3. In case of compactions we would have to update both bloom filters and sparse Index right ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน

      1) Each SSTable has a bloom filter which is created when the SSTable is created (flushed from memory). Yep, you'd store the bloom filter on disk and load it into memory!

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน

      2) I'm not sure what you mean by sparse index. Presumably you mean a secondary index, in which case yeah you basically have to replicate the whole LSM tree, SSTables, and bloom filters for whatever the key is for that index

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน

      3) Again, assuming sparse index = secondary index. You have to update bloom filters on compaction, but I'd think that for the whole sparse index you can compact it separately from when you compact the primary index.

    • @stefanss2494
      @stefanss2494 3 หลายเดือนก่อน +1

      No that is not what I mean by sparse Index. In the video (minute 8:15) you showed 2 optimisations for the LSM tree, one was bloom filter and the other one was Sparse Index (about which you said it contains some of the keys of the SSTable and speeds up searching for a key in an SSTable).

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 หลายเดือนก่อน

      @@stefanss2494 OMG I totally forgot about the sparse index! Yes that would have to be rebuilt upon a merge because now the addresses change. In theory, you could probably "merge" the two bloom filters by just combining the ones, but that's effectively rebuilding them.

  • @itmatterstomepodcast
    @itmatterstomepodcast หลายเดือนก่อน +1

    Why would any search for a key/value go through multiple SSTables? Shouldn’t it search the tables in the order they were written to starting with the most recent. That way it knows the first time it finds the key/value then that is the most up to date data for it

    • @jordanhasnolife5163
      @jordanhasnolife5163  หลายเดือนก่อน

      That's what we do - multiple just means more than 1, what if our key isn't in the first SSTable file?

  • @andriidanylov9453
    @andriidanylov9453 ปีที่แล้ว +1

    👍✍

  • @Spreadlove5683
    @Spreadlove5683 7 หลายเดือนก่อน +1

    The tombstone just lives in memory right? Well, until you flush memory to disk when you run out of space, right?

    • @jordanhasnolife5163
      @jordanhasnolife5163  7 หลายเดือนก่อน

      It actually does go to disk, because keep in mind we have multiple sstables, so you need the tombstone until they get compacted together

    • @Spreadlove5683
      @Spreadlove5683 7 หลายเดือนก่อน +1

      ​@@jordanhasnolife5163 thank you! when do they go to disk? Immediately upon creation? What I meant in my original comment is that I figured they live in memory until the memory gets flushed into an SS table (so, not talking about swap space), but I'm afraid I don't understand how things work if that's not the case. Memory is always the primary source of truth anyways, right? And SS tables are immutable you said, but I guess you could associate a tombstone with an SS table entry without modifying the SS table itself, depending on if you consider that a mutation.

    • @jordanhasnolife5163
      @jordanhasnolife5163  7 หลายเดือนก่อน +1

      Yep they first go to memory and then are flushed to disk like a normal write :)@@Spreadlove5683

    • @Spreadlove5683
      @Spreadlove5683 7 หลายเดือนก่อน

      ​@@jordanhasnolife5163thanks!!

  • @zuowang5185
    @zuowang5185 5 หลายเดือนก่อน +1

    Is nlogn really not acceptable when n is ok?

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 หลายเดือนก่อน

      Can you elaborate on this question? But generally, no - any increase to time complexity is pretty huge when you're working on large enough datasets.

  • @dibll
    @dibll ปีที่แล้ว +1

    Question - LSM trees have overhead of WAL(Disk) in additional to memory write while B Trees only write to Disk then why B Trees are slower in writes when compare to LSM? Also I believe B trees do not have a WAL as they are not in memory, right?

    • @jordanhasnolife5163
      @jordanhasnolife5163  ปีที่แล้ว +3

      Btrees still likely have a WAL if you are performing writes in place. This functions to make sure that the tree doesn't accidentally get in an inconsistent state if the db fails. Btree write is WAL plus disk tree traversal, LSM tree write is WAL plus in memory tree traversal

    • @7th_CAV_Trooper
      @7th_CAV_Trooper ปีที่แล้ว +1

      Btree are mutable so they require locked writes and they would probably use a page buffer that is flushed after WAL. Additionally btree on disk requires random access which is slower than sequential.

  • @Hjksivgx
    @Hjksivgx 5 หลายเดือนก่อน +1

    I love the condenced version of DDIA. Hand drawing looks very ugly though. Why not powerpoint or Excalidraw? I would say more professional diagrams will get you way further. Good luck!

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 หลายเดือนก่อน

      Wow not a fan of my artistic abilities??
      Understand your point with excalidraw, however it would take a me a lot longer to iterate there and I couldn't write on top of them, hence why I opted for hand drawings.
      I could hire an animator, but all of these videos aren't behind a paywall...so maybe that'll be version 3.0

    • @Hjksivgx
      @Hjksivgx 5 หลายเดือนก่อน +1

      Many of your early videos featured prewritten content that looked much better than your art, haha. I understand that you already had the content, and also, redrawing for every new video without existing content would take an additional 15 minutes, but I think it's still worth it. Having it pre-drawn would make your videos shorter and much more professional. Anyways, there isn't much content like yours on TH-cam, to be honest-so concise and focused solely on the information we need for SDI. Good job!

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 หลายเดือนก่อน

      @@Hjksivgx Appreciate the feedback!

  • @unknownman1
    @unknownman1 27 วันที่ผ่านมา +1

    leaving the course at 6:03

  • @prashlovessamosa
    @prashlovessamosa ปีที่แล้ว +1

    Hey yo