The Secret Sauce Behind NoSQL: LSM Tree

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ธ.ค. 2024

ความคิดเห็น • 192

  • @Kxneki2433
    @Kxneki2433 10 หลายเดือนก่อน +63

    IMPORTANT: Don't forget the Memtable is stored in memory, so if the system crashes, that data will be lost. To avoid losing data, we can maintain a separate log file on disk. Every time we write to the Memtable, we'll also append that write to the log file (no need to sort it as we just use it to restore after a crash). Then once the Memtable contents get written out to a SSTable file, we can erase the log file. That way the log helps us avoid losing writes stuck in memory when a crash happens.

    • @jerichaux9219
      @jerichaux9219 6 หลายเดือนก่อน +1

      This seems like a fairly important detail.

    • @sujithmenonp
      @sujithmenonp 5 หลายเดือนก่อน

      Kafka is used as the logging system by default in most scenarios

    • @xijinping5064
      @xijinping5064 2 หลายเดือนก่อน +2

      a write-ahead log.

    • @nguyennoiphap6989
      @nguyennoiphap6989 หลายเดือนก่อน

      That's the interesting information. Thank you so much.

  • @sealuke2724
    @sealuke2724 2 ปีที่แล้ว +97

    Bruh, this is just awesome... keep going

  • @JoelBrubaker
    @JoelBrubaker 2 ปีที่แล้ว +185

    This is the perfect amount of depth and overview I’m looking for. Great videos and visuals!!

  • @2p2hong
    @2p2hong 2 ปีที่แล้ว +22

    This guy is better than my teamlead in term of explaining a concept on NoSQL, thank you , make my day

  • @maxil122
    @maxil122 2 ปีที่แล้ว +16

    That's the best system design content I have ever seen on youtube ! This channel is absolutely amazing. It must be tough to squeeze all that valuable knowledge into less than a 10 minutes video. Keep up the excellent work!

  • @bazoo513
    @bazoo513 2 ปีที่แล้ว

    This is the most information-dense video on what's so special about NoSQL databases I ever saw. Not a word was superfluous, but the key concepts were clearly transmitted.

  • @mr_nature
    @mr_nature 2 ปีที่แล้ว +25

    I appreciate your efforts. Thanks for making system design more palatable than ever.

  • @nishantgoel769
    @nishantgoel769 10 หลายเดือนก่อน

    One of the concise video to understand how elastic/lucene is using these things to fast write and read.
    Great work man!

  • @danielgospodinow
    @danielgospodinow 10 หลายเดือนก่อน

    I can't recall the last time I stumbled upon such great material! Fascinating work!

  • @rafaelacioly3252
    @rafaelacioly3252 2 ปีที่แล้ว +1

    This channel is by far the best channel that I've found on yt about tech

  • @AminAramoon
    @AminAramoon 2 ปีที่แล้ว +12

    These videos are superb man, keep up the good work

  • @arno.claude
    @arno.claude ปีที่แล้ว

    This channel is such a gold mine!

  • @BRBearUSA
    @BRBearUSA ปีที่แล้ว

    VERY informative without going into too much complexity. THANKS and congrats for a great video. I'm an MS SQL Server DBA, and the high level explanation you provided was awesome. Thanks again. Best, R.

  • @richsadowsky8580
    @richsadowsky8580 2 ปีที่แล้ว

    Really awesome overview of how SQL and NoSQL differ. Agree with Joel below me just the right level of detail to provide value.

  • @frederickbarbarossa2746
    @frederickbarbarossa2746 ปีที่แล้ว

    besides bloom filter, a sparse index is to help find a key quickly so we only look into a small number of sstables

  • @bardhan.abhirup
    @bardhan.abhirup 2 ปีที่แล้ว +8

    These videos are incredible! Very well paced and presented!

  • @CppExpedition
    @CppExpedition 2 ปีที่แล้ว +2

    LOVE THE DARK BACKGROUND! :D:D:D:D (Also the video of course!!)

  • @DK-ox7ze
    @DK-ox7ze 2 ปีที่แล้ว +1

    Great explanation. Resolved all my doubts on how NoSql DBs work. However, I wanted to understand
    1) Whether the balanced tree and keys in sorted set is only the object key with pointer to data value or it also contains the actual data?
    2) Can a NoSql DB index multiple keys?
    3) Why can't SQL DB also implement flushing mechanism in order to speed up writes? I know that they are highly consistent so they need to persist data to disk, but they can simply append the entry in a log file just like NoSql DBs do, and in case of a network partition, first check the log file to sync data in actual database?

    • @bojandolinar1535
      @bojandolinar1535 ปีที่แล้ว

      Re 3 afaik that's what they already do. But sooner or later it has to write them to b-tree, which I guess is the real bottleneck.

  • @arthursoares610
    @arthursoares610 2 ปีที่แล้ว +12

    The dark mode was awesome. I think it could be the default from now on

    • @guptaanmol184
      @guptaanmol184 2 ปีที่แล้ว

      I came here to say this! +1 for dark more, finally!

  • @RajinderYadav
    @RajinderYadav 11 หลายเดือนก่อน

    I love these byte sized video, helps introduce new concepts.

  • @susingh2
    @susingh2 2 ปีที่แล้ว

    I never experienced such a communicative video with such a simple and easy explanation.. Thanks Alex ..Please keep it up and upload more such videos. I have anyway bought both volume of System Design Book. Thank you so much !!!

  • @Andrew-rc3vh
    @Andrew-rc3vh 2 ปีที่แล้ว +11

    That's a cool trick. So what it is essentially doing is spreading out the computer's resources across time, where with a traditional database you will get lots of spikes in processing on the timeline when doing reads and writes. Mind you the attraction of SQL was that you could have multiple indexes and create custom views on the data in a highly relational way, but granted that was very expensive on resources. I think traditional databases were mostly optimised for the deficiencies of the mechanical hard disk drive. I think it may end up as redundant in future as we store our data more on chips.
    Thanks for the video. I like videos that don't waste your time with long BS intros.

    • @BRBearUSA
      @BRBearUSA ปีที่แล้ว +1

      You read my mind when it comes to "spreading the computer's resources across time"... Which for some use cases makes perfect sense, but not all use cases.
      Not sure I agree with the "deficiencies of mechanical HDDs" part of your comment though. But great comment overall.

  • @doxologist
    @doxologist 2 ปีที่แล้ว

    Perhaps the best educational systems content on the whole of TH-cam right now. Great stuff

  • @anilkaliya3375
    @anilkaliya3375 ปีที่แล้ว

    Nicely explained. I read all this information in designing data intensive application book. But few topics like memorable and sstables were still bit unclear to me. Got the whole idea now. Great stuff. Keep going

  • @lechprotean
    @lechprotean 2 ปีที่แล้ว +20

    great, you make it sound so simple, I'm writing my own nosql db this weekend ;)

  • @shoobidyboop8634
    @shoobidyboop8634 2 ปีที่แล้ว

    Stuff like this is the future of many forms of education.

  • @bobdinitto
    @bobdinitto 2 ปีที่แล้ว

    I've often wondered how NOSQL databases can achieve higher write throughput than relational DB. Thank you for sharing the techniques involved. Your explanation is clear and the graphics are excellent!

  • @LeoFuso
    @LeoFuso 2 ปีที่แล้ว +1

    Great work! I wish I had watched this video before trying to learn the architecture behind RocksDB by only looking their documentation, hahaha! Awesome work!

  • @KostiantynKostin
    @KostiantynKostin 2 ปีที่แล้ว +2

    6:56 correction regarding Bloom Filter
    If the Key doesn't exist, bloom filter will NOT return False Negative
    If the Key doesn't exist, bloom filter MAY return False Positive
    In other words.
    If bloom filter says there's NO key in page, there's definitely no key in page.
    If it says that there's the key in page, there MIGHT be the key in page with some probability.
    This probability data structure helps to reduce unnecessary reads.

    • @Winnetou17
      @Winnetou17 5 หลายเดือนก่อน

      That's exactly what is said in the video

  • @carolinegr
    @carolinegr 2 ปีที่แล้ว +4

    One additional thing to mention (around th-cam.com/video/I6jB0nM9SKU/w-d-xo.html perhaps) is that the writes are written to memory AND a transaction log to ensure durability. Otherwise whatever was in the MemTable will not persist after a crash. The transaction log can be replayed to rebuild the MemTable.

    • @ByteByteGo
      @ByteByteGo  2 ปีที่แล้ว +5

      Thank you for the feedback.
      We did have that initially, but decided to take it out to focus on the LSM tree itself.
      We knew someone would bring it up, but didn't think it would take this long. 😂
      We are glad you did.

    • @ibgib
      @ibgib 2 ปีที่แล้ว +1

      I was actually wondering if this were a difference with what I understand of relational dbs. Thanks for pointing it out.

  • @tomtomsiesie5436
    @tomtomsiesie5436 2 ปีที่แล้ว +1

    Another amazing video! This format is so vaiuable.

  • @kwalter_6557
    @kwalter_6557 2 ปีที่แล้ว +3

    You’ve been hitting it out of the park with these videos! Really enjoy the content.
    6:50 “(bloom filters) return a firm no if a key does not exist”. That’s not quite right. Bloom filters returns a firm yes if a key exists. If a key does not exist, it might still return yes with a low probability.

    • @santoshjoshi3396
      @santoshjoshi3396 2 ปีที่แล้ว +3

      He is correct .. false positive is possible but not false negative

    • @kwalter_6557
      @kwalter_6557 2 ปีที่แล้ว +2

      That’s exactly right!
      The subtlety here is in his claim that
      1. Key does not exist -> returns no
      2. Key “might” exist -> probably returns yes
      Whereas bloom filters really guarantees
      1. Returns no -> key does not exist
      2. Returns yes -> key probably exists
      i.e. “bloom filters returns a firm no (only) if a key does not exist”

    • @KostiantynKostin
      @KostiantynKostin 2 ปีที่แล้ว

      You are correct. Video didn't communicate properties of Bloom Filter well.

    • @yan0kyan0
      @yan0kyan0 2 ปีที่แล้ว

      in other words, a query returns either "possibly in set" or "definitely not in set".
      definitely not in set -> if the key does not exists, 100 % no answer.
      en.wikipedia.org/wiki/Bloom_filter#:~:text=A%20Bloom%20filter%20is%20a,a%20member%20of%20a%20set.

    • @Winnetou17
      @Winnetou17 5 หลายเดือนก่อน

      What is wrong with you people ? That's exactly what's said in the video. And much simpler than what you try to explain. Am I missing something ? There's other comments like this too, arguing that somehow the video is incorrect, when in fact it is, or at least what they say is the same as what the video says.
      Which is if the bloom filter says there's no key, you can trust that and skip to the next level.

  • @balaclava351
    @balaclava351 2 ปีที่แล้ว

    Great video. I'm a junior dev that has to implement a chat feature in the next few weeks. This really helped me understand NoSQL. Thanks.

  • @pashachechehov3483
    @pashachechehov3483 ปีที่แล้ว

    Great visualization of "Designing Data-Intensive Applications" book

  • @roct07
    @roct07 ปีที่แล้ว

    This is so high quality. Thank you :)

  • @AndyThomasStaff
    @AndyThomasStaff ปีที่แล้ว +1

    Great explanation, thank you, this helped reinforce my learnings about LSM-trees from reading. The graphics were especially helpful

  • @Konservator69
    @Konservator69 2 ปีที่แล้ว +3

    Good video. For the further topic development it'd be interesting to see a LSM tree vs Redis RDM/AOF persistence schema comparison.

  • @modolief
    @modolief 2 ปีที่แล้ว +1

    Sublime, superb, excellent ... again.

  • @paramvirsingh5640
    @paramvirsingh5640 2 ปีที่แล้ว

    Looking at how beautifuly my tired brain understood this, this video deserves a noble prize.

  • @quirkyquerty
    @quirkyquerty 7 หลายเดือนก่อน

    in their book, bloom filters are stored on disk, but here they're shown to be in memory. Hopefully we'll get some eventual consistency

  • @TheOnlyRealBreadIntheWorld
    @TheOnlyRealBreadIntheWorld 8 หลายเดือนก่อน

    amazing content, thank you for all the hard work you do!

  • @sampleshawn5380
    @sampleshawn5380 2 ปีที่แล้ว +1

    Man this video is awesome, so much information loved it!

  • @alamelu85
    @alamelu85 2 ปีที่แล้ว

    Alex - Your lectures and contents are great assets to software engineering community.

  • @faris_id_music
    @faris_id_music 2 ปีที่แล้ว

    one of the best videos so far

  • @AllOfMyWat
    @AllOfMyWat 2 ปีที่แล้ว

    I can't wait for my next systems design interview!

  • @shakedrosenblat1925
    @shakedrosenblat1925 ปีที่แล้ว

    Thank you. great video, as always.
    I'd like if you guys could go into more detail

  • @singhsaubhik
    @singhsaubhik 2 ปีที่แล้ว +2

    This is an awesome overview of LSM tree. If someone wants dig deeper read "Design Data intensive applications".

  • @BiggRanger
    @BiggRanger 2 ปีที่แล้ว

    Excellent presentation!

  • @aus10d
    @aus10d ปีที่แล้ว

    Very interesting! Loved this video

  • @peepeepoopoo2243
    @peepeepoopoo2243 2 ปีที่แล้ว +1

    Great video!

  • @ibgib
    @ibgib 2 ปีที่แล้ว

    Like many have said, this is a great dense video at a good beginner depth for people like me. I hope there will be a relational db video that complements this if at all possible, since the quality level is so high 🤞

  • @eliyahubasa9401
    @eliyahubasa9401 2 ปีที่แล้ว +1

    Great Content, thank you very much.
    I'm already waiting for Tuesday for a new video, as much as I wait for a new One Piece episode.

  • @GyroCannon
    @GyroCannon 2 ปีที่แล้ว

    Not at all a question I had, but glad I watched because I extensively use Mongo for my own app

  • @sahilchadha9621
    @sahilchadha9621 4 หลายเดือนก่อน

    Please create a video on size tiered and level compaction strategy as well

  • @jiamingliu192
    @jiamingliu192 2 ปีที่แล้ว

    Absolutely amazing. Thank you for making the video!

  • @ThangNguyen-je8kv
    @ThangNguyen-je8kv 2 ปีที่แล้ว

    The next step after watching this video is to … watch it again 😂. Awesome as always.

  • @mnchester
    @mnchester 2 ปีที่แล้ว +1

    amazing video!

  • @arunkutube
    @arunkutube 9 หลายเดือนก่อน

    great explanation for beginners

  • @thelonearchitect
    @thelonearchitect 2 ปีที่แล้ว +4

    Thanks for the video. Your explanation rises a concern to me : since the memtable is in memory, what happens if the server crashes before flushing ?
    Is that memtable distributed or replicated ?

    • @lwfeagan
      @lwfeagan 2 ปีที่แล้ว +2

      Cassandra, for example, still has a write ahead log.

    • @nitinagrawalbst
      @nitinagrawalbst 2 ปีที่แล้ว +1

      Generally for the memory table write ahead log is maintained. Once the memory table is moved to create SSDTable write ahead log is deleted. In case of crashes write ahead log can be used to restore the memory table.

  • @plussin2760
    @plussin2760 9 หลายเดือนก่อน

    LSM Tree에 대한 이해와 작동 방식에 대한 개요를 알 수 있었습니다. ㄳ 합니다

  • @MostafaZeinali
    @MostafaZeinali 2 ปีที่แล้ว

    Thank you for the great video. Keep up the good work. ❤

  • @santozard
    @santozard 2 ปีที่แล้ว +1

    Freaking intuitive talk.

  • @StephenGillie
    @StephenGillie 2 ปีที่แล้ว

    Very cool. Could have
    - one process just taking DB writes and putting them in memory
    - another writes too-big variables to files on disk
    - next would go through files and flatten them (like continuous truncate/shrink on a transaction log)
    - last would take DB reads and go through the memory, bloom filters, and file structure to find and return the requested data.

  • @obedgennius1401
    @obedgennius1401 2 ปีที่แล้ว +1

    I really appreciated these videos !! Thanks you very much
    I would to know what software are you using to produce such presentation 🙏

  • @galeop
    @galeop ปีที่แล้ว

    Amazing video
    1:15 what is that "object key" ? The row key that is being edited/added to the keyspace ?

  • @aaronprindle385
    @aaronprindle385 2 ปีที่แล้ว

    Awesome work, thanks for this!

  • @ziakhan-tk7rk
    @ziakhan-tk7rk 2 ปีที่แล้ว +4

    What software do you use for animation in your videos?
    I am very curious

    • @wrondonparticual5113
      @wrondonparticual5113 2 ปีที่แล้ว

      I want to know as well. It is perfect!!!

    • @biwer-r
      @biwer-r ปีที่แล้ว

      @@wrondonparticual5113 Maybe this is something :-) th-cam.com/video/H5GETOP7ivs/w-d-xo.html

  • @jrabelo_
    @jrabelo_ 2 ปีที่แล้ว

    perfect explanation, thanks

  • @rbelatamas
    @rbelatamas ปีที่แล้ว

    great explanation ❤

  • @sophiiisticated
    @sophiiisticated 2 ปีที่แล้ว

    I think the movie is recommended to left 30 seconds instead of 10s for the ending credit because the next suggestion video covers (in the left bottom side) in the summary time

  • @Marcus-yc3ib
    @Marcus-yc3ib 2 หลายเดือนก่อน

    Thank you very much.

  • @seattle_bach
    @seattle_bach 2 ปีที่แล้ว

    great explanation!

  • @willl0014
    @willl0014 2 ปีที่แล้ว

    So much knowledge!!!

  • @axa993
    @axa993 2 ปีที่แล้ว

    Awesome overview.

  • @thisisnotok2100
    @thisisnotok2100 2 ปีที่แล้ว +1

    yeah I freakin love this channel

  • @caseyspaulding
    @caseyspaulding ปีที่แล้ว

    Thank you!

  • @jiankuang9890
    @jiankuang9890 4 หลายเดือนก่อน

    Question regarding the books: volume 1 and volume 2 are considered as two separate editions for the same content or the same edition for different contents? Another way to ask: Should I but both volumes or just volume 2?

  • @pdteach
    @pdteach 2 ปีที่แล้ว +1

    Simply best

  • @hamsalekhavenkatesh3440
    @hamsalekhavenkatesh3440 2 ปีที่แล้ว +1

    Amazing!

  • @dmytrosolovei6025
    @dmytrosolovei6025 2 ปีที่แล้ว

    Love your videos!

  • @javisartdesign
    @javisartdesign 2 ปีที่แล้ว

    very well explained. Thanks

  • @DarknessGu1deMe
    @DarknessGu1deMe 2 ปีที่แล้ว

    What would be a good example of an application benefiting from "fast write slow read" property of NoSQL DB? Based on what's presented, I'd say most user-facing application, like a typical service a startup would build (e.g. personal calendar organization, etc) doesn't sound like a good fit given reads are pretty important in user-facing traffic.

  • @allisonmachado
    @allisonmachado 2 ปีที่แล้ว

    awesome video indeed! thank you

  • @raviv5109
    @raviv5109 2 ปีที่แล้ว

    Thank you so much!

  • @junkahoolik
    @junkahoolik 2 ปีที่แล้ว

    relational databases don't use btrees, they use b+ trees. the only db i know of that uses btrees is actually mongodb.

  • @paddyd7642
    @paddyd7642 2 ปีที่แล้ว +1

    Thank you! When you say sorted, is it by some object id or time?

    • @big0bad0brad
      @big0bad0brad 2 ปีที่แล้ว +2

      Object ID, unless you design the system to use a high precision timestamp as the id, which could maybe be an interesting idea. If the Object ID is a timestamp, then object creations are already sorted which could further boost write performance, though there is probably no advantage in the case of updates or deletes.

  • @noahgsolomon
    @noahgsolomon ปีที่แล้ว

    awesome explanation

  • @stackunderflow5951
    @stackunderflow5951 2 ปีที่แล้ว

    The explanation of why SStables have to be sorted is not sufficient. In my opinion, the reason is that allowing the SSTables to be sorted allows faster merging and allows the index for the table to be sparse. It doesn't really allow faster table retrieval since running a binary search on disk is not so efficient.

  • @adamaiken00
    @adamaiken00 ปีที่แล้ว

    This is a great video. If I want ti go further is there any good reference for nosql?

  • @jhonsen9842
    @jhonsen9842 ปีที่แล้ว

    Gold Standard vidios.

  • @mohawkgwai
    @mohawkgwai 2 ปีที่แล้ว

    Cassandra also has Leveled Compaction Strategy so that slide comparing it to RocksDB is a little misleading

    • @raphaelcarvalho4288
      @raphaelcarvalho4288 2 ปีที่แล้ว

      Cassandra initially had size tiered only and later borrowed leveled from RocksDB to solve the space amplification problem, so it's not completely misleading.

  • @geck1204
    @geck1204 2 ปีที่แล้ว

    Wow this was great

  • @AmrishPandey
    @AmrishPandey 2 ปีที่แล้ว

    This is amazing video

  • @Chauhannitin
    @Chauhannitin 2 ปีที่แล้ว

    Very good animation

  • @RandomNullpointer
    @RandomNullpointer 7 หลายเดือนก่อน

    Any idea how Domino databases work in this regard? Do they follow the same concepts? Is the compaction process "Tiering" or "Leveling"?

  • @wingforce8530
    @wingforce8530 ปีที่แล้ว

    Actually, many modern SQL are also LSM Tree based, it's not limited to no-sql

  • @wave9303
    @wave9303 ปีที่แล้ว

    hi , can you do a video on how Splunk is use for devops and how it storing its data ?

  • @chaoluncai4300
    @chaoluncai4300 2 ปีที่แล้ว

    this is brilliant! I'm also wondering would this amount/level of knowledge for an advanced DS is enough for tech/system design interview? Obviously I think the interviewer won't ask for implementation so... ig im trying to know how deeper do we need to go than e.g. this channel's few minutes videos?

  • @abhijit-sarkar
    @abhijit-sarkar ปีที่แล้ว

    How is a key-value actually stored and retrieved from disk? Although a SSTable is sorted within itself, since the keys in different SSTables are not related, how does a read request find a key? Surely not by trying all SSTables one by one, that'd be too slow.

  • @ethanmye-rs
    @ethanmye-rs 2 ปีที่แล้ว

    Thanks! One of the things I find difficult to find good information on is structuring data. Given I want these x properties, how do so arrange the information to get them, and what technologies are required to do it.