System Design Interview - Top K Problem (Heavy Hitters)

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ส.ค. 2024
  • Please check out my other video courses here: www.systemdesi...
    Topics mentioned in the video:
    - Stream and batch processing data pipelines.
    - Count-min sketch data structure.
    - MapReduce paradigm.
    - Various applications of the top k problem solution (Google/Twitter/TH-cam trends, popular products, volatile stocks, DDoS attack prevention).
    Merge N sorted lists problem: leetcode.com/p...
    Inspired by the following interview questions:
    Amazon (www.careercup....)
    Facebook (www.careercup....)
    Google (www.careercup....)
    LinkedIn (www.careercup....)
    Twitter (www.careercup....)
    Yahoo (www.careercup....)

ความคิดเห็น • 646

  • @DickWu1111
    @DickWu1111 3 ปีที่แล้ว +448

    Jesus christ this guy's material is amazing... and each video is so compact. He basically never wastes a single word....

    • @antonfeng1434
      @antonfeng1434 2 ปีที่แล้ว +19

      I have to pause or rewind constantly, and watch every video twice to digest it.

    • @amitdubey9201
      @amitdubey9201 2 ปีที่แล้ว

      @@antonfeng1434 me too

    • @xordux7
      @xordux7 2 ปีที่แล้ว

      @@antonfeng1434 Same here

    • @jasonl412
      @jasonl412 4 หลายเดือนก่อน +1

      @@xordux7 Same here

  • @saurabhmaurya94
    @saurabhmaurya94 4 ปีที่แล้ว +605

    A summary of questions and answers asked in the comments below.
    1. Can we use use hash maps but flush it's content (after converting to heap) each few seconds to the storage instead of using CMS?
    For small scale it is totally fine to use hash maps. When scale grows, hash maps may become too big (use a lot of memory). To prevent this we can partition data, so that only subset of all the data comes to a Fast Processor service host. But it complicates the architecture. The beauty of CMS is that it consumes a limited (defined) memory and there is no need to partition the data. The drawback of CMS, it calculates numbers approximately. Tradeoffs, tradeoffs...
    2. How do we store count-min sketch and heap into database? Like how to design the table schema?
    Heap is just a one-dimensional array. Count-min sketch is a two-dimensional array. Meaning that both can be easily serialized into a byte array. Using either language native serialization API or well-regarded serialization frameworks (Protobufs, Thrift, Avro). And we can store them is such form in the database.
    3. Count-min sketch is to save memory, but we still have n log k time to get top k, right?
    Correct. It is n log k (for Heap) + k log k (for sorting the final list). N is typically much larger then k. So, n log k is the dominant.
    4. If count-min sketch is only used for 1 min count, why wouldn't we directly use a hash table to count? After all the size of data set won't grow infinitely.
    For small to medium scale, hash tables solution may work just fine. But keep in mind that if we try to create a service that needs to find top K lists for many different scenarios, there may be many such hash tables and it will not scale well. For example, top K list for most liked/disliked videos, most watched (based on time) videos, most commented, with the highest number of exceptions during video opening, etc. Similar statistics may be calculated on channels level, per country/region and so on. Long story short, there may be many different top K lists we may need to calculate with our service.
    5. How to merge two top k lists of one hour to obtain top k for two hours?
    We need to sum up values for the same identifiers. In other words we sum up views for the same videos from both lists. And take the top K of the merged list (either by sorting or using a Heap). [This won't necessarily be a 100% accurate result though]
    6. How does count min sketch work when there are different scenarios like you mentioned.... most liked/disliked videos. Do we need to build multiple sketch? Do we need to have designated hash for each of these categories? Either ways, they need more memory just like hash table.
    Correct. We need its own sketch to count different event types: video views, likes, dislikes, submission of a comment, etc.
    7. Regarding the slow path, I am confused by the data partitioner. Can we remove the first Distribute Messaging system and the data partitioner? The API gateway will send messages directly to the 2nd Distribute Messaging system based on its partitions. For example, the API gateway will send all B message to partition 1, and all A messages to partition 2 and all C messages to partition 3. Why we need the first Distribute Messaging system and data partitioner? If we use Kalfa as Distribute Messaging system, we can just create a topic for a set of message types.
    In case of a large scale (e.g. TH-cam scale), API Gateway cluster will be processing a lot of requests. I assume these are thousands or even tens of thousands of CPU heavy machines. With the main goal of serving video content and doing as little of "other" things as possible. On such machines we usually want to avoid any heavy aggregations or logic. And the simplest thing we can do is to batch together each video view request. I mean not to do any aggregation at all. Create a single message that contains something like: {A = 1, B = 1, C = 1} and send it for further processing. In the option you mentioned we still need to aggregate on the API Gateway side. We cannot afford sending a single message to the second DMS per each video view request, due to a high scale. I mean we cannot have three messages like: {A = 1}, {B = 1}, {C = 1}. As mentioned in the video, we want to decrease request rate at every next stage.
    8. I have a question regarding the fast path through, it seems like you store the aggregated count min sketch in the storage system, but is that enough to calculate the top k? I felt like we would need to have a list of the websites and maintain a size k heap somewhere to figure out the top k.
    You are correct. We always keep two data structures: a count-min sketch and a heap in Fast Processor. We use count-min sketch to count, while heap stores the top-k list. In Storage service we also may keep both or heap only. But heap is always present.
    9. So in summary, we still need to store the keys...count-min sketch helps achieve savings by not having to maintain counts for keys individually...when one has to find the top k elements, one has to iterate thru every single key and use count-min sketch to find the top k elements...is this understanding accurate?
    We need to store the keys, but only K of them (or a bit more). Not all.
    When every key comes, we do the following:
    - Add it to the count-min sketch.
    - Get key count from the count-min sketch.
    - Check if the current key is in the heap. If it presents in the heap, we update its count value there. If it not present in the heap, we check if heap is already full. If not full, we add this key to the heap. If heap is full, we check the minimal heap element and compare its value with the current key count value. At this point we may remove the minimal element and add the current key (if current key count > minimal element value).
    This way we only keep a predefined number of keys. This guarantees that we never exceed the memory, as both count-min sketch and the heap has a limited size.
    Video Notes by Hemant Sethi: tinyurl.com/qqkp274

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +94

      Hi Saurabh. This is amazing! Thank you for collecting all these questions and answers in one place. I would like to find time to do something like this for other videos as well.
      I have pinned this comment to be at the top. Thank you once again!

    • @ananths5905
      @ananths5905 4 ปีที่แล้ว +1

      Thanks a lot for this Saurabh!

    • @saiprajeeth
      @saiprajeeth 4 ปีที่แล้ว +6

      Need more people like you. Thank you

    • @saurabhmaurya94
      @saurabhmaurya94 3 ปีที่แล้ว

      @@atibhiagrawal6460 glad it's helpful!
      Might be worth posting a link to your notes in a standalone comment too so that everyone can see it

    • @atibhiagrawal6460
      @atibhiagrawal6460 3 ปีที่แล้ว

      @@saurabhmaurya94 That is good idea ! Thank you :D

  • @alexbordon8886
    @alexbordon8886 3 ปีที่แล้ว +18

    Your accent is hard to understand initially, but now I fall in love with you accent.

  • @souravmojumder5563
    @souravmojumder5563 5 ปีที่แล้ว +83

    This s one of the best system design video I came across in long time .. keep up the good work !

  • @lysialee2897
    @lysialee2897 3 ปีที่แล้ว +20

    i feel bad that im not paying for this video! the quality is beyond amazing

    • @aman1893_
      @aman1893_ 2 ปีที่แล้ว +1

      You shouldn't feel bad. With this much knowledge, he must be getting atleast $500k+ on his current job. And by now he must be looking beyond money and must be looking for making meaningful contribution to the society.

    • @ST-pq4dx
      @ST-pq4dx 2 หลายเดือนก่อน

      He is staff at stripe, 1M plus easy. He is just sharing his knowledge

  • @itdepends5906
    @itdepends5906 ปีที่แล้ว +2

    THIS GUY is SO COOL. Who else feel that when he's speaking, explaining difficult concepts in the most concise way possible - and also touching on what we really need to hear about?!

  • @sumedhabirajdar
    @sumedhabirajdar 2 หลายเดือนก่อน +1

    Most system design interview answers contains high level design decisions. These videos explains how the data flows. Which is something I needed.

  • @balajipattabhiraman
    @balajipattabhiraman 2 ปีที่แล้ว +9

    As luck would have it i had a similar question for make or break round in google and I nailed it since I watched it several times over before the interview. Got a L6 role offered at Google. Thanks for making my dream come true.

  • @MrAithal29688
    @MrAithal29688 10 หลายเดือนก่อน +3

    All videos in this channel are the best on YT in this category even to this date. You can find many other channels which may give similar data divided into more than 5 videos with a lot of fluff. Mikhael's video touches upon every important part without beating around the bush and also gives great pointers in identifying what the interviewer may be looking for. Kudos to all the videos in this channel !

  • @soulysouly7253
    @soulysouly7253 8 หลายเดือนก่อน +1

    I'm devastated.
    I just got out of a last round interview, it was my first time ever being asked a system design question.
    I used this channel, among others, to study, and this video is the ONLY video I didn't have time to watch.
    My interview question was exactly this, word for word.
    I made up a functional and relatively scalable solution on the fly, and the interview felt conversational + it lasted 10 minutes more than it should have, so I think I did alright, but I still struggled a lot in the begining and needed some help.
    Life is cruel sometimes.

  • @pulkitb4Mv
    @pulkitb4Mv 3 ปีที่แล้ว +6

    I love Mikhail's content, the video is so interactive that it looks like he is talking to you and he knows what is going inside your head :)

  • @HarkiratSaluja
    @HarkiratSaluja 3 ปีที่แล้ว +4

    How can someone even downvote this? This is just so amazing. Have not learnt so much in 30 minutes in my whole life.

  • @stefanlyew8821
    @stefanlyew8821 5 ปีที่แล้ว +26

    one of the best technical discussions I have seen

  • @deathbombs
    @deathbombs 3 ปีที่แล้ว +5

    19:05 slow path
    22:00 faster than map reduce but more accurate than countmin
    22:43 fast path
    25:38 Data partitioner is basically kafka that reads message(logs, processed logs w counts,etc..) and stores them to topics

  • @graysonchao9767
    @graysonchao9767 2 ปีที่แล้ว +2

    Great work. I am a senior engineer at a big tech company and I'm still learning a lot from your videos.

  • @admiringrubin2910
    @admiringrubin2910 2 ปีที่แล้ว +2

    couldn't solve this problem in an interview. found this gem of a video a month after. will get them next time!

  • @andrepinto7895
    @andrepinto7895 2 ปีที่แล้ว +10

    It is not enough to send the count min sketch matrix to storage only, you also need to send a list of all the event types that were processed, otherwise you have no way of moving from the matrix data to the actual values (before hashing). The only advantage over the map solution is that you don't need to keep all of it in memory at once, you can stream it as you go from disk for example.
    Calculating the min for each key is O(number of hash functions, H) and you need to do that for all types of events, so O(E*H). Then you use the priority queue to get the top K, O(E*log(K)), so total time complexity is O(E*H*log(K)).

    • @xiaopeiyi
      @xiaopeiyi ปีที่แล้ว +1

      Well, you are right. But I think the video is more about one of a general design for a single event type. Then we can start from here based on the functional requirement.

  • @NikhilSharad
    @NikhilSharad 3 ปีที่แล้ว +18

    You're amazing, by far the most detailed and deeply analysed solution I've seen on any design channel. Please never stop making videos.

  • @supremepancakes4388
    @supremepancakes4388 4 ปีที่แล้ว +23

    I wish all sys interview tutorials are like yours, with so much information precisely and carefully explained in a clear manner, with diff trade offs and topics to discuss interviewers along the way! Thank you so much

  • @sshks10
    @sshks10 3 ปีที่แล้ว +36

    Very clean explanation, which is rare nowadays, why did you stop ? It would be nice to see your new videos , good luck man!

    • @SharanyaVRaju
      @SharanyaVRaju 3 ปีที่แล้ว +1

      I agree. Can you please continue doing this?

    • @freeman-uq8xr
      @freeman-uq8xr 3 ปีที่แล้ว +1

      PLEASE MAKE MORE VIDEOS. WE WILL PAY FOR IT (ADD JOIN BUTTON)!

  • @radsimu
    @radsimu ปีที่แล้ว +4

    I think it is admirable that you explained all the inner workings. In a real interview you can probably skip the single host solution with the heap, that's good for an explanation on youtube. What I think is more valuable is to also propose some actual technologies for the various components to make it clear that you are not proposing building this from scratch. I'm surprised that Kafka Streams was not mentioned. Also for the long path, it is worth discussing the option to store the raw or pre-aggregated requests in an OLAP db like Redshift. The olap can do the top k efficiently for you with a simple sql query (all the map reduce magic will be handled under the hood), can act as main storage, and will also make you flexible to other analytics queries. Integrates directly with various dashboarding products and one rarely wants to do just top k.

  • @jianlyu700
    @jianlyu700 3 หลายเดือนก่อน

    OMG, this is still the best system design video i've ever seen. it's not only for interview, but also for actual system solution design.

  • @niosocket1
    @niosocket1 4 ปีที่แล้ว +13

    So funny, found this channel yesterday and watched this video and been asked pretty much same question at my interview at LinkedIn today. Thanks a lot.

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +2

      Funny, indeed )) This world is so small ))
      Thanks for sharing!

    • @niosocket1
      @niosocket1 4 ปีที่แล้ว +5

      Actually got an offer from Amazon, LinkedIn, Roku and probably Google as well. A lot of it because of this channel. Can’t recommend it enough! Thanks again!

    • @HieuNguyen-ty7vw
      @HieuNguyen-ty7vw 4 ปีที่แล้ว

      I was asked this same question at my interview last Friday and found out your video today :( Didn't nail it though, hope I can do better next time. Thank you Mikhail, hope you can spend time to create more video like this.

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว

      Wow, Sergey. You rock!
      And thank you for the praise.

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +2

      Time will come, Hugh. Just keep pushing!

  • @sevenb1t
    @sevenb1t 3 ปีที่แล้ว +4

    I had an interview step with AWS a couple of days ago and they asked me exactly this question. Thank you for your videos.

  • @abysswatcher4907
    @abysswatcher4907 2 ปีที่แล้ว +3

    For people wondering why heap complexity is O(nlog(k)) for single host top k, we do a simple optimization to pop least frequent item when heap size reaches K, so we have n operations each taking order log(k).

  • @FrequencyModulator
    @FrequencyModulator 4 ปีที่แล้ว +13

    This is the best explanation on system design I've ever seen. Thanks Mikhail, that helps A LOT!

  • @atabhatti6010
    @atabhatti6010 2 ปีที่แล้ว +4

    Excellent video. A key thing that you did at the end (and is very useful IMHO) is that you identified many other interview questions that are really the same problem in disguise. That is very good thinking that we all probably need to learn and develop. I encourage you to do that in your other design solutions as well. Thank you for another excellent video.

  • @VageeshUdupa
    @VageeshUdupa 2 ปีที่แล้ว +1

    Thank you very much!! I had gone over all your videos multiple times to understand it well. I had 2 interviews with FAANG in the last week and was offered a job in both! I have to say a lot of the credit goes to you!

  • @fasolplanetarium
    @fasolplanetarium 2 ปีที่แล้ว +8

    PLEASE come back and make videos again. There's no resource quite like this channel.

  • @HieuNguyen-ty7vw
    @HieuNguyen-ty7vw 4 ปีที่แล้ว +7

    The best system design answer I have seen on TH-cam. Thank you!

  • @joyshu6264
    @joyshu6264 4 ปีที่แล้ว +9

    Hands down the best system design videos so far !! and I have watched lots of the system design videos. Love how you start from simple and work all the way to complex structure and how it can applies to different situations.

  • @coolgoose8555
    @coolgoose8555 4 ปีที่แล้ว +1

    Ohhhh why I did not find this channel before.... The way you approach the problem and take it forward it make it so easy else the realm of system design concepts are huge.... We need more videos like this.... This is design pattern of system design.... Good Job!!!!

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว

      Glad to have you aboard, coolgoose8555! Thank you for the feedback!

  • @SystemDesignInterview
    @SystemDesignInterview  5 ปีที่แล้ว +3

    By some reason TH-cam hides valid comments. I can only see such comments in the inbox, but there is no way for me to reply. Let me re-post such comments on behalf of people who submitted it.
    From @Howell Pan
    so essentially we need to maintain a series of topK list (heap, map) using count-min, each represent one minute (or a few min or an hour, depending on the granular level required). So then we can aggregate/merge these resultset in order to get topK for last min, last hour, last day, last week, month etc.. is this the idea? I think there is an edge case where some key could have a count just below topK for many time slots in a row, but add together it could make topK. So we can't just maintain topK for each list, we need to maintain top (K+N) to handle this case, right?

    • @SystemDesignInterview
      @SystemDesignInterview  5 ปีที่แล้ว +1

      Yes, Howell, you got the idea correctly. And you are right, there is this edge case you mentioned. You actually explained what I asked in the video. To provide an example of why we need the whole data set to calculate top K list precisely (well done!). When we merge top K lists we may lose data. Keeping top (K + N) can help to reduce the total error, but will not help to eliminate the problem completely. That is why we rely on MapReduce jobs.

    • @Tony-cy2yr
      @Tony-cy2yr 4 ปีที่แล้ว

      @@SystemDesignInterview Will this N be the number of collision ? Will the keeping of top (K + N) shows more accurate results by sacrificing with including false positive results? (Say we request top 2 among A=3, B=2, c=1, d=1, and c & d has collision causing the combined count-min sketch table to have c=2, and we are returning top (2 + 1))

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +1

      Keeping (K + N) elements makes the system more accurate. As you pointed out in your example, the same element (C in your case) may appear on several different machines. C may not be among top K on each of the machines. But it may be among top (K + N), and eventually appear among top K on the Storage machine.
      What this N should be? I do not know. Meaning that I have not seen any papers that describe how to properly choose the value of N, to guarantee the desired error.

  • @JR-zz8bw
    @JR-zz8bw 2 ปีที่แล้ว +1

    I think the open question on this video is how the fast path stores and retrieves data. It's not really answered clearly in any of the comments I could find.
    It seems like we are missing an "aggregator" component, which combines the count-mins/heaps from all the fast processors. The video seems to imply we'd have a single count-sketch / heap per time interval. But this will put a huge contention on the database - every fast processor will have to lock the count-sketch and heap, add its local count-sketch/update heap, and store it back. So we will have a large contention on the DB. In addition, like others pointed out, we need the list of all video IDs to do this - so we can rebuild the heap. But that becomes impractical at large volumes.
    Only things I can think of are :
    1) Each fast processor stores its heap into the db (local heap) for the time interval. On query, we aggregate all the local heaps for the interval and build a new global top K heap. The query component can then store this in a cache like redis, so it doesn't need to be recalculated. This approach however requires we partition by video_id all views that are sent to the fast processor. Otherwise we can't accurately merge the local Ks. The problem with this, though, is we can get hot videos and those video counts will be handled entirely by a single processor.
    2) Use a DB with built in topK support, like Redis. In this case, we don't need to partition views at all and can balance across all fast processor. Each fast processor then stores a local map of video counts for a short period of time (like 5s), and periodically flushes all the counts to Redis. Redis takes care of storing topK in its own probabilistic data structure. Redis should be able to handle 10k RPS like this. If we need to scale it further, then we have to partition Redis on video_id, for example. And again, our query component will have to aggregate on read all the partitioned local topKs and merge sort them.

    • @abhijit-sarkar
      @abhijit-sarkar 11 หลายเดือนก่อน

      For option 1, if fast processors sends their local top-k to the aggregator, that should be enough to calculate global top-k for 1-minute. I don't think there's any need to send CMS to the aggregator. The aggregator creates 1-minute top-k by merging the local heaps, and the query service can simply read the value.

  • @karthikmucheli7930
    @karthikmucheli7930 4 ปีที่แล้ว +6

    OMG. I love these videos. Thank you so much for creating these. Please write a book or open a course, it may fund you to focus much time on very helpful content like this. I am very happy today.

  • @supremepancakes4388
    @supremepancakes4388 4 ปีที่แล้ว +4

    I got an offer from an interview I did next day after binging all your videos (looking forward to your distributed counter video!) on top of studying and reviewing all my previous notes on networking and algo. This really bridges a gap of knowledge for some of us here who had some experience in specific areas but don't have enough to put a whole system together or think about it this way, and when i used yours as part of my review material I always found myself feeling mentally prepared and confident to be in the driver's seat!

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +5

      Hi SupremePancakes. Really glad for you! Thanks for sharing. Always nice to hear feedback like this!
      Even though you passed the interview already, please come back to the channel from time to time. I want this channel not only help with interviews, but even more important, help to improve system design skill for your daily job.
      Helping someone to become a better engineer is what makes this all worthwhile for me.

    • @supremepancakes4388
      @supremepancakes4388 4 ปีที่แล้ว +1

      System Design Interview Of course!!! I look forward to more videos and how this channel grows

    • @tejvepa8521
      @tejvepa8521 4 ปีที่แล้ว

      System Design Interview In the fast path how is the heap constructued from count min sketch table?

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว

      Hi Tej. Please take a look at this comment and let me know if more details are needed: th-cam.com/video/kx-XDoPjoHw/w-d-xo.html&lc=UgzcpyPR8nmCoaxTV3Z4AaABAg.8xFD1xe1cgU91u3EpZgosP

  • @OKJazzBro
    @OKJazzBro ปีที่แล้ว +3

    This is an excellent video, but I am left with these questions:
    1. Count min-sketch does not really keep track of video IDs in its cells. Each cell in the table could be from several collisions from different videos. So once we have our final aggregated min-sketch table, we pick the top k frequencies, but we can't tell which video ID each cell corresponds to. So how would it work? I haven't come up with an answer for this.
    2. What would be type of database used to store the top k lists?
    I would just use a simple MySql database since the number of rows would not be very large if we have to retain top k lists for a short window of time (say for 1 week) and k is not too big. We can always add new instances of the db for each week of data if we need preserve data for older weeks. We would have to create an index for the time range column to efficiently search.

  • @SystemDesignInterview
    @SystemDesignInterview  5 ปีที่แล้ว +6

    By some reason TH-cam hides valid comments. I can only see such comments in the inbox, but there is no way for me to reply. Let me re-post such comments on behalf of people who submitted it.
    From @sachinjp
    Very detailed and in depth technical video on system design. Thanks for putting so much effort into this.

    • @SystemDesignInterview
      @SystemDesignInterview  5 ปีที่แล้ว +1

      Thank you for the feedback, @sachinjp. Hopefully this message will find you.

  • @arvind30
    @arvind30 3 ปีที่แล้ว +10

    One of the best system design channel ive come across! great job! I particularly liked how you were able to describe a fundamental pattern that can be applied in multiple scenarios

  • @sabaamanollahi5901
    @sabaamanollahi5901 ปีที่แล้ว +1

    I wish I could give this video a thousand likes instead of just 1 !!! these contents are fantastic!!!

  • @gameonline6769
    @gameonline6769 3 ปีที่แล้ว +1

    Thanks Mikhail. I can bet..this is the best channel on TH-cam. Just binge watch all the videos from this channel and you will learn so much.

  • @anderswb
    @anderswb 2 ปีที่แล้ว +2

    These are by far the best videos on system design for interviews. Thanks a lot for taking the time to make and publish these!

  • @alexanderyakovlev9311
    @alexanderyakovlev9311 4 ปีที่แล้ว +3

    This is yet another great System Design video in this channel! I have two thoughts that might help improve the solution: 1. The question of "top K frequent elements" does not require us to sort those top K elements, thus we can use "Quick Select" algorithm merely to find the kth element. The point is after we find the kth element using Quick Select, the array is partitioned such that the top K elements are in the first K positions (but not sorted). This gives the answer in log(n) time, which is a reduction from nlog(k); 2. When you really have a huge amount of data and counts to handle, why not partition the data simply using round-robin for each key? This way, each partition contains (about) the same data so we only need to calculate the result from one partition only. With this approach, we may consider all other partitions 'virtual' or imaginary (without actually using server nodes) so we save the design cost. What do you think?

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +8

      Hi Alexander. Thank you for the feedback and great questions!
      Here are some of my thoughts:
      - Quick Select has O(n) best and average case time complexity. O(n*n) in the worst case. You are correct that it still may be a bit faster on the fixed-sized list of size n. But I cannot say the same for a streaming data, when new events keep coming and we need to calculate/update top K list when every new event arrives. Heap guarantees log(k) time complexity. Running Quick Select on already partially sorted array should be around the same time, but I cannot say what is guaranteed worst-case complexity in this case.
      - I believe when you say round-robin you mean hash-based, right? So that events for the same video always go to the same partition. Because a "classic" round-robin means "choose the next one in a sequence of machines", which may mean that events for the same video may go to different partitions. So, if you mean hash-based, you are correct, we can use this approach.
      Two notes, though
      a. Hash-based partition may lead to "hot partitions" problem. I mention this in a video as well as talk in a bit more details in the latest (step by step interview guide) video.
      b. When we use count-min sketch, we do not need to partition data at all. Partitioning is needed to guarantee that only limited amount of data will be routed to a particular machine. But because both a count-min sketch and a heap use limited memory (independently how much data is coming), partitioning is not required at all. But this is true for the fast path only, when we calculate approximate results. To calculate accurate results we need to partition.
      Please keep sharing your thoughts!

  • @tusharahuja205
    @tusharahuja205 3 ปีที่แล้ว +3

    Very clear solution and something that can actually be used in an interview! Please keep making more of these.

  • @ahanjura
    @ahanjura 5 ปีที่แล้ว +4

    The system design video to beat. PERIOD!!!

  • @fredylg
    @fredylg 2 ปีที่แล้ว

    Sr your videos are gold, I got no interview but it’s rare to find architecture so well explained, thanks

  • @prateek_jesingh
    @prateek_jesingh 6 หลายเดือนก่อน

    This is one of the best system design videos on this topic I have come across. Thanks & keep up the great work, Mikhail!

  • @biaozhang1643
    @biaozhang1643 ปีที่แล้ว +1

    Wow! This is the best system design review video I've ever seen.

  • @MEhagar
    @MEhagar 3 ปีที่แล้ว +5

    My intuition for why we need the entire data set of video view events in order to calculate the top-k videos in an hour:
    If k=2, and during the first 1-minute period the top three videos are A with 3 views, B with 3 views, and C with 2 views. In the second 1-minute period, the top two videos are D with 3 views, E with 3 views, and C with 2 views. When computing the top-k videos for a 1-hour period, If we only had the 1-minute data for videos available, we would not have the data for the "C" video available because we only stored data for the top two videos at each video. However, over the 2-minute period, the "C" video has actually been viewed the most (4 times).

    • @ymdcool
      @ymdcool 2 ปีที่แล้ว

      Good example to the question 11:16

  • @mohitnagpal8025
    @mohitnagpal8025 4 ปีที่แล้ว +1

    I have seen lot of system design videos but this content's quality is way above rest. Really appreciate the effort. Please keep posting new topics. Or you can pick top k heavy hitters system design problem requests from comments :)

  • @HeckscherHH
    @HeckscherHH ปีที่แล้ว +2

    I think your great coverage of the topic show how you really know it and understand it compared to other guys who just share what they read last night. Thank you

  • @manasdalai3934
    @manasdalai3934 4 ปีที่แล้ว +3

    This is one of the best system design content I have came across. Thanks a lot.

  • @arunmagar4836
    @arunmagar4836 2 ปีที่แล้ว +1

    These videos are gem for System design noob like me.

  • @boombasach
    @boombasach 10 หลายเดือนก่อน

    Among all the materials I have seen in youtube, this is really the top one. Keep up the good work and thanks for sharing

  • @drakezen
    @drakezen 5 ปีที่แล้ว +1

    Bonus on mentioning using Spark and Kafka as I was thinking that during the video. Great stuff as usual!

  • @andreytamelo1183
    @andreytamelo1183 3 ปีที่แล้ว +4

    Misha,
    Loved the structure as well as depth and breadth of the topics you touched on!

  • @dc5
    @dc5 3 ปีที่แล้ว +1

    Awesome videos Mikhail... thanks a lot for sharing! That last part showing other problems with similar solutions was the cherry on top.

  • @balajipattabhiraman
    @balajipattabhiraman 2 ปีที่แล้ว +2

    Phenomenal. We do something very similar with hot and cold path in microsoft. Instead of countmin sketch we use hyperloglog

  • @abbasraza4991
    @abbasraza4991 4 ปีที่แล้ว +1

    Excellent video! Has depth and breadth that isn’t seen elsewhere. Keep it up!

  • @Gukslaven
    @Gukslaven 3 ปีที่แล้ว +2

    These are the best videos on system design I've seen, thanks so much!

  • @chuka231d8
    @chuka231d8 7 หลายเดือนก่อน

    This is the most tech intense 30min video I've ever seen :) Thank you!

  • @SystemDesignInterview
    @SystemDesignInterview  5 ปีที่แล้ว

    By some reason TH-cam hides valid comments. I can only see such comments in the inbox, but there is no way for me to reply. Let me re-post such comments on behalf of people who submitted it.
    From @Howell Pan
    if we send all the events to a Kafka event server, and on the subscriber end have a cluster of data processing service that aggregate and save data/count into a non-sql db (essentially one billion key-value pair), at the same time maintain/update top K in the cache, using the same logic (already in the cache -> update value, add if not yet reach K total and replace existing if new key with higher value count), read service will read from cache directly. Wouldn't this be both fast and accurate?

    • @SystemDesignInterview
      @SystemDesignInterview  5 ปีที่แล้ว

      These all are very good considerations. But there are some challenges we will need to address along the way.
      First such challenge is "hot" partitions. Kafka consumers will experience uneven load distribution. People search some keywords much more often than others. Or some video got viral. "Hot partition" problem is much more often in practice than it might seem. And quite complex. That is one of the reasons why we introduced an additional layer - Partitioner.
      Another reason, data partitioning helps each processor to aggregate only a portion of the whole data set, as we cannot aggregate data for the whole data set on a single processor instance (may be only for a very short period of time). So, processing service you mentioned will need to take care of this. We need to wisely partition the data either before putting it into Kafka or right after consuming the data.
      Let's say we solved problems mentioned above and database contains frequency counts for all the items. How to calculate top K list accurately based on this data? The procedure you mentioned, when we keep top K list in cache and replace items, only work when all frequency counts are calculated. Why? You actually explained it already in your other comment. When some item may never be a part of top K list for each minute, but the sum over a longer period (e.g. 5 minutes) will allow this item to appear in the final top K list. So, if we want to count top K list for a day, we will first need to wait till the end of that day, accumulate all frequency counts in the database, iterate over the items in the database and calculate the top K list. And I am not really sure how cache can help us to speed the things up.
      But I agree that cache can help if accuracy is not important and we can calculate a top K list approximately. Procedure you mentioned will help with this.
      Please let me know if I missed anything.

  • @prashub8707
    @prashub8707 2 ปีที่แล้ว +1

    So far I am loving it. Keeps me glued to ur channel. Fantastic job I must say

  • @DinkarGahoi
    @DinkarGahoi 4 ปีที่แล้ว +1

    This is by far the best content I have found on the System Design. I am addicted to this content.
    Keep up the good work, waiting for more videos .. :)

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +1

      Glad you enjoy it, Dinkar! Sure, more videos to come. I feel very busy these days. But I try to use whatever time is left to work on more content.

  • @andrewsouthpaw
    @andrewsouthpaw 3 ปีที่แล้ว +2

    9:08 🤯 It's so cool seeing various algorithm problems actually applied to real-world designs!

  • @terigopula
    @terigopula 3 ปีที่แล้ว +2

    Your content is PURE GOLD. Hats off! :)

  • @xiaopeiyi
    @xiaopeiyi ปีที่แล้ว

    It's really helpful. I already watched each videos so many times, I learned a lot. Initially, I was so frustraded with the accent(I am not native Eng speaker either). But now I am okay watching it without CC.

  • @pradyumnakadapa5665
    @pradyumnakadapa5665 3 ปีที่แล้ว +1

    Nicely structured ! covering both depth and breadth of the concepts as much as possible.

  • @anshulgolu123
    @anshulgolu123 3 ปีที่แล้ว +1

    Please do more of them as your videos are very good from a content perspective :) Extremely informative ...

  • @theghostwhowalk
    @theghostwhowalk 3 ปีที่แล้ว +2

    Great video. Request you to cover couple of popular System Design questions when get chance: (1) recommendation of celebrity on Instagram or Song Recommendation (2) Real time coding competition and display 10 top winners.

  • @lolista
    @lolista 3 ปีที่แล้ว +1

    watched other videos before this.. so liking this before starting...

  • @shekharbhosale4234
    @shekharbhosale4234 8 หลายเดือนก่อน

    Excellent video and great explanation. One further improvement can be done in Slow processing path: Instead of using hadoop MapReduce, if you use Apache Spark for MapReduce, it will save more time. Because Spark uses in memory processing i.e. it does not store intermediate stage results on HDFS (it keeps them in memory), which makes it faster than Hadoop.

  • @AbdulSamadh-lg7lt
    @AbdulSamadh-lg7lt 2 หลายเดือนก่อน +1

    I believe this can explain the reason why we cannot use 60 1-min stores of topK elements to create 1 hour store of topK elements.
    Imagine we only have 4 possible videos to watch on youtube(A, B, C, D) and we want to find top3 viewed videos for every 1-hour window given we have all the top3 video view counts for 60 1-min windows.
    Imagine in the first 59 1-min windows we get the following:
    10 views of A,
    10 views of B,
    10 views of C,
    9 views of D
    And in the 60th window we get the following:
    10 views of A,
    10 views of B,
    5 views of C,
    70 views of D.
    In one hour these were the actual views of the 4 videos:
    views of A = 10*60 = 600
    views of B = 10*60 = 600
    views of C = 10*59 + 5 = 595
    views of D = 9*59 + 70 = 601
    Top3 should return D,A and B
    But if we had only 60 1-min window information these are the views we can count:
    views of A = 10*60 = 600
    views of B = 10*60 = 600
    views of C = 10*59 = 590 (we dont count the 5 from the last 1-min window because top3 in last 1-min window was D, A and B)
    views of D = 70 (we dont count 9*59 from the first 59 1-min videos since top3 were A,B and C.
    So with this calculation we would incorrectly get A,B,C has the top3 instead of D,A,B.
    Hope this helped.

  • @VarunMittal-viralmutant
    @VarunMittal-viralmutant 2 ปีที่แล้ว +7

    29:48 If someone is wondering, like I was, why merging two 1hr top-K lists will not give an accurate one 2-hr list here is the explanation:
    Each top-K list, in that hour is based on the data available for that hour only. That means, data is local to 1hr window only and not cumulative. When we move to the next hour, all the previous data is discarded. So, while a video might be the most watched in some hour X, it might not be watched at all in hour X+1, but will still be eligible as a candidate while create top-K list from 2K elements(2 * 1hr top-K list elements).
    Simple example:
    X hour top 5(k=5) = V1(10), V2(9), V3(8), V4(7), V5(6). Where V1(10) means, Video 1 watched 10 times. Say a video V6 was watched 5 times, so it could not make to the list
    X+1 hour top 5 = V8(11), V9(10), V10(9), V11(7), V12(6). V6 was watched 5 times again, but could not make it to the list
    But the interesting part is all the top videos V1-V5 in X hour were never watched in X+1 hour. Likewise, V8-V12 were never watched in X hour.
    If we create a 2hour top-5 list from these two, V6 will not even be considered even though it was watched total of 10 times in X and X+1 hours and our final list would be:
    V8(11), V9(10), V1(10), V2(9), V10(9)

    • @vancouverbill
      @vancouverbill 2 ปีที่แล้ว

      Thanks, this is the questions and answer I was looking for

  • @harjos78
    @harjos78 3 ปีที่แล้ว +2

    This is pure Gem!.. Take a bow ....

  • @crescentcompe8289
    @crescentcompe8289 2 ปีที่แล้ว

    amazing, i was like wtf you talking about at the beginning. It all makes sense now after the data retrieval part.

  • @soubhagyasriguddum4983
    @soubhagyasriguddum4983 4 ปีที่แล้ว +1

    this channel has the best System design explanations ... thank you so much and keep up the good work!!

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว

      Thank you for the feedback, Soubhagyasri! Glad you like the channel!

  • @gpranjan
    @gpranjan 3 ปีที่แล้ว

    One of the main reasons for inaccurate results is that events from certain API gateways may be delayed, or arrive late because of congestion. This is the primary motivation for the slow path. Was not obvious till I started thinking

  • @ashwin4911
    @ashwin4911 ปีที่แล้ว +1

    Hi Mikhail, I was going over this video again. I am not clear how count min sketch will save memory. Even if we have a predefined size width and height. We still need to know all the videos like A, B, C, D... so we can calculate the different hash values for them before doing a min operation to find the count. So that means we need to persist this list of videos somewhere for the period of time we are accumulating the counts.

  • @diptikaushik8250
    @diptikaushik8250 3 ปีที่แล้ว +2

    This is just incredible! Please do publish more videos.

  • @artemgrygor476
    @artemgrygor476 5 ปีที่แล้ว +7

    Thank you for such a detailed explanation. Awesome as usual!

  • @rahulsahay19
    @rahulsahay19 ปีที่แล้ว +1

    Awesome. Simply awesome. You killed it completely!

  • @warnercooler4488
    @warnercooler4488 2 ปีที่แล้ว

    The amount of info you have covered here is amazing! Thank you so much!

  • @harishshankar682
    @harishshankar682 4 ปีที่แล้ว +2

    Cant thank you enough for your efforts in sharing such a high quality content for us!

  • @algorithmimplementer415
    @algorithmimplementer415 4 ปีที่แล้ว +4

    Amazing video. Thank you! The way you structured it is commendable.

  • @NoWarForever
    @NoWarForever 3 ปีที่แล้ว +1

    The best that I have seen so far!

  • @reprogram_myself
    @reprogram_myself 3 ปีที่แล้ว +1

    i would use Flink(with apache Beam or not) which can substitute lambda architecture, since it can handle both batch and stream processing and do precise calculations using windowing. Basically u use windowing for aggregations and triggers to output intermediate result when needed.

  • @sachin_getsgoin
    @sachin_getsgoin 3 ปีที่แล้ว

    Awesome video. Discussion of various approach (with code snippet) and the drawback is the highlight. Thanks a lot!

  • @user-uj8rr7nn9q
    @user-uj8rr7nn9q 5 ปีที่แล้ว +3

    That's awesome learning material! I hope you can keep publishing new video about system design

  • @saurabhchoudhary9260
    @saurabhchoudhary9260 5 ปีที่แล้ว +7

    Awesome and detailed explanation. Hats off

  • @unfiltered_motivation
    @unfiltered_motivation 3 หลายเดือนก่อน

    Please come back man, the community needs you.

  • @ravitiwari2160
    @ravitiwari2160 3 ปีที่แล้ว +1

    Hey, Thank you so much all your knowledge sharing. I am able to perform very nice in all my interviews. Keep up the good work. More power to you.
    Keep rocking!!!

  • @nikhilkumarsaraf3290
    @nikhilkumarsaraf3290 4 ปีที่แล้ว +2

    All your videos are really amazing. I hope you would post it more often.

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +1

      Thank you, Nikhil. I will surely come back with more regular video postings.

  • @zhouchong90
    @zhouchong90 4 ปีที่แล้ว +1

    For Count Min Sketch, if you use multiple nodes to process then merge, you can only get the global frequency of an item, but not the top-K. Because after you merge the count, all you have is a table of hashed integers, and you don't know what's the original data that produced this hash.
    For this one, I think you can only do a master-slave model, and use a single node to do the counting, and produce top k using a heap.

    • @zhouchong90
      @zhouchong90 4 ปีที่แล้ว

      www.cs.rice.edu/~as143/Papers/topkapi.pdf This is an interesting paper that solves merging the count min sketch matrix from multiple processor nodes.

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว +1

      Hi Chong Zhou. Thank you for the question and the reference to the paper. A good read!
      Please take a look at other comments under this video, we address some of your concerns there. Let me provide a summary:
      - Fast Processor machines keep both a count-min sketch and a heap. Count-min sketch stores the frequency, while heap stores the local top-K (original data).
      - Each Fast Processor machine sends data to a single Storage machine. What data we send? You are right, we need to send the original data. In other words this is the data stored in the heap. We may or may not send the count-min sketch as well.
      - To avoid single point of failure in the face of a single Storage machine, we replicate data across several Storage machines.
      Please let me know if you have more questions. Will be glad to clarify further.

    • @alirezaamedeo
      @alirezaamedeo 7 หลายเดือนก่อน

      Hi@@SystemDesignInterview. I am wondering, why bothering with CMS when you want to maintain a heap on a Fast Processor?
      I think it makes sense that we maintain a list of IDs on each node and send them along with CMS tables to the merger node.
      But cannot get why a heap would be needed.
      I appreciate your thought here.

  • @reprogram_myself
    @reprogram_myself 3 ปีที่แล้ว +1

    and huge thank you for all your videos! They are the best I could find on system design!

  • @ashwin4911
    @ashwin4911 4 ปีที่แล้ว +2

    I was asked this problem during an interview recently and this system design was very helpful. Thanks :).

  • @kevin8918
    @kevin8918 4 ปีที่แล้ว +1

    It will be great to have a system design related to metrics and how to handle percentile calculation

  • @cnanavat
    @cnanavat 5 ปีที่แล้ว +5

    Excellent explanation ! I really appreciate your work!

  • @howellPan
    @howellPan 5 ปีที่แล้ว +6

    in the topK mapReduce, there is a edge case where a key would have count just miss being the topK in each of the local TopK, but the total value can be within the topK, right? so your local topK needs to be top (K+N), perhaps? e.g. local 1 have (A: 100, B: 80: C: 70, D: 60), local2 (E: 110, F: 100: G: 80: D: 60), local3 (X: 150, Y: 120: Z: 90, D: 60), if you do top(3), D would miss from each of thelocal top 3, but together D has the most count.

    • @SystemDesignInterview
      @SystemDesignInterview  5 ปีที่แล้ว +7

      I am really glad you see these edge cases! I myself got into the same logical trap.
      After the first MapReduce job calculates frequency counts, data for the same item always go to the same local mapper, it is never split between mappers. Let's take a look at your example.
      After the first MapReduce job (frequency count) we have:
      A 100
      B 80
      C 70
      D 180 (60 + 60 + 60) !!!
      E 110
      F 100
      G 80
      X 150
      Y 120
      Z 90
      Splitter will assign A, B, C to the first mapper. D, E, F - to the second. The rest - to the third mapper. So, value of D will never be split between multiple mappers.
      Once again, kudos to you for asking this insightful question.
      P.S. I have also replied to your other questions. Please check.

    • @yueliang171
      @yueliang171 4 ปีที่แล้ว

      @@SystemDesignInterview following up this question, generally for any map reduce job, if one key has too many items that cannot fit in a machine, what can we do? I am trying to think of an example.. this may not be a good one: index inversion. if one word has too many pages associated with it. Is this situation real?

    • @SystemDesignInterview
      @SystemDesignInterview  4 ปีที่แล้ว

      Hi Yue Liang!
      It is possible to create a MapReduce job with too many keys landing on a single reducer machine. There are several mechanisms that helps to improve the situation. Let me name the few:
      - Combiners, when before passing data to the actual reducer we try to reduce on the mapper machine. As a result, more data is aggregated on the mapper side and less values are sent to the reducer.
      - Let me call it a cascade of MapReduce jobs. When we have not a single MapReduce job, but a chain of them. One MapReduce job splits data into regions and calculates per-region output. Another job then calculates global output. And we define how large each region can be.
      - Sampling. When we intentionally drop some data on the floor, so that remaining data always fit a reducer.
      - Use data compression. So that large data sets can be transferred to the reducer. We then read and process values in chunks on a reducer machine.

    • @jiongwu8119
      @jiongwu8119 3 ปีที่แล้ว +1

      @@SystemDesignInterview
      I have a similar question, but happened at 29:39 when aggregating a 5-minutes top-K from 1-minute top-K.
      Using the same example:
      If we have a top-3 record list for every minute, like following.
      1:01: (A: 100, B: 80: C: 70, ~D: 60~)
      1:02: (E: 110, F: 100: G: 80: ~D: 60~)
      1:03: (X: 150, Y: 120: Z: 90, ~D: 60~)
      And we want to get top-3 for 3 minutes (1:01 ~ 1:03), the ~D~ would miss, because it's not top-3 of any 1 minute slot.

    • @jiongwu8119
      @jiongwu8119 3 ปีที่แล้ว

      OK, never mind. You said in the video, it's approximate.

  • @shw4083
    @shw4083 2 ปีที่แล้ว

    This is really the best tutorial, and I hope there is article like this content!