Kafka Deep Dive w/ a Ex-Meta Staff Engineer

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 พ.ย. 2024

ความคิดเห็น • 174

  • @scarlettc123
    @scarlettc123 3 หลายเดือนก่อน +44

    relied on your videos heavily while preparing for my system design interview and accepted my staff engineer offer today. you're doing the lord's work by not putting this content behind a paywall. will recommend your stuff whenever someone asks me for interview prep resources in the future. 🙏🙏

    • @hello_interview
      @hello_interview  3 หลายเดือนก่อน +3

      Amazing work! So happy to help and thanks for sharing your story with us.

    • @pavankalyan6547
      @pavankalyan6547 หลายเดือนก่อน +1

      Hello, what other resources did you use to prepare for System Design interviews?

  • @KolliMadhukar
    @KolliMadhukar 3 หลายเดือนก่อน +47

    1 Hello interview video = 100 exponents and medium articles thanks a ton for these

    • @hello_interview
      @hello_interview  3 หลายเดือนก่อน +1

      🤯

    • @tionx126
      @tionx126 26 วันที่ผ่านมา +1

      For real. It's equivalent to like 1 year of FANG working experience

  • @ediancomachio2783
    @ediancomachio2783 4 หลายเดือนก่อน +36

    You have amazing teaching skills! The World Cup example was incredibly good and entertaining to watch. I’ve paused my interview journey for now, but I always watch your videos for the pure knowledge they provide. Thank you so much!

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน +7

      Who doesn't love the world cup! 😍

  • @aghileslounis
    @aghileslounis 3 หลายเดือนก่อน +10

    It's not just for an interview at this point but a VERY high quality Kafka course!
    God, you're so talented 😲
    I don't know if it's just me, but it's EXACTLY how I like to learn new things, diagrams, some code and a high quality high level overview. The rest I'll figure it out easily.
    People will love your courses if you decide to make some, it's very rare for someone of your level to take time to explain that well

  • @XzcutioneR2
    @XzcutioneR2 3 หลายเดือนก่อน +7

    These are super cool! I would love to see more of such deep dives into topics like Elastic Search, Flink, and Distributed Databases like Cockroach DB

    • @hello_interview
      @hello_interview  3 หลายเดือนก่อน +1

      Coming soon!

    • @anipendakur
      @anipendakur 2 หลายเดือนก่อน

      @@hello_interview Can't wait for Elastic Search!

  • @sidforreal
    @sidforreal 4 วันที่ผ่านมา

    Just went through the whole video, and loved it depth of the topics you have explained

  • @cineramafilms683
    @cineramafilms683 18 วันที่ผ่านมา

    This video is so so good! I am lucky to have found this before my interview!

  • @t.jihad96
    @t.jihad96 หลายเดือนก่อน

    This covers something that many other system design resources doesn't cover, which is the answer of why, when and trade-offs. Thank you so much.

  • @MrSnackysmorez
    @MrSnackysmorez 3 หลายเดือนก่อน +2

    I cannot thank you guys enough for putting these videos together! The way you lay the points out and provide the information really goes well with my learning style. Please keep them coming as I cannot get enough of these. Your content is the best out there in terms of teaching of teaching system design!

  • @杰-x2z
    @杰-x2z 27 วันที่ผ่านมา

    This is amazing introduction and deep dive.
    Suggestion: Better to introduce why kafka is high throughput and low-latency:
    1: sendfile() system call-> Zero-Copy Approach
    2. Sequential Append for disk storage.
    3. Distributed Architecture & Partitioning

  • @viveksharma-tt5nj
    @viveksharma-tt5nj 12 วันที่ผ่านมา

    Thank you so much for such details explanation !! after watching got another perspective of handling failure scenarios using retry and dead letter queue. Thanks again for such a great content !!

  • @bjugdbjk
    @bjugdbjk 26 วันที่ผ่านมา

    by far, this is the best , Explaantion with an usecase s the selling point for your channel subscription!
    Amazing efforts, your knowledge sharing is highly appreciated, Thanks a ton brother.
    Website is quite an another very informative resource, Thanks again for keeping it free.

  • @karvinus
    @karvinus 4 หลายเดือนก่อน +2

    I always struggled to choose between queues, stream and pub-sub but this video makes it super easy to understand what to use and when to use.

  • @amikaichuang
    @amikaichuang หลายเดือนก่อน

    The best system design video about Kafka I have ever seen.

  • @prashantmishra-yw4xu
    @prashantmishra-yw4xu หลายเดือนก่อน

    Great explanation, you made all the intricacies of kafka sound so simple, thank you so much for making this video !!!

  • @SantoshKumar2
    @SantoshKumar2 4 หลายเดือนก่อน +2

    Most awaited topic. Thank you for the detailed and insightful video on Kafka. Your every video is a gold mine. 🙏🏻 ❤

  • @sumitgadi7633
    @sumitgadi7633 หลายเดือนก่อน

    Great appreciation for the knowledge you have shared. I am waiting for new videos on System design or deep dives.
    Thanks a ton..!!

  • @59sharmanalin
    @59sharmanalin 12 ชั่วโมงที่ผ่านมา

    Love you brother, you are killin' it!! God sent person!

  • @anirudhheda9232
    @anirudhheda9232 26 วันที่ผ่านมา

    This is awesome. Answered all my questions within the first 15 minutes. The rest was just bonus xD. Would love to see more content!

  • @yankomirov4290
    @yankomirov4290 หลายเดือนก่อน

    Thank you so much for these, it's incredibly well-thought and easy to follow! Also love the practical example!

  • @nikolahuang1919
    @nikolahuang1919 หลายเดือนก่อน +2

    Your videos are super high quality!

  • @amitagarwal8006
    @amitagarwal8006 3 หลายเดือนก่อน +1

    Absolutely loved it! Especially the part explaining how different systems can utilise it. Waiting for more of these!!

  • @bjugdbjk
    @bjugdbjk 26 วันที่ผ่านมา

    A deep dive on a Postgres DB and on a Mongo, will be a great help!

  • @sushilprusty2766
    @sushilprusty2766 2 หลายเดือนก่อน

    You have saved my time by short , beautiful and point to point answer

  • @notrequired28
    @notrequired28 3 หลายเดือนก่อน +3

    Thanks Evan, nicely explained with enough depth. Can you consider adding a section why Kafka is fast even though it is durable (disk vs in-memory)? Also, a common decision point is to choose from different alternatives, for example in this case Kafka vs Kinesis, Kafka vs RabbitMQ etc; can you add when not to use Kafka and look for alternatives?

  • @danielryu6527
    @danielryu6527 4 หลายเดือนก่อน

    I have a system design tomorrow and this is a perfect timing to watch!

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน

      Good luck! You got this 💪

  • @prashantsalgaocar
    @prashantsalgaocar 4 หลายเดือนก่อน +2

    amazing discussions and pointers as always.. Evan... always look forward to your videos..

  • @goyalpankaj237
    @goyalpankaj237 3 หลายเดือนก่อน

    This is hands down the best Kafka explanation I have seen so far :)

  • @raederle9070
    @raederle9070 3 หลายเดือนก่อน

    This is great - keep them coming - if you produce it I'll consume it!

  • @JohnCF
    @JohnCF 3 หลายเดือนก่อน

    Great video! I'm glad to have found the youtube channel of HelloInterview. A lot more practically useful content and advice for actual system design interviews, compared to other channels in youtube.

  • @aIgojoe
    @aIgojoe หลายเดือนก่อน

    This was great! Thank you so much for giving this knowledge out!

  • @MrDianitaChan
    @MrDianitaChan 3 หลายเดือนก่อน

    Thanks so much for putting this video. I love the way you explain everything. Keep it up the good work

  • @fadygamilmahrousmasoud5863
    @fadygamilmahrousmasoud5863 3 หลายเดือนก่อน

    very very very insightful, keep this amazing working on this series up.
    Thanks.

  • @adhirajbhattacharya8574
    @adhirajbhattacharya8574 3 หลายเดือนก่อน

    Your process for teaching is amazing. Diagrams and perfect balance of high-level and low-level info, using the deep dives. Anyone interested to know more has enough base info to search themselves.
    Please make more of the technology deep dives. Also, if you could do some of difficult core concepts deep dives.
    Elasticsearch, mongodb, cassandra, graph dbs, something detailed on available load balancers, rate limiter, api gateways implementations deep dive. Geoindex or spatial index.

  • @ruleind
    @ruleind 3 หลายเดือนก่อน

    The mock interviews were very useful for me!
    Your content is the best! Do keep pushing out the content!

  • @alby_tho
    @alby_tho 4 หลายเดือนก่อน +1

    First!
    Shoutout to this channel! It really prepared me for all my system design interviews this cycle

  • @AvneetSingh_011
    @AvneetSingh_011 3 หลายเดือนก่อน

    Your videos are so informative and helpful , will love to see more videos from your side

  • @mohammednisar1994
    @mohammednisar1994 2 หลายเดือนก่อน

    I feel like a pro already! nice job

  • @abhilashbandi3866
    @abhilashbandi3866 4 หลายเดือนก่อน +3

    Superb. Someone who has just theoretical knowledge on Kafka helped me understand this "topic" little better. Request for a video on ZooKeeper (I think Kafka moved away from ZooKeeper to kRaft)

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน +1

      Yah exactly right re-kRaft. Consensus something I maybe should have mentioned but, while key to internals, not really necessary to know about in an interview.

    • @abhilashbandi3866
      @abhilashbandi3866 4 หลายเดือนก่อน

      @@hello_interview Thank you for the videos. Do interviews at staff/ principal level focus on consensus? At least, glance them?

  • @YoungY2-uu9rj
    @YoungY2-uu9rj 4 หลายเดือนก่อน +1

    Thank you. Love every video published so far.

  • @SaurinShah1
    @SaurinShah1 4 หลายเดือนก่อน +2

    Thank you for all you guys do!

  • @Nnngao4231
    @Nnngao4231 หลายเดือนก่อน

    Thanks, this is very helpful! It would be great also mention that how deep should we understand kafka for different levels of system design interview, maybe not all the dive deeps are required for E4?

  • @randymujica136
    @randymujica136 4 หลายเดือนก่อน +1

    Great explanation as always! I keep watching your videos over and over. As a previous commenter mentioned, it would be great a deep dive on ZooKeeper, it’s mentioned many times in Orchestration/Coordination scenarios along with Consistent Hashing and I think it would be valuable to understand how it works.

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน +1

      Feel free to vote for what you want to see next here! www.hellointerview.com/learn/system-design/answer-keys/vote

    • @randymujica136
      @randymujica136 3 หลายเดือนก่อน

      @@hello_interview done

  • @narasimhanmb4703
    @narasimhanmb4703 หลายเดือนก่อน

    Great Video! Takeaway for me was how kafka can decouple producers and consumers. That was awesome!
    One question: Isn't the "acknowledgement" setting more of a trade-off between consistency guarantees and latencies? And not directly related to durability?

  • @Pockykuma
    @Pockykuma 3 หลายเดือนก่อน +5

    I feel like I am committing a crime to watch this for free. Keep it up, Evan!

  • @SunilKumar-jl6dl
    @SunilKumar-jl6dl 4 หลายเดือนก่อน +1

    I have used Kafka a lot, but this video just enforced the nitty gritty details. Great content!

  • @3rd_iimpact
    @3rd_iimpact 4 หลายเดือนก่อน

    Listening via AUX while I’m driving. Love it. Curious to see it visually later.

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน +5

      Hello interview podcast lol

  • @NikhilJain2013
    @NikhilJain2013 3 หลายเดือนก่อน

    Nicely explained specifically the diff between topic and partition. Glad, you are making videos on system design. I have a doubt.
    As per your explanation, we create a queue for a consumer group for a topic which we called as partition, to scale more, we create more partitions of a single partition in different Brokers and same consumer group will be getting data from different brokers for the partitions which we created. Please let me know if my understanding is correct?
    topic = events
    consumer group = A, B
    partitions (queues) = events_A and events_B
    to scale more, we distribute events_A to 2 brokers, broker 1 and broker 2, some events will go to Broker 1 and same will go Broker 2
    now consumer group A will be getting data from events_A Queue (partition of Broker 1 ) and events_A Queue (partition of Broker 2 )

  • @franciscoizaguirre9069
    @franciscoizaguirre9069 2 หลายเดือนก่อน

    absolute BANGER

  • @redheart97
    @redheart97 2 หลายเดือนก่อน

    Amazing content!! Would love to see more deep dives. Maybe into some common AWS tools used in system design interviews.

    • @hello_interview
      @hello_interview  2 หลายเดือนก่อน +1

      There’s a dynamodb write up on our website!

    • @redheart97
      @redheart97 19 ชั่วโมงที่ผ่านมา

      @@hello_interview you guys are awesome, thank you for putting this all out there for free.

  • @RaviChoudhary_iitkgp
    @RaviChoudhary_iitkgp 3 หลายเดือนก่อน

    thanks for amazing explanation & deep dive into kafka :)

  • @nicolasanderson5881
    @nicolasanderson5881 หลายเดือนก่อน

    I saw this and I subscribed your channel.

  • @shikharupadhyay7435
    @shikharupadhyay7435 3 หลายเดือนก่อน

    Great video Evan.......

  • @mohamedessammorsy967
    @mohamedessammorsy967 4 หลายเดือนก่อน

    Really amazing, You explanied it really well,
    Thanks for the great effort :)

  • @fayezabusharkh3987
    @fayezabusharkh3987 3 หลายเดือนก่อน

    Thank you! Great explanation as always

  • @vadimbytsiv9461
    @vadimbytsiv9461 หลายเดือนก่อน

    Great vid! One thing about pub-sub. When multiple consumers consume same message, I believe that is called "broadcasting". It is still pub-sub though, as pub-sub is a more general term, meaning there is a producer-intermediate-consumer relation. When we have exactly one consumer per one message it is also a pub-sub. Please let me know if I messed up or not. Thanks

  • @anuragtiwari3032
    @anuragtiwari3032 4 หลายเดือนก่อน

    Liked even though I haven't watched the video. I know it will be a banger !

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน

      Don’t speak to soon haha

  • @fatemehrezaei3727
    @fatemehrezaei3727 4 หลายเดือนก่อน

    Thank you so much. Love your channel. Please provide a deep dive on Redis too. 🙏

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน +1

      Already got one! th-cam.com/video/fmT5nlEkl3U/w-d-xo.html

    • @kamalsmusic
      @kamalsmusic 4 หลายเดือนก่อน +1

      They did earlier

    • @fatemehrezaei3727
      @fatemehrezaei3727 4 หลายเดือนก่อน

      @@hello_interview Thanks a lot.

    • @fatemehrezaei3727
      @fatemehrezaei3727 4 หลายเดือนก่อน

      @@kamalsmusic Thanks a lot.

  • @Amin-wd4du
    @Amin-wd4du 2 หลายเดือนก่อน

    The most important part about limitations around number of consumers and partitions were not covered.

  • @bangbang86
    @bangbang86 3 หลายเดือนก่อน +1

    ElasticSearch deep dive would be great

  • @sankalpmishra8313
    @sankalpmishra8313 2 หลายเดือนก่อน

    Hi, You are an amazing teacher and your knowledge is superb. I learned a lot! Thank you! 1 question, so lets say if a consumer commits the offset after finishing the job, wont there be a possibility that kafka cluster can send same message to 2 consumers not knowing till which offset messages are consumed?

  • @trueinviso1
    @trueinviso1 4 หลายเดือนก่อน

    Love these deep dives, thanks!

  • @rupeshjha4717
    @rupeshjha4717 4 หลายเดือนก่อน

    Good going, please keep continuing this series!
    I had a question regarding consumer concurrency, which is not discussed in this video.
    Let's say I have 1 consumer group with 2 consumers running and topic having partition of 8, then each consumer will be assigned with 4 partitions when concurrency = 1, how the consumer gets affected if consumer concurrency is changed to 2 now ?

    • @yiannig7347
      @yiannig7347 4 หลายเดือนก่อน

      Are you asking what happens if consumer threads are increased from 1 to 2 for a single consumer instance in a group? If so, the consumer is still a single client of the broker, like kafka-client-01 and kafka-client-02.
      With more threads, the consumer can process messages from its assigned partitions concurrently, improving throughput. However, it still handles the same number of partitions overall.

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน +1

      Thanks for the assist!

  • @MyAeroMove
    @MyAeroMove 4 หลายเดือนก่อน

    Very well structured!

  • @hbhavsi
    @hbhavsi 3 หลายเดือนก่อน

    Amazing video, thanks so much for sharing! The person in the Redis video mentioned 5 key technologies that are either most common or one should know. Do you guys plan to cover the other 3 after Redis and Kafka? That would be AMAZING!! :) Which ones are those that you guys were referring to?

    • @hello_interview
      @hello_interview  3 หลายเดือนก่อน +1

      Planning content on ElasticSearch, Postgres, and Dynamo next. Some internal debate about #5 but you'll see those sometime in the coming weeks.

    • @hbhavsi
      @hbhavsi 3 หลายเดือนก่อน

      @@hello_interview amazing, thank you so much!!

  • @yipz
    @yipz หลายเดือนก่อน

    Thanks for the deep dive! When would you use SQS with FIFO over kafka?

    • @hello_interview
      @hello_interview  หลายเดือนก่อน +1

      If I need built in support for retries, viability timeouts, or am already deep in aws ecosystem are two places

  • @TheSmashten
    @TheSmashten 2 วันที่ผ่านมา

    Awesome content!! Can you guys do a video on Zookeeper?

  • @WyattsDeBestDad
    @WyattsDeBestDad 4 หลายเดือนก่อน

    In terms of Horizontal Scaling, from the accompanying article:
    "Horizontal Scaling With More Brokers: The simplest way to scale Kafka is by adding more brokers to the cluster. This helps distribute the load and offers greater fault tolerance. Each broker can handle a portion of the traffic, increasing the overall capacity of the system. It's really important that when adding brokers you ensure that your topics have sufficient partitions to take advantage of the additional brokers. More partitions allow more parallelism and better load distribution. If you are under partitioned, you won't be able to take advantage of these newly added brokers."
    My understanding is that Kafka can be scaled horizontally dynamically, perhaps as a system sees an unanticipated increase in volume. If thats correct, does the above imply that partitions can be added dynamically too? In the example cited, LeBron James campaign, I took that to mean that you'd add extra partitions for that campaign in anticipation of the additional traffic. In the case of hot partitions, can one of the prescribed techniques ( say random salting or compound keys ) be added on the fly? If this is non trivial can you maybe link to how this is achieved?
    Thanks so much!

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน

      In general, these are things handled by managed versions of Kafka, such as AWS MSK or Confluent Cloud. How they dynamically scale depends on each managed service. Typically, handling hot partitions is still not managed dynamically and requires conscious effort on the part of the developer.

  • @akshayjhamb1022
    @akshayjhamb1022 4 หลายเดือนก่อน

    For handling kafka consumer down we could turn manual commit offset on, There's option of AutoCommit Offset and a timer limit also when we should autocommit. Though Great video for revision for kafka. Also it would have been great if you had mentioned number of kafka consumer application limitations based on number of partitions.

  • @ItsMeIshir
    @ItsMeIshir 4 หลายเดือนก่อน

    Good video. Thanks for making it.

  • @mouleeswarkothandaraman7095
    @mouleeswarkothandaraman7095 2 หลายเดือนก่อน

    Hi Evan, great content! One question- how about using a time series DB like Influx or Prometheus for aggregation by time slices? Will that work?

  • @ninlar-codes
    @ninlar-codes 3 หลายเดือนก่อน

    Excellent job on this. This is so helpful. I'm familiar with Azure Service Bus and Azure Event Hubs, since we use the Azure Stack. With Azure Event Hubs, the consumers maintain their own bookmark or offset into each partition, so they can choose when to checkpoint and/or replay events / records / messages if needed. Does Kafka have something similar? If I commit the offset to Kafka, but I want to replay events due to data loss, can I reset my offset?

  • @ziake
    @ziake 3 หลายเดือนก่อน +1

    This is so great thanks a lot. One question with the diagram at @43:24. Is it actually possible to have the leader and followers of a partition be in the same broker? I thought that with 2 brokers as in the example, the max replication factor is 2, where the leader and follower are in each brokers

  • @shrishtigupta6902
    @shrishtigupta6902 4 หลายเดือนก่อน

    Thank you for this detailed video. But a quick question here - I'm still confused when to use RabbitMQ and when to use Kafka? Because both of them can be helpful for all the use cases

  • @chrisgu4121
    @chrisgu4121 3 หลายเดือนก่อน

    great video! one question, how does Kafka handle exact once delivery? is it good enough to set the idempotence on the producer to ensure that?

  • @harrylee27
    @harrylee27 3 หลายเดือนก่อน

    With SQS, you probably don't need a retry topic, the attempts are tracked in main topic, and you can configure when the retry attempts exceeds some threshold, to put the message into DLQ. Also consumer just tell SQS whether the message got processed, if failed or timeout, SQS will make the message visible again and increase the attempts count, and SQS will put message into DLQ if needed, not consumer.

  • @ayosef
    @ayosef 2 หลายเดือนก่อน

    Thank you very much!
    Which tool are you using for whiteboard? Looks very clean!

  • @Anonymous-ym6st
    @Anonymous-ym6st 2 หลายเดือนก่อน

    thanks for the great video, I am wondering to handle hot partition, can we also just use batch?

    • @hello_interview
      @hello_interview  2 หลายเดือนก่อน

      Yes, depending on your throughout requirements

  • @aforty1
    @aforty1 4 หลายเดือนก่อน

    Thank you!

  • @lil_n_co
    @lil_n_co 3 หลายเดือนก่อน

    Excellent!

  • @CS-eh8eo
    @CS-eh8eo หลายเดือนก่อน

    BTW, really enjoying the content!
    I think if you made a statement like "kafka never goes down" in an interview it would be a bit of an odd thing to hear as an interviewer... of course kafka could go down even if running a managed service. Your application should surely handle this scenario in such a way that the system can recover when Kafka comes back online, hopefully due to its durability and persistent storage of events, your application should be able to resume where it left off. Wdyt?
    Edit - I need to add precision to my statement here, so am I correct in saying Kafka is only as durable as how you configure it, specifically how you configure replicas and how producers set replica acknowledgement before marking commit transactions as complete. If acks=1 is set, the producer only waits for acknowledgement from the leader broker, which could fail before writing to replicas, and data could be lost. acks=all is called Leader and In-Sync Replica Acknowledgement.
    So in summary, Kafka can certainly fail, I think a good answer to this question would be detailing how to configure it for durability, how that will impact performance, whether the operation needs such data consistency vs availability, and how your application reacts to unavailability of the kafka broker, and how it recovers when Kafka comes back online.

  • @RezaZulfikarNaipospos-v4u
    @RezaZulfikarNaipospos-v4u 2 หลายเดือนก่อน

    please create use case for Hybrid Cloud Architecture. example an mobile retail application (on cloud) connect to branch system (branch can run on offline mode too) :D

  • @firezdog
    @firezdog 3 หลายเดือนก่อน

    i tried to summarize the example, but i'm not convinced i'm getting to crux of why events might be processed out of order:
    Example. Imagine we have a website covering a live event and we want to display up-to-date news as it occurs. There will be a producer (maybe a reporter on a keyboard) and a consumer. The producer will put updates on a queue and the consumer will process them and put them on the site. What happens if the number of events increases dramatically? For example, what if instead of covering one live event, our website wants to cover 10 live events?
    A single queue might not be enough to handle so many events (memory pressure), so our producer could start publishing to multiple queues. If we have a pattern of events
    A > B > C > D
    it's possible they will be distributed between the queues as
    Q1: A C
    Q2: B D
    In particular, the consumer might process A and B first. Network issues then delay the arrival of C so that it does not get to Q1 until after D arrives on Q2. Assuming the consumer isn't going to wait for Q1 to fill up and continues to work on Q2 (because that's the only source of work), the events are processed as
    A > B > D > C
    We can solve this problem by associating one queue with one event type. We may not publish updates between events in the order in which they occurred, but supposing D represented scoring a goal and C kicking a ball, we'll never publish events in an order that reverses the causality.
    At some point, though, we'll have so many events that a single consumer cannot keep up with them. But if we scale consumers, we need to be able to coordinate their work. We have to make sure, for example, that consumers don't process the same events. To fix this, we might try to distinguish our consumers into groups, each of which is responsible for handling a single queue (and incidentally preserving the ordering of items on that queue). (We need to guarantee events are processed at most and ideally exactly once.)
    The final problem we might run into is that dividing queues between specific live events might not give us a sufficient level of granularity if the events are of different kinds (soccer vs. football). In that case, we need to distinguish groups of queues as well according to *topic*.
    These examples show some of the use cases for Kafka: we want ensure that a stream of heterogeneous events is processed in order while partitioning those events at diferent levels of granularity.
    * Note: the example depends on network issues and may seem a little bit contrived. In fact, the mere interleaving of events between queues is enough to produce out of order reporting if the consumer cannot coordinate the way in which it processes events with the producer. I think the main point comes across if we can convince ourselves that without some additional framework, the order in which the consumer consumes will only coincide with the order in which the producer produces if we're lucky.

  • @nexovec
    @nexovec หลายเดือนก่อน

    Control question: If there's two kafka servers with partitions for topic A and a consumer subscribes to that topic A, how does it get an ordered log out of the two partitions on those two separate servers?

  • @sigabort6787
    @sigabort6787 10 วันที่ผ่านมา

    @hello_interview
    Can you do a video on Designing a Ci/CD system. ?

  • @sandeshsinha732
    @sandeshsinha732 หลายเดือนก่อน

    Note: Partition factor is number of followers and leader combined.

  • @konstantinwilleke6292
    @konstantinwilleke6292 3 หลายเดือนก่อน

    Great resource!
    What’s the name of the drawing/diagram app?

  • @SujeetBanerjee-b9g
    @SujeetBanerjee-b9g 2 หลายเดือนก่อน

    [22:32] What's flink - is that an alternative to Reddis? Is that a design for scalable "Leaderboard" type of application?

  • @Nick-lw7rj
    @Nick-lw7rj 4 หลายเดือนก่อน

    First off, thank you for these videos and resources, they are very valuable to anyone studying for interviews.
    I'm curious though, how would you improve the interview process as someone who's been on both sides of it for a number of years?
    I question the value of these interviews given that people are being asked to design massive systems, for billions of users, engineered by hundreds/thousands of people over a number of years, which were iteratively improved over time. They're expected to have a pretty ideal solution by having researched the problem or similar ones ahead of time, or much less often, having faced similar problems themselves. If someone was asked to actually design a system in a real production environment, they would spend ample time researching it ahead of time anyway, so I don't necessarily understand the value of them knowing it up front in an interview.
    I'm also curious how you would react if you were still interviewing people, and a candidate proposed a solution that's an identical or near-identical copy of yours. Would you pass them as long as they understood why each component is needed, and why certain technologies should be used over others? Would you have time to properly gauge that in a 45 minute interview once they've finished their design?

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน +1

      That's a big topic! One that likely requires a full blog post.
      I will say that, in general, we agree. The interview process within big tech is stuck in a local minima and is in need of a facelift. But as long as the supply of engineers exceeds demand, there isn't much incentive for companies. Their hiring process may have poor recall, but if precision stays high, they don't really care.

    • @Nick-lw7rj
      @Nick-lw7rj 4 หลายเดือนก่อน

      @@hello_interview agreed about a needed facelift, until then, the grind continues :) thanks again for these

  • @maazshaikh7905
    @maazshaikh7905 3 หลายเดือนก่อน

    can you kindly recommend any research papers about kafka that students can use for academics in order to learn about the history/development of kafka, some live case studies and further improvements in the field.

  • @rjl-s5p
    @rjl-s5p 4 หลายเดือนก่อน

    In the section about using Kafka for messenger, how would the topics and partitions for a messaging application like Messenger be structured to achieve low latency and high throughput? For example, if there are 1 billion users on the platform, would there be one billion topics, or a single topic with a billion partitions, one for each user (which I don't think is possible since the recommendation is 4k partitions per broker and max of 200K per cluster)? Is there a different approach that could be considered? What are the tradeoffs for each option?
    And great video. Thank you for doing this.

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน

      Some alternatives discussed here: www.hellointerview.com/learn/system-design/answer-keys/whatsapp

  • @MASTERISHABH
    @MASTERISHABH 2 หลายเดือนก่อน

    Hey, I might be wrong but that batch time and size is not possible in kafkajs lib out of the box as every send works based on the provided ack and based on this it continues with rest of the code so batching messages won't get ack and thus won't work in js this way.
    Although, it does support sendBatch separately but if we have a API then that's not directly possible to batch unless we write a custom function to store messages as obj in js side and run periodically to flush out messages to Kafka but still size batching in js won't be so easy as per my understanding.
    Let me know if I'm missing something here.
    P.S: Talking about 39:28

  • @RezaZulfikarNaipospos-v4u
    @RezaZulfikarNaipospos-v4u 2 หลายเดือนก่อน

    how we monitoring kafka? what's metric we can focus set for alerting?

  • @Richard-yw9if
    @Richard-yw9if หลายเดือนก่อน

    topic A: why partion1 follower and partition 1 leader can be on same broker/server ? what 's the sense of that ?

  • @patrickshepherd1341
    @patrickshepherd1341 4 หลายเดือนก่อน

    I don't know if you'll see this, but please read if you do. I've been having a lot of trouble lately, and I would really value even just some very brief advice.
    I've been watching your videos a lot to prepare for an upcoming interview. I'm a PhD computer scientist, but I left professorship about a year ago to work in industry. I've been REALLY unsuccessful in landing anything, but there's never any feedback, so I don't know what I'm doing wrong. Just a lot of anonymous rejections. I'm literally facing bankruptcy. I have an interview on Wednesday, and so I'm learning all I can about modern system design. I'm hopeful, but trying not to get too excited. I'm really thankful for your videos and all the information you provide, though. It's really helping.
    Hypothetical question: if you became a single parent with no education at 24, but buckled down and raised your kid and took care of your sick mom and made it through an undergrad and a phd over the course of 10 years, and maybe don't have a huge professional footprint because your grad work + all the course materials you made don't add up to a very impressive github profile, what can you do to stand out more? Keeping a house running and raising a kid and taking care of a sick parent doesn't get many stars on your repos, but I think it speaks a lot to adaptability, perseverance, problem solving acumen, etc. But those aren't things you can typically bring up in a cover letter or resume, and you're not supposed to talk about it in interviews, so it just looks like I haven't done anything serious outside of my grad software. It feels like a catch 22.
    Can you give me any advice? I could wallpaper a house with all the rejects/ignores I've gotten.
    Thank you again for your great videos!

    • @hello_interview
      @hello_interview  4 หลายเดือนก่อน

      Really sorry to hear what you've been going through; it's not easy. We work with so many candidates lately who are similarly struggling to land a job in this market. I wish I had a silver bullet, but unfortunately not many novel insights I can offer here. Referrals help. I'd try to leverage connections as best you can if the main challenge is getting through the door to the first interview.
      Beyond that, look for companies with take-home assessments as the first round. This widens the number of candidates that get a shot and puts things back in your control; just need to crush the take-home.

  • @sergei5104
    @sergei5104 3 หลายเดือนก่อน

    I have a question: If I want to consume a message and then perform a long-lasting task (like web crawling) before committing the offset, does it mean that I need to have a configuration where the number of consumers is strictly equal to the number of partitions to avoid duplicate readings of the same message?

    • @hello_interview
      @hello_interview  3 หลายเดือนก่อน

      Nope, just gave them as part of the same consumer group.

  • @salahayman3513
    @salahayman3513 13 วันที่ผ่านมา

    can have same for RMQ and their use cases (kafka and RMQ) please

  • @damluar
    @damluar หลายเดือนก่อน

    22:55 In the Whatsapp design by Stefan, using Kafka was marked as a bad solution since Kafka doesn't scale well to millions of topics and Redis pub/sub was recommended as a better alternative. Do you agree with this? It would be nice to have a section on when not to use Kafka :)

  • @SAHILMANIAR
    @SAHILMANIAR หลายเดือนก่อน

    In batch consuming, from a batch of 100, I successfully processed 65 and then my service crashed. How is the commit / retry handled in this case?

  • @刘天旻
    @刘天旻 3 หลายเดือนก่อน

    I am working on a project recently that I want to process events asynchronous but in order. I am thinking of using Kafka/kinesis. How do I ensure that the two events are actually ingested into Kafka in order? What if event A ingestion is delayed with some network issue and event B which happened later got ingested before A?