relied on your videos heavily while preparing for my system design interview and accepted my staff engineer offer today. you're doing the lord's work by not putting this content behind a paywall. will recommend your stuff whenever someone asks me for interview prep resources in the future. 🙏🙏
You have amazing teaching skills! The World Cup example was incredibly good and entertaining to watch. I’ve paused my interview journey for now, but I always watch your videos for the pure knowledge they provide. Thank you so much!
It's not just for an interview at this point but a VERY high quality Kafka course! God, you're so talented 😲 I don't know if it's just me, but it's EXACTLY how I like to learn new things, diagrams, some code and a high quality high level overview. The rest I'll figure it out easily. People will love your courses if you decide to make some, it's very rare for someone of your level to take time to explain that well
I cannot thank you guys enough for putting these videos together! The way you lay the points out and provide the information really goes well with my learning style. Please keep them coming as I cannot get enough of these. Your content is the best out there in terms of teaching of teaching system design!
Thanks Evan, nicely explained with enough depth. Can you consider adding a section why Kafka is fast even though it is durable (disk vs in-memory)? Also, a common decision point is to choose from different alternatives, for example in this case Kafka vs Kinesis, Kafka vs RabbitMQ etc; can you add when not to use Kafka and look for alternatives?
by far, this is the best , Explaantion with an usecase s the selling point for your channel subscription! Amazing efforts, your knowledge sharing is highly appreciated, Thanks a ton brother. Website is quite an another very informative resource, Thanks again for keeping it free.
This is amazing introduction and deep dive. Suggestion: Better to introduce why kafka is high throughput and low-latency: 1: sendfile() system call-> Zero-Copy Approach 2. Sequential Append for disk storage. 3. Distributed Architecture & Partitioning
Great video! I'm glad to have found the youtube channel of HelloInterview. A lot more practically useful content and advice for actual system design interviews, compared to other channels in youtube.
Thank you so much for such details explanation !! after watching got another perspective of handling failure scenarios using retry and dead letter queue. Thanks again for such a great content !!
Thanks, this is very helpful! It would be great also mention that how deep should we understand kafka for different levels of system design interview, maybe not all the dive deeps are required for E4?
37:12 - The way idempotent producer mode works is that each producer is assigned a unique id, and each message from a producer is assigned a monotonically increasing sequence id (per topic). The server will discard a message if it's sequence id is not greater than the latest seen by the topic.
Great explanation as always! I keep watching your videos over and over. As a previous commenter mentioned, it would be great a deep dive on ZooKeeper, it’s mentioned many times in Orchestration/Coordination scenarios along with Consistent Hashing and I think it would be valuable to understand how it works.
This is so great thanks a lot. One question with the diagram at @43:24. Is it actually possible to have the leader and followers of a partition be in the same broker? I thought that with 2 brokers as in the example, the max replication factor is 2, where the leader and follower are in each brokers
Superb. Someone who has just theoretical knowledge on Kafka helped me understand this "topic" little better. Request for a video on ZooKeeper (I think Kafka moved away from ZooKeeper to kRaft)
Yah exactly right re-kRaft. Consensus something I maybe should have mentioned but, while key to internals, not really necessary to know about in an interview.
Your process for teaching is amazing. Diagrams and perfect balance of high-level and low-level info, using the deep dives. Anyone interested to know more has enough base info to search themselves. Please make more of the technology deep dives. Also, if you could do some of difficult core concepts deep dives. Elasticsearch, mongodb, cassandra, graph dbs, something detailed on available load balancers, rate limiter, api gateways implementations deep dive. Geoindex or spatial index.
Nicely explained specifically the diff between topic and partition. Glad, you are making videos on system design. I have a doubt. As per your explanation, we create a queue for a consumer group for a topic which we called as partition, to scale more, we create more partitions of a single partition in different Brokers and same consumer group will be getting data from different brokers for the partitions which we created. Please let me know if my understanding is correct? topic = events consumer group = A, B partitions (queues) = events_A and events_B to scale more, we distribute events_A to 2 brokers, broker 1 and broker 2, some events will go to Broker 1 and same will go Broker 2 now consumer group A will be getting data from events_A Queue (partition of Broker 1 ) and events_A Queue (partition of Broker 2 )
Great Video! Takeaway for me was how kafka can decouple producers and consumers. That was awesome! One question: Isn't the "acknowledgement" setting more of a trade-off between consistency guarantees and latencies? And not directly related to durability?
Hi, You are an amazing teacher and your knowledge is superb. I learned a lot! Thank you! 1 question, so lets say if a consumer commits the offset after finishing the job, wont there be a possibility that kafka cluster can send same message to 2 consumers not knowing till which offset messages are consumed?
Hey, I might be wrong but that batch time and size is not possible in kafkajs lib out of the box as every send works based on the provided ack and based on this it continues with rest of the code so batching messages won't get ack and thus won't work in js this way. Although, it does support sendBatch separately but if we have a API then that's not directly possible to batch unless we write a custom function to store messages as obj in js side and run periodically to flush out messages to Kafka but still size batching in js won't be so easy as per my understanding. Let me know if I'm missing something here. P.S: Talking about 39:28
Great vid! One thing about pub-sub. When multiple consumers consume same message, I believe that is called "broadcasting". It is still pub-sub though, as pub-sub is a more general term, meaning there is a producer-intermediate-consumer relation. When we have exactly one consumer per one message it is also a pub-sub. Please let me know if I messed up or not. Thanks
Excellent job on this. This is so helpful. I'm familiar with Azure Service Bus and Azure Event Hubs, since we use the Azure Stack. With Azure Event Hubs, the consumers maintain their own bookmark or offset into each partition, so they can choose when to checkpoint and/or replay events / records / messages if needed. Does Kafka have something similar? If I commit the offset to Kafka, but I want to replay events due to data loss, can I reset my offset?
Thank you for this detailed video. But a quick question here - I'm still confused when to use RabbitMQ and when to use Kafka? Because both of them can be helpful for all the use cases
Good going, please keep continuing this series! I had a question regarding consumer concurrency, which is not discussed in this video. Let's say I have 1 consumer group with 2 consumers running and topic having partition of 8, then each consumer will be assigned with 4 partitions when concurrency = 1, how the consumer gets affected if consumer concurrency is changed to 2 now ?
Are you asking what happens if consumer threads are increased from 1 to 2 for a single consumer instance in a group? If so, the consumer is still a single client of the broker, like kafka-client-01 and kafka-client-02. With more threads, the consumer can process messages from its assigned partitions concurrently, improving throughput. However, it still handles the same number of partitions overall.
Amazing video, thanks so much for sharing! The person in the Redis video mentioned 5 key technologies that are either most common or one should know. Do you guys plan to cover the other 3 after Redis and Kafka? That would be AMAZING!! :) Which ones are those that you guys were referring to?
For handling kafka consumer down we could turn manual commit offset on, There's option of AutoCommit Offset and a timer limit also when we should autocommit. Though Great video for revision for kafka. Also it would have been great if you had mentioned number of kafka consumer application limitations based on number of partitions.
rally nice video i have one question if we are pushing message in partition with custom key what if we are getting million of messages there in one partition only how we can address this issue? one could be increase the consumers in consumer group but what if this partition is not getting cleared ? could you please through some lights on this?
The dlq in kakfa can be connected to s3 connector and we can consume and store it in s3 buckt to be checked later What happens when the consumer has consumed the event, and hasn’t set the offset, does the other consumers read that msg again? Or 1 consumer has been assigned a set of partitions and can read from those partitions only?
Control question: If there's two kafka servers with partitions for topic A and a consumer subscribes to that topic A, how does it get an ordered log out of the two partitions on those two separate servers?
In the section about using Kafka for messenger, how would the topics and partitions for a messaging application like Messenger be structured to achieve low latency and high throughput? For example, if there are 1 billion users on the platform, would there be one billion topics, or a single topic with a billion partitions, one for each user (which I don't think is possible since the recommendation is 4k partitions per broker and max of 200K per cluster)? Is there a different approach that could be considered? What are the tradeoffs for each option? And great video. Thank you for doing this.
22:55 In the Whatsapp design by Stefan, using Kafka was marked as a bad solution since Kafka doesn't scale well to millions of topics and Redis pub/sub was recommended as a better alternative. Do you agree with this? It would be nice to have a section on when not to use Kafka :)
I have a question: If I want to consume a message and then perform a long-lasting task (like web crawling) before committing the offset, does it mean that I need to have a configuration where the number of consumers is strictly equal to the number of partitions to avoid duplicate readings of the same message?
can you kindly recommend any research papers about kafka that students can use for academics in order to learn about the history/development of kafka, some live case studies and further improvements in the field.
If we use a compound key of adId:userId, it will result in one partition per ad/per user. Is there is any concern having too many partitions each holding small number of messages?
I am working on a project recently that I want to process events asynchronous but in order. I am thinking of using Kafka/kinesis. How do I ensure that the two events are actually ingested into Kafka in order? What if event A ingestion is delayed with some network issue and event B which happened later got ingested before A?
In terms of Horizontal Scaling, from the accompanying article: "Horizontal Scaling With More Brokers: The simplest way to scale Kafka is by adding more brokers to the cluster. This helps distribute the load and offers greater fault tolerance. Each broker can handle a portion of the traffic, increasing the overall capacity of the system. It's really important that when adding brokers you ensure that your topics have sufficient partitions to take advantage of the additional brokers. More partitions allow more parallelism and better load distribution. If you are under partitioned, you won't be able to take advantage of these newly added brokers." My understanding is that Kafka can be scaled horizontally dynamically, perhaps as a system sees an unanticipated increase in volume. If thats correct, does the above imply that partitions can be added dynamically too? In the example cited, LeBron James campaign, I took that to mean that you'd add extra partitions for that campaign in anticipation of the additional traffic. In the case of hot partitions, can one of the prescribed techniques ( say random salting or compound keys ) be added on the fly? If this is non trivial can you maybe link to how this is achieved? Thanks so much!
In general, these are things handled by managed versions of Kafka, such as AWS MSK or Confluent Cloud. How they dynamically scale depends on each managed service. Typically, handling hot partitions is still not managed dynamically and requires conscious effort on the part of the developer.
With SQS, you probably don't need a retry topic, the attempts are tracked in main topic, and you can configure when the retry attempts exceeds some threshold, to put the message into DLQ. Also consumer just tell SQS whether the message got processed, if failed or timeout, SQS will make the message visible again and increase the attempts count, and SQS will put message into DLQ if needed, not consumer.
First off, thank you for these videos and resources, they are very valuable to anyone studying for interviews. I'm curious though, how would you improve the interview process as someone who's been on both sides of it for a number of years? I question the value of these interviews given that people are being asked to design massive systems, for billions of users, engineered by hundreds/thousands of people over a number of years, which were iteratively improved over time. They're expected to have a pretty ideal solution by having researched the problem or similar ones ahead of time, or much less often, having faced similar problems themselves. If someone was asked to actually design a system in a real production environment, they would spend ample time researching it ahead of time anyway, so I don't necessarily understand the value of them knowing it up front in an interview. I'm also curious how you would react if you were still interviewing people, and a candidate proposed a solution that's an identical or near-identical copy of yours. Would you pass them as long as they understood why each component is needed, and why certain technologies should be used over others? Would you have time to properly gauge that in a 45 minute interview once they've finished their design?
That's a big topic! One that likely requires a full blog post. I will say that, in general, we agree. The interview process within big tech is stuck in a local minima and is in need of a facelift. But as long as the supply of engineers exceeds demand, there isn't much incentive for companies. Their hiring process may have poor recall, but if precision stays high, they don't really care.
please create use case for Hybrid Cloud Architecture. example an mobile retail application (on cloud) connect to branch system (branch can run on offline mode too) :D
relied on your videos heavily while preparing for my system design interview and accepted my staff engineer offer today. you're doing the lord's work by not putting this content behind a paywall. will recommend your stuff whenever someone asks me for interview prep resources in the future. 🙏🙏
Amazing work! So happy to help and thanks for sharing your story with us.
Hello, what other resources did you use to prepare for System Design interviews?
1 Hello interview video = 100 exponents and medium articles thanks a ton for these
🤯
For real. It's equivalent to like 1 year of FANG working experience
You have amazing teaching skills! The World Cup example was incredibly good and entertaining to watch. I’ve paused my interview journey for now, but I always watch your videos for the pure knowledge they provide. Thank you so much!
Who doesn't love the world cup! 😍
It's not just for an interview at this point but a VERY high quality Kafka course!
God, you're so talented 😲
I don't know if it's just me, but it's EXACTLY how I like to learn new things, diagrams, some code and a high quality high level overview. The rest I'll figure it out easily.
People will love your courses if you decide to make some, it's very rare for someone of your level to take time to explain that well
This covers something that many other system design resources doesn't cover, which is the answer of why, when and trade-offs. Thank you so much.
I cannot thank you guys enough for putting these videos together! The way you lay the points out and provide the information really goes well with my learning style. Please keep them coming as I cannot get enough of these. Your content is the best out there in terms of teaching of teaching system design!
🤗
Thanks Evan, nicely explained with enough depth. Can you consider adding a section why Kafka is fast even though it is durable (disk vs in-memory)? Also, a common decision point is to choose from different alternatives, for example in this case Kafka vs Kinesis, Kafka vs RabbitMQ etc; can you add when not to use Kafka and look for alternatives?
This is awesome. Answered all my questions within the first 15 minutes. The rest was just bonus xD. Would love to see more content!
Most awaited topic. Thank you for the detailed and insightful video on Kafka. Your every video is a gold mine. 🙏🏻 ❤
Got you 🫡
You have amazing teaching skills, simple and easy to understand, can you do one video on cassandra
by far, this is the best , Explaantion with an usecase s the selling point for your channel subscription!
Amazing efforts, your knowledge sharing is highly appreciated, Thanks a ton brother.
Website is quite an another very informative resource, Thanks again for keeping it free.
This video is so so good! I am lucky to have found this before my interview!
These are super cool! I would love to see more of such deep dives into topics like Elastic Search, Flink, and Distributed Databases like Cockroach DB
Coming soon!
@@hello_interview Can't wait for Elastic Search!
Absolutely loved it! Especially the part explaining how different systems can utilise it. Waiting for more of these!!
Soon!
This is amazing introduction and deep dive.
Suggestion: Better to introduce why kafka is high throughput and low-latency:
1: sendfile() system call-> Zero-Copy Approach
2. Sequential Append for disk storage.
3. Distributed Architecture & Partitioning
Great explanation, you made all the intricacies of kafka sound so simple, thank you so much for making this video !!!
Great video! I'm glad to have found the youtube channel of HelloInterview. A lot more practically useful content and advice for actual system design interviews, compared to other channels in youtube.
Thank you so much for such details explanation !! after watching got another perspective of handling failure scenarios using retry and dead letter queue. Thanks again for such a great content !!
This was great! Thank you so much for giving this knowledge out!
amazing discussions and pointers as always.. Evan... always look forward to your videos..
Glad you liked it!
Thank you so much for these, it's incredibly well-thought and easy to follow! Also love the practical example!
Thanks, this is very helpful! It would be great also mention that how deep should we understand kafka for different levels of system design interview, maybe not all the dive deeps are required for E4?
Great appreciation for the knowledge you have shared. I am waiting for new videos on System design or deep dives.
Thanks a ton..!!
The best system design video about Kafka I have ever seen.
First!
Shoutout to this channel! It really prepared me for all my system design interviews this cycle
Fast! 💨
The mock interviews were very useful for me!
Your content is the best! Do keep pushing out the content!
This is hands down the best Kafka explanation I have seen so far :)
Just went through the whole video, and loved it depth of the topics you have explained
I always struggled to choose between queues, stream and pub-sub but this video makes it super easy to understand what to use and when to use.
Your videos are super high quality!
37:12 - The way idempotent producer mode works is that each producer is assigned a unique id, and each message from a producer is assigned a monotonically increasing sequence id (per topic). The server will discard a message if it's sequence id is not greater than the latest seen by the topic.
Thanks so much for putting this video. I love the way you explain everything. Keep it up the good work
I have a system design tomorrow and this is a perfect timing to watch!
Good luck! You got this 💪
very very very insightful, keep this amazing working on this series up.
Thanks.
underrated video, really good video
You have saved my time by short , beautiful and point to point answer
Great explanation as always! I keep watching your videos over and over. As a previous commenter mentioned, it would be great a deep dive on ZooKeeper, it’s mentioned many times in Orchestration/Coordination scenarios along with Consistent Hashing and I think it would be valuable to understand how it works.
Feel free to vote for what you want to see next here! www.hellointerview.com/learn/system-design/answer-keys/vote
@@hello_interview done
Great video. Would love to hear your thoughts on using Kafka/streaming platforms when dealing with use-cases that require strong consistency.
Thank you. Love every video published so far.
Love you brother, you are killin' it!! God sent person!
Your videos are so informative and helpful , will love to see more videos from your side
Coming soon!
This is great - keep them coming - if you produce it I'll consume it!
This is so great thanks a lot. One question with the diagram at @43:24. Is it actually possible to have the leader and followers of a partition be in the same broker? I thought that with 2 brokers as in the example, the max replication factor is 2, where the leader and follower are in each brokers
Superb. Someone who has just theoretical knowledge on Kafka helped me understand this "topic" little better. Request for a video on ZooKeeper (I think Kafka moved away from ZooKeeper to kRaft)
Yah exactly right re-kRaft. Consensus something I maybe should have mentioned but, while key to internals, not really necessary to know about in an interview.
@@hello_interview Thank you for the videos. Do interviews at staff/ principal level focus on consensus? At least, glance them?
Amazing content!! Would love to see more deep dives. Maybe into some common AWS tools used in system design interviews.
There’s a dynamodb write up on our website!
@@hello_interview you guys are awesome, thank you for putting this all out there for free.
[22:32] What's flink - is that an alternative to Reddis? Is that a design for scalable "Leaderboard" type of application?
Your process for teaching is amazing. Diagrams and perfect balance of high-level and low-level info, using the deep dives. Anyone interested to know more has enough base info to search themselves.
Please make more of the technology deep dives. Also, if you could do some of difficult core concepts deep dives.
Elasticsearch, mongodb, cassandra, graph dbs, something detailed on available load balancers, rate limiter, api gateways implementations deep dive. Geoindex or spatial index.
Elasticsearch is next!
Thank you for all you guys do!
Nicely explained specifically the diff between topic and partition. Glad, you are making videos on system design. I have a doubt.
As per your explanation, we create a queue for a consumer group for a topic which we called as partition, to scale more, we create more partitions of a single partition in different Brokers and same consumer group will be getting data from different brokers for the partitions which we created. Please let me know if my understanding is correct?
topic = events
consumer group = A, B
partitions (queues) = events_A and events_B
to scale more, we distribute events_A to 2 brokers, broker 1 and broker 2, some events will go to Broker 1 and same will go Broker 2
now consumer group A will be getting data from events_A Queue (partition of Broker 1 ) and events_A Queue (partition of Broker 2 )
Great Video! Takeaway for me was how kafka can decouple producers and consumers. That was awesome!
One question: Isn't the "acknowledgement" setting more of a trade-off between consistency guarantees and latencies? And not directly related to durability?
Hi, You are an amazing teacher and your knowledge is superb. I learned a lot! Thank you! 1 question, so lets say if a consumer commits the offset after finishing the job, wont there be a possibility that kafka cluster can send same message to 2 consumers not knowing till which offset messages are consumed?
This content is gold !!
Hey, I might be wrong but that batch time and size is not possible in kafkajs lib out of the box as every send works based on the provided ack and based on this it continues with rest of the code so batching messages won't get ack and thus won't work in js this way.
Although, it does support sendBatch separately but if we have a API then that's not directly possible to batch unless we write a custom function to store messages as obj in js side and run periodically to flush out messages to Kafka but still size batching in js won't be so easy as per my understanding.
Let me know if I'm missing something here.
P.S: Talking about 39:28
well this is also for: Design a distributed Message Queue! good video!
Great vid! One thing about pub-sub. When multiple consumers consume same message, I believe that is called "broadcasting". It is still pub-sub though, as pub-sub is a more general term, meaning there is a producer-intermediate-consumer relation. When we have exactly one consumer per one message it is also a pub-sub. Please let me know if I messed up or not. Thanks
I feel like a pro already! nice job
Liked even though I haven't watched the video. I know it will be a banger !
Don’t speak to soon haha
A deep dive on a Postgres DB and on a Mongo, will be a great help!
Excellent job on this. This is so helpful. I'm familiar with Azure Service Bus and Azure Event Hubs, since we use the Azure Stack. With Azure Event Hubs, the consumers maintain their own bookmark or offset into each partition, so they can choose when to checkpoint and/or replay events / records / messages if needed. Does Kafka have something similar? If I commit the offset to Kafka, but I want to replay events due to data loss, can I reset my offset?
thanks for amazing explanation & deep dive into kafka :)
Thank you for this detailed video. But a quick question here - I'm still confused when to use RabbitMQ and when to use Kafka? Because both of them can be helpful for all the use cases
Hi Evan, great content! One question- how about using a time series DB like Influx or Prometheus for aggregation by time slices? Will that work?
great video! one question, how does Kafka handle exact once delivery? is it good enough to set the idempotence on the producer to ensure that?
Good going, please keep continuing this series!
I had a question regarding consumer concurrency, which is not discussed in this video.
Let's say I have 1 consumer group with 2 consumers running and topic having partition of 8, then each consumer will be assigned with 4 partitions when concurrency = 1, how the consumer gets affected if consumer concurrency is changed to 2 now ?
Are you asking what happens if consumer threads are increased from 1 to 2 for a single consumer instance in a group? If so, the consumer is still a single client of the broker, like kafka-client-01 and kafka-client-02.
With more threads, the consumer can process messages from its assigned partitions concurrently, improving throughput. However, it still handles the same number of partitions overall.
Thanks for the assist!
Love these deep dives, thanks!
♥️
Really amazing, You explanied it really well,
Thanks for the great effort :)
Cheers!
Thank you so much. Love your channel. Please provide a deep dive on Redis too. 🙏
Already got one! th-cam.com/video/fmT5nlEkl3U/w-d-xo.html
They did earlier
@@hello_interview Thanks a lot.
@@kamalsmusic Thanks a lot.
absolute BANGER
‼️
Thank you! Great explanation as always
I have used Kafka a lot, but this video just enforced the nitty gritty details. Great content!
High praise from a pro!
Amazing video, thanks so much for sharing! The person in the Redis video mentioned 5 key technologies that are either most common or one should know. Do you guys plan to cover the other 3 after Redis and Kafka? That would be AMAZING!! :) Which ones are those that you guys were referring to?
Planning content on ElasticSearch, Postgres, and Dynamo next. Some internal debate about #5 but you'll see those sometime in the coming weeks.
@@hello_interview amazing, thank you so much!!
For handling kafka consumer down we could turn manual commit offset on, There's option of AutoCommit Offset and a timer limit also when we should autocommit. Though Great video for revision for kafka. Also it would have been great if you had mentioned number of kafka consumer application limitations based on number of partitions.
Great thing to mention!
rally nice video i have one question if we are pushing message in partition with custom key what if we are getting million of messages there in one partition only how we can address this issue? one could be increase the consumers in consumer group but what if this partition is not getting cleared ? could you please through some lights on this?
thanks for the great video, I am wondering to handle hot partition, can we also just use batch?
Yes, depending on your throughout requirements
Thanks for the deep dive! When would you use SQS with FIFO over kafka?
If I need built in support for retries, viability timeouts, or am already deep in aws ecosystem are two places
The dlq in kakfa can be connected to s3 connector and we can consume and store it in s3 buckt to be checked later
What happens when the consumer has consumed the event, and hasn’t set the offset, does the other consumers read that msg again? Or 1 consumer has been assigned a set of partitions and can read from those partitions only?
Great video Evan.......
🩵
Thank you very much!
Which tool are you using for whiteboard? Looks very clean!
Excalidraw
Control question: If there's two kafka servers with partitions for topic A and a consumer subscribes to that topic A, how does it get an ordered log out of the two partitions on those two separate servers?
In the section about using Kafka for messenger, how would the topics and partitions for a messaging application like Messenger be structured to achieve low latency and high throughput? For example, if there are 1 billion users on the platform, would there be one billion topics, or a single topic with a billion partitions, one for each user (which I don't think is possible since the recommendation is 4k partitions per broker and max of 200K per cluster)? Is there a different approach that could be considered? What are the tradeoffs for each option?
And great video. Thank you for doing this.
Some alternatives discussed here: www.hellointerview.com/learn/system-design/answer-keys/whatsapp
Awesome content !🙌🏿🙌🏿
Listening via AUX while I’m driving. Love it. Curious to see it visually later.
Hello interview podcast lol
22:55 In the Whatsapp design by Stefan, using Kafka was marked as a bad solution since Kafka doesn't scale well to millions of topics and Redis pub/sub was recommended as a better alternative. Do you agree with this? It would be nice to have a section on when not to use Kafka :)
I have a question: If I want to consume a message and then perform a long-lasting task (like web crawling) before committing the offset, does it mean that I need to have a configuration where the number of consumers is strictly equal to the number of partitions to avoid duplicate readings of the same message?
Nope, just gave them as part of the same consumer group.
can you kindly recommend any research papers about kafka that students can use for academics in order to learn about the history/development of kafka, some live case studies and further improvements in the field.
how we monitoring kafka? what's metric we can focus set for alerting?
topic A: why partion1 follower and partition 1 leader can be on same broker/server ? what 's the sense of that ?
I feel like I am committing a crime to watch this for free. Keep it up, Evan!
Best comment!
If we use a compound key of adId:userId, it will result in one partition per ad/per user. Is there is any concern having too many partitions each holding small number of messages?
It’s consistent hashing on the partition key. So it’s not a new partition per ad:user pair.
Very well structured!
I am working on a project recently that I want to process events asynchronous but in order. I am thinking of using Kafka/kinesis. How do I ensure that the two events are actually ingested into Kafka in order? What if event A ingestion is delayed with some network issue and event B which happened later got ingested before A?
In terms of Horizontal Scaling, from the accompanying article:
"Horizontal Scaling With More Brokers: The simplest way to scale Kafka is by adding more brokers to the cluster. This helps distribute the load and offers greater fault tolerance. Each broker can handle a portion of the traffic, increasing the overall capacity of the system. It's really important that when adding brokers you ensure that your topics have sufficient partitions to take advantage of the additional brokers. More partitions allow more parallelism and better load distribution. If you are under partitioned, you won't be able to take advantage of these newly added brokers."
My understanding is that Kafka can be scaled horizontally dynamically, perhaps as a system sees an unanticipated increase in volume. If thats correct, does the above imply that partitions can be added dynamically too? In the example cited, LeBron James campaign, I took that to mean that you'd add extra partitions for that campaign in anticipation of the additional traffic. In the case of hot partitions, can one of the prescribed techniques ( say random salting or compound keys ) be added on the fly? If this is non trivial can you maybe link to how this is achieved?
Thanks so much!
In general, these are things handled by managed versions of Kafka, such as AWS MSK or Confluent Cloud. How they dynamically scale depends on each managed service. Typically, handling hot partitions is still not managed dynamically and requires conscious effort on the part of the developer.
0:29 whats the other 4?
Redis, Elasticsearch, Postgres, casdnadra/dynamodb. All but Postgres are on our website
Great resource!
What’s the name of the drawing/diagram app?
Excalidraw
In batch consuming, from a batch of 100, I successfully processed 65 and then my service crashed. How is the commit / retry handled in this case?
With SQS, you probably don't need a retry topic, the attempts are tracked in main topic, and you can configure when the retry attempts exceeds some threshold, to put the message into DLQ. Also consumer just tell SQS whether the message got processed, if failed or timeout, SQS will make the message visible again and increase the attempts count, and SQS will put message into DLQ if needed, not consumer.
@hello_interview
Can you do a video on Designing a Ci/CD system. ?
Awesome content!! Can you guys do a video on Zookeeper?
First off, thank you for these videos and resources, they are very valuable to anyone studying for interviews.
I'm curious though, how would you improve the interview process as someone who's been on both sides of it for a number of years?
I question the value of these interviews given that people are being asked to design massive systems, for billions of users, engineered by hundreds/thousands of people over a number of years, which were iteratively improved over time. They're expected to have a pretty ideal solution by having researched the problem or similar ones ahead of time, or much less often, having faced similar problems themselves. If someone was asked to actually design a system in a real production environment, they would spend ample time researching it ahead of time anyway, so I don't necessarily understand the value of them knowing it up front in an interview.
I'm also curious how you would react if you were still interviewing people, and a candidate proposed a solution that's an identical or near-identical copy of yours. Would you pass them as long as they understood why each component is needed, and why certain technologies should be used over others? Would you have time to properly gauge that in a 45 minute interview once they've finished their design?
That's a big topic! One that likely requires a full blog post.
I will say that, in general, we agree. The interview process within big tech is stuck in a local minima and is in need of a facelift. But as long as the supply of engineers exceeds demand, there isn't much incentive for companies. Their hiring process may have poor recall, but if precision stays high, they don't really care.
@@hello_interview agreed about a needed facelift, until then, the grind continues :) thanks again for these
please create use case for Hybrid Cloud Architecture. example an mobile retail application (on cloud) connect to branch system (branch can run on offline mode too) :D