Kafka Interview questions and answers for 2024 for Experienced | Code Decode [ MOST ASKED ] | Part-1
ฝัง
- เผยแพร่เมื่อ 10 ธ.ค. 2023
- For java developers top kafka interview questions for freshers and experienced we have covered most asked kafka interview questions
Udemy Course of Code Decode on Microservice k8s AWS CICD link:
openinapp.co/udemycourse
Course Description Video :
yt.openinapp.co/dmjvd
Vamsi’s LinkedIn : / vamsi-karuturi
What is Apache Kafka?
Its a publish-subscribe based durable messaging system, exchanging data between processes, applications, and servers.
Apache Kafka, created by former LinkedIn data engineers and written in Scala and Java, became an open-source technology in 2011. It started as a highly scalable messaging system and is now a crucial part of the Confluent Stream Platform, managing trillions of events daily. Many well-known companies trust and use Apache Kafka.
In real time, managing data and logs is a big task. This involves processing, reprocessing, analyzing, and handling information, often in real-time. This is where Apache Kafka shines. It plays a significant role in handling messages efficiently in a streaming environment. The principles behind Kafka's design focus on meeting the increasing demand for systems that can handle a large volume of data
Architecture of kafka
The architecture of Apache Kafka is designed for distributed, fault-tolerant, and scalable handling of streaming data.
Let's explore the key components and their interactions in detail:
Producer
Producers are applications that publish messages to Kafka topics. They create ProducerRecord like this public ProducerRecord
Topic:
Topics are logical channels or categories to which messages are published. Topics can be divided into partitions for scalability and parallelism.
Partition:
Each topic is divided into partitions, which are the basic units of parallelism. Partitions allow Kafka to distribute and parallelize the processing of messages.
Broker:
Responsibility: Kafka brokers are individual servers within the Kafka cluster. They store and manage data, handle producer and consumer requests, and participate in the replication and distribution of data.
ZooKeeper:
Responsibility: ZooKeeper is used for distributed coordination and management of the Kafka cluster. It maintains metadata, leader election, and helps manage the overall state of the cluster.
Topic Partitions Replication:
Responsibility: Each partition has multiple replicas for fault tolerance. Replicas are distributed across different brokers. One replica is designated as the leader, and the others are followers.
Consumer:
Responsibility: Consumers subscribe to one or more topics and consume messages from partitions. Each consumer group can have multiple consumers, and each consumer within a group processes a specific subset of partitions.
Why did you use Kafka?
Scalability
Fault Tolerance
Durability
Real-time Stream Processing
Partitioning
High Throughput
Data Retention Policies
Dynamic Reconfiguration
Community and Support
What is the role of the ZooKeeper in Kafka
ZooKeeper acts as a centralized and reliable coordination service for Kafka, ensuring that the distributed components of a Kafka cluster can work together seamlessly.
It helps in managing the dynamic nature of Kafka clusters, providing fault tolerance, and enabling coordination among different components.
It's worth noting that starting with Kafka version 2.8.0, there have been efforts to reduce Kafka's dependency on ZooKeeper, with ongoing work to replace it with a self-managed metadata store. Its said that with kafka 4.x it will be completely removed while with 3.x its removed but still its not prod ready.
Most Asked Core Java Interview Questions and Answers: • Core Java frequently a...
Advance Java Interview Questions and Answers: • Advance Java Interview...
Java 8 Interview Questions and Answers:
• Java 8 Interview Quest...
Hibernate Interview Questions and Answers:
• Hibernate Interview Qu...
Spring Boot Interview Questions and Answers:
• Advance Java Interview...
Angular Playlist:
• Angular Course Introdu...
SQL Playlist: • SQL Interview Question...
GIT: • GIT
Subscriber and Follow Code Decode
Subscriber Code Decode: th-cam.com/users/CodeDecode?...
LinkedIn : / codedecodeyoutube
Instagram: / codedecode25
#kafka #codedecode #kafkainterviewquestions
Very important video. Kafka, Asynchronous messaging is forming the main part of backend Java developer interviews these days. Thanks team. Keep such content coming !
Thanks Sayan 😊
Working in Kafka for the past 2 years without knowing much but just configuring topics. Now u made me understand the whole. Thanks!
😊😊👍
Very useful. Please upload this video continuation too. Thanks a lot!
Very informative video on kafka ..always referring these videos with brilliant teacher to have us..thanks for come up with such videos.
Thank you for this video. Eagerly awaiting for the Part-2
I'm sure this channel will grow rapidly and extensively. Content quality is too good.
Wonderful video. Learn a lot in 30mins about kafka.
😊😊
I was asked recently in morgan stanley interview how consumer handles messages coming in sequence if the consumer fails. Wish i saw this video before. Useful content ❤
Thanks 😊👍
Awesome video. Saved my 2-3 hours of reading. Waiting for part-2
😊😊
I was waiting for this 😮 😊 please continue thanks for the knowledge 🙏
Sure thanks 😊😊
thank u so much for making this kind of video please make second part also
Great informative video, Do share the Part - 2
next level of explanation . want more videos on this👌👌
Sure we will create more on this
Hi, Thanks to Code Decode team, Can you please clarify the below points;
1. Can producer publish data on followers also or only on leader?
2. Will offset be removed after consumer consumed, if it will not remove till some time,then data will be huge data in the partition, then what is the limit of partitions?
3. Can multiple consumers consume the same offset?
4. how different consumers knows from which offset they need to start consuming?
5. Can multiple consumers from the same group or different groups consume same offset?
6. how to handle consumer lag any case of consumer fail to consume the data?
1. Only producer and followers will replicate data
2. Once offset is consumed, it’s consumed for particular consumer group so message can still be consumed again. There is retention policies, after that data will be deleted permanently
3. Yes, as long as consumers belong to different group
4. No, as partition data us read by specific consumer of group only
5. Fine tune buffer size, rebalance consumers or topic partition etc
Hi Mam, Thanks for an amazing content. Requesting you to create more interview videos on Kafka.
Thanks a lot for helping us to prepare for the interview..
Also the baby from the background is also trying to teach us something..it is sad we can't understand 😅
🙊🙊🙊🙊 sorry for the inconvenience❤️ And thanks for understanding ,😘
Thanks for the video. Very informative.
You're welcome!
Thanks for great videos. Please create a video on kafka rebalancing, how to clear lag in kafka, how to decide how many partitions to chose for a kafka topic.
Noted👍
Awesome content. Thanks for sharing knowledge.
You are welcome
Thanks to the Code Decide team, it's great I have been following ur classes for quite some time really appreciated. In this particular video,
(1)What are the reasons that the leader broker dies ?
(2) How the Leader Election does -- what's the Behind Algorithm and how this ISR works when data is produced in a broker/partition ?
(3) Why Exactly Three 3 copies of Follower Brokers with the same set of data in Partitions ?
(4) Do We Now have a successful Kafka running version Without ZOOKEEPER Dependency ?? which Version is it ?? How successful this new version in market is now ??
Please reply with your insights and answers.
Thanks keep doing this nice work and knowledge sharing for a lot of aspirants to become successful IT careers like us. 👍🙏😊
1) broker is nothing but a server, which we start with a command. Now it can stop or crash any time just like our normal servers.
2) using epoch number. The most recently and most updated isr / broker is choosed az as new leader
3) it's a best practice to use at least 2 replica for fault tolerance. You can have 0 or you have 10
4) we are in 3.x version as of now. 4.x.x will have zookeeper completely removed
Thanks very useful. We need more interview questions on kafka.
Sure 👍👍
hi, it was a very helpful video. If you could please make more videos to make more questions on kafka, it will be extremely helpful.
You are the Best 🙏🙏
😊😊 Thanks 👍😊
How to handle the scenario where a consumer gets delayed reading a message, for example- Consumer B reads "booking successful" message before and a Consumer A could read "payment failure" message ? When both messages are from different topics
Great Video
Thanks 👍
Very beautiful explanation.
Thanks
Waiting for this long time, Thanks. First comment
Thanks for your comment Akash 😊👍
Awesome
Thanks
Awesome ❤, need other topics too
👍👍😊sure
Many thanks❤
😊😊👍👍
Your presentation is very good. 👍 There are so many videos on Udemy which don’t follow minimum standards. I really liked the way you explained the concept. I am not sure if you have your training courses on udemy as well. If not you can try that and you will be a star teacher over there. Just one suggestion though, don’t repeat a point of explanation multiple times, to make your video crisp and clear.
Thanks man. Means a lot 😊😊
Thanks for this good video.
☺️☺️
please make more practical videos on kafka, Thank you so much
Sure we will create it soon
Pls make part 2 and also pls explain clearly about the consumer groups in nxt video.
Sure we will create it soon
Hi please clarify that how can u ensure that Kafka consumes the messages in the sequential? 2. Will publisher publish duplicate messages? 3. How to retailn failed messages from the dlq within 7 days?
how does zookeeper track brokers? and how it handles leader and follower? did you miss kafka controller?
Please make video on grafana and kibana and Zipkin as well
Very nice explanation can you please share the new coupon code for this course Master Spring Boot Microservice & Angular K8s CICD AWS??
👍
Please continue interview series.
Sure 😊👍👍
Can you please cover the remaining topics of Kafka..
🙏👍
😊😊
Pls make video on basis of cloud like clusters, pods, etc
Sure ekta we will create videos soon on kubernetes concept
Please upload more and more question
sure we will upload it soon
Nice vedio on Kafka, but there is a small mistake. You said both Producer writes and Consumer reads from Leader partition, which is incorrect. As this would stop the concept of parallelism when there are more than one consumer, and both trys to read from leader in your case.
Generally, the leader status of the partition does not have any bearing on the consumer.
The leader status only applies to the producers and the broker, so the consumer is fine to read all its data from both the follower Partition and the leader Partition.
"The leader handles all read and write requests for the partition while the followers passively replicate the leader"
As per the documentation of Kafka itself.
kafka.apache.org/081/documentation.html
To answer your question on parallelism,
"Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster."
Again as per the documentation 😊😊
@@CodeDecode Thanks.. I could'nt able locate the info provided in the documentation..
But for simplicity, lets say we only have one topic, three brokers and 3 consumers belongs to different consumer group.
Now per the documentation only one Broker can have a leader for this one Topic and rest 2 are followers. (I assume we both agree on this)
In this case, lets say reads happens only from the leader, then all three consumer would read only from the leader for this topic, which stops parallelism right.
So ideally for Kafka to support parallelism here, one broker is assigned per consumer, so one of the consumer may read from leader , but the other 2 has to read from the follower. Otherwise in your case, all consumer will read from leader, which is from same broker, which distrupts parallelism.
so here's a more accurate representation:
"The leader handles all write requests for the partition, actively replicating the data to its followers. While the leader is responsible for coordinating writes, both leaders and followers can handle read requests. Consumers can read data from both the leader and followers, providing fault tolerance and parallelism."
please correct me if my understanding is not correct .. still learning, and thanks for the reply..
@@MohaideenA you are correct , @Code decode got this one incorrect..
Your explanation seems reasonable to me too but am still confused as documentation is contradicting it. Give me some time. Will try to connect to open source team. Either they will revise the documentation or else our doubt will be cleared
@@CodeDecode Thanks for getting back to me..
How much charge and duration ?
Sorry didn't get you question
Your pronounciation of "T" alphabet is so disturbing
very helpful mam. pls share contact
Thanks. Ucan connect with us on codedecodebusiness@gmail.com
Thank you for this video. Eagerly awaiting for the Part-2
Sure we will upload it soon