4:04 who will take care of committing the consumer_offset? Looks like point 3 days consumer does the commit. But, if in case there is a bug in the code that doesn't let consumer commit, would it be reading the same message again and again or it'll just keep on waiting for new message past the current offset?
I think it would be more accurate to say that Kafka stores offsets until which a partition has been read by any consumer in a consumer group. It does not really care about an identity of particular consumer.
If I just write one consumer, then can I get multiple instances of that consumer to attach to different partitions? Also if I attach a consumer to just a topic is it by default bound to a partition or is it a slave to all partitions (since one consumer can read from multiple partitions)?
hi, i tried to set up a consumer. it's worked but after sometimes i got error like "consumer instance not found.". how i can troubleshoot this type of senerios?
Really great video. Finally I saw it. I have this question, if I can. If I have listener applications with a given group id and 4 brokers, will each instance read from a different broker (partition) in parallel after autoscaling to 4 instances of my application?
Hi Stephane, Need your expertise here. I'm stuck in my last step of my assignment. 1) Create a kafka consumer to consume messages from topic 'topic-1' 2) store them in '/tmp/kafka-messages' . Can you please assist in the step 2. Appreciate your help on this. Thanks
Thank you for this nice video Stephane! I understand that each consumer is a separate independent process that reads from one or more partitions. No two consumers from the same group will read the same partition. So to achieve a parallelism of 5, should I create and run 5 consumers, all grouped under same consumer-group?
The offsets are nice but what if some consumer app has just read a bunch of data and fails while it's processing it...how can we guarantee no data loss? Is there a mechanism to control read acknowledgement similar to write acknowledgement?
Messages are kept in the partition until their retention period expires. You could keep a persisted track of each partition’s offset that has been “committed” into your consumer app. If your processing crashes, you can reset to the older offset and read them again, as long as they haven’t expired.
how kafka identifies single consumer? for example consumer goes down, 5mintues later hes up so we suppose to continue where he finished last time but how kafka knows which consumer it is and if he even been connected previously? by ip? what if there are two consumers on single machine - so ip+port?
It identifies based on consumer group n hw many consumers. If 1 fails, it continues to process on other 2 consumers in same group. Once consu1 comes back it will start processing from last committed msg in a partition, not nesnsorily which msg consu1 last committed, I think.
If you want to learn more, check out my Apache Kafka Series - Learn Apache Kafka for Beginners v2 course : links.datacumulus.com/apache-kafka-coupon
great thought of storing consumer offset👍
Nice explanation by Stephane, remember, visual memories are strongest. So he ensured to explain with diagrams.
I needed this!
I really got that diagram into my head... Thanks
4:04 who will take care of committing the consumer_offset? Looks like point 3 days consumer does the commit. But, if in case there is a bug in the code that doesn't let consumer commit, would it be reading the same message again and again or it'll just keep on waiting for new message past the current offset?
I think it would be more accurate to say that Kafka stores offsets until which a partition has been read by any consumer in a consumer group. It does not really care about an identity of particular consumer.
Great video. Thank you for sharing.
If I just write one consumer, then can I get multiple instances of that consumer to attach to different partitions? Also if I attach a consumer to just a topic is it by default bound to a partition or is it a slave to all partitions (since one consumer can read from multiple partitions)?
i have a doubt can two consumer grp read data from same partition
How can I create multiple consumer instances within a consumer group?
thank you, you are amazing
hi, i tried to set up a consumer. it's worked but after sometimes i got error like "consumer instance not found.". how i can troubleshoot this type of senerios?
Really great video. Finally I saw it. I have this question, if I can. If I have listener applications with a given group id and 4 brokers, will each instance read from a different broker (partition) in parallel after autoscaling to 4 instances of my application?
How to find the group Id of a kafka topic if no separate consumer groups are created.
what is number in consumer-offsets-(Number)
Hi Stephane, Need your expertise here. I'm stuck in my last step of my assignment.
1) Create a kafka consumer to consume messages from topic 'topic-1'
2) store them in '/tmp/kafka-messages' .
Can you please assist in the step 2. Appreciate your help on this. Thanks
Thank you for this nice video Stephane! I understand that each consumer is a separate independent process that reads from one or more partitions. No two consumers from the same group will read the same partition. So to achieve a parallelism of 5, should I create and run 5 consumers, all grouped under same consumer-group?
I think Parallelism of 5 can be achieved with 3 consumers too.
The offsets are nice but what if some consumer app has just read a bunch of data and fails while it's processing it...how can we guarantee no data loss? Is there a mechanism to control read acknowledgement similar to write acknowledgement?
Messages are kept in the partition until their retention period expires. You could keep a persisted track of each partition’s offset that has been “committed” into your consumer app. If your processing crashes, you can reset to the older offset and read them again, as long as they haven’t expired.
how kafka identifies single consumer? for example consumer goes down, 5mintues later hes up so we suppose to continue where he finished last time but how kafka knows which consumer it is and if he even been connected previously? by ip? what if there are two consumers on single machine - so ip+port?
It identifies based on consumer group n hw many consumers. If 1 fails, it continues to process on other 2 consumers in same group. Once consu1 comes back it will start processing from last committed msg in a partition, not nesnsorily which msg consu1 last committed, I think.
Zookeeper never comes into picture while taking care of these consumer offsets?
"Messages are read in order within partition, but they read parallel across partition"
Bro you have to speak more cleary....open your mouth.