"If a consumer disconnects before committing the offset, Kafka will automatically send the data to another consumer." Actually, this part is a bit misleading, as Apache Kafka doesn't 'send' anything because it's a pull-based model which means when a new consumer is assigned to a partition, it will start reading data from the last saved offset for that partition. If the previous consumer didn't save its progress, the new consumer will start from the last saved point and might process some messages again.
I can see how this was misleading. In terms of data flow, the message moves to an available consumer, but yes the consumer first must poll for new records. Traditional queues work this way as well. Thanks for clarifying :)
I feel like what's not talked about enough in this kind of comparison is the tradeoffs 1. kafka is much more expensive to run, especially because of the persistence 2. rabbitmq is generally slower at processing messages (hence unsuitable for streaming) 3. kafka has a lot more plugins for different backend services 4. rabbitmq is relatively simpler conceptually hence faster to integrate
Not true. Rabbit is much faster until it hits 100%. It's outrun kafka very significant. And consume less resources. So it's not true that rabbit is slower
Okay i think a few people have pointed it out already, but each consumer of a Kafka topic does _not_ get a copy of the same message. It is one consumer in each consumer group that gets a copy of the message. A consumer group is usually assigned to an application, and each instance of that application is a consumer within that group. If each consumer ended up receiving a copy of the message, then it would lead to a lot of duplicate processing. The example they video used of Logs, Data Analysis etc. is sound. Each of those would be a consumer group though, and e.g. the logs service would have multiple instances, where each instance would be a consumer.
This is a common misunderstanding :) Every consumer assigned to a partition in Kafka gets a copy of every message sent to that partition. All consumer groups do is distribute consumers to different partitions so that each partition only has a single consumer. This causes difficulties for traditional queueing use cases where we want to load balance on the consumer side and/or need to handle consumer disconnects without duplicating messages.
@interviewpen Yes the job of the consumer group is to distribute the consumers under it so that each partition gets no more than one consumer attached to it. So in your example, "logging" would be a consumer group. E.g. if the topic has 4 partitions, and you have two instances of the logging application, then the consumer group would assign each instance (i.e each consumer) to two different partitions. And a new application for say "analytics" that wants to listen to the topic would need a different consumer group to assign its instances (consumers) to partitions. And each message that lands in a topic goes to a single partition, and is then ready by a single consumer from each consumer group. So the fan out is happening across consumer groups, not individual consumers. Among the two instances of my logging application, only one instance receives the message. And since each instance is reading from two different partitions, we also get good load balancing. The caveat is to not have more instances than partitions.
Awesome explanation on the difference of use cases. I've worked with kakfa but kind of new to RabbitMQ and this video helps me getting the brief insights on these. Thanks
You misunderstood the difference between consumer group and consumer in Kafka. In Kafka, one particular message can ONLY be delivered to a consumer, as same as RabbitMQ. But, different consumer in different consumer group will be able to consume a same message independently.
Hi! Consumer groups work around this issue, but at its core, Kafka is in fact fan-out. Two consumers assigned to the same partition will both receive the same messages. All consumer groups do is assign consumers to different partitions within a topic. This is a commonly misunderstood nuance :)
@@interviewpen Agree with @kaixu1026. Within a consumer group, you get a similar abstraction to RabbitMQ without the scalability ceiling. Audience will have a difficult time discerning that, however.
@@interviewpen you can not assign two consumes in a same consumer group to a same partition for runtine consuming in Kafka. Right? Then,technically,it is impossible to consume the same message from 2 consumers if they belongs to the same consume group. There is no doubt Kafka is in fact fan-out. But,your audience may be confused if they believe Kafka can assign 2 consumers to consume a same message without any restriction. consumer group is a key core concept of kafka in my mind. :)
There is so many things wrong this video. Kafka doesn't directly fan out. Each consumer is assigned to one or more partition based on the configuration of kafak. Once a message is read and it doesn't send to all the consumer since all consumer would be assigned to different partition. This is just misleading. I don't know why people aren't calling it out. and There is another comment also pointing out "Kafak doesn't send anything it's pullbased" was there which is wronngly mentioned in video.
That’s not true. Each message delivered to a partition is fanned out to every consumer assigned to that partition. And yes, consumers poll for new messages, as with traditional queues. This might help: docs.confluent.io/platform/current/clients/consumer.html
I think there is some confusion here... There seems to be confusion around consumers and consumer groups.. As @ashutosh mentioned, consumers of the same consumer group are assigned one or more partitions of a topic. Multiple consumer groups can read from the same topic. Ex: logging, notification, alerting etc can all be three different consumer groups reading from same topic and doing their own independent things.
@@adarsh.hatwaryes! this is not well explained the video needed to introduce consumer groups and consumer for those of us who know Kafka it's clear but for a newcomer it's confusing
Both systems are designed to decouple. Both systems have routing/filtering/processing capabilities, that either built in or part of their eco system. Main difference is that kafka is designed for events, rabbit is for commands. And ordering/persistence associated with those.
@@asian1599 Both are messages. Events are facts about something that has already happened, e.g. OrderPlacedEvent. Events can be internal/external, e.g domain/intergration, it depends on a scope and usually are a part of systems that implement EDA (event driven) architecture. Commands on the other hand, are messages that tell some service to do something, e.g ValidateCustomerCommand or ShipOrderCommand. Those can or can't be part of EDA systems, but certainly a part of Orchestration (saga). Events are meant to be 1-many (1 event - many or none listeners), thus kafka streams. Commands are meant to be 1-1 (point to point), thus rabbit queues.
@@asian1599 event: OrderPlaced and you can push all the relevant details associated with it to any consumer that maybe interested, say, to update total sales stats, inventories and what not. Command would be: PlaceOrder - you basically requesting your system to start taking steps required to handle the order, once every required step is done, you can push the aforementioned event OrderPlaced as a notification of something that has already taken place
@@asian1599 I'd say that events are things that happened, and you want to notify "to whom it may concern". The receiver of the event decides if it has to do something or not based on the contents of the event. Commands are things that you want to happen. So you put it in a backlog, and hopefully some other system picks it up and does the thing.
2:40 is kind of incorrect. RabbitMQ uses at-least-once delivery as it requires messages to be acknowledged by the application (either automatically or manually after doing some work). This way a failure of an app instance which processes a message, but doesn't ack it, or just acks it too late, may result into double processing. The consumer still has to have some level of idempotency to have resulting data in a consistent state.
Nice content. I was doing a microservice project where I had chosen Kafka over RabbitMQ without actual knowledge of the use of kafka. I needed the traditional RabbitMQ behaviour for my current microservice project, and I thought Kafka would be better for that as I heard it had high throughput. now I know the why's. gonna go with rabbitMQ.
Some parts were misleading and confusing unlike the other videos. I'd suggest to redo this video to match the quality with the other videos :) , thanks for all the great contents
could you explain please how different Kafka consumers are assigned to this or that consumer group? I guess this is the key moment in creating the balanced model with multiple consumers in Kafka
Sure, Kafka consumer groups just make sure that each partition has exactly one consumer (but a consumer can be connected to multiple partitions). There are various algorithms (such as round-robin) that can be used internally. Thanks!
It certainly can, but that doesn't mean you should...it's really not designed for that. There's a lot of strange issues that arise from doing this--you need lots and lots of partitions, group rebalances can cause message duplication, brokers will time out clients whose jobs run long, etc.
@@interviewpen What you are saying is misleading, Kafka indeed is built for both pub-sub and queue systems. Kafka is meant to handle loads of partitioning and in fact that's the whole purpose of kafka. It depends on your configuration to increase group rebalancing and broker timeouts. There arent any harm in doing these. Kafka configurations are very flexible to adapt to these models and they can cover the majority of cases, it's mostly up to how you configure it for your use cases. Not sure what you mean by "group rebalancing can cause message duplication", if a message hasn't been committed, it can be picked up by another consumer after the group rebalancing, and if your implementation does not handle idempotency when processing messages, the issue here is the implementation not kafka's fault. Even if you have database records, you can still handle these cases with transactions. And the whole idea behind the consumer group is to act as a pub-sub and as a queue depending on how you want to use it. Hope this helps!
Are consumers in Kafka do polling to get the events or is it a push based model, where Kafka pushed the events to consumers? For traditional queues, I know that it is a pull based model, where identical consumers (i.e replicas) do polling.
what exactly do you mean by "consumers in RabbitMQ have control over what messages they are consuming"? either they are binded to direct queue or topic they always consume messages - aren't they?
The difference we're pointing out is that RabbitMQ consumers are responsible for polling the queue and getting the next record, while in Kafka, the messages are already destined for a specific partition when they're produced.
5:01 Why is bursty data better handled by traditional queues instead of Kafka? Why do you say Kafka is better suited for "messages that take uniform(and short) time to process"?
Kafka tends to have a bit higher latency. The connection handling also works better for consistent throughput (Kafka will time out idle clients and whatnot)
A Kafka “topic” is essentially an abstraction around a group of individual queues (called partitions) that allows producers to distribute messages easily. Hope that helps :)
@@interviewpenyeah in the concept of Kafka, but in architecture, a “topic” is the role that kafka serves as a whole here. While a “queue” is what RabbitMQ is here. So what is described here, is the opposition of a “topic” (of which kafka is an implementation, like SNS is too) and a “queue” (RabbitMQ, SQS…). No?
หลายเดือนก่อน +1
@@interviewpen but it allow producers, to spread messages across multiple consumers, while making sure that each consumers can get each message at least once. This kind of component, in software architecture, is called a "Topic". RabbitMQ never had this ambition, because it is actual a "queue".
Brilliant! Glad I chanced upon this gem. My current project has a requirement to process real-time location co-ords continuously and by different consumers. I now fully understand why the Architect chose Kafka over alternatives. 👏
I really like your videos (and even bought your course because of that), but this one seems to be lower level than your usual standards. You make it seem like the difference is mostly topological, or maybe API-wise. But in reality the systems are very different in purpose and use cases. RabbitMQ is an implementation of AMQP, full stop. Just go read John O'Hara's article "Toward a commodity enterprise middleware". Being an AMQP, all it does, and nothing else, is delivery of messages to the specified recepient. Again, full stop. As soon as the message is in the memory buffer - it's done its job. Kafka, on the other hand, is a CDC on steroids. A distributed WAL. There are no messages, therefore, no addressee, there are only "data changes", which we usually call "events". I could wax poetic for much longer, and do comparisons with this and that, but honestly, I don't see the point, as we live in the time when you can literally ask the person who created the system what they wanted to do ;) .
Yeah this is a good explanation of the implementation differences between the two systems. This video just covered the practical use cases and functionality, so thanks a lot for sharing this perspective!
"If a consumer disconnects before committing the offset, Kafka will automatically send the data to another consumer." Actually, this part is a bit misleading, as Apache Kafka doesn't 'send' anything because it's a pull-based model which means when a new consumer is assigned to a partition, it will start reading data from the last saved offset for that partition. If the previous consumer didn't save its progress, the new consumer will start from the last saved point and might process some messages again.
I can see how this was misleading. In terms of data flow, the message moves to an available consumer, but yes the consumer first must poll for new records. Traditional queues work this way as well. Thanks for clarifying :)
@@interviewpen RMQ is push-based where the exchange assigns a consumer for processing a message. It is different from how kafka works.
I feel like what's not talked about enough in this kind of comparison is the tradeoffs
1. kafka is much more expensive to run, especially because of the persistence
2. rabbitmq is generally slower at processing messages (hence unsuitable for streaming)
3. kafka has a lot more plugins for different backend services
4. rabbitmq is relatively simpler conceptually hence faster to integrate
Thanks for sharing :)
Not true. Rabbit is much faster until it hits 100%. It's outrun kafka very significant. And consume less resources. So it's not true that rabbit is slower
Okay i think a few people have pointed it out already, but each consumer of a Kafka topic does _not_ get a copy of the same message. It is one consumer in each consumer group that gets a copy of the message. A consumer group is usually assigned to an application, and each instance of that application is a consumer within that group. If each consumer ended up receiving a copy of the message, then it would lead to a lot of duplicate processing.
The example they video used of Logs, Data Analysis etc. is sound. Each of those would be a consumer group though, and e.g. the logs service would have multiple instances, where each instance would be a consumer.
This is a common misunderstanding :) Every consumer assigned to a partition in Kafka gets a copy of every message sent to that partition. All consumer groups do is distribute consumers to different partitions so that each partition only has a single consumer. This causes difficulties for traditional queueing use cases where we want to load balance on the consumer side and/or need to handle consumer disconnects without duplicating messages.
@interviewpen Yes the job of the consumer group is to distribute the consumers under it so that each partition gets no more than one consumer attached to it. So in your example, "logging" would be a consumer group. E.g. if the topic has 4 partitions, and you have two instances of the logging application, then the consumer group would assign each instance (i.e each consumer) to two different partitions.
And a new application for say "analytics" that wants to listen to the topic would need a different consumer group to assign its instances (consumers) to partitions. And each message that lands in a topic goes to a single partition, and is then ready by a single consumer from each consumer group.
So the fan out is happening across consumer groups, not individual consumers. Among the two instances of my logging application, only one instance receives the message. And since each instance is reading from two different partitions, we also get good load balancing. The caveat is to not have more instances than partitions.
Awesome explanation on the difference of use cases. I've worked with kakfa but kind of new to RabbitMQ and this video helps me getting the brief insights on these. Thanks
Glad it helped. Thanks for watching!
You misunderstood the difference between consumer group and consumer in Kafka. In Kafka, one particular message can ONLY be delivered to a consumer, as same as RabbitMQ. But, different consumer in different consumer group will be able to consume a same message independently.
Hi! Consumer groups work around this issue, but at its core, Kafka is in fact fan-out. Two consumers assigned to the same partition will both receive the same messages. All consumer groups do is assign consumers to different partitions within a topic. This is a commonly misunderstood nuance :)
@@interviewpen Agree with @kaixu1026. Within a consumer group, you get a similar abstraction to RabbitMQ without the scalability ceiling. Audience will have a difficult time discerning that, however.
@@interviewpen you can not assign two consumes in a same consumer group to a same partition for runtine consuming in Kafka. Right? Then,technically,it is impossible to consume the same message from 2 consumers if they belongs to the same consume group. There is no doubt Kafka is in fact fan-out. But,your audience may be confused if they believe Kafka can assign 2 consumers to consume a same message without any restriction. consumer group is a key core concept of kafka in my mind. :)
There is so many things wrong this video. Kafka doesn't directly fan out. Each consumer is assigned to one or more partition based on the configuration of kafak. Once a message is read and it doesn't send to all the consumer since all consumer would be assigned to different partition. This is just misleading. I don't know why people aren't calling it out. and There is another comment also pointing out "Kafak doesn't send anything it's pullbased" was there which is wronngly mentioned in video.
That’s not true. Each message delivered to a partition is fanned out to every consumer assigned to that partition. And yes, consumers poll for new messages, as with traditional queues. This might help: docs.confluent.io/platform/current/clients/consumer.html
I think there is some confusion here... There seems to be confusion around consumers and consumer groups..
As @ashutosh mentioned, consumers of the same consumer group are assigned one or more partitions of a topic.
Multiple consumer groups can read from the same topic. Ex: logging, notification, alerting etc can all be three different consumer groups reading from same topic and doing their own independent things.
@@adarsh.hatwaryes! this is not well explained the video needed to introduce consumer groups and consumer for those of us who know Kafka it's clear but for a newcomer it's confusing
Both systems are designed to decouple.
Both systems have routing/filtering/processing capabilities, that either built in or part of their eco system.
Main difference is that kafka is designed for events, rabbit is for commands.
And ordering/persistence associated with those.
what are events and commands?
@@asian1599
Both are messages.
Events are facts about something that has already happened, e.g. OrderPlacedEvent. Events can be internal/external, e.g domain/intergration, it depends on a scope and usually are a part of systems that implement EDA (event driven) architecture.
Commands on the other hand, are messages that tell some service to do something, e.g ValidateCustomerCommand or ShipOrderCommand. Those can or can't be part of EDA systems, but certainly a part of Orchestration (saga).
Events are meant to be 1-many (1 event - many or none listeners), thus kafka streams. Commands are meant to be 1-1 (point to point), thus rabbit queues.
@@asian1599 event: OrderPlaced and you can push all the relevant details associated with it to any consumer that maybe interested, say, to update total sales stats, inventories and what not. Command would be: PlaceOrder - you basically requesting your system to start taking steps required to handle the order, once every required step is done, you can push the aforementioned event OrderPlaced as a notification of something that has already taken place
@@asian1599 I'd say that events are things that happened, and you want to notify "to whom it may concern". The receiver of the event decides if it has to do something or not based on the contents of the event.
Commands are things that you want to happen. So you put it in a backlog, and hopefully some other system picks it up and does the thing.
2:40 is kind of incorrect. RabbitMQ uses at-least-once delivery as it requires messages to be acknowledged by the application (either automatically or manually after doing some work). This way a failure of an app instance which processes a message, but doesn't ack it, or just acks it too late, may result into double processing. The consumer still has to have some level of idempotency to have resulting data in a consistent state.
Yes! That is true of both types of systems. Thanks for the insight :)
Great video! Would like to have seen NATs included in the comparison.
Nice content. I was doing a microservice project where I had chosen Kafka over RabbitMQ without actual knowledge of the use of kafka. I needed the traditional RabbitMQ behaviour for my current microservice project, and I thought Kafka would be better for that as I heard it had high throughput. now I know the why's. gonna go with rabbitMQ.
Nice, glad we could help :)
Some parts were misleading and confusing unlike the other videos. I'd suggest to redo this video to match the quality with the other videos :) , thanks for all the great contents
Could you elaborate on what you thought was confusing?
Amazing video, thanks a lot ❤
Thanks for watching!
really nice :), thanks
could you explain please how different Kafka consumers are assigned to this or that consumer group? I guess this is the key moment in creating the balanced model with multiple consumers in Kafka
Sure, Kafka consumer groups just make sure that each partition has exactly one consumer (but a consumer can be connected to multiple partitions). There are various algorithms (such as round-robin) that can be used internally. Thanks!
Have you ever heard about Hermes which is using by Allegro - one of the biggest e-commerce in Poland
Kafka can be used as queue tho, sharing the same consumir group and increasing the partitions as many consumers needed to balance messages
It certainly can, but that doesn't mean you should...it's really not designed for that. There's a lot of strange issues that arise from doing this--you need lots and lots of partitions, group rebalances can cause message duplication, brokers will time out clients whose jobs run long, etc.
@@interviewpen What you are saying is misleading, Kafka indeed is built for both pub-sub and queue systems. Kafka is meant to handle loads of partitioning and in fact that's the whole purpose of kafka. It depends on your configuration to increase group rebalancing and broker timeouts. There arent any harm in doing these. Kafka configurations are very flexible to adapt to these models and they can cover the majority of cases, it's mostly up to how you configure it for your use cases.
Not sure what you mean by "group rebalancing can cause message duplication", if a message hasn't been committed, it can be picked up by another consumer after the group rebalancing, and if your implementation does not handle idempotency when processing messages, the issue here is the implementation not kafka's fault. Even if you have database records, you can still handle these cases with transactions. And the whole idea behind the consumer group is to act as a pub-sub and as a queue depending on how you want to use it.
Hope this helps!
Great breakdown of the queues and side-by-side comparison. And I can’t believe I’m the first one to comment here!
Thanks for watching!
RabbitMQ is push based model , consumer does not poll , rabbitMq tries to push as soon as it gets the message
Are consumers in Kafka do polling to get the events or is it a push based model, where Kafka pushed the events to consumers?
For traditional queues, I know that it is a pull based model, where identical consumers (i.e replicas) do polling.
Kafka consumers poll for new records (same for traditional queues).
This is well articulated, 🙏
Thanks!
Kafka is like a Lake and Rabbit MQ is like a River
So for handling the stripe webhooks, i should use RabbitMQ but many times i have seen people using kafka in their system design
Yep, common misconception about the use cases of the two systems!
this is a great summary of them!
Thank you!
ooh damn, it was interesting
Thanks for watching!
strange... AMQP (and RabbitMQ) has topics that let distribute messages to multiple listeners as well. It was never mentioned
Yes, this is done using exchanges--accomplishing this in RabbitMQ requires fanning out the message to multiple queues.
Give me a video on kafka and nats
Nice video
Thanks!
what exactly do you mean by "consumers in RabbitMQ have control over what messages they are consuming"? either they are binded to direct queue or topic they always consume messages - aren't they?
The difference we're pointing out is that RabbitMQ consumers are responsible for polling the queue and getting the next record, while in Kafka, the messages are already destined for a specific partition when they're produced.
Si think you use the "Topic" Terminology for Kafka
Amazing video 💯
Thanks!
5:01 Why is bursty data better handled by traditional queues instead of Kafka? Why do you say Kafka is better suited for "messages that take uniform(and short) time to process"?
Kafka tends to have a bit higher latency. The connection handling also works better for consistent throughput (Kafka will time out idle clients and whatnot)
@@interviewpen awesome, that makes sense. Thanks for answering!
I prefer..... NATS
🔥🔥🔥👌🏾
Thank you :)
I think it would be great to have a follow up video covering AWS IoT core
Ok, we’ll add it to the list.
Use NATS
thank you
Thanks for watching!
What software is he using to draw?
We're using GoodNotes on an iPad
What about emqtt
Excellent video, one request can u compare Solace Queue also with Rbbit and kafka?
Are we basically talking about queue vs topic?... Those are just implementation of each type no?
A Kafka “topic” is essentially an abstraction around a group of individual queues (called partitions) that allows producers to distribute messages easily. Hope that helps :)
@@interviewpenyeah in the concept of Kafka, but in architecture, a “topic” is the role that kafka serves as a whole here.
While a “queue” is what RabbitMQ is here.
So what is described here, is the opposition of a “topic” (of which kafka is an implementation, like SNS is too) and a “queue” (RabbitMQ, SQS…). No?
@@interviewpen but it allow producers, to spread messages across multiple consumers, while making sure that each consumers can get each message at least once.
This kind of component, in software architecture, is called a "Topic".
RabbitMQ never had this ambition, because it is actual a "queue".
Haven't all moved to pulsar yet?
wait I can't imagine that's free content 😮
well explained 👏👏
Thanks!
Brilliant! Glad I chanced upon this gem. My current project has a requirement to process real-time location co-ords continuously and by different consumers. I now fully understand why the Architect chose Kafka over alternatives. 👏
Sweet. Thanks!
You say consumers when you should be saying consumer groups when talking about Kafka fanout.
Otherwise a nice explanation
Бери кафку - кафка фкуфнее! 😆
would you kindly do how to make a backend as a service system design
Yeah we might do that soon :)
why are you holding a pen
?
I really like your videos (and even bought your course because of that), but this one seems to be lower level than your usual standards.
You make it seem like the difference is mostly topological, or maybe API-wise.
But in reality the systems are very different in purpose and use cases.
RabbitMQ is an implementation of AMQP, full stop. Just go read John O'Hara's article "Toward a commodity enterprise middleware". Being an AMQP, all it does, and nothing else, is delivery of messages to the specified recepient. Again, full stop. As soon as the message is in the memory buffer - it's done its job.
Kafka, on the other hand, is a CDC on steroids. A distributed WAL. There are no messages, therefore, no addressee, there are only "data changes", which we usually call "events".
I could wax poetic for much longer, and do comparisons with this and that, but honestly, I don't see the point, as we live in the time when you can literally ask the person who created the system what they wanted to do ;) .
Thanks for the clarification
Yeah this is a good explanation of the implementation differences between the two systems. This video just covered the practical use cases and functionality, so thanks a lot for sharing this perspective!