Sometimes, using the same tool for different things simplifies development, by not having to learn yet another tool, yet another configuration, yet another release cycles and patches and licensing model and yadda yadda. This outweighs the fact that we are are distorting its intended usage, if, after all, it is flexible enough to allow it.
Actually, Kafka's partitioning abilities (and consumer groups) are perfect for command consumption with paralell command handlers (which can even be enforcing sequence guarantes). After, for example an event handler that translates events from other bouded contexts to commands on the current BC. Events, commands and state events are all messages, that can go to separate topics with separate permissions. Event sourcing with Kafka as the event store with separate snapshot messages is pretty efficient.
The Kafka doesn’t do CDC is technically correct but if you look at commercial Kafka, like Confluent, that does do CDC and it uses a tool like Debezium to perform the CDC log mining. So while I understand the pushback on using Kafka as a message queue and an event store I would agree. For CDC there is some grey here.
True. I recorded the first part a couple times and was terming it as "not entirely" or "not exactly". Some of that was based on my dislike for the many uses case of CDC. It has legit cases but I think it's entirely abused.
Was just about to say this. Saying that, I appreciate this video a lot even if I think it came out of some initial hair splitting. I'm pretty sure what you're describing is what the infographic meant
"Use Cases" does not mean Kafka is the Only component in the solution. You kinda contradicted yourself on these by mentioning the use cases for Kafka in each scenario. There are certainly very high throughput systems out there using it in every one of those scenarios. Not my choice personally, but the graphic is not wrong. I think you just had a different interpretation of the term "Use Cases" than the author of the image.
I hear you. However, the issue with the advice from LinkedIn is that less experienced people look at such posts and take it as gospel truth. Rather than apply a tool according to the context, they’d use a tool for the sake of it. This video clarifies the fact that Kafka is a tool in the mix rather than the be-all. One mustn’t forget that Kafka is a complex tool that isn’t necessarily required everywhere.
I had to, it struck a nerve. I often wonder if more time is spent on these infographics, making the colors and animations rather than actually understanding the content in them. My guess is someone searched up Kafka use-cases, found blog post, that was equally incorrect, and used that as a means to create it.
Lol when I saw the notification about the video I immediately thought about Derek's comment on LinkedIn 😂 He couldn't let it pass. Jokes aside thanks for clarifying this topic Derek.
Since I need Kafka for distributed event streaming, I will also use it for messaging while passing by basically at no cost (in terms of infrastructure and mental load on developers). Using Kafka for messaging only becomes troublesome when you have exactly-once processing needs on messages. So while techincally it is not the right tool for messaging. In nearly all usecases that I know, it is good enough for messaging.
While I agree with the premise of the video, it doesn't really explain the details around why Kafka fails when used in place of a message queue system (though in some use cases it works very well, like for a distributed work queue system). The scenarios where it falls short are usually when a consumer is taking too long to perform its work and events are therefore backing up in the partition while other available consumers of the same topic are idling doing nothing. This is the competing consumers concept mentioned, but not explained, in the video. This can be addressed through clever use of keys, but like most clever solutions, that is the point where the system design will start to deteriorate. Add to this the fact that the number of partitions cannot easily be changed to allow for quick scaling of consumers and you start to see the shortcomings.
In your video "commands vs events" you showed an example where both can be used together. I wonder how is that if distributed log is used for events and queue for commands, does it mean a message broker that support both queuing, pub/sub should be used (e.g. RabbitMQ) .
Hi Derek. I didn't get the part with message queuing. Could you provide a concrete example that shows the issue and a contrasting solution with some traditional queue like RabbitMQ for example? My company insists on using Kafka for everything and it would be very usefull to change that mindset.
Remember that adding a queue will add operational, learning and licence cost for your company. It is usual to insist on single technology when one is already in use. Same for databases
@@andreybraslavskiy522 Precisely Andrey. For many things in distributed achitechtures, Kafka is simply the best solution out there. Kafka may not be purpose built for messaging, but it can do that good enough at no cost to infrastructure and training on employees.
Glad you followed up that LinkedIn discussion with a video that hits things on the head, Derek! You've motivated me to follow up with my own video to give some more event sourcing examples.
I am using kafka in production with all the use cases that you mentioned. But that is very heavy weight with storage. I am on migration all the stuff to NATS Jetstream.
That LinkedIn post wasn't for the technically apt. It was targeting the people who think they are technical but they aren't anymore (or never was). Like: did Kafka showed up in you cdc example? Yes. Hence it's used for cdc....
Regarding CDC with debezium...I agree that if you for instance are using SQL Servers as your primary database servers throughout your technical database setup, then I would prefer to use the built-in data replication and distribution features in SQL Server as your data pipelines. However if you are using different database servers like MySQL, PostgreSQL, REDIS, SQL Server etc. and you need to subscribe on data then Debezium on top of Kafka sounds to be a less code heavy way of consuming and distributing data in real-time between the different databases. I would like to hear, why that is a bad idea? and what alternative approaches are available which doesn't include too much code.
You mentioned something like this a couple years ago and I remember arguing that Kafka is perfectly usable for all this. Looking back at it I can clearly see I'm wrong and I can see how much I learned from you and I'm very appreciative. I've been a mender on patreon and here for years now but all that said... I'm not clear on why it can't be used for CDC. An event log seems better than a queue for CDC.. no?
Thanks for the comment. It can be used as a means to distribute events from CDC tool such as debezium. However, CDC in general is a slippery slope in my opinion. It has its use cases, like hard to integrate systems. Kafka in itself isn't CDC was more my point.
Wait what? 2:50 "generally with queues, you are interacting often with commands". Have you read Kleinrock at Uni? I think you are getting the abstraction layers wrong.
What do you call this pattern using Kafka: (I have a lot of words to describe it but I don't want to be flagged and blacklisted) - Send "customer-service: add-customer bob"! - Wait for reply "master: customer-added 1234"... - Send "customer-service: set-address xyz"! - Wait for reply "master: address-updated"... - Wait for reply "master: address-updated"... - Wait for reply "master: address-updated"... - Error, timeout!
Request/reply, maybe with orchestration, maybe with sagas. Like the video suggests, these commands really shouldn't be using Kafka anyway though. A message broker like RabbitMQ would probably fit better.
@@georgehelyar Well, we (ops) has received this implementation for a new "microservice based centralized control plane" that sends about a 100 messages per WEEK. It is implemented like this and if some microservice sh*t somewhere is down for a few minutes while a user is making some changes in the central it will get "error: kafka reply timeout" and the operator have to retry the steps in the "wizard". FML!
These are regular commands. Whats weired is acting of producer. The benefit of queue is that you can reliably retry a particular part of action chain. For example change of address might be retried by customer-service silently(and queue-based retry could survive restart of any service, could try by hours and days if needed) but your producer is just dropping whole process and starts again.
"Debunking" is a pretty strong word to use, when actually all you're doing is just spending 10 minutes adding extra detail and nuance that would obviously have been impossible to include in an infographic.
Disagree. Wiki apache kafka: Apache Kafka is a distributed "event store" and stream-processing platform. Also need read about section Messaging in Kafka official documentation.
wow this is great content, I am personally struggling a lot with the lack of detail of the information around and chatgpt has just made things worse, I was wondering what resources can I use to gain this deep understanding of architectural patterns?
@@bc4198 it's not the quality of the sources don't get me wrong it's just the level of detail i think this video does a fantastic job at explaining the difference between event streaming and a message queue, I have been using message passing for async communication with pub sub protocols like MQTT for a while but it always felt very wrong and couldn't understand why... now i know
Stop following me Derek!!! Seriously man, how do you keep bringing out videos the same week - if not sometimes a day after?!!?! - on topics and concepts I'm trying to educate others on!? Dude. This is getting creepy. ;)
Sometimes, using the same tool for different things simplifies development, by not having to learn yet another tool, yet another configuration, yet another release cycles and patches and licensing model and yadda yadda. This outweighs the fact that we are are distorting its intended usage, if, after all, it is flexible enough to allow it.
I always appreciate your point of view; you always provide appropriate context to topics marketed as solutions to problems they don't actually solve.
There's too much "what" and not enough "why".
Actually, Kafka's partitioning abilities (and consumer groups) are perfect for command consumption with paralell command handlers (which can even be enforcing sequence guarantes). After, for example an event handler that translates events from other bouded contexts to commands on the current BC.
Events, commands and state events are all messages, that can go to separate topics with separate permissions. Event sourcing with Kafka as the event store with separate snapshot messages is pretty efficient.
I'm running Kafka sually with between 300,000 and 400,000 topics like you say
The Kafka doesn’t do CDC is technically correct but if you look at commercial Kafka, like Confluent, that does do CDC and it uses a tool like Debezium to perform the CDC log mining.
So while I understand the pushback on using Kafka as a message queue and an event store I would agree. For CDC there is some grey here.
True. I recorded the first part a couple times and was terming it as "not entirely" or "not exactly". Some of that was based on my dislike for the many uses case of CDC. It has legit cases but I think it's entirely abused.
@@CodeOpinion it’s a means to an end and I have burned many an ear about planning for its obsolescence as you are putting it in.
Was just about to say this. Saying that, I appreciate this video a lot even if I think it came out of some initial hair splitting. I'm pretty sure what you're describing is what the infographic meant
@@vulpixelful agree! Love @CodeOpinion videos. Direct many of my teams to them.
"Use Cases" does not mean Kafka is the Only component in the solution. You kinda contradicted yourself on these by mentioning the use cases for Kafka in each scenario. There are certainly very high throughput systems out there using it in every one of those scenarios. Not my choice personally, but the graphic is not wrong. I think you just had a different interpretation of the term "Use Cases" than the author of the image.
I hear you. However, the issue with the advice from LinkedIn is that less experienced people look at such posts and take it as gospel truth. Rather than apply a tool according to the context, they’d use a tool for the sake of it. This video clarifies the fact that Kafka is a tool in the mix rather than the be-all. One mustn’t forget that Kafka is a complex tool that isn’t necessarily required everywhere.
I knew this was coming after the Linkedin comment 😂 Great content as always!
I had to, it struck a nerve. I often wonder if more time is spent on these infographics, making the colors and animations rather than actually understanding the content in them. My guess is someone searched up Kafka use-cases, found blog post, that was equally incorrect, and used that as a means to create it.
Lol when I saw the notification about the video I immediately thought about Derek's comment on LinkedIn 😂 He couldn't let it pass.
Jokes aside thanks for clarifying this topic Derek.
Since I need Kafka for distributed event streaming, I will also use it for messaging while passing by basically at no cost (in terms of infrastructure and mental load on developers). Using Kafka for messaging only becomes troublesome when you have exactly-once processing needs on messages. So while techincally it is not the right tool for messaging. In nearly all usecases that I know, it is good enough for messaging.
While I agree with the premise of the video, it doesn't really explain the details around why Kafka fails when used in place of a message queue system (though in some use cases it works very well, like for a distributed work queue system).
The scenarios where it falls short are usually when a consumer is taking too long to perform its work and events are therefore backing up in the partition while other available consumers of the same topic are idling doing nothing. This is the competing consumers concept mentioned, but not explained, in the video. This can be addressed through clever use of keys, but like most clever solutions, that is the point where the system design will start to deteriorate. Add to this the fact that the number of partitions cannot easily be changed to allow for quick scaling of consumers and you start to see the shortcomings.
In your video "commands vs events" you showed an example where both can be used together. I wonder how is that if distributed log is used for events and queue for commands, does it mean a message broker that support both queuing, pub/sub should be used (e.g. RabbitMQ) .
The home of infographics?! 😂 "LinkedIn: we're like Pinterest, but 5 years ago, and greedier"
My feed is a full inspiration. I could make a video everyday like this.
Kafka Connect does CDC
Hi Derek. I didn't get the part with message queuing. Could you provide a concrete example that shows the issue and a contrasting solution with some traditional queue like RabbitMQ for example?
My company insists on using Kafka for everything and it would be very usefull to change that mindset.
Remember that adding a queue will add operational, learning and licence cost for your company. It is usual to insist on single technology when one is already in use. Same for databases
@@andreybraslavskiy522 we're talking about opensource solutions. Both Kafka and RabbitMQ is free
@@andreybraslavskiy522it may be usual but will neither be easier nor cheaper in the long run.
@@andreybraslavskiy522 Precisely Andrey. For many things in distributed achitechtures, Kafka is simply the best solution out there. Kafka may not be purpose built for messaging, but it can do that good enough at no cost to infrastructure and training on employees.
Glad you followed up that LinkedIn discussion with a video that hits things on the head, Derek!
You've motivated me to follow up with my own video to give some more event sourcing examples.
Go for it!
I am using kafka in production with all the use cases that you mentioned. But that is very heavy weight with storage. I am on migration all the stuff to NATS Jetstream.
Hi Derek. Excellent as always. That's how kind of semantic diffusion occurs.
How about kafka connect?
This seems to suggest that Kafka is good to use for commands too - th-cam.com/video/7fkS-18KBlw/w-d-xo.html
That LinkedIn post wasn't for the technically apt. It was targeting the people who think they are technical but they aren't anymore (or never was).
Like: did Kafka showed up in you cdc example? Yes. Hence it's used for cdc....
The post is a perfect fit for the platform, then! A real microcosm - macrocosm thing 🤭
Whenever i see LinkedIn, I know it's dog shit
Regarding CDC with debezium...I agree that if you for instance are using SQL Servers as your primary database servers throughout your technical database setup, then I would prefer to use the built-in data replication and distribution features in SQL Server as your data pipelines.
However if you are using different database servers like MySQL, PostgreSQL, REDIS, SQL Server etc. and you need to subscribe on data then Debezium on top of Kafka sounds to be a less code heavy way of consuming and distributing data in real-time between the different databases.
I would like to hear, why that is a bad idea? and what alternative approaches are available which doesn't include too much code.
You mentioned something like this a couple years ago and I remember arguing that Kafka is perfectly usable for all this. Looking back at it I can clearly see I'm wrong and I can see how much I learned from you and I'm very appreciative. I've been a mender on patreon and here for years now but all that said... I'm not clear on why it can't be used for CDC. An event log seems better than a queue for CDC.. no?
Thanks for the comment. It can be used as a means to distribute events from CDC tool such as debezium. However, CDC in general is a slippery slope in my opinion. It has its use cases, like hard to integrate systems. Kafka in itself isn't CDC was more my point.
Ugh I'm sick every time developer are pulling Kafka to be used as queue for their 20 mesaages/h because "it's fast and persistent"
Be good if you could compare it to cloud constructs like SNS, SQS, Kinesis etc
Wait what? 2:50 "generally with queues, you are interacting often with commands". Have you read Kleinrock at Uni? I think you are getting the abstraction layers wrong.
Semantics of using a queue as a task in a one to one, topics as a one to many.
What do you call this pattern using Kafka: (I have a lot of words to describe it but I don't want to be flagged and blacklisted)
- Send "customer-service: add-customer bob"!
- Wait for reply "master: customer-added 1234"...
- Send "customer-service: set-address xyz"!
- Wait for reply "master: address-updated"...
- Wait for reply "master: address-updated"...
- Wait for reply "master: address-updated"...
- Error, timeout!
Request/reply, maybe with orchestration, maybe with sagas.
Like the video suggests, these commands really shouldn't be using Kafka anyway though. A message broker like RabbitMQ would probably fit better.
@@georgehelyar Well, we (ops) has received this implementation for a new "microservice based centralized control plane" that sends about a 100 messages per WEEK. It is implemented like this and if some microservice sh*t somewhere is down for a few minutes while a user is making some changes in the central it will get "error: kafka reply timeout" and the operator have to retry the steps in the "wizard". FML!
@@ddanielsandbergoh now I understand why you can't use the word to describe that without your comment getting censored
These are regular commands. Whats weired is acting of producer. The benefit of queue is that you can reliably retry a particular part of action chain. For example change of address might be retried by customer-service silently(and queue-based retry could survive restart of any service, could try by hours and days if needed) but your producer is just dropping whole process and starts again.
"Debunking" is a pretty strong word to use, when actually all you're doing is just spending 10 minutes adding extra detail and nuance that would obviously have been impossible to include in an infographic.
It's not really a debunking. It is specifically your opinion on use cases and how you would personally like to use it
Disagree. Wiki apache kafka: Apache Kafka is a distributed "event store" and stream-processing platform. Also need read about section Messaging in Kafka official documentation.
sounds very subtle
This video was very clear and easy to understand. Great job.
wow this is great content, I am personally struggling a lot with the lack of detail of the information around and chatgpt has just made things worse, I was wondering what resources can I use to gain this deep understanding of architectural patterns?
Just keep digging around and and finding content that doesn't seem superficial. You'll find sources that corroborate.
You can also ask ChatGPT for sources, and at least some of the time, it will give some good ones.
@@bc4198 it's not the quality of the sources don't get me wrong it's just the level of detail i think this video does a fantastic job at explaining the difference between event streaming and a message queue, I have been using message passing for async communication with pub sub protocols like MQTT for a while but it always felt very wrong and couldn't understand why... now i know
Stop following me Derek!!!
Seriously man, how do you keep bringing out videos the same week - if not sometimes a day after?!!?! - on topics and concepts I'm trying to educate others on!?
Dude. This is getting creepy. ;)
We must follow the same things on LinkedIn/Twitter/TH-cam etc...
That command vs event diagram is really good.