Thanks for the video it’s really insightful...Can you just explain in a video or here…How kafka and flink can fit in a realtime scenario with their duties for understanding with more clarity.
Flink is a separate compute engine that is might more scalable and efficient than Kafka Streams / KSQL (because it does not rely directly on Kafka Topics). Other benefits: Support for multiply APIs (SQL, Java, Python), Unified API for streaming and batch, support for CEP (complex event processing, i.e. pattern matching), connectivity to multiple Kafka clusters in one query, etc. Kafka Streams on the other side is a very lightweight library that can be embedded into microservices (e.g., Spring Boot applications operated in its own Docker container). Very different sweet spot than Flink.
Apache Flink, a real-time stream processing framework, can be conceptually compared to the blackboard model of consciousness in the sense that both involve dynamic interactions and collaboration. In the blackboard model, different components (agents or processes) contribute to a shared memory or workspace (the "blackboard") to solve complex problems. Similarly, in Flink, multiple tasks or operators process and transform streams of data in a distributed environment, constantly interacting with the data streams. Drafted by AI
One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
You can also use Spark Streaming together with Kafka. The fundamental difference is that Spark was built for batch and added streaming capabilities while Flink was designed for streaming from the beginning. This fact is combination with some other benefits and mature features for transactional workloads, complex event processing (CEP) capabilities, much better open source community adoption and growth (for streaming data, not for batch data) make Flink the better choice for most data streaming projects.
thanks for your answers i got couple questions: - is it possible to be used with RabbitMQ instead Kafka (and why not?) - what would be a Hello,World project for such field (data streaming projects)?
@@GreatTaiwan One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
@@GreatTaiwan "Hello World" projects can either be a (relatively simple) integration data pipeline for streaming ETL or a simple business application / business logic such as alerting if a threshold (e.g., of a sensor temperature measurement) is reached.
Are we going back to overly complex "application servers" like the ones getting such a bad rap in the EJB days? I see a lot of love for flink in other videos when all i can think is how overly complicated they've managed to do things. When i hear snapshots being stored in cloud for instance, my over-complication-radar sounds its alarm. Guess there are a lot of good use cases for it though, since all those big companies are using it. But flink should not be the first thing that comes to mind for aggregating some kafka events into some state (in my opinion, obviously!). Better to just write a streaming application(? - it is very easy). Scaling that is a simple matter of upping stream threads + partitions, and maybe number of pods. Totally flexible. No need for snapshots or replays after downtime since kafka stores state when you do joins etc. In the right scenario, use flink. But think before you act! :) That said, i am very intrigued and am currently looking at maybe introduce flink at work.
Indeed, Flink is complex to operate. That's why a fully managed SaaS cloud service is the best choice. You don't have to worry about operations, scalability, etc. You just pay as you go with consumption-based pricing, elastic scale, etc. And yes, not every use case requires Flink. But in most cases (especially with a SaaS Flink), you are just a SQL query or Python script away from doing stream processing. If you need data consistency, low latency or reliable SLAs, then "just writing another streaming application" is definitely not easier.
I am a beginner I understood the Concepts with in 10 mins . Very good explanation
More lightboard videos of Flint would be so helpful :D
Well done. Love the SL FLink offering, now it makes sense how Kafka and Flink can coexist :)
Very engaging video, with just the right amount of information. Top effort!
Thanks for the video it’s really insightful...Can you just explain in a video or here…How kafka and flink can fit in a realtime scenario with their duties for understanding with more clarity.
Why would one need to use Flink when there is already Kafka Streams and KSQLDB?
Same question like you
The answer starts at 3:10
Flink is a separate compute engine that is might more scalable and efficient than Kafka Streams / KSQL (because it does not rely directly on Kafka Topics). Other benefits: Support for multiply APIs (SQL, Java, Python), Unified API for streaming and batch, support for CEP (complex event processing, i.e. pattern matching), connectivity to multiple Kafka clusters in one query, etc.
Kafka Streams on the other side is a very lightweight library that can be embedded into microservices (e.g., Spring Boot applications operated in its own Docker container). Very different sweet spot than Flink.
Kafka Streams and KSQLDB doesn't support analytical job like Flink does
@@RecaAtoz What exactly do you mean with „analytical job“?
Very informative. Thanks!
Apache Flink, a real-time stream processing framework, can be conceptually compared to the blackboard model of consciousness in the sense that both involve dynamic interactions and collaboration. In the blackboard model, different components (agents or processes) contribute to a shared memory or workspace (the "blackboard") to solve complex problems. Similarly, in Flink, multiple tasks or operators process and transform streams of data in a distributed environment, constantly interacting with the data streams.
Drafted by AI
Awesome video.. just suggestion is look straight at screen a feel like you are explaining to viewers
Yes, agreed. The lightboard setup will be improved for future videos.
can we use it with RabbitMQ instead?
One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
Brilliant...
Why not Apache Spark streaming from kafka
You can also use Spark Streaming together with Kafka. The fundamental difference is that Spark was built for batch and added streaming capabilities while Flink was designed for streaming from the beginning. This fact is combination with some other benefits and mature features for transactional workloads, complex event processing (CEP) capabilities, much better open source community adoption and growth (for streaming data, not for batch data) make Flink the better choice for most data streaming projects.
thanks for your answers i got couple questions:
- is it possible to be used with RabbitMQ instead Kafka (and why not?)
- what would be a Hello,World project for such field (data streaming projects)?
@@GreatTaiwan One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
@@GreatTaiwan "Hello World" projects can either be a (relatively simple) integration data pipeline for streaming ETL or a simple business application / business logic such as alerting if a threshold (e.g., of a sensor temperature measurement) is reached.
Are we going back to overly complex "application servers" like the ones getting such a bad rap in the EJB days? I see a lot of love for flink in other videos when all i can think is how overly complicated they've managed to do things. When i hear snapshots being stored in cloud for instance, my over-complication-radar sounds its alarm. Guess there are a lot of good use cases for it though, since all those big companies are using it. But flink should not be the first thing that comes to mind for aggregating some kafka events into some state (in my opinion, obviously!). Better to just write a streaming application(? - it is very easy). Scaling that is a simple matter of upping stream threads + partitions, and maybe number of pods. Totally flexible. No need for snapshots or replays after downtime since kafka stores state when you do joins etc. In the right scenario, use flink. But think before you act! :) That said, i am very intrigued and am currently looking at maybe introduce flink at work.
Indeed, Flink is complex to operate. That's why a fully managed SaaS cloud service is the best choice. You don't have to worry about operations, scalability, etc. You just pay as you go with consumption-based pricing, elastic scale, etc. And yes, not every use case requires Flink. But in most cases (especially with a SaaS Flink), you are just a SQL query or Python script away from doing stream processing. If you need data consistency, low latency or reliable SLAs, then "just writing another streaming application" is definitely not easier.