Oh, liked this one! For Kafka Streams/ksqlDB *everything* is about Kafka, all input and all output moves through 1 single Kafka cluster. That has bit me a few times, and Flink is more flexible there: You can read from one cluster and write to another. Or join data from different clusters. Or read data from a cluster you only have read access from.
Where do you persist those states? How easy to share that states when you move Kubernetes from one cluster to a new cluster ? Currently, I persist states in Redis.
Kstreams is my favourite simply because of the deployment model as long as I already have a Kafka cluster. If the echo system does not use Kafka and uses AWS Kinesis, I would choose Flink.
what about latency ? is it as performing as others available on the market if not better at fraction of the cost ? would your provide some benchmark numbers relative to other candidate streaming languages / frameworks targeted use case : streaming large financial datasets in many formats , text , integer , float ...etc your input is highly appreciated
Flink is not as advanced a product as you present it. It is more like libraries and scripts for creating software than software itself. In flink you cannot do many trivial things that normally do with data. Flink also changes drastically from version to version and is not compatible with the previous ones. The documentation is unclear. Flink disappointed me a lot.
Sorry I think that's nonsense. While KStreams might be easier to use, you might hit a wall with a problem you could only solve in Flink, not the other way around. As they said in the video.
Using SQL syntax in streaming application makes things even worse. How do you test Ksql together with Kafka Streams? They just belong two different worlds. The idea of enabling not java developer to work with Kafka will failed at the end. If someone can't even write Java code, he is definitively not qualified for developing or handling the complicity in such streaming applications.
Idk what you are talking about regarding "two different worlds". All ksql queries are converted to Kafka Streams processes afaik, so they are literally "the same world". SQL syntax is just domain specific "code" that hides some complexity and implementation details behind abstractions. Also, I think pretty much all devs can or could write their stream code in Java, it's just a matter of preference or not wanting to add another language to your project.
@@NikolasHonnef Yes, ksql under the hood is just kafka-streams. But just image if you're building a a microservice event sourcing system. One service need to collect and transform multiple data sources, if kafka-streams can cover everything within one kafka stream topology, why should I use ksql additionally, how would you write your test code? And you will also need a separate deployment, if you're using kubernetes. At the end ksql will bring you more operation overhead. Other frameworks like flink or spark, they all have a build-in sql-like high-level-api and it combines well with low-level part. For test or deployment you only maintain one codebase instead of handle sql-part separately. So based on that, I don't think it's only a matter of preference...
@@djl3009 I don't know either. But if someone can't even handle csv or dat abase ETL, they will have definitely more problem with streaming data, because he can't catch one record from the running stream, manipulate it manually and put it back to the stream ;-)
The dude in the middle is brilliant! Asks the correct questions for the uninitiated!
He has his own podcast channel and it is fantastic: www.youtube.com/@DeveloperVoices
This was an excellent episode. KrisJ - I really like your host/interviewing style. This was an interesting topic and very well presented.
Thanks! Glad you enjoyed it. 😊
Oh, liked this one! For Kafka Streams/ksqlDB *everything* is about Kafka, all input and all output moves through 1 single Kafka cluster. That has bit me a few times, and Flink is more flexible there: You can read from one cluster and write to another. Or join data from different clusters. Or read data from a cluster you only have read access from.
Excellent video, Kris
Great debate, thanks for sharing.
Is there any way to clear CTAS data when joining 2 tables together?
Where do you persist those states? How easy to share that states when you move Kubernetes from one cluster to a new cluster ? Currently, I persist states in Redis.
Kstreams is my favourite simply because of the deployment model as long as I already have a Kafka cluster. If the echo system does not use Kafka and uses AWS Kinesis, I would choose Flink.
13:41 the big takeaway as to why/when Flink vs Kstreams
OMG, it really worked. Thank you so much!!
Nice Episode!
what about latency ? is it as performing as others available on the market if not better at fraction of the cost ? would your provide some benchmark numbers relative to other candidate streaming languages / frameworks
targeted use case : streaming large financial datasets in many formats , text , integer , float ...etc your input is highly appreciated
ksqlDB is the 🐐!
BEEST!!!
Flink is not as advanced a product as you present it. It is more like libraries and scripts for creating software than software itself. In flink you cannot do many trivial things that normally do with data. Flink also changes drastically from version to version and is not compatible with the previous ones. The documentation is unclear. Flink disappointed me a lot.
Sorry I think that's nonsense. While KStreams might be easier to use, you might hit a wall with a problem you could only solve in Flink, not the other way around. As they said in the video.
You are very, very uninformed.
well, tNice tutorials is going to take forever...
Using SQL syntax in streaming application makes things even worse. How do you test Ksql together with Kafka Streams? They just belong two different worlds.
The idea of enabling not java developer to work with Kafka will failed at the end. If someone can't even write Java code, he is definitively not qualified for developing or handling the complicity in such streaming applications.
That is an "interesting" take on what qualifies someone to work with the complexity of streaming applications :)
Idk what you are talking about regarding "two different worlds". All ksql queries are converted to Kafka Streams processes afaik, so they are literally "the same world". SQL syntax is just domain specific "code" that hides some complexity and implementation details behind abstractions. Also, I think pretty much all devs can or could write their stream code in Java, it's just a matter of preference or not wanting to add another language to your project.
@@NikolasHonnef Yes, ksql under the hood is just kafka-streams. But just image if you're building a a microservice event sourcing system. One service need to collect and transform multiple data sources, if kafka-streams can cover everything within one kafka stream topology, why should I use ksql additionally, how would you write your test code? And you will also need a separate deployment, if you're using kubernetes. At the end ksql will bring you more operation overhead.
Other frameworks like flink or spark, they all have a build-in sql-like high-level-api and it combines well with low-level part. For test or deployment you only maintain one codebase instead of handle sql-part separately. So based on that, I don't think it's only a matter of preference...
@@djl3009 I don't know either. But if someone can't even handle csv or dat
abase ETL, they will have definitely more problem with streaming data, because he can't catch one record from the running stream, manipulate it manually and put it back to the stream ;-)