Comparing Kafka Streams, Akka Streams and Spark Streaming: what to use when | Rock the JVM

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ม.ค. 2025

ความคิดเห็น • 41

  • @alexandrutoma9187
    @alexandrutoma9187 3 ปีที่แล้ว +1

    esti cel mai bun instructor de scala din lume :D
    ce bine ca esti si pe udemy si ai si cursuri pe site.
    tot asa Daniel!

  • @abdulelahaljeffery6234
    @abdulelahaljeffery6234 3 ปีที่แล้ว +1

    I really love how you lay out the pros and cons of each streaming API, and in what situation we have to use what. Really great stuff; and I'm glad that I found your channel.
    I'd happily buy a membership to learn from your awesome courses.
    Cheers pro :)

    • @rockthejvm
      @rockthejvm  3 ปีที่แล้ว

      So glad you like my work!

  • @Dr_Dude
    @Dr_Dude 4 ปีที่แล้ว +1

    Nice, finally i know the difference and when to use what!!!! well done video as always

    • @rockthejvm
      @rockthejvm  4 ปีที่แล้ว +1

      That's the goal - enjoy!

  • @carlosvaztec
    @carlosvaztec ปีที่แล้ว +1

    One thing I missed was STATE, how they compare in terms of managing aggregations. Great video thank you.

  • @chandrashekharkotekar8453
    @chandrashekharkotekar8453 4 ปีที่แล้ว +2

    Thanks for this detailed video. Can you please make similar video which compares Spark streaming with Apache Flink with Apache pulsar?

    • @rockthejvm
      @rockthejvm  4 ปีที่แล้ว

      Noted!

    • @felipegutierrez7856
      @felipegutierrez7856 4 ปีที่แล้ว +2

      good request. I was going to say that. The video offers a very good explanation about the 3 stream libraries/frameworks. I would say that Flink offers a better low latency if compared to Spark since Flink follows the process-at-a-tuple model and Spark uses micro-batching. Backpressure in Flink is per operator and in Spark is on the source. Akka-streams is also per operator, which is very good! i loved Akka-stream because I can change the strategy of one operator at runtime using Flows and Partition. If I would do it in Flink and Spark I will need to save the state and restart the stream query.

  • @ElectricWound
    @ElectricWound 3 ปีที่แล้ว +2

    A very nice high-level overview of the differences of the streaming libraries. I was especially looking for a description of when to use Kafka Streams instead of Akka Stream and this was very helpful. There was one severe error in your description of Akka Streams though. They are not "asynchronous by default". Most operators are actually synchronous and you are able to introduce asynchronous boundaries into streams or invoke asynchronous operations with a given degree of parallelism. Consecutive synchronous operations will be "baked" into a single actor transparently on materialization to minimize message passing overhead. So you have perfect and concise control over the concurrency of calculations. And I just can not fully agree on your position on Akka Streams as being especially hard for beginners. Especially programmers with some Scala experience will quickly relate to the collections-like API and be up and running in no time, especially compared to setting up Kafka or Spark. I think, before anyone approaches streaming libraries at all, they are probably already knee deep in hard to solve concurrency, dependency and performance problems and maybe sunk weeks into cracking each problem the hard way. Then finding Akka Streams you can finally concentrate on your logic, get all the boilerplate out of the way and write some self-descriptive concise code, that rocks some incredibly complex stuff, nicely modularized in readable code chunks that fit on a single screen. Its discovery for me was like finally coming home. I think, the hardest part is wrapping your head around the concept of materialized values, how to design stream stages with state correctly and when you need the Graph API at all. My next task is getting my hands dirty with Kafka.

  • @marekiwaniuk2399
    @marekiwaniuk2399 3 ปีที่แล้ว +3

    Just wanted to leave a note on how Reactive Manifesto and Reactive Streams are (not) related to each other. The first one describes 'reactive systems' - it means the whole system, where all of its components cooperate in a resilient, elastic, fault-tolerant and message-driven manner. So it is a specification of how a system should behave as a whole. Reactive streams, on the other hand, are just a piece of the puzzle in the reactive system. They also can be used separately, outside of reactive system. The thing is, you can actually write an application, which doesn't comply with requirements of Reactive Manifesto, but still uses and leverages Reactive Streams. 'Reactive' in systems means how the whole system reacts to volume, load, errors etc.; 'reactive' in streams means that you have a flow of data, and you react asynchronously to the events in this flow. In the world of Akka those two terms might get blurred, because Akka Actor system actually enables you to build a reactive system. Nonetheless, I would say that Akka Streams might help you build a reactive system, but they won't make your system resilient, elastic, etc. straight away.
    Anyways, you have a really good content on this channel, thanks a ton for that!

    • @rockthejvm
      @rockthejvm  3 ปีที่แล้ว +1

      Thanks for adding the color there. I might make a video on this exact topic. Glad you like my work!

  • @danishamjad5807
    @danishamjad5807 7 หลายเดือนก่อน

    I am guessing ZIO streams is analogous to Akka streams w.r.t usage. right?

  • @namanbhayani1016
    @namanbhayani1016 2 ปีที่แล้ว +1

    Thank you very much Daniel :)

  • @cgmds1973
    @cgmds1973 4 ปีที่แล้ว +2

    Awesome explanation, thank you!!

  • @lsitful
    @lsitful 4 ปีที่แล้ว +2

    +1 for: why Flink is not here?

  • @iQwert789
    @iQwert789 4 ปีที่แล้ว +4

    Good video, however it was nice if you could also include Flink (as you comparing streaming frameworks) it's generally 20% faster than Kafka Streams and Spark Streaming, probably Kafka streams is the future as Kafka's ecosystem is evolving, but syntax vice Spark/Flink are much more intuitive in Scala

  • @ziauddin5981
    @ziauddin5981 4 ปีที่แล้ว +1

    Nice explanation. Can we also include a part of Apache Flink. Apache Flink,as i think , also uses Akka under the hood (?) and it also provides some good control over stream through low level APIs and other benefits as shown for akka.

    • @rockthejvm
      @rockthejvm  4 ปีที่แล้ว +1

      Will add something

  • @minshi1040
    @minshi1040 2 ปีที่แล้ว

    Hi Daniel,
    Normally, how would you host the scala applications to make it long running process if you use Kafka Streams ?
    I know if I use spark streaming, the dedicated cluster will keep it running and listen /react to the stream/data. I have not big amount of data.
    Kind Regards

    • @rockthejvm
      @rockthejvm  2 ปีที่แล้ว

      There are various cloud services for Kafka to help you with the Kafka cluster.

  • @stanislavg.7903
    @stanislavg.7903 4 ปีที่แล้ว +1

    Cool. But now (from 2.3) Spark has .trigger(processingTime = "0 seconds") to minimize the latency. We may use a 0 second processing time trigger indicating that Spark should start each micro-batch as fast as it can with no delays.

    • @rockthejvm
      @rockthejvm  4 ปีที่แล้ว

      Yep. Did that come into conflict with anything in the video?

  • @tai-hao-le
    @tai-hao-le 3 ปีที่แล้ว

    Could you please clarify what do you mean by fault tolerance in Akka Streams? I am used to working with big data frameworks (Kafka Streams, Spark Streaming and Flink) and they usually execute code on flock of machines with exceptional horizontal scalability and fault tolerance. I lack the information on Akka Streams side - from your description (best for high-performance streams that are part of the business logic) I would assume that we embed Akka Streams application into existing ones. That could give us superior vertical scalability (with concurrency backed by actors) but if that's just a single machine then how on earth can we talk about fault-tolerance? I must be missing something obvious :)

    • @rockthejvm
      @rockthejvm  3 ปีที่แล้ว

      Maybe a subject for a future video

  • @IslombekToshev-p6t
    @IslombekToshev-p6t 3 หลายเดือนก่อน +1

    Useful video

  • @dimfatal7259
    @dimfatal7259 4 ปีที่แล้ว

    Hey, Daniel, I’m absolutely beginner and I have question about fs2 library which also using for some kind of streaming. My question is - could it be alternative for some of the streaming library’s that you mentioned in this video?

    • @rockthejvm
      @rockthejvm  4 ปีที่แล้ว +2

      FS2 is a streaming library that's better used for application logic rather than data processing. Also it's quite hard for beginners.

  • @Pl4sm4feresi
    @Pl4sm4feresi 4 ปีที่แล้ว

    Is There any discount associate with your yearly full access membership? Here in Brazil things are complicated. Dollar is almost 6 times our currency.

    • @ziauddin5981
      @ziauddin5981 4 ปีที่แล้ว

      Hi Victor, Try RockTheJVM courses on Udemy.

    • @rockthejvm
      @rockthejvm  4 ปีที่แล้ว +2

      In the process of creating some location-adjusted optional discounts because I know things are unequal across the world

  • @Pl4sm4feresi
    @Pl4sm4feresi 4 ปีที่แล้ว +1

    I love your videos bro

  • @LucaSavoja
    @LucaSavoja 4 ปีที่แล้ว +7

    Awesome video as always. I'd love a course (on udemy, not free!) of kafka/kafka streams. The other one on udemy are not as good as yours.

  • @_slier
    @_slier 2 ปีที่แล้ว

    but i hate jvm related technology.. so, do i have any other choices? or just suck it up?

    • @rockthejvm
      @rockthejvm  2 ปีที่แล้ว +2

      Don't use the JVM.

  • @menphalla
    @menphalla 4 ปีที่แล้ว

    Hello. :-)