Kafka - Exactly once semantics with Matthias J. Sax

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ธ.ค. 2024

ความคิดเห็น • 14

  • @imranfp
    @imranfp 3 หลายเดือนก่อน +2

    Well presented and explained well. The best part when Matthias use pointer to exactly mention what part in presentation is under discussion. Thanks 👏

  • @shyamagrawal6161
    @shyamagrawal6161 2 ปีที่แล้ว +4

    One of the best practical session on Kafka .. Thanks to both of you for efforts and valuable time

  • @JardaniJovonovich192
    @JardaniJovonovich192 2 ปีที่แล้ว +5

    This is a brilliant video on EOS in Kafka. Thanks to you and Matthias!
    A couple of questions,
    1. I did not completely understand why a watermark is even required if a consumer can only read committed events. Could you please shed some light on what a watermark is and why is it needed exactly?
    2. If a microservice is spawned(and running) on 5 Kubernetes pods, that means, each pod would have the same code running inside it. In this code, assume that the kafka producer code with transactional semantics is also running. Now, in this case, how do I give a separate transaction_id to each of the pods by ensuring the code is one and the same,
    Do I have to provide `producer.transaction_id = random_generated_id ` ?

    • @TheGeekNarrator
      @TheGeekNarrator  2 ปีที่แล้ว +2

      “Reply from Matthias (for some reason his comments are not visible)”
      Glad you liked the episode!
      1) We did not cover all details about the “last stable offset” (LSO) watermark. To give a brief answer: when sending a batch of messages to the consumer, the broker needs to attach some metadata that the consumer needs to filter out aborted messages. If the batch would contain pending messages, the broker cannot compute this metadata.
      2) Yes, you need a unique id per pod. Using a random one might work but could have some disadvantages. A better way might be to derive the id using some unique pod metadata (you would need to deploy as a stateful set). Check out www.confluent.io/kafka-summit-sf18/deploying-kafka-streams-applications/ for more details.
      Hope this helps.

    • @JardaniJovonovich192
      @JardaniJovonovich192 2 ปีที่แล้ว +1

      @@TheGeekNarrator
      Thanks, Kaivalya and Matthias, that makes sense now. One more question related to Kafka configs in general,
      Which among the parameters `max.partition.fetch.bytes`, `max.poll.interval.ms`, `max.poll.records`, `fetch.max.bytes` needs to be tuned to increase the throughput of a Kafka streams consumer? also a brief about each parameter would really help, official documentation is a bit confusing to understand :(

    • @TheGeekNarrator
      @TheGeekNarrator  2 ปีที่แล้ว +2

      Matthias is unable to comment because of some restrictions so I am posting his reply:
      In general, I would recommend you to ask questions like this on StackOverflow, the Kafka user mailing list, or the Confluent channels (Developer Forum or Community Slack).
      To answer your question:
      If a broker hosts multiple partitions that a consumer reads from, the consumer sends a single fetch request to the broker, requesting data from all those partitions, and the broker will send back a single batch of data (that may contain messages from multiple of those partitions). The parameter `fetch.max.bytes` controls the maximum overall response size from the broker. The config `max.partition.fetch.bytes` controls how much data the broker can add to the response per partition. Ie, this second parameter can be used if you want to limit the data of a single partition per round-trip. In general, `fetch.max.bytes` is the parameter you might want to tune to increase throughput.
      `max.poll.interval.ms` defines how frequently `poll()` must be called, and has nothing to do with throughput. It basically effects the liveness check for the consumer, and it's also related to rebalancing. Lastly, `max.poll.records` controls how many record a single `poll()` call may return. Again, it has nothing to do with throughput. If a fetch requests, returns 1000 records in the respons, and you set `max.poll.records` to 100, you would need to call `poll()` 10 time to consumer the full response. Ie, 9 of the `poll()` calls will be served from the consumer in-memory buffer without the need to fetch data from the broker. `max.poll.records` is related to `max.poll.interval.ms`: if you allow more record to be returned from `poll()` you need more time to process the data and thus you will call `poll()` less frequent (ie, you spent more time between two `poll()` calls). Ie, either decreasing `max.poll.records` or increasing `max.poll.interval.ms` can help you to not drop out of the consumer group due to timeouts.
      Hope this helps.

    • @JardaniJovonovich192
      @JardaniJovonovich192 2 ปีที่แล้ว +1

      @@TheGeekNarrator Thank you very much Matthias and Kaivalya. This definitely helps

  • @pietrogalassi1113
    @pietrogalassi1113 2 ปีที่แล้ว +2

    Brilliant video!!! Some questions:
    1. If i have a long running execution with external system calls (let's say from 2 minutes to 50 minutes) which is the best approach to handle them with Kafka Streams and EOS ?
    2. Is EOS also applicable to simple kafka consumer/producer (they're used internally by kstreams) and if so how to do it in pair with long running executions ?
    Thanks!

    • @TheGeekNarrator
      @TheGeekNarrator  ปีที่แล้ว +2

      Thank you @Pietro Galassi. Regarding your questions, here’s my take:
      1) In my experience using Kafka streams for long running execution isn’t ideal (I would let Matthias correct me). Mainly because while you are executing a task (read processing an event) the partition is blocked (waiting for previous execution). I would go for a queuing system like SQS for such cases.
      2) IIUC EOS is applicable using streams API only. Using low level consumer, producer APIs you can get idempotent behaviour.
      I will let Matthias know about the question.

    • @TheGeekNarrator
      @TheGeekNarrator  ปีที่แล้ว +1

      Reply from Matthias. Unfortunately he is unable to reply directly because of some issue. 👇
      The main point for EOS is of course, that side-effects are not covered. Ie, if the external call modifies some external state, those modifications might be done more than once (if the TX fails and and retried). -- Also, there is the TX-timeout: you would need to increase it, implying that you block downstream consumers because LSO does not advance. So you should not combine external calls and EOS.
      For plain consumer/producers: it's a big "it depends". For "plain" consumer/producer there is not EOS, because it's impossible. Read_Committed implies you won't read aborted data, but you still might read committed data multiple times. For the producer, you can use the TX-API, but EOS is more than using TX. -- I actually submitted a Kafka Summit talk (for London 2023) about EOS and will cover the details there if the talk gets accepted.

  • @rajatahuja4720
    @rajatahuja4720 ปีที่แล้ว

    How to implement exactly Kafka consumer because ocnsumer can die before ack the broker and when it comes back it can read the message again ? how can we implement Kafka consumer which is EOS?

    • @TheGeekNarrator
      @TheGeekNarrator  ปีที่แล้ว +1

      Hi Rajat, I guess we discussed that scenario here. Didn’t we? With the transactional and idempotent producer and read committed consumer you achieve EOS. You can find more details on this blog. Let me know if you have specific questions www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/

  • @VJBHARADWAJ
    @VJBHARADWAJ ปีที่แล้ว +1

    👌👌👌