Martin Kleppmann | Kafka Summit London 2019 Keynote | Is Kafka a Database?

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ก.ค. 2024
  • Martin Kleppmann is a distributed systems researcher at the University of Cambridge, and author of the acclaimed O’Reilly book “Designing Data-Intensive Applications” (dataintensive.net/). Previously he was a software engineer and entrepreneur, co-founding and selling two startups, and working on large-scale data infrastructure at LinkedIn.
    ABOUT CONFLUENT
    Confluent, founded by the creators of Apache Kafka®, enables organizations to harness business value of live data. The Confluent Platform manages the barrage of stream data and makes it available throughout an organization. It provides various industries, from retail, logistics and manufacturing, to financial services and online social networking, a scalable, unified, real-time data pipeline that enables applications ranging from large volume data integration to big data analysis with Hadoop to real-time stream processing. To learn more, please visit confluent.io
    #kafkasummit #apachekafka #database
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 29

  • @demokraken
    @demokraken 4 ปีที่แล้ว +14

    Martin's book is a marvel, highly recommend for people interested in design of distributed applications.

  • @kevinhock1041
    @kevinhock1041 5 ปีที่แล้ว +7

    Really awesome talk, his book is great too

  • @harshitsinghai1395
    @harshitsinghai1395 3 ปีที่แล้ว +3

    His book is my first tech book ever. I'm proud to have chosen his book as my first. Totally worth it.

  • @el_chivo99
    @el_chivo99 3 หลายเดือนก่อน

    ok i’ve actually asked myself this very question

  • @xinyuanliu1959
    @xinyuanliu1959 3 ปีที่แล้ว +3

    Trying to make some notes here..By replying to the ordering of messages in a Kafka topic partition, we have achieved serializable executing this transaction, because the stream processor for each individual partition is just a single-threaded linear sequential process. We get scalability by being able to do lots of partitions in parallel. Partition by a partition key. Transactions in a database is broken down into multi-stage streaming pipeline in Kakfa. We can get better consistency than many real database.

    • @hl5768
      @hl5768 2 ปีที่แล้ว

      it likes using lua in redis

  • @applerr22
    @applerr22 2 ปีที่แล้ว

    For achieving the positive account balance consistency the suggested model is not enough as there is nothing stopping credit event being processed even if debit fails. This can be achieved with additional checks for same event id before performing credit event but this will require a db. Other way could be to generate credit event only after debit succeeds but that will have its own trade offs.

  • @sandeepkumarverma8754
    @sandeepkumarverma8754 4 ปีที่แล้ว +1

    In the case of isolation, what if one consumer picked the message to create user 'Jane' and Kafka rebalanced and delivered the same message to another consumer. Now both the consumers are trying to create user 'Jane' into some database. Now again we have a problem of two 'Jane' users get created.

  • @iavasilev
    @iavasilev 4 ปีที่แล้ว +2

    Link to the article from presentation: queue.acm.org/detail.cfm?id=3321612

  • @anurag870
    @anurag870 5 ปีที่แล้ว +5

    deja vu :)

  • @sumitstir
    @sumitstir 3 ปีที่แล้ว

    How's the scalability gains from having a partitioned message bus compare with directly partitioning a transactional database like Mysql? Given that we need to support the required write throughput irrespective of if there is kafka in between, what exact advantage is kafka providing here?

    • @rishabhgpt3
      @rishabhgpt3 3 ปีที่แล้ว +1

      Distributed transactions !!

  • @rajsaraogi
    @rajsaraogi 4 ปีที่แล้ว +1

    How about using change data capture and listen to changes of our primary database and then capture them to update others like search indexes or the caching dbs ??

    • @rajsaraogi
      @rajsaraogi 4 ปีที่แล้ว

      @@thebeckettgroup yes then which way to take log based architecture or the change Capture ?

    • @HassanDibani
      @HassanDibani 4 ปีที่แล้ว

      @@rajsaraogi CDC is essentially reading the database's log.

  • @fb-gu2er
    @fb-gu2er 2 หลายเดือนก่อน

    Durability is loosely defined. A durable record doesn’t disappear after you read it at some point

  • @metaocloudstudio2221
    @metaocloudstudio2221 2 ปีที่แล้ว

    The all talk make sense, but I have heard the opposite while ago that "Kafka is not a database". So I am confused why not using Kafka as a SoT?

  • @luzyoz143
    @luzyoz143 4 ปีที่แล้ว

    More ACID = Better Databases?

  • @MechanicalEI
    @MechanicalEI 5 ปีที่แล้ว +8

    So... kafta is a database?

    • @Ayoub-adventures
      @Ayoub-adventures 2 หลายเดือนก่อน

      Actually, he didn't project the concept of Durability on Kafka, which for me is what is missing in Kafka to be a database.
      Conclusion of the talk is that the hard to implement ACID guarantees in traditional databases are made easy using Kafka. But that's not a new idea, since most NoSQL databases use commit log to achieve that

  • @Rbcksqheclfy
    @Rbcksqheclfy 2 ปีที่แล้ว

    Dear Confluent,
    So what do you want to achieve here compare to the previous naive example? How can that be compared to a proper distributed transaction?
    th-cam.com/video/BuE6JvQE_CY/w-d-xo.html In this example, let's imagine some event appending to Kafka was succeeded, index and cache updates were applied, but not to the database, the dead event just did not get applied to the database, the data integrity between index\cache and database is corrupted. The advantage of this approach is to have an event log; I don't see anything about proper distributed transactions and atomicity for non-eventually consistent systems.
    Please explain.

  • @Kingslyt
    @Kingslyt 4 ปีที่แล้ว +4

    Great talk. I like the idea illustrated here and not a fan of XA, but wanted to point out a factual inaccuracy in this talk. It is not true that read commited isolation level allows the scenario described at 16:28, which is dirty reads (neither read uncommited nor phantom reads), which is reading what has not been commited yet. Even if one considers the stretched definition of atomicity in this talk and read-commited isolation level together, then there won't be a scenario with relational databases that you would see account1 debitted and account2 not credited.

    • @asn90436
      @asn90436 3 ปีที่แล้ว

      I think what he said is write skew not dirty reads

    • @gstraylz
      @gstraylz 3 ปีที่แล้ว +1

      Its not about dirty reads. Suppose you are selecting both accouns and you've selected one before commit and second after.
      Read committed does allow that, although both Oracle and Postgres have a bit stronger guarantees on default level (snapshot isolation), thus for provided example you won't be able to see inconsistent sum over accounts in aforementioned databases.

    • @Rusebor
      @Rusebor 3 ปีที่แล้ว +2

      It is a HUGE mistake from Martin. Which makes the whole talk not great at all.
      His example should have proved that Kafka and a relational database were the same thing. But it proved the opposite.
      Unfortunately he did not show what would happened should account 12345 had 0 balance.
      I assume that in that case we should have to emit an event to credit account 54321. But we could’t do this. We separated the original message (transaction) into two independent events.
      In his example we should have emitted the credit event for 54321 _only after_ we debited 12345 successfully.
      But even in that case it is not possible to do it in one step. We can’t write to database and Kafka in the same transaction.
      Kafka is needless here.

    • @sumitstir
      @sumitstir 3 ปีที่แล้ว

      @@Rusebor yeah, what we really want in this situation is a transactional database with a CDC based approach to update the cache and search index.

    • @sumitstir
      @sumitstir 3 ปีที่แล้ว

      @@gstraylz It's not the same, in that case if the user refreshes the balance for first account he is guaranteed to have updated value, while same is not true with kafka approach as the 2 events might be published to different partitions, and there is no guarantee for when events in different partitions are processed due to lag.