Distributed Systems 8.1: Collaboration software

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 พ.ย. 2024

ความคิดเห็น • 31

  • @zhou7yuan
    @zhou7yuan 3 ปีที่แล้ว +19

    Collaboration and conflict resolution [0:20]
    Families of algorithms [2:07]
    Conflict-free Replicated Data Types (CRDTs)
    Operational Transformation (OT)
    Conflicts due to concurrent updates [2:31]
    Operation-based map CRDT (algorithm) [3:35]
    Operation-based map CRDTs [8:08]
    Recall strong eventual consistency [8:32]
    CRDT algorithm implements this [9:00]
    State-based map CRDT (algorithm) [10:13]
    State-based CRDTs [13:54]
    Merge operator satisfy: communtative, associate, idenpotent
    State vs operation [14:30]
    op: smaller messages
    st: tolerate message loss/duplication
    Not necessarily uses broadcast [16:13]
    (google docs demo) [17:17]
    (disconnected) [17:48]
    Collaborative text editing: the problem [19:08]
    Operational transformation [22:00]
    Text editing CRDT [25:33]
    (character@float position)
    Operation-based text CRDT (1/2) [29:01]
    Operation-based text CRDT (2/2) [32:47]
    causal broadcast on insertion / deletion of a character

  • @davidespinosa1910
    @davidespinosa1910 ปีที่แล้ว +2

    Great explanation of OT: it transforms operations past (already executed) operations that are concurrent with them. That's the whole algorithm ! However, the whole point of OT is to use causal order instead of total order. If we require total order, OT is unnecessary.

  • @programming6881
    @programming6881 ปีที่แล้ว

    Awesome @Marin Klepmann, this is really good. Your explanations and the choice of examples are excellent. Thank you again. I am reading about distributed systems in the last few weeks, the lecture series increased my love for distributed systems.

  • @good-vibes-inc
    @good-vibes-inc 4 ปีที่แล้ว +8

    Maaan, I almost made it. Thanks for the lectures!

  • @1729abhishek
    @1729abhishek 9 หลายเดือนก่อน

    Say users 1 and 2 insert 'A' and 'B' at position 0. After broadcast and delivery, chars will converge to {(0, null, ⊢), (0.5, 1, 'A'), (0.5, 2, 'B'), (1, null, ⊣)} and 'A' will be visible on both devices. However, if either user attempts to delete 'A' they will see 'B' on their device because the request to delete the character at position 0 will only remove (0.5, 1, 'A') from chars. I found the lecture to be extremely helpful though.

  • @reubenkuhnert6870
    @reubenkuhnert6870 2 ปีที่แล้ว

    Excellent explanation, let me watch the whole series

  • @exoticcoder5365
    @exoticcoder5365 3 ปีที่แล้ว

    Text editing CRDT is very useful, thank you 🙏🏻

  • @fatihsenturk649
    @fatihsenturk649 3 ปีที่แล้ว +2

    well, what would happen if both two nodes insert a value at 0.875? for example, userA wanted to put F in between C and end-symbol, also concurrently userB wanted to put E at the same place. 28:57

    • @AlexeyPirogov
      @AlexeyPirogov 3 ปีที่แล้ว +4

      My understanding - this is a conflict and it can't be resolved 100% correct without human interaction.
      But this algorithm just uses nodeId in this case - sorts all characters by insert position and nodeId.

  • @mmfStudent
    @mmfStudent 2 ปีที่แล้ว +1

    Hey, do you have an example of practical usage of this algorithm with fractional indexes?

  • @yuanshaoqian
    @yuanshaoqian 11 หลายเดือนก่อน

    Nice lecture, awesome. Thanks for the work

  • @lidu007
    @lidu007 3 ปีที่แล้ว +1

    The statement at 25:06 was inaccurate. OT does not require total order broadcast. Reliable or even best-effort broadcast is enough. OT algorithms have an embedded lightweight mechanism to ensure causal delivery, e.g., by delivering an operation only after all causally preceding operations have been delivered according to their vector timestamps. Other than that, concurrent operations can be delivered in arbitrary order and are transformed against operations in the log to ensure convergence. That is, OT does not depend on a total order of delivery. OT achieves eventual consistency (convergence) after all operations are applied on all replicas, as long as causal relation is preserved.

    • @2tce
      @2tce 2 ปีที่แล้ว

      Total order broadcast is still eventually consistent right?

    • @1729abhishek
      @1729abhishek 9 หลายเดือนก่อน

      In the example itself, the delivery was not total order since the two inserts were delivered in different orders on the two nodes. However, the speaker qualified his statement with "generally". So, although weaker broadcasts such as causal may work with some OT data and operation models, there are also cases where total broadcast is needed. I am not sure which cases he is referring to but this is what I gathered from his statement.

  • @kirubhakaranmohan9811
    @kirubhakaranmohan9811 ปีที่แล้ว +1

    Hi,
    How complicated it would be with CRDT for Google Sheets ?
    Could you please post a video on that ?

  • @mykytapiskarov7291
    @mykytapiskarov7291 11 หลายเดือนก่อน

    Do I understand correctly that Operation-based map CRDTs works only if update entirely overwrites previous value?
    Because if value not entirely overwritten, e.g. value is string, and update operations is concatenation, then depend on order of updates - we'll get different result.

  • @chon-850
    @chon-850 3 ปีที่แล้ว

    love your book so much!

  • @kd6613
    @kd6613 ปีที่แล้ว

    Very few software that actually use Lamport or vector clocks?

  • @yangwang1577
    @yangwang1577 2 ปีที่แล้ว

    For the calendar use-case, don't we usually have a centralized storage for the calendar data? I always thought the sync would be between individual clients and the service and not between two clients (devices)...

    • @AndreiStreltsov
      @AndreiStreltsov 2 ปีที่แล้ว

      I think this does not change the algorithm. In addition to being a regular node, the server node is also tasked with broadcasting the events to each client. So essentially the server implements a mechanism for reliable broadcast.

    • @BharCode09
      @BharCode09 ปีที่แล้ว

      For the calendar use case, when the devices are in offline state, the changes are first saved locally (offline, in device storage). Now when the device comes online, the sync has to happen between devices with their locally applied changes and with centralized storage as well. When mutliple devices try to sync with Centralized storage, then there may be conflicts. That's what its explained here.

  • @AndreiStreltsov
    @AndreiStreltsov 2 ปีที่แล้ว

    I think the operation-based CRDT algorithm described in the video would not actually be suitable for a real calendar application.
    It uses Lamport clocks to impose ordering on the edit events, which means that any events that are concurrent with respect to the system itself are going to be essentially arbitrarily ordered, which may be inconsistent with causality from the user's perspective.
    For example say a user modifies the same key on both nodes in between syncs. From the system's perspective the modifications are concurrent, but for the user they are not, as the user known which edit happened before the other.
    I can think of only two options to overcome this. Either we have to use physical clocks, or hybrid vector+physical clock. When using a physical clock only, clock skew will lead to the loss of changes that originated on the lagging node even though logically many of the events could have been ordered correctly. By using a hybrid approach I think we could reduce the impact of clock skew by preferring logical timestamps when the events are not concurrent and falling back to physical timestamps only to order the truly concurrent events. Only vector clocks can be used for this, since Lamport timestamps don't differentiate between concurrent and non-concurrent events.

    • @2tce
      @2tce 2 ปีที่แล้ว

      I agree. This is similar to another comment I made elsewhere. See Google's Spanner. They use atomic clocks (physical)

    • @Ynno2
      @Ynno2 7 หลายเดือนก่อน

      I was very confused by this. If you had two devices in airplane mode which were synced to the same point and then made conflicting changes on the same field and used only Lamport clocks because we don't trust the wall clock time, I can't see how the results could be correct.
      I understand that because you could have some node ID that the Lamport clock timestamps would be unique and one could be guaranteed to clobber the other in some deterministic way and so in that sense is commutative and maybe "theoretically" correct. But for a real world perspective this would be a very bad way to resolve the conflict. It's LWW for some very perverse definition of "last". Or did I totally misunderstand this concept?

  • @陈卢鑫
    @陈卢鑫 2 ปีที่แล้ว

    What is the nodeId ? How to generate it ?

  • @sahibgrover3998
    @sahibgrover3998 2 ปีที่แล้ว

    Why does operation based crdt only need commutativity butstate based also need associativity

  • @Dake1989
    @Dake1989 2 ปีที่แล้ว

    I have a question about operation-based text CRDT (the one with the floating number positions). If the node id is used to break ties when merging, does that mean if userA inserts a sentence between some two initial characters and userB inserts a diffeent sentence between the same two initial characters, when the network heals, the merge process will interleave their edits character by character?
    XHello!X + XBye!X == XXHBeylel!o!X ??

    • @mikaylamaki4689
      @mikaylamaki4689 2 ปีที่แล้ว

      From what I understand, that’s exactly right! Martin’s other talk on CRDTs covers this and many other hairy edge cases: th-cam.com/video/x7drE24geUw/w-d-xo.html

    • @1729abhishek
      @1729abhishek 9 หลายเดือนก่อน

      Let's pick a smaller example. Say that a user types 'AB' on node 1 and another user types 'CD' on node 2, and now we must figure out how to merge the two strings. On node 1, insert(0,A) leads to chars = {(0, null, ⊢), (0.5, 1, A), (1, null, ⊣)}. Then, insert(1,B) leads to chars = {(0, null, ⊢), (0.5, 1, A), (0.75, 1, B), (1, null, ⊣)}. On node 2, the two insertions yield chars = {(0, null, ⊢), (0.5, 2, C), (0.75, 2, D), (1, null, ⊣)}. After broadcast and delivery, the chars on both nodes will converge to chars = {(0, null, ⊢), (0.5, 1, A), (0.5, 2, C), (0.75, 1, B), (0.75, 2, D), (1, null, ⊣)}. Since ElementAt gives precedence to node 1, the string `AB` will show for both users.