Not only these videos teach you about distributed systems concepts and solutions, but also about mental models on how to think about them, including how to present them graphically. Awesome work, Martin!
Linearizability strongest option atomically [0:55] as single copy [1:10] Consequence: "strong consistency" [1:33] also in shared-memory [2:06] ≠ serializability [3:08] Read-after-write consistency revisited [3:43] From the client's point of view [4:58] not happens-before [6:35] Operations overlapping in time [7:48] Not linearizable, despite quorum reads/writes [8:58] (client-only view) [11:17] Making quorum reads/writes linearizable [12:17] (client2 resend set) [12:56] Linearizability for different types of operation [14:33] compare-and-swap [15:30] linearizable CAS distributed -> total order broadcast [16:09] Linearizable compare-and-swap (CAS) (algorithm) [16:29]
Hi, It was just because of the space compaction, he had to show it like that, but the whole idea was still the linearizability time dependency between two get requests.
@14:30, client C is still not guaranteed to see v1 though? It will if it starts strictly after the finish of client B. The good thing is that at least the system is linearizable.
Regarding using sync read-repair to make quorum approach linearisable. From the book "designing-data-intensive-applications", it mentions "Interestingly, it is possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair synchronously, before returning results to the application, and **a writer must read the latest state of a quorum of nodes before sending its writes**" The last part about "writer must read before sending writes" is not mentioned in the lecture. Wondering why
Another thing that made me confused was that in System Design Interview by Alex Xu, it mentioned that in order to have strong consistency, we need to make sure every write operation finishes are done before gets for strong consistency, which is why they chose eventual consistency. It really seems to me that we only need a quorum. Is the book just wrong in this case?
the "OK" from node A only means the write to node A is successful, the entire write operation(set(x, v1)) is not done yet, as the progress bar is long in the figure. I think the key insight here is that all the set and get operations are happening concurrently.
I have an ultimate question that puzzles me for a long time since I first went thru these courses: If we can simply use vector clock or even lamport clock (that provides TOB)+ read repair to achieve linearizability in a leaderless replication, why do we bother having complicated algorithms such as Raft or Paxos to only achieve linearizability in a single leader replication?
In the DDIA book, there is a quote on page 350: "Linearizability is the same as total order broadcast? Not quite!". Would you say Linearizability is "stronger" than total order broadcast since the former can be built from the latter?
Hi Martin, thank you for the lecture. One question regarding quorum and linearizability. In your example of non-linearizable quorum, the write was asynchronously replicated, which can technically violate the quorum safety condition, as the write wasn't acknowledgement by a quorum of nodes. What if the quorum is used with synchronous replication, such that the write is not committed before receving such acknowledgement from the quorum. In that case, a read quorum will always find the most recent write. Any thougts?
@@nadaralpenidze9549 yes, these are concurrent, is it fine to modify values on other nodes in between with read repair ? because final operation can end up in failing. and say after you update with read repair, other client's get can see the quorum of reads where as it was not consistent with actual outcome, once original write fails
Thanks. However something is really unclear at 11:16, What happen if client 2 reads from B, C instead of A, B? it will get V0 instead of V1, is this still considered Strong consistent system?
I think that in that case, both Client 2 and Client 3 will get the same values, and this scenario appears to be valid. I believe the system is consistent if a read operation followed by the first read operation gets the same value, considering that we have no writes in between. Thus, the system will change its state from the perspective of the clients either when the value is updated on the (n+1)/2 nodes or when the clients detect the old value and update it.
The reads are considered concurrent with the write, because we haven't finished the quorum write yet. So it's not a violation of linearizability to return the old value for any concurrent read. But it would be a violation of linearizability if a read returned the new value, then a subsequent read returned the old value, because we have gone "back in time". Once any read sees the new value, every subsequent read must return it.
@@Ynno2 Ok, so in either case , what if write operation failed on other nodes after they see v1 value ? was this right for consistency or linearizability ?
At 14:06 , Quorum wouldn't become linearizable if client 2 request would have ended at B and C, right ? which basically means client 2 and client 3 are not seeing changes written by client 1 .
Yes I tend to agree with that. This sounds like "brute forcing" quorums to be linearizable. This is an example of how it can be linearizable. But your question is an example of how it can be non-linearizable. Generally, "linearizable" quorums are going to be O(n) broadcast for 1 node. And if another Quorum node also tries to broadcast a new value, you risk having O(n^2) writes.
Does any linearizable storage implicitly require total order ? or total order broadcast is just one particular approach that makes use of total order to solve the problem ?
At 18:37, for Linearizable CAS, on delivering (CAS,x,old,new) the update only happens if the replica has localState[x] = "old". What if the replica has very old data and has value "older" , where "older" was the value of x before it was set to "old" at some point in time in the past? In this case the value on the replica is not going to be updated from "older" to "new".
@@2tce Yes, but what if the pervious write (value = old) sent to this replica was lost and the replica still has the value "older" for x. There is no guarantee that writes will succeed, only that eventually they will converge.
@@QDem19 That will not happen because unless a node execute operation X, it cant execute operation X+1. This is delivery gurantee given by Total order broadcast.
Don't we also have to guarantee atomic writes to ensure linearizability even with quorum reads and writes? For example, if the quorum write operation failed because it only wrote to one node, and a subsequent quorum read just happened to pick up the value from that one node and conducts read repair. The write actually failed, but the value of the write did take into effect on the system
I was about to ask the same question. Did you get any answer for this ? I think, this is not correct to start read repair, before the full operation takes in efefct. Is this because we don't have any leader in this case ? , and finally TOB would communicate the effects ?
How about the case when request from client-2 and client-3 land on mentioned replicas but at the same time, Client-2's repair would not have finished so client-3 will still get the outdated value, isn't it or am I missing something here?
If client3 started before client2 finished, whether client3 reads v0 or v1 doesn't really matter, they both satisfy linearizability. It only matters if client3 started after client2 has finished, and client3 must read v1 in this case to satisfy linearizability.
"not linearizable, despite quorum reads/writes" - I don't believe this is a valid quorum read in the example shown. Quorum read requires majority nodes to return the same view. Therefore, for client2 get(x) to return v1, the set(x, v1) from client1 must have completed on 2/3 nodes. If this is the case, and client3 get(x) started after client2 get(x) completed, client3 would also see v1 in 2/3 nodes.
Not only these videos teach you about distributed systems concepts and solutions, but also about mental models on how to think about them, including how to present them graphically. Awesome work, Martin!
Linearizability
strongest option
atomically [0:55]
as single copy [1:10]
Consequence: "strong consistency" [1:33]
also in shared-memory [2:06]
≠ serializability [3:08]
Read-after-write consistency revisited [3:43]
From the client's point of view [4:58]
not happens-before [6:35]
Operations overlapping in time [7:48]
Not linearizable, despite quorum reads/writes [8:58]
(client-only view) [11:17]
Making quorum reads/writes linearizable [12:17]
(client2 resend set) [12:56]
Linearizability for different types of operation [14:33]
compare-and-swap [15:30]
linearizable CAS distributed -> total order broadcast [16:09]
Linearizable compare-and-swap (CAS) (algorithm) [16:29]
thank you!
14:23 - Client 3 may still get v0, but now it is OK since Client 3's "get" started before Client 2's one finished
Hi, It was just because of the space compaction, he had to show it like that, but the whole idea was still the linearizability time dependency between two get requests.
Thanks. I was wondering about this, too.
What about optimizing it by using vesion? I mean like you use MySQL as master, and use MySQL's update last time as version and sync slave like redis.
@14:30, client C is still not guaranteed to see v1 though? It will if it starts strictly after the finish of client B. The good thing is that at least the system is linearizable.
is this really a valid quorum? This does not seems to satify R+W>N (at 10:52)
Regarding using sync read-repair to make quorum approach linearisable.
From the book "designing-data-intensive-applications", it mentions "Interestingly, it is possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair synchronously, before returning results to the application, and **a writer must read the latest state of a quorum of nodes before sending its writes**"
The last part about "writer must read before sending writes" is not mentioned in the lecture. Wondering why
Another thing that made me confused was that in System Design Interview by Alex Xu, it mentioned that in order to have strong consistency, we need to make sure every write operation finishes are done before gets for strong consistency, which is why they chose eventual consistency.
It really seems to me that we only need a quorum. Is the book just wrong in this case?
I do not understand, how can client 1 get "OK" from only one node in a quorum of three at 10:23?
the "OK" from node A only means the write to node A is successful, the entire write operation(set(x, v1)) is not done yet, as the progress bar is long in the figure. I think the key insight here is that all the set and get operations are happening concurrently.
I have an ultimate question that puzzles me for a long time since I first went thru these courses: If we can simply use vector clock or even lamport clock (that provides TOB)+ read repair to achieve linearizability in a leaderless replication, why do we bother having complicated algorithms such as Raft or Paxos to only achieve linearizability in a single leader replication?
In the DDIA book, there is a quote on page 350: "Linearizability is the same as total order broadcast? Not quite!". Would you say Linearizability is "stronger" than total order broadcast since the former can be built from the latter?
Fault Tolerant FIFO-Total Order Broadcast Distributed Datastore with Linearizability + Compare-and-set
Hi Martin, thank you for the lecture.
One question regarding quorum and linearizability. In your example of non-linearizable quorum, the write was asynchronously replicated, which can technically violate the quorum safety condition, as the write wasn't acknowledgement by a quorum of nodes.
What if the quorum is used with synchronous replication, such that the write is not committed before receving such acknowledgement from the quorum. In that case, a read quorum will always find the most recent write.
Any thougts?
Answering to myself:
I figured out that the operations are happening concurrently, so client 1 is still pending an ack from the quorum.
@@nadaralpenidze9549 yes, these are concurrent, is it fine to modify values on other nodes in between with read repair ? because final operation can end up in failing. and say after you update with read repair, other client's get can see the quorum of reads where as it was not consistent with actual outcome, once original write fails
In the last slide, who do we return to?
Thanks Martin, this video is very helpful 🙏
Does this mean raft is linearizable? also what is the point of two-phase commit if total order broadcast already guarantees linearizability?
Thanks.
However something is really unclear at 11:16, What happen if client 2 reads from B, C instead of A, B? it will get V0 instead of V1, is this still considered Strong consistent system?
I think that in that case, both Client 2 and Client 3 will get the same values, and this scenario appears to be valid. I believe the system is consistent if a read operation followed by the first read operation gets the same value, considering that we have no writes in between. Thus, the system will change its state from the perspective of the clients either when the value is updated on the (n+1)/2 nodes or when the clients detect the old value and update it.
The reads are considered concurrent with the write, because we haven't finished the quorum write yet.
So it's not a violation of linearizability to return the old value for any concurrent read.
But it would be a violation of linearizability if a read returned the new value, then a subsequent read returned the old value, because we have gone "back in time". Once any read sees the new value, every subsequent read must return it.
@@Ynno2 Ok, so in either case , what if write operation failed on other nodes after they see v1 value ? was this right for consistency or linearizability ?
At 14:06 , Quorum wouldn't become linearizable if client 2 request would have ended at B and C, right ? which basically means client 2 and client 3 are not seeing changes written by client 1 .
Yes I tend to agree with that. This sounds like "brute forcing" quorums to be linearizable. This is an example of how it can be linearizable. But your question is an example of how it can be non-linearizable. Generally, "linearizable" quorums are going to be O(n) broadcast for 1 node. And if another Quorum node also tries to broadcast a new value, you risk having O(n^2) writes.
Does any linearizable storage implicitly require total order ? or total order broadcast is just one particular approach that makes use of total order to solve the problem ?
At 18:37, for Linearizable CAS, on delivering (CAS,x,old,new) the update only happens if the replica has localState[x] = "old".
What if the replica has very old data and has value "older" , where "older" was the value of x before it was set to "old" at some point in time in the past?
In this case the value on the replica is not going to be updated from "older" to "new".
This would never happen in a single leader. The Single Leader gets to write first. And for each write, broadcasts CAS(x, old = x, new).
@@2tce Yes, but what if the pervious write (value = old) sent to this replica was lost and the replica still has the value "older" for x. There is no guarantee that writes will succeed, only that eventually they will converge.
@@QDem19 That will not happen because unless a node execute operation X, it cant execute operation X+1. This is delivery gurantee given by Total order broadcast.
What other viewpoint is there other than clients viewm does anything work or we just keep talking and finally say I never said it would work.
amazing presentation thank you
Don't we also have to guarantee atomic writes to ensure linearizability even with quorum reads and writes? For example, if the quorum write operation failed because it only wrote to one node, and a subsequent quorum read just happened to pick up the value from that one node and conducts read repair. The write actually failed, but the value of the write did take into effect on the system
I was about to ask the same question. Did you get any answer for this ? I think, this is not correct to start read repair, before the full operation takes in efefct.
Is this because we don't have any leader in this case ? , and finally TOB would communicate the effects ?
Thank you man. A cool video. So ZooKeeper is eventually consistent system even if it says that it is linearizable
Excellent explanations!
How about the case when request from client-2 and client-3 land on mentioned replicas but at the same time, Client-2's repair would not have finished so client-3 will still get the outdated value, isn't it or am I missing something here?
If client3 started before client2 finished, whether client3 reads v0 or v1 doesn't really matter, they both satisfy linearizability. It only matters if client3 started after client2 has finished, and client3 must read v1 in this case to satisfy linearizability.
@@DanielQiu thanks for the explanation!
Excellent lectures but ads are really annoying
"not linearizable, despite quorum reads/writes" - I don't believe this is a valid quorum read in the example shown. Quorum read requires majority nodes to return the same view. Therefore, for client2 get(x) to return v1, the set(x, v1) from client1 must have completed on 2/3 nodes.
If this is the case, and client3 get(x) started after client2 get(x) completed, client3 would also see v1 in 2/3 nodes.
the key insight here is all the read and write operations are happening concurrently!
Blind writes have to be idempotent ...isn’t it ?
Anyone else noticed the sound effect at 5:55? :)
no ! I guess I am watching the video from another replica :P
@@OmarQunsul lol. hahaa