I was reading your book and got tired at the beginning of chapter 8 then I found your TH-cam channel while trying to watch some videos before I dig into the chapter! Thanks for all your work in making this field more understandable.
@martin Kleppmann, thanks for the interesting presentation all the way from Cambridge. I'll like to suggest that we could update the Linearizable CAS to: IF old = new THEN success := true There is no point comparing the old and new, if they are the same. :)
atomic commitment is completely different from atomic in ACID. For example, if students and classes are handles on different nodes, then after all components have voted yes and the coordinator send the commit messages, there will be a moment when the student has enrolled in a class but the class does not yet exist or vise versa. This is completely different from "atomic" in ACID.
Hello, the video is so helpful but hope that my question can be clarified, best. Does the coordinator node care if other nodes have committed successfully or not, if it does and a node failed to commit, does the coordinator make a second decision for sending an abort to all the nodes?
Hi Martin, you told failure detector can be run on any node. So my doubts are what will happen if the specific node is down or crashed on which failure detector is running?? and then how we will detect how many other nodes also crash??
Am I right in understanding that we can use raft to send total order broadcasts and elect new coordinators for node communication and two phase commit for commiting data?
Hi Martin, Thanks for this amazing series. I have a question here . If for any replica there are conflicting answers (one sent by the replica itself and other sent by other node on behalf of the replica(suspecting the replica is down) around the same time, shouldn't it take the later decision instead of the first decision? If some other node said a "No" (on this replica's behalf) and then the actual replica recovers itself and says a "yes" , then taking the later decision looks more logical . Same is true in the opposite case.
At first glance, that approach is appealing, since it appears to be the safest, avoiding any confusion by taking the most conservative default position. However, that isn't actually necessary, by virtue of the way Total Order Broadcast works. This is down to the relative timing of the slow / recovered replica's vote of "Yes", and the consensus decision by all the nodes. If the "Yes" vote is received from the slow node _after_ all the other "No" votes from others on its behalf, those "No" votes are overridden by the "Yes", since that was the first vote seen by it from others. What's not entirely clear from the video is precisely when a consensus is considered to have been reached, and if/how this is consequently communicated among them. Presumably, if all the other nodes have already settled on the decision against proceeding before the "Yes" vote is received from the slow one, then that decision is not invalidated. The previous video in this series may expand upon this.
what if is nodes reply to co ordinator that yes we can peform this transaction and send out the ok message (in response to prepare) but after sending the prepare they crash ? I assume these nodes will replicate the data (via consensus) so even in the face of faliure another leader will get elected. I do understand how total order broadcast work via raft but I am unable to how data is locked ?
What happens if one of the nodes has sent ok for prepare but while waiting for all the oks it crashes ? The transaction will go forward in all the other nodes.
Thank you for going over this. I have a question regarding slide 2 of the Fault tolerant 2PC. Which node is taking the decision on the fate of the transaction, is it the current term leader of the Total Order Broadcast, or can it be any node participating in the transaction. It seems like it should be the former, i.e. current term leader, but just wanted to be sure.
Each node can independently understand if the distributed transaction failed: each node receives the same sequence of messages and the algorithm used to determine if the transaction failed is deterministic. So all the nodes will reach the same conclusion without the need of a coordinator.
@@jainamm5307 For the proposed fault-tolerant version of the 2PC, we use total order broadcast as communication primitive. So by definition all nodes receive the same messages in the same order. If you are interested in how to achieve this, there are other videos in this channel explaining it very well
I still don't get how with geographically distributed nodes (with different ping/latency to each other)... total order broadcast can prevent a (very rare and unlikely) race condition where you have 5/10 nodes that get the failure detector message to abort fractions of a second before the sluggish node sends a vote to go ahead and commit... and the other 5/10 nodes would have the opposite ordering If it happens at exactly the same time... due to network latency effects... you could have a split of the network (5 nodes with low ping to the failure detector and 5 nodes with low ping to the sluggish node but high ping to the failure detector)... so in that case do you just go with majority rules and always have an odd total number of nodes to decide which is the true(er) version of history? But now we are into 3 phases not 2 phases... So is this like a shitty version of the raft protocol or something where it assumes 0 network latency?
Total order broadcast requires consensus and if only 5/10 nodes have agreed then there's no quorum and no consensus. Neither event will be actionable until n/2+1 nodes have received it. If there is a 50%/50% split, neither side of the split will make any decisions (nothing will be committed and everything will grind to a halt) until the partition is resolved.
I thought consensus are used in databases but looks like consensus can't solve atomic commit problem. Can anyone explain the real application of consensus?
I have one more doubt. So we will wait to get an "OK" message from all the replicas or we will commit to a specific replica after receiving the "OK" message??. I mean if we will wait for all the replicas that make sense but if we just commit after receiving "OK" then it may consist of inconsistency. Ex if one replica sends the message "OK" and we commit the change to a specific replica but the other replica crash and does not send the "OK" message then both replica will be inconsistent.
Got a job because of you... you changed my life... thank you
As soon as I find enough time I'm going to go through all the series. Thank you for making the effort.
Thanks for putting these lectures on youtube--education should be accessible to all
I was reading your book and got tired at the beginning of chapter 8 then I found your TH-cam channel while trying to watch some videos before I dig into the chapter! Thanks for all your work in making this field more understandable.
Mr. Kleppmann , I love your book and the way you explain things in your videos. Thank you so much for creating this material.
Wow, I didn't know Martin has a YT channel. Instant subscribe.
Delighted to watch the series. Thanks for creating this. I am already grateful to you because of "DDIA"
Great lecture, straight to the point. Thanks for the effort put into it and the adequate way of explaining it.
"Consistency" [0:11]
ACID
Read-after-write-consistency (lecture 5)
Replication
Consistency model
Distributed transactions [2:26]
Atomic commit versus consensus [4:47]
>1 propose | all votes
any 1 proposed value decided | must all commit/abort
crash tolerated | abort if 1 node crash
Two-phase commit (2PC) [6:33]
(key moment) [9:45]
The coordinator in two-phase commit [10:25]
Fault-tolerant two-phase commit (1/2) [12:58]
Fault-tolerant two-phase commit (2/2) [16:43]
Legend!! Im passing this course cuz of this playlist, the whole distributed systems in 1 day thanks to you
Bas yala ya abdo
@@iyadelwy1500 😂😂😂 walahy sebtaha 3shan enta tshofha
Crystal clear explanation. Hats off to you, Martin!
Great lectures, Great book, Great author 👍
super illustrative. Thank you!
Grateful for the amazing lecture! Finally, get some impression about how Raft works.
@martin Kleppmann, thanks for the interesting presentation all the way from Cambridge. I'll like to suggest that we could update the Linearizable CAS to:
IF old = new THEN
success := true
There is no point comparing the old and new, if they are the same. :)
you my guy are a gem of humanity
Very nice and detailed video. I would love to see your three-phase commit explanation,
Amazing lectures. Thank you so much. You are a god.
Hope you can make a video to explain three phase commit and how it improves fault tolerance.
What does the failed replica do when it comes up?
Big thanks
atomic commitment is completely different from atomic in ACID. For example, if students and classes are handles on different nodes, then after all components have voted yes and the coordinator send the commit messages, there will be a moment when the student has enrolled in a class but the class does not yet exist or vise versa. This is completely different from "atomic" in ACID.
Hello, the video is so helpful but hope that my question can be clarified, best.
Does the coordinator node care if other nodes have committed successfully or not, if it does and a node failed to commit, does the coordinator make a second decision for sending an abort to all the nodes?
Very clear explanation!
Hi Martin, you told failure detector can be run on any node. So my doubts are what will happen if the specific node is down or crashed on which failure detector is running?? and then how we will detect how many other nodes also crash??
Am I right in understanding that we can use raft to send total order broadcasts and elect new coordinators for node communication and two phase commit for commiting data?
Question: Why is the "prepare" message necessary if replicas "ack" on the original transaction message?
Good explanation! Thank you!
Hi Martin, Thanks for this amazing series. I have a question here . If for any replica there are conflicting answers (one sent by the replica itself and other sent by other node on behalf of the replica(suspecting the replica is down) around the same time, shouldn't it take the later decision instead of the first decision? If some other node said a "No" (on this replica's behalf) and then the actual replica recovers itself and says a "yes" , then taking the later decision looks more logical . Same is true in the opposite case.
At first glance, that approach is appealing, since it appears to be the safest, avoiding any confusion by taking the most conservative default position. However, that isn't actually necessary, by virtue of the way Total Order Broadcast works. This is down to the relative timing of the slow / recovered replica's vote of "Yes", and the consensus decision by all the nodes. If the "Yes" vote is received from the slow node _after_ all the other "No" votes from others on its behalf, those "No" votes are overridden by the "Yes", since that was the first vote seen by it from others.
What's not entirely clear from the video is precisely when a consensus is considered to have been reached, and if/how this is consequently communicated among them. Presumably, if all the other nodes have already settled on the decision against proceeding before the "Yes" vote is received from the slow one, then that decision is not invalidated. The previous video in this series may expand upon this.
thank you
This is fantastic ! thank you so much :)
Fault tolerant 2PC means the coordinator is redundant and can be removed?
Why client is opening transaction simultaneously on 2 nodes in 2PC ? shouldn't the transaction be open on master node only ?
what if is nodes reply to co ordinator that yes we can peform this transaction and send out the ok message (in response to prepare) but after sending the prepare they crash ? I assume these nodes will replicate the data (via consensus) so even in the face of faliure another leader will get elected. I do understand how total order broadcast work via raft but I am unable to how data is locked ?
If database gone down after it had agreed to commit, what would you do?
What happens if one of the nodes has sent ok for prepare but while waiting for all the oks it crashes ? The transaction will go forward in all the other nodes.
One potential solution to this problem is to have a recovery mechanism for the node when it comes back up.
One potential solution is to have a recovery mechanism for the node when it comes back up.
Thank you for going over this.
I have a question regarding slide 2 of the Fault tolerant 2PC. Which node is taking the decision on the fate of the transaction, is it the current term leader of the Total Order Broadcast, or can it be any node participating in the transaction.
It seems like it should be the former, i.e. current term leader, but just wanted to be sure.
Each node can independently understand if the distributed transaction failed: each node receives the same sequence of messages and the algorithm used to determine if the transaction failed is deterministic.
So all the nodes will reach the same conclusion without the need of a coordinator.
@@giorgiobuttiglieri5876 When you say each node receives the same sequence of messages - how is the "sequence" guranteed to be the same in every node?
@@jainamm5307 For the proposed fault-tolerant version of the 2PC, we use total order broadcast as communication primitive.
So by definition all nodes receive the same messages in the same order.
If you are interested in how to achieve this, there are other videos in this channel explaining it very well
So is the coordinator used for decision making on commits, and the total order broadcast system just a backup in case the coordinator crashes?
I still don't get how with geographically distributed nodes (with different ping/latency to each other)... total order broadcast can prevent a (very rare and unlikely) race condition where you have 5/10 nodes that get the failure detector message to abort fractions of a second before the sluggish node sends a vote to go ahead and commit... and the other 5/10 nodes would have the opposite ordering
If it happens at exactly the same time... due to network latency effects... you could have a split of the network (5 nodes with low ping to the failure detector and 5 nodes with low ping to the sluggish node but high ping to the failure detector)... so in that case do you just go with majority rules and always have an odd total number of nodes to decide which is the true(er) version of history? But now we are into 3 phases not 2 phases...
So is this like a shitty version of the raft protocol or something where it assumes 0 network latency?
Total order broadcast requires consensus and if only 5/10 nodes have agreed then there's no quorum and no consensus. Neither event will be actionable until n/2+1 nodes have received it. If there is a 50%/50% split, neither side of the split will make any decisions (nothing will be committed and everything will grind to a halt) until the partition is resolved.
I thought consensus are used in databases but looks like consensus can't solve atomic commit problem. Can anyone explain the real application of consensus?
Consensus achieves total order broadcast, i.e. all nodes deliver messages/operations in the same order.
Discourse from "distributed systems" God himself.
Very nice 👌
"Reasonably simple way"... Yeah. That's what I thought
I have one more doubt. So we will wait to get an "OK" message from all the replicas or we will commit to a specific replica after receiving the "OK" message??. I mean if we will wait for all the replicas that make sense but if we just commit after receiving "OK" then it may consist of inconsistency. Ex if one replica sends the message "OK" and we commit the change to a specific replica but the other replica crash and does not send the "OK" message then both replica will be inconsistent.
Would not make more sense to use a Queue as or helper for the coordinator
First cut the fucking hair xD next recording...