I was reading your book and got tired at the beginning of chapter 8 then I found your TH-cam channel while trying to watch some videos before I dig into the chapter! Thanks for all your work in making this field more understandable.
Hello, the video is so helpful but hope that my question can be clarified, best. Does the coordinator node care if other nodes have committed successfully or not, if it does and a node failed to commit, does the coordinator make a second decision for sending an abort to all the nodes?
Please correct me if i missed something..Before client sends commit message to coordinator for 2 phase commit start..it perform normal transaction on replicas but my confusion is if the problem happens during that write i.e. one replica performed write but another fails then how two phase will help...i thought that the entire reason of two-phase commit is to perform write in prepare phase and the commit in commit phase...why we are allowing normal update to both replicas before 2pc can start the process.
Am I right in understanding that we can use raft to send total order broadcasts and elect new coordinators for node communication and two phase commit for commiting data?
Hi Martin, you told failure detector can be run on any node. So my doubts are what will happen if the specific node is down or crashed on which failure detector is running?? and then how we will detect how many other nodes also crash??
@martin Kleppmann, thanks for the interesting presentation all the way from Cambridge. I'll like to suggest that we could update the Linearizable CAS to: IF old = new THEN success := true There is no point comparing the old and new, if they are the same. :)
what if is nodes reply to co ordinator that yes we can peform this transaction and send out the ok message (in response to prepare) but after sending the prepare they crash ? I assume these nodes will replicate the data (via consensus) so even in the face of faliure another leader will get elected. I do understand how total order broadcast work via raft but I am unable to how data is locked ?
Hi Martin, Thanks for this amazing series. I have a question here . If for any replica there are conflicting answers (one sent by the replica itself and other sent by other node on behalf of the replica(suspecting the replica is down) around the same time, shouldn't it take the later decision instead of the first decision? If some other node said a "No" (on this replica's behalf) and then the actual replica recovers itself and says a "yes" , then taking the later decision looks more logical . Same is true in the opposite case.
At first glance, that approach is appealing, since it appears to be the safest, avoiding any confusion by taking the most conservative default position. However, that isn't actually necessary, by virtue of the way Total Order Broadcast works. This is down to the relative timing of the slow / recovered replica's vote of "Yes", and the consensus decision by all the nodes. If the "Yes" vote is received from the slow node _after_ all the other "No" votes from others on its behalf, those "No" votes are overridden by the "Yes", since that was the first vote seen by it from others. What's not entirely clear from the video is precisely when a consensus is considered to have been reached, and if/how this is consequently communicated among them. Presumably, if all the other nodes have already settled on the decision against proceeding before the "Yes" vote is received from the slow one, then that decision is not invalidated. The previous video in this series may expand upon this.
What happens if one of the nodes has sent ok for prepare but while waiting for all the oks it crashes ? The transaction will go forward in all the other nodes.
Thank you for going over this. I have a question regarding slide 2 of the Fault tolerant 2PC. Which node is taking the decision on the fate of the transaction, is it the current term leader of the Total Order Broadcast, or can it be any node participating in the transaction. It seems like it should be the former, i.e. current term leader, but just wanted to be sure.
Each node can independently understand if the distributed transaction failed: each node receives the same sequence of messages and the algorithm used to determine if the transaction failed is deterministic. So all the nodes will reach the same conclusion without the need of a coordinator.
@@jainamm5307 For the proposed fault-tolerant version of the 2PC, we use total order broadcast as communication primitive. So by definition all nodes receive the same messages in the same order. If you are interested in how to achieve this, there are other videos in this channel explaining it very well
atomic commitment is completely different from atomic in ACID. For example, if students and classes are handles on different nodes, then after all components have voted yes and the coordinator send the commit messages, there will be a moment when the student has enrolled in a class but the class does not yet exist or vise versa. This is completely different from "atomic" in ACID.
I still don't get how with geographically distributed nodes (with different ping/latency to each other)... total order broadcast can prevent a (very rare and unlikely) race condition where you have 5/10 nodes that get the failure detector message to abort fractions of a second before the sluggish node sends a vote to go ahead and commit... and the other 5/10 nodes would have the opposite ordering If it happens at exactly the same time... due to network latency effects... you could have a split of the network (5 nodes with low ping to the failure detector and 5 nodes with low ping to the sluggish node but high ping to the failure detector)... so in that case do you just go with majority rules and always have an odd total number of nodes to decide which is the true(er) version of history? But now we are into 3 phases not 2 phases... So is this like a shitty version of the raft protocol or something where it assumes 0 network latency?
Total order broadcast requires consensus and if only 5/10 nodes have agreed then there's no quorum and no consensus. Neither event will be actionable until n/2+1 nodes have received it. If there is a 50%/50% split, neither side of the split will make any decisions (nothing will be committed and everything will grind to a halt) until the partition is resolved.
I thought consensus are used in databases but looks like consensus can't solve atomic commit problem. Can anyone explain the real application of consensus?
I have one more doubt. So we will wait to get an "OK" message from all the replicas or we will commit to a specific replica after receiving the "OK" message??. I mean if we will wait for all the replicas that make sense but if we just commit after receiving "OK" then it may consist of inconsistency. Ex if one replica sends the message "OK" and we commit the change to a specific replica but the other replica crash and does not send the "OK" message then both replica will be inconsistent.
Got a job because of you... you changed my life... thank you
As soon as I find enough time I'm going to go through all the series. Thank you for making the effort.
I was reading your book and got tired at the beginning of chapter 8 then I found your TH-cam channel while trying to watch some videos before I dig into the chapter! Thanks for all your work in making this field more understandable.
Mr. Kleppmann , I love your book and the way you explain things in your videos. Thank you so much for creating this material.
Thanks for putting these lectures on youtube--education should be accessible to all
"Consistency" [0:11]
ACID
Read-after-write-consistency (lecture 5)
Replication
Consistency model
Distributed transactions [2:26]
Atomic commit versus consensus [4:47]
>1 propose | all votes
any 1 proposed value decided | must all commit/abort
crash tolerated | abort if 1 node crash
Two-phase commit (2PC) [6:33]
(key moment) [9:45]
The coordinator in two-phase commit [10:25]
Fault-tolerant two-phase commit (1/2) [12:58]
Fault-tolerant two-phase commit (2/2) [16:43]
Delighted to watch the series. Thanks for creating this. I am already grateful to you because of "DDIA"
Great lecture, straight to the point. Thanks for the effort put into it and the adequate way of explaining it.
Legend!! Im passing this course cuz of this playlist, the whole distributed systems in 1 day thanks to you
Bas yala ya abdo
@@iyadelwy1500 😂😂😂 walahy sebtaha 3shan enta tshofha
Crystal clear explanation. Hats off to you, Martin!
Great lectures, Great book, Great author 👍
Wow, I didn't know Martin has a YT channel. Instant subscribe.
you my guy are a gem of humanity
super illustrative. Thank you!
What does the failed replica do when it comes up?
Grateful for the amazing lecture! Finally, get some impression about how Raft works.
Hello, the video is so helpful but hope that my question can be clarified, best.
Does the coordinator node care if other nodes have committed successfully or not, if it does and a node failed to commit, does the coordinator make a second decision for sending an abort to all the nodes?
Please correct me if i missed something..Before client sends commit message to coordinator for 2 phase commit start..it perform normal transaction on replicas but my confusion is if the problem happens during that write i.e. one replica performed write but another fails then how two phase will help...i thought that the entire reason of two-phase commit is to perform write in prepare phase and the commit in commit phase...why we are allowing normal update to both replicas before 2pc can start the process.
Am I right in understanding that we can use raft to send total order broadcasts and elect new coordinators for node communication and two phase commit for commiting data?
Question: Why is the "prepare" message necessary if replicas "ack" on the original transaction message?
Thank you sir, a grateful new subscriber.
Hi Martin, you told failure detector can be run on any node. So my doubts are what will happen if the specific node is down or crashed on which failure detector is running?? and then how we will detect how many other nodes also crash??
@martin Kleppmann, thanks for the interesting presentation all the way from Cambridge. I'll like to suggest that we could update the Linearizable CAS to:
IF old = new THEN
success := true
There is no point comparing the old and new, if they are the same. :)
what if is nodes reply to co ordinator that yes we can peform this transaction and send out the ok message (in response to prepare) but after sending the prepare they crash ? I assume these nodes will replicate the data (via consensus) so even in the face of faliure another leader will get elected. I do understand how total order broadcast work via raft but I am unable to how data is locked ?
Amazing lectures. Thank you so much. You are a god.
Why client is opening transaction simultaneously on 2 nodes in 2PC ? shouldn't the transaction be open on master node only ?
Hope you can make a video to explain three phase commit and how it improves fault tolerance.
Fault tolerant 2PC means the coordinator is redundant and can be removed?
Very nice and detailed video. I would love to see your three-phase commit explanation,
Very clear explanation!
Hi Martin, Thanks for this amazing series. I have a question here . If for any replica there are conflicting answers (one sent by the replica itself and other sent by other node on behalf of the replica(suspecting the replica is down) around the same time, shouldn't it take the later decision instead of the first decision? If some other node said a "No" (on this replica's behalf) and then the actual replica recovers itself and says a "yes" , then taking the later decision looks more logical . Same is true in the opposite case.
At first glance, that approach is appealing, since it appears to be the safest, avoiding any confusion by taking the most conservative default position. However, that isn't actually necessary, by virtue of the way Total Order Broadcast works. This is down to the relative timing of the slow / recovered replica's vote of "Yes", and the consensus decision by all the nodes. If the "Yes" vote is received from the slow node _after_ all the other "No" votes from others on its behalf, those "No" votes are overridden by the "Yes", since that was the first vote seen by it from others.
What's not entirely clear from the video is precisely when a consensus is considered to have been reached, and if/how this is consequently communicated among them. Presumably, if all the other nodes have already settled on the decision against proceeding before the "Yes" vote is received from the slow one, then that decision is not invalidated. The previous video in this series may expand upon this.
If database gone down after it had agreed to commit, what would you do?
What happens if one of the nodes has sent ok for prepare but while waiting for all the oks it crashes ? The transaction will go forward in all the other nodes.
One potential solution to this problem is to have a recovery mechanism for the node when it comes back up.
One potential solution is to have a recovery mechanism for the node when it comes back up.
Thank you for going over this.
I have a question regarding slide 2 of the Fault tolerant 2PC. Which node is taking the decision on the fate of the transaction, is it the current term leader of the Total Order Broadcast, or can it be any node participating in the transaction.
It seems like it should be the former, i.e. current term leader, but just wanted to be sure.
Each node can independently understand if the distributed transaction failed: each node receives the same sequence of messages and the algorithm used to determine if the transaction failed is deterministic.
So all the nodes will reach the same conclusion without the need of a coordinator.
@@giorgiobuttiglieri5876 When you say each node receives the same sequence of messages - how is the "sequence" guranteed to be the same in every node?
@@jainamm5307 For the proposed fault-tolerant version of the 2PC, we use total order broadcast as communication primitive.
So by definition all nodes receive the same messages in the same order.
If you are interested in how to achieve this, there are other videos in this channel explaining it very well
Good explanation! Thank you!
atomic commitment is completely different from atomic in ACID. For example, if students and classes are handles on different nodes, then after all components have voted yes and the coordinator send the commit messages, there will be a moment when the student has enrolled in a class but the class does not yet exist or vise versa. This is completely different from "atomic" in ACID.
This is fantastic ! thank you so much :)
Big thanks
I still don't get how with geographically distributed nodes (with different ping/latency to each other)... total order broadcast can prevent a (very rare and unlikely) race condition where you have 5/10 nodes that get the failure detector message to abort fractions of a second before the sluggish node sends a vote to go ahead and commit... and the other 5/10 nodes would have the opposite ordering
If it happens at exactly the same time... due to network latency effects... you could have a split of the network (5 nodes with low ping to the failure detector and 5 nodes with low ping to the sluggish node but high ping to the failure detector)... so in that case do you just go with majority rules and always have an odd total number of nodes to decide which is the true(er) version of history? But now we are into 3 phases not 2 phases...
So is this like a shitty version of the raft protocol or something where it assumes 0 network latency?
Total order broadcast requires consensus and if only 5/10 nodes have agreed then there's no quorum and no consensus. Neither event will be actionable until n/2+1 nodes have received it. If there is a 50%/50% split, neither side of the split will make any decisions (nothing will be committed and everything will grind to a halt) until the partition is resolved.
So is the coordinator used for decision making on commits, and the total order broadcast system just a backup in case the coordinator crashes?
thank you
I thought consensus are used in databases but looks like consensus can't solve atomic commit problem. Can anyone explain the real application of consensus?
Consensus achieves total order broadcast, i.e. all nodes deliver messages/operations in the same order.
Discourse from "distributed systems" God himself.
Very nice 👌
I have one more doubt. So we will wait to get an "OK" message from all the replicas or we will commit to a specific replica after receiving the "OK" message??. I mean if we will wait for all the replicas that make sense but if we just commit after receiving "OK" then it may consist of inconsistency. Ex if one replica sends the message "OK" and we commit the change to a specific replica but the other replica crash and does not send the "OK" message then both replica will be inconsistent.
"Reasonably simple way"... Yeah. That's what I thought
Would not make more sense to use a Queue as or helper for the coordinator
First cut the fucking hair xD next recording...