27:01 [slide 16] note that the C++ standard (c++23 and prior) doesn’t provide any assurance that the atomic-read-modify-write operations are wait-free, and all one can portably check/verify is if an operation is lock-free (vs. blocking). In fact, many popular C++ compiler implement fetch_add on x86 as a lock-free cas-loop. This means that it might be impossible to implement portable wait-free algorithms in “pure” c++ at the moment. Daniel addresses some of this in 1:01:09 and 1:03:37
I think we should distinguish between "wait-free" as a feature of atomic operation and as a feature of algorithm. Actually, no implementation of CAS is absolutely wait-free, because at hardware level it involves synchronization between CPU cores to facilitate cache coherence.
Couldn't read() just return 1 if it sees that val is zero? Obviously, we can't return 0 because that would allow situations where the value switches from 1 to 0 to 1, but if we just lie and say that the decrement hasn't finished until it committed the zero flag, read could return 1, I think this would be entirely consistent for outside observers. One advantage of this would be that we don't need the helper flag anymore. Another advantage is that read always just reads and never branches and writes.
@@freax13 I thought the same during the talk. I believe you are right, in this case the read linearizes before the decrement if the decrement isn't done yet. It should be perfectly valid.
Yeah, I wanted to comment the same thing. It seems a lot simpler. Maybe it's not correct somehow, but I don't see how. If the counter is 0 then it is in the progress of being set to zero... which by the way could be interrupted by an increment, and it seems weird and incorrect to let a read change the counter. It kinda violates const correctness and there's another problem: Let's say there are 3 threads/processes with 3 different priorities: H, M, L (High, Mid, Low). L tries to decrement to 0, then M interrupts and tries to increment (prevent setting it to 0), then H reads. Now H "helps" a lower priority to decrement the value to 0. This seems like priority inversion. M should be able to increment the counter before it gets set to 0. Which would be the case if read just returned 1 when value was 0. It was 1 before the decrement and could still be 1 after the decrement and parallel increment.
4:23 [slide 4] note that the precondition for decrement seems suspicious in a concurrent world - what if the counter is 1 and 2 threads call decrement. The precondition is violated although neither of the threads could “protect itself” from the violation. In practice it’s fine because the typical behavior is that every thread that calls decrement does so after having earlier called increment_if_not_zero() and received a true result. Daniel discusses it in 44:10 Great talk Daniel - definitely worthy of the big room 😂
@@ZeroPlayerGame Yep. I believe shared_ptr's constructor would construct a counter, weak_ptr::lock() performs increment_if_not_zero and shared_ptr's destructor performs decrement.
RE legality of calling decrement around 44:00, I think "you know the counter is not 0" is accurate but can be made more specific: the ways you know the counter is not 0 are (1) you constructed the counter and have not decremented it yet, or (2) you incremented the counter successfully and have not decremented it yet. If you're in such a code branch, you can decrement the counter. The logic here can be perfectly static - or the legality of decrementing can be carried around in a variable - but it's almost automatic to get that right, if you've written a logical program.
At 48:46, would it have been possible to just return 1 in the case where the counter is zero but without the flags? Would saying that read linearize before decrement work too? Because if the counter is at zero, then you know that at the beginning of decrement you were at one. If the decrement operation is not done, you could assume it's still one. If another thread increments in the meantime, then that 1 was true. Otherwise, it was as if decrement was called after since it will finish after the load. Am I right to assume this?
Isn't the problem just pushing the goal-post to the top bit now being the thing that needs to be verified ? I have only reached 33:55 so I'm not sure if this is going to be addressed, but I suspect that there'd be a way to 'elect' or support other threads to all verify or agree that if one says the top bit is 0 .. (hence counter = 0), we all agree, and should end our threads or return that the counter is now zero..
nevermind, my comment makes no sense. the helping bit is a sweet touch.. you execute code on behalf of another thread, and let the thread know 'someone' did the work for you - so go ahead, take responsibility, and delete the shared object.. although, i think someone from security needs to see if this could cause a whole lot of ways in which rogue threads come in and delete objects when they shouldn't be deleted.
Interesting I got the same results with my own testing, so I've gone with wait free for a game development environment but it looks like a tested swapping system might be best.
What is the specification of the return values of increment_if_not_zero() and decrement()? The lack of comment on this, makes the presentation hard to understand. Maybe it's there at some point in the presentation, I would have to look back.
this idea has been used in SQL DBs for like ever. Instead of doing 2 queries wrapped in a transaction: BEGIN "SELECT ... FOR UPDATE" then "UPDATE ..." and COMMIT. You can do just one query: UPSERT or "UPDATE ... ON CONFLICT". And SQL let's you do many things in case the value already existed or reached certain limit. This video is not about SQL, but the idea here is similar: using the first 2 bits as a way of notifying the Writer.
The wait-free implementation seems broken. In the mutex related implementation we can do -> 1.) decrement (returns true) -> 2.) increment_if_not_zero ( returns false) -> 3.) decrement (returns false) -> 4.) increment_if_not_zero (returns true). That's all good, and the fourth call correctly returns true since the counter is not zero. Now do the same sequence of 4 calls with the wait-free implementation. In the fourth call, and although the counter is not zero, increment_if_not_zero returns false. (uint64_t has overflown and reverses the leftmost bit). Besides, in wait-free you already have broken invariant/semantics to begin with, since increment_if_not_zero() modifies the internal state of the struct and the actual member of interest (the counter) but returns false and behaves like a no-op.
The precondition (stated on an early slide but easily missed) is that you may only call decrement() if you know the counter can't possibly have reached 0 already. You typically would know this by you yourself having previously called increment() with a return value of "success". That's how refcounts of shared pointers work: you only access a shared object after you successfully incremented the refcount, and in that case when you're done accessing the shared object you decrement the refcount, but if you didn't successfully increment the refcount in the first place, then you wouldn't later try to decrement it. A correct use of this counter could not lead to your hypothetical sequence of calls. As you observe, different implementations of the counter might indeed behave differently under incorrect usage of the counter, but that's moot insofar as one really just needs to know whether different implementations behave the same (i.e., correctly) under correct usage, to then shift one's focus to comparing the different implementations via other metrics (e.g., performance).
Useful information in this video starts at 30:59, the entire presentation could be 10 minutes not 1 hour. The speaker looks like a very young boy, but he is already suffering from the typical school teacher's occupational hazard: he treats the audience as little kids, and tries hard to be impressive building up the drama, because he believes he is the smarter person in class. It's so much better and more professional when the speaker just shares his own story: what problem he faced and how he solved it, maybe asks the audience how would they solve it at the end.
With an expert audience you could indeed skip the first half, but as the introduction makes clear, this talk is aimed at a more general audience. The title is literally "Introduction to Wait-free Algorithms". I think he did a really great job at explaining the essence of wait-free programming by first starting with a mutex based and CAS loop based implementation and discussing their respective shortcomings.
There is a bad idea to keep mask and counter in one variable - solution is weird. Moreover solution is incorrect, for example in following scenario counter is broken: - thread #1 calls increment() - counter has != 0 value - thread #1 calls decrement(): decrement value to 0 and freezed before if statement "if (counter.comp_exch(e, is_zero){" - thread #2 executes read() - loads 0 values and "helps" by adding flags is_zero|helped to counter - thread #1 executes if statement - compare_exch(e, is_zero) fails because thread #2 added both flags is_zero|helped. thread freezed before executing next if statement "if ((e&helped) && counter.exch(is_zero)..." - thread #3 executes increment() - counter value has 2 signed high bits is_zero|helped and 1. - thread #1 executes if statement - e variable has helped bit signed because of read() call and compare_exch from previous if statement, so it executes counter.exchange(is_zero) - result counter value is signed is_zero bit!!!! (value of counter updated by thread#3 is lost) - thread #3 executes decrement() - counter is broken. There is a bad idea to test solution just by running 1000x threads - it doesn't guaranty anything.... Yes, I agree with "implementation of correct wait-free algorithm is untrivial and complex task".
When thread #3 calls increment(), the returned boolean flag informs thread #3 that the increment attempt was unsuccessful (because the counter is now considered stuck forever at 0, doesn't matter what the low bits happen to be). Since thread #3 knows it didn't successfully increment, it would not later call decrement(). Only threads who got success from increment() are allowed to call decrement().
Because maybe pthreads isn't enough for you or it isn't (commonly) available on your platform (see Windows) or you need to take control. This to me seems like it complements threads, not replaces them, your dichotomy is false.
This comment doesn't make any sense... pthreads is equivalent to std::thread, std::mutex, std::condition_variable, etc. This talk is about wait-free programming with atomics.
Surprise to see Daniel using Single Use Plastic water bottle. This should be a glass bottle or his own water bottle. World is discouraging and cppconn is encouraging. !!!! Environment is also Important
That's a really great talk! Daniel must be a terrific teacher.
27:01 [slide 16] note that the C++ standard (c++23 and prior) doesn’t provide any assurance that the atomic-read-modify-write operations are wait-free, and all one can portably check/verify is if an operation is lock-free (vs. blocking). In fact, many popular C++ compiler implement fetch_add on x86 as a lock-free cas-loop. This means that it might be impossible to implement portable wait-free algorithms in “pure” c++ at the moment.
Daniel addresses some of this in 1:01:09 and 1:03:37
I think we should distinguish between "wait-free" as a feature of atomic operation and as a feature of algorithm. Actually, no implementation of CAS is absolutely wait-free, because at hardware level it involves synchronization between CPU cores to facilitate cache coherence.
Pretty mind blowing and really creative solution!
Great talk. wish I could give it two likes.
Couldn't read() just return 1 if it sees that val is zero? Obviously, we can't return 0 because that would allow situations where the value switches from 1 to 0 to 1, but if we just lie and say that the decrement hasn't finished until it committed the zero flag, read could return 1, I think this would be entirely consistent for outside observers. One advantage of this would be that we don't need the helper flag anymore. Another advantage is that read always just reads and never branches and writes.
@@freax13 But how we return 0 if it is 0 and other threads finished work?
@@freax13 I thought the same during the talk. I believe you are right, in this case the read linearizes before the decrement if the decrement isn't done yet. It should be perfectly valid.
@@freax13 I had the same thought. Anyone able to poke a hole in this?
Yeah, I wanted to comment the same thing. It seems a lot simpler. Maybe it's not correct somehow, but I don't see how.
If the counter is 0 then it is in the progress of being set to zero... which by the way could be interrupted by an increment, and it seems weird and incorrect to let a read change the counter. It kinda violates const correctness and there's another problem:
Let's say there are 3 threads/processes with 3 different priorities: H, M, L (High, Mid, Low).
L tries to decrement to 0, then
M interrupts and tries to increment (prevent setting it to 0), then
H reads. Now H "helps" a lower priority to decrement the value to 0. This seems like priority inversion. M should be able to increment the counter before it gets set to 0.
Which would be the case if read just returned 1 when value was 0. It was 1 before the decrement and could still be 1 after the decrement and parallel increment.
I agree! It looks like bread and butter is not only helping, but lying first, and if not possible, only then helping!
4:23 [slide 4] note that the precondition for decrement seems suspicious in a concurrent world - what if the counter is 1 and 2 threads call decrement. The precondition is violated although neither of the threads could “protect itself” from the violation. In practice it’s fine because the typical behavior is that every thread that calls decrement does so after having earlier called increment_if_not_zero() and received a true result. Daniel discusses it in 44:10
Great talk Daniel - definitely worthy of the big room 😂
The way to ensure that would be to allow a thread to call decrement iff it constructed the counter or it has previously succeeded at an increment.
@@ZeroPlayerGame Yep. I believe shared_ptr's constructor would construct a counter, weak_ptr::lock() performs increment_if_not_zero and shared_ptr's destructor performs decrement.
@@Roibarkan that's fine though, decrement would serialize after in this case
excellent talk, thanks!
such a great speaker
Talk starts at 3:27.
I can't imagine myself using this technique in a million years, but such an interesting talk!
@@rutabega306 it is the kind of magic we use hidden behind code smart people put into libraries for us :)
RE legality of calling decrement around 44:00, I think "you know the counter is not 0" is accurate but can be made more specific: the ways you know the counter is not 0 are (1) you constructed the counter and have not decremented it yet, or (2) you incremented the counter successfully and have not decremented it yet. If you're in such a code branch, you can decrement the counter.
The logic here can be perfectly static - or the legality of decrementing can be carried around in a variable - but it's almost automatic to get that right, if you've written a logical program.
At 48:46, would it have been possible to just return 1 in the case where the counter is zero but without the flags? Would saying that read linearize before decrement work too? Because if the counter is at zero, then you know that at the beginning of decrement you were at one. If the decrement operation is not done, you could assume it's still one. If another thread increments in the meantime, then that 1 was true. Otherwise, it was as if decrement was called after since it will finish after the load. Am I right to assume this?
I had the same idea and wonder if it would be possible. Could be faster as well.
I wonder if the timing is more consistent with the wait-free counter. I would guess it would be, and that can be useful for real-time systems.
Just want to say that a software engineer named Chip Hogg is epic. Cheers to you, Chip!
Isn't the problem just pushing the goal-post to the top bit now being the thing that needs to be verified ? I have only reached 33:55 so I'm not sure if this is going to be addressed, but I suspect that there'd be a way to 'elect' or support other threads to all verify or agree that if one says the top bit is 0 .. (hence counter = 0), we all agree, and should end our threads or return that the counter is now zero..
nevermind, my comment makes no sense. the helping bit is a sweet touch.. you execute code on behalf of another thread, and let the thread know 'someone' did the work for you - so go ahead, take responsibility, and delete the shared object.. although, i think someone from security needs to see if this could cause a whole lot of ways in which rogue threads come in and delete objects when they shouldn't be deleted.
Wait-free algorighms were invented to make Lock-free look simple.
Interesting I got the same results with my own testing, so I've gone with wait free for a game development environment but it looks like a tested swapping system might be best.
Nice talk!
13:22 OMG LIKE THE GAME
What is the specification of the return values of increment_if_not_zero() and decrement()? The lack of comment on this, makes the presentation hard to understand. Maybe it's there at some point in the presentation, I would have to look back.
Great
this idea has been used in SQL DBs for like ever. Instead of doing 2 queries wrapped in a transaction:
BEGIN "SELECT ... FOR UPDATE" then "UPDATE ..." and COMMIT. You can do just one query: UPSERT or "UPDATE ... ON CONFLICT". And SQL let's you do many things in case the value already existed or reached certain limit. This video is not about SQL, but the idea here is similar: using the first 2 bits as a way of notifying the Writer.
The wait-free implementation seems broken. In the mutex related implementation we can do -> 1.) decrement (returns true) -> 2.) increment_if_not_zero ( returns false) -> 3.) decrement (returns false) -> 4.) increment_if_not_zero (returns true). That's all good, and the fourth call correctly returns true since the counter is not zero. Now do the same sequence of 4 calls with the wait-free implementation. In the fourth call, and although the counter is not zero, increment_if_not_zero returns false. (uint64_t has overflown and reverses the leftmost bit). Besides, in wait-free you already have broken invariant/semantics to begin with, since increment_if_not_zero() modifies the internal state of the struct and the actual member of interest (the counter) but returns false and behaves like a no-op.
The precondition (stated on an early slide but easily missed) is that you may only call decrement() if you know the counter can't possibly have reached 0 already. You typically would know this by you yourself having previously called increment() with a return value of "success". That's how refcounts of shared pointers work: you only access a shared object after you successfully incremented the refcount, and in that case when you're done accessing the shared object you decrement the refcount, but if you didn't successfully increment the refcount in the first place, then you wouldn't later try to decrement it. A correct use of this counter could not lead to your hypothetical sequence of calls. As you observe, different implementations of the counter might indeed behave differently under incorrect usage of the counter, but that's moot insofar as one really just needs to know whether different implementations behave the same (i.e., correctly) under correct usage, to then shift one's focus to comparing the different implementations via other metrics (e.g., performance).
Useful information in this video starts at 30:59, the entire presentation could be 10 minutes not 1 hour. The speaker looks like a very young boy, but he is already suffering from the typical school teacher's occupational hazard: he treats the audience as little kids, and tries hard to be impressive building up the drama, because he believes he is the smarter person in class. It's so much better and more professional when the speaker just shares his own story: what problem he faced and how he solved it, maybe asks the audience how would they solve it at the end.
@@ugood I learned a lot from the first 30 minutes
With an expert audience you could indeed skip the first half, but as the introduction makes clear, this talk is aimed at a more general audience. The title is literally "Introduction to Wait-free Algorithms". I think he did a really great job at explaining the essence of wait-free programming by first starting with a mutex based and CAS loop based implementation and discussing their respective shortcomings.
There is a bad idea to keep mask and counter in one variable - solution is weird.
Moreover solution is incorrect, for example in following scenario counter is broken:
- thread #1 calls increment() - counter has != 0 value
- thread #1 calls decrement(): decrement value to 0 and freezed before if statement "if (counter.comp_exch(e, is_zero){"
- thread #2 executes read() - loads 0 values and "helps" by adding flags is_zero|helped to counter
- thread #1 executes if statement - compare_exch(e, is_zero) fails because thread #2 added both flags is_zero|helped. thread freezed before executing next if statement "if ((e&helped) && counter.exch(is_zero)..."
- thread #3 executes increment() - counter value has 2 signed high bits is_zero|helped and 1.
- thread #1 executes if statement - e variable has helped bit signed because of read() call and compare_exch from previous if statement, so it executes counter.exchange(is_zero) - result counter value is signed is_zero bit!!!! (value of counter updated by thread#3 is lost)
- thread #3 executes decrement() - counter is broken.
There is a bad idea to test solution just by running 1000x threads - it doesn't guaranty anything....
Yes, I agree with "implementation of correct wait-free algorithm is untrivial and complex task".
When thread #3 calls increment(), the returned boolean flag informs thread #3 that the increment attempt was unsuccessful (because the counter is now considered stuck forever at 0, doesn't matter what the low bits happen to be). Since thread #3 knows it didn't successfully increment, it would not later call decrement(). Only threads who got success from increment() are allowed to call decrement().
Why bother with these abstractions, when you have pthreads?
Because pthreads are not in standard C++ so code that uses it is not portable.
Because maybe pthreads isn't enough for you or it isn't (commonly) available on your platform (see Windows) or you need to take control. This to me seems like it complements threads, not replaces them, your dichotomy is false.
This comment doesn't make any sense... pthreads is equivalent to std::thread, std::mutex, std::condition_variable, etc. This talk is about wait-free programming with atomics.
Surprise to see Daniel using Single Use Plastic water bottle.
This should be a glass bottle or his own water bottle.
World is discouraging and cppconn is encouraging. !!!! Environment is also Important
Great