I had to say, I was kinda confused with x86 sfence, lfence and mfences. I know the empty inline assembly doesn't give any guarantee in a multithreaded or multiprocessor environment, but i fear that that is not clear enough from this video.
My goal was to avoid talking about hardware memory reordering as much as possible because it's a separate issue (and warrants a dedicated discussion). If your instructions have already been scheduled in the wrong order by the compiler, it won't work even on a sequentially consistent machine, so you've lost the war before you've even executed your application. I'll be doing video on hardware memory reordering and hw barriers soon! Cheers, --Nick
@@NotesByNick Ahh so you've actually forseen this problem already. Of course you're miles ahead of me. I should've expected that by now. I just always seem to have to talk about what certain memory primitives do and more importantly what they not do when talking to other programmers.
Memory models are a niche topic that unfortunately many people don't have a good foundation in. It's also a place where intuition can fail you. Fortunately on x86, it's a relatively strict memory model where the reordering of stores is not even possible, so the example is safe. On a platform like ARM, with a much weaker consistency model, you would need a barrier (e.g. like the ARM linux kernel spin lock that uses the smp_mb() macro that expands to the dmb instruction).
@@chenyu8553 technically, this is an incorrect use of volatile, I believe it should be an atomic with relaxed memory. The thing you want to guarantee is that now_serving becomes visible to other cores. An atomic or volatile guarantees that reads and writes always goes through memory and doesn't use a local register copy. The difference being is that CPUs have visibility on other CPU's memory operations (since memory is shared) whereas registers are private to a logical CPU (hyperthreads get their own copy of the register state but share the compute units with other hyperthreads). The reason why volatile is not technically correct is a bit subtle and is the reason why C++11 introduced atomics in the first place. Probably the explanation would make for good content @CoffeeBeforeArch!
@@chenyu8553 volatile guarantees intra-thread ordering and inter-thread visibility but without ordering and, thus, not inter-thread synchronization. The order in which other threads see volatile memory accesses is not guaranteed to align with the order in which they are made by a given thread. This is why atomics and memory_order semantics are required to ensure desired behavior actually occurs in program execution. As stated by @DoobooDomo below.
Well, after looking into it in a bit more detail i have come to conclusion declaring the shared_value as the volatile will not prevent the instruction re-ordering it would only force the compiler to load the value in register before using it not used the previously referenced one if any
There's zero point to create a software memory barrier without a hardware memory barrier. This talk is highly misleading. You're using undefined behavior of compilers to do this. If you're going to cover memory barriers, you should talk about the standard C++ barriers. More so you're using "volatile" on now_serving incorrectly. Volatile is only for use with hardware IO, and has basically no use for anything outside embedded applications.
Not sure what your point is. The guy has started from a single concept of software memory barrier and has another video about hardware memory barrier. This is common sense to move step by step for novice audience. You know stuff doesn't mean you should be told all at once. Now stay quite and check all videos.
@@267praveen Erg's tone is a little harsh, but he's correct. Even teaching to novices, it is probably a good idea to avoid teaching things that are actually wrong. This use of volatile is incorrect (it should be an atomic with relaxed memory ordering), but it is a common error because that's what people did before C++ had an explicit memory model with atomics etc. (I think this use of volatile is also correct in Java). I agree this stuff is subtle, and made-up (but important) concepts like the "abstract C++ model" make things more difficult for all learners. To end on a positive note: I really enjoyed Coffee's treatment on different spinning policies in his spinlock playlist.
you should take this video down. using volatile in this way shared between threads is WRONG. It may or may not work but basically its undefined behaviour.
Very well explained! 👏👏👏 Thank you Sir! 🙏
I had to say, I was kinda confused with x86 sfence, lfence and mfences. I know the empty inline assembly doesn't give any guarantee in a multithreaded or multiprocessor environment, but i fear that that is not clear enough from this video.
My goal was to avoid talking about hardware memory reordering as much as possible because it's a separate issue (and warrants a dedicated discussion). If your instructions have already been scheduled in the wrong order by the compiler, it won't work even on a sequentially consistent machine, so you've lost the war before you've even executed your application.
I'll be doing video on hardware memory reordering and hw barriers soon!
Cheers,
--Nick
@@NotesByNick Ahh so you've actually forseen this problem already. Of course you're miles ahead of me. I should've expected that by now. I just always seem to have to talk about what certain memory primitives do and more importantly what they not do when talking to other programmers.
Memory models are a niche topic that unfortunately many people don't have a good foundation in. It's also a place where intuition can fail you.
Fortunately on x86, it's a relatively strict memory model where the reordering of stores is not even possible, so the example is safe. On a platform like ARM, with a much weaker consistency model, you would need a barrier (e.g. like the ARM linux kernel spin lock that uses the smp_mb() macro that expands to the dmb instruction).
Can you help a session on reading binary files as buffers using fread function? How to predetermine buffer structure size in binary? Thanks much
Thanks for the suggestion! I'll see if it fits in with any of the other topics I have planned.
Cheers,
--Nick
nice explanation
good stuff man
nice video, but why we should use volatile here?
Good question, I also want to know.
@@chenyu8553 technically, this is an incorrect use of volatile, I believe it should be an atomic with relaxed memory. The thing you want to guarantee is that now_serving becomes visible to other cores. An atomic or volatile guarantees that reads and writes always goes through memory and doesn't use a local register copy. The difference being is that CPUs have visibility on other CPU's memory operations (since memory is shared) whereas registers are private to a logical CPU (hyperthreads get their own copy of the register state but share the compute units with other hyperthreads).
The reason why volatile is not technically correct is a bit subtle and is the reason why C++11 introduced atomics in the first place. Probably the explanation would make for good content @CoffeeBeforeArch!
@@chenyu8553 volatile guarantees intra-thread ordering and inter-thread visibility but without ordering and, thus, not inter-thread synchronization. The order in which other threads see volatile memory accesses is not guaranteed to align with the order in which they are made by a given thread. This is why atomics and memory_order semantics are required to ensure desired behavior actually occurs in program execution. As stated by @DoobooDomo below.
Speaking only about software optimization, why dont you just declare shared_value as volatile?
it would be great if you have figured out why he didn't use the volatile for the shared_value as well ... cuz rn i am confused about it
Well, after looking into it in a bit more detail i have come to conclusion declaring the shared_value as the volatile will not prevent the instruction re-ordering it would only force the compiler to load the value in register before using it not used the previously referenced one if any
LOVE IT. THANK YOU.
There's zero point to create a software memory barrier without a hardware memory barrier. This talk is highly misleading. You're using undefined behavior of compilers to do this. If you're going to cover memory barriers, you should talk about the standard C++ barriers. More so you're using "volatile" on now_serving incorrectly. Volatile is only for use with hardware IO, and has basically no use for anything outside embedded applications.
Not sure what your point is. The guy has started from a single concept of software memory barrier and has another video about hardware memory barrier. This is common sense to move step by step for novice audience. You know stuff doesn't mean you should be told all at once. Now stay quite and check all videos.
@@267praveen Erg's tone is a little harsh, but he's correct. Even teaching to novices, it is probably a good idea to avoid teaching things that are actually wrong. This use of volatile is incorrect (it should be an atomic with relaxed memory ordering), but it is a common error because that's what people did before C++ had an explicit memory model with atomics etc. (I think this use of volatile is also correct in Java). I agree this stuff is subtle, and made-up (but important) concepts like the "abstract C++ model" make things more difficult for all learners.
To end on a positive note: I really enjoyed Coffee's treatment on different spinning policies in his spinlock playlist.
Didn't know Hearthstone was an advanced topic haha
Hey Nick could you check your email please? I would reallllllly appreciate your help with something. Thank you!!
you should take this video down. using volatile in this way shared between threads is WRONG. It may or may not work but basically its undefined behaviour.