Is there any requirement of putting memory fence after critical section and before making flag false? Is there any possibility of flag becoming flase before finishing critical section due to reordering?
So given that the operation for setting flag[tid] to FALSE is a write, it will not be re-ordered w.r.t any prior reads/writes in the critical section. The only re-ordering modern x86 processors allow is writes with later reads in program order. Reads with later writes are not re-ordered, and writes with later writes are not re-ordered (there are a few cases like non-temporal mov instructions where writes can be reordered w.r.t. other writes, but those are not being used here). I go over the section of the intel SW developers manual that talks about this in the previous video of the series. Cheers, --Nick
Great video, yet I think what it is really missing is a real example (it is enough to just reference it, or give a link to it). The point is to emphasize that there is a certain set of problems that exist in our real world. And if we want to have a reasonably decent solution(or sometimes any solution at all) to any of those problems, we have to consider this kind of approach of using mfence. Roughly, I would consider mfence a last option, or I mean I believe that you have to be reasonable when choosing this option when solving your problem as it has it's consequences. As you've stated (If I remember correctly) in your recent video that x86 CPUs use internal buffers for micro-operations and their data (which by the way is also sometimes used for predictions) and then contents of those buffers (or more specifically a queue) are distributed between the execution blocks of a core (and that actual distribution may effect code performance by a lot). And so mfence instruction may clear some contents of those CPU buffers and you need to be aware of that if used incorrectly it may really deny your code performance.
Your channel is a goldmine! Just amazing.
hey man, we missed your videos. hope you get back soon.
Missing your content Coffee. Hope all is well and you come back with the fire content soon !
thanks for all your channel
bro youre a beast! greetings from Mexico!
Is there any requirement of putting memory fence after critical section and before making flag false? Is there any possibility of flag becoming flase before finishing critical section due to reordering?
So given that the operation for setting flag[tid] to FALSE is a write, it will not be re-ordered w.r.t any prior reads/writes in the critical section.
The only re-ordering modern x86 processors allow is writes with later reads in program order. Reads with later writes are not re-ordered, and writes with later writes are not re-ordered (there are a few cases like non-temporal mov instructions where writes can be reordered w.r.t. other writes, but those are not being used here). I go over the section of the intel SW developers manual that talks about this in the previous video of the series.
Cheers,
--Nick
Great video, yet I think what it is really missing is a real example (it is enough to just reference it, or give a link to it).
The point is to emphasize that there is a certain set of problems that exist in our real world. And if we want to have a reasonably decent solution(or sometimes any solution at all) to any of those problems, we have to consider this kind of approach of using mfence.
Roughly, I would consider mfence a last option, or I mean I believe that you have to be reasonable when choosing this option when solving your problem as it has it's consequences. As you've stated (If I remember correctly) in your recent video that x86 CPUs use internal buffers for micro-operations and their data (which by the way is also sometimes used for predictions) and then contents of those buffers (or more specifically a queue) are distributed between the execution blocks of a core (and that actual distribution may effect code performance by a lot).
And so mfence instruction may clear some contents of those CPU buffers and you need to be aware of that if used incorrectly it may really deny your code performance.