9:14 I don't understand Sin #2. Isn't the underlying issue having UB in the code? I haven't finished the talk yet, I'm sorry if this will be addressed later.
Yes, you are exactly right. The underlying issue is that there is UB in the code. The sin is that some people think that - even though it is UB - it will never have negative consequences on their platform. The argument goes something like this: "The C++ rules only mark concurrent access as UB because CPUs re-order loads/stores, have caches etc. But my micro embedded controller has no load/store, cache, and so even though my code is UB, it will still work in practice on my CPU". I should have made this clearer in the talk.
51:22 why have array of atomics, I presume asm will be generating atomic instruction for every float, should be much faster to batch the writes to nonatomic array and guard that batch write?
I agree one should employ good tooling and diagnostics to tackle the problems described here. So let's start with using a language the detects a lot of those problems at compile time.
44:50 explanation is very confusing, e.g. what if I reorder the lines with wpos and rpos, why is wpos atomic... But more importantly why not just recommend a library instead of encouraging people to write their own broken atomic code? Oh and obviously his clever atomic stuff does not matter since he will probably get false sharing between wpos and rpos.
Great talk!
9:14 I don't understand Sin #2. Isn't the underlying issue having UB in the code? I haven't finished the talk yet, I'm sorry if this will be addressed later.
That is what he said: UB lead to optimization that "broke"(it was broken already) program.
Yes, you are exactly right. The underlying issue is that there is UB in the code. The sin is that some people think that - even though it is UB - it will never have negative consequences on their platform. The argument goes something like this: "The C++ rules only mark concurrent access as UB because CPUs re-order loads/stores, have caches etc. But my micro embedded controller has no load/store, cache, and so even though my code is UB, it will still work in practice on my CPU". I should have made this clearer in the talk.
51:22 why have array of atomics, I presume asm will be generating atomic instruction for every float, should be much faster to batch the writes to nonatomic array and guard that batch write?
I agree one should employ good tooling and diagnostics to tackle the problems described here. So let's start with using a language the detects a lot of those problems at compile time.
44:50 explanation is very confusing, e.g. what if I reorder the lines with wpos and rpos, why is wpos atomic... But more importantly why not just recommend a library instead of encouraging people to write their own broken atomic code?
Oh and obviously his clever atomic stuff does not matter since he will probably get false sharing between wpos and rpos.