@4:52 Strange... looks like compiler bug or some strange quirk with debugable/non-optimised code. Such time variations in CPU bounded code also are strange, but real, i checked on my windows machine and with bigger counts in loop. It needs more investigation, it maybe some syscalls overheads.
@@learning_rust I don't think it is OBS, i don't have it, other os so it must be something with generated machine code or with this timer (but i doubt this is it).
@@learning_rust I investigated slightly... cargo run uses rustc with "-o0", generated assembly is.... ufff... full of calls, even for loop includes call to into_iter, etc., -C opt-level=1 gives more reasonable code and don't optimise 80% of code away, but then single threaded loop is optimised away :(
Ah ok, so I get that one idea is to use "inline" as well, I think I may have to look at 'pgo' as well eventually. Interesting info : - doc.rust-lang.org/rustc/profile-guided-optimization.html#community-maintained-tools - see also: clang.llvm.org/docs/UsersManual.html#id49
@@learning_rust I will check, now i have only few moment to look at it and I assumed that most is controlled by optimisation level. But I didn't expect that level 0 is "deoptimised" 😉
That is an interesting side-effect from using a constant.
That might be worth testing for other types of constant (String or str for example).
@4:52 Strange... looks like compiler bug or some strange quirk with debugable/non-optimised code.
Such time variations in CPU bounded code also are strange, but real, i checked on my windows machine and with bigger counts in loop.
It needs more investigation, it maybe some syscalls overheads.
It may be that running OBS affected it. I'll revisit this with latest Rust update and see if I can reproduce the results 👍
@@learning_rust I don't think it is OBS, i don't have it, other os so it must be something with generated machine code or with this timer (but i doubt this is it).
@@learning_rust I investigated slightly... cargo run uses rustc with "-o0", generated assembly is.... ufff... full of calls, even for loop includes call to into_iter, etc., -C opt-level=1 gives more reasonable code and don't optimise 80% of code away, but then single threaded loop is optimised away :(
Ah ok, so I get that one idea is to use "inline" as well, I think I may have to look at 'pgo' as well eventually. Interesting info : - doc.rust-lang.org/rustc/profile-guided-optimization.html#community-maintained-tools - see also: clang.llvm.org/docs/UsersManual.html#id49
@@learning_rust I will check, now i have only few moment to look at it and I assumed that most is controlled by optimisation level. But I didn't expect that level 0 is "deoptimised" 😉
in my M2 macbook it desn;t have any difference. But try in a release mode. It will be a surprise.