For the question around 1:03:13, GCC has a warning option `-Wdisabled-optimization` to notify if it gave up on optimizing something due to it being too big or complex.
I have been wondering, what kind of information we could provide within our source code, for example through annotations, that could help the compiler to optimize better or to allow completely new optimizations? We could add new annotations as we find more useful information for compilers. What kind of information the developer knows of how his code should work that the compiler could use?
WRT the "Can we visit better?" slide, you mentioned that the switch needs to be written manually. If we introduce a new language feature for compile time unrolling of switches, e.g. "for switch(i){ v(*std::get_if(&x)); }" where min and max are constexpr, such that the loop is unwrapped at compile time into (max - min) components, that'd be pretty nifty. It'd also be useful to be able to have a compile-time switch on types, and being able to generate it based on a function would be a godsend.
I'm actually doing some work now with Michael Park on improving the visit implementation, you can do some nice things with the language now (nicer than I realized!). So hopefully soon you'll be getting better visit code in the standard library.
@@quickNir In retrospect it wasn't so bad. I often watch these videos at a late hour and the assembly probably confused me to the point the templates confused me as well
@Nil Friedman Amazing talk. Can you tell which gcc version started optimizing reinterpret_cast code away due to UB? I work in HFT as well and want to avoid surprises :D Also are you aware which optimization flag might be responsible for that (e.g. will disabling of strict aliasing help)? I was surprised about std sort example. I dream about tool that would show in readable name hints about missing compiler optimizations per function.
@Nir Friedman You said you ran into a problem with reinterpret_cast. Could you show a piece of code where you actually got hit (to reproduce it) by it?
@@phonlolol5153 ok, thanks i've mixed that, cause later he shows an example with memcpy. I just don't get what's wrong with that? Where is the Undefined Behavior Nir is talking about?
@@1000sergez it is undefined behavior to (re)interpret arbitrary storage as type T as if an object of a type T is existent but in reality it got never created there. for example: if i hand you a couple of bytes, you are not allowed to interpret them as integers if i did not put integers there in the first place. so to circumvent this, you create an integer, copy the bytes there and now you can read its value. there are a few exceptions where this is actually allowed by the C++ language (for example byte/char). it is named strict aliasing rule
If you change to unsigned integers, there will still be a discrepancy in assembly generated. So while this may be true it's not the reason for difference in generated assembly (though the example could be improved).
- 10:23, wait a minute! I indeed believe that a lambda f() could be faster IF it has the contents within itself. However, that lambda has as the same pointer-to-f() as the direct call, and is it still faster?! o-0 More code meaning less work?! - 14:25, is this true for any kind of types? I have a project that vectors are usually faster than arrays. I guess that is due to the copy from same type, since it has a main vector that "pretty much needs" to be a vector.
For the question around 1:03:13, GCC has a warning option `-Wdisabled-optimization` to notify if it gave up on optimizing something due to it being too big or complex.
I have been wondering, what kind of information we could provide within our source code, for example through annotations, that could help the compiler to optimize better or to allow completely new optimizations? We could add new annotations as we find more useful information for compilers. What kind of information the developer knows of how his code should work that the compiler could use?
WRT the "Can we visit better?" slide, you mentioned that the switch needs to be written manually. If we introduce a new language feature for compile time unrolling of switches, e.g. "for switch(i){ v(*std::get_if(&x)); }" where min and max are constexpr, such that the loop is unwrapped at compile time into (max - min) components, that'd be pretty nifty. It'd also be useful to be able to have a compile-time switch on types, and being able to generate it based on a function would be a godsend.
I'm actually doing some work now with Michael Park on improving the visit implementation, you can do some nice things with the language now (nicer than I realized!). So hopefully soon you'll be getting better visit code in the standard library.
Missed important thing: inlining is done per function call, not per function. The same function can get inlined in one place but not inlined in other.
I thought we weren't supposed to "help the compiler"?
Oh well, I guess in 2023, we'll be seeing "Helping the compiler not help you help itself" XD
1. "OK, simple inline functions. All good."
2. * Looks away for a minute *
3. Holy .... Template hell
Timestamps? ;-). I didn't think there were any bad templates tbh.
@@quickNir In retrospect it wasn't so bad. I often watch these videos at a late hour and the assembly probably confused me to the point the templates confused me as well
GREAT! now I understand why the performance of my std::variant code was so bad.... Maybe I can fix it now.
@Nil Friedman Amazing talk. Can you tell which gcc version started optimizing reinterpret_cast code away due to UB? I work in HFT as well and want to avoid surprises :D Also are you aware which optimization flag might be responsible for that (e.g. will disabling of strict aliasing help)?
I was surprised about std sort example. I dream about tool that would show in readable name hints about missing compiler optimizations per function.
Check out -f-no-strict-aliasing
@Nir Friedman You said you ran into a problem with reinterpret_cast. Could you show a piece of code where you actually got hit (to reproduce it) by it?
Unfortunately it's really hard to isolate, otherwise I would have reproduced the code. Sorry!
@@quickNir Could you please explain why on the slide 7.2 you just not write "auto* h = reinterpret_cast(buffer);" Why do you need a copy?
@@1000sergez there is no copy made. h is a reference.
@@phonlolol5153 ok, thanks i've mixed that, cause later he shows an example with memcpy. I just don't get what's wrong with that? Where is the Undefined Behavior Nir is talking about?
@@1000sergez it is undefined behavior to (re)interpret arbitrary storage as type T as if an object of a type T is existent but in reality it got never created there. for example: if i hand you a couple of bytes, you are not allowed to interpret them as integers if i did not put integers there in the first place. so to circumvent this, you create an integer, copy the bytes there and now you can read its value. there are a few exceptions where this is actually allowed by the C++ language (for example byte/char). it is named strict aliasing rule
16:30 adding the condition line changes behavior of function for x values less than (-m), so such functions is not equivalent.
If you change to unsigned integers, there will still be a discrepancy in assembly generated. So while this may be true it's not the reason for difference in generated assembly (though the example could be improved).
Are the slides for this available online? I couldn't find them inside the CppCon2018 git repo.
- 10:23, wait a minute! I indeed believe that a lambda f() could be faster IF it has the contents within itself. However, that lambda has as the same pointer-to-f() as the direct call, and is it still faster?! o-0 More code meaning less work?!
- 14:25, is this true for any kind of types? I have a project that vectors are usually faster than arrays. I guess that is due to the copy from same type, since it has a main vector that "pretty much needs" to be a vector.
7.13 is missing break;
Yes, there are many places with missing breaks and other typos, like (int == 0)
I'll try to fix on the slides, thanks.
@@1000sergez Feel free to write list here, or email me quicknir@gmail.com and I'll make corrections.
@@quickNir The following places: 7.2 (int == 0) , (int == 1) 7.3 (int == 1) , 7.9 and 7.13 mising breaks. And thanks for your presentation!
It's always fun to watch Israelis lecture at cppcon (although, I'd dare to say your parents are Israeli, and you grew up in the states?)
Correct :-). (Well, Canada, not the states).