32:28 I'm sorry if this has been worked out by someone else in the comments before, but my old grad school instincts took "I haven't bothered to work it out" to mean "it's a good exercise to work out what this does," which is exactly what I did. The assembly in the output is just (a
Small mistake @44:51. Matt says the problem is if you pass INT_MAX, but that will overflow in both cases. The actual edge case is close to sqrt(INT_MAX), which will overflow in the first case but not the second. I'm guessing that's the objection that was raised by the audience member.
Yes, talking about execution of dynamically compiled codes. Few years back I wrote one directory traversal codes, online compiled it and managed to traversed the directory of the server. I highlighted this vulnerability to the site-owner.
@31:40 :D Also, I'm getting flashbacks from optimizing multiplication by 40 on the C64 in machine code. shift, shift, shift, store, shift, shift, add. The factor 40 comes up a lot because there are 40 bytes of characters per row.
@@jakobnissen5723 That is not the case here. The second-to-last `ret` on line 12 is already aligned; jumping to a redundant one after it is _completely_ pointless.
The bit halfway through made me wonder if there are any benchmarks where people recompile old code for old hardware on new compilers (and new machines to save time), and see if the old stuff runs faster afterwards.
That's actually a neat point of Java. Just run the same code on another CPU (with an jvm that understands it) and you are good to go to use new instructions the CPU offers.
I saw that René compiled some basic program (gzip?) with the newest compiler on some ancient hardware th-cam.com/users/MoreReneRebe It was slower. I think it might have been after mitigations, but larger code might not benefit the smaller caches from back then.
final does not guarantee in any way, that there won't be virtual calls... it's enough you have two classes both final, that implement a shared interface.
This cleverness is starting to make template generics look like premature optimization. A C generic using (void*), (*)(), size, offset, stride will become a compile time generic due to cleverness. That is, if the parameters are constant, the compiler is just as happy to inline an indirect function call and unroll loops as if it were a with T::dostuff(). The C generic, however, has the advantage that it is simultaneously a runtime generic which can be called from Python or whatnot. So I'm amused by the irony in saying 1. compilers are clever while 2. advising the use of std:: template stuff. Said another way, templates tell the compiler to force compile time parameters when they are already clever enough to turn runtime parameters into compile time parameters when possible and defer to runtime what they must.
If you're using templates for speed, your using them wrong. You use them because they will yell at you when you screw it up (eg you cast the state param wrong) or because you want to metaprogram on the types, for example using different parsing algorithms for different types automatically without the caller having to tell you which to use. Also keep in mind that classically you lose all the interprocedual optimization when you call across compilation units, so remember to use global optimization. Also it's generally also trivial to get a C ABI version of a template by simply passing the instantiation parameters to it!
so the main takeaway is to leave gcc and use clang instead? bit of a shame because compiling in gcc takes significantly less typing (g++ file.cpp) than compiling in clang (clang++ file.cpp) and that only gets you c++98 in clang
If the time you need to type to call the compiler is your bottleneck, just stay with g++ (or make an 1-letter alias for your favorite compiler + arguments). If you actually care about runtime performance, set up your build-system to compile with different compilers and do regular benchmarks.
i was making a joke in that second sentence, really. I'm a physics student so I mostly write small single-file prototype things and it's not worth bringing out the makefile for that. still, I'll probably do that aliasing thing, later, when i can be arsed
He's not talking only for you. I hope you don't complain every time somebody says something you already know. I've never listened to such a talk before and I'm glad that it is not 5 year's old. BTW, many people still think that C++ abstractions have to be slower than C code and assembly is the fastest you can get while it is obvious that on any reasonably complex program, the compiler outperforms the best assembly writers. Still a lot of new tricks to learn to old dogs, which you apparently aren't ;)
Man, this presentation made me tear up. What a humble guy to have made and provide such a great service and to ask so little for it.
Thanks Matt.
Its crazy how these talks are all free on youtube
32:28
I'm sorry if this has been worked out by someone else in the comments before, but my old grad school instincts took "I haven't bothered to work it out" to mean "it's a good exercise to work out what this does," which is exactly what I did.
The assembly in the output is just (a
I love godbolting my code! I have learned so much from it!
Really enjoyed this, had no idea quite how clever compilers were getting
Small mistake @44:51. Matt says the problem is if you pass INT_MAX, but that will overflow in both cases. The actual edge case is close to sqrt(INT_MAX), which will overflow in the first case but not the second. I'm guessing that's the objection that was raised by the audience member.
The "int" in the signature is 32 bit and imul operates with 64 bit arguments so I'm pretty sure there's no overflow except for the INT_MAX.
Excellent, excellent talk! Every single piece of information was valuable and entertainingly delivered.
Yes, talking about execution of dynamically compiled codes. Few years back I wrote one directory traversal codes, online compiled it and managed to traversed the directory of the server. I highlighted this vulnerability to the site-owner.
@31:40 :D
Also, I'm getting flashbacks from optimizing multiplication by 40 on the C64 in machine code. shift, shift, shift, store, shift, shift, add. The factor 40 comes up a lot because there are 40 bytes of characters per row.
Fun & instructive, I'm glad I watched this. Terrific work, Matt.
What a blast from the past. The good old days of programming a TRS-80 in assembly. Great lecture.
1:06:00 I love that someone else but me still cares about "URLs live forever".
22:13 -- notice that GCC emits an extra instruction. Two rets one after the other. The one on line 12 could simply be deleted.
Not neccesarily. The compiler may add redundant instructions in order to align the instructions in the instruction cache.
@@jakobnissen5723 That is not the case here. The second-to-last `ret` on line 12 is already aligned; jumping to a redundant one after it is _completely_ pointless.
@@alexreinking I think Matt explains it here: th-cam.com/video/w0sz5WbS5AM/w-d-xo.html
Thank you for the effort, it's an amazing tool and presentation was fun :)
Glad it helped!
The bit halfway through made me wonder if there are any benchmarks where people recompile old code for old hardware on new compilers (and new machines to save time), and see if the old stuff runs faster afterwards.
Good point!
I'm sure a fresh compile would benefit a decade old program, given how clever the compilers have become.
narutofan9999 that's not very helpful haha XD
That's actually a neat point of Java. Just run the same code on another CPU (with an jvm that understands it) and you are good to go to use new instructions the CPU offers.
I saw that René compiled some basic program (gzip?) with the newest compiler on some ancient hardware th-cam.com/users/MoreReneRebe
It was slower. I think it might have been after mitigations, but larger code might not benefit the smaller caches from back then.
I know it's been six years, but if you're still curious you should look into the efforts to improve the performance of mario 64
It has compiled your code!
Did he just call Google a 'small internet startup', there at 3:00?
It was in 1998.
Anyone have a link for the talk he mentions at 6:53?
quite a few processors have popcount procedure
If you want to optimize virtual methods then use the "final" attribute on classes.
with LTCG/LTO it still works as Matt said :)
final does not guarantee in any way, that there won't be virtual calls...
it's enough you have two classes both final, that implement a shared interface.
>Wrote a whole IRC client with UI and scripting support in assembly while in high school.
Amey Narkhede That seems like a huge undertaking.
@@pha1994 mostly drudgework
writing assembly is nowhere near as incomprehensible as reading compiled code listings or, worse, disassembly
Wish I had watched this a couple month ago!
Is there a similar but for ARM 64bit architecture?
Most of the same optimizations are done for most architectures
Brilliant!
36:30 next time someone tells me Ubuntu is rock solid with no bugs
47:28 "... making the code efficient and *_most_* *_often_* *_correct_* too." ROFL!!!
Once I heard you'll be using Intel syntax, I had to upvote.
He mentioned all the languages except for Rust, which was supported for a pretty long time :(
Does anyone know if the other talk Matt kept talking about has (or will) be uploaded?
The jsbeeb talk wasn't recorded, but a longer version is available here: th-cam.com/video/37jyHQT7fXQ/w-d-xo.html (if that's the one you meant)
Wow, I could make a lot of videos on "what has my compiler done AGAINST me"
LD_PRELOAD? Lame. Ptrace the thing instead and catch the syscalls no matter how they were reached.
Thanks for the idea; I'll check it out
No, use seccomp with a BPF filter.This is why seccomp exists.
I'm currently considering both 'firejail' and 'isolate' which both use bpf filters and namespaces. Thanks all!
OMG. I though I'm so clever because I have made script that delete unused functions and variables.
Dunning Kruger effect.
31:05
Lol
javascript x86 emulator exists (eg : v86) but they are not as complete as you'd like to :D (ho nvm, it was pointed out in Q&A)
This cleverness is starting to make template generics look like premature optimization. A C generic using (void*), (*)(), size, offset, stride will become a compile time generic due to cleverness. That is, if the parameters are constant, the compiler is just as happy to inline an indirect function call and unroll loops as if it were a with T::dostuff(). The C generic, however, has the advantage that it is simultaneously a runtime generic which can be called from Python or whatnot. So I'm amused by the irony in saying 1. compilers are clever while 2. advising the use of std:: template stuff. Said another way, templates tell the compiler to force compile time parameters when they are already clever enough to turn runtime parameters into compile time parameters when possible and defer to runtime what they must.
If you're using templates for speed, your using them wrong. You use them because they will yell at you when you screw it up (eg you cast the state param wrong) or because you want to metaprogram on the types, for example using different parsing algorithms for different types automatically without the caller having to tell you which to use.
Also keep in mind that classically you lose all the interprocedual optimization when you call across compilation units, so remember to use global optimization.
Also it's generally also trivial to get a C ABI version of a template by simply passing the instantiation parameters to it!
:D
so the main takeaway is to leave gcc and use clang instead? bit of a shame because compiling in gcc takes significantly less typing (g++ file.cpp) than compiling in clang (clang++ file.cpp) and that only gets you c++98 in clang
If the time you need to type to call the compiler is your bottleneck, just stay with g++ (or make an 1-letter alias for your favorite compiler + arguments).
If you actually care about runtime performance, set up your build-system to compile with different compilers and do regular benchmarks.
i was making a joke in that second sentence, really. I'm a physics student so I mostly write small single-file prototype things and it's not worth bringing out the makefile for that. still, I'll probably do that aliasing thing, later, when i can be arsed
Fun fact: It is literally never worth bringing out the makefile. Makefiles and build systems of all kinds are shit.
Just alias g++ to clang++ like Apple does by default. :^)
I aliased my build system to "r" for run, beat me
First
Doomsayer_Hazel first to reply to first
I see no point of this lecture. Every year we have that talk multiple times.
That exact talk? Including the new stuff and Q&A? For example, I didn’t know about the JS-based x86 emulator.
He's not talking only for you. I hope you don't complain every time somebody says something you already know.
I've never listened to such a talk before and I'm glad that it is not 5 year's old.
BTW, many people still think that C++ abstractions have to be slower than C code and assembly is the fastest you can get while it is obvious that on any reasonably complex program, the compiler outperforms the best assembly writers. Still a lot of new tricks to learn to old dogs, which you apparently aren't ;)
And as long as people don't learn, there shall be more!
I see no point of your comment. Every year we have this comment multiple times
waste of time