Jason Turner, Herb Sutter, Scott Meyers, Venkat Subramaniam, Kate Gregory, and Dylan Beattie I think are some of the greatest technical speakers of our day. I thoroughly enjoy every talk I've seen by each of those people
Jason's awesome - after watching his constexpr CppCon talk I'm now going through all his presentations; totally dig his style & you're certain to take away a couple of things from *every* talk 👍
Thank you Jason Turner for your talks. Compared to the time I spent watching it and what I get out of it, even if it just a repetition, really great! One of the best even.
Here's a tip for programming language snippets inside of presentations: Write out the code instead of taking a screen shot. Color that code as an editor would. Sometimes you'll need to remove the slide title; that's fine, the point is the code example. The title can be put on a title slide then move over to the code slide.
Already love the first example. I wrote a performance/frame timer func, and obviously that *needs* to be as performant as possible so as not to burden the rest of... well, everything. Initially, I used Chrono. Too slow. Then I switched to filetime, which was much faster. But it still soaked up like 5% of my spent CPU, so, why? Well I checked out the ASM and saw that for some reason it was performing two [load effective], [copy]'s. I then realized, it's because I'm using both halves of the ULI I made to extract the full accuracy quadpart. I checked and, the low part works fine for comparisons for up to around 450s. A frame is 16.667ms or below. I dropped the upper part of the ULI along with the quadpart line, and the function was 50% faster. Don't trust the gut, trust the ASM and benchmarking.
I didn't get on 40:00 , why many instantiations of shared pointers influence on compile RAM and time, but unique_ptr does not? Could someone please explain this point? Thank you.
I believe unique pointers just manage the scope of the pointer and don't let you move it outside the scope. They delete themselves upon the scope ending. With shared pointers, the references have to be kept track of and incremented and decremented every time you move it around.
11:22 if you spawn more threads on val of uncached Int, you can be calculating more than once, if context switch will occur after entering if and before setting is_calculated.
@7:50 Jason showed string with const modifier and stated that "const" may increase performance. I didn't get that? I thought that during assignment construction only corresponding constructor will be called, isn't it? and if so how const is helpful here
You are right, const is not helpful at all in this example. But maybe the example is too small and compiler sees that string without const is not modified anyway so it is const qualified internally anyway.
I usually think of a data race as a bug. In this case I'd say there's no problem with the code. Whatever we do to prevent the value being calculated multiple times is probably going to hurt performance.
great talk, but I still don't see how move/copy semantics can be useful in polymorphic classes. Like, if you want to move the polymorphic object, you pretty much just reassign the pointer; and for copies you'd need a virtual clone method. As the person @17:30 (is trying to) say, enabling copy/move just allows you to implicitly slice objects, hardly a good practice
The fact is you actually can make two constructors to ensure copy elision struct A { std::string a; A(std::string const& a_) : a(a_) {} A(std::string && a_) : a(std::move(a_)) {} };
just one comment for 11:30 - even the values became atomic, de method should be atomic itself. The variable usages could get inconsistent because you didn't block the whole function. If you know what I mean
40:00 anyone can explain me why there will be only one shared_ptr created? I tought the vector is constructed of shared pointers, and there are 4 objects.
That stuff about Base/Derived class redefining the virtual destructor disables move operations. and the solution to type all those default rules out.. ugh i cant believe that is the solution
I thought that lambda were implemented by std::function and binding under the hood, how come they don't lead to any overhead ? I mainly use lambda for storing callbacks
Regarding unique_ptr vs shared_ptr (th-cam.com/video/uzF4u9KgUWI/w-d-xo.html): I do not get comparable results in MSVC. shared_ptr ends up (slightly) faster to compile (with default compiler settings under debug). EXE is still bigger though, and I'm sure the performance implications still apply.
- 9:00, wouldn't it be better coded as: const std::string s (std::move (std::string ("long string is mod ") + ('0' + std::rand() % 4))); ? - 13:25, couldn't you had inline val() and the constructor? - 25:40, is that valid for explicit casts of dynamic_cast too?
yes shared_ptr keeps a refcount which is just a number that is incremented atomically every time a new shared_ptr is created from copying another one and decremented when a shared_ptr is destructed. When the refcount reaches 0 the object owned by the pointer(s) is destroyed. When you pass the shared_ptr by value you are copying it so it increments the refcount, if you instead pass the shared_ptr by reference you are essentially passing a pointer to the shared_ptr object and that does not count as copying the shared_ptr. Passing a shared_ptr by value means sharing the ownership of the underlying object, passing it by reference does not entail sharing and its usually an error. As shown in the slides you most likely just want to .get() the raw pointer and pass it.
All these talks are fine, but I'm always struck by what's missing. What's missing is _diagnosis._ _How do you know what to fix?_ There's a bone-simple method I've used for half a century, and it resembles the _poor man's profiler._ I don't do it for economic reasons. I do it because _it actually works,_ where profilers don't. I'm not saying profilers don't measure stuff. I'm saying _they don't tell you much to fix,_ and if you're a typical programmer, you're happy to hear that. All it requires is the ability to take a SMALL NUMBER of samples of the call stack, such as with a debugger. AHEM. That's CALL STACK, where you can see each line, not just PROGRAM COUNTER, which is all the optimizer cares about. Look for calls you could avoid. You're looking for code that could be improved, which is kind of like a bug, but not a correctness bug. After you fix it, you realize it was a bug, but only in the sense that it was doing unnecessary stuff - a speed bug. If a speed bug is big enough to be worth fixing, i.e. if the time that could be saved by fixing it, is more than 10% (typically it is 20%-90%), then you WILL SEE IT two or more times in a small number of samples. I usually take between 5 and 20 samples. If you see it just once, that could be by chance. But if you see it twice, you can be sure it's real. And, there is never just one speed bug. Usually there are several, in a range of sizes. Here's the thing: _removing one magnifies the others, so it takes fewer samples to see them._ The minimum execution time you can get to is what's left over after they've all been removed, and that can be _orders of magnitude_ smaller than what you started with. Just a few typical speed bugs are: - Calling *new* or *delete* when a prior object could just be re-used. - Calling *pushback* which deletes, reallocates, and copies an array just to make it 1 element bigger. - I/O time formatting or writing lines to a log file that nobody reads. - I/O time reading a DLL file to get a string translation resource, when the string doesn't need to be translated. - Calling an array-indexing function to index an array, to make sure the index is within range, when you know the index cannot be out of range. - Calling *==* between strings to check program state, when integers could be used. ... there is no limit to the ways time can be wasted.
at 28:50 using ' ' instead of std::endl, that makes my code linux locked, because windows uses '
'. Not something that will cause me problems most of the times, but something that should be mentioned anyways, because portability should be something one should keep in mind.,
' ' is portable on the commandline, AFAIK. When you're just printing to stdout, ' ' will always give you what you expect. You'll only have a problem with Notepad, but you can also open the stream in text mode in that case, which does platform-specific conversions.
Nope. Windows uses ' ' (LF); and Linux, " " (CRLF). I saw that a bunch of times, using hex editor. If your app. expected to find ' ', it may crash if it was important.
10::34 Don"t do more than you have to class (Int) { Int(std::string t_s) : s(t_s) , value(std::atoi(s)) {}; std::string s; int value; } ; If you calculate the value in the constructor you dont"t have to declare is the variable isCalculated; And you dont need to call the Val function. So the function Val is obselete.
Jason Turner, Herb Sutter, Scott Meyers, Venkat Subramaniam, Kate Gregory, and Dylan Beattie I think are some of the greatest technical speakers of our day. I thoroughly enjoy every talk I've seen by each of those people
Arther o'dwyer, mayer scott, Klaus Iksagberg, Dimitri Nesteruk,
Jason's awesome - after watching his constexpr CppCon talk I'm now going through all his presentations;
totally dig his style & you're certain to take away a couple of things from *every* talk 👍
Arthur O'dwyer is pretty good
Jason Turner has one the best talks this year.
Yeah for sure. Hope he will do more next time.
Definitely agree on that.
yep
yes, and I also like your cppcast
13:30 "Don't Const!" .... I like the way he thinks, ima do more of that ;)
I love how he is honest about his findings. Most people would try to forget and never mention again.
Thank you Jason Turner for your talks. Compared to the time I spent watching it and what I get out of it, even if it just a repetition, really great! One of the best even.
the c++17 c64 was an awesome talk, and this too
Here's a tip for programming language snippets inside of presentations:
Write out the code instead of taking a screen shot. Color that code as an editor would.
Sometimes you'll need to remove the slide title; that's fine, the point is the code example. The title can be put on a title slide then move over to the code slide.
Already love the first example. I wrote a performance/frame timer func, and obviously that *needs* to be as performant as possible so as not to burden the rest of... well, everything. Initially, I used Chrono. Too slow. Then I switched to filetime, which was much faster. But it still soaked up like 5% of my spent CPU, so, why? Well I checked out the ASM and saw that for some reason it was performing two [load effective], [copy]'s. I then realized, it's because I'm using both halves of the ULI I made to extract the full accuracy quadpart. I checked and, the low part works fine for comparisons for up to around 450s. A frame is 16.667ms or below. I dropped the upper part of the ULI along with the quadpart line, and the function was 50% faster.
Don't trust the gut, trust the ASM and benchmarking.
I didn't get on 40:00 , why many instantiations of shared pointers influence on compile RAM and time, but unique_ptr does not? Could someone please explain this point? Thank you.
I believe unique pointers just manage the scope of the pointer and don't let you move it outside the scope. They delete themselves upon the scope ending. With shared pointers, the references have to be kept track of and incremented and decremented every time you move it around.
@13:30 "don't _const_"?
11:22 if you spawn more threads on val of uncached Int, you can be calculating more than once, if context switch will occur after entering if and before setting is_calculated.
of course, it must be done with mutex. exactly the same as in Singleton pattern
@7:50 Jason showed string with const modifier and stated that "const" may increase performance. I didn't get that? I thought that during assignment construction only corresponding constructor will be called, isn't it? and if so how const is helpful here
You are right, const is not helpful at all in this example. But maybe the example is too small and compiler sees that string without const is not modified anyway so it is const qualified internally anyway.
If the compiler knows that a variable is never going to be changed, it can do other optimisations in other circumstances
26:26 why function accepting Base by reference as passed it to pointer?
He explained that was a bug.
at 12:34 there is still data race
Yes, atomics dont quite solve race condition
I don't see it.
I usually think of a data race as a bug. In this case I'd say there's no problem with the code. Whatever we do to prevent the value being calculated multiple times is probably going to hurt performance.
You are right, but at this point it might we worth mentioning that using multiple atomic's can still be dangerous.
I think there's a race condition (atoi() can be called multiple times), but no data race (no undefined behavior)
great talk, but I still don't see how move/copy semantics can be useful in polymorphic classes. Like, if you want to move the polymorphic object, you pretty much just reassign the pointer; and for copies you'd need a virtual clone method. As the person @17:30 (is trying to) say, enabling copy/move just allows you to implicitly slice objects, hardly a good practice
20:26 If the string 's' was const then how would you move it.
You can move it, because it is being copied to the struct S via the constructor's parameter, which is copied by value.
Peterolen it's first copied and then moved from the copy
The fact is you actually can make two constructors to ensure copy elision
struct A {
std::string a;
A(std::string const& a_) : a(a_) {}
A(std::string && a_) : a(std::move(a_)) {}
};
just one comment for 11:30 - even the values became atomic, de method should be atomic itself. The variable usages could get inconsistent because you didn't block the whole function. If you know what I mean
40:00 anyone can explain me why there will be only one shared_ptr created? I tought the vector is constructed of shared pointers, and there are 4 objects.
The question was about number of instantiations of template. Since we use only one type which is int, only one instance of template will be created.
This reminded me of a few habits I ingrained so long ago that I don't even notice them any more.
That stuff about Base/Derived class redefining the virtual destructor disables move operations. and the solution to type all those default rules out.. ugh i cant believe that is the solution
C++ is actually surprisingly bad at being a fast language.
I thought that lambda were implemented by std::function and binding under the hood, how come they don't lead to any overhead ?
I mainly use lambda for storing callbacks
Nope. It is most often implemented as new anonymous class with overriden call operator. So each lambda has unique type.
Storing a shared_ptr in a class so that you can have a getter for a complicated data structure is the only time I ever use shared_ptr.
Wow this one took off at 3:00
What I learned from this talk: don't use red syntax highlighting on a gray background!
Practical Performance Practices - may be you need to apply some principles to the title of the presentation, as well :) great talk !
Regarding unique_ptr vs shared_ptr (th-cam.com/video/uzF4u9KgUWI/w-d-xo.html):
I do not get comparable results in MSVC. shared_ptr ends up (slightly) faster to compile (with default compiler settings under debug). EXE is still bigger though, and I'm sure the performance implications still apply.
I thought std::make_shared is intrusive (1 allocation for both control block and the object)
this talk is super helpful for me
Very pleased to hear that the presentation was helpful!
- 9:00, wouldn't it be better coded as:
const std::string s (std::move (std::string ("long string is mod ") + ('0' + std::rand() % 4))); ?
- 13:25, couldn't you had inline val() and the constructor?
- 25:40, is that valid for explicit casts of dynamic_cast too?
There is no point of moving temporary because it is already rvalue. On contrary, using std::move can prevent copy elision and be slower.
thanks
24:20 incrementing REFERENCE count by passing by VALUE? Not passing by REFERENCE?
yes
shared_ptr keeps a refcount which is just a number that is incremented atomically every time a new shared_ptr is created from copying another one and decremented when a shared_ptr is destructed. When the refcount reaches 0 the object owned by the pointer(s) is destroyed.
When you pass the shared_ptr by value you are copying it so it increments the refcount, if you instead pass the shared_ptr by reference you are essentially passing a pointer to the shared_ptr object and that does not count as copying the shared_ptr.
Passing a shared_ptr by value means sharing the ownership of the underlying object, passing it by reference does not entail sharing and its usually an error. As shown in the slides you most likely just want to .get() the raw pointer and pass it.
He should do `std::stoi` instead of `std::atoi(string.c_string())`
I'd rather prefer std::from_chars (since C++17).
All these talks are fine, but I'm always struck by what's missing. What's missing is _diagnosis._ _How do you know what to fix?_
There's a bone-simple method I've used for half a century, and it resembles the _poor man's profiler._ I don't do it for economic reasons. I do it because _it actually works,_ where profilers don't. I'm not saying profilers don't measure stuff. I'm saying _they don't tell you much to fix,_ and if you're a typical programmer, you're happy to hear that.
All it requires is the ability to take a SMALL NUMBER of samples of the call stack, such as with a debugger. AHEM. That's CALL STACK, where you can see each line, not just PROGRAM COUNTER, which is all the optimizer cares about. Look for calls you could avoid.
You're looking for code that could be improved, which is kind of like a bug, but not a correctness bug. After you fix it, you realize it was a bug, but only in the sense that it was doing unnecessary stuff - a speed bug.
If a speed bug is big enough to be worth fixing, i.e. if the time that could be saved by fixing it, is more than 10% (typically it is 20%-90%), then you WILL SEE IT two or more times in a small number of samples. I usually take between 5 and 20 samples. If you see it just once, that could be by chance. But if you see it twice, you can be sure it's real.
And, there is never just one speed bug. Usually there are several, in a range of sizes. Here's the thing: _removing one magnifies the others, so it takes fewer samples to see them._ The minimum execution time you can get to is what's left over after they've all been removed, and that can be _orders of magnitude_ smaller than what you started with.
Just a few typical speed bugs are:
- Calling *new* or *delete* when a prior object could just be re-used.
- Calling *pushback* which deletes, reallocates, and copies an array just to make it 1 element bigger.
- I/O time formatting or writing lines to a log file that nobody reads.
- I/O time reading a DLL file to get a string translation resource, when the string doesn't need to be translated.
- Calling an array-indexing function to index an array, to make sure the index is within range, when you know the index cannot be out of range.
- Calling *==* between strings to check program state, when integers could be used.
... there is no limit to the ways time can be wasted.
Doesn't IIFE has cost and should it be used just for const initialisation as const is just for code maintainability n has no performance gains?
At even O1 optimizations it should be inlined
at 28:50 using '
' instead of std::endl, that makes my code linux locked, because windows uses '
'. Not something that will cause me problems most of the times, but something that should be mentioned anyways, because portability should be something one should keep in mind.,
'
' is portable on the commandline, AFAIK. When you're just printing to stdout, '
' will always give you what you expect. You'll only have a problem with Notepad, but you can also open the stream in text mode in that case, which does platform-specific conversions.
Jason Turner mentioned this in one of his C++ weekly videos on youtube. '
' is portable
Nope. Windows uses '
' (LF); and Linux, "
" (CRLF). I saw that a bunch of times, using hex editor.
If your app. expected to find '
', it may crash if it was important.
You actually have those flipped (Windows uses CRLF, modern *NIX uses LF). Unless you're in a file, std::cout
iostreams are formatted IO. That means they do newline conversions, and possibly other conversions. Also on Windows the newline sequence is
.
Tbh that 'interaction' thing makes this way longer than needed. Also comes across as a bit pedantic.
10::34 Don"t do more than you have to
class (Int) {
Int(std::string t_s) : s(t_s) , value(std::atoi(s)) {};
std::string s;
int value;
} ;
If you calculate the value in the constructor you dont"t have to declare is the variable isCalculated;
And you dont need to call the Val function. So the function Val is obselete.
Opps I apoligize.
You have already done it.
I'm a bit puzzled, because at lacture about "Practical Performance Practices" people call out bugs on so many slides...
"So many slides" is actualy a whopping amount of two and it does not influence in anyway the message they are conveying.
So many bugs on your slide