This talk stands out as head-and-shoulders above quite a few other C++ talks I’ve watched lately. Clear, interesting, confidently and fluently presented. And of great relevance to to the work I’m doing. Well done.
Great talk, I especially liked the const segment aliasing idea! Regarding the 32 byte benchmark, I expect it favors SIMDString because 32 bytes is too big for std::string's small string optimization.
Although I like the idea of abusing read only memory, I wonder how well this optimization works with DLLs. Can their const char * be considered read only, and if so, what happens if you unload a DLL.
it seems odd; the source code snippet used for motivation doesn't seem to need any dynamically allocated strings at all, it only uses arrays of characters known at compile time or combinations of them (assuming that m_shaderName is very limited in its domain), and as you said, using std::string there seems wrong
If detecting compile time strings is crucial, maybe using a sentinal to flag those strings is more portable across architectures than hacky looking address inspection.
Maybe, but you would still have to somehow figure out if the const char* is actually a constant string or not. You could brake the std::string api conformance and pass a flag to the constructor, but then it's not a drop in replacement anymore. Also think about having a third party function that returns a std::string_view or a const char* and you want to take ownership of its lifetime so you crate a std::string and pass it around. At this point you have no way of knowing where this pointer came from so you don't know the what to set the is_constant flag to, but the "hacky" approach can still identify the pointer as a constant string.
Before using this class, it sounds like, I need to know how big are my strings on average and decide whether to use the 16, 32, 64 or 128 variant. Otherwise, most of that allocated stack buffer may not be used.
wonder how plausible it would be for OSes / executable formats to provide a canonical way to tell if a string comes from the executable's const data segment. Would be a boon to optimizations like this **EDIT**: In a performant way - just saw Window's OS calls to check that and it definitely would lose the perf benefits here
Well it's pretty easy to get the image header (__ImageBase), and not too tricky to from there find the segment headers and determine which segment you care about. That gets you an address and size you can store to check pretty much instantly from there.
maybe i'm confused, but the canonical way to tell if a string comes from the const data segment is if its representation is a valid key in your string table, whole string objects are usually not passed around at runtime that much, even less so strings known at compile time
I've tested this and std::string is beating it on an AMD cpu, on the other hand works very well on an intel. char array beats everything else tho. So if performance matters that's what should be used.
@@yokozombie No you don't. If I create a string of 1024 bytes it's pointless having those bytes default initialised if I'm going to overwrite them later. No flag is necessary
Really great talk and very creative optimizations! It would be really awesome to be able to use hashes and a string table for a lot of this though.. Then we'd only have to be passing around ints and converting to string when we really have to.
That rodata check makes no sense to me. Either 1) if you construct from const char * the string actually lies in rodata or 2) if it doesn't, the memory associated with it must be provided across the lifetime of the string object. Just make that a precondition. Or else provide an extra constructor overload that copies the const char data, in turn saving that extra branch (check).
Since the requirements for computer graphics shown here are pretty much the requirements for most programming tasks, it would be interesting to know why the std::string implementations suck so much. Also, it would be nice to know what particular aspects of the std::string implementations cause its slowness relative to the SIMD::string. Perhaps it is mostly all about handling small strings faster.
It appears that this library is focused on manipulating short text strings as elements in a communications protocol (CPU => GPU) rather than as a more typical data type. So long as your string manipulations are bounded to combinatorics of data that's compiled into your program's data block, this is perfect. In a more general case, where the string type contains user-provided data, it's likely to be harmless at best. I work largely with REST APIs in my day-to-day, and I'd be interested to see if there's any remaining performance benefit for a typical JSON serialization use case. Sure, the curly braces and colons and quotes are all compiled into the program as constants, but the vast majority of content is going to be determined at runtime. If you hook SIMDString into a JSON serializer with a typical workload, is there any benefit? Something to experiment with!
The comparison shown is the performance of working with 32-byte strings using a type with 128-byte SSO (speaker's implementation) vs 16-byte SSO (typical std::string implementation). It doesn't demonstrate that std::string sucks, it only demonstrates the advantage of picking a string type which makes the correct trade-offs for the specific length of string you are expecting. The way its presented comes off as kind of disingenuous, because the graph is basically just comparing the performance of dynamically allocated strings vs fixed-sized 128 byte arrays.
This talk stands out as head-and-shoulders above quite a few other C++ talks I’ve watched lately. Clear, interesting, confidently and fluently presented. And of great relevance to to the work I’m doing. Well done.
somebody get that guy a bottle of water
I had to turn it down otherwise I wouldn't have been able to watch it to the end
Wow! Morgan McGuire!! I'm so excited! Thanks CppCon!!
Great talk, I especially liked the const segment aliasing idea! Regarding the 32 byte benchmark, I expect it favors SIMDString because 32 bytes is too big for std::string's small string optimization.
Zander covered that in the q&a portion. std::string's small string optimization is apparently not very optimal.
Although I like the idea of abusing read only memory, I wonder how well this optimization works with DLLs. Can their const char * be considered read only, and if so, what happens if you unload a DLL.
Christmas has come early! What a great talk!
Can we add the function in 14:50 to the C++ standard?
In rendering engines... we don't use strings for binding at runtime, and if you do, you're doing it wrong arguably
it seems odd; the source code snippet used for motivation doesn't seem to need any dynamically allocated strings at all, it only uses arrays of characters known at compile time or combinations of them (assuming that m_shaderName is very limited in its domain), and as you said, using std::string there seems wrong
If detecting compile time strings is crucial, maybe using a sentinal to flag those strings is more portable across architectures than hacky looking address inspection.
Maybe, but you would still have to somehow figure out if the const char* is actually a constant string or not. You could brake the std::string api conformance and pass a flag to the constructor, but then it's not a drop in replacement anymore. Also think about having a third party function that returns a std::string_view or a const char* and you want to take ownership of its lifetime so you crate a std::string and pass it around. At this point you have no way of knowing where this pointer came from so you don't know the what to set the is_constant flag to, but the "hacky" approach can still identify the pointer as a constant string.
Great talk! I would to try this class! Could you share the open source version, cause I couldn't find anything reasonable on Github
Very good talk, thank you.
Before using this class, it sounds like, I need to know how big are my strings on average and decide whether to use the 16, 32, 64 or 128 variant. Otherwise, most of that allocated stack buffer may not be used.
You’ll probably find that doesn’t matter on most modern architectures. How close do you think you’re going to get to the end of the stack anyway?
wonder how plausible it would be for OSes / executable formats to provide a canonical way to tell if a string comes from the executable's const data segment. Would be a boon to optimizations like this
**EDIT**: In a performant way - just saw Window's OS calls to check that and it definitely would lose the perf benefits here
Not really losing performance benefits on Windows, as it only runs it once due to it being static.
Well it's pretty easy to get the image header (__ImageBase), and not too tricky to from there find the segment headers and determine which segment you care about. That gets you an address and size you can store to check pretty much instantly from there.
maybe i'm confused, but the canonical way to tell if a string comes from the const data segment is if its representation is a valid key in your string table, whole string objects are usually not passed around at runtime that much, even less so strings known at compile time
I've tested this and std::string is beating it on an AMD cpu, on the other hand works very well on an intel.
char array beats everything else tho. So if performance matters that's what should be used.
std::string initialises the memory it allocates for the string. Future versions of C++ should allow you to construct it uninitialised.
because you need later to know which string is not initialized you have to store some flag anyway, the difference could not be significant
@@yokozombie No you don't. If I create a string of 1024 bytes it's pointless having those bytes default initialised if I'm going to overwrite them later. No flag is necessary
@@iddn What he meant was how we're going to tell a initialized string from unitialaized one
Really great talk and very creative optimizations! It would be really awesome to be able to use hashes and a string table for a lot of this though.. Then we'd only have to be passing around ints and converting to string when we really have to.
That rodata check makes no sense to me. Either 1) if you construct from const char * the string actually lies in rodata or 2) if it doesn't, the memory associated with it must be provided across the lifetime of the string object. Just make that a precondition. Or else provide an extra constructor overload that copies the const char data, in turn saving that extra branch (check).
Since the requirements for computer graphics shown here are pretty much the requirements for most programming tasks, it would be interesting to know why the std::string implementations suck so much. Also, it would be nice to know what particular aspects of the std::string implementations cause its slowness relative to the SIMD::string. Perhaps it is mostly all about handling small strings faster.
It appears that this library is focused on manipulating short text strings as elements in a communications protocol (CPU => GPU) rather than as a more typical data type. So long as your string manipulations are bounded to combinatorics of data that's compiled into your program's data block, this is perfect. In a more general case, where the string type contains user-provided data, it's likely to be harmless at best. I work largely with REST APIs in my day-to-day, and I'd be interested to see if there's any remaining performance benefit for a typical JSON serialization use case. Sure, the curly braces and colons and quotes are all compiled into the program as constants, but the vast majority of content is going to be determined at runtime. If you hook SIMDString into a JSON serializer with a typical workload, is there any benefit? Something to experiment with!
@@kyledrudy Well he did say the 2 meg string case, the larger cases the performances were practically equal to standard string anyway.
The comparison shown is the performance of working with 32-byte strings using a type with 128-byte SSO (speaker's implementation) vs 16-byte SSO (typical std::string implementation). It doesn't demonstrate that std::string sucks, it only demonstrates the advantage of picking a string type which makes the correct trade-offs for the specific length of string you are expecting. The way its presented comes off as kind of disingenuous, because the graph is basically just comparing the performance of dynamically allocated strings vs fixed-sized 128 byte arrays.
8:28 "A lot of game engines use std::string."
No, they do not. Tools, maybe.