I write "tread_local static", as I thought you should and MSVC doesn't complain. It makes it obvious. Better use it sparingly, anyway, it is a kind of global variable.
I'm using thread_local a lot if a function needs an internal buffer / string which stays pre-reserved for the next call. The size of the vector or string with stick to it's maximum size for all calls and re-allocations will be rare. This can give a massive speedup of the code.
Lol, ok never got to the good part of thread_local so here at least is my main _use case_ for thread local (as I have used it and seen it used). If you have a scenario where you have multiple threads, and at some point you allocate memory on your threaded tasks (most of the time this will happen through std::vector) then you may notice that your threads will block each other, this is because they all need to sip some heap memory and there is only one drinking fountain for that memory and they all have to share it. But if you declare your vector as `thread_local std::vector foo` you get to have your own drinking fountain and you don't block anyone else who needs to drink, and by that I mean allocate memory, sorry for the dumb analogy. So thread_local allows you to do allocation local to your thread, this huge if say you made a CPU side ray tracer that has really high utilization of each thread, and each task on the thread needs to allocate. Without thread_local memory management becomes very complex, and very painful. But at the end of the day just know that it prevents threads from blocking on memory allocation. Ok that's all :D
I think the main benefit of this is that you can be 100% sure that mutation of these variables is thread-safe. You can use this for small caches, hash states, RNG states -- something that will be used very frequently, takes up a limited amount of space and doesn't have to be shared between threads
Yeah, the whole subject of storage class specifiers is scary complicated, though the fun begins in project with multiple translation units and usage of inline definitions.
a constant initialized value is *always* better (think constexpr), then I'd probably aim for something like `std::atomic` dependent on use case, then maybe thread_local. Test and measure. All of those things are almost guaranteed to be better than an explicit mutex. Best performance is almost 100% to do an explicit copy of data into the thread then mutate your copy then return the mutated copy. Non-sharing of data avoids all of the issues of sharing data.
I'm always happy to have topic requests - they are tracked here: github.com/lefticus/cpp_weekly/issues/ Feel free to add your request and vote on the other topics!
The thread_local makes me want constexpr string_view for separated threads. C'mon, "static thread_local constexpr" backed string_view should be a thing.
ThreadLocal is long existing in Java, in the latest edition they changed it somewhat. What I like about the C++ implementation of jthread is that RAII works also with Threads. I used ThreadLocal in implementations where each http request has it's own thread. Nice for config stuff per thread, without spreading everything under and copying the config everywhere. A singleton per thread. Obviously you have to be now what you're doing, but that's the C++ mantra anyways.
I find the way the raw Windows API deals with TLS easier to understand. TlsAlloc creates a memory slot for you in every thread. What you store in this slot is unique per thread.
so, when you have a mutable static variable (for a cache for example) and somebody comes and tell you, “this is not thread-safe” you simply put a thread_local and you are done?. I can’t explain why but I have the feeling that this is cheating and it is a bad idea in the long run. Does anybody agree or have a better global perspective on this?
As far as I know, that's it. Statics aren't inherently thread-safe, and this gives you individual thread-safe statics instead of shared memory that requires guards. The only downside I can see is that it may not always be reliably static with dynamic libraries (as another comment pointed out with DLLs on Windows). But if you're _relying_ on a static being initialized on a call instead of using it as an optimization, that's already bad design, IMHO. That's reliance on side-effects, when you should be designing toward functional purity and only deviating from it when such an optimization yields high value. If it _needs_ to be initialized, it should be a parameter or wrapped in an object at the calling scope (e.g. a generator/iterator).
@@bloodgain I know that some libraries (that I haven’t used) give “lightweight” threads (like HPX) and these don’t (can’t?) have thread_local variables. This is as much as I was able to tell about possible limitations of this feature.
@@bloodgain Agreed, it is often the "right" answer. The main downside being that the value is initialized once per thread and depending on cost, that might be noticeable, but you avoid explicit locks (good) Even better is just to avoid mutable globals and mutable state shared between threads...
Personally, I prefer more stringent management of threads instead of this floaty nonsense. Either it's running or it's not, and as the programmer I should damn well know which state it's in at any given time. I would hope that more programmers would agree with this sentiment because it's a scary landscape if not.
TLDR thread_local is like static but per thread, the rest of the video isn't strictly relevant
I write "tread_local static", as I thought you should and MSVC doesn't complain. It makes it obvious. Better use it sparingly, anyway, it is a kind of global variable.
@@raymundhofmann7661 yeah, but at least errno was made thread_local in C11 (unlike before that)
Thank you so much for talking about earlier versions of C++. It helps beginners like me.
I'm using thread_local a lot if a function needs an internal buffer / string which stays pre-reserved for the next call. The size of the vector or string with stick to it's maximum size for all calls and re-allocations will be rare. This can give a massive speedup of the code.
fmt/format can print anything that can be printed to std::ostream but you need to include for it.
Lol, ok never got to the good part of thread_local so here at least is my main _use case_ for thread local (as I have used it and seen it used). If you have a scenario where you have multiple threads, and at some point you allocate memory on your threaded tasks (most of the time this will happen through std::vector) then you may notice that your threads will block each other, this is because they all need to sip some heap memory and there is only one drinking fountain for that memory and they all have to share it. But if you declare your vector as `thread_local std::vector foo` you get to have your own drinking fountain and you don't block anyone else who needs to drink, and by that I mean allocate memory, sorry for the dumb analogy. So thread_local allows you to do allocation local to your thread, this huge if say you made a CPU side ray tracer that has really high utilization of each thread, and each task on the thread needs to allocate. Without thread_local memory management becomes very complex, and very painful. But at the end of the day just know that it prevents threads from blocking on memory allocation. Ok that's all :D
i really wished you talked about the potential use cases for this too, since you mainly concentrated on what it does.
I think the main benefit of this is that you can be 100% sure that mutation of these variables is thread-safe.
You can use this for small caches, hash states, RNG states -- something that will be used very frequently, takes up a limited amount of space and doesn't have to be shared between threads
@@yato3335 yeah it's great for RNG states, allowing deterministic multithread Monte Carlo computations for instance.
You should have used thread local in the global namespace (file scope). That makes things even more tricky.
Yeah, the whole subject of storage class specifiers is scary complicated, though the fun begins in project with multiple translation units and usage of inline definitions.
The example also demonstrates the delay of spinning up a new thread. That is why fork-join is not an optimal threading pattern.
5:03 Actually it seems that jthread was only introduced in C++20
Oops, I said 17 didn't I?
Good luck if you want to use this in a system with dynamicly loaded dlls on windows.
That is weird. In C11, thread_local objects must have global storage duration, but in C++ they don't have to?
6:05 Instead of stringstream, how about C++20's osyncstream?
github.com/lefticus/cpp_weekly/issues/435
How performant are these? Is it ok to use them in high frequency code?
a constant initialized value is *always* better (think constexpr), then I'd probably aim for something like `std::atomic` dependent on use case, then maybe thread_local. Test and measure. All of those things are almost guaranteed to be better than an explicit mutex.
Best performance is almost 100% to do an explicit copy of data into the thread then mutate your copy then return the mutated copy. Non-sharing of data avoids all of the issues of sharing data.
Would love a deeper dive into how and when thread_local objects get destroyed when a thread ends.
I'm always happy to have topic requests - they are tracked here: github.com/lefticus/cpp_weekly/issues/ Feel free to add your request and vote on the other topics!
We'll have to discuss this tomorrow
All I know is that its lifetime ends at the end of the thread, but before the end of static lifetimes. 😎
The thread_local makes me want constexpr string_view for separated threads. C'mon, "static thread_local constexpr" backed string_view should be a thing.
Usually i use thread locals for caching things like JNIenv.
Why did puts solve the race condition? Is it just a quicker operation than cout?
No the difference is that only a single operator
ThreadLocal is long existing in Java, in the latest edition they changed it somewhat. What I like about the C++ implementation of jthread is that RAII works also with Threads. I used ThreadLocal in implementations where each http request has it's own thread. Nice for config stuff per thread, without spreading everything under and copying the config everywhere. A singleton per thread. Obviously you have to be now what you're doing, but that's the C++ mantra anyways.
I understood and used TLS before this video. After watching it, I have no idea what thread_local does. :)
I find the way the raw Windows API deals with TLS easier to understand. TlsAlloc creates a memory slot for you in every thread. What you store in this slot is unique per thread.
It tells pretty clearly what behavior you get from thread_local, it just doesn't tell you how it's done, as that depends on the platform.
so, when you have a mutable static variable (for a cache for example) and somebody comes and tell you, “this is not thread-safe” you simply put a thread_local and you are done?. I can’t explain why but I have the feeling that this is cheating and it is a bad idea in the long run. Does anybody agree or have a better global perspective on this?
As far as I know, that's it. Statics aren't inherently thread-safe, and this gives you individual thread-safe statics instead of shared memory that requires guards.
The only downside I can see is that it may not always be reliably static with dynamic libraries (as another comment pointed out with DLLs on Windows). But if you're _relying_ on a static being initialized on a call instead of using it as an optimization, that's already bad design, IMHO. That's reliance on side-effects, when you should be designing toward functional purity and only deviating from it when such an optimization yields high value. If it _needs_ to be initialized, it should be a parameter or wrapped in an object at the calling scope (e.g. a generator/iterator).
@@bloodgain I know that some libraries (that I haven’t used) give “lightweight” threads (like HPX) and these don’t (can’t?) have thread_local variables. This is as much as I was able to tell about possible limitations of this feature.
@@bloodgain Agreed, it is often the "right" answer. The main downside being that the value is initialized once per thread and depending on cost, that might be noticeable, but you avoid explicit locks (good)
Even better is just to avoid mutable globals and mutable state shared between threads...
Personally, I prefer more stringent management of threads instead of this floaty nonsense. Either it's running or it's not, and as the programmer I should damn well know which state it's in at any given time. I would hope that more programmers would agree with this sentiment because it's a scary landscape if not.