Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023

CppCon

มุมมอง 15 490

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ก.พ. 2024
cppcon.org/
---
Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023
github.com/CppCon/CppCon2023
In this session we will construct a Single Producer Single Consumer Lock-free Wait-free Fifo from the ground up. But why write our own if we can get one from reliable sources such as Boost.Lockfree? There are a couple of answers to this question.
* Writing such a fifo is a fairly gentle introduction to lock free programming.
* There are some interesting performance optimizations that can be made.
* There may be some specific requirements that are not met in out-of-the box implementations.
In the presentation we will first develop a simple circular fifo. Next we will make it thread-safe as well as lock-free and wait-free. After that we will address issues associated with cache coherency and false sharing in particular. We will show a few additional optimizations that can be added as needed to meet specific requirements. Finally, the performance of our fifo will be compared with a few out of the box implementations. Along the way we will touch on subjects such as thread safety, data-races, false sharing, object lifetime, and relaxed atomics.
---
Charles Frasch
Charles Frasch is a Senior Core Developer with the IEX Group where he is working to re-platform their core trading infrastructure in modern c++. Charles has worked in the financial technology world for more than twenty years in areas such as high-frequency trading and low-latency, high-performance infrastructure.
__
Videos Filmed & Edited by Bash Films: www.BashFilms.com
TH-cam Channel Managed by Digital Medium Ltd: events.digital-medium.co.uk
---
Registration for CppCon: cppcon.org/registration/
#cppcon #cppprogramming #cpp
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 36

@anon_y_mousse 3 หลายเดือนก่อน ⁺²⁸
I've probably given this tip to more people than any other, but if you're optimizing around an array of any variety, use a size that's a power of two. If you need to constrain an index into such an array, a `bitwise and` is the fastest way to do so. Never use division/modulus and oddly sized arrays, even when setting up the backing store for a hash table. Prime number sizes may yield better distribution, but it's always at the expense of performance. Also, with regards to hash tables, always store the unnormalized hash with the keys so that when you need to resize the table you don't need to query the hashing function again to rebuild the entire table.
@phusicus_404 3 หลายเดือนก่อน
Are there benchmarks which show that power of 2 size is better?
@Dominik-K 3 หลายเดือนก่อน ⁺¹
I very much agree with this advice. Especially for anything that is performance sensitive, bitwise and is a go-to means to restrict value ranges.
@user-mc9wh3df5v 3 หลายเดือนก่อน
While I completely agree regarding power of 2 + bitwise and on one minus the size, I would also not be surprised if the hardware didn't recognize this optimization. In my day job I constrain the size to be a power of 2.
@MenaceInc 3 หลายเดือนก่อน
Seems like he was going to say the same thing at 8:30 before he lost his train of thought
@MenaceInc 3 หลายเดือนก่อน
Ah, he gets it out by 9:50
@fabiogaluppo2635 3 หลายเดือนก่อน ⁺⁶
Great talk. We need more of this type of content in C++ talks!
@nathanieldoromal6904 3 หลายเดือนก่อน ⁺²
One of the best explanations for implementing a lock-free SPSC I've come across. Kudos!
@DedmenMiller 2 หลายเดือนก่อน ⁺¹
41:00 I had that issue too.
My solution was to not store objects in the queue, but just store pointers.
the new/delete are done outside. I used another fifo of "Available buffers", pre-allocated pointers that can be reused. Instead of new grab one from it, instead of delete push it back to be reused.
That way I get rid of the push/pop memcpy's, and ALSO of runtime allocations.
Problem is just that you have to handle when the push fails and the queue is full, you need to either throw the pointer back into the available buffers, or hold onto it until it can be pushed.
@user-mc9wh3df5v 6 หลายเดือนก่อน ⁺¹²
I completely misunderstood Roi's question about 1:03. Yes, I agree that if you grab multiple pushers or poppers in the same block they will be destructed in the reverse order in which they were acquired. And, if you do that you must carefully consider that fact. The pusher and popper destructors just advance the cursor. So, it is possible that at the end of the block the cursors are correct. But, my real answer is, if you need to read or write multiple values from and to the FIFO you should extend the API to add functions that do that correctly.
@szirsp 3 หลายเดือนก่อน
Well, yeah, they just advance the cursor... But because of that only the cursor values will be correct at the end!
The FIFO won't be! Because when you are getting multiple pushers in a row they are getting a reference to the same memory, and when one lifetime ends only the cursor position is changes the still existing pushers won't be, so they will overwrite the previous item and skip one item when they go out of scope.
Instead of advancing the cursor implicitly in the destructor I would use a commit() method. That way you can get a pusher, commit and then get another pusher in the same scope.
I'm not sure if I would actually implement pusher and popper at all, just use some kind of peek methods to get direct access the write and read location and use commit and free or similarly named methods to advance the write and read head.
@youtou252 3 หลายเดือนก่อน
how is your comment 3 months old when the video was published 2 days ago ?
@Jeanpierrec19 3 หลายเดือนก่อน
Wonderful talk. Have done a similar implementation but never needed the extra performance over fifo2 so never looked into how it worked. Your talk explained everything very well. Thank you.
@user-mc9wh3df5v 3 หลายเดือนก่อน ⁺¹
@@youtou252 I'm the presenter; they allowed me to preview the video.
@amaama4140 3 หลายเดือนก่อน ⁺⁴
Wow, what a marvelous presentation! I enjoyed every single second of it. I have always wanted some similar learning material to start with lock-free programming, and this one was just the right one. Thank you very much, Mr. Frasch.
@user-mc9wh3df5v 3 หลายเดือนก่อน ⁺²
You are welcome. The references in the slides are also very informative.
@starchsky 3 หลายเดือนก่อน
So many good things explained so well in one go. Bravo.
@dian-lunlin7349 3 หลายเดือนก่อน
Great talk! I enjoyed it a lot.
@ksvishvajith 12 วันที่ผ่านมา
That gentleman looks so much like the cousin of Bjourne, cool presentation
@yb9737 3 หลายเดือนก่อน ⁺⁴
Beautiful talk
@user-mc9wh3df5v 3 หลายเดือนก่อน ⁺¹
Thank you.
@LaserFur 3 หลายเดือนก่อน
One way to handle a pointer request when near the end of the FiFo wrap around point is to allocate extra space past the main circular buffer. Then the get pointer function can in the rare cases where the buffer wraps around to do a copy into this extra space so that the caller has a linear buffer. and then the release can copy data back into the FiFo buffer. The only problem is when the FiFo can auto expand the buffer. In that case a mutex is needed and all callers can't be holding pointers. Luckily in my case I don't need lock free and the code just grabs a mutex for any operation and also keeps the mutex held between the getting of the pointer and the release of the pointer. I also use RAII in that my get pointer returns a class to release the pointer and the mutex.
@szirsp 3 หลายเดือนก่อน
It's not clear to me what problem you are describing or trying to solve. But your proposal doesn't seem like a sound solution to me.
@user-ms2if2un7f 3 หลายเดือนก่อน
slide 28, in push function, shouldn't we load popCursor with acquire first, and then load pushCursor with relaxed, correct me if I misunderstand this.
@user-mc9wh3df5v 3 หลายเดือนก่อน
From a correctness point of view the load order shouldn't make any difference. In the push function and while the cursors are being loaded the fifo can only become emptier. If the cursor comparison shows full but the reader thread popped something off, the function will return full and noop. The next time through a slot will be obtained and the push will succeed. The full condition is already the slow path. From a performance point of view it might be a bit more efficient to start the load acquire first since it is a more expensive operation than load relaxed. OTOH, the processor's instruction scheduler show run the loads concurrently. Regardless, the comparison cannot proceed until both values are available.
@Justin_Wang 3 หลายเดือนก่อน
Great talk. I wonder how hardware works and how often do fifo1 with out atomic get error, but after 10,000 times of test in release mode, it still work correctly, do any one have any hint of it?
@Justin_Wang 3 หลายเดือนก่อน
I find out that x86 is a strongly-ordered system; everything is release-acquire by default. So, whether Fiio1 will pass the test or not is decided by how the program is compiled, not at runtime. Am I right about it?
@rocknroooollllll 3 หลายเดือนก่อน
Don't we have to compare/exchange in order to get thread safety here?
@user-mc9wh3df5v 3 หลายเดือนก่อน ⁺¹
You can certainly use atomic compare and exchange. It is a more complicated and expensive operation that the atomic load and stores I use.
@szirsp 3 หลายเดือนก่อน
12:07 While this would work for most, this is not correct [when capacity is not 2's power]. It relies on size_t not overflowing.
(Which is a safe bet when it's 64 bit since it would take hundreds of years, but it is NOT when it's 32 bit or less, for example when running on a 32 bit or 8 bit microcontroller, where it only takes minutes to overflow.)
To demonstrate the problem create a FIFO with 10 capacity and set the cursors to 0xfffffffaUL (or push and pop that many times), then push 10 items into the FIFO (compiled to a 32 bit platform). The write index will be 0,1,2,3,4,5,0,1,2,3. Meaning even though the FIFO was initially empty and had enough space to hold 10 items, it overwrites 4 of those items... and it will try to destruct some items twice.
Also I usually don't like using modulo operation for indexing. Because capacity is not a compile time constant modulo division won't be optimized which could be a problem (mostly on processors without hardware division unit) (even when capacity is 2's power). I'd rather limit the cursor to valid index range / capacity to make sure it at all times contains actual valid array index (which makes debugging easier). (of course you would need to differentiate between full and empty in another way)
I have written a number of lock free FIFOs for microcontrollers over the decades, so it seemed trivial to me that this is possible (even though the implementation might not be trivial, and may include memory barriers) I also realize that it might not be obvious for "modern"/young developers. I still wondered what could be said about them to fill out a 1 hour long presentation...
So I still watched it (on 2x speed) to see if he ever addresses this overflow issue, and the talk did contain some interesting pitfalls. This shows that multi-threading and modern processors are complicated, so nearly everyone should use standard libraries instead of rolling their own FIFO.
@user-mc9wh3df5v 3 หลายเดือนก่อน ⁺¹
That's right. The presentation serves its purpose as is. It would be far too complicated to to deal with ABA issues; perhaps I should have mentioned that.

ต่อไป

เล่นอัตโนมัติ

CppCon 2017: Fedor Pikus “C++ atomics, from basic to advanced. What do they really do?”