Iteration Revisited: A Safer Iteration Model for Cpp - Tristan Brindle - CppCon 2023

CppCon

มุมมอง 9 330

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ส.ค. 2024
cppcon.org/
---
Iteration Revisited: A Safer Iteration Model for Cpp - Tristan Brindle - CppCon 2023
github.com/Cpp...
“Safety” is the word on everyone’s lips at the moment. Unfortunately for C++ programmers, one of our most fundamental abstractions - iterators - are fraught with danger. Prone to out-of-bounds memory accesses, unexpected invalidation and dangling, iterators are a UB minefield in which even experts can find themselves in trouble.
Fortunately there is something we can do about it.
In this talk we’ll look at an alternative, safer abstraction for iterating over sequences and introduce Flux, a new C++20 library implementing these ideas. We’ll see how Flux retains all of the power and flexibility of the existing STL, but greatly reduces the potential for UB through careful design and implementation choices - all while offering compatibility with existing code. We’ll also take a look at performance and examine the cost of universal bounds checking. Spoiler: it’s probably a lot less than you’d think. In fact, we’ll see how Flux code can actually outperform equivalent C++20 Ranges code in some common situations. Finally, we’ll take a quick look at some other quality-of-life improvements that Flux brings.
If you’re interested in the STL and the ranges and algorithms in C++20 and are looking for an easy-to-use way of making your codebase more resilient, then this is the talk for you.
---
Tristan Brindle
Tristan Brindle is a C++ consultant and trainer based in London. With over 15 years C++ experience, he started his career working in high-performance computing in the oil industry in Australia before returning home to his native UK in 2018. He is an active member of the ISO C++ Standards Committee (WG21) and the BSI C++ Panel. He is a regular speaker at C++ conferences around the world, and is a director of C++ London Uni, a non-profit organisation offering free introductory programming classes in London and online.
---
Videos Filmed & Edited by Bash Films: www.BashFilms.com
TH-cam Channel Managed by Digital Medium Ltd: events.digital...
---
Registration for CppCon: cppcon.org/reg...
#cppcon #cppprogramming #cpp

ความคิดเห็น • 51

@TechTalksWeekly 5 หลายเดือนก่อน ⁺¹
This talk has been featured in the #6 issue of Tech Talks Weekly newsletter. Congrats Tristan 👏👏👏
@AlfredoCorrea 5 หลายเดือนก่อน ⁺²
50:49 I think the code generation in this specific example is more telling about the unfortunate implementation/specification of std::filter than about the efficiency of flux.
@Heater-v1.0.0 5 หลายเดือนก่อน ⁺⁵
That is very cool. One of the attractions of Rust for me was that I find C++ syntax obnoxious especially all that ranges and iterating stuff, even after using C++ for decades. FLUX makes everything look very much nicer and simpler. Of course anything that improves safety in C++ by reducing the possibilities for writing UB is a huge and important bonus. Looks like with FLUX I won't be feeling so bad next time I have to do something in C++. Many thanks.
@tommaisey9069 5 หลายเดือนก่อน ⁺¹
It looks much nicer than C++20 ranges to me.
@andersama2215 5 หลายเดือนก่อน ⁺³
I don't write "safe" code. However since there's a growing obsession with the idea I'd like to point something out concerning pointers / iterators, namely "pointers" aren't really any different than indexes, they're just numbers after all. The key aspect that pointers lose that causes all these sorts of these issues is that they're not relative to the originating object that's the "dangling" part of your pointer or reference. There's always a way around to sort the problem with "iterators" because they're just a structure, they're often just pointers, they could just as easily be a combination of an index with a "safe" pointer to the original object.
Concerning safety, the thing I think is really important is how the "check" stage is performed / the semantic of what keeps you in the loop. I've always found it strange that with pointers and iterators the stage people are taught the pattern using != operations. I've always considered it fundamentally "unsafe" but not with respect to UB, but rather because the != operation means the states where taking the loop is considered correct is fundamentally a different thing. It will of course behave "correctly" strictly assuming everything was well behaved before-hand...and your step size is 1. It will not if for whatever reason you're "past" the end iterator (which can easily happen if your step size isn't 1). This is why "iterating" past the end is UB. It never had to be, if the iterator / pointer loop check used the < operation instead you'll be considerably "safer" if for no other reason than the range which the loop functions matches the valid range of addressable locations.
Also note, the index example makes it also clearer the behavior of the program is expected to do in a multithreaded environment, although it's an optimization to store the size before the loop and use it to compare against that choice is made explicit in how it's written. Things like the range for syntax can hide away this ever so slight difference and create problems similar to a dangling reference.
@davidsicilia5316 5 หลายเดือนก่อน ⁺²
I do like the choice of member functions instead of pipes, but it seems like the down side to that design is that it would prevent users from adding their own custom view adapters in a pipeline (without modifying Flux source code). Also, there is another elephant in the room that was not discussed: compile times... the view adapters in the range-v3 library have huge issues with compile times... would be useful to see how this library compares.
@tcbrindle 5 หลายเดือนก่อน ⁺⁵
As to compile times, real C++20 ranges are already much much better than Range-V3's concepts emulation. Flux should be somewhat faster still, as we use a different customisation approach which should result in fewer template instantiations.
@tcbrindle 5 หลายเดือนก่อน ⁺⁴
It looks like the spam filter ate my other comment as I included a CE link.... but anyway, using custom sequence adaptors is possible by saying e.g.
seq._(my_adaptor, args...)
In fact, the included operations can be used with that syntax, so you could say
flux::ref(vec)
._(flux::filter, flux::pred::even)
._(flux::map, [](int i) { return i * i; )
._(flux::sum);
it's just that the built-in adaptors have some syntactic sugar
@davidsicilia5316 5 หลายเดือนก่อน
@@tcbrindle thx for your responses
@driedurchin 5 หลายเดือนก่อน ⁺³
For modern processors the cost of bounds checking is probably none due to branch prediction, right? Intel assumes branch not taken IIRC so even the first time through the loop you don’t pay the runtime cost.
@japedr 5 หลายเดือนก่อน
As nearly always, there is no free lunch. The problem is that branch prediction cache is not infinite, so you are using some resources that could be used somewhere else, especially when those checked accesses are inclined (multiple copies of the same checking code), this could make some other unrelated code slower.
@Roibarkan 9 หลายเดือนก่อน ⁺⁵
18:19 Barry Revzin’s talk: th-cam.com/video/95uT0RhMGwA/w-d-xo.html
@ABaumstumpf 5 หลายเดือนก่อน ⁺⁷
10:25 .. we had that bug. nearly 100% of the time 3 element in the vector, capacity 4, and like once in a blue moon 4. And very rarely we needed to add another element. And even more rare (like once a week) it would happen that a custom "navigator" (bad wrapper around iterators that somebody thought we needed) was used to check the elements and some housekeeping.
it took us a week to get a single example of what happened and the problem was a "it shows strings instead of numbers and then a bit later crashes". The iterator ended up overwriting some memory used in interthread-communication.... sometimes. Often it just "worked".
Aaaand NO - push_back is specified as NOT invalidating iterators IF enough capacity was reserved beforehand.
C++ iterators are just a clunky mess and they are way too restrictive while not adding all that much over pointers. Why is it not allowed to increment/decrement iterators any further? Accessing its content is already UB. This also makes many tricks you'd want to use for high performance or high density storage technically UB. (And then there is the thing that without "auto" their type-names are just insane)
@FreeOfFantasy 5 หลายเดือนก่อน ⁺²
The end iterator already points past the last element. Depending on container type, what happens when you increment it further isn't really clear. While a vector iterator just does pointer arithmetic the iterator of a list would have to access the connecting structure, however there isn't one for the end iterator, or at least there isn't one for the next node, so the result would probably a NULL ptr deref.
@sanderbos4243 5 หลายเดือนก่อน ⁺¹
Amazing talk
@KX36 5 หลายเดือนก่อน ⁺¹
my only question is why is it called flux? I suspect the answer is because what_ranges_could_have_been_if_they_werent_designed_by_committee is too long.
@tcbrindle 5 หลายเดือนก่อน
For the record, the bulk of C++20 Ranges was almost entirely written by Eric Niebler and Casey Carter: the library wasn't "designed by committee". And Flux only exists because of the things I learned from studying, implementing and building upon Eric and Casey's work.
@SolidAir54321 5 หลายเดือนก่อน ⁺¹
Thanks! A very interesting talk.
Two questions:
1. What exactly is a cursor? (For an array it can just be an index of course. But for any container what is it?)
2. How do you access elements for standard library containers like lists and sets?
These can only be accessed by iterators and not indexes. So does it have to use iterators under the covers? Is that unsafe? I might imagine it can just grab the begin() iterator and then use indexes as offsets from that. Maybe safer but still using the standard library's iterator for access.
@Roibarkan 5 หลายเดือนก่อน ⁺¹
- I believe slide 26 aims to answer the first question: a cursor is an object that support the is_last(), read_at() and inc() operations.
- I believe Tristan mentioned that for node-based containers a generic way to implement a cursor is as an object that contains an iterator and ‘the age of the container’, where the idea is that every mutating operation on a container will increase/change its ‘age’, and every operation on the cursor will first check that the container hadn’t aged since the cursor was created.
- potentially one can implement other mechanisms, such as ones that use shared pointers and weak pointers.
@AlfredoCorrea 5 หลายเดือนก่อน
@@Roibarkanthis is where flux will start showing its complexity then. I don’t think this is easily generalizable, useful or efficient for anything other than array data structures. The value I see for flux is as a monadic-like interface for *arrays*.
@SolidAir54321 5 หลายเดือนก่อน
@Roibarkan So for containers like list or set I take it that it's a kind of wrapper that still uses iterators but protects you against bad use of the iterators.
Although I 'm not sure how the generation thing can work with standard containers like a linked list. You could modify the linked list independently. I don't know how Flux would know it was changed.
Yes I guess I could look at the Flux source code although it's pretty complicated. Or I can experiment with it to see what it does. I just didn't want to spend the time if someone had quick answers.
Thanks.
@multiHappyHacker 5 หลายเดือนก่อน
I think modern IDEs also have the free functions accepting that type as a first arg at the bottom of the list, when the intellisense dropdown opens up, try pressing the up arrow to see.
@TNothingFree 5 หลายเดือนก่อน ⁺⁴
It's a good talk,
I'd expect a better flowing presentation , overall the whole point of the talk is iteration by index and the C++ Flux library.
What I miss is:
- Flux vs other well known iteration/container libraries.
- Benchmarks.
@Roibarkan 5 หลายเดือนก่อน ⁺¹
I find slides 33-34 a good dictionary for translating flux to ranges and back, and slides 36-45 as ‘proof’ that flux is as performant as can be
@TNothingFree 5 หลายเดือนก่อน
@@Roibarkan
It is decent and well
But i think it's not enough
@multiHappyHacker 5 หลายเดือนก่อน
But I think the free functions w/ first-arg in intellisense dropdown requires the first arg to be a class type, like you can't get a list of free functions accepting an int as the first arg, but you could with struct Foo or class Bar.
@embeddor3023 5 หลายเดือนก่อน
So it's basically "reject modernity, embrace tradition" kinda of thing... I like it.
@funkmasterhexbyte1684 5 หลายเดือนก่อน ⁺¹
23:14: nit pick: bounds check doesn't guarantee program correctness/safety, if the index refers to a different element after a mutation than it intended to before then, your program just becomes wrong full-stop.
There's nothing that can "guarantee" protection from this class of error without compiler help (one of the talks from a few years ago covered this; there was a compiler check that could catch this exact class of error; it was one of Herb's safety talks)
@charlieandfedericabeattie3432 5 หลายเดือนก่อน
for_each_while with a predicate that modified what you are iterating will put you back where you were. But this *is* how I built all my containers before STL existed.
@Troyseph 5 หลายเดือนก่อน ⁺¹
If compilers are eliding all of the bounds checks, how is it safer?
@anon_y_mousse 5 หลายเดือนก่อน
The compiler sees the entirety of the loop and determines that the bounds will not be violated. So it doesn't need to check before each access.
@tcbrindle 5 หลายเดือนก่อน ⁺⁷
Compilers only elide the bounds checks when they can prove that there is no possible out-of-bounds access. But they're surprisingly good at working that out.
@LeDabe 5 หลายเดือนก่อน ⁺⁶
I'll be faire, while it is a great talk, idea, library, it is clearly a shame that such work is limited to C++20 and up. The rational, is that people on older standard are probably much more in need of such tools. That said, it might not be hard to rewrite or port to say, C++14.
@ZahrDalsk 5 หลายเดือนก่อน ⁺⁸
I don't think it's a shame, I think it's a good thing. Let the base language be hampered by backwards compatibility - fine, but libraries should move on.
@TaranovskiAlex 5 หลายเดือนก่อน ⁺²
watching this from Java world and I'm like... what the heck? is it 2024 outside, or is it 1995?
@KX36 5 หลายเดือนก่อน ⁺³
Seeing someone still in the Java world and thinking... what the heck? is it 2024 outside, or is it 1995?
@yaroslavpanych2067 5 หลายเดือนก่อน ⁺²
Well, transform is much much less problematic name than map. Map has more conflicting meanings than transform.
@KX36 5 หลายเดือนก่อน
it's just map-filter-reduce is a standard paradigm in algorithms and functional programming and most other modern languages use those names. I agree that it's awkward that C++ chose to use "map" and "unordered_map" for containers long ago but I don't think it would be much of a problem in practice.
@gfasterOS 5 หลายเดือนก่อน ⁺¹
What could C++ iterators express that Rust iterators couldn't? You can write a Rust iterator in terms of pointer iteration, so I would think if anything Rust iterators are more expressive.
@embeddor3023 5 หลายเดือนก่อน ⁺¹
does Rust have random access iterators?
@driedurchin 5 หลายเดือนก่อน ⁺²
Rust iterates aren’t random access, you can’t use them to do things like, say, swap two elements. That prevent writing an algorithm like bubble sort.
That being said, they weren’t meant to, so no fault to them.
@ChlorieHCl 5 หลายเดือนก่อน ⁺¹
There are more types of iterators in C++, namely input iterators (only single-pass is allowed), forward iterators (multi-pass allowed), bidirectional iterators (you can also go backwards), random-access iterators (you can skip elements), and contiguous iterators (like array slices). Many of the additional properties are needed for the generic algorithms to work, and for some algorithms more efficient implementations can be used for the iterators that provide more functionalities.
@kaarvan 5 หลายเดือนก่อน
If you can write A in terms of B, by default this means that B is at least as expressive, and frequently more expressive. And C++ iterators are basic "pointers in disguise". I don't know Rust so as to provide an example, but can you find the "distance" between two Rust iterators?
@gordonhenriksen3144 5 หลายเดือนก่อน ⁺⁷
Rust's iterator abstraction is forward-only; this design is incompatible with many STL algorithms (for example, std::sort, std::merge, std::make_heap, std::reverse, std::swap_ranges). Note that Rust's stdlib sort algorithm is provided for slices instead of iterators, for example. That's fine for Rust, but an abstraction that's not isomorphic with iterators would preclude Flux's bidirectional interoperation feature @ 40:20.
@anon_y_mousse 5 หลายเดือนก่อน
I've been writing my iteration for over 25 years using nothing more than integer indices and I've hated C++ iterators that whole time. The super clunky type names were also a big part of it, especially the std namespace. I've also known this whole time that it's more efficient to use an index than it is to use a pointer, most especially if you're iterating multiple arrays at the same time. I will never understand this drive that some people have to overcomplicate things. Also, I will never use .at() because the only time I need to index into a vector is when I'm iterating the whole thing or I've already checked the index because it came from an external source and it's no less safe to use the length as the barrier condition when doing something with it and should the vector change .size() then the loop exit condition will reflect that. Also also, if the contents don't change during the iteration and you know they won't, then the new form `for ( auto i : a )` is safe and easier to type. If on the other hand you need to either grow or shrink the array during iteration, then a flat index is easier to deal with than using some namespace::function_name( object ) methodology.
As for the current trend of "functionalization" of loops, I think it's absolutely stupid. Not just the function name choice, though `map` is an especially egregious choice, but also in the first example shown where the contents of the array were squared in the loop and then the maximum was determined at the very end was horrifically inefficient. The maximum would be the maximum whether you squared it first or at the very end and determining the maximum as you went would be far more efficient. It seems like functional programming is the new pointer tricks, only instead of getting code that crashes now it takes until the heat death of the universe and eats all your RAM.
Please take the time to write your own C standard library using pure C and no assembly and profile it at -O0. If you want the shortcut to learning, then start with implementing all of the mem*() and str*() functions along with qsort() and bsearch(). It shouldn't take you more than a few hours to implement just those functions. If you can implement qsort() iteratively, then we'll talk.
@anon_y_mousse 5 หลายเดือนก่อน
I'm adding this note for whoever is running the channel that you might not realize that the settings are such that my post gets hidden from public view. This is why I'm leaving a dislike, as this type of censorship by TH-cam is insidiously evil and whatever settings you have set are enabling it.
@Roibarkan 5 หลายเดือนก่อน ⁺²
@@anon_y_mousseI can see your comment 😊
@anon_y_mousse 5 หลายเดือนก่อน
@@Roibarkan Good, and I will change my like accordingly. I see that the newer post is also now visible. So clearly they found the setting.
@anon_y_mousse 5 หลายเดือนก่อน
@@Roibarkan And now I see that my other comments are still not showing up, including my first response to you. So clearly I was wrong that they found the setting.

ต่อไป

เล่นอัตโนมัติ

C++ Memory Model: from C++11 to C++23 - Alex Dathskovsky - CppCon 2023