I wish I had found this video 2 years ago when I was trying to figure out why my release version was spitting out a blank window Icon while my debug version worked fine.
I have a programming exercise that I give to interviewees, the preferred solution to which requires the use of two pointers to a mutable buffer. When I first started trying to learn Rust in earnest, this was one of the first things I tried to write, and discovered that Rust wouldn't let me do it. And I thought this was absurd, because having multiple pointers into mutable storage is quite common. "How do you get anything done in this language?" I thought. And then I remembered the classic solution to reversing the elements of a vector/array in place requires two pointers to mutable storage -- one starts at the beginning, the other starts at the end. You swap the elements at the pointers, then increment/decrement the respective pointers until they meet in the middle. A quick look at the Rust standard library showed that it has a std::vec::reverse() function, so I went and had a look at the source code to see how they achieved it... and discovered (at the time) it was doing exactly that -- using two mutable pointers -- in an unsafe block. The current version of reverse() is a bit more clever about obtaining the pointers -- basically dividing the buffer in half, creating a new reference to each half, then destroying the original reference -- but still requires an unsafe block to obtain those pointers.
You can easily reverse the elements of a vector in-place in safe Rust by doing a destructuring operation: (vec[i], vec[vec_size - i - 1]) = (vec[vec_size - i - 1], vec[i]), given that vec: &mut Vec. This requires only one mutable reference
@@francishubert2020 I've been around enough pessimizing compilers over the years to be concerned that the generated machine code from such an approach won't use pointers directly, but will instead recalculate the offsets from the array indices on every pass through the loop.
This was amazing. What I got , as a beginner to both C and Rust, is that: 1. Unsafe Rust may be trickier (not necessarily harder) and more restrictive to write than write C/C++ 2. The trade-off to above is that tools like Miri can detect Rust UB, while existing tools in C/C++ cannot detect UB (for very specific and obscure cases) 3. this shit is hard but its pretty fun
No not all. There's a list of limitations in the README here: github.com/rust-lang/miri. And in general I don't think it'll ever be able to detect UB caused by FFI calls into C or assembly.
Let me give you a 4.: In practice, it is VERY rarely necessary to actually write any unsafe rust unless you are writing your own low level collection types, thread synchronization mechanisms or similar stuff. And even in these cases, it's just tiny, hopefully well documented unsafe blocks which makes potential UB much easier to audit. Usually you can write entire programs in pure safe rust though, get the same efficiency and runtime speed you would get in C/C++ and never have to think about any of this nasty UB stuff - just let the extremely helpful compiler errors guide you.
Asan and ubsan are tools that can detect undefined behaviour, not sure what you mean by "in obscure cases" though. I'd personally argue that unsafe rust is significantly more complex than C, since you're interacting with a system that is more complex itself. It's true that you need way less of it compared to C where safety is guaranteed only by the programmer.
A way to explain that unsafe has all these restrictions is that unsafe does not enable you to bypass the rules of safe rust but rather shifts the enforcement of them on the programmer.
I really like the style of this video. You explicitly spell out the scope of the video (what we'll cover, what we won't) and worth through each detail very clearly. I could imagine having a video library of hundreds (thousands?) to thoroughly learn each language piece by piece. Very interesting to watch :D
33:40 MSVC always considers that aliasing is possible (even if you use *restrict) hence doesn't hard code pointers even while optimizing. So there is no "-fno-strict-aliasing" flag.
I've written a lot of C++ and C. Perhaps too much, because I understood the "pathological example" almost immediately - in the sense that I opened up the playground and every experiment I made came out exactly as predicted, which was a a funny experience because I've barely ever touched Rust, and never Unsafe. But if you're used to thinking about what is- and is-not UB in C/C++ and what optimizations you're hoping to achieve when you use the Restrict keyword (and how Restrict shenanigans can quickly turn UB) then the Rust example almost explains itself. I'm glad you showed it to us, this is really good stuff. EDIT: C.f. the bit about a "false positive" in the borrow-checker - I'm glad you illustrated that "I wish the borrow checker would allow this" is functionally equivalent to "I wish the compiler was less certain about pointer providence so it would be forced to write slower code."
So Rust assumes you don't understand the machine, but assumes you understand Rust... if you don't understand Rust, you get errors. C assumes the programmer understands both the machine and C, and if you don't you get errors. Given that C is obviously more simple than Rust, the question is: Is understanding implicit/explicit Rust abstractions simpler than understanding the hardware itself? Regardless, it would be waaaay harder to write C code if basically everything used restrict and const by default.
Great video, wonderfully clear explanation. Loved the four side by side rust/c examples with the mutable pointers. I kinda knew that rust optimises with its aliasing rules in mind, but I didn't think to connect that to UB!
I'm not a Rust programmer and I've presumed unsafe mostly turns off all the checks. I've heard it's not quite like that but wasn't clear on how. Already at 8:06 I presume a large part of the point here is made (I promise I'll watch more later). Rust unsafe just wraps unsafe actions. Not unsafely handled data or anything else. What I had imagined is some sort of clobber declaration. Like gcc extended asm with its clobbers. You declare what you've touched so it can retain checks for the rest of the code. This was much nicer.
Yes, this is a great description of what unsafe Rust is and isn't. If you'd like a flood of more details, you can also take a look at a type called MaybeUninit from the standard library. This type is how we represent possibly uninitialized values, which are illegal in most other contexts. The distinction between which methods are safe and which are unsafe on that type is very informative I think. Note that it's perfectly safe to construct it, but generally unsafe to do anything useful with it. It's also interesting to note that this type wasn't present in version 1.0, and it was added later once some of UB problems it's solving were better understood.
What unsafe changes in Rust is *only* allowing a few operations forbidden in safe code - otherwise it’s just as safe as safe Rust (ie. *nothing* changes in safe code wrapped with the unsafe keyword), and those are: - dereference raw pointers (you can create raw pointers in safe Rust, but to read anything from them you need unsafe), - call unsafe functions (those might be C FFI, or functions that need you to manually ensure some invariants aren’t broken to make calling them sound), - access fields of (untagged, C-like) unions (Rust enums are safe tagged union-types, but if you have a union without a tag marking which variant it’s in, you have to use unsafe to access its internals), - implement unsafe traits (which typically are “marker” traits, ie. it lets you for example manually mark your type as thread-safe even though it contains non-thread-safe fields), - mutate static objects (which can only ever be read in safe code), and that’s it. And all the “safe” types and interfaces are generally built on top of those, that is low-level safe functions will often use raw pointers or external C functions in their implementation, but will do that in a way making sure that no required invariants can be broken by their caller, and thus being completely safe to call from safe Rust.
@@benedyktjaworski9877 > mutate static objects (which can only ever be read in safe code), You cannot read mutable static in safe Rust, btw. In general, mutable statics are very hard in Rust.
@@ХузинТимур Yeah, you’re totally right! I was thinking about static objects in general (you can have immutable ones that you can safely read and cannot ever mutate) and oversimplified, my bad!
Late to the party. This was an amazing presentation. The _ = &mut x line breaking the code is super scary and I wonder what rule we are breaking in that case..
When I wrote that example, my understanding was that it violated the experimental "stacked borrows" rules that Miri was enforcing. But I've heard that Miri has changed since then, and I'm not sure what the current model is.
Re the pathological example around 24:00: This is a consequence of Stacked Borrows rules. Stacked Borrows is overtly strict, since it wants to allow the optimizer to always _add writes to mutable references._ Therefore, it does a "fake write" whenever such a reference is created, and this fake write means the shared reference must die (since otherwise adding an actual write is wrong). As you point out, this is often unintuitive, since no actual write happened. Tree Borrows, the alternative aliasing model for Rust (that we really need to add to the playground) is more lenient. There, your pathological example is not UB. But it loses some optimizations about writes. Which of these will be the official one in the end is as of yet undecided.
Oh wow, it's awesome to hear from you. I don't know much about exactly how Tree Borrows is different, but I have to turn it on to get Miri to accept casting a &T into a &ReadOnlyCell, so I'm excited about that :)
Very interesting video! And speaking of unsafe code in Rust, is there any tool that would help us to check memory leaks?. Let's say we wanna write CRust FFI and there is a "rule" that says "if u allocate memory in C u have to also free that memory in C, and same applies to Rust". So my questions is: Is there any tool that checks whether allocated memory was freed properly? ( Especially for rust, for C we can use something like Valgrind I believe).
I think if you compile both C and Rust with ASan turned on you might get something like what you're describing? But I haven't tried it myself. (Miri also includes some leak detection features, but I don't think it supports FFI.)
C and C++ are very different. I hate when people treat them as if they were the same, just because there is a large common subset. The "overlapping" example violates the same strict aliasing rule, by accessing storage that has int objects, but no Foo objects, through a Foo pointer. In most cases, it's UB to access an object through a pointer of different type. There are some exceptions. I'm coming from a C++ background, but as far as I'm aware, these aliasing rules are the same in C, except for unions.
In "a pathological example" I believe everything would be fine if you had written x_ptr.read() instead of *x_ptr. It's always seemed like a mistake to me that Rust pointers can be casually dereferenced since transient references are such a common source of UB in unsafe Rust code. Sure, you need ways to create references from pointers but they should be a lot more explicit and eye catching than something as casual and commonplace as the dereference operator, which just encourages the conceptual conflation of references and pointers. Even in cases where dereferencing would be safe, I always default to explicit read/write calls. The good news is that MIRI is reliable at catching this class of UB as long as you have full code coverage for unsafe code.
I'm pretty sure doing it with .read() is totally equivalent, at least in this case. Miri still thinks it's UB: play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=63f5f73daf0dc0413f7312af5cdb2d4e
It would probably be undefined behavior in C if you declared x and y as restrict pointers, that tells the compiler x and y cannot be aliased so it optimizes accordingly and always return 42
In 32:17, you say you think the casts [from `(int *)` to `(Foo *)`] are allowed. That's true, since any casts between data pointers are allowed (as long as there are no alignment or similar issues; let's ignore that part). However, I think the same rule of the previous example applies: You cannot access the data through that pointer. You must cast back to `(int *)` before dereferencing. So, the access to `foo1->y` is already illegal (access through a wrong type) even if you would remove the other (partially-overlapping) write.
What's the rule on "header" structs? Like if the first field of `struct foo` is of type `struct foo_header`, can I pointer cast between them in some cases? I'm not aware of any rule that says yes, but this also seems to be something that C projects (like CPython) do constantly, as sort of a simple kind of inheritance.
@@oconnor663 It's . That rule still requires that the structures are within a union. So, yes, most C programs that use such structures do trigger UB. For some reason, there's a misconception about this rule, and people seem to believe that just by sharing a first member, two structures are at least partially compatible. That idea probably comes from times _before_ ANSI C, and back then before standard C this was probably true in compilers of that era. There was a discussion about the sockaddr(3type) structures, which are a clear example of UB. Those APIs predate ISO C, and force the programmer and libc to cause UB. It was decided to add wording in POSIX.1-2024 to clarify that compilers must make sure that sockaddr(3type) structures don't cause UB even if they should according to the general rules of ISO C, by using any compiler-specific means (such as compiler attributes that make the types compatible even if they aren't).
@@oconnor663 It's 6.5.2.3p6. For some reason, youtube was removing my comments with a link. That rule still requires the use of a union. And yes, most programs violate that rule. That misconception that the common initial sequence allows seemingly unrelated structures to be compatible probably comes from the times before ANSI C. Back then, compilers allowed such uses. Now they don't anymore. It's interesting that sockaddr(3type) is a POSIX API which makes a bad use of that rule in the standard. It is a standard API that forced users and libc to invoke UB. The wording in POSIX.1-2024 was changed to say that implementations must use any compiler-specific features (such as attributes) to avoid invoking UB in uses of that API, which according to general ISO C aliasing rules is UB. So, it's basically a recognition that the rules conflict with pre-ISO C.
I tried the code in the minute 15:00 myself, and using compiler explorer like in the video found that this rust UB does not happen anymore from rustc 1.75.0 forward. Can anyone else confirm this as I dont seem to find the explanation specified in the 1.75.0 update change log.
Thanks for catching this. To reproduce the same bug on current Rust, I need to add #[inline(never)] above foo(). Here's an updated godbolt: godbolt.org/z/T6od4qsoz. Using cargo-bisect-rustc, it looks like the first nightly that "broke" the original example was 2023-10-19. Of the commits that went in that day, it looks like one of them was an LLVM update, which was probably the culprit: github.com/rust-lang/rust/pull/116840. I'll add this to the list of errata in the video description, and I'll add the updated godbolt to the list of links.
32:15 I donʼt know about C, but in C++ Iʼm almost certain that at least the cast on line 17 is not allowed. The one on line 16 used to not be but I think now *is*, although Iʼm not certain of that. Iʼd need to double check how providing storage for contained objects works again now.
@@oconnor663 This is exactly the kind of question I feel like ubsan can have bugs with, but maybe thatʼs a wrong impression because one of the first times I used it I ran into a (since fixed) bug, where the compiler compiled perfectly valid lambdas in a way that would be UB for the programmer to write even though it proceeded to compile that intermediate form in a way that did the right thing, and then UBSan came along and caused it to crash as UB. In all other cases Iʼve heard of it only gives false negatives not false positives.
One subtlety here is that we're not loading any B's (that is, Foo's) from the array. Instead we're only accessing the int members, which does match the array type. I *think* this makes each individual access ok according to the standard, but I could be totally wrong about that.
@@oconnor663 Imagine a CPU has a dedicated instruction `load-8byte-aligned ptr, offset`, which is faster than generic `load ptr, offset`, and the compiler decided that all structs must be 8-bytes aligned
Definitely agreed that this code will be broken if Foo has alignment other than 4. But if we imagine adding an assert that Foo alignment is 4 before the pointer cast, and we didn't create the second pointer, do you think accesses through the first pointer would still be per se UB?
@@exl5eq28 honestly, same. C really needs a tool like Miri that can (aspirationally) catch 100% of UB, at least in pure, deterministic programs like this one. But my impression that this is almost impossible when the standard isn't designed in advance to accommodate it.
Thanks for a very informative video - it certainly made me think about some things that I hadn't really considered too much before. Regarding around 34:00 the standard way to make sure C doesn't do the unwanted optimisations is to use the volatile keyword for the function inputs. That tells the compiler that the values referenced by the pointers can change at any moment, this makes the code behave as expected even with -O3.
Oh that's an interesting point. It does make sense that volatile would suppress this optimization. That said, volatile has a history of being abused for purposes that the standard doesn't actually guarantee it's good for (like atomic communication between threads), and I'd want to be careful about relying on this.
It's a combination of problems. A cheap new webcam, a new apartment lighting situation, and the fact that I still don't quite know what I'm doing. I'll get there.
42:10 wouldn't this assembly cause incorrect behavior if the int64_t pointer is pointing to two int32_t elements in the middle of the array? I guess that the compiler makes the assumption that the 64bit pointer can't point to the middle of the array because a pointer cast like that is undefined behavior (?) I don't know the rules to this kind of pointer casts, honestly
Yes it would certainly cause problems, but the way I like to think about it is that the "strict aliasing" rules define those problems to be the programmer's fault rather than the compiler's fault.
As much of the video mentions, the C standard specifies that compilers may assume pointers of different types don't overlap in memory. The uint64 is a different type to the uint32 array, so the compiler can make an optimisation that is broken if that rule is also broken. The uint32 pointer is the same type as the array - and by the rules laid down by the standard, the compiler cannot assume that it occupies a different location in memory, so it has to do the calculation the long way, in case the pointer is overwritten midway through. I imagine there is some logical justification to this in the sense that, generally speaking, in a well-formed program data of different types should not overlap in memory (and data of the same type shouldn't overlap at different addresses), but some common techniques like subsetting part of a struct with another struct probably run afoul of this more often, and is why things like Linux (which use that technique a lot) need strict aliasing to be turned off.
Why is the pointer cast at 27:55 unsafe? I get why the example produce the wrong result, but is the pointer cast *alone* always unsafe, or is it only for this example?
I don't think this pointer cast all by itself is UB. Like if you were to cast the pointer and then just throw away the result, that should be allowed (but pointless). However I think actually doing anything through the cast pointer is an "access through an incompatible type", which is UB according to the current standard. There's a footnote in the standard that reads "The intent of [this rule] is to specify those circumstances in which an object may or may not be aliased", and all the cases I'm aware of that produce the wrong result by breaking this rule do it by aliasing the same object with two pointers of different types, so it's fair to say that _in practice_ you have to write code that's kind of similar to this example to break things. But relying on the intent of the standard is a dangerous business, even when that intent is explicitly spelled out in a footnote. The letter of the law is that access through an incompatible type is UB, period, and I think that's sufficient grounds to say that it's "unsafe" to do anything at all with this cast pointer, whether it's similar to this example or not. That said, someone else pointed out in the comments that a proposal called "Improved Rules for Tag Compatibility" (N3003) has been adopted into C23, and that's going to change these rules somewhat. So this particular example may actually become legal, if you're using a very recent compiler.
@@oconnor663 It's still UB in C23. N3003 only allows structures with the same tag and contents to be compatible. The example uses a different tag, and so is UB.
Is it correct that you can use have ailaising mutable raw pointers *mut T but if and only if there are no other references to the same memory location?
Yes, *mut pointers are allowed to alias each other. However, you have to be super careful with what you do with them. For example, if you dereference one of them and call a &mut self method, it might not be ok for the other pointers to alias that implicit &mut self. The exact rules for this have been changing a lot over the past few years, and the only consistent recommendation I can give is to test whatever you're trying to do under Miri 😅
That's the video for all the ignoramuses that claim that Rust is overcomplicated. No, Rust is exactly as complicated as the problem that it's solving. Many people write undefined behaviour in C without ever knowing it, all it takes is one recast. Not to mention the absolutely ridiculous performance hit that comes from poor optimization coming from pessimistic aliasing analysis. Yes, you don't run into that sort of an issue often, but in larger code bases you inevitably will run into a problem like this. And it can literally take weeks to track down.
Not getting such warnings is kind of the point of assigning to _. It's not very intuitive because it's different than any other assignment. "_ does not bind" meaning that anything assigned to _ will be dropped immediately instead of at the end of the scope. That's useful in certain situations, like destructuring tuples "let (a, _) = something_returning_a_tuple();" or avoiding to keep something with #[must_use] (like Result) in memory.
unsafe rust is way worse than c. ive heard rust will be fixing unsafe some time though (i heard this a while ago and havent kept up so maybe it already happened?)
Yeah the distinction between "unsafe" and "unsound" is really important and not very clear from the names. That said, I can't think of a better keyword.
The statement about Rust programs being 95-99% safe code just isn't entirely correct. It depends on what kind of applications you are writing. Ash, Rust low-level wrapper for Vulkan, the one that can actually give you C++ levels of performance, assumes that most of your Vulkan code will be wrapped in unsafe blocks, including your main rendering loop. (and man, it does look like an ugly clunky cousin of a similar C++ code there with all those CStr::from(b"hello world") all over the place) The same will be true basically for any code working with hardware or low-level OS API calls, like memory mapping. Basically that's why I've noped out of the idea of using it for game development after some evaluation.
That's probably a byproduct of GNU libc dependency. There's this "crABI" proposal that aims to deprecate libc in favor of a universal ABI that multiple programming languages (and graphic APIs such as Vulkan and OpenGL) can bind to
@@JoJoModding That's why I stated that "it depends on the application". But games are simply the most obvious example. If YT would still be allowing posting links to other videos in comments, I would share a link to a video of one of those "Rust gurus" on YT introducing UB into his Cell container clone with 20 LoCs via "unsafe", having a second thought about it, and then dismissing it with "nah, compiler knows better". IMO Rust will make the code that produced by juniors - or even AI - safer, while giving those working on lower levels shiny new high caliber guns to aim at their feet. And just think of the possibilities Crates' centralized code base has for slipping obscure back doors into everyone's code in unsafe blocks somewhere deep down in the code.
No very special reason. It's just that when I need "an arbitrary example value with no special meaning" my first choice is 42 and my second choice is 99.
It could've been thirty seconds: 1. Rust doesn't have the "strict aliasing" rules from C and C++. 2. But all Rust references are effectively "restrict" pointers, so getting unsafe Rust right is harder in practice. 3. It would be nice never to have to worry about any of this, but it turns out that a lot of optimizations don't work without aliasing information.
Very minor correction at 16:16, Miri doesn't insert checks into your code. Rather, it interprets Rust's MIR to test for UB, memory leaks, etc.
Good catch! Added the first erratum :)
@@oconnor663 Thanks! Very good video! I'm a Rust programmer (and former C++ programmer) and learned a lot!
I wish I had found this video 2 years ago when I was trying to figure out why my release version was spitting out a blank window Icon while my debug version worked fine.
I have a programming exercise that I give to interviewees, the preferred solution to which requires the use of two pointers to a mutable buffer. When I first started trying to learn Rust in earnest, this was one of the first things I tried to write, and discovered that Rust wouldn't let me do it. And I thought this was absurd, because having multiple pointers into mutable storage is quite common. "How do you get anything done in this language?" I thought.
And then I remembered the classic solution to reversing the elements of a vector/array in place requires two pointers to mutable storage -- one starts at the beginning, the other starts at the end. You swap the elements at the pointers, then increment/decrement the respective pointers until they meet in the middle. A quick look at the Rust standard library showed that it has a std::vec::reverse() function, so I went and had a look at the source code to see how they achieved it... and discovered (at the time) it was doing exactly that -- using two mutable pointers -- in an unsafe block.
The current version of reverse() is a bit more clever about obtaining the pointers -- basically dividing the buffer in half, creating a new reference to each half, then destroying the original reference -- but still requires an unsafe block to obtain those pointers.
You can easily reverse the elements of a vector in-place in safe Rust by doing a destructuring operation: (vec[i], vec[vec_size - i - 1]) = (vec[vec_size - i - 1], vec[i]), given that vec: &mut Vec. This requires only one mutable reference
@@francishubert2020 I've been around enough pessimizing compilers over the years to be concerned that the generated machine code from such an approach won't use pointers directly, but will instead recalculate the offsets from the array indices on every pass through the loop.
@@ewhac Then maintain two indices i and j (incrementing i and decrementing j), and use them both to index into one &mut Vec
I'm not sure what this comment is even about. So what if it uses unsafe? That's literally one of the usecases for unsafe.
I'm curious. Could you please post the code of that exercise solution that requires the use of two pointers to mutable data? Thanks.
14:30 to be fair, I've written C code that has a memory exception in debug and passes in optimization.
This was amazing. What I got , as a beginner to both C and Rust, is that:
1. Unsafe Rust may be trickier (not necessarily harder) and more restrictive to write than write C/C++
2. The trade-off to above is that tools like Miri can detect Rust UB, while existing tools in C/C++ cannot detect UB (for very specific and obscure cases)
3. this shit is hard but its pretty fun
Can Miri detect all possible cases of UB?
No not all. There's a list of limitations in the README here: github.com/rust-lang/miri. And in general I don't think it'll ever be able to detect UB caused by FFI calls into C or assembly.
Let me give you a 4.:
In practice, it is VERY rarely necessary to actually write any unsafe rust unless you are writing your own low level collection types, thread synchronization mechanisms or similar stuff. And even in these cases, it's just tiny, hopefully well documented unsafe blocks which makes potential UB much easier to audit. Usually you can write entire programs in pure safe rust though, get the same efficiency and runtime speed you would get in C/C++ and never have to think about any of this nasty UB stuff - just let the extremely helpful compiler errors guide you.
this shit is pretty fun indeed
Asan and ubsan are tools that can detect undefined behaviour, not sure what you mean by "in obscure cases" though. I'd personally argue that unsafe rust is significantly more complex than C, since you're interacting with a system that is more complex itself. It's true that you need way less of it compared to C where safety is guaranteed only by the programmer.
A way to explain that unsafe has all these restrictions is that unsafe does not enable you to bypass the rules of safe rust but rather shifts the enforcement of them on the programmer.
I really like the style of this video. You explicitly spell out the scope of the video (what we'll cover, what we won't) and worth through each detail very clearly. I could imagine having a video library of hundreds (thousands?) to thoroughly learn each language piece by piece. Very interesting to watch :D
33:40 MSVC always considers that aliasing is possible (even if you use *restrict) hence doesn't hard code pointers even while optimizing. So there is no "-fno-strict-aliasing" flag.
I swear your videos on Rust vs C++ are amazing. I just need more of them xD
I've written a lot of C++ and C. Perhaps too much, because I understood the "pathological example" almost immediately - in the sense that I opened up the playground and every experiment I made came out exactly as predicted, which was a a funny experience because I've barely ever touched Rust, and never Unsafe.
But if you're used to thinking about what is- and is-not UB in C/C++ and what optimizations you're hoping to achieve when you use the Restrict keyword (and how Restrict shenanigans can quickly turn UB) then the Rust example almost explains itself. I'm glad you showed it to us, this is really good stuff.
EDIT: C.f. the bit about a "false positive" in the borrow-checker - I'm glad you illustrated that "I wish the borrow checker would allow this" is functionally equivalent to "I wish the compiler was less certain about pointer providence so it would be forced to write slower code."
So Rust assumes you don't understand the machine, but assumes you understand Rust... if you don't understand Rust, you get errors.
C assumes the programmer understands both the machine and C, and if you don't you get errors.
Given that C is obviously more simple than Rust, the question is:
Is understanding implicit/explicit Rust abstractions simpler than understanding the hardware itself?
Regardless, it would be waaaay harder to write C code if basically everything used restrict and const by default.
this is way way over my head...I'll come back here a year or two to make sure I gained more knowledge xD
awesome stuff!
Great video, wonderfully clear explanation. Loved the four side by side rust/c examples with the mutable pointers. I kinda knew that rust optimises with its aliasing rules in mind, but I didn't think to connect that to UB!
Really good video! Super well communicated and informative.
Thank you for the Playlist. At least I have lessons being learned in order. Focus guaranteed. Thankyou. Great work Sir. Michael.
This is incredibly approachable. Thanks so much for making this video.
I'm not a Rust programmer and I've presumed unsafe mostly turns off all the checks. I've heard it's not quite like that but wasn't clear on how.
Already at 8:06 I presume a large part of the point here is made (I promise I'll watch more later). Rust unsafe just wraps unsafe actions. Not unsafely handled data or anything else.
What I had imagined is some sort of clobber declaration. Like gcc extended asm with its clobbers. You declare what you've touched so it can retain checks for the rest of the code.
This was much nicer.
Yes, this is a great description of what unsafe Rust is and isn't. If you'd like a flood of more details, you can also take a look at a type called MaybeUninit from the standard library. This type is how we represent possibly uninitialized values, which are illegal in most other contexts. The distinction between which methods are safe and which are unsafe on that type is very informative I think. Note that it's perfectly safe to construct it, but generally unsafe to do anything useful with it. It's also interesting to note that this type wasn't present in version 1.0, and it was added later once some of UB problems it's solving were better understood.
What unsafe changes in Rust is *only* allowing a few operations forbidden in safe code - otherwise it’s just as safe as safe Rust (ie. *nothing* changes in safe code wrapped with the unsafe keyword), and those are:
- dereference raw pointers (you can create raw pointers in safe Rust, but to read anything from them you need unsafe),
- call unsafe functions (those might be C FFI, or functions that need you to manually ensure some invariants aren’t broken to make calling them sound),
- access fields of (untagged, C-like) unions (Rust enums are safe tagged union-types, but if you have a union without a tag marking which variant it’s in, you have to use unsafe to access its internals),
- implement unsafe traits (which typically are “marker” traits, ie. it lets you for example manually mark your type as thread-safe even though it contains non-thread-safe fields),
- mutate static objects (which can only ever be read in safe code),
and that’s it.
And all the “safe” types and interfaces are generally built on top of those, that is low-level safe functions will often use raw pointers or external C functions in their implementation, but will do that in a way making sure that no required invariants can be broken by their caller, and thus being completely safe to call from safe Rust.
@@benedyktjaworski9877
> mutate static objects (which can only ever be read in safe code),
You cannot read mutable static in safe Rust, btw.
In general, mutable statics are very hard in Rust.
@@ХузинТимур Yeah, you’re totally right!
I was thinking about static objects in general (you can have immutable ones that you can safely read and cannot ever mutate) and oversimplified, my bad!
Late to the party. This was an amazing presentation. The _ = &mut x line breaking the code is super scary and I wonder what rule we are breaking in that case..
When I wrote that example, my understanding was that it violated the experimental "stacked borrows" rules that Miri was enforcing. But I've heard that Miri has changed since then, and I'm not sure what the current model is.
@@oconnor663 thanks for the answer, have a great new year's eve!
works perfectly, thank you
you do great presentations dude, keep it up. i'm learning so much
Re the pathological example around 24:00:
This is a consequence of Stacked Borrows rules. Stacked Borrows is overtly strict, since it wants to allow the optimizer to always _add writes to mutable references._ Therefore, it does a "fake write" whenever such a reference is created, and this fake write means the shared reference must die (since otherwise adding an actual write is wrong). As you point out, this is often unintuitive, since no actual write happened.
Tree Borrows, the alternative aliasing model for Rust (that we really need to add to the playground) is more lenient. There, your pathological example is not UB. But it loses some optimizations about writes.
Which of these will be the official one in the end is as of yet undecided.
Full disclosure: I am a coauthor of the Tree Borrows rules. If you have any feedback/questions, just ask them here.
Oh wow, it's awesome to hear from you. I don't know much about exactly how Tree Borrows is different, but I have to turn it on to get Miri to accept casting a &T into a &ReadOnlyCell, so I'm excited about that :)
@@oconnor663 Here are some slides that explain it by example: perso.crans.org/vanille/share/satge/arpe/etaps.pdf
Thank you for providing this video. I wish that you can create a playlist for "The Rustonomicon". 👍
I loved this video, hard but pretty fun
that really was amazing. thank you
Very interesting video! And speaking of unsafe code in Rust, is there any tool that would help us to check memory leaks?. Let's say we wanna write CRust FFI and there is a "rule" that says "if u allocate memory in C u have to also free that memory in C, and same applies to Rust". So my questions is: Is there any tool that checks whether allocated memory was freed properly? ( Especially for rust, for C we can use something like Valgrind I believe).
I think if you compile both C and Rust with ASan turned on you might get something like what you're describing? But I haven't tried it myself. (Miri also includes some leak detection features, but I don't think it supports FFI.)
Thank you!
The nightly compiler has ASan support.
Valgrind works for Rust too, though I believe it shows some leaks that it shouldn't, last I heard.
C and C++ are very different. I hate when people treat them as if they were the same, just because there is a large common subset.
The "overlapping" example violates the same strict aliasing rule, by accessing storage that has int objects, but no Foo objects, through a Foo pointer. In most cases, it's UB to access an object through a pointer of different type. There are some exceptions. I'm coming from a C++ background, but as far as I'm aware, these aliasing rules are the same in C, except for unions.
Hooooo been waiting for another vid from you.
In "a pathological example" I believe everything would be fine if you had written x_ptr.read() instead of *x_ptr. It's always seemed like a mistake to me that Rust pointers can be casually dereferenced since transient references are such a common source of UB in unsafe Rust code. Sure, you need ways to create references from pointers but they should be a lot more explicit and eye catching than something as casual and commonplace as the dereference operator, which just encourages the conceptual conflation of references and pointers. Even in cases where dereferencing would be safe, I always default to explicit read/write calls.
The good news is that MIRI is reliable at catching this class of UB as long as you have full code coverage for unsafe code.
I'm pretty sure doing it with .read() is totally equivalent, at least in this case. Miri still thinks it's UB: play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=63f5f73daf0dc0413f7312af5cdb2d4e
@@oconnor663 Thanks, the playground was down earlier today or I would have checked myself. Just goes to show how easy it is to mess this up.
Awesome presentation, thanks!
What a great presentation!!
Great video! I learned a lot 😊
It would probably be undefined behavior in C if you declared x and y as restrict pointers, that tells the compiler x and y cannot be aliased so it optimizes accordingly and always return 42
Correct!
Too complicated for me, but you and your videos are kinda captivating to watch :)
In 32:17, you say you think the casts [from `(int *)` to `(Foo *)`] are allowed. That's true, since any casts between data pointers are allowed (as long as there are no alignment or similar issues; let's ignore that part). However, I think the same rule of the previous example applies: You cannot access the data through that pointer. You must cast back to `(int *)` before dereferencing. So, the access to `foo1->y` is already illegal (access through a wrong type) even if you would remove the other (partially-overlapping) write.
You'd need to use either memcpy(3) or a union to be allowed to do type punning.
What's the rule on "header" structs? Like if the first field of `struct foo` is of type `struct foo_header`, can I pointer cast between them in some cases? I'm not aware of any rule that says yes, but this also seems to be something that C projects (like CPython) do constantly, as sort of a simple kind of inheritance.
@@oconnor663 It's . That rule still requires that the structures are within a union.
So, yes, most C programs that use such structures do trigger UB. For some reason, there's a misconception about this rule, and people seem to believe that just by sharing a first member, two structures are at least partially compatible. That idea probably comes from times _before_ ANSI C, and back then before standard C this was probably true in compilers of that era.
There was a discussion about the sockaddr(3type) structures, which are a clear example of UB. Those APIs predate ISO C, and force the programmer and libc to cause UB. It was decided to add wording in POSIX.1-2024 to clarify that compilers must make sure that sockaddr(3type) structures don't cause UB even if they should according to the general rules of ISO C, by using any compiler-specific means (such as compiler attributes that make the types compatible even if they aren't).
@@oconnor663
@@oconnor663 It's 6.5.2.3p6. For some reason, youtube was removing my comments with a link. That rule still requires the use of a union. And yes, most programs violate that rule.
That misconception that the common initial sequence allows seemingly unrelated structures to be compatible probably comes from the times before ANSI C. Back then, compilers allowed such uses. Now they don't anymore.
It's interesting that sockaddr(3type) is a POSIX API which makes a bad use of that rule in the standard. It is a standard API that forced users and libc to invoke UB. The wording in POSIX.1-2024 was changed to say that implementations must use any compiler-specific features (such as attributes) to avoid invoking UB in uses of that API, which according to general ISO C aliasing rules is UB. So, it's basically a recognition that the rules conflict with pre-ISO C.
I tried the code in the minute 15:00 myself, and using compiler explorer like in the video found that this rust UB does not happen anymore from rustc 1.75.0 forward. Can anyone else confirm this as I dont seem to find the explanation specified in the 1.75.0 update change log.
Thanks for catching this. To reproduce the same bug on current Rust, I need to add #[inline(never)] above foo(). Here's an updated godbolt: godbolt.org/z/T6od4qsoz. Using cargo-bisect-rustc, it looks like the first nightly that "broke" the original example was 2023-10-19. Of the commits that went in that day, it looks like one of them was an LLVM update, which was probably the culprit: github.com/rust-lang/rust/pull/116840. I'll add this to the list of errata in the video description, and I'll add the updated godbolt to the list of links.
32:15 I donʼt know about C, but in C++ Iʼm almost certain that at least the cast on line 17 is not allowed. The one on line 16 used to not be but I think now *is*, although Iʼm not certain of that. Iʼd need to double check how providing storage for contained objects works again now.
I just wish there was a sanitizer we could run to answer this question.
@@oconnor663 This is exactly the kind of question I feel like ubsan can have bugs with, but maybe thatʼs a wrong impression because one of the first times I used it I ran into a (since fixed) bug, where the compiler compiled perfectly valid lambdas in a way that would be UB for the programmer to write even though it proceeded to compile that intermediate form in a way that did the right thing, and then UBSan came along and caused it to crash as UB.
In all other cases Iʼve heard of it only gives false negatives not false positives.
Hitchhikers Guide to the Galaxy... 42
Trying to watch this on a mobile phone or even a "sensibly sized" monitor makes me wish i had binoculars!
It looks not bad on big phone screen.
great video!
32:00 I can't find the exact statement in the doc, but accessing an array `A[]` via pointer `B*` is UB IIRC
One subtlety here is that we're not loading any B's (that is, Foo's) from the array. Instead we're only accessing the int members, which does match the array type. I *think* this makes each individual access ok according to the standard, but I could be totally wrong about that.
@@oconnor663 Imagine a CPU has a dedicated instruction `load-8byte-aligned ptr, offset`, which is faster than generic `load ptr, offset`, and the compiler decided that all structs must be 8-bytes aligned
Definitely agreed that this code will be broken if Foo has alignment other than 4. But if we imagine adding an assert that Foo alignment is 4 before the pointer cast, and we didn't create the second pointer, do you think accesses through the first pointer would still be per se UB?
@@oconnor663 I have no idea
@@exl5eq28 honestly, same. C really needs a tool like Miri that can (aspirationally) catch 100% of UB, at least in pure, deterministic programs like this one. But my impression that this is almost impossible when the standard isn't designed in advance to accommodate it.
Thanks for a very informative video - it certainly made me think about some things that I hadn't really considered too much before.
Regarding around 34:00 the standard way to make sure C doesn't do the unwanted optimisations is to use the volatile keyword for the function inputs. That tells the compiler that the values referenced by the pointers can change at any moment, this makes the code behave as expected even with -O3.
Oh that's an interesting point. It does make sense that volatile would suppress this optimization. That said, volatile has a history of being abused for purposes that the standard doesn't actually guarantee it's good for (like atomic communication between threads), and I'd want to be careful about relying on this.
Jack u really thirsty in this video
Lol it took me a long time to realize you meant literally drinking a lot of water and not www.urbandictionary.com/define.php?term=thirsty.
good stuff
Eyyy first. Love your talks. What happened to your camera's saturation though? Is there insufficient lighting?
It's a combination of problems. A cheap new webcam, a new apartment lighting situation, and the fact that I still don't quite know what I'm doing. I'll get there.
42:10 wouldn't this assembly cause incorrect behavior if the int64_t pointer is pointing to two int32_t elements in the middle of the array?
I guess that the compiler makes the assumption that the 64bit pointer can't point to the middle of the array because a pointer cast like that is undefined behavior (?)
I don't know the rules to this kind of pointer casts, honestly
Yes it would certainly cause problems, but the way I like to think about it is that the "strict aliasing" rules define those problems to be the programmer's fault rather than the compiler's fault.
As much of the video mentions, the C standard specifies that compilers may assume pointers of different types don't overlap in memory. The uint64 is a different type to the uint32 array, so the compiler can make an optimisation that is broken if that rule is also broken.
The uint32 pointer is the same type as the array - and by the rules laid down by the standard, the compiler cannot assume that it occupies a different location in memory, so it has to do the calculation the long way, in case the pointer is overwritten midway through.
I imagine there is some logical justification to this in the sense that, generally speaking, in a well-formed program data of different types should not overlap in memory (and data of the same type shouldn't overlap at different addresses), but some common techniques like subsetting part of a struct with another struct probably run afoul of this more often, and is why things like Linux (which use that technique a lot) need strict aliasing to be turned off.
Why is the pointer cast at 27:55 unsafe? I get why the example produce the wrong result, but is the pointer cast *alone* always unsafe, or is it only for this example?
I don't think this pointer cast all by itself is UB. Like if you were to cast the pointer and then just throw away the result, that should be allowed (but pointless). However I think actually doing anything through the cast pointer is an "access through an incompatible type", which is UB according to the current standard. There's a footnote in the standard that reads "The intent of [this rule] is to specify those circumstances in which an object may or may not be aliased", and all the cases I'm aware of that produce the wrong result by breaking this rule do it by aliasing the same object with two pointers of different types, so it's fair to say that _in practice_ you have to write code that's kind of similar to this example to break things. But relying on the intent of the standard is a dangerous business, even when that intent is explicitly spelled out in a footnote. The letter of the law is that access through an incompatible type is UB, period, and I think that's sufficient grounds to say that it's "unsafe" to do anything at all with this cast pointer, whether it's similar to this example or not.
That said, someone else pointed out in the comments that a proposal called "Improved Rules for Tag Compatibility" (N3003) has been adopted into C23, and that's going to change these rules somewhat. So this particular example may actually become legal, if you're using a very recent compiler.
I think your strict aliasing example will be legal in C23 (with the inclusion of N3003).
Oh that's super interesting!
Added to errata.
@@oconnor663 It's still UB in C23. N3003 only allows structures with the same tag and contents to be compatible. The example uses a different tag, and so is UB.
Ah thanks for the correction. As you can tell, I still don't understand these rules very well.
Is it correct that you can use have ailaising mutable raw pointers *mut T but if and only if there are no other references to the same memory location?
Yes, *mut pointers are allowed to alias each other. However, you have to be super careful with what you do with them. For example, if you dereference one of them and call a &mut self method, it might not be ok for the other pointers to alias that implicit &mut self. The exact rules for this have been changing a lot over the past few years, and the only consistent recommendation I can give is to test whatever you're trying to do under Miri 😅
Please share more videos, because to me they look better than Avatar The way of water
great video
That's the video for all the ignoramuses that claim that Rust is overcomplicated. No, Rust is exactly as complicated as the problem that it's solving. Many people write undefined behaviour in C without ever knowing it, all it takes is one recast. Not to mention the absolutely ridiculous performance hit that comes from poor optimization coming from pessimistic aliasing analysis. Yes, you don't run into that sort of an issue often, but in larger code bases you inevitably will run into a problem like this. And it can literally take weeks to track down.
the _ = &mut x could throw an "unused" or "useless" error in the future instead of just compiling
Are you sure that would be an error and not a warning?
@@oconnor663 a warning that would cause cascading errors, but something at least
Not getting such warnings is kind of the point of assigning to _.
It's not very intuitive because it's different than any other assignment. "_ does not bind" meaning that anything assigned to _ will be dropped immediately instead of at the end of the scope. That's useful in certain situations, like destructuring tuples "let (a, _) = something_returning_a_tuple();" or avoiding to keep something with #[must_use] (like Result) in memory.
💪
unsafe rust is way worse than c. ive heard rust will be fixing unsafe some time though (i heard this a while ago and havent kept up so maybe it already happened?)
Sa. TNice tutorials quarantine is making question my whole existence.
unsafe is a misleading keyword.
Yeah the distinction between "unsafe" and "unsound" is really important and not very clear from the names. That said, I can't think of a better keyword.
The statement about Rust programs being 95-99% safe code just isn't entirely correct. It depends on what kind of applications you are writing. Ash, Rust low-level wrapper for Vulkan, the one that can actually give you C++ levels of performance, assumes that most of your Vulkan code will be wrapped in unsafe blocks, including your main rendering loop. (and man, it does look like an ugly clunky cousin of a similar C++ code there with all those CStr::from(b"hello world") all over the place) The same will be true basically for any code working with hardware or low-level OS API calls, like memory mapping. Basically that's why I've noped out of the idea of using it for game development after some evaluation.
Yep, and that's why "Rust will replace C++" is just not true. Maybe in some cases, but not entirely for sure.
That's probably a byproduct of GNU libc dependency. There's this "crABI" proposal that aims to deprecate libc in favor of a universal ABI that multiple programming languages (and graphic APIs such as Vulkan and OpenGL) can bind to
To be fair, most programs are not game engines. Especially most programs where it's important not to crash.
@@JoJoModding That's why I stated that "it depends on the application". But games are simply the most obvious example. If YT would still be allowing posting links to other videos in comments, I would share a link to a video of one of those "Rust gurus" on YT introducing UB into his Cell container clone with 20 LoCs via "unsafe", having a second thought about it, and then dismissing it with "nah, compiler knows better". IMO Rust will make the code that produced by juniors - or even AI - safer, while giving those working on lower levels shiny new high caliber guns to aim at their feet. And just think of the possibilities Crates' centralized code base has for slipping obscure back doors into everyone's code in unsafe blocks somewhere deep down in the code.
11:22 Why chose 99?
Is this the way to turn the other magic number into SFW?
No very special reason. It's just that when I need "an arbitrary example value with no special meaning" my first choice is 42 and my second choice is 99.
@@oconnor663 I wonder how many people use 42 and not know it's origin. Similar to people not knowing where Hello World originally comes from.
@@cthutu I've definitely read Hitchhiker's Guide, but I don't actually know where hello world comes from 😅
This video could have been /3 the length it is if it got straight to the point
It could've been thirty seconds:
1. Rust doesn't have the "strict aliasing" rules from C and C++.
2. But all Rust references are effectively "restrict" pointers, so getting unsafe Rust right is harder in practice.
3. It would be nice never to have to worry about any of this, but it turns out that a lot of optimizations don't work without aliasing information.