You don't need to remember its name. You just need to remember that a certain magical keyword exists for when you need extra speed in a program with a lot of pointer arithmetic and no overlaps between the pointers. When you need it, just Google it.
Really enjoyed the video. I always found everything related to compiler optimizations very interesting. After seeing and understanding this, now I know why LLVM can occasionally optimize Rust code a little bit more than C code, it's because the "restrict" idea is baked into the rust borrow checker! Thanks for the video!
Although actually, this is still disabled in most cases because LLVM has/had lots of bugs related to this stuff because it's so rarely used in C and C++. Though all the bugs found so far have been fixed now so maybe this time around it will finally make it into the next stable Rust release, if we're lucky and they won't find another misscompilation in beta again like the last three times.
3 ปีที่แล้ว +8
@@1vader yeah, maybe that's also a good reason to avoid restrict in C until those things get ironed out.
Yeah, although in fairness, programmers don't really think about this result. Since only the & mut is guaranteed to be exclusive, I'm not sure what results that will have. I mean all the situations explained here should still always work with Rust, but I'm not certain if we missed something. Another thing where Rust is able to gain speed, btw, is because of its move semantics and/or call conventions.
3 ปีที่แล้ว +9
@@9SMTM6 not sure I understand. The issue with Rust is there are so many &mut llvm screws up. Or better said, it's issue with LLVM. Of course the proper fix is just fix those bugs but we need to have something working in the meantime.
Would love to hear what he has to say :) But for now, there is a great video about inline in the C++ weekly series from Jason Turner: th-cam.com/video/GldFtXZkgYo/w-d-xo.html
I'm just as interested in your C/C++ videos as I am in your python and algorithm videos. You are great at picking interesting and useful topics and condensing them in a short video.
There are currently 0 uses of the restrict keyword in numba, an opportunity for more speed perhaps? Though I'm not sure if numba is bottlenecked anywhere that this would help. Nonetheless thanks for sharing my video!
@@mCoding on that I'm not surprised. I don't think Numba itself is likely to be sped up by its use. But the specific issues around aliasing, overlapping memory, etc. that the restrict keyword deals with is something I know he spends a lot of time thinking about!
I dont know squat about C or C++, but some detailed insight about how compilers approach stuff like this and ways to think about code optimization is useful!
Even if it doesn't make sense or seem relevant now, one day it could just *click*. Happens to me all the time. Ohh yeah that thing from 10 years ago I totally get it now.
10:05 memmove actually doesn't need any temporary buffer. It works the exact same way as memcpy, except it checks the pointer overlap by comparing the addresses and reverses the copy order (copies from end to start) if the destination is located after the source in memory. This ensures the data it overwrites is either unsignificant -outside of the source range) or has already been moved.
I think there's also the nitpick that it can't use SIMD instructions in some cases; consider this memory chunk: 0x10 - 0x15: 00 01 02 03 04 05 We want to copy 5 bytes from 0x11 to 0x10, therefore we expect the following result: 0x10 - 0x15: 01 02 03 04 05 05 However, with (4-byte) SIMD, the first shift would result in this: 0x10 - 0x15: 00 02 03 04 05 05 There is one remaining byte to copy, from 0x11 to 0x10: 0x10 - 0x15: *02* 02 03 04 05 05 Whoops!!! A solution could be to also use a SIMD read, but... that would simply mean accessing some memory locations more times than necessary.
@@erikkonstas SIMD are not necessary here since ther's no computation to apply to the bytes. It's a simple copy, so just using a wider data type (registry) is enough. (I don't know how the compiler would optimize this, but using SIMD seems overkill since a normal registry can perform this task just as well). Basically, the single fact of reversing the copy order depending on the "direction" of the overlap is enough to ensure we get the expected result. In your example, we want to copy from 0x11 (source) to 0x10 (dest). Since the dest is "before" than the source, we just copy normally from left to right, which, starting with the same base memory chunk, gives something like this : 0x10 - 0x15: 00 01 02 03 04 05 First, since we want to copy 5 bytes, we can copy 4 at once with an unsigned 32 bits integer. We do so from 0x11 to 0x10 , which give this result: 0x10 - 0x15: 01 02 03 04 04 05 Then, since ther's only one byte left to copy, we can just use an 8 bits integer copy from 0x15 to 0x14: 0x10 - 0x15: 01 02 03 04 05 05 Now, if we were to copy from 0x10 to 0x13, we would have to do the same in reverse (copy from right to left), because destination (0x13) is biger than source (0x10) this time.
I love this. There are some many beginner programming videos out there. It’s nice to be able to watch something that helps me improve, even as a working professional.
That's my goal, something that even experienced professionals can enjoy! (Not something I achieve in every video, my viewers are very knowledgeable already)
@@mCoding 17 year professional here (Python/Scala/Java/C++), love the videos. I only wish you'd been doing something like the Python 3.10 preview videos (which were excellent) for C++ back when C++11/14 were coming out-- I hardly recognize the language anymore.
This was a good one, your explanation was good and easy to follow too. I was intrigued by restrict ever since I noticed it in some of the string.h function declarations. I would definitely like to see more content around compiler optimizations, C/C++ weirdness (or features) and maybe assembly too!
Today morning I was reading getline function there was restrict keyword used in parameter list. Searched it online was able to somewhat make sense out it. Your video very much cleared the useage. Thanks.
Aliasing is the main reason Fortran has historically been faster than C/C++. In Fortran aliasing is forbidden, it is illegal to pass the same argument twice to a function or overlapping regions of an array.
Benchmark, Benchmark, Benchmark!!! As you said at the end, in most cases you don't really need this keyword. It only makes sense when you have benchmarked your code and made sure that you would really benefit from the added mental overhead of using functions with restrict parameters. Really interesting as always!
in Rust aliasing mutable pointers is illegal and the compiler prevents it so when LLVM is more stable the Rust compiler will apply noalias optimizations to all pointers by default so that is very cool
Don't benchmark; profile. Benchmarking software seems to be related to hardware; profilers are what we use to measure the performance of software. The distinction may seem negligible on a high level, but there's this problem with people writing their own erroneous "benchmark" code, whereas they ought to save time using a profiler instead, and come to a correct conclusion (rather than an incorrect conclusion) as a result.
@@sebastianramadan8393 well, kinda. Both have their place. Benchmarks are certainly easier to reproduce and therefore will give you a solid A-B comparison. But it is easy to write an irellevant benchmark if you dont set it up like in production. Its also easy to just randomly slap useless optimizations everywhere by *just* profiling I'd say: Profile first to find bottlenecks -> try building a *reproducable* benchmark around those bottoenecks -> optimize -> benchmark again -> profile in production. Obviously, it depends on what you're doing and the complexity of your products, but that would be my approach.
I always wondered about `__restrict` you see in C standard library declarations (as seen when writing c++). Very informative. Also all the godbolting is very appreciated, thank you.
I didn’t know restrict was even a thing! You captured my attention nicely with the mysterious title, then proceeded with a very captivating technical explanation of what restrict is. I loved it! Video-production note for you: I would suggest having less noticeable cuts between sentences. I don’t know what your setup is, but maybe a teleprompter would help?
My sentences are so short because I'm speaking spontaneously, I don't have a script. I could probably up my production quality by making a script ahead of time and using a teleprompter to get longer shots 😅. Thx for the tip I will look into teleprompters.
it is one of the things the committee for whatever reason just fails to accept: Restrict is a very useful keyword. If they have problem defining what it would mean in relation to a "this" pointer - then just say it is not allowed or make it undefined behaviour - as they have done with tones of new features. It would also be less of a problem if they would not have block every single suggestion for strong type aliases: Often you have multiple parameters that are conceptually different types but fundamentally just integers. Say matrix-indices. If you could give them actual strong types then not only would it give rise to the strong aliasing rules and thus directly act like "restrict" does, it also would prevent a whole butload of bugs that happen due to mixing up parameter order. I had to deal with a function taking in 5 int-pointers for arrays for some simple statistics.... the genius who wrote that called them "param1" to "param5" ... have fun figuring out that the first and last are used as in-out-parameters while the other 3 are strictly for reading data.
Definitely worth making. I haven't looked at the C standard in years and I see that I should. Years ago I was someone who spent a lot of time answering questions about both the C and C++ standards. But I see they've made more changes than I expected.
I am interested in picking up C development again. The C17 and C23 additions are interesting. I enjoyed writing C code back in the day, but the majority of the industry moved on forcing me to adapt. I miss writing code for the machine rather than writing code for the abstractions. Somewhere things went off the rails with layering abstractions on top of abstractions, and we've forgotten that we're writing software for real world machines with real world limitations.
@@potato9832 It definitely had its upside. I worry that today, that us old school are few and far between. The upside of that is that people that actually know how things work are in demand. The majority don't actually understand how stuff works, because they don't have to.
Actually, C++ does have an analog for C's restrict keyword. When you make the parameters different types, the compiler uses the same behavior as if the restrict keyword was used. This is why you might see "strong type aliases" used in some cases, since they can be a negative-cost abstraction.
Hmmmmmmmm. I'm not sure if I consider the strict aliasing rule (which is what allows strong type aliases to enjoy restrict-like optimizations) to be equivalent to restrict since sometimes the types really should all be the same, or allowed to alias like void* or char*. But I did consider talking about it in this video so I guess they are closely related...
@@mCoding I guess the point of @Nicholas Bradon was that C++ with its capabilities of generic user-defined types allows you to enable the same compiler optimizations. For example you can make a source and a sink type as strong types which even convert into each other. But since they are actually different types as parameters, they don't alias and thus enable the same optimizations as the restrict keyword.
Regarding memove and temporary buffer: You can check at the begining where the areas overlap. It the dest starts at higher address than src then you could reverse the loop, not needing temporary buffer.
As I do almost exclusively competitive programming ("almost" because I also have university assignments), I never knew about this detail. I never had to use the 'restrict' keyword. This reminds me, that in fact, I never had to use many, many keywords. So here are a few questions, perhaps ideas for future videos: What the heck is extern? What does static actually mean? Is it meant to mean something along the lines of "global for this function / structure / class" or is it about being inaccessible by code from other files? What does volatile mean? What are the differences between structs and classes in C++? (I know there are several, but honestly I have no idea if they're interesting enough for a video) What are the different stages of creating an executable from source code? (Preprocessing, compiling, linking, and possibly a couple others which I have never heard about) What are the other questions I wanted to ask which I do not remember right now? Man, so many questions... Loved the video! Cheers!
I'm not an expert, but maybe I can roughly answer a few of those questions. What is extern? To answer this, we must first differentiate between declaration of a variable/function and definition of a variable/function. Declaration introduces a name and type of the variable in some scope and tells the relevant parts of code that the variable exists somewhere in memory. Definition actually specifies where the variable/function is defined and/or what value it has: void foo(); // declaration void foo(){} // definition Consider this global variable: int bar; This syntax declares the "bar" variable with type int and also defines it as a global variable, i.e. it lives globally in memory. But what if you actually didn't want to define "bar", just declare it? That's exactly what the extern keyword is used for: extern int bar; This declares "bar" of type int, but it doesn't define where it lives yet. That means you can't actually use it, if you try to set it or use its value, the compiler will complain that "bar" is declared, but not defined anywhere. The advantage here is that you can declare the same name multiple times in different modules, and then define it in one module and all the modules will refer to the same variable when they use it. It's the same concept as declaring functions in a .h file to include the same function in multiple modules. If you don't define the function, you also get a complaint from the compiler (or linker, I should say). Difference between class and struct? There is almost none. The only difference is that members of struct are public by default and members of class are private by default. It also conveys some semantic difference to a human reading your code, but to the compiler, they are the same thing. What does volatile mean? Volatile tells the compiler that it can't assume anything about the value of the variable declared as volatile. Consider: int i = 0; while (i==0); The compiler can notice that i doesn't change and so the while loop is infinite, i.e. it optimises out the i==0 check and just loops indefinitely. volatile int i = 0; while (i==0); i is declared as volatile, so the compiler can't assume anything about its value, so the check i==0 will not he optimised out, in case somebody changes the value of i from outside the loop. This keyword is commonly used in multithreaded programming, or also with memory mapped I/O. What does static mean? To my knowledge, static can mean different things based on the context where it is used in C. Function or global variable defined as static is considered local to the compilation unit, i.e. if you declare an extern int bar in unit A and define a static int bar in unit B, then bar in unit A is still undefined because bar in unit B is local to unit B. Similarly for a function. If you define a variable static inside a function, then that variable is understood to live globally, i.e. not on the stack of the function. This means that when the function is executed multiple times, the value of the static variable is retained, unlike a value of stack defined variables.
@@stewartzayat7526 Thanks a ton for the replies! Especially thanks for sharing knowledge about extern. That being said, I still have this feeling that struct and class are different from each other in several, very minute ways besides the default member visibility.
@@Maurycy5 I guess I forgot to mention that inheriting from struct uses public inheritance by default and inheriting from class uses private inheritance by default, besides that, however, there is no difference. There is a geeks for geeks article describing some other differences, but they are saying straight up plain nonsense there.
TL;DR of Stewart's great effort post above: extern - The function or value is defined in another object file. static - The function or value cannot be used to define an extern declaration in another object file. volatile - The compiler must never assume the value bound to this identifier. Maybe you have never worked on a project with multiple object files. This is where multiple source documents are compiled each using `gcc -c objN.c -o objN.o', and finally linked using `gcc obj0.o obj1.o obj2.o main.o -o binary_exec', where binary_exec is the executable run by the user. The linker will attempt to resolve definitions for any undefined identifiers declared with extern by looking in the other object files present on the command line. You might wonder what a reserved word like `__extern_always_inline' could mean. This tells the linker to attempt to inline the specified function where it is referenced using an extern declaration. This is part of a process called link-time optimisation (LTO).
I love the acknowledgment of the real world applications :) In many videos like this over YT those presenters act as if there's only the "pure standard" and the land of "undefined behavior" while in practice, when you are an actual paid programmer that ships products you know you need to bend the rules often.
The restrict keyword (specifically noalias) is a little bugged right now in LLVM and GCC and may emit unsound code even when used correctly. Additional test cases from the Rust project exposed some deficiencies in the original implementation
@@meowzerus yeah, it was "fixed" multiple times, but I haven't heard anything about it in a while, so either people stopped looking too closely or it really works now.
wow - I've been using C for 35 years and never noticed or used this. I've don't think I've ever seen this in code reviews either and I've worked in many different industries. Thanks for this. One of the things I wish I did more was to look at the assembly generated.
In C++, unique_ptr might go in a similar direction, while not allowing pointer arithmetic etc. But it's a way to enfoce via the compiler to disallow aliasing to a pointer.
Actually, unique_ptr does not come with semantics of disallowing aliasing. unique_ptr is about ensuring there is a unique *owner* of the object, i.e. the unique_ptr is uniquely in charge of deleting the object. It is fine and very common to have many non-owning raw pointers pointing to what a unique_ptr owns. As such, the compiler cannot assume a unique_pointer's underlying pointer is not aliased.
@@mCoding well yes, I assumed that one does not mix it with raw pointers. But I don't know whether the compiler uses unique_ptr for optimization purposes. Probably not, for the reasons you mention.
Stumbeled upon this video, and I must say I really enjoy this proper programming video in the sea of framework and loose concept videos that are out there.
0:00 - ok, this sounds like not just a welcome message, but more like "Hell, bit*hes, who said I only make Python videos? You wanna low programming! I'll give you low programming! you thought making stuff in pure C was the end? Prepare your sorry arses to some assembly!"
Side note: In Rust everything is restrict (I guess except unsafe pointers used for interfacing with C/ffi). Meaning you simply can't pass a reference to the same thing as mutable and immutable (or even as mutable twice) at the same time to a function. So in theory Rust can always apply those optimizations, but I know that in the past there where problems with LLVM that prevented Rust from enabling this. Meaning Rust is the first time this optimization feature of LLVM was used to that extreme extend and thus uncovered compiler bugs (miscompilations). I don't know what the current state of this is, if LLVM has fixed all the issues and it was re-enabled in Rust. I know there was a back and forth for some time where Rust enabled it, found bugs in LLVM and disabled it again until LLVM had fixed it and repeat.
Fair enough although there are other places where safety tradeoffs are made in Rust for example in array bounds checking. I think with both of these things the micro-optimization is not significant enough to outweigh other factors (and if you did need that level of performance you would be checking the output assembly anyway to ensure everything is as good as it can possibly be). EDIT: Nevermind, the vector_add example has convinced me this is probably more significant. Would be interested to know if any of these would stop working in the presence of bounds checking (probably not)
@@samuelallanviolin752 If you can do the stuff via iterators instead of array index you don't have a bound check (only the implicit loop condition). And even with bound checks branch prediction helps. But yes, it is a thing where it can reduce performance.
I really love your C/C++ tutorials. Can you make some guides about C++ debugging and development systems? There are not a lot of tutorials out there that cover topics such as debugger attachment or, for example, how to develop and debug a dynamic library that is being used by an already build or compiled program
I appreciate the gigantic text, which is unlike all other programming presentations where the host is unaware that an audience can’t read 12-point font on a 4K screen.
It’s very rare that i consider a TH-cam coding channel to actually be presenting any sort of correct or useful information. This channel is one of those rare cases
The closest I have come to working with c++ is some 40 line arduino code, but I believe learning about this will be beneficial in the future none the less.
The only thing I would change is make the overlapping function the default. Make the restricted function the longer named "opt in" function. That will help make sure future maintainers don't shoot themselves in the foot.
To add to what you said about having two versions of the function, one with restrict and one without: if the function is particularly complex, you might want to put the body in a separate file and #include it inside both functions, so you don't have to keep track of two copies of the same code. Or you can use a macro I guess, but it might look ugly having to put a \ after every line.
#define fubar(x) _Generic(x, default: (char*[]){"hello", "is there anybody out there?", "not if you can hear me", [sizeof x]="is there anyone home?"} fubar('a') struct relax { size_t information; char hurts[]; }; There's also something subtly different about the scope of nested declarations. If I recall correctly, in C, the declaration for `struct bar` is hoisted outside of `struct foo`: struct foo { struct bar { int x; } x; int y; } ... so that you can use `struct bar` elsewhere. C++ has some strange clause that renders a recursive `main` entry point undefined... Eh, there's still quite a few differences. char *blatant_cpp_error = malloc(0); // write C like this; don't cast to facilitate for C++ or you're enjoying the worst of both worlds
Request: would you please consider doing a video on all the different value types in C++ (i.e., glvalue, rvalue, lvalue, xvalue, and prvalue)? That is the thing I find the most confusing.
It's still there in C++. Its hidden inside several Libs in fact. Its also inside a BitLib that IBM developed and is called UNDERSCORE + restrict + UNDERSCORE and does the exact same thing with pointers being assigned.
Nice video as always. However, I would've enjoyed a side by side view with the Assembly with and without usage of the restrict keyword to find out which instructions are actually changing (or disappearing).
Fair enough, I kinda expected people interested enough to plug it into compiler explorer for themselves :). I definitely considered doing the side-by-side at one point but I don't recall why I decided against it.
This strongly reminds me of mutable references in Rust. In fact, I think a * restrict T is almost exactly like a &mut T in Rust. Which explains why there are so many potential dangers with restricted pointers.
btw rust always provides the same pointer aliasing guarantees as restrict, as there can't be more than one mutable reference, and you can't have any immutable ones while you have a mut one. so for example you can't pass add_sub(&x, &y, &mut x) which allows the compiler to safely and automatically apply these optimisations. this is one of the guarantees that makes rust sometimes, but very rarely, outperform C in benchmarks. Of course when compared with proper optimized C code (with these guarantees specified explicitly, e.g. the restrict keyword and friends) it's more or less equal or *slower.* (slower: mostly due to additional checks Rust does, for example array bounds checking, which is included with all array accesses by default, unless the value is guaranteed to fall into array's size (this behavior can be changed using unsafe code))
you should do a video on using the compiler, building, and make files. like deep meaning of all the flags and stuff. it seems their are a lot of tutorials on how to program in C++ but few on setting up, using, and modifying big make files. so to new people its kind of feels like RTFM but when I type in man g++ Manual assumes a lot of knowledge, the manuals are sometimes confusing, non existent, or maybe I am just not finding the manual for the job. but you explain things well so you should go threw compiling and make files.
Great video. Regarding function naming, give the special name to the function that has special requirements. I.e., 'vector_add_no_overlap' and 'vector_add' should be the correct naming scheme. You would not go looking for a special function in your autocomplete dropdown unless you know that you have a special situation and might want a function that can make use of it. If 'vector_add' has the requirement that inputs do not overlap, you will never think of that until you find out the hard way. Edit: By the way, that is why 'memcpy' and 'memmove' are really horrible names. How often have you seen people use memmove? Me, almost never. People want to copy memory, mostly do not think about the case of overlapping inputs, and just type mem... and up comes memcpy, that's what they use.
Nice video. Although it would've made sense to point out the example functions probably would have been inlined, with the compiler heuristic seeing that the values don't overlap even if the restrict qualifier isn't given.
The semantics of restrict are notoriously hard to define. But then, so are the semantics of pointers in a low level language, of unions (not sum types), etc.
Even if restrict is out of reach for your current projects, if you take one thing away from this video I hope it is that you can and should use compiler explorer to try small examples!
10:58 You can always code the functions needing restrict in C and call from C++. Totally standard. If you want to use __restrict and variants, use a macro like MY_RESTRICT so you can make it appropriate for the compiler (including being nothing if it doesn't support it).
Tyvm. Yeah i don't recommend using it willy nilly since it can result in such hard to find bugs and there are usually other things you can do to get performance gains.
Yaaaaaay, another C keyword that's important for micro-optimizations that I have to remember
lmao
Or just don't use it and you don't have to worry about whether you are using it properly or not.
You don't need to remember its name. You just need to remember that a certain magical keyword exists for when you need extra speed in a program with a lot of pointer arithmetic and no overlaps between the pointers. When you need it, just Google it.
Well, that optimization does not seem very micro to me
it's a safety feature too
Really enjoyed the video. I always found everything related to compiler optimizations very interesting. After seeing and understanding this, now I know why LLVM can occasionally optimize Rust code a little bit more than C code, it's because the "restrict" idea is baked into the rust borrow checker! Thanks for the video!
Very cool property of rust!
Although actually, this is still disabled in most cases because LLVM has/had lots of bugs related to this stuff because it's so rarely used in C and C++. Though all the bugs found so far have been fixed now so maybe this time around it will finally make it into the next stable Rust release, if we're lucky and they won't find another misscompilation in beta again like the last three times.
@@1vader yeah, maybe that's also a good reason to avoid restrict in C until those things get ironed out.
Yeah, although in fairness, programmers don't really think about this result. Since only the & mut is guaranteed to be exclusive, I'm not sure what results that will have.
I mean all the situations explained here should still always work with Rust, but I'm not certain if we missed something.
Another thing where Rust is able to gain speed, btw, is because of its move semantics and/or call conventions.
@@9SMTM6 not sure I understand. The issue with Rust is there are so many &mut llvm screws up. Or better said, it's issue with LLVM. Of course the proper fix is just fix those bugs but we need to have something working in the meantime.
It would be cool if you cover the details of C's inline (inline/extern inline/static inline) vs C++'s inline.
yes, that's a great video idea
What do you want to know?
Are you asking why this would be an interesting topic?
Now that's a really technical topic! We'll see, added it to my list of ideas 💡
Would love to hear what he has to say :)
But for now, there is a great video about inline in the C++ weekly series from Jason Turner: th-cam.com/video/GldFtXZkgYo/w-d-xo.html
@@mCoding And I thought restrict was Really Technical!
Thank you! The official docs on restrict are pretty dense and never simply state what it promises to the compiler like you have here so succinctly 👏
Glad it helped!
I'm just as interested in your C/C++ videos as I am in your python and algorithm videos. You are great at picking interesting and useful topics and condensing them in a short video.
I appreciate the kind words very much! Thanks for your support!
Yes. His toppic choices are amazing!
Very interesting! I passed it along to our Numba lead because he necessarily immerses himself in compiler intricacies.
There are currently 0 uses of the restrict keyword in numba, an opportunity for more speed perhaps? Though I'm not sure if numba is bottlenecked anywhere that this would help. Nonetheless thanks for sharing my video!
@@mCoding on that I'm not surprised. I don't think Numba itself is likely to be sped up by its use. But the specific issues around aliasing, overlapping memory, etc. that the restrict keyword deals with is something I know he spends a lot of time thinking about!
I dont know squat about C or C++, but some detailed insight about how compilers approach stuff like this and ways to think about code optimization is useful!
Even if it doesn't make sense or seem relevant now, one day it could just *click*. Happens to me all the time. Ohh yeah that thing from 10 years ago I totally get it now.
10:05 memmove actually doesn't need any temporary buffer. It works the exact same way as memcpy, except it checks the pointer overlap by comparing the addresses and reverses the copy order (copies from end to start) if the destination is located after the source in memory. This ensures the data it overwrites is either unsignificant -outside of the source range) or has already been moved.
I think there's also the nitpick that it can't use SIMD instructions in some cases; consider this memory chunk:
0x10 - 0x15: 00 01 02 03 04 05
We want to copy 5 bytes from 0x11 to 0x10, therefore we expect the following result:
0x10 - 0x15: 01 02 03 04 05 05
However, with (4-byte) SIMD, the first shift would result in this:
0x10 - 0x15: 00 02 03 04 05 05
There is one remaining byte to copy, from 0x11 to 0x10:
0x10 - 0x15: *02* 02 03 04 05 05
Whoops!!! A solution could be to also use a SIMD read, but... that would simply mean accessing some memory locations more times than necessary.
@@erikkonstas SIMD are not necessary here since ther's no computation to apply to the bytes. It's a simple copy, so just using a wider data type (registry) is enough.
(I don't know how the compiler would optimize this, but using SIMD seems overkill since a normal registry can perform this task just as well).
Basically, the single fact of reversing the copy order depending on the "direction" of the overlap is enough to ensure we get the expected result.
In your example, we want to copy from 0x11 (source) to 0x10 (dest).
Since the dest is "before" than the source, we just copy normally from left to right, which, starting with the same base memory chunk, gives something like this :
0x10 - 0x15: 00 01 02 03 04 05
First, since we want to copy 5 bytes, we can copy 4 at once with an unsigned 32 bits integer. We do so from 0x11 to 0x10 , which give this result:
0x10 - 0x15: 01 02 03 04 04 05
Then, since ther's only one byte left to copy, we can just use an 8 bits integer copy from 0x15 to 0x14:
0x10 - 0x15: 01 02 03 04 05 05
Now, if we were to copy from 0x10 to 0x13, we would have to do the same in reverse (copy from right to left), because destination (0x13) is biger than source (0x10) this time.
I love this. There are some many beginner programming videos out there. It’s nice to be able to watch something that helps me improve, even as a working professional.
That's my goal, something that even experienced professionals can enjoy! (Not something I achieve in every video, my viewers are very knowledgeable already)
@@mCoding 17 year professional here (Python/Scala/Java/C++), love the videos. I only wish you'd been doing something like the Python 3.10 preview videos (which were excellent) for C++ back when C++11/14 were coming out-- I hardly recognize the language anymore.
Enjoying the switch to C/C++ lately. That's kind of my focus for learning right now, so it's timely.
Any interesting 'quirk' about languages and compilers is a great topic
Although I mostly subscribed for the Python intricacies, other technical videos are still interesting.
Good because I can't help myself 😀
This was a good one, your explanation was good and easy to follow too. I was intrigued by restrict ever since I noticed it in some of the string.h function declarations.
I would definitely like to see more content around compiler optimizations, C/C++ weirdness (or features) and maybe assembly too!
Great to hear! I love C and C++ as much as Python so if my viewers do too there will be more of all of them!
Today morning I was reading getline function there was restrict keyword used in parameter list.
Searched it online was able to somewhat make sense out it.
Your video very much cleared the useage. Thanks.
Perfect timing for me! Glad it helped!
I love these more technical videos !!! Thank you
You're welcome! Thanks for watching!
Aliasing is the main reason Fortran has historically been faster than C/C++. In Fortran aliasing is forbidden, it is illegal to pass the same argument twice to a function or overlapping regions of an array.
Benchmark, Benchmark, Benchmark!!! As you said at the end, in most cases you don't really need this keyword. It only makes sense when you have benchmarked your code and made sure that you would really benefit from the added mental overhead of using functions with restrict parameters.
Really interesting as always!
Totally I should have emphasized this more in the video!
in Rust aliasing mutable pointers is illegal and the compiler prevents it so when LLVM is more stable the Rust compiler will apply noalias optimizations to all pointers by default so that is very cool
Don't benchmark; profile. Benchmarking software seems to be related to hardware; profilers are what we use to measure the performance of software. The distinction may seem negligible on a high level, but there's this problem with people writing their own erroneous "benchmark" code, whereas they ought to save time using a profiler instead, and come to a correct conclusion (rather than an incorrect conclusion) as a result.
@@sebastianramadan8393 well, kinda. Both have their place. Benchmarks are certainly easier to reproduce and therefore will give you a solid A-B comparison. But it is easy to write an irellevant benchmark if you dont set it up like in production. Its also easy to just randomly slap useless optimizations everywhere by *just* profiling
I'd say: Profile first to find bottlenecks -> try building a *reproducable* benchmark around those bottoenecks -> optimize -> benchmark again -> profile in production.
Obviously, it depends on what you're doing and the complexity of your products, but that would be my approach.
@@tacokoneko You're thinking of mutable references. Raw pointers exist in rust, and are allowed to alias memory
I always wondered about `__restrict` you see in C standard library declarations (as seen when writing c++). Very informative.
Also all the godbolting is very appreciated, thank you.
I really enjoy that type of technical content, thanks a lot! I hope I'll be seeing more such content from you :)
I've never heard of restrict keyword during 15+ years of writing C code daily. Thanks!
Happy to help! Glad to have an experienced C programmer in the audience to keep me honest!
I didn’t know restrict was even a thing!
You captured my attention nicely with the mysterious title, then proceeded with a very captivating technical explanation of what restrict is. I loved it!
Video-production note for you: I would suggest having less noticeable cuts between sentences. I don’t know what your setup is, but maybe a teleprompter would help?
My sentences are so short because I'm speaking spontaneously, I don't have a script. I could probably up my production quality by making a script ahead of time and using a teleprompter to get longer shots 😅. Thx for the tip I will look into teleprompters.
@@mCoding your videos are fine
it is one of the things the committee for whatever reason just fails to accept: Restrict is a very useful keyword. If they have problem defining what it would mean in relation to a "this" pointer - then just say it is not allowed or make it undefined behaviour - as they have done with tones of new features.
It would also be less of a problem if they would not have block every single suggestion for strong type aliases: Often you have multiple parameters that are conceptually different types but fundamentally just integers. Say matrix-indices. If you could give them actual strong types then not only would it give rise to the strong aliasing rules and thus directly act like "restrict" does, it also would prevent a whole butload of bugs that happen due to mixing up parameter order.
I had to deal with a function taking in 5 int-pointers for arrays for some simple statistics.... the genius who wrote that called them "param1" to "param5" ... have fun figuring out that the first and last are used as in-out-parameters while the other 3 are strictly for reading data.
Definitely worth making. I haven't looked at the C standard in years and I see that I should. Years ago I was someone who spent a lot of time answering questions about both the C and C++ standards. But I see they've made more changes than I expected.
I am interested in picking up C development again. The C17 and C23 additions are interesting. I enjoyed writing C code back in the day, but the majority of the industry moved on forcing me to adapt.
I miss writing code for the machine rather than writing code for the abstractions. Somewhere things went off the rails with layering abstractions on top of abstractions, and we've forgotten that we're writing software for real world machines with real world limitations.
@@potato9832 It definitely had its upside. I worry that today, that us old school are few and far between. The upside of that is that people that actually know how things work are in demand. The majority don't actually understand how stuff works, because they don't have to.
I love this kind of technical videos.the things we lack on this TH-cam space. :D
Actually, C++ does have an analog for C's restrict keyword. When you make the parameters different types, the compiler uses the same behavior as if the restrict keyword was used. This is why you might see "strong type aliases" used in some cases, since they can be a negative-cost abstraction.
Hmmmmmmmm. I'm not sure if I consider the strict aliasing rule (which is what allows strong type aliases to enjoy restrict-like optimizations) to be equivalent to restrict since sometimes the types really should all be the same, or allowed to alias like void* or char*. But I did consider talking about it in this video so I guess they are closely related...
@@mCoding I guess the point of @Nicholas Bradon was that C++ with its capabilities of generic user-defined types allows you to enable the same compiler optimizations. For example you can make a source and a sink type as strong types which even convert into each other. But since they are actually different types as parameters, they don't alias and thus enable the same optimizations as the restrict keyword.
For this we first would need actual strong type aliases - which the Committee has also always opposed -.-
the hilarious thing is that VB6 had the Assume No Aliasing compiler flag.
Actually you should avoid using pointers at all in C++. There are smart pointers for a reason.
Regarding memove and temporary buffer:
You can check at the begining where the areas overlap. It the dest starts at higher address than src then you could reverse the loop, not needing temporary buffer.
As I do almost exclusively competitive programming ("almost" because I also have university assignments), I never knew about this detail. I never had to use the 'restrict' keyword. This reminds me, that in fact, I never had to use many, many keywords. So here are a few questions, perhaps ideas for future videos:
What the heck is extern?
What does static actually mean? Is it meant to mean something along the lines of "global for this function / structure / class" or is it about being inaccessible by code from other files?
What does volatile mean?
What are the differences between structs and classes in C++? (I know there are several, but honestly I have no idea if they're interesting enough for a video)
What are the different stages of creating an executable from source code? (Preprocessing, compiling, linking, and possibly a couple others which I have never heard about)
What are the other questions I wanted to ask which I do not remember right now?
Man, so many questions...
Loved the video! Cheers!
I'm not an expert, but maybe I can roughly answer a few of those questions.
What is extern?
To answer this, we must first differentiate between declaration of a variable/function and definition of a variable/function. Declaration introduces a name and type of the variable in some scope and tells the relevant parts of code that the variable exists somewhere in memory. Definition actually specifies where the variable/function is defined and/or what value it has:
void foo(); // declaration
void foo(){} // definition
Consider this global variable:
int bar;
This syntax declares the "bar" variable with type int and also defines it as a global variable, i.e. it lives globally in memory. But what if you actually didn't want to define "bar", just declare it? That's exactly what the extern keyword is used for:
extern int bar;
This declares "bar" of type int, but it doesn't define where it lives yet. That means you can't actually use it, if you try to set it or use its value, the compiler will complain that "bar" is declared, but not defined anywhere. The advantage here is that you can declare the same name multiple times in different modules, and then define it in one module and all the modules will refer to the same variable when they use it. It's the same concept as declaring functions in a .h file to include the same function in multiple modules. If you don't define the function, you also get a complaint from the compiler (or linker, I should say).
Difference between class and struct?
There is almost none. The only difference is that members of struct are public by default and members of class are private by default. It also conveys some semantic difference to a human reading your code, but to the compiler, they are the same thing.
What does volatile mean?
Volatile tells the compiler that it can't assume anything about the value of the variable declared as volatile. Consider:
int i = 0;
while (i==0);
The compiler can notice that i doesn't change and so the while loop is infinite, i.e. it optimises out the i==0 check and just loops indefinitely.
volatile int i = 0;
while (i==0);
i is declared as volatile, so the compiler can't assume anything about its value, so the check i==0 will not he optimised out, in case somebody changes the value of i from outside the loop.
This keyword is commonly used in multithreaded programming, or also with memory mapped I/O.
What does static mean?
To my knowledge, static can mean different things based on the context where it is used in C. Function or global variable defined as static is considered local to the compilation unit, i.e. if you declare an extern int bar in unit A and define a static int bar in unit B, then bar in unit A is still undefined because bar in unit B is local to unit B. Similarly for a function. If you define a variable static inside a function, then that variable is understood to live globally, i.e. not on the stack of the function. This means that when the function is executed multiple times, the value of the static variable is retained, unlike a value of stack defined variables.
@@stewartzayat7526 Thanks a ton for the replies! Especially thanks for sharing knowledge about extern. That being said, I still have this feeling that struct and class are different from each other in several, very minute ways besides the default member visibility.
@@Maurycy5 I guess I forgot to mention that inheriting from struct uses public inheritance by default and inheriting from class uses private inheritance by default, besides that, however, there is no difference. There is a geeks for geeks article describing some other differences, but they are saying straight up plain nonsense there.
TL;DR of Stewart's great effort post above:
extern - The function or value is defined in another object file.
static - The function or value cannot be used to define an extern declaration in another object file.
volatile - The compiler must never assume the value bound to this identifier.
Maybe you have never worked on a project with multiple object files. This is where multiple source documents are compiled each using `gcc -c objN.c -o objN.o', and finally linked using `gcc obj0.o obj1.o obj2.o main.o -o binary_exec', where binary_exec is the executable run by the user. The linker will attempt to resolve definitions for any undefined identifiers declared with extern by looking in the other object files present on the command line.
You might wonder what a reserved word like `__extern_always_inline' could mean. This tells the linker to attempt to inline the specified function where it is referenced using an extern declaration. This is part of a process called link-time optimisation (LTO).
I love the acknowledgment of the real world applications :) In many videos like this over YT those presenters act as if there's only the "pure standard" and the land of "undefined behavior" while in practice, when you are an actual paid programmer that ships products you know you need to bend the rules often.
Crazy how I googled about this keyword earlier today at my work computer and now youtube recommends this to me on my personal phone.
Strong Jason Turner vibes in this video, I like.
He is a big inspiration for the channel!
I loooove this level of detail. Keep it up dude
Thanks a ton!
The restrict keyword (specifically noalias) is a little bugged right now in LLVM and GCC and may emit unsound code even when used correctly. Additional test cases from the Rust project exposed some deficiencies in the original implementation
That has been fixed since this comment was written!
@@anlumo1 God I hope so, I've seen a few "this time it's fixed" posts and then something else comes up x.x
@@meowzerus yeah, it was "fixed" multiple times, but I haven't heard anything about it in a while, so either people stopped looking too closely or it really works now.
wow - I've been using C for 35 years and never noticed or used this. I've don't think I've ever seen this in code reviews either and I've worked in many different industries. Thanks for this. One of the things I wish I did more was to look at the assembly generated.
Probably a good thing considering how easy it is to break code as soon as you enable -O3.
I love this channel. More C/C++ content for sure!
Thanks! Will do!
In C++, unique_ptr might go in a similar direction, while not allowing pointer arithmetic etc. But it's a way to enfoce via the compiler to disallow aliasing to a pointer.
Actually, unique_ptr does not come with semantics of disallowing aliasing. unique_ptr is about ensuring there is a unique *owner* of the object, i.e. the unique_ptr is uniquely in charge of deleting the object. It is fine and very common to have many non-owning raw pointers pointing to what a unique_ptr owns. As such, the compiler cannot assume a unique_pointer's underlying pointer is not aliased.
@@mCoding well yes, I assumed that one does not mix it with raw pointers. But I don't know whether the compiler uses unique_ptr for optimization purposes. Probably not, for the reasons you mention.
Useful knowledge! I knew what restrict did, and I knew it wasn't in C++, but I didn't know about the extensions.
Stumbeled upon this video, and I must say I really enjoy this proper programming video in the sea of framework and loose concept videos that are out there.
Thanks for the kind words, glad you enjoyed!
This is great! Please do more of these videos, you're really good at explaining these more technical topics 👀
0:00 - ok, this sounds like not just a welcome message, but more like "Hell, bit*hes, who said I only make Python videos? You wanna low programming! I'll give you low programming! you thought making stuff in pure C was the end? Prepare your sorry arses to some assembly!"
k
I have now acquired the knowledge of restrict usage in C and (lack thereof) in C++.
Side note: In Rust everything is restrict (I guess except unsafe pointers used for interfacing with C/ffi). Meaning you simply can't pass a reference to the same thing as mutable and immutable (or even as mutable twice) at the same time to a function. So in theory Rust can always apply those optimizations, but I know that in the past there where problems with LLVM that prevented Rust from enabling this. Meaning Rust is the first time this optimization feature of LLVM was used to that extreme extend and thus uncovered compiler bugs (miscompilations). I don't know what the current state of this is, if LLVM has fixed all the issues and it was re-enabled in Rust. I know there was a back and forth for some time where Rust enabled it, found bugs in LLVM and disabled it again until LLVM had fixed it and repeat.
Fair enough although there are other places where safety tradeoffs are made in Rust for example in array bounds checking. I think with both of these things the micro-optimization is not significant enough to outweigh other factors (and if you did need that level of performance you would be checking the output assembly anyway to ensure everything is as good as it can possibly be).
EDIT: Nevermind, the vector_add example has convinced me this is probably more significant. Would be interested to know if any of these would stop working in the presence of bounds checking (probably not)
@@samuelallanviolin752 If you can do the stuff via iterators instead of array index you don't have a bound check (only the implicit loop condition). And even with bound checks branch prediction helps. But yes, it is a thing where it can reduce performance.
At the 10:34 minute ... "But there was a lot of push_back."
Tell me its pun intended.
I really love your C/C++ tutorials.
Can you make some guides about C++ debugging and development systems? There are not a lot of tutorials out there that cover topics such as debugger attachment or, for example, how to develop and debug a dynamic library that is being used by an already build or compiled program
Excellent video as always. As a long term viewer I can safely say that you never disappoint. Keep up the good work, man!
Much appreciated! Glad to have you!
Perfectly examples of why the "const" does not mean constant, but read-only!
Really enjoyed this video, we need more C/C++ videos!
More to come!
I appreciate the gigantic text, which is unlike all other programming presentations where the host is unaware that an audience can’t read 12-point font on a 4K screen.
I don't even code in C and I found this fascinating, thanks.
Really helpful, I've always wondered what restrict really was!
Awesome! Finally I understand what the man is going on about with all the "overlapping is undefined" stuff for memcpy! :D
It’s very rare that i consider a TH-cam coding channel to actually be presenting any sort of correct or useful information. This channel is one of those rare cases
I am honored and thank you for the kind words!
The closest I have come to working with c++ is some 40 line arduino code, but I believe learning about this will be beneficial in the future none the less.
Holy shit, I had a bug about using memcpy with overlapping memory. And now you just explain this. Was so hard to find this
"Nice shooting, son. What's your name?"
"Murphy."
I've never seen it in production code. It takes a fair amount of thought to apply correctly inside inner loops where it might help.
You are the son I never had. 😊 I didn’t think people kept track of those things. Good. 👍
The only thing I would change is make the overlapping function the default. Make the restricted function the longer named "opt in" function. That will help make sure future maintainers don't shoot themselves in the foot.
To add to what you said about having two versions of the function, one with restrict and one without: if the function is particularly complex, you might want to put the body in a separate file and #include it inside both functions, so you don't have to keep track of two copies of the same code. Or you can use a macro I guess, but it might look ugly having to put a \ after every line.
Possibly, yeah. That's a tradeoff you could choose to make depending on if it makes sense for your codebase.
Things being modular like that makes my brain feel good :3
I havent written a single line of C in all my years as a programmer, but its still interesting to see how and why it does what it does.
Great video, I had no idea restrict was a thing. Very good explanation :)
I knew that C++ was not a strict superset of C but not how so, good to know one way in which they're different now :P
#define fubar(x) _Generic(x, default: (char*[]){"hello", "is there anybody out there?", "not if you can hear me", [sizeof x]="is there anyone home?"}
fubar('a')
struct relax { size_t information; char hurts[]; };
There's also something subtly different about the scope of nested declarations. If I recall correctly, in C, the declaration for `struct bar` is hoisted outside of `struct foo`: struct foo { struct bar { int x; } x; int y; } ... so that you can use `struct bar` elsewhere. C++ has some strange clause that renders a recursive `main` entry point undefined... Eh, there's still quite a few differences. char *blatant_cpp_error = malloc(0); // write C like this; don't cast to facilitate for C++ or you're enjoying the worst of both worlds
Request: would you please consider doing a video on all the different value types in C++ (i.e., glvalue, rvalue, lvalue, xvalue, and prvalue)? That is the thing I find the most confusing.
Got it! I've added value categories to my list of topics, that's one that I'm sure many C++ programmers struggle with.
@@mCoding awesome! Thanks so much. Can't wait.
12:00 - I would totally be interested in C + C++ content!
It's still there in C++. Its hidden inside several Libs in fact. Its also inside a BitLib that IBM developed and is called UNDERSCORE + restrict + UNDERSCORE and does the exact same thing with pointers being assigned.
Nice video as always.
However, I would've enjoyed a side by side view with the Assembly with and without usage of the restrict keyword to find out which instructions are actually changing (or disappearing).
Fair enough, I kinda expected people interested enough to plug it into compiler explorer for themselves :). I definitely considered doing the side-by-side at one point but I don't recall why I decided against it.
This strongly reminds me of mutable references in Rust. In fact, I think a * restrict T is almost exactly like a &mut T in Rust. Which explains why there are so many potential dangers with restricted pointers.
Great Video, I had never heard of your channel, but your explanation was great and understandable!
You can always add a macro to use __restrict only where it's supported, and it's defined as nothing when it's not.
Great video. Well worth it from end to end. Keep it up.
Every major C++ compiler supports it. So I defined it via a macro.
This is really high quality content, thank you!
Nice!
Really important for autovectorization.
Thanks for the vid pal, I appreciate your work
Thanks, learnt something new today. Not something I was seeking for, but someday perhaps it might come in handy...
btw rust always provides the same pointer aliasing guarantees as restrict, as there can't be more than one mutable reference, and you can't have any immutable ones while you have a mut one.
so for example you can't pass
add_sub(&x, &y, &mut x)
which allows the compiler to safely and automatically apply these optimisations.
this is one of the guarantees that makes rust sometimes, but very rarely, outperform C in benchmarks. Of course when compared with proper optimized C code (with these guarantees specified explicitly, e.g. the restrict keyword and friends) it's more or less equal or *slower.*
(slower: mostly due to additional checks Rust does, for example array bounds checking, which is included with all array accesses by default, unless the value is guaranteed to fall into array's size (this behavior can be changed using unsafe code))
Excellent and interesting video, James. So pedentically speaking, C++ is not a superset of C :D
Sorta sounds like a unique pointer.
I did not know about this at all, so I learned something. New to your channel. Anytime you want to do a C video that'd be fine with me!
Now ik why a lot of the standard library functions had that restrict keyword... Cool video.
Me more grow smart
Your explanations are really good
I enjoyed that, and I like technical stuff. This wasn't even that bad, from my perspective.
I love non beginners videos, thanks a lot!
I have never seen that keyword, but it's damn useful.
Thank you so much for making this explainer. Helped me a lot.
you should do a video on using the compiler, building, and make files. like deep meaning of all the flags and stuff. it seems their are a lot of tutorials on how to program in C++ but few on setting up, using, and modifying big make files. so to new people its kind of feels like RTFM but when I type in man g++ Manual assumes a lot of knowledge, the manuals are sometimes confusing, non existent, or maybe I am just not finding the manual for the job. but you explain things well so you should go threw compiling and make files.
Great video. Regarding function naming, give the special name to the function that has special requirements. I.e., 'vector_add_no_overlap' and 'vector_add' should be the correct naming scheme. You would not go looking for a special function in your autocomplete dropdown unless you know that you have a special situation and might want a function that can make use of it. If 'vector_add' has the requirement that inputs do not overlap, you will never think of that until you find out the hard way.
Edit: By the way, that is why 'memcpy' and 'memmove' are really horrible names. How often have you seen people use memmove? Me, almost never. People want to copy memory, mostly do not think about the case of overlapping inputs, and just type mem... and up comes memcpy, that's what they use.
Nice video. Although it would've made sense to point out the example functions probably would have been inlined, with the compiler heuristic seeing that the values don't overlap even if the restrict qualifier isn't given.
The only keyword that scares me, let's me create bugs I'm not smart enough to find.
The semantics of restrict are notoriously hard to define.
But then, so are the semantics of pointers in a low level language, of unions (not sum types), etc.
Very informative video. Well explained. Thanks.
Me who has just started w C++: Yeah... Uh huh... makes sense... mhm...
Love the vid though, great explanation
Even if restrict is out of reach for your current projects, if you take one thing away from this video I hope it is that you can and should use compiler explorer to try small examples!
Just imagine the optimisations you could do if you eliminated mutation entirely.
Please use Dark mode, it hurts my eyes at night to watch.
Great video tho, concise and thoughtful.
10:58 You can always code the functions needing restrict in C and call from C++. Totally standard.
If you want to use __restrict and variants, use a macro like MY_RESTRICT so you can make it appropriate for the compiler (including being nothing if it doesn't support it).
Not if you need to use some C++ features in the functions
I wouldn't say _Generic is used like templates. It's more used like function overloading. You can use macros kinda like templates. All quite hacky.
Even though I wouldn't use that, or any modern keywords, it is still interesting.
Oh, and helping the algorithm.
Tyvm. Yeah i don't recommend using it willy nilly since it can result in such hard to find bugs and there are usually other things you can do to get performance gains.
Yea this keyword seems like a disaster waiting to happen
Restrict all pointers and always build with -O3 -ffast-math -funroll-loops for maximum speed crashing!
Awesome level of expertise! Thanks!
Very welcome!
Thank you for this video, great channel!
Using restrict in things like memcpy exposes it to serious speedups like DMA and kernel-level memory mapping (copy on write).