What do you want to see next? 👇 Don’t forget you can try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/TheCherno . You’ll also get 20% off an annual premium subscription.
@@hanspeterbestandig2054 Instead of just a comparison I'd like to see a video of cases where he has personally preferred one over the other, like what problem did this data structure solve. I've rarely found myself using lists, but when I have they have been invaluable. Rarely see myself using maps over unordered, priority_queues over deques, or stacks over deques. And lots of my usage of various data structures are just out of habit, but I chose them to solve a specific problem they might not be the best solution for.
video about people who optimising their code about replasing list to vector and how much they use their time to it compared to time how much their app will run in this universum.
I think we should stop recommending c++ as a beginner language in this day and age. it's fine if you are looking for a job in it but in general it's not really that good of a language (like honestly I've used it)
Another thing I would add to this, is to always mark your move constructor noexcept if you want the vector to use it. In this case it didn't cause problems, since the reserved size, and no vector resize occured in this example. But if a resize did occur, the vector probably would use the copy constructor instead of move if it's not noexcept. So always mark your move constructor noexcept if you can.
6:08 - that is categorically wrong. The cost of using heap-allocation is the actual allocation. Once it is allocated there is no difference anymore. 9:57 - compile that with a not-ancient compiler and optimisation enabled: The result is most likely 0 allocations - the compiler is allowed to remove those. 16:35 - emplace_back would also be 0 allocations - that is mandated by the language. 19:05 - the reason it does not have a move-constructor is cause you disabled it by giving it a user-declared copy-constructor. had you not done that your class would be a simple aggregate-type, those operations would all be compiler-generated (with some other nice benefits) and you'd not see copies/moves either. With vector you only want to use reserve if you either know the exact number of elements already, or you have measured that there is a performance-problem and you have also measured that you can get a good enough heuristic that your preallocation actually is significantly faster. If you dont know then you can very easily end up with nearly the same number of allocations but a lot higher re-allocation and more memory-traffic.
Yeah, I was almost willing to forgive his giant "stack is faster than the heap" text hoping he would elaborate. Then he elaborated wrongly. I can see people duplicating values all over the stack just to avoid using the heap as a result of this advice.
I usually share his opinion but coming from ccp weekly knowledge he is nit picking over stuff the compiler already does in most scenarios. It's often times better to write expressive code than performant code. Because they might work the same way after optimizations.
@@bestopinion9257 not necessarily, there’s a lot of cases where you know the minimum size but you still want the capability to expand the buffer without seg faulting.
@@bestopinion9257 No it's not a contradiction. In some scenarios you know beforehand that you will be dealing with data in some range(between 100 and 10000 elements for example) that goes to thousands It would be good for optimisation to preallocate some memory so it wouldn't waste some time increasing capacity and copying inside arrays
It is common to get something where you get an image line by line and get the metadata at the beginning to pre-allocate the size. Preallocating a maximum size image would not be as memory efficient.
Technically, std::array is stored wherever the memory you are using for it, is stored. If you have an object that has an std::array, that array is going to be stored wherever that object is. If you put that object on the heap, then the std array is stored on the heap. In your second example, if you make a static array to store the colors, that data would be stored in the .data section or .rdata section. (Or SOMEWHERE within the PE or ELF. I have also seen static data get stuck in the .text section. Haha) The important part is that it's not going to create any additional memory. Just minor nitpick though. :)
The discussion in the second part is wrong in the sense that the only reason you got all those copies was because your instrumentation code forced them to be there -- overload resolution will prefer the manually-added copy constructor over the compiler-generated move constructor. Had you _not_ written the copy constructor, a move would've happened instead. That is, it's not true that you need to supply a move constructor yourself. In the vast majority of cases the compiler will write one for you and it'll usually be correct, particularly if what you have are just dumb structs (even if they contain more complicated types like vector).
As for whether one should still use push_back, IMO yes. It's true that emplace_back subsumes the same functionality so in principle there's no loss of expressivity if you just use emplace_back everywhere. However, 1. using push_back signals intent and more importantly 2. it's not a template, so error messages will happen at the call site instead of deep in the standard library in xmemory or some other implementation-specified header. push_back should also compiler faster and lead to a smaller binary, for the same reason.
To solve this issue, C++26 will add a std::inplace_vector, a "dynamically-resizable, fixed capacity, inplace contiguous array". It has the benefits of both an array and a vector, it's capacity is fixed but it's size is dynamic, and it's stored on the stack.
Dunno why it took them so long, using std::array with a manual counter is kinda annoying. I do find a hybrid model more convenient though. A vector that has two allocators, one in-place with fixed capacity and then if that exceeds, it will fallback to heap allocated storage. Useful for when the array usually doesn't exceed a certain size but can.
You also could have mentioned std::span, which is a meant as a view into a contiguous buffer (like std::vector/std::array) similar to std::string_view is a non-owning view into a std::string (or any contiguous buffer of char)
As someone who works with Rust a lot, I'm sad to see std::span mentioned so rarely. We at least have people using std::string_view nowadays, but it's unfortunate that many people don't know about similar generalized concepts. In fairness though, I have colleagues who take arguments as &Vec in Rust and don't think to just change it to &[T], so i suppose this problem is language agnostic.
@@coarse_snad I ended up implementing my own span, but I call it *view* because it's more intuitive, so I have a list of defined types, such as: view view view ... They derive from view and implement their own operations, such as arithmetic operations for integers and string operations for char/wchar, etc. span really is a class that allows you to write very concise code.
Great video! Just a side note... 8 calls to malloc (or standard c++ new operator) not necessarily heap allocate 8 times. it allocates in pages of memory, not every call. but I totally got the point of the video, which is awesome btw!
operating system gives memory to the process in pages, allocators (like malloc) then break those pages into chunks and give the chunks to the programmer. In order to reuse chunks you need to keep track of the chunks. That means you need at least 2 pools of chunks: one for unused chunks, one for occupied chunks. Plus, it makes sense to keep small and big chunks together respectively to minimize memory fragmentation. Even it the call to malloc (or new) looks very simple it's actually not very simple at all
Reserve has a fun pitfall of not following the geometric growth c: this may cause more allocations if e.g. you have a 1k element vector and reserve 100 elements; the reserve would allocate for 1.1k elements, while using push/emplace_back would allocate for 1.5k elements. If the reserve is done multiple times, it can cause real performance issues.
Reserve is allowed to overallocate, and I believe that using reserve on most implementations will follow the normal geometric growth of that implementation. On the other hand, resize has this issue on all implementations, IIRC. But in either case, caution is needed as it is vary easy to do worse than just letting the container manage its own size. You are right and Cherno should be more cautious recommending manually handling the vector's size. That works best if you know you want exactly N elements and will never change it, but in most cases it is better to just let vector handle it, at least until you have a performance profile showing that it is suboptimal.
@@oracleoftroy That's why I wrote my own vector, I can control the type of growth in compile-time, for a vector that will deal with blocks or buckets, growing exactly is more sensible because of the size of the chunks...
I'm pretty sure the push back function is an amortized constant so preallocating just halves the copies. Misusing the resize can also make it go from a constant to O(n) insertion.
Sorry I don't understand what you're saying. If I'm inserting n elements into a vector I don't understand how it could possibly be done faster than O(n)
A std::array is not a general replacement for std::vector. If you return a std::array from a function by value, all its elements will be copied. Returning a vector by value is a lot cheaper. (Sure, in most cases we have RVO.) Another important point, though, is that stack space is limited. Your program might work for small test examples, but will crash if you suddenly use it with larger std::arrays. This does not happen with std::vector. In certain cases it might be a lot smarter to use a vector to future proof your program for larger inputs. I’m also not sure about performance differences in accessing arrays or vectors. This might be true if you only store 5 ints, but for larger sizes the overhead of the initial indirection is negligible. Caches don’t play much of a role for performance comparisons if you iterate over a hundred structs or more. You should only make sure that you don’t continuously resize a vector by calling push_back (like you have mentioned). But, don’t initialize the vector with a size or resize it. (Almost) always use reserve() instead because default constructing objects (not just ints which actually stay uninitialized) is a performance killer. If you always use resize() you don’t have to think twice. And finally, only use emplace_back if you want to construct the object in place. Otherwise the general consensus is to use push_back to avoid nasty errors. If you move an already existing object into the vector with push_back it is not slower than emplace_back. But, it is safer and you actually want to push back an object in this case and cannot emplace it.
15:00 The question of reserve or resize is basically: Is the element cheap to default-construct and to copy-assign? Yes? Then use resize. Otherwise use push_back. The difference can be massive, as every push_back requires checking and setting the capacity, plus writing the data. A copy-assignment just writes the data. The default-initialization for trivial data types is cheaper than a capacity check-and-decrement. If the size won’t increase, use a std::unique_ptr and write in all the elements. A little less overhead than std::vector. In C++20, there is std::make_unique_for_overwrite to create a unique_ptr with uninitialized elements.
Thank you so much for making this video! It’s super helpful for someone like myself who’s self-taught and doesn’t have a good grasp of the inner workings. I really appreciate how you explain the thought process, the different approaches, and show how you can dig in deeper to verify for yourself. You’re a great teacher!
BTW In the meanwhile the owner (Adam) of this Tetris project was so kind to accept my Pull Request in which I taught him about these Issues as part of a Fork of his Project. This means that the latest version of the discussed code is now fixed in this repeating copy of the vectors. This storage then is referenced by a *const reference* that refers to these cached data when needed. Hence the code now performs lazy allocation of the Resources upon the first access and stores (caches) it *on a single point*. BTW its not only the Vector of Colors that are managed within a std::vector).
You mentioning the code review episode which inspired this vid reminds me of a teacher looking at your paper during an exam, then reminding the whole class not to make some very specific mistake 😅
Good for beginners, but here’s a few comments. The C++ standard library is no longer called STL, it was years ago but now it’s just “the standard library”. std::vector is one of the very few standard collections which is OK even for performance-critical stuff. The only few times when I replaced it recently when I wanted to bypass malloc/free C heap i.e. I know my data is very large and I wanna page aligned zero initialized memory directly from the OS kernel i.e. VirtualAlloc or mmap. Modern compilers are smart enough to eliminate temporaries caused by push_back. For simple elements like your example they often automatically inline everything at least in release builds, compiling push/emplace into equivalent machine code.
this is so interesting! I knew all this in theory, but all of these look to me, eyeballing it, like the compiler should be able to get rid of the extraneous allocations on its own. I've been told before things like "portability isn't the only reason to pick C over assembly - you're also probably not cleverer than the compiler!" but I had no idea that this doesn't actually even begin to translate to the jump from C to C++, not at all!
Correction: stack memory is not faster than heap. It is literally the same memory. It is only the allocation step that is slower for heap ( in fact it is undetermined ). The reason why stack might be faster in some occasions is the fact it is pre-allocated and hard to cache miss.
in order to access stack data the cpu will use the value in a stack register and an offset to get the address. In order to access a heap data the cpu will use the value in stack register and an offset to get the address to a pointer, and dereference that to get the address. After getting the base address to something like an array then yes, stack and heap memory are just as fast, but getting the address of the data is one more step for heap.
@@petarpetrov3591 for one big task like that, sure. If you have many small lookups it's the other way around. Consider it irrelevant or not it's wrong to say it's only the allocation step that's different. I can agree that it's mostly irrelevant though
Stack and heap memory are not necessarily the same memory. In an embedded environment, memory may be implemented in separate fast and slow physical memory devices with the stack and heap configured to live in either one.
In the tetris code, when it iterates the vector it is also making a copy of Position. He could avoid that by changing that for to "for (const Position &item : tiles)"
Great video! I've known about this for a while, and it's surprising how many programmers overlook the importance of understanding how many copies and movements are happening in deep memory when using std::vector. You absolutely nailed it in explaining why this matters. It's not just about using the right tools, but knowing how they work under the hood. Thanks for shedding light on this important topic!
It is nothing wrong to return a vector from a function. It is a move semantic or compiler's return value optimisation which allows not to copy returned value. Also every container has "reserve" function to avoid multiple dynamic allocation/copy in a container. But in general I agree with you that C++ has big disadwantage in compatison to C. It is required paying much more attention for performance of the application.
This is why I made arrays first-class objects in my language. If you use a question mark as part of the array size when declaring the array, then it's dynamic and on the heap, otherwise it'll be static and on the stack. For most operations the compiler attempts to optimize usage of dynamic arrays as views, like I do for strings. In fact, the default string type is a string view in my language. Of course, someone will invariably wonder how you control the allocator when these types are built-in, and for that you can use the template syntax to substitute your own allocator.
I understand that "on the stack" and "on the heap" is mostly true, but being as pedantic as I am, I would say that `std::array` is "inline", and `std::vector` is a "owning pointer to a heap allocarion.
keep in mind you dont always want to stack allocate arrays, especially if they are large. you only get 1mb of stack on windows and 8mb on linux, which is fine for small stuff but past a point you want to keep it fairly freed up
I think their is an important idea to have because we often add elements to vectors but sometimes we need to remove them. If we don't have a LIFO or something we sometimes need to remove elements in the middle of a vector. If the order does not matter the correct way to remove elements is to swap the element to remove with the last element THEN remove the last element so it is constant time and not linear. Actually I think this kind of algorithmic optimization is more important than machinery optimization that is a super complex topic in the end.
Thanks Cherno, please teach custom allocator, arena allocator next from scratch. Also speak about template specialization using std forward and more about templated classes Thanks a ton in advance
In C++, "struct is class" although trying to use the compiler option -fno-rtti that is not good for the enormous classes taxonomy. For cleaner design, v.push_back(Data(i)) may not give same performance as v.push_back(i), so that the former idea maybe unoptimized.
There are cases where push_back is more suitable and the distinction is quite simple: if you have an object of type Data already constructed, use push_back (preferably with std::move), and if you're constructing the Data object as you're inserting it into he vector use emplace_back. If you're using emplace_back for objects that were already constructed you're unnecessarily calling the copy/move constructor.
What about huge data structures arrays? Say you need a million elements, should you use the stack or the heap? Should you use , or just an array? I'm aware the stack has a fixed size and it's not that big, but I'm also aware you can change the size of the stack so... Which one to use?
Fun fact - allocating more than you asked for also happens in other languages. I remember asking on SO why a supposedly empty dictionary has such a large size when I first discovered this.
Make a habit of typing `auto const &` any time you're "cloning" something on the right of the `=`, and when you realize "oh, I need to change (mutate) this thing", you just have to decide if you need a copy (remove the `const &`) or change the original (remove just the `const`). Start strict and loosen only when necessary. Likewise, take broad types as parameters, but return as specific a type as is reasonable. It's good practice in general, but when you do reach the point you realize it's a useful generic abstraction, often all you need to do is replace the type with the template type in the function specification -- at least assuming you've been smart about using `auto` in your declarations like I mentioned above. With modern types like std::span (or gsl::span) and std::string_view that can even take non-hierarchical types, C++ is supporting even better API designs.
Some of us are super old school and were around when STL was the "new kid on the block". It wasn't part of the C++ standard until 1998. These were days when you had a dog eared copy of Knuth's algorithms book (TAOCP) on your desk for quick reference ;) Anyway, the reason one is heap based and the other stack is pretty simple. The std::array is basically just a decorator for an array with some sugar thrown on like iterators and range checking. A std::vector, on the other side basically does a malloc to create a buffer. It will then exponentially grow that buffer as it needs to. Also, heap has always been slower than stack. Unless something has changed. In the world of x64 it does require a few extra instructions, but heap memory is not contiguous, so there is overhead with the memory manager (which I also wrote low level back in the day using Borland C++ and Assembler :P ). Stack is contiguous. In terms of complexity they are both O(1) for access, but the std::vector will add overhead for insertion. If it needs to resize that operation will take O(N), so in the long run std::vector is going to be slower. It's really not something you would typically optimize for - remember Knuth's warning: Premature optimization is the root of all evil. Use them for what they are used for. If you have a fixed buffer and you know it will never change, use a std::array, if you need a dynamic buffer use std::vector. These are patterns that should be familiar to all developers regardless of what frameworks or libraries they are using. Just saying. Now let me get off my X Gen soap box :P
💡One thing that could have been highlighted (though you have briefly mentioned it), is that moves are not free. When a move is done on a temporary, both objects need to be created first, before the temporary is moved (and also destroyed). This is precisely why emplace_back() which forwards the arguments is good 👍
I tried to write the same code in CLion and I'm glad that Clang-Tidy linter highlighted me these issues and recommended to use `emplace_back` instead of `push_back` and highlighted unnecessary temporary object creation when I tried `emplace_back(Data(i))`. Not even saying about `const T&` stuff
Push back can actually be better when you have a vector of some small data type (bellow 8 bytes on 64 bit platforms, and bellow 4 bytes on 32 bit platforms) ESPECIALLY for built-in types like int, size_t, and other numerical types, because those have special optimizations around them in compilers when copying them, rather than the reference to it This is not ALWAYS true, but 99% of the time, copying a number is better than copying a reference/pointer to it, especially when it's smaller than 8/4 bytes (again, depending on the platform) PS: these optimizations don't really matter unless you're gonna move around millions or billions of numbers per second.
So for the example of a function which returns a vector, I'm of the opinion that if you are intending to return a dynamically sized chunk of memory, it is better to use the heap. If you want to use the stack, you're stuck between two options, copying the structure up the stack in the return line of the function, or requiring that the caller knows how much memory you're going to need and passing in a reference. Having to allocate and deallocate space in the heap is not that big of a performance hit if done correctly. The blanket advice of "avoid heap" isn't nuanced enough in my opinion. What is more important is your emphasis that people understand how these data structures work. But of course, let's not forget that the compiler can often do the work for you and writing maintainable code is arguably more important than optimized code in those examples.
You missed talking about amortized constant time for push_back despite all the allocations. I think that's important to cover because it explains why vector chooses to increase the size by a multiplicative factor each time.
While I believe it's good to show the differences, I think the comparison isn't entirely fair. By having the copy and move constructor modify global state, you're introducing side effects, which effectively prevents copy/move elision from happening. Remove those side effects and the copies/moves created by push_back(Data(i)) and emplace_back(Data(i)) will actually be elided.
Before C++20 emplace_back didn't work with aggregate initialization, so you had no choice but to have it invoke the move constructor: emplace_back(Aggregate{x}) But, when you start doing that, push_back and emplace_back pretty much do the same thing. One other thing worth noting about emplace_back is that it returns a reference to the constructed element since C++17, which may be useful in certain situations.
One of the things that I think that wasn't mentioned between the uses of std::array, std::vector isn't just in knowing how many items you'll need or have within your containers, but also the lifetimes of those objects.
Great show and tell of how something and simple as std:Vector isn't the silver of collections of data that a lot of people think it is unless through and planning are used first.
This video is A++. Seriously. This will save a lot of time and headache for a lot of people The title, content and timing of release. I was just in the middle of going back and benchmarking + studying the different containers. Thanks for this fr because emplace_back is so important lol
Headache? No. This is functioning code and it will work. It’s just a lack of awareness of the language and performance. It’s usually a problem of discipline and knowledge. BTW, in production, those issues will usually be caught by a static analyzer.
@@michaeljackson1147 For me, headaches are synonymous with bugs. Some mistake that breaks something… therefore, I said it’s not helpful in preventing a headache. But anyway, it is a very useful video.
@@malekith6522 Ah, but couldn't you say lack of language awareness and performance can contribute to "headaches", regardless? lol I see what you mean though really. Further more, bugs can be included in the entire knack, in regards to discipline and knowledge. Just picking back at you though at this point :P
Interesting to see your take on the STL. I know that Chromium, which I'd say falls in the category of performance sensitive real time applications is actually use as much of it as possible. They have their own abseil library for cases where STL implementations are lackluster but they generally try to avoid homegrown solutions.
without knowing much about arena, it can't reach identical performance of stack right since the CPU has registers specifically for the stack pointers. You'd have to put those in memory or something instead I imagine to emulate it with Arena
@@shiinondogewalker2809 Generic memory allocation may require many hoops, several thousands of code statements: braching, failed branch prediction, merging memory holes. Arena, even if not backed by registers, saves some of that trouble. Once array has been allocated, its address can go to local variable, and local variable can be mapped to a register. Additional register for every array allocated on arena, and if CPU is out of registers, then local variables on stack for storing pointers to arrays in arena.
5:35 Hoppla! 😳 why is std::array stored on the *Stack* ? It uses the storage *dependent* from the context/location* it lives* ! For example: If you use it within a local (stack managed) scope, then you‘re right. But If one uses it globally, then it *does not* use the stack but the global space which usually is the bss section within the program. If a std::vector is part of a class or struct an one creates an instance by using new then the containing array is also part of this memory which is the heap! For std::vector you’re totally right: This container *likely* uses the heap for its storage because since it is dynamic it will (likely) employ new to allocate its storage space and new means heap! However this is dependent from the implementation. Funny side note: In my career I stumbled over an quite clever implementation that attempts to optimize small Allocations by reserving a certain space for its storage as a fixed array (aka intrinsic storage). If this will not be sufficient, then it starts to use new/ delete to expand this Space upon larger storage demands. This design decision was clever for this ( embedded) Software, because the application code was designed not to exceed these limits to keep the performance high. Furthermore Embedded Software should pretend to use dynamically allocation of memory due to the risk of memory fragmentation… But this is another Story…
I think he uses stack intercahngeably with being alocated depending on location (which he should probably clarify to be fair). As for std::vector always using the heap, is this really a requirement? I.e. is it part of the spec? Genuine question, as I would assume it could possibily have a small "stack" allocation (more specifically storage dependent on location, like an array) for small vectors, no? Something like std::string does if Im not mistaken
4:20, if array is the most used, I don't know. But std::vector is certainly more comfortable, and almost as the same speed as array. Even nowadays, when clang seemed to achieve more performance for array, it's only ~5%, according to my benchmarks. Meaning a game (depending on vector performance) with 57 FPS would run at 60 with array. That also means vector is being deployed on stack - otherwise it would never reach this performance. vector is also more comfortable to use because of pushing_back things makes the size grows proportionally, without the worry for seg fault, when reaching stack's limits. But of course, I always use vector::reserve 1st, to avoid new "allocations" at every push_back.
@@TheCherno Are you saying that std::array is slower than C-array? I read somewhere that the standard granted same speed, by keeping its inner structure.
@@MrAbrazildo That's not what he's trying to say. He means "across all programming languages, the most used data structure is the one that stores a set of objects in contiguous memory". The name "Array" is the most common name for this data structure. Both std::vector and std::array are "Arrays" in this sense.
@@ABaumstumpf let's say a function takes a vector, and a collection of elements to be pushed to it if the function reserves the number of elements beforehand, then calling that function in a loop will force the vector to grow linearly if the function doesn't reserve, then calling that function in a loop will make the vector grow exponentially in both the cases the function knew how many elements needed to be pushed beforehand, but the function that reserves has exponentially more reallocations that the function that doesn't reserve
As far as all this is true just wanted to add, don't blindly use emplaces instead of push or moving data. If you already have data copy/move it, move if it's no longer used in the scope you want it, copy otherwise.
Usually most of your videos go over my head as I only took like two C++ classes but I could understand this one and I really enjoyed it, 10/10 and also shared with a CS C++ graduate friend
Wow, I had never thought about adding capacity config for a dynamic array before. I kind of figured that this is kind of redundant concer ing how this is a requirement of a static array and so you could just use a static array. But I see the value of automatic resizability after capacity config.
Step 0: Consider if the code you are writing is performance critical. If it is not you can maybe sometimes prefer the simplicity of just using the vector 'wrong'. However often code that is not considered performance critical may become it later, so most of the time just aim to optimize early.
It's good to understand the reliable baseline. Using std::array for fixed size, and emplace_back and reserve in std::vector, isn't premature optimization, it's having a good baseline approach. Using std::list when you don't know it'll outperform a vector is bad because it's not a good baseline to use and is premature optimization for example.
@borealis75 you CANNOT know if this or that part of the code is or will be performance critical - and once you set it in stone, it is usually very hard, if impossible to undo - not to mention there simply is no manpower to do it. One should always write code which is at least OPTIMIZABLE and non-pessimized. This whole 'dont optimize' mentality from the early 2000's needs to die already. Compute is cheap, but it is not free.
Amazing! Beyond the Tetris example, can you explain what this memory management change might look like in a larger application? Maybe not an air traffic control system but something where larger consequences can play out.
I am just trying to learn std:pmr (polymorphic memory resource). This gives more control of where the memory comes from and can help making code that uses std::pmr::vector (and other STL datatypes) more robust and faster. This gives you one solution to store a vector inside stack memory that you allocated with std::array. I just haven't found a good explaination, yet and I'm currently very confused about it.
For any task where size matters ( giggty ) - where vector sizes reach thousands of elements - the allocation cost incurred by the incremental push_back probably does not matter. Storing objects with a non-trivial copy semantics in a vector is asking for trouble either way - as a push-back ( or emplace ) that forces a reallocation will move / copy the content to a new location - incurring a pretty hefty cost. For games such usually means frame drops. The main benefit of std::array is that it is 'constexpr' able, which may mean 0 allocations, 0 moves and 0 copies.
How would the std::array example look like at the end? since it doesn't have the emplace_back function as std::vector, would you be forced to use the struct's copy constructor?
Can you similarly talk about other STL features and how to write better code(performance), and one suggestion when doing some optimization talk about the tradeoffs rather than completely discarding an option, and it would be more engaging if you actually show the runtime after showing the number of allocations, using counters,tracers and profilers offered by kernel
very useful,i can take this in mind.I still like std vector because it's convenient for a clean code.I may think about vector.reserve or vector(presize) if i had known size.
EASTL is pretty great, but I do have one question regarding it you may or may not know the answer to: What is the correct way to override the new operator? They mention briefly that you need to and give an example but don't talk further about it.
The real solution should be to create a compile time array, with statically known size. That way, you can add more items to the array and still have even 0 stack allocations. The value will just be loaded into memory during load time.
The same behavior is for std::string. A better Implementation of a vector is in QT. Here the objects are copy-on-write. If you create a copy of the object, then there increment only a reference counter. So there are copy online a mangagent structure of some bytes. If you modify 1 copy then there class create a new sepeate copy of the data. So here the management of the data its better then the stl.
The reason Qt had to do this was that when Qt was created there were no modern features like move constructors in C++. It was way before C++ 11 was released.
@@robertvetter1011 Its also possible to extend the normal vector with COW. class CowVector { private: std::vector * internalData; size_t * linkCounter; }; With the construction of the Class setup the internalVector and the linkcounter = 1. If the linkcounter is equals 1, so you can read and write on the internalVector. If the linkcounter is bigger then 1, the internalVector is readOnly. If you want write, you create a new copy of the internalVector. The new structure has then the Linkcounter = 1 and is RW. A copy-constructor have copy both pointers and increment the value of the linkCounter. And the destructor have decrement the linkCounter. If the linkCounter == 0 then the destructor has to been delete both pointer. So you can theoretically extend the existing Vector with COW without using C++11 features. Be careful the example doesn't contains the public functions. Furthermore my example isn't threadsafe.
I was going to comment on the fact that the case made in this video actually demonstrated how good the vector is since the cost of allocation is amortized which is explained a bit later. Also, the copy and move constructor is irrelevant to the topic in using the vector since other data structures will behave similarly to how your customized data type set up copy and move constructors. However, the process of analyzing allocations here is solid and in fact, lacking in most developers I've seen in other language users, so kudos.
@@just_smilez The point I'm making is that in the video the Cherno mentions that there are several allocations when moving from 0 to 1 to 2 to 3 to 4 elements - which are elided in Rust.
I like that video. There is one thing however that I don't get. The function that hooks into the allocation prints a message and increases the counter. But when executed it show way less messages in console than the counter. For example at 10:48 it shows 1 message printed and 5 allocations. I'd expect it to show 5 lines with "Allocated...". Is this just scrolled or what's going on?
Yes, very easy to forget that memory allocation is expensive and doing uneeded copies of course is expensive too (but very modern in functional coding style).
Reserve + push back pattern is quite alot slower than resize + set for simple types. If's are slow, and there's an if to check for capacity. With resize + set you also get better compiler optimizations like loop unrolling that will utilize memory bandwidth better and the superscalar nature of the processor. I've seen 5x difference. If you do resize + set it might even generate a bit of SIMD. It's really a no brainer. Imagine inlining the push back implementation, you get so much ugly code in the inner body of your loop. Keep that simple.
I see the standard template library used a ton in a lot of performance and memory intensive projects, but it's only ever going to be a "standard library" meaning general purpose.
Just implement a copy constructor that fails if used and you'll no longer accidentally copy anything. I would prefer that you had to call explicit clone() or copy() whenever you actually need to copy something. Always move or give reference instead. Of course, when you take that to extreme, you should probably be using Rust instead of C++ already.
That's not going to happen. Rust will not replace C++. One of the biggest uses for C++ is games, and the instrumentation and support for game development in Rust is lightyears behind C++.
What do you want to see next? 👇
Don’t forget you can try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/TheCherno . You’ll also get 20% off an annual premium subscription.
A comparison between std::vector and std::list ? 😉 …and then std:set, std::map and the differences of std::unordered_set, std::unorderd_map … 😏
@@hanspeterbestandig2054 Instead of just a comparison I'd like to see a video of cases where he has personally preferred one over the other, like what problem did this data structure solve. I've rarely found myself using lists, but when I have they have been invaluable. Rarely see myself using maps over unordered, priority_queues over deques, or stacks over deques. And lots of my usage of various data structures are just out of habit, but I chose them to solve a specific problem they might not be the best solution for.
A video about Memory Orders. Did you use any lock-free structure in your game engine? Does it bring any performance to your engine?
memory safety. shortcuts what you can take so that you can product faster out. how to make code easily readable.
video about people who optimising their code about replasing list to vector and how much they use their time to it compared to time how much their app will run in this universum.
Please don’t forget to add this to the C++ series playlist. A lot of beginners need to see this
He says literally the same things in the vector video in the playlist
... and some Rusty guys who see only disadvantages in C++ :)
@@danielmilyutin9914 rust has the same thing tho idk what they're seeing
@@danielmilyutin9914 Rusty guys, nice lol
I think we should stop recommending c++ as a beginner language in this day and age. it's fine if you are looking for a job in it but in general it's not really that good of a language (like honestly I've used it)
Another thing I would add to this, is to always mark your move constructor noexcept if you want the vector to use it. In this case it didn't cause problems, since the reserved size, and no vector resize occured in this example. But if a resize did occur, the vector probably would use the copy constructor instead of move if it's not noexcept. So always mark your move constructor noexcept if you can.
this really helped me. Thank you.
6:08 - that is categorically wrong. The cost of using heap-allocation is the actual allocation. Once it is allocated there is no difference anymore.
9:57 - compile that with a not-ancient compiler and optimisation enabled: The result is most likely 0 allocations - the compiler is allowed to remove those.
16:35 - emplace_back would also be 0 allocations - that is mandated by the language.
19:05 - the reason it does not have a move-constructor is cause you disabled it by giving it a user-declared copy-constructor. had you not done that your class would be a simple aggregate-type, those operations would all be compiler-generated (with some other nice benefits) and you'd not see copies/moves either.
With vector you only want to use reserve if you either know the exact number of elements already, or you have measured that there is a performance-problem and you have also measured that you can get a good enough heuristic that your preallocation actually is significantly faster.
If you dont know then you can very easily end up with nearly the same number of allocations but a lot higher re-allocation and more memory-traffic.
Yeah, I was almost willing to forgive his giant "stack is faster than the heap" text hoping he would elaborate. Then he elaborated wrongly.
I can see people duplicating values all over the stack just to avoid using the heap as a result of this advice.
Of course it will be 0 allocations for a simple program. But for a more complex one? I wouldn't be so sure about that.
I usually share his opinion but coming from ccp weekly knowledge he is nit picking over stuff the compiler already does in most scenarios.
It's often times better to write expressive code than performant code. Because they might work the same way after optimizations.
I think access heap is still slightly slower due to indirection. You first need to read the address of the heap block from the pointer variable.
@@justinzhao9831 "You first need to read the address of the heap block from the pointer variable."
That is the same with the stack.
Sonic pro tip: preallocate memory beforehand
If you know how much memory you need, you do not need a vector.
@@bestopinion9257 not necessarily, there’s a lot of cases where you know the minimum size but you still want the capability to expand the buffer without seg faulting.
@@RagePower2000 That's a contradiction. It is either fixed size or unknown expandable.
@@bestopinion9257 No it's not a contradiction. In some scenarios you know beforehand that you will be dealing with data in some range(between 100 and 10000 elements for example) that goes to thousands It would be good for optimisation to preallocate some memory so it wouldn't waste some time increasing capacity and copying inside arrays
It is common to get something where you get an image line by line and get the metadata at the beginning to pre-allocate the size. Preallocating a maximum size image would not be as memory efficient.
Technically, std::array is stored wherever the memory you are using for it, is stored.
If you have an object that has an std::array, that array is going to be stored wherever that object is. If you put that object on the heap, then the std array is stored on the heap.
In your second example, if you make a static array to store the colors, that data would be stored in the .data section or .rdata section. (Or SOMEWHERE within the PE or ELF. I have also seen static data get stuck in the .text section. Haha)
The important part is that it's not going to create any additional memory.
Just minor nitpick though. :)
If it’s a global variable then it will get stored in .data section
@@e22z6 does const makes them reside in .rdata? (checked in dumpbin, yes it does)
You are mixing up RAM memory storage with sections in the executable file. Both are different things.
@@robertvetter1011 no they aren't. The executable sections are literally stored in RAM. They are only stored elsewhere when on disk.
The discussion in the second part is wrong in the sense that the only reason you got all those copies was because your instrumentation code forced them to be there -- overload resolution will prefer the manually-added copy constructor over the compiler-generated move constructor. Had you _not_ written the copy constructor, a move would've happened instead. That is, it's not true that you need to supply a move constructor yourself. In the vast majority of cases the compiler will write one for you and it'll usually be correct, particularly if what you have are just dumb structs (even if they contain more complicated types like vector).
As for whether one should still use push_back, IMO yes. It's true that emplace_back subsumes the same functionality so in principle there's no loss of expressivity if you just use emplace_back everywhere. However, 1. using push_back signals intent and more importantly 2. it's not a template, so error messages will happen at the call site instead of deep in the standard library in xmemory or some other implementation-specified header. push_back should also compiler faster and lead to a smaller binary, for the same reason.
Exactly this. There is no need to optimize copies of integer sized PODs (and even 16 integers copying is still OK).
@@isodoubIet Was looking for this comment ^_^ correct!
Does every compiler generate the move constructor?
@@aarong2374 Yes, it's required by the standard. If it doesn't it's a bug.
To solve this issue, C++26 will add a std::inplace_vector, a "dynamically-resizable, fixed capacity, inplace contiguous array". It has the benefits of both an array and a vector, it's capacity is fixed but it's size is dynamic, and it's stored on the stack.
I can't tell if this is satire or not.
@@Kaptimeit’s not
Dunno why it took them so long, using std::array with a manual counter is kinda annoying.
I do find a hybrid model more convenient though. A vector that has two allocators, one in-place with fixed capacity and then if that exceeds, it will fallback to heap allocated storage.
Useful for when the array usually doesn't exceed a certain size but can.
Guys! take a loot at boost::container::small_vector ahhahahhhhhhhhhhhh
@@Kaptime Why would it be satire? I am sorry if I made any mistakes, this is just my interpretation of what's said on cpppreference
You also could have mentioned std::span, which is a meant as a view into a contiguous buffer (like std::vector/std::array) similar to std::string_view is a non-owning view into a std::string (or any contiguous buffer of char)
As someone who works with Rust a lot, I'm sad to see std::span mentioned so rarely. We at least have people using std::string_view nowadays, but it's unfortunate that many people don't know about similar generalized concepts. In fairness though, I have colleagues who take arguments as &Vec in Rust and don't think to just change it to &[T], so i suppose this problem is language agnostic.
@@coarse_snadthat &Vec issue should be pretty easy to detect using clippy if not a compiler warning
@@coarse_snad I ended up implementing my own span, but I call it *view* because it's more intuitive, so I have a list of defined types, such as:
view
view
view
...
They derive from view and implement their own operations, such as arithmetic operations for integers and string operations for char/wchar, etc. span really is a class that allows you to write very concise code.
@@ensuretime Haskell which is where this concept was inspired from orginally called it view too, I thin
Great video!
Just a side note... 8 calls to malloc (or standard c++ new operator) not necessarily heap allocate 8 times. it allocates in pages of memory, not every call. but I totally got the point of the video, which is awesome btw!
operating system gives memory to the process in pages, allocators (like malloc) then break those pages into chunks and give the chunks to the programmer. In order to reuse chunks you need to keep track of the chunks. That means you need at least 2 pools of chunks: one for unused chunks, one for occupied chunks. Plus, it makes sense to keep small and big chunks together respectively to minimize memory fragmentation. Even it the call to malloc (or new) looks very simple it's actually not very simple at all
That's true. But even with that, allocations are slow.
@@ohwow2074 Yes yes, totally should be avoided if possible of course.
It uses placement new operator?
Reserve has a fun pitfall of not following the geometric growth c: this may cause more allocations if e.g. you have a 1k element vector and reserve 100 elements; the reserve would allocate for 1.1k elements, while using push/emplace_back would allocate for 1.5k elements. If the reserve is done multiple times, it can cause real performance issues.
Reserve is allowed to overallocate, and I believe that using reserve on most implementations will follow the normal geometric growth of that implementation. On the other hand, resize has this issue on all implementations, IIRC.
But in either case, caution is needed as it is vary easy to do worse than just letting the container manage its own size. You are right and Cherno should be more cautious recommending manually handling the vector's size. That works best if you know you want exactly N elements and will never change it, but in most cases it is better to just let vector handle it, at least until you have a performance profile showing that it is suboptimal.
@@oracleoftroy That's why I wrote my own vector, I can control the type of growth in compile-time, for a vector that will deal with blocks or buckets, growing exactly is more sensible because of the size of the chunks...
I'm pretty sure the push back function is an amortized constant so preallocating just halves the copies. Misusing the resize can also make it go from a constant to O(n) insertion.
Sorry I don't understand what you're saying. If I'm inserting n elements into a vector I don't understand how it could possibly be done faster than O(n)
@@Kurushimi1729 I was talking about complexity per insertion
@@Kurushimi1729 Meaning: inserting n elements normally is O(n) amortised. Misusing resize *can* make it O(n²)
@@xugro ah I got it thanks
A std::array is not a general replacement for std::vector. If you return a std::array from a function by value, all its elements will be copied. Returning a vector by value is a lot cheaper. (Sure, in most cases we have RVO.) Another important point, though, is that stack space is limited. Your program might work for small test examples, but will crash if you suddenly use it with larger std::arrays. This does not happen with std::vector. In certain cases it might be a lot smarter to use a vector to future proof your program for larger inputs.
I’m also not sure about performance differences in accessing arrays or vectors. This might be true if you only store 5 ints, but for larger sizes the overhead of the initial indirection is negligible. Caches don’t play much of a role for performance comparisons if you iterate over a hundred structs or more. You should only make sure that you don’t continuously resize a vector by calling push_back (like you have mentioned). But, don’t initialize the vector with a size or resize it. (Almost) always use reserve() instead because default constructing objects (not just ints which actually stay uninitialized) is a performance killer. If you always use resize() you don’t have to think twice.
And finally, only use emplace_back if you want to construct the object in place. Otherwise the general consensus is to use push_back to avoid nasty errors. If you move an already existing object into the vector with push_back it is not slower than emplace_back. But, it is safer and you actually want to push back an object in this case and cannot emplace it.
15:00 The question of reserve or resize is basically: Is the element cheap to default-construct and to copy-assign? Yes? Then use resize. Otherwise use push_back. The difference can be massive, as every push_back requires checking and setting the capacity, plus writing the data. A copy-assignment just writes the data. The default-initialization for trivial data types is cheaper than a capacity check-and-decrement. If the size won’t increase, use a std::unique_ptr and write in all the elements. A little less overhead than std::vector. In C++20, there is std::make_unique_for_overwrite to create a unique_ptr with uninitialized elements.
My ex was an STD vector.
She spread them all over the town!
will always love this one
lmao
Thank you so much for making this video! It’s super helpful for someone like myself who’s self-taught and doesn’t have a good grasp of the inner workings. I really appreciate how you explain the thought process, the different approaches, and show how you can dig in deeper to verify for yourself. You’re a great teacher!
BTW In the meanwhile the owner (Adam) of this Tetris project was so kind to accept my Pull Request in which I taught him about these Issues as part of a Fork of his Project. This means that the latest version of the discussed code is now fixed in this repeating copy of the vectors. This storage then is referenced by a *const reference* that refers to these cached data when needed. Hence the code now performs lazy allocation of the Resources upon the first access and stores (caches) it *on a single point*. BTW its not only the Vector of Colors that are managed within a std::vector).
You mentioning the code review episode which inspired this vid reminds me of a teacher looking at your paper during an exam, then reminding the whole class not to make some very specific mistake 😅
For the use of std::array, good to mention also that the size needs to be a COMPILE TIME known, otherwise not useful.
Good for beginners, but here’s a few comments.
The C++ standard library is no longer called STL, it was years ago but now it’s just “the standard library”.
std::vector is one of the very few standard collections which is OK even for performance-critical stuff. The only few times when I replaced it recently when I wanted to bypass malloc/free C heap i.e. I know my data is very large and I wanna page aligned zero initialized memory directly from the OS kernel i.e. VirtualAlloc or mmap.
Modern compilers are smart enough to eliminate temporaries caused by push_back. For simple elements like your example they often automatically inline everything at least in release builds, compiling push/emplace into equivalent machine code.
this is so interesting! I knew all this in theory, but all of these look to me, eyeballing it, like the compiler should be able to get rid of the extraneous allocations on its own. I've been told before things like "portability isn't the only reason to pick C over assembly - you're also probably not cleverer than the compiler!" but I had no idea that this doesn't actually even begin to translate to the jump from C to C++, not at all!
Correction: stack memory is not faster than heap. It is literally the same memory. It is only the allocation step that is slower for heap ( in fact it is undetermined ). The reason why stack might be faster in some occasions is the fact it is pre-allocated and hard to cache miss.
in order to access stack data the cpu will use the value in a stack register and an offset to get the address. In order to access a heap data the cpu will use the value in stack register and an offset to get the address to a pointer, and dereference that to get the address. After getting the base address to something like an array then yes, stack and heap memory are just as fast, but getting the address of the data is one more step for heap.
@@shiinondogewalker2809 True but irrelevant IMO. 1-2-3 access for register versus million access for RAM via segments.
@@petarpetrov3591 for one big task like that, sure. If you have many small lookups it's the other way around. Consider it irrelevant or not it's wrong to say it's only the allocation step that's different. I can agree that it's mostly irrelevant though
Stack and heap memory are not necessarily the same memory. In an embedded environment, memory may be implemented in separate fast and slow physical memory devices with the stack and heap configured to live in either one.
I think it also worth to mention that if we use vector that may be resized, it is good to mark move constructor objects that vector stores as noexcept
In the tetris code, when it iterates the vector it is also making a copy of Position. He could avoid that by changing that for to "for (const Position &item : tiles)"
Great video! I've known about this for a while, and it's surprising how many programmers overlook the importance of understanding how many copies and movements are happening in deep memory when using std::vector. You absolutely nailed it in explaining why this matters. It's not just about using the right tools, but knowing how they work under the hood. Thanks for shedding light on this important topic!
Thanks, this is a good reminder on letting emplace_back do the construction, vs calling std::move(data)
Since you touched the subject, it would be nice to make videos about the inline vectors of some libraries and the pmr::vector of the standard library
Your a natural teacher. And your editing is spot on too: Short and quick. Love it! fantastic video.
It is nothing wrong to return a vector from a function. It is a move semantic or compiler's return value optimisation which allows not to copy returned value. Also every container has "reserve" function to avoid multiple dynamic allocation/copy in a container. But in general I agree with you that C++ has big disadwantage in compatison to C. It is required paying much more attention for performance of the application.
This is why I made arrays first-class objects in my language. If you use a question mark as part of the array size when declaring the array, then it's dynamic and on the heap, otherwise it'll be static and on the stack. For most operations the compiler attempts to optimize usage of dynamic arrays as views, like I do for strings. In fact, the default string type is a string view in my language. Of course, someone will invariably wonder how you control the allocator when these types are built-in, and for that you can use the template syntax to substitute your own allocator.
I understand that "on the stack" and "on the heap" is mostly true, but being as pedantic as I am, I would say that `std::array` is "inline", and `std::vector` is a "owning pointer to a heap allocarion.
keep in mind you dont always want to stack allocate arrays, especially if they are large. you only get 1mb of stack on windows and 8mb on linux, which is fine for small stuff but past a point you want to keep it fairly freed up
I think their is an important idea to have because we often add elements to vectors but sometimes we need to remove them. If we don't have a LIFO or something we sometimes need to remove elements in the middle of a vector. If the order does not matter the correct way to remove elements is to swap the element to remove with the last element THEN remove the last element so it is constant time and not linear. Actually I think this kind of algorithmic optimization is more important than machinery optimization that is a super complex topic in the end.
Thanks Cherno, please teach custom allocator, arena allocator next from scratch.
Also speak about template specialization using std forward and more about templated classes
Thanks a ton in advance
In C++, "struct is class" although trying to use the compiler option -fno-rtti that is not good for the enormous classes taxonomy. For cleaner design, v.push_back(Data(i)) may not give same performance as v.push_back(i), so that the former idea maybe unoptimized.
reserve(N) doesn't grow to N, but to _at least_ N. It's an important distinction as capacity() will usually be higher than what you just reserved.
There are cases where push_back is more suitable and the distinction is quite simple: if you have an object of type Data already constructed, use push_back (preferably with std::move), and if you're constructing the Data object as you're inserting it into he vector use emplace_back.
If you're using emplace_back for objects that were already constructed you're unnecessarily calling the copy/move constructor.
And here I am, basking in the C# bliss of using List all the time with no idea of how it affects performance lol
What about huge data structures arrays? Say you need a million elements, should you use the stack or the heap? Should you use , or just an array? I'm aware the stack has a fixed size and it's not that big, but I'm also aware you can change the size of the stack so... Which one to use?
Fun fact - allocating more than you asked for also happens in other languages. I remember asking on SO why a supposedly empty dictionary has such a large size when I first discovered this.
Moving amounts to a shallow struct copy, and happens when the source operand is known to be expiring.
Make a habit of typing `auto const &` any time you're "cloning" something on the right of the `=`, and when you realize "oh, I need to change (mutate) this thing", you just have to decide if you need a copy (remove the `const &`) or change the original (remove just the `const`). Start strict and loosen only when necessary.
Likewise, take broad types as parameters, but return as specific a type as is reasonable. It's good practice in general, but when you do reach the point you realize it's a useful generic abstraction, often all you need to do is replace the type with the template type in the function specification -- at least assuming you've been smart about using `auto` in your declarations like I mentioned above. With modern types like std::span (or gsl::span) and std::string_view that can even take non-hierarchical types, C++ is supporting even better API designs.
Some of us are super old school and were around when STL was the "new kid on the block". It wasn't part of the C++ standard until 1998. These were days when you had a dog eared copy of Knuth's algorithms book (TAOCP) on your desk for quick reference ;) Anyway, the reason one is heap based and the other stack is pretty simple. The std::array is basically just a decorator for an array with some sugar thrown on like iterators and range checking. A std::vector, on the other side basically does a malloc to create a buffer. It will then exponentially grow that buffer as it needs to. Also, heap has always been slower than stack. Unless something has changed. In the world of x64 it does require a few extra instructions, but heap memory is not contiguous, so there is overhead with the memory manager (which I also wrote low level back in the day using Borland C++ and Assembler :P ). Stack is contiguous. In terms of complexity they are both O(1) for access, but the std::vector will add overhead for insertion. If it needs to resize that operation will take O(N), so in the long run std::vector is going to be slower. It's really not something you would typically optimize for - remember Knuth's warning: Premature optimization is the root of all evil. Use them for what they are used for. If you have a fixed buffer and you know it will never change, use a std::array, if you need a dynamic buffer use std::vector. These are patterns that should be familiar to all developers regardless of what frameworks or libraries they are using. Just saying. Now let me get off my X Gen soap box :P
What a clever and simple way to find these "leaks" in the code. I will definitely try to use this.
💡One thing that could have been highlighted (though you have briefly mentioned it), is that moves are not free. When a move is done on a temporary, both objects need to be created first, before the temporary is moved (and also destroyed).
This is precisely why emplace_back() which forwards the arguments is good 👍
I tried to write the same code in CLion and I'm glad that Clang-Tidy linter highlighted me these issues and recommended to use `emplace_back` instead of `push_back` and highlighted unnecessary temporary object creation when I tried `emplace_back(Data(i))`. Not even saying about `const T&` stuff
Push back can actually be better when you have a vector of some small data type (bellow 8 bytes on 64 bit platforms, and bellow 4 bytes on 32 bit platforms)
ESPECIALLY for built-in types like int, size_t, and other numerical types, because those have special optimizations around them in compilers when copying them, rather than the reference to it
This is not ALWAYS true, but 99% of the time, copying a number is better than copying a reference/pointer to it, especially when it's smaller than 8/4 bytes (again, depending on the platform)
PS: these optimizations don't really matter unless you're gonna move around millions or billions of numbers per second.
So for the example of a function which returns a vector, I'm of the opinion that if you are intending to return a dynamically sized chunk of memory, it is better to use the heap. If you want to use the stack, you're stuck between two options, copying the structure up the stack in the return line of the function, or requiring that the caller knows how much memory you're going to need and passing in a reference. Having to allocate and deallocate space in the heap is not that big of a performance hit if done correctly. The blanket advice of "avoid heap" isn't nuanced enough in my opinion. What is more important is your emphasis that people understand how these data structures work. But of course, let's not forget that the compiler can often do the work for you and writing maintainable code is arguably more important than optimized code in those examples.
You missed talking about amortized constant time for push_back despite all the allocations. I think that's important to cover because it explains why vector chooses to increase the size by a multiplicative factor each time.
While I believe it's good to show the differences, I think the comparison isn't entirely fair.
By having the copy and move constructor modify global state, you're introducing side effects, which effectively prevents copy/move elision from happening.
Remove those side effects and the copies/moves created by push_back(Data(i)) and emplace_back(Data(i)) will actually be elided.
Before C++20 emplace_back didn't work with aggregate initialization, so you had no choice but to have it invoke the move constructor: emplace_back(Aggregate{x})
But, when you start doing that, push_back and emplace_back pretty much do the same thing.
One other thing worth noting about emplace_back is that it returns a reference to the constructed element since C++17, which may be useful in certain situations.
for fixed-size arrays, the C Arrays (with type[size]) are usually faster than the std::array because they have less overhead
One of the things that I think that wasn't mentioned between the uses of std::array, std::vector isn't just in knowing how many items you'll need or have within your containers, but also the lifetimes of those objects.
Great show and tell of how something and simple as std:Vector isn't the silver of collections of data that a lot of people think it is unless through and planning are used first.
This video is A++.
Seriously. This will save a lot of time and headache for a lot of people
The title, content and timing of release. I was just in the middle of going back and benchmarking + studying the different containers. Thanks for this fr because emplace_back is so important lol
It's c++ actually
Headache? No. This is functioning code and it will work. It’s just a lack of awareness of the language and performance. It’s usually a problem of discipline and knowledge.
BTW, in production, those issues will usually be caught by a static analyzer.
@@malekith6522 Not sure what you're arguing about with the headache part but the point of the comment is that the video is helpful lol pretty simple.
@@michaeljackson1147 For me, headaches are synonymous with bugs. Some mistake that breaks something… therefore, I said it’s not helpful in preventing a headache. But anyway, it is a very useful video.
@@malekith6522 Ah, but couldn't you say lack of language awareness and performance can contribute to "headaches", regardless? lol I see what you mean though really. Further more, bugs can be included in the entire knack, in regards to discipline and knowledge. Just picking back at you though at this point :P
Interesting to see your take on the STL. I know that Chromium, which I'd say falls in the category of performance sensitive real time applications is actually use as much of it as possible. They have their own abseil library for cases where STL implementations are lackluster but they generally try to avoid homegrown solutions.
If performance of stack is desired, one can consider using arena. Arena is like user defined additional stack
without knowing much about arena, it can't reach identical performance of stack right since the CPU has registers specifically for the stack pointers. You'd have to put those in memory or something instead I imagine to emulate it with Arena
@@shiinondogewalker2809 Generic memory allocation may require many hoops, several thousands of code statements: braching, failed branch prediction, merging memory holes. Arena, even if not backed by registers, saves some of that trouble.
Once array has been allocated, its address can go to local variable, and local variable can be mapped to a register. Additional register for every array allocated on arena, and if CPU is out of registers, then local variables on stack for storing pointers to arrays in arena.
5:35 Hoppla! 😳 why is std::array stored on the *Stack* ? It uses the storage *dependent* from the context/location* it lives* ! For example: If you use it within a local (stack managed) scope, then you‘re right. But If one uses it globally, then it *does not* use the stack but the global space which usually is the bss section within the program. If a std::vector is part of a class or struct an one creates an instance by using new then the containing array is also part of this memory which is the heap! For std::vector you’re totally right: This container *likely* uses the heap for its storage because since it is dynamic it will (likely) employ new to allocate its storage space and new means heap!
However this is dependent from the implementation.
Funny side note: In my career I stumbled over an quite clever implementation that attempts to optimize small Allocations by reserving a certain space for its storage as a fixed array (aka intrinsic storage). If this will not be sufficient, then it starts to use new/ delete to expand this Space upon larger storage demands. This design decision was clever for this ( embedded) Software, because the application code was designed not to exceed these limits to keep the performance high. Furthermore Embedded Software should pretend to use dynamically allocation of memory due to the risk of memory fragmentation… But this is another Story…
I think he uses stack intercahngeably with being alocated depending on location (which he should probably clarify to be fair).
As for std::vector always using the heap, is this really a requirement? I.e. is it part of the spec? Genuine question, as I would assume it could possibily have a small "stack" allocation (more specifically storage dependent on location, like an array) for small vectors, no? Something like std::string does if Im not mistaken
@@lengors7327Exactly! You got the point! Thanks for this valuable explanations! 👍👏👏👏
You should cover something like "How to approach to optimisation" or Optimisation to an existing codebase in general. (of course for beginners)
This video sort of confirmed my suspicions of a lot of what I see with Vector usage, e.g. misuse of the ->data() method (underlying pointer).
4:20, if array is the most used, I don't know. But std::vector is certainly more comfortable, and almost as the same speed as array. Even nowadays, when clang seemed to achieve more performance for array, it's only ~5%, according to my benchmarks. Meaning a game (depending on vector performance) with 57 FPS would run at 60 with array. That also means vector is being deployed on stack - otherwise it would never reach this performance.
vector is also more comfortable to use because of pushing_back things makes the size grows proportionally, without the worry for seg fault, when reaching stack's limits.
But of course, I always use vector::reserve 1st, to avoid new "allocations" at every push_back.
std::vector is also an array, I wasn’t talking about std::array specifically
@@TheCherno Are you saying that std::array is slower than C-array? I read somewhere that the standard granted same speed, by keeping its inner structure.
@@MrAbrazildo That's not what he's trying to say. He means "across all programming languages, the most used data structure is the one that stores a set of objects in contiguous memory". The name "Array" is the most common name for this data structure. Both std::vector and std::array are "Arrays" in this sense.
i just wrote an implementation of it, it was super fun (and it helped me fix a lot of bugs in my linked list and memory allocator implementations)
Thank you so much for this one ❤ I love your C++ tutors so much.
reserving memory beforehand can actually backfire sometimes
Yeah - you want to do that if you know the exact size or at least a rough order of magnitude.
@@ABaumstumpfnot really
if you do that in a loop, it can make the growth of vectors linear instead of exponential
@@hpsmash77 "not really"
how?
"if you do that in a loop"
Ah - so when you do NOT know the size.
@@ABaumstumpf let's say a function takes a vector, and a collection of elements to be pushed to it
if the function reserves the number of elements beforehand, then calling that function in a loop will force the vector to grow linearly
if the function doesn't reserve, then calling that function in a loop will make the vector grow exponentially
in both the cases the function knew how many elements needed to be pushed beforehand, but the function that reserves has exponentially more reallocations that the function that doesn't reserve
@@hpsmash77 "in both the cases the function knew how many elements needed to be pushed beforehand"
No it did not - obviously not.
As far as all this is true just wanted to add, don't blindly use emplaces instead of push or moving data. If you already have data copy/move it, move if it's no longer used in the scope you want it, copy otherwise.
Memory access doesn't know whether it's accessing stack or heap. So same performance.
You can also use a stack-backed arena allocator and plug it into std::vector with pmr.
Usually most of your videos go over my head as I only took like two C++ classes but I could understand this one and I really enjoyed it, 10/10 and also shared with a CS C++ graduate friend
Wow, I had never thought about adding capacity config for a dynamic array before. I kind of figured that this is kind of redundant concer ing how this is a requirement of a static array and so you could just use a static array. But I see the value of automatic resizability after capacity config.
Step 0: Consider if the code you are writing is performance critical.
If it is not you can maybe sometimes prefer the simplicity of just using the vector 'wrong'. However often code that is not considered performance critical may become it later, so most of the time just aim to optimize early.
Slow code is bad code. This mentality is why software is slow. See Mike Acton’s talk “Data Oriented design and C++”
Premature optimization ist the root of all evil.
It's good to understand the reliable baseline. Using std::array for fixed size, and emplace_back and reserve in std::vector, isn't premature optimization, it's having a good baseline approach.
Using std::list when you don't know it'll outperform a vector is bad because it's not a good baseline to use and is premature optimization for example.
@@sutsuj6437 Stop misquoting
@borealis75 you CANNOT know if this or that part of the code is or will be performance critical - and once you set it in stone, it is usually very hard, if impossible to undo - not to mention there simply is no manpower to do it. One should always write code which is at least OPTIMIZABLE and non-pessimized.
This whole 'dont optimize' mentality from the early 2000's needs to die already. Compute is cheap, but it is not free.
Amazing! Beyond the Tetris example, can you explain what this memory management change might look like in a larger application? Maybe not an air traffic control system but something where larger consequences can play out.
I am just trying to learn std:pmr (polymorphic memory resource). This gives more control of where the memory comes from and can help making code that uses std::pmr::vector (and other STL datatypes) more robust and faster.
This gives you one solution to store a vector inside stack memory that you allocated with std::array. I just haven't found a good explaination, yet and I'm currently very confused about it.
This is a great video and explanation. With a hot take like the one you had I wasn't too sure it would be useful. Love to be surprised!
7:37 What stands out the most to me is not darta vs dayta, but the way you pronounce here... heeya.
For any task where size matters ( giggty ) - where vector sizes reach thousands of elements - the allocation cost incurred by the incremental push_back probably does not matter. Storing objects with a non-trivial copy semantics in a vector is asking for trouble either way - as a push-back ( or emplace ) that forces a reallocation will move / copy the content to a new location - incurring a pretty hefty cost. For games such usually means frame drops.
The main benefit of std::array is that it is 'constexpr' able, which may mean 0 allocations, 0 moves and 0 copies.
Say what you will about rust, but moving by default solves so many of these issues.
How would the std::array example look like at the end? since it doesn't have the emplace_back function as std::vector, would you be forced to use the struct's copy constructor?
Can you similarly talk about other STL features and how to write better code(performance), and one suggestion when doing some optimization talk about the tradeoffs rather than completely discarding an option, and it would be more engaging if you actually show the runtime after showing the number of allocations, using counters,tracers and profilers offered by kernel
maybe a before/after benchmark would have been nice to show the impact
very useful,i can take this in mind.I still like std vector because it's convenient for a clean code.I may think about vector.reserve or vector(presize) if i had known size.
EASTL is pretty great, but I do have one question regarding it you may or may not know the answer to: What is the correct way to override the new operator? They mention briefly that you need to and give an example but don't talk further about it.
The real solution should be to create a compile time array, with statically known size. That way, you can add more items to the array and still have even 0 stack allocations. The value will just be loaded into memory during load time.
Can we see some numbers on how much difference this stuff makes in program performance? How much faster does it run, how much memory do you save?
This is the best explanation of emplace right here.
I avoided this video after seeing it multiple times but I am glad I watched it. Thank you Cherno.
Will this also apply to std::vector or a struct containing std::string?
Yes, but string also has move()
The same behavior is for std::string.
A better Implementation of a vector is in QT. Here the objects are copy-on-write. If you create a copy of the object, then there increment only a reference counter. So there are copy online a mangagent structure of some bytes. If you modify 1 copy then there class create a new sepeate copy of the data. So here the management of the data its better then the stl.
The reason Qt had to do this was that when Qt was created there were no modern features like move constructors in C++. It was way before C++ 11 was released.
@@robertvetter1011 Its also possible to extend the normal vector with COW.
class CowVector {
private:
std::vector * internalData;
size_t * linkCounter;
};
With the construction of the Class setup the internalVector and the linkcounter = 1.
If the linkcounter is equals 1, so you can read and write on the internalVector.
If the linkcounter is bigger then 1, the internalVector is readOnly. If you want write, you create a new copy of the internalVector. The new structure has then the Linkcounter = 1 and is RW.
A copy-constructor have copy both pointers and increment the value of the linkCounter.
And the destructor have decrement the linkCounter. If the linkCounter == 0 then the destructor has to been delete both pointer.
So you can theoretically extend the existing Vector with COW without using C++11 features. Be careful the example doesn't contains the public functions. Furthermore my example isn't threadsafe.
I was going to comment on the fact that the case made in this video actually demonstrated how good the vector is since the cost of allocation is amortized which is explained a bit later. Also, the copy and move constructor is irrelevant to the topic in using the vector since other data structures will behave similarly to how your customized data type set up copy and move constructors.
However, the process of analyzing allocations here is solid and in fact, lacking in most developers I've seen in other language users, so kudos.
In Rust the std vector initially starts with 4 elements and then grows to avoid the millions of cases of extra allocations for tiny arrays.
In c++ it doubles capacity when it needs to grow for the same reason.
@@just_smilez The point I'm making is that in the video the Cherno mentions that there are several allocations when moving from 0 to 1 to 2 to 3 to 4 elements - which are elided in Rust.
@@just_smilez it's times 1.5 actually, but close
@@just_smilez In msvc++ stl grows by 1.5x
@@lumek4513this is only with msvc and clang tho , gcc doubles the capacity.
I like that video. There is one thing however that I don't get. The function that hooks into the allocation prints a message and increases the counter. But when executed it show way less messages in console than the counter. For example at 10:48 it shows 1 message printed and 5 allocations. I'd expect it to show 5 lines with "Allocated...". Is this just scrolled or what's going on?
Yes, very easy to forget that memory allocation is expensive and doing uneeded copies of course is expensive too (but very modern in functional coding style).
I love you takes and insight. Never gets old.
Reserve + push back pattern is quite alot slower than resize + set for simple types. If's are slow, and there's an if to check for capacity. With resize + set you also get better compiler optimizations like loop unrolling that will utilize memory bandwidth better and the superscalar nature of the processor. I've seen 5x difference.
If you do resize + set it might even generate a bit of SIMD.
It's really a no brainer. Imagine inlining the push back implementation, you get so much ugly code in the inner body of your loop. Keep that simple.
A pesar de tener ya unos años su lista de reproducción de C++ sigue siendo una joya!
I see the standard template library used a ton in a lot of performance and memory intensive projects, but it's only ever going to be a "standard library" meaning general purpose.
Well this was mindblowing thanks a lot
we need more videos like this !!!
I'm working in C (making my own embedded lang), and I've made my own array structure that works from the heap...
In that example with std::array, won't it generate copies or moves?
Just implement a copy constructor that fails if used and you'll no longer accidentally copy anything. I would prefer that you had to call explicit clone() or copy() whenever you actually need to copy something. Always move or give reference instead.
Of course, when you take that to extreme, you should probably be using Rust instead of C++ already.
That's not going to happen. Rust will not replace C++. One of the biggest uses for C++ is games, and the instrumentation and support for game development in Rust is lightyears behind C++.