I argue by making us visualize the graph he cemented the ideas further as instead of just looking at the graph for a moment we had to store it in our heads based on his descriptions and that increases the retention.
lol, tbh though, most comp sci assignments aren't about real world applications. They're more there for you to learn how a computer works. The programming experience is a plus.
Linked lists were developed in 1955-1956 by Allen Newell, Cliff Shaw and Herbert A. Simon at RAND Corporation as the primary data structure for their Information Processing Language.
Optimizing cache accesses and memory traversal patterns is by far the most important optimization you can make (beyond, of course, choose the lowest-complexity algorithm). And this is more true today than ever before. You should look into the latest research on "cache-oblivious data structures and algorithms". In my personal experience, reducing indirections, favoring memory locality, and making traversal patterns predictable will typically improve speed between 10x or 100x on a modern arch..
Yes, preallocating all the memory that your program needs at once should be the default way of programming. There are exceptions, but they need to be proven first.
I have experienced this in real life while optimizing frame caches for video editing and tile sorting for tiled rendering. Switching to arrays made both operations immensely faster. For tile sorting, an operation that took half an hour was reduced to a few seconds.
Linked lists are NOT good at linear access. Theoretically they should be fine. However due to CPUs being orders of magnitude faster at processing memory that can reside in the cache, and doesn't need to fetched, linked lists fail to perform in a real world environment. Unless you only ever insert into a linked list it's probably never the right collection to use. I can't remember the last time I only ever randomly inserted into, but never read or iterated, data from a collection.
One time, for a hashmap implementation (that i needed to be really fast, faster than the STL hashmap) i created a vector of linked lists as the buckets. HOWEVER, that was fairly slow to both create and delete and the data was spread out, so instead i packed all the linked lists into a single vector, where each element of the vecor is a node of the linked list (KeyHash,Key,Element,Next) And another vector, which keeps the beginning element for each bucket (this one is resized manually when the buckets become too full (>75%) and the other one is resized whenever it decides) This sped up the creation/deletion of the hashmap by a lot and (probably) reduced the cache misses due to all the elements being stored in 2 vectors (the largest that the hashmap got was around 150 million keys, the task was "determinization of finite automata", mathematically solving that task is of the order of 2^n, but i needed a fairly quick solution for a university project)
People don't seem to understand that O(1) is not FASTER than O(n), it just SCALES better; there's a breaking point. Imagine a "hashcode as a service" rest call vs doing it in memory: it'd still scale better, but it'd take an awfully large 'n' to get there. I see people missing this in PRs, streaming/looping an array is often faster with small arrays than something like a `HashSet`. And in practice, its really unlikely either makes a tangible difference in most use cases.
Linked lists are great! Pointers are great too! When I got my first c++ compiler in the 1980s I said what should I write? My conclusion was to rewrite my c linked list code as a c++ linked list object. I have used that object uncountable times, but never once have I needed to keep a list of integers. Real life applications often involve managing data records of 1000 bytes or larger. Mr. Stroustrup was also correct in pointing out that a lot of linked lists aren't large. For example, a patient's chart does not typically exceed 1000 records. While I still try to program efficiently as if I only had 640K of memory, I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant. It is displaying stuff, accessing files, and doing communications that slows things down.
>I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant. 5 years later: software has become even slower and more buggier than ever before.
@@youtubeenjoyer1743 Microsoft's original DOS was like 100K or less if you deleted utilities. Now Windows is hundreds of megabytes maybe even gigabytes. You have to wonder the level of inefficiency and kludge they used to produce so much code to do so little, with a lot of bugs. A lot of the code is just alternate proprietary approaches to lock you into Windows. The bugs are really a management and not software problem. Today's computers and the Internet are so fast that the only way you can be slow is to build in slow technology.
It's not a bad example to illustrate a largely fictitious dilemma. But, it's pitting two solutions of about a dozen against one another as if they are the only things going. But it does somewhat show that a lot of optimisation comes down to 'know your hardware'. What are the things to avoid with linked lists? -Apparently, having a system with a cache. (this is near universal today, but it was actually the exception, not the norm in the 80's) -Having to search the list for specific entries - Having a trivial amount of data per node, such that the link pointers themselves dominate the storage requirements. Obviously, searching is a very big issue in general. Of course, in terms of teaching things, linked lists have several important relatives, most notably the tree. A tree has a lot in common with a linked list on some level, but because the organisation is different, the result is also quite different. Again though, know your problem space, know your hardware.
Do *not* pay attention to this advice if you are developing for CPU(s) without cache. The linked list was invented in 1955-1956, when no CPU had cache. Without CPU cache, a vector shoving each of its elements after the insertion/deletion point over by one location is very costly, and the random memory access itself of the traversal of the linked list incurs no performance hit versus incremental access.
@@RetroDawn I will have to disagree with you on that, most CPUs (at least modern ones not sure about older ones) have many cache layers. Maybe the ones you work on don't but the majority of them certainly have cache
@@ayaanqui Smaller microcontrollers are cacheless, as are many but not all embedded controllers. I use everything from 9S08 (8 bit), 9S12 (16 bit), PIC16, PIC18, ARM, PowerPC. Only the ARM and two of the three PowerPCs I use have caches. There are actually three timing constraints. One is presence or absence of a instruction cache, another is the presence or absence of a data cache, other depends on the memory type. Smaller systems like embedded controllers that are run-in-place (code is in FLASH) often have static RAM so there is a small latency for page changes. Larger RAM has a precharge/rewrite cycle and crossing a boundary incurs a very large delay, often 70ns or more.
Just when I thaught I found the Holy Grale by using linked lists and its child structures, this video popped up on the screen :) Most of the tutorial sides for teaching beginners like me the basics of C++ apparently get this point wrong.
That missing chart was too much of a tease. I had to repeat his experiment. See the charts and the source code here. marketmovers.blogspot.com/2016/04/how-to-make-c-code-run-90x-faster.html
Sorry, I know what you mean. That popup comes from a 3rd party library, and sometimes it goes crazy. I'll ask our web people if they can tweak the settings on that window.
Thanks for providing this. I want to point out two things. Firstly, the formatting on your page is messing up the code. It's inserted angled quotes where there should just be normal quotes. I'm not complaining, I just figured you'd want to know. Secondly, after seeing this code, I'm kinda dumbfounded. This one's not on you, so please don't feel insulted. Randomly iterating through a linked list and deleting the node you land on? Of course performance is going to be exponentially worse than an array! That's not what people use lists for! lol... Lists (hopefully) aren't used in contexts where you need to randomly access data. They're used when you have a sequence of data that you want to iterate through, applying some function on all entries of the list, one after the other, in order, not randomly. Stroustrup should have known better. Maybe he's just an old man, and with one of his slides missing, he was flustered and forgot to mention this detail. I just hope he doesn't believe this is in any way proof that vectors are always better than lists. Certainly, they often are. On certain CPUs, maybe they even always are. Computerphile made a decent video comparing the two datastructures: th-cam.com/video/DyG9S9nAlUM/w-d-xo.html
When you get to the point where you are starting to worry about insertion, deletion and search time, its time to use a b+tree which by their very nature are indexed.
Thought-provoking video and comments. Very edifying for me, still learning the ropes with data structures, but savvy enough with performance analysis & architecture to follow and judge the arguments (both in the video and in the comments) fairly & profitably. Especially worthwhile (for me at least) are a few sage comments on how stl gets stuff done.
So I'd just like to know if I understand this right, and I would appreciate if anyone would let me know if I got it wrong but: arrays should be used mainly structuring numbers that will soon be used to calculate with, and linked list should hold a list of elements such as a list of an array of numbers or an array of strings or some data in general and iterate through them to do said calculations. Thanks in advance for the clarifications.
The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers. While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ). The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have. On a contiguous memory data structure (vector/array) the processor will be very fast iterating doing: "get value, get value, get value" while on a heavy pointer structure it will be a lot slower on iterating doing: "get pointer, follow pointer, get value, get pointer, follow pointer, get value, get pointer, follow pointer, get value". This will be still worse because the processor has a small "super ultra fast memory" (the cache) where on the contiguous memory case it will do "get value, get value, HMM - I SEE! HE NEEDS THIS REGION OF MEMORY - LET ME MOVE IT TO THE CACHE!". While on linked lists each value is scattered across memory, with pointers pointing to distant parts of memory, and the processor will be "OMG - I CAN'T MAKE SENSE OF THIS" and will not be capable of using the cache. There's a great write-up about this, applied to games, here: gameprogrammingpatterns.com/data-locality.html Both this video, and that write-up is saying: Careful when using pointer-heavy data structures (like linked-lists) because they might have very bad performance due to pointer chasing.
The claim is deceptive. First it sounds like he is telling why you shouldn't use linked lists at all, but then he only really explains that you shouldn't use linked lists if you need to search random elements out of it. You know, there are situations where you only need to traverse through the list linearly without searching any particular element, because something has to be done to all of the elements, but along the way you might also want to remove some of the elements out of list smoothly.
Data locality and cache-friendliness also matters (a lot), which is the point of this video imho. When you access a position in an array, all values that sit in the same block are brought to the cache if they're not there already. If using a vector, these are the next values you would be looking at next in a linear search, so all you get are cache hits. When using a linked list they probably aren't though, and this means cache fails = bad performance. So linear search in a vector will usually be much faster than linear search in a linked list, due to data compactness. In his slide he says he didn´t have to use linear search for the vector example (of course he didn't have to, random access would have been much faster) but he did, just to be fair when comparing performance with the linked list.
@@josephp.3341 It's a trade-off, of course, depending on data size etc. I myself prefer hash approaches: really don't like B-trees for two reasons. One, stuff has to get moved from place to place all the time, which is unnecessary data (& even IO) traffic. And two, the code is always a lot more complex, which leads to a much higher probability of bugs. I HATE complicated code...
This is the keynote for Going Native 2012, the full video can be found here: th-cam.com/video/OB-bdWKwXsU/w-d-xo.html (the above part can be found at the 47 minute mark).
It has to do with temporal and spatial locality--if you are traversing sequential memory in a predictable pattern as in a vector's memory layout, the cache can prefetch large chunks of this memory and access it with little overhead. Meanwhile, a linked list operation involves following pointers to random locations with no locality. So Stroustrup is arguing that even if the forward shift for a middle-of-vector insertion or deletion may look bad from a time complexity standpoint, it's not that bad thanks to hardware optimizations that linked lists are less able to exploit.
If performance is your biggest concern, you say "hello, assembler." As a practical matter, I would expect both graphs to be linear (for inserting/deleting 1 element in a structure that has N) or quadratic (for inserting N elements starting from 0.) The algorithm on an array is dominated by moving over the elements (which comes to a linear operation.) And the algorithm on a linked list is dominated by the linear search (also a linear operation.)
Same but both will be slower and the difference less significant. But the list would load all your pointers to heap in fewer cache lines, so you get your indirections faster
The inventors of Simula; Specifically, Kristian Nygaard. He was also part of the inspiration for the original C With Classes that was Bjarne’s project preceding C++
Oh, these artificial examples with linked lists that only contain one integer. If you have a more typical data structure, the pointers are a much smaller percentage of the allocated memory. Also, the memory blocks in the heap are usually in large blocks, and are close together, so the caching is usually not that big of a factor. From a programming standpoint, having to shuffle things around in order to insert something in the middle is much more likely to have overruns because people won't test all the edge conditions. So, i am unconvinced by the arguments in this video.
For large types, the overhead is mitigated substantially, especially if they're cache-line aligned along with the list pointers. Also, serious uses of linked lists like in the Linux kernel back them with efficient allocators like free lists which tend to allocate the lion's share of nodes contiguously.
Also if you care about order, you adapt the list to a tree so finding the insertion/deletion point becomes a binary search. If you don't care you can usually use FIFO/LIFO. It is true though that modern processors are astonishingly fast at sequential memory accesses because of cache pre-fetching and should be able to pipeline most array processing, so initial implementation ought be simple to save developer time.
If we can make assumptions about maximum size then Ring buffer (array based) would be faster. Especially if it is power-of-2 sized, you only need a single AND operation to wrap the indices.
Linear search to insert on a linked list? Well there is your issue. I sped something up going from a vector to a linked list. Insertion sort (even using a binary search) works until the data becomes too large. Then you use merge sort.
I didn't fully watch the vid, but from stack overflow: In general, use vector when you don't care what type of sequential container that you're using, but if you're doing many insertions or erasures to and from anywhere in the container other than the end, you're going to want to use list. Or if you need random access, then you're going to want vector, not list. Other than that, there are naturally instances where you're going to need one or the other based on your application, but in general, those are good guidelines.
In practice 99.9% of the time you want a vector. Random access is a way more common case than random insertion. And even the cases that need random insertion usually end up needing std::map anyways.
For small collections (up to 5-10 elements) lists can do better since their structure (to be initialized) is very simple. If your language of choice makes use of that often then linked lists are a way to go. Besides that, it all depends on application. If random access is important use vectors. If sequential access is a fav then use lists.
Basically, Stroustrup is saying: If you use Linked Lists with a Schlemiel the Painter algorithm then they suck. Well, he is right about that. But the argument is a strawman. You don't have to used Linked Lists like that and you shouldn't. He fails to prove that Linked Lists are bad, he only proves that certain ways of using them are bad.
To those saying he chose a terrible example, remember his stated point of the example. Lots of people see this problem and think "this is a good use case for linked lists". That's it.
See my other reply, if you are interested in my response. And "my way of thinking" about hardware limitations seems to be the general trend of thought for the last 50 years - else why all the effort to make bigger, faster machines? It's great to avoid indirection, sure. This is as true of my AMD64 as it was on my 8080A. But there are many tasks for which the indirection itself avoids many more cycles, such as inserting into an array as opposed to a list. Bjarney is ignoring the overall task.
+Arun Prasath Elements in a linked list are scattered through memory, and for accessing the nTh element in a linked list you have to follow N pointers. While elements in an array (or vector) are contiguously in memory, and you do not need to keep following pointers. You can access them directly. Due to cache memory, an array (or vector) is a lot more efficient since when you access one element the entire array/vector will be put on the cache. While on linked lists this can't happen, since the elements are not together in a contiguously memory space. Note that a linked list is different from the List abstract data type. A List is just an abstraction for a sequence of elements, which you can use a fixed size array for that, or a auto-growable array (vector), or you can link the elements with pointers (a linked list). Python List, Java ArrayList, and C++ Vector are all the same data structure: an auto-growable array. Which is a great data structure. Linked lists on the other hand, have very niche uses (but still have its uses) and most of the times are better substituted by arrays/vectors.
+Arun Prasath You should use a tree. An (a, b)-tree (with a big b if you're mainly inserting) would be good, here. But if you have to implement it yourself and don't want to put a lot of time into it, just use a binary tree. Trees have insertion ∈ O(log(n)) where n is the number of elements you already inserted. His stupid approaches both are ∈ O(n) (much, much, much worse) with a linear factor between them. Btw.: He claimed you needed a doubly linked list to do the insertion in an arbitrary place, which of course is plain wrong.
zerphase So you still have to look up the address of the next block? What happens when something is deleted. A better solution to your problem would be an unbounded array.
+Christoph Michelbach you shouldn't need to. It's a stack allocator. Can delete in order of being placed on or roll back to the begging of the stack. Personally, I'm used to doing game programming and keeping as close as possible to C for performance and compile times.
The linked list has to be the most overly hyped data structure in the history of computer science! Linked lists are WORTHLESS! You can do *everything* that you need to do by using a vector/array.
If we have very large objects stored and they frequently have writes in the middle and need sorting frequently then linkedList will perform better than vector because in vector we need to physically move/copy the elements and in this case size of the objects will become a constraint. This is not a problem in linked list. Swapping two large resource owning objects in linkedlist we can simply swap them :) unlike in vector where we need to store them contiguously and hence will need a costly copy/move operation.
Performance-wise, I can see how a vector be better than a list. But I'm curious about what the comparison is memory-wise. Does a vector require far more memory during resizing, or is there some crazy trick that the runtime library uses to shift around data and reallocate in place? The latter seems unfeasible to me. I would imagine if I had a 2KB vector inside 4KB of ram, and then I tried to add 1 more element to vector, I would get an out of memory exception.
Apoorv Kumar memmove() is for shuffling elements over on insert & delete, but when the allocation for the vector needs to grow, there's no way around the fact that you have to make a new allocation & copy (or move) elements from the old allocation to the new one, and memory usage will spike up during that operation. With a list, you don't have to do this re-size operation, but on most machines the list is going to use an extra 16 bytes per stored element (in addition to the per-allocation overhead which comes from the memory allocator). Quite often the stored elements are 8 bytes in size, so the list is at least 3x the allocated memory on a continuous basis, whereas the vector occasionally spikes up to 2x the memory requirement. So vector wins on memory usage in the common case of machine-word-sized elements, even when we're talking about maximum memory use as opposed to continuous. Another thing to mention: when you use a vector, whenever possible you want to call reserve() when you begin an operation that will add a number of elements (even if you only have a rough upper-bound on that). Calling resize() actually helps code readability, in addition to helping performance by reducing the number of re-size operations.
okay, i know i'm a little late here....but can someone tell me that does in reality this hold true for arrays too? i.e. does array implementation will be faster than list?
Yes, but arrays cannot grow dynamically so the whole point with insertions and removals is pointless. Everything you do with a raw array is in constant time; Well searching aside, which can be done exactly the same as a vector; Which uses an array behind the scenes anyway. It’s basically a resizable array
Sure (i hope you are being sarcastic)! Even if you are serious about it, that's nothing wrong with Visual Basic (6 or .net)? I have done Visual basic 6 in combination with C/C++ (making the DLL), where VB6 calling those C declaration in high level. Not much difference in speed since binary dll doing the hardyard. My point is you won't get the performance like C/C++ with other languages except Assembly, especially if there are huge floating point data e.g. Some high order differential equation
If that was true, why is Google, Facebook, Microsoft, and other very large and successful companies investing so heavily in C++? (and in fact, sending their employees to GoingNative 2012 to listen to Bjarne Stroustrup?)
the windows kernel is definitely written in ansi C with a fair bit of ASM code (i assume), parts of windows could be written in visual basic, for all we know. low level drivers are usually written in assembly, some parts in C or even Java. C++ is being replaced quite fast(as an application dev language) by java (android, windows, web dev), c#(windows)/objective c(if you're developing for apple products) and a lot more languages in web dev. These are all object oriented and use linked lists.
But a cache miss is still a linear operation. The linked list algorithm should still not go exponential. Strictly speaking, a cache miss IS a linear memory access. The processor misses the cache so it accesses main memory. 50 to 100 times seems a little high unless you think it needs to access the page directory and page table every time. But disabling the cache outright should only cause a factor of four slowdown.
Video title should be: Why you should avoid **Doubly** Linked Lists Or sub-titled: Stroustrup learns how L1 Cache usage is critical for performance sensitive code. "What every programmer should know about memory" * www.akkadia.org/drepper/cpumemory.pdf
Julien Cossette I was simply stating the **facts:** Stroustrup was not aware of how modern cache usage effects performance. He is not the only one who lacks _pragmatic_ knowledge about Data oriented Design: * research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf This is OOPs biggest weakness -- it does not scale to high performance due to programmers thinking about the uncommon case (one) instead of the common case (many). There was no ad hominem insinuate or implied.
Sorry i just perceived it that way. But you're right programmers are getting worst and worst at understanding the real impact on the hardware of whatever they are coding.
They have good amortized performance. They have a performance penalty when they regrow, and usually use more space. Since they regrow at a factor varying from 1.6 to 2, that is: when they get full they allocate a new region 1.6-2.0x bigger, copy all elements to there, and free the old region. They do not have the "pointer chasing" problem because it's at-most one pointer, that points to a contiguous region. That is, on almost all implementations of queues/stacks - since they use an array/vector implementation. If you use a linked list as the backbone for a regrow-able stack/queue then it will have the same problems pointed on the video (too much pointer chasing while iterating the container).
Why not have a mixture of array and linked list by preallocating an array and when the array is full, allocate a new array and have a pointer to it? Could you get the best of both worlds?
Matthew Mitchell regrow-able stacks/queues/arrays work more or less like this. They allocate an amount of heap memory, and have a pointer/reference to it. When it's full they allocate a new, bigger, amount of heap memory, copy the contents of the old region to the new one, and change the pointer/reference to this new region. The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers. While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ). The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have. There are several good data structures that have better performance than both vectors and linked lists, depending on your use case. A heap, normally implemented using an array/vector, has O(logN) for inserting/removing, and can be used if you have logically ordered elements.
Wait... like what? Which languages are both higher level(or more convenient) and faster than C++? also, you know that Google uses C/C++ as their main programming language right? Until people actually start seriously to invest time into run time optimization(which is kinda harder to implement on C/C++/Java/C# than languages like Python/Ruby), C/C++ will continue to be the standard for speed.
I find this really misleading and bad. His usage of linked list has the same asymptotic complexity as the vector because of the linear search. Linked list's insert is better only in situations where you have the iterator to the insertion place already available. Additionaly, neither of these data structures is the best for this task. Binary search trees would greatly outperform both and they do not store data in contiguous memory. More here: github.com/det/random_insert
That code is supposed to give you credibility in this discussion? Your example is much contrived, cherry-picking even. Going that route, you can even prove Java to be faster than C++ ;)
This comment is missing the point. Stroustrup is talking about cache misses, not about saving few bytes of pointers. Why is cache miss problematic? Basically everytime you traverse a linked-list element, you kiss your cache goodbye. That's latency. Learn about instruction pipeline, L1, L2, and hardware cache prefetch.
This is a really bad example. (For large values of N) neither vectors nor linked-lists are a good choice for the scenario he proposes. Not to mention that the curve he describes as "exponential" is certainly only quadratic.
No, I got the point. My problem is that a large fraction of the programmers who watch this video (or went to this talk) will think "If I have a problem that requires random access including insertions and removals in a sequence then Bjarne told me that a vector is the best data structure to use, so I'll use that!"
Levi Burton His spacial locality argument fails. Say I have his linked list of 100,000 elements. I insert in the middle of it. As he says, I have to traverse to the middle, risking a lot of cache misses. Conversely, using his vector of 100,000 elements & a binary search, I'll hit less than 20 elements scattered over the list, getting a few cache misses. Then when I shove 50,000 elements over by 1, they'll *all* get cache misses. Either way, i'm looking at cache misses for (on average) 1/2 of the list for every insertion and deletion. And his vector requires read *and* write for all those elements. In any case, his whole argument is predicated on random access to the list/vector, which is not what linked lists are best for. They're great for a queue, and vectors are horrible for that. In short, use the right tool for the job.
jursamaj Actually, the number of cache misses for the vector is more like 50,000/L where L is the width of the cache line. Usually L is quite big, which is what makes the vector faster. But you're right; neither a linked lists nor a vector is the right tool for this job.
problem here is that Stroustup completly ignores the problem/cost of memory allocation. sure you can push items around in an array really fast as long as you can pre-allocate a big enough linear chunk of memory in advance to hold that array. But when you can't predict the size in advance and you find yourself needing to expand the array it can get hugely expensive. Other problem is assumption that linear access is the norm. I seldom need to do that usually want specific list item via ptr
Uhh, preallocation is a solved problem. std::vector simply doubles the allocation size every time it runs out. And, yeah, it recopies everything - but this is still faster than whatever list does because it's a linear access (=the prefetcher does a great job) and it can use SSE operations (=16 bytes per instruction) and read and write on the same cycle.
As someone who uses python a LOT. Whenever he says list, I am thinking about array..... so an array is just as good as another array. Trolling aside, this is quite a nice talk.
This talk only explains why linked lists suck at handling random access when you have enough continuous free memory to allocate an array of the right size. Random access memory is designed for fast random access using pointers and the fastest general purpose structure in that environment is the array. So anything that can be done with an array should be done with an array. All other structures (lists and trees) are performance trading fallback solutions to get around situations where it's impossible to use an array. Mainly when lots of allocations and deallocations have happened and memory is so heavily fragmented that you can't find a single continuous free memory block large enough to hold your array.
Memory fragmentation is not really an issue anymore. The cpu can reorder all the 4kb pages in any order necessary (and use fixed-size buckets for smaller allocations), so it can always build a contiguous block of memory to map your array (unless you're doing no-MMU embedded stuff, or you're doing 32bit builds instead of 64bit and you exhaust the address space).
Pointers => Cache misses, so you get bad performance traversing your objects anyway. Large data structures then you aren't using numberchrunching anway.
@@coshvjicujmlqef6047 Unless you program in C and not C++. I program in C. My linked lists work just fine. If you're pushing and pulling from a stack type of vector than you're using a type of linked list where you have to push and pull in order. No different.
oh im sorry. i cant hear you over my computer running ubuntu thats written in c++ with a game engine i wrote in C++ that is not a text rpg and the numerous programs i wrote with wxWidgets that are written in C++.
This is only correct as long as you can keep vectors from growing. Like an array, expanding a vector is catastrophic and a bad idea. Random access simply violates the design of a linked list, so he better would have entitled his video "Never use random access operations (or random access iterators) on linked lists". In a scenario where the emphasis lies on inserting a lot of objects in order to e.g. buffer them, rather than doing random access on specific positions, a linked list does much better. So, as always, it depends on what you intend to do with it. This is why i dislike such absolute statements like "never do this" or "bad practice". They are almost always only true under certain conditions.
Nope. They grow, it takes very little time to allocate new memory and use it. They are *still* better in every way even when you need to copy the data to new memory. Memory is great at copying linear data. So you go ahead and time that badboy, and you'll find the linked list still sucks compared to a constantly growing vector (which actually grows logarithmically as it doubles in sizes each time it runs out of space).
"Imagine this is a graph"
He handled that well! I felt badly for him. Good talk.
Nevertheless, it would have been a much better talk if he hadn't accidentally deleted his slide.
I argue by making us visualize the graph he cemented the ideas further as instead of just looking at the graph for a moment we had to store it in our heads based on his descriptions and that increases the retention.
Does anyone have the graph which he was trying to present? Just to confirm whether I got it right.
you can find it here : www.stroustrup.com/Software-for-infrastructure.pdf
lol, tbh though, most comp sci assignments aren't about real world applications. They're more there for you to learn how a computer works. The programming experience is a plus.
Love how I'm searching for videos trying to understand linked lists and here's a video of the creator himself saying not use them.. motivating lol
:))
Vishal Nishad sorry **creator of the language itself
I fell upon this same video, and I was wondering if I should share this on my class forum... slap in the fast to the Professor :D.
noobenstein, you have the IQ of an average gerbil.
Linked lists were developed in 1955-1956 by Allen Newell, Cliff Shaw and Herbert A. Simon at RAND Corporation as the primary data structure for their Information Processing Language.
Optimizing cache accesses and memory traversal patterns is by far the most important optimization you can make (beyond, of course, choose the lowest-complexity algorithm). And this is more true today than ever before. You should look into the latest research on "cache-oblivious data structures and algorithms". In my personal experience, reducing indirections, favoring memory locality, and making traversal patterns predictable will typically improve speed between 10x or 100x on a modern arch..
always allocate a 5 GB array, that way you don't have to use linked lists
What? Did you heard of vectors?
@@KP-md3oe ok
this is some big brain stuff right here
@GeklmintendontofAwesomewhat? How is it going to be used if its not allocated?
Yes, preallocating all the memory that your program needs at once should be the default way of programming. There are exceptions, but they need to be proven first.
I have experienced this in real life while optimizing frame caches for video editing and tile sorting for tiled rendering. Switching to arrays made both operations immensely faster. For tile sorting, an operation that took half an hour was reduced to a few seconds.
Holy shit, that's something!
Vendicar Decarian What?
Linked lists are NOT good at linear access. Theoretically they should be fine. However due to CPUs being orders of magnitude faster at processing memory that can reside in the cache, and doesn't need to fetched, linked lists fail to perform in a real world environment. Unless you only ever insert into a linked list it's probably never the right collection to use. I can't remember the last time I only ever randomly inserted into, but never read or iterated, data from a collection.
@noobenstein Exactly. I use them as intended and they're great.
and that's why you use the right tool for the right job. all data structures have pros and cons
One time, for a hashmap implementation (that i needed to be really fast, faster than the STL hashmap) i created a vector of linked lists as the buckets. HOWEVER, that was fairly slow to both create and delete and the data was spread out, so instead i packed all the linked lists into a single vector, where each element of the vecor is a node of the linked list (KeyHash,Key,Element,Next)
And another vector, which keeps the beginning element for each bucket (this one is resized manually when the buckets become too full (>75%) and the other one is resized whenever it decides)
This sped up the creation/deletion of the hashmap by a lot and (probably) reduced the cache misses due to all the elements being stored in 2 vectors
(the largest that the hashmap got was around 150 million keys, the task was "determinization of finite automata", mathematically solving that task is of the order of 2^n, but i needed a fairly quick solution for a university project)
Word graphs - I prefer them - I can choose my own colours
I just love how he explained that imaginary graph. 😂
When my C++ professor asks me why i didn't complete the project on linked list.
i'll show him this video
You should still know how to create them though.
@@valizeth4073 stfu
@@valizeth4073 Only idoits try to know how to create a linked list or red black tree. Useless data structures and should be avoided like plague.
@@coshvjicujmlqef6047 You don't have a clue what you're talking about kid.
@@NeilRoy I have destroyed all your rants with rational arguments. You are a loser sorry.
People don't seem to understand that O(1) is not FASTER than O(n), it just SCALES better; there's a breaking point. Imagine a "hashcode as a service" rest call vs doing it in memory: it'd still scale better, but it'd take an awfully large 'n' to get there.
I see people missing this in PRs, streaming/looping an array is often faster with small arrays than something like a `HashSet`. And in practice, its really unlikely either makes a tangible difference in most use cases.
And the lessons learned:
1. linked lists make shitty associative arrays.
2. if you need ordered associative array, just use the fucking thing.
Linked lists are great! Pointers are great too! When I got my first c++ compiler in the 1980s I said what should I write? My conclusion was to rewrite my c linked list code as a c++ linked list object. I have used that object uncountable times, but never once have I needed to keep a list of integers. Real life applications often involve managing data records of 1000 bytes or larger. Mr. Stroustrup was also correct in pointing out that a lot of linked lists aren't large. For example, a patient's chart does not typically exceed 1000 records. While I still try to program efficiently as if I only had 640K of memory, I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant. It is displaying stuff, accessing files, and doing communications that slows things down.
>I don't worry about the overhead of pointers and their speed because today's computers are so fast and c++ is so efficient that everything done in memory takes an instant.
5 years later: software has become even slower and more buggier than ever before.
@@youtubeenjoyer1743 Microsoft's original DOS was like 100K or less if you deleted utilities. Now Windows is hundreds of megabytes maybe even gigabytes. You have to wonder the level of inefficiency and kludge they used to produce so much code to do so little, with a lot of bugs. A lot of the code is just alternate proprietary approaches to lock you into Windows. The bugs are really a management and not software problem. Today's computers and the Internet are so fast that the only way you can be slow is to build in slow technology.
6:43 "True OO" = ArrayList in Java : )
It's not a bad example to illustrate a largely fictitious dilemma. But, it's pitting two solutions of about a dozen against one another as if they are the only things going.
But it does somewhat show that a lot of optimisation comes down to 'know your hardware'.
What are the things to avoid with linked lists?
-Apparently, having a system with a cache. (this is near universal today, but it was actually the exception, not the norm in the 80's)
-Having to search the list for specific entries
- Having a trivial amount of data per node, such that the link pointers themselves dominate the storage requirements.
Obviously, searching is a very big issue in general.
Of course, in terms of teaching things, linked lists have several important relatives, most notably the tree.
A tree has a lot in common with a linked list on some level, but because the organisation is different, the result is also quite different.
Again though, know your problem space, know your hardware.
Do *not* pay attention to this advice if you are developing for CPU(s) without cache. The linked list was invented in 1955-1956, when no CPU had cache. Without CPU cache, a vector shoving each of its elements after the insertion/deletion point over by one location is very costly, and the random memory access itself of the traversal of the linked list incurs no performance hit versus incremental access.
I'm not even sure that CPUs without cache exist. But even if they did 99% of CPUs have cache so not sure what your comment is supposed to mean.
@@ayaanqui Most CPU models have no cache. Almost all the CPU models I develop for have no cache.
@@RetroDawn false. The majority have cache.
@@RetroDawn I will have to disagree with you on that, most CPUs (at least modern ones not sure about older ones) have many cache layers. Maybe the ones you work on don't but the majority of them certainly have cache
@@ayaanqui Smaller microcontrollers are cacheless, as are many but not all embedded controllers. I use everything from 9S08 (8 bit), 9S12 (16 bit), PIC16, PIC18, ARM, PowerPC. Only the ARM and two of the three PowerPCs I use have caches. There are actually three timing constraints. One is presence or absence of a instruction cache, another is the presence or absence of a data cache, other depends on the memory type. Smaller systems like embedded controllers that are run-in-place (code is in FLASH) often have static RAM so there is a small latency for page changes. Larger RAM has a precharge/rewrite cycle and crossing a boundary incurs a very large delay, often 70ns or more.
Bjarne looks like my uncle lol. Interesting and informative video!
is your uncle also smart :) ?
This is from "GoingNative 2012 - Day 1 - C++11 style" at 0:45:00 if you want the full lecture. th-cam.com/video/m0H5bUPfwn8/w-d-xo.html
Just when I thaught I found the Holy Grale by using linked lists and its child structures, this video popped up on the screen :) Most of the tutorial sides for teaching beginners like me the basics of C++ apparently get this point wrong.
That missing chart was too much of a tease. I had to repeat his experiment. See the charts and the source code here. marketmovers.blogspot.com/2016/04/how-to-make-c-code-run-90x-faster.html
Sorry, I know what you mean. That popup comes from a 3rd party library, and sometimes it goes crazy. I'll ask our web people if they can tweak the settings on that window.
I'm glad someone on this thread has a functional brain and takes performance seriously. Also good work on providing the missing slide.
Thanks for providing this. I want to point out two things. Firstly, the formatting on your page is messing up the code. It's inserted angled quotes where there should just be normal quotes. I'm not complaining, I just figured you'd want to know. Secondly, after seeing this code, I'm kinda dumbfounded. This one's not on you, so please don't feel insulted. Randomly iterating through a linked list and deleting the node you land on? Of course performance is going to be exponentially worse than an array! That's not what people use lists for! lol... Lists (hopefully) aren't used in contexts where you need to randomly access data. They're used when you have a sequence of data that you want to iterate through, applying some function on all entries of the list, one after the other, in order, not randomly. Stroustrup should have known better. Maybe he's just an old man, and with one of his slides missing, he was flustered and forgot to mention this detail. I just hope he doesn't believe this is in any way proof that vectors are always better than lists. Certainly, they often are. On certain CPUs, maybe they even always are. Computerphile made a decent video comparing the two datastructures: th-cam.com/video/DyG9S9nAlUM/w-d-xo.html
This make sense after seeing the video, all the branch predictors in the modern CPU will fail with random access, good talk !
Did he ever post that graph somewhere?
in the full video he basically says Linked list should not be your default data structure.
I am just starting out in C, C++, and Python, and I hope to be this educated about programming in a few years.
Well, how are you now?
In fact, how are you by now?
When you get to the point where you are starting to worry about insertion, deletion and search time, its time to use a b+tree which by their very nature are indexed.
Of course use B+ tree. Linked lists and RB tree are harmful
"My graph disappeared!" Must be a really fast implementation of a data structure whatever it was. ;-)
they must have not been using linked lists
Shame the graph didn't show up. The slides must have been stored as a linked list.
Thought-provoking video and comments. Very edifying for me, still learning the ropes with data structures, but savvy enough with performance analysis & architecture to follow and judge the arguments (both in the video and in the comments) fairly & profitably. Especially worthwhile (for me at least) are a few sage comments on how stl gets stuff done.
This dude seems to know what he is talking about. He should create his own programming language or give courses on udemy
Fuck!!! Bro he created the c++ programming language 😂😂😅
@@samarthtandale9121 u can't get jokes can u?
@@Person-hb3dv Oh ! sometimes I guess ... lol 😅
So I'd just like to know if I understand this right, and I would appreciate if anyone would let me know if I got it wrong but: arrays should be used mainly structuring numbers that will soon be used to calculate with, and linked list should hold a list of elements such as a list of an array of numbers or an array of strings or some data in general and iterate through them to do said calculations. Thanks in advance for the clarifications.
The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers.
While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ).
The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have.
On a contiguous memory data structure (vector/array) the processor will be very fast iterating doing: "get value, get value, get value" while on a heavy pointer structure it will be a lot slower on iterating doing: "get pointer, follow pointer, get value, get pointer, follow pointer, get value, get pointer, follow pointer, get value".
This will be still worse because the processor has a small "super ultra fast memory" (the cache) where on the contiguous memory case it will do "get value, get value, HMM - I SEE! HE NEEDS THIS REGION OF MEMORY - LET ME MOVE IT TO THE CACHE!".
While on linked lists each value is scattered across memory, with pointers pointing to distant parts of memory, and the processor will be "OMG - I CAN'T MAKE SENSE OF THIS" and will not be capable of using the cache.
There's a great write-up about this, applied to games, here: gameprogrammingpatterns.com/data-locality.html
Both this video, and that write-up is saying: Careful when using pointer-heavy data structures (like linked-lists) because they might have very bad performance due to pointer chasing.
Thank you!
@@Aleh_Loup Wonderful summary, thanks Alessandro 🙏
Great summary
The claim is deceptive. First it sounds like he is telling why you shouldn't use linked lists at all, but then he only really explains that you shouldn't use linked lists if you need to search random elements out of it. You know, there are situations where you only need to traverse through the list linearly without searching any particular element, because something has to be done to all of the elements, but along the way you might also want to remove some of the elements out of list smoothly.
Data locality and cache-friendliness also matters (a lot), which is the point of this video imho. When you access a position in an array, all values that sit in the same block are brought to the cache if they're not there already.
If using a vector, these are the next values you would be looking at next in a linear search, so all you get are cache hits. When using a linked list they probably aren't though, and this means cache fails = bad performance. So linear search in a vector will usually be much faster than linear search in a linked list, due to data compactness.
In his slide he says he didn´t have to use linear search for the vector example (of course he didn't have to, random access would have been much faster) but he did, just to be fair when comparing performance with the linked list.
Thank you Mr. Stroustrup! I learned a lot from your lecture!
@jqbtube Ooh, stop crying.
Yeah, and I assume all Call of Duty games are written in the resource-hogging Java too? Hell no. Minecraft would be better had it been written in C++.
How about skip lists? Those seem like a reasonable compromise for fairly large data sets.
@@josephp.3341 It's a trade-off, of course, depending on data size etc. I myself prefer hash approaches: really don't like B-trees for two reasons. One, stuff has to get moved from place to place all the time, which is unnecessary data (& even IO) traffic. And two, the code is always a lot more complex, which leads to a much higher probability of bugs. I HATE complicated code...
then why not store pointers in a vector?
nice video. link for the full talk, anyone?
This is the keynote for Going Native 2012, the full video can be found here: th-cam.com/video/OB-bdWKwXsU/w-d-xo.html (the above part can be found at the 47 minute mark).
The Linux kernel is written in C and assembly. Applications can be written in whatever as long as they produce an executable binary.
Linked Lists. Why you should avoid Bjarne Stroustrup.
No linked list. Why we should avoid garbage programmers (who do not understand computer architecture) like you.
So when exactly should I use Linked list ?
Why are caches good at shifting elements? I couldn't find a good explanation for this via Google.
It has to do with temporal and spatial locality--if you are traversing sequential memory in a predictable pattern as in a vector's memory layout, the cache can prefetch large chunks of this memory and access it with little overhead. Meanwhile, a linked list operation involves following pointers to random locations with no locality. So Stroustrup is arguing that even if the forward shift for a middle-of-vector insertion or deletion may look bad from a time complexity standpoint, it's not that bad thanks to hardware optimizations that linked lists are less able to exploit.
From this video I learned: you get a lot of salty comments from confused people when you misrepresent the content of your videos.
what do you mean?
If performance is your biggest concern, you say "hello, assembler." As a practical matter, I would expect both graphs to be linear (for inserting/deleting 1 element in a structure that has N) or quadratic (for inserting N elements starting from 0.) The algorithm on an array is dominated by moving over the elements (which comes to a linear operation.) And the algorithm on a linked list is dominated by the linear search (also a linear operation.)
What if you have complex heap allocated objects stored in the containers? How does it compare then?
Same but both will be slower and the difference less significant. But the list would load all your pointers to heap in fewer cache lines, so you get your indirections faster
I wish people would avoid linked lists, namely, Facebook.
I wanna know who gave advice to the creator of cpp to write more oop style...
The inventors of Simula; Specifically, Kristian Nygaard. He was also part of the inspiration for the original C With Classes that was Bjarne’s project preceding C++
Oh, these artificial examples with linked lists that only contain one integer. If you have a more typical data structure, the pointers are a much smaller percentage of the allocated memory. Also, the memory blocks in the heap are usually in large blocks, and are close together, so the caching is usually not that big of a factor. From a programming standpoint, having to shuffle things around in order to insert something in the middle is much more likely to have overruns because people won't test all the edge conditions. So, i am unconvinced by the arguments in this video.
For large types, the overhead is mitigated substantially, especially if they're cache-line aligned along with the list pointers. Also, serious uses of linked lists like in the Linux kernel back them with efficient allocators like free lists which tend to allocate the lion's share of nodes contiguously.
Also if you care about order, you adapt the list to a tree so finding the insertion/deletion point becomes a binary search. If you don't care you can usually use FIFO/LIFO.
It is true though that modern processors are astonishingly fast at sequential memory accesses because of cache pre-fetching and should be able to pipeline most array processing, so initial implementation ought be simple to save developer time.
unrolled linklists :)
nice to hear him explain 'true OO style' isnt a good thing, with all the extra pointer chasing killing modern computers
Surely, linked lists are better when you have big elements that need to be inserted and deleted often, right?
But then again, sometimes you need only a queue data structure, and what can be more efficient than a linked list for that particular purpose?
If we can make assumptions about maximum size then Ring buffer (array based) would be faster. Especially if it is power-of-2 sized, you only need a single AND operation to wrap the indices.
Linear search to insert on a linked list? Well there is your issue. I sped something up going from a vector to a linked list. Insertion sort (even using a binary search) works until the data becomes too large. Then you use merge sort.
Title is incorrect. It should be "WHEN you should avoid linked lists".
The graph was just at the end of a dangling reference
i got you fam, have a like
I didn't fully watch the vid, but from stack overflow: In general, use vector when you don't care what type of sequential container that you're using, but if you're doing many insertions or erasures to and from anywhere in the container other than the end, you're going to want to use list. Or if you need random access, then you're going to want vector, not list. Other than that, there are naturally instances where you're going to need one or the other based on your application, but in general, those are good guidelines.
In practice 99.9% of the time you want a vector. Random access is a way more common case than random insertion. And even the cases that need random insertion usually end up needing std::map anyways.
For small collections (up to 5-10 elements) lists can do better since their structure (to be initialized) is very simple. If your language of choice makes use of that often then linked lists are a way to go. Besides that, it all depends on application. If random access is important use vectors. If sequential access is a fav then use lists.
@@randomseed I would never use a dynamic data structure like a list for such a small amount of data.
Basically, Stroustrup is saying:
If you use Linked Lists with a Schlemiel the Painter algorithm then they suck.
Well, he is right about that. But the argument is a strawman. You don't have to used Linked Lists like that and you shouldn't.
He fails to prove that Linked Lists are bad, he only proves that certain ways of using them are bad.
This is why you should avoid 360p videos
To those saying he chose a terrible example, remember his stated point of the example. Lots of people see this problem and think "this is a good use case for linked lists". That's it.
See my other reply, if you are interested in my response. And "my way of thinking" about hardware limitations seems to be the general trend of thought for the last 50 years - else why all the effort to make bigger, faster machines?
It's great to avoid indirection, sure. This is as true of my AMD64 as it was on my 8080A. But there are many tasks for which the indirection itself avoids many more cycles, such as inserting into an array as opposed to a list. Bjarney is ignoring the overall task.
is this the guy behind c++... Man i remember cheating in my exam to just to write the founder of c++
wait what
@@Lastrevio there was a question on his exam about who created c++. He didnt know it so he cheated.
this video terminates before Bjarne finishes.
But then how can I even reverse one?
Sir,
Instead of Linked List,
What should i use ? May i use arrays ? Anyone please explain me.
+Arun Prasath Elements in a linked list are scattered through memory, and for accessing the nTh element in a linked list you have to follow N pointers.
While elements in an array (or vector) are contiguously in memory, and you do not need to keep following pointers. You can access them directly.
Due to cache memory, an array (or vector) is a lot more efficient since when you access one element the entire array/vector will be put on the cache. While on linked lists this can't happen, since the elements are not together in a contiguously memory space.
Note that a linked list is different from the List abstract data type. A List is just an abstraction for a sequence of elements, which you can use a fixed size array for that, or a auto-growable array (vector), or you can link the elements with pointers (a linked list).
Python List, Java ArrayList, and C++ Vector are all the same data structure: an auto-growable array. Which is a great data structure.
Linked lists on the other hand, have very niche uses (but still have its uses) and most of the times are better substituted by arrays/vectors.
+Arun Prasath You should use a tree. An (a, b)-tree (with a big b if you're mainly inserting) would be good, here. But if you have to implement it yourself and don't want to put a lot of time into it, just use a binary tree. Trees have insertion ∈ O(log(n)) where n is the number of elements you already inserted. His stupid approaches both are ∈ O(n) (much, much, much worse) with a linear factor between them. Btw.: He claimed you needed a doubly linked list to do the insertion in an arbitrary place, which of course is plain wrong.
+Christoph Michelbach just write a custom allocator that allocates contiguously with the linked list. A char * would work.
zerphase So you still have to look up the address of the next block? What happens when something is deleted. A better solution to your problem would be an unbounded array.
+Christoph Michelbach you shouldn't need to. It's a stack allocator. Can delete in order of being placed on or roll back to the begging of the stack.
Personally, I'm used to doing game programming and keeping as close as possible to C for performance and compile times.
The linked list has to be the most overly hyped data structure in the history of computer science! Linked lists are WORTHLESS! You can do *everything* that you need to do by using a vector/array.
Cool. Now implement a cactus stack using vectors/arrays.
Is the rest of this talk available somewhere?
He used a link list for inserting that graph that's why it disappeared!
If we have very large objects stored and they frequently have writes in the middle and need sorting frequently then linkedList will perform better than vector because in vector we need to physically move/copy the elements and in this case size of the objects will become a constraint. This is not a problem in linked list.
Swapping two large resource owning objects in linkedlist we can simply swap them :) unlike in vector where we need to store them contiguously and hence will need a costly copy/move operation.
What about vector of unique pointers that point to struct? It holds small amounts of data even if it has 1000 elements so re-allocations are fast.
Performance-wise, I can see how a vector be better than a list. But I'm curious about what the comparison is memory-wise. Does a vector require far more memory during resizing, or is there some crazy trick that the runtime library uses to shift around data and reallocate in place? The latter seems unfeasible to me. I would imagine if I had a 2KB vector inside 4KB of ram, and then I tried to add 1 more element to vector, I would get an out of memory exception.
There is a neat trick for that. See: man7.org/linux/man-pages/man3/memmove.3.html . Gives you the illusion that you have sufficient temp store.
Apoorv Kumar memmove() is for shuffling elements over on insert & delete, but when the allocation for the vector needs to grow, there's no way around the fact that you have to make a new allocation & copy (or move) elements from the old allocation to the new one, and memory usage will spike up during that operation. With a list, you don't have to do this re-size operation, but on most machines the list is going to use an extra 16 bytes per stored element (in addition to the per-allocation overhead which comes from the memory allocator). Quite often the stored elements are 8 bytes in size, so the list is at least 3x the allocated memory on a continuous basis, whereas the vector occasionally spikes up to 2x the memory requirement. So vector wins on memory usage in the common case of machine-word-sized elements, even when we're talking about maximum memory use as opposed to continuous. Another thing to mention: when you use a vector, whenever possible you want to call reserve() when you begin an operation that will add a number of elements (even if you only have a rough upper-bound on that). Calling resize() actually helps code readability, in addition to helping performance by reducing the number of re-size operations.
vonkruel - Right. I answered out of the context of a vector. Thanks for the correction :)
Is there the entire talk somewhere?
okay, i know i'm a little late here....but can someone tell me that does in reality this hold true for arrays too? i.e. does array implementation will be faster than list?
Yes, but arrays cannot grow dynamically so the whole point with insertions and removals is pointless. Everything you do with a raw array is in constant time; Well searching aside, which can be done exactly the same as a vector; Which uses an array behind the scenes anyway. It’s basically a resizable array
@@casperes0912 ofc LL has that advantage. But atleast now i can use arrays in competitive coding more often!? Thank you for replying though!
Sure (i hope you are being sarcastic)! Even if you are serious about it, that's nothing wrong with Visual Basic (6 or .net)?
I have done Visual basic 6 in combination with C/C++ (making the DLL), where VB6 calling those C declaration in high level. Not much difference in speed since binary dll doing the hardyard.
My point is you won't get the performance like C/C++ with other languages except Assembly, especially if there are huge floating point data e.g. Some high order differential equation
If that was true, why is Google, Facebook, Microsoft, and other very large and successful companies investing so heavily in C++? (and in fact, sending their employees to GoingNative 2012 to listen to Bjarne Stroustrup?)
Did he ever posted the missing slide?
the windows kernel is definitely written in ansi C with a fair bit of ASM code (i assume), parts of windows could be written in visual basic, for all we know. low level drivers are usually written in assembly, some parts in C or even Java.
C++ is being replaced quite fast(as an application dev language) by java (android, windows, web dev), c#(windows)/objective c(if you're developing for apple products) and a lot more languages in web dev. These are all object oriented and use linked lists.
Agree with you. But I think Linus Torvalds wrote the Linux kernel in C.
But a cache miss is still a linear operation. The linked list algorithm should still not go exponential.
Strictly speaking, a cache miss IS a linear memory access. The processor misses the cache so it accesses main memory. 50 to 100 times seems a little high unless you think it needs to access the page directory and page table every time. But disabling the cache outright should only cause a factor of four slowdown.
Video title should be: Why you should avoid **Doubly** Linked Lists
Or sub-titled: Stroustrup learns how L1 Cache usage is critical for performance sensitive code.
"What every programmer should know about memory"
* www.akkadia.org/drepper/cpumemory.pdf
There was no need for the veiled ad-hominems.
Nice link tho thx for that.
Julien Cossette I was simply stating the **facts:** Stroustrup was not aware of how modern cache usage effects performance. He is not the only one who lacks _pragmatic_ knowledge about Data oriented Design:
* research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf
This is OOPs biggest weakness -- it does not scale to high performance due to programmers thinking about the uncommon case (one) instead of the common case (many).
There was no ad hominem insinuate or implied.
Sorry i just perceived it that way. But you're right programmers are getting worst and worst at understanding the real impact on the hardware of whatever they are coding.
What about arbitrarily scalable queues and stacks?
They have good amortized performance. They have a performance penalty when they regrow, and usually use more space. Since they regrow at a factor varying from 1.6 to 2, that is: when they get full they allocate a new region 1.6-2.0x bigger, copy all elements to there, and free the old region.
They do not have the "pointer chasing" problem because it's at-most one pointer, that points to a contiguous region.
That is, on almost all implementations of queues/stacks - since they use an array/vector implementation. If you use a linked list as the backbone for a regrow-able stack/queue then it will have the same problems pointed on the video (too much pointer chasing while iterating the container).
Why not have a mixture of array and linked list by preallocating an array and when the array is full, allocate a new array and have a pointer to it? Could you get the best of both worlds?
Matthew Mitchell regrow-able stacks/queues/arrays work more or less like this. They allocate an amount of heap memory, and have a pointer/reference to it. When it's full they allocate a new, bigger, amount of heap memory, copy the contents of the old region to the new one, and change the pointer/reference to this new region.
The linked list advantage is, theoretically, on mid element inserting/removing because you iterate to the position, and for removing/inserting you just change 2 pointers. While on a contiguous memory (array/vector) to insert/remove an element mid position you iterate to it but you have to shift several elements to right/left: if you want to insert a new element on position 6 you have to shift to right all elements from 6 until the end of the array/vector. (That is, considering you want to preserve order ). The thing is, on the practice the pointer chasing penalty of linked lists destroy - on most cases - any advantage they have.
There are several good data structures that have better performance than both vectors and linked lists, depending on your use case. A heap, normally implemented using an array/vector, has O(logN) for inserting/removing, and can be used if you have logically ordered elements.
If I want fast search, insertions and deletions I use a tree structure.
Matthew Mitchell While I agree that searching a tree can be very fast (especially B trees), inserting and deleting is not fast...at all
Ah! The time-tested art of the multimedia handpuppets!
Why he didnt used paint
Wait... like what? Which languages are both higher level(or more convenient) and faster than C++? also, you know that Google uses C/C++ as their main programming language right? Until people actually start seriously to invest time into run time optimization(which is kinda harder to implement on C/C++/Java/C# than languages like Python/Ruby), C/C++ will continue to be the standard for speed.
made a big change in my homework solution thanks :)
I find this really misleading and bad. His usage of linked list has the same asymptotic complexity as the vector because of the linear search. Linked list's insert is better only in situations where you have the iterator to the insertion place already available. Additionaly, neither of these data structures is the best for this task. Binary search trees would greatly outperform both and they do not store data in contiguous memory. More here: github.com/det/random_insert
That code is supposed to give you credibility in this discussion? Your example is much contrived, cherry-picking even. Going that route, you can even prove Java to be faster than C++ ;)
Thanks for upload!!!
MFW the binary search for such cheap comparisons is actually pessimization over linear search of the vector.
This comment is missing the point. Stroustrup is talking about cache misses, not about saving few bytes of pointers. Why is cache miss problematic? Basically everytime you traverse a linked-list element, you kiss your cache goodbye. That's latency. Learn about instruction pipeline, L1, L2, and hardware cache prefetch.
This is a really bad example. (For large values of N) neither vectors nor linked-lists are a good choice for the scenario he proposes. Not to mention that the curve he describes as "exponential" is certainly only quadratic.
I think you missed the point. This is about spatial locality.
No, I got the point. My problem is that a large fraction of the programmers who watch this video (or went to this talk) will think "If I have a problem that requires random access including insertions and removals in a sequence then Bjarne told me that a vector is the best data structure to use, so I'll use that!"
Pat Morin It would appear I missed YOUR point. =)
Levi Burton His spacial locality argument fails. Say I have his linked list of 100,000 elements. I insert in the middle of it. As he says, I have to traverse to the middle, risking a lot of cache misses.
Conversely, using his vector of 100,000 elements & a binary search, I'll hit less than 20 elements scattered over the list, getting a few cache misses. Then when I shove 50,000 elements over by 1, they'll *all* get cache misses.
Either way, i'm looking at cache misses for (on average) 1/2 of the list for every insertion and deletion. And his vector requires read *and* write for all those elements.
In any case, his whole argument is predicated on random access to the list/vector, which is not what linked lists are best for. They're great for a queue, and vectors are horrible for that. In short, use the right tool for the job.
jursamaj Actually, the number of cache misses for the vector is more like 50,000/L where L is the width of the cache line. Usually L is quite big, which is what makes the vector faster.
But you're right; neither a linked lists nor a vector is the right tool for this job.
i love all 3 pixels in this video
problem here is that Stroustup completly ignores the problem/cost of memory allocation.
sure you can push items around in an array really fast as long as you can pre-allocate a big enough linear chunk of memory in advance to hold that array.
But when you can't predict the size in advance and you find yourself needing to expand the array it can get hugely expensive.
Other problem is assumption that linear access is the norm. I seldom need to do that usually want specific list item via ptr
Uhh, preallocation is a solved problem. std::vector simply doubles the allocation size every time it runs out. And, yeah, it recopies everything - but this is still faster than whatever list does because it's a linear access (=the prefetcher does a great job) and it can use SSE operations (=16 bytes per instruction) and read and write on the same cycle.
I believe he's the creator of C++
As someone who uses python a LOT. Whenever he says list, I am thinking about array..... so an array is just as good as another array. Trolling aside, this is quite a nice talk.
List in python is not an array, its literally a list. Unless you use numpy
Chad Vector vs Virgin List
Google chrome, Firefox and Safari are mainly written in C++. I guess those web browsers are, as you like to call it "noobs".
This talk only explains why linked lists suck at handling random access when you have enough continuous free memory to allocate an array of the right size.
Random access memory is designed for fast random access using pointers and the fastest general purpose structure in that environment is the array.
So anything that can be done with an array should be done with an array.
All other structures (lists and trees) are performance trading fallback solutions to get around situations where it's impossible to use an array.
Mainly when lots of allocations and deallocations have happened and memory is so heavily fragmented that you can't find a single continuous free memory block large enough to hold your array.
Memory fragmentation is not really an issue anymore. The cpu can reorder all the 4kb pages in any order necessary (and use fixed-size buckets for smaller allocations), so it can always build a contiguous block of memory to map your array (unless you're doing no-MMU embedded stuff, or you're doing 32bit builds instead of 64bit and you exhaust the address space).
that improv got me!!
I guess the slides of that presenation were implemented as linked lists :-)
Pointers => Cache misses, so you get bad performance traversing your objects anyway. Large data structures then you aren't using numberchrunching anway.
ITT: People who can't use linked lists properly.
The proper use of Linked Lists is not using them.
Linked lists are meant for linear access. Not random. If you're using a linked list for random access, you're using the wrong tool for the job.
@@NeilRoy If you want linear access. You should still use std::vector. The creator of Linked list should be put into gulag.
@@coshvjicujmlqef6047 Unless you program in C and not C++. I program in C. My linked lists work just fine. If you're pushing and pulling from a stack type of vector than you're using a type of linked list where you have to push and pull in order. No different.
@@NeilRoy C programmers have no idea how computer architecture works. TBH
great explanation!
So... imagine this to be a graph...
oh im sorry. i cant hear you over my computer running ubuntu thats written in c++ with a game engine i wrote in C++ that is not a text rpg and the numerous programs i wrote with wxWidgets that are written in C++.
This is only correct as long as you can keep vectors from growing. Like an array, expanding a vector is catastrophic and a bad idea.
Random access simply violates the design of a linked list, so he better would have entitled his video "Never use random access operations (or random access iterators) on linked lists".
In a scenario where the emphasis lies on inserting a lot of objects in order to e.g. buffer them, rather than doing random access on specific positions, a linked list does much better. So, as always, it depends on what you intend to do with it.
This is why i dislike such absolute statements like "never do this" or "bad practice". They are almost always only true under certain conditions.
Nope. They grow, it takes very little time to allocate new memory and use it. They are *still* better in every way even when you need to copy the data to new memory. Memory is great at copying linear data. So you go ahead and time that badboy, and you'll find the linked list still sucks compared to a constantly growing vector (which actually grows logarithmically as it doubles in sizes each time it runs out of space).
David Olsen amortized constant time is still not theta(1) time. Appending a linked list should outperform a vector
That's in theory, in practice linked lists have a lot of cache misses
That's why you would use a vector of pointers