Bjarne Stroustrup is one of the best engineers to emulate. he always comes off as a guy focused on solving problems and not about having pointless opinions.
@@shlokbhakta2893you don't always get a choice what you work on. I did C++ for several years. It's one of those interesting cases where lots of individual decisions make sense but the final result has problems. You end up with lots of features you don't want and not getting things other languages have had until years later.
Big O notation is based on limits in calculus. In theory O(1) is always faster than O(n) but only as n approaches infinity. This is why built in sort functions in well implemented programming languages will often select an algorithm based on the number of elements you pass in.
@@bigpest That particular statement is inaccurate in a couple ways. First, everything that's O(1) is also O(n), or O(n²) etc, just the bound isn't tight. That means O(1) is not necessarily faster than O(n) for any n. Maybe that's pedantic, so let's switch to tight bounds throughout (I'll use Θ instead of O, for clarity). Secondly, we use the limit notation to indicate the asymptotic behavior as n "approaches infinity" but it's important to understand that what this actually means is that there's some _finite_ value at which Θ(1) is faster than Θ(n). We're not dealing with some weird mathematical statement that's only valid exactly on the limit, like 0.9999... = 1. Given a _large enough_ n, Θ(1) will always beat Θ(n).
This. I said in my own comment using big O for speed is like measuring the number of lanes on a road to tell you how fast the cars can driving on it are going.
In very high level languages, all bets are off. Just use whatever data structure is most convenient until you encounter performance issues. Then measure to find out which part of the code is the culprit. Then repeat the measure, tweak, measure cycle.
Exactly! I’d rather use a set to get the unique elements in a small collection even if slower as it’s more expressive. If it turns into a bottleneck then optimize.
I agree. When I do anything in more modern architectures I don't even think about clock cycles. On 8 bit micros though? Time to whip out the opcode chart
It’s cache misses that causes the slow down. The list will perform badly because it’s generating cache misses all over the place. The array is guaranteed to have contiguous memory, so as you don’t have to go to main memory as frequently. This is the meaning of compact and predictable access of memory.
It's not always the cache misses. There are also pipeline stalls from dependent loads: the next address to fetch in the list is in the data you are currently fetching so even with speculative execution those loads have to be entirely serial. Cache misses are indeed overwhelmingly likely to be the culprit for performance issues and hence it's a good rule of thumb to optimize for them first, but they aren't the only effect.
@@zoeherriot cache misses take place for array, too. However, there are prefetchers that guesses the next elemnt and prefetches it to make it cache hit. Also, when there is a cache miss, cache fetches 1 cacheline of data to cache which is mostly 64B. So, it brings several elements of the array which again reduces number of cache misses.
@@EmreHepsag yeah, I get that. My comment was because both Bjarne and Scott Myers both did talks where they explicitly covered this topic - and the point was - almost always choose arrays over linked lists. I wasn’t trying to get into the weeds, I just assumed everyone had seen those talks.
To be fair, also presumes that you're iterating the list, if you aren't, they aren't as bad, but in most cases when you make a container of elements that isn't like a map/dictionary, you're usually iterating it, hence why lists tend not to be a great container to resort to.
One does not add constants to big O notation, because it is not intended to indicate performance, which varies between processors and data. Instead big O notation is an indication of how the performance scales with regards to input, agnostic to the processor and data. It's possible for an O(n*log(n)) algorithm to perform significantly worse than an O(n^2) counterpart on small inputs and significantly better on larger inputs. When optimizing, benchmark for your specific data and platform. The reason why linked lists generally don't perform well is due to cache misses and speed of main memory vs CPU cache. So while it might be a single step in the algorithm the CPU will be idle for 1000s of cycles as it waits for main memory to return the value. Predictable memory lookup (Such as with with a vector/array) often leads to the CPU prefetching the data before it actually needs to be read.
It also depends on the architecture. For example if your using slow ass RAM such as SPI on embedded devices, you'll get wildly different results with the same code as opposed to running on a PC or a IBM Power RISC. Always fish some results early. Benchmark for your needed dataset size.
11:30, during my measurements of "if-block vs switch vs array (fixed-size) vs bitfields", for different configurations of these data structures, I found that they have this increasing speed order, on average. But I also found that some switches are slower than ifs, and others beat the "lesser" array configurations, as well as the faster array ones beat some "slower" bitfields.
My first question when approaching this stuff is "How big is the element, and how many elements are there?' I can usually rewrite a function that inserts/removes n elements into/from a vector length m from O(m*n) (move tail after every insertion/removal) to O(m+n) bundling all the operations into one pass through the vector (move tail by m, then subsequently move elements by less as elements are encountered). I like to use lists to store big almost-singletons. Stuff you typically expect to have one of, but in rare cases you'll want more. Say, a logging service with connection to the logging server. In 99.9% of the cases you'll have one logging server, one connection, period. Rarely you'll have 2. Almost never 3. The object itself is quite heavy, containing a sizeable, fixed size buffer for the logs. I don't want a vector to pre-allocate space for several of these. I absolutely don't allow moving them on resize, 'cause other stuff already has pointers to the existing instance. But traversing the 1-3 element list to visit all the elements takes a blink of an eye comparing to time spent performing the object's actual job, and no RAM is wasted.
Interesting. I was just doing tests like this in C# yesterday morning... I found if you're inserting multiple 100s of thousands or more elements near or at the front, it's worth converting a list to a linked list for the insertions, and then back. This gets better as the original size of the list and number of insertions increases. I'm not sure I've ever seen a scenario in production for this kind of thing... but if it comes up I'll be ready for the rare time a LL will be worth it! lol (and that's only if the memory isn't a concern for the app at that point)
Not sure about the current state, but List used to be just a cute wrapper around T[]. If it still is, your scenario would make perfect sense as the container array would have to be allocated>copied>disposed every time you exceed it's size with the insertions. And obviously, an insert (or removal) near/at the beginning will need A LOT more element shifting than the same operation at the end. Should be easy to test. Allocate a List then start inserting items one by one at the end and timing the insertion. It should be (near) constant until you hit the limit, then it will have an "outlier" and go back to the same (near) constant on the next insertion, with the "outliers" appearing at regular intervals because of the allocate>copy cycle. But C# is a really nasty thing when it comes to optimization. For example [y*stride+x],[y][x] and [y,x] should be pretty close, but they are anything but. And if you pick the best one (for the moment), you might get surprised if you later change your CLR and it becomes worse than before. Or to put it simply, if performance is critical, C# is not it, write a dll/so in C++/C/somethingpredictable with the performance critical parts in it. If you can live with sub-optimal, by all means, let it all live in C# (for example [][] is not always the best, but is never the worse ;) ).
Ya if performance was truly critical, we'd probably want some sort of optimized/custom initializing and resizing logic instead of the what C# provides with its array wrapper for List, like you said. But as I mentioned, I haven't even needed to stray from List in any project for performance reasons, as the C# hasn't come close to network times. C# actually just released a bunch of performance upgrades with .Net 8 today! Still not the language you choose for hardcore performance, but there's some nice improvements for JIT and AOT compilation. This was just a fun, quick experiment (similar to what The Primagen was writing) to see if I should have been considering LinkedList instead of List in my previous projects. @@ErazerPT
@@Average-Lizard Can totally understand you. On some project we had to interface with a Sharepoint server that lived in a network that can only be described as either dysfunctional, schizo or both. Add both things and you get a slug on Valium. We gave up on optimization because we could at best shave some dozens of milis out of seconds on the external calls...
If a usecase were heavy on front insertions you could probably just go with a double ended array. Or just an array where you store the items in reverse by convention, so front insertions get added to the back
@@ErazerPT Step 1 in my plan to optimize C# applications I deal with. Use async. I've seen C# written purely synchronously more often than you'd think. Heck, I've written more synchronous C# than I'm proud of. When the upper level is synchronous, it's just easier to stay that way. Slower, but easier.
One thing to keep in mind, for many things code readability and maintainability, while subjective, can be more important that small performance gains. If a Map makes your code crystal clear, use it
What about smartass compilers? Let's say the compiler knows I'm doing 5 deletions, instead of deleting + compacting 5 times, it could try to do all deletions at once maybe? Same goes for other crazy operations on vectors that it can see optimizable or not?
That's why you always test. There may be compiler options that will speed up the performance, but they may also slow down the performance. Same goes with tuning queries in the database. Just throwing indexes at something does not always work.
I used merge sorting linked lists in C and it beat using arrays. Insertion sorting doesn't scale no matter how you you store the data. Binary searching your linked list will beat linear searching, he seems to overlook that. I didn't need the data sorted all the time, there are always tradeoffs.
@@phill6859 How does binary search on linked list function? You have to traverse the list to get to the checked element. Is the "check element" function that demanding?
@@Kycilak yes. Doing a strcmp on each item takes time and if your linked list pointers are allocated from a block they will end up in the cache. If you linearly compare all the keys then you will thrash the cache. This was 25 years ago, caches were smaller. I had plenty of people telling me I was wrong at the time. It worked and it was faster, so I just ignored them.
@@egor.okhterovnot a skip list, going to an index in a linked list is faster than strcmp linearly because of cache usage. 50% of the time you will be going in the opposite direction that you just travelled and so that is in the cache too. You can cache more pointers than you can data.
When remove an element from a doubly linked list you also have to update the next and previous links (pointers) in the next and previous location as well.
I had an application once that processed some data, correlating two lists of data by using one list as keys and the other list as a lookup table; the lists were two different representations of the same dataset. Changing that lookup list to a hash table provided a speedup of over 16000x in the biggest dataset we had of over 50k entries. Point is: time your application before you optimize; you don't know when the constants trump the asymptotics.
It's not that 0(1) is really O(C*1) - it's that O(C(n)) means that operations are less than or equal to C*f(n) for some constant C and sufficiently large x. For example, 5x^3 + 3x^2 + x + 18 < C*x^3 when x is >= 1000 and C is 6, since 6(1000)^3 > 5(1000)^3 + 3(1000)^2 + 1000 + 18, and for all similar expressions where x > 1000. It's the same concept as a limit going to infinity, how only the largest piece matters once x gets large enough. idk if this is actually what O stands for, but in my head I always say 'order of f(n)', which is what it is.
it is actually what big-O stands for. But in computer science we are used to the fact that n approaches infinity, because n is like some number of 'basic operations', so algorithms are always scaling to infinity. But to be clear, 'n' in f(n) might be some other value, even negative, or even a vector of values
wait a sec, I just realised that the insertion time that I made for my arraylist datastructure library in C is actually faster than a linked list, and probably so too because a linked list is getting a ton of cache misses. I made my arraylist in C kind of a ring buffer so that it tricks you into thinking that inserting front will actually insert at front but it actually inserts at the back, and because of some math this is O(1) and then inserting means just instantly going to the element and then either moving all elements to front or to the back depending on which path is faster. so that O(n) but my insertion can always calculate the fastest path. the only thing that would take more time would be doing a resize which is O(nn) but tbh, this can easily be prevented with my arraylist too by just having a big enough capacity for the job from the start or something like that. but then again yeah Idk if the resizing is that bad honestly.
All software development is about tradeoffs. If the problems caused by abstracting memory away are net less expensive than the solutions that the abstraction provides - then it is a good abstraction.
The biggest reason to use std::list over std::vector is the difference in iterator-invalidation. Iterators are/maybe invalidated by inserting into/removing from a std::vector, but iterators for a std::list remain valid unless the element your iterator is pointing to was removed.
That's just a C++ thing only. In general, the best use linked lists have is to allocate a list of different size and types elements, filled with the equivalent of void pointers in C that later get a defined size and type.
12:00 Doesn't mean much when not telling which method turned out to be faster. For me it is obvious the array is, its just a tiny bit of continuous memory that can be processed at full processor speed without memory latency messing things up. If the list turned out to be faster, now that would really surprise me and I would certainly expect that to be because of some data skew making the comparison invalid.
A more telling example: If you write a function that always returns an array of size 1000000, this function has a O(1) time and space complexity. Another function, that always returns an array of size (n) where n is the input, will be an O(N) function, but will be faster up until 1000000.
@@vasiliigulevich9202 No, that's just not true and a gross oversimplification. An array of uninitialized memory is *probably* (modulo weird allocators) O(1). A zeroed array might be O(1) or O(N) depending on your allocator, OS, hardware etc.. Generating an array of random numbers is almost surely O(N). Generally speaking the time complexity can be arbitrarily high because computing any element of the array can be arbitrarily large - and together with that we can also bump the space complexity arbitrarily high.
Majority of the time a vector (dynamic array) with a pool, SBO, and/or static buffer are just as good as a fixed size array, unless you have some reason you need to avoid heap allocations.
@@Spartan322 Pros and cons. If I ever expect the number of elements to change I'm almost always going to go with std::vector over std::array. Assuming we're talking about member variables, the memory layout between the two is pretty different, even if both are on the heap. Most of the time it doesn't mater, but that extra pointer dereference can cause cache misses. It's also a massive difference in how copy, move, and serialize/deserialize operate. With the fun part being moves are almost always much faster when using std::vector.
@@arthurmoore9488 Usually the case for which I'll consider array over vector is when doing constexpr/consteval stuff, as Apple's version of Clang still doesn't like allocations in constexpr/consteval even in C++20, template metaprogramming can get pretty gnarly with everything we have in C++20. (that's not entirely supported by any if not all the compilers) Course you need to never let allocations escape the constant eval environment regardless so you have to return fixed sized objects, but most of the time I just resort to vector if I need a contiguous container unless I need some type of stack advantage, its rarely been useful in my experience to deal with arrays aside from those cases.
@@Spartan322 Directly taking pointer to elements in an array can be very useful. Explicitly setting a hard limit on something makes stress testing trivial. Fixed length arrays are more portable, not every platform has malloc (i.e. wasm)
@@pokefreak2112 Pretty sure emscriptem has dlmalloc and emmalloc, so how would wasm not support malloc? Like guess you could set it to run without malloc, but that's not really the same as saying it does not support it, malloc is literally standard C, if it can't use malloc, its not standard C regardless, that's like saying well defined behavior does not need to follow the standard, then you're simply not using C, let alone C++. I can't think of a single platform that doesn't support malloc that can claim to support C.
Here's an even faster vector based semi-compact "list": don't move elements over, keep another vector of indexes to determine where gaps are, only fill in the gaps on specific call to do so, such as after you're done inserting/removing whatever, that will be undisputably faster than removing one element and shifting entire block of data at a time. You can even not ever move elements anywhere at all, simply maintain second vector that holds correct permutation of first's indices, and it will still be faster than list, because when it matters, you can swap elements according to permutation before doing any heavy work. There's practically zero real world usecases for linked lists.
Linked lists force memory lookups for every node, where a vector has all of the info in contiguous memory. That might might be obvious, and most people also know that memory lookups are extremely slow... But often a vector's elements can fit within a cpu register which is blazingly fast, and the cpu is also blazingly fast at manipulating data within it's own registers. std::vector also has optimizations for certain removals, so many operations can be performed with a single 'walk'' of the array where that can be much harder with a linked list. Linked lists do have use-cases, but I think it's relatively rare and it's usually a trade-off for a gain somewhere else; they seem closer to an educational tool than anything else lol.
I don't like words like "fast" and "slow" in this context. "Blazingly fast" and "extremely slow" makes me cringe even harder. I could accept "faster" and "slower" though :)
@@egor.okhterov The difference is large enough to use "blazingly" and "extremely lol. The number I find from a Google search is cpu registers are generally 10-100 times faster than RAM, but it's kind of an ambiguous value.
@@Hazanko83 the context is king. What matters is "observable" difference. For example if I sort a file in 1ms or 100ms I will not observable the difference, so that's basically the same speed for me. If I am a web page user and a page loads faster then I blink all the speeds faster then I can move my eyelids are equivalently blazingly fast 😀
@@egor.okhterov That kind of logic only works for one-off situations. a 1ms difference compared to a 100ms difference, could mean thousands or millions of dollars in server costs for a website. Or in a game it could mean it running at 1fps or 100fps if every function is running 100x slower than it could be. The context is real world applications vs someone's toy project.
this is why unit testing, benchmarking, and pure functions are so powerful. you can pump out dog sh** code, and easily fix/optimize things later, in my experience.
Arrays are only faster when you have cache. Singly linked lists are faster when you don't have cache. Lisp, a programming language of linked lists, functions best in 36 bit word size. 32 bit size is not optimal for Lisp. Lisp, LISt Processing, treats everything as data. Functions are themselves lists of data. Because Lisp code can be processed as data, you can write Lisp code which will write Lisp code. Something that Bjorn Strousup can't do in C++. Christian Schafmeister, a software engineer and chemical engineer, designs 3D molecular modeling software, switched from C++ to Lisp. He spent 10 years on his project, after hitting a development wall in C++, found Lisp, and wishing he had started his project in Lisp.
Good thing there's no computer in the wild that doesn't have cache AND has a lot of memory. Microcontrollers have no cache but don't have enough memory to accomodate each element wasting 2+ bytes because pointers.
It'd be interesting to see how the performance changes if you're using arena allocation for the linked list, or some sort of data structure that better represents the problem at hand, that'll cost you development time though and there's no guarantees that'll be faster.
it's called a std::vector or a c-style array lol. The whole point of a linked list is that you can have exit/begin pointers to 'neighboring' nodes BECAUSE you don't know where they are in memory, if you're expecting data to be allocated within the same area of the ''arena'' or to be contiguous, then you'd just use an offset on the object pointer to get to previous/next nodes(which is literally a raw array). std::vector essentially is it's own ''arena allocator'' in the sense that it manages the allocations and 'garbage collection' for you, instead of calling into your own custom memory allocator. For example, the struct below and it's function works, and is ''valid c++''. Although I would not recommend designing anything like this lol... struct SillyLinky { inline SillyLinky* get_next() { return (this+1); }; };
You still have to store pointers and that overhead adds up, your cache lines will have less data, and you will have more cache misses. Each insertion requires allocation, writing node data at the advancing arena pointer. Depending on the problem, your linked list nodes may become randomly spread out on the arena, and a simple traversal will cause unpredictable memory access patterns.
@@youtubeenjoyer1743 We're talking about small memory optimizations, so the size of the array might not be a concern. Making the arena larger but not so much as to allowing it to fit into a cache line could be possible. Also I'm not talking about a linked list on top of an arena in the "use this allocator instead" c++ way, I'm talking about maybe a vector with index pointers, you'll be jumping arround the vector instead of the entire heap so probably that changes things.
@@Hazanko83 You took the worst possible interpretation of what I said lol. I'm not proposing an arena allocator in the c++ way, I meant it in the sense of treating the vector like a linked list, storing pairs of (element, index of next element). This way you still have the constant search performance advantage. Would still need to test it though.
@@user-hk3ej4hk7m Vector IS a constant time, random access container though?... Which is as good as you can get... You can do exactly what I showed with that struct thing, with it being inside of a vector, without needing a pointer variable as a member or needing anything initialized(it's just a bad idea). When you use operator[] on vector, it's simply pointer arithmetic on a pointer. After having data contiguous in memory, and increasing cache locality, clever algorithms is really all that's left... not more misdirection.
There is also the thing than CPUs have special instructions to work with arrays, strings and more higher concepts nowdays. While they can't really do much to improve linked lists. However, it doesn't mean lists are never useful, but i do prefer arrays for contiguous lists of data, but trees which are just multi dimensional lists are so nice for many other things.
I get that array based lists are compact, but how is adding/removing from their wrapper counterparts such as vector or ArrayList/List faster? Just because they are contiguous and are accessed via an offset? I assume this means that accessing an element of the linked list through its reference/pointer member is slower overall.
I'm not really experienced enough to speak on the matter but I think it depends a bit on hardware. If you're doing embedded development for a single architecture, and the machine's concept of allocation is "Here's 64k of RAM, that's all you get" I'd rather just use arrays
Genuinely asking, for the problem in the beginning why would you intuitively reach for the linked list? It says that you remove the elements by random positions, in the LL implementation you'd have to traverse till you get to that position where as you have direct access in the vector/array implementation.
It’s because when you remove an element in a vector you have to shift every element over by one. Here’s the original array: std:vector = {0, 1, 4, 6, 8}; and if you want to remove 4, you have to shift over 6 and 8 so that the new vector looks like {0, 1, 6, 8} If not for shifting, the vector would look like: {0, 1, GARBAGE, 6, 8}. The reason you don’t need to create a new vector is because you just change the iterators pointing to the last element.
Because linked lists are better designed to delete elements in the middle of the list (you just relink the previous to the next and the next to the previous), whereas vectors will realocate as the elements must be contiguous in memory.
@@jaredteaches894 Thanks for the reply. So essentially it's just a tradeoff between taking the time to search the linked list to find your target and shifting the memory? Is it more intuitive because it's more ergonomic from a developer POV?
@@Lukeisun7 It's "intuitive" because it's often taught that insertions and deletions are efficient in a linked list, which is true; they are efficient. So, when you want to have a data structure where you plan on doing a lot of insertions and deletions, you will naturally choose a linked list. The author is suggesting that you need to measure the cost of operations. The article ends by emphasizing the importance of locality -- both spatial and temporal. The general gist of caching is the most common case will be efficient. Things closer to X will probably be used again so hardware will exploit that. This is not necessarily true in a linked list. Nodes on a list don't have to be near each other in memory. In other words, the addresses of nodes might not be cached (within the TLB) and this extra memory references will be incurred.
The whole O(N) being actually O(C*N) is interesting and important, but when it comes to cache locality issues, it's a different issue and much worse. The average cost of a memory access is equal to the cost of a cache miss times the probability of getting a cache miss. The probability of getting a cache miss on a random memory location grows linearly with N (memory). Also, the cost of individual cache misses increases with N because the data has to be fetched from even further away. In my own measurements, years ago, the overall cost of cache misses for random jumps in memory of size N was around O(N^1.5), meaning that something like a linked-list traversal ends up being around O(N^2.5). Ultimately, the problem is that big-O analysis assumes that all memory access whenever and wherever always cost O(1). The real world is not like that. There is a whole field of research in CS about trying to model the cost of memory accesses and incorporating that into the overall big-O analysis, but as you can imagine, it gets really complicated.
I was literally watching your 4 month old video about this mere minutes ago, since I wanted to know more about the use case of Linked Lists, what're the chances of a new upload 2 hours ago!
One other thing that rarely gets brought up is that in higher level languages with runtimes usually have highly optimized native implementations of Data Structures. So, if your in something like Java, C#, or PHP, its basically impossible to write faster data structure than what is implemented natively without writing a language extension. This because those native implementations are usually binaries that are written in something lower level like C or C++. Those implementation have better access to low level functionality abd memory management than what you can write in these higher level languages. So use whats already available to you! Of course, this is a different story if you're working with C or Rust.
No, it’s precisely the opposite. In these higher level languages data structures always store pointers to tables of pointers to data and functions, instead of just storing the data. All this indirection “levels the playing field” by trashing your cache no matter what your program is doing. When you remove the benefit of contiguous memory and cpu cache, suddenly all asymptotic notations become useful.
1:49 Whaaaat? No you dont care about other side. You take the pointer of the n+1 element and adjust the point of n-1 element to point to n+1 element, right? And let the garbaage collector sort it out.
Yeah, we program in a heap only languages with no worries, until we stumble on a problem where we need to build, use, and discard a highly efficient lookup structure, and suddenly memory access and method dispatch is costing you precious microseconds on operations that you need to run hundreds of per web request. I really yearned for a C struct and a vector in Ruby a month ago. Still do.
The velocity comment wasn't too far off. Big O is an indicator of acceleration. A wimpy engine (high constant) will always lose because shedding ounces won't affect the acceleration.
The actual lesson that everyone should draw is that BTrees are the god tier data structure, but sadly not empathized enough. The debate is not necessarily just between lists and vectors, BTrees with a node size close to the page size are often better than either. In Python I like to use the SortedContainers library for this. (yes, my applications are often disk bound)
BTrees are only useful in databases as far as I know. Why would I use BTree for memory-only when sorted vector will outperform it in binary search anyway?
11 หลายเดือนก่อน
Hash code generation is what makes sets slower than arrays with small numbers of elements.
If your collection is already sorted, bubblesort will spank quicksort when it comes to performance. So if your array have 1-2 things out of place, and a size of $HUGE, it might very well beat qs.
@@williamdrum9899 True, but real data in the wild is often "almost sorted" (for a given variety of almost), so if we want to be boring there's a reason why merge-variants like Timsort are used in libraries.
"because we've been taught this our whole life" yes, because professors teaching software engineering have no idea what they are doing, couldn't program themselves out of a paper bag, and their previous profession was "student". Cache locality is huge, linked lists are *almost* always the devil, but always measure regardless.
I think that even better than vector, deque would probably shine for that problem, as deletion and insertion towards the start would be slightly faster yet still very predictable
The difference between continuous memory and linked memory should be explained as such: If you have a library with 30 rooms and ask people to find and move books per request: is it easier if the books are in 3 adjacent rooms that need to be accessed and moved? or will it be easier to find books (with a little note showing what room has the next book) within any of the 30 rooms that may be on opposite ends of the library? (It's a little hyperbolic, but it should make sense why linked memory isn't always better) Of course libraries prefer categorized shelves of continuous memory.
All of the various algorithms that can be applied to all of the various tables and or data structures in many cases do have overlapping complexity curves in both time execution and memory footprint. These overlapping curves or more aptly put is the intersection between said curves. When it comes to BIG O notation the first thing to remember is that it's usually represented by Worst Case Scenario, and as is mentioned in the video the constants or coefficients that are dropped from the representation. Big O is not precisely but more relatively, approximately, or generally speaking. So generally speaking, you have Constant, Log, Linear, Log Linear, Quadratic, Cubic, Polynomial and Exponential: O(1), O(log n), O(n), O(n log n), O(n^2), O(n^3), O(n^k), O(2^n) respectively and each algorithm is categorized by the rate of change of its runtime into relation with the change in its input as it grows in size and the category is always by worse case scenario. There are times where one classified algorithm on a specific data set would be assumed that it would run faster than another where one is commonly known to be say O(log n) and the other is O(n). Yet in some circumstances which could be depending on other factors, looping structure, branching with the branch predictor, data size or size of data structure within the provided container causing a bunch of cache misses, could end up causing the O(n) algorithm to outperform the O(log n) algorithm. Here the O(log N) algorithm for some reason is running at its absolute worse, and the O(N) algorithm in this same case might even be optimized by the compiler, OS, drivers, assemblers, etc, no issues with branch predictor, rarely any cache misses, it could be internally vectorized through SIMD, etc... and could end up running at an O (log N) or even better almost approaching constant time... So basing your Code on Big O Notation isn't always the most suitable. Knowing how various algorithms work on various containers with said data structures and data sets internally, and know what to expect from them in various different situations I think is more important. For example if you know the size your data's container is never going to change, then a simple raw array will suffice but one has to be careful with invalid indexing and invalid pointer arithmetic. If you know the container's length or size is going to change periodically but not all too often then either a vector a list or set would suffice which could also depend on how you into to use or access it. If you're going to need direct element access via indexing then vector is more appropriate. If you need faster insertion and deletion rates then perhaps a list. If you know each element is going to be unique then a set. If you have a specific preferred ordering them perhaps a queue or deque such as a priority list, if you need a FIFO, FILO or LIFO, LILO, type of structure than maybe a stack or their variants. The type of data, the size of the data, the amount of elements within the container, the type of algorithms that are going to be performed on the data itself, on the data structure, and even on its container are just some of the factors in choosing the appropriate container / algorithm combo. There might be times where you have one section of code that doesn't need to be super fast and in fact you're running it in its own separate thread in parallel to another thread that is crunching through a huge loop and you want them to finish approximate at the same time. It's okay if the smaller thread isn't fast but you don't want it to end too early and waiting for too long for the threads to be joined... So maybe you choose a slower algorithm instead because you want both sections of code to complete in say approximately 50ms. So there are many factors as to why, when, where and how you might use said components. Big O might be a helpful guide at comparing various algorithms/containers but it gives you absolutely nothing in terms or regards of how and why you might use one over another. Many times those choices could be situation dependent or could be programmer preference choice... I've been listening to Bjarne Stroustrup on and off throughout the years, nearly 20 years now, and one of my favorite videos was when he was in a discussion with Jason Turner a young and brilliant Modern C++ developer, and a few others, some who are even on the committee or board for the drafting of the Language Standard.
@@SVVV97 my point is that you can do the whole "remove the gap created by deleting an element" in one syscall. yes, it will likely be more expensive than just adjusting a few pointers and free()ing some memory. But with so much cheaper traversal due to cache locality and such, you are better off with the memmove in a lot of cases, as long as the moved memory isn't too large. Amortizing is quite powerful.
Tried linked lists and vectors in C++. Linked list was always slower... ALWAYS. iterating is slower, adding is slower, removing was sometimes faster but considering you have to iterate to find what you want to remove... it ended up being slower. This was with containers with over a million elements. Cache misses are MASSIVE time sinks, as in theory your cpu can do around 4000 floating point multiplications (assuming avx512, 1000 with sse2, don't quote me on exact numbers though) in the same amount of time it takes to recover from level 3 cache miss. Just something to think about. And you have a cache miss per element. On the other hand, copying is really fast.
always measure the perf. One time I thought it would be clever to replace a hash map with a continuous sorted memory and use binary search. And for a large input size (like 1M elements) the binary search on a presorted memory is more than 1000 times slower than hash map 🤯
@@iMagUdspEllr it depends on implementation of the hash-map. There are 2 kinds of hash-maps: sorted and unsorted(bucket based). Sorted maps implemented with red-black trees (with log(n) access complexity) and work best with small (
"For a big enough 1 and a small enough n, O(1) is slower than O(n)" Clever, but this is wrong. Because Big O is not a measure of speed. This is akin to me measuring the number of lanes on a road to tell you how fast the cars can driving on it are going.
C++ will never be great. You have to start over and never add any bad language features like two flavours of pointers, two colours of struct, methods, private/public/protected, inheritance, constexpr, RAII, exceptions, and a ton of undefined behavior. Also the standard library is in a very sad state, but it’s the cruft that c++ will have to carry until the end.
There are actually 3 types of asymptotic notations: 1. Big O(n) 2. Small o(n) 3. Theta Θ(n) Each of them is a set/class of functions {f1, f2, f3, ...} that are all indistinguishable between each other when we look at them through a certain lense.
@@egor.okhterov As mathematical objects, sure, as concepts, not necessarily. O(n) really just tells you how fast the execution time grows, not what its base speed is. So in that sense, it's as good an analogy for O(n) as speed is for execution time. We use fast, speedy, quick, and other adjectives for programs, despite them not physically moving and having no measurable speed. But they figuratively move from A to B at a set speed.
@@Exilum now I better understand this perspective, thank you for clarification. This is not how I think about it though. I am thinking of an estimate for the number of atomic operations instead of thinking about time. Speed, velocity and acceleration are all tightly coupled and based on the concept of time. The time has nothing to do with O(n) notation, but rather it is the story about problem configuration space. Surely, in the end when we are actually starting to walk across that constructed space the time manifests itself in the real world as the result, but the time could change even if the configuration space stays the same.
@@egor.okhterov Yeah, I am not saying it is the best analogy, and that's not what my original comment was meant to convert either. I pretty much just wanted to point what the chatter probably meant in chat when they said "velocity". That's pretty much why I said "could work", it's not perfect, but pretty good for teaching purposes. While it doesn't convey the structure, it conveys the relationship between the O notation and the "result" it "approximates" (a lot of quotes there).
Interesting. This article makes me appreciate the pragmatism of Lua’s design even more. It doesn’t just “use vectors (dynamic arrays) by default”, it uses them (tables) for _everything._ 🤔
@@BoardGameMaker4108 Lua "table" is a hybrid of a dynamic array, and a dictionary/map/lookup-table/whatever-you-call-it, under the hood. That is why there exists separate ipair() and pair() in the first place. If you use Lua table with integer indices, it switches to the underlying dynamic array; if with keys, then a dictionary.
Lua does some interesting things to make tables that work as arrays as well as hash maps. You start to see some of the issues with the approach if you've ever tried to use ipairs on a table that wasn't only written to using table.insert(). I actually think it'd be better if Lua had arrays and tables as separate data structures instead of the one type of table that you can use incorrectly.
@@MSheepdog, I really like the _brutishly_ simple design of Lua. It’s _so_ easy to embed, which is really what it’s meant for. I know people write entire _apps_ in Lua, but… if you’re mostly exposing entry points into another language and writing some glue code to customize policy & business logic, I think the “everything’s a table” design is reasonable. I’m aware there are pitfalls of ‘mixing paradigms’ with tables (i.e., treating the same table as both an array _and_ a dictionary)-but, the way I’m using Lua-I haven’t personally been bitten by them.
Please note that "List vs Vector" is not really a valid debate in most "modern" languages, since the default/preferred list implementation will usually be backed by an array and pretty close to a C++ Vector. like Java's ArrayList. Though the "only reach for a linked list/hash set/tree set/any specialized structure when you know you need it" advice is valid in almost any language. And please don't use Java's Vector class, it's deprecated, slow, and should have been called ArrayList anyway.
The debate is valid in any programming language, you just need to know that “list” means linked list, and “vector” means dynamic array (well, vector by itself was clear enough).
There do exist fast in-place merge sorts. Batcher's odd-even merge sort comes to mind. But I would still test this with real data before using it, of course. Radix sort is still the go-to for ordering billions of values.
@@minor12828 It's been a while since I've looked at DualPivotQuicksort.java, but it now indeed does use more than just Insertion and Merge sort. A lot has changed since 2009. I stand corrected. I didn't look before the dual pivot change, but I vaguely remember it was bubble sort and single pivot quicksort at some point.
The difference that you're trying to explain is between the instruction function, such as f(n)=5n^2+2n+12, which encodes the time the algorithm will run given some input (of course, length of collection is an obvious factor and the default example most times), and the asymptotic classes of functions that it belongs to. Here, the obvious answer is that f is in O(n^2), but that also means it's in O(2n^2), O((n^2)/2), O(en^5), and O(n^7+16n^3) because all that Big O means is that the function is bounded above by the function in the O multiplied by a constant k or greater (technically, this is for all n > m, for some constant m dependent upon that coefficient). All that is to say that Big O does have it's use in that moving an algorithm from O(n^5) to O(n log(n)) time complexity will make the algorithm faster over the majority of the domain, but it is still crucial for one to know what part of the domain one's input space covers whether one uses Big O, Big Omega, and Big Theta or the instruction function itself. The advice for the rest of what appears in the video is essentially measure unknowns and don't concretize too early.
The assumption I always see with this argument is this is running for a single user/client. The entire argument relies on a specific context, but I suspect will fail when it's scaled by concurrent users/clients. Being memory greedy just means you'll hit the barrier where you're not getting cache hits faster. I'd say front end should be fine with the "default to vector" idea. Backend should think twice and embedded should probably ignore. Everyone should be re-thinking using C++ at all, moreover.
There is a C++ talk about library implementation. The standard sort use a threshold and use a different sorting method depending on the number of element
This whole video is silly. Anyone who understand what is happening in a computer knows that cache hits are much slower then moving even large chunks of memory. All Bjarne is doing is pointing out that lists cause a lot of cache hits. As for what collection to use in a program abstract the collection so you can change it later.
I only disagree with Bjarne slightly. My advice is to use whichever container makes your code the most readable by default unless you have a good reason not to. And then even after that you should probably use an adaptor (which gets optimized away by the compiler) so that the code uses the readable use paradigm, but the faster algorithm. For example if a map makes logically the most sense, use a map or hashmap. If you later find out through benchmarks that a contiguous, sorted storage is better for performance, then use an adapter like flat_map so you don't have to compromise the clarity of representation of your business logic to take advantage of a faster algorithm. If you have to implement it yourself, follow the YAGNI principle and only implement the methods that you need, but with almost certainty someone has already implemented what you want in a library already.
Stroustrup: This is example is NOT about how big-O can be deceiving. Big-O-magen: So, anyway, let me spend the next 10 minutes explaining how big-O notation can be deceiving.
I'm afraid there are some dumb people who will use this video to justify writing the most ridiculous N^(100-theirIQ) algorithm and argue ferociously that they are virtuous.
O(1) cannot be bigger than O(n). but yeah it can be slower. edit: it doesn't matter that there is a c constant there will also be a c in the O(n) then... O(1) < O(cn)
@@MrLeDiazz 100000 can be bigger or smaller than any arbitrary n. but O(n) < O(1) as a definition can not make any sense what you are talking about is a real number 100000 being greater than any arbitrary other number [n] . I'm just saying that towards the end of the video there are some takes that seem to be disproving some common myth or something. there is nothing to disprove in that particular case because O(1) is always* smaller than O(n).
@@brod515 I'm not sure what your point is. Is O(1) always smaller than O(n) from the strict definition of the notation? Ok, maybe, but we're not talking about theory here. We're talking about practice, with a finite input. The problem is that a lot of people seem to think that an _algorithm_ with O(1) runtime always runs faster than an algorithm with O(n) runtime. That is definitely not true. O-notation doesn't describe how fast something is. It describes _how it scales._ O(1) will always outpace O(n) when the input grows large enough, but if you only run small inputs the O(n) option might easily be faster. That is important to understand (!) but for some reason a lot of programmers don't.
O(1) is a set. O(n) is also a set. You cannot use < or > for these objects. What you should use instead is set operations like: O(1) ∪ O(n) - union of sets O(1) ∩ O(n) - intersection of sets O(1) ⊂ O(n) - statement saying that O(1) is a subset of O(n). That statement is true by the way.
Bjarne Stroustrup is one of the best engineers to emulate. he always comes off as a guy focused on solving problems and not about having pointless opinions.
very subtle
And somehow he has contributed to c++ been the most bloated language with providing solutions to problems that c++ created.
@@georgeokello8620 c++ is not bloated, you include what you want to use. it's that simple
@@georgeokello8620 just dont use it then lol
@@shlokbhakta2893you don't always get a choice what you work on.
I did C++ for several years. It's one of those interesting cases where lots of individual decisions make sense but the final result has problems. You end up with lots of features you don't want and not getting things other languages have had until years later.
Big O notation is based on limits in calculus. In theory O(1) is always faster than O(n) but only as n approaches infinity. This is why built in sort functions in well implemented programming languages will often select an algorithm based on the number of elements you pass in.
Thank you for stating what the O notation actually is.
This. For small arrays (i mean, like 10 elements) an insertion sort (hell, even a bubble) can be WAAAY faster that a qsort().
"In theory O(1) is always faster than O(n) but only as n approaches infinity."
That's not right either.
@@isodoubIet How so? I’m guessing we’re gonna say constant-time with a large constant can be more expensive than linear-time with a tiny constant?
@@bigpest That particular statement is inaccurate in a couple ways. First, everything that's O(1) is also O(n), or O(n²) etc, just the bound isn't tight. That means O(1) is not necessarily faster than O(n) for any n. Maybe that's pedantic, so let's switch to tight bounds throughout (I'll use Θ instead of O, for clarity). Secondly, we use the limit notation to indicate the asymptotic behavior as n "approaches infinity" but it's important to understand that what this actually means is that there's some _finite_ value at which Θ(1) is faster than Θ(n). We're not dealing with some weird mathematical statement that's only valid exactly on the limit, like 0.9999... = 1. Given a _large enough_ n, Θ(1) will always beat Θ(n).
I'm surprised to see how many people miss the fact that big O notation is about scalability of an algorithm and not absolute performance
This.
I said in my own comment using big O for speed is like measuring the number of lanes on a road to tell you how fast the cars can driving on it are going.
In very high level languages, all bets are off. Just use whatever data structure is most convenient until you encounter performance issues. Then measure to find out which part of the code is the culprit. Then repeat the measure, tweak, measure cycle.
fax
Exactly! I’d rather use a set to get the unique elements in a small collection even if slower as it’s more expressive. If it turns into a bottleneck then optimize.
I agree. When I do anything in more modern architectures I don't even think about clock cycles. On 8 bit micros though? Time to whip out the opcode chart
It’s cache misses that causes the slow down. The list will perform badly because it’s generating cache misses all over the place. The array is guaranteed to have contiguous memory, so as you don’t have to go to main memory as frequently.
This is the meaning of compact and predictable access of memory.
It's not always the cache misses. There are also pipeline stalls from dependent loads: the next address to fetch in the list is in the data you are currently fetching so even with speculative execution those loads have to be entirely serial.
Cache misses are indeed overwhelmingly likely to be the culprit for performance issues and hence it's a good rule of thumb to optimize for them first, but they aren't the only effect.
@@OMGclueless that's a great point. Thanks for the clarification.
@@zoeherriot cache misses take place for array, too. However, there are prefetchers that guesses the next elemnt and prefetches it to make it cache hit. Also, when there is a cache miss, cache fetches 1 cacheline of data to cache which is mostly 64B. So, it brings several elements of the array which again reduces number of cache misses.
@@EmreHepsag yeah, I get that. My comment was because both Bjarne and Scott Myers both did talks where they explicitly covered this topic - and the point was - almost always choose arrays over linked lists. I wasn’t trying to get into the weeds, I just assumed everyone had seen those talks.
To be fair, also presumes that you're iterating the list, if you aren't, they aren't as bad, but in most cases when you make a container of elements that isn't like a map/dictionary, you're usually iterating it, hence why lists tend not to be a great container to resort to.
One does not add constants to big O notation, because it is not intended to indicate performance, which varies between processors and data.
Instead big O notation is an indication of how the performance scales with regards to input, agnostic to the processor and data.
It's possible for an O(n*log(n)) algorithm to perform significantly worse than an O(n^2) counterpart on small inputs and significantly better on larger inputs.
When optimizing, benchmark for your specific data and platform.
The reason why linked lists generally don't perform well is due to cache misses and speed of main memory vs CPU cache.
So while it might be a single step in the algorithm the CPU will be idle for 1000s of cycles as it waits for main memory to return the value.
Predictable memory lookup (Such as with with a vector/array) often leads to the CPU prefetching the data before it actually needs to be read.
It also depends on the architecture. For example if your using slow ass RAM such as SPI on embedded devices, you'll get wildly different results with the same code as opposed to running on a PC or a IBM Power RISC. Always fish some results early. Benchmark for your needed dataset size.
11:30, during my measurements of "if-block vs switch vs array (fixed-size) vs bitfields", for different configurations of these data structures, I found that they have this increasing speed order, on average. But I also found that some switches are slower than ifs, and others beat the "lesser" array configurations, as well as the faster array ones beat some "slower" bitfields.
compiler magic sir
@@ea_naseer Compilers differ too. GCC's performance is more unpredictable than Clang's. I think that's due to make more use of jumps.
My first question when approaching this stuff is "How big is the element, and how many elements are there?' I can usually rewrite a function that inserts/removes n elements into/from a vector length m from O(m*n) (move tail after every insertion/removal) to O(m+n) bundling all the operations into one pass through the vector (move tail by m, then subsequently move elements by less as elements are encountered).
I like to use lists to store big almost-singletons. Stuff you typically expect to have one of, but in rare cases you'll want more. Say, a logging service with connection to the logging server. In 99.9% of the cases you'll have one logging server, one connection, period. Rarely you'll have 2. Almost never 3. The object itself is quite heavy, containing a sizeable, fixed size buffer for the logs. I don't want a vector to pre-allocate space for several of these. I absolutely don't allow moving them on resize, 'cause other stuff already has pointers to the existing instance. But traversing the 1-3 element list to visit all the elements takes a blink of an eye comparing to time spent performing the object's actual job, and no RAM is wasted.
Bjarne is epitome of pragmatism trumps all. That's probably why C++ is how it is, but still one of the most widely used language.
Interesting. I was just doing tests like this in C# yesterday morning...
I found if you're inserting multiple 100s of thousands or more elements near or at the front, it's worth converting a list to a linked list for the insertions, and then back. This gets better as the original size of the list and number of insertions increases.
I'm not sure I've ever seen a scenario in production for this kind of thing... but if it comes up I'll be ready for the rare time a LL will be worth it! lol (and that's only if the memory isn't a concern for the app at that point)
Not sure about the current state, but List used to be just a cute wrapper around T[]. If it still is, your scenario would make perfect sense as the container array would have to be allocated>copied>disposed every time you exceed it's size with the insertions. And obviously, an insert (or removal) near/at the beginning will need A LOT more element shifting than the same operation at the end.
Should be easy to test. Allocate a List then start inserting items one by one at the end and timing the insertion. It should be (near) constant until you hit the limit, then it will have an "outlier" and go back to the same (near) constant on the next insertion, with the "outliers" appearing at regular intervals because of the allocate>copy cycle.
But C# is a really nasty thing when it comes to optimization. For example [y*stride+x],[y][x] and [y,x] should be pretty close, but they are anything but. And if you pick the best one (for the moment), you might get surprised if you later change your CLR and it becomes worse than before.
Or to put it simply, if performance is critical, C# is not it, write a dll/so in C++/C/somethingpredictable with the performance critical parts in it. If you can live with sub-optimal, by all means, let it all live in C# (for example [][] is not always the best, but is never the worse ;) ).
Ya if performance was truly critical, we'd probably want some sort of optimized/custom initializing and resizing logic instead of the what C# provides with its array wrapper for List, like you said.
But as I mentioned, I haven't even needed to stray from List in any project for performance reasons, as the C# hasn't come close to network times.
C# actually just released a bunch of performance upgrades with .Net 8 today! Still not the language you choose for hardcore performance, but there's some nice improvements for JIT and AOT compilation.
This was just a fun, quick experiment (similar to what The Primagen was writing) to see if I should have been considering LinkedList instead of List in my previous projects.
@@ErazerPT
@@Average-Lizard Can totally understand you. On some project we had to interface with a Sharepoint server that lived in a network that can only be described as either dysfunctional, schizo or both. Add both things and you get a slug on Valium. We gave up on optimization because we could at best shave some dozens of milis out of seconds on the external calls...
If a usecase were heavy on front insertions you could probably just go with a double ended array. Or just an array where you store the items in reverse by convention, so front insertions get added to the back
@@ErazerPT Step 1 in my plan to optimize C# applications I deal with. Use async. I've seen C# written purely synchronously more often than you'd think. Heck, I've written more synchronous C# than I'm proud of. When the upper level is synchronous, it's just easier to stay that way. Slower, but easier.
There are people out there who argue with the creator of C++ about performance? Wow.
One thing to keep in mind, for many things code readability and maintainability, while subjective, can be more important that small performance gains.
If a Map makes your code crystal clear, use it
What about smartass compilers? Let's say the compiler knows I'm doing 5 deletions, instead of deleting + compacting 5 times, it could try to do all deletions at once maybe?
Same goes for other crazy operations on vectors that it can see optimizable
or not?
That's why you always test. There may be compiler options that will speed up the performance, but they may also slow down the performance. Same goes with tuning queries in the database. Just throwing indexes at something does not always work.
Merge sort also has a good use case: external sorting. If your data set doesn't fit in memory, merge sort is a totally valid strategy.
I used merge sorting linked lists in C and it beat using arrays. Insertion sorting doesn't scale no matter how you you store the data. Binary searching your linked list will beat linear searching, he seems to overlook that. I didn't need the data sorted all the time, there are always tradeoffs.
@@phill6859 How does binary search on linked list function? You have to traverse the list to get to the checked element. Is the "check element" function that demanding?
@Kycilak maybe he meant skip-list, but anyway that guy seems delusional and needs to be taken to infirmary
@@Kycilak yes. Doing a strcmp on each item takes time and if your linked list pointers are allocated from a block they will end up in the cache. If you linearly compare all the keys then you will thrash the cache. This was 25 years ago, caches were smaller. I had plenty of people telling me I was wrong at the time. It worked and it was faster, so I just ignored them.
@@egor.okhterovnot a skip list, going to an index in a linked list is faster than strcmp linearly because of cache usage. 50% of the time you will be going in the opposite direction that you just travelled and so that is in the cache too. You can cache more pointers than you can data.
When remove an element from a doubly linked list you also have to update the next and previous links (pointers) in the next and previous location as well.
I had an application once that processed some data, correlating two lists of data by using one list as keys and the other list as a lookup table; the lists were two different representations of the same dataset. Changing that lookup list to a hash table provided a speedup of over 16000x in the biggest dataset we had of over 50k entries. Point is: time your application before you optimize; you don't know when the constants trump the asymptotics.
It's not that 0(1) is really O(C*1) - it's that O(C(n)) means that operations are less than or equal to C*f(n) for some constant C and sufficiently large x. For example, 5x^3 + 3x^2 + x + 18 < C*x^3 when x is >= 1000 and C is 6, since 6(1000)^3 > 5(1000)^3 + 3(1000)^2 + 1000 + 18, and for all similar expressions where x > 1000. It's the same concept as a limit going to infinity, how only the largest piece matters once x gets large enough. idk if this is actually what O stands for, but in my head I always say 'order of f(n)', which is what it is.
it is actually what big-O stands for. But in computer science we are used to the fact that n approaches infinity, because n is like some number of 'basic operations', so algorithms are always scaling to infinity. But to be clear, 'n' in f(n) might be some other value, even negative, or even a vector of values
wait a sec, I just realised that the insertion time that I made for my arraylist datastructure library in C is actually faster than a linked list, and probably so too because a linked list is getting a ton of cache misses. I made my arraylist in C kind of a ring buffer so that it tricks you into thinking that inserting front will actually insert at front but it actually inserts at the back, and because of some math this is O(1) and then inserting means just instantly going to the element and then either moving all elements to front or to the back depending on which path is faster. so that O(n) but my insertion can always calculate the fastest path. the only thing that would take more time would be doing a resize which is O(nn) but tbh, this can easily be prevented with my arraylist too by just having a big enough capacity for the job from the start or something like that. but then again yeah Idk if the resizing is that bad honestly.
I feel that abstracting memory away has created more problems than that it has solved 😅
it all started with new and delete
All software development is about tradeoffs. If the problems caused by abstracting memory away are net less expensive than the solutions that the abstraction provides - then it is a good abstraction.
@@youtubeenjoyer1743 GCs famously didn't exist before Javascript, right?
I mean, to be fair, the memory is already abstracted for you by the hardware.
@@gagagero It's not about gc or javascript.
The biggest reason to use std::list over std::vector is the difference in iterator-invalidation. Iterators are/maybe invalidated by inserting into/removing from a std::vector, but iterators for a std::list remain valid unless the element your iterator is pointing to was removed.
That's just a C++ thing only. In general, the best use linked lists have is to allocate a list of different size and types elements, filled with the equivalent of void pointers in C that later get a defined size and type.
12:00 Doesn't mean much when not telling which method turned out to be faster. For me it is obvious the array is, its just a tiny bit of continuous memory that can be processed at full processor speed without memory latency messing things up. If the list turned out to be faster, now that would really surprise me and I would certainly expect that to be because of some data skew making the comparison invalid.
A more telling example:
If you write a function that always returns an array of size 1000000, this function has a O(1) time and space complexity. Another function, that always returns an array of size (n) where n is the input, will be an O(N) function, but will be faster up until 1000000.
Both are O(1)
Yeah, but that doesn't mean the first one won't be faster. @@vasiliigulevich9202
@@vasiliigulevich9202depends on the precise array and implementation details. We can't tell either way a-priori
@@SVVV97 no. Allocation/initialization of array of primitives - O(1) array of objects - O(N). Easy as that, can't be misread.
@@vasiliigulevich9202 No, that's just not true and a gross oversimplification. An array of uninitialized memory is *probably* (modulo weird allocators) O(1). A zeroed array might be O(1) or O(N) depending on your allocator, OS, hardware etc.. Generating an array of random numbers is almost surely O(N).
Generally speaking the time complexity can be arbitrarily high because computing any element of the array can be arbitrarily large - and together with that we can also bump the space complexity arbitrarily high.
I believe we should use fixed size arrays by default, it's hands down the best data structure most of the time.
Majority of the time a vector (dynamic array) with a pool, SBO, and/or static buffer are just as good as a fixed size array, unless you have some reason you need to avoid heap allocations.
@@Spartan322 Pros and cons. If I ever expect the number of elements to change I'm almost always going to go with std::vector over std::array. Assuming we're talking about member variables, the memory layout between the two is pretty different, even if both are on the heap. Most of the time it doesn't mater, but that extra pointer dereference can cause cache misses. It's also a massive difference in how copy, move, and serialize/deserialize operate. With the fun part being moves are almost always much faster when using std::vector.
@@arthurmoore9488 Usually the case for which I'll consider array over vector is when doing constexpr/consteval stuff, as Apple's version of Clang still doesn't like allocations in constexpr/consteval even in C++20, template metaprogramming can get pretty gnarly with everything we have in C++20. (that's not entirely supported by any if not all the compilers) Course you need to never let allocations escape the constant eval environment regardless so you have to return fixed sized objects, but most of the time I just resort to vector if I need a contiguous container unless I need some type of stack advantage, its rarely been useful in my experience to deal with arrays aside from those cases.
@@Spartan322 Directly taking pointer to elements in an array can be very useful. Explicitly setting a hard limit on something makes stress testing trivial. Fixed length arrays are more portable, not every platform has malloc (i.e. wasm)
@@pokefreak2112 Pretty sure emscriptem has dlmalloc and emmalloc, so how would wasm not support malloc? Like guess you could set it to run without malloc, but that's not really the same as saying it does not support it, malloc is literally standard C, if it can't use malloc, its not standard C regardless, that's like saying well defined behavior does not need to follow the standard, then you're simply not using C, let alone C++. I can't think of a single platform that doesn't support malloc that can claim to support C.
Here's an even faster vector based semi-compact "list": don't move elements over, keep another vector of indexes to determine where gaps are, only fill in the gaps on specific call to do so, such as after you're done inserting/removing whatever, that will be undisputably faster than removing one element and shifting entire block of data at a time.
You can even not ever move elements anywhere at all, simply maintain second vector that holds correct permutation of first's indices, and it will still be faster than list, because when it matters, you can swap elements according to permutation before doing any heavy work.
There's practically zero real world usecases for linked lists.
Linked lists force memory lookups for every node, where a vector has all of the info in contiguous memory.
That might might be obvious, and most people also know that memory lookups are extremely slow... But often a vector's elements can fit within a cpu register which is blazingly fast, and the cpu is also blazingly fast at manipulating data within it's own registers. std::vector also has optimizations for certain removals, so many operations can be performed with a single 'walk'' of the array where that can be much harder with a linked list.
Linked lists do have use-cases, but I think it's relatively rare and it's usually a trade-off for a gain somewhere else; they seem closer to an educational tool than anything else lol.
I don't like words like "fast" and "slow" in this context. "Blazingly fast" and "extremely slow" makes me cringe even harder.
I could accept "faster" and "slower" though :)
@@egor.okhterov The difference is large enough to use "blazingly" and "extremely lol. The number I find from a Google search is cpu registers are generally 10-100 times faster than RAM, but it's kind of an ambiguous value.
@@Hazanko83 the context is king. What matters is "observable" difference. For example if I sort a file in 1ms or 100ms I will not observable the difference, so that's basically the same speed for me. If I am a web page user and a page loads faster then I blink all the speeds faster then I can move my eyelids are equivalently blazingly fast 😀
@@egor.okhterov That kind of logic only works for one-off situations. a 1ms difference compared to a 100ms difference, could mean thousands or millions of dollars in server costs for a website. Or in a game it could mean it running at 1fps or 100fps if every function is running 100x slower than it could be.
The context is real world applications vs someone's toy project.
this is why unit testing, benchmarking, and pure functions are so powerful.
you can pump out dog sh** code, and easily fix/optimize things later, in my experience.
Arrays are only faster when you have cache. Singly linked lists are faster when you don't have cache.
Lisp, a programming language of linked lists, functions best in 36 bit word size. 32 bit size is not optimal for Lisp.
Lisp, LISt Processing, treats everything as data. Functions are themselves lists of data.
Because Lisp code can be processed as data, you can write Lisp code which will write Lisp code. Something that Bjorn Strousup can't do in C++.
Christian Schafmeister, a software engineer and chemical engineer, designs 3D molecular modeling software, switched from C++ to Lisp.
He spent 10 years on his project, after hitting a development wall in C++, found Lisp, and wishing he had started his project in Lisp.
Good thing there's no computer in the wild that doesn't have cache AND has a lot of memory.
Microcontrollers have no cache but don't have enough memory to accomodate each element wasting 2+ bytes because pointers.
@@shinobuoshino5066 uLisp then. ;-D
It'd be interesting to see how the performance changes if you're using arena allocation for the linked list, or some sort of data structure that better represents the problem at hand, that'll cost you development time though and there's no guarantees that'll be faster.
it's called a std::vector or a c-style array lol. The whole point of a linked list is that you can have exit/begin pointers to 'neighboring' nodes BECAUSE you don't know where they are in memory, if you're expecting data to be allocated within the same area of the ''arena'' or to be contiguous, then you'd just use an offset on the object pointer to get to previous/next nodes(which is literally a raw array). std::vector essentially is it's own ''arena allocator'' in the sense that it manages the allocations and 'garbage collection' for you, instead of calling into your own custom memory allocator.
For example, the struct below and it's function works, and is ''valid c++''. Although I would not recommend designing anything like this lol...
struct SillyLinky
{
inline SillyLinky* get_next()
{ return (this+1); };
};
You still have to store pointers and that overhead adds up, your cache lines will have less data, and you will have more cache misses. Each insertion requires allocation, writing node data at the advancing arena pointer. Depending on the problem, your linked list nodes may become randomly spread out on the arena, and a simple traversal will cause unpredictable memory access patterns.
@@youtubeenjoyer1743 We're talking about small memory optimizations, so the size of the array might not be a concern. Making the arena larger but not so much as to allowing it to fit into a cache line could be possible. Also I'm not talking about a linked list on top of an arena in the "use this allocator instead" c++ way, I'm talking about maybe a vector with index pointers, you'll be jumping arround the vector instead of the entire heap so probably that changes things.
@@Hazanko83 You took the worst possible interpretation of what I said lol. I'm not proposing an arena allocator in the c++ way, I meant it in the sense of treating the vector like a linked list, storing pairs of (element, index of next element). This way you still have the constant search performance advantage. Would still need to test it though.
@@user-hk3ej4hk7m Vector IS a constant time, random access container though?... Which is as good as you can get... You can do exactly what I showed with that struct thing, with it being inside of a vector, without needing a pointer variable as a member or needing anything initialized(it's just a bad idea). When you use operator[] on vector, it's simply pointer arithmetic on a pointer.
After having data contiguous in memory, and increasing cache locality, clever algorithms is really all that's left... not more misdirection.
There is also the thing than CPUs have special instructions to work with arrays, strings and more higher concepts nowdays.
While they can't really do much to improve linked lists.
However, it doesn't mean lists are never useful, but i do prefer arrays for contiguous lists of data, but trees which are just multi dimensional lists are so nice for many other things.
Probably I will invent America for you, but you could store tree in array.
Look up: Fenwick tree or Heap tree
I get that array based lists are compact, but how is adding/removing from their wrapper counterparts such as vector or ArrayList/List faster? Just because they are contiguous and are accessed via an offset? I assume this means that accessing an element of the linked list through its reference/pointer member is slower overall.
I'm not really experienced enough to speak on the matter but I think it depends a bit on hardware. If you're doing embedded development for a single architecture, and the machine's concept of allocation is "Here's 64k of RAM, that's all you get" I'd rather just use arrays
Genuinely asking, for the problem in the beginning why would you intuitively reach for the linked list? It says that you remove the elements by random positions, in the LL implementation you'd have to traverse till you get to that position where as you have direct access in the vector/array implementation.
It’s because when you remove an element in a vector you have to shift every element over by one.
Here’s the original array:
std:vector = {0, 1, 4, 6, 8};
and if you want to remove 4, you have to shift over 6 and 8 so that the new vector looks like
{0, 1, 6, 8}
If not for shifting, the vector would look like:
{0, 1, GARBAGE, 6, 8}.
The reason you don’t need to create a new vector is because you just change the iterators pointing to the last element.
Because linked lists are better designed to delete elements in the middle of the list (you just relink the previous to the next and the next to the previous), whereas vectors will realocate as the elements must be contiguous in memory.
@@jaredteaches894 Thanks for the reply. So essentially it's just a tradeoff between taking the time to search the linked list to find your target and shifting the memory? Is it more intuitive because it's more ergonomic from a developer POV?
Sometimes you already have the element you want to remove, so you don't need to transverse
@@Lukeisun7 It's "intuitive" because it's often taught that insertions and deletions are efficient in a linked list, which is true; they are efficient. So, when you want to have a data structure where you plan on doing a lot of insertions and deletions, you will naturally choose a linked list.
The author is suggesting that you need to measure the cost of operations. The article ends by emphasizing the importance of locality -- both spatial and temporal. The general gist of caching is the most common case will be efficient. Things closer to X will probably be used again so hardware will exploit that. This is not necessarily true in a linked list. Nodes on a list don't have to be near each other in memory. In other words, the addresses of nodes might not be cached (within the TLB) and this extra memory references will be incurred.
The whole O(N) being actually O(C*N) is interesting and important, but when it comes to cache locality issues, it's a different issue and much worse. The average cost of a memory access is equal to the cost of a cache miss times the probability of getting a cache miss. The probability of getting a cache miss on a random memory location grows linearly with N (memory). Also, the cost of individual cache misses increases with N because the data has to be fetched from even further away. In my own measurements, years ago, the overall cost of cache misses for random jumps in memory of size N was around O(N^1.5), meaning that something like a linked-list traversal ends up being around O(N^2.5).
Ultimately, the problem is that big-O analysis assumes that all memory access whenever and wherever always cost O(1). The real world is not like that. There is a whole field of research in CS about trying to model the cost of memory accesses and incorporating that into the overall big-O analysis, but as you can imagine, it gets really complicated.
I was literally watching your 4 month old video about this mere minutes ago, since I wanted to know more about the use case of Linked Lists, what're the chances of a new upload 2 hours ago!
One other thing that rarely gets brought up is that in higher level languages with runtimes usually have highly optimized native implementations of Data Structures.
So, if your in something like Java, C#, or PHP, its basically impossible to write faster data structure than what is implemented natively without writing a language extension. This because those native implementations are usually binaries that are written in something lower level like C or C++. Those implementation have better access to low level functionality abd memory management than what you can write in these higher level languages. So use whats already available to you!
Of course, this is a different story if you're working with C or Rust.
No, it’s precisely the opposite. In these higher level languages data structures always store pointers to tables of pointers to data and functions, instead of just storing the data. All this indirection “levels the playing field” by trashing your cache no matter what your program is doing. When you remove the benefit of contiguous memory and cpu cache, suddenly all asymptotic notations become useful.
@@youtubeenjoyer1743
I stand corrected.
1:49 Whaaaat? No you dont care about other side. You take the pointer of the n+1 element and adjust the point of n-1 element to point to n+1 element, right? And let the garbaage collector sort it out.
A C++ developer definitely has the tools to think like this. I use Java which handles the memory allocation for me.
Is there a link to the video he mentioned in the beginning? About inserting random numbers into a sorted sequence?
0:35 what video ia he talking about?
this one: th-cam.com/video/cvZArAipOjo/w-d-xo.html
th-cam.com/video/cvZArAipOjo/w-d-xo.htmlsi=piPX_a1S3wOqpwFn
5:39 identifier 'count' is undef? unless its global?
Yeah, we program in a heap only languages with no worries, until we stumble on a problem where we need to build, use, and discard a highly efficient lookup structure, and suddenly memory access and method dispatch is costing you precious microseconds on operations that you need to run hundreds of per web request.
I really yearned for a C struct and a vector in Ruby a month ago. Still do.
Why do you care about hundreds of microseconds for a web page? :)
What if objects are not strings? just big enough objects, so in this case using vactor may be not so effective
The velocity comment wasn't too far off. Big O is an indicator of acceleration. A wimpy engine (high constant) will always lose because shedding ounces won't affect the acceleration.
No, it's not about acceleration.
O(n^2) is a set of functions. For example, a function f(n): 3n + 7 belongs to that set 😂
The actual lesson that everyone should draw is that BTrees are the god tier data structure, but sadly not empathized enough. The debate is not necessarily just between lists and vectors, BTrees with a node size close to the page size are often better than either. In Python I like to use the SortedContainers library for this.
(yes, my applications are often disk bound)
BTrees are only useful in databases as far as I know. Why would I use BTree for memory-only when sorted vector will outperform it in binary search anyway?
Hash code generation is what makes sets slower than arrays with small numbers of elements.
I love that prime will never highlight the end and beginning characters of a text section
12:05
It's probably because of memory locality, cache misses, something like that.
[Edit: would love to see more C++ on this channel] Advent of code maybe
I just made a linked list in c today. Guess i'll make a vector tomorrow
If your collection is already sorted, bubblesort will spank quicksort when it comes to performance. So if your array have 1-2 things out of place, and a size of $HUGE, it might very well beat qs.
Problem is you can't tell how close to sorted an array is until after you sorted it
@@williamdrum9899
True, but real data in the wild is often "almost sorted" (for a given variety of almost), so if we want to be boring there's a reason why merge-variants like Timsort are used in libraries.
9:40 Does he know about parallel merge sort?
"because we've been taught this our whole life" yes, because professors teaching software engineering have no idea what they are doing, couldn't program themselves out of a paper bag, and their previous profession was "student". Cache locality is huge, linked lists are *almost* always the devil, but always measure regardless.
09:47 but quicksort doesn't preserve order of equal elements
Why would you care?
This was great, do more like these, good coding article, showing some code etc. and making us think along with the video!
I think that even better than vector, deque would probably shine for that problem, as deletion and insertion towards the start would be slightly faster yet still very predictable
Except in most cases, std::vector beats std::deque.
The difference between continuous memory and linked memory should be explained as such: If you have a library with 30 rooms and ask people to find and move books per request: is it easier if the books are in 3 adjacent rooms that need to be accessed and moved? or will it be easier to find books (with a little note showing what room has the next book) within any of the 30 rooms that may be on opposite ends of the library?
(It's a little hyperbolic, but it should make sense why linked memory isn't always better)
Of course libraries prefer categorized shelves of continuous memory.
You can do an inline merge sort.
If in doubt, vector it out. Until such time as you can make an informed decision.
All of the various algorithms that can be applied to all of the various tables and or data structures in many cases do have overlapping complexity curves in both time execution and memory footprint. These overlapping curves or more aptly put is the intersection between said curves. When it comes to BIG O notation the first thing to remember is that it's usually represented by Worst Case Scenario, and as is mentioned in the video the constants or coefficients that are dropped from the representation. Big O is not precisely but more relatively, approximately, or generally speaking. So generally speaking, you have Constant, Log, Linear, Log Linear, Quadratic, Cubic, Polynomial and Exponential: O(1), O(log n), O(n), O(n log n), O(n^2), O(n^3), O(n^k), O(2^n) respectively and each algorithm is categorized by the rate of change of its runtime into relation with the change in its input as it grows in size and the category is always by worse case scenario. There are times where one classified algorithm on a specific data set would be assumed that it would run faster than another where one is commonly known to be say O(log n) and the other is O(n). Yet in some circumstances which could be depending on other factors, looping structure, branching with the branch predictor, data size or size of data structure within the provided container causing a bunch of cache misses, could end up causing the O(n) algorithm to outperform the O(log n) algorithm. Here the O(log N) algorithm for some reason is running at its absolute worse, and the O(N) algorithm in this same case might even be optimized by the compiler, OS, drivers, assemblers, etc, no issues with branch predictor, rarely any cache misses, it could be internally vectorized through SIMD, etc... and could end up running at an O (log N) or even better almost approaching constant time... So basing your Code on Big O Notation isn't always the most suitable. Knowing how various algorithms work on various containers with said data structures and data sets internally, and know what to expect from them in various different situations I think is more important.
For example if you know the size your data's container is never going to change, then a simple raw array will suffice but one has to be careful with invalid indexing and invalid pointer arithmetic. If you know the container's length or size is going to change periodically but not all too often then either a vector a list or set would suffice which could also depend on how you into to use or access it. If you're going to need direct element access via indexing then vector is more appropriate. If you need faster insertion and deletion rates then perhaps a list. If you know each element is going to be unique then a set. If you have a specific preferred ordering them perhaps a queue or deque such as a priority list, if you need a FIFO, FILO or LIFO, LILO, type of structure than maybe a stack or their variants. The type of data, the size of the data, the amount of elements within the container, the type of algorithms that are going to be performed on the data itself, on the data structure, and even on its container are just some of the factors in choosing the appropriate container / algorithm combo. There might be times where you have one section of code that doesn't need to be super fast and in fact you're running it in its own separate thread in parallel to another thread that is crunching through a huge loop and you want them to finish approximate at the same time. It's okay if the smaller thread isn't fast but you don't want it to end too early and waiting for too long for the threads to be joined... So maybe you choose a slower algorithm instead because you want both sections of code to complete in say approximately 50ms. So there are many factors as to why, when, where and how you might use said components. Big O might be a helpful guide at comparing various algorithms/containers but it gives you absolutely nothing in terms or regards of how and why you might use one over another. Many times those choices could be situation dependent or could be programmer preference choice... I've been listening to Bjarne Stroustrup on and off throughout the years, nearly 20 years now, and one of my favorite videos was when he was in a discussion with Jason Turner a young and brilliant Modern C++ developer, and a few others, some who are even on the committee or board for the drafting of the Language Standard.
You are my favorite tech slop ❤
Use a linked list and keep a separate binary search tree list, get the best of both worlds
Specifically with the array and deleting an entry and moving the rest up, can't that be done with a simple memmove?
Yes, but what's your point? memmove can be quite expensive
@@SVVV97 my point is that you can do the whole "remove the gap created by deleting an element" in one syscall.
yes, it will likely be more expensive than just adjusting a few pointers and free()ing some memory. But with so much cheaper traversal due to cache locality and such, you are better off with the memmove in a lot of cases, as long as the moved memory isn't too large. Amortizing is quite powerful.
The devil lies in constants
A new version of "GOTO considered toxic"?
Tried linked lists and vectors in C++. Linked list was always slower... ALWAYS. iterating is slower, adding is slower, removing was sometimes faster but considering you have to iterate to find what you want to remove... it ended up being slower. This was with containers with over a million elements. Cache misses are MASSIVE time sinks, as in theory your cpu can do around 4000 floating point multiplications (assuming avx512, 1000 with sse2, don't quote me on exact numbers though) in the same amount of time it takes to recover from level 3 cache miss. Just something to think about. And you have a cache miss per element. On the other hand, copying is really fast.
As someone who's been programming in C++ for 8 years, what's a list?
std::list is a doubly linked list implementation in standard template library
So should I be using MongoDB? Or something else.
Yes
Yes
As always the correct answer is to measure first
Finger trees are kind of dope
always measure the perf. One time I thought it would be clever to replace a hash map with a continuous sorted memory and use binary search. And for a large input size (like 1M elements) the binary search on a presorted memory is more than 1000 times slower than hash map 🤯
A hash map has O(1) access. The hash map should always beat O(log(n)). Why would an array beat a hash map?
@@iMagUdspEllr it depends on implementation of the hash-map. There are 2 kinds of hash-maps: sorted and unsorted(bucket based). Sorted maps implemented with red-black trees (with log(n) access complexity) and work best with small (
I remember the original video. It's an eye-opener.
i cant find it, whats the link ?
@@onground330 I wouldn't be able to find it either. 8 months ago is when I saw it.
Eyeppenheimer
We all know that a fully connected graph is the superior data structure.
in dutch there is the saying : " Meten is weten" = Measuring is knowing
In german, there is the proverb "wer misst, misst mist". Who measures, measures bullshit.
You mentioned that everything in JS is on the heap, but what about Python?
It's on the heap heap
@@batatanna Wut
@@heavymetalmixer91 hooray
Gotem
You can safely assume most of these interpreted languages do (almost) everything in the heap.
"For a big enough 1 and a small enough n, O(1) is slower than O(n)"
Clever, but this is wrong.
Because Big O is not a measure of speed.
This is akin to me measuring the number of lanes on a road to tell you how fast the cars can driving on it are going.
If c++ introduces ownership and borrowing, will it be great again?
It never stopped.
C++ will never be great. You have to start over and never add any bad language features like two flavours of pointers, two colours of struct, methods, private/public/protected, inheritance, constexpr, RAII, exceptions, and a ton of undefined behavior. Also the standard library is in a very sad state, but it’s the cruft that c++ will have to carry until the end.
@@youtubeenjoyer1743If you want to complain about C++ you should probably not give a list of some of its awesome features.
2:35 everytime when Prime goes technical I feel like a computer with some missing drivers.
I think the fact people who know O notation don't understand this point is ultimately because they really don't understand O notation.
You really understand what O(n) is when you also understand what o(n) and Θ(n) are.
I often disagree with Stroustrup, but he is a very smart person and this article shows that.
There are actually 3 types of asymptotic notations:
1. Big O(n)
2. Small o(n)
3. Theta Θ(n)
Each of them is a set/class of functions {f1, f2, f3, ...} that are all indistinguishable between each other when we look at them through a certain lense.
I mean, velocity is the wrong word, as it just means speed, but acceleration could work.
Velocity is a vector. Acceleration is a vector. Both are wrong analogies when thinking about O(n)
@@egor.okhterov As mathematical objects, sure, as concepts, not necessarily. O(n) really just tells you how fast the execution time grows, not what its base speed is. So in that sense, it's as good an analogy for O(n) as speed is for execution time. We use fast, speedy, quick, and other adjectives for programs, despite them not physically moving and having no measurable speed. But they figuratively move from A to B at a set speed.
@@Exilum now I better understand this perspective, thank you for clarification.
This is not how I think about it though. I am thinking of an estimate for the number of atomic operations instead of thinking about time. Speed, velocity and acceleration are all tightly coupled and based on the concept of time. The time has nothing to do with O(n) notation, but rather it is the story about problem configuration space. Surely, in the end when we are actually starting to walk across that constructed space the time manifests itself in the real world as the result, but the time could change even if the configuration space stays the same.
@@egor.okhterov Yeah, I am not saying it is the best analogy, and that's not what my original comment was meant to convert either. I pretty much just wanted to point what the chatter probably meant in chat when they said "velocity". That's pretty much why I said "could work", it's not perfect, but pretty good for teaching purposes. While it doesn't convey the structure, it conveys the relationship between the O notation and the "result" it "approximates" (a lot of quotes there).
Try sorting 1TB file without merge sort
Interesting. This article makes me appreciate the pragmatism of Lua’s design even more. It doesn’t just “use vectors (dynamic arrays) by default”, it uses them (tables) for _everything._ 🤔
yes there's a name for that... I've forgotten what it was
Tables are probably maps, which are much slower than vectors (unless they use some kind of wizardry underneath).
@@BoardGameMaker4108 Lua "table" is a hybrid of a dynamic array, and a dictionary/map/lookup-table/whatever-you-call-it, under the hood. That is why there exists separate ipair() and pair() in the first place. If you use Lua table with integer indices, it switches to the underlying dynamic array; if with keys, then a dictionary.
Lua does some interesting things to make tables that work as arrays as well as hash maps.
You start to see some of the issues with the approach if you've ever tried to use ipairs on a table that wasn't only written to using table.insert().
I actually think it'd be better if Lua had arrays and tables as separate data structures instead of the one type of table that you can use incorrectly.
@@MSheepdog, I really like the _brutishly_ simple design of Lua. It’s _so_ easy to embed, which is really what it’s meant for. I know people write entire _apps_ in Lua, but… if you’re mostly exposing entry points into another language and writing some glue code to customize policy & business logic, I think the “everything’s a table” design is reasonable. I’m aware there are pitfalls of ‘mixing paradigms’ with tables (i.e., treating the same table as both an array _and_ a dictionary)-but, the way I’m using Lua-I haven’t personally been bitten by them.
Please note that "List vs Vector" is not really a valid debate in most "modern" languages, since the default/preferred list implementation will usually be backed by an array and pretty close to a C++ Vector. like Java's ArrayList. Though the "only reach for a linked list/hash set/tree set/any specialized structure when you know you need it" advice is valid in almost any language.
And please don't use Java's Vector class, it's deprecated, slow, and should have been called ArrayList anyway.
The debate is valid in any programming language, you just need to know that “list” means linked list, and “vector” means dynamic array (well, vector by itself was clear enough).
@@YoTengoUnLCD True. I'm just targetting my fellow Javaist degenerates that'd try to use Vector thinking all lists are linked lists. :D
First TH-cam live and now Frontend Masters Workshop...Man you are so delicated into the Matrix 🥺🫂✨
There do exist fast in-place merge sorts. Batcher's odd-even merge sort comes to mind. But I would still test this with real data before using it, of course. Radix sort is still the go-to for ordering billions of values.
The amount of people in comments that misunderstand undergrad-level CS fundamentals is concerning.
You could make a few hypothesis from this observation that could make you money
Well that's one of the downsides of abstraction: people don't know how the computer actually works
Java uses merge and quick for their Aarrays inplementation
Java uses neither. See: dual pivot quicksort. (which uses parts of insertion sort and quick sort )
@@joejoesoft dual, tripple is quicksort. It also uses merge but this is not the channel to discuss specifics. 🤗
@@joejoesoft amd for the record the term is timsort. Go check the specifics if you wish to.
@@minor12828 It's been a while since I've looked at DualPivotQuicksort.java, but it now indeed does use more than just Insertion and Merge sort. A lot has changed since 2009. I stand corrected. I didn't look before the dual pivot change, but I vaguely remember it was bubble sort and single pivot quicksort at some point.
I feel heepy.
I feel unhappy
I feel eepy
The difference that you're trying to explain is between the instruction function, such as f(n)=5n^2+2n+12, which encodes the time the algorithm will run given some input (of course, length of collection is an obvious factor and the default example most times), and the asymptotic classes of functions that it belongs to. Here, the obvious answer is that f is in O(n^2), but that also means it's in O(2n^2), O((n^2)/2), O(en^5), and O(n^7+16n^3) because all that Big O means is that the function is bounded above by the function in the O multiplied by a constant k or greater (technically, this is for all n > m, for some constant m dependent upon that coefficient). All that is to say that Big O does have it's use in that moving an algorithm from O(n^5) to O(n log(n)) time complexity will make the algorithm faster over the majority of the domain, but it is still crucial for one to know what part of the domain one's input space covers whether one uses Big O, Big Omega, and Big Theta or the instruction function itself.
The advice for the rest of what appears in the video is essentially measure unknowns and don't concretize too early.
The assumption I always see with this argument is this is running for a single user/client. The entire argument relies on a specific context, but I suspect will fail when it's scaled by concurrent users/clients. Being memory greedy just means you'll hit the barrier where you're not getting cache hits faster.
I'd say front end should be fine with the "default to vector" idea. Backend should think twice and embedded should probably ignore. Everyone should be re-thinking using C++ at all, moreover.
Prove it
@@egor.okhterovI stated I suspect it, therefor I have proven that I suspect it.
0:01 Yes
There is a C++ talk about library implementation. The standard sort use a threshold and use a different sorting method depending on the number of element
This whole video is silly. Anyone who understand what is happening in a computer knows that cache hits are much slower then moving even large chunks of memory. All Bjarne is doing is pointing out that lists cause a lot of cache hits.
As for what collection to use in a program abstract the collection so you can change it later.
By who 😂😂😂?
I only disagree with Bjarne slightly. My advice is to use whichever container makes your code the most readable by default unless you have a good reason not to. And then even after that you should probably use an adaptor (which gets optimized away by the compiler) so that the code uses the readable use paradigm, but the faster algorithm. For example if a map makes logically the most sense, use a map or hashmap. If you later find out through benchmarks that a contiguous, sorted storage is better for performance, then use an adapter like flat_map so you don't have to compromise the clarity of representation of your business logic to take advantage of a faster algorithm. If you have to implement it yourself, follow the YAGNI principle and only implement the methods that you need, but with almost certainty someone has already implemented what you want in a library already.
Stroustrup: This is example is NOT about how big-O can be deceiving.
Big-O-magen: So, anyway, let me spend the next 10 minutes explaining how big-O notation can be deceiving.
I'm afraid there are some dumb people who will use this video to justify writing the most ridiculous N^(100-theirIQ) algorithm and argue ferociously that they are virtuous.
Lists are clearly the work of the devil
No
O(1) cannot be bigger than O(n). but yeah it can be slower.
edit: it doesn't matter that there is a c constant there will also be a c in the O(n) then... O(1) < O(cn)
100000 > n² + n + 1 until n is big enough, that what big O(1) can be bigger than O(n²) means.
@@MrLeDiazz 100000 can be bigger or smaller than any arbitrary n.
but O(n) < O(1) as a definition can not make any sense what you are talking about is a real number 100000 being greater than any arbitrary other number [n] .
I'm just saying that towards the end of the video there are some takes that seem to be disproving some common myth or something. there is nothing to disprove in that particular case because O(1) is always* smaller than O(n).
@@brod515 I'm not sure what your point is. Is O(1) always smaller than O(n) from the strict definition of the notation? Ok, maybe, but we're not talking about theory here. We're talking about practice, with a finite input.
The problem is that a lot of people seem to think that an _algorithm_ with O(1) runtime always runs faster than an algorithm with O(n) runtime. That is definitely not true. O-notation doesn't describe how fast something is. It describes _how it scales._ O(1) will always outpace O(n) when the input grows large enough, but if you only run small inputs the O(n) option might easily be faster. That is important to understand (!) but for some reason a lot of programmers don't.
O(1) is a set. O(n) is also a set. You cannot use < or > for these objects. What you should use instead is set operations like:
O(1) ∪ O(n) - union of sets
O(1) ∩ O(n) - intersection of sets
O(1) ⊂ O(n) - statement saying that O(1) is a subset of O(n).
That statement is true by the way.