CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures"

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ธ.ค. 2024

ความคิดเห็น • 114

  • @coder0xff
    @coder0xff 7 ปีที่แล้ว +124

    My take away: cache locality is everything.

  • @fritsgerms3565
    @fritsgerms3565 3 ปีที่แล้ว +42

    I watch nearly all talks on cppcon + meeting. This is my favourite subject. This talk encapsulates 70% of what it is to write efficient code. to cover 70% is incredible for an hour. Best talk to date on the subject.

  • @cosmicallyderived
    @cosmicallyderived 7 ปีที่แล้ว +79

    26:16 Very good chandler,
    "Why do I bring up these simple examples, I bring up these simple examples because it is critical that you always do these things, ok. Not some of the time, not after a profile discovers a problem, you wont find a profile that points to these problems, that's part of what makes them really tricky, ok. The key here is that you want to always do these things because they don't actually make your code worse."
    Essentially we're talking about efficiency death by a thousand performance hits, microoptimiziation zealotry called to battle, I like it.

    • @infinitesimotel
      @infinitesimotel 6 ปีที่แล้ว +10

      Indeed itis the unnoticed minutia that add up in the aggregate that cause most of the drag than the one-shot-celebrity problem. Then again it doesn't show up on the radar as brightly. Lesson of life: small things cause the biggest problems. We live in an ironic universe.

    • @nickwilson3499
      @nickwilson3499 2 ปีที่แล้ว +2

      The people that constantly harp about premature optimization need to hear this

  • @soulstudiosmusic
    @soulstudiosmusic 10 ปีที่แล้ว +46

    Excellent talk and even more accessible than Mike Acton's Data-Driven design talk (also good) without being so low-level oriented.

  • @eyedl
    @eyedl ปีที่แล้ว +3

    Thanks! This talk is still very useful in 2023, though it could be 2 times shorter. The gist is: be cache-friendly, keep your data locality high, use contiguous containers. Instead of std::map you can often use std::vector + housekeeping, and sort when necessary. Lists are evil. Earlier, I saw that, in synthetic examples, when not much else is going on, the performance of std::vector and std::map is essentially the same when the sizes of the containers are below ~500. The tricky part here is the "when not much is going on". So I decided to carry out a real life experiment. Recently I wrote a lightweight node profiler for a tree of CPU-heavy nodes of a size ~400. I used maps to sort measurements by two distinct properties. It was ok, but in debug builds the performance hit due to profiling was noticeable, similar to 1-2 heavy nodes. I spent 2 hours rewriting the code to std:vector plus some additional housekeeping.The gathering and sorting of data in debug builds became ~3.5x faster, and filtering of data became ~8-10 times faster. Surprizingly, the code became 1.5x smaller. Now it is entirely possible that the first version of my code was simply shit. But anyway, now my profiler is really a lightweight one.

  • @echosystemd
    @echosystemd 7 ปีที่แล้ว +13

    I love the table of memory latency at 37:15

  • @joshramer7
    @joshramer7 5 ปีที่แล้ว +6

    Absolutely a wizard.

  • @morbo3000
    @morbo3000 9 ปีที่แล้ว +5

    No, he didn't say to simply replace map with vector. He said that a sorted vector (where find() would become a binary search) has the same functionality as map but much higher performance.

  • @sparkloweb
    @sparkloweb 9 ปีที่แล้ว +7

    "Performant" is very much a valid word in marketing pidgin.

    • @GodOfMacro
      @GodOfMacro 7 ปีที่แล้ว +1

      i'm french, it's a word in french I tought it would be a word in english too hehe

  • @paulpach
    @paulpach 4 ปีที่แล้ว +8

    24:39. That code is NOT linear. Yes, you hash the key 4 times, but you always hash it 4 times regardless of how many entries are in the cache. If it was linear you would do more work with bigger caches. That code may not be efficient, but it is still O(1)

    • @jacobschmidt
      @jacobschmidt 2 ปีที่แล้ว

      it's linear in that you have to compute it every time you call it, if you called it 10'000 times it would be 10'000•k where k is the time to perform the lookup. you can make it O(1) lookups per use by just looking it up once. pretty sloppy terminology but that's what is meant

    • @rutabega306
      @rutabega306 2 ปีที่แล้ว +5

      Yeah I don't understand why he said that. Even if he was talking about some worst case performance of the hashmap, the fix he talks about in that section doesn't change the runtime complexity...

    • @Ehal256
      @Ehal256 8 หลายเดือนก่อน

      It's linear because the lookup in the hash table, after the key is computed, is worst-case linear.
      "If it was linear you would do more work with bigger caches."
      You do, at least in the worst case :). std::unordered_map is implemented via bucketing, (compute bucket via hash, linear search over bucket) but since hash collisions are possible, you could have everything in one bucket, hence, worst case linear.

  • @copium8269
    @copium8269 10 ปีที่แล้ว +5

    Great talk as usual by Chandler!

  • @pecholt
    @pecholt 10 ปีที่แล้ว +1

    he was talking about sorted vectors and in that case you can use std::lower_bound which will be faster then map::find

  • @gast128
    @gast128 4 ปีที่แล้ว +5

    It is pity that too much constraints in the C++ specification forces std libraries to specific implementations. It seems that we miss already out out on sbo vector (like Boost's small_vector) and a hash map with faster lookup. Perhaps that one also solves the creation cost of unordered containers which can be pretty heavy due to allocations.

  • @ingeborgschmidt-huser9299
    @ingeborgschmidt-huser9299 8 ปีที่แล้ว +10

    gotta love this guy.

  • @cosmicallyderived
    @cosmicallyderived 7 ปีที่แล้ว +5

    52:36 "...we are subject to some worse is better problems in the real world. There are going to be cases where the exact opposite of everything you know or want to believe will be true. For example, bubble sort is the fastest sort algorithm, for small data sets. Despite being quadratic algorithmically and being a terrible idea most of the time." I think Chandler would do well to abstract this divergence from the ideal processor/system where the algorithmic constraints are unhindered vs the real world cache hierarchy memory locality constraints. What properties would that ideal computer have to present for the algorithmic concerns to dominate in isolation? Is it just a matter of instant random access to any physical memory? Is that the bottom line of why "everything we want to believe is true" is actually only conditionally true?
    At the university we were definitely taught in a theoretical space about complexity, and these extra notions would have obscured the theoretical points being made. It's good to have a reality check though, thanks so much Chandler.

  • @robbie_
    @robbie_ 6 ปีที่แล้ว +13

    Very interesting but I'm confused... why don't the CPP committee give us a range of useful vector-based data structures that we know will be faster in practice (given some bounds)?

  • @FougaFrancois
    @FougaFrancois 5 ปีที่แล้ว +8

    The real sentence is "Rush to idle", Copyright Mooly Eden ... ;)

  • @auffiechumi
    @auffiechumi 9 ปีที่แล้ว +9

    The compiler can do return-value optimization if you return by value. But a move, which is not exactly free, cannot be elided.

  • @Kavukamari
    @Kavukamari 7 ปีที่แล้ว +10

    i dislike camera work where they jump away from the diagrams, like the cpu one, cause I like to look longer and study (obviously I can pause though, so its kinda moot anyhow)

  • @_TheDudeAbides_
    @_TheDudeAbides_ 4 ปีที่แล้ว +4

    The comparison of java and c++ prformance he does is good but i think the problem is his conclusion that it is hard to get java to perfoem better than c++ but in reality it is a bit difficult to make a c++ program run in an optimized way. So if you are not super skilld it is a bit of a dice roll. So if you simplify the situation you could say C++ is like rolling a die with the numbers 1-20 on it and using java is like rolling a die with the numbers.. lets say 5-15 because higher amount of control also means higher risk of messing up.

  • @jacob_90s
    @jacob_90s 8 หลายเดือนก่อน +1

    My definition of "performant" would be reaching the point at which continuing to improve the performance of a program does not justify the cost it would require to improve it.

  • @ABaumstumpf
    @ABaumstumpf 3 ปีที่แล้ว +9

    There are a few funny points about this:
    Caches - at the time of the talk intel Haswell was already released and they had one of the best experimental changes Intel did in that long 4-core time: They added an L4 cache. For some applications (my work included) that was a huge speedup (~20% in many instances). Sadly they did not further pursue that.
    Java - i mostly had the opposite experience: People claiming it to be oh so slow. Kinda funny when you can then show them that it is one of the most used languages for programs running on supercomputers. Sure, Java was extremely slow in the first version where is was basically an interpreted language. And even nowadays there are scenarios were it is crawling along like a sloth but that is in specific scenarios only as it has a rather long startup-time for the JVM so starting a small program thousands of times from commandline-arguments will be really slow.
    On the other hand writing java is easy and when putting the same amount of work into writing a program in C++ and Java then the Java-Version is very likely to perform better as it tends to be easier to write java and harder to write high performance C++.
    And performance:
    Well yeah, sadly programs are objectively becoming less and less efficient and slower at completing the same objectives - generally speaking.
    More time and resources is focused on other aspects and performance is treated like "Speed? We got faster hardware" very often. Specially with smaller games that is very noticeable. When 2D side-scrollers with maybe a few docent objects on screen will not run smoothly on a 10 year old CPU while looking worse than the games that came out 15 years ago....
    I have seen extremes like Sim-City clones that have less functionality and less going on would fully utilise 1 core of a current CPU and still only compute about 10 timesteps a second.
    And at my work it is also rather strange.
    Yes, i do use std::unordered_map for a few things as it made the algorithm easier. Funnily enough i initially used a normal std::map and during code-review it was mandated that i'd switch to unordered-map for better performance..... just that the change gave me exactly 0 improvement in execution-time cause for our software most of the time is wasted on an incredible archaic and inefficient tracing-tool. Changing the format in which i traced the data (constructing and formatting the string on my own) gave me a 3x speedup.
    The whole base-system has so many problems that the execution-time of the actual businesslogic is not even 1/10th the total.
    Best thing ever was the request to improve the interface i had written for communicating with a particular hardware-system.
    It was a block of ~2500 bytes, with the first couple bytes being some base-information that was always required and some control-info telling us which fields were filled. Reading everything in one go took about 1 second. The initial request was for me to read those few marking-bytes and then only read the individual fields that were actually useful - and that increased the time to over 2 seconds cause that hardware only polled incoming requests like 50 times a second. Changing it to read it opportunistic with a large initial block reduced it to 500 ms on average.
    And of those 500 ms i checked that my code was only running for ~10 ms, 9ms were for the slow tracing,

    • @LemonChieff
      @LemonChieff 3 ปีที่แล้ว +2

      Programming languages aren't fast. (Ok, maybe Awk)
      You can write slow code in any language. In fact, it's easier to do so in C++ than any other language.
      People don't measure before they cut. They add pointers everywhere, then it's a cache miss simulator.
      Food for thought: Java can never be faster than C++, it's written in C++.

    • @jan-lukas
      @jan-lukas 2 ปีที่แล้ว +1

      @@LemonChieff Java can be faster than C++, because if that C++ optimizes the Java code well enough that it has to do less than the C++ code would have to, it's faster than the C++

  • @georganatoly6646
    @georganatoly6646 2 ปีที่แล้ว

    good talks don't age! e: reflecting more about the LL/maps points, I found myself tending towards making basically 3D arrays and providing the API as if it were a map/graph/deque w/e just because I'm lazy and it was quicker/simpler to implement but then testing it runs faster as well

  • @kamilkarwacki9590
    @kamilkarwacki9590 5 ปีที่แล้ว +2

    Well you can compare the perfrmance of your code to the theoretical peak performance of your CPU. Nothing crazy about that. TPP will tell you how many operstions your CPU is able to do per second. Linear algebra libraries will be the programs that come closest to TPP

  • @informatiquereseaux7542
    @informatiquereseaux7542 2 ปีที่แล้ว

    Very informative talk, thanks !

  • @sgsfak
    @sgsfak 4 ปีที่แล้ว +1

    Although I agree on the cache locality affect on performance, I don't really buy the argument using vectors against map and lists. The problem is that you normally need to traverse (almost) random locations in the memory in order to do anything with the objects you reference in your collection, unless these objects are actually contained in the vector itself. In every other case that you put (smart or otherwise) pointers in the vector, and you want for example to locate an object (e.g. with binary search) then cache misses will be frequent.

    • @ABaumstumpf
      @ABaumstumpf 3 ปีที่แล้ว +2

      Even in the case you are storing pointer in the vector - that is then just 1 memory access vs 2 for a list. As he explained Lists are vastly slower in NEARLY all instances, just for some specific applications they can be nice.
      If you want to search for a specific object then only if there is no heuristics you can use for your vector and you are storing them via pointer, then a list will have similar performance.
      Store the object directly, or your data is sorted, or you know that it is more likely to be in a specific range? Vector will be faster most of the time.

  • @bokwoon
    @bokwoon 4 ปีที่แล้ว +6

    26:13 accessing a map 4 times is O(n) but accessing a map once and storing the result suddenly makes it O(1)? What? If you're going to count the hashing and pointer chasing as O(n), whether you do it 4 times or 1 time is not going to change the fact that it is O(n). Correct me if I'm wrong.

    • @karma6746
      @karma6746 4 ปีที่แล้ว

      Related to that, is his 'fixed' code forgetting to create a new element in the map when it does not already exist, which the original code was doing? Or am I mistaken?

    • @itsjosh3948
      @itsjosh3948 4 ปีที่แล้ว +1

      I don't think he said it stops being linear, of course it's still linear (O(4n) = O(n)). The two things he says about it are that it's an efficiency improvement (because it does less work, which he established early on is what he means by efficiency) and that it's faster because of this. Also, I'm pretty sure the fixed code creates the new element because of the reference (just tested and it does).

    • @darkengine5931
      @darkengine5931 4 ปีที่แล้ว +2

      I think the point he was getting at is that even if a hash table lookup is considered to be constant time complexity, it is not necessarily cheap to look up something like a string key since hashing a string key requires a linear-time loop through the string's characters. I don't think he meant to imply the reducing the number of lookups by a constant factor changes the algorithmic complexity, but simply that it is flawed to write code as though it assumes it's so cheap that it can be done 4 times more than necessary.

  • @akasilov
    @akasilov 10 ปีที่แล้ว +4

    interesting talk!

  • @kozlovskyi
    @kozlovskyi 4 ปีที่แล้ว

    This speech is sooo cool

  • @keke8273
    @keke8273 6 ปีที่แล้ว +2

    40:39 Why is one cycle of a 3Ghz 1ns? Shouldnt it be 0.33333ns? How many CPU clock cycles are needed to access the L1 cache? Half a cycle? Is this correct?

    • @Bing0197
      @Bing0197 3 ปีที่แล้ว

      Maybe he made a mistake

  • @sirhenrystalwart8303
    @sirhenrystalwart8303 10 หลายเดือนก่อน

    The compulsion to return return a vector by value is very suspect. Returning by value in this case forces the user to always re-allocate a new vector, even though it's often cheaper to re-use a vector you already had, especially if the new vector will have the same size or be smaller than the old vector. But return by value prevents that, because RVO doesn't work for re-assignment.

  • @colin398
    @colin398 ปีที่แล้ว

    at 23:00 isnt it a way bigger optimization (if X is expensive to copy) to move the X's into the vector or emplace back with constructor arguments? Isnt that going to speed up the code a lot more than reserve would? I understand that reserve will make the code more efficient, but Im confused why he chose to talk about that and not this other optimization that seems more obvious to me.

  • @neo1017
    @neo1017 ปีที่แล้ว

    This is a great talk !! Do we have something similar for Python as well ? Because Python does really need performant approaches. Any talk that covers some basics and /or advanced stuff?

  • @sampajano
    @sampajano 10 ปีที่แล้ว +1

    Awesome talk.

  • @tristunalekzander5608
    @tristunalekzander5608 3 ปีที่แล้ว

    idk I tested using output pointer vs return value and in my simple case they were about the same speed with the output pointer being slightly faster.

  • @mytech6779
    @mytech6779 2 ปีที่แล้ว

    The twist is that transistor density has become so high that lighting up all the transistors at once will cause the silicon to melt in a very short time. (seconds?)

  • @AnastasisGrammenos
    @AnastasisGrammenos 7 ปีที่แล้ว +7

    what a diverse crowd! :D

  • @TimTeatro
    @TimTeatro 9 ปีที่แล้ว +1

    @ +aziz as ---- Linked lists aren't bad (in general). Traversing lists on modern hardware is slow because of the way the caching mechanism works. Because it isn't generally stored contiguously in memory, the processor will keep requiring lookups in slower memory rather than loading a contiguous block into the cache.

    • @mortvald
      @mortvald 7 ปีที่แล้ว +2

      Still traversing a binary tree is much more efficient in term of computation required, comes with a trade off but worth it if you have to search often in a massive data pool.

  • @BobbyAlter
    @BobbyAlter 7 ปีที่แล้ว

    Great talk

  • @nexovec
    @nexovec 2 ปีที่แล้ว

    23:00 Doesn't that make like 2 copies of every single thing in the vector? I thought things like this ran a copy constructor...

    • @MsJavaWolf
      @MsJavaWolf 2 ปีที่แล้ว

      You mean you should use emplace_back instead?

  • @IvanovPlamen
    @IvanovPlamen 10 ปีที่แล้ว

    Good talk.

  • @gbizzotto
    @gbizzotto 5 ปีที่แล้ว

    The missing frames are driving me nuts

  • @MrAbrazildo
    @MrAbrazildo 8 ปีที่แล้ว +1

    - So, is it always better to do all the work locally, and return a copy?
    - If a f() has more than 1 return way, it tends to be slower?
    - Never use 'std::move()'?
    - If an array/vector must have 64 bytes to fit in the L1 cache, does that mean that the array should be divided into several ones, 64 bytes each?

    • @AttilaSzenczi
      @AttilaSzenczi 7 ปีที่แล้ว +2

      Depends on your code, the more complex it is, the less likely the optimizer can make NRVO.
      Tho i think you have to work really hard for it, because the optimizer can even change the order of your object creations to make NRVO possible.
      I have to mention that NRVO is not guarantueed by the standard, only RVO is. The major compilers does pretty good job with NRVO tho.
      std::move: If you return by value and the compiler doesn't do RVO/NRVO, then it's implicitly moved. It's guarantueed by the standard. But if you write std::move it's prohibited to RVO because of the type mismatch.
      Rule of thumbs: Never use move in return statements.

    • @robinmartens4662
      @robinmartens4662 6 ปีที่แล้ว

      - If the compiler uses return value optimization, you don't actually copy anything. The actual objects will be constructed directly where the copy would end up.
      - If there is more than 1 return way, then return value optimization is less likely to be used.
      - std::move gets in the way of return value optimization, but generally moves are faster than copies (although Scott Meyers says "Assume that move operations are not present, not cheap, and not used")
      -No need to divide arrays into chunks, just keep your operations as spatially local as possible, i.e. deal with your left neighbor or right neighbor in memory next, don't jump 100,000 bytes ahead or back

  • @iyalovecky
    @iyalovecky 5 ปีที่แล้ว

    It's not amortized but expected 54:11

  • @victornoagbodji
    @victornoagbodji 10 ปีที่แล้ว +3

    great talk : )

  • @Yupppi
    @Yupppi 11 หลายเดือนก่อน

    Imagine if the data centers sold the cooling water as heating water that we're regularly spending in the northern hemisphere.

  • @himanipku22
    @himanipku22 6 ปีที่แล้ว +1

    39:10 What about L3 cache?

  • @QuickNETTech
    @QuickNETTech 3 ปีที่แล้ว +4

    I love when people say ignorant things like "[word which is indeed a word] isn't a word" and are being serious, it just shows they don't understand how language works. Dictionaries describe the usage of words by the speaker of a language, they do not determine what is and is not a word and they do not determine the objective meaning of words.

    • @ABaumstumpf
      @ABaumstumpf 3 ปีที่แล้ว

      The only difference are specific technical terms - they are defined to mean one specific thing for a reason.
      a red-black-tree is one specific thing and nothing else, but there is always the option to add new terminology.

  • @morbo3000
    @morbo3000 9 ปีที่แล้ว

    It's bad in all languages because each node is allocated separately by definition.

  • @feraudyh
    @feraudyh 2 ปีที่แล้ว +1

    I for one am guilty of using linked lists excessively in the past.

  • @Shoudibola
    @Shoudibola 10 ปีที่แล้ว +1

    One thing was not clear. If I am not supposed to use std::map, then making a std::find on a std::vector is faster than std::map find method?

    • @MrAbrazildo
      @MrAbrazildo 7 ปีที่แล้ว +1

      Yes, because it's a contiguous memory, instead of a generic linked list.

    • @jaredmulconry
      @jaredmulconry 7 ปีที่แล้ว

      Augusto César Martins Gil The short answer is that it depends.
      Depending on the time cost of comparing against your key and the number of elements stored, map::find may come out faster due to the Log(N) number of comparisons. If, however, your vector data is also in sorted order, you can use std::lower_bound for your search and best map almost every time.
      It's way more complex than I have described, so I recommend testing some of this for yourself.

  • @aleksay2142
    @aleksay2142 6 ปีที่แล้ว

    26:09 example says:
    X* getX(std::string key, std::unordered_map& cache)
    {
    std::unique_ptr& entry = cache[key];
    if(entry)
    {
    return entry.get();
    }
    entry = make_unique();
    // cache[key] = entry; //

  • @frankistcd
    @frankistcd 9 ปีที่แล้ว

    Why not use move instead of returning by value?

    • @nameguy101
      @nameguy101 8 ปีที่แล้ว +8

      Moving prevents NRVO

    • @MrAbrazildo
      @MrAbrazildo 8 ปีที่แล้ว

      +Nameguy ??

    • @PaweSkalczynski
      @PaweSkalczynski 8 ปีที่แล้ว +6

      If you are asking about that vector example: en.cppreference.com/w/cpp/language/copy_elision . TL;DR compiler is doing copy elision what eliminates copying that vector and it returns that one instance directly. Move would be better than copying, but still slower that copy-elision. :)

  • @xealit
    @xealit ปีที่แล้ว +2

    it feels a bit underwhelming. The whole talk could be condensed to 20 minutes, saying: memory access is the main bottleneck, hence cache is important and data access pattern must have the locality, use std::vector and nothing else, and call reserve when you know the size ahead. In fact, the most useful part of it is all those bits in the discussion, about the return value optimization (which "we are not supposed to focus on in this talk"!) and that std::map is actually a linked list, and that there is google's "dense hash map" floating somewhere in open source projects on the internet. Which is ironic, because everything important is implicit, like always in C++.

  • @edwardyang8254
    @edwardyang8254 3 ปีที่แล้ว

    I really don't see his rant against map. Maybe it is not useful to the apps that he's seen at Google. But to think he's seen them all at Google is no less than sheer hubris. Also his rant against std::unordered_map seems only on its implementation, not the data structure itself.

  • @47Mortuus
    @47Mortuus 4 ปีที่แล้ว +10

    "99.9999% you want to run programs faster, to save battery life"
    the games industry does NOT - I repeat - does NOT exist

    • @ladislavdobrovsky8826
      @ladislavdobrovsky8826 4 ปีที่แล้ว

      games are more about efficiency because of the added complexity and detail, if you do things efficiently

  • @KX36
    @KX36 2 ปีที่แล้ว

    but I liked unordered_map.. goddamnit..

  • @auffiechumi
    @auffiechumi 9 ปีที่แล้ว +1

    It seems to be a French word.

  • @jesuschrist7037
    @jesuschrist7037 5 ปีที่แล้ว

    Dropped

  • @brucevayding9144
    @brucevayding9144 9 ปีที่แล้ว +1

    allocation in loop hole
    allotcation in loop hole

  • @sirhenrystalwart8303
    @sirhenrystalwart8303 10 หลายเดือนก่อน

    RVO and NRVO have too many caveats to be trusted. Sometimes it's required, and sometimes it's not. Sometimes you get the optimization anyway, abut sometimes you don't. Even if the compiler optimized it away today, nothing guarantees it still will tomorrow. Nobody has time to constantly check the assembly output and there is no way to assert your got RVO. It's much simpler and more reliable to just use an out parameter.

    • @jahrgang91
      @jahrgang91 5 หลายเดือนก่อน

      In which cases besides "large" stack objects does this really apply where move semantics won't help you out?

  • @tommylebron5593
    @tommylebron5593 8 ปีที่แล้ว

    what he said

  • @givememorebliss
    @givememorebliss 8 ปีที่แล้ว +19

    I wish I could marry him.

    • @Novous
      @Novous 7 ปีที่แล้ว +32

      I'd rather just hire him.

  • @QuentinUK
    @QuentinUK 8 ปีที่แล้ว

    Making code fast.

  • @x86me75
    @x86me75 5 ปีที่แล้ว

    I think that this wouldn't apply for interpreted languages like Python, because Python under the hood is working with buckets of memory, and every data type is an object(int, str, list, tuple, true or false, etc.)

    • @ABaumstumpf
      @ABaumstumpf 3 ปีที่แล้ว

      Even for Python (which is very slow in comparison) it matters. And everything that requires performance in Python is then no longer handled by lists/buckets but very often just good old c-libraries.

    • @rutabega306
      @rutabega306 2 ปีที่แล้ว

      While not directly relevant to python, libraries like numpy and pandas are hopefully making good use of the cache (and other optimizations discussed in this talk) in vectorized functions.
      while you don't need to know this stuff to use the libraries, it helps motivate why one would use vectorization over a native for loop.

  • @fungiside
    @fungiside 10 ปีที่แล้ว +2

    The overall message of this talk seems to be that STD data structures are in general pretty terrible.

    • @babgab
      @babgab 7 ปีที่แล้ว +11

      No, this applies to data structures broadly. A linked list is still a linked list even if it isn't the standard library implementation.

  • @stephenry
    @stephenry 7 ปีที่แล้ว +21

    In summary: linked lists and red-black trees, although fundamental data structures in computer science, are terrible and should never be used for anything because "caches"... got it.

    • @DarthShad
      @DarthShad 6 ปีที่แล้ว +11

      caches are everything these days my dude

    • @MrOhioutod
      @MrOhioutod 6 ปีที่แล้ว +18

      Yes. You are correct. I'm not sure what the point of your scare quotes was -- as if the idea of cache locality is a myth? The answer, as always, is benchmark. I've run tests on my code where linear search kicked binary search's ass -- and on large collections. Cache locality is key.
      The problem isn't that our idea of algorithms is wrong. It's that the standard computational model we use to analyze algorithmic performance assumes that each operation takes exactly 1 unit of time. As the data sheets from CPUs show us, that's not true -- certain operations (computes) take 1 unit of time, and other operations (memory loads from RAM) take 100 units of time. There needs to be some branch of algorithmic analysis that fixes its computational model to take cache misses into account.

  • @davidpike766
    @davidpike766 6 ปีที่แล้ว +2

    these c++ dudes are really passive-salty

  • @SumoCumLoudly
    @SumoCumLoudly 6 ปีที่แล้ว +1

    Im looking for hot girls, is this a good place to find them?

  • @DonCorleone1234
    @DonCorleone1234 8 ปีที่แล้ว +1

    This is ridiculous, and insulting to generations of computer scientists.

    • @DonCorleone1234
      @DonCorleone1234 8 ปีที่แล้ว +1

      +Yesudeep Mangalapilly (येसुदीप मंगलापिल्ली) I am not exaggerating at all, and a lot of what the presenter is saying should be common knoweldge (which surprises that he is presenting that in such a conferece). There are applications for all the known data structures. Then, there are efficient implementations for all such data structures. Then, we have the applications we are trying to optimize. Yes, in many cases - we will only need vector (or array) because the properties of vector are suitable for most applications. Yet, once we start get a bet deeper with the applications we design, we will realize that we need good data structures other than just vector.

    • @DonCorleone1234
      @DonCorleone1234 8 ปีที่แล้ว

      And who said lists are bad? it is the way we allocate objects that is bad. The solution is not to say: "Oh lists are bad, we should use vectors", No. the solution should be in a better implementation of lists (e.g., use object pools). Oh but wait, most object pools are implemented as lists, trees, and vectors!

    • @jirihavel9766
      @jirihavel9766 8 ปีที่แล้ว +27

      Object pools don't solve the main problem, which is incoherent memory access. Hardware prefetcher doesn't like lists.

    • @babgab
      @babgab 7 ปีที่แล้ว +10

      >a lot of what the presenter is saying should be common knoweldge (which surprises that he is presenting that in such a conferece)
      Yes, it should be. But sadly, it isn't. Even the undergrads who get taught this stuff nowadays in their systems' programming courses have a tendency to forget it once they get into the "real world," or even immediately after the exam. Or they dismiss these things as "microoptimizations" - the presenter even alludes to this.

    • @cosmicallyderived
      @cosmicallyderived 7 ปีที่แล้ว +9

      I thought it was a great talk to concretely break the myth about the "ideal low algorithmic complexity == fast" in the absolute sense one is taught in university CS courses. All that only holds true under some ideal computing device assumptions not the reality of cache hierarchy and cache locality (modern processors as he refers to them as) which sound introduce factors that can wind up dominating compute cycle percentages when cache misses are too frequent. I don't recall hearing anything remotely to this at the university or maybe was too green to appreciate a subtle caveat even if it was offered. The focus was on algorithmic complexity alone in isolation. This talk carries this point some more constrained to very specific hardware th-cam.com/video/rX0ItVEVjHc/w-d-xo.html