C++Now 2018: You Can Do Better than std::unordered_map: New Improvements to Hash Table Performance

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 มิ.ย. 2024
  • cppnow.org
    -
    Presentation Slides, PDFs, Source Code and other presenter materials are available at: cppnow.org/history/2018/talks/
    -
    The hash table is probably the most important data structure. Because of that importance, there is a large zoo of possible implementations with design trade-offs. The standard hash table, std::unordered_map, traded off performance in order to get backwards compatibility with std::map. Which was probably a good choice, but it does leave us with a lot of hash tables that are slower than necessary, while also using more memory than necessary.
    This talk is about replacements for std::unordered_map. How they work, why they are faster, and when you should choose which. Topics include linear probing with Robin Hood Hashing, Google's new trick of using SIMD instructions to look at 16 elements at a time, and optimizations for node based containers, because they can actually be really fast.
    I will also talk about recent improvements to hash table performance. Little tricks that shave nanoseconds from table lookup times. In an environment that's already had decades worth of micro-optimizations, it's fascinating to watch as people come up with inventive new ways to keep pushing performance.
    -
    Malte Skarupke
    Malte is an AI programmer at Avalanche Studios in New York. In his free time he likes to optimize algorithms. He blogs at www.probablydance.com
    -
    Videos Filmed & Edited by Bash Films: www.BashFilms.com
    ---
    *--*
    ---
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 48

  • @willw2596
    @willw2596 2 หลายเดือนก่อน +1

    I'm 15 minutes in and find this to be one of the best technical talks I've ever seen.

    • @BoostCon
      @BoostCon  2 หลายเดือนก่อน

      Thank you so much for your appreciative comment!

  • @OmarChida
    @OmarChida 2 ปีที่แล้ว +14

    One of the best talks I have ever saw.
    I quite like his scientific approach.

  • @TheOnlyFeanor
    @TheOnlyFeanor 5 ปีที่แล้ว +15

    This was an excellent talk, thank you.

  • @amitm1157
    @amitm1157 5 ปีที่แล้ว +3

    Nice explanation about cache hits and misses while using hash tables.

  • @conradomartinezp
    @conradomartinezp ปีที่แล้ว +2

    As long as you have a way to solve ties in Robin Hood which only depends on the identity of the keys, e.g., the smaller key stays at its slot, the larger continues probing, the layout of the hash table after N inserts will be exactly the same, independent of the order of insertion. If ties are solved by the time of insertion then a cluster of synonims (equal hash value) will be ordered by time of insertion. Robin Hood does not reduce the average time for successful search but it makes a bit reduction of the variance (even in almost full tables) and it also reduces the expected longest probe sequence from O(log n) to O(log log n). It's nice to see that it also does well in practice 😉

  • @TheLavaBlock
    @TheLavaBlock 5 ปีที่แล้ว +3

    Great talk! Thanks

  • @dennisrkb
    @dennisrkb 4 ปีที่แล้ว +4

    you explained google's map much better than they did themselves

  • @wiadroman
    @wiadroman 5 ปีที่แล้ว

    Great material, thanks a lot.

  • @origamibulldoser1618
    @origamibulldoser1618 5 หลายเดือนก่อน

    This is mad fascinating.

  • @user-ze6fs5ui6h
    @user-ze6fs5ui6h หลายเดือนก่อน

    Thank you! Best talk I have seen on this topic😊

    • @BoostCon
      @BoostCon  หลายเดือนก่อน

      Thank you! So pleased to hear that you found the presentation informative.

  • @KilgoreTroutAsf
    @KilgoreTroutAsf 5 ปีที่แล้ว +14

    300 nanoseconds amortized sounds atrocious for a cache miss on modern hardware.
    could it actually be 300 cpu cycles instead? This would fit the typical 50-80 ns latency for ddr4 and 3~4 Ghz cpu frequency
    either that or each miss results in O*(4) accesses to ram, with the cost of cpu instructions being too small ( ~0.25 ns per cycle) to matter

  • @lenkite
    @lenkite 4 ปีที่แล้ว +7

    Wonderful and enlightening talk especially the zooms at various N levels. So std::map is great at < 400 elements!. I only wish the size of the element was also shown.

    • @leleo53000
      @leleo53000 4 ปีที่แล้ว +1

      it's using `std::map`, the size of the element is probably 4

    • @excitedbox5705
      @excitedbox5705 3 ปีที่แล้ว

      @@leleo53000 I think he said 8 later on but that might not apply to that.

  • @77oxf
    @77oxf 3 ปีที่แล้ว +7

    Why is std::map faster for fewer elements?

  • @FindecanorNotGmail
    @FindecanorNotGmail ปีที่แล้ว +1

    In the final algorithm, to evict an element inside a linked list you'd have to unlink it from the list by adjusting the link to it to point to the element that the evictee was pointing to, and then replace it last in the list. (Or am I missing something?)
    But that means that to be able to evict any element, then the sum of the jump distance _to_ it and _from_ it would also need to be in jump_distances[].
    If not, you _could_ grow the table, but would you be guaranteed that it would fit _then_ ?

    • @jacksonallan954
      @jacksonallan954 9 หลายเดือนก่อน +1

      I haven't checked Malte's code to see how he does it, but I did implement a similar hash table. You are imagining that the jump-distance value is the jump distance from the *current* bucket to the bucket containing the next element in the chain. However, I think the jump distance values have to be relative to the chain's *home* bucket, i.e. the bucket to which all the elements in the chain belong and that contains the first element in the chain. That way, the problem you identified disappears. We can always re-link elements existing in a chain.

  • @ratchet1freak
    @ratchet1freak 6 ปีที่แล้ว +6

    I'm pretty sure that you can significantly improve the robinhood adaptation of the google's map implementation is different,
    If you treat the 16 element bucket as a bucket that each are targets of the hash, instead of the individual elements of the buckets. Then the robin hood variant can use just 2 bits for the distance meta data (0-2 bucket distance + 3 for empty) and keep 6 bits for the hash pre-filter.

  • @Voy2378
    @Voy2378 6 ปีที่แล้ว +16

    Benchmarking only on int keys only is deceptive. If you have std::string or any other type with expensive operator == ridiculously high load factor will kill you.

    • @KilgoreTroutAsf
      @KilgoreTroutAsf 5 ปีที่แล้ว +5

      A good implementation for large-sized or variable-length keys should never store keys and hashes together, because it completely negates cache locality and memory alignment trickery.
      The better solution is to use two-level indirection and use the hash table to store (hash,pointer) or (hash,offset) pairs, where the offsets point to a flat array or other memory pool for keys, and are just linearly incremented after each addition.
      Also, the whole point of hash tables is to use a good universal hash function which digests the object into a fixed-length string of enough bits to guarantee very low collision probability. The final test for true equality should only happen more than once very rarely.

    • @excitedbox5705
      @excitedbox5705 3 ปีที่แล้ว +3

      insert your strings into an array and use the keys as your INT. Then when you retrieve the key you can use stringArray[key] which only adds minor overhead.

    • @andrewgrant788
      @andrewgrant788 ปีที่แล้ว +1

      Agreed. The comparison between std::map() and std::unordered_map() performance for small values of N are very misleading when benchmarked with int keys. Because a std::map is a tree, std::map::find() does a lot of < comparisons of keys which is very cheap for ints and much more expensive for strings or more complex keys. unorded_map first hashes the key to determine the bucket and then compares the keys in the same hash bucket. Keys can be pre-hashed to further optimized performance.

  • @roman9509
    @roman9509 5 ปีที่แล้ว

    cool

  • @Sychonut
    @Sychonut ปีที่แล้ว +3

    The content was great and much appreciated, but damn it felt at several points throughout that it's a nerds fest with people trying to show off to one another how smart they are. It's just not a good look.

  • @ABaumstumpf
    @ABaumstumpf 4 ปีที่แล้ว +6

    If only the real world was that simple - a hasmap for int-int, gosh how it would like that.
    instead we have things like....
    normally we have maps of maps of vectors and such things.........

    • @blarghblargh
      @blarghblargh 3 ปีที่แล้ว +4

      his benchmarks are open source. go prove how his map works worse for those, and show us you actually know what you're talking about

    • @muonline2067
      @muonline2067 ปีที่แล้ว

      @@blarghblargh my question is: where can I download his unordered_map for testing man?

  • @joestevenson5568
    @joestevenson5568 8 วันที่ผ่านมา

    Great talk, terrible audience

  • @zyxyuv1650
    @zyxyuv1650 5 ปีที่แล้ว +1

    Obvious German

    • @seditt5146
      @seditt5146 5 ปีที่แล้ว +1

      Really? Because I thought he sounded as though he had a slight Irish accent not German.

    • @pruibiebehastoet1914
      @pruibiebehastoet1914 2 ปีที่แล้ว

      Because of the gründlichkeit ?