Faster than Rust and C++: the PERFECT hash table

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 พ.ย. 2024

ความคิดเห็น • 2K

  • @strager_
    @strager_  ปีที่แล้ว +207

    Click 'read more' for performance tips with timestamps 👇
    Source code of all of the techniques (for educational purposes) (warning: poorly organized):
    github.com/strager/perfect-hash-tables
    Production C++ implementation of my hash table:
    github.com/quick-lint/quick-lint-js/blob/9388f93ddfeca54f73fab398320fdcb8b3c76a66/src/quick-lint-js/fe/lex-keyword-generated.cpp
    My JavaScript/TypeScript compiler project:
    quick-lint-js.com/
    Rust reimplementation of my hash table:
    github.com/YangchenYe323/keyword-match-experiment
    Performance tips:
    04:16 1. Try it
    08:18 2. Make assumptions about your data
    16:24 3. See how other people did it
    20:08 4. Improve your data
    20:28 5. Keep asking questions
    21:17 6. Smaller isn't always better
    24:57 7. Use multiple profiling tools
    25:13 8. Benchmark with real data
    29:46 9. Keep trying new ideas
    32:42 10. Talk about your optimizations

    • @Makotom
      @Makotom ปีที่แล้ว

      Please fix this 😁

    • @strager_
      @strager_  ปีที่แล้ว +4

      Fix what? What's broken?

    • @Makotom
      @Makotom ปีที่แล้ว +1

      @@strager_ Fix your comment on top😁

    • @strager_
      @strager_  ปีที่แล้ว +1

      What is broken about it?

    • @omgtallmonster
      @omgtallmonster ปีที่แล้ว

      Is this meant to be a pinned comment? This shows up as a normal comment to me.

  • @spokesperson_usa
    @spokesperson_usa ปีที่แล้ว +1672

    I can't believe I watched a 34 minute video on hashing without getting bored and wandering off, this was an amazingly interesting talk.

    • @enisten
      @enisten ปีที่แล้ว +72

      He optimized his presentation, too. :)

    • @bittikettu
      @bittikettu ปีที่แล้ว +5

      It is because this wasn't a tryon haul. 😂

    • @enisten
      @enisten ปีที่แล้ว +3

      @@bittikettu The first rule of try on hauls is: you do not talk about try on hauls.

    • @NoxWings
      @NoxWings ปีที่แล้ว +17

      wait what? I just realized the video was 30+ minutes long because of this comment o.O

    • @Tawnos_
      @Tawnos_ ปีที่แล้ว +3

      The video was only 17 minutes long, but I watch at 2x speed ;)

  • @ijlijl
    @ijlijl ปีที่แล้ว +2541

    Extreme optimization can be an addicting game. Very cool to see it documented.

    • @voidwalker7774
      @voidwalker7774 ปีที่แล้ว +114

      you could have typed that faster ;)

    • @nextlifeonearth
      @nextlifeonearth ปีที่แล้ว +49

      My colleagues get mad if I do it, but it's so much fun.

    • @user-dh8oi2mk4f
      @user-dh8oi2mk4f ปีที่แล้ว +43

      @@voidwalker7774 u coudof usd fewer wrds:)

    • @YasserS19
      @YasserS19 ปีที่แล้ว +1

      @@user-dh8oi2mk4f ucudvusdfwrltrz

    • @Yutaro-Yoshii
      @Yutaro-Yoshii ปีที่แล้ว

      @@Zex-4729 vry tru jpns inefcnt

  • @BuRRak
    @BuRRak ปีที่แล้ว +1593

    This video was incredibly easy to watch, you had my attention the whole time. I really hope the algorithm blesses you, because you deserve it.

    • @strager_
      @strager_  ปีที่แล้ว +106

      Thank you!

    • @creaturekaspar
      @creaturekaspar ปีที่แล้ว +36

      Just appeared in my recommended, seems like a good sign!

    • @opayke980
      @opayke980 ปีที่แล้ว +14

      @@strager_ honestly the thumbnail is what grabbed my attention lol

    • @decky1990
      @decky1990 ปีที่แล้ว +2

      I’m with Burrak on this one - such a pleasant video to stay engaged with.
      Please do more videos.

    • @genehenson8851
      @genehenson8851 ปีที่แล้ว

      I think it’s happening!

  • @Pence128
    @Pence128 ปีที่แล้ว +181

    The 'c' in _mm_cmpestrc is because it returns the carry flag. What the carry flag actually means depends on the control byte.

  • @robrick9361
    @robrick9361 ปีที่แล้ว +806

    This guy is every programmer from each generation wrapped into one. I love this dude.

    • @strager_
      @strager_  ปีที่แล้ว +146

      👶👦👨🧔‍♂️👨‍🦳👴

    • @yrtepgold
      @yrtepgold ปีที่แล้ว +35

      He is the 1985 model of Dennis Ritchie

    • @Luuncho
      @Luuncho ปีที่แล้ว +49

      Not every generation.. unless he’s wearing programming socks 🥹

    • @2ndfloorsongs
      @2ndfloorsongs ปีที่แล้ว +16

      ​@@Luuncho 😁 Maybe that's why he's never shown us his toes.
      P.S. No socks, but I have an Intel 4004 t-shirt.

    • @acex222
      @acex222 ปีที่แล้ว

      ​@@Luunchocreepy.

  • @ClayMurray8
    @ClayMurray8 ปีที่แล้ว +641

    Knowing your data is such a HUGE programming concept that isn’t talked about nearly as much as it should be.

    • @styleisaweapon
      @styleisaweapon ปีที่แล้ว +46

      Its because CS education has been simplified to the point of stupidity and dogma - they are very big on commenting your code, right? - originally it was the variables and data flow that were asked to be documented and that specifically the code should _not_ be commented - it was understood that the code should merely be written in a way that documents itself (easy to read) because the code is the authority on what the code does - but somewhere between the 1980s and now, all we get is dogmatic crap from "expert programmers" that still write code with comments like "// increment the index" and "// test for null" because its really easy for a CS professor to grade junk like that, while its problematic to grade people on variable and function names and types, the actual stuff that needs to be very clear.

    • @strager_
      @strager_  ปีที่แล้ว +114

      Can you share an example of an expert programmer who writes comments like "// increment the index"?

    • @knight_kazul
      @knight_kazul ปีที่แล้ว

      @@styleisaweapon Wow, someone's really butthurt about comments out of all things.
      While i do agree that the "Clean Code" principles do more harm than good ( th-cam.com/video/tD5NrevFtbU/w-d-xo.html ) i do strongly disagree with comments being absolutely needless. I leave tons of comments in my code. If you look at the code i write it's essentially 50% comments. Why? Because i like to be nice to me and to any potential person who looks at my code because i write code once and then immediately forget what it does and why it's there. Sure i could try to understand it's purpose but you know what's easier for a human to read than source code? Human language. It's far easier for me to read a comment than it is to decipher a chunk of code with 0 comments explaining what it does. And that's how i'd like to think others experience it as well. And why should i not make my own life and the life of others easier by adding in comments? It doesn't hurt anyone and is a nice guidance for those who don't know as much as i do about programming.

    • @styleisaweapon
      @styleisaweapon ปีที่แล้ว +3

      @@knight_kazul you arent a good programmer if your implementation needs a second companion description of what the code does - meanwhile all that data you still havent documented while insisting you need to document the code - thats your problem you've been trold cant help you further

    • @knight_kazul
      @knight_kazul ปีที่แล้ว +39

      @@styleisaweapon My code doesn't NEED comments. i provide it because i want to. Because i can benefit from it later on in development when i have long forgotten about that code segment, what it does and why it does things the way it does.
      Not sure what you mean by the data that still isn't documented...

  • @nilusnilus6365
    @nilusnilus6365 ปีที่แล้ว +33

    clicked for the thumbnail, stayed for hash tables

  • @daleryanaldover6545
    @daleryanaldover6545 ปีที่แล้ว +303

    You would not really feel this is a 33 minutes video. The brief discussion in earlier sections made it easy to digest the things you talked in the upcoming sections. It's hard not to imagine you were rubber ducking while making the video because of how you explain is so relatable.

    • @strager_
      @strager_  ปีที่แล้ว +55

      I've spent years teaching people on my Twitch stream (twitch.tv/strager), so I've had practice. 🙂

    • @Walter_
      @Walter_ ปีที่แล้ว +13

      @@strager_ wow youtube thinks parenthesis are part of a regular twitch url

  • @linkertv
    @linkertv ปีที่แล้ว +139

    This guy clearly has a knack for teaching. I have not seen an explanation this clear and engaging in a long time.

    • @rembautimes8808
      @rembautimes8808 10 หลายเดือนก่อน

      It’s a privilege to watch an expert talk on his field

  • @nonchip
    @nonchip ปีที่แล้ว +599

    fun fact, the "easiest hash function being the length" is actually why PHP has so many weirdly long function names. in early versions that's actually how the hashmap for globals would work, and the developers thought it was easier (at least early one) to just come up with lots of different length names, than replace the underlying structure (which they did later anyway, but at that point didn't want to rename everything)

    • @strager_
      @strager_  ปีที่แล้ว +155

      I totally forgot about that! I would have mentioned it in the video. 😀

    • @sullivan3503
      @sullivan3503 ปีที่แล้ว +123

      Wow. That's... horrible.

    • @nonchip
      @nonchip ปีที่แล้ว +18

      ​@@styleisaweapon i mean, is there really a better option though? a _general_ purpose hashmap that is meant to be "usually fast" on 64bit systems can't really be any faster than "single 64bit integer compare" can it? *without* any application specific assumptions, that is. and also, yeah, as long as its source was 64bit there can't be any collisions (not sure what you mean by "see how hash has multiple meanings", as in "do you mean the hash function or any specific hash", but it doesn't matter either way, for either interpretation of 'meanings'. either "the identity function" which is as well-defined as you can get; or "maps things only onto itself", which, again, can't possibly collide).
      i mean yeah sure with e.g.

    • @styleisaweapon
      @styleisaweapon ปีที่แล้ว +9

      @@nonchip spoken like someone that doesnt know what properties algorithms need from their hashing function, just like those terribly wrong math nerds

    • @oxey_
      @oxey_ ปีที่แล้ว +2

      that's incredible lmao

  • @Spookyhoobster
    @Spookyhoobster ปีที่แล้ว +269

    24:38 - I nearly lost it when you showed off those numbers. It's incredible to see what someone with in-depth knowledge can achieve. Appreciate the demonstration and advice!

    • @strager_
      @strager_  ปีที่แล้ว +55

      I was also surprised how much cmov improved performance, especially after seeing Linus Torvalds' 2007 email about cmov: yarchive.net/comp/linux/cmov.html

    • @casperes0912
      @casperes0912 ปีที่แล้ว +31

      ​@@strager_ I think something that has changed since Linus' comments is that A) the throughput difference of CMOV and MOV has been reduced as pipelining has been improved.
      B) The penalty of a full branch prediction miss is greater because correct prediction is faster relative to incorrect prediction compared to 2007, and pipelining everything with cmov may still be slower than 100% perfect prediction but slower than a small number of mispredictions.
      These are guesses, but yeah - as I saw you comment in another comment-thread as well; The environment can change and micro-architectures have changed a lot since 2007

    • @styleisaweapon
      @styleisaweapon ปีที่แล้ว +10

      @@casperes0912 On the most modern cpus there will never be a branch misprediction penalty in forward conditional code like that - modern pipelines will speculate both sides of the condition - the penalty you get is instead in always using unnecessary resources which may have otherwise done something else that was useful - the best version was the lea instruction, and its objectively strictly better in every way than both cmov and the forward conditional branching - the lea instruction uses the very highly optimized addressing mode logic built into the cpu - there exists no instruction on x86/AMD64 with any objectively better performance metric or available execution units - whenever lea can do it, you probably should have used lea.

    • @magne6049
      @magne6049 ปีที่แล้ว +1

      @@casperes0912 *but _faster_ than a small number of mispredictions

  • @10e999
    @10e999 ปีที่แล้ว +78

    "Your data is never random"
    I just want to say Bravo !
    What a pleasure to watch a Data structure video from a real-world perspective!
    Subscribe.

    • @nikkiofthevalley
      @nikkiofthevalley ปีที่แล้ว +3

      ​@jankrynickyHow do you know that some program doesn't use hash tables a bunch to the point where this type of optimization _would_ help? You don't. You might not need that kind of optimization, but someone else might.

  • @Galakyllz
    @Galakyllz ปีที่แล้ว +180

    This video was truly amazing. Wow. I really appreciate all of the work you've put into this. I've always known that I could write my own assembly, but I've figured that the compiler would always optimize better than I could. Seeing how you were able to profile and optimize your code made me realize that I would have eventually come to some of the same conclusions - so I should probably at least give it a shot myself.
    Thank you for making this video. This should be a required viewing for any computer science college students at some point in their education.
    PS. I was slightly annoyed that you didn't dig into why the binary search was slower at the very beginning of the video until you kept going deeper and deeper into the different inner workings of all of these different hash algorithms and optimizations, making me realize that this is meant to be a densely-packed hash table video, not a binary search video. Thanks again.

    • @strager_
      @strager_  ปีที่แล้ว +25

      > this is meant to be a densely-packed hash table video, not a binary search video.
      Actually, it's a video on general performance tips. The perfect hash table is just a case study!
      > I was slightly annoyed that you didn't dig into why the binary search was slower at the very beginning of the video
      You might be interested in this talk: th-cam.com/video/1RIPMQQRBWk/w-d-xo.html

    • @brunoais
      @brunoais ปีที่แล้ว +16

      This kinda reminds me how I optimized a hashtable with >2M entries into 2 arrays and increased its performance 20x. It was a performance critical program which was already using 32 GB of RAM (and I reduced to 8GB!).

  • @andydataguy
    @andydataguy ปีที่แล้ว +27

    That was insane. I've never seen someone walk through such a deep optimization. Thanks so much for this video!

  • @cmilkau
    @cmilkau ปีที่แล้ว +92

    Smaller memory footprint *does* matter for performance, because caches. In this example, the table is just too small already to see that effect.

    • @skilz8098
      @skilz8098 ปีที่แล้ว +19

      It's not just the cache hits and cache misses ratio, nor just the branch predictors at length, but also data alignment and page boundaries can come into play. Other things that can become a factor can come from the OS of the system itself such as with their process schedulers, v-tables, and more... but the latter are for the most part out of the hands of the basic user because this would then involve modifying the actual underlying kernels of the system and how they manage system memory access via read/write ops.

    • @WeirdDuck781
      @WeirdDuck781 11 หลายเดือนก่อน +1

      Small enough to fit on a CPU L-cache ?

  • @osamaaj
    @osamaaj ปีที่แล้ว +8

    This video has been popping up in my suggestions for the past two months and I'm so glad I've finally watched it. I can see how these concepts still apply, even if you don't end up going down the rabbit hole and writing assembly instructions. Thanks a lot!

  • @Rose-ec6he
    @Rose-ec6he ปีที่แล้ว +95

    This is amazing! I never expected you'd be able to get more than 10x faster than builtin hashes. I didn't expect how wildly the performance would vary from modifying your algorithm, assumptions about the input data and assembly instruction optimisations. Seeing the numbers go up and up with your thorough explanations of each modification you made tickled my brain. I'll definitely be coming back here for more!

    • @CuriousSpy
      @CuriousSpy ปีที่แล้ว +11

      It is not all purpose algorithm. No need to compare them to built in

    • @strager_
      @strager_  ปีที่แล้ว +26

      Most people would use the language-provided hash table, so it's completely fair to compare my solution with those hash tables.

    • @marcossidoruk8033
      @marcossidoruk8033 ปีที่แล้ว +8

      ​​@@strager_ No. That doesn't mean it is a fair comparison at all 🤦🏼‍♂️.
      Its like comparing a Ferrari to a Fiat to get from point A to B, the fiat does the job at a fraction of the cost yes, but that comparison misses the entire point of the existence of the Ferrari.
      Just because "most" people would use for convenience the builtin hash table doesn't mean it is comparable, in fact if you think it like that your point is totally self defeating, most people use the builtin because it is easier and more convenient, one solution seeks convenience and generality and the other pointless optimization for a single case, not comparable.

    • @strager_
      @strager_  ปีที่แล้ว +14

      ​@@marcossidoruk8033 You can compare the two in terms of feature set, ergonomics, performance, and security.

    • @Rose-ec6he
      @Rose-ec6he ปีที่แล้ว +18

      I don't see the problem with comparing them. It's solves the same problem as other tools just in a way that's fastest for this use case. It's not general purpose like other tools sure but they are still different solutions to the same problem of storing data and fetching it from a large group with low overhead. The structure of one is just more coupled to the the data. It's an improvement in speed over what may have otherwise been used. Sure it's not a drop-in replacement for every project but it doesn't need to be. I'm sure it has it's tradeoffs too but what was being measured and compared here was the amount of lookups in a given time. Sure it may be an unnecessary optimisation and it won't be useful in every use case but that does not make the performance comparison in this use case invalid. The topic of this video was optimising the hashset data structure for use in his compiler. In terms of speed in his specific use case under conditions he expects, his solution was faster. Does that mean everyone should use it? No. Does it mean it's perfect for every situation? Does it mean it's always faster for every dataset? No. But that's not the topic of the video. It was not a pros and cons list for using his hashset in any program. It's how fast can he get with the data he needs to process en masse.

  • @furl_w
    @furl_w ปีที่แล้ว +38

    The other thing that I think makes calling memcmp slow is that the branch predictor gets thrown off inside memcmp because memcmp is called from other places in the program with other data; with the single character check the branch predictor has the luxury of only seeing that branch used for the keyword lookup and nothing else.

    • @strager_
      @strager_  ปีที่แล้ว +8

      I haven't looked in a while, but I thought the only branch in memcmp was a page boundary check. You might be right though.

  • @_Escaflowne_
    @_Escaflowne_ 2 หลายเดือนก่อน +2

    This is by far the most advanced optimization video I've seen and so captivating too

  • @mihailmilev9909
    @mihailmilev9909 ปีที่แล้ว +3

    I just loved seeing the bars go up and hearing about the different approaches going deeper and deeper.

  • @markzuckerbread1865
    @markzuckerbread1865 ปีที่แล้ว +80

    Amazing video, hash tables are such an underrated concept, everyone uses them but few ever mention them or how they work, let alone do that in depth, you're awesome!

    • @strager_
      @strager_  ปีที่แล้ว +6

      Hash tables are dope. 🚬

    • @spacelem
      @spacelem ปีที่แล้ว +1

      There's a talk by Raymond Hettinger on dictionaries, with an in depth look at how they've changed and improved since the 60s, and eventually returned to a very similar layout to that of the 60s. I think watching that is why TH-cam suggested this video to me!

    • @juliap.5375
      @juliap.5375 ปีที่แล้ว

      wtf? 😮 How they works everyone study from beginning. Both self-educated or university-educated, it is base knowledge in any book.

  • @bloodgain
    @bloodgain ปีที่แล้ว +19

    Wow, great talk! You're a clever programmer and a good teacher.
    We shouldn't, as a habit, go about trying to optimize everything to the last bit, but you cover exactly the way one should go about it properly: profile the performance! That goes for our larger programs as a whole, too. That is, don't go optimizing your hash function or writing your own hash table until you've determined that's where your software needs optimizing. Then when you do, iterate on that process: profile it and work inward. And of course, always check to see if there's a faster, well-tested option already available before rolling your own!

    • @strager_
      @strager_  ปีที่แล้ว +3

      > That is, don't go optimizing your hash function or writing your own hash table until you've determined that's where your software needs optimizing .
      Profiling first can be misleading. The keyword recognition code in my parser didn't show up in the profiler, yet optimizing it improved performance by 5%. (Maybe this is just a bug in the profiler?)

    • @strager_
      @strager_  ปีที่แล้ว +7

      > And of course, always check to see if there's a faster, well-tested option already available before rolling your own!
      I suggest understanding the problem well (e.g. by writing your own implementation) before picking an off-the-shelf solution. How do you know that their solution is good until you've thought about it carefully yourself?

    • @bloodgain
      @bloodgain ปีที่แล้ว +5

      @@strager_ By testing, which you'd have to do with your own solution, anyway. Though in many cases, lots of data on performance is already available unless you have a very special use case.
      In production code, there are so many good reasons to avoid writing complex libraries until you can prove they don't meet your needs that not doing so is a named anti-pattern: Not Invented Here. Much has been written about it better than I could do here.
      You're not required to agree with me, of course. I'd want a guy like you working on core libraries, anyway, and in those cases, your attitude is usually the right one. But I wouldn't want the guys working on the rest of the system trying to outsmart/out-optimize the core library team, at least not unless they've found the bottleneck and submit a pull request after!

    • @strager_
      @strager_  ปีที่แล้ว +5

      > in many cases, lots of data on performance is already available unless you have a very special use case.
      I disagree. The assumptions I can make about my data are not the same as the assumptions you can make about your data, even if we both could use a common solution (like a string->number hash table).
      > I wouldn't want the guys working on the rest of the system trying to outsmart/out-optimize the core library team
      Why not? Why can't anyone contribute to the core library? A workplace where only privileged programmers can touch the core library sounds like a toxic workplace.

    • @bloodgain
      @bloodgain ปีที่แล้ว +3

      @@strager_ You left out the second half of that sentence where I said, "unless they've identified the bottleneck and submit a pull request after". The point being if they're off rolling their own brand new solution everything because they think they "can do better" (a common attitude, and the one that leads to "Not Invented Here" syndrome), that's a problem.
      It seems like you're just looking for something to argue about. My original comment was positive and supportive, and you still looked for ways to pick it apart.

  • @leddoo
    @leddoo ปีที่แล้ว +23

    loved the intuitive explanations!
    and very solid performance advice. people often forget that you can adjust your input data to, for example, get rid of bounds checks, vectorize, or unroll.

    • @strager_
      @strager_  ปีที่แล้ว +14

      > people often forget that you can adjust your input data
      Yup. This is one problem with abstractions: they hide details! Sometimes it's nice to peel back the abstractions to see what you're *really* dealing with.

  • @endunry2
    @endunry2 ปีที่แล้ว +6

    Started this video thinking i know a lot of stuff about hashing, i mean i knew about collision etc. But i didnt just learn a lot more about creating actual hash-functions that are fitted to my data but also the whole branching stuff was insightful. Really really nice to watch thanks.

    • @strager_
      @strager_  ปีที่แล้ว +2

      I'm glad you learned something!

  • @link99912
    @link99912 ปีที่แล้ว +7

    I've been working on and off on a simulation where I could use every scrap of performance I can find. I'm not even close to the optimizing stage yet, but I find this video really inspiring. It makes me think, "Hey, maybe I can make this 50x faster", and gets me excited to think of ways to speed it up.

    • @strager_
      @strager_  ปีที่แล้ว +2

      It's fun!

  • @candrewlee14
    @candrewlee14 ปีที่แล้ว +8

    Phenomenal video! And thanks for talking about your missteps, that’s rarely talked about in videos like these. You kept viewers expectations of the optimization process much more realistic. Keep it up!

  • @KadenCartwright
    @KadenCartwright ปีที่แล้ว +13

    Excellent video! I have absolutely no use for this in my day job but I still had a lot of fun learning about it regardless. Can’t wait to see more videos from you in the future!

  • @oscarmvl
    @oscarmvl ปีที่แล้ว +7

    I agree with everyone else mentioning that this didn’t feel like a half an hour video, very (and surprisingly) entertaining and insightful!

    • @hashi856
      @hashi856 ปีที่แล้ว +1

      It especially doesn't feel like half an hour when you watch it at 2x speed

  • @denalijah6790
    @denalijah6790 ปีที่แล้ว +1

    Thanks for starting from the beginning. I didn't need all of it, but it was a nice intro to make sure everyone was on the same page.

  • @johnmccardle
    @johnmccardle ปีที่แล้ว +4

    pleasantly surprised to see Dave Churchill come up. feels like a crossover episode

  • @MisterDan
    @MisterDan ปีที่แล้ว +4

    This was excellent! I spent a solid month on hashing at one point -- the tips in here would have saved me days of time. Thank you for the knowledge.

  • @BenWills_Com
    @BenWills_Com ปีที่แล้ว +3

    Here are some things I've done to solve the same problem...
    But first, I want to say that I really appreciate this video. I work on stuff like this "for fun" a lot, and it's pretty difficult to find other folks who engage/approach these things similarly.
    1 - Instead of checking the first character, analyze all keys to find which character has the most variance. eg: if you have 32 keys in your table, maybe key[1] has more variance than key[0]. Additionally, this can help you escape your bounds check for single byte lookups.
    2 - Instead of modifying the seed to the hash funciton, create a very minimal hash function - mostly rotates and xors, maybe additions - to hash the key as quickly as possible. Then have the hash function generator run through all of the permutations (of shifts, xors, and adds) to find the smallest table generated with the fewest instructions. eg: you may have two permutations of operations that result in a table size of x, but maybe one of them doesn't need a rotate.
    3 - Trying to find the smallest table is most useful when you can fit the table into the L1 cache. Your resulting table sizes may not have fit into L1. Using the technique in #2 above, I've been able to shrink table sizes considerably. And then it's worth checking to see if a modulo/table size that isn't a power of 2 is worth the extra latency/throughput required by a non-power-of-2 modulo, which may compensate for going to L2 or L3.
    Thanks again for the video. I learned some good stuff in here, and really appreciate you putting this together.

    • @strager_
      @strager_  ปีที่แล้ว +1

      #2 is an interesting idea which I have never explored before. I never looked into making my own hash function from scratch; I always relied on someone else's work.

    • @strager_
      @strager_  ปีที่แล้ว +1

      #3: At the end, here's the space usage when I used a separate table for the strings:
      Hash table: 4 bytes per entry * 512 entries = 2 KiB
      String table: 420 bytes
      Total data size: < 3 KiB
      Code: github.com/strager/perfect-hash-tables/blob/0569f3580499036ad3372e3cda6c4e015fa006a8/generated/pht--table-size-pot_--characters-15_--hasher-lehmer128_--string-compare-ptest_--cmov_--keyword-size-in-entry_--no-early-bounds-check_--string-table.cpp
      The data all fit in L1 cache, which is probably why shrinking the table didn't improve performance (28:03).

    • @BenWills_Com
      @BenWills_Com ปีที่แล้ว

      @@strager_ Ah. Yeah, that will definitely fit in L1.

  • @G-3-A-R-Z
    @G-3-A-R-Z ปีที่แล้ว +3

    THIS IS SO HELPFUL. Thanks. This really helps us self-taught programmers like crazy. It's always good to learn about fundamentals like this, that are useful for many languages. Kind of opens your mind to what is possible.

  • @devcybiko
    @devcybiko ปีที่แล้ว +1

    brilliant video… This is one of my favorite interview questions when I seek new programmers… I start them with a simple on ordered array and asked him to search it and we walk from linear search to binary search to binary search trees to hash tables… Most understand how to use hash tables very few understand how they work… I hope more programmers watch this video and understand how hash tables work on the inside… Because there's a difference between knowing how do use something and knowing why it works

  • @jomohogames
    @jomohogames ปีที่แล้ว

    Instant subscription. This type of content is why I still love TH-cam. You packed years or even decades of experience into a short and densely packed lesson that I'll remember probably for the rest of my life. Thanks so much!

  • @verybasic1836
    @verybasic1836 ปีที่แล้ว +5

    I truly love your presentation. It's very efficient, just like your code lmao
    Some helpful visuals go a long way in illustrating concepts and keeping attention. Your video not only taught me about hash tables, but also about presentation. GJ!

  • @vkottler
    @vkottler ปีที่แล้ว +6

    Customizing / designing hash-table interfaces based on contextual knowledge about the underlying data is a great topic. Very true that such an intuitive technology (mapping some key to some element) is often overlooked and not well understood.
    Awesome video!

  • @ZelenoJabko
    @ZelenoJabko ปีที่แล้ว +119

    The problem with doing all this is that possibly with the next version of the compiler, different operating system, or a new model of CPU, any of these optimizations might backfire (actually cause performance loss).

    • @strager_
      @strager_  ปีที่แล้ว +119

      Yup. Maybe Zen 4's cmpestrc isn't so slow!
      But I doubt the new hardware and compilers would make my solution slower than C++'s std::unordered_map.
      If a compiler makes my code slower on the same hardware, that's probably a fixable compiler bug.
      You bring up a good point though. If I *really* cared about the performance, I should keep my benchmark suite around for the future to check my work when the environment changes.

    • @peterfireflylund
      @peterfireflylund ปีที่แล้ว +8

      Just having the compiler put the basic blocks in a different order or just having the linker out the sections in a different order can change the performance. Different layouts of the stack and heap can do the same.
      Emery Berger has a great talk about it + he co-wrote a paper “STABILIZER: Statistically Sound Performance Evaluation” about it.

    • @criticalhits7744
      @criticalhits7744 ปีที่แล้ว +5

      The compiler might slow it down a bit, but unless there's a compiler bug you're never going to lose the order of magnitude in performance that you gained. And more importantly, you should build for your target platform, not some theoretical future platform.

    • @ZelenoJabko
      @ZelenoJabko ปีที่แล้ว +8

      @@criticalhits7744 there is no target platform, people will be compiling and running the code on hundreds of different processor models, maybe even different compilers (GCC, clang, ICC). Next, I am not arguing all optimizations would go bad, just some. For example, in this video, he has made 10 optimizations that were all beneficial. In some other future scenario, 7 might still be beneficial, 1 becomes neutral, and 2 harmful (worsening performance). You have to countuously keep testing each in isolation

    • @criticalhits7744
      @criticalhits7744 ปีที่แล้ว +1

      @@ZelenoJabko Sure, that's kind of what I mean though, youll always want to benchmark at least on the compiler *you* are shipping with, assuming you're shipping software and not an SDK/API. If you're shipping software you'll always want to know what your range of platforms/environment s are now, and not necessarily consider what *might* change for the platforms/environments in the future.

  • @TheXenoEnder
    @TheXenoEnder ปีที่แล้ว +4

    As someone who has had the joy of micro-optimizing workloads before, this was a fun journey to take with you!

  • @birenpatel1838
    @birenpatel1838 ปีที่แล้ว +1

    Excellent video & an excellent teaching style. I went down the same rabbit hole a couple of years ago (but for database joins, not compilers) and ultimately switched back to gperf from simd because my laptop would get uncomfortably hot on my lap!

  • @m.minkov
    @m.minkov ปีที่แล้ว +5

    This was awesome! This video format was really informative, it's like walking through the socratic method for performance.

    • @strager_
      @strager_  ปีที่แล้ว +2

      > it's like walking through the socratic method for performance.
      Thank you!

  • @magicpigfpv6989
    @magicpigfpv6989 ปีที่แล้ว +6

    I think perf tip 6 is why the karatsuba method is more performant than typical matrix multiplication approach taught in school

  • @bustamove361
    @bustamove361 ปีที่แล้ว +10

    This is some high-quality content. Thanks for taking the time to explain it so clearly

  • @weiSane
    @weiSane ปีที่แล้ว +1

    Thanks to TH-cam for recommending this video. I live for these technical deep dives. This video was exquisite 😂.

    • @strager_
      @strager_  ปีที่แล้ว +1

      You're welcome! I'm glad you liked it.

  • @jamessteiner5023
    @jamessteiner5023 ปีที่แล้ว

    Subscribed. Rare combination of a pleasant voice, good recording quality, lively delivery, and concise, fluff-free information. MMM.

  • @DarenC
    @DarenC ปีที่แล้ว +24

    Loved this. Reminded me of the days when I was a C programmer back in the 80s/90s, rather than just a PHP scripter like I am today 🤣

    • @dutchdykefinger
      @dutchdykefinger ปีที่แล้ว +1

      i actually started with php, then went to java/c to learn about typecasting lol
      nowadays i even do more java/ecmascript than php, but i still have my php CLI toolkit for all kinds of shit,
      i refuse to use python lol

    • @JacobAsmuth-jw8uc
      @JacobAsmuth-jw8uc ปีที่แล้ว +6

      Writing PHP in 2023? That's brutal

    • @tehbing
      @tehbing ปีที่แล้ว +4

      @@JacobAsmuth-jw8uc Still a good language for fast prototyping and general-purpose web development! Lots of great changes have been brought into the language in the past few years. The usefulness of every tool depends on the task at hand, but maybe you had a really bad experience with PHP. Can't fault you for that :)

    • @pepegrillo9722
      @pepegrillo9722 ปีที่แล้ว +3

      @@JacobAsmuth-jw8uc Still talking down on PHP in 2023? That's so 2010 😆

  • @0xCAFEF00D
    @0xCAFEF00D ปีที่แล้ว +8

    This format is excellent. I learned a lot thank you.

  • @workingjubilee
    @workingjubilee ปีที่แล้ว +29

    You should do comparisons against Rust's FxHashMap. The reason SipHash is used as a default is because we assume it might be in a position to suffer a HashDoS attack, but everyone using Rust who wants to actually make their HashMap go fast, including the actual Rust compiler, uses FxHash or another simpler hash function as their hasher.

    • @strager_
      @strager_  ปีที่แล้ว +15

      I did make a custom hash function. I plugged it into C++'s std::unordered_map to show an improvement (8:02). Switching the hash function is something both Rust's and C++'s hash tables support, so I just picked one to demonstrate the idea.
      The point of this video wasn't to make Rust's HashMap fast. It was to show how to think about improving performance.

    • @kallehed6330
      @kallehed6330 ปีที่แล้ว +8

      This is very correct. Would be interesting to see C++ unordered_map compared to Rust with a better hash function.

  • @quexpix
    @quexpix ปีที่แล้ว +2

    First video of yours I've seen and it's an instant subscribe. Highly technical information presented in a very clear and understandable way, with no filler, and plenty of food for thought. Thank you.

  • @20dareason09
    @20dareason09 ปีที่แล้ว +1

    This simply amazing! From brainstorming to presentation and everything in-between. Thanks for sharing your hashing journey.

  • @yapet
    @yapet ปีที่แล้ว +6

    Nice vid 👍. I remember a talk by Chandler Carruth at cppcon, where he brought a skylake 2U server with him onto the stage, and did a bunch of perf shenanigans, trying to optimise the code at hand.
    He works on LLVM team, and spent a lot of time trying to make clang produce that cmov. In the end, he made compiler emit branchless code, and it performed worse!
    Turns out the relative performance of cmov vs branches is highly dependent on CPU architecture (in his case, it was a skylake machine). Compilers use a bunch of heuristics and just setting -march=native is enough to make clang emit a branch on the same codebase (which was faster for that case on that micro architecture).
    All that got me wondering about what different instructions are better generated for newer / older CPUs. This can be pinned down if you are running on the known hardware (servers, consoles, micro-controllers), but something like compilers run on everything. Looks like there is a point in gathering stats to determine what is the most common -march for the majority of users.

    • @strager_
      @strager_  ปีที่แล้ว +2

      You make a great point about cmov's performance being CPU-specific!
      I did try -march=native on both of my machines (x86_64 (Zen 3, 5950X) and AArch64 (Apple M1)). Clang generated cmov (csel) consistently for AArch64, but neither GCC nor Clang generated cmov consistently for x86_64. I did not test on older or newer architectures.
      All of the benchmarking was done on a 5950X on Linux with GCC (I think version 11).
      One thing I did not research (but should have) is using profile-guided optimizations to see if that would help compilers generate cmov. The compilers would be able to see the lookup hit rate. I suspect they would still try to branch on x86_64 though...

    • @JohnDlugosz
      @JohnDlugosz ปีที่แล้ว +2

      There's also the case of finding the sweet spot. E.g. the shipped exe runs on Haswell or newer, and if you really want to run on old hardware you can get an alternative compiled binary.
      That is, at some point the new instructions and optimization assumptions make a critical difference, and beyond that it's just incremental improvements.
      But compiling for the absolute worst case of old hardware will perform rather poorly on modern hardware.

  • @AlanTheBeast100
    @AlanTheBeast100 ปีที่แล้ว +5

    You are one of the best programming presenters I've ever run across.

  • @slimhazard
    @slimhazard ปีที่แล้ว +15

    31:00 on the optimization of checking the first character. Calling a function has some overhead, just to push registers, jump to the address, clear some stack space, pop the registers again when it‘s done, and so forth. Also memcmp() in particular does what it can to do wider compares, which is almost always a big win, but since the data isn‘t necessarily aligned for that, it does some byte-for-byte compares at the beginning and the end of the memory region - the wide compares tend to be in the middle. The one-byte comparison for the first character is a single instruction, so when the first character differs, all of that overhead can be avoided. It‘s a win when the first character differs most of the time, less so if they‘re frequently the same.

    • @_fudgepop01
      @_fudgepop01 ปีที่แล้ว +2

      Oh so I was right! Nice!~
      However, I looked it up and found that in some compilers (gcc) memcmp is recognized as a compiler “builtin” that is inlined if it can be, so I’m still wondering if it could be something else

    • @gianni50725
      @gianni50725 ปีที่แล้ว

      slight correction, on x86_64 you shouldnt expect any registers to be pushed or popped since the arguments will be stored directly in the registers

  • @mctielpresidente
    @mctielpresidente 8 หลายเดือนก่อน

    Wow. I learned so much from this vid that I didn't know I NEEDED to know about hash maps. I went from hashes and hash maps being more advance topics slightly outside my current knowledge in Comp Science to feeling like I know the subject inside and out.
    I've even learned a lot about assembly and compilers while watching this random vid that came across my timeline!
    You, sir, are an excellent teacher. You have gained yourself a subscriber, Sensei! 🙇‍♂️

  • @artificemm
    @artificemm ปีที่แล้ว +4

    Wow! this was the perfect summary of all your work on stream, easy to follow and really clear and had me to question a few things about my code. I appreciate your hard work Strager, let's celebrate with a cold mexican coca cola. Cheers!

  • @ruffianeo3418
    @ruffianeo3418 ปีที่แล้ว +8

    One idea might be: You could do your thing for each group of same-length keywords. One benefit might be, that the tables for each length become very small and the search for the hash functions becomes much quicker/better.
    A variation of that idea could be to "seed in" the length of the symbol you want to test if it is a keyword (as if it were 2 extra characters in front or something along that line).
    "let" becomes "\3let" is what I mean with the last paragraph. Compare to the (e.g. in Common Lisp) "let*" vs "let" "\4let*" vs "\3let".

    • @strager_
      @strager_  ปีที่แล้ว +4

      > You could do your thing for each group of same-length keywords.
      This is a great idea! I actually had this idea but wrote it poorly at 30:00.

    • @strager_
      @strager_  ปีที่แล้ว +4

      > A variation of that idea could be to "seed in" the length of the symbol you want to test if it is a keyword
      I actually did try this, but only picking characters was easier to explain for the video and performed faster on average.

  • @1Adamrpg
    @1Adamrpg ปีที่แล้ว +15

    Idea for your perfect hash table:
    Make a separate hash table for 1-character keys and 2+-character keys, and use a branchless approach (or mask) that selects the appropriate value. This of course involves 2 different approaches, but could avoid the memory waste, and an even more optimized table for 1-character keys.

    • @strager_
      @strager_  ปีที่แล้ว +9

      I didn't have any one-character keys, but if I did, this sounds like a reasonable thing to try.

  • @nooan79
    @nooan79 ปีที่แล้ว +1

    Fantastic and addictive video with plenty of ideas. The burp was a nice touch too. You've gained a new subscriber !

  • @DDlol01
    @DDlol01 ปีที่แล้ว +2

    I was playing a game ... and i did not notice I stopped playing and loosing to a boss until I died, while watching.
    This video helped me ALOT. Thank you!

    • @strager_
      @strager_  ปีที่แล้ว +1

      🤣
      At least you weren't driving a car!

  • @jahbini
    @jahbini ปีที่แล้ว +3

    Love this stuff. Mind candy with extra goodness. For the limited domain of a fixed set of keywords, I might look at all two letters in the set, select the two that have maximum entropy and use those two as first guess. If you have a pair that give no collisions for the set, speed ensues.

  • @TheWumbologistt
    @TheWumbologistt ปีที่แล้ว +7

    Incredible video, didn’t even realize it was a 34 minute video until it was over. While explaining the hash table implementation, you also implicitly made a compelling argument for taking assembly seriously. I learned a lot. Thank you!

  • @JoshuaMaciel
    @JoshuaMaciel ปีที่แล้ว +6

    Fantastic video! Even though I don't know much about low level stuff you made it very easy to understand and follow and got me even more motivated to study more

    • @strager_
      @strager_  ปีที่แล้ว +2

      > got me even more motivated to study more
      That's what I like to hear!

  • @grandpotato
    @grandpotato ปีที่แล้ว

    I love your tip "winners don't give up" is one of your last ones.
    I love it more that it is (coincidentally?) symbolised with the dimming outside light as the video progresses. It does feel like you had to struggle through the process.
    Great video

  • @broccoloodle
    @broccoloodle ปีที่แล้ว +2

    Never expect this kind of details on a free platform like youtube

  • @silentdebugger
    @silentdebugger ปีที่แล้ว +4

    Oh man, I am actually deep in this middle of high-performance hashing right now (to build an internalizer that maps known strings into unique integers, very similar to what you're doing). My solution currently looks like this:
    Rather than hash the entire string and index into a linear hash table, I constructed a decision tree. It's a lot like the game Guess Who: "Does your person have glasses? Does your person have long hair? Is this your person?". Each node in the tree would represent a pivot decision such as "is string length < 5?", "is char[3] greater than 'h'?". You would then iterate down this tree - which if constructed ideally the lookup would be O(log 2 N) - and at the end of the line the tree would point to some char data "Is this your string?", which you would strcmp on, in addition to the unique ordinal for the string which is your result. Works incredibly well for longer strings and skips the need for a hash function entirely.

    • @strager_
      @strager_  ปีที่แล้ว +1

      How big is your data set?
      I can see your solution being more memory-efficient with large data sets, but I suspect the pointer chasing is expensive with small data sets like mine.

    • @silentdebugger
      @silentdebugger ปีที่แล้ว +2

      @@strager_ about 500 strings with lengths between 1 and 30. I figured if the tree is packed into a linear struct array then the cache locality makes iterating down it fairly fast. And if the data set included much longer strings, my approach should be even faster as it only needs to compare a few characters rather than hashing the entire string.

    • @strager_
      @strager_  ปีที่แล้ว +2

      > my approach should be even faster as it only needs to compare a few characters rather than hashing the entire string.
      My approach doesn't hash the entire string. I only hash the first two and last two characters. (This specific approach might not work for your data, but there might be a similar approach that does work.)

  • @talhaakram
    @talhaakram ปีที่แล้ว +3

    I am so happy I discovered this channel. This is such an awesome video, I really learned a lot, Thank you!

  • @blank001
    @blank001 ปีที่แล้ว +6

    Awesome content, really liked it but way more than I can process in one go.
    This is the first youtube channel that I have subscribed in just one video.

  • @AnimeKing-m2t
    @AnimeKing-m2t 3 หลายเดือนก่อน

    TH-cam is full of so much good education and learning material that we can be masters in a subject, just saw this video and feel AMAZED by the way this instructor is teaching...🙂

  • @fredesch3158
    @fredesch3158 ปีที่แล้ว +1

    I don't know why but it was so entertaining seeing your thought process, and I even learned some stuff. Very good content

  • @ronanru
    @ronanru ปีที่แล้ว +16

    WHAT A THUMBNAIL WOW

  • @ThomasRobinson-k7c
    @ThomasRobinson-k7c ปีที่แล้ว +15

    A man in a faded microsoft T-shirt with his Christmas tree out in March wants to teach me algorithms? You best bet I'm going to listen.

  • @wyfyj
    @wyfyj ปีที่แล้ว +5

    Glorious thumbnail

  • @MickGardner-vc4us
    @MickGardner-vc4us ปีที่แล้ว +1

    I'll be stealing this lol. Some really cool math behind it as well. Thank you for your work strager!

  • @phitc4242
    @phitc4242 ปีที่แล้ว

    probably one of the most concise video on youtube about hashing... very cool and interesting

  • @hirisraharjo
    @hirisraharjo ปีที่แล้ว +4

    Now this is a quality content! Thanks a lot for sharing your knowledge with us!

  • @czolus
    @czolus ปีที่แล้ว +3

    Wrt to the speedup by looking at the first character: You're avoiding both the call overhead (if not inlined), and the memcmp loop setup/teardown (even if inlined) in the typical case

  • @RogerValor
    @RogerValor ปีที่แล้ว +3

    slow clap, thunderous applause, awe, top, i loved the burp.

  • @newmonengineering
    @newmonengineering ปีที่แล้ว +2

    Cool video. I have done some of these methods you mentioned on many slow programs. If there is a lesson I learned, it is stop, think about it and try another approach. There are so many different ways to approach the same issue in programming, but certain ones definitely are way faster. I made my own encryption concept and stumbled on performance issues the same way. But my latest version is waster than most off the shelf encryption algorithms. You have to keep trying and look at other people's code as well to some things so you can get some inspiration. So many times I have seen a code snippet and said wow that is a unique approach. I think regularly reading some people's code snippets is a great way to find inspiration. Especially if you look at code that fixes performance issues.

  • @RupertBruce
    @RupertBruce ปีที่แล้ว +1

    Thanks for the profiler tips and a very enjoyable exploration of the optimization game. I want to go work on my hash function now! I love that hash seed search trick!

  • @EraYaN
    @EraYaN ปีที่แล้ว +4

    It also a lot of fun to try profilers like coz and tools like stabilizer (although this one has maybe one fork left that is somewhat active?). But it can really help you not run into dead ends because of random layout changes.

  • @sarahclark9256
    @sarahclark9256 ปีที่แล้ว +8

    One nit: JavaScript’s Object shouldn’t be used as a general-purpose hash. You don’t get the performance guarantees and keys are limited to 7-bit ASCII. That’s why we have the Map class now. (Edit: The key limitation is if you have to run on ES2015 and earlier, which is surprisingly common if you’re shipping to a global audience.)

    • @strager_
      @strager_  ปีที่แล้ว +14

      > You don’t get the performance guarantees
      It's JavaScript... 🤣
      > keys are limited to 7-bit ASCII
      No, they aren't:
      $ node
      Welcome to Node.js v18.6.0.
      Type ".help" for more information.
      > {'\u2603': true}
      { '☃': true }

    • @sarahclark9256
      @sarahclark9256 ปีที่แล้ว +2

      @@strager_ True, I sometimes have to write code that works with javascript before 2016 (which is why the ASCII restriction on Object keys - your example would fail there.) Re: runtime guarantees, the Map specification requires implementations to run in sublinear time relative to the number of keys, so not necessarily O(1), but less than O(n).

  • @dominickrass8473
    @dominickrass8473 ปีที่แล้ว +7

    Bro that Thumbnail 💀💀💀

  • @right_jehoon
    @right_jehoon ปีที่แล้ว +1

    I didn't make a single blink while watching this god-like presentation video.
    Thank you so much for sharing this helpful information and tips!
    you should deserve mooooore(not moore lol) subs :)

  • @tylovset
    @tylovset ปีที่แล้ว +1

    Hats off for the engaging presentation and the well paced progression in the video.

  • @fantasdeck
    @fantasdeck ปีที่แล้ว +10

    How about Gödel numbering, sorted and compressed down? I feel like it would generate massive numbers, and so the real challenge would involve the compression bit.

    • @strager_
      @strager_  ปีที่แล้ว +9

      I only learned about the Gödel numbering trick yesterday (by coincidence; unrelated to hashing).
      I suspect it won't turn out well (because it would require multiplying four possibly-big prime numbers), but I didn't try it, so maybe I'm wrong!

  • @pdxholmes
    @pdxholmes ปีที่แล้ว +8

    Fun project, with some great insights, but I feel like there is a bit of "don't let perfect be the enemy of good enough" going on here. As a matter of pure learning and fun, it's great....but for most projects people should just use gperf until they've truly squeezed every ounce of optimization out of other parts of the code. Especially for newer/younger engineers, it's so easy to get caught down a rabbit hole like this and not realize you've spent precious time optimizing for 5% because you just had to replace gperf, instead of optimizing for 20 or 30% in parts of the code that are actually unique to what you're writing and where there is no "good enough" commodity solution in play. If you've truly squeezed all the juice out of the lemon elsewhere, then sure, go around replacing the "good enough" solutions with "perfect" ones.

    • @strager_
      @strager_  ปีที่แล้ว +3

      > for most projects people should just use gperf until they've truly squeezed every ounce of optimization out of other parts of the code
      I got a 4-5% speedup over gperf with a week or two of work. I'd say that was worth it. (Also I learned some things along the way.) I think keyword recognition is one of the bottlenecks that matter in my program.
      Some of my techniques could be incorporated into gperf too.
      > Especially for newer/younger engineers, it's so easy to get caught down a rabbit hole like this
      Yes, this is a problem. However, engineers need to grow an understanding of algorithms and how their machine works. They need to practice their optimization muscles on easy, self-contained problems, otherwise they can't tackle harder problems.
      The point of my video was not "drop down to assembly as soon as possible". The performance tips I listed are not specific to assembly, and most apply to far-from-the-machine languages like Ruby and Scala.

  • @BhargavaMan
    @BhargavaMan ปีที่แล้ว +5

    I instantly subscribed for the thumbnail butts are the coolest

  • @oglothenerd
    @oglothenerd 9 หลายเดือนก่อน +2

    I am making a compiled programming language with the safety of Rust, and the absurd control of Zig, and I am taking notes! Thank you!

  • @MrHaggyy
    @MrHaggyy ปีที่แล้ว +2

    Great video. For creating a hash table, or in performance in general, it's worth knowing how many entries you will have. A smaller table will usually be faster, in embedded it's crucial to hit the memory limitations, and on CPU's it helps with cashing.
    Length is such an important piece of information. In string compare, if the length is not equal it can't be the same, if the first letter is not the same it can't be the same.
    The holy grail would be if you can proof that several (if) checks, like length, first letter, last letter are sufficent. Then you don't need any memcopy at all.

  • @kennethbeal
    @kennethbeal ปีที่แล้ว +3

    Blast from the past: "Hash table? That's where I store my named pipes!" :)

  • @rakeshravindran3147
    @rakeshravindran3147 ปีที่แล้ว +3

    Hey, just wanted to say, hands down, one of the best deep dives I have seen in over 12 years. Awesome work man. Looking forward to this channel.
    In the “Shiftless table indexing” I think, the reason why you don’t see much difference after removing 2 LEAs is that most likely that the pipeline bottleneck there is determined by the LDDQU (128 bit load). The pipeline is already pinned by dependency on RAX by the ADD (add rax, rdx) instruction. So i think question is, can the ADD finish computing RAX and the load use RAX immediately ?
    My guess is that, if you add some pipeline fill instructions before or after that Add (but before lddqu), you shouldn’t notice any big difference.

    • @strager_
      @strager_  ปีที่แล้ว +1

      I didn't look at the code under llvm-mca, but I should!

    • @rakeshravindran3147
      @rakeshravindran3147 ปีที่แล้ว +1

      @@strager_ It might be an interesting topic in itself, if you ever get any time

  • @ItzAnameOk
    @ItzAnameOk ปีที่แล้ว +7

    I can't believe the butt was actually added. TY Strager. It's still Saturday where I am butt my weekend is now complete.

    • @strager_
      @strager_  ปีที่แล้ว +5

      You can thank Raticide on Twitch for the butt suggestion!

    • @NostraDavid2
      @NostraDavid2 ปีที่แล้ว

      Thanks Raticide, very cool

  • @discontinuity7526
    @discontinuity7526 4 หลายเดือนก่อน

    This is frankly unbelievable. Holy cow man the amount of knowledge you are dropping.

  • @maksymiliank5135
    @maksymiliank5135 ปีที่แล้ว

    This is one of the best videos about performance and low level programming I have ever seen. Usually trying to understand these concepts takes a lot of effort and I sometimes feel tired after a while. You managed to make this video easy to follow, interesting and entertaining. Great stuff.

  • @atiedebee1020
    @atiedebee1020 ปีที่แล้ว +3

    This was a great video, I loved the content and it was really concise while still keeping enough information!

  • @thokismail1
    @thokismail1 ปีที่แล้ว +3

    Really cool video. Amazing that a 10x speed increase was possible.

    • @strager_
      @strager_  ปีที่แล้ว +2

      I was hoping for 100x, but I'll take what I can get. 😜

  • @royler8848
    @royler8848 ปีที่แล้ว +17

    Can't believe this video has less than 1K views. It is so incredibly well made, really enjoyed it!.

    • @peezieforestem5078
      @peezieforestem5078 ปีที่แล้ว +1

      now it has more

    • @jma42
      @jma42 ปีที่แล้ว +1

      because most "programmers" nowadays are all about web development and npm installing stuff

    • @peezieforestem5078
      @peezieforestem5078 ปีที่แล้ว +2

      @@jma42 what are you talking about, it's been 20 hrs and the video has 11 kviews.

    • @jma42
      @jma42 ปีที่แล้ว

      ​@@peezieforestem5078 thats not my point, and besides, its too little when compared to most typical web dev content.
      What I meant is that the topic is not that interesting to the most programmers nowadays cuz they dont do computer science shit.

    • @strager_
      @strager_  ปีที่แล้ว +4

      ​@@jma42 I don't consider what I did 'computer science shit'. I consider my work closer to engineering. But I guess that's just semantics.
      There is plenty of content like this video on TH-cam. For example:
      www.youtube.com/@CppCon has lots of deep-dive talks.
      www.youtube.com/@DaveChurchill has good beginner-friendly talks which aren't just "install this library and you're done!".
      www.youtube.com/@MollyRocket has a popular series called Handmade Hero which covers making a game from scratch line-by-line.
      www.youtube.com/@fasterthanlime has educational content similar to mine.
      You say it's "too little when compared to" something else. But I don't think I'm competing for the same audience as npm bootcamp tutorial videos, and I think TH-cam's algorithm treats our content separately.

  • @CiY3
    @CiY3 ปีที่แล้ว +1

    incredible video! i have never really cared about hashes, but i was completely invested the whole time! cant wait for more videos like this one.

  • @fv1100
    @fv1100 ปีที่แล้ว +1

    This is a truly fantastic video! What I would love to see next is a comparison of the memory cost of each approach, since they handle array sparseness very differently