Designing for Programmer Delight - Richard Feldman - Software You Can Love 2022

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ส.ค. 2024

ความคิดเห็น • 25

  • @Verrisin
    @Verrisin ปีที่แล้ว +9

    From a talk about Python3 dicts: IndexMap is actually *faster* because the hash table can be compressed (only a few bits per index, for reasonably sized dicts; and you need rehashing anyway as you grow, so you just use bigger index types then) - Because the slowest thing tends to be memory access, compressing the table makes lookup faster.

  • @edgeeffect
    @edgeeffect 3 หลายเดือนก่อน

    Even though I'm fairly sure that Roc is not the language for me... I'm still always pleased to head a talk from Richard Feldman.

  • @pengain4
    @pengain4 ปีที่แล้ว +2

    I like the level of attention to details they are bringing to Roc.

  • @trondenver5017
    @trondenver5017 ปีที่แล้ว +2

    Another Feldman banger

  • @MaksimVolkau
    @MaksimVolkau ปีที่แล้ว +2

    Great topic and coverage of the principled design.

  • @juliuskoskela452
    @juliuskoskela452 ปีที่แล้ว +6

    Great overview of the topic! Personally I would've chosen to provide at least two abstraction to fulfill the design goals for the dictionary, a hash map and an ordered tree. In my c-library I just call these map_t and order_t. The hash map would prioritize insertion and lookup performance. I have implemented map_t with open addressing (as opposed to bucket implementation). I have gotten the best results with simple linear probing. As for the hash function, most x86 machines have SIMD so a simple hasher that uses the 128-bit aesenc function can be used as a default hasher on most machines (with hand written fallback). This will automatically yield 2 64-bit keys which can be used for indexing (so we get 2 indexes for the price on 1) and since aesenc takes a round_key, the rk could be randomized with for example one byte that is saved in the hash map structure if we wanted to make it ddos resistant.
    However I totally understand that the index map design is probably a pretty good compromise in many ways. It's just that sometimes you want to at least feel like you are using the "optimal tool for the job" or maybe it's just me...

    • @anderdrache8504
      @anderdrache8504 ปีที่แล้ว +2

      I guess since Roc should feel simple and safe by default, only one should be enough.
      Of course more efficient special case implementations can be provided with libraries and with a package manager that shouldn't be too much of a problem.

    • @patrolin
      @patrolin ปีที่แล้ว

      what is the use case for an "ordered tree" - is that just a priority queue?

    • @juliuskoskela452
      @juliuskoskela452 ปีที่แล้ว

      @@anderdrache8504 Yeah it's probably a reasonable tradeoff. I wonder how important it is to provide hash function genericity nowadays, since we have instruction-level support for hashing. I know that's how it's usually done and I have it so in my library, but can we reasonably expect a better solution than the 3 clock-cycle aesenc instruction? I feel it's getting a bit redundant.

  • @Verrisin
    @Verrisin ปีที่แล้ว +1

    13:07 - I would say merging dicts is incredibly common and useful.

  • @terragame5836
    @terragame5836 8 หลายเดือนก่อน

    Hyrum's law probably shouldn't be a driving factor for design decision. People may hack around as much as they want, but that should reasonably imply waiving the right to stability. Maintaining some property you haven't promised to anyone seems likely to get you stuck with a bad design. I get the point about delight, but I'm not sure how much a language may trade off for it...

  • @Verrisin
    @Verrisin ปีที่แล้ว

    17:30 - I would say that only makes sense within the "axioms" of the model. They don't have to be exactly the same values, only equivalent in respect to definitions. Any undefined behavior is free to vary.
    - EDIT: Again, I admit it's nicer for delight.

  • @TravisTennies
    @TravisTennies ปีที่แล้ว

    I would try to convince you to consider having an unordered hash AND an ordered hash. The first to guarantee speed and the other to guarantee ordered traversal. Some programmers don't know how to work around unordered.... I would love to work on your team. My brain loves low level but I have no experience.

  • @patrolin
    @patrolin ปีที่แล้ว +1

    AFAICT IndexMap still relies on an (inner) unordered map, so it doesn't actually fix hash flooding

    • @IgnacioLosiggio
      @IgnacioLosiggio ปีที่แล้ว +2

      The idea is that you can do basically anything that fixes hash flooding on that unordered map w/o changing iteration order.

    • @Verrisin
      @Verrisin ปีที่แล้ว

      @@IgnacioLosiggio without changing any observable behavior

  • @Verrisin
    @Verrisin ปีที่แล้ว +1

    13:50 - No. If the traversal order was undefined, if anyone was relying on it, they don't deserve any consideration. You have not created any bug for them they didn't already have.
    - Fair enough. It's more delightful to not have the possibility of such a bug. Still, anyone who relies on random hash ordering needs to seek help.

    • @Verrisin
      @Verrisin ปีที่แล้ว

      for flood mitigation: I think IndexMap is safe to use with random hashing, as the hash table is never exposed.

    • @rtfeldman
      @rtfeldman ปีที่แล้ว +1

      @@Verrisin Yes, we're definitely considering this!

  • @JobvanderZwan
    @JobvanderZwan ปีที่แล้ว

    Why not have two remove methods, one fast and one that preserves order at the cost of speed? Although I suppose the latter could easily introduce Accidentally Quadratic behavior so that's probably a bad idea

  • @AGeekTragedy
    @AGeekTragedy ปีที่แล้ว

    What does it do in the case that the program contains (apologising for not knowing roc):
    let new_dict = insert(old_dict, x)
    and new_dict and old_dict both stay in scope, so it can't (obviously) mutate opportunistically ? Does it copy the whole thing? I

    • @rtfeldman
      @rtfeldman ปีที่แล้ว +2

      Yeah, if opportunistic mutation isn't available, it deep-clones the whole thing. Our bet is that in performance-critical paths, you can always arrange for it to be unique (and therefore eligible for in-place mutation), and in non-performance-critical paths you won't notice an extra clone here or there. We'll see how it works out in practice!

  • @Verrisin
    @Verrisin ปีที่แล้ว

    what about python3 dict? - keeps insertion order, and has fast compressed hash table (which can even be shared across different dicts with the same keys)
    - EDIT 19:40 - Never mind. XD

    • @Verrisin
      @Verrisin ปีที่แล้ว

      The question with IndexedMap is : How will you deal with persistent versions, in case a pointer is kept to an old version of the data structure?

    • @brendanhansknecht4650
      @brendanhansknecht4650 ปีที่แล้ว

      @@Verrisin you mean if someone inserts a value into a dict that has another reference. So now you have one reference to the dict without the new key and one reference to the dict with the new key?
      In that case, it would copy the entire dictionary.