Modern Dictionaries by Raymond Hettinger

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ก.ค. 2024
  • Abstract
    Python's dictionaries are stunningly good. Over the years, many great ideas have combined together to produce the modern implementation in Python 3.6. This fun talk is given by Raymond Hettinger, the Python core developer responsible for the set implementation and who designed the compact-and-ordered dict implemented in CPython for Python 3.6 and in PyPy for Python 2.7. He will use pictures and little bits of pure python code to explain all of the key ideas and how they evolved over time. He will also include newer features such as key-sharing, compaction, and versioning. This talk is important because it is the only public discussion of the state of the art as of Python 3.6. Even experienced Python users are unlikely to know the most recent innovations.
    Who and Why (Audience):
    This talk is for all Python programmers. It is designed to be fully understandable for a beginner (it starts from first principles) but to have new information even for Python experts (how key-sharing works, how the compact-ordered patch works, how dict versioning works). At the end of this talk, you can confidently say that you know how modern Python dictionaries work and what it means for your code.
    Bio
    Raymond Hettinger has also served as a director of the Python Software Foundation, and has mentored many people over the years on their contributions to the python-dev community. He is also well known for his contributions to the Python Cookbook, and shares many pieces of Python wisdom on Twitter. He is a frequent keynote speaker at Python Conferences around the world and has received the Distinguished Service Award at PyCon 2014 for his exceptional contributions to the python community.
    Other info:
    This talk is delivered at SF Python's 2nd Annual Holiday Party for Python Devs in SF Bay Area, CA. In you are in San Francisco area looking to meet other python devs, please check our schedule for meetups on sfpython.org
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 95

  • @aliqandil
    @aliqandil 5 ปีที่แล้ว +95

    Guido actually changed his mind and ordered dict is now (3.7) officially part of the python language specs! xD

    • @Asdayasman
      @Asdayasman 4 ปีที่แล้ว +3

      > he was the "Benevolent dictator for life" (BDFL) until he stepped down from the position in July 2018
      per wiki

    • @thichquang1011
      @thichquang1011 3 ปีที่แล้ว +4

      @@Asdayasman Because of 3.8 walrus operator :=

    • @Asdayasman
      @Asdayasman 3 ปีที่แล้ว +3

      @@thichquang1011
      >>> if b := len("abc") == 3:
      ... print(b)
      ...
      True
      >>> if (c := len("abc")) == 3:
      ... print(c)
      ...
      3
      >>> b
      True
      >>> c
      3
      >>>
      Oh man this is gross as fuck, confusing order of operations by default, no block scoping, reuse of a character that holds special meaning in Python already (the colon)...

    • @fridyair
      @fridyair 3 ปีที่แล้ว +1

      @@Asdayasman boo hoo, cant use fucking parantheses.

    • @Asdayasman
      @Asdayasman 3 ปีที่แล้ว

      @@fridyair Way to spend ten months not understanding my point at all.

  • @msunardi
    @msunardi 7 ปีที่แล้ว +21

    Excellent talk. I learned a lot from this. Thank you SF Python and Raymond Hettinger. He's an excellent speaker.

  • @jacobs-h398
    @jacobs-h398 7 ปีที่แล้ว +35

    I love this guy

  • @hcubill
    @hcubill 7 หลายเดือนก่อน

    I still listen to this and find it awesome.

  • @YouLilalas
    @YouLilalas 4 ปีที่แล้ว +17

    When he said “Thanks Microsoft!”, I couldn’t really tell if he was sarcastic or not.

  • @tripkendall
    @tripkendall 6 ปีที่แล้ว +32

    Who learned something new?

  • @liesdamnlies3372
    @liesdamnlies3372 7 ปีที่แล้ว +4

    Great talk. For me this really drove home the old adage, "Necessity is the mother of invention." A long time ago, they didn't have memory to waste. Reminds me of a talk by Mike Acton in 2014 (he used a set of key-value pairs to illustrate how to think about minimizing cache misses).

  • @nagra11
    @nagra11 7 ปีที่แล้ว +1

    Excellent

  • @unperrier5998
    @unperrier5998 4 ปีที่แล้ว +2

    He was right, Guido did change his mind: now ordered dict is the default in CPython :)

  • @workflowinmind
    @workflowinmind 7 ปีที่แล้ว +3

    Great presentation

  • @RonJohn63
    @RonJohn63 7 ปีที่แล้ว +15

    31:20 This looks a hell of a lot like the original database format, but with an added hash key.

    • @simonmasters3295
      @simonmasters3295 5 ปีที่แล้ว +13

      Thats the point!

    • @Verrisin
      @Verrisin 3 ปีที่แล้ว +1

      and the "added hash key" is just an alternative index

  • @MrMamanDon
    @MrMamanDon 2 ปีที่แล้ว +1

    Sometimes I think that in comparison to older times, we haven't really invented any great stuff as people did back then rather we just made the same things more efficient and proper, and that too because we had more time to observe the best way to do so.

  • @vladimirkraus1438
    @vladimirkraus1438 6 ปีที่แล้ว +4

    Raymond Hettinger and Scott Meyres are the best speakers in the whole IT business...

    • @kasozivincent8685
      @kasozivincent8685 2 ปีที่แล้ว

      Have you listened to Martin Fowler and Martin Roberts?

    • @vladimirkraus1438
      @vladimirkraus1438 2 ปีที่แล้ว

      @@kasozivincent8685 Martin Fowler is amazing speaker too. I do not know Martin Roberts.

  • @agranero6
    @agranero6 5 หลายเดือนก่อน

    Being excited about having the highest score in mini golf remembers me of an episode of Curious George.

  • @BlindPigBluesBand
    @BlindPigBluesBand 5 ปีที่แล้ว +2

    The size of the dictionaries using Python v3.6.2 :5 fd33b5 is 68 - 5:45

  • @muckvix
    @muckvix 7 ปีที่แล้ว +14

    slides dl.dropboxusercontent.com/u/3967849/sfmu2/_build/html/index.html

  • @Verrisin
    @Verrisin 3 ปีที่แล้ว +2

    I personally really like trie dicts, as used in Clojure.
    (I would want to implement them slightly differently, but maybe it would be worse, who knows)
    - They can also be persistent (immutable + fast update)
    - and they just make sense to me...

    • @Verrisin
      @Verrisin 3 ปีที่แล้ว

      the compact hashtable is nice, though

    • @Verrisin
      @Verrisin 3 ปีที่แล้ว

      This is obviously faster, though. I wonder if this approach could be made immutable... - All the data is already shared, one just needs to somehow have half-weak references in the 'table' and only needs to copy the hash-index (which could be made of parts (very wide trie), for really big dicts)

    • @Verrisin
      @Verrisin 3 ปีที่แล้ว

      ok, so all data was not shared. It was only for object instance dicts (key sharing). - One would have to deal with missing keys of the 'actual' dict when iterating, and such... and you cannot easily share different values for the same key...
      - Tries are likely still better for persistent... but who knows.

  • @atifadib
    @atifadib 5 ปีที่แล้ว +2

    when you do sys.getsizeof on an empty python3.6 dictionary it still return 240, why?

    • @BladeOfLight16
      @BladeOfLight16 5 ปีที่แล้ว +3

      Probably a default size. An empty dict is usually created to add items to it.

    • @skepticmoderate5790
      @skepticmoderate5790 4 ปีที่แล้ว

      @Yochem 2311 Did you even watch the video?

  • @mansamusa-el4705
    @mansamusa-el4705 5 ปีที่แล้ว +7

    *So is it advisable to use dict as a database?*

  • @user-zi2zv1jo7g
    @user-zi2zv1jo7g 4 หลายเดือนก่อน

    All the auto resizing is great but why not allow fixed sizes for optimization

  • @timdiekmann5751
    @timdiekmann5751 6 ปีที่แล้ว +2

    Excellent talk, would love to see the slides, dropbox link is dead (seriously? dropbox? *sigh*)

  • @grailfinder201
    @grailfinder201 7 ปีที่แล้ว +20

    There must be a better way!

  • @12345charliebrown
    @12345charliebrown 3 ปีที่แล้ว

    This is essential Python. The only way to learn to master 3.x

  • @janjanacek8089
    @janjanacek8089 ปีที่แล้ว +1

    Can you actually implemet python3.6+ dictionaries in python2.7? I would assume it is possible if it is all just python algo, could it be used to speed up python2 runtime?

    • @versacebroccoli7238
      @versacebroccoli7238 ปีที่แล้ว

      It was implemented to 2.7 using pypy. I think the python core team abandoned python 2 entirely.

  • @Verrisin
    @Verrisin 3 ปีที่แล้ว +1

    Why store all the hashes? Faster reallocation? that seems like the only usage...
    Edit: I see. The way they implement hashing, they are checking possibly other keys, because of collisions. Thus, checking the hashes makes lookup faster.

  • @mrmagnetic927
    @mrmagnetic927 7 ปีที่แล้ว +1

    How does someone setup Pypy? Online documentation seems lacking and would love to experiment.

  • @thinkingaloud1833
    @thinkingaloud1833 4 ปีที่แล้ว +1

    Such a beautiful narrative, rarely find this kind of tech talk.
    My question is, it seems that compact dict is mainly used for Classes, where the number of fields is fixed, but the number of instance can vary. So when talking about the insertion and deletion, it should be inserting or deleting columns that matters. All the examples focus on rows. So how does it work when an instance is created or garbage collected?

    • @sweepingdenver
      @sweepingdenver 4 ปีที่แล้ว +1

      I came to the comments to ask the exact same question. Googled a bit and the answer can mostly be found by reading PEP 412. Yes, shared keys are only used for instance dicts. And the values aren't really stored they way they are shown in the slides here liked a compacted table. Each instance has an independent new dict object created, but that object simply points back to a common list of shared keys.

    • @thinkingaloud1833
      @thinkingaloud1833 4 ปีที่แล้ว

      @@sweepingdenver thanks. So it's like EnumMap in Java?

  • @Jacklawi
    @Jacklawi 7 ปีที่แล้ว +1

    Why is signed char / short / long for the indices?
    Why not use unsigned, and instead of using negative 1 as your "None", just use the largest number in the range (255 for char).

    • @cyndrthenerd9865
      @cyndrthenerd9865 5 ปีที่แล้ว +9

      it's implemented like that, but explaining a small implementation detail like that isn't really relevant to the topic of the talk would've just wasted time.

    • @skepticmoderate5790
      @skepticmoderate5790 4 ปีที่แล้ว

      That's actually a really good question. Doesn't that use up more of the address space, meaning that the maximum number of slots is only 128 instead of 255?

  • @TheClonerx
    @TheClonerx 3 ปีที่แล้ว

    Can we have timestamps?

  • @SumoCumLoudly
    @SumoCumLoudly 6 ปีที่แล้ว +2

    Mayor Adam West knows a lot about dicts

  • @yvrelna
    @yvrelna 5 ปีที่แล้ว +4

    So, the hash table people discovered tables and indexes, just as the database people discovered key-value stores? Next stop, implement MVCC-powered dict.

  • @onehungrygeek
    @onehungrygeek 5 ปีที่แล้ว

    Why does he pronounce Guido as Kwee-doh? I thought it would be Gee-do.

    • @Alcamore
      @Alcamore 5 ปีที่แล้ว +1

      According to Wikipedia the pronunciation originated in two different areas, Italy and then later eastern Europe. He's using the Italian pronunciation, Gwee-doh. The German pronunciation is Gee-do. I didn't know about the German pronunciation until just now, so that was a cool find.

    • @philbest
      @philbest 5 ปีที่แล้ว +3

      @@Alcamore Actually he's using the Dutch pronunciation: /ˈɣido/

  • @copperfield42
    @copperfield42 7 ปีที่แล้ว +1

    6:20 I get 140 in my python 2.7.11...

  • @minyakonga8897
    @minyakonga8897 8 หลายเดือนก่อน

    Cache lines, i don't understand it.

  • @josephantoniou3778
    @josephantoniou3778 3 ปีที่แล้ว

    Really fun talk.I do not understand why the hashes are stored in the compact ordered dict implementation. Surely they are only needed to resolve the index? Once that is done, hash is derived from key on lookup so I'm not sure what value there is in storing it..

    • @xycno
      @xycno 3 ปีที่แล้ว

      Your key can be an object, and computing the hash can be a tangible operation. By storing the hash, it speeds up identifying hits during data retrieval.

  • @Verrisin
    @Verrisin 3 ปีที่แล้ว +1

    32:10 I don't understand how he puts all the values in the same tuple for all the dicts... That makes no sense to me. - At least values must be per-dict (per instance)...
    34:57 - This. So weird... I don't see how it makes sense...

    • @Verrisin
      @Verrisin 3 ปีที่แล้ว

      key sharing is only for instance dicts? 47:40

    • @IPClimber
      @IPClimber 3 ปีที่แล้ว

      I also can't wrap my head around the fact that they're a lot of dicts with absolutely different keysets, and that must be a huge waste of space for sparsity.
      Did you sort that out for yourself?

  • @hemanth6951
    @hemanth6951 3 ปีที่แล้ว +1

    He look like teddy in two and half men

  • @splendorman7922
    @splendorman7922 3 ปีที่แล้ว

    this guy looks more like an fbi agent than a coder.

  • @keshavnemeli
    @keshavnemeli 4 ปีที่แล้ว +1

    Loves Microsoft. Uses a Macbook.

  • @michaelstahn2945
    @michaelstahn2945 7 ปีที่แล้ว +2

    Is he...drunken? (;

    • @themeeman
      @themeeman 6 ปีที่แล้ว +6

      Nah, he is just enthusiastic. He believes in what he does.

  • @thorbergson
    @thorbergson 7 ปีที่แล้ว +28

    You know you're getting older when you patiently suffer what an earlier version of yourself would've called an unbearably condescending and unpleasant talk, all for the wisdom of it.

    • @privetvastutnestoyalo2339
      @privetvastutnestoyalo2339 6 ปีที่แล้ว +18

      > patiently suffer
      That's just means you're still an easily offended kid.

    • @leeritenour
      @leeritenour 6 ปีที่แล้ว +1

      privet vastutnestoyalo, I believe he was saying "getting older" instead of getting really old.

    • @kennys1881
      @kennys1881 5 ปีที่แล้ว +12

      idk how u got condascended or anything but i specifically searched this guy bc his explanations are fun and informative.. I never had the feeling of condascence or anything

    • @jamieg2427
      @jamieg2427 5 ปีที่แล้ว +1

      I didn't sense any condescension . . .

    • @JeremyAndersonBoise
      @JeremyAndersonBoise 5 ปีที่แล้ว +1

      They call that wisdom.

  • @xappppp
    @xappppp 6 ปีที่แล้ว

    Kinda verbose, but a lot good stuff.

    • @simonmasters3295
      @simonmasters3295 5 ปีที่แล้ว +1

      try() saying it more concisely and taking your audience with you?

  • @jonklopfer9827
    @jonklopfer9827 5 ปีที่แล้ว

    TL;DR
    like Cache invalidation == to cache... 57min in-- mentioned about Linux users. As previously working for a eCommerce "tech import from China" company- Linux users (of which as company we were) would return products because the chipset was not what we said it was. The first 10,000 prouducts were the chipset as advertised- we would test, but we did not test every device. We got duds on an entire order or box at time. Linux users were the best. They'd submit reproduable issue- how they discoved what they did and how they did it. Honestly, the best of our clients. If something did now work they were not igornet, they understood (before we did) our newest order was effed up. Naturally as par company code- we paid to reship working and ate the cost ourselves.
    The linux community is great. Shit-- you ordered one- it worked you ordered 1,000- we thought we were giving you idetical product (we were lead to believe) you as a community understood. And we ate the cost which is what we should and did do.
    Compared to someone buyung a 6$ wifi dongle and calling us scammers because it "did not give them internet" and then trying to explain the dongle is a wifi dongle and you have to have a wifi network to use it... linux and linux people rock. We get error logs from them-- We actually blocked all IE explorere users from direct orders, because 90% of our customrs who clogged up our customer service were using IE-- we directed to a "download firefox or chrome" page. Since business side speaking was better. If someone was using IE and tried to make 10,000$ purchase, and using IE, we'd be like no.