@@thichquang1011 >>> if b := len("abc") == 3: ... print(b) ... True >>> if (c := len("abc")) == 3: ... print(c) ... 3 >>> b True >>> c 3 >>> Oh man this is gross as fuck, confusing order of operations by default, no block scoping, reuse of a character that holds special meaning in Python already (the colon)...
Great talk. For me this really drove home the old adage, "Necessity is the mother of invention." A long time ago, they didn't have memory to waste. Reminds me of a talk by Mike Acton in 2014 (he used a set of key-value pairs to illustrate how to think about minimizing cache misses).
Sometimes I think that in comparison to older times, we haven't really invented any great stuff as people did back then rather we just made the same things more efficient and proper, and that too because we had more time to observe the best way to do so.
I personally really like trie dicts, as used in Clojure. (I would want to implement them slightly differently, but maybe it would be worse, who knows) - They can also be persistent (immutable + fast update) - and they just make sense to me...
This is obviously faster, though. I wonder if this approach could be made immutable... - All the data is already shared, one just needs to somehow have half-weak references in the 'table' and only needs to copy the hash-index (which could be made of parts (very wide trie), for really big dicts)
ok, so all data was not shared. It was only for object instance dicts (key sharing). - One would have to deal with missing keys of the 'actual' dict when iterating, and such... and you cannot easily share different values for the same key... - Tries are likely still better for persistent... but who knows.
Can you actually implemet python3.6+ dictionaries in python2.7? I would assume it is possible if it is all just python algo, could it be used to speed up python2 runtime?
@@versacebroccoli7238 Lol, even python 3.8 is abandoned today. Python3.9 will receive security patches for the next year, and in one year you can say goodbye to 3.9 too
Why store all the hashes? Faster reallocation? that seems like the only usage... Edit: I see. The way they implement hashing, they are checking possibly other keys, because of collisions. Thus, checking the hashes makes lookup faster.
Such a beautiful narrative, rarely find this kind of tech talk. My question is, it seems that compact dict is mainly used for Classes, where the number of fields is fixed, but the number of instance can vary. So when talking about the insertion and deletion, it should be inserting or deleting columns that matters. All the examples focus on rows. So how does it work when an instance is created or garbage collected?
I came to the comments to ask the exact same question. Googled a bit and the answer can mostly be found by reading PEP 412. Yes, shared keys are only used for instance dicts. And the values aren't really stored they way they are shown in the slides here liked a compacted table. Each instance has an independent new dict object created, but that object simply points back to a common list of shared keys.
32:10 I don't understand how he puts all the values in the same tuple for all the dicts... That makes no sense to me. - At least values must be per-dict (per instance)... 34:57 - This. So weird... I don't see how it makes sense...
I also can't wrap my head around the fact that they're a lot of dicts with absolutely different keysets, and that must be a huge waste of space for sparsity. Did you sort that out for yourself?
Why is signed char / short / long for the indices? Why not use unsigned, and instead of using negative 1 as your "None", just use the largest number in the range (255 for char).
it's implemented like that, but explaining a small implementation detail like that isn't really relevant to the topic of the talk would've just wasted time.
That's actually a really good question. Doesn't that use up more of the address space, meaning that the maximum number of slots is only 128 instead of 255?
Really fun talk.I do not understand why the hashes are stored in the compact ordered dict implementation. Surely they are only needed to resolve the index? Once that is done, hash is derived from key on lookup so I'm not sure what value there is in storing it..
Your key can be an object, and computing the hash can be a tangible operation. By storing the hash, it speeds up identifying hits during data retrieval.
So, the hash table people discovered tables and indexes, just as the database people discovered key-value stores? Next stop, implement MVCC-powered dict.
According to Wikipedia the pronunciation originated in two different areas, Italy and then later eastern Europe. He's using the Italian pronunciation, Gwee-doh. The German pronunciation is Gee-do. I didn't know about the German pronunciation until just now, so that was a cool find.
You know you're getting older when you patiently suffer what an earlier version of yourself would've called an unbearably condescending and unpleasant talk, all for the wisdom of it.
idk how u got condascended or anything but i specifically searched this guy bc his explanations are fun and informative.. I never had the feeling of condascence or anything
TL;DR like Cache invalidation == to cache... 57min in-- mentioned about Linux users. As previously working for a eCommerce "tech import from China" company- Linux users (of which as company we were) would return products because the chipset was not what we said it was. The first 10,000 prouducts were the chipset as advertised- we would test, but we did not test every device. We got duds on an entire order or box at time. Linux users were the best. They'd submit reproduable issue- how they discoved what they did and how they did it. Honestly, the best of our clients. If something did now work they were not igornet, they understood (before we did) our newest order was effed up. Naturally as par company code- we paid to reship working and ate the cost ourselves. The linux community is great. Shit-- you ordered one- it worked you ordered 1,000- we thought we were giving you idetical product (we were lead to believe) you as a community understood. And we ate the cost which is what we should and did do. Compared to someone buyung a 6$ wifi dongle and calling us scammers because it "did not give them internet" and then trying to explain the dongle is a wifi dongle and you have to have a wifi network to use it... linux and linux people rock. We get error logs from them-- We actually blocked all IE explorere users from direct orders, because 90% of our customrs who clogged up our customer service were using IE-- we directed to a "download firefox or chrome" page. Since business side speaking was better. If someone was using IE and tried to make 10,000$ purchase, and using IE, we'd be like no.
Guido actually changed his mind and ordered dict is now (3.7) officially part of the python language specs! xD
> he was the "Benevolent dictator for life" (BDFL) until he stepped down from the position in July 2018
per wiki
@@Asdayasman Because of 3.8 walrus operator :=
@@thichquang1011
>>> if b := len("abc") == 3:
... print(b)
...
True
>>> if (c := len("abc")) == 3:
... print(c)
...
3
>>> b
True
>>> c
3
>>>
Oh man this is gross as fuck, confusing order of operations by default, no block scoping, reuse of a character that holds special meaning in Python already (the colon)...
@@Asdayasman boo hoo, cant use fucking parantheses.
@@fridyair Way to spend ten months not understanding my point at all.
Excellent talk. I learned a lot from this. Thank you SF Python and Raymond Hettinger. He's an excellent speaker.
3.13.1, the size of each dict at 7:28 is 280 bytes. I am not sure why that is the case.
3.11.9 the same thing, 280 bytes
I still listen to this and find it awesome.
Great talk. For me this really drove home the old adage, "Necessity is the mother of invention." A long time ago, they didn't have memory to waste. Reminds me of a talk by Mike Acton in 2014 (he used a set of key-value pairs to illustrate how to think about minimizing cache misses).
31:20 This looks a hell of a lot like the original database format, but with an added hash key.
Thats the point!
and the "added hash key" is just an alternative index
I love this guy
He was right, Guido did change his mind: now ordered dict is the default in CPython :)
When he said “Thanks Microsoft!”, I couldn’t really tell if he was sarcastic or not.
Raymond Hettinger and Scott Meyres are the best speakers in the whole IT business...
Have you listened to Martin Fowler and Martin Roberts?
@@kasozivincent8685 Martin Fowler is amazing speaker too. I do not know Martin Roberts.
The size of the dictionaries using Python v3.6.2 :5 fd33b5 is 68 - 5:45
Who learned something new?
Sometimes I think that in comparison to older times, we haven't really invented any great stuff as people did back then rather we just made the same things more efficient and proper, and that too because we had more time to observe the best way to do so.
I personally really like trie dicts, as used in Clojure.
(I would want to implement them slightly differently, but maybe it would be worse, who knows)
- They can also be persistent (immutable + fast update)
- and they just make sense to me...
the compact hashtable is nice, though
This is obviously faster, though. I wonder if this approach could be made immutable... - All the data is already shared, one just needs to somehow have half-weak references in the 'table' and only needs to copy the hash-index (which could be made of parts (very wide trie), for really big dicts)
ok, so all data was not shared. It was only for object instance dicts (key sharing). - One would have to deal with missing keys of the 'actual' dict when iterating, and such... and you cannot easily share different values for the same key...
- Tries are likely still better for persistent... but who knows.
Can you actually implemet python3.6+ dictionaries in python2.7? I would assume it is possible if it is all just python algo, could it be used to speed up python2 runtime?
It was implemented to 2.7 using pypy. I think the python core team abandoned python 2 entirely.
@@versacebroccoli7238 Lol, even python 3.8 is abandoned today. Python3.9 will receive security patches for the next year, and in one year you can say goodbye to 3.9 too
*So is it advisable to use dict as a database?*
NoSQL is similar to dict
Yeah, actually good and really fast.
All the auto resizing is great but why not allow fixed sizes for optimization
Excellent talk, would love to see the slides, dropbox link is dead (seriously? dropbox? *sigh*)
Why store all the hashes? Faster reallocation? that seems like the only usage...
Edit: I see. The way they implement hashing, they are checking possibly other keys, because of collisions. Thus, checking the hashes makes lookup faster.
Such a beautiful narrative, rarely find this kind of tech talk.
My question is, it seems that compact dict is mainly used for Classes, where the number of fields is fixed, but the number of instance can vary. So when talking about the insertion and deletion, it should be inserting or deleting columns that matters. All the examples focus on rows. So how does it work when an instance is created or garbage collected?
I came to the comments to ask the exact same question. Googled a bit and the answer can mostly be found by reading PEP 412. Yes, shared keys are only used for instance dicts. And the values aren't really stored they way they are shown in the slides here liked a compacted table. Each instance has an independent new dict object created, but that object simply points back to a common list of shared keys.
@@sweepingdenver thanks. So it's like EnumMap in Java?
32:10 I don't understand how he puts all the values in the same tuple for all the dicts... That makes no sense to me. - At least values must be per-dict (per instance)...
34:57 - This. So weird... I don't see how it makes sense...
key sharing is only for instance dicts? 47:40
I also can't wrap my head around the fact that they're a lot of dicts with absolutely different keysets, and that must be a huge waste of space for sparsity.
Did you sort that out for yourself?
when you do sys.getsizeof on an empty python3.6 dictionary it still return 240, why?
Probably a default size. An empty dict is usually created to add items to it.
@Yochem 2311 Did you even watch the video?
Great presentation
slides dl.dropboxusercontent.com/u/3967849/sfmu2/_build/html/index.html
thx a lot
link dead
Why is signed char / short / long for the indices?
Why not use unsigned, and instead of using negative 1 as your "None", just use the largest number in the range (255 for char).
it's implemented like that, but explaining a small implementation detail like that isn't really relevant to the topic of the talk would've just wasted time.
That's actually a really good question. Doesn't that use up more of the address space, meaning that the maximum number of slots is only 128 instead of 255?
Being excited about having the highest score in mini golf remembers me of an episode of Curious George.
There must be a better way!
*Thuds fist on the table*
Excellent
Really fun talk.I do not understand why the hashes are stored in the compact ordered dict implementation. Surely they are only needed to resolve the index? Once that is done, hash is derived from key on lookup so I'm not sure what value there is in storing it..
Your key can be an object, and computing the hash can be a tangible operation. By storing the hash, it speeds up identifying hits during data retrieval.
This is essential Python. The only way to learn to master 3.x
So, the hash table people discovered tables and indexes, just as the database people discovered key-value stores? Next stop, implement MVCC-powered dict.
How does someone setup Pypy? Online documentation seems lacking and would love to experiment.
Cache lines, i don't understand it.
Why does he pronounce Guido as Kwee-doh? I thought it would be Gee-do.
According to Wikipedia the pronunciation originated in two different areas, Italy and then later eastern Europe. He's using the Italian pronunciation, Gwee-doh. The German pronunciation is Gee-do. I didn't know about the German pronunciation until just now, so that was a cool find.
@@Alcamore Actually he's using the Dutch pronunciation: /ˈɣido/
Can we have timestamps?
6:20 I get 140 in my python 2.7.11...
David Franco n: 0_
Mayor Adam West knows a lot about dicts
lol
this guy looks more like an fbi agent than a coder.
He look like teddy in two and half men
Loves Microsoft. Uses a Macbook.
Is he...drunken? (;
Nah, he is just enthusiastic. He believes in what he does.
You know you're getting older when you patiently suffer what an earlier version of yourself would've called an unbearably condescending and unpleasant talk, all for the wisdom of it.
> patiently suffer
That's just means you're still an easily offended kid.
privet vastutnestoyalo, I believe he was saying "getting older" instead of getting really old.
idk how u got condascended or anything but i specifically searched this guy bc his explanations are fun and informative.. I never had the feeling of condascence or anything
I didn't sense any condescension . . .
They call that wisdom.
Kinda verbose, but a lot good stuff.
try() saying it more concisely and taking your audience with you?
TL;DR
like Cache invalidation == to cache... 57min in-- mentioned about Linux users. As previously working for a eCommerce "tech import from China" company- Linux users (of which as company we were) would return products because the chipset was not what we said it was. The first 10,000 prouducts were the chipset as advertised- we would test, but we did not test every device. We got duds on an entire order or box at time. Linux users were the best. They'd submit reproduable issue- how they discoved what they did and how they did it. Honestly, the best of our clients. If something did now work they were not igornet, they understood (before we did) our newest order was effed up. Naturally as par company code- we paid to reship working and ate the cost ourselves.
The linux community is great. Shit-- you ordered one- it worked you ordered 1,000- we thought we were giving you idetical product (we were lead to believe) you as a community understood. And we ate the cost which is what we should and did do.
Compared to someone buyung a 6$ wifi dongle and calling us scammers because it "did not give them internet" and then trying to explain the dongle is a wifi dongle and you have to have a wifi network to use it... linux and linux people rock. We get error logs from them-- We actually blocked all IE explorere users from direct orders, because 90% of our customrs who clogged up our customer service were using IE-- we directed to a "download firefox or chrome" page. Since business side speaking was better. If someone was using IE and tried to make 10,000$ purchase, and using IE, we'd be like no.