Watch out for this (async) generator cleanup pitfall in Python

mCoding

มุมมอง 22 059

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 73

@amarasa2567 2 หลายเดือนก่อน ⁺⁷³
So basically, don't open and close resources in your generators, but give handles to those resources to your generators?
Like don't let your generator open a file itself, but give it a file handle?
@PeterZaitcev 2 หลายเดือนก่อน ⁺⁶
Exactly.
@rask004 2 หลายเดือนก่อน ⁺¹⁰
So... Dependency injection?
@antifa_communist 2 หลายเดือนก่อน
@@rask004 no
@benshapiro9731 2 หลายเดือนก่อน
@@rask004more like passing a reference
@nel_tu_ 2 หลายเดือนก่อน ⁺⁶
@@benshapiro9731So... Dependency injection?
@marckiezeender 2 หลายเดือนก่อน ⁺⁹⁵
How about instead of "don't use generators" we just say "don't use 'with' or 'finally' blocks inside of generators"
Edit: what I mean is that you should avoid context and resource management inside of generators. If the generator needs access to, say, a file, then the generator's caller should be responsible for opening and closing that file.
@AbstractObserver 2 หลายเดือนก่อน
He explain why thats not a good way to do that wat in the video
@themartdog 2 หลายเดือนก่อน ⁺¹¹
@@AbstractObserver at what timestamp? I didn't hear him say that. Even if he did, I think it's a good idea to avoid with/finally in these contexts
@AJMansfield1 2 หลายเดือนก่อน ⁺³
DI is definitely the way to handle this, I agree -- but I'd say the real _problem_ it solves is the fact that the code has a sequencing requirement on the release order of those resources, but isn't written in a way that expresses that requirement.
@themartdog 2 หลายเดือนก่อน ⁺¹
@@AJMansfield1 that's a good point. LSPs should start warning devs when they do something that has an undefined behavior like this
@odosmatthews664 2 หลายเดือนก่อน ⁺²
Perfect answer. Generators are good for saving memory but bad at resource management.
@aaronm6675 2 หลายเดือนก่อน ⁺²⁸
Wild
@endoflevelboss 2 หลายเดือนก่อน ⁺⁷
James Murphy for new BDFL 🎉
@danwellington3571 2 หลายเดือนก่อน ⁺³⁴
Obligatory Rust comment that completely ignores the 30-odd years of design choices Python has to do its best to not break
@DrGreenGiant 2 หลายเดือนก่อน ⁺²
Works fine in assembly too
@youtubeenjoyer1743 หลายเดือนก่อน
"Design choices". Like the lack of block scope. Or dynamic typing. Or the module system. Or ...
@danwellington3571 หลายเดือนก่อน ⁺³
@@youtubeenjoyer1743 Yes those are design choices
@DrGreenGiant หลายเดือนก่อน
@@youtubeenjoyer1743 assembly doesn't have block scope, dynamic typing nor modules! Long live assembly!
@youtubeenjoyer1743 หลายเดือนก่อน
@@danwellington3571 Terrible, uninformed, ignorant, incompetent design choices. Even 30 years ago.
@lpt_7 หลายเดือนก่อน ⁺⁶
python developers trying to not depend on the implementation details:
@motbus3 2 หลายเดือนก่อน ⁺⁶
You can also use a context manager which encloses both generator and resources
@motbus3 2 หลายเดือนก่อน
But that's annoying
@idoben-yair429 2 หลายเดือนก่อน ⁺¹⁸
The ACTUAL solution is to write your code such that it doesn't rely on exactly when resources get cleaned up, and if the exact time of the cleanup is important, then express it EXPLICITLY in the code by calling a cleanup method. RAII-style context managers are not the right tool when the order and timing of cleanup is important. Generators are not the issue here. The engineering approach is the issue.
To elaborate: RAII means Resource Acquisition Is Initialization. The converse, i.e., Reference Loss Is Resource Disposal, does NOT hold. This is an important implication of the reference counting style of resource management, unlike C++-style RAII where you can engineer your classes to enforce both RAII and RLIRD.
@ilmuoui 2 หลายเดือนก่อน
Sounds like Rust with extra added steps
@idoben-yair429 2 หลายเดือนก่อน ⁺⁶
@@ilmuouievery language has different tools, stop worshipping languages, they’re just that - tools
@themartdog 2 หลายเดือนก่อน ⁺¹
@@ilmuoui rust itself is programming with extra stepa
@RogerValor 2 หลายเดือนก่อน ⁺¹
@@ilmuoui i love rust, but with years of python and gc language experience (java, c#, ...), i never expect the GC to release things in time anyway, that is something that i always notice as topic with C and C++ programmers, so the codebase avoids relying on this, and the resulting code is the simple readable version of the problem, except you really need it, like writing to a file you just read, or deleting or sth., the code has to use those explicit methods here shown. The with secures me not to have to explicitly close the file, this is what I want from it.
to be honest most rust apps are inside of frameworks anyway with the resources in some injection system, or (or rather, and) simply have some central static structs with arc-mutexes as a backbone, and with async, it is not unusual, to actually move into controlling it, once you reach the complexity, that it makes problems with an async main, but you clearly see which variable is temporary, and which one is part of a system like that.
And from experience, in comparison, you will reach the point where you have to build complex type maintenance, faster, simply because you have to be strict about the type system and express so much with it, however you cannot simply forget those things in python either, even if you are implicit, so you usually avoid singletons, and act like memory is an ocean and abstraction is free.
python is expected to be fast fixable, fast deployable, and error robust on runtime, but rust is expected to be efficient, error robust at compile time and be usable as compiled system code. however both usually are memory safe, and on that regard, python and rust feel weirdly close (of course in python you get an exception, but you will still not write into the wrong memory).
So I totally get what you mean you see as extra steps (which zig and go really solve well with defer), but I think each language and also mindset behind it has it's own extra steps.
@darthwalsh1 2 หลายเดือนก่อน
Instead of using a context manager, you would use an explicit try: finally: that calls a cleanup? Wouldn't that have the same semantics as with:
@PeterZaitcev 2 หลายเดือนก่อน ⁺⁵
what *could* be done with the mentioned PEP is to add support for these iterclose methods but do not activate them in any default classes, instead provide a wrapper in contextlib that you can attach to your generator functions (or classes) which adds the missing method.
@陈宇轩-o1g หลายเดือนก่อน
yea, maybe they can add a function that do something like def xxx(gen):gen.__on_iter_end__=gen.__del__;return gen🤔
and i still dont know why they did not add "yield from" and "return" in async generator. in pep-0525 they said they could, but it is too hard to do so....
but i found a snippet from pep-0380. that is to show how "yield from" works. i think maybe they could just use that snippet to replace every "yield from" in AST, to make async generator supports "yield from"?
@PeterZaitcev หลายเดือนก่อน
@@陈宇轩-o1g You misinterpret what "yield from" does. It not just an alias for "for x in it: yield x", it's a statement for passing control - it is required for generators which could communicate, and "async def" functions are ones of those.
@陈宇轩-o1g หลายเดือนก่อน
@@PeterZaitcev i know "yield from" can pass send/throw in to the inner side. that s why i think py should add it, instead of letting user type for x in obj:yield x.
@xelaxander 2 หลายเดือนก่อน ⁺⁸
The resource existing at least as long as the generator seems like to me like the natural solution. Imagine if it wasn’t, I.e. the generator was accessible while the resource isn’t. Suddenly it’s even more cursed, since you can try to get an element from the generator, but it indeterminately fails to yield.
@xelaxander 2 หลายเดือนก่อน ⁺¹
Resources not being garbage-collected immediately is a choice necessary to allow alternative Python implementations with different optimization goals.
That said, at least in CPython, you can also just del g, and then gc.collect(). Although I’m not sure if the Python spec guarantees cleanup.
@DrGreenGiant 2 หลายเดือนก่อน ⁺¹
@@xelaxanderyeah my question would be, is it in the spec or is it an implementation detail. If the latter then I'd happily be that guy who says it sounds like a non cpython implementers problem.
@marckiezeender 2 หลายเดือนก่อน
@@DrGreenGiant It's considered an implementation detail.
@aaliboyev 2 หลายเดือนก่อน ⁺⁴
As a conclusion "Wrap all your with statements with for loop inside into separate function which only holds this statement to guarantee gc"
@marckiezeender 2 หลายเดือนก่อน
Having it in a separate function doesn't guarantee gc collection in the python spec, only in cpython.
@RogerValor 2 หลายเดือนก่อน ⁺²
Not sure how often I care about when resources are cleaned up, as long as they are cleaned up. But I guess this is good to know for the case when I encounter it. So thx!
@janhwillems10000 2 หลายเดือนก่อน
Thanks a lot James!
@fyaa23 2 หลายเดือนก่อน ⁺³
Why do you expect garbage collection to behave like RAII in other languages? The point is that you don't have to take care to clean it up, but you can hardly control when it happens.
@marckiezeender 2 หลายเดือนก่อน ⁺²
The issue is the explicit cleanup of resources from the context manager or finally statement become tied to the gc (which is implementation dependent) when used inside a generator.
@tc14hd23 2 หลายเดือนก่อน ⁺²⁵
No problem was ever solved by adding 'async' in front of it
@jenaf4208 หลายเดือนก่อน
Id have the generator need the resource to be construced, and have the context manager outside of the generator
@black-snow หลายเดือนก่อน
And then there's the annoying bug with async generators and context vars. I feel like aGens are in such bad shape that I don't dare to use them.
@anon_y_mousse 2 หลายเดือนก่อน
Do you think async in general is a good idea? Is it actually better than just using proper threads?
@dvhh 2 หลายเดือนก่อน ⁺²
Unfortunately in python yes, and even in general, most developers are not comfortable enough with dealing with common multi-thread programming pitfall.
Multiprocess on the other hand tough.
@anon_y_mousse 2 หลายเดือนก่อน
@@dvhh Do you think it would be better for a new language to have it, or to promote using proper threads and perhaps have a tutorial for using them as part of the examples? I have seen that a lot of the examples for languages which use async still have to demonstrate how it's used for newbies. Would it not be better to just promote proper threads?
@IanEpperson 2 หลายเดือนก่อน ⁺²
Async and threads are not the same thing. You could use them in a similar way, but os threads and cooperative multitasking have very different characteristics.
@anon_y_mousse 2 หลายเดือนก่อน
@@IanEpperson And? Did you have anything to actually contribute or did you actually think I don't know what async functions are?
@dvhh 2 หลายเดือนก่อน ⁺¹
@@anon_y_mousse true, but I don't think most programming pulled off actually providing good tutorial/guide for concurrent programming with multi-threading. maybe python is better off with multi-processing or/and async, leaving most of the multi-threading nastiness to "lower-level" languages.
@tophat593 2 หลายเดือนก่อน ⁺¹²
Sorry, beyond tired and ever so slightly drunk. Want to listen but need to sleep and tomorrow I won't remember this video exists.
@exploited410 2 หลายเดือนก่อน ⁺²³
Answering this so you have a notification tomorrow and will see this video on a clear mind
@MarianoBustos-i1f 2 หลายเดือนก่อน ⁺²
wake up and watch the video.
@gustavomendez2891 2 หลายเดือนก่อน ⁺²
wake up bro
@leetdavid 2 หลายเดือนก่อน ⁺²
You snooze you lose! Wake up!
@MrAlanCristhian 2 หลายเดือนก่อน ⁺⁷
That is the type of behavior that is detected with unit testing.
@austinnar4494 2 หลายเดือนก่อน ⁺¹³
That's not true, the async case is literally non-deterministic. So you could have application code that expects the resource to be cleaned up, the unit tests pass, but then fails in production.
@MrAlanCristhian 2 หลายเดือนก่อน
@@austinnar4494 No, in that case the test will randomly fail. Because the test suit it's supose to run multiple times before deployment. The test suit runs every time a change is made.
@themartdog 2 หลายเดือนก่อน ⁺¹¹
@@MrAlanCristhian but it could randomly fail only .01% of the time...
@austinnar4494 2 หลายเดือนก่อน ⁺⁸
@@MrAlanCristhian non-deterministic in this sense does not necessarily mean "randomly." In an async context especially, the particular time that the cleanup code is run is highly sensitive to what other async code is being run and what else is scheduled on the event loop. In unit tests, the async generator could be the only thing being tested, so it always works as expected and passes. But in production, there could be a large number of async tasks on the event loop, so the cleanup code does not run immediately, causing a failure
@youtubeenjoyer1743 หลายเดือนก่อน
asyncio runtime is deterministic. If you order your unit tests in a certain way, you might never catch the issue.
@SpeedingFlare 2 หลายเดือนก่อน ⁺¹
Just use C and do if(error) goto cleanup; where cleanup cleans up anything that's non-null in whatever order you want.
For async, I don't know
@dirtdart81 2 หลายเดือนก่อน
I'm only up to 8:30 but why can't you just do a `async for x in contextlib.aclosing(gen()):` ?

ต่อไป

เล่นอัตโนมัติ