I'm glad to see how simple the config parsing part of flake8 is. Pylint on the other hand, has it split in so many parts it's hard to even keep in your head. There's actually two different config parsing modules, one that uses argparse, and an old one that uses optparse.. (Granted they'll be releasing 3.0 soon that removes the old one)
in general, poorly -- an update to the reference count of an object pages in the whole PyObject* structure. there's ways to help this with `gc.freeze()` which I plan to cover in another video. I've also seen an alternative python implementation optimized for fork-based workloads (think like uwsgi / apache) which moves the refcounts off of the objects and into a reserved area of memory -- though this trades off recount lookup speed for memory and memory is usually cheaper.
@@anthonywritescode makes sense, thanks! I did a test and found that using fork() memory would grow over time, not all the way to the memory used by spawn() but enough that I would rather pay the upfront cost and deal with higher but stable memory use than trying to rely on fork() preserving memory.
hey at 5:04 could you explain the OOM issue in fork() and not in spawn(). because spawning also requires a whole new process to start right? and the memory is limited.
thanks.@@anthonywritescode . also want to know how the __name__ works with multiprocessing.. because I assume that spawned process also gets the __name__ as __main__. then shouldn't the child process recursively spawn another child process and son on?
Hi, I have to use spawn in one of my projects (because of CUDA). I need to spawn multiple processes, but when the first process is spawned it runs through main and terminates and no other process is spawned. I don't really get why this happens. Why could this be the case?
What about in fork, if you modify the global variable between entering the Pool context and running the map? (i.e. I do `global glob; glob = 3` before calling print(list(p.map(...)))) I would guess, based on the paint explanation, that the child process is still watching the memory of the parent process, so it will print [4,5,6]. But when I tried it I still got [3,4,5]. Thanks!
from multiprocessing import Process global_variable = 10 def modify_global(): global global_variable global_variable += 5 print(f"Child process: Modified global_variable to {global_variable}") if __name__ == "__main__": global_variable = 4 print(f"Parent process: Original global_variable is {global_variable}") child_process = Process(target=modify_global) child_process.start() child_process.join() print(f"Parent process: After child process, global_variable is still {global_variable}") In the above code, the child process can still access the global_variable assignment in the __main__ function. I thought I understood that the child process only cares about the program state from the line where it is spawned - and also the global variables - here the glob var assignment is before it is spawned and still it accessed the modified glob variable
At first, I wasn't really fond of the new video thumbnails. But I can say, they grew on me 🙂
Glad to see your paint game has improved !
I'm glad to see how simple the config parsing part of flake8 is. Pylint on the other hand, has it split in so many parts it's hard to even keep in your head.
There's actually two different config parsing modules, one that uses argparse, and an old one that uses optparse.. (Granted they'll be releasing 3.0 soon that removes the old one)
it's most of what I've worked on the last few years
I should have watched this video like a week earlier. it would have saved me all this rework. Thanks for the video. Great explanation.
Thanks! I never understood the difference, now it's pretty clear :)
Thanks Anthony, great explanation
MacOS is only compatible with ancient versions of POSIX, so...
Plus, do you run Paint in VM or in Wine? :D
Great video as always!
How does fork’s ROW memory interact with reference counting or other intricacies of Python memory management?
in general, poorly -- an update to the reference count of an object pages in the whole PyObject* structure. there's ways to help this with `gc.freeze()` which I plan to cover in another video. I've also seen an alternative python implementation optimized for fork-based workloads (think like uwsgi / apache) which moves the refcounts off of the objects and into a reserved area of memory -- though this trades off recount lookup speed for memory and memory is usually cheaper.
@@anthonywritescode makes sense, thanks! I did a test and found that using fork() memory would grow over time, not all the way to the memory used by spawn() but enough that I would rather pay the upfront cost and deal with higher but stable memory use than trying to rely on fork() preserving memory.
On Windows, we're used to being slow.
hey at 5:04 could you explain the OOM issue in fork() and not in spawn(). because spawning also requires a whole new process to start right? and the memory is limited.
when using fork the original process is copied, with spawn it starts from 0
thanks.@@anthonywritescode .
also want to know how the __name__ works with multiprocessing..
because I assume that spawned process also gets the __name__ as __main__.
then shouldn't the child process recursively spawn another child process and son on?
are you observing that? I think you can answer your own question
interestingly that (recursive spawning) is not happening.. and I was wondering why.. perhaps you could help me with it.@@anthonywritescode
clearly it doesn't set name to `__main__`
Hi, I have to use spawn in one of my projects (because of CUDA). I need to spawn multiple processes, but when the first process is spawned it runs through main and terminates and no other process is spawned. I don't really get why this happens. Why could this be the case?
it's impossible to know without seeing your code -- maybe stop by the discord with a minimal example? th-cam.com/video/ritp4gAqNMI/w-d-xo.html
@@anthonywritescode yeah definitely
What about in fork, if you modify the global variable between entering the Pool context and running the map? (i.e. I do `global glob; glob = 3` before calling print(list(p.map(...))))
I would guess, based on the paint explanation, that the child process is still watching the memory of the parent process, so it will print [4,5,6]. But when I tried it I still got [3,4,5].
Thanks!
once forked the memory spaces are separate -- writes in the parent won't be reflected in the child
Ohh no! Thumbnail and video title are off by one! (491 vs 492)
oops at least it was just the title being wrong
love the video
from multiprocessing import Process
global_variable = 10
def modify_global():
global global_variable
global_variable += 5
print(f"Child process: Modified global_variable to {global_variable}")
if __name__ == "__main__":
global_variable = 4
print(f"Parent process: Original global_variable is {global_variable}")
child_process = Process(target=modify_global)
child_process.start()
child_process.join()
print(f"Parent process: After child process, global_variable is still {global_variable}")
In the above code, the child process can still access the global_variable assignment in the __main__ function.
I thought I understood that the child process only cares about the program state from the line where it is spawned - and also the global variables - here the glob var assignment is before it is spawned and still it accessed the modified glob variable
it depends whether you're using fork or spawn. the default depends on your python version and operating system
Wholesome.
nice