I have a lingering doubt with respect to concurrency (more so with the threading library). The GIL. I cannot quite wrap my head around the internals that doesn't allow Python to achieve true parallelism via multi-threading. People often whip out multiprocessing in the name of GIL (I am sure that the ones who have this approach of thinking too are somewhat lacking in fundamentals here). You had clearly stated that we should use multiprocessing for CPU-bound tasks while we should go for threading for I/O-bound tasks. My question is: does this behaviour with respect to multiprocessing and threading carry over to other languages as well (say, Java)? And what exactly is this GIL and why does it hamper "true parallelism while threading" (when it... should not? At least that is what my understanding was)? As always, have been loving your videos lately!
You hit on a really interesting (and contentious) topic with the GIL! So there is a lot of history with the GIL and its implementation. It largely came from a time where most people didn't have multiple CPU cores but still wanted to do parallel processing. It actually does a good job stopping most people from tripping themselves up, but imposes some limitations, namely stopping true threaded concurrency. Now, there are a few reasons why I haven't gone in depth with the GIL (it could use its own video), and one of the biggest of those is that it may be going away in future versions of python. We're already seeing some of that coming with Python 3.12 and the per-interpreter GIL implementation. But to answer the biggest of your questions, the GIL is effectively just a lock; pretty similar to what we used in the Threading video. Python is designed so certain objects or actions need to acquire and release that lock. Since Python runs in a single process by default, that act of waiting for the lock makes it impossible for true parallelism. This is because while one python thread is doing something, it has acquired the lock and it isn't until it releases the lock that other threads are able to acquire it. This is a giant subject, but if you think a standalone video on the GIL would be useful, then I'll add it to my list. I hope this helps and keep up the great questions!
@@JakeCallahan Interesting! Would definitely love a video about that sometime. Even the removal of GIL should have some sort of repercussion in the language design? The solution to solving this problem could definitely use its own video... Thanks for the clarification!
ty for (another) great video! my question, for the final example, on line 25. does this loop need to complete before the tasks will start? because it seems the "all tasks submitted" prints before the tasks start. if we were submitting many more tasks, would it still perform this way?
Great question! So the tasks actually start when submitted to the executor. The reason why is (almost) always prints that the tasks have been submitted first is because it does take a little bit of time for each process to start up. Try this experiment: Add a time.sleep(1) between when the tasks are submitted and the print statement, then see what happens. Play around with the sleep time more as well.
Continue with more video, u r awesome ... Thank u... On threading, racing condition... Celery worker, distributed computing.... Real-world production grade python coding practices etc etc... I'm a quant
Thank you for the kind words. I do have a video on threading that covers lock race conditions. There's also many more planned, including most of what you mentioned. Stay tuned
A lot of thanks for sharing your knowledge, dude! In the 2nd example when you're getting prime numbers I didn't know that way to test if a number is prime or not (what's its name?), so I coded the classical and academic definition of primes, and when I ran it, what a surprise, 20 min., 25 min., not finished, and I question myself WTF I'm doing wrong: so when I was washing my clothes, I realized it, the classical definition is making an absurd and vast amount of divisions when testing a number prime. I reviewed the code and looked for the best optimized prime testing until a video that explains a TH-cam channel called GeeksForGeeks, which is exactly like yours but coded differently (your code is also testing negative numbers which is good), and finally I kept following your explanation and I was much glad to remember that feeling of getting stuck by a task and resolving with my own effort plus your video and the other sources. ¡Saludos desde México, estimado!
Interesting and tricky question! This is likely something you'll need external tooling for. The basic idea would be to snapshot your system's python processes before and after splitting off to multiprocessing. This is one way but, depending on what type of overhead you're wanting to measure, there would be other ways to measure.
I have a lingering doubt with respect to concurrency (more so with the threading library). The GIL. I cannot quite wrap my head around the internals that doesn't allow Python to achieve true parallelism via multi-threading. People often whip out multiprocessing in the name of GIL (I am sure that the ones who have this approach of thinking too are somewhat lacking in fundamentals here). You had clearly stated that we should use multiprocessing for CPU-bound tasks while we should go for threading for I/O-bound tasks. My question is: does this behaviour with respect to multiprocessing and threading carry over to other languages as well (say, Java)? And what exactly is this GIL and why does it hamper "true parallelism while threading" (when it... should not? At least that is what my understanding was)?
As always, have been loving your videos lately!
You hit on a really interesting (and contentious) topic with the GIL!
So there is a lot of history with the GIL and its implementation. It largely came from a time where most people didn't have multiple CPU cores but still wanted to do parallel processing. It actually does a good job stopping most people from tripping themselves up, but imposes some limitations, namely stopping true threaded concurrency.
Now, there are a few reasons why I haven't gone in depth with the GIL (it could use its own video), and one of the biggest of those is that it may be going away in future versions of python. We're already seeing some of that coming with Python 3.12 and the per-interpreter GIL implementation.
But to answer the biggest of your questions, the GIL is effectively just a lock; pretty similar to what we used in the Threading video. Python is designed so certain objects or actions need to acquire and release that lock. Since Python runs in a single process by default, that act of waiting for the lock makes it impossible for true parallelism. This is because while one python thread is doing something, it has acquired the lock and it isn't until it releases the lock that other threads are able to acquire it.
This is a giant subject, but if you think a standalone video on the GIL would be useful, then I'll add it to my list. I hope this helps and keep up the great questions!
@@JakeCallahan Interesting! Would definitely love a video about that sometime. Even the removal of GIL should have some sort of repercussion in the language design? The solution to solving this problem could definitely use its own video...
Thanks for the clarification!
I'll add a GIL deep dive to my list, maybe even dig into the CPython source for examples.
Very informative and well structured. Learnt a few things.
And you have a perfect voice, fit for NatGeo type documentary narrations.
Cheers
thank you for this video! you deserve more likes and follows! people are missing out!
ty for (another) great video! my question, for the final example, on line 25. does this loop need to complete before the tasks will start? because it seems the "all tasks submitted" prints before the tasks start. if we were submitting many more tasks, would it still perform this way?
Great question! So the tasks actually start when submitted to the executor. The reason why is (almost) always prints that the tasks have been submitted first is because it does take a little bit of time for each process to start up.
Try this experiment: Add a time.sleep(1) between when the tasks are submitted and the print statement, then see what happens. Play around with the sleep time more as well.
very good video
Continue with more video, u r awesome ... Thank u...
On threading, racing condition...
Celery worker, distributed computing....
Real-world production grade python coding practices etc etc...
I'm a quant
Thank you for the kind words. I do have a video on threading that covers lock race conditions.
There's also many more planned, including most of what you mentioned. Stay tuned
A lot of thanks for sharing your knowledge, dude! In the 2nd example when you're getting prime numbers I didn't know that way to test if a number is prime or not (what's its name?), so I coded the classical and academic definition of primes, and when I ran it, what a surprise, 20 min., 25 min., not finished, and I question myself WTF I'm doing wrong: so when I was washing my clothes, I realized it, the classical definition is making an absurd and vast amount of divisions when testing a number prime. I reviewed the code and looked for the best optimized prime testing until a video that explains a TH-cam channel called GeeksForGeeks, which is exactly like yours but coded differently (your code is also testing negative numbers which is good), and finally I kept following your explanation and I was much glad to remember that feeling of getting stuck by a task and resolving with my own effort plus your video and the other sources. ¡Saludos desde México, estimado!
That's my favorite part of programming, developing and testing possible solutions to interesting problems. Glad you enjoyed it!
hellos . how can i calculate overhead cost of multiprocessing
Interesting and tricky question! This is likely something you'll need external tooling for. The basic idea would be to snapshot your system's python processes before and after splitting off to multiprocessing.
This is one way but, depending on what type of overhead you're wanting to measure, there would be other ways to measure.