threading vs multiprocessing in python

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 มิ.ย. 2024
  • A comparative look between threading and multiprocessing in python.
    I will show activity plots of 4,8,16 threads vs 4,8,16 processes and discuss the differences between the two modules.
    In summary: threads in python are concurrent and not parallel, so no two threads can execute at the same time. The way to get around this isto use the core module multiprocessing and spawn child python processes to each run work in parallel.

ความคิดเห็น • 330

  • @sorcerer_of_supreme
    @sorcerer_of_supreme 2 ปีที่แล้ว +627

    I can't imagine the effort and time you have invested for making this video.. Very informative

    • @DavesSpace
      @DavesSpace  2 ปีที่แล้ว +32

      Thank you very much!

    • @ErikS-
      @ErikS- ปีที่แล้ว +17

      I fully agree.
      It is one of the best videos showing the issues with multithreading and how it compares to multiprocessing. It really deserves a higher piority in youtube search.

    • @anonfourtyfive
      @anonfourtyfive 10 หลายเดือนก่อน

      @@ErikS- yup, 18:24 clear my mind about it.
      I was always wondering the real impact of it.
      damn right I was for using threading, but yet, I understand now the utility of multiprocessing.
      +1
      🦾

    • @mohammadjavadebrahimi5895
      @mohammadjavadebrahimi5895 7 หลายเดือนก่อน

      Where can we have full code? Can you give a GitHub link plz? Also explain any risk of doing this multiprocessing

    • @jarrodhroberson
      @jarrodhroberson 7 หลายเดือนก่อน +1

      Too bad this video is incorrect on almost ever level.

  • @EbonySeraphim
    @EbonySeraphim 2 ปีที่แล้ว +177

    Very nice full presentation. The short of it is that "Python" doesn't support parallel execution. For most programmers, when you talk about having multiple threads, the assumption is that those threads can and will execute in parallel. Unfortunately, Python was designed with single core CPU in mind so even though the idea of threads have existed for a while in computing, code wasn't likely to be run on a multithreaded/multicore/multicpu machine to do anything in parallel. It was just the operating system giving out small slices of time to execute one thread or another and it was perceived like both were happening at the same time -- very much like your graphs show.
    Python, like most interpreted languages, cannot get over this problem because of the synchronization and locking needed to share access to data across threads so they inherently can only allow one "Python interpreted" thread to run at a time. Only library implementations in C can get around this under the hood by spawning real threads on Python's behalf to do work. Or this "multiprocess" approach, which creates a new process and an independent Python interpreter with entirely separate program state and memory. This approach isn't really a Python solution because any programming language can spawn a new OS process (provided a library is available to access fork() and exec*() system calls) and then the OS will execute that process in parallel on a multicore machine. But the thing about multiple processes is that it's harder and slower to share and synchronize data between processes than it is threads. It may not be an issue in some cases if not much synchronization is needed (the case if only an end result matters at the end of parallel work), but it can be a severe a limitation.
    The last thing I'll say is that often times IO driven or IO heavy applications don't really need a performance boost of true parallel execution. The wait for IO (disk and network for example) are so slow compared to CPU execution that most threads would be waiting for IO anyways. With proper async-io setup (kqueue, select, epoll, IO completion ports) you can use a single thread to handle and dispatch thousands of IO requests and still be bottlenecked by IO. This is how/why people can still write "performance intensive" applications with interpreted languages and compete with a language like C or C++. Maximizing IO efficiency is simply something that sometimes C/C++ won't offer any benefit for so much "slower" languages appear to be just as fast.

    • @sylvianblade75
      @sylvianblade75 2 ปีที่แล้ว +2

      I was able to combine multithreading under multiprocessing using threadpoolexecutor and processpoolexecutor thinking I could achieve true parallelism. But you’re right there is no real benefit doing multiprocessing 99% if your program is IO bound. The extra overhead and slowdown spawning processes is simply not worth it.

    • @alessandropolidori9895
      @alessandropolidori9895 ปีที่แล้ว +3

      What i can’t understand from the video is why multi-threading (even if non parallel) should help, in theory, in IO heavy applications. Can you help me?

    • @sephirot7581
      @sephirot7581 ปีที่แล้ว +2

      ​@@alessandropolidori9895 When it comes to IO the operating system is doing this in the background. So this means, you can schedule an other thread and the operating system will still do the work in the background and if your thread is scheduled again, the OS maybe finished his work and you are able to continue your work

    • @janekschleicher9661
      @janekschleicher9661 ปีที่แล้ว +4

      @@alessandropolidori9895 In theory, one thing where it can help is if the hot data that is processed, can e.g. mostly be cached in some of the direct for a cpu available cache line in L1 or L2 cache. An example could be a redis like implementation with a bloom filter (very small memory that can definitly deny if data is not in the slow data store behind and 99% or so sure if it is there). And for such a scenario, it's of course helpful if for different data stores, each one works on a different cpu core, so that the bloom filter is already in the ultrafast L1 or L2 cache in case. To be honest, for python scenarios, this is a bit far off - as you would usually implement such things in a system language like C, C++, Rust, Ada, or even Golang (there indeed exists a redis clone). The latter is an example of a language that still has its own run time and garbage collection, but optimized for such tasks.
      The more practical example is that in IO heavy tasks, some individual tasks will block. (Classic example fetch data from a SQL database or an url). Now you certainly don't want all other tasks to wait for it. The modern approach for it is async - but this is relatively new in Python (something like 4 years "young") and multithreading were the answer before the async implementations were available and production ready.
      It's also nowadays a simple way (but slightly less performant than the async implementations in most cases) to alter code and have this (mostly) non-blocking behaviour if you don't want or can't refactor the implementation.
      In general, nowadays I'd recommend either optimize the program to just run on one cpu core or to run on all, nothing in between. You can't really mix the use cases and still be performant anyway. In python, you'd end up fighting a lot with GIL (global interpreter lock) and if you have to put both use cases into one application, I'd suggest to have two different programs that communicate e.g. with a message queue asynchronously. I remember a lot of headaches with machine learning optimized implementations (e.g. Spacy) in combinations with a web server (running with wsgi). Short story: don't do it - separate them 🙂

    • @DR_1_1
      @DR_1_1 7 หลายเดือนก่อน

      I notice a 20-50% slower file copy with Python compared to system, for example with shutil.move() on Windows, I'm running the file copy on a separate pyQt thread... juste renaming the file which should not take any noticeable processing time.
      Do you think another method might be faster? I'm asking because I expect C++ would be as fast as the system in this case, not 20-50% slower...

  • @calloq1035
    @calloq1035 7 หลายเดือนก่อน +3

    What an incredible video. I’ve just been blindly picking one or the other, not sure the differences between either one, but this makes everything so clear. I’m so glad I found it!!

  • @NONAME-ey6qs
    @NONAME-ey6qs ปีที่แล้ว +1

    This is hands down the most thorrow video on a topic. And youtube shows me this exactly 1 year after I desperately needed it.
    Better late than never i guess.

  • @MrHvfan
    @MrHvfan ปีที่แล้ว +4

    Thankyou Dave, i'm so glad youtube algo's bought your video to my daily feed. A really fascinating insight into thread and processes and the presentation style was perfect. Best wishes.

  • @falwk
    @falwk 5 หลายเดือนก่อน +2

    Absolutely the best video on youtube describing how threading works in Python, with concise demonstrations and a well thought of script and presentation. 10/10, subscribed

  • @DChoi5815
    @DChoi5815 ปีที่แล้ว +6

    This is by far the most comprehensive and easily consumable video on any CS learning I've ever seen. Great job! Giving you a sub for sure.
    Keep it up Dave!

  • @CherifRahal
    @CherifRahal 3 หลายเดือนก่อน

    The amount of work done here is unblievable. Thank you so much

  • @ninjahkz4078
    @ninjahkz4078 2 ปีที่แล้ว +96

    this video is amazing, honestly one of the best I've ever seen, thank you from the bottom of my heart for dedicating so much time to creating it❤️

  • @aramshojaei8490
    @aramshojaei8490 ปีที่แล้ว +11

    This is the most comprehensive video I've ever seen on multithreading and multiprocessing. Great job!

  • @neelshah1943
    @neelshah1943 5 หลายเดือนก่อน +4

    Excellent explaination about the most complicated questions that I have ever come across in an interview setting. Even though, this is an after math I am super glad to learn in with such a thought clarity. This is how you become fear-less!!! Thank you Dave ❤!

  • @aeronesto
    @aeronesto หลายเดือนก่อน

    This is the very best explanation of threading vs multiprocessing that I have ever seen. Well done!

  • @nj6553
    @nj6553 ปีที่แล้ว

    2 minutes into this I already understand it better than all other readings I did online. Nice!

  • @TON-vz3pe
    @TON-vz3pe ปีที่แล้ว +3

    Massively underrated video. Saved it in my library. Thank you sir.

  • @javedalam7383
    @javedalam7383 2 ปีที่แล้ว +4

    Brilliant representation of the concept. Thanks for all you effort.

  • @yeahthisismyhandleyouknow
    @yeahthisismyhandleyouknow 7 หลายเดือนก่อน +1

    Yay, it's so interesting to see a visual representation of something that I have been figuring out during my work for a few years with threading\multiproc.
    When you understand it on instincts but not so visualized and vivid.

  • @rampage_sl
    @rampage_sl ปีที่แล้ว +3

    Brilliant work!! Best video on multithreading/processing I've seen in a while

  • @mehulaggarwal7776
    @mehulaggarwal7776 2 ปีที่แล้ว +3

    Literally the best video ive seen yet on this topic. Keep posting man!

  • @maply007
    @maply007 ปีที่แล้ว +1

    The best video in TH-cam explaining the concept! Thanks

  • @meme-ge8tq
    @meme-ge8tq ปีที่แล้ว +24

    This video was so informative, even for someone who is unfamiliar with the concept
    You deserve a lot more recognition

  • @AmanKumar-tu2og
    @AmanKumar-tu2og ปีที่แล้ว +33

    The best I have ever watched on multiprocessing v/s threading!! The visualizations were a complete treat ❤

  • @jerrylu532
    @jerrylu532 2 ปีที่แล้ว +3

    This channel is a hidden gem!

  • @trustytrojan
    @trustytrojan ปีที่แล้ว +3

    fantastic data visualization with the activity charts, i will be checking out more of your videos

  • @retrogamessocietybrasil3372
    @retrogamessocietybrasil3372 ปีที่แล้ว +6

    One of the best lectures on multiprocessing and threading that I ever saw. Thanks for the guide and info, this will help me improve my own lectures on the subject

  • @Matlockization
    @Matlockization ปีที่แล้ว +4

    I am impressed with your use of visual aids in explaining how all this works. It definitely makes a lot more sense.

  • @rustyelectron
    @rustyelectron 2 ปีที่แล้ว +1

    Woah, just found your channel. This is truly a goldmine.

  • @alexandrepv
    @alexandrepv 2 ปีที่แล้ว +15

    Very well explained :) I can see your number of subscribers growing at a steady pace mate. Keep it up! Good stuff!

    • @DavesSpace
      @DavesSpace  2 ปีที่แล้ว +1

      Thank you very much! Yes steady growth is encouraging 😊

  • @kedardeshmukh1168
    @kedardeshmukh1168 ปีที่แล้ว +2

    This sooooooo great.... probably the best explanation on TH-cam

  • @Adamreir
    @Adamreir 2 ปีที่แล้ว +13

    I really, really don’t understand why you don’t have more followers. Keep up the good work. This is really well done! Informative and straight out fun!

  • @MultiMojo
    @MultiMojo ปีที่แล้ว

    Incredible video and crystal clear explanations. Hope to see more !

  • @sarthaknarayan2159
    @sarthaknarayan2159 ปีที่แล้ว

    Hands down best video on python multithreading and multiprocessing.

  • @Nielsx
    @Nielsx ปีที่แล้ว

    Great video. The best multiprocessing v/s threading graphical explanation on the hole internet. Thanks for the dedication. New subscriber.

  • @thahrimdon
    @thahrimdon 8 หลายเดือนก่อน

    As someone who does data analysis and plotting with Python, thank you. So much.

  • @Subbestionix
    @Subbestionix 7 หลายเดือนก่อน

    Awesome, no pressure and yet informative! Good work! Thanks a lot! allthough i knew the topic well from uni, i could deepen my understanding with this!

  • @jylpah
    @jylpah 7 หลายเดือนก่อน

    Outstanding video. “Like” is an understatement. So clear and informative.

  • @weiao7276
    @weiao7276 2 ปีที่แล้ว +2

    It is my first time to figure out the multiprocess and threading in Python. Thanks a lot.

  • @burnthewitch7286
    @burnthewitch7286 ปีที่แล้ว

    This is the best video explanation on this topic, WOW

  • @ivankudinov4153
    @ivankudinov4153 11 หลายเดือนก่อน

    This video is a marvellous craftsmanship

  • @tinkeringengr
    @tinkeringengr ปีที่แล้ว

    Thanks for this -- looking forward to more of your work!

  • @marzhanzhylkaidarova3994
    @marzhanzhylkaidarova3994 6 หลายเดือนก่อน

    Thank you so much for your content! very useful and I really enjoyed the way you structure and visualize your video! Thank you!

  • @user-jt5nd3yq4u
    @user-jt5nd3yq4u 3 หลายเดือนก่อน

    Excellent work, very informative! Thanks a ton for your time!

  • @daymaker_bybit
    @daymaker_bybit 6 หลายเดือนก่อน

    This was a super quality educational video, thanks so much!

  • @ajblondell5853
    @ajblondell5853 7 หลายเดือนก่อน +3

    This video is amazing! I don't usually go to youtube for programming content because its all just copy paste. This is one of the most informative and useful vids I've come across in a long time. I love the graphics/ visuals. I don't know how you managed to make multithreading and multiprocessing so engaging but bravo! 👏 Keep up the great work and thank you for the content!

  • @daveys
    @daveys 6 หลายเดือนก่อน

    Excellent video, superbly made. Thanks for posting.

  • @Exce11ent22
    @Exce11ent22 ปีที่แล้ว

    I wish growth to your channel. A very informative video with amazing visualization. There would be more of this in my recommendations.

  • @abhinav_mzn
    @abhinav_mzn 10 หลายเดือนก่อน

    One of the best video that I have seen on the internet...This video forced me to subscribe this channel.

  • @mengisi
    @mengisi ปีที่แล้ว

    Unbelievable I found this video! Now open my mind about Python! Please make video like this agaiinn!

  • @robmoore423
    @robmoore423 ปีที่แล้ว +1

    This was incredibly helpful!

  • @karaca_ahmet
    @karaca_ahmet 6 หลายเดือนก่อน

    It was a very good and impressive presentation. listening to it made me feel as if David Attenborough was describing the lyrebird like in the bbc documentary. :) Thank you for your effort...

  • @kemoxplus
    @kemoxplus 7 หลายเดือนก่อน

    Great explanation! Thanks for clarifying.

  • @SubhamSharma-ei3vs
    @SubhamSharma-ei3vs 2 ปีที่แล้ว +2

    Very nice explanation . keep up the good work.

  • @TurboLoveTrain
    @TurboLoveTrain 6 หลายเดือนก่อน

    You can run parallel threads using PdP (Parallel distributed Processing) if you have a process that can run non serial...obviously there is networking overhead. Great video--lots of ground to cover.

  • @etienneboutet7193
    @etienneboutet7193 2 ปีที่แล้ว +2

    Very informative video. Thanks a lot !

  • @daniiltroshkov6081
    @daniiltroshkov6081 2 หลายเดือนก่อน

    Excellent video! Thank you!

  • @brpawankumariyengar4227
    @brpawankumariyengar4227 9 หลายเดือนก่อน

    Awesome video and so very well explained. Thank you so very much. It was excellent.

  • @NileGold
    @NileGold ปีที่แล้ว

    I love this video, the explanation is perfect

  • @azmatullah2847
    @azmatullah2847 3 หลายเดือนก่อน

    Thanks for the really great information.❤

  • @ravithejaburugu8926
    @ravithejaburugu8926 ปีที่แล้ว

    Thanks for the detailing. Excellent

  • @Kattemageren
    @Kattemageren ปีที่แล้ว

    This is a brilliant video, thank you

  • @giladfuchs2377
    @giladfuchs2377 6 หลายเดือนก่อน

    amazing explanation!!
    thank you!!

  • @Mrslykid1992
    @Mrslykid1992 2 ปีที่แล้ว +1

    HOLY CRAP THIS IS A GREAT USE CASE!

  • @Simorenarium
    @Simorenarium ปีที่แล้ว +4

    That explains the one intern I had, who wouldn't want to believe that threads are simultaneous. He said he had some python experience, but we use java.

  • @Schlumpfpirat
    @Schlumpfpirat ปีที่แล้ว

    Knew all of that already (wish it was more tl;dw - like 2mins) but think it's super extensive + informative for a beginner.

  • @hassaanshah9819
    @hassaanshah9819 ปีที่แล้ว

    Awesome attention to details 😀

  • @-_Nuke_-
    @-_Nuke_- 4 หลายเดือนก่อน

    Thank you so much for this!

  • @greob
    @greob 7 หลายเดือนก่อน

    Thanks for sharing this nice presentation!

  • @aRWorldDJ
    @aRWorldDJ 6 หลายเดือนก่อน +2

    This is a masterpiece, honestly. Content-wise is very informative, but the way you represent everything is like watching a sci-fi movie.

  • @alexengineering3754
    @alexengineering3754 ปีที่แล้ว

    Good Explanation, next time i know exactly which one is better for my purpose.

  • @user-is5vn8ie5v
    @user-is5vn8ie5v 8 หลายเดือนก่อน

    Great job and thank you so much !

  • @fabricio_patrocinio
    @fabricio_patrocinio ปีที่แล้ว +3

    Você tá de parabéns, um dos vídeos mais bem didáticos que vi sobre python. Bom trabalho e certamente irei ver mais vídeo seu!

  • @ImSidgr
    @ImSidgr 2 ปีที่แล้ว +1

    Very high quality!

  • @jcashion123
    @jcashion123 ปีที่แล้ว +1

    Just a few weeks ago I went through this discovery myself when writing a wordle solver in python. This video would have been very helpful at that time. Everything explained here is spot on.

  • @romangaranin2675
    @romangaranin2675 8 หลายเดือนก่อน

    Amazing video! Thanks a lot!

  • @a.for.arun_
    @a.for.arun_ 2 ปีที่แล้ว

    Awesome video. Those visuals are helpful. Thank you

    • @DavesSpace
      @DavesSpace  2 ปีที่แล้ว

      Glad you like them!

  • @roark45
    @roark45 ปีที่แล้ว

    Wow! Really well explained

  • @felixfourcolor
    @felixfourcolor 7 หลายเดือนก่อน +2

    PEP 703 go brrr! I'm excited to try it on python 3.13

  • @DrGreenGiant
    @DrGreenGiant 4 หลายเดือนก่อน

    I'd be curious if there are differences between python implementations when it comes to threading and multiprocessing. Might be interesting to see if pypy, for example, is more performant in spawning tasks.
    This is the first video from you I've seen and really enjoyed it!

  • @AmitKB00
    @AmitKB00 10 หลายเดือนก่อน

    Great description!

  • @Haax06
    @Haax06 5 หลายเดือนก่อน +1

    Great video! One question, how did you create the time series visualizations of threads and processes?

  • @emersontavera9362
    @emersontavera9362 ปีที่แล้ว

    thank you so much, it was a great video

  • @osogrande4999
    @osogrande4999 7 หลายเดือนก่อน

    Fantastic video.

  • @ttuurrttlle
    @ttuurrttlle ปีที่แล้ว +2

    Thanks for this video. I think this will help me out at work. Some idiot is sending us a database table output where each row of data is a json file of each column header and 1 row of data. And we're getting upwards of 50k files a day which could just be a single file that's only a few MB. I thought that multiprocessing was the way to go for this thinking that that would help with all the io calls, but it looks like multithreading is it.

  • @TheRailroad99
    @TheRailroad99 6 หลายเดือนก่อน +2

    A few things to note:
    The GIL (and therefore sequential thread execution within a process) are only an issue in CPython, not in (most) other python interpreters.
    Jython for example has true parallel threads. Also most other languages have them. This is mostly a python problem

  • @thenoseplays2488
    @thenoseplays2488 ปีที่แล้ว +1

    This has been the best explanation of the differences between the two I have seen.
    My only gripe is I really wanted to see this same data but also include a column for true single threaded work with no threading or multiprocessing enabled. How much lower than the 2million is it?
    That would have been helpful to see. Otherwise this was excellent and helped clarify what I need yo use when. Thanks so much.

    • @OMGclueless
      @OMGclueless ปีที่แล้ว +2

      You should expect it is more than 2 million, not less. Threading has some overhead, and since none of the operations in this example are i/o bound you never get that overhead back.

  • @jayaganthan1
    @jayaganthan1 10 หลายเดือนก่อน

    Awesome video. Thanks

  • @peterstark9381
    @peterstark9381 ปีที่แล้ว +2

    Is there a preferred way to have the OS do the multiprocessing for you? Meaning, not using one control process of python to kick-off all processes and waiting for them, but rather starting them loosely (e.g. using os.fork(), os.setsid, function() and then sys.exit)? I want to avoid the controlling process to get stuck waiting for the threads/processes.

  • @Shontushontu
    @Shontushontu 2 ปีที่แล้ว +1

    I love your channel :) you are a 3 blue 1 brown in the making, if not better

  • @TusharPal93
    @TusharPal93 ปีที่แล้ว

    Really nice explanation.

  • @AlexandreSiedschlag
    @AlexandreSiedschlag ปีที่แล้ว

    1ºclass work, Congratz

  • @gastonarevalo1237
    @gastonarevalo1237 3 หลายเดือนก่อน

    Really informative video¡¡ I struggled a bit with the accent and speed but it's really good¡

  • @Master_of_Chess_Shorts
    @Master_of_Chess_Shorts 3 หลายเดือนก่อน

    Nice work, would be nice to have more examples of comparison by types of operations. For example image processing, machine learning cross validation, audio processing, etc. I find it is not always simple to pick between threading or pooling as most applications are a mixture of cpu and data calls. I guess it could be broken into smaller steps but, I'd love to see how to structure that efficiently.

  • @vishwanathbondugula4593
    @vishwanathbondugula4593 7 หลายเดือนก่อน

    Really enjoyed the video all along. I have been doing multi processing and multi threading from a long term but I never knew that python's default multi threading was single threaded. I always did parallel programming in C and I used to use fork() sys call to spawn a new process, but then I switched to pthreads since the clone of parent process introduces significant amount of overhead in my case, hence pthreads worked for me. But I think C threads using pthread library are multi processed unlike python threading library. Let me know if I am wrong about pthreads.

  • @VorpalForceField
    @VorpalForceField 4 หลายเดือนก่อน

    excellent info .. Thank You .. Cheers :)

  • @maxbezrukov7711
    @maxbezrukov7711 2 ปีที่แล้ว +1

    Perfect visualisation and well presented content. Thank you for your efforts!

    • @DavesSpace
      @DavesSpace  2 ปีที่แล้ว

      Glad you enjoyed it!

  • @blazingentertainment5420
    @blazingentertainment5420 11 หลายเดือนก่อน

    i appreciate this work

  • @linuxguy1199
    @linuxguy1199 3 หลายเดือนก่อน

    One major improvement I've found is taking your CPU intensive Python code and writing it in this language called "C". Joking aside, great video!

  • @michaelmueller9635
    @michaelmueller9635 11 หลายเดือนก่อน

    This video is completely underrated.

  • @lovebroman9335
    @lovebroman9335 9 หลายเดือนก่อน

    Great video!

  • @iainelder7607
    @iainelder7607 ปีที่แล้ว +1

    Why did you use "arrays" instead of stock lists? I didn't recognize the syntax or the understand the reason for the dt variable. Thank you for the visual explanation. What are you using to draw the charts?

  • @robinmaurer2645
    @robinmaurer2645 ปีที่แล้ว

    It took me to 7:50 to realize that the voice is Ai generated. Good job. Also thanks for the helpful indepth video!