I absolutely love the way you present your tutorials, in a non-nonsense manner - providing all the salient points without unnecessary info! A fantastic style and very impactful/memorable - thanks and keep it up!
The timing of this video is perfect! I was starting to consider parallelizing my code at work, but wasn't sure where to start. Now I can get started *and* not get bitten by a data-race right away 😌 Thanks! 🙏
For those of you wondering why the speedup of the parallel sum was so little (despite the high number of inputs) - this is due to false sharing. This happens when you have threads frequently reading and writing to adjacent elements of memory (like in an array) as it can cause performance issues due to cache invalidation.
I have 3 videos on GPGPU using CUDA.jl (06x10, 06x11 & 06x12): [06x10] th-cam.com/video/VpbMiCG2Tz0/w-d-xo.html [06x11] th-cam.com/video/YwHGnHI5UxA/w-d-xo.html [06x12] th-cam.com/video/4PmcxUKSRww/w-d-xo.html
Very nice video. One of the best tutorials in parallel computing. This is the video that I was looking for. I really appreciate your tutorial. Many many thanks .🙇♂ Moreover to compare the timing of parallel programme, I used the "Happy-birthday" problem as in the youtube video "Parallel Computing on Your Own Machine", where I used the threading inside "birthday_distribution()" function. This, I add just to help the people who is new to parallel programming.
Thanks for the kind word! For anyone interested in watching "Parallel Computing on Your Own Machine", here a link to Prof. Edelman's video: th-cam.com/video/dczkYlOM2sg/w-d-xo.html
BTW: On my computer the multithreaded abxy-function was slightly faster (about 5%) than the builtin BLAS-function, when using the @inbounds-macro (on a M1-Pro processor).
Thank you for the great video! Could you please cover the package Distributed as well in the future? Personally, I think multi-processing is much more interesting than multi-threads. I used 6 to 10 threads to solve 4 similar optimization problems with Ipopt. The speedup from multi-threading is only about 20%. In contrast, using 4 cores with the package Distributed, it becomes almost 3 times faster.
You must be a mind-reader! The next video will be on Multi-Processing. It's available now for channel members, but it will be released to the public tomorrow (Sunday, February 12th). That's interesting to hear about your experience. Thanks for sharing!
In VS Code Settings, search for "Julia Execution Result Type". There's a drop-down menu where you can choose to view the results, either in the REPL, inline or both. I'm using "both" as my option. After selecting an option, place your cursor at the end of the line of code and then hit Shift+Enter to execute that line. Hope that helps!
I absolutely love the way you present your tutorials, in a non-nonsense manner - providing all the salient points without unnecessary info! A fantastic style and very impactful/memorable - thanks and keep it up!
Thanks for the kind words and thanks for watching!
Incredible tutorials!
Never seen something better than this, in any subject.
Keep up the good work, and thanks!
With your channel i leanned almost all i needed to know to get starded on julia. You will be on the greetings of my master thesis
That's amazing! I'm honored!
I love your videos! I always recommend them to my friends in the JuliaBrasil group.
I'm very grateful! Much love to JuliaBrasil!
Dude I am soooo glad I found your channel! Your way of teaching is entertaining and informative!
Welcome to the channel! Enjoy exploring the content!
The timing of this video is perfect! I was starting to consider parallelizing my code at work, but wasn't sure where to start.
Now I can get started *and* not get bitten by a data-race right away 😌
Thanks! 🙏
Cool! Good luck!
For those of you wondering why the speedup of the parallel sum was so little (despite the high number of inputs) - this is due to false sharing. This happens when you have threads frequently reading and writing to adjacent elements of memory (like in an array) as it can cause performance issues due to cache invalidation.
This is a great video. Thank you for making it!
You're welcome!
I know it was just meant to be a toy example but for simple reductions like summing a bunch of elements foldxt from the Transducers library is great.
Thanks!
Thank YOU! I really appreciate it!
Great tutorial! Very well explained 👍
Thanks!
Great video!! Do/Will you have a tutorial on GPU computing or heterogeneous (CPU+GPU) computing?
I have 3 videos on GPGPU using CUDA.jl (06x10, 06x11 & 06x12): [06x10] th-cam.com/video/VpbMiCG2Tz0/w-d-xo.html [06x11] th-cam.com/video/YwHGnHI5UxA/w-d-xo.html [06x12] th-cam.com/video/4PmcxUKSRww/w-d-xo.html
Very nice video. One of the best tutorials in parallel computing. This is the video that I was looking for. I really appreciate your tutorial. Many many thanks .🙇♂
Moreover to compare the timing of parallel programme, I used the "Happy-birthday" problem as in the youtube video "Parallel Computing on Your Own Machine", where I used the threading inside "birthday_distribution()" function. This, I add just to help the people who is new to parallel programming.
Thanks for the kind word! For anyone interested in watching "Parallel Computing on Your Own Machine", here a link to Prof. Edelman's video: th-cam.com/video/dczkYlOM2sg/w-d-xo.html
I was able to understand about threads and how to use them
Thank you
You're welcome!
BTW: On my computer the multithreaded abxy-function was slightly faster (about 5%) than the builtin BLAS-function, when using the @inbounds-macro (on a M1-Pro processor).
Very interesting. Thanks for sharing your results!
Thank you for the great video! Could you please cover the package Distributed as well in the future? Personally, I think multi-processing is much more interesting than multi-threads. I used 6 to 10 threads to solve 4 similar optimization problems with Ipopt. The speedup from multi-threading is only about 20%. In contrast, using 4 cores with the package Distributed, it becomes almost 3 times faster.
You must be a mind-reader! The next video will be on Multi-Processing. It's available now for channel members, but it will be released to the public tomorrow (Sunday, February 12th). That's interesting to hear about your experience. Thanks for sharing!
@@doggodotjl That is great! Looking forwards to the video
Really nice video
Thank you! And thanks for being a channel member!
10:25 How do you get julia output directly in the vscode editor? Sorry I haven't been following your videos for quite some time.
In VS Code Settings, search for "Julia Execution Result Type". There's a drop-down menu where you can choose to view the results, either in the REPL, inline or both. I'm using "both" as my option. After selecting an option, place your cursor at the end of the line of code and then hit Shift+Enter to execute that line. Hope that helps!
my right ear enjoyed this video
Much cores with this one
LOL! Ah, the classic Doge meme...
what irritates me is to run both serial and parallel at least 3 time, then running it again at least 3 more times to be able to compare it fairly
On the first run Julia compiles the code. So this run takes usually longer.
Thanks!
Thank YOU!