Threads On Multicore Systems

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.พ. 2025
  • This video was sponsored by Brilliant.
    To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/.... You’ll also get 20% off an annual premium subscription.
    Join CodeCrafters and learn by creating your own: Redis, Git, Http server, Interpreter, Grep... in your favorite programming language:
    app.codecrafte...
    In this video we explore the role of threads on multi-core systems.
    Questions and business contact:
    contact.coredumped@gmail.com
    Sponsor my work on Github:
    github.com/jdv...
    Join our discord server:
    / discord
    Follow me on twitter:
    twittter.com/c...
    Twitch:
    / coredumpped

ความคิดเห็น • 392

  • @CoreDumpped
    @CoreDumpped  หลายเดือนก่อน +29

    For those wondering about my computer: x.com/coredumpped/status/1868128051361354032?s=46
    This video was sponsored by Brilliant.
    To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/CoreDumped. You’ll also get 20% off an annual premium subscription.

  • @Newb1eProgrammer
    @Newb1eProgrammer หลายเดือนก่อน +348

    No channel has every single video worth watching, well except for the lord core dumped
    Can we have more computer architecture videos?

    • @CoreDumpped
      @CoreDumpped  หลายเดือนก่อน +57

      Yes, sir!

    • @Xeros08
      @Xeros08 หลายเดือนก่อน +4

      Yes! Please

    • @Taented_9
      @Taented_9 หลายเดือนก่อน +3

      Realest 🔥💯

  • @DevWolf31
    @DevWolf31 หลายเดือนก่อน +99

    This channel is like a gold mine. Here you can found all those concepts that seem complicated in a form that almost everyone can understand.

  • @CaptTerrific
    @CaptTerrific หลายเดือนก่อน +38

    We havent even touched on synchronization and atomics yet, but I'm already stoked to see how that + this video + your IPC video come together to allow us to finally have insight into true parallel scalability! This journey has been so consistently clear yet thorough every step of the way, I can't wait to watch it culminate :)

    • @poopick9258
      @poopick9258 หลายเดือนก่อน

      atomics? i'm scared to even google this
      i will though

  • @SlackwareNVM
    @SlackwareNVM หลายเดือนก่อน +89

    10:11 You can still chunk the array into 4 segments, find the minimum in each, and then take the minimum from those 4 results as you join the threads.

    • @no1ofinterst
      @no1ofinterst หลายเดือนก่อน +11

      Yep! Forkjoin parallelism to the rescue!

    • @leongtengman6261
      @leongtengman6261 หลายเดือนก่อน +11

      We may split, map and reduce if the operation is associative like sum, min, max.

    • @EvanMildenberger
      @EvanMildenberger หลายเดือนก่อน

      It's good because a monad is just a monoid in the category of endofunctors! ;)
      I've finally realized that that statement just refers to things that are not nested in a meaningful way. What @leongtengman6261 is referring to about associativity is that a data structure that's associative for some operation can be thought of as "flat" and easily "chunkable" when trying to apply that operation to it. Trees can be represented as nested tuples whereas something list-like doesn't need parentheses to distinguish the structure. The challenge is that applying a map function to a tree requires knowledge of the internal structure and isn't easily scalable.
      But something without structure like lists, arrays, etc (aka anything "monoidal", think "mono" for one possible internal structure) can easily have a map function applied to its parts in chunks as long as the overall ordering of the chunks is preserved (since list-like things still imply an ordering property compared to something like a bag or set). So monoids can be trivially fanned out / split into chunks as small as you want, have a function applied to all the items in each chunk, and then be fanned back in to a single data structure. Then the restriction on time is the overhead of time fanning out and in (or "forkjoin" as @no1ofinterst phrases it) as well as number of available resources.

    • @TheRealZitroX
      @TheRealZitroX หลายเดือนก่อน +2

      thats what i thought. works when you have billions of numbers and split them

    • @omri9325
      @omri9325 หลายเดือนก่อน +1

      I have a feeling he was leading up to it in a follow up example but the video ended before it happened, maybe in a future video.

  • @rohithpokala
    @rohithpokala หลายเดือนก่อน +121

    Babe wakeup. Core Dumped released his new video with in 15 days.

    • @Newb1eProgrammer
      @Newb1eProgrammer หลายเดือนก่อน

      Yup

    • @sursmansurs9717
      @sursmansurs9717 หลายเดือนก่อน

      Yeah

    • @CaptTerrific
      @CaptTerrific หลายเดือนก่อน

      Babe wakeup. @rohithpokala is about to win a subscription to codecrafters!

  • @stephanbylkov8215
    @stephanbylkov8215 หลายเดือนก่อน +6

    I'm a cs graduate and I think your videos are perfect for software engineers who did not have chance to attend good cs basics classes. Your videos are simple, yet very informative and well structured.
    For those who also like your videos I would recommend reading the book "The Elements of Computing Systems". It's quite simple and gives a lot hands-on exercises of building things from the cpu parts up to a game.
    I would also like to see video about cpu-gpu communication.
    PS: messed up a computer? seems like someone uses arch btw...

    • @Newb1eProgrammer
      @Newb1eProgrammer หลายเดือนก่อน

      @@stephanbylkov8215 in not even in college yet but want to learn about system level concepts and this has been especially useful
      I'll be sure to check out the book too

  • @nickalexander2505
    @nickalexander2505 หลายเดือนก่อน +3

    Still can’t believe I’m learning more about circuits and operation load from you instead of my professors. Don’t know how involved in the subject you are, but getting into the material science and electrical engineering side of computing would be marvelous to have.
    Thanks a ton and keep up the quality content!

  • @stachowi
    @stachowi หลายเดือนก่อน +31

    This channel is GOAT.

  • @edvinberhan3842
    @edvinberhan3842 หลายเดือนก่อน +4

    Amazed at the quality of content you are putting out! Illuminating animations combined with concrete examples from the linux kernel is such a great way to communicate these concepts.
    Few other channels give me as many “aha” moments per unit of time! Keep up your style!

  • @olivierjoris3201
    @olivierjoris3201 หลายเดือนก่อน +12

    To answer the question asked in the video at 9:45, think of it like baking a cake. You have two parts to the task: mixing the ingredients and baking the cake. Mixing can be done quickly and shared among friends, so the more friends you have, the faster you can mix. But once the cake is mixed, it has to go into the oven, and that part can only be done one at a time.
    So, even if you have four friends helping you mix, you still have to wait for the oven to bake the cake. This means that while adding more friends speeds up the mixing, the overall time is limited by the baking time. The more time-consuming the baking (which is the sequential part), the less benefit you get from having more friends (or processors).
    This is related to Amdahl's Law that teaches us that not all tasks can be sped up just by adding more resources, especially if there are parts that must be done one after the other. It’s a great reminder of the limits of parallel processing. More info here: en.wikipedia.org/wiki/Amdahl%27s_law.

  • @pyikyaw3741
    @pyikyaw3741 หลายเดือนก่อน +4

    Please also include hyper-threading and the way it works in future episodes. Does it really have parallelism? Or just improves concurrency? I’ve been reading around the internet but I would love a CoreDumpped level explanation. Thank you for all the great videos so far.

  • @oscarleon736
    @oscarleon736 หลายเดือนก่อน +8

    9:40
    My thoughts are as follows, one might think that with 4 cores we’d achieve a 4x speedup, but the reality is more complex. I like to use this analogy: it’s like trying to cook faster using several pans-while you have more "processing power," there are other factors to consider:
    Communication overhead: The cores need to coordinate with each other.
    Shared resources: RAM and cache are shared (like when multiple cooks share utensils).
    System competition: Other programs also need CPU time.
    From my experience with parallelism, using 4 cores typically yields a 2-3x speedup. It’s a significant improvement, but it doesn’t reach the theoretical 4x due to these limitations.
    I found it really interesting to see in the video benchmarks how different parts of the array took varying amounts of time to process. It makes sense-some numbers require more calculations to determine if they’re prime.

  • @ignaciosavi7739
    @ignaciosavi7739 หลายเดือนก่อน +1

    Fantastic video! Your explanation of the difference between concurrency and parallelism was incredibly clear-I've never seen it laid out so simply before. The examples of data parallelism vs. task parallelism really helped solidify my understanding.
    On the topic of performance gains from parallelism, it might be interesting to dive into **Amdahl's Law** in a future episode. It highlights how the potential speedup of a task is limited by the sequential portion of the task that can't be parallelized. Understanding this could help viewers grasp why adding more cores doesn't always result in linear performance gains.
    Keep up the great work! Your videos make complex concepts accessible and engaging. Looking forward to the next one!

  • @anshumanupadhyay8073
    @anshumanupadhyay8073 หลายเดือนก่อน +2

    This is one of the most knowledgeable channels I have ever found on TH-cam

  • @axelnava2800
    @axelnava2800 หลายเดือนก่อน

    With this, now I can understand more about threads and parallelism.
    Pls, continue doing more videos like this and take the time that you need to do it. Thanks Core Dump

  • @Eiarlk
    @Eiarlk หลายเดือนก่อน +4

    One of the best thing about this channel is that it uses Rust language to explain concepts

  • @mehmet2247
    @mehmet2247 หลายเดือนก่อน

    I watched every single video in this channel to learn about the inner secrets of CPU and memory interaction. It worked good for me so far. Thank you.

  • @johnmesfin625
    @johnmesfin625 หลายเดือนก่อน +1

    Your videos have been a lifesaver for my computer architecture studies, Winning the CodeCrafters subscription would be a huge boost to my learning journey. Thank you for creating such amazing and insightful content!

  • @_edoyt
    @_edoyt หลายเดือนก่อน

    this is awesome! I have an upcoming OS exam and seeing these concepts come to life through your animations has made grasping the underlying theory so much more intuitive! much love from italy🙏

  • @michakowalski4578
    @michakowalski4578 หลายเดือนก่อน +1

    10:50 in this example the best way would be to create one loop in one thread but add there variables for min, max etc, all can be done in one loop.

  • @KoDeCllan
    @KoDeCllan หลายเดือนก่อน +1

    Great video! The upper limit for parallel speed up can be described with Amdahls law. The law states, that the sequential part of a program (for instance synchronization points) have a tremendous impact on the achievable parallel speedup. You can think of it literally like a bottleneck: even though the majority of the bottle is pretty thick, the speed in which the fluid flows out the bottle depends on the smallest area.

  • @YellowFlash469
    @YellowFlash469 หลายเดือนก่อน

    Your video really helped me understand the concepts of OS, especially threads, concurrency, and parallelism, which were tough to grasp before. It even helped me in my exams! As a student, learning about how an OS works and the basic principles of programming feels so much easier and more fun now.
    Your channel is such an incredible source of knowledge for students like me who are eager to learn but often struggle with complex topics. Thank you for creating such amazing and free content! Please keep making more videos like this-you’re inspiring the next generation of developers. 🙌

  • @nousquest
    @nousquest หลายเดือนก่อน +8

    My cop-out for OS exam studying is watching your videos

  • @sattwyk
    @sattwyk หลายเดือนก่อน +2

    this channel is slowing becoming one of my favourites

  • @crystal4372
    @crystal4372 หลายเดือนก่อน

    Please keep it up you are an amazing channel providing content which makes really hard to grasp and hard to research knowledge really accessible.

  • @md.mohaiminulislam9644
    @md.mohaiminulislam9644 19 วันที่ผ่านมา

    By the power vested in me as a youtube commenter, I announce you as the 3Blue1Brown of OS and Computer Architecture since they are so pleasing to watch! In terms of visuals/animation, content, and delivery, all are great!

  • @sidreddy7030
    @sidreddy7030 หลายเดือนก่อน

    I learn something useful everytime I watch your videos and I watch them on TV cuz I love them so much lol.
    I think the state of the system at a given point in times changes the execution times despite repeating the same task multiple times. You might have another background thread using one of the cores, making the running process get access to the core much later and so on.

  • @chk1414
    @chk1414 หลายเดือนก่อน +37

    Ben eater disappeared when the world needed him the most. But i believe core dumped could save the world

    • @CoreDumpped
      @CoreDumpped  หลายเดือนก่อน +19

      My only regret is not being able to record my own voice because some people seem to hate TTS to the point where they won't even watch the content but just comment "AI slop"

    • @Carmenifold
      @Carmenifold หลายเดือนก่อน +19

      @@CoreDumpped personally i love the joke of you actually being an LLM, it's a fun spin on the channel.

    • @breafoga
      @breafoga หลายเดือนก่อน +18

      ​@@CoreDumppedwell, i gotta say the TTS voice is an *appeal* of the channel to me. it's memorable, comfy, puts me into a listening and study mindset, and ive just come to think of you like this. i legit quite like it man, you're doing an incredible job with these videos

    • @lauriekimani
      @lauriekimani หลายเดือนก่อน +7

      ​@CoreDumpped AI or not, your work can't be described as "slop".Every video seems to have a lot of thought put in.

    • @nightfox6738
      @nightfox6738 หลายเดือนก่อน +11

      @@CoreDumpped What those people don't realize is that just because the narration is AI, it doesn't mean the content wasn't extremely well thought out and high quality and certainly wasn't AI generated like the voice. It could be worth trying your own voice but to be honest it's really not that big a deal. The people who are actually interested in the content won't care.

  • @deepjyotipaul1532
    @deepjyotipaul1532 หลายเดือนก่อน

    You are a gift to humanity 🙏🏽
    Please share your knowledge - there are millions of people who would greatly benefit

  • @matyi1997en
    @matyi1997en หลายเดือนก่อน

    The visualisations help out a ton in understanding these concepts! Great job and thank you!

  • @ojasgupta1189
    @ojasgupta1189 หลายเดือนก่อน

    Your explanations are awesome.
    Please make more videos.
    Also can you share some resources from where you are learning these concepts.

  • @andreyx2057
    @andreyx2057 หลายเดือนก่อน +3

    An interesting note about threads on multicore systems is that although they generally improve performance, there are cases where performance can suffer compared to a single threaded program. Such as poor implementations where the a thread has to wait for another to complete its task

  • @Dremth
    @Dremth หลายเดือนก่อน +1

    It's worth mentioning that, depending on your data size, the order in which you process your data between threads may matter for cache performance, and the way in which your threads can sleep and be rescheduled may also impact performance due to the cache. This is one reason we sometimes might lock a thread to be scheduled on a specific core. The cache is typically tiered, meaning there will be a larger cache shared by all cores, and smaller caches exclusive to each core.
    For instance, at 10:25 with the example of task parallelism, some of the tasks are starting at the end, some at the beginning. If your data set is sufficiently large that it cannot fit within the shared cache, then your tasks could be competing for space in that cache as it tries to fetch from different places in memory, rather than using what another thread already fetched. Of course, caches can be complicated, especially when considering cache access for multiple simultaneous threads, and sometimes things don't work as you expect, so this is all hypothetical. All this is to say that if you want maximum performance, sometimes the order in which data is processed among different threads can be important, and profiling your code with these different scenarios is essential to finding the optimal performance, if that is important to you.

  • @TrollFunMineMafia
    @TrollFunMineMafia หลายเดือนก่อน

    awesome content! It would be interesting to see the differences between dynamic dispatch and generics; and how they work behind the scenes, if the topic allows to talk about that ofc

  • @wijionemaster1875
    @wijionemaster1875 หลายเดือนก่อน

    Hello, your videos are amazing, you have a talent to explain such difficult concepts in very simple words! Visualizations are extremely helpful!
    Could you please also create a video regarding compiled and interpreted languages? How do interpreters work and how do they manage memory?
    For example, in javascript we don't specify a variable type, does it mean that only identifiers themselves are stored on the stack while the actual values are stored in memory?

  • @azmah1999
    @azmah1999 หลายเดือนก่อน +7

    I haven't looked at the comments yet so I do not know if it's been said, but I do not agree with you that it does not make sense to not split the problems here (10:10 ish) into subsets. If you have the minimum of each subset, it's very easy to find the global minimum. Same thing with maximum and whether or not to the list contains 101 (even though I admit that in the last case you are worsening you're best possible performance, if the value was at the beginning of the list). It's also easy for the mean, (either compute the mean in each thread then do a weighted average of the means, or just compute the sum in each thread and only do the division on the main thread).
    Apart from this nitpick, amazing work as always!

    • @Bornerdogge2
      @Bornerdogge2 หลายเดือนก่อน +1

      I was about to comment the same, you are correct!

    • @IvanToshkov
      @IvanToshkov หลายเดือนก่อน +1

      Exactly! This is basically the map/reduce architecture.

    • @rafaelfv3362
      @rafaelfv3362 หลายเดือนก่อน +2

      Yes, and in the case of the contains it is even possible to cancel the remaining threads early when one of the threads returns because it has found the value.

  • @zsombornyiro1156
    @zsombornyiro1156 หลายเดือนก่อน

    We always love a CoreDumpped video. Thank you George😊

  • @RandomDude-ry6wt
    @RandomDude-ry6wt หลายเดือนก่อน

    You're videos are helping me get through my course 😄

  • @joni5028
    @joni5028 หลายเดือนก่อน

    Amazing content. What a wild time to be alive and have free access to this quality educational videos. Thank you very much

  • @GamingfreakPC
    @GamingfreakPC หลายเดือนก่อน

    I read the book modern operating system by Tanenbaum S. twice to get the concepts but never understood it so detailed. It’s by far the best TH-cam channel to truly understand this topic. I recommend this channel since the first video. Do you have Patreon?

  • @RepChris
    @RepChris หลายเดือนก่อน

    10:10 @CoreDumpped A small correction:
    All of those can be parallelized to gain a significant performance improvement. For min, max, and mean you can just split it into n subsets, calculate the result for each of them, and then compute the value with the result of those n subsets again. This is still roughly a n fold increase in performance, (asymptotically it is exactly n fold) but of course not as much as if it is a fully parallelizable task.
    Likewise for contains you can run the check for n subsets instead, and once one of them finds 101 that process sets the output to true (and optionally tells all the other processes to stop looking). This is in fact perfect parallelizable since no actual communication between the processes is needed beyond a single shared output variable in memory. The speedup here is EXATLY n fold (assuming no overhead for thread creation etc).

  • @Zhiqian-r9g
    @Zhiqian-r9g หลายเดือนก่อน

    Love the way the knowledge is organized and presented. This is what is called "wisdom".

  • @coldball5578
    @coldball5578 หลายเดือนก่อน

    Amazing video! Multi threading, if done right, does wonders in improving performance. It works especially well when you are doing anything that is I/O bound because I/O is very slow.
    Had a scenario where I had to an array of tasks, each task was trying to fetched data from 4 different sources and then combined the result and stored it elsewhere. The time taken to complete 100 tasks took about 3 minutes, whereas, introducing threads cut short this time down to 15 seconds.
    While developing this, I also learned that you really need to be careful with multi threading. The tasks were assigned to a fixed size thread pool so as to not spawn too many threads. Well, it turns out if you do not handle the exceptions correctly, then the thread suspends. Now you have 1 less thread in the thread pool. Have it happen enough times and now you have hardly 1-2 threads trying to do all the tasks. The debugging of this issue was a roller coaster of emotions that I won’t ever forget.

  • @nollix
    @nollix หลายเดือนก่อน

    PERFECT visualizations. You really understand what makes concepts click and how they should be visualized.

  • @brocolli3096
    @brocolli3096 หลายเดือนก่อน

    Interesting to see another viewpoint on thread architecture, always thought its better to have a thread per physical core, so threads don't overwrite each other's local data, hence causing cache misses. I hope you explore more in that area. Nice video!

  • @lowkeygaming4716
    @lowkeygaming4716 หลายเดือนก่อน

    Thank you for this. As an IT grad this kind of deep dive into the concepts in programming were not discussed in our university this thoroughly. That's why when I get to work with CS grads I found out that I'm actually missing a lot of fundamentals in programming. 🥺

  • @MT-cf2ms
    @MT-cf2ms หลายเดือนก่อน +1

    I'm new here but this channel seems one of the best of its kind, good luck for the winner

  • @vartan_babayan_4388
    @vartan_babayan_4388 หลายเดือนก่อน

    I wish I would have seen all your videos during my Bc.
    Hell I'm in love your animations
    Would love to see more, especially about CPU Scheduling

  • @emilianoenriquez5637
    @emilianoenriquez5637 หลายเดือนก่อน

    It's impressive how far we've gotten. At first, it was more like electronics, but now we're starting to dive into computing; I love it

  • @mruhul4
    @mruhul4 หลายเดือนก่อน

    I love this channel. Everytime i learn something.

  • @andrewallbright
    @andrewallbright หลายเดือนก่อน

    Haven’t had this all explained so simply before. Great video!

  • @abc21000
    @abc21000 หลายเดือนก่อน

    Superb Explanation.
    Thank You.
    Please post more such videos.

  • @Mr.Tom9te
    @Mr.Tom9te หลายเดือนก่อน +1

    Falo do Brasil , sou muito seu fan desde o vídeo "why is Stack só fast", vídeo muito bom❤

  • @Eeatch
    @Eeatch หลายเดือนก่อน +1

    You're the best George!
    Thank you for your videos!

  • @will...
    @will... หลายเดือนก่อน

    Just discovered this channel and it's great! Much more digestible than my university courses.

  • @JamesRuga
    @JamesRuga หลายเดือนก่อน +4

    This channel is a gold mine! 🥳

  • @BruceDuncan
    @BruceDuncan หลายเดือนก่อน +4

    I'm not sure there is a better example but all four of the operations you mentioned can be parallelised :) min(array) is the same as min([min(subarray) for subarray in chunked(array)]). `contains` is best because the serial phase depends on the number of threads, not on the size of the data! There even algorithms like partial sum which are really important in data parallelism. [sorry for the bad pseudocode!]

    • @CoreDumpped
      @CoreDumpped  หลายเดือนก่อน +3

      Yes, but I never said they couldn't be. What I said is that to get the result of those operations you need the whole dataset anyways so assigning each task to a different thread makes more sense.
      Now, whether that's the best solution is out of the scope of the video.

  • @pablodevitoria
    @pablodevitoria หลายเดือนก่อน +1

    Hola. He entendido que hablas español. Gracias por la opción de audio en castellano. Voy a recomendar tu canal a mis alumnos para que entiendan de forma visual todos estos conceptos de Sistemas Operativos.
    Muchas gracias.

  • @heavyhauldude9914
    @heavyhauldude9914 หลายเดือนก่อน

    One of the only channels where I immediately like the video because I already know i will love it!

  • @manoharnalluri7
    @manoharnalluri7 หลายเดือนก่อน +2

    His explanation is god tier❤

  • @Timo-Epis
    @Timo-Epis หลายเดือนก่อน

    Man, this guys video's are so intuitive and so well explained.

  • @AbdulRahman-118
    @AbdulRahman-118 หลายเดือนก่อน

    This deserves its own award category. 🏆🌟

  • @vishal-rj4zh
    @vishal-rj4zh 13 วันที่ผ่านมา

    This channel videos are just🔥❤️❤️❤️❤️ plz provide next parts soon

  • @nynirf975
    @nynirf975 หลายเดือนก่อน +1

    can you make video about floating point numbers, how to convert into decimal, arithmetical operations with it and some another stuff

  • @khachikhatsagortsyan7027
    @khachikhatsagortsyan7027 หลายเดือนก่อน

    Thanks man, you always on top!Everyone just need to be cautious when working with chunks on separate threads. Make sure not to mutate the data or array you use, otherwise the program would become unpredictable.

  • @MohanadSaid-u8x
    @MohanadSaid-u8x หลายเดือนก่อน

    Im your first supporter if you decide to make a website like brilliant but for low level / computer science
    I really would love to see that at the same time i feel like its a missed opportunity where no other websites really take, maybe because everyone is already familiar and searches for normal learning, but then again the potential is apparent in your vids.
    Anyways keep up the quality content !

  • @MrLinkos
    @MrLinkos หลายเดือนก่อน

    Beautifully animated, makes the subject x10 easier to digest!

  • @m0r4ag
    @m0r4ag หลายเดือนก่อน +2

    The video is very cool, I really love the visuals and the animations to explain the concepts, what do you use or how do you make the animations?

  • @IvanKravchenko-k5y
    @IvanKravchenko-k5y หลายเดือนก่อน

    Your videos are so informative and valuable, thank you🙏

  • @GalaxyHomeA9
    @GalaxyHomeA9 หลายเดือนก่อน +1

    coredumped releases the video that I was excited about. Thanks

  • @Xeros08
    @Xeros08 หลายเดือนก่อน +2

    I think the usefullness of paralelism as being proportional to the consistency and complexity of the task.
    If the task is computationally simple, it will paralilize well. For example, adding up all the numbers on a big list, or sorring an array with merge sort.
    But things like calculating prime numbers, which is not a linear time problem, don't paralilize so well, as you'll be waiting for the slowest thread.
    This brings me to GPUs. GPUs are excellent for paralelization, great at vector math, but for them to be great, you should have every thread (aka, fragment) finish executing as close to eachother as possible.
    Tho, paralelization is still a great tool, but it should't be abused. The limiting factor will be the slowest thread, and you should scale the number of threads acordingly.
    Edit: As always, great video ❤

  • @sainagaviswachaitanyabasav8945
    @sainagaviswachaitanyabasav8945 หลายเดือนก่อน +1

    Regarding the point raised at 9:51, I think factors like synchronization overhead, cache coherency issues, and imperfect load balancing reduce efficiency. Additionally, physical constraints such as heat generation and power limits can lead to throttling, further hindering performance scaling. These challenges, combined with the inherent difficulty of fully parallelizing tasks, prevent linear scaling even in isolated environments

  • @kayakMike1000
    @kayakMike1000 18 วันที่ผ่านมา

    in the RISC-V ecosystem, there are "HARTs". HARTs are hardware supported threads. A core can be constructed with extra hardware that can make one core operate on two threads, as in hyper threading.

  • @howitwork-4u
    @howitwork-4u หลายเดือนก่อน

    the answer is of the question ask in video is depend on how much portion of (process/task/program )" sorry don't clear about these terms" are parallel and how much is number of core increase.
    then by applying amdahl's law
    speedup=1/[(1-P)+ P/n)].
    P = fraction portion the program that can be parallelized. e.g 0.4 for 40%.
    n= number of cores.
    From this, we can find speedup.
    To calculate performance increases with respect to time.
    then
    Tp= Ts/speedup.
    Tp is execute time after parallelize
    Ts is sequential time on single core or before parallelizing.

  • @Tarakiiht
    @Tarakiiht หลายเดือนก่อน

    Great explanation!! Please create video for kernel internal

  • @saifbenzamit5980
    @saifbenzamit5980 หลายเดือนก่อน +2

    I'm putting faith in you guys, up vote ❤

  • @nevokrien95
    @nevokrien95 หลายเดือนก่อน

    The 4 ops u gave actually allow for a very elegant data split sincw they are all reductions.
    For instance we can compute the qmx for each if the 4 slice then the maximum max is the true max.

  • @RalphWu55
    @RalphWu55 หลายเดือนก่อน +1

    9:41 Here the workload is distributed by splitting the data into 4 even subsets. Instead it could be distributed using a queue, where each thread pops the next number from the queue (or a small chunk of numbers if each pop is costly). This would result in a more even utilisation of all cores.

  • @bornmad
    @bornmad หลายเดือนก่อน

    How much we can get out of multiple cores depends on a lot of factors. Take example from video only, if given to check for prime numbers. If we have sorted data set and then we can get very optimised results when we try to distribute data modulo wise. Take example of 4 core , 1st number for core 1, 2nd number for core 2 , 3rd number for core 3, 4th number for core 4, then 5th for core 1 , 6th for core 2, 7th for core 3 , 8th for core 4. Although this will be very optimised but it needs pre computations or sorted datas. I also want to point out if one wants to find min, we can find min out of all cores with distributed datas and take min of them , this will be optimised.

  • @ValentinLezhnin
    @ValentinLezhnin หลายเดือนก่อน

    Brilliant as usual. Looking forward for a new episode with a code example.

  • @fabriai
    @fabriai หลายเดือนก่อน

    Great animation and excellent educational video. Thanks.

  • @BenoîtBurnichon
    @BenoîtBurnichon หลายเดือนก่อน

    This channel should have a lot more stars. It is christmas before the actual date to have a new video.

  • @RaviRanjan-fr2bt
    @RaviRanjan-fr2bt 9 วันที่ผ่านมา

    Core dumped has change the way of looking computer science which my University didn't taught ❤

  • @i.umarov5245
    @i.umarov5245 29 วันที่ผ่านมา

    We also need a video on how having several cores increase the overall CPI performance in different ways. I believe multi threading is just one way of utilizing several cores

  • @etienneperron7658
    @etienneperron7658 หลายเดือนก่อน

    i don't know if you take suggestions but a video on file descriptor (or pointer) could be entertaining. Like how file paging work and how process get access to file (and shared memory space)

  • @spacebuddy5339
    @spacebuddy5339 หลายเดือนก่อน

    I'm early!! Bro I love your videos so much. I'm subscribed and I like all of them. Please can you do a video on containers like Docker? An in-depth explanation please, thank you.

  • @Chinmay_Rath
    @Chinmay_Rath หลายเดือนก่อน

    Amazing explanation and video. Glad I came across this. Can you please also explain the difference between a cpu core and its virtual core and why it's needed, and whats the difference in hardware level?? I mean, when we say stuff like "This Intel chip has 4 physical cores and 8 virtual cores, i.e, two virtual cores for one physical core". What does this mean ?? How does this work ?? And how does it relate to the threads in this video ?? I'll be very grateful if you explain this.

  • @ShinSpiegel
    @ShinSpiegel หลายเดือนก่อน

    Another amazing video, like always. For me this video is the Christmas gift :)

  • @iqueirozgo
    @iqueirozgo หลายเดือนก่อน

    That's the kind of content that I like on TH-cam

  • @iloveparallelism
    @iloveparallelism หลายเดือนก่อน

    I love your video mam! Somehow your videos are almost exact as my class exams topics lol.
    Regarding the performance gain though, to me, there will always be overhead (distributing/broadcasting - gathering) to parallelism, and I would have to ask myself if parallelism worth it. Of course theres also the problem of equally distributed the work/data too.
    A simple example that my prof used to said is that: imagine you have to count 10 candies, it would be much faster if you count it yourself than to ask 10 people to count it and and summarize the result :)).

  • @ivankudinov4153
    @ivankudinov4153 หลายเดือนก่อน +2

    I know it's a bit more complicated, but, man, if you ever need a rest from dropping all these gems to achieve a better quality/result/polish in a return, please take it. Because your videos will still be played years later (yes, they're so good) and we can wait a couple of weeks

  • @mrobledo7197
    @mrobledo7197 หลายเดือนก่อน

    According to my university classes on computer architecture, the maximum theoretical speed-up you can achieve by parallelizing a problem is limited by Amdahl's Law. This depends on the fraction of the problem that can be parallelized. For instance, if a problem is 100% parallelizable, the theoretical maximum speed-up would be NN, where NN is the number of processors in the system. However, in practice, the actual speed-up is lower due to additional factors, such as the synchronization required for communication between processes, contention for shared resources (like data in memory), and the overhead of creating and managing threads or processes. These practical limitations often prevent parallelization from reaching its theoretical maximum.

  • @web-dev-spider
    @web-dev-spider หลายเดือนก่อน

    This video made me incredibly excited for your next video.

  • @gFamWeb
    @gFamWeb หลายเดือนก่อน +6

    Really good video! Honestly, though, Data vs Task parallelism, to me, starts jumping more into arbitrary distinctions (as it doesn't actually affect what the CPU does) but perhaps the next video will make it more clear. Looking forward to it!

    • @Dremth
      @Dremth หลายเดือนก่อน

      It becomes more distinct when you have threads that depend on other threads, or if you have any writes to shared data. Basically, data parallelization can often imply that the underlying data isn't going to change, so the tasks are more isolated, and thus it's a much more straightforward approach to scaling wide with cores. With task parallelization, those tasks may have side effects that alter the underlying data and/or access a shared resource that can only be safely accessed by one thread at a time, meaning it's more important that these methods involve locking mechanisms to avoid corrupting things. When you have to use locking mechanisms, that's when it does have a tangible effect on what the CPU is doing.

  • @John-zz6fz
    @John-zz6fz หลายเดือนก่อน +1

    I'd love to see you cover GPU architecture when you finish up the CPU and OS series!

  • @edjohnson5875
    @edjohnson5875 หลายเดือนก่อน +3

    9:45 How much performance gain we can achieve from parallel operations ? Let's try (I put some icons for reference of the formulas 🔶️🔺️)-->
    Things to consider :
    - Time cost of parallelization P : thread creation, data splitting and reconstruction (or result merging) time
    - Size of the data we manipulate N
    - Number of cores C
    - Each split has a size S = N/C
    - The executed algo (prime test here) : The bigger the number the longer it takes to test it (looking for dividers under the square root of it) on this specific case
    Tbase : execution time with no parallelization
    Tpara : execution time with parallelization
    ⭐ Having a gain means Tpara < Tbase 🔺️
    Tpara = P + Tslowest_thread 🔶️ (700ms here)
    We have this slow thread here because of the algo executed and the fact that the first split contains more big numbers than the others.
    If we randomly change the input for each execution during the test, having a more fair distribution of big numbers in average, we have Tpara = P + Tavg_thread 🔶️. I think this is the instinctive theory ->
    Tavg_thread =~ Tbase / C
    -> Tpara = P + Tbase / C 🔶️
    -> we want P + Tbase / C < Tbase to have a gain 🔺️
    -> P < Tbase ( 1 - 1/C ) 🔺️
    C = 1 (one core) : no parallelization
    C = 2 : the parallelization cost in time has to be less than Tbase/2 or 50%Tbase to have a gain (P < 50%Tbase)
    C = 4 : P < 75%Tbase
    P has to be lower than a number (%Tbase) growing with the number of cores. In other words the gain seem easier and easier to have as C increase. ⚠️ BUT the number of cores used (C) impacts the parallelization cost in time (P). P = f(C)
    Pause ! What am I doing ?! First, that was not the question, second I am not an expert and third, George will explain this way better. Let's go back to the point.
    ⭐ Gain = Tbase - Tpara
    Using 🔶️ -->
    Gain = Tbase - ( P + Tbase / C )
    Gain = Tbase * (1 - 1/C) - P 💥
    If we force the equal distribution of big numbers in each split, the time of execution of this should be added to P.
    I would be glad to be corrected if I made any mistake.
    PS : George @Core Dumped is fantastic ! 🔥
    3Blue1Brown of low level software eng. I am binge watching and I love it.
    Any chance to know which software you use ?
    Thank you for the amazing job !

  •  หลายเดือนก่อน

    Keep up the high quality content

  • @bruceu526
    @bruceu526 หลายเดือนก่อน

    This channel is gold. Make me want to study Comutper Engineering

  • @gamingdude7959
    @gamingdude7959 หลายเดือนก่อน

    "hi friends!, my name is george" is soo calmly said by TTS :))
    Thank you george