Nvidia CUDA in 100 Seconds

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ม.ค. 2025

ความคิดเห็น • 1.4K

  • @Fireship
    @Fireship  11 หลายเดือนก่อน +1595

    Shoutout to Nvidia for hooking me up with an RTX4090 to run the code in this video, get the CUDA toolkit here nvda.ws/3SF2OCU

    • @universaltoons
      @universaltoons 11 หลายเดือนก่อน +28

      🥇

    • @light-gray
      @light-gray 11 หลายเดือนก่อน +49

      ZLUDA be like:

    • @TuxikCE
      @TuxikCE 11 หลายเดือนก่อน +408

      yes mom, I need a 4090 to run CUDA.

    • @u_j134s
      @u_j134s 11 หลายเดือนก่อน +150

      Damn you really put that rtx4090 through hell

    • @HolyRamanRajya
      @HolyRamanRajya 11 หลายเดือนก่อน +81

      So this is sponsored?

  • @tigerseye1202
    @tigerseye1202 11 หลายเดือนก่อน +10080

    Little know fact, CUDA is actually so fast, that it can bend spacetime and make 100 seconds last 3 minutes and 12 seconds, truly revolutionary.

    • @killerdroid99
      @killerdroid99 11 หลายเดือนก่อน +518

      Underrated comment

    • @JJGlyph
      @JJGlyph 11 หลายเดือนก่อน +634

      He ran the seconds in parallel with Cuda.

    • @sarimsalman2698
      @sarimsalman2698 11 หลายเดือนก่อน +275

      Serious question, why are these videos never 100 seconds?

    • @NigerianWeeb
      @NigerianWeeb 11 หลายเดือนก่อน +251

      Because it's just the name of the series. A catchy title, really. I don't think anyone cares if they're exactly 100s.

    • @Clarity-808
      @Clarity-808 11 หลายเดือนก่อน +183

      To be fair, he explained it in 90 seconds, the rest is building an app.

  • @mjiii
    @mjiii 11 หลายเดือนก่อน +2529

    The #1 computing platform for vendor lock-in

    • @PRIMARYATIAS
      @PRIMARYATIAS 11 หลายเดือนก่อน +143

      And so is Apple.

    • @AchwaqKhalid
      @AchwaqKhalid 11 หลายเดือนก่อน +73

      Dell in the server space too

    • @turolretar
      @turolretar 11 หลายเดือนก่อน +73

      Cisco as well

    • @anonymouscommentator
      @anonymouscommentator 11 หลายเดือนก่อน +77

      yall forgetting about aws? 😂

    • @ps3guy22
      @ps3guy22 11 หลายเดือนก่อน +101

      No, Nvidia is an open computing platform dedicated to the development of democratized development and open standa--- Pfff 🤣🤣🤣 hahdahha!!

  • @meh3lp
    @meh3lp 11 หลายเดือนก่อน +1299

    0:36 this just taught me matrix multiplication, thanks

    • @alvinbontuyan8083
      @alvinbontuyan8083 11 หลายเดือนก่อน +91

      The best thing that had ever happened to me was figuring our what matrices actually represent (a linear transformation) and I've been able to do matrix multiplication without any memorizing simply because its just intuitive now. Try this also because schooling has failed us

    • @_rshiva
      @_rshiva 11 หลายเดือนก่อน +16

      I think that is taken from @3blue1brown, @Fireship ??

    • @goddamnit
      @goddamnit 11 หลายเดือนก่อน

      ​@@alvinbontuyan8083 can you give a quick example on what you mean with this? I'm not that smart, thanks!

    • @AiSponge2
      @AiSponge2 11 หลายเดือนก่อน +16

      lmao fr, those 3 seconds are extremally helpful

    • @DanielMaixner
      @DanielMaixner 11 หลายเดือนก่อน +2

      I was thinking the same thing. I couldn't understand it from teachers and 3s animation made it make sense

  • @mrgalaxy396
    @mrgalaxy396 11 หลายเดือนก่อน +3810

    I've done a bit of CUDA in uni for a class in parallelism. Let me tell you, writting truly parallel code is a pain in the ass. Ain't no way all those scientists are writing CUDA code, probably some Python abstraction that uses C++ and CUDA underneath.

    • @acoupleofschoes
      @acoupleofschoes 11 หลายเดือนก่อน +651

      Like PyTorch and Tensorflow

    • @Imperial_Squid
      @Imperial_Squid 11 หลายเดือนก่อน

      "model.to("cuda:0") is the only cuda you need to know unless you're developing new algorithms or doing something truly wacky

    • @MaeLSTRoM1997
      @MaeLSTRoM1997 11 หลายเดือนก่อน +89

      some (x) mostly (o)

    • @oksowhat
      @oksowhat 11 หลายเดือนก่อน +254

      yeh thats why pytorch and tensorflow exist, i have parallelism and HPC both this sem, writing openmp and MOI codes, truly a pita

    • @CraftingCake
      @CraftingCake 11 หลายเดือนก่อน +492

      There are a few geniuses who write libraries and then there are thousands of devs who build products out of them....

  • @WolfPhoenix0
    @WolfPhoenix0 11 หลายเดือนก่อน +241

    I did some CUDA programming assignments for my college Parallel Computing class.
    That course was the second hardest CS course I've ever taken (The hardest one is Compilers but that's in its own league). Human brains really weren't designed to think in parallel.

    • @DK-ox7ze
      @DK-ox7ze 11 หลายเดือนก่อน +3

      Which college and course?

    • @skyhappy
      @skyhappy 11 หลายเดือนก่อน

      The teacher probably sucked like most academic teachers. If you had fireship it would be a hundred times easier

    • @KoaIa200
      @KoaIa200 11 หลายเดือนก่อน +23

      I would argue that people were not really "designed" to think in any specific way... neuroplasticity for the win... same way that most programmers can think of code. Practise makes perfect.

    • @KoaIa200
      @KoaIa200 11 หลายเดือนก่อน +4

      @@duckbuster1572 It's common for it to be a course in your last year of undergrad... I dont see why it would be horrific.

    • @khSoraya01
      @khSoraya01 10 หลายเดือนก่อน

      Which kind of projects? I'm looking for some projet ideas

  • @0seele
    @0seele 11 หลายเดือนก่อน +635

    Seeing "Hi Mom!" continue to be in your videos is such a beautiful thing. Hope you're holding up well

    • @FengHuang13
      @FengHuang13 11 หลายเดือนก่อน +18

      Yes, my eyes got wet when I saw that

    • @forhadrh
      @forhadrh 11 หลายเดือนก่อน +23

      Mom be like: I am proud of you, my son

    • @kamikaze9271
      @kamikaze9271 11 หลายเดือนก่อน

      Wait, where?

    • @forhadrh
      @forhadrh 11 หลายเดือนก่อน

      Where? What did you watch in this video then, lol. @@kamikaze9271
      Here: 1:45, 2:53

    • @depralexcrimson
      @depralexcrimson 10 หลายเดือนก่อน

      ​@@kamikaze9271 2:52

  • @theycallmerye3
    @theycallmerye3 11 หลายเดือนก่อน +113

    ngl, I'm really loving how often these videos are being uploaded. It's often, but not so often that I feel overwhelmed and just spaced out enough that I feel a little excited when a new one comes out!

    • @YOTUBE8848
      @YOTUBE8848 11 หลายเดือนก่อน +3

      wait until he drops some existential crisis type content lol

  • @8XN72Hw_xK
    @8XN72Hw_xK 11 หลายเดือนก่อน +138

    Wrote Cuda at university .. getting the indices, blocks etc right ... that was fun (also since thread count depends on the actual GPU model). For the final project, we were allowed to use libraries such as thrust which made my life a ton easier by abstracting away most of the fun stuff.

    • @KoaIa200
      @KoaIa200 11 หลายเดือนก่อน +4

      thread count is not depended on GPU model (max 1024 threads per block), total block size and number of cores are depended on number of SMs and cuda computability.

    • @Brahvim
      @Brahvim 11 หลายเดือนก่อน +2

      Sounds like the "fun" was actually "fun boilerplate but it's still just boilerplate". Correct? Or... are you being _purely_ sarcastic?

    • @8XN72Hw_xK
      @8XN72Hw_xK 11 หลายเดือนก่อน

      @@BrahvimBoth actually. It was fun in the beginning, but with more complex projects/tasks it became harder to understand how to use it correctly (espeically kernel launch configs with the dimensions, etc). Mabye, with more experience, it would be easier for me today than it was at that time.
      But don't get me wrong, they also showed how to do the same thing with OpenCl and the amount of boilerplate code for this to run was way more than with Cuda.
      And when they allowed using thrust for the final project, most of the boilerplate code was gone because thrust abstracts that away. It was more fun to work with an API that offers host and device vectors and a standard library for common tasks. But, thrust also abstracts away the launch configurations for kernels etc, so you loose control (which was fine for me because I struggelded with the more advanced concepts). But I guess you will loose some speed/memeory effeciency like with all abstractions.

    • @8XN72Hw_xK
      @8XN72Hw_xK 11 หลายเดือนก่อน +1

      @@KoaIa200you are right. I am sorry. The more advanced kernel launch configs with block size etc was quite hard for me and I haven't used Cuda in years now. But I remeber struggeling with the concepts after the initial easy tasks

    • @8XN72Hw_xK
      @8XN72Hw_xK 10 หลายเดือนก่อน +1

      @@BrahvimNo, it actually was fun, but it is also hard. And if you compare to OpenCL it is actually much much less boilerplate code.
      In the beginning, exercise were quite easy but with more complex tasks, it became much harder. For the final project we were allowed to just thrust which is a library that makes things much easier. E.g. it provides host and device vectors and it also handles all boilerplate stuff. However, you will loose control because it is a abstraction and probably some speed. But today, if I would need to do Cuda again it would be with thrust (at least in the beginning)

  • @smx75
    @smx75 11 หลายเดือนก่อน +387

    0:45 IEEE 754 moment

    • @cloudytheconqueror6180
      @cloudytheconqueror6180 11 หลายเดือนก่อน +3

      When you use TFLOPs, is it single precision or double precision? Because I see double precision here.

    • @adialwaysup8184
      @adialwaysup8184 11 หลายเดือนก่อน +21

      Gives me PTSD from my master's thesis. Had to modify 4 flags in clang to get acceptable results. Took me a while to figure out.

    • @Temari_Virus
      @Temari_Virus 11 หลายเดือนก่อน +11

      ​@@cloudytheconqueror6180Single precision. Double precision is often much slower, though the rtx 4090 is just able to get into the teraflop range for f64

  • @imWaytooRad
    @imWaytooRad 11 หลายเดือนก่อน +21

    Thanks! I was having this discussing with my coworkers the other day about what separates a gpu from a cpu and this was an excellent explanation!

  • @ucantSQ
    @ucantSQ 11 หลายเดือนก่อน +19

    Whoa, my universes are operating in parallel. I just learned about CUDA this morning for the first time, and here's a new fireship video about it.

  • @bartlx
    @bartlx 11 หลายเดือนก่อน +41

    Nice to see a video touching C++'s ecosystem for a change. Now make one about SYCL, so even people who don't find free RTX 4090 cards in their mailbox can get into high performance parallel computing using modern ISO C++ instead of custom CUDA syntax.

    • @vladislavakm386
      @vladislavakm386 11 หลายเดือนก่อน

      yeah, Nvidia dominates in parallel computing because software engineers only know CUDA.

    • @TheRealFFS
      @TheRealFFS 10 หลายเดือนก่อน +5

      @@vladislavakm386 You got that backwards, but ok.

    • @JohnUrbanic-m3q
      @JohnUrbanic-m3q 8 หลายเดือนก่อน

      SYCL is needlessly low level. Use OpenMP, with GPU targets.

  • @Julzaa
    @Julzaa 11 หลายเดือนก่อน +684

    1:09 still day zero of not mentioning AI

    • @2099EK
      @2099EK 11 หลายเดือนก่อน +77

      AI is definitely worth mentioning.

    • @rkvkydqf
      @rkvkydqf 11 หลายเดือนก่อน +58

      ​@@2099EKPlease, can we just don't? Physics models (for example) are much more interesting (in my opinion) than curve fitting on steroids. (Just a matter of avoiding a cliche and showing a greater range of GPU computing applications)

    • @thecutepika
      @thecutepika 11 หลายเดือนก่อน

      ​Why, fitting so much complex curves that reflect reality is indeed worth mentioning ​@@rkvkydqf

    • @devrim-oguz
      @devrim-oguz 11 หลายเดือนก่อน +1

      It’s more like zero minutes 😂

    • @mechadeka
      @mechadeka 11 หลายเดือนก่อน +43

      @@anon8510You're literally on a technology channel, you Twitter drone.

  • @gagd7351
    @gagd7351 9 หลายเดือนก่อน +3

    As a programmer I absolutely love your series on programming languages and tools ! Cannot be more clear, and full of knowledge. Thank you. This also refresh common knowledge such as the C video!

  • @petrsehnal7990
    @petrsehnal7990 11 หลายเดือนก่อน +195

    Man, you are a genius. I wrote my masters thesis on CUDA and there's no way how I would be able to explain this in 100 seconds.
    Respect! 🎉

    • @klekaelly
      @klekaelly 11 หลายเดือนก่อน +7

      Can I read your master's thesis?

    • @Real-Hg-Mercury
      @Real-Hg-Mercury 11 หลายเดือนก่อน

      same , LMK when you get it@@klekaelly

    • @maymayman0
      @maymayman0 11 หลายเดือนก่อน +17

      Could you do it in 192 seconds??

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +6

      Really, I thought Opencl will do this just fine. Funny thing is
      ALL GPU's are designed to be parallel computers and
      AMD in actually more massively parallel than Ngreedia.
      He didn't describe anything that is just cuda specific, did you really not get that when writing your thesis?

    • @petrsehnal7990
      @petrsehnal7990 11 หลายเดือนก่อน +4

      @klekaelly thank you, but it was on cuda version 1.0, which is really outdated from both software and hardware perspectives. Furthermore it is not in English. But I really appreciate your interest!

  • @MaxoticsTV
    @MaxoticsTV 11 หลายเดือนก่อน +14

    Funny, I had to install NVIDIA CUDA for a thing I'm doing and forgot what CUDA does, searched it, and found this video that was just posted an hour ago! WHAT TIMING!!!

  • @4RILDIGITAL
    @4RILDIGITAL 11 หลายเดือนก่อน

    Impressive explanation of how we can harness the power of our GPU using Nvidia's CUDA for more than just gaming. The practical demonstration expounded the potential of parallel computing considerably.

  • @Rohinthas
    @Rohinthas 11 หลายเดือนก่อน +13

    Not using or planning to use CUDA but man did this just help me make sense of some terms I see being thrown around! Awesome!

  • @Munto-Z
    @Munto-Z 11 หลายเดือนก่อน +1238

    Bruh, are you my FBI agent? I just looked CUDA up a few hours ago.

    • @guinea_horn
      @guinea_horn 11 หลายเดือนก่อน +130

      Yeah man, he monitored your web traffic, saw that you wanted to learn about cuda, and then made this video as fast as he could since he knew you would watch it.

    • @MrMudbill
      @MrMudbill 11 หลายเดือนก่อน +41

      Now I'm scared about tomorrow's video

    • @ABZein
      @ABZein 11 หลายเดือนก่อน +11

      I was thinking to learn about CUDA. He is a mind reader

    • @gosnooky
      @gosnooky 11 หลายเดือนก่อน +8

      That's classified.

    • @soufianenajari8900
      @soufianenajari8900 11 หลายเดือนก่อน +5

      literally doing an homeword in cuda rn

  • @scapegoat079
    @scapegoat079 11 หลายเดือนก่อน +1

    Yo I just wanted to say thank you for making this kind of stuff so interesting and digestible. You make these extremely complex, time intensive languages, apis, tools, etc., and make them incredibly approachable.
    Love your content. Cheers.

  • @somerandomdudemc6201
    @somerandomdudemc6201 10 หลายเดือนก่อน +3

    Hello sir,
    Today is my High school IT exam.
    I thank you for giving so much knowledge in these years.
    Thank you sir

  • @boredofeducation-sb6kr
    @boredofeducation-sb6kr 11 หลายเดือนก่อน +1

    I loved the animations and thr explanation..i just finished a cuda course for my masters so it was minx blowing to see a whole weeks worth of lectures effortlessly compressed in ... 100 seconss

    • @khSoraya01
      @khSoraya01 10 หลายเดือนก่อน

      Can I see the course?

  • @lucasgasparino6141
    @lucasgasparino6141 11 หลายเดือนก่อน +5

    Hey, that was nice! I use both CUDA and OpenACC EXTENSIVELY to build CFD applications, and the performance on gpus is really fantastic... when done well xD strongly recommend against managed memory for complex production codes, if only for the fact that it seems to disable device/device DMA comms when using MPI. For anyone thinking about porting to GPUs, recommend to not half-arse it, and just make all data available to devices. Host/device exchanges can be brutally costly, and will likely eat up all your gains. Finally, it works with C and Fortran as well, for anyone curious about it :) Fireship, be nice to see a beyond 100 seconds of this, covering OpenACC and offloaded OpenMP as well😊

    • @jaiveersingh5538
      @jaiveersingh5538 11 หลายเดือนก่อน

      Which CFD software has CUDA acceleration? Just Ansys Fluent right now right?

    • @lucasgasparino6141
      @lucasgasparino6141 11 หลายเดือนก่อน

      @adialwaysup8184 not really, we performed some testing on A100s and H100s and offloaded omp was WAY slower. Sure it's portable, but acc is still getting love. It's also syntatically easier and cleaner in my opinion.

    • @lucasgasparino6141
      @lucasgasparino6141 11 หลายเดือนก่อน

      @jaiveersingh5538 take a look at research code. Nek5000 uses CUDA, and as well as NekRS if I remember well. Our own code started as CUDA Fortran but we eventually moved to OpenACC. Easier to use and explain to other users. Quite a few libraries behind research soft also uses CUDA, or even OpenCL. For matrix free SEM methods, CUDA might be a bit hard to implement, but it's as fast as it gets.

    • @adialwaysup8184
      @adialwaysup8184 10 หลายเดือนก่อน

      @@lucasgasparino6141 For us, omp was performing 2% slower than acc and 6-8% slower than cuda. Though, the performance was much worse on clang than nvhpc

    • @adialwaysup8184
      @adialwaysup8184 10 หลายเดือนก่อน

      @@lucasgasparino6141 In my experience, currently, there's a major discrepancy in how well a compiler optimizes code for accelerators. The is doubly important when it comes to nvidia, since the nvptx backend is far from perfect. But if the same tests are done on nvidia say with nvhpc. I found an overall 2-3% gap between openmp and openacc. I do agree with your second point, openacc is much cleaner to write and integrates well, but at that point you're backing up in a corner with nvidia's hardware. Openacc might be an open standard, but no one except nvidia gives it a serious consideration. If you're going all in with nvidia anyway, why bother with openacc and just move to cuda.

  • @n.w.4940
    @n.w.4940 11 หลายเดือนก่อน +13

    Aside from this very informative video ... Heartwarming that you put in that "Hi mom"-message.
    Probably one of the most concise videos on this topic.

  • @KorruFreez
    @KorruFreez 10 หลายเดือนก่อน +46

    Sometimes I regret my career choices

    • @xt-cj7jg
      @xt-cj7jg 6 หลายเดือนก่อน +6

      always time. learning nevet stops so why should you?

    • @vkhs562
      @vkhs562 5 หลายเดือนก่อน +1

      ​@@xt-cj7jgyeah exactly

    • @gabrielaleactus9932
      @gabrielaleactus9932 5 หลายเดือนก่อน +4

      What happened bud

    • @vkhs562
      @vkhs562 4 หลายเดือนก่อน

      @@KorruFreez Did you choose VLSI

    • @prozac1127
      @prozac1127 หลายเดือนก่อน +1

      No. All fields are interconnected. Just find out how your expertise will fit into the current trend.

  • @otakuotaku6774
    @otakuotaku6774 11 หลายเดือนก่อน +39

    Bro, Can you do more Hardware videos, just like this

    • @recursion.
      @recursion. 10 หลายเดือนก่อน +5

      Hardware videos 💀

  • @Officialjadenwilliams
    @Officialjadenwilliams 11 หลายเดือนก่อน +41

    Surprised that it took this long to get a CUDA in 100 seconds. 😆

    • @scapegoat079
      @scapegoat079 10 หลายเดือนก่อน +7

      I did not expect this...
      I'm calling Miguel.

    • @jacobgames3412
      @jacobgames3412 4 หลายเดือนก่อน +1

      Same

  • @StefanoBorini
    @StefanoBorini 11 หลายเดือนก่อน +8

    Interesting little factoid: if you are doing parallel cuda programming, and have to compute on a subset of a large block of memory, often it's faster to operate on the whole block and simply ignore the additional data, without checking for actual boundaries. If conditions kill performance in cuda kernels, at the point that often it pays off to just compute garbage and discard it at the end, rather than prevent it from computing it.

    • @9SMTM6
      @9SMTM6 11 หลายเดือนก่อน +1

      If conditions are usually translated to compute discard.
      But they give false appearances, and also if the if condition is difficult to compute that adds to the runtime cost.

    • @KoaIa200
      @KoaIa200 11 หลายเดือนก่อน

      warp divergence does not matter if the other threads are doing nothing in the first place... just dont have if else and you are fine.

    • @janisir4529
      @janisir4529 10 หลายเดือนก่อน

      Better add those bounds checks, don't want to crash with access violations...

  • @wywarren
    @wywarren 11 หลายเดือนก่อน +4

    The SDK has already gotten alot more convenient in the last 5-6 years. Memory used to require the SDK to manually copy back and forth. From what I remember the manual copying is still available, but in my DLI course when I was trying it out, having it be auto managed is slower than manually moving it all into memory first and running the operation. Using it in managed improves the developer experience signficantly but on each access if the memory block hasn't been copied I believe the managed system will still need to move it over on demand. To pass my CUDA DLI exam to meet the passing criteria, one of the steps I opted to manually copy. One can only dream of the day we have unified memory architectures then we don't have to deal with the copies.

    • @niamhleeson3522
      @niamhleeson3522 11 หลายเดือนก่อน +5

      Yeah, you can probably keep on dreaming about that. Memory management is the primary contradiction that you must solve if you want your CUDA program to go fast. Either you need to get all of the data in the register file / shared memory or you have Too Much Data and have to do horrible things and maybe even have some of that data out of core and it will go much slower than it could. There's no cache coherence protocol so if you need it you have to move things around manually and do some synchronization. Fun stuff.

  • @neuronscale
    @neuronscale 11 หลายเดือนก่อน +1

    Great presentation of the topic of CUDA architecture and Nvidia GPUs in such a compact and fast form. As always, brilliant video!

  • @arinahomuleba4165
    @arinahomuleba4165 11 หลายเดือนก่อน +31

    You just explained parallel computing in 100s better than my lecturer did in more than 100 days🔥

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +6

      Yet misses the fact this is NOT cuda specific.

    • @bakedbeings
      @bakedbeings 11 หลายเดือนก่อน +1

      Or your lecturer set you up well to follow this very basic, high speed summary. Like a reader of the LOtR series can see meaning in the film series' long, dreary shots.

  • @sachethana
    @sachethana 11 หลายเดือนก่อน +1

    Cuda is Awesome! I did one of my thesis on parallel processing in 2016 using CUDA for a super fast blood cells segmentation. Then used CUDA for mining crypto on the GPU.

  • @sepro5135
    @sepro5135 11 หลายเดือนก่อน +4

    Im using cuda for fluid simulation, it’s a real game changer in terms of speed

  • @BattlewarPenguin
    @BattlewarPenguin 11 หลายเดือนก่อน +1

    Awesome video! Thank you for the heads up in the conference!

  • @davidf6592c
    @davidf6592c 11 หลายเดือนก่อน +29

    I'll admit, I tear up a little every time I see the "Hi Mom" in your vids.

  • @zard0y
    @zard0y 11 หลายเดือนก่อน +2

    This channel should go down the history is the greatest work done by humanity. Absolutely legendary introductions & quality level

  • @TheHackysack
    @TheHackysack 11 หลายเดือนก่อน +107

    1:39 Complier :D

    • @YuriG03042
      @YuriG03042 11 หลายเดือนก่อน +2

      no, complier

    • @Sarfarazzamani
      @Sarfarazzamani 11 หลายเดือนก่อน +1

      Gotcha moment😀

    • @incognito3678
      @incognito3678 10 หลายเดือนก่อน

      Marcomplier

  • @h3lpkey
    @h3lpkey 11 หลายเดือนก่อน +1

    Many thanks for every video on your channel, you doing very big and cool work

  • @bnaZan6550
    @bnaZan6550 11 หลายเดือนก่อน +47

    You didn't explain what CUDA does you explained what a GPU does...
    CUDA just has special optimizations over normal GPU parallels.
    Your example will work fine on every GPU and doesn't require CUDA to be parallel.
    All GPUs calculate the pixels using multi threading and multiple cores.

    • @Aoredon
      @Aoredon 11 หลายเดือนก่อน +4

      I mean he explained how to get started with it and clarified how it's different to programming on the CPU. Also I'm pretty sure the > syntax is specific to CUDA so you wouldn't be able to just run this anywhere. And GPUs in graphics are usually just dealing with essentially a 2D array of pixels rather than 3D like here.

    • @HoloTheDrunk
      @HoloTheDrunk 11 หลายเดือนก่อน +13

      @@Aoredon AMD's ROCm also uses the > syntax and I kinda agree with OP, this would've been good if it was titled "GPUs in 100 seconds" but as things stand it's hardly anything CUDA-specific

    • @oghidden
      @oghidden 11 หลายเดือนก่อน +1

      This is a summary channel, not overly detailed.

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +2

      Correct and well said!

    • @julesoscar8921
      @julesoscar8921 11 หลายเดือนก่อน +3

      The extension of the file was .cu tho

  • @AO-ek9qw
    @AO-ek9qw 11 หลายเดือนก่อน +2

    0:36 this matrix multiplication animation is really REALLY good!!!!!

  • @ren3105
    @ren3105 11 หลายเดือนก่อน +6

    dam bro i have my linear algebra exam next week and you just taught me how to matrix multiply at 0:36 (teacher took 3 classes to explain)

  • @BingleBangleBungle
    @BingleBangleBungle 11 หลายเดือนก่อน +2

    This is a very slick advert for Nvidia 😅 didn't realize it was an ad until the end.

  • @copyrightstrike7637
    @copyrightstrike7637 26 วันที่ผ่านมา

    Multi-threading and distributed data parallelism are two of the most powerful tools for a modern scientist to have under their belt today if they write code that needs to be fast. Started writing a paper that utilized optical flow algorithms deployed on a multi-gpu environment. I had a beginner-to-intermediate level knowledge of python. Never touched this stuff before. Spent 2 weeks trying to figure out how to get a massive tensorflow model to fit across multiple GPUs. I still don't know what I am doing, but it works!

  • @augustinmichez8874
    @augustinmichez8874 11 หลายเดือนก่อน +13

    0:46 truly a masterpiece from our beloved GPU

    • @augustinmichez8874
      @augustinmichez8874 11 หลายเดือนก่อน +1

      @@starsandnightvision not a native speaker but ty for pointing it out

  • @D.u.d.e.r
    @D.u.d.e.r 2 หลายเดือนก่อน

    Nicely explained, thank u! This is why your channel is special👍👍

  • @Serizon_
    @Serizon_ 11 หลายเดือนก่อน +19

    Julia can run directly on GPU btw.

    • @TrendNipper
      @TrendNipper 11 หลายเดือนก่อน +3

      Really? Can you run it on any GPU? Or it's locked to NVIDIA?

    • @mjiii
      @mjiii 11 หลายเดือนก่อน

      ​@@TrendNipper There is CUDA.jl for NVIDIA/CUDA, AMDGPU.jl for AMD/ROCm, and Metal.jl for Apple/Metal. So not *any* GPU but most of the recent mainstream GPUs.

    • @turolretar
      @turolretar 11 หลายเดือนก่อน

      how is that possible

    • @bepamungkas
      @bepamungkas 11 หลายเดือนก่อน +2

      ​@@TrendNipperalmost all gpu (including apple silicon) is supported.

    • @DFPercush
      @DFPercush 11 หลายเดือนก่อน

      @@turolretar I guess they have SPIR-V as a compilation target.

  • @GazziFX
    @GazziFX 24 วันที่ผ่านมา +1

    CUDA is not only one platform which allows GPGPU, there are also OpenCL, DirectCompute, FireStream

  • @demonfedor3748
    @demonfedor3748 11 หลายเดือนก่อน +7

    Just recently seen the news abour Nvidia banning the use of translation layers on CUDA software like ZLUDA for AMD. That video's right on time.

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +5

      Which is what he should be making a video on but you don't get free 4090's for that content.

    • @demonfedor3748
      @demonfedor3748 11 หลายเดือนก่อน +4

      @@noanyobiseniss7462 NVIDIA doesn't wanna let go that sweet sweet monopoly type proprietary stuff.

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +3

      @@demonfedor3748 Pretty anti competitive company that bleeds users dry. I have no clue why its userbase is so filled with gaslit fanbois. I guess it comes down to the misery likes company mantra.

    • @demonfedor3748
      @demonfedor3748 11 หลายเดือนก่อน

      @@noanyobiseniss7462 Every big company wants to get as much profit as the next guy. NVIDIA does it through proprietary stuff, AMD does it by open standarts to claim the moral high ground. Pros and cons to each approach but the goal remains the same. NVIDIA has a lot of fans because they innovate a lot and are trailbrazers in multiple areas. Real time hardware ray tracing, DLSS, G-SYNC, frame generation, GPGPU aka CUDA, OPtiX, just to name a few. I know most of this stuff is proprietary and/or hardware locked but it's still innovation. I don't mean that AMD doesn't innovate. Mantle that subsequently led to Vulkan was a big deal, chiplet GPU and CPU design, 3D-Vcache on CPUs and GPUs, SAM. There's no clear winner, however NVIDIA is currently performance king. Intel wants in the game for over 15 years but they got big shoes to fill. Was a big blow when Larrabee failed.

    • @demonfedor3748
      @demonfedor3748 11 หลายเดือนก่อน

      @@noanyobiseniss7462 Every big company wants to get as much profit as the next guy. NVIDIA does it through proprietary stuff, AMD does it by open standarts to claim the moral high ground. Pros and cons to each approach but the goal remains the same. NVIDIA has a lot of fans because they innovate a lot and are trailbrazers in multiple areas. Real time hardware ray tracing, DLSS, G-SYNC, frame generation, GPGPU aka CUDA, OPtiX, just to name a few. I know most of this stuff is proprietary and/or hardware locked but it's still innovation. I don't mean that AMD doesn't innovate. Mantle that subsequently led to Vulkan was a big deal, chiplet GPU and CPU design, 3D-Vcache on CPUs and GPUs, SAM. There's no clear winner, however NVIDIA is currently performance king. Intel wants in the game for over 15 years but they got big shoes to fill. Was a big blow when Larrabee failed.

  • @tommy.3377
    @tommy.3377 10 หลายเดือนก่อน +1

    This is the best ... This guy is the best ... Thank you Jeff ... All the interest that was developed in me was due to watching your videos .... Please don't quit ... Keep on making such good and informative videos for all of us ... Thanks again ... :-)

  • @batoczki93
    @batoczki93 11 หลายเดือนก่อน +53

    But can CUDA center a div?

    • @abhishekpawar921
      @abhishekpawar921 10 หลายเดือนก่อน +1

      💀💀💀

    • @drangertornado
      @drangertornado 10 หลายเดือนก่อน

      Yes when you center a div in CSS, the browser uses your GPU for rendering the pages on your browser

    • @mulletmate8
      @mulletmate8 9 หลายเดือนก่อน +1

      center div
      exit vim
      I use arch btw
      hmm yes, very original "I've been programming for two weeks" joke

  • @bramvdnheuvel
    @bramvdnheuvel 10 หลายเดือนก่อน

    I would love to see Elm in 100 seconds soon! It definitely deserves more love.

  • @aghilannathan8169
    @aghilannathan8169 11 หลายเดือนก่อน +24

    Data Scientists don’t use CUDA, they use Python abstractions like Tensorflow or Torch which parallelize their work using CUDA assuming an NVIDIA GPU is available.

    • @el_teodoro
      @el_teodoro 11 หลายเดือนก่อน +10

      "Data scientists don't use CUDA, they use CUDA" :D

    • @drpotato5381
      @drpotato5381 11 หลายเดือนก่อน +9

      ​The guy above you doesnt knows what the word abstraction means lmao​@@el_teodoro

    • @HUEHUEUHEPony
      @HUEHUEUHEPony 10 หลายเดือนก่อน

      @@el_teodoroor rocm? or vulkan? or metal?

  • @dfsafsadfsadf
    @dfsafsadfsadf 11 หลายเดือนก่อน +1

    That was a great summary! Thank you!!!

  • @markosdelaportas3089
    @markosdelaportas3089 11 หลายเดือนก่อน +9

    Can't wait to install ZLUDA on my linux pc!

  • @The472k
    @The472k 10 หลายเดือนก่อน

    Thanks for the video! Easy to understand and that helped me a lot to get a basic understanding of CUDA

  • @radumihaidiaconu
    @radumihaidiaconu 11 หลายเดือนก่อน +17

    RocM next

  • @TheVilivan
    @TheVilivan 11 หลายเดือนก่อน +5

    Would love to see some more videos on parallel computing, with more explanation of this kind of code. Maybe a more in-depth video on Beyond Fireship?

  • @stefantanuwijaya8598
    @stefantanuwijaya8598 11 หลายเดือนก่อน +9

    Opencl next!

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +2

      I doubt AMD will pay him a 7900XTX to do it.

  • @Monstacheeks
    @Monstacheeks 3 หลายเดือนก่อน

    Thanks so much for visually explaining Cuda!

  • @noble.reclaimer
    @noble.reclaimer 11 หลายเดือนก่อน +7

    I can finally build my own LLM now!

  • @marcellsimon2129
    @marcellsimon2129 11 หลายเดือนก่อน

    Love how this video came out 20 minutes after I did intensive google search about CUDA :D

  • @DeJaK314
    @DeJaK314 10 หลายเดือนก่อน +5

    1:30 THE CAKE IS A LIE

  • @CoughSyrup
    @CoughSyrup 11 หลายเดือนก่อน +2

    While you are correct for crediting both Buck and Nichols for the prior work leading up to CUDA, I felt like it was important to point out that they did not both contribute equally to the research in question, as most people will agree that one Buck is worth about 20 Nichols.

  • @zainkhalid3670
    @zainkhalid3670 11 หลายเดือนก่อน +40

    Getting CUDA to run on your Windows machine is one of the greatest problems of modern computer science.
    Edit: "getting CUDA-related libraries in a Python environment to correctly run neural networks"

    • @eigentensor
      @eigentensor 11 หลายเดือนก่อน +9

      lol, holy wow this really is a noob channel

    • @СергейМакеев-ж2н
      @СергейМакеев-ж2н 11 หลายเดือนก่อน +7

      Getting it to run the "official" way, from Visual Studio, is not much of a problem. Now, getting CUDA-related libraries in a Python environment to correctly run neural networks - THAT's a challenge. Especially with how much of a bother Conda is.

    • @MrCmon113
      @MrCmon113 11 หลายเดือนก่อน +1

      Lots of ML stuff doesn't have good support on windows. Probably good idea just to run an Ubuntu VM if you plan to do much locally.

  • @JLSXMK8
    @JLSXMK8 11 หลายเดือนก่อน +2

    Can I mention this video as part of my channel intro? I use NVIDIA CUDA to re-render and upscale all my video clips for TH-cam nowadays!! You give a really good explanation of how it all works.

  • @timmyanimations8321
    @timmyanimations8321 11 หลายเดือนก่อน +9

    It didn't change the world at all. OpenCL is exactly the same thing except it works on any graphics card instead of just NVIDIA ones.

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +7

      Stop, Ngreedia doesn't give you free 4090's to say this!

    • @ProjectPhysX
      @ProjectPhysX 11 หลายเดือนก่อน +1

      Yes. Not to mention that OpenCL is exactly as fast as CUDA. I don't know why people still fall for Nvidia's marketing and limit their software to a proprietary platform. Having one OpenCL implementation work on literally every GPU is so much better, it gives users the choice of which GPU to buy.

  • @NEOchildish
    @NEOchildish 11 หลายเดือนก่อน +1

    Great Video! A ROCM video would awesome too. Could help me explain my suffering to friends on using CUDA native apps in a crappy docker container for less performance vs native Nvidia.

  • @samiparlayan4758
    @samiparlayan4758 11 หลายเดือนก่อน +4

    "I was not paid to make this video, but Nvidia did hook me up with an RTX4090"
    Dude i'd rather get an rtx 4090 than getting paid 💀💀💀💀💀

  • @klaotische5701
    @klaotische5701 10 หลายเดือนก่อน

    Just as I needed. Simple and quick introduction for it.

  • @historyrevealed01
    @historyrevealed01 11 หลายเดือนก่อน +16

    A: how complex the CUDA is ?
    B: Even the Fireship doesnt make sense

    • @lucasgasparino6141
      @lucasgasparino6141 11 หลายเดือนก่อน +3

      Honestly, it's a rather low-level API, so it CAN get excessively complicated. That being said, you'd mostly use the basics of CUDA, and complexity would come from making the algorithm you're trying to implement parallel itself. Of course, the real magic is that you can optimize the SHIT out of it, I.e. overengineer the kernel 😅 but yeah, trust me when I say he covers only the intro bits about CUDA, this thing is a rabbit hole.

  • @ace9463
    @ace9463 10 หลายเดือนก่อน

    Having used the CUDA Toolkit for implementing LSTMs and CNNs for Computer Vision and Sentiment Analysis projects using Tensorflow GPU and ScikitLearn libraries of Python which utilized my laptop's NVIDIA GPU, the process of writing raw CUDA Kernels in C++ is somewhat new for me and seems fascinating.

  • @3lqm89
    @3lqm89 11 หลายเดือนก่อน +3

    hey, that's more than 100 seconds

  • @gamemotronixg3965
    @gamemotronixg3965 11 หลายเดือนก่อน +1

    Finally 🎉🎉🎉
    I challenge you to do CUDA matrix multiplication using C

  • @Joey-dj4cd
    @Joey-dj4cd 11 หลายเดือนก่อน +223

    Use me as the button "I understood NOTHING"

    • @AndrewI-n5l
      @AndrewI-n5l หลายเดือนก่อน +2

      He explained it pretty clearly, and bragging about not understanding isn't helpful

    • @AndrewI-n5l
      @AndrewI-n5l หลายเดือนก่อน

      He explained it pretty clearly and bragging about not understanding even a single bit is just toxic. You didn't get it? Watch again or read more.

  • @romanino
    @romanino 10 หลายเดือนก่อน

    I didn't understand MOST of it, but still loved it , thanks!

  • @bradenhelmer9795
    @bradenhelmer9795 11 หลายเดือนก่อน +4

    I literally just finished an exam on cuda wtf

    • @acestandard6315
      @acestandard6315 11 หลายเดือนก่อน

      What course do you offer

    • @SalomDunyoIT
      @SalomDunyoIT 11 หลายเดือนก่อน

      @@acestandard6315 where do u study?

    • @bradenhelmer9795
      @bradenhelmer9795 11 หลายเดือนก่อน +4

      @@SalomDunyoIT Nunya University

  • @novacoax
    @novacoax 11 หลายเดือนก่อน

    Watched the entire video from start to finish and the only word I'm familiar with is AI and CUDA still the best 100 seconds

  • @Nova-rk3fq
    @Nova-rk3fq 9 หลายเดือนก่อน +4

    What game is it in 0:25 ?

    • @FORREAL-TEME
      @FORREAL-TEME 4 หลายเดือนก่อน +8

      It is from unreal engine 5 showcase from 2020 i guess

  • @bonobo3748
    @bonobo3748 11 หลายเดือนก่อน +1

    The video editing must take hours for each upload
    Well done brother

  • @gourav7315
    @gourav7315 11 หลายเดือนก่อน +3

    0:25 what is the game name

    • @pramodgoyal743
      @pramodgoyal743 10 หลายเดือนก่อน

      Leaving a dot here for a captain to show up.

    • @BinaryBlueBull
      @BinaryBlueBull 10 หลายเดือนก่อน

      I also would like to know this. Anyone?

    • @-bismarck
      @-bismarck 5 หลายเดือนก่อน

      It is not a real game it was just a demo to reveal unreal engine 5 possibilities

  • @hypeSe7
    @hypeSe7 11 หลายเดือนก่อน +1

    00:02 CUDA permite usar GPU para computação paralela além de jogos
    00:29 CUDA permite processamento paralelo para cálculos gráficos
    00:54 Nvidia CUDA permite que os desenvolvedores aproveitem o poder da GPU para processamento paralelo rápido.
    01:17 Nvidia CUDA permite execução paralela em GPU para processamento mais rápido.
    01:44 Nvidia CUDA permite processamento paralelo em GPU para programas C++.
    02:08 Usando memória gerenciada para acesso contínuo aos dados entre CPU e GPU.
    02:30 Configuração do lançamento de kernel Cuda para otimização de estruturas de dados
    02:50 Executando 256 threads em paralelo na GPU

  • @noanyobiseniss7462
    @noanyobiseniss7462 11 หลายเดือนก่อน +5

    Cuda is closed source and therefor a non starter for anyone that believes in freedom standards.

    • @Volian0
      @Volian0 11 หลายเดือนก่อน +2

      I wouldn't recommend nvidia to anyone, their CEO is crazy!!

    • @MrCmon113
      @MrCmon113 11 หลายเดือนก่อน +1

      And the alternative is what?
      Hospitals, the garbage collection, fire departments, etc aren't open source either, but you're kinda forced to use them.
      Nvidia has got us all by the balls.
      Your balls are firmly placed in Nvidia's hands.
      God speed your efforts to come up with a freedom alternative.

    • @Volian0
      @Volian0 11 หลายเดือนก่อน

      @@MrCmon113 the alternatives exist! In case of CUDA, OpenCL is the alternative that works on all GPUs. And in case of gaming, AMD cards preform very well (and their drivers are open source)

  • @backyardfreestyler7866
    @backyardfreestyler7866 10 หลายเดือนก่อน

    I had CUDA in my Parallel computing class. And it has only been less than an year since. It was too difficult to find any resource here on youtube, but now youtube is filled with it

  • @MaybeBlackMesa
    @MaybeBlackMesa 11 หลายเดือนก่อน +8

    Nothing worse than buying an AMD card and being locked out of anything AI (and these days it's a LOT of things). Never again.

    • @noanyobiseniss7462
      @noanyobiseniss7462 11 หลายเดือนก่อน +4

      Your not too bright are you.

    • @montytrollic
      @montytrollic 11 หลายเดือนก่อน

      Google ZLUDA my friend ...

  • @drangertornado
    @drangertornado 10 หลายเดือนก่อน

    My masters project is based on CUDA and I was blown away by the performance of my 5 year old 1050Ti Max Q laptop. I am really starting to like Nvidia.

  • @xbozo.
    @xbozo. 11 หลายเดือนก่อน

    awesome animations on the video man

  • @livelife3051
    @livelife3051 11 หลายเดือนก่อน

    Bro, your way to teach, much faster than my mind..

  • @Orincaby
    @Orincaby 11 หลายเดือนก่อน

    I love how the 100 Seconds series is really “how long it takes to explain the topic, and then some”

    • @NonTwinBrothers
      @NonTwinBrothers 10 หลายเดือนก่อน

      WAY back they used to be :(

  • @AitCollini
    @AitCollini 11 หลายเดือนก่อน

    That’s the Sponsored material that the Internet deserves and really needs!

  • @roflixo
    @roflixo 11 หลายเดือนก่อน +8

    0:57 the best way to describe the difference between the CPUs and GPUs is that
    1. CPUs are designed to be mostly MIMD - executing Multiple Instructions on Multiple Data sets (and thus are slower, but more versatile)
    2. GPUs are experts at SIMD (performing Single Instruction on Multiple Data sets)

    • @the_mastermage
      @the_mastermage 11 หลายเดือนก่อน

      Altough you can nowadays also do SIMD on a CPU.

    • @rubbish9231
      @rubbish9231 11 หลายเดือนก่อน +2

      ​​​@@the_mastermageyou can always simd on mimd . But you can't mimd on simd. That's why cpu are slower but more dynamic and more Central to advance computing

    • @HoloTheDrunk
      @HoloTheDrunk 11 หลายเดือนก่อน

      ​@@the_mastermageCPU SIMD is incomparable to GPUs, CPU SIMD is usually limited to blocks of 512 bits max (history note but 64/128-bit SIMD have been a thing for around 3 decades by now, not sure "nowadays" applies hrh)

  • @OK-ri8eu
    @OK-ri8eu 11 หลายเดือนก่อน

    I worked on a porject using CUDA enviornment, this brought some memory like the copying from host to device and vice versa. I'm sure I'll be working on it again in the future.

  • @hyperpug2898
    @hyperpug2898 11 หลายเดือนก่อน

    Wow what great timing to mention ZLUDA

  • @devrim-oguz
    @devrim-oguz 11 หลายเดือนก่อน +2

    You should do a video on SHMT (simultaneous and heterogeneous multithreading)

  • @dheovanixavierdacruz3043
    @dheovanixavierdacruz3043 11 หลายเดือนก่อน

    YES! I was waiting for this one

  • @M7ilan
    @M7ilan 11 หลายเดือนก่อน +1

    Valuable video!

  • @vectoralphaSec
    @vectoralphaSec 11 หลายเดือนก่อน +1

    Game Developers Conference (GDC) is also that week.

  • @SuvviSanthosh
    @SuvviSanthosh 11 หลายเดือนก่อน

    Very informational on CUDA and NVDIA ,👌👌👌Do you own research but dont' miss out on AI & NVIDIA its touching all companies & all sectors.

  • @survivalskillspodcast
    @survivalskillspodcast 11 หลายเดือนก่อน +1

    Fireship, is smart, when are you creating the first ever teleportation machine?

  • @MatheusLB2009
    @MatheusLB2009 11 หลายเดือนก่อน

    I honestly recommend the GTC if you're into graphics or just interesting curiosities