Mini Project: How to program a GPU? | CUDA C/C++

0Mean1Sigma

มุมมอง 36 380

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ม.ค. 2025

ความคิดเห็น • 113

@jarrodangove1921 หลายเดือนก่อน ⁺¹⁰⁰
Some days the internet makes me sad. Other days it reminds me of all the people with the same niche interests as me and how incredibly talented some of them are. Thanks for putting so much effort into this :)
@0mean1sigma หลายเดือนก่อน ⁺³
Thanks a lot. Glad you liked the video 😃
@psibarpsi หลายเดือนก่อน ⁺²
*skilled. not talented. talent is god-given. skill is developed through practice. I think it's a little disrespectful to call someone talented - almost makes it seem like they didn't work for it.
🙂
@maurilatvala76 หลายเดือนก่อน
This is why social media is amazing. No mainstream media would not make this. Too niche and too hard for general public. But here can find hidden diamonds.
@Otakutaru หลายเดือนก่อน ⁺¹⁴⁸
"MINI" Project? What the heck?! You just munched a lot of hard to grasp technical implementations, coded a working example, shared it on your blog, AND made a fully animated video about it!! You make me mad.
@0mean1sigma หลายเดือนก่อน ⁺¹³
😅
@salmaalfawal6155 หลายเดือนก่อน ⁺²
That was a click bait for me, cause it ain't MINI at all
@wlatol6512 หลายเดือนก่อน ⁺³
Gues it IS mini because he reviewed a single operation indepth. Great work anyways. Cheers for the author!
@salmaalfawal6155 หลายเดือนก่อน
@@wlatol6512 makes sense
@andydataguy หลายเดือนก่อน ⁺¹
I aspire toward the day when i can do something similar and have the audacity to call it a "mini project" 😂
Thank you for sharing your brilliance and curiosity with the world my friend 💜
@eyannoronha831 หลายเดือนก่อน ⁺²²
I am reading the CUDA C programming book and your videos are super helpful in visualizing the memory access process! Thank you very much!
@0mean1sigma หลายเดือนก่อน ⁺¹
Glad it was helpful!
@antoninmeunier หลายเดือนก่อน ⁺⁷
This is not a "mini" project, you made some real content here! Fantastic video, congrats!
@0mean1sigma หลายเดือนก่อน
Thanks! 😃
@jorgegimenezperez9398 หลายเดือนก่อน ⁺¹⁴
Hey, I’ve been going through the Programming Massively Parallel Processors book lately and doing some CUDA and this was a GREAT video!!!
@0mean1sigma หลายเดือนก่อน ⁺¹
Thanks a lot! Glad the video was helpful!
@sanesanyo หลายเดือนก่อน ⁺³
Great stuff Tushar. I have been keen on learning GPU programming so great to see your videos in my feed. Keep it up and all the best.
@0mean1sigma หลายเดือนก่อน
Great to hear! 😀
@nathanpotter1334 หลายเดือนก่อน ⁺¹¹
Yay, CUDA video. Feel like my timeline has been needing CUDA content
@epic_mole หลายเดือนก่อน ⁺³
I am from embedded systems background but I love your work. Keep Going brother, Just don't quit ! There's always an audience for great content.
@0mean1sigma หลายเดือนก่อน ⁺¹
Thanks a lot! I’m in this for the long run 😃
@markzakharyan หลายเดือนก่อน ⁺⁶
dude this is actually amazing. you’re the cs version of 3b1b… keep up the great work!
@0mean1sigma หลายเดือนก่อน
Thanks a lot! I appreciate it 😃
@jojodi หลายเดือนก่อน ⁺¹
Good job. The reason workgroups are laid out in 1d/2d/3d grids is that all GPU compute APIs were first designed and implemented on top of existing graphics concepts where calculating outputs as e.g. 2D grids is a natural thing.
@Friedbutter101 หลายเดือนก่อน
Super satisfying to see Manim to show the algorithm like that.
@0mean1sigma หลายเดือนก่อน
Glad you enjoy it!
@anwar6336 หลายเดือนก่อน ⁺¹
Great video. I love the simplicity, and the great explanation.
@0mean1sigma หลายเดือนก่อน
Thanks. Glad you found it useful.
@ihmejakki2731 หลายเดือนก่อน
Very cool project, I will definitely go through the project code in the evening!
@0mean1sigma หลายเดือนก่อน
Great! Please do let me know what you think 😃
@mohammedabdulwahhab3699 26 วันที่ผ่านมา
Hey great video. Are the animations at 4:58 detailing the matrix multipications correct? Looks like there's a mistake in what's being selected by the animation. The column vector should be repeated for each dot product shown, but that doesn't look to be the case.
@0mean1sigma 26 วันที่ผ่านมา
I didn't show the complete set of calculations there. My point was to just show the memory access patterns. 🙂
@zeugzeugzeug หลายเดือนก่อน
Thanks for the research! Keep going! I would like to see other algorithms being run an optimized on GPUs...
@Birdsneverfly หลายเดือนก่อน
Beautiful explanation and animation
@qwickstart หลายเดือนก่อน
Beautiful visualization!! i am enjoying watching your videos. Keep up the good work
@0mean1sigma หลายเดือนก่อน
Thank you! Cheers!
@siddharth-gandhi หลายเดือนก่อน
Beautiful video as usual. I'll am motivated to pick up PMPP after sem end just from watching your videos!
@0mean1sigma หลายเดือนก่อน
Thanks a lot and good luck 😃
@ChrisHalden007 หลายเดือนก่อน
Great explanation. Thanks
@0mean1sigma หลายเดือนก่อน
Glad you liked the video 😃
@MariusNiveriHH หลายเดือนก่อน
Crazy good manim skills! perfect video
@0mean1sigma หลายเดือนก่อน
Appreciate it!
@JoeyLutes หลายเดือนก่อน
yay i know a little about this now, thank you!!
@AbhishekS-cv3cr 13 วันที่ผ่านมา
Your videos have similar vibes to 3Blue1Brown Channel!
Great content
@0mean1sigma 13 วันที่ผ่านมา
Glad you liked the video 😃
@diogor420 2 วันที่ผ่านมา
Yes, because he uses the python library Manim created by 3B1B!
@ninad2740 หลายเดือนก่อน
thanks for this amazing vid brother!
@0mean1sigma หลายเดือนก่อน
Thanks. Glad you liked the video 😃
@huzaifamalik2346 หลายเดือนก่อน
Your content is very helpfull and your method teach is great
@AIShipped 20 วันที่ผ่านมา
This is beautiful. I would love to see what it takes to reach the cublas implementation.
@0mean1sigma 20 วันที่ผ่านมา
You can check out the references in the video description. There’s a blog post (by someone else) who got around 90% of the cuBLAS performance.
@w花b หลายเดือนก่อน ⁺¹
Can you talk about scan operations like Blelloch and Hillis/Steele algorithms? Maybe you've already talked about them I haven't seen all your videos. This would be nice to know in what context they're used and provide a cool visualisation of them too.
@0mean1sigma หลายเดือนก่อน ⁺¹
I can take these topics for some future videos. Thanks a lot for suggesting. 😀
@appleWhisky43 หลายเดือนก่อน
Gonna enjoy this knowledge
@taufiquzzamanpeyash6008 หลายเดือนก่อน
High quality content. Subscribed.
@0mean1sigma หลายเดือนก่อน
Thanks a lot. I really appreciate it 😃
@holdthat4090 หลายเดือนก่อน
This is awesome. Have you considered using streams and unrolling the matrices?
@0mean1sigma หลายเดือนก่อน
I’ll consider this in future. My objective as of now is to showcase GPU programming, and keep things as simple as possible. Next videos will be on different algorithms and how GPU can be used there. But, I’ll definitely consider covering streams in future. Thanks a lot for your comment and I appreciate your input. Cheers! 😃
@spacepxl หลายเดือนก่อน
What was the reason for surprise at the x-horizontal / y-vertical layout? It's the standard convention for image processing, which is what GPUs are designed for
@0mean1sigma หลายเดือนก่อน
y-axis is generally vertical up and I'm not bothered too much about this as well. What confused me was (z, y, x). I understand that GPUs weren't designed to work with these kind of computations but it always confused me (when I started out).
@acasualviewer5861 หลายเดือนก่อน
@@0mean1sigma i think you mean i,j vs x,y.. i often means the row in matrix operations. But x is always horizontal in cartesian coordinates.
@charlietharas3339 29 วันที่ผ่านมา
Fantastic video
@slowedreverb6819 หลายเดือนก่อน ⁺¹
This is so beautiful and magnificent to see ❤❤❤🎉
@0mean1sigma หลายเดือนก่อน ⁺¹
Thank you so much!
@theintjengineer หลายเดือนก่อน ⁺¹
Amazing🎉.
Thank you!
@nicosauerbrei8729 หลายเดือนก่อน
Is the nvidia implementation using wmma (Warp Matrix Functions)?
Either way it would be interesting to see how the performance would be impacted if the tensor cores were used as well! Good video!
@0mean1sigma หลายเดือนก่อน
I’ve a video showing the use of tensor cores in a very basic way. I’m not using tension cores in this video, but will definitely consider a similar step by step approach using wmma. Thanks a lot for the suggestion 😀
@JohnMullee หลายเดือนก่อน
really nice stuff, thanks
@0mean1sigma หลายเดือนก่อน
Glad you liked it!
@AllemandInstable หลายเดือนก่อน
this is golden
@trapkat8213 หลายเดือนก่อน
Great content!
One thing I have never understood is, for matrix multiplication, what is the sort of threshold in size that makes a GPU implementation faster than a CPU implementation? If I want to do one million multiplications of 4 x 4 matrices, ignoring the overhead required to set up the computations, is the GPU faster than the CPU? Surely not. What about 100 x 100? 1000 x 1000?
@0mean1sigma หลายเดือนก่อน
GPUs are suited for large dataset. I can’t specify a number as it will depend on the algorithm and GPU specs.
@gauravpatil1630 หลายเดือนก่อน
What did you do? Compared with GFLOP you developed your own function to handle matrix mul ?
@0mean1sigma หลายเดือนก่อน
I wrote SGeMM from scratch (that runs on a gpu)
@gauravpatil1630 หลายเดือนก่อน
@ got it 🔥great man !!
@TheVirtualArena24 หลายเดือนก่อน
I'm so dumb to understand this but I know this is something good. I'll understand it someday
@7uz330 หลายเดือนก่อน
i study linear algebra and im shocked right now now cuz it’s important to program the hardware system
@AbhishekGupta-ny8ig หลายเดือนก่อน
i have a question ... from where can i learn these concepts
@0mean1sigma หลายเดือนก่อน
I’ve provided some of the links in the video description. There are also some good textbooks. Good luck!
@brianmeehan6235 หลายเดือนก่อน
Nice Manim work!
@nikhilkande336 หลายเดือนก่อน
I was hoping that someone would simplify GPU's Matrix multiplication to me , so thank you .
@0mean1sigma หลายเดือนก่อน
Glad you found it useful 😃
@AA-oc7lu หลายเดือนก่อน
If you use Strassen Method for multiplying 2x2 matrices you will reduce the number of required steps by 67% in large matrices
@0mean1sigma หลายเดือนก่อน ⁺¹
I'll definitely try that in future. Thanks a lot for your comment 😀
@AA-oc7lu หลายเดือนก่อน
@@0mean1sigma 👍👍👍👍
@AA-oc7lu หลายเดือนก่อน
@@0mean1sigma th-cam.com/video/0oJyNmEbS4w/w-d-xo.html
That is where I learnt it 😃
@uonliaquat7957 หลายเดือนก่อน
Thanks for the video, from where did you learn all this stuff? Any book or course?
@0mean1sigma หลายเดือนก่อน ⁺¹
I've put links to a couple of Blog posts in the description. They were very helpful (especially when it came to verifying my code).
@uonliaquat7957 หลายเดือนก่อน
But you didn’t take any course right?
@0mean1sigma หลายเดือนก่อน ⁺¹
Nope
@croncoder862 หลายเดือนก่อน
Are you using manim?
@LostAdmin หลายเดือนก่อน
Yes he is
@engineeredarmy1152 หลายเดือนก่อน
Yet another Indian banger video
@dimitri0404 หลายเดือนก่อน ⁺¹
Any plans to make more hands on gpu/cuda tutorials? That show more of the specific syntax instead of algorithms and techniques?
Or do you know where i can find quality tutorials for that?
@dimitri0404 หลายเดือนก่อน
oh,I'm guessing your blog is that place? I'll check it out.
@0mean1sigma หลายเดือนก่อน ⁺²
Yes, you can find more details on my blog and I also open source my code.
@erichpoly4434 หลายเดือนก่อน
Thans for you Service
@juliansantos1900 หลายเดือนก่อน ⁺¹
Why it feels like 3blue1brown vid... You use manim???
@0mean1sigma หลายเดือนก่อน
Yes
@Vuden13 หลายเดือนก่อน
W project
@__hannibaalbarca__ หลายเดือนก่อน
Where is Cuda/C/C++
@0mean1sigma หลายเดือนก่อน
Check out the links in the description
@vkrotevich6171 หลายเดือนก่อน
So, after trying your benchmarks, I've got that cuBLAS is fastest in comparison to any of your approach.
Though, nice video
@0mean1sigma หลายเดือนก่อน
Thanks a lot for your comment. Yes, my implementations are slower than CUDA but my focus was more on understanding the GPU programming concepts.
@عبدالرحمنمجدي-ف2س 12 วันที่ผ่านมา
thanks but the music is very annoying can't focus
@0mean1sigma 12 วันที่ผ่านมา ⁺¹
Will keep this in mind for the future 😃
@JohnnyAlex2u หลายเดือนก่อน
5 minutes in and I’m starting to stroke
@PRiKoL1ST1 9 วันที่ผ่านมา
Come on, now use tensor cores
@domdom2447 หลายเดือนก่อน
Mini💀
@JohnnyAlex2u หลายเดือนก่อน
wtf did I just watched ? 😬
@imnbmusic หลายเดือนก่อน ⁺¹
Are you using manim??
@0mean1sigma หลายเดือนก่อน ⁺¹
Yes
@forloopcodes หลายเดือนก่อน ⁺¹
Where did you learn that? Its soo clean asf @@0mean1sigma

ต่อไป

เล่นอัตโนมัติ

2678x Faster with CUDA C: Simple Matrix Multiplication on a GPU | Episode 1: Introduction to GPGPU