Tutorial: CUDA programming in Python with numba and cupy

nickcorn93

มุมมอง 84 525

2 100

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 ก.พ. 2025

ความคิดเห็น • 82

@ErolErten 2 ปีที่แล้ว ⁺¹⁵
I have been looking into gpu programming using numba and python for a while, this seems to be the best tutorial I was able to find so far.. . thank you
@ÁngelTena-w9p หลายเดือนก่อน ⁺¹
Thank you very much for this video, very useful for an introduction to those python packages.
@taj-ulislam6902 10 หลายเดือนก่อน ⁺³
Definitely a lot of new material not seen else where - not a run-of-the-mill video. Great job on originality.
@Omgtired 2 ปีที่แล้ว ⁺¹⁸
Thank you so much. Probably the best introdution to CUDA with Python. The example you use, while very basic, touches on usage of blocks, which is usually omitted in other introduction-level tutorials. Great stuff! Hope you return with some more videos. I have subscribed!
@kayakMike1000 ปีที่แล้ว
Cuda is bullshit closed source. Just wait for Tenstorrent, it's gonna be HUGE.
@kineticraft6977 ปีที่แล้ว ⁺²
This reminds me a lot of the mindset you need to program in assembly.
@frankkoslowski6917 4 หลายเดือนก่อน ⁺²
Excellent Lesson. It answered my questions pertaining Cupy nicely. 🤩
@andrjo 3 ปีที่แล้ว ⁺⁹
wanted to comment that the information in this presentation is very well structured and the flow is excellent.
@nickcorn93 3 ปีที่แล้ว
Thanks man!
@shaheeng8034 11 หลายเดือนก่อน ⁺¹
Thanks a lot! Still the best guide I could find.
@martind.6053 หลายเดือนก่อน ⁺¹
Very interesting ! Thanks
@vallurirajesh 3 ปีที่แล้ว ⁺¹¹
Thank you so very much. This is the exact kind of material I was looking for on this very specific subject. Kudos.
@prietjepruck ปีที่แล้ว ⁺¹
Really great introduction to GPU programming. I hope you make a new one soon.
@Shoz_ ปีที่แล้ว ⁺³
Thank you, this is gold
@______373 3 ปีที่แล้ว ⁺¹¹
wait i tought that this made by some popular channel, done pretty well and then saw, 29 subscribers
@nickcorn93 3 ปีที่แล้ว ⁺¹²
you would be surprised what powerpoint can do. To be honest I don't enjoy making videos that much, it's a lot of work, it always turns out kind of shit (especially audio and webcam quality), and I get nothing in return. But when I encounter a really niche topic that I struggled with myself and I don't find many resources for it I figure I make it myself hopefully such that it may be useful to someone else.
@______373 3 ปีที่แล้ว
@@nickcorn93 "nickcorn93
nickcorn93
2 hours ago
you would be surprised what powerpoint can do." not only powerpoint))))))
@leaodev 3 ปีที่แล้ว ⁺³
Great video, nick!
@ouaililydia3835 ปีที่แล้ว ⁺¹
thank you so much, it is the best explaination i found. Please keep going and give us more information and examples on that
@duongkstn 2 ปีที่แล้ว ⁺³
great tut ! thanks
@Zysperro 2 ปีที่แล้ว ⁺²
Just what I needed! Thanks!
@sciencewolf963 3 ปีที่แล้ว ⁺¹⁴
Excellent explanation, keep going with this content man ;)
@PhoenixReflex ปีที่แล้ว
Thank you so much. Keep up the hard work. Just hoping that more and more libraries in python will support GPU computations soon.
@plumberski8854 ปีที่แล้ว ⁺¹
Great intro for me. Waiting for my new GPU (likely 4060 Ti) for me to dig deeper into Python, CUDA, deep learning ...
@silkworm6861 3 ปีที่แล้ว ⁺⁶
This is a great video!
@DanielleBabitzKarn ปีที่แล้ว ⁺²
thank you! super helpful
@LoneXeaglE 2 ปีที่แล้ว ⁺¹
Thank you so much sir, you are an amazing human being !
@jakob3267 3 ปีที่แล้ว ⁺²
Really nice video, thank you for sharing!
@srepmub 2 ปีที่แล้ว ⁺³
fantastic video.
@thousandTabs 2 ปีที่แล้ว ⁺¹
this was such an excellent video, thank you so much!
@therealbatman664 2 ปีที่แล้ว ⁺²
Thanks a lot really got me started .
@localhost_mds 2 ปีที่แล้ว ⁺²
thank you. good video!!! it was very helpful
@terriplays1726 3 ปีที่แล้ว ⁺²
Thanks for the video, I found the first half and the wrap up really excellent.
@ArijitBhattacharya971 3 ปีที่แล้ว ⁺¹
wold love to see a video on what are a few CUDA programming challenges
@MichaelSkinner-e9j ปีที่แล้ว ⁺¹
What about if you want to develop a library for neural net work?
A highly specialized library
@nucspartan321 ปีที่แล้ว ⁺²
Great video
@dfrank5157 3 ปีที่แล้ว ⁺³
This is really helpful for my computing. Thank you.
@rezidwipradana495 2 ปีที่แล้ว ⁺¹
Thank you very much
@Khaled_Elsadani ปีที่แล้ว
Thanks for sharing INFO
@lfmtube 3 ปีที่แล้ว ⁺¹
Perfect Video! Saw was revealing to me to understand how it works. Thank you! I am a new subscriber of your channel. Regards from Buenos Aires, Argentina
@tooniatoonia2830 ปีที่แล้ว
Really learnt a lot here, thanks!💪
@taiman9423 6 หลายเดือนก่อน
masterclass
@bitcode_ 6 หลายเดือนก่อน
quality education
@timharris72 3 ปีที่แล้ว ⁺¹
This was really good. Thanks for posting this!
@svart-rav8072 6 หลายเดือนก่อน
Hey everyone,
I followed the code 1 to 1 - and then also then checked with the notebook provided in the video description to make sure it's no mistake in my code-along version
And the calculation times for the custom code is always significantly slower, then in the video
Did someone eventually did encounter the same problem or has an explanation for that?
@gauravdeshpande4298 ปีที่แล้ว
I am unable to install cupyx from pip any help
@garywilliams4214 ปีที่แล้ว
Great tutorial, Nick! One minor critique: your pronunciation of ‘array’ was confusing…a more standard pronunciation is “uh-RAY”.
@1Eagler 2 ปีที่แล้ว ⁺¹
Very educational. One thing I've missed: The function matmul is running on the PC or the GPU?
@richardbennett4365 2 ปีที่แล้ว ⁺¹
Wait. At 12:10, the narrator says the timeit magic function reports a duration of 5 ms, but the number is 0.01 ms from 6 ms. The number us far away from 5 compared to 6. It shoukd be 6 ms if he's rounding, not 5 ms. He's truncating the decimals to arrive at an integer.
@nickcorn93 2 ปีที่แล้ว ⁺³
Congratulations, you have invalidated the entire video by spotting this massive mistake ;) !
@richardbennett4365 2 ปีที่แล้ว
@@nickcorn93 🆗.
@sevdattufanogullari6581 7 วันที่ผ่านมา
Thanks for ruining the video
@zaharkohut7881 ปีที่แล้ว
Thank you for this tutorial, it has been very helpful! But since it is only an introduction could anyone tell me what I should watch or read next on this topic? Thanks in advance for the advice!
@billyblackburn864 3 ปีที่แล้ว ⁺¹
hi, I have a program that I want to translate to numba. could you help me?
@nickcorn93 3 ปีที่แล้ว
- what should the program do?
- who is the program for?
- what is it currently written in?
@benheller472 5 หลายเดือนก่อน
@@nickcorn93 is there an email I can contact you at?
@AngeloHafner ปีที่แล้ว
Muito bom...
@niffoxichere8394 2 ปีที่แล้ว
is it only me or the cooling fan going brrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr.
@jakubkahoun8383 2 ปีที่แล้ว ⁺¹
Hi, I m trying this on my local computer, but cannot install Cupy, I have NVida geforece RTX 3060. EDIT: Installed CUDA 11.6 toolkit and it works now.
@nickcorn93 2 ปีที่แล้ว
What is your OS? You may be having issues if you are using windows and pip. Easiest to install cupy in a conda virtual environment, as it will also install the cuda toolkit.
@jakubkahoun8383 2 ปีที่แล้ว
@@nickcorn93 Sorry for bother you, the problem was not installing Cuda Toolkit, srly I hate people who doesnt watch full video closely and ask stupid questions....and now I m one of them :D. Thx alot for this tutorial in 2 months i will try write my own GPU operator for my program, would be interting if this will be faster than CPU. (Btw using normal Visual code in python 3.10 env. on win 11, so far so good. (Altrough i have some code output delay problem when using openCV for some strange reason)
@glenneric1 3 ปีที่แล้ว ⁺²
You say ARRay, I say arRAY. Let's call the whole thing off. But seriously, good stuff.
@Julian-tf8nj 3 ปีที่แล้ว
I kept thinking, "huh? what is he talking about?? Oh, he meant an ARRay!" lol
Other than that, awesome vid!
@nickcorn93 3 ปีที่แล้ว ⁺²
Interesting, so I've basically been pronouncing array incorrectly my whole life. Will try to watch out for that in the future.
@glenneric1 2 ปีที่แล้ว
@@nickcorn93 I've heard other people saying it your way too.
@rweaver6 ปีที่แล้ว
@@nickcorn93 it was very distracting. Work on it google it and use the pronunciation feature.
Otherwise outstanding and very useful tutorial.
@0Clappy 2 ปีที่แล้ว ⁺²
Can you do a tutorial series on how to accelerate things using cuda python?
@nickcorn93 2 ปีที่แล้ว
I've thought about it but it's a lot of work to make and edit a silly video like this, and at the moment I really don't have the time. I don't get anything for making these videos.
@Julian-tf8nj 3 ปีที่แล้ว ⁺¹
VERY helpful, thank you!!!!
@kayakMike1000 ปีที่แล้ว ⁺¹
GPUs aren't general purpose... sigh... They are really good at specific executing the same operation on many data banks. It just happens to be similair type of needs for graphics an machine learning
@nickcorn93 ปีที่แล้ว
Isn't that what I say in this video? Did you even watch it?
@jesusmtz29 11 หลายเดือนก่อน
Approximate arbitrary function? There are caveats.
@HectorHernandez-ws3el 3 ปีที่แล้ว ⁺¹
Thanks for the video, it isn´t very information about, sorry for my english
@TheAIEpiphany ปีที่แล้ว
Something is seriously off with your fast matmul implementation, it's 3 orders of magnitude slower than the built-in method (12.5 ms vs 8.82 us)?
You probably have some host-device copying going on?
@nickcorn93 ปีที่แล้ว
The matmul example shown is the example from the numba documentation so I don't think it's wrong. It's (relatively) slow because matrix multiplication is something that is so common, it is insanely optimized in available implementations. You won't write a matrix multiplication implementation with numba that's faster than cupy. But if you have something custom you need to do, a custom kernel can be faster than a combination of cupy operations.
@mateuszanuszewski69 7 หลายเดือนก่อน
i never really understood how to use numba, the basic explainations in docs didn't help. Good that we have mojo now
@wrcz 8 หลายเดือนก่อน
all these tutorials using light mode while I learn at night... I'm gonna go blind :X
@snapo1750 2 ปีที่แล้ว ⁺¹
There is a python opencl package (pyopencl)
a = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
b = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
krnl = ReductionKernel(ctx, numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="x[i]*y[i]",
arguments="__global float *x, __global float *y")
my_dot_prod = krnl(a, b).get()
🙂 Benefit is it works on ALL GPU's not only Nvidia, (works on intel built in cpu gpu's and on amd gpus)
@mfatihaydogdu7 2 ปีที่แล้ว ⁺¹
Very helpful, thank you.
@SweFromSe ปีที่แล้ว
Thank you so much

ต่อไป

เล่นอัตโนมัติ

Make Python code 1000x Faster with Numba