nickcorn93
nickcorn93
  • 6
  • 83 383
Tutorial: CUDA programming in Python with numba and cupy
/Using the GPU can substantially speed up all kinds of numerical problems. Conventional wisdom dictates that for fast numerics you need to be a C/C++ wizz. It turns out that you can get quite far with only python. In this video, I explain how you can use cupy together with numba to perform calculations on NVIDIA GPU's. Production quality is not the best, but I hope you may find it useful.
00:00 Introduction: GPU programming in python, why?
06:52 Cupy intro
08:39 Cupy demonstration in Google colab
19:54 Cupy summary
20:21 Numba.cuda and kernels intro
25:07 Grids, blocks and threads
27:12 Matrix multiplication kernel
29:20 Tiled matrix multiplication kernel and shared memory
34:31 Numba.cuda demonstration in Google colab
44:25 Final remarks
Edit 3/9/2021: the notebook is use for demonstration can be found here colab.research.google.com/drive/15IDLiUMRJbKqZUZPccyigudINCD5uZ71?usp=sharing
Edit 9/9/2021: at 23:56 one of the grid elements should be labeled 1,3 instead of 1,2. Thanks to _______ for pointing this out.
มุมมอง: 73 285

วีดีโอ

Git basics 1: What is Git and how does it work?
มุมมอง 2993 ปีที่แล้ว
In this video I explain the basic working principle of the distributed version control software git. You can get Git for your system at git-scm.com/. These videos are only meant as an introductory presentation, not as an exhaustive course. You will find all relevant details in the Pro-Git book by Chanon and Straub. For specific questions you will 99% of times find answers on stackexchange.
Git basics 2: Branching and merging
มุมมอง 1383 ปีที่แล้ว
In this video I explain the star features of Git: branching and merging. Branching allows you to make parallel histories inside one project. This is a key element to easy collaboration, and a good understanding of branching and merging is a prerequisite for effectively using remote services such as Gitlab. You can get Git for your system at git-scm.com/. These videos are only meant as an introd...
Git basics 3: Git on a remote
มุมมอง 923 ปีที่แล้ว
Now that you are an expert on git, in this somewhat longer and last video in the series I finally explain how we can use Gitlab and other remote git services to collaborate on git projects. Since this video is a bit longer, here is an outline: 0:00 Introduction and rant about different hosting platforms 4:47 Basic explanation of how remotes work 7:00 Demonstration of creating a new remote repos...
Creating and maintaining a conda-forge package
มุมมอง 9K3 ปีที่แล้ว
Conda is a powerful package manager with which you can create virtual environments and install basically any type of software on a user level. If you are in science or data science, chances are high you are using conda to manage your packages. But how can you create your own packages so that they can be installed with conda? Community contributed conda packages tend be scattered across multiple...
Demonstration of TVIPSconverter, a GUI for converting .tvips to .blo
มุมมอง 1854 ปีที่แล้ว
TVIPSconverter is a small open source GUI tool for converting TVIPS camera data into other data formats. Instalation instructions can be found on Github: github.com/din14970/TVIPSconverter TVIPS cameras are pixelated detectors made by the small German company TVIPS. They are used in some transmission electron microscopes to collect precession electron diffraction (PED) and 4D-STEM data. However...

ความคิดเห็น

  • @wrcz
    @wrcz 12 วันที่ผ่านมา

    all these tutorials using light mode while I learn at night... I'm gonna go blind :X

  • @taj-ulislam6902
    @taj-ulislam6902 2 หลายเดือนก่อน

    Definitely a lot of new material not seen else where - not a run-of-the-mill video. Great job on originality.

  • @shaheeng8034
    @shaheeng8034 3 หลายเดือนก่อน

    Thanks a lot! Still the best guide I could find.

  • @jesusmtz29
    @jesusmtz29 3 หลายเดือนก่อน

    Approximate arbitrary function? There are caveats.

  • @mattiskardell
    @mattiskardell 4 หลายเดือนก่อน

    Thank you so much

  • @Khaled_Elsadani
    @Khaled_Elsadani 6 หลายเดือนก่อน

    Thanks for sharing INFO

  • @PhoenixReflex
    @PhoenixReflex 7 หลายเดือนก่อน

    Thank you so much. Keep up the hard work. Just hoping that more and more libraries in python will support GPU computations soon.

  • @AngeloHafner
    @AngeloHafner 7 หลายเดือนก่อน

    Muito bom...

  • @garywilliams4214
    @garywilliams4214 9 หลายเดือนก่อน

    Great tutorial, Nick! One minor critique: your pronunciation of ‘array’ was confusing…a more standard pronunciation is “uh-RAY”.

  • @TheAIEpiphany
    @TheAIEpiphany 11 หลายเดือนก่อน

    Something is seriously off with your fast matmul implementation, it's 3 orders of magnitude slower than the built-in method (12.5 ms vs 8.82 us)? You probably have some host-device copying going on?

    • @nickcorn93
      @nickcorn93 11 หลายเดือนก่อน

      The matmul example shown is the example from the numba documentation so I don't think it's wrong. It's (relatively) slow because matrix multiplication is something that is so common, it is insanely optimized in available implementations. You won't write a matrix multiplication implementation with numba that's faster than cupy. But if you have something custom you need to do, a custom kernel can be faster than a combination of cupy operations.

  • @user-um9sl1kj6u
    @user-um9sl1kj6u 11 หลายเดือนก่อน

    What about if you want to develop a library for neural net work? A highly specialized library

  • @kineticraft6977
    @kineticraft6977 11 หลายเดือนก่อน

    This reminds me a lot of the mindset you need to program in assembly.

  • @tooniatoonia2830
    @tooniatoonia2830 11 หลายเดือนก่อน

    Really learnt a lot here, thanks!💪

  • @zaharkohut7881
    @zaharkohut7881 ปีที่แล้ว

    Thank you for this tutorial, it has been very helpful! But since it is only an introduction could anyone tell me what I should watch or read next on this topic? Thanks in advance for the advice!

  • @user-tx1we1hw8b
    @user-tx1we1hw8b ปีที่แล้ว

    thank you! super helpful

  • @rohitsatyam2935
    @rohitsatyam2935 ปีที่แล้ว

    Thanks for creating this video. This really got me started with building packages.

  • @plumberski8854
    @plumberski8854 ปีที่แล้ว

    Great intro for me. Waiting for my new GPU (likely 4060 Ti) for me to dig deeper into Python, CUDA, deep learning ...

  • @kayakMike1000
    @kayakMike1000 ปีที่แล้ว

    GPUs aren't general purpose... sigh... They are really good at specific executing the same operation on many data banks. It just happens to be similair type of needs for graphics an machine learning

    • @nickcorn93
      @nickcorn93 ปีที่แล้ว

      Isn't that what I say in this video? Did you even watch it?

  • @prietjepruck
    @prietjepruck ปีที่แล้ว

    Really great introduction to GPU programming. I hope you make a new one soon.

  • @Shoz_
    @Shoz_ ปีที่แล้ว

    Thank you, this is gold

  • @gauravdeshpande4298
    @gauravdeshpande4298 ปีที่แล้ว

    I am unable to install cupyx from pip any help

  • @nucspartan321
    @nucspartan321 ปีที่แล้ว

    Great video

  • @nigmaxus
    @nigmaxus ปีที่แล้ว

    Cupy does not install well through the use of pip

    • @nickcorn93
      @nickcorn93 ปีที่แล้ว

      typically it is easier via conda yes.

  • @ouaililydia3835
    @ouaililydia3835 ปีที่แล้ว

    thank you so much, it is the best explaination i found. Please keep going and give us more information and examples on that

  • @snapo1750
    @snapo1750 ปีที่แล้ว

    There is a python opencl package (pyopencl) a = pyopencl.array.arange(queue, 400, dtype=numpy.float32) b = pyopencl.array.arange(queue, 400, dtype=numpy.float32) krnl = ReductionKernel(ctx, numpy.float32, neutral="0", reduce_expr="a+b", map_expr="x[i]*y[i]", arguments="__global float *x, __global float *y") my_dot_prod = krnl(a, b).get() 🙂 Benefit is it works on ALL GPU's not only Nvidia, (works on intel built in cpu gpu's and on amd gpus)

  • @duongkstn
    @duongkstn ปีที่แล้ว

    great tut ! thanks

  • @0Clappy
    @0Clappy ปีที่แล้ว

    Can you do a tutorial series on how to accelerate things using cuda python?

    • @nickcorn93
      @nickcorn93 ปีที่แล้ว

      I've thought about it but it's a lot of work to make and edit a silly video like this, and at the moment I really don't have the time. I don't get anything for making these videos.

  • @richardbennett4365
    @richardbennett4365 ปีที่แล้ว

    Wait. At 12:10, the narrator says the timeit magic function reports a duration of 5 ms, but the number is 0.01 ms from 6 ms. The number us far away from 5 compared to 6. It shoukd be 6 ms if he's rounding, not 5 ms. He's truncating the decimals to arrive at an integer.

    • @nickcorn93
      @nickcorn93 ปีที่แล้ว

      Congratulations, you have invalidated the entire video by spotting this massive mistake ;) !

    • @richardbennett4365
      @richardbennett4365 ปีที่แล้ว

      @@nickcorn93 🆗.

  • @vicentemedel8469
    @vicentemedel8469 ปีที่แล้ว

    i have a question im noob on this why always when can run an .py proyect y have to install some packages with conda install over again ?

    • @nickcorn93
      @nickcorn93 ปีที่แล้ว

      I'm not sure this video is the right place for this question ;)

  • @arcface2casia255
    @arcface2casia255 ปีที่แล้ว

    Just discovered your channel! Great content 👍 instant sub! Thanks!

  • @localhost_mds
    @localhost_mds ปีที่แล้ว

    thank you. good video!!! it was very helpful

  • @ErolErten
    @ErolErten ปีที่แล้ว

    I have been looking into gpu programming using numba and python for a while, this seems to be the best tutorial I was able to find so far.. . thank you

  • @thousandTabs
    @thousandTabs ปีที่แล้ว

    this was such an excellent video, thank you so much!

  • @1Eagler
    @1Eagler ปีที่แล้ว

    Very educational. One thing I've missed: The function matmul is running on the PC or the GPU?

  • @bradleykreider3358
    @bradleykreider3358 ปีที่แล้ว

    I suggest using "conda install conda-forge::package" over "conda install -c conda-forge package". These mean different things: 1. Install this one package from conda-forge -- install dependencies and other packages from my channel settings. 2. Install this package and look for ALL packages on conda-forge first. This makes conda-forge the highest priority channel, so if you use "defaults" you will see many packages getting replaced by the same version - and again if you run another install command on the same environment without the "-c" (the same packages will get reinstalled from defaults).

    • @nickcorn93
      @nickcorn93 ปีที่แล้ว

      In general, the packages on conda forge aim to be interoperable, so ideally all the packages in your environment should be from conda-forge. There is no guarantee packages will work together when they come from different channels, for example C and Fortran packages if they were compiled using different compilers. Conda-forge standardizes on these details and ensures that dependencies come from conda-forge during build.

    • @bradleykreider3358
      @bradleykreider3358 ปีที่แล้ว

      @@nickcorn93 If all of the packages are coming from conda-forge, then there is no need to specify -c, --channel. If one is using -c, then they are most likely using defaults and cherry picking a few packages from conda-forge. In any case, those two invocations look very similar but act differently when conda is using strict channel priority (the default for 99% of people). I agree that it's better to be all-in (or all out) when using conda-forge; the packages are tested very well and built using the same build-chains and configurations. It's when mixing channels that one can run into inscrutable problems. The fact that defaults and conda-forge work so well together most of the time makes the tiny inconsistencies more surprising for most users.

  • @mfatihaydogdu7
    @mfatihaydogdu7 ปีที่แล้ว

    Very helpful, thank you.

  • @LoneXeaglE
    @LoneXeaglE ปีที่แล้ว

    Thank you so much sir, you are an amazing human being !

  • @Omgtired
    @Omgtired ปีที่แล้ว

    Thank you so much. Probably the best introdution to CUDA with Python. The example you use, while very basic, touches on usage of blocks, which is usually omitted in other introduction-level tutorials. Great stuff! Hope you return with some more videos. I have subscribed!

    • @kayakMike1000
      @kayakMike1000 ปีที่แล้ว

      Cuda is bullshit closed source. Just wait for Tenstorrent, it's gonna be HUGE.

  • @srepmub
    @srepmub ปีที่แล้ว

    fantastic video.

  • @niffoxichere8394
    @niffoxichere8394 2 ปีที่แล้ว

    is it only me or the cooling fan going brrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr.

  • @fredeisele1895
    @fredeisele1895 2 ปีที่แล้ว

    Good job. A mention about using the ‘build:skip # [win]’ or similar to only build for linux would be helpful (you mentioned why it would be helpful, but not how to do it).

    • @nickcorn93
      @nickcorn93 ปีที่แล้ว

      Best to have a look at this example github.com/conda-forge/exitwavereconstruction-feedstock/blob/main/recipe/meta.yaml and check out the docs conda-forge.org/docs/maintainer/adding_pkgs.html#build

  • @Zysperro
    @Zysperro 2 ปีที่แล้ว

    Just what I needed! Thanks!

  • @therealbatman664
    @therealbatman664 2 ปีที่แล้ว

    Thanks a lot really got me started .

  • @jakubkahoun8383
    @jakubkahoun8383 2 ปีที่แล้ว

    Hi, I m trying this on my local computer, but cannot install Cupy, I have NVida geforece RTX 3060. EDIT: Installed CUDA 11.6 toolkit and it works now.

    • @nickcorn93
      @nickcorn93 2 ปีที่แล้ว

      What is your OS? You may be having issues if you are using windows and pip. Easiest to install cupy in a conda virtual environment, as it will also install the cuda toolkit.

    • @jakubkahoun8383
      @jakubkahoun8383 2 ปีที่แล้ว

      @@nickcorn93 Sorry for bother you, the problem was not installing Cuda Toolkit, srly I hate people who doesnt watch full video closely and ask stupid questions....and now I m one of them :D. Thx alot for this tutorial in 2 months i will try write my own GPU operator for my program, would be interting if this will be faster than CPU. (Btw using normal Visual code in python 3.10 env. on win 11, so far so good. (Altrough i have some code output delay problem when using openCV for some strange reason)

  • @rezidwipradana495
    @rezidwipradana495 2 ปีที่แล้ว

    Thank you very much

  • @billyblackburn864
    @billyblackburn864 2 ปีที่แล้ว

    hi, I have a program that I want to translate to numba. could you help me?

    • @nickcorn93
      @nickcorn93 2 ปีที่แล้ว

      - what should the program do? - who is the program for? - what is it currently written in?

  • @ArijitBhattacharya971
    @ArijitBhattacharya971 2 ปีที่แล้ว

    wold love to see a video on what are a few CUDA programming challenges

  • @jakob3267
    @jakob3267 2 ปีที่แล้ว

    Really nice video, thank you for sharing!

  • @leaodev
    @leaodev 2 ปีที่แล้ว

    Great video, nick!

  • @Julian-tf8nj
    @Julian-tf8nj 2 ปีที่แล้ว

    VERY helpful, thank you!!!!