GPU Compute Shader Work Groups

แชร์
ฝัง

ความคิดเห็น • 24

  • @tetronym4549
    @tetronym4549 ปีที่แล้ว +9

    This video is exactly what I needed! Slight nitpick though, is that the music is rather loud compared to your voice, and there is a lot of static as well.
    The good news is this isn't too hard to fix! The static is rather homogeneous so it should be easy to remove with something basic like Audacity

  • @redenvalerio601
    @redenvalerio601 ปีที่แล้ว +6

    Loved your informative videos. Been subscribed ever since. Please continue this kind of content. Hoping for more frequent updates! Much love!

    • @arsiliath
      @arsiliath  ปีที่แล้ว

      Thank you! Really heart warming to hear this.

  • @michelechirra7120
    @michelechirra7120 ปีที่แล้ว +3

    Great video, clear and concise explanation on a topic that is not covered that much, congratulations!

  • @gt3293
    @gt3293 ปีที่แล้ว +2

    This is a great explanation, and i would love to see more if these!

  • @absorbingdude
    @absorbingdude 5 หลายเดือนก่อน

    Big thanks for this clear explanation!

  • @dpscloud3324
    @dpscloud3324 ปีที่แล้ว +3

    Awesome video! Earned a new like and subscribe.

  • @alexandredias4782
    @alexandredias4782 2 หลายเดือนก่อน +1

    formula at 4:08 contains a mistake : dispatchX x dispatchY x dispatchY. the Y is repeated two times, it should instead be dispatchX x dispatchY x dispatchZ.
    Question : Nvidia GPUs have thousands of warps. But each warp contains a maximum of 32 threads. So, wouldn't a numthread of 16x16x1 exceed that number?

  • @Chribit
    @Chribit ปีที่แล้ว +2

    Nice video! :)
    Do you know of any way to query warp / wave size at runtime with vulkan for example?
    I've found it makes quite the difference whether or not a workgroup fits within the internal thread block - and you loose out on some performance if you just use the smallest possible workroup size for all devices.
    For instance: my laptop has intel internal graphics, and supposedly 8x8 workgroups work better there. But my desktop pc with a gtx 1080 ti has much larger warps and technically could go for 32x32 as far as i know.
    would be really cool if you could point me to any resources regarding this - i can't seem to find much in the vulkan specs (only things regarding maximum thread counts per workgroup)

    • @arsiliath
      @arsiliath  ปีที่แล้ว +1

      Unfortunately I'm not sure. I suppose one hack would be to programmatically try a few different sizes and then see which leads to highest fps. Sometimes CUDA documentation is more robust, so if you look there and then try to find the corresponding thing in whatever framework you are using, sometimes that can be a good angle.

  • @dev_reimu
    @dev_reimu 7 หลายเดือนก่อน

    This is great and incredibly informative, but why would we use more than 1 Dispatch if it's going to get distributed in the GPU?

    • @arsiliath
      @arsiliath  7 หลายเดือนก่อน +1

      You dispatch once per kernel call. If you want to dispatch multiple kernels or one kernel multiple times, then you would do multiple dispatches.
      When you dispatch a grid, you choose the number of work groups and the work group dimensions.

    • @dev_reimu
      @dev_reimu 7 หลายเดือนก่อน

      @@arsiliath Oh haha! That's just the same video this comment is on!
      I am just confused because at 4:28 you show a Dispatch of (1, 1, 1), saying it uses much more of the hardware and that it's better, but then proceed to use a Dispatch of (4, 4, 4) and numthreads of (4, 4, 4). Why use multiple dispatches for this one texture?

    • @arsiliath
      @arsiliath  6 หลายเดือนก่อน

      @@dev_reimu What is your understanding of what "Dispatch" means? It might be helpful to articulate your understanding of dispatch (grid size), and also of num threads (work group size) etc, and make sure that you understand what's happening.

  • @TestVideoChannel_1234
    @TestVideoChannel_1234 ปีที่แล้ว

    Amazing video! Thank you for breaking down such a complex topic and making it so much more understandable. Liked and Subscribed. I’m interested in the Compute Shader course you have on Gumroad. Is that still current? I know it’s only a few years old but I though I would check before purchasing.

    • @arsiliath
      @arsiliath  ปีที่แล้ว

      Hi - yes! The course now has students using the current versions of Unity and it is still working. The Unity compute shader API has not changed much, if at all, since the course was created.

    • @TestVideoChannel_1234
      @TestVideoChannel_1234 ปีที่แล้ว

      @@arsiliath Great! Thank you much!!

  • @musabkara1684
    @musabkara1684 5 หลายเดือนก่อน

    So what is the limit of threads. is choosing a high value like 64,64,1 or higher possible?

    • @DxXNA
      @DxXNA 5 หลายเดือนก่อน

      Yeah that's something I'd like to know as well, but if his suggestion of "Warps" is correct you take your Cuda Core's Count i.e. 2060 super has 2176 Cuda Cores and divide this by 32 to get the Warp Size aka 68 unique cores.

    • @alexfrozen
      @alexfrozen หลายเดือนก่อน +1

      You have to look at limits when acquiring GPU device. I don't remember the name of this one, they're long to keep in mind, but it's last. Each limit has guaranteed value in spec of GPU and every vendor should respect it. So, 256 is guaranteed value. It means that 256,1,1, or 1,1,256, or 16,4,4 are valid values. Google GPU LIMITS keywords for answer how to ask GPU about this.

  • @v037_
    @v037_ ปีที่แล้ว

    Nice video thanks dude, can you also add some code samples of compute shader, for me it's best learning also viewing a lot of examples

    • @arsiliath
      @arsiliath  ปีที่แล้ว

      Thank you!
      For examples please see my course: arsiliath.gumroad.com/l/compute-shaders