Kernel Grid | GPU Programming | Episode 2

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ก.ย. 2024
  • Support this channel at:
    buymeacoffee.c...
    More on Matrix Multiplication:
    • Matrix multiplication ...
    en.wikipedia.o...
    Code for animations and examples:
    github.com/Szy...

ความคิดเห็น • 5

  • @gowiththeflo59
    @gowiththeflo59 12 วันที่ผ่านมา

    This is a great series, thank you!

  • @dimanft6160
    @dimanft6160 หลายเดือนก่อน +7

    How does this have only 165 views, it's so good

    • @vastabyss6496
      @vastabyss6496 หลายเดือนก่อน

      ikr! Even 3 weeks later, it's not even at 1k :(

  • @bhavindhedhi
    @bhavindhedhi 6 วันที่ผ่านมา

    equations at 2:24 are incorrect

  • @Stefan-td1pw
    @Stefan-td1pw หลายเดือนก่อน +1

    Hi, I've been watching these videos in addition to reading the Programming Massively Parallel Processors,
    My take on the exercise: (for the sake of brevity, I will not include assigning memory or memcpy for now)
    ```c
    // Kernel Function for Array Summing
    __global__ void sumArrays_Kernel(float *A, float *B, float *C, float *D, int Width, int Height, int Depth) {
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int z = blockIdx.z * blockDim.z + threadIdx.z;
    if (x < Width && y < Height && z < Depth) {
    int index = x + y * Width + z * Width * Height; // Defined as index as used twice in next line
    D[index] = A[index] + B[y * Width + x] + C[x];
    }
    }
    void sumArrays_Host(float *A, float *B, float *C, float *D, int X, int Y, int Z) {
    float *A_d, *B_d, *C_d, *D_d;
    // Malloc and Memcpy vars (i.e A -> A_d)
    dim3 block(2, 2, 2); // I'm not massively sure on good sizing here
    dim3 grid((X + block.x - 1) / block.x,
    (Y + block.y - 1) / block.y,
    (Z + block.z - 1) / block.z);
    sumArraysKernel(d_A, d_B, d_C, d_D, X, Y, Z);
    // memcpy result back, and then free memory
    }
    ```
    General idea is that we're using a different index for each input vector, based on the logic you were mentioning earlier, the block and grid logic is just making sure we're in bounds