As far as I understand it, some of these details are slightly off, for example, it is my understanding that the 65k/1024 register limit only applies to warps that are resident concurrently rather than the entire block, as registers can be recycled once warps finish and other warps take their place. The CUDA infrastructure does this register/occupancy planning ahead of time. Also, multiple blocks can be resident at the same time on an SM, so one block does not have to wait for the previous one to finish if the SM has resources available to start running another block.
As far as I understand it, some of these details are slightly off, for example, it is my understanding that the 65k/1024 register limit only applies to warps that are resident concurrently rather than the entire block, as registers can be recycled once warps finish and other warps take their place. The CUDA infrastructure does this register/occupancy planning ahead of time.
Also, multiple blocks can be resident at the same time on an SM, so one block does not have to wait for the previous one to finish if the SM has resources available to start running another block.
Wow. Is this manim??
no is womanim.
@@shukranmuchai4750 Yes, all of the animations were done with manim
wow.