Kind + GPU: A Local K8s cluster that can run GPU pods

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ก.ย. 2024

ความคิดเห็น • 8

  • @alzeNL
    @alzeNL 3 หลายเดือนก่อน

    I really appreicate this - i wanted a dev envrionment and found 'kind' and it was amazing to see it had GPU support ! Saved me hours of fighting with kubernetes and nvidia plugins. Fantastic work.

    • @alzeNL
      @alzeNL 3 หลายเดือนก่อน

      will be linking to your video in my phd blog :) (alanknipmeyer.phd)

  • @xenoaiandrobotics4222
    @xenoaiandrobotics4222 2 หลายเดือนก่อน

    Please can i get this for k8s am using containerd as my container runtime

    • @Samos123x
      @Samos123x  2 หลายเดือนก่อน

      k8s is still using containerd I think. You just need docker, podman or nerdctl on your laptop / device that runs kind.

  • @kevinchen7002
    @kevinchen7002 11 หลายเดือนก่อน

    When I process the command:
    $ helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled=false
    I met an error:
    FailedMount pod/gpu-feature-discovery-9bnmp MountVolume.SetUp failed for volume "run-nvidia" : hostPath type check failed: /run/nvidia is not a directory
    But when I created the cluster, I used containerPath: /var/run/nvidia-container-devices/all. Where is this /run/nvidia needed?

    • @Samos123x
      @Samos123x  11 หลายเดือนก่อน

      Interesting. I haven't seen that before. What host are you running this on?

    • @kevinchen7002
      @kevinchen7002 11 หลายเดือนก่อน

      Thank you Sam! I use Ubuntu22.04 local host. I waited for quite a long time and this error has been somehow automatically solved. And I want to add one more point that we should make sure the system uses cgroup v2, which can be check by 'docker info | grep -i cgroup'. Otherwise it may cause an error 'adding pid to cgroups: failed to write' during executing gpu-operator-XXX-node-feature-discovery-master pod. Thanks again! Your tutorial is very clear and useful! @@Samos123x