I really appreicate this - i wanted a dev envrionment and found 'kind' and it was amazing to see it had GPU support ! Saved me hours of fighting with kubernetes and nvidia plugins. Fantastic work.
When I process the command: $ helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled=false I met an error: FailedMount pod/gpu-feature-discovery-9bnmp MountVolume.SetUp failed for volume "run-nvidia" : hostPath type check failed: /run/nvidia is not a directory But when I created the cluster, I used containerPath: /var/run/nvidia-container-devices/all. Where is this /run/nvidia needed?
Thank you Sam! I use Ubuntu22.04 local host. I waited for quite a long time and this error has been somehow automatically solved. And I want to add one more point that we should make sure the system uses cgroup v2, which can be check by 'docker info | grep -i cgroup'. Otherwise it may cause an error 'adding pid to cgroups: failed to write' during executing gpu-operator-XXX-node-feature-discovery-master pod. Thanks again! Your tutorial is very clear and useful! @@Samos123x
I really appreicate this - i wanted a dev envrionment and found 'kind' and it was amazing to see it had GPU support ! Saved me hours of fighting with kubernetes and nvidia plugins. Fantastic work.
will be linking to your video in my phd blog :) (alanknipmeyer.phd)
Please can i get this for k8s am using containerd as my container runtime
k8s is still using containerd I think. You just need docker, podman or nerdctl on your laptop / device that runs kind.
When I process the command:
$ helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled=false
I met an error:
FailedMount pod/gpu-feature-discovery-9bnmp MountVolume.SetUp failed for volume "run-nvidia" : hostPath type check failed: /run/nvidia is not a directory
But when I created the cluster, I used containerPath: /var/run/nvidia-container-devices/all. Where is this /run/nvidia needed?
Interesting. I haven't seen that before. What host are you running this on?
Thank you Sam! I use Ubuntu22.04 local host. I waited for quite a long time and this error has been somehow automatically solved. And I want to add one more point that we should make sure the system uses cgroup v2, which can be check by 'docker info | grep -i cgroup'. Otherwise it may cause an error 'adding pid to cgroups: failed to write' during executing gpu-operator-XXX-node-feature-discovery-master pod. Thanks again! Your tutorial is very clear and useful! @@Samos123x