Scaling Kubernetes Clusters for Generative Models: Managing GPU Resources for AI App... Jack Min Ong
ฝัง
- เผยแพร่เมื่อ 17 ธ.ค. 2023
- Scaling Kubernetes Clusters for Generative Models: Managing GPU Resources for AI Applications - Jack Min Ong, Jina AI
With the rise of Generative AI applications, GPU resources have become a critical bottleneck in scaling infrastructure to efficiently serve AI powered applications. Kubernetes, an open-source container orchestration platform, coupled with the NVIDIA GPU operator, provides a scalable solution to this problem, allowing teams to configure and consume GPU resources at scale through the Kubernetes API.
In this talk, we will explore how Kubernetes can be used to efficiently scale Generative AI workloads. We will introduce the challenges of GPU resource management, discuss various techniques to shard GPU devices and explore various techniques for optimizing GPU usage in generative model pipelines.
By the end of this talk, attendees will have a solid understanding of how Kubernetes can be used to provision and share GPU resources across multiple containers, allowing them to make the most of their GPU investments and accelerate their Generative AI applications.