Serving Gemma on GKE using vLLM

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ก.ย. 2024
  • Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.
    vLLM is a fast and easy-to-use library for LLM inference and serving.
    In this video Mofi Rahman and Ali Zaidi walks through the process of deploying Gemma on GKE using vLLM serving engine.
    Find Gemma on Huggingface - huggingface.co...
    Follow along the guide: cloud.google.c...
    Find other guides for serving Gemma and other AIML resources for GKE: g.co/cloud/gke-aiml
    Find other resources for learning about Gemma: ai.google.dev/...

ความคิดเห็น •