Running Multiple Models on the Same GPU, on Spot Instances
ฝัง
- เผยแพร่เมื่อ 10 พ.ย. 2024
- Speaker: Oscar Rovira, Co-founder, Mystic AI
I’ll talk about the 2 biggest cost optimisations companies can adopt when running ML inference in their cloud. I’ll cover what is GPU fractionalization, benefits and limitations. I’ll cover the value of using Spot instances and the potential challenges. I’ll include a couple examples of how these combined allow to increase throughput and cost of your GenAI application.