Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ก.ค. 2024
  • Even the smallest of Large Language Models are compute intensive significantly affecting the cost of your Generative AI application. Your ability to increase the throughput and reduce latency can make or break many business cases. NVIDIA TensorRT-LLM is an open-source tool that allows you to considerably speed up execution of your models and in this talk we will demonstrate its application to Gemma.
    Checkout more videos of Gemma Developer Day 2024 → goo.gle/440EAIV
    Subscribe to Google for Developers → goo.gle/developers
    #Gemma #GemmaDeveloperDay
    Event: Gemma Developer Day 2024
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 2

  • @LuigiGoogle
    @LuigiGoogle 3 หลายเดือนก่อน

    hello google, I am inspired by your company, could you create open source projects to work with people who are also interested in this, namely this topic of neural networks.

  • @shayantriedcoding
    @shayantriedcoding 3 หลายเดือนก่อน

    Hi, I am Shayan, beginner Python developer, I want to learn something from code jams or online coding challenges, can Google team will do that work for me please 🥺, I really want to learn something from Google code jam.