Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 8

  • @voncolborn9437
    @voncolborn9437 11 หลายเดือนก่อน +2

    Great presentation. It is interesting to see the practical side of running a bunch of LLMs. Ops makes it happen. Coming from the old, really old, school of computing with massive multi-user, time-share systems, it is interesting to see how no matter how much computing changes, aspects of it remain the same. Through-put, latency, caching and scheduling is still central. All that seems to have changed is the problem domain. We do, in deed, live in intereswting times.

  • @conan_der_barbar
    @conan_der_barbar ปีที่แล้ว +1

    great talk! still waiting for the open source release 👀

  • @suleimanshehu5839
    @suleimanshehu5839 11 หลายเดือนก่อน

    Please create a video on fine tuning MoE LLM using LoRa adapters such as Mixtural 8x7B MoE LLM within your framework

  • @Gerald-iz7mv
    @Gerald-iz7mv 7 หลายเดือนก่อน

    hi, do you have any links to benchmarks you can run to measure latency, throughput for different model and frameworks etc?

  • @fastcardlastname3353
    @fastcardlastname3353 ปีที่แล้ว

    This shall change the landscape of multiple agents if it's promised.

  • @mohamedfouad1309
    @mohamedfouad1309 11 หลายเดือนก่อน

    Github link😅

  • @nithinrao7191
    @nithinrao7191 ปีที่แล้ว

    Second

  • @absbi0000
    @absbi0000 ปีที่แล้ว

    First