LLM in a flash: Efficient Large Language Model Inference with Limited Memory

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ค. 2024
  • In this video we review a recent important paper from Apple, titled: "LLM in a flash: Efficient Large Language Model Inference with Limited Memory".
    This paper presents a method to run large language models (LLMs) on devices that does not have enough memory to store the entire model's weights.
    This is an exciting progress in LLMs democratization as it brings closer to using top large language models on our personal computers or our phones.
    Watch the video to learn more about how this method works.
    Paper page - arxiv.org/abs/2312.11514
    Blog post - aipapersacademy.com/llm-in-a-...
    -----------------------------------------------------------------------------------------------
    ✉️ Join the newsletter - aipapersacademy.com/newsletter/
    👍 Please like & subscribe if you enjoy this content
    We use VideoScribe to edit our videos - tidd.ly/44TZEiX (affiliate)
    -----------------------------------------------------------------------------------------------
    Chapters:
    0:00 Introduction
    1:25 Flash Memory & LLM Inference
    3:42 Reduce Data Transfer
    5:16 Increase Chunk Size
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 3

  • @niazhimselfangels
    @niazhimselfangels 7 หลายเดือนก่อน +1

    Really nice video. That's a lot of good content shared in a very digestible form.

  • @rS8NkZRu
    @rS8NkZRu 7 หลายเดือนก่อน

    My man said jiggabyte

  • @PaulSchwarzer-ou9sw
    @PaulSchwarzer-ou9sw 7 หลายเดือนก่อน