Coding Llama 3 from scratch in PyTorch - Part 1

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 พ.ค. 2024
  • In this video series, you will learn how to train and fine-tune Llama 3 model from scratch.
    The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infini-attention.
    📚Papers:
    - Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints
    : arxiv.org/abs/2212.05055
    - Pre-training Small Base LMs with Fewer Tokens: arxiv.org/abs/2404.08634
    Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention: arxiv.org/abs/2404.07143
    💻 To follow along you can use this colab notebook:
    - github.com/Blaizzy/Coding-LLM...
    🎥 Coding Llama 2 from scratch video series
    Part 1: th-cam.com/users/liveXHmag4damTg
    Part 2: th-cam.com/users/liveLSWDpFmbE90
    Part 3: • Coding Llama 2 from sc...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 15

  • @linz4213
    @linz4213 3 วันที่ผ่านมา +1

    Well made Prince! Learned a lot

  • @fliptip
    @fliptip วันที่ผ่านมา

    such a high quality content piece

  • @AC-go1tp
    @AC-go1tp 29 วันที่ผ่านมา +3

    This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!

    • @princecanuma
      @princecanuma  28 วันที่ผ่านมา

      Most welcome!
      It’s my pleasure:)
      I lived through this so others don’t have to.

  • @ngamcode2485
    @ngamcode2485 18 วันที่ผ่านมา

    this is very impressive and great content. thank you

    • @princecanuma
      @princecanuma  13 วันที่ผ่านมา

      You're very welcome!

  • @kishoretvk
    @kishoretvk 28 วันที่ผ่านมา

    Super impressive. Great value
    One question
    How do I further train the model on my custom content
    Instead of LORA ?
    Can we further full training it and add new memory

    • @princecanuma
      @princecanuma  22 วันที่ผ่านมา

      Most welcome!
      You can do that, but that can be very expensive.

  • @maslaxali8826
    @maslaxali8826 4 วันที่ผ่านมา

    CS programmers are vampires. My eeeeyyyes. great content though

  • @vivekpadman5248
    @vivekpadman5248 16 วันที่ผ่านมา

    Bro how did you train llama 3 without paper?

    • @princecanuma
      @princecanuma  13 วันที่ผ่านมา

      Could you elaborate?

    • @vivekpadman5248
      @vivekpadman5248 12 วันที่ผ่านมา

      @@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅

    • @princecanuma
      @princecanuma  12 วันที่ผ่านมา +1

      @@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance.
      Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel.
      th-cam.com/play/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK.html&si=0Gyt9mdaA-ydiWOA
      Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.

    • @vivekpadman5248
      @vivekpadman5248 12 วันที่ผ่านมา +1

      @@princecanuma oh understood, thanks I'll check it out and also your video 💙

    • @princecanuma
      @princecanuma  12 วันที่ผ่านมา

      Most welcome :)