How to run Large AI Models from Hugging Face on Single GPU without OOM

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 พ.ย. 2024

ความคิดเห็น • 67

  • @serta5727
    @serta5727 2 ปีที่แล้ว +11

    It is really impressive! I didn’t expect that it would be possible for me to host a huge Model like bloom myself !

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว +1

      I was very happy to see that

  • @NoobMLDude
    @NoobMLDude ปีที่แล้ว +3

    Thanks for walking through the notebook and sharing the resources ! Good job!

  • @muhammedvaseem8570
    @muhammedvaseem8570 ปีที่แล้ว +4

    This channel is really a treasure

  • @prathameshjadhav3041
    @prathameshjadhav3041 2 ปีที่แล้ว +4

    Woah this is what I needed . Thank you !!

  • @abdelrhmandameen2215
    @abdelrhmandameen2215 2 ปีที่แล้ว +3

    Fantastic. Thank you for sharing.

  • @samlaki4051
    @samlaki4051 2 ปีที่แล้ว +4

    Excellent video! I'd love to learn more and hopefully contribute to these feats of optimization someday.

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว +1

      Thank you. I think you can check their GitHub issues if any good first issue is marked

    • @samlaki4051
      @samlaki4051 2 ปีที่แล้ว

      @@1littlecoder gotcha! Thanks mate!

  • @darshantank554
    @darshantank554 2 ปีที่แล้ว +1

    Thanks to Kalyan KS who suggested me this amazing video!

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว

      That's great to know. Thanks to Kalyan and you!

  • @robert301990
    @robert301990 ปีที่แล้ว +1

    You are a fantastic explainer, thank you!

  • @vtrandal
    @vtrandal ปีที่แล้ว +1

    Excellent. I looked at your Google CoLab notebook, and I want to know if Nvidia V100 GPU is supported? The CoLab notebook says, "Currently Turing and Ampere GPUs are supported." Volta is not listed. V100 is Volta micro-architecture. [update: V100 GPUs are mentioned in Table 1 of “8-BIT OPTIMIZERS VIA BLOCK-WISE QUANTIZATION” by Dettmers et al]

  • @ashutossahoo7041
    @ashutossahoo7041 ปีที่แล้ว +1

    It is really amazing 😍

  • @fontenbleau
    @fontenbleau 8 หลายเดือนก่อน

    I bought recently 4070 Ti Super, which want to use together with 2070 Super in tandem.

  • @BrokenRecord-i7q
    @BrokenRecord-i7q ปีที่แล้ว

    You are one ‘great’ coder❤

  • @EvanBurnetteMusic
    @EvanBurnetteMusic 2 ปีที่แล้ว +5

    There's a typo in that notebook which you've linked. "bitsandbytes" is missing the s at the end so pip can't find the package.

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว +2

      Thank you for highlighting, I actually linked the original notebook from the dev. I don't have edit rights to it, but if anyone stumbles upon the issue they should mostly see this comment

    • @lolmaker
      @lolmaker ปีที่แล้ว +2

      @@1littlecoder Might be worthing adding a note in the description, I was wondering why pip was not able to get the package till I ventured into the comments

  • @jonathanberry1111
    @jonathanberry1111 11 หลายเดือนก่อน +1

    🎯 Key Takeaways for quick navigation:
    00:00 🚀 *Running Large AI Models on Single GPU*
    - Exploring how to run large language models on a single GPU.
    - Introducing the use of the "bits and bytes" library for this purpose.
    - Acknowledging the source of the content from Tim Ditmers.
    01:11 🧮 *Quantization for Model Size Reduction*
    - Explaining the concept of quantization in neural networks.
    - Highlighting the importance of quantization for reducing model size.
    - Emphasizing the use of 8-bit and 16-bit precision for quantization.
    04:11 🔧 *Setting Up Environment for Model Loading*
    - Listing the steps to set up the environment for loading large models.
    - Mentioning the installation of required libraries (bits and bytes, transformers, accelerate).
    - Providing guidance on selecting the appropriate GPU hardware.
    06:20 📦 *Loading Large Models with Ease*
    - Demonstrating how to load a large language model with a single line of code.
    - Showcasing the ability to load a 3 billion parameter model without RAM issues.
    - Comparing the use of transformers' pipeline with manual model loading.
    09:33 💾 *Quantization Without Performance Degradation*
    - Highlighting the key benefit of quantization: reducing model size without performance degradation.
    - Discussing memory savings achieved with quantization for large models.
    - Illustrating how quantization allows hosting large models on single GPUs.
    13:18 👏 *Acknowledgment and Conclusion*
    - Expressing gratitude to Tim Ditmers and his team for simplifying the process.
    - Recognizing the potential impact of this advancement on hosting AI models.
    - Encouraging viewers to explore this opportunity and stay tuned for further research details.
    Made with HARPA AI

  • @smoklares9791
    @smoklares9791 ปีที่แล้ว

    How to run chronos hermes 13b on the PC, what i need?

  • @じある
    @じある 2 ปีที่แล้ว +3

    can you please verify if you can run the 175b bloom model?
    i see you are run 3b model but i want to know if you have 175b model working in colab, please help

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว

      You're right I ran the 3B model..I think you'd need a better GPU for 175B model.

    • @じある
      @じある 2 ปีที่แล้ว

      @@1littlecoder could you run the 175b model with an a100 gpu(or other gpu) provided with a Google colab pro subscription?

    • @julius333333
      @julius333333 ปีที่แล้ว

      you can't

  • @namratashivagunde1027
    @namratashivagunde1027 2 ปีที่แล้ว +1

    The table shown is from which paper?

  • @fractalarbitrage
    @fractalarbitrage 2 ปีที่แล้ว

    For human like original text do you prefer paraphrase or generate text? Which model do you recommend?

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว

      Thanks for checking the video. I'd go with generate text if it's from scratch. Overall GPT-3 still rules in this space but of open source alternatives OPT and Bloom seem good. I think domain based fine tuning would make more sense than just using the model right out of the box.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว

    What is T4?

  • @thumperhunts6250
    @thumperhunts6250 ปีที่แล้ว

    how would you recommend building a custom pc for running alocal llm?

    • @anthrophilosophia
      @anthrophilosophia ปีที่แล้ว

      -nvidia a100 -> or the most vram tou can afford on Nvidia card.
      (Put in average high end desktop)

  • @kitastro
    @kitastro ปีที่แล้ว +1

    I like your style :)

  • @imranullah3097
    @imranullah3097 ปีที่แล้ว

    I think they not working for fine-tuning large model 💔☹️

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว

      try looking into accelerate for fine-tuning

  • @knowledgelover2736
    @knowledgelover2736 ปีที่แล้ว

    Do you know if anybody is working on instructOPT, like instructGPT?

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +1

      (Meta) Opt recently released its instruct model. I think even model weights were shared. Other option is Bloomz

    • @knowledgelover2736
      @knowledgelover2736 ปีที่แล้ว

      @@1littlecoder awesome. thanks. I am researching that now. and bloomz. i didn't know about that. i will read into it.
      Do you know how many tokens OPT can take? davinci from gpt takes 4000 tokens.

  • @ElNinjaZeros
    @ElNinjaZeros ปีที่แล้ว

    How do I fine-tune an LLM model in free google colab?

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +1

      check this out - th-cam.com/video/NRVaRXDoI3g/w-d-xo.html

  • @techinsider3611
    @techinsider3611 ปีที่แล้ว

    Can i run it on a rtx 3060 12 gb

  • @user-wp8yx
    @user-wp8yx ปีที่แล้ว

    6:30

  • @geekyprogrammer4831
    @geekyprogrammer4831 2 ปีที่แล้ว

    You are an amazing Instructor no doubt. But why dont you work on improving your English accent?

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว +10

      I'm trying by speaking to native speakers If you have any suggestions please let me know

    • @ayomidediekola2505
      @ayomidediekola2505 2 ปีที่แล้ว +5

      @@1littlecoder This was such a humble reply. You gained a sub man.
      The OP probably doesn't know how hard it is to speak with an English accent when it's not your native language

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว +5

      Thanks Ayomide :) I've got a full-time job, I do TH-cam to learn and share what I know and also expand my knowledge on what I don't know. If English is something that I should improve for my subs to get the content better, I'm all in :) Thank you for the kind words!

    • @ayomidediekola2505
      @ayomidediekola2505 2 ปีที่แล้ว +5

      @@1littlecoder It must take a lot of effort to be putting out quality videos with a full time job. Really commend your effort. I'm sure it will pay off
      Your grammar is great by the way. And I'm sure a lot of people won't mind the accent. It's also very nice that you're trying to improve on it.

    • @1littlecoder
      @1littlecoder  2 ปีที่แล้ว +2

      Thank you for being kind :)