How to run Large AI Models from Hugging Face on Single GPU without OOM

1littlecoder

มุมมอง 41 111

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 14 พ.ย. 2024

ความคิดเห็น • 67

@serta5727 2 ปีที่แล้ว ⁺¹¹
It is really impressive! I didn’t expect that it would be possible for me to host a huge Model like bloom myself !
@1littlecoder 2 ปีที่แล้ว ⁺¹
I was very happy to see that
@NoobMLDude ปีที่แล้ว ⁺³
Thanks for walking through the notebook and sharing the resources ! Good job!
@1littlecoder ปีที่แล้ว
Thanks
@muhammedvaseem8570 ปีที่แล้ว ⁺⁴
This channel is really a treasure
@1littlecoder ปีที่แล้ว
Thank you
@prathameshjadhav3041 2 ปีที่แล้ว ⁺⁴
Woah this is what I needed . Thank you !!
@1littlecoder 2 ปีที่แล้ว
I'm glad you liked it
@abdelrhmandameen2215 2 ปีที่แล้ว ⁺³
Fantastic. Thank you for sharing.
@1littlecoder 2 ปีที่แล้ว
Glad you liked it
@samlaki4051 2 ปีที่แล้ว ⁺⁴
Excellent video! I'd love to learn more and hopefully contribute to these feats of optimization someday.
@1littlecoder 2 ปีที่แล้ว ⁺¹
Thank you. I think you can check their GitHub issues if any good first issue is marked
@samlaki4051 2 ปีที่แล้ว
@@1littlecoder gotcha! Thanks mate!
@darshantank554 2 ปีที่แล้ว ⁺¹
Thanks to Kalyan KS who suggested me this amazing video!
@1littlecoder 2 ปีที่แล้ว
That's great to know. Thanks to Kalyan and you!
@robert301990 ปีที่แล้ว ⁺¹
You are a fantastic explainer, thank you!
@vtrandal ปีที่แล้ว ⁺¹
Excellent. I looked at your Google CoLab notebook, and I want to know if Nvidia V100 GPU is supported? The CoLab notebook says, "Currently Turing and Ampere GPUs are supported." Volta is not listed. V100 is Volta micro-architecture. [update: V100 GPUs are mentioned in Table 1 of “8-BIT OPTIMIZERS VIA BLOCK-WISE QUANTIZATION” by Dettmers et al]
@ashutossahoo7041 ปีที่แล้ว ⁺¹
It is really amazing 😍
@fontenbleau 8 หลายเดือนก่อน
I bought recently 4070 Ti Super, which want to use together with 2070 Super in tandem.
@BrokenRecord-i7q ปีที่แล้ว
You are one ‘great’ coder❤
@EvanBurnetteMusic 2 ปีที่แล้ว ⁺⁵
There's a typo in that notebook which you've linked. "bitsandbytes" is missing the s at the end so pip can't find the package.
@1littlecoder 2 ปีที่แล้ว ⁺²
Thank you for highlighting, I actually linked the original notebook from the dev. I don't have edit rights to it, but if anyone stumbles upon the issue they should mostly see this comment
@lolmaker ปีที่แล้ว ⁺²
@@1littlecoder Might be worthing adding a note in the description, I was wondering why pip was not able to get the package till I ventured into the comments
@jonathanberry1111 11 หลายเดือนก่อน ⁺¹
🎯 Key Takeaways for quick navigation:
00:00 🚀 *Running Large AI Models on Single GPU*
- Exploring how to run large language models on a single GPU.
- Introducing the use of the "bits and bytes" library for this purpose.
- Acknowledging the source of the content from Tim Ditmers.
01:11 🧮 *Quantization for Model Size Reduction*
- Explaining the concept of quantization in neural networks.
- Highlighting the importance of quantization for reducing model size.
- Emphasizing the use of 8-bit and 16-bit precision for quantization.
04:11 🔧 *Setting Up Environment for Model Loading*
- Listing the steps to set up the environment for loading large models.
- Mentioning the installation of required libraries (bits and bytes, transformers, accelerate).
- Providing guidance on selecting the appropriate GPU hardware.
06:20 📦 *Loading Large Models with Ease*
- Demonstrating how to load a large language model with a single line of code.
- Showcasing the ability to load a 3 billion parameter model without RAM issues.
- Comparing the use of transformers' pipeline with manual model loading.
09:33 💾 *Quantization Without Performance Degradation*
- Highlighting the key benefit of quantization: reducing model size without performance degradation.
- Discussing memory savings achieved with quantization for large models.
- Illustrating how quantization allows hosting large models on single GPUs.
13:18 👏 *Acknowledgment and Conclusion*
- Expressing gratitude to Tim Ditmers and his team for simplifying the process.
- Recognizing the potential impact of this advancement on hosting AI models.
- Encouraging viewers to explore this opportunity and stay tuned for further research details.
Made with HARPA AI
@smoklares9791 ปีที่แล้ว
How to run chronos hermes 13b on the PC, what i need?
@じある 2 ปีที่แล้ว ⁺³
can you please verify if you can run the 175b bloom model?
i see you are run 3b model but i want to know if you have 175b model working in colab, please help
@1littlecoder 2 ปีที่แล้ว
You're right I ran the 3B model..I think you'd need a better GPU for 175B model.
@じある 2 ปีที่แล้ว
@@1littlecoder could you run the 175b model with an a100 gpu(or other gpu) provided with a Google colab pro subscription?
@julius333333 ปีที่แล้ว
you can't
@namratashivagunde1027 2 ปีที่แล้ว ⁺¹
The table shown is from which paper?
@namratashivagunde1027 2 ปีที่แล้ว ⁺¹
Thanks for the video
@1littlecoder 2 ปีที่แล้ว ⁺²
Here's the paper - arxiv.org/pdf/2208.07339.pdf
@namratashivagunde1027 2 ปีที่แล้ว
Thanks
$@fractalarbitrage$
@fractalarbitrage 2 ปีที่แล้ว
For human like original text do you prefer paraphrase or generate text? Which model do you recommend?
@1littlecoder 2 ปีที่แล้ว
Thanks for checking the video. I'd go with generate text if it's from scratch. Overall GPT-3 still rules in this space but of open source alternatives OPT and Bloom seem good. I think domain based fine tuning would make more sense than just using the model right out of the box.
@user-wr4yl7tx3w ปีที่แล้ว
What is T4?
@thumperhunts6250 ปีที่แล้ว
how would you recommend building a custom pc for running alocal llm?
@anthrophilosophia ปีที่แล้ว
-nvidia a100 -> or the most vram tou can afford on Nvidia card.
(Put in average high end desktop)
@kitastro ปีที่แล้ว ⁺¹
I like your style :)
@1littlecoder ปีที่แล้ว
Oh thank you!
@imranullah3097 ปีที่แล้ว
I think they not working for fine-tuning large model 💔☹️
@1littlecoder ปีที่แล้ว
try looking into accelerate for fine-tuning
@knowledgelover2736 ปีที่แล้ว
Do you know if anybody is working on instructOPT, like instructGPT?
@1littlecoder ปีที่แล้ว ⁺¹
(Meta) Opt recently released its instruct model. I think even model weights were shared. Other option is Bloomz
@knowledgelover2736 ปีที่แล้ว
@@1littlecoder awesome. thanks. I am researching that now. and bloomz. i didn't know about that. i will read into it.
Do you know how many tokens OPT can take? davinci from gpt takes 4000 tokens.
@ElNinjaZeros ปีที่แล้ว
How do I fine-tune an LLM model in free google colab?
@1littlecoder ปีที่แล้ว ⁺¹
check this out - th-cam.com/video/NRVaRXDoI3g/w-d-xo.html
@techinsider3611 ปีที่แล้ว
Can i run it on a rtx 3060 12 gb
@1littlecoder ปีที่แล้ว
Not very sure
@user-wp8yx ปีที่แล้ว
6:30
@geekyprogrammer4831 2 ปีที่แล้ว
You are an amazing Instructor no doubt. But why dont you work on improving your English accent?
@1littlecoder 2 ปีที่แล้ว ⁺¹⁰
I'm trying by speaking to native speakers If you have any suggestions please let me know
@ayomidediekola2505 2 ปีที่แล้ว ⁺⁵
@@1littlecoder This was such a humble reply. You gained a sub man.
The OP probably doesn't know how hard it is to speak with an English accent when it's not your native language
@1littlecoder 2 ปีที่แล้ว ⁺⁵
Thanks Ayomide :) I've got a full-time job, I do TH-cam to learn and share what I know and also expand my knowledge on what I don't know. If English is something that I should improve for my subs to get the content better, I'm all in :) Thank you for the kind words!
@ayomidediekola2505 2 ปีที่แล้ว ⁺⁵
@@1littlecoder It must take a lot of effort to be putting out quality videos with a full time job. Really commend your effort. I'm sure it will pay off
Your grammar is great by the way. And I'm sure a lot of people won't mind the accent. It's also very nice that you're trying to improve on it.
@1littlecoder 2 ปีที่แล้ว ⁺²
Thank you for being kind :)

ต่อไป

เล่นอัตโนมัติ

What is GPT-3 Prompt Injection & Prompt Leaking? AI Adversarial Attacks