Excellent. I looked at your Google CoLab notebook, and I want to know if Nvidia V100 GPU is supported? The CoLab notebook says, "Currently Turing and Ampere GPUs are supported." Volta is not listed. V100 is Volta micro-architecture. [update: V100 GPUs are mentioned in Table 1 of “8-BIT OPTIMIZERS VIA BLOCK-WISE QUANTIZATION” by Dettmers et al]
Thank you for highlighting, I actually linked the original notebook from the dev. I don't have edit rights to it, but if anyone stumbles upon the issue they should mostly see this comment
@@1littlecoder Might be worthing adding a note in the description, I was wondering why pip was not able to get the package till I ventured into the comments
🎯 Key Takeaways for quick navigation: 00:00 🚀 *Running Large AI Models on Single GPU* - Exploring how to run large language models on a single GPU. - Introducing the use of the "bits and bytes" library for this purpose. - Acknowledging the source of the content from Tim Ditmers. 01:11 🧮 *Quantization for Model Size Reduction* - Explaining the concept of quantization in neural networks. - Highlighting the importance of quantization for reducing model size. - Emphasizing the use of 8-bit and 16-bit precision for quantization. 04:11 🔧 *Setting Up Environment for Model Loading* - Listing the steps to set up the environment for loading large models. - Mentioning the installation of required libraries (bits and bytes, transformers, accelerate). - Providing guidance on selecting the appropriate GPU hardware. 06:20 📦 *Loading Large Models with Ease* - Demonstrating how to load a large language model with a single line of code. - Showcasing the ability to load a 3 billion parameter model without RAM issues. - Comparing the use of transformers' pipeline with manual model loading. 09:33 💾 *Quantization Without Performance Degradation* - Highlighting the key benefit of quantization: reducing model size without performance degradation. - Discussing memory savings achieved with quantization for large models. - Illustrating how quantization allows hosting large models on single GPUs. 13:18 👏 *Acknowledgment and Conclusion* - Expressing gratitude to Tim Ditmers and his team for simplifying the process. - Recognizing the potential impact of this advancement on hosting AI models. - Encouraging viewers to explore this opportunity and stay tuned for further research details. Made with HARPA AI
can you please verify if you can run the 175b bloom model? i see you are run 3b model but i want to know if you have 175b model working in colab, please help
Thanks for checking the video. I'd go with generate text if it's from scratch. Overall GPT-3 still rules in this space but of open source alternatives OPT and Bloom seem good. I think domain based fine tuning would make more sense than just using the model right out of the box.
@@1littlecoder awesome. thanks. I am researching that now. and bloomz. i didn't know about that. i will read into it. Do you know how many tokens OPT can take? davinci from gpt takes 4000 tokens.
@@1littlecoder This was such a humble reply. You gained a sub man. The OP probably doesn't know how hard it is to speak with an English accent when it's not your native language
Thanks Ayomide :) I've got a full-time job, I do TH-cam to learn and share what I know and also expand my knowledge on what I don't know. If English is something that I should improve for my subs to get the content better, I'm all in :) Thank you for the kind words!
@@1littlecoder It must take a lot of effort to be putting out quality videos with a full time job. Really commend your effort. I'm sure it will pay off Your grammar is great by the way. And I'm sure a lot of people won't mind the accent. It's also very nice that you're trying to improve on it.
It is really impressive! I didn’t expect that it would be possible for me to host a huge Model like bloom myself !
I was very happy to see that
Thanks for walking through the notebook and sharing the resources ! Good job!
Thanks
This channel is really a treasure
Thank you
Woah this is what I needed . Thank you !!
I'm glad you liked it
Fantastic. Thank you for sharing.
Glad you liked it
Excellent video! I'd love to learn more and hopefully contribute to these feats of optimization someday.
Thank you. I think you can check their GitHub issues if any good first issue is marked
@@1littlecoder gotcha! Thanks mate!
Thanks to Kalyan KS who suggested me this amazing video!
That's great to know. Thanks to Kalyan and you!
You are a fantastic explainer, thank you!
Excellent. I looked at your Google CoLab notebook, and I want to know if Nvidia V100 GPU is supported? The CoLab notebook says, "Currently Turing and Ampere GPUs are supported." Volta is not listed. V100 is Volta micro-architecture. [update: V100 GPUs are mentioned in Table 1 of “8-BIT OPTIMIZERS VIA BLOCK-WISE QUANTIZATION” by Dettmers et al]
It is really amazing 😍
I bought recently 4070 Ti Super, which want to use together with 2070 Super in tandem.
You are one ‘great’ coder❤
There's a typo in that notebook which you've linked. "bitsandbytes" is missing the s at the end so pip can't find the package.
Thank you for highlighting, I actually linked the original notebook from the dev. I don't have edit rights to it, but if anyone stumbles upon the issue they should mostly see this comment
@@1littlecoder Might be worthing adding a note in the description, I was wondering why pip was not able to get the package till I ventured into the comments
🎯 Key Takeaways for quick navigation:
00:00 🚀 *Running Large AI Models on Single GPU*
- Exploring how to run large language models on a single GPU.
- Introducing the use of the "bits and bytes" library for this purpose.
- Acknowledging the source of the content from Tim Ditmers.
01:11 🧮 *Quantization for Model Size Reduction*
- Explaining the concept of quantization in neural networks.
- Highlighting the importance of quantization for reducing model size.
- Emphasizing the use of 8-bit and 16-bit precision for quantization.
04:11 🔧 *Setting Up Environment for Model Loading*
- Listing the steps to set up the environment for loading large models.
- Mentioning the installation of required libraries (bits and bytes, transformers, accelerate).
- Providing guidance on selecting the appropriate GPU hardware.
06:20 📦 *Loading Large Models with Ease*
- Demonstrating how to load a large language model with a single line of code.
- Showcasing the ability to load a 3 billion parameter model without RAM issues.
- Comparing the use of transformers' pipeline with manual model loading.
09:33 💾 *Quantization Without Performance Degradation*
- Highlighting the key benefit of quantization: reducing model size without performance degradation.
- Discussing memory savings achieved with quantization for large models.
- Illustrating how quantization allows hosting large models on single GPUs.
13:18 👏 *Acknowledgment and Conclusion*
- Expressing gratitude to Tim Ditmers and his team for simplifying the process.
- Recognizing the potential impact of this advancement on hosting AI models.
- Encouraging viewers to explore this opportunity and stay tuned for further research details.
Made with HARPA AI
How to run chronos hermes 13b on the PC, what i need?
can you please verify if you can run the 175b bloom model?
i see you are run 3b model but i want to know if you have 175b model working in colab, please help
You're right I ran the 3B model..I think you'd need a better GPU for 175B model.
@@1littlecoder could you run the 175b model with an a100 gpu(or other gpu) provided with a Google colab pro subscription?
you can't
The table shown is from which paper?
Thanks for the video
Here's the paper - arxiv.org/pdf/2208.07339.pdf
Thanks
For human like original text do you prefer paraphrase or generate text? Which model do you recommend?
Thanks for checking the video. I'd go with generate text if it's from scratch. Overall GPT-3 still rules in this space but of open source alternatives OPT and Bloom seem good. I think domain based fine tuning would make more sense than just using the model right out of the box.
What is T4?
how would you recommend building a custom pc for running alocal llm?
-nvidia a100 -> or the most vram tou can afford on Nvidia card.
(Put in average high end desktop)
I like your style :)
Oh thank you!
I think they not working for fine-tuning large model 💔☹️
try looking into accelerate for fine-tuning
Do you know if anybody is working on instructOPT, like instructGPT?
(Meta) Opt recently released its instruct model. I think even model weights were shared. Other option is Bloomz
@@1littlecoder awesome. thanks. I am researching that now. and bloomz. i didn't know about that. i will read into it.
Do you know how many tokens OPT can take? davinci from gpt takes 4000 tokens.
How do I fine-tune an LLM model in free google colab?
check this out - th-cam.com/video/NRVaRXDoI3g/w-d-xo.html
Can i run it on a rtx 3060 12 gb
Not very sure
6:30
You are an amazing Instructor no doubt. But why dont you work on improving your English accent?
I'm trying by speaking to native speakers If you have any suggestions please let me know
@@1littlecoder This was such a humble reply. You gained a sub man.
The OP probably doesn't know how hard it is to speak with an English accent when it's not your native language
Thanks Ayomide :) I've got a full-time job, I do TH-cam to learn and share what I know and also expand my knowledge on what I don't know. If English is something that I should improve for my subs to get the content better, I'm all in :) Thank you for the kind words!
@@1littlecoder It must take a lot of effort to be putting out quality videos with a full time job. Really commend your effort. I'm sure it will pay off
Your grammar is great by the way. And I'm sure a lot of people won't mind the accent. It's also very nice that you're trying to improve on it.
Thank you for being kind :)