first off, respect for the hustle and the in-depth breakdown of integrating llama with other tools really shows how much work goes behind the scenes that said, not sure why everyone's so hyped about all these new models when sometimes simpler and older architectures can do the trick but hey, if it's all about pushing boundaries and experimenting, you're killing it bro!
Thanks a mill moondev!! Yah at this point I'm just pushing to see where it's going, I started fine tuning this for some custom use cases and it looks hyper promising though!
You can’t load Llama2-70b on a single A100 GPU. Using full precision(float-32) would require 70billion * 4 bytes = 280GB of GPU memory. If you load it using float-16 it would reduce by half to 140GB. It finally worked cause you loaded it in int-8 which only requires 70GB of memory while the A100 has 80GB of GPU memory. If you wanted to load it in full/half precision you would need multiple GPUs and also need to leverage tensor parallelism whereby you slice the tensors across multiple GPUs.
I'm not sure that was it, I successfully loaded in half precision over 2xA100-80GB (didn't show the loading in the vid). But when I went to generate this is what I came up against: github.com/huggingface/transformers/issues/24056. Solid calcs though!
Yeah, nice work! I've been playing around RAG as well, I can relate to all roadblocks and pain points. I'm trying to squeeze as much as possible so I can have a decent RAG, without any fancy GPU, consumer grade hardware running everything local, it's been fun/painful
NIcholas I love your videos and your way of making learning about ML/AI fun! In your next video can you please show us how to fine-tune a LLM model! Thanks for all the hard work you do on making these videos!
Huge thanks for you videos. Nowadays I code, demonstrate, and perhaps lead AI, ML, DL, and RL development in 1300 + worker engineering and consulting company I am combining technical analysis tools (fem, CFD, MBS…) with AI to generate new digital business cases
My computer is currently training a lora on stable 7b for natural language to (30k)python, and (30k)sql. I also Included (30k)orca questions so it dosent loose its abilities as a language model, and 20k sentiment analysis for new headlines. I would love to try this model with this as soon as Its done training.
please make a video on ocr on past question papers that can extract questions, and extract keywords and analyse with 10 years papers, and predicts upcoming questions
I think "Amazing" falls short, the amount of knowledge, the fact that your using cutting edge Open source model and all of that in a really funny and light tone. Keep up the good work! I have a question, do you think is much harder to deploy that app into google cloud run compared with runpod?
Thanks so much Juan! I can't imagine it would be running on a VM instance with GPUs attached. Could also separate out the LLM bit and run that solely on a GPU then just run the app on a basic linux instance!
Love your videos Nicholas. Watching this with my morning coffee, a few chuckles, and a bunch of "ooohhh riiiiiight!"s. Your vid bridged a bunch of gaps in my knowledge. Gonna be implementing my own RAG now 😎👍
Great tutorial! Can you also help do a tutorial on setting up runpod to host the application on it? Found that part to be a bit confusing and would love a more thorough walk thru. Thanks for all you do!
I so wish I could do this. Maybe not specifically THIS, but things like this. I wish I understood the underlying principles for making something like this work, Great video!!!
'How to start a farm with no experience' - Hahaha, man, I just want to say that I love your sense of humour. Also, your videos are really useful for me, I'm an English teacher and I'm trying to build useful tools for my students. Thanks for your content.
Hi Nick... really late but would be super grateful for a response. I'm trying to figure out how you used RunPOD for this. It looks like you created a folder to store the weights instead of using one of their custom LLM options. Did you pay for extra storage? I can't imaging you loaded all the weights each time you needed to use this on the cloud. I'm new to working with these models and cloud GPUs, so any help is greatly appreciated!
It is possible that when you tried to load the pdf with SimpleDirectoryReader, it was skipping pages, because of the chunk size /embedding model you selected, the model you selected (all-MiniLM-L6-v2) is limited to 384 while the chunk you specified was 1024, maybe and just maybe, that is why I think it was skipping pages, because it was unable to load all the chunk in the embedding model
Do you really have to get access by Meta to use the weights? My current interpretation is that you enter the license agreement as soon as you use the weights, where ever you got them (as you're also allowed to redistribute them). I'm not 100% sure about this, but I think you don't need to register. I think that's more for them to keep track of early adopters.
Can you please make a video explaining what is the LLM to use when developing a RAG!! It would be of great help if you could make one and also please tell us about how to run this locally on linux!!😁
What are the difference between the meta released llama 2 models , hf models and quantised model (ggml) files found in the hugging face? Why cant we use the meta/llama-2-70b model ?
You could! llama-2-70b is the base model, chat is the model fine-tuned for chat. the GGML model is a quantized model (optimized for running on less powerful machines). The hf suffix indicates that it's been updated to run with the transformers library.
@@NicholasRenotte The 70b chat model downloaded from meta has consolidated.pth files in it. How to use the files to finetune the model for custom datasets ?
hey can it be done on chainlit along with LMQL and Langflow added to it, output shows pdfs file as a reference and scores based on whether its retrieves factual data or makes up its own answer
I wanted to use llama in a chatbot. Do you know if that will be possible? I want to know your opinion. I am using rasa framework to build the chatbot but I am not sure how to integrate it.
please help me 😓 ( in your videos of licence plate tensorflow) i have this error when i copy the train command in cmd : ValueError: mutable default for field sgd is not allowed: use default_factory
Didn't show it here but if I were scaling this out, the whole thing wouldn't be running on a GPU. The app would be on a lightweight machine and the LLM running on serverless GPU endpoints.
Yeah, no real way around that, gotta host somewhere! Especially so if you want to be able to use your own fine-tuned model eventually (coming up soon)!
Nick, thank you so much for the great content. I’m new to AI and want to build an LLM for my startup, but I’m not sure where to start. Can you recommend something?
Going to give it a crack this week, i've got a fine tuning project coming up. Will let you know. The other option is to use the GGML/4 bit quantized models, reduces the need for such a beefy instance. Also, check out RunPod Secure Cloud, a little pricier but seems to have more availability (I ended up using SC when I was recording results for this vid because the community instances were all unavailable). Not sponsored just in case I'm giving off salesy vibes.
Really great content, you might have the most effective style I’ve ever seen. Well done. I can’t remember which video I saw where you spoke about your hardware setup. It’s cloud based isn’t it?
Thanks a mil! This particular instance is cloud based, yup! It's all runpod, I used a remote SSH client to use the env with VsCode. Old HW vid might have been this: th-cam.com/video/GH1RuKguO54/w-d-xo.html
Nice sharing sir your way of teching is very helpful for biggner. Please make a video how we can make deep learning model on earthquake dataset as you have make a project on image classification.
Hey, is there some structured way(steps) to learn to work with llms. As an analogy, DSA is one structured way to solve coding problems. I am new to llms realm and any advice is much appreciated.
thank you for this tutorial , although i am facing a slight issue in parsing tables from pdfs , i managed to allow the parser to take in multiple documents , and it is answering in a quick time , only issue with if the question is related to data within a table or some times data spanning multiple lines it fails to retrieve that data
When someone tells you they made something "as good as" or "better" than chatgpt, remember that even FB do not compare l70b to current gpt-4 turbo, but the previous release.
Nice Video! I think it is impossible to use LLaMA 2 70B on a MacPro M1 with 8 GB RAM :( or is there a any chance without cloud services to use it locally ?
Love your videos , would love to deploy a model but the 70B compute is way too much do you have any idea or do you know any website where I can check compute requirements for the 7B model ? Just got my meta access last week thanks again for the video
What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023? Thanks in advance
I am saying Do I need to know metric level math to get Ahead in machine learning or just Know how things work like the specific or library I'm using Pls answer my Question
Gosh when i use GPT4 , it give me a response saying it can not further summarize personal report and it just stop there. I think i will just need to switch to a diff models
Hey Nicholas, It's a little disappointing that you haven't actually released the final model yet, even though you mentioned it in the video. While showing the source code is a good start, it's not the same as actually providing the finished product. Unfortunately, without the final model itself, it's difficult to take your word for it. To build trust and transparency, it would be much better to provide a download link for the model so people can try it out for themselves. This would be a much more impactful way to share your work and allow others to engage with it. I hope you'll reconsider and release the final model soon!
i am getting error:ValidationError: 1 validation error for HuggingFaceLLM query_wrapper_prompt str type expected (type=type_error.str). I am using 7b chat llama2 model
Well done! Have you considered a video around a formal fine tuning of one of the lesser variants (e.g. 7B) version of Llama 2? I’d love to see you do one. 😁
Excelet video, you are amazing, please update the video "AI Face Body and Hand Pose Detection with Python and Mediapipe", I can't solve the errors, it would be very useful for my university projects, thank you very much.
first off, respect for the hustle and the in-depth breakdown of integrating llama with other tools
really shows how much work goes behind the scenes
that said, not sure why everyone's so hyped about all these new models when sometimes simpler and older architectures can do the trick
but hey, if it's all about pushing boundaries and experimenting, you're killing it bro!
Thanks a mill moondev!! Yah at this point I'm just pushing to see where it's going, I started fine tuning this for some custom use cases and it looks hyper promising though!
Thanks so much for the detailed videos @@NicholasRenotte Can you make a video on fine tuning?
can you name some of the old models? so I can look them up and learn about them?
@@vyrsh0 @moondevonyt yes, I would like to learn what older models do the trick as well!
You can’t load Llama2-70b on a single A100 GPU. Using full precision(float-32) would require 70billion * 4 bytes = 280GB of GPU memory. If you load it using float-16 it would reduce by half to 140GB. It finally worked cause you loaded it in int-8 which only requires 70GB of memory while the A100 has 80GB of GPU memory. If you wanted to load it in full/half precision you would need multiple GPUs and also need to leverage tensor parallelism whereby you slice the tensors across multiple GPUs.
I'm not sure that was it, I successfully loaded in half precision over 2xA100-80GB (didn't show the loading in the vid). But when I went to generate this is what I came up against: github.com/huggingface/transformers/issues/24056. Solid calcs though!
That's nice. I'll just have to settle for my quantized 70b LLMs that run hot and fast on my 4090.
I think I can live with this.
Use petals
It runs nice @ 4bit precision on A6000.
What you meant to say was that you can load LLama2-70b on a single A100 GPU, you just have to run it in int-8.
Always looking forward to your videos...
I've an MSc. In AI, but I still learn from you 👏🏼
I have a PhD and I am here as well 🤷♂
I guess I'm on the right path then
@@MikeAirforce111
Yeah, nice work!
I've been playing around RAG as well, I can relate to all roadblocks and pain points.
I'm trying to squeeze as much as possible so I can have a decent RAG, without any fancy GPU, consumer grade hardware running everything local, it's been fun/painful
This video was great. You have created a format that is very entertaining to watch! 🙌 Subbed!
Thanks so much Mike!
NIcholas I love your videos and your way of making learning about ML/AI fun! In your next video can you please show us how to fine-tune a LLM model! Thanks for all the hard work you do on making these videos!
Incredible stuff done... thank you Nich.
Anytime!! Glad you liked it Rahul!
Nick this is insanely good, thank you for the effort
Thanks a mil!!!
Love this style of video. Fantastic content as always mate. You've given me some ideas to try out. Thanks :)
🙏🏽 thanks for checking it out!
Huge thanks for you videos. Nowadays I code, demonstrate, and perhaps lead AI, ML, DL, and RL development in 1300 + worker engineering and consulting company
I am combining technical analysis tools (fem, CFD, MBS…) with AI to generate new digital business cases
Ooooh, sounda amazing!
@@NicholasRenotte It is a 13 workers' digital business development group:) But thanks again mate
well done. one of the best and compact tutorial I ever had. Thanks for providing the source code
Sick production value and great content!
Thanks a mil Andrey!
Taking the viewers along the development and debugging ride is a cool style
2:40 😂
Thanks Nic the video is awesome ! 🤘🏽🤘🏽🤘🏽
LOL, stoked you liked it Kev!!
Great content! helped me alot with building my own open source model RAG
My computer is currently training a lora on stable 7b for natural language to (30k)python, and (30k)sql. I also Included (30k)orca questions so it dosent loose its abilities as a language model, and 20k sentiment analysis for new headlines. I would love to try this model with this as soon as Its done training.
Noiceee, what data sets you using for Python?
I love you nicholas....you are awesome .my only regret is that i didn't found you earlier. all my dream projects in a channel....thankyou
please make a video on ocr on past question papers that can extract questions, and extract keywords and analyse with 10 years papers, and predicts upcoming questions
I think "Amazing" falls short, the amount of knowledge, the fact that your using cutting edge Open source model and all of that in a really funny and light tone. Keep up the good work! I have a question, do you think is much harder to deploy that app into google cloud run compared with runpod?
Thanks so much Juan! I can't imagine it would be running on a VM instance with GPUs attached. Could also separate out the LLM bit and run that solely on a GPU then just run the app on a basic linux instance!
Love your videos Nicholas. Watching this with my morning coffee, a few chuckles, and a bunch of "ooohhh riiiiiight!"s. Your vid bridged a bunch of gaps in my knowledge.
Gonna be implementing my own RAG now 😎👍
Great share! Thank you for your persistence and giving away your efforts :)
Anytime! Gotta share where I can!
wow! This is top-tier content. Thank you!
Great tutorial! Can you also help do a tutorial on setting up runpod to host the application on it? Found that part to be a bit confusing and would love a more thorough walk thru. Thanks for all you do!
Ya, might do something soon and add it to the free course on Courses From Nick. I'm saving infra style/setup videos for the Tech Fundamentals course.
Amazing editing and content, learnt a lot.
🙏🏽
Thankyou so much for this. God bless you
🙏🏽
I so wish I could do this. Maybe not specifically THIS, but things like this. I wish I understood the underlying principles for making something like this work, Great video!!!
@@jimmc448 Ha ha ha...
'How to start a farm with no experience' - Hahaha, man, I just want to say that I love your sense of humour. Also, your videos are really useful for me, I'm an English teacher and I'm trying to build useful tools for my students. Thanks for your content.
😂 it's my secret dream job! Hahah thanks so much for checking it out man!!
Hi Nick... really late but would be super grateful for a response. I'm trying to figure out how you used RunPOD for this. It looks like you created a folder to store the weights instead of using one of their custom LLM options. Did you pay for extra storage? I can't imaging you loaded all the weights each time you needed to use this on the cloud. I'm new to working with these models and cloud GPUs, so any help is greatly appreciated!
minute 4:45 comment is confirmation clutch! Never give up!
8:57 nice auth key you got there
It is possible that when you tried to load the pdf with SimpleDirectoryReader, it was skipping pages, because of the chunk size /embedding model you selected, the model you selected (all-MiniLM-L6-v2) is limited to 384 while the chunk you specified was 1024, maybe and just maybe, that is why I think it was skipping pages, because it was unable to load all the chunk in the embedding model
Really cool llama application. Really Impressive.
Do you really have to get access by Meta to use the weights? My current interpretation is that you enter the license agreement as soon as you use the weights, where ever you got them (as you're also allowed to redistribute them).
I'm not 100% sure about this, but I think you don't need to register. I think that's more for them to keep track of early adopters.
Great video - thank you
Thanks a mil for checking it ou!
Can you please make a video explaining what is the LLM to use when developing a RAG!! It would be of great help if you could make one and also please tell us about how to run this locally on linux!!😁
What are the difference between the meta released llama 2 models , hf models and quantised model (ggml) files found in the hugging face? Why cant we use the meta/llama-2-70b model ?
You could! llama-2-70b is the base model, chat is the model fine-tuned for chat. the GGML model is a quantized model (optimized for running on less powerful machines). The hf suffix indicates that it's been updated to run with the transformers library.
@@NicholasRenotte The 70b chat model downloaded from meta has consolidated.pth files in it. How to use the files to finetune the model for custom datasets ?
Thank you brother
Anytime!!
...Sunday morning after a bender hahhaaha bro I love you.
Best time to deploy imho 😅
Hi, just found your channel and enjoying it, but I can't wait till we have real Open Source LLMs, anyway keep up the good work, cheers from Sydney.
Thank you. ❤️🍕
Nice video. You seem to have the taken the tough route. I didn't have as much trouble :)
LOL, murphy's law!
This is awesome!
Thanks a mil!!
please attempt the DAG context {model} next, would love to see that, sort of like Causal inference model@@NicholasRenotte
You make me love machine learning more
My job here is done 🙌🏼
hey can it be done on chainlit along with LMQL and Langflow added to it, output shows pdfs file as a reference and scores based on whether its retrieves factual data or makes up its own answer
I wanted to use llama in a chatbot. Do you know if that will be possible? I want to know your opinion. I am using rasa framework to build the chatbot but I am not sure how to integrate it.
Sure can! Seen this? forum.rasa.com/t/how-to-import-huggingface-models-to-rasa/50238
please help me 😓
( in your videos of licence plate tensorflow)
i have this error when i copy the train command in cmd :
ValueError: mutable default for field sgd is not allowed: use default_factory
Excellent video
How can it be scalable, since this deployment costs like 2$ per hour? Thanks.
Didn't show it here but if I were scaling this out, the whole thing wouldn't be running on a GPU. The app would be on a lightweight machine and the LLM running on serverless GPU endpoints.
@@NicholasRenotte but you would still need to pay to rent an A100 GPU which around $1 to $4 per hour
Yeah, no real way around that, gotta host somewhere! Especially so if you want to be able to use your own fine-tuned model eventually (coming up soon)!
Does gpt3.5-turbo (4k or 16k context) remain cheaper in a small production scale?
Nick, thank you so much for the great content. I’m new to AI and want to build an LLM for my startup, but I’m not sure where to start. Can you recommend something?
RunPod A100 instances are looking scarce, any tips on how to adapt for multiple GPU instances?
Going to give it a crack this week, i've got a fine tuning project coming up. Will let you know. The other option is to use the GGML/4 bit quantized models, reduces the need for such a beefy instance. Also, check out RunPod Secure Cloud, a little pricier but seems to have more availability (I ended up using SC when I was recording results for this vid because the community instances were all unavailable). Not sponsored just in case I'm giving off salesy vibes.
Really great content, you might have the most effective style I’ve ever seen. Well done. I can’t remember which video I saw where you spoke about your hardware setup. It’s cloud based isn’t it?
Thanks a mil! This particular instance is cloud based, yup! It's all runpod, I used a remote SSH client to use the env with VsCode. Old HW vid might have been this: th-cam.com/video/GH1RuKguO54/w-d-xo.html
Would you consider a video showing the setup process you use?
Nice sharing sir your way of teching is very helpful for biggner. Please make a video how we can make deep learning model on earthquake dataset as you have make a project on image classification.
You got it!
Waiting sir
Hey, is there some structured way(steps) to learn to work with llms. As an analogy, DSA is one structured way to solve coding problems. I am new to llms realm and any advice is much appreciated.
Hi!
What was the performancee of the method?
How many tokens per second with that deployment?
All facets of your work are incredible! Are the context limits of llama2 similar to that of OpenAI?
Thanks a mil! Would depend on which models you're comparing!
Nick you said that you were able to build your lip reading model in 96 epochs. How long in an epoch in real time?
thank you for this tutorial , although i am facing a slight issue in parsing tables from pdfs , i managed to allow the parser to take in multiple documents , and it is answering in a quick time , only issue with if the question is related to data within a table or some times data spanning multiple lines it fails to retrieve that data
TL;DR basic RAG with Llama 70B, nothing more, nothing less - (thanks a lot for the video, really well done)
When someone tells you they made something "as good as" or "better" than chatgpt, remember that even FB do not compare l70b to current gpt-4 turbo, but the previous release.
where can we find the code that you use in the video? Can you please share it.
How can we set up llama=-2 on the local system with memory. Not just one time question but interactive conversation like online chatGPT
Nice Video! I think it is impossible to use LLaMA 2 70B on a MacPro M1 with 8 GB RAM :( or is there a any chance without cloud services to use it locally ?
Could give it a crack with the GGML models, haven't tried it yet though tbh!!
Hi, could you provide the runpod source code for this? Can't find any outside documentation on how you made this possible
As you have used RAG method, I'd like to know how it can answer extrapolated questions?
Hi Nicolas, are you planing to make a video on training OWL-ViT model?
Love your videos , would love to deploy a model but the 70B compute is way too much do you have any idea or do you know any website where I can check compute requirements for the 7B model ? Just got my meta access last week thanks again for the video
Hi, thanks for the video. which gpu are using? I want to buy and build a dl machine to play with llm.
What platform you are using for 1.69$/hr gpu? Cant find any good gpu cloud providers🥺
Hey, Can you tell me minimum vram, ram and space required to load and inference from the model?
Hello, This seems like a less expensive approach than using Google Cloud. How much did it cost?
What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023? Thanks in advance
I am saying Do I need to know metric level math to get Ahead in machine learning or just Know how things work like the specific or library I'm using Pls answer my Question
Really interesting, but what was your total cost in the end?
Hi! Nice video! Is it possible to use llama2 to build an app like autogpt or gpt researcher in a local environment?
Essentially runpod is a local environment. It's a linux server in the cloud , but it's no different than local linux server.
Yup, what he said ^!
do you use linux? because i cant run this with my windows machine, bitandbytes didn't support windows for cuda >= 11.0
what is the response time for each query? and which GPU did you use for this app?
What are the limation on monetizing Ilama Banker app ? Could please explain?
You are marvelous! I bow down after witnessing your next level hacking skills 🧐.
How do I use this with a react frontend?
Could wrap the inference side of the app up into an api with FastAPI then just call out to it using axios!
Gosh when i use GPT4 , it give me a response saying it can not further summarize personal report and it just stop there.
I think i will just need to switch to a diff models
your are a god thank you !
Waiting open source like function call ChatGpT will be amazing
What is your laptop model?
Mac M1 Max (2021)
I have learned enter intermediate level machine learning, now can I start deep learning along with machine learning. please sir tell me
You are a weapon!!!
Hey Nicholas,
It's a little disappointing that you haven't actually released the final model yet, even though you mentioned it in the video. While showing the source code is a good start, it's not the same as actually providing the finished product.
Unfortunately, without the final model itself, it's difficult to take your word for it. To build trust and transparency, it would be much better to provide a download link for the model so people can try it out for themselves. This would be a much more impactful way to share your work and allow others to engage with it.
I hope you'll reconsider and release the final model soon!
Hello nicholas,i still not understand the ./model.
You are brilliant. I've been trying to find a tutorial for slidebot.. could you work on it ?
Wheres the code for the front end website?
can you do a video about analysing trends from websites such as WGSN?
You got it!
it's great , I try to find like this
Can we have a tutorial on conditional GANs please. And multi feature conditional gans as well 😊
i am getting error:ValidationError: 1 validation error for HuggingFaceLLM query_wrapper_prompt str type expected (type=type_error.str). I am using 7b chat llama2 model
Did you resolved that error? I am gettting same error and iam unable to solve it
this man never ceases to disappoint, thank you for all that you do!
Anytime, thank YOU for watching!!
I am confused is Llama 2 an LLM or did you use the Huggingface LLM ?
LLaMA 2 70b is the LLM, we loaded it here using the Hugging Face library.
A deep learning in Pytorch video pleaseee
How i get free gpu in web server. I don't have gpu.
Does anyone know if renting GPU is cheaper than using Open AI API? By how much? Thank Nicholas for your great content!
Well done! Have you considered a video around a formal fine tuning of one of the lesser variants (e.g. 7B) version of Llama 2? I’d love to see you do one. 😁
On the cards for this week DK, had a client ask for it. Actually got a super interesting use case in mind!
@@NicholasRenotte Awesome! Looking forward to it.
Excelet video, you are amazing, please update the video "AI Face Body and Hand Pose Detection with Python and Mediapipe", I can't solve the errors, it would be very useful for my university projects, thank you very much.
Will take a look!