@@woongda RWKV is not by Stability, and it is a whole different architecture also (nothing wrong with that). I mean, Stability AI spent like $300 to train Vicuna which isn't bad but they spent like $600k to train the first few versions of Stable Diffusion.
What I am saying bro. This is their business model. So far. There models. Dataset used to train them. Are subpar. StableLM is trained on less than 60b tokens of GitHub data, where there tons of papesr, that highlight learning representations in code, leads to higher levels of generalized reasoning. It's like basic shit to me. They have too much money to release subpar products.😮
@@samwitteveenai Thanks for the reply! Can LangChain adapt for the prompts for vicuna? Have you tried other models with LangChain like OpenAssistant-SFT-7-Llama-30B or Wizard Vicuna 13b? What local models have you found work best or worst so far?
@@kenfink9997 To be honest for work we just trained our own. I did try adapting the prompts for a few of these and I do think that should be possible, but I haven't had great success yet doing it that way. I may just release a model aimed at this.
Hi Sam, thanks for your content! You are amazing, Im really excited to see your next video about Langchain, react and StableVicuna, I've been struggling to left OpenAI LLM to run Langchain tools in a decent way and not spending a sh*t ton of money 😂
I was trying to run the COLAB notebook but it crashes due to insufficient memory errors. Is it ONLY possible to run this on the PRO paid version of COLAB? How could I run this on a free version of COLAB or perhaps even locally on a PC in a Jupyter notebook? Perhaps using 4-bit version of the LLM as you mention?
If I want to further fine tune these RLHF fine-tuned models on domain specific data, is preparing instruction-response dataset the only way? I have domain specific data, but it is just corpus.
It is probably the best way currently. You do things like do more pre-training on a particular corpus just doing next token prediction then do a SFT on instruction / response data. Ie for a finance model you can do more pretraining on that kind of vocab and then do the instruction / response.
@@samwitteveenai Thank you Sam. Appreciate your comment. I don't have the means to collect instruction/response data atm, and my single 24GB GPU likely no able to pre-train the whole model. (just LoRA should be OK). Maybe I will try some automated way to generate QA data from corpus first.
Another nice video, thanks Sam, I've been getting into Langchain a lot recently as a decent abstraction for the different LLM's, but in terms of videos, have you considered looking over the alternatives to langchain? I've seen about half a dozen similar ones, like Chameleon and that C# whose name escapes me right now. I've tried a bunch of the different models, but honestly the whole licensing thing for Llama/Alpaca/Vicuna and the like really annoys me. Because I can't build anything commercially, I have to mess around finding patches to models etc. Just wish someone would release a properly open source base model that everyone can customize and have done with the Llama licensing issues.
You say you need an A100 to run this. I am wondering if a 24GB Nvidia RTX4090 is big enough for this 13B model with 8bit quantization. possible? How much A100 memory is it using?
Would it be easier to first train a model that predicts RLHF preferences, and then use that to "self-train" one of these LLMs? So you'd build a kind of generative adversarial network that way.
Hi Sam. Great video as always! I have learned a lot from you. One quick question: I don't feel the parameter temperature has any effect in this case since sampling is not enabled and model just do greedy search when doing inference?(i.e. by default do_sample=False)
This model can follow complex instructions the best so far, on par with with turbo3.5 api. I think this is the new open source king for us peasants lol
I’m really liking these data sets if not the models, but those just keep getting better and better! Awesome work once again Sam and Tku for what you do!🦾🤖😎
I like to play with Ai by making competition between two of them and they can really write a book together correcting each other's errors like Microsoft cognitive Ai(chatgpt4) with Open assistant from Huggingface which is viewing and consulting with each other (both have competing pros and cons). But for now it's only possible with manual copy-pasta, someday will be automatically in one window. They made together by 30 mins a great business project with careful documents creation (correcting each other) and even full memo for best presentation of how to pursue investors for it, presentation part was quite thoroughly edited like 5 times to the most distilled simplest form with advices to the presenter. It will be an interesting world soon.
Great quality as always! I notice you have done a lot of videos using a100 in colab pro+, have you every facing the case of running out compute units? If that happen what is the gpu google gave us for the rest of the month?
Good question I think it goes back to just cpu. I am not sure these days. I tend to make the vids in Colab Pro but I often use a custom Colab with a different backend.
Tbh there LLM releases have been subpar compared to whats already out. The dataset could have been better. I mean there whole business model is around open source. Where is the quality at? The fact I have a better dataset. Is comical to me. Meta honestly been 🐐 lately in the space. Great content as usual.
When I get some time I will make a proper video going through how it works. If you look at my video on constitutional AI it has a bit about it in there.
Hi Sam, great content as usual! I am a doctor from Australia working on a SaaS product for GP's to use and would love some guidance regarding AI integration. Do you offer consulting services and if so, can you be reached by email? Thank you and kind regards, Dr Gabriel
I just don't get how any of these models are useful. They are really poor in comparison to open ai models. I would like to hear your opinions on that. I don't see any more huge advances coming off of these sort of models. Also who wants a model that is worse than Open AI's but still has all the same filters and restrictions? I just don't get it.
Lots of people want models that don't require them to share their data with OpenAI. I agree that these still have a long way to go compared to OpenAI for open domain chats. Most business cases don't need open domain chat they need something that can be fine tuned to be really good with a limited closed domain. I do understand the people want to have non restrictive models. This is something I have actually spent most of today testing and working on.
So, Sam, would you say that these lastest open LLMs, vicuna, etc, could be used as a base to fine tuning in specific domains? Is that what private business are looking for?
In edition what others said. Using OpenAI is a privacy nightmare for anything that needs to be kept secure. Customer records, proprietary code, business secrets and planning.
Bummer that the colab isn't working anymore :( Spits out error when you get to the "tokenizer = LlamaTokenizer.from_pretrained("TheBloke/stable-vicuna-13B-HF")" activation point.
Hey Sam, can you change your colab to load model with these settings? Its running 3x faster, better accuracy with only 26GB memory usage: base_model = LlamaForCausalLM.from_pretrained( "TheBloke/stable-vicuna-13B-HF", torch_dtype=torch.float16, low_cpu_mem_usage=True, device_map='auto', )
Based on some initial testing, this model (TheBloke/stable-vicuna-13B-GGML, 4_2 bit quantized version with llama.cpp) is way more incoherent than eachadea/ggml-vicuna-13b-1.1. Also I notice it is broken by all the censorship bs and keeps babbling around about content policies and restrictions. I'd recommend using the uncensored version of Vicuna instead.
Great to come with a other Opensource. I am very curious what you all mention with LangChain. Many thanks.
Amazing video, this model looks interesting, thanks for sharing
Considering Stability AI has the resources, I hope they eventually develop an open foundational model to replace LLaMA.
rwkv
@@woongda yes
@@woongda RWKV is not by Stability, and it is a whole different architecture also (nothing wrong with that). I mean, Stability AI spent like $300 to train Vicuna which isn't bad but they spent like $600k to train the first few versions of Stable Diffusion.
What I am saying bro. This is their business model. So far. There models. Dataset used to train them. Are subpar. StableLM is trained on less than 60b tokens of GitHub data, where there tons of papesr, that highlight learning representations in code, leads to higher levels of generalized reasoning. It's like basic shit to me. They have too much money to release subpar products.😮
Very excited to see your colab/video on running this locally with LangChain.
working on this but unfortunately it only works 30% of the time with this model.
@@samwitteveenai Thanks for the reply! Can LangChain adapt for the prompts for vicuna? Have you tried other models with LangChain like OpenAssistant-SFT-7-Llama-30B or Wizard Vicuna 13b? What local models have you found work best or worst so far?
@@kenfink9997 To be honest for work we just trained our own. I did try adapting the prompts for a few of these and I do think that should be possible, but I haven't had great success yet doing it that way. I may just release a model aimed at this.
Excellent as always, running out of superlatives (note to self: query GPT 😊) THANKS
Hey Sam, Awesome video as usual. Can you make a video on fine-tuning as well? Hope that will be helpful for the community.
Hi Sam, thanks for your content! You are amazing, Im really excited to see your next video about Langchain, react and StableVicuna, I've been struggling to left OpenAI LLM to run Langchain tools in a decent way and not spending a sh*t ton of money 😂
Could you please make a video on how to do the 4-bit conversion of a model like this, thank you!
I was trying to run the COLAB notebook but it crashes due to insufficient memory errors. Is it ONLY possible to run this on the PRO paid version of COLAB? How could I run this on a free version of COLAB or perhaps even locally on a PC in a Jupyter notebook? Perhaps using 4-bit version of the LLM as you mention?
yeah unfortunately you need a GPU with a lot of VRAM to run this so Colab free isn't going to work.
If I want to further fine tune these RLHF fine-tuned models on domain specific data, is preparing instruction-response dataset the only way? I have domain specific data, but it is just corpus.
It is probably the best way currently. You do things like do more pre-training on a particular corpus just doing next token prediction then do a SFT on instruction / response data. Ie for a finance model you can do more pretraining on that kind of vocab and then do the instruction / response.
@@samwitteveenai Thank you Sam. Appreciate your comment. I don't have the means to collect instruction/response data atm, and my single 24GB GPU likely no able to pre-train the whole model. (just LoRA should be OK). Maybe I will try some automated way to generate QA data from corpus first.
Another nice video, thanks Sam, I've been getting into Langchain a lot recently as a decent abstraction for the different LLM's, but in terms of videos, have you considered looking over the alternatives to langchain? I've seen about half a dozen similar ones, like Chameleon and that C# whose name escapes me right now. I've tried a bunch of the different models, but honestly the whole licensing thing for Llama/Alpaca/Vicuna and the like really annoys me. Because I can't build anything commercially, I have to mess around finding patches to models etc. Just wish someone would release a properly open source base model that everyone can customize and have done with the Llama licensing issues.
The truly open source stuff is coming in regards to models. Other frameworks there is at least one I am looking at doing some vids for.
@@samwitteveenai Waiting for the RedPajama model to be out!
Agree I look at all of these as trial runs for once that is available etc.
You say you need an A100 to run this. I am wondering if a 24GB Nvidia RTX4090 is big enough for this 13B model with 8bit quantization. possible? How much A100 memory is it using?
I didn't check the usage, but I know others have run some of the similar models on a 3090/4090 with 24GB etc. so I think it should work.
Would it be easier to first train a model that predicts RLHF preferences, and then use that to "self-train" one of these LLMs? So you'd build a kind of generative adversarial network that way.
Yeah this is what Constitutional AI does. I made a couple videos about that and go through RLAIF in that.
I opened the colab and run all, then it failed for "CUDA error: invalid device function". Is that caused for my free colab account?
yes this won't work with the free Colab unfortunately
Hi Sam. Great video as always! I have learned a lot from you. One quick question: I don't feel the parameter temperature has any effect in this case since sampling is not enabled and model just do greedy search when doing inference?(i.e. by default do_sample=False)
This model can follow complex instructions the best so far, on par with with turbo3.5 api. I think this is the new open source king for us peasants lol
Pretty sure by now all teams working on camelids and other LLMs are making sure to train their models to answer your standard questions :D
lol this is something that did occur to me as I recorded this.
I’m really liking these data sets if not the models, but those just keep getting better and better! Awesome work once again Sam and Tku for what you do!🦾🤖😎
I like to play with Ai by making competition between two of them and they can really write a book together correcting each other's errors like Microsoft cognitive Ai(chatgpt4) with Open assistant from Huggingface which is viewing and consulting with each other (both have competing pros and cons). But for now it's only possible with manual copy-pasta, someday will be automatically in one window.
They made together by 30 mins a great business project with careful documents creation (correcting each other) and even full memo for best presentation of how to pursue investors for it, presentation part was quite thoroughly edited like 5 times to the most distilled simplest form with advices to the presenter.
It will be an interesting world soon.
Use this model ( or better other) to download data from free sources, refractor, and retrain, avoids license issue, yes?
Great quality as always! I notice you have done a lot of videos using a100 in colab pro+, have you every facing the case of running out compute units? If that happen what is the gpu google gave us for the rest of the month?
Good question I think it goes back to just cpu. I am not sure these days. I tend to make the vids in Colab Pro but I often use a custom Colab with a different backend.
how does this compare to the GPT4AllxAlpaca model? That seems to be the best one I've come across so far
Not sure I haven't tried that against this.
Hi sam, thank you for your videos. I have 12 gb of ram, how I can train/fine tune LLM models? Or how much video memory should I buy to fine tune them
you should be able to run some of the 3B models. Generally for the bigger models you will need at least a consumer card with 24gb ram
@@samwitteveenai Thank you for your answer. I am trying to fine-tune vicuna on 24GB ram, but got this error: Cannot copy out of meta tensor; no data!
You are my favorite llama 🎉
Tbh there LLM releases have been subpar compared to whats already out. The dataset could have been better. I mean there whole business model is around open source. Where is the quality at? The fact I have a better dataset. Is comical to me. Meta honestly been 🐐 lately in the space.
Great content as usual.
Its hard to find a good a comercial use model
Can you run it on a 24GB card ?
Yes I think so, though I was using a 40gb card on the video.
dont know why they call this open source, none of llama model is really opensource.
Agree. At best they are Open Access.
How does the RLHF piece work
R'einforces l'earning through h'uman f'eedback
When I get some time I will make a proper video going through how it works. If you look at my video on constitutional AI it has a bit about it in there.
Is it continuously learning on human feedback? Or was that during the training period only?
Hi Sam, great content as usual! I am a doctor from Australia working on a SaaS product for GP's to use and would love some guidance regarding AI integration. Do you offer consulting services and if so, can you be reached by email? Thank you and kind regards, Dr Gabriel
best to just reach out to me on Linkedin.
You need 2000 dollar gfx card.
Vicuñas eat jalapeños
I just don't get how any of these models are useful. They are really poor in comparison to open ai models. I would like to hear your opinions on that. I don't see any more huge advances coming off of these sort of models. Also who wants a model that is worse than Open AI's but still has all the same filters and restrictions? I just don't get it.
Lots of people want models that don't require them to share their data with OpenAI. I agree that these still have a long way to go compared to OpenAI for open domain chats. Most business cases don't need open domain chat they need something that can be fine tuned to be really good with a limited closed domain. I do understand the people want to have non restrictive models. This is something I have actually spent most of today testing and working on.
So, Sam, would you say that these lastest open LLMs, vicuna, etc, could be used as a base to fine tuning in specific domains? Is that what private business are looking for?
openassistant is a no bad modelif you are looking to chat with a bot.
these and more often the 30B models.
In edition what others said. Using OpenAI is a privacy nightmare for anything that needs to be kept secure. Customer records, proprietary code, business secrets and planning.
Bummer that the colab isn't working anymore :(
Spits out error when you get to the "tokenizer = LlamaTokenizer.from_pretrained("TheBloke/stable-vicuna-13B-HF")" activation point.
Hey Sam, can you change your colab to load model with these settings? Its running 3x faster, better accuracy with only 26GB memory usage:
base_model = LlamaForCausalLM.from_pretrained(
"TheBloke/stable-vicuna-13B-HF",
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
device_map='auto',
)
will check it out
NameError: name 'init_empty_weights' is not defined
your code throws this error can u help me up here
Make sure your have a GPU that can run the model.
@@samwitteveenai yes, i am using colab gpu and my local is 3040
@@taimoorneutron2940 are you using Colab Pro+ you will need an A100 to run it (check with !nvidia-smi -L )
@@samwitteveenai i have pro plus i will show u updates, let me update u
Based on some initial testing, this model (TheBloke/stable-vicuna-13B-GGML, 4_2 bit quantized version with llama.cpp) is way more incoherent than eachadea/ggml-vicuna-13b-1.1. Also I notice it is broken by all the censorship bs and keeps babbling around about content policies and restrictions. I'd recommend using the uncensored version of Vicuna instead.
thanks