In finetuning of LLM we have 2 options. 1) change the parameter of actual Base model. But this require High resource and time. 2) Add new layers and change the architecture of the model. In finetuning only change the weight of this additional layer and Base model remain frozen. In inferencing we use both Base model and this additional layer. LoRA helps us in reducing this additional layer by using Low Rank Matrices. This is my knowledge. I want to please react on it So I can Verify my knowledge!😊
Awesome explaination! I have few questions though: 1) At 24:00, you said we can do some matrix multiplication and addition to update the value of Wq so that the fine tuned information gets kinda infused in Wq which inturn allowed us to have faster inference time, but won't that hurt the performance in comparision to the case where we don't update Wq and keep A and B? Are we just trading performance for inference speed? 2) what if we do the same 'update Wq' part with additive adapters? That will also speed up their inference time?
Custom GPTs or Gemini Gems are pretty spot on after you get good at making them. I would play around with these before building an AI agent with LangChain and vector embeddings.
The quizzes aren't well connected to the content. Heck if you could add a timestamp after each quiz of "if you got this wrong, check out this timestamp" that would be helpful
Your explanations are easy to understand and in-depth at the same time. Thank you for making my life easier.
I like simple methods yet extremely effective
In finetuning of LLM we have 2 options.
1) change the parameter of actual Base model. But this require High resource and time.
2) Add new layers and change the architecture of the model. In finetuning only change the weight of this additional layer and Base model remain frozen. In inferencing we use both Base model and this additional layer.
LoRA helps us in reducing this additional layer by using Low Rank Matrices.
This is my knowledge. I want to please react on it So I can Verify my knowledge!😊
This is a good overview 👍
Awesome explaination! I have few questions though:
1) At 24:00, you said we can do some matrix multiplication and addition to update the value of Wq so that the fine tuned information gets kinda infused in Wq which inturn allowed us to have faster inference time, but won't that hurt the performance in comparision to the case where we don't update Wq and keep A and B? Are we just trading performance for inference speed?
2) what if we do the same 'update Wq' part with additive adapters? That will also speed up their inference time?
LoRAs are the biggest thing to come out of AI since the transformer
Cursor with claude 3.5 or o1 mini is great. Use their shortcuts to save time. Still struggles with new languages and frameworks though
When did you explain benefits of loras over adapters?
I seem to have missed it
Custom GPTs or Gemini Gems are pretty spot on after you get good at making them. I would play around with these before building an AI agent with LangChain and vector embeddings.
Appreciate it!
Amazing, thank you. Can u do one for latent diffusion
Back again ❤❤❤
The quizzes aren't well connected to the content. Heck if you could add a timestamp after each quiz of "if you got this wrong, check out this timestamp" that would be helpful