Low-Rank Adaptation - LoRA explained
ฝัง
- เผยแพร่เมื่อ 26 มิ.ย. 2024
- RELATED LINKS
Paper Title: LoRA: Low-Rank Adaptation of Large Language Models
LoRA Paper: arxiv.org/abs/2106.09685
QLoRA Paper: arxiv.org/abs/2305.14314
LoRA official code: github.com/microsoft/LoRA
Parameter-Efficient Fine-Tuning (PEFT) Adapters paper: arxiv.org/abs/1902.00751
Parameter-Efficient Fine-Tuning (PEFT) library: github.com/huggingface/peft
HuggingFace LoRA training: huggingface.co/docs/diffusers...
HuggingFace LoRA notes: huggingface.co/docs/peft/conc...
⌚️ ⌚️ ⌚️ TIMESTAMPS ⌚️ ⌚️ ⌚️
0:00 - Intro
0:58 - Adapters
1:48 - Twitter ( / ai_bites )
2:13 - What is LoRA
3:17 - Rank Decomposition
4:28 - Motivation Paper
5:02 - LoRA Training
6:53 - LoRA Inference
8:24 - LoRA in Transformers
9:20 - Choosing the rank
9:50 - Implementations
MY KEY LINKS
TH-cam: / @aibites
Twitter: / ai_bites
Patreon: / ai_bites
Github: github.com/ai-bites
this is better explained than what the inventor of Lora itself explained in his video.
Underrated channel, keep making videos and itll eventually blow up
Sure. Thanks for the encouraging words 👍
Amazing video
Glad you think so! 😊
Thanks for the video!
I loved that you added some libraries we can use for this.
do you want me to do more videos on hands-on? Or should I continue on the theory and papers? your inputs will be quite valuable :)
@@AIBites Hands on videos will be great too
wow u r great 😄
Thank you! I am chuffed :)
Good job on the clear explanation of the method and simplification. At 3:40, when you showed the matrix decomposition, the result on the left side does not match the result on the right side. Is this a mistake in the video editing, or is there a point to this? [1 2 3] x [2 20 30[ should be [[2. 4 6], [20 40 60], [30 60 90]]
ah yeah! super spot! I got that wrong while editing. Sorry... 🙂
@@AIBites Yup the Matrix should be [1/2/3] * [ 2 20 1]
Thanks again :)
Very Well Explained! If ΔW's dimensions is 10 x 10 , A and B dimensions are 10x2 and 2x10 respectively. So, instead of training 100 params we only train 40 params (10x2 + 2x10). Am I correct ?
yup you got it right. And based on the compute available, we can adjust the rank ranging from say from as low as 2.
@@AIBites Thanks for the confirmation.