Low-rank Adaption of Large Language Models Part 2: Simple Fine-tuning with LoRA

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024

ความคิดเห็น • 83

  • @gnostikas
    @gnostikas ปีที่แล้ว +14

    You seem to be the kind of ai expert that I am trying to become. Very impressive.

  • @LordKelvinThomson
    @LordKelvinThomson ปีที่แล้ว +8

    At least as good and at times better that every other equivalent tutorial on the subject at this time.

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Thanks so much for the kind words!

  • @television9233
    @television9233 ปีที่แล้ว +8

    Very Cool
    Huggingface has done so much of the heavylifting for us, they are actually amazing.
    Also, when I first heard about LoRa I thought the implementation was complicated (utilizing some efficient SVD or other numerical methods to achieve the decomposition of the full weight update matrix) turns out it literally just starts with the two smaller matrices and backprop does all the work lol

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว +1

      Backprop coming to the rescue again!

  • @datasciencetoday7127
    @datasciencetoday7127 ปีที่แล้ว +3

    Mind blown into 3 billion pieces

  • @waynesletcher7470
    @waynesletcher7470 10 หลายเดือนก่อน +1

    Oh, Wise Wizard, I bow before your might. Please, continue to guide me.

  • @user-hf3fu2xt2j
    @user-hf3fu2xt2j ปีที่แล้ว +2

    Ok, you got me absoutely amused by the results.
    Also, thanks for showing that there's lora library out there : I tried to do it on my own

  • @Umuragewanjye
    @Umuragewanjye ปีที่แล้ว +2

    Hi Chriss! Thanks for the course. i want to learn more. May God bless you 🤲

  • @DreamsAPI
    @DreamsAPI ปีที่แล้ว +3

    Subscribed and Thumbs up, appreciate the videos.

  • @afifaniks
    @afifaniks ปีที่แล้ว

    Very intuitive! I didn't even yawn throughout the whole video lol. Keep up the good work! :)

  • @ryanbthiesant2307
    @ryanbthiesant2307 ปีที่แล้ว +1

    Who, what, where, why and when. I am grateful for your video. Please can you give a use case. And start your videos with the end in mind.

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Absolutely! I'll seek to do that going forward!

    • @ryanbthiesant2307
      @ryanbthiesant2307 ปีที่แล้ว +1

      @@chrisalexiuk thanks for not taking offence. I have ASD and ADHD. Super hard to focus without an idea of what you are making and what problem you are trying to solve. apologies for the directness.

  • @andriihorbokon2015
    @andriihorbokon2015 ปีที่แล้ว +2

    Great video! So much passion, love it.

  • @danraviv7393
    @danraviv7393 ปีที่แล้ว +2

    Thanks for the video, it was very useful and clear

  • @nothing_is_real_0000
    @nothing_is_real_0000 ปีที่แล้ว +6

    Hi Chris! Really thank you so much for such a detailed tutorial. Loved every bit of it. In the time of big corporations trying to monopolise the technology, people like you give hope and knowledge to so many others! Really appreciate it. You've made the lora tutorial easy to understand.
    Just had a question. I guess you have answered it in someway already, but just wanted to confirm. GPT-2 is somewhat old, so does this method apply to GPT-2 also? I mean can we use GPT-2 model instead of Bloom?

  • @ENJI84
    @ENJI84 ปีที่แล้ว +2

    Amazing set of videos!
    Can you please update on the model that is doing text-to-SQL that you've mentioned? This is very important to me :)

  • @MasterBrain182
    @MasterBrain182 ปีที่แล้ว +1

    Astonishing content Man 🚀

  • @tech-talks-with-ali
    @tech-talks-with-ali ปีที่แล้ว +1

    WoW! You are amazing man!

  • @sagardesai1253
    @sagardesai1253 ปีที่แล้ว +2

    informative video,
    can suggest some GPU compute resource. Aim is to implement the learnings. would like to know cheapest possible resource.

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Lambda Labs has great prices right now, otherwise Colab Pro is an affordable and flexible option.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว +3

    what is meant by causal language model? I assume it has nothing to do with the separate field of Causal AI.

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว +5

      A causal language model is a model that predicts the next token in the series. It only looks at tokens on the "left" or "backward" and cannot see future tokens.
      It's confusing because, as you noted, it has nothing to do with Causal AI.

  • @PCPTMCROSBY
    @PCPTMCROSBY ปีที่แล้ว +1

    trying to get Some people interested in product development and modification but they have requirements for material can't leave the building that means no internet everything has to be done in our machines in house we can't share it with collab or anybody it would be nice if you did more shows related to that subject of keeping complete control material because there are so many people that are just scared to death of breaches

  • @yasinyaqoobi
    @yasinyaqoobi 8 หลายเดือนก่อน +1

    Attempting to run the notbook but I keep getting ValueError: Attempting to unscale FP16 gradients. Tried different colab envs but no luck.

  • @omercelebi2012
    @omercelebi2012 ปีที่แล้ว +2

    Thanks for sharing this tutorial. I get 'IndexError: list index out of range' when reading from hub, I just copied and pasted code, it happens 6th progress bar. Any solution? Model: bloom-1b7

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Could you share with me your notebook so I can determine what the issue is?

  • @yasinyaqoobi
    @yasinyaqoobi 8 หลายเดือนก่อน

    Great video. Wish you showed the comparison against the base model. Just to clarify, we are not able to use the LORA model generated from model A with a different base model?

  • @mchaney2003
    @mchaney2003 ปีที่แล้ว +1

    What are the ways you mentioned to more efficiently teach a model new knowledge rather than new structures?

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      You'd be looking at something like continued pre-training. I perhaps misspoke by saying "more efficient", I meant to convey that LoRA might not be the best solution for domain-shifting a model - and so there are more *effective* ways to domain-shift.

  • @datasciencetoday7127
    @datasciencetoday7127 ปีที่แล้ว +2

    hi chris can you make a video on this or give me some pointers?
    scaling with langchain, how to have multiple sessions with LLM, meaning how to have a server with the LLM and serve to multiple people concurrently. What will be the system requirements to run such a setup. I believe we will be needing kubernetes for the scaling

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      You'll definitely need some kind of load balancing/resource balancing. I'll go over some more granular tips/tricks in a video!

  • @vita24733
    @vita24733 ปีที่แล้ว +2

    Hi Chris, the block of code with "model.gradient checkpointing enabled()" which increases stability of model. Have you made any previous videos where I can read and learn about this. If not, are there any resources you would reccomend to understand this.

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว +1

      Basically, you can think of it this way:
      As we need to represent tinier and tinier numbers - we need more and more exponents. There are a number of layers which tend toward very tiny numbers, and if we let those layers stay in 4bit/8bit it might have some unintended side-effects. So, we let those layers stay in full precision so as to not encounter those nasty side-effects!

    • @vita24733
      @vita24733 ปีที่แล้ว

      @@chrisalexiuk ohhh ok understood. This was by far the clearest explanation abt this. Thank you!

  • @user-su1lr7by7w
    @user-su1lr7by7w ปีที่แล้ว +2

    Amazing work. Can you put up something similar for fine-tuning MPT-7B model?
    I switched the model to MPT-7B but I keep getting this error during training "TypeError: forward() got an unexpected keyword argument 'inputs_embeds'". I am scratching my head but cant seem to figure out what went wrong.

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Sure, i can try and do this!

  • @user-hf3fu2xt2j
    @user-hf3fu2xt2j ปีที่แล้ว +1

    Tried this and it's interesting that 3b/7b1 bloom models perform WORSE on my test questions after this training, than bloom 1b1

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Hmmmm. That's very interesting!

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      I wonder specifically why, it would be interesting to know!

    • @user-hf3fu2xt2j
      @user-hf3fu2xt2j ปีที่แล้ว

      @@chrisalexiuk I didn't change other parameters though. maybe rank and batch size should be higher for higher param count models

    • @user-hf3fu2xt2j
      @user-hf3fu2xt2j ปีที่แล้ว

      @@chrisalexiuk man, it gets more weird now. I tried doing more steps with smaller learning rate, smaller batch size, on a bigger model. It started adding explanation sections and generating, well, explanations.
      bloom 3b

    • @gagangayari5981
      @gagangayari5981 ปีที่แล้ว

      @@user-hf3fu2xt2j What was the learning rate you were using? Is it the same as mentioned in BLOOM paper? Also what is the current learning rate ?

  • @Robo-fg3pq
    @Robo-fg3pq 8 หลายเดือนก่อน +2

    Getting "ValueError: Attempting to unscale FP16 gradients." when running the cell with trainer.train(). Any idea?

    • @shashankjainm5009
      @shashankjainm5009 4 หลายเดือนก่อน +1

      Even i'm getting the same error for "bloom-1b7". Did your problem resolved ?

    • @Jithendra0001
      @Jithendra0001 2 หลายเดือนก่อน +1

      @@shashankjainm5009 I am getting the same error. did you fix that??.

  • @honglu679
    @honglu679 3 หลายเดือนก่อน +1

    Thx for great video! so what is the better way to teach a model new knowledge, if FT is somehow only good for structure? thx much!

    • @chrisalexiuk
      @chrisalexiuk  17 วันที่ผ่านมา

      Continued Pre-Training or Domain Adaptive Pre-Training!

  • @sarabolouki
    @sarabolouki 11 หลายเดือนก่อน +1

    Thank you for the great tutorial! How do we set that we only want to fine-tune query_key_value and the rest of the weights are frozen?

    • @chrisalexiuk
      @chrisalexiuk  11 หลายเดือนก่อน

      By using the adapter method, you don't need to worry about that! The base model will remain frozen - and you will not train any model layers.

  • @98f5
    @98f5 10 หลายเดือนก่อน +2

    any chance you can make an example of fine tuning code llama like this

    • @chrisalexiuk
      @chrisalexiuk  10 หลายเดือนก่อน +1

      I might, yes!

    • @98f5
      @98f5 10 หลายเดือนก่อน

      @chrisalexiuk itd be greaty appreciated. There is almost no implementation docs or examples around for using lora 😀

  • @shaw5698
    @shaw5698 ปีที่แล้ว +3

    Sir, Is it possible to share the colab notebook? For Extractive QA, How we will evaluate and compare with other models? Like, EM and F1, how we will implement those and compare with other Bert or llm? models

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว +1

      Yes, sorry, I will be sure to update the description with the Notebook used in the video!

    • @shaw5698
      @shaw5698 ปีที่แล้ว

      @@chrisalexiuk Thank you, it will be very much appreciated.

    • @prospersteph
      @prospersteph ปีที่แล้ว

      @@chrisalexiuk we will appreciate it

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว +1

      colab.research.google.com/drive/1GzHdbIarvnRee_Ix9bdhx1a1v0_G_eqo?usp=sharing

  • @akashdeepsoni
    @akashdeepsoni ปีที่แล้ว

    Thanks for explaining the implementation in such an easy way.
    I wanted to play around with this and I used the free tier google colab with TU-GPU and used the smaller "bigscience/bloom-1b7" model. The inference method make_inference(context, question) is giving me below error. Is this because of using the free-tier GPU, though training and all the previous steps were executed without any issues. Would be great if you can shed some light on this !
    Error :
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

  • @maxlgemeinderat9202
    @maxlgemeinderat9202 10 หลายเดือนก่อน +1

    Great video! What wpuld be different if i do download the model not on colab but locally? Which lines do change in the code?

    • @chrisalexiuk
      @chrisalexiuk  10 หลายเดือนก่อน

      You should be able to largely recreate this process locally - but you would need to `pip install` a few more dependencies. You can find which by looking at what the colab environment has installed - or using a tool like pipreqs!

  • @Neuralbench
    @Neuralbench ปีที่แล้ว +1

    Hey Chris, Awesome video! Thank you for it. Can you please help me out here. I am using your notebook but when it do the model.push_to_hub then , adapter_config.json and adapter_model.bin are not being uploaded to the hugging face , instead i only see
    1. generation_config.json
    2. pytorch_model.bin
    3. config.json
    What am i doing wrong here?

    • @Neuralbench
      @Neuralbench ปีที่แล้ว +1

      I figured out the problem , it was this line
      model = model.merge_and_unload() after the training

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Yes! Sorry, Adil!
      We are only pushing the actual *LoRA* weights to the hub - and merging the model back will mean that the entire model will be pushed to hub.
      Great troubleshooting!

  • @alexandria6097
    @alexandria6097 9 หลายเดือนก่อน

    Do you know how much GPU RAM the meta-llama/Llama-2-70b-chat model would take to fine-tune?

  • @kartikpodugu
    @kartikpodugu 11 หลายเดือนก่อน +1

    Amazing.
    I tried this on my desktop which has NVIDIA GeForce 3060. And, I was able to run only 6 steps.
    On windows I wasn't able to run at all as i am facing some issues with bitsandbytes library.
    Also, I used bloom1b7.
    But, after doing all the exercise, i see that the output generated doesn't stop after CONTEXT, QUESTION and ANSWER, it keeps generating some text which includes EXAMPLE and so on.
    Though the notebook adds bitsandbytes at the start using "import bitsandbytes as bnb", bnb is not used anywhere.
    So, I thought commenting that line out will make my script work on windows, but no, even without the line the script that i wrote mimicking your colab notebook, didn't work on windows.
    Can you tell me how the notebook depends on bitsandbytes?

    • @chrisalexiuk
      @chrisalexiuk  11 หลายเดือนก่อน

      Bitsandbytes is leveraged behind the scenes through the HuggingFace library.

  • @davidromero1373
    @davidromero1373 10 หลายเดือนก่อน

    Hi a question, can we use lora to just reduce the size of a model and run inference, or we have to train it always?

    • @chrisalexiuk
      @chrisalexiuk  10 หลายเดือนก่อน

      LoRA will not reduce the size of the model during inference. It actually adds a very small amount extra - this is because the memory savings come from reduced number of optimizer states.

  • @chrism315
    @chrism315 ปีที่แล้ว +1

    The notebook linked doesn't match the one used in the video. Is the notebook in the video available somewhere?
    Thanks, great video!

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      Ah, so sorry! I'll resolve this ASAP.

    • @chrisalexiuk
      @chrisalexiuk  ปีที่แล้ว

      I've updated the link - please let me know if it doesn't resolve your issue!
      Sorry about that!

  • @ilya6889
    @ilya6889 ปีที่แล้ว

    Please don't scream 😬

  • @ArunKumar-bp5lo
    @ArunKumar-bp5lo 11 หลายเดือนก่อน

    facing KeyError: 'h.0.input_layernorm.bias' when downloading from the hub

    • @chrisalexiuk
      @chrisalexiuk  11 หลายเดือนก่อน

      Hmmm.
      Are you using the base notebook?

    • @ArunKumar-bp5lo
      @ArunKumar-bp5lo 11 หลายเดือนก่อน

      @@chrisalexiuk yeah just changed the model to 1b7

    • @chrisalexiuk
      @chrisalexiuk  11 หลายเดือนก่อน

      Could you try adding `device_map="auto"` to your `.from_pretrained()` method?
      Also, are you using a GPU enabled instance for the Notebook?