Llama 3 Fine Tuning for Dummies (with 16k, 32k,... Context)

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 ธ.ค. 2024

ความคิดเห็น • 48

  • @drnicwilliams
    @drnicwilliams 7 หลายเดือนก่อน +8

    LOL “We don’t need this code, so let’s put it in a text cell”

  • @triynizzles
    @triynizzles 5 หลายเดือนก่อน +3

    At 10:32 the unsloth comment says "We only need to update 1 to 10% of all parameters" what does that mean? I recently created my own training data, it has 1015 questions and answers, when I run the trainer for 1 epoch, it only does 127 steps, shouldn't do more?

    • @nodematic
      @nodematic  5 หลายเดือนก่อน

      There are "base model" parameters, and then "adapter layer" parameters that are added at the end of the base model, when doing this LoRA fine-tuning. The comment is highlighting that we are only working with the adapter layers at the end when doing this fine tuning - which is around 1-10% of all the parameters. This is normal. You could do full-parameter fine-tuning (which updates the base model parameters), but that's not worth the high computational demands and complexity for most use cases.
      Each one of your steps is doing a batch when fine-tuning. The effective batch size is: per_device_train_batch_size * gradient_accumulation_steps * number_of_devices. For the demonstrated setup, the effective batch size is 8, meaning 127 steps covers up to 127*8=1016 of your Q&A examples. So, you're using all your Q&A examples, and doing a full pass over your training data, in the 127 steps. You could bump the epochs value if you want to do multiple passes.

  • @danielhanchen
    @danielhanchen 7 หลายเดือนก่อน +2

    Oh fantastic video as always - absolutely packed with detailed information so great work!

  • @simonstrandgaard5503
    @simonstrandgaard5503 6 หลายเดือนก่อน +2

    Great explanation. The background music was a little distracting.

    • @nodematic
      @nodematic  6 หลายเดือนก่อน +1

      Thanks for the feedback - we'll keep this in mind on future videos.

  • @ralvin3614
    @ralvin3614 6 หลายเดือนก่อน +1

    Really love the play list! Great vidieo.

  • @ahmadrana-c1y
    @ahmadrana-c1y 14 วันที่ผ่านมา

    is it worth using unsloth with amazon sagemaker ?

  • @MYPL89
    @MYPL89 3 หลายเดือนก่อน +1

    What is the difference between (push_to_hub) and (push_to_hub_merged in 4bits)?
    Great video btw, many thanks!!

  • @triynizzles
    @triynizzles 6 หลายเดือนก่อน

    Hello, I don't understand how at 11:00 I can change the "yahma/alpaca-cleaned" to a local .json file on my pc?

    • @nodematic
      @nodematic  6 หลายเดือนก่อน

      The Hugging Face datasets library is used in either case, to compile a dataset of training strings. The load_dataset("yahma/alpaca-cleaned") approach (or similar) is only if you have your dataset in Hugging Face. The Dataset.from_dict used in the video should work if you read in the data from your local json and use it for the dictionary's "text" value. Depending upon how the text is structured in your JSON, you may need to do string interpolation - the end result "text" values for the dataset need to be pure strings.

    • @triynizzles
      @triynizzles 6 หลายเดือนก่อน

      @@nodematic Thank you! I may have more questions in the future. :)

  • @shehanmkg
    @shehanmkg 7 หลายเดือนก่อน +1

    Great explanation. This could be a stupid question. How do we fine-tune for trigger function calling?

    • @nodematic
      @nodematic  7 หลายเดือนก่อน +1

      Thanks for your question-it's definitely not a stupid one! In your dataset, have fields like "instruction", "prompt", and "function", and then do the string interpolation to create your text field (you could do it similar to the video, but replace "### Story" with "### Prompt" and "### Summary" with "### Function"). Make sure your training set has a consistent format for the function to trigger, and a consistent fallback value for non-triggering cases. Overall, the process should be quite similar to the video.
      Your model itself won't be able to actually trigger the function - only identify the right function to trigger (and possibly the arguments to supply to the function). You'll need to execute the function as a "next step" in some broader pipeline, application, service, or script.
      Hope I'm understanding the question correctly and that helps.

  • @AshrafiArad
    @AshrafiArad 5 หลายเดือนก่อน

    Great Video. Loved the fun generated musics. We don't need this code, so let's put it in the text cell =))

  • @slimpbisquitte3942
    @slimpbisquitte3942 7 หลายเดือนก่อน +2

    Really comprehensive and well-explained! Great work!
    I wonder if it is also possible to fine-tune not a text generator but an image generator. Does someone have any ideas? I am super new to this field and pretty much in the dark so far. Could not find something for image generation yet :/
    Thanks for any suggestions!

    • @nodematic
      @nodematic  7 หลายเดือนก่อน

      We'll try to make a video on this. Thanks for the suggestion.

  • @triynizzles
    @triynizzles 6 หลายเดือนก่อน +1

    I have been having tremendous difficulty, can this be run locally in VScode?

    • @nodematic
      @nodematic  6 หลายเดือนก่อน

      We haven't tested this, but it should work. The biggest concern would be if you don't have enough GPU memory on your local machine or if you don't have a clean Python packages and CUDA setup.

    • @triynizzles
      @triynizzles 6 หลายเดือนก่อน +1

      @@nodematic I have read about it more and it looks like windows isnt acting too friendly and most people are running Linux. :(

  • @alokrajsidhaarth7130
    @alokrajsidhaarth7130 7 หลายเดือนก่อน +1

    Great Video!
    I had a doubt about RoPE Scaling. How efficient is it and to what extent does it help solve the LLM context window size issue?
    Thanks!

    • @nodematic
      @nodematic  7 หลายเดือนก่อน +2

      RoPE is the standard way to solve the context window size issue with these open models. It can come at a quality cost, but it's basically the best method we have if you need to go beyond the model's default context window. Use it only if you truly need the additional tokens. In the video's example, the RoPE scaling is needed, because you simply can't summarize a 16k token story by only looking at the second-half 8k of tokens.

    • @npip99
      @npip99 7 หลายเดือนก่อน

      @@nodematic ​ @nodematic Is there an easy API for RoPE?
      I don't even need fine-tuning, I just need a chat completion API for 32k context Llama 3

    • @nodematic
      @nodematic  7 หลายเดือนก่อน

      Yes, you can use RoPE without fine-tuning (e.g., off-the-shelf Llama 3 with a 32k context). I would recommend using Hugging Face libraries, which can be configured for RoPE scaling (for example TGI RoPE scaling is detailed here huggingface.co/docs/text-generation-inference/en/basic_tutorials/preparing_model).

  • @minidraco2601
    @minidraco2601 7 หลายเดือนก่อน

    whats the name of the some at 3:47? sounds pretty cool

    • @nodematic
      @nodematic  7 หลายเดือนก่อน

      That's a Udio-generated custom song, and isn't published.

  • @SahlEbrahim
    @SahlEbrahim 4 หลายเดือนก่อน

    why arent we tokenizing the finetuningdataset? is it automatically done in the sft trainer

    • @nodematic
      @nodematic  4 หลายเดือนก่อน

      Yes, it's done by the Trainer

  • @galavant_amli
    @galavant_amli 4 หลายเดือนก่อน

    Would better, if someone give noob tutorial or guide for how to prepare dataset.
    I do get data is set of input and output, but I dont know to label data

  • @adnenbenabdelaali6016
    @adnenbenabdelaali6016 7 หลายเดือนก่อน

    Great video and nice code, can you do this context length extension for Deepseek Coder model ?

    • @nodematic
      @nodematic  7 หลายเดือนก่อน

      I believe it's possible, but I haven't tried yet and there isn't an existing Unsloth model for this. We'll look into it though and try to create a video. Thanks for the suggestion.

  • @nishitp28
    @nishitp28 7 หลายเดือนก่อน

    Nice Video,
    What should be the format for data extraction, if I want to extract data from a chunk?
    Can I include something like:
    """
    {Instruction or System Prompt}
    ### {Context or Chunks}
    ### {Question}
    ### {Answer}
    """

    • @nodematic
      @nodematic  7 หลายเดือนก่อน +1

      The "###" lines signify headers, so I wouldn't put your content on those lines - rather, they are used to categorize the line(s) of text below each header. If you're using a chunk of content (e.g., via some sort of RAG approach), yes, you could have that as a separate categorization. Something like:
      """
      {instruction}
      ### Background
      {chunk}
      ### Question
      {question}
      ### Answer
      {answer}
      """
      For the best results, use the header terms in your instruction. For the example above, this could be something like "Based on the provided background, which comes from documentation, FAQs, and/or support tickets, answer the supplied question as clearly and factually as possible. If the background is insufficient to answer the question, answer "I don't know".".

  • @ShdowGarden
    @ShdowGarden 6 หลายเดือนก่อน

    hi, I am fine tuning llama 3 model but i am facing some issue. Your video was great. I was hoping to connect with you. Can we connect?

    • @nodematic
      @nodematic  6 หลายเดือนก่อน

      Thanks. You can reach out via email at community@nodematic.com. We often do not have the staff to handle technical troubleshooting or architectural consulting, but we'll answer if we can.

  • @Itsgosm
    @Itsgosm 7 หลายเดือนก่อน

    Amazing video!, been curious if had to train a set of codes, which would have indentations (take example python code), will it still require data to be in ]standard format of having 'instruction', 'output' and 'input'? 150+ codes with quite high complexity will it be possible to train it? are there any other ways to set up the dataset? and is Llama3 capable of getting trained on un-structured data?

    • @nodematic
      @nodematic  7 หลายเดือนก่อน

      Yes, you could use a different, non-Alpaca-style format. For the "text" field creation via string interpolation, replace that with a text block of your code lines (including line breaks).
      Llama-3 does well on HumanEval, so I suspect it would work well for your described use case. Just be careful with how you create your samples - getting the model to stop after generating the right line/block of code may not be easy (although you could trim things down with post-processing).

  • @SameerUddin-q5k
    @SameerUddin-q5k 7 หลายเดือนก่อน

    do we need to create repo first before push to hub command ?

    • @nodematic
      @nodematic  7 หลายเดือนก่อน +1

      No, just replace "hf/model" with your username (or organization name) and desired model name. Also, if you want a private repo, add a private=True argument to push_to_hub_merged.

  • @krishparwani2039
    @krishparwani2039 5 หลายเดือนก่อน +3

    this is not for dummies i could not understand anything

  • @LaelAl-Halawani-c4l
    @LaelAl-Halawani-c4l 6 หลายเดือนก่อน

    i hate how everyone does unsloth tutorials not able of using multigpu setup

  • @artemvakhutinskiy900
    @artemvakhutinskiy900 5 หลายเดือนก่อน

    not gonna lie the ai song was a banger

  • @Gootsffrida
    @Gootsffrida 2 หลายเดือนก่อน

    You lost me withn 60 secs how is this for dummies

  • @ChituyiDalmasWakhusama
    @ChituyiDalmasWakhusama 6 หลายเดือนก่อน

    Hi, i keep getting this error "TypeError: argument of type 'NoneType' is not iterable" It is originating from "usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py" Could you please share the requirements.txt. Also it only happens when i try to push "merge_16bit". merge_4bit works just fine!