The EASIEST way to finetune LLAMA-v2 on local machine!

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 ก.ค. 2023
  • In this video, I'll show you the easiest, simplest and fastest way to fine tune llama-v2 on your local machine for a custom dataset! You can also use the tutorial to train/finetune any other Large Language Model (LLM). In this tutorial, we will be using autotrain-advanced.
    AutoTrain Advanced github repo: github.com/huggingface/autotr...
    Steps:
    Install autotrain-advanced using pip:
    - pip install autotrain-advanced
    Setup (optional, required on google colab):
    - autotrain setup --update-torch
    Train:
    autotrain llm --train --project_name my-llm --model meta-llama/Llama-2-7b-hf --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 12 --num_train_epochs 3 --trainer sft
    If you are on free version of colab, use this model instead: huggingface.co/abhishek/llama.... This is a smaller sharded version of llama-2-7b-hf by meta.
    Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
    My book, Approaching (Almost) Any Machine Learning problem, is available for free here: bit.ly/approachingml
    Follow me on:
    Twitter: / abhi1thakur
    LinkedIn: / abhi1thakur
    Kaggle: kaggle.com/abhishek

ความคิดเห็น • 291

  • @linuxmanju
    @linuxmanju 4 หลายเดือนก่อน +29

    Anyone comes across this in 2024 (jan ), the command switches with new autotrain version is autotrain llm --train --project-name josh-ops --model mistralai/Mistral-7B-Instruct-v0.2 --data-path . --use-peft --quantization int4 --lr 2e-4 --train-batch-size 12 --epochs 3 --trainer sft . Great, Video, thanks Abhishek

    • @BrusnickiRoberto
      @BrusnickiRoberto 4 หลายเดือนก่อน

      After finetuning it, how to run it?

  • @tarungupta83
    @tarungupta83 11 หลายเดือนก่อน +4

    That's Awesome, nothing better than this way of training large language model. Super easy ❤

  • @andyjax100
    @andyjax100 2 หลายเดือนก่อน

    Keeping it this simple is something very few people are able to do. Very well explained.
    This can be understood by even a beginner. Atleast the execution if not the intuition behind it. Kudos

  • @syedshahab8471
    @syedshahab8471 11 หลายเดือนก่อน +2

    Thank you for the on-point tutorial.

  • @tarungupta83
    @tarungupta83 11 หลายเดือนก่อน +5

    Appreciate it, and request to continue making such videos🎉

  • @WeDuMedia
    @WeDuMedia หลายเดือนก่อน

    Incredibly helpful video, I appreciate that you took the time to create this! Great stuff

  • @charleskarpati1129
    @charleskarpati1129 6 หลายเดือนก่อน

    Thank you Abhishek! This is phenomenal.

  • @AICoffeeBreak
    @AICoffeeBreak 11 หลายเดือนก่อน +11

    Amazing, tutorials at light speed! Llama 2 was just released! 😮

  • @MasterBrain182
    @MasterBrain182 10 หลายเดือนก่อน +1

    Astonishing content Man 🔥🔥🔥 🚀

  • @nirsarkar
    @nirsarkar 10 หลายเดือนก่อน

    Excellent, thank you so much. I will try.

  • @xthefoetusx
    @xthefoetusx 11 หลายเดือนก่อน +3

    Great video! Would be great if in some future vid you could go into depth on the training hyperparameters and perhaps also talk about what size your custom datasets should be.

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน +4

      sometimes I do that. however, this model would have taken wayy too long to train. im training a model as i type here and if i get good results ill share both model and params 🙂

    • @emrahe468
      @emrahe468 10 หลายเดือนก่อน +1

      @@abhishekkrthakur guess no good luck with the training :(

  • @sohailhosseini2266
    @sohailhosseini2266 8 หลายเดือนก่อน

    Thanks for sharing!

  • @JagadishSongapagounder
    @JagadishSongapagounder 11 หลายเดือนก่อน +1

    Great Job :)

  • @aaronliruns
    @aaronliruns 9 หลายเดือนก่อน +7

    Great tutorial! Can you also put up one video teaching on how to merge the fine tuned weights to the base model and do inference? Would like to see an end-to-end course. Thank you!

    • @adamocheri3513
      @adamocheri3513 9 หลายเดือนก่อน +2

      +1 on this question !!!!

    • @devyanshrastogi
      @devyanshrastogi 7 หลายเดือนก่อน

      any updates guys?? I really want to know how to merge the fine tuned model with the base model and do the inference. Do let me you have any resources or insights about the same

    • @kopamed5024
      @kopamed5024 4 หลายเดือนก่อน

      @@devyanshrastogi also need this answered. have you guys had any success?

  • @YuniYoshi
    @YuniYoshi 6 หลายเดือนก่อน +1

    There is only one thing I want to see. I want to see you using the final result and prove it actually works. Thank you.

  • @bryanvann
    @bryanvann 11 หลายเดือนก่อน +18

    Thanks for the tutorial! A couple questions for you. Is there an approach you're using to test quality and verity that the training data has influenced the weights in the model sufficiently to learn the new task? And second, can you use the same approach for unstructured training data such as using a large corpus of private data to do domain adaptation?

  • @mautkajuari
    @mautkajuari 11 หลายเดือนก่อน

    Informative video, hopefully one day I will get a task that requires me to finetune a LLM

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน +1

      or you can just do it for fun 🤗

  • @abramswee
    @abramswee 10 หลายเดือนก่อน

    thanks for sharing!

  • @dr.mikeybee
    @dr.mikeybee 7 หลายเดือนก่อน

    Nice job!

  • @abhishekkrthakur
    @abhishekkrthakur  11 หลายเดือนก่อน +25

    Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)

    • @arpitghatiya7214
      @arpitghatiya7214 9 หลายเดือนก่อน

      Please make a video on Llama2 + RAG (instead of finetuning)

  • @jdoejdoe6161
    @jdoejdoe6161 10 หลายเดือนก่อน +1

    Hi Abh
    Your method is inspiring and commendable. How do we read the csv or json training dataset we prepared instead of the hugging face dataset you used?

  • @prachijadhav9098
    @prachijadhav9098 10 หลายเดือนก่อน +2

    Nice video Abhishek!
    I am curious to know about custom data for LLMs. What is the ideal (good quality) data size (e.g., #rows), to fine-tune these models for good performance, not necessarily it should be big data of course.
    Thanks!

  • @spookyrays2816
    @spookyrays2816 11 หลายเดือนก่อน

    Thank you brother

  • @sd_1989
    @sd_1989 11 หลายเดือนก่อน

    Thanks!

  • @cloudsystem3740
    @cloudsystem3740 10 หลายเดือนก่อน

    thank you very much

  • @tal7atal7a66
    @tal7atal7a66 2 หลายเดือนก่อน

    thanks bro ❤

  • @elmuchoconrado
    @elmuchoconrado 9 หลายเดือนก่อน +7

    As always very useful and short without wasting anyone's time. Thank you. Just I'm a bit confused about the prompt formatting you have used here - "### Instruction:
    ### Input:... etc" while Llama official is "[INST] {{ system_prompt }}{{ user_message }} [/INST]" and on TheBloke's page it says "SYSTEM: {system_prompt}
    USER: {prompt}
    ASSISTANT:"

    • @ahmetekizx
      @ahmetekizx 7 หลายเดือนก่อน

      I think this isn't mandatory, it is a suggestion.

  • @ajaytaneja111
    @ajaytaneja111 10 หลายเดือนก่อน +4

    Hi Abhishek, is the auto train using LORA or prompt tuning as the PEFT technique?

  • @nehabidkar7377
    @nehabidkar7377 9 หลายเดือนก่อน

    Thanks for this great explanation. Can you provide the link to you training data?

  • @stevenshaw124
    @stevenshaw124 10 หลายเดือนก่อน +3

    what kind of GPUs do you have? how big was your dataset and how long did it take to train? what is the smallest fine-tuning data set size that would be reasonable?

  • @FlyXing16
    @FlyXing16 9 หลายเดือนก่อน

    Thanks Kaggle grand master :) you got an channel.

  • @jeremyarancio1683
    @jeremyarancio1683 11 หลายเดือนก่อน

    Nice vid
    Should we label input tokens to -100 to focus the training on the prediction?
    I see no one doing it

  • @jessem2176
    @jessem2176 10 หลายเดือนก่อน

    Great Video. i love it and can't wait to try it. Now that Llama2 is out... is it better to FineTune a model or try to create your own Model?

  • @mariusirgens5555
    @mariusirgens5555 10 หลายเดือนก่อน

    Superb video! Does autotrain allow to export finetuned model as GGML file? Or can it be used with GGML file?

  • @user-nj7ry9dl3y
    @user-nj7ry9dl3y 10 หลายเดือนก่อน +1

    For fine-tuning of the large language models (llama-2-13b-chat), what should be the format(.text/.json/.csv) and structure (like should be an excel or docs file or prompt and response or instruction and output) of the training dataset? And also how to prepare or organise the tabular dataset for training purpose?

  • @boujlidamohamed
    @boujlidamohamed 10 หลายเดือนก่อน +1

    First thank you for the great tutorial , I have one question : I am trying to finetune the model on Japanese , do you have any advice for that ? I have tried the same script as you did but it didn't work; it produced some gibberish after the training finished , I am guessing it is a tokenizer problem, what do you think ?

  • @manishsharma2211
    @manishsharma2211 11 หลายเดือนก่อน

    The way Abhishek side eyes before stopping the video and resuming is is soo crazy 🤣🤣😅

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน +2

      lol. big screen. button too far 🤣

  • @r34ct4
    @r34ct4 11 หลายเดือนก่อน

    Thanks for the comprehensive tutorial. Can this be done using chat logs to build a clone of your friend? I have done this with GPT3.5 finetuning using prompt->response. The prompts are questions generated by ChatGPT based on the chat log message. Can the same thing be done with Instruction->Input->Response? Thank you very much man.

  • @muhammadasadullah4452
    @muhammadasadullah4452 8 หลายเดือนก่อน

    Great work Abhishek Thakur, it will be great if you made a video on how to run the fine-tuned model

    • @abhishekkrthakur
      @abhishekkrthakur  8 หลายเดือนก่อน

      already done. check out other videos on my channel

    • @AnandMoorthyJ
      @AnandMoorthyJ 7 หลายเดือนก่อน

      @@abhishekkrthakur can you please post the video link? there are many videos in your channel, it's hard to find which one you are talking about.

    • @devyanshrastogi
      @devyanshrastogi 7 หลายเดือนก่อน

      ​@@abhishekkrthakur I did fine tuning on the model, but I don't think I can run it on google colab with T4 since its show out of memory error!! Any suggestion?

    • @ozzzer
      @ozzzer 2 หลายเดือนก่อน

      @@AnandMoorthyJ did you find the video? im looking for the link aswell :)

  • @unclecode
    @unclecode 10 หลายเดือนก่อน +1

    Beautiful content, I have a side question, what tool you are using to have "copilot"-like suggestion in your terminal? Thx again for the video

    • @jessem2176
      @jessem2176 10 หลายเดือนก่อน

      I use Hugginfaces co pilot. - it works pretty well and super easy to set up and free..

    • @ahmetekizx
      @ahmetekizx 7 หลายเดือนก่อน

      @@jessem2176 Thanks for the recommendation, but did you mean HuggingFace Personal-copilot Blog?

  • @safaelaqrichi9096
    @safaelaqrichi9096 10 หลายเดือนก่อน

    Thank you for this interesting video. How could we change the encoding to ''latin-1' in order to train on french language ? thank you.

  • @deltagamma1442
    @deltagamma1442 11 หลายเดือนก่อน +1

    How do you set the training data? I see different people using different formats? Does it matter or is the only requirement that it has to be structured meaniningfully?

  • @tachyon7777
    @tachyon7777 8 หลายเดือนก่อน

    Great one! Two things - you didn't show how to configure the cli to enable access to the model. Secondly, it would be useful to know how to use aws for training. Thanks!

  • @rohitdaddekar2900
    @rohitdaddekar2900 10 หลายเดือนก่อน

    Hey, could you guide us how to train custom dataset on llama2? How to prepare our dataset for training?

  • @EduardoRodriguez-fu4ry
    @EduardoRodriguez-fu4ry 11 หลายเดือนก่อน

    Great tutorial! Thank you! Maybe I missed it but, at which point do you enter your HF token?

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน +1

      You dont. You login using "huggingface-cli login" command. There's also a similar command for notebooks and colab. :)

  • @DevanshiSukhija
    @DevanshiSukhija 10 หลายเดือนก่อน

    How is your ipython giving suggestions? I want the same set up. Please make a video on these types of set up that assists in coding and other processes.

  • @crimsonalchemist856
    @crimsonalchemist856 11 หลายเดือนก่อน +1

    Hey Abhishek, Thanks for sharing this amazing tutorial. Can I do this on my RTX 3070Ti 8GB GPU? If yes, what batch size would be preferable?

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน +2

      8GB sounds a bit low for this. maybe try bs=1 or 2? but tbh, im not sure if it will work. Might work fine for a smaller model!

  • @chichen8425
    @chichen8425 2 หลายเดือนก่อน

    I know it could be too much but could you also make a video of how to prepare the data? I have like 'question' and 'answer' but I am strugging to make it to a trainable data set into that kind of csv so I could use it!

  • @jaivalani4609
    @jaivalani4609 10 หลายเดือนก่อน

    Thank you ,what is diff between instruction and input

  • @utoubp
    @utoubp 5 หลายเดือนก่อน

    Hi Abhishek,
    Much appreciated. How would things change if we were to use simple fine tuning? That is, just a large single code file to learn from, to tune code-llama, phi2, etc..

  • @sandeelg_lite
    @sandeelg_lite 10 หลายเดือนก่อน

    I trained model using autotrain in same way as you suggested and model file is stored.
    Now I need to use this model for prediction. Can you shed some light on this as well?

  • @ajaypranav1390
    @ajaypranav1390 5 หลายเดือนก่อน

    Thanks for this great video, but how to fine tune or train for question answer data set

  • @returncode0000
    @returncode0000 11 หลายเดือนก่อน

    I just bought a RTX 4090 Founders Edition. Could you tell on a particular example were I could run into limits with card when training LLMs locally? I personally think that I'm safe for the next few years and I will not run in any problems.

  • @user-ut8ts5gv2g
    @user-ut8ts5gv2g 4 หลายเดือนก่อน

    nice video. it great for a beginner to learn how to fine tune LLAMA2 locally. It would be better if you can share the code and dataset.

  • @dhruvilshah7770
    @dhruvilshah7770 3 หลายเดือนก่อน +1

    Can you make a video for fine tuning in silicon macs ?

  • @simonv3548
    @simonv3548 10 หลายเดือนก่อน

    Thanks for the nice tutorial. Could you show how to perform inference the finetuned model?

    • @abhishekkrthakur
      @abhishekkrthakur  10 หลายเดือนก่อน +1

      yes. its in one of my previous videos :) thanks!

    • @BrusnickiRoberto
      @BrusnickiRoberto 4 หลายเดือนก่อน

      @@abhishekkrthakur Which one?

  • @ShotterManable
    @ShotterManable 11 หลายเดือนก่อน

    Is there a way to run it on CPU? Thanks sir, I love your work

  • @user-we6vc9co1b
    @user-we6vc9co1b 11 หลายเดือนก่อน +1

    Do you have to use [INST]...[/INST] for indicating the instructions? I think the original Llama 2 model was trained with these tags, so I am a bit puzzled if you have to use the tags in the csv or they are added internally ?!

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน

      in this video, im finetuning the base model. you can finetune it anyway you want. you can even take the chat model and finetune it this way. if you are using a different format for finetuning, you must use the same format while inference in order to get the best results.

  • @manojreddy7618
    @manojreddy7618 10 หลายเดือนก่อน

    Thank you for the video. I am new to this, so I am trying to set it up on my windows PC. When I am trying to install the latest version of autotrain-advanced==0.6.2, I get an error saying: trition==2.0.0.post1 cannot be found. Which I believe is only available on Linux. So is it possible to use autotrain-advanced on windows?

  • @agostonhuszka8237
    @agostonhuszka8237 11 หลายเดือนก่อน

    Thank for the tutorial!
    How can I fine-tune the language model with a domain-specific unlabeled dataset to improve performance on that specific domain? Is it effective to leave the instruction and input empty and only use domain-specific text for the output?

    • @sanjaykotabagi4407
      @sanjaykotabagi4407 10 หลายเดือนก่อน

      Hey, Can we connect. Even I need help on similar topic. We can discuss more ...

  • @marioricoibanez144
    @marioricoibanez144 11 หลายเดือนก่อน

    Hey! Fantastic video, but i do not understand at all the division into smaller chunks of the model in order to work in free version of collab, can you explain it? Thank you!

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน

      chunks are loaded into ram first. since larger chunks didnt fit in ram with all the other stuff, i created a version with smaller shards :)

  • @0xeb-
    @0xeb- 10 หลายเดือนก่อน

    How to shard as you mentioned towards the end?

  • @Truizify
    @Truizify 11 หลายเดือนก่อน

    Thanks for the tutorial! How would you modify the code to train on a dataset containing a single column of text? i.e. trying to perform domain-specific additional pretraining?
    I would remove the peft portion to do full finetuning, anything else?

    • @sanjaykotabagi4407
      @sanjaykotabagi4407 10 หลายเดือนก่อน

      Hey, Can we connect. Even I need help on similar topic. We can discuss more ...

    • @user-bq2vt4zz2e
      @user-bq2vt4zz2e 9 หลายเดือนก่อน

      Hi, I'm looking into something similar. Did you find a good way to do this?

  • @oliversilverstein1221
    @oliversilverstein1221 9 หลายเดือนก่อน

    hello, thank you. i really need to know: does this pad appropriately? also, how does it internally split it into prompt completion? Can i make up roles like ### System? does it complete only the last message?

  • @jdoejdoe6161
    @jdoejdoe6161 11 หลายเดือนก่อน +3

    Please show how you used the trained mode for inference

  • @ezzye
    @ezzye 10 หลายเดือนก่อน

    "These days", LOL. We should all be training our own LLMs.

  • @manabchetia8382
    @manabchetia8382 11 หลายเดือนก่อน

    Thank you. Can you please also show us how to train on GPU #3 or GPU#1 or both GPU#1&3 but not in GPU #0 in a multi GPU machine?

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน +4

      CUDA_VISIBLE_DEVICES=0 autotrain llm --train ..... will run it on gpu 0
      CUDA_VISIBLE_DEVICES=1,3 autotrain llm --train ..... will run it on gpu 1 and 3

  • @susteven4974
    @susteven4974 10 หลายเดือนก่อน

    It's very useful for me , can you share you instruction dataset or just tell me where I can look for some small and good dataset , thank's again !

    • @susteven4974
      @susteven4974 10 หลายเดือนก่อน

      I already found some tiny dataset from huggingface , I will follow your step , that's interesting thing !

  • @kunalpatil7705
    @kunalpatil7705 9 หลายเดือนก่อน

    Thanks for the video. i have a doubt that how can i make a package of it so others can also use it offline by just installing the application

  • @anjalichoudhary2093
    @anjalichoudhary2093 10 หลายเดือนก่อน

    Great tutorial, how can i run the fine tuned model on inference data?

    • @abhishekkrthakur
      @abhishekkrthakur  10 หลายเดือนก่อน

      there are couple of videos on my channel for that

  • @srinivasanm48
    @srinivasanm48 หลายเดือนก่อน

    When will I be able to see the model that I have trained? Once all the training is complete?

  • @user-oh6ve3df7l
    @user-oh6ve3df7l 10 หลายเดือนก่อน +1

    Amazing content. One Q left: how can I run the model locally in inference mode after training? Anyone have a command for that?

    • @abhishekkrthakur
      @abhishekkrthakur  10 หลายเดือนก่อน +1

      th-cam.com/video/o1BCq1KJULM/w-d-xo.html

  • @protectorate2823
    @protectorate2823 9 หลายเดือนก่อน

    Hello @abishekkrthakur can I train summarization models with autotrain advanced?

  • @rog0079
    @rog0079 10 หลายเดือนก่อน

    Thanks for this awesome video! Also, if we were fine-tuning llamaV2 for instruction/chat use-case, we would just replace Input, Response with User, Assistant respectively in the dataset, that's it right?

    • @ahmetekizx
      @ahmetekizx 7 หลายเดือนก่อน

      You should check for Alpaca Format.

  • @mallorywestwood
    @mallorywestwood 10 หลายเดือนก่อน

    Can we do this on a CPU? I am using a GGmL model.. please share your thoughts

  • @0xeb-
    @0xeb- 10 หลายเดือนก่อน

    How do you deal with response in the dataset that has newline characters?

  • @sebastianandrescajasordone8501
    @sebastianandrescajasordone8501 11 หลายเดือนก่อน

    I am running out of memory when testing it on the free-version of google colab, did you use the exact same tuning parameters as described in the video?

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน

      yes. you can reduce batch size. note, you need to use different model path if you are on colab or it will run out of memory. see description for more details

  • @jas5945
    @jas5945 10 หลายเดือนก่อน +1

    Very good tutorial. On what machine are you running this? I am trying to run it on a Macbook pro M1 but I keep getting "ValueError: No GPU found. Please install CUDA and try again." I have tried to do this directly on Huggingface and got "error 400: bad request"...so I cloned autotrain and ran it locally...still getting error 400. Do you have any pointers?

    • @nirsarkar
      @nirsarkar 8 หลายเดือนก่อน

      Same error

  • @_Zefyr_
    @_Zefyr_ 8 หลายเดือนก่อน +1

    Hi I have a question , it´s posible to use "autotrain" without cuda, with rocm support of AMD GPU ?

  • @ConsultingjoeOnline
    @ConsultingjoeOnline 3 หลายเดือนก่อน

    How do you convert it to work with Ollama? I setup the model file and it doesnt seem to know anything from my training.

  • @mohdzaki1930
    @mohdzaki1930 10 หลายเดือนก่อน

    Nice video. What if we just want to fine tune on plain text, like a bunch of lines in a csv file?

    • @abhishekkrthakur
      @abhishekkrthakur  10 หลายเดือนก่อน +1

      yeap

    • @user-bq2vt4zz2e
      @user-bq2vt4zz2e 10 หลายเดือนก่อน

      Would it just be one column then?@@abhishekkrthakur

  • @karthik-pillai
    @karthik-pillai 11 หลายเดือนก่อน +1

    Amazing!
    How can we use the model for inference?

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน

      th-cam.com/video/o1BCq1KJULM/w-d-xo.html

    • @emrahe468
      @emrahe468 10 หลายเดือนก่อน +2

      ​@@abhishekkrthakursorry, but output of this video and the video you linked doesn't match really

    • @mraihanafiandi
      @mraihanafiandi 10 หลายเดือนก่อน

      @@abhishekkrthakur I dont think its relevant since you use peft and 4bit. The model loading in inference should be different when you have an adapter model. Would be really really great if you show us the inference of the model you trained on the video

  • @abdellaziztekaya8596
    @abdellaziztekaya8596 5 หลายเดือนก่อน

    Where can i find to code you worte and your dataset? I would like to use it as an exemple for testing

  • @sachinsoni5044
    @sachinsoni5044 11 หลายเดือนก่อน

    hey Abhishek, I am a full stack developer and interested in AI. I love to code. I tried learning DS but found no interest in juggling with data. How should i learn?

  • @BTC198
    @BTC198 10 หลายเดือนก่อน

    What GPUs were you running?

  • @radus8832
    @radus8832 10 หลายเดือนก่อน

    Is it possible to train a model, transformer using this library from scratch? If not maybe you could make a video on how to train a LLM from scratch using a custom dataset?

  • @oxydol3456
    @oxydol3456 หลายเดือนก่อน

    which machine is recommended for fine-tuning LLAMA? windows?

  • @vasuchandra
    @vasuchandra 10 หลายเดือนก่อน

    Thanks for the tutorial.
    On a Linux 5.15.0-71-generic #78-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux machine, I get following error when training llm with the small dataset. File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2819, in from_pretrained
    raise ValueError(
    ValueError:
    Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
    the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
    these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
    `device_map` to `from_pretrained`.
    What could be the problem? Is it possible to share the data.csv that you have with single row that I can take as reference to test my own data?

  • @yashvardhanjain1968
    @yashvardhanjain1968 11 หลายเดือนก่อน

    Thanks! Is there a way to push the trained model to hub after its trained and not using the --push_to_hub while training? Also, when I try to use push to hub, I get a "you don't have rights to create a model under this namespace". I am using a read token to access the llama model. Do I need to change it to a write token? Is it possible to use two separate tokens? (sorry, I'm super new to Huggingface) Any help is much appreciated. Thanks!

    • @abhishekkrthakur
      @abhishekkrthakur  11 หลายเดือนก่อน +1

      yes. you need to use a write token. you can remove push to hub and then push the model manually using git commands if you wish

  • @rajhammeersinghhada72
    @rajhammeersinghhada72 5 หลายเดือนก่อน

    Why do we need --mixed-precsion and --quantization both? Aren't they both doing the same thing?

  • @prathampundir5924
    @prathampundir5924 27 วันที่ผ่านมา

    can i train llama3 also with these steps?

  • @nirsarkar
    @nirsarkar 9 หลายเดือนก่อน

    Can this be done on Apple Silicon, I have M2 with 24G memory?

  • @kishalmandal5676
    @kishalmandal5676 10 หลายเดือนก่อน

    How can i load the model for inference if i stop training after 1 epoch out of 3 epochs.

  • @shaileshtiwari8483
    @shaileshtiwari8483 9 หลายเดือนก่อน

    Is Gpu Machine necessary for llama 7b to be trained?

  • @deepakkrishna837
    @deepakkrishna837 7 หลายเดือนก่อน

    Hi when we tried fine tuning MPT LLM using autotrain, getting the error ValueError: MPTForCausalLM does not support gradient checkpointing. Any help you can offer on this pleas?

  • @ashishtater3363
    @ashishtater3363 2 หลายเดือนก่อน

    I have llm downloaded can I fine tune it with downloading from huggingface.

  • @codeguero8933
    @codeguero8933 10 หลายเดือนก่อน

    If i understand this model is training in the local machine? and is saved locally too ? Or model continues on hugging face servers?

    • @abhishekkrthakur
      @abhishekkrthakur  10 หลายเดือนก่อน +1

      its training locally. its also saved locally. if you use --push-to-hub arg, it will push model to huggingface servers

  • @DavidJones-cw1ip
    @DavidJones-cw1ip 9 หลายเดือนก่อน +1

    Any chance you have the python scripts available somewhere? Thanks in advance.

  • @abdalgaderabubaker6078
    @abdalgaderabubaker6078 11 หลายเดือนก่อน +2

    Any idea to fine-tune it on Apple chip M1/M2? Just have an installation issues with auto train-advanced 😢

    • @allentran3357
      @allentran3357 10 หลายเดือนก่อน

      Would love to know how to do this as well!

    • @jas5945
      @jas5945 10 หลายเดือนก่อน +1

      Bumping because running into so many issues with M1. Cannot believe how little resources are available for M1 right now given that macOS is so widely used in data science

  • @aurkom
    @aurkom 10 หลายเดือนก่อน

    How to change this for tasks like classification?