The Ultimate Guide to Fine Tune Mistral Easily

Mervin Praison

มุมมอง 2 943

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 13 ต.ค. 2024

ความคิดเห็น • 14

@gbs-l3n 3 หลายเดือนก่อน ⁺¹
Is this fully parameter finetuning, Can I use this finetuning method for, train on my specific data, which means It shouldn't answer any out of the box questions or previous knowledge. for example I'm finetuning a model for medical chatbot. When I inference it shouldn't responds to general knowledge questions (like for example : how to make sandwich). if not how can we achieve it ?
@harshkamdar6509 4 หลายเดือนก่อน ⁺⁵
I want to fine-tune a model on entire textbooks to give it specific knowledge instead of instruction tuned datasets like these, how can I do it ?
I am looking for fine-tune SLMs like phi 3 128k so if you can show me some resources for the same it would be really helpful
@xspydazx 4 หลายเดือนก่อน ⁺¹
for me :
I asked gpt o give me a python code to take my folder of text files and for each document:
create chunks of 1024 / 2048 /4096 /8k,16k,32k: (i did this for each target size) Training for different context lengths strengthens the responses generated from the contets and makes the model more robust for recalling these books later : i asked to include the document title : for each chunk also as well as obvioulsy give them a series id : so i can reconstruct the order of this book later as well as when training i can just take random batches ... shuffleing the records to get a good spread:
then i trained the model with various prompts :
ie : save this document for later recal , save this important information...... (instructions like this are very good to tell the model to store the data , as well as it becomes a task) ... now later i cn call to recall these same chunks in another task ie the opposite of save book....
here we can use the larger chunks as the smaller chunks we trained with will need the model to draw all its contexts and create larger chunks for recall hence small training chunks and large recall chunks .... forcing the data through the model :
eventually i will recall the book !
also i find its also pruduct to just dump the books in also as raw text ... this way its also unstructured :
so we have added unsrructured first ... then structured ... then a task...
now we know when training that unstructured data should be trained at a rate of 0-2 loss as its unstructured so prefereably 1.25 loss:
for storage task we need to also make sure we are pushing 20million paramters ... indepth :
for recall re can use 5 million paramters to extract and adjust the network to better retrieve this data... later for full recall we will return to the 20million paramters ...
here there is another phylosphy in play :
parameter depth !
@xspydazx 4 หลายเดือนก่อน
Sorry for the long post ! (but thats the whole thing !)
@harshkamdar6509 4 หลายเดือนก่อน ⁺²
@@xspydazx don't apologise it's was a very detailed explanation thankyou for that, I did something similar
I created segments of 128k tokens per chunk ( the context window of phi3 ) , wrapped it with the prompt template of phi 3 then used Qlora and SFTTrainer to train on the dataset, the dataset has 16 segments of 128k tokens, it was a 600 page book but when I trained the model and inferenced using the updated weights it didn't have any effect on the model, i fail to understand why, i tried adjusting the bias from none to all tried with different hyper params but to no luck
@xspydazx 4 หลายเดือนก่อน
@@harshkamdar6509 yes I have been able to recall a whole dataset of books or papers now using the techique ...
I used it to get the. Bible in ! So we can call for a verse then a chapter then a book or summary of the book (hmm) .. and sometimes the whole document ... Without alteration. It varys .. as the bibles are clearly marked so to reference asv instead of king Jane version is no problem because I was very meticulous about the markdown Bibles as well as the normal bibles ...
.it seems long winded I know but the bible would not take into the model ... So I had to examine the whole process of fine tuning information into the model for verbatim recall ... Hence we also will require discourse. .. but everything in order first. .. it also worked for the transformers documentation so I also did the same for langchain and gradio
@urbancity1254 2 หลายเดือนก่อน
what are the costs for a ft job like this 🙏
@aryanakhtar 4 หลายเดือนก่อน ⁺¹
Is it necessary to have an account on Massed compute for fine tuning the mistral model?
@MervinPraison 4 หลายเดือนก่อน
No you don’t need to have massed compute account
@rodrimora 4 หลายเดือนก่อน
Will this work for WizardLM 2 8x22B as it's based on the mixtral 8x22B?
@w1ll2p0wr 4 หลายเดือนก่อน
WLM8x22B is so slept on, but I figure you’d have to fine tune each of the 22B experts for your use case then combine them in the MoE layer…but idk I’m just some guy
@user-wr4yl7tx3w 4 หลายเดือนก่อน
Is the GPU free on Mistral server?
@user-wr4yl7tx3w 4 หลายเดือนก่อน
Only thing is that it has abstracted away all aspect of training into a black box such that you have no idea of the inner workings

ต่อไป

เล่นอัตโนมัติ

QLoRA-How to Fine-tune an LLM on a Single GPU (w/ Python Code)