@@onlineinformation5320 the are some video tutorials, you mainly format them as json lines or whatever, since you neee to read them tipically in a notebook
In the code, the EOS_TOKEN is added „…, output) + EOS_TOKEN“, right? Is that all that is necessary or do I have to add the EOS_TOKEN into the dataset? I trained the model using the code you used but with my own data and my result-model never stops talking. 😢
Can HF auto-train be used here? Also, why does every keep insisting on fine-tuning when DSPy is already out? You can obtain structured outputs with DSPy without the need to fine-tune. You haven't focused much on DSPy, I think it's very important you do this. It's clearly the future of AI
You can use auto-train. UnSloth gives you more efficient finetuning (memory). DSPy is on my list. Need to get a better understanding of it and will start creating content on it.
Thank you man for your videos. But my most curious question is how to prepare dataset from my own data? I have a book and wanna talk with book. Obviously RAG cannot fit all the content of the book even with 128k context length. So how to train my model on that book?
Whats next, show your skills? 1. CodeCraft Duel: Super Agent Showdown 2. Pixel Pioneers: Super Agent AI Clash 3. Digital Duel: LLM Super Agents Battle 4. Byte Battle Royale: Dueling LLM Agents 5. AI Code Clash: Super Agent Showdown 6. CodeCraft Combat: Super Agent Edition 7. Digital Duel: Super Agent AI Battle 8. Pixel Pioneers: LLM Super Agent Showdown 9. Byte Battle Royale: Super Agent AI Combat 10. AI Code Clash: Dueling Super Agents Edition
why do so many "use your own dataset" videos are just using online datasets? this has nothing to do with my data, it is custom yes, but not mine. it is online dataset from hf. my dataset wouldnt be there
Simple, straight to the point
Hi. I still do not unterstand how I can create my own datasets. Can you make a video about that? Its in the title "on you own dataset" ;-)
if you don't know you may not need it 😂, a dataset is mainly a series of question / answers pairs
@@sherpya I have a csv of question answer pairs how should I upload it plzzz answer
Use DSPy. Don't worry about creating your own datasets. It's a rabbit hole you'll never get out of.
@@onlineinformation5320 the are some video tutorials, you mainly format them as json lines or whatever, since you neee to read them tipically in a notebook
Why do you suggest DSPy? Do you have experience with it, @marilynlucas5128
In the code, the EOS_TOKEN is added „…, output) + EOS_TOKEN“, right? Is that all that is necessary or do I have to add the EOS_TOKEN into the dataset? I trained the model using the code you used but with my own data and my result-model never stops talking. 😢
Can HF auto-train be used here? Also, why does every keep insisting on fine-tuning when DSPy is already out? You can obtain structured outputs with DSPy without the need to fine-tune. You haven't focused much on DSPy, I think it's very important you do this. It's clearly the future of AI
You can use auto-train. UnSloth gives you more efficient finetuning (memory). DSPy is on my list. Need to get a better understanding of it and will start creating content on it.
but if my data inculte like dailogue how can be structured where there is one instruction for each response
Thank you man for your videos. But my most curious question is how to prepare dataset from my own data? I have a book and wanna talk with book. Obviously RAG cannot fit all the content of the book even with 128k context length. So how to train my model on that book?
I have the same doubt. A video on this would be very helpful.
I second that.
huh!? Rag is your best bet! If you want structured outputs to enable easy and efficient state transitions, use DSPy.
Will see what I can do here.
Is it possible to finetune on my language?
for my use case the dataset only consist of input and response , is it possible to fine tune !?
Yes, you can modify the prompt template in any capacity you want
How do I create a docker image and run it as a service? Can it support concurrent requests?
If I use Groq API, it's no longer opensource.
Whats next, show your skills?
1. CodeCraft Duel: Super Agent Showdown
2. Pixel Pioneers: Super Agent AI Clash
3. Digital Duel: LLM Super Agents Battle
4. Byte Battle Royale: Dueling LLM Agents
5. AI Code Clash: Super Agent Showdown
6. CodeCraft Combat: Super Agent Edition
7. Digital Duel: Super Agent AI Battle
8. Pixel Pioneers: LLM Super Agent Showdown
9. Byte Battle Royale: Super Agent AI Combat
10. AI Code Clash: Dueling Super Agents Edition
I have a csv of question answer pairs how should I upload it plzzz answer
You can read that csv file and convert it into a dictionary and the same code provided in the notebook.
why do so many "use your own dataset" videos are just using online datasets? this has nothing to do with my data, it is custom yes, but not mine. it is online dataset from hf. my dataset wouldnt be there
wrong 🤦♂️