Coding Llama 3 from scratch in PyTorch - Part 1
ฝัง
- เผยแพร่เมื่อ 5 พ.ค. 2024
- In this video series, you will learn how to train and fine-tune Llama 3 model from scratch.
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infini-attention.
📚Papers:
- Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints
: arxiv.org/abs/2212.05055
- Pre-training Small Base LMs with Fewer Tokens: arxiv.org/abs/2404.08634
Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention: arxiv.org/abs/2404.07143
💻 To follow along you can use this colab notebook:
- github.com/Blaizzy/Coding-LLM...
🎥 Coding Llama 2 from scratch video series
Part 1: th-cam.com/users/liveXHmag4damTg
Part 2: th-cam.com/users/liveLSWDpFmbE90
Part 3: • Coding Llama 2 from sc... - วิทยาศาสตร์และเทคโนโลยี
Well made Prince! Learned a lot
such a high quality content piece
This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!
Most welcome!
It’s my pleasure:)
I lived through this so others don’t have to.
this is very impressive and great content. thank you
You're very welcome!
Super impressive. Great value
One question
How do I further train the model on my custom content
Instead of LORA ?
Can we further full training it and add new memory
Most welcome!
You can do that, but that can be very expensive.
CS programmers are vampires. My eeeeyyyes. great content though
Bro how did you train llama 3 without paper?
Could you elaborate?
@@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅
@@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance.
Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel.
th-cam.com/play/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK.html&si=0Gyt9mdaA-ydiWOA
Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.
@@princecanuma oh understood, thanks I'll check it out and also your video 💙
Most welcome :)