Would love to visit Venice (one of my checklist to visit) , gonna try it ,! BTW , very grate-full for your lecture , got so much to learn and revise , I have followed the Stable diffusion too , learnt a lot and was able to co relate from that lecture to this .
thank you @umarjamilai, I love your video it's really connected with your previous videos and I have seen all the videos 😉 thank you once again, TBH I don't have any words to explain my feeling
you are the best youtuber on the internet, the best! Not one of! I have listened bunch of programming videos, none of them are like you, yours are so good, so up to date, so amazing
It's so satisfying to know alot of details that was once quite confusing. Thanks for such a great effort to offer this amazing lecture! It helps me not only get deeper about how VLM is about but also refreshing and connecting the transformer part :)))
This channel and video is the real deal. Amazing quality. Can't wait to watch the whole thing. Can't believe its completely free - we have no excuse! Keep up the great work and Assalamu Alaikum from Austin, TX!
I've been watching your videos for years, but this is the first time I've seen you. Look at that charisma! You're the king. Let me know if you come to Istanbul; I'll treat you to some kebab.
I have no words to thank you. I was thinking 2 weeks ago, why there are not books for VLMs like LLMs and today I found your comprehensive explanation video.
You are just doing amazing work and this will provide a path for all those who are interested in ai to do some amazing things. You are an inspiration for all the students who want to learn this field at the deep level. Thankyou for what you are doing this is helping us a lot to learn this field.
Finito! Thank you a lot: I was very curious to learn how multimodal algorithms were even able to work, and it has been a very good challenge to follow along the flow of informations from input to the output, one math operation at a time. Kudos!
Umar, I don't have words for this type of content I started with watching aladdin persson on youtube 4 years back where he implement papers and that go me into ML but he stopped uploading and after so muh Time I found you, I am glad and would love to see many more videos like this
I wanted to spend a few days reading how multimodal LMs work, but your video broke all my plans 😅. As always, perfect timing and explanation, keep up the great work!
Thank youuu very much, I'm doing my master's thesis on Visual Language Models and this video is such an amazing resource to complete it. Excelent work!
Amazingly fabulous as always. Could you please cover implementation of controlnet from scratch using Pytorch. Would love to see that. Thanks again for the great content
Incredible work Umar! You really have a great talent to visualize complex things. And doing all this work for free is impressive. Quick question for the experimented Data Scientists -> Is it normal to have soooo many layers (function that is in another function that is in another function .....). Is it efficient and don't you lose the general view of what the code is doing?
thank you so much @umarjamilai, I love your explanation method. I really like your method. I hope you will make a video on triton language and how it differs from the regular CUDA.
I Commented on you older video which was about DDP training. you explained everything soooo well. Can you Please make a detailed video about Tensor Parallel training just like ddp? It will be very helpful.
@umarjamilai Thank you for this amazing video! At 1:13:35, why are skip connections done from before the LayerNorm vs after? Wouldn't this result in un-normalized values being added to the output of the MHA or MLP blocks? Is there some reason why the architecture looks like this?
i think maybe the tie_weights method should flip the equality because i tried this implementation with gpt2 from karpathy and loss was very very big at the beginning , after flip the equality the loss back to resonable range !
Time index 3:11:40 , I guess like LSTM the more information we pack in a given token , the further out we go , like the last token harder it gets to remember information about past token , I would love to get your views on why transformers can remember initial sequence and the last dew tokens however struggle to retrieve information from the middle .
Thank you so much for the video. Loved every bit of it. At one point you mentioned that they Gelu given heurisitics and experimentations, but won't changing one of many hyperparams can increase the training cost by a huge marigin? is there any way to guessitimate that intuition? May be like looking at activations on tensorboard / wandb? How does that work in industry? But thanks for the video again, had a great weekend watching it.
Read this paper by Norm Shazeer: "GLU Variants Improve Transformer". In the conclusion the author says "We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence." Noam Shazeer is one of the fathers of the Transformer model. If he says so, who am I to argue? 😁
@@umarjamilai Hahah true that and I remember you mentioned this in one of your other video I think it was Mistral Explanation video. What I m thinking right now is to how can we estimate which model to choose? There's moondream, clogVLM, miniGPT-4, florence 2, phi-3 etc. I m thinking on seeing how it performs for certain tasks on Image encoder first and then see which architecture uses that archietucture and finetune further on that task. Anyway thanks again :)
My favorite (one of my favorite) pizza is actually "Pizza with mozzarella di bufala", also known as "bufalina" in Italy 😆😋
Sir, thanks a lot for these awesome works. Being a student, this benefits me a lot, sir.
Your lecture are very helpful. Please increase the frequency of videos.
Would love to visit Venice (one of my checklist to visit) , gonna try it ,!
BTW , very grate-full for your lecture , got so much to learn and revise , I have followed the Stable diffusion too , learnt a lot and was able to co relate from that lecture to this .
thank you @umarjamilai, I love your video
it's really connected with your previous videos and I have seen all the videos 😉
thank you once again, TBH I don't have any words to explain my feeling
Bufala or Parmigiana ❤ Thanks for the amazing videos as always
You Sir are a source of pride to all of us Italian Computer Scientists. Auguri! Grazie!
you are the best youtuber on the internet, the best! Not one of! I have listened bunch of programming videos, none of them are like you, yours are so good, so up to date, so amazing
谢谢你的点赞!
You and Andrej are the two guys inspiring me a lot. Respect!
5 hours of top tier content that too completely for free! Thank you so much! Please keep uploading such content
6 hours****
I appreciate your effort to make a such clear explanation. I spent the whole thanksgiving week to watch video.
It's so satisfying to know alot of details that was once quite confusing. Thanks for such a great effort to offer this amazing lecture! It helps me not only get deeper about how VLM is about but also refreshing and connecting the transformer part :)))
Saw a Twitter post about how this an underrated channel. Bro, you videos are hardcore and I love it!
You're the best in explanation papers with codes. keep it up bro👏. I hope the next about *ControNet* from scratch.
This channel and video is the real deal. Amazing quality. Can't wait to watch the whole thing. Can't believe its completely free - we have no excuse! Keep up the great work and Assalamu Alaikum from Austin, TX!
I've been watching your videos for years, but this is the first time I've seen you. Look at that charisma! You're the king. Let me know if you come to Istanbul; I'll treat you to some kebab.
Thank you! 😋😋😋🥙🥙🥙
I have no words to thank you. I was thinking 2 weeks ago, why there are not books for VLMs like LLMs and today I found your comprehensive explanation video.
same problem i have faced
Same, i only found one paper: An Introduction to Vision-Language Modeling
Maybe you will find it useful
@@vinc6966 I also found this today: Building and better understanding vision-language models: insights and future directions
the model is working , thank you Umar , you are great person . can you do more videos about training LM and finetuning it !
Stable diffusion video was great. I bet this is even better. Nice to see your videos man welcome back.
you are awesome sir! thank you so much for putting this together and sharing it so generously with us!! you and Andrej are boon to the ML/AI community
Can't thank you enough. You are simply the best guy on TH-cam in this field...
This is super super awesome! Thank you very much for this awesome work!
Man you are a saviour ...pls keep up the good work
from the bottom of my heart thank you. your explanations are exceptionally clear even for novices like me. We wish you the best
You are just doing amazing work and this will provide a path for all those who are interested in ai to do some amazing things. You are an inspiration for all the students who want to learn this field at the deep level. Thankyou for what you are doing this is helping us a lot to learn this field.
Contributo fantastico!! best content I ever saw in TH-cam
Priceless... Sei il mio mito e una continua fonte di ispirazione!
The Man is back 🤩🤩🤩🤩
Finito! Thank you a lot: I was very curious to learn how multimodal algorithms were even able to work, and it has been a very good challenge to follow along the flow of informations from input to the output, one math operation at a time. Kudos!
Umar, I don't have words for this type of content I started with watching aladdin persson on youtube 4 years back where he implement papers and that go me into ML but he stopped uploading and after so muh Time I found you, I am glad and would love to see many more videos like this
Before watching the video, I want to thank you for the great effort! Your videos always answer my questions!!
Super well commented and structured code, well explained video. Superb quality video with zero fee!
I wanted to spend a few days reading how multimodal LMs work, but your video broke all my plans 😅. As always, perfect timing and explanation, keep up the great work!
Legend! Dont stop
Your content is top tier.
Thank youuu very much, I'm doing my master's thesis on Visual Language Models and this video is such an amazing resource to complete it. Excelent work!
you are the best ml engineer bro , there is no full explanation with pytorch code for multimodal LM in the entire youtube, may god preserve you .
I haven't seen the video yet, but I'm sure it's amazing, like all your videos, here's a little thank you
Thank you very much! Please reach out to me on LinkedIn if you have any questions or doubts.
What a video with a lot of efforts. Thanks.
what an excellent video. Well done bro
This is god level stuff. Thanks a lot man.
How is this still 39k subs? This is so alpha
Please share it with your network of friends, best way to help me
Love it! Very good. BTW image “Normalization” is actually Standardization. Normalization is the scaling
This is gold. thank you.
Best guy on TH-cam for this field
Thanks Umar Jamil. big fan of your work !
Thanks for the great material. Hope you enjoy the coffee.
Great video @umarjamilai, learned a lot, helpful, thanks for efforts.
Detailed video on llama 3.2 Architectures will be helpful.
Amazingly fabulous as always.
Could you please cover implementation of controlnet from scratch using Pytorch. Would love to see that.
Thanks again for the great content
Sir you are the biggest inspiration for me . Thank you for your guidance
Yeees, I am a fan of your coding from scratch videos🤩
Incredible work Umar! You really have a great talent to visualize complex things. And doing all this work for free is impressive. Quick question for the experimented Data Scientists -> Is it normal to have soooo many layers (function that is in another function that is in another function .....). Is it efficient and don't you lose the general view of what the code is doing?
Thanks man, you are the best.
PLEASE do a video about FLUX models.
Thanks
love you explain thing at deep. Keep up the work!
Love ❤ your channel and learned a lot about he machine learning
Love from xi'an! Good explain !
那你一定认识最后我用的照片 😄
Legend is back 🎉
Ah yes! Gonna try this in the weekend!!
💚¡Gracias!
thank you so much @umarjamilai, I love your explanation method.
I really like your method.
I hope you will make a video on triton language and how it differs from the regular CUDA.
Thank you for this detail video.
Absolutely loving, could you please make video on same on CoCOOP also
Love it so much this content. I dont are if take 6 hour long. men i appreciate the effort
Amazing lecture Umar! i hope and wish you keep doing this for a long long time. BTW which app do you use for making notes?
¡Gracias!
in 4:10:25 he is showing a softmax after linear layer, why we didnt add softmax in GemmaModel class?
Jazakallah Khairan ❤
Please federated learning from scratch next
Thanks for your selfless 、awesome vidio ,it is hard to descripe how much I appreciate it! god prey for you~~~~
Love From India, Sir
Welcome back!!❤❤❤
Great content!Thanks for sharing!
Do you have plan about introducing the training process and fine-tuning of multi-modal LLMs?
love the way you say "Pepperoni"
🫠🥰
Thanks very much for your excellent explanation, Umar. And can you explain the recent popular Agents stuff , please? Hope you have a nice day.
You are a monster, My dream is be like you
Have you made any video on tokenizers?
Could you tell me how I can train this model for specific data to improve its performance further?
Hello, Did you find any way to train this?
Thanks
I Commented on you older video which was about DDP training. you explained everything soooo well. Can you Please make a detailed video about Tensor Parallel training just like ddp? It will be very helpful.
Geez. You deliver :)
@umarjamilai Thank you for this amazing video! At 1:13:35, why are skip connections done from before the LayerNorm vs after? Wouldn't this result in un-normalized values being added to the output of the MHA or MLP blocks? Is there some reason why the architecture looks like this?
just out of curiosity, what pdf is the one that you are going over here @05:02:36 Can we have that?
Unbelievable!!!!!!!!
i think maybe the tie_weights method should flip the equality because i tried this implementation with gpt2 from karpathy and loss was very very big at the beginning , after flip the equality the loss back to resonable range !
Thanks!
@umarjamilai great video , is their a way to train it on OCR task (like hawritten txtwritten for other langages ??)
One word - (Legend)^100
Great video. Thank you. Can you attach the slides of the vision transformers ?
how about train a Multimodal (Vision) model for video?
It would be good if you instruct how to encrypt data for each task of VLM
Time index 3:11:40 , I guess like LSTM the more information we pack in a given token , the further out we go , like the last token harder it gets to remember information about past token , I would love to get your views on why transformers can remember initial sequence and the last dew tokens however struggle to retrieve information from the middle .
i can not clone the model eventhough i have token , i can download just tokenizer and safetensor index ? is there any solution
You're probably using the SSH endpoint to clone, use the HTTPS one.
@@umarjamilai i am actually downloading the safetensor manually , and then i will put them in the same folder with .json files that i had .
36:22
What's the point of doing super().__init__() in the config? It will always be initialized properly without it being there. It is redundant.
Please make a video on gpt and sora.
what’s the pad he used? ipad? and what’s the app? i like that it can be mapped to windows and streaming
Tldraw, maybe. Excalidraw is similar and I like it
Thanks a lot . Can any one help me in training time ? I confused about training time . Do we use image logits in loss function or just prompt tokens ?
Valeu!
Great work , please share the software , ipad etc , you used for the presentation and if possible do a presentation on video LLM
Why the Dropout not used here?
because Dropout is used durong training, but during inference.
Thank you so much for the video. Loved every bit of it. At one point you mentioned that they Gelu given heurisitics and experimentations, but won't changing one of many hyperparams can increase the training cost by a huge marigin? is there any way to guessitimate that intuition? May be like looking at activations on tensorboard / wandb? How does that work in industry?
But thanks for the video again, had a great weekend watching it.
Read this paper by Norm Shazeer: "GLU Variants Improve Transformer". In the conclusion the author says "We offer no explanation as to why these
architectures seem to work; we attribute their success, as all else, to divine benevolence."
Noam Shazeer is one of the fathers of the Transformer model. If he says so, who am I to argue? 😁
@@umarjamilai Hahah true that and I remember you mentioned this in one of your other video I think it was Mistral Explanation video.
What I m thinking right now is to how can we estimate which model to choose? There's moondream, clogVLM, miniGPT-4, florence 2, phi-3 etc. I m thinking on seeing how it performs for certain tasks on Image encoder first and then see which architecture uses that archietucture and finetune further on that task.
Anyway thanks again :)
32:16, you mean patch number 16 is always on the bottom right? :)
respect!😍
Muito obrigado
Sir, could you create a video for pruning and knowledge distillation with llms?
Does anyone get error while loading model that "Tokenizer class GemmaTokenizer does not exist or is not currently imported." ?
Bro how is the tutorial.. Can we understood things clearly? What's your view.
Thanks!
Please upload more such videos on the channel nobody teaches with code