Hey Everyone! Hope you're all doing super well. This video will give you everything you need to know about Transformer Neural Networks, BERT Networks and Sentence Transformers - or at least all that we can cover in 17 minutes. Hoping we all understand why these Architectures were developed the way they were, painting the picture as a fluid story. I'm trying another teaching style here. If you like this kind of video, please do let me know in the comments. Put a lot of effort into this, so I hope you think this is good! Enjoy! And Cheers!
Great Video! One minor comment: Shouldn't the loss equation from the triplet method (13:58) be the other way around? The difference that is subtracted should be between the anchor sentence and the negative sentence.
Hey man huge fan! Would you do a video about the "vanishing gradient problem"? Tbh I've been looking for a good video on it, but they're just not on point as you are....I'd really like your explanation on such argument! Keep up with the great work
Could you explain these 2 points in more detail? 3:21 transformers weren't designed to be language models + 16:35 transformers not complex enough to train a language model 1. What are language models supposed to do that transformers can't? My interpretation is that transformers do seq-seq tasks like translation, and translation needs a language model, so transformers are language models. Anything wrong with this thinking? 2. Can I say transformers are only invented to parallelize RNN family of models with attention? Any other obvious general or task specific benefits of transformers?
I guess he means that they get better through improved pretraining (thus understand language better) From Papers with code: "BERT improves upon standard Transformers by removing the unidirectionality constraint by using a masked language model (MLM) pre-training objective. The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context."
Great video!!! Can we get some project videos on Transformer? As you showed in this video about the text-similarity with BERT so do you have any plan to create a video to do this with python?
Great video. One part I didn't completely understood is the NLI part. Do you mean that after that NLI step, the mean pooling sentence vector of the newly trained BERT not be "poor" anymore? Thanks.
My favourite model to train is T5. So much better . I don't like encoder models . I rather use a model that uses both encoder and decoder rather than either/or.
Hey Everyone! Hope you're all doing super well. This video will give you everything you need to know about Transformer Neural Networks, BERT Networks and Sentence Transformers - or at least all that we can cover in 17 minutes. Hoping we all understand why these Architectures were developed the way they were, painting the picture as a fluid story. I'm trying another teaching style here. If you like this kind of video, please do let me know in the comments. Put a lot of effort into this, so I hope you think this is good! Enjoy! And Cheers!
dude you are amazing. Hope you keep this work up! Explaining complex things in an easy-to-follow and examplified way is a great skill!
Thanks a ton Daniel! Much appreciated complements :)
Wow. Thanks a lot for all these videos. I am self-studying beginner and your videos have been a boon. Keep up the good work, man!
This channel has orders of magnitude more views than it deserves
Thank you! This is what I need for my thesis
Great Video! One minor comment: Shouldn't the loss equation from the triplet method (13:58) be the other way around? The difference that is subtracted should be between the anchor sentence and the negative sentence.
I have the same doubt.
Great video. I just figured out the issue with my dataset, after I had bad results from directly using Roberta
Excellent overview!
Great explanation dude
Brilliant video 🚀
Great Video
Amazing overview !
Hey man huge fan! Would you do a video about the "vanishing gradient problem"? Tbh I've been looking for a good video on it, but they're just not on point as you are....I'd really like your explanation on such argument!
Keep up with the great work
Very informative, thank you!
You are very welcome! Thanks for watching and commenting
Really the best info out. Thank you.
Masterpiece 💯
Oooooooh, this is so freaking cool!! When are we teaming up to build something?!
Dude. I will reach out ma guy (sorry i didn't before) :)
@@CodeEmporium ayyy no problemo man!
really neat . Thank you , I was looking for nice stuff on SBERT with decent depth
Could you explain these 2 points in more detail?
3:21 transformers weren't designed to be language models + 16:35 transformers not complex enough to train a language model
1. What are language models supposed to do that transformers can't? My interpretation is that transformers do seq-seq tasks like translation, and translation needs a language model, so transformers are language models. Anything wrong with this thinking?
2. Can I say transformers are only invented to parallelize RNN family of models with attention? Any other obvious general or task specific benefits of transformers?
I guess he means that they get better through improved pretraining (thus understand language better)
From Papers with code: "BERT improves upon standard Transformers by removing the unidirectionality constraint by using a masked language model (MLM) pre-training objective. The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context."
Great stuff. Thanks
Great video!!! Can we get some project videos on Transformer? As you showed in this video about the text-similarity with BERT so do you have any plan to create a video to do this with python?
Yes, really needed.
Internet is lacking with an exact project developed using Transformers with proper backend information
Great video. Thanks
Nice overview
Great video. One part I didn't completely understood is the NLI part. Do you mean that after that NLI step, the mean pooling sentence vector of the newly trained BERT not be "poor" anymore? Thanks.
This is 🔥!!!😍😍😍😍😍
Great video, thanks a lot
Welcome:)
Great choice!
Thank you sir.
5:11 Bidirectional Encoder Representation FROM Transformer (not of Transformers)
My favourite model to train is T5. So much better . I don't like encoder models . I rather use a model that uses both encoder and decoder rather than either/or.
Can we use this for comparing two web articles?
this is good!
Приятно
It is spelled chien nor chein
QUORAAAAA
AAAHHH