if you are watching this in lockdown you are one of the rare species on the earth . many students are wasting their time on facebook, youtube, twitter, netflix, watching movies playing pubg, but you are working hard to achieve something . ALL the best ...nitj student here
I promise you nobody!!! i repeat again nobody!!!! can explain anything better than Krish sir and that too RNN explanation was the most simplest yet the best explanation i have ever seen , just so excited to see LSTM as well. All your content is like ruby's, emerald's and gold given for free and anybody can take, that's how i see your videos. no words to thank you sir!!!
I love how you go in to depth and show the mathematical expression of what is happening in each step. Definitely one of the best channels on deep learning
This is the best explanation for how RNN architecture works. I have read multiple blogs and watched animations, but I was not clear on how the feedback on the RNN block works. Thank you so much for such a passionate presentation.
The amount of Ads I get while watching your videos, clearly states the Monopoly of your videos in ML/DS. Thanks you for the videos. Your Videos helps a lot.
I wanted to thank you for your time and effort making this video. Your series is just amazing and your explanations are simply beautiful. I learned alot from you, so thanks again for sharing your understanding
I've watched 30 of your DL videos today and would like to thank you for them because they were very helpful in explaining these complicated topics. In some of the videos you've made some simple typing errors which might confuse people, so I would suggest you to watch the videos afterwards and put annotations correcting the typing errors. Keep up the great work! :)
1-time seeing the video - nothing understood, 2-time seeing the video - slightly understood, 3-time seeing the video - pretty clear, 4-time seeing the video - pretty much good. Thanks, Krish bro!
Can you explain me?? I am a bit confused. 1. in RNN are you using the same layer for the forward propagation or are there many layers like that in ANN but only the number of neurons are same? 2. the weights that been given to o1 o2 o3 are same or they are different? cause the weights assigned to x11, x12 are the same for the forward propogation
I have one doubt regarding input to the hidden layer . Krish had said first text data was converted into vector then feed to the RNN. SO , while converting into vector, sequential informed may be lost(as krish said in previous video). My question - why are we converting into vector?.. Also suggest some best technique which are here used to convert text data into vector. Correct me if i am wrong
Question please, why do you differently pronounce the W1 of the hidden state? Do they have different number? You call W1 (W one) at O1 & O3, but you call W1(W dash) at O2
Hello Krish, you are really doing great job for us. I just want to make correction at the end of rnn after O4 the activation function we will be implying for Binary classification is Sigmoid and if we have multi class classification then we will use softmax function..
Brilliant to say the list Krish. As you explained this is for t=4, in your corpus you may have different size, how do you handle that ? Will watch your next videos to watch it.
Great content! One question: Is w' getting updated during forward propagation? Like is w' different at t=1, 2, or 3, or is it the same value all along with the forward propagation?
krish thanks a lot for ur videos on ML, DeepLearing and now NLP. Very helpful and vital. I have two question: (1) How to create PoS unigrams , PoS bigrams or PoS trigrams?? (2) How to train ML using these PoS unigrams, bigrams and trigrams?? If u could answer it will be very very helpful , Thanku in advance
-: Plz can you make a video to explain how probability and statistics is useful in machine, deep learning and A.I ? -: What part of statistics and probability is most important which we need to study for this ? -: plz give some elaborative examples to make us understand. -: How can it be implemented in these learnings and A.I ?. Please make a video on that. by the way love your videos😘😘
lets take an example - If you are a civil engineer and you have to contract a dam, So you have collected 200 samples from dataset of past Ranfall in that city. So being an engineer your first question will be what is the probability of total rainfall in centimeters > danger mark like "20 cm ". So that you can contract a dam according to that. simply put you have to tell P(x>=20) ? Solution - Here you will use Probability and statistics. you have the sample data and you will calculate the parameters like mean and variance from that. And if you have mean than you can plot the PDF/CDF graphs and and just by looking them you can can tell p(x=20) = 1-P(x
@@Mrcrownjk both the weights w and W` are constants, as said in the video for the first time they are initialized using some initialization techniques and they will get changed appropriately depending on the loss fun during the back propagation..
@@krishnaik06Even for binary classification??? I have seen people uses 'sigmoid' for binary classifcation and 'softmax' for multi-class classification. That's why I asking, Krish! Correct me If I am wrong.
@@kishanlal676 yes. I think you can use softmax even for binary classification. But you'll have to make sure that your class labels are one hot encoded. The output of softmax is always the probability scores associated with each class. And these values will be improved with a loss function through backpropagation. So it can be used for both multi class and binary classification. Whereas sigmoid can only be used for binary classification because it is a squashing function. The output of sigmoid is either 1 or 0. Correct me if I'm wrong
Can you please explain why all the weights for the inputs x11, x12...are initialized the same (w) and why all the weights for the hidden layer to output layer (w1) the same for all the connections? I mean can't we initialize all the weights as different random values so that those are not the same? (What I mean is, say we initialize weight w1_ for the first input, w2_ for the second and so on) and also initialize weight w1 for the first output, w2 for the second and so on. What could possibly be different? (Or would it give us actually similar outputs as the backward propagation will fix that accordingly anyways?)
And also I wanna know what happens if we don't use the layer 0 (output 0), because there's nothing before the first word, which means no previous sequence either.
hi I have a doubt regarding the topic pls see though it 1st . when we find the weight w then we have to find it only once ?? (since weight is shared ) 2nd can u pls tell how we can calculate/update the weight w' pls reply thanks
sigmoid value ranges between 0 to 1......basically converts the output value between 0 to 1 and then if the value is < 0.5 consider it as 0 if greater than >0.5 consider it as 1.....this is how sigmoid works..
Hello sir first of all, thanks for all the videos you provided for us but can you just update the playlist with the videos about neural network that we need to watch in order to start with RNN. Thank you.
Are the weights associated with each preceding output (w') also initialized and fixed and same for all like we have with the weights for each Xi1 input ?? In simpler words, like w is same for all X, is w' also same for all O ??
I think it will be same for all O.Even If its not, ultimately all the weights would deduced to its optimum values in back propagation.So, just for mental satisfaction think it in this way ....
@Krish Naik do the weights that are applied to each output layer (w1, w', w'') have to be different? If this is the case, it seems inefficient - then the network would have to update many weights for a sentence with many words.
Hey if possible can you also tell if the number of hidden layers is related to the sentence length? You were very consise on the other aspects which I really appreciate. I just had that doubt, if possible do clarify.Thanks for the amazing video again
if you are watching this in lockdown you are one of the rare species on the earth . many students are wasting their time on facebook, youtube, twitter, netflix, watching movies playing pubg, but you are working hard to achieve something . ALL the best ...nitj student here
are you the same guy ho commented on 5 minutes eng
I promise you nobody!!! i repeat again nobody!!!! can explain anything better than Krish sir and that too RNN explanation was the most simplest yet the best explanation i have ever seen , just so excited to see LSTM as well.
All your content is like ruby's, emerald's and gold given for free and anybody can take, that's how i see your videos.
no words to thank you sir!!!
You are a blessing, I usually come here, when my highly talented professors in London, could not make me understand this simply.
One of the best AI professors and Data Scientist in the country
world*
I love how you go in to depth and show the mathematical expression of what is happening in each step. Definitely one of the best channels on deep learning
By far the best explanation of the workings of an RNN I have experienced.
I liked the code basics one more!!
I am following Krish for the last 3 years and he is one of my Favourite Instructor.
Dude you're love, the way you made it so simple. Respect
I feel so motivated when watching his videos. God Bless you and you have huge respect Sir.
omg !!!!! one of the best and simplified explaination of RNN in the internet ,tq sir.
This is the best explanation for how RNN architecture works. I have read multiple blogs and watched animations, but I was not clear on how the feedback on the RNN block works. Thank you so much for such a passionate presentation.
I think in calculation of o2, o4 it should be w' in place of w1
The amount of Ads I get while watching your videos, clearly states the Monopoly of your videos in ML/DS. Thanks you for the videos. Your Videos helps a lot.
adblock to the rescue....use the google chrome extension
Outstanding! Had you answered Shahriar's question in your video, I would say you made the perfect RNN video. Great job! Keep up the outstanding work.
I was watching the previous video right now😃 thanks for next video
Anyone else tried to remove the fly off the wall from their screen?! :'D. Great video BTW :)
Didn't notice until I read this comment! :P
Yes. I tried.
I wanted to thank you for your time and effort making this video. Your series is just amazing and your explanations are simply beautiful. I learned alot from you, so thanks again for sharing your understanding
WOW. Great. I never seen a video in you tube clearing explaining the concept. Thank u
I've watched 30 of your DL videos today and would like to thank you for them because they were very helpful in explaining these complicated topics. In some of the videos you've made some simple typing errors which might confuse people, so I would suggest you to watch the videos afterwards and put annotations correcting the typing errors. Keep up the great work! :)
Your list of these videos is amazing! Thanks!
1-time seeing the video - nothing understood,
2-time seeing the video - slightly understood,
3-time seeing the video - pretty clear,
4-time seeing the video - pretty much good. Thanks, Krish bro!
Seeing 5 time got wonderful knowlege now
Great tutorial. Keep up the good work! Eagerly waiting for the next lesson on RNN
A Grand Salute to you Man👍
That annoying fly XD
Great explanation thank you
Sir what about the bias term???
Thanks for this great video, Krish. You''re awesome! I've liked and subscribed.
Thank you so much for the very wonderful explanation!! It is only now that I understood.
Awesome Work Sir
Thanks a ton
If you know how forward propagation works in ANN, this makes perfect sense. Great explanation as usual.
SIr, Just a suggestion, you could also give some links for further or supplementary reads.
Awesome explanation 👍👏😊
I just throw a like on your videos, without even seeing it first, coz I know the effort behind.
at 8:02, it should be sigmoid as you mentioned the classification is for 0 and 1 .
yes it should be sigmoid....
Amazing explanation! Seeing this after completing the Udacity course and found this to be more intuitive
sir , don't mind but you have some misconception regarding RNN .
loss function calculated every word neural network
Shouldn't the last layer function should be a sigmoid as we are classifying only between 0 and 1?
yeah it will be sigmoid and not softmax
Superb! Finally I could understand the intuition behind RNN, great work, thanks Krish ! 👍
Can you explain me?? I am a bit confused. 1. in RNN are you using the same layer for the forward propagation or are there many layers like that in ANN but only the number of neurons are same? 2. the weights that been given to o1 o2 o3 are same or they are different? cause the weights assigned to x11, x12 are the same for the forward propogation
very detailed explanation! Awesome
Well explained video. Thank you for sharing.
I have one doubt regarding input to the hidden layer . Krish had said first text data was converted into vector then feed to the RNN. SO , while converting into vector, sequential informed may be lost(as krish said in previous video). My question - why are we converting into vector?..
Also suggest some best technique which are here used to convert text data into vector.
Correct me if i am wrong
Question please, why do you differently pronounce the W1 of the hidden state? Do they have different number? You call W1 (W one) at O1 & O3, but you call W1(W dash) at O2
did you get the answer?
Hello Krish, you are really doing great job for us.
I just want to make correction at the end of rnn after O4 the activation function we will be implying for Binary classification is Sigmoid and if we have multi class classification then we will use softmax function..
Why we are using softmax activation function, because it is use in multiclass classification. Should we use sigmoid activation function.
great explanation sir.....
Thanks , great video , helped me a lot.
Thanks a lot for the video. Looking forward to seeing “logic behind CRNN for text recognition” video someday :)
Brilliant to say the list Krish. As you explained this is for t=4, in your corpus you may have different size, how do you handle that ? Will watch your next videos to watch it.
Great content! One question: Is w' getting updated during forward propagation? Like is w' different at t=1, 2, or 3, or is it the same value all along with the forward propagation?
Found the answer in the next video: the weight of w' is the same along an instance of forward propagation
all the w' will remain same
@Krish Naik ----- Why weight is different at output of time stamp t4. W''
I love the energy
krish thanks a lot for ur videos on ML, DeepLearing and now NLP. Very helpful and vital. I have two question: (1) How to create PoS unigrams , PoS bigrams or PoS trigrams?? (2) How to train ML using these PoS unigrams, bigrams and trigrams?? If u could answer it will be very very helpful , Thanku in advance
-: Plz can you make a video to explain how probability and statistics is useful in machine, deep learning and A.I ?
-: What part of statistics and probability is most important which we need to study for this ?
-: plz give some elaborative examples to make us understand.
-: How can it be implemented in these learnings and A.I ?.
Please make a video on that.
by the way love your videos😘😘
lets take an example -
If you are a civil engineer and you have to contract a dam, So you have collected 200 samples from dataset of past Ranfall in that city. So being an engineer your first question will be what is the probability of total rainfall in centimeters > danger mark like "20 cm ". So that you can contract a dam according to that. simply put you have to tell P(x>=20) ?
Solution -
Here you will use Probability and statistics.
you have the sample data and you will calculate the parameters like mean and variance from that.
And if you have mean than you can plot the PDF/CDF graphs and and just by looking them you can can tell p(x=20) = 1-P(x
You are such a great teacher - thank you so much !! #StayBlessednHappy
Sir please explain how is W different from W' ???
You have used W and W' alternatively while calculating the output at each time step..why is that???
It is not required for the weights to be changed in the forward propagation.. i.e. the weights will be same for all the iterations
@@chitneedihemanthsaikumar7511 that means for every o1,O2,o3 ,o4 weights are same?(x11*w,x12*w , x13*w , x14*w ) all these w's are same?
Pls reply
I have the same exact qns
@@Mrcrownjk both the weights w and W` are constants, as said in the video for the first time they are initialized using some initialization techniques and they will get changed appropriately depending on the loss fun during the back propagation..
Thank you sir
Shouldn't we use 'sigmoid' as activation function at the last layer? or 'sigmoid' can aslo be used ??
For classification we usually sigmoid
@@krishnaik06Even for binary classification??? I have seen people uses 'sigmoid' for binary classifcation and 'softmax' for multi-class classification. That's why I asking, Krish! Correct me If I am wrong.
@@kishanlal676 yes. I think you can use softmax even for binary classification. But you'll have to make sure that your class labels are one hot encoded.
The output of softmax is always the probability scores associated with each class. And these values will be improved with a loss function through backpropagation.
So it can be used for both multi class and binary classification.
Whereas sigmoid can only be used for binary classification because it is a squashing function. The output of sigmoid is either 1 or 0.
Correct me if I'm wrong
07:58 sigmoid at the last layer
waiting for the rnn next vedio.. please uploead....its great
Can you please explain why all the weights for the inputs x11, x12...are initialized the same (w) and why all the weights for the hidden layer to output layer (w1) the same for all the connections? I mean can't we initialize all the weights as different random values so that those are not the same? (What I mean is, say we initialize weight w1_ for the first input, w2_ for the second and so on) and also initialize weight w1 for the first output, w2 for the second and so on. What could possibly be different? (Or would it give us actually similar outputs as the backward propagation will fix that accordingly anyways?)
And also I wanna know what happens if we don't use the layer 0 (output 0), because there's nothing before the first word, which means no previous sequence either.
My question is, this diagram was in context to one hidden layer only. How the input will move if we have more than 1 hidden layer?
Its not one hidden layer its the whole neural network!!!
hi I have a doubt regarding the topic pls see though it
1st . when we find the weight w then we have to find it only once ?? (since weight is shared )
2nd can u pls tell how we can calculate/update the weight w'
pls reply thanks
I liked the code basics one more!!
For binary classification, the final activation function is sigmoid, and not softmax
you can use the softmax activition for binary classification instead sigmoid , sigmoid is a sub case of softmax.
Sir, we have to add bais haven't us!
Why you didn’t consider the bias of every hidden layer of network in forward propagation?
claps to you!! you are amazing!!
i think the weights of the previous info is constant W and weights of current inputs is altered in back propagation ...Krish please confirm
Sir , please make video on how to identify plant disease using neural networks
real knowledge exhibited in chalk and talk method👋
Nice explained
Thanks Krish
weight w and output weight w' is same weight are different ?
Why is Loss calculated as yhat-y instead of y-yhat?
cant thank you enough. Love from Pakistan
sir can you make a playlist of text to speech with CNN from the scratch
so basically the number of neurons in RNN is always the same right????
Hello, I wonder if it's possible to output a vector like [0, 1, 0] at the end .. thanks
sigmoid value ranges between 0 to 1......basically converts the output value between 0 to 1 and then if the value is < 0.5 consider it as 0 if greater than >0.5 consider it as 1.....this is how sigmoid works..
Very Helpful Sir
Great.
Hello sir first of all, thanks for all the videos you provided for us but can you just update the playlist with the videos about neural network that we need to watch in order to start with RNN.
Thank you.
The last activation function should be sigmoid not softmax, since it's basic sentiment analysis (only two possible outputs)
It depends on the number of classes. I mean most of the datasets for sentiment analysis have four classes then softmax should be the activation...
yes you are right...he meant the same its just a small human error..
Thank You So Much!!!!
Good explanation!
Is that circle thing inside the hidden layer is a RNN neuron?
yes consider it as neuron......
Are the weights associated with each preceding output (w') also initialized and fixed and same for all like we have with the weights for each Xi1 input ?? In simpler words, like w is same for all X, is w' also same for all O ??
I think it will be same for all O.Even If its not, ultimately all the weights would deduced to its optimum values in back propagation.So, just for mental satisfaction think it in this way ....
@@vkaspainkra5107 haha, yes makes sense. Thanks a lot 😁👍
Sir can you do a video of Rnn example by giving numerical values
you are amazing
Awesome
Is it a single hidden layer or 4 hidden layers ?
Is w' value for all the layers identical?
@Krish Naik do the weights that are applied to each output layer (w1, w', w'') have to be different? If this is the case, it seems inefficient - then the network would have to update many weights for a sentence with many words.
The weights are same for every rnn cell . I studied this from MIT course
Sir w dash and w one are same?
You mean in a hidden layer output of 1 neuron is the input of next nuron🤔
Hey if possible can you also tell if the number of hidden layers is related to the sentence length? You were very consise on the other aspects which I really appreciate. I just had that doubt, if possible do clarify.Thanks for the amazing video again
Yes the number of hidden layers is based on number of words
Krish, Good job man.
Would you please make a video for BERT? Thanks!
Sigmoid function should be used instead of Softmax
sir what will be the ouput dimanension of O1
can anyone explain why bias is not calculated in RNN?
please share more link for more depth knowledge
why w1 and w' are different?
I can’t see where nlp videos are please any one can help