I can't even imagine the amount of effort you put behind making these videos. When I have to explain something worth 10 minutes to my friends, I have to prepare myself for hours. You make these long well researched videos. I really can't imagine the effort you have to put. You are really a gem. Thank you very much for doing this.
Exactly true bro...personally I want say ….I never seen such informative channel on TH-cam that give detail information on machine learning topics. After seeing the 100 days of machine learning series my ml knowledge is totally boost up and Thank you sir for Everything. hope sir you are going to make video playlist on image processing... take one image apply filter and process the image
By far the best explanation of BN I have ever seen. Thanks a lot for your such a tremendous effort. Now I will surely go through all your videos about NNs.
I am learning data science since a year now and your channel is the last place where i get my every query , errors or concepts are cleared . your channel is institute itself it has every thing to become a data scientist. you are doing a great job and plz keep it up Sir . Thank you very much for putting these much efforts to teach in-depth . and one request plz make video on time series analysis 🙏
You are amazing sir.. 😊 thank you for giving such amazing and hard content in a very precise manner. You have ma saraswati s blessing for grasping and spreading knowledge🙏 .
All I can say is, I wholeheartedly appreciate your effort! Just one thing though, if your videos were in English, people accross the globe could be benifitted. Thank you!
This is amazing stuff, but the challenge is to keep all the concepts in the brain. Another point, you are telling to normalizing the activation Z11, but Z11 is output for a particular node, so I think you meant to say firstly to normalize the output, then secondly, multiply with gamma and addd beta and then thirdly, pass this resultant through the activation function. got little confused here.
Its a great lecture, probably the best one. just some minor things which bothered me in maths, so writing it down for any other person who is watching. 20:02, i think the bias term would be 2x1 matrix (2 rows and 1 columns), instead of 1,2 stated above? update: something wrong with the maths logic here, matrix multiplication is not commutative or wt.x not equal to x.wt. In video (4,2)*(2,2) = x.wt another flaw, can't add two terms with different matrix dimensions. in video (4,2) + (1,2) My interpretation is, it takes single row and calculates a1, then another point and calculates a2, in the end we would have four numbers at node1 and then we can calculate meu, std at node1 of these four points and process repeats like a loop for given batch size (4 here).
34:05 had lot of problem in understanding how ewma is working, (came back after taking lecture on ewma and messing with chatgpt) this is what my tiny brain could grasp, correct me kindly, whereever wrong - as still confused: Q1: how ewma would be calculated after each epoch? Soln: alpha = 1 - beta EWMA_now = (1-alpha) EWMA_last_batch + alpha * MEAN_for_this_batch you get a new EWMA . after each epoch, you would use the same previous calculated ewma and move forward. For test, standardized_value = (new_data_point - mean or EWMA) / σ (if you have it) -------- formula copied from chatgpt Q2: I was confused kae std kon sa use karain gae testing main, as each node per std nikal rha. so here is chatgpt answer: during testing, use the per-node, per-epoch standard deviations calculated during training for the nodes in your network to ensure consistent and accurate normalization. Note: it also said, if you don't have the standard deviation explicitly calculated during training, you would use only the mean or EWMA values (if you have calculated them during training) for standardization during testing.
Hello sir, Firstly thankyou for teaching such great content highly appreciate your efforts. I had one doubt (34:05) why are we using the last batch values of mean and SD (EWMA) during testing?
Sir, we request you to please complete the machine learning interview questions playlist as soon as possible. Campus placements are starting from 25 July.
Nitesh - Does the weight initialization concepts apply to Ganma and Beta also? And are there concepts like sgd to optimize Gamma and Beta values also ?
Hi sir Since I following you on TH-cam from couple of year...there is a chance that you are going to make 100 days of playlist in Image processing video because lot of company are looking that the person should have expertise in Image processing.
I went through the paper for He initialisation. There they proved that once we initialize the weights according to the given formula, the input and output functions (Z) have the same variance. And this is preserved during the backpropagation step as well. So if we just normalise the inputs, then there would be no need to use batch normalisation right? Kaiming He initialisation should be sufficient.
⁉⁉⁉⁉⁉⁉🤔🤔🤔🤔Sir I have a doubt in batch normalization , as the the trainable paramaters increases for the batch normalization i.e., gamma, and beta and also it calculates its mean and standard deviation for all values in the batch. So my question is: Doesn't it Increases it's Computation time during training?
SITA RAMAM 1 second ago Hi Sir can you please clarify my doubt here. Dense Hidden layer parameters like : {512*500)+500 I also got like BN_1 : 500*4 in this what is this 4 SIr. according to me its no.of learning parameters. Can you please confirm me
I can't even imagine the amount of effort you put behind making these videos. When I have to explain something worth 10 minutes to my friends, I have to prepare myself for hours. You make these long well researched videos. I really can't imagine the effort you have to put. You are really a gem. Thank you very much for doing this.
Exactly true bro...personally I want say ….I never seen such informative channel on TH-cam that give detail information on machine learning topics. After seeing the 100 days of machine learning series my ml knowledge is totally boost up and Thank you sir for Everything. hope sir you are going to make video playlist on image processing... take one image apply filter and process the image
Yes bro
Yes
looks like 48 days of information in 43 minutes salute to you man
So many blogs and other videos are contradictory and confusing...until Nitesh sir explained it visually with simplification...thanks a ton😊
Thanks a ton! This is a criminally underrated channel!
O my God! Sir salute to you 🔥 you're like leaving no questions in my mind, your videos are in so much detail .... Hat's off to you sir
By far the best explanation of BN I have ever seen. Thanks a lot for your such a tremendous effort. Now I will surely go through all your videos about NNs.
Just can't believe it , that someone can explain this complex topic in such a smooth way.
I am learning data science since a year now and your channel is the last place where i get my every query , errors or concepts are cleared . your channel is institute itself it has every thing to become a data scientist. you are doing a great job and plz keep it up Sir . Thank you very much for putting these much efforts to teach in-depth . and one request plz make video on time series analysis 🙏
Amazing clarity and explanation .. by far the best channel on the subject. Take a bow Nitish !! thanks a ton.
Amazing explanation ,No where I found such amazing explanation. Thank you very much
Dhanya ho guru , bahut clearly samjha diya
Thank you so much sir, You are a true gem! I'll never forget your contribution in my journey of ML/DL.
Most underrated channel.
sir you are best teacher ever in this world of data science
You are amazing sir.. 😊 thank you for giving such amazing and hard content in a very precise manner. You have ma saraswati s blessing for grasping and spreading knowledge🙏 .
code is simple thx to Keras lib ,,but the batch normalization concept my God - So deeply involved & interesting to ... sir 🙂
I'm so happy he is gaining momentum in subscribers 👌👌👌
it was amazing video .............everything explained so nicely .....great ....thank you
sir please optimizers videos its very hard to understand them and you're doing great work so thanks
Your effort really shows up in your teaching and video. Keep it up. You have gained a new subscriber.⭐
you are doing best job sir , at least for me.
your teaching way is very good sir ji thank you so much
please upload lecture on CNN as soon as possible sir ji ,please please
Thanks sir. Very informative and detailed discussion was there.
Thanks a lot for the wonderful explanation!
All I can say is, I wholeheartedly appreciate your effort! Just one thing though, if your videos were in English, people accross the globe could be benifitted. Thank you!
Awesome, yaar!
You are very good at making anyone understand anything. But bad luck 💔, you are underrated.
Keep up your work, and Allah will help you.
This is amazing stuff, but the challenge is to keep all the concepts in the brain. Another point, you are telling to normalizing the activation Z11, but Z11 is output for a particular node, so I think you meant to say firstly to normalize the output, then secondly, multiply with gamma and addd beta and then thirdly, pass this resultant through the activation function. got little confused here.
Hi Bro,please finish interview ,NLP playlist as well topic modelling NER CRF encoders transformers chatbots etc ..also start deployment mlops
Great sir keep uploading 🙏🙏🙏 after finishing DL plz complete NLP
This channel is so under rated.
Sir if you want to start your paid courses then I will be the first one to buy it
I love you man, what a lecture!!
great explanation of a critical topic.
Great video. Not only u explained batch normalization with implementation, but also cleared y nomalization is useful in general.
Its a great lecture, probably the best one. just some minor things which bothered me in maths, so writing it down for any other person who is watching.
20:02, i think the bias term would be 2x1 matrix (2 rows and 1 columns), instead of 1,2 stated above?
update: something wrong with the maths logic here, matrix multiplication is not commutative or wt.x not equal to x.wt. In video (4,2)*(2,2) = x.wt
another flaw, can't add two terms with different matrix dimensions. in video (4,2) + (1,2)
My interpretation is, it takes single row and calculates a1, then another point and calculates a2, in the end we would have four numbers at node1 and then we can calculate meu, std at node1 of these four points and process repeats like a loop for given batch size (4 here).
amazing vid on BN sir
Great explanation
Goated Tutorial
Thank You Sir.
Isn't the mean is 0 and SD 1 after Standardization? Normalization is when the values are made between 0 & 1? I am little confused with the terms
I am trying to make sense of the same.
great job bro
Thank you
could you explain "Disharmony between Dropout and Batch Normalization" and suggest a good solution?
Thanks sir
34:05 had lot of problem in understanding how ewma is working, (came back after taking lecture on ewma and messing with chatgpt) this is what my tiny brain could grasp, correct me kindly, whereever wrong - as still confused:
Q1: how ewma would be calculated after each epoch?
Soln:
alpha = 1 - beta
EWMA_now = (1-alpha) EWMA_last_batch + alpha * MEAN_for_this_batch
you get a new EWMA . after each epoch, you would use the same previous calculated ewma and move forward.
For test,
standardized_value = (new_data_point - mean or EWMA) / σ (if you have it) -------- formula copied from chatgpt
Q2: I was confused kae std kon sa use karain gae testing main, as each node per std nikal rha.
so here is chatgpt answer:
during testing, use the per-node, per-epoch standard deviations calculated during training for the nodes in your network to ensure consistent and accurate normalization.
Note: it also said, if you don't have the standard deviation explicitly calculated during training, you would use only the mean or EWMA values (if you have calculated them during training) for standardization during testing.
Hello sir, Firstly thankyou for teaching such great content highly appreciate your efforts. I had one doubt (34:05) why are we using the last batch values of mean and SD (EWMA) during testing?
Sir, we request you to please complete the machine learning interview questions playlist as soon as possible.
Campus placements are starting from 25 July.
How was your placement?
Nitesh - Does the weight initialization concepts apply to Ganma and Beta also? And are there concepts like sgd to optimize Gamma and Beta values also ?
Hi sir Since I following you on TH-cam from couple of year...there is a chance that you are going to make 100 days of playlist in Image processing video because lot of company are looking that the person should have expertise in Image processing.
WOW What an explanation
csv file is not available in the directory
I went through the paper for He initialisation. There they proved that once we initialize the weights according to the given formula, the input and output functions (Z) have the same variance. And this is preserved during the backpropagation step as well. So if we just normalise the inputs, then there would be no need to use batch normalisation right? Kaiming He initialisation should be sufficient.
thanks for the great video:)
Good explanation. Keep it up
31:44 Why is there a need, if I have trained my network for batch normalization then I will use same weights and biase to test.
Really Great
your teaching is damn good sir
How come it's normalization? could anyone please clarify this for means 0 and standard deviations 1 means standardization.
17:20,
In 5:13 can you please explain what are the contour plots of.
loss functions
can you share the onennote you have used it would be very helpful
Please finish NLP playlist
if we have an activation function which will also provide output in normalize form then why do we need to add batch normalization on top of that?
leakyrelu, elu, selu do not have fixedrange, i.e, it is unsaturated so this will help.
Amazinggggg
finished watching
❤❤❤❤❤
best
sir ji thumbnail pe thoda dhyaan do view count mein kaafi difference hai if you are adding good thumbnail ,even in large video 🧐
sir please share concentriccircle dataset
⁉⁉⁉⁉⁉⁉🤔🤔🤔🤔Sir I have a doubt in batch normalization , as the the trainable paramaters increases for the batch normalization i.e., gamma, and beta and also it calculates its mean and standard deviation for all values in the batch. So my question is:
Doesn't it Increases it's Computation time during training?
🙏
SITA RAMAM
1 second ago
Hi Sir can you please clarify my doubt here.
Dense Hidden layer parameters like : {512*500)+500
I also got like BN_1 : 500*4 in this what is this 4 SIr. according to me its no.of learning parameters.
Can you please confirm me
How many days it will take still to complete deep learning
Hi SIr can you please clarify my doubt here.
Pls share dl onenote
This is basically standardization right ? then why it is called batch normalization..?
Sir deep learning ki videos boht late ho jati hai
Ha bhai sir ko mai ne live me pucha tha sir ne bola ke research lag ta hai is liye late hoti hai video
Then that's even more Great
Bhai log sach me bahot mehnat lag rahi. Mazak nai kar raha 😪
@@campusx-official thank you so much sir for your efforts you are giving in it for us
That means a lot
And sry if that's offending
@@campusx-official ha sir mai khali bata raha tha
Aditya Mishra ji ko sorry agar galat laga
3:55 mean =0 means normalization or standardization?
Thanks in advance
same doubt.. it should be standardization tho
Standardization
22
💎💎💎💎💎
You are damn good...