Not sure about loss in this case as bias correct is used to bring the exponentially weighted running average same as the gradients and that is what he proved mathematically in this lecture ie the expectation by using bias corrected exponentially weighted running average is equal to the expectation of gradients Ps: do correct and explain me if I'm wrong
If your entire data set is noisy, then it wont work, however if you have noise in some data and not on the entire set, then yes it may remove the noise.
The bias correction is just multiplying the mt with a constant. This is already done by the learning rate. And loss doesn't always decrease as we have seen the error to overshoot due to the presence of momentum. As long as E[mt] = c*E[gt], we are in the right track as learning rate will take care of the rest
Concise and to the point! Thank You
Cheers from Hungary!
Nice explanation sir
nice explaination....are there kids in the background?
They are students from IIT.😁
Sir your voice changed from this lecture
As E[mt] = true expected value of gradient after making bias correction ,does this mean loss will always decrease?
Not sure about loss in this case as bias correct is used to bring the exponentially weighted running average same as the gradients and that is what he proved mathematically in this lecture ie the expectation by using bias corrected exponentially weighted running average is equal to the expectation of gradients
Ps: do correct and explain me if I'm wrong
If your entire data set is noisy, then it wont work, however if you have noise in some data and not on the entire set, then yes it may remove the noise.
The bias correction is just multiplying the mt with a constant. This is already done by the learning rate. And loss doesn't always decrease as we have seen the error to overshoot due to the presence of momentum. As long as E[mt] = c*E[gt], we are in the right track as learning rate will take care of the rest