Notes to self : Nestorov Momentum is just look ahead GD, we first move by the amount suggested by the history, then calculate the gradient and make the final move.
Excellent explanation . I have one observation with respect to the content in slide and code you shared. From the slide initially wlook_ahead = wt - gamma * update t-1. But in code v_w = gamma * prev_v_w i.e., wlook_ahead = gamma * prevW .. so getting confused here..
Such a great video! Thanks a lot!. If you could share a github with the code and the simulation, it would be really nice, to be able to try it faster, without rewriting. I would also appreciate if you guys could show how exactly is "update " vector. formed. And how is the NAG seen in a computer graph within the whole NN.
Sorry, this explanation is too vague to make sense: it reveals nothing about how nor why it works. For instance, the method works provably well only for very specific values of gamma and eta, while your explanation would "prove" that the method works for a large range of values of these parameters.
Best explanation I've seen on NAGD
Bro I'm watching this years after Graduating, props for the excellent explanation
Best explanation indeed for NAGD. I completely understood the derivation especially the look ahead term.
Great explanation sir ! especially the last 5 minutes of the explanation.
This is pure gold !
Excellent explanation! Thank you.
You explain things soo clearly
Amazing video, nicely explained
man you are really a good teacher thank you
Notes to self : Nestorov Momentum is just look ahead GD, we first move by the amount suggested by the history, then calculate the gradient and make the final move.
best explaination
Dude it was an excellent explanation, which i was looking for ofcourse.👍
Wow awesome why to go to Coursera edx and to del with accent when our professors are so good 😂 thank you sir
Very good explanation😀
Awesome lecture
Great vid
Very, very good! Thanks:)
what and awesome explanation.. now I gotta subscribe to one fourth labs
Excellent explanation . I have one observation with respect to the content in slide and code you shared. From the slide initially wlook_ahead = wt - gamma * update t-1.
But in code v_w = gamma * prev_v_w i.e., wlook_ahead = gamma * prevW .. so getting confused here..
Thank you !
Nice explanation!
At 1:11, shouldn't it be w(t-1) = w(t) + gamma*update(t-1) - eta*gradient @NPTEL
Holy smoke!!
Such a great video! Thanks a lot!. If you could share a github with the code and the simulation, it would be really nice, to be able to try it faster, without rewriting. I would also appreciate if you guys could show how exactly is "update " vector. formed. And how is the NAG seen in a computer graph within the whole NN.
Amazing video first time on NPTEL.
best
24k Gold for free.
Sorry, this explanation is too vague to make sense: it reveals nothing about how nor why it works. For instance, the method works provably well only for very specific values of gamma and eta, while your explanation would "prove" that the method works for a large range of values of these parameters.
worst explanation