Hello Ahmad. Many thanks for your support! To be honest, I don't know much about gradient methods. I often use search-based optimization methods in my research such as GA, PSO ...
Honestly, this guy is incredible. He explains everything soo precisely and efficiently without any unnecessary information. Thanks a lot for this video. You made my life easier.
It's rare when less viewed video gives best explanation. Your presentations are almost like 3Blue1Brown or Khan academy! Don't know why this video has this less view!!
I'm here from yesterday's 3b1b video on Newton's method of finding roots, after wondering if there's any way to use it for minimizing a function. Mainly to see why we can't use it instead of Stochastic Gradiend Descent in Linear Regression. Turns out the Hessian of functions with many components can turn out to be large and computationally intensive, and also that if the second derivative is not a parabola, it can lead you far away from the minima. Still it was nice to see how the operation works in practice, and you mentioned the same points about Hessians too. Good job 😊👍
I have been watching your videos regularly and they are very informative. Thanking you for taking time to enlighten us. Would you mind making videos on conventional optimizationmethods like conjugate gradient methods?
he did all this hard work and sent it to the internet for free. and he doesn't get too much but what he gets is RESPECT and credit for bringing new aspiring engineers to earth.
Hats off ! Ahmad I have no words to let you know how grateful I am for this free course, it is not only well designed but also easy to follow, God bless you.
This course has literally changed my life. 2 years ago i started learning optimization from this course and now i am a software engineer intern at a great startup. Thanks Ahmad !
Just finished watching and following along with this. Ahmad, thank you so much! It took me about 12 hours to actually get through it cus I kept pausing and going back to make sure I got all the things right.
Thank you so much for wonderful series of videos. Can you please make a video to solve a bi-level optimization problem with a number of variables to solve using different optimization solvers, like GA etc.,? It will be very much appreciated.
HOLYYYYY FKKK !!!! I really wish I came across your video much before I took the painful ways to learn all this… definitely a big recommendation for all the people I know who just started with optimisation courses. Great work !!!!!
Hi Ahmad, how are you doing? Thank you so much for your videos. Personally, they have been very eye-opening and educational. This might be a far reached request, As a graduate student, your videos have been very helpful, especially with implementing which is missing in classes, but I'll like to know if you have any plan for a full-blown project implementation on any of your Playlists, be it ML or Math Optimization. Thank you
Hello Gaffar, I'm doing well, hope are you as well. I'm very glad you found it useful. As a matter of fact, this is a very great idea. I will give this a deep thought, then act accordingly. Thank you for you idea :)
Thanks for posting these videos. They are quite helpful. So, to ensure that we minimize and not maximize, is it sufficient to ensure that the newton step has the same sign (goes towards the same direction) as the gradient? Is it ok to just change the sign of the step if that's not the case? (my experiments seem to indicate its not, but what should be done then?)
Another problem is for a negative curvature, the method climbs uphill. E.g. ML Loss functions tend to have a lot of saddle points, which attract the method, so gradient descent is used, because it can find the direction down from the saddle
Sure. Consider the quadratic approximation f(x) ~ f(xk) + f'(xk) (x - xk) + 1/2 f''(xk) (x-xk)^2 at the bottom of the screen at 7:06. To minimize the right hand side, we can take the derivative with respect to x and set it to zero (i.e., f'(xk) + f''(xk) (x - xk) = 0). If you solve for x, you get x = xk - 1 / f''(xk) * f'(xk).
Hii Dr. Ahmad ,as per my knowledge this methods are used for Machine learning, where gradient descent is a classical algorithm to find minimum of a function( not always zero), If you know basics about ML then you will be familiar with loss function , so we have to minimize that function, for that we need its derivative to be zero, for finding that we use gradient as direction where the change in function is maximum.Now we have the direction but the we dont have the magnitude , for that we use a learning rate as a constant which is what 1st order does.In 2nd order we would use the magnitude which gives us the magnitude for which the point where derivative of function is 0 can be reached in less iterations.Thus 3rd order will ultimately result in finding the minimum of dervative of the loss function , but we need to find minimum of the loss function so ,it will be useless. Hope this was helpul
I think that the visualization makes sense if we think about approximating the function f(x) by its second order Taylor expansion around x_t. Taking the derivative of the second order Taylor expansion and setting it equal to zero leads us to the formula of the Newton's method for optimization. This operation is the same as minimizing the second order approximation of the function at x_t as depicted in the video.
Yes, I think that statement on Wikipedia is a little misleading. Any symmetric rank one update can be written as c^T c for some vector c. For a problem in R^n, then c will have n degrees of freedom, but equation (**) gives you n constraints, so it's not surprising that you get a unique solution.
Best lecture in quasi-newton method which I have so far found on internet!
Using LaTeX generated equations like a boss. Thank you sir Ahmad !
2:50 the animations are very nice. Thank you for taking time to record the lecture.
Hello Ahmad. Many thanks for your support! To be honest, I don't know much about gradient methods. I often use search-based optimization methods in my research such as GA, PSO ...
Ten minutes of this video explains better than an hour of lecture in the course I’m taking🤣 thanks for saving my brain!
Honestly, this guy is incredible. He explains everything soo precisely and efficiently without any unnecessary information. Thanks a lot for this video. You made my life easier.
It's rare when less viewed video gives best explanation. Your presentations are almost like 3Blue1Brown or Khan academy! Don't know why this video has this less view!!
To find this whole course freely available on TH-cam is such a gift. Seriously, you cover a LOT of ground.
I am a PhD student and I will be using optimization methods in my research.
I've known this man only for 40 minutes, but I feel like I owe him 40 decades of gratitude. Thank you for this awesome tutorial!
I'm here from yesterday's 3b1b video on Newton's method of finding roots, after wondering if there's any way to use it for minimizing a function. Mainly to see why we can't use it instead of Stochastic Gradiend Descent in Linear Regression. Turns out the Hessian of functions with many components can turn out to be large and computationally intensive, and also that if the second derivative is not a parabola, it can lead you far away from the minima. Still it was nice to see how the operation works in practice, and you mentioned the same points about Hessians too. Good job 😊👍
I have been watching your videos regularly and they are very informative. Thanking you for taking time to enlighten us. Would you mind making videos on conventional optimizationmethods like conjugate gradient methods?
Understandable with example, rather than those who explained long enough using matrix formula only. Thank you 🙏✨
I can't believe these type of courses are for free here, it's amazing how education has change.
he did all this hard work and sent it to the internet for free. and he doesn't get too much but what he gets is RESPECT and credit for bringing new aspiring engineers to earth.
Dude, I'm less than 2 minutes in and I just want to say thank you so much for creating this absolute monster of a video.
Your explanation is awesome. Extension from root-finding scenario to minimum-point-finding problem was exactly my question.
Hats off ! Ahmad I have no words to let you know how grateful I am for this free course, it is not only well designed but also easy to follow, God bless you.
Thanks so much for posting!!
Ahmad can really keep you hooked up on the way he explains things. What a legend.
I can't even imagine how long it took to complete this video. Thanks a ton for your effort.
This course has literally changed my life. 2 years ago i started learning optimization from this course and now i am a software engineer intern at a great startup. Thanks Ahmad !
Just finished watching and following along with this. Ahmad, thank you so much! It took me about 12 hours to actually get through it cus I kept pausing and going back to make sure I got all the things right.
ABSOLUTELY LOVE your 40 minute video series... Thanks a lot Ahmad :)😍
This guy is the most underrated youtuber on planet earth.
Can we just take a moment to appreciate this guy for providing this type of content for free ? great help, Thank you sir! 🙏🙏🙏
Thank you so much for wonderful series of videos. Can you please make a video to solve a bi-level optimization problem with a number of variables to solve using different optimization solvers, like GA etc.,? It will be very much appreciated.
What an absolutely epic contribution to the world. Thank you!
Gorgeous tutorial ! I have never even saw the pyhton interface in my life before, but with the help of your videos i feel like i understand a lot.
The way you explain this is so helpful - love the comparison to the linear approximation. Thank you!
This is wonderful!
this guy, sat for about 1 hour and talked about newton in one video, and then released it for free. legend
I really appreciate your precious effort ,not to mention how much fun and friendly to learn. Thanks Prof. Ahmad.
Excellent video, it really helps me for understanding the quasi Newton method, thank you very much!
Super clear explanations and very well put together. Thank you!
HOLYYYYY FKKK !!!! I really wish I came across your video much before I took the painful ways to learn all this… definitely a big recommendation for all the people I know who just started with optimisation courses. Great work !!!!!
Wonderful video for clearing optimization of newtons method for finding minima of function in machine learning
I love your videos!, having learnt all this in my gcses / a levels, just rewatching it after 4 months after my exams
This course is extremely useful. Thanks a lot. You did a great job!
Thank you, Ahmad, for the time and effort you took into making this marvellous tutorial. Much, much appreciated!
Thank you for the amazing optimization algorithms tutorial! We Appreciate your time and the effort to teach us coding 😃
man, perfect explanation. clear and intuitive!
An incredible work as usual. Congratulations for the whole video.
Sir your way of explaining is really good.
Thanks for this tutorial. Awesome explanations perfect for beginners and experts.
Really appreciate your course! Your tutorials are always so helpful.
Hi Ahmad, how are you doing? Thank you so much for your videos. Personally, they have been very eye-opening and educational. This might be a far reached request, As a graduate student, your videos have been very helpful, especially with implementing which is missing in classes, but I'll like to know if you have any plan for a full-blown project implementation on any of your Playlists, be it ML or Math Optimization. Thank you
Hello Gaffar, I'm doing well, hope are you as well.
I'm very glad you found it useful. As a matter of fact, this is a very great idea. I will give this a deep thought, then act accordingly. Thank you for you idea :)
Amazing explaination! This is very helful for understanding. Thanks a lot sir.
WoW! This is amasing work man, thank you.
This was such an awesome explanation, so grateful thank you.
Loved the graphical presentation
Thank you for the words of encouragement, I appreciate it!
thank u very much it was soo helpful can i get the pdf version !!
This is brilliant thank you, hope you give us more visual insight into calculus related things
Amazingly presented, thank you.
Brillant explanation, thank you so much.
Thank you very much for your suggestion! I will try my best.
Thanks for posting these videos. They are quite helpful. So, to ensure that we minimize and not maximize, is it sufficient to ensure that the newton step has the same sign (goes towards the same direction) as the gradient? Is it ok to just change the sign of the step if that's not the case? (my experiments seem to indicate its not, but what should be done then?)
Very nice. Thank you for your inside information Ahmad. Always pleased to watch your content.
This was exactly what I needed, thank you!
What the what?! Even I understood this. Killer tutorial!
superb,excellent,best video
Ahmad you should write your book , it'll be really helpful for literally a lot of people out there
Illuminating! Thank you
Thank you soo much for the amazing lecture.
Wow, this looks like a great course! 😀
Amazing video. Looking forward to more.
This was actually quite helpful :)
lovely explanation 🤩🤩🤩🤩🤩🤩
Another problem is for a negative curvature, the method climbs uphill. E.g. ML Loss functions tend to have a lot of saddle points, which attract the method, so gradient descent is used, because it can find the direction down from the saddle
Thanks for the amazing lecture
Amazing lecture! Muchas gracias!
Sure. Consider the quadratic approximation f(x) ~ f(xk) + f'(xk) (x - xk) + 1/2 f''(xk) (x-xk)^2 at the bottom of the screen at 7:06. To minimize the right hand side, we can take the derivative with respect to x and set it to zero (i.e., f'(xk) + f''(xk) (x - xk) = 0). If you solve for x, you get x = xk - 1 / f''(xk) * f'(xk).
Very nice and clear explanations
Hii Dr. Ahmad ,as per my knowledge this methods are used for Machine learning, where gradient descent is a classical algorithm to find minimum of a function( not always zero), If you know basics about ML then you will be familiar with loss function , so we have to minimize that function, for that we need its derivative to be zero, for finding that we use gradient as direction where the change in function is maximum.Now we have the direction but the we dont have the magnitude , for that we use a learning rate as a constant which is what 1st order does.In 2nd order we would use the magnitude which gives us the magnitude for which the point where derivative of function is 0 can be reached in less iterations.Thus 3rd order will ultimately result in finding the minimum of dervative of the loss function , but we need to find minimum of the loss function so ,it will be useless. Hope this was helpul
I think that the visualization makes sense if we think about approximating the function f(x) by its second order Taylor expansion around x_t. Taking the derivative of the second order Taylor expansion and setting it equal to zero leads us to the formula of the Newton's method for optimization. This operation is the same as minimizing the second order approximation of the function at x_t as depicted in the video.
Awesome video! Thank you!
Imagine how many people would have earned better living because of this effort put in by Ahmad. Huge Respect.
Ahmad is a legend !
OMG, this video just saved my homework
I hope listening to this brings more positive TH-cam channels like yours 💜
can someone please tell me whats the algebra needed for getting the newton method from the taylor series stated in 6:58. thank you in advance
Your videos are awesome!
Hi, can you please explain how do you convert alpha into 1 over second derivative of xk at 7:06? Thank you!
Thank you Ahmad !
Very good explanation
Awesome, thank you!
We appreciate you ❤️
Thank You so much Sir✨
really appreciate your work :)
Very great content
thanks , very informative
Amazing job! Thanks a lot!!
Again amazing
Liked, Subscribed and voted for him 👍
Good job. I am subscribing !
Yes, I think that statement on Wikipedia is a little misleading. Any symmetric rank one update can be written as c^T c for some vector c. For a problem in R^n, then c will have n degrees of freedom, but equation (**) gives you n constraints, so it's not surprising that you get a unique solution.
If we get df/dlambda equals to quadratic function or higher polynomials, which lambda we need to choose?
All the best!!
Thank you !