For someone who is wondering why the function stays linear even when we squared terms of observations is because that the observations can always be modified. E.g., an observation X*X can always be replaced X since it's just the observation but the same does not hold true for the parameters of the model. That's why if the function is linear w.r.t. to the betas, the least-squares optimization would be linear.
Great explanation! I got a quesion: At 1:27 you mention that the function stays linear, even when adding the x-squared term, and that the linearity depends on the beta´s, not the x´s. This is mind blowing. I always thought the linearity depends on the x´s. Could you shortly elaborate why that is?
This is how ordinary least squares (OLS) work. To solve OLS, you differentiate w.r.t. beta's, and you get a system of equations. That system is linear, if the original regression is linear w.r.t. the beta's. It doesn't matter if the x's are non linear. If you move to matrix form - you can simply expand your design (X) matrix to include these square terms. Remember that the X's are your data points, they are known. There is no problem to take any function of them - be it square, cube, etc.
Awesome lecture. Thank you very much. One question tho. Why at 5:47, the final result has become the transpose of the Jacobian and not the Jacobian itself?
Specifically I read this in a book called Computational Statistics (2005). But it was mentioned there a bit briefly (only the 2nd view). I had to also google other resources online.
Ready Scipy's docs, these are the references for non-linear least squares M. A. Branch, T. F. Coleman, and Y. Li, “A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems,” SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp 1-23, 1999. William H. Press et. al., “Numerical Recipes. The Art of Scientific Computing. 3rd edition”, Sec. 5.7. R. H. Byrd, R. B. Schnabel and G. A. Shultz, “Approximate solution of the trust region problem by minimization over two-dimensional subspaces”, Math. Programming, 40, pp. 247-263, 1988. A. Curtis, M. J. D. Powell, and J. Reid, “On the estimation of sparse Jacobian matrices”, Journal of the Institute of Mathematics and its Applications, 13, pp. 117-120, 1974. J. J. More, “The Levenberg-Marquardt Algorithm: Implementation and Theory,” Numerical Analysis, ed. G. A. Watson, Lecture Notes in Mathematics 630, Springer Verlag, pp. 105-116, 1977. C. Voglis and I. E. Lagaris, “A Rectangular Trust Region Dogleg Approach for Unconstrained and Bound Constrained Nonlinear Optimization”, WSEAS International Conference on Applied Mathematics, Corfu, Greece, 2004. J. Nocedal and S. J. Wright, “Numerical optimization, 2nd edition”, Chapter 4. B. Triggs et. al., “Bundle Adjustment - A Modern Synthesis”, Proceedings of the International Workshop on Vision Algorithms: Theory and Practice, pp. 298-372, 1999.
For someone who is wondering why the function stays linear even when we squared terms of observations is because that the observations can always be modified. E.g., an observation X*X can always be replaced X since it's just the observation but the same does not hold true for the parameters of the model. That's why if the function is linear w.r.t. to the betas, the least-squares optimization would be linear.
Thank you so much for this incredible tutorial!
Thanks so much for this great video. It helped a lot
Thank you so much for your clear explantion. It was incredible!
Thank you!
Great explanation! I got a quesion:
At 1:27 you mention that the function stays linear, even when adding the x-squared term, and that the linearity depends on the beta´s, not the x´s. This is mind blowing. I always thought the linearity depends on the x´s. Could you shortly elaborate why that is?
This is how ordinary least squares (OLS) work. To solve OLS, you differentiate w.r.t. beta's, and you get a system of equations. That system is linear, if the original regression is linear w.r.t. the beta's. It doesn't matter if the x's are non linear. If you move to matrix form - you can simply expand your design (X) matrix to include these square terms. Remember that the X's are your data points, they are known. There is no problem to take any function of them - be it square, cube, etc.
Thanks for the quick response, that fully answered my question!
Absolutely fab! Thank you!
Really nice! Thank you
Awesome lecture. Thank you very much. One question tho. Why at 5:47, the final result has become the transpose of the Jacobian and not the Jacobian itself?
Very very useful, thanks
Thanks a lot~
You've got good shit
buenisimo, gracias😄
What is Bk you use during the second derivative?
The code is not available, could anyone say me where find it, please?
This is the link to view the Jupyter Notebook: github.com/MaverickMeerkat/TH-cam/blob/master/Code/Gauss-Newton.ipynb
It is not available, could anyone say me where is the code, please?
Better than Isaac!
Could you recommend me some books where I could learn from this?
Specifically I read this in a book called Computational Statistics (2005). But it was mentioned there a bit briefly (only the 2nd view). I had to also google other resources online.
Ready Scipy's docs, these are the references for non-linear least squares
M. A. Branch, T. F. Coleman, and Y. Li, “A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems,” SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp 1-23, 1999.
William H. Press et. al., “Numerical Recipes. The Art of Scientific Computing. 3rd edition”, Sec. 5.7.
R. H. Byrd, R. B. Schnabel and G. A. Shultz, “Approximate solution of the trust region problem by minimization over two-dimensional subspaces”, Math. Programming, 40, pp. 247-263, 1988.
A. Curtis, M. J. D. Powell, and J. Reid, “On the estimation of sparse Jacobian matrices”, Journal of the Institute of Mathematics and its Applications, 13, pp. 117-120, 1974.
J. J. More, “The Levenberg-Marquardt Algorithm: Implementation and Theory,” Numerical Analysis, ed. G. A. Watson, Lecture Notes in Mathematics 630, Springer Verlag, pp. 105-116, 1977.
C. Voglis and I. E. Lagaris, “A Rectangular Trust Region Dogleg Approach for Unconstrained and Bound Constrained Nonlinear Optimization”, WSEAS International Conference on Applied Mathematics, Corfu, Greece, 2004.
J. Nocedal and S. J. Wright, “Numerical optimization, 2nd edition”, Chapter 4.
B. Triggs et. al., “Bundle Adjustment - A Modern Synthesis”, Proceedings of the International Workshop on Vision Algorithms: Theory and Practice, pp. 298-372, 1999.
Dear can you share your email? I want to talk to you regarding the implementation of this method for another particular case.