Thanks again Dr Strang. For those of us coming from the ill posed problems world (a.k.a.Inverse Problems world) this lecture is both gold and awesome. A million thanks
DR. Strang thank you for a powerful lecture on the Survey of Difficulties with Ax=b. In linear systems this equation is displayed in many technical books, with little or no thought about it's difficulties. From watching this lecture, the difficulties of this equation is explained in great detail.
So far exactly what I need. A note: This is a hybrid of algebra and statistics so you may rewrite A as X, b as y, and x as b. This has helped me relate the presented material to associated material in the statistical literature.
So I'm literally writing a paper on inverting neural networks. It's fairly direct (yet not trivial) under certain nice assumptions. My plan was to begin to investigate what happens as you relax these assumptions. It's perfectly timed that I'm taking this course :))))
23:32 different ways to generalize solution for test data use Lipschitz Continuity bound to your loss function(Usually while applying SGD we have a methodology which combines the two using Lipschitz Continuity Constraint Projection.
I find out that If x0 = A+b (plus is the pseudoinverse,not sum) than two facts are true : 1. Ax0 is the closest point to b among Ax. 2. x0 has minimum norm among all x0s that satisfies condition 1 above (it is good to mention that in theory points satisfying condition 1 may be several)
So, can someone explain (or point in the right direction) how is the penalty actually helps our inverse problem? I understand what it will bump up very little sigmas of the og matrix, but why is that useful to us?
Question: If A is poorly conditioned, I thought QR decomposition was computationally unstable as well. Why would we do this with a poorly conditioned matrix?
What does he actually mean when he says "it's inverse is going to be big" .. he says it 2 times.. when talking about columns in bad condition and nearly singular matrix.. please help . And also, please tell what problems could a "big" inverse cause ?
hmm here's my guess: when a matrix is nearly singular, at least one of its singluar values must be close to 0 (check out 24:50). When you invert it, a singular value will be 1/(the small singular value) which is large (i.e. some entry in the matrix is large). e.g. inv([ 1 1 ; 1 1.001] ) % nearly singular let's say is 1001 -1000 -1000 1000
Singular matrices have at least one zero eigenvalue. If it's nearly singular, at least one eigenvalue is nearly zero which causes it to blow up when finding the inverse. As a result, the inverse of a near-singular matrix will involve massive numbers
I guess it means a matrix with small singular values will get an inverse matrix with big singular values. It can be understood by considering a simple diagonal matrix case.
All the comments add value to answer the question, my ¢5 contribution would be to say that if the singular values are close to zero, when you take the inverse they will become large numbers, and in computational terms you are mixing large numbers with small values close to the computationally round off error going to machine precision of zero, a big headache
38:24 Why does σ/(σ^2+δ^2) go to zero when σ goes to zero? I think it should be infinity because 1/σ (σ approaching zero) is 1/0 = infinity. Any one explains? Thanks!
Correct, you can't separate the inverse like that (there are formulas for doing something related, e.g. en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula, but that's not really relevant here). Here it's simply the inverse of ATA + d^2 I, where ATA may not be invertible (ie if any eigenvalues are zero), but adding d^2 to the leading diagonal adds d^2 to all the eigenvalues, which must then all be positive and it is invertible.
It's at first a magic answer, but the solution is at 40:39, where it shows that choosing delta squared l2 norm x squared will transform (AtA)^-1 by adding a value on the diagonals such that near-zero eigenvalues will no longer blow up
Best Teacher in the World. I have viewed many lectures in machine learning but, Nobody explains better than Prof. Strang. Salute
Dr. Strang & MIT open- you are a gift to entire field of engineering
In Dr. Strang's lecture, every idea rolls out so naturally that you feel the world should have been like that in the first place.
Thanks again Dr Strang. For those of us coming from the ill posed problems world (a.k.a.Inverse Problems world) this lecture is both gold and awesome.
A million thanks
The lecturer could use very simple examples to illustrate very difficult materials. What a teacher he is.
DR. Strang thank you for a powerful lecture on the Survey of Difficulties with Ax=b. In linear systems this equation is displayed in many technical books, with little or no thought about it's difficulties. From watching this lecture, the difficulties of this equation is explained in great detail.
18.06 + 18.065 + Deep Learning Specialization = Great Combo
Agreed!
What is the last one? A course or you meant to apply the learning from the previous two courses? Thanks!
@@sicongliu1484 deep learning specialization is another online course
So far exactly what I need. A note: This is a hybrid of algebra and statistics so you may rewrite A as X, b as y, and x as b. This has helped me relate the presented material to associated material in the statistical literature.
The video series is a great and fast review of linear algebra.
So I'm literally writing a paper on inverting neural networks. It's fairly direct (yet not trivial) under certain nice assumptions. My plan was to begin to investigate what happens as you relax these assumptions. It's perfectly timed that I'm taking this course :))))
Sounds good!
wish you luck buddy
23:32 different ways to generalize solution for test data use Lipschitz Continuity bound to your loss function(Usually while applying SGD we have a methodology which combines the two using Lipschitz Continuity Constraint Projection.
Any links that address 5:25 about whether deep learning and the iteration from stochastic gradient descent go to the minimum l1 norm?
Thanks. Bro
@42:59 There is no power of 2 on the l1 norm for the lasso problem.
In case 3 where m
I find out that If x0 = A+b (plus is the pseudoinverse,not sum) than two facts are true :
1. Ax0 is the closest point to b among Ax.
2. x0 has minimum norm among all x0s that satisfies condition 1 above (it is good to mention that in theory points satisfying condition 1 may be several)
thanks for such a great content
So, can someone explain (or point in the right direction) how is the penalty actually helps our inverse problem? I understand what it will bump up very little sigmas of the og matrix, but why is that useful to us?
In linear case, regulation for l2 norm is just pseudo inversion! How about nonlinear case say the case in the deep learning?
Question: If A is poorly conditioned, I thought QR decomposition was computationally unstable as well. Why would we do this with a poorly conditioned matrix?
"And for my last trick I will pull the answer down from the board above"
prof. Gilbert strang : when i say we i don't mean i
siraj the scammer : objection mylord
What does he actually mean when he says "it's inverse is going to be big" .. he says it 2 times.. when talking about columns in bad condition and nearly singular matrix.. please help . And also, please tell what problems could a "big" inverse cause ?
hmm here's my guess: when a matrix is nearly singular, at least one of its singluar values must be close to 0 (check out 24:50). When you invert it, a singular value will be 1/(the small singular value) which is large (i.e. some entry in the matrix is large).
e.g.
inv([ 1 1 ; 1 1.001] ) % nearly singular let's say
is
1001 -1000
-1000 1000
Singular matrices have at least one zero eigenvalue. If it's nearly singular, at least one eigenvalue is nearly zero which causes it to blow up when finding the inverse. As a result, the inverse of a near-singular matrix will involve massive numbers
Yashwanth Soodini thank youuuu
I guess it means a matrix with small singular values will get an inverse matrix with big singular values. It can be understood by considering a simple diagonal matrix case.
All the comments add value to answer the question, my ¢5 contribution would be to say that if the singular values are close to zero, when you take the inverse they will become large numbers, and in computational terms you are mixing large numbers with small values close to the computationally round off error going to machine precision of zero, a big headache
31:18 Why there is only two cases? Why not consider sigma < 0?
Because it's a positive definitive matrix system, that's is guaranteed to be >= 0
@@elyepes19 no? it requires only A^TA to be a semi p.d matrix, which is always the case. I believe σ can be smaller than 0.
38:24 Why does σ/(σ^2+δ^2) go to zero when σ goes to zero? I think it should be infinity because 1/σ (σ approaching zero) is 1/0 = infinity. Any one explains? Thanks!
@@shashankshekhar3891 Got it! Appreciate for your clarification!
The limit really should be evaluated by L'Hospital's rule. It comes down to how fast delta goes to zero relative to sigma.
Can someone help me understand how the pseudoinverse conclusion at 37:52 is reached?
superhanfeng This is b’cos pseudo inverse has 1/sigma_i for sigma >0 and if sigma=0 it takes value 0
What is (ATA + d^2 I) ^-1? I think it cant be separated as (ATA )^-1 + (d^2 I)^-1.
Correct, you can't separate the inverse like that (there are formulas for doing something related, e.g. en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula, but that's not really relevant here).
Here it's simply the inverse of ATA + d^2 I, where ATA may not be invertible (ie if any eigenvalues are zero), but adding d^2 to the leading diagonal adds d^2 to all the eigenvalues, which must then all be positive and it is invertible.
I don't understand where delta squared multiply norm x squared come from ? why it help us fix problem ?
It's at first a magic answer, but the solution is at 40:39, where it shows that choosing delta squared l2 norm x squared will transform (AtA)^-1 by adding a value on the diagonals such that near-zero eigenvalues will no longer blow up
Probabilistically, you're setting a prior on the values of x's. This is a form of regularization.
where are the notes of prof?
The course materials are on MIT OpenCourseWare at: ocw.mit.edu/18-065S18. Best wishes on your studies!
Thanks@@mitocw
I understand previous lessons very well but not this one
已知Ab求x, 如果A=UΣV包含很小的特征值,则Σ^-1很大,b的一点变动就会在x中放大:x=A^-1 b(ill conditioned), 所以引入δ以平滑结果(x) (b通常是采样来的 含有(高斯)噪声)
I love it even if I use my own knowledge. I say that unknown language in this comment could swap energy and sides and direction.
WOW
Damn i really want that book but it costs like 80$ and only avaliable in paperback
20220530 簽
camera angle !@#$!$, come on!