Gilbert Strang is a treasure to the world. His books are top notch, and the MIT Open Courseware has made it possible for myself and so many others to learn at our own pace. Many thanks Dr. Strang!
Awesome! I could watch the lectures over and over and each time, I gain a different and extra insight into the results and the process. Linear Algebra could turn into magic after you apply it’s theory--the once dull computations of rows and columns. The most fascinating aspect of linear Algebra is when the entire matrix details unfold into a single effect, one that resembles the effect of simple scalars used in simple math formulas. Thanks to the fundamental concept of the Eigenvectors and Eigenvalues. Understanding the very essence of a matrix and what it does to any vector quantity (change in magnitude and direction) is the key concept to appreciating the underlying concept of various matrix decompositions. It all boils down to how many ways we could see the workings of a matrix in terms of its decomposed forms. The SVD composition is one of those decompositions of a matrix that allows a matrix to be seen as a sequential transformation of direction and magnitude in terms of some orthonormal basis characteristic of such matrix and its Eigenspace. Such a beauty!
Thank you very much Dr. Strang for making this intuitive and natural. I had only gone through SVD in bits and pieces and never had what I could consider a good enough understanding until I saw this lecture.
He just saved my from the brim of breaking down trying to understand SVD! What he presented is so natural, so beautiful that I instantly catch them. Appreciate MIT open courseware also!
I am confident once I went through a Refresher up to this point that his explanation will be as clear as water😀 Because none of my advanced mathematics professors ever explained in such detail!! Awesome professor 👍🏾🙏
Poles (eigenvalues) are dual to zeros -- optimized control theory. The standard basis is dual to the eigen vector basis -- symmetric matrices. Stretch is dual to squeeze -- eigenvalues, forces are dual, ellipsoids are dual to circles or hyperspheres. "Always two there are" -- Yoda. Syntax is dual to semantics -- languages or communication. If mathematics is a language then it is dual. The teacher (Gilbert Strang, Yoda) is dual to the pupil (You, Luke Skywalker) -- the Hegelian dialectic.
It is better not to rotate when getting ellipse. Ellipse has to be done without rotating, because multipling by lambda makes stretching along axes. Then of course should be rotating by matrix U. I.e. when ellipse first appeared on the board it should be vertically placed, not inclined. It is possible to draw - you should placed vectors 1 and 2 inclined initially on the first step. Then after multipling by V they will be along axes. Then stretching, then once again rotating. Still, it is a good lecture! :)
I cannot thank you enough for this clear, insightful lecture. It is very inspiring that you are calling the matrices "beautiful", and I can feel your love for them :) For me your lectures are beautiful.
Good lecture, but I think the geometric intepretation of SVD is its motivation. We should use SVD as a tool to understand general linear transformation
@6:49 The fact that A^T * A and A * A^T have the same non-zero eigenvalues is blowing my mind right now. I guess this is intimately related to the fact that det(B) = det(B^T), since det is where we get the characteristic polynomial/egenvalues...
@@allyourcode You're mistaken. It is not only used but it's crucial for this to work. The fact that the eigenvalues are the same is what enables to have the same Sigma matrix for both of the matrices, AA' and A'A. If that was not the case, well, the technique would lead us astray.
In a previous lecture, professor mentioned that eigenvalues of AB and BA (similar matrixes) are the same. If we consider A' as B and A as A, then A'A and AA' should have the same eigenvalues.
If A^T A V = V S is the spectral decomposition of A^T A with S diagonal and V orthogonal, we have that (A V)^T (A V)= S. Hence, the nonzero columns of AV, which is associated with the nonzero columns of A, is orthogonal with all other nonzero columns of AV. Hence, is possible to find an orthogonal matrix U such that A V = U sqrt(S) -- note that the diagonal of S give us the squared norm of each column of AV, which means this is completely fine. Now, this means A=U sqrt(S) V^T. Completing U, if necessary, we have the SVD decomposition. In practice, after the calculation of the spectral decomposition of A^T A, we would calculate the Kern of (AV)^T -- orthogonal to Im(AV) -- and find an orthogonal basis to it, and of course, all the singular values associated with this process would be zero.
I love prof. Strang and I'm very grateful for his lectures, but I have to admit I don't find this proof convincing. Correct me if I'm wrong, but the project he starts out with (@3:32) is to prove that every matrix A has this decomposition. But @13:26, he starts out by **assuming** that A has this decomposition, and then he deduces what U, Σ, and Vᵀ must be. But that doesn't show that every matrix A has this decomposition. Instead, it shows that *if A has this decomposition*, then *that's what U, Σ, and Vᵀ must be*. Even if we were to argue that we can obtain such U, Σ, and Vᵀ for any matrix A, prof. Strang doesn't show that by multiplying UΣVᵀ we will get back A for any A. I struggled with this topic for months until I finally could arrange everything into what I believe is a proof for why every matrix A can be decomposed this way.
@@dariuszspiewak5624 Hi. I'm going to paste what I've written in my notes. The non-mathematic notation is in Portuguese (my native language), but I think Google Translate will probably do a good job. In any case, I haven't been peer-reviewed, so I cannot guarantee the accuracy of this proof. [Teorema] Se A é mxn e Rank(A) = r -> A = UΣVᵀ, onde: Vᵀ nxn é ortogonal e tem em suas primeiras r linhas os autovetores unitários de AᵀA para os quais os autovalores são maiores que zero, e nas restantes n-r linhas os autovetores unitários de AᵀA para os quais os autovalores são exatamente zero. As primeiras r linhas de Vᵀ são uma base para C(Aᵀ), e as restantes n-r linhas de Vᵀ são uma base para N(A). Σ mxn é diagonal mxn (retangular se m≠n) e tem em suas primeiras r entradas diagonais a raiz quadrada dos autovalores de AᵀA maiores que zero (que são os mesmos autovalores maiores que zero de AAᵀ), ordenados de acordo com as linhas de Vᵀ, e todas as outras entradas são 0. U mxm é ortogonal e a i-ésima de suas primeiras r colunas são Avᵢ->/σᵢ, onde σᵢ = √λᵢ (λᵢ = autovalor de vᵢ->), sendo que essas r colunas formam uma base para C(A), e as restantes m-r colunas formam qualquer base ortonormal para N(Aᵀ). E também é o caso que nas primeiras r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são maiores que zero, e nas restantes m-r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são exatamente zero. [Prova]: Considere A mxn (pode ser m≠n), com Rank(A) = r (r pode ser qualquer valor). Recapitulação básica: Sabemos que o espaço colunar C(A) e o espaço nulo à esquerda N(Aᵀ) ∈ ℝᵐ, e o espaço de linha C(Aᵀ) e o espaço nulo N(A) ∈ ℝⁿ. Sabemos que a dimensão tanto de C(A) quanto de C(Aᵀ) é r. Chame {v₁->, v₂->, ..., vᵣ->} os autovetores ortonormais de AᵀA cujos autovalores são maiores que 0, e {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} os autovetores ortonormais de AᵀA cujos autovalores são exatamente 0. Esses conjuntos sempre precisam existir, pois: AᵀA nxn é simétrica, e portanto tem decomposição espectral QoΛQo⁻¹, o que faz seus n autovetores serem ortogonais. Normalizando esses n autovetores, obtemos um conjunto ortonormal. AᵀA é positiva semidefinida, portanto seus autovalores são maiores ou iguais a zero. Como Rank(AᵀA) = Rank(A) = r, segue-se, pelo Teorema da Nulidade, que dim(N(AᵀA)) = n-r, e portanto, por Gram-Schmidt, pode-se achar uma base ortonormal de n-r vetores para N(AᵀA). E como sabemos, cada vetor linearmente independente em um espaço nulo de uma matriz é um autovetor da matriz com autovalor igual a zero. Então existem pelo menos n-r autovetores ortonormais em AᵀA com autovalor igual a zero. Na verdade, existem _exatamente_ n-r autovetores ortonormais em AᵀA com autovalor igual a zero, pois se houvesse mais um autovetor ortornormal em AᵀA com autovalor igual a zero, então a dim(N(AᵀA)) seria maior que n-r. Então o conjunto {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} sempre pode ser obtido. O conjunto {v₁->, v₂->, ..., vᵣ->} é obtido simplesmente selecionando-se os autovetores ortogonais em AᵀA cujos autovalores são maiores que zero. Eles sempre precisam existir, pois todos os autovetores em AᵀA com autovalor igual a zero estão no outro conjunto, e os autovalores de AᵀA são ou zero ou um número maior que zero. Como vimos, {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} é uma base para N(AᵀA). E como já provado, N(AᵀA) = N(A). Como já provado: Se V ∈ ℝⁿ e dim(V) = r -> [(V⟂ é um subespaço ortogonal a V) & (dim(V⟂) = n-r) V⟂ é o complemento ortogonal de V] O subespaço spannado por {v₁->, v₂->, ..., vᵣ->} ∈ ℝⁿ e tem dimensão r, e o subespaço spannado por {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->}, que é N(A), é ortogonal e ele e tem dimensão n-r. Portanto o subespaço spannado por {v₁->, v₂->, ..., vᵣ->} é o complemento ortogonal de N(A), que como sabemos é C(Aᵀ). Então veja que {v₁->, v₂->, ..., vᵣ->} é um conjunto ortonormal que é uma base para C(Aᵀ), enquanto {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} é um conjunto ortonormal que é uma base para N(A). Chame {u₁->, u₂->, ..., uᵣ->} o conjunto de vetores tais que u₁-> = Av₁->/σ₁, u₂-> = Av₂->/σ₂, ..., uᵣ-> = Avᵣ->/σᵣ, onde σᵢ = √λᵢ Esses vetores sempre precisam existir, pois λ₁, λ₂, ..., λᵣ são maiores que zero. Vamos provar que {u₁->, u₂->, ..., uᵣ->} são ortogonais: Para i≠j: uᵢ->ᵀuⱼ-> = (Avᵢ->/σᵢ)ᵀ(Avⱼ->/σⱼ) uᵢ->ᵀuⱼ-> = (1/(σᵢσⱼ))vᵢ->ᵀAᵀAvⱼ-> Como vⱼ-> é um autovetor de AᵀA, segue-se que AᵀAvⱼ-> = σⱼ²vⱼ->. Substituindo: uᵢ->ᵀuⱼ-> = (1/(σᵢσⱼ))vᵢ->ᵀσⱼ²vⱼ-> uᵢ->ᵀuⱼ-> = (σⱼ²/(σᵢσⱼ))vᵢ->ᵀvⱼ-> Como os vᵢ->'s são ortonormais, vᵢ->ᵀvⱼ-> = 0. Substituindo: uᵢ->ᵀuⱼ-> = (σⱼ²/(σᵢσⱼ))*0 uᵢ->ᵀuⱼ-> = 0 Vamos provar que {u₁->, u₂->, ..., uᵣ->} são unitários: ||uᵢ->|| = √(uᵢ->ᵀuᵢ->) ||uᵢ->|| = √((Avᵢ->/σᵢ)ᵀ(Avᵢ->/σᵢ)) ||uᵢ->|| = √((1/(σᵢσᵢ))vᵢ->ᵀAᵀAvᵢ->) Como vᵢ-> é um autovetor de AᵀA, segue-se que AᵀAvᵢ-> = σᵢ²vᵢ->. Substituindo: ||uᵢ->|| = √((1/(σᵢσᵢ))vᵢ->ᵀσᵢ²vᵢ->) ||uᵢ->|| = √((σᵢ²/(σᵢσᵢ))vᵢ->ᵀvᵢ->) Como os vᵢ->'s são ortonormais, vᵢ->ᵀvᵢ-> = 1. Substituindo: ||uᵢ->|| = √((σᵢ²/(σᵢσᵢ))*1) Simplificando: ||uᵢ->|| = √1 ||uᵢ->|| = 1 Portanto o conjunto {u₁->, u₂->, ..., uᵣ->} é ortonormal. Porque eles foram definidos como uᵢ-> = (Avᵢ->/σᵢ), segue-se que todos eles pertencem a C(A), pois são simplesmente combinações lineares das colunas de A. Como Rank(A) = r, então esses r vetores {u₁->, u₂->, ..., uᵣ->}, que são linearmente independentes (pois são ortogonais entre si), precisam ser uma base para C(A). Chame {uᵣ₊₁->, uᵣ₊₂->, ..., uₘ->} uma base ortonormal para N(Aᵀ). Sempre é possível achar essa base ortonormal por Gram-Schmidt. Chame V = [v₁-> v₂-> ... vₙ->]. Veja que V é ortogonal. Chame U = [u₁-> u₂-> ... uₘ->]. Veja que U é ortogonal. Chame Σ = matriz mxn cujas primeiras r entradas diagonais serão {σ₁, σ₂, ..., σᵣ}, e todas as outras entradas (diagonais e não-diagonais) serão 0. Note que Σ é diagonal (retangular se m≠n). Vamos explicar por que a seguinte igualdade é verdadeira: AV = UΣ Primeiro, vamos analisar AV. Sabemos que a i-ésima coluna de AV é simplesmente Avᵢ->. Para as primeiras r colunas de AV, a i-ésima coluna será, como definimos acima, σᵢuᵢ->, pois uᵢ-> = Avᵢ->/σᵢ. Todas as restantes n-r colunas serão 0->, uma vez que Avᵣ₊₁-> = 0->, ..., Avₙ-> = 0->, pois esses vetores formam uma base para N(A). Logo, AV = [σ₁u₁-> σ₂u₂-> ... σᵣuᵣ-> 0-> 0-> ... 0->], onde a quantidade de 0->'s é n-r. Note que AV é mxn, pois A é mxn e V é nxn. Agora, vamos analisar UΣ. Esta também será uma matriz mxn, pois U é mxm e Σ é mxn. Porque Σ é diagonal com r entradas diagonais não-zero e n-r entradas diagonais zero, temos o seguinte: O primeiro vetor colunar de Σ é , o segundo vetor colunar é , o r-ésimo vetor colunar é (na r-ésima posição), e todos os outros n-r vetores colunares são 0->. Portanto segue-se que UΣ será [σ₁u₁-> σ₂u₂-> ... σᵣuᵣ-> 0-> 0-> ... 0->] Logo, AV = UΣ. Como V é ortogonal, seu inverso é Vᵀ. Pós-multiplicando ambos os lados por Vᵀ: A = UΣVᵀ Logo, toda matriz tem decomposição por valores singulares. Vamos ver o que é AAᵀ: AAᵀ = UΣVᵀVΣᵀUᵀ Porque V é ortogonal, VᵀV = I. Substituindo: AAᵀ = UΣΣᵀUᵀ Como já provado, porque Σ é uma matriz diagonal mxn, ΣΣᵀ precisa ser mxm diagonal, e como Σ estava estruturado de forma que suas primeiras r entradas diagonais eram maiores que zero e as restantes são exatamente zero, segue-se que as primeiras r entradas diagonais de ΣΣᵀ são maiores que zero e as restantes são exatamente zero. Chame D := ΣΣᵀ. Substituindo: AAᵀ = UDUᵀ Porque U é ortogonal, Uᵀ = U⁻¹. Substituindo: AAᵀ = UDU⁻¹ Veja que o LDE é a diagonalização de AAᵀ. Como já provado, precisa ser o caso que os vetores em U são autovetores de AAᵀ. Mais especificamente, nas primeiras r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são maiores que zero, e nas restantes m-r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são exatamente zero.
One thing that I find confusing is that because the sigma matrix is left multiplied by V transposed, it is not pure scaling. This is because the sigma values are multiplied by the rows of V transposed and not the columns. This will cause additional rotation and changes in dimensionality.
You need to do the steps in sequence. Yes, if you consider the operations from left to right you get mixed scaling and rotation. But considered from right to left, it's a (pure) rotation, a (pure) scaling) and a final rotation.
Amazing explanation by Gilbert sir as always but I have a small doubt. We find the U vectors from the V vectors because of the reason that there might be a collection of Vectors in U to choose from which are orthogonal but what happens when even the V vectors are a collection of vectors from where we have to choose the orthogonal one's?
How can any mXn matrix A be factorized into SQ, where both are square matrices (symmetric and orthogonal)? The dimensions are not matching. Or, is this only for a square matrix A?
1. Start with Ax = ∂ x and keep the transpose x^T A^T= x^T ∂ ready 2. Multiply A^T both sides thus, A^T Ax = A^T ∂ x 3. Multiply x^T both sides thus, x^T A^T Ax = x^T A^T ∂ x 4. Replace x^T A^T with x^T ∂ 5. Thus, x^T(A^T A)x =x^T (∂ ^2 ) x So you believe (∂ ^2 ) are the eigenvalues of A^T A, so are the sigma^2
You can think of it as 3 combination 2, which gives 3. Two directions combine to form a single rotation from the one vector to the other. So, the number of angles is D combination 2, where D is the dimension of your space. Four dimensions will thus require 6 angles, etc.
I think 2 angles are enough to specify the direction. Another parameter to specify the magnitude. I'm not a native speaker of English and didn't understand what they meant by roll, pitch and yaw physics.stackexchange.com/questions/175354/why-are-three-parameters-required-to-express-rotation-in-3-dimension
32:12 I do not understand why are the resulting vectors called u1 and u2. I expect Ax1 and Ax2 to be there as a result of A multiplication. If U multiplication is rotation, than u1 and u2 are just rotated unit vectors (exactly as v's, different rotation)... Can somebody prove me wrong please?
One question I have here is with the rotation and stretching - if A is a rectangular matrix does it not project x into a lower or higher dimensional space?
I think he made an error. He said the determinant of an orthogonal matrix is 1, but actually it is 1 or -1. Then Det(A) = + or - the product of the singular values.
actually it refers to the symmetry matrix, so he simply ignore the negative value because symmetry matrix has eigen of positive semi definiteness, and the orthogonal matrix here refers to the eigen matrix.
@@imrematajz1624 I think @Matías Sandacz refers to the U and V matrices. Those are orthogonal which means the determinant is either 1 or -1. If it is 1 it would be a rotation. If it -1 it would be a reflection. At 29:46 he even says that it could be a reflection. Can someone please confirm this?
To be fair the stress MIT students go through their undergrad/grad careers is inconceivable. Though it's incredible how much you learn even in one lecture at this school, let alone an entire 4 years
Professor Strang mentions that U and V form bases for the four fundamental subspaces of A, but it's not clear to me how C(A) = C(Ur) and C(A') = C (Vr'). I know that U and V were determined by the eigenvectors of AA' and A'A, respectively, but how are these related to the column and row spaces of A?
Here's how I understand it. They are connected to the column and row spaces by the fact that they constitute bases for those spaces, which in turn means there are r of them (rank of the matrix) and they are orthogonal (within each basis, that is to say, because the row space and column space generally belong to two different spaces if the matrix is not square). They are not columns or rows from the matrix itself. They can be constructed from the matrix (especially the basis for the row space), though, by the Gram-Schmidt process. I still need to confirm how the ortho-basis for the column space comes into life... By the way, it just so happens (miraculously) that the U and V matrices that store orthonormal vectors contain the eigenvectors of A'A and AA'. It's just a strange and beautiful fact. Additionally, the sigma matrix is built from the square roots of the eigenvalues which all happen to be real non-negative for the symmetric matrices A'A and AA'. Obviously, since the entries in U and V are eigenvectors belonging to the row/column-spaces, they definitely are linear combinations of the rows/columns (and there are r of them). Is that clearer?
Has any body else spotted the error @25:21 A^T A = sigma_2 ^2 /( sigma_1 * sigma_2 ) is that an error there? sigma_2 ^2 relates to the vector v_2 while A^T A relates sigma^2 , a diagonal matrix i.e. A^T A = V sigma^2 V^T or am I wrong, if so would appreciate an explanation :)
When U and V transposed are the same, do these matrices represent a change of basis matrix? Because for this to happen, A has to be positive symmetric, so then the singular values of A would equal the eigenvalues of A right? So if A represents a transformation , then the diagonal matrix sigma could be understood as that same transformation matrix but in the basis of eigenvalues, so then U = V transposed, should be change of basis matrices ???? But when are change of basis matrices equal to each other???? Thank you!!
When transposed they are not the same. They are orthonormal which means when multiplication happens - UU' or VV' - the outcome is an identity matrix in the correct space (m-by-m or n-by-n).
I am not entirely sure if this is correct ( plz say if I am not), but I’m also pretty sure U, sigma, and v-transpose can also be called LDL-transpose or also LDU for L is lower D diagonal and U for upper
I dont think any proof is needed because he rearranged A=UEV^T to get the u's he tests for being orthogonal. That means any u he considers is by definition an eigenvalue of AA^t. But dont quote me on that
(A^T*A)v = cv (A^t)Av = cv --> v must be in the column space of A^T since it is a linear combination of the columns of A^T. Therefore it is in the rowspace of A
Prof Strang's notes were turned into his most recent book -- visit the OCW course site for links and the Reading section for guidance on what chapters go with the lectures: ocw.mit.edu/18-065S18.
Gilbert Strang is a treasure to the world. His books are top notch, and the MIT Open Courseware has made it possible for myself and so many others to learn at our own pace. Many thanks Dr. Strang!
Brown nose. MIT also fails to hire anyone with brown skin and will blow out of proportion any work conducted by white old guys. Unbelievable.
Thumbs up to the camera angle management team; great video audio editing.
I love the way he always explains everything from the beginning, he tells you the whole process.
One of the best lectures on TH-cam, thanks Prof. Gilbert for explaining this beautifully.
His explanations have always been impeccable. What a legend.
Awesome! I could watch the lectures over and over and each time, I gain a different and extra insight into the results and the process. Linear Algebra could turn into magic after you apply it’s theory--the once dull computations of rows and columns. The most fascinating aspect of linear Algebra is when the entire matrix details unfold into a single effect, one that resembles the effect of simple scalars used in simple math formulas. Thanks to the fundamental concept of the Eigenvectors and Eigenvalues. Understanding the very essence of a matrix and what it does to any vector quantity (change in magnitude and direction) is the key concept to appreciating the underlying concept of various matrix decompositions. It all boils down to how many ways we could see the workings of a matrix in terms of its decomposed forms. The SVD composition is one of those decompositions of a matrix that allows a matrix to be seen as a sequential transformation of direction and magnitude in terms of some orthonormal basis characteristic of such matrix and its Eigenspace. Such a beauty!
Fantastic lecture, really appreciate MIT sharing the course lectures for free.
This SVD business is truly poetry in motion.
"The key is that A transpose A is a great matrix" :D simply GOAT!
love Prof Strang's lecture, enlighten me on the depth of linear algebra
Thank you very much Dr. Strang for making this intuitive and natural. I had only gone through SVD in bits and pieces and never had what I could consider a good enough understanding until I saw this lecture.
God bless your existence, Prof. Strang.
He just saved my from the brim of breaking down trying to understand SVD! What he presented is so natural, so beautiful that I instantly catch them. Appreciate MIT open courseware also!
This is pure gold... such a fantastic Professor!! Thank you
Genius of this Man is the way he is explaining such a difficult concept so easily..... Hatts Off. As per Indian Ethos I just want to touch your feet.
This lecture singularly helped me through the concepts on use of eigen values for capturing variability
Prof Strang is mathematical elegant in delivering linear algebra in an understandable way
This lecture in specific was truly beautiful
16:23 symmetry; 28:50 geometry; 42:24 "econ" SVD
Professor Strang thank you for breaking down and explaining Singular Value Decomposition in linear algebra.
I am confident once I went through a Refresher up to this point that his explanation will be as clear as water😀 Because none of my advanced mathematics professors ever explained in such detail!! Awesome professor 👍🏾🙏
Beautiful exposition of SVD
Poles (eigenvalues) are dual to zeros -- optimized control theory.
The standard basis is dual to the eigen vector basis -- symmetric matrices.
Stretch is dual to squeeze -- eigenvalues, forces are dual, ellipsoids are dual to circles or hyperspheres.
"Always two there are" -- Yoda.
Syntax is dual to semantics -- languages or communication.
If mathematics is a language then it is dual.
The teacher (Gilbert Strang, Yoda) is dual to the pupil (You, Luke Skywalker) -- the Hegelian dialectic.
It is better not to rotate when getting ellipse. Ellipse has to be done without rotating, because multipling by lambda makes stretching along axes. Then of course should be rotating by matrix U. I.e. when ellipse first appeared on the board it should be vertically placed, not inclined.
It is possible to draw - you should placed vectors 1 and 2 inclined initially on the first step. Then after multipling by V they will be along axes. Then stretching, then once again rotating. Still, it is a good lecture! :)
You are right. For a picture see en.wikipedia.org/wiki/Singular_value_decomposition#/media/File:Singular-Value-Decomposition.svg
His drawing isn't necessarily wrong, if for instance the vectors in V happen to be the unit vectors
Love the cool idea to light up the U and V!
Awesome simplicity. Wish I had him in college
Thank you professor Strang, thank you MIT OCW
I cannot thank you enough for this clear, insightful lecture. It is very inspiring that you are calling the matrices "beautiful", and I can feel your love for them :) For me your lectures are beautiful.
Thank you so much for this beautiful and brilliant lecture.
Good lecture, but I think the geometric intepretation of SVD is its motivation. We should use SVD as a tool to understand general linear transformation
42:00 use SVD, the smallest SVD and square SVD
Splendidly explained. Thank you so much
@6:49 The fact that A^T * A and A * A^T have the same non-zero eigenvalues is blowing my mind right now. I guess this is intimately related to the fact that det(B) = det(B^T), since det is where we get the characteristic polynomial/egenvalues...
I think it turns out that this is not actually used, so no need to lose sleep over it, I guess...
@@allyourcode You're mistaken. It is not only used but it's crucial for this to work. The fact that the eigenvalues are the same is what enables to have the same Sigma matrix for both of the matrices, AA' and A'A. If that was not the case, well, the technique would lead us astray.
In a previous lecture, professor mentioned that eigenvalues of AB and BA (similar matrixes) are the same. If we consider A' as B and A as A, then A'A and AA' should have the same eigenvalues.
50:00 the important/biggest/principal part of matrix
If A^T A V = V S is the spectral decomposition of A^T A with S diagonal and V orthogonal, we have that (A V)^T (A V)= S. Hence, the nonzero columns of AV, which is associated with the nonzero columns of A, is orthogonal with all other nonzero columns of AV. Hence, is possible to find an orthogonal matrix U such that A V = U sqrt(S) -- note that the diagonal of S give us the squared norm of each column of AV, which means this is completely fine. Now, this means A=U sqrt(S) V^T. Completing U, if necessary, we have the SVD decomposition. In practice, after the calculation of the spectral decomposition of A^T A, we would calculate the Kern of (AV)^T -- orthogonal to Im(AV) -- and find an orthogonal basis to it, and of course, all the singular values associated with this process would be zero.
P.S.: In the picture at 32:35, the u_1 and u_2 should be a_1 and a_2, after applying the rotation U.
He's an elite prof
Truly awesome instructor. Fantastic!
I love prof. Strang and I'm very grateful for his lectures, but I have to admit I don't find this proof convincing. Correct me if I'm wrong, but the project he starts out with (@3:32) is to prove that every matrix A has this decomposition. But @13:26, he starts out by **assuming** that A has this decomposition, and then he deduces what U, Σ, and Vᵀ must be. But that doesn't show that every matrix A has this decomposition. Instead, it shows that *if A has this decomposition*, then *that's what U, Σ, and Vᵀ must be*. Even if we were to argue that we can obtain such U, Σ, and Vᵀ for any matrix A, prof. Strang doesn't show that by multiplying UΣVᵀ we will get back A for any A.
I struggled with this topic for months until I finally could arrange everything into what I believe is a proof for why every matrix A can be decomposed this way.
Can you please share your proof with us? That would be awesome. Thank you!
@@dariuszspiewak5624 Hi. I'm going to paste what I've written in my notes. The non-mathematic notation is in Portuguese (my native language), but I think Google Translate will probably do a good job.
In any case, I haven't been peer-reviewed, so I cannot guarantee the accuracy of this proof.
[Teorema]
Se A é mxn e Rank(A) = r -> A = UΣVᵀ, onde:
Vᵀ nxn é ortogonal e tem em suas primeiras r linhas os autovetores unitários de AᵀA para os quais os autovalores são maiores que zero, e nas restantes n-r linhas os autovetores unitários de AᵀA para os quais os autovalores são exatamente zero. As primeiras r linhas de Vᵀ são uma base para C(Aᵀ), e as restantes n-r linhas de Vᵀ são uma base para N(A).
Σ mxn é diagonal mxn (retangular se m≠n) e tem em suas primeiras r entradas diagonais a raiz quadrada dos autovalores de AᵀA maiores que zero (que são os mesmos autovalores maiores que zero de AAᵀ), ordenados de acordo com as linhas de Vᵀ, e todas as outras entradas são 0.
U mxm é ortogonal e a i-ésima de suas primeiras r colunas são Avᵢ->/σᵢ, onde σᵢ = √λᵢ (λᵢ = autovalor de vᵢ->), sendo que essas r colunas formam uma base para C(A), e as restantes m-r colunas formam qualquer base ortonormal para N(Aᵀ). E também é o caso que nas primeiras r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são maiores que zero, e nas restantes m-r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são exatamente zero.
[Prova]:
Considere A mxn (pode ser m≠n), com Rank(A) = r (r pode ser qualquer valor).
Recapitulação básica:
Sabemos que o espaço colunar C(A) e o espaço nulo à esquerda N(Aᵀ) ∈ ℝᵐ, e o espaço de linha C(Aᵀ) e o espaço nulo N(A) ∈ ℝⁿ.
Sabemos que a dimensão tanto de C(A) quanto de C(Aᵀ) é r.
Chame {v₁->, v₂->, ..., vᵣ->} os autovetores ortonormais de AᵀA cujos autovalores são maiores que 0, e {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} os autovetores ortonormais de AᵀA cujos autovalores são exatamente 0.
Esses conjuntos sempre precisam existir, pois:
AᵀA nxn é simétrica, e portanto tem decomposição espectral QoΛQo⁻¹, o que faz seus n autovetores serem ortogonais. Normalizando esses n autovetores, obtemos um conjunto ortonormal.
AᵀA é positiva semidefinida, portanto seus autovalores são maiores ou iguais a zero.
Como Rank(AᵀA) = Rank(A) = r, segue-se, pelo Teorema da Nulidade, que dim(N(AᵀA)) = n-r, e portanto, por Gram-Schmidt, pode-se achar uma base ortonormal de n-r vetores para N(AᵀA). E como sabemos, cada vetor linearmente independente em um espaço nulo de uma matriz é um autovetor da matriz com autovalor igual a zero. Então existem pelo menos n-r autovetores ortonormais em AᵀA com autovalor igual a zero. Na verdade, existem _exatamente_ n-r autovetores ortonormais em AᵀA com autovalor igual a zero, pois se houvesse mais um autovetor ortornormal em AᵀA com autovalor igual a zero, então a dim(N(AᵀA)) seria maior que n-r.
Então o conjunto {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} sempre pode ser obtido.
O conjunto {v₁->, v₂->, ..., vᵣ->} é obtido simplesmente selecionando-se os autovetores ortogonais em AᵀA cujos autovalores são maiores que zero. Eles sempre precisam existir, pois todos os autovetores em AᵀA com autovalor igual a zero estão no outro conjunto, e os autovalores de AᵀA são ou zero ou um número maior que zero.
Como vimos, {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} é uma base para N(AᵀA). E como já provado, N(AᵀA) = N(A).
Como já provado: Se V ∈ ℝⁿ e dim(V) = r -> [(V⟂ é um subespaço ortogonal a V) & (dim(V⟂) = n-r) V⟂ é o complemento ortogonal de V]
O subespaço spannado por {v₁->, v₂->, ..., vᵣ->} ∈ ℝⁿ e tem dimensão r, e o subespaço spannado por {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->}, que é N(A), é ortogonal e ele e tem dimensão n-r.
Portanto o subespaço spannado por {v₁->, v₂->, ..., vᵣ->} é o complemento ortogonal de N(A), que como sabemos é C(Aᵀ).
Então veja que {v₁->, v₂->, ..., vᵣ->} é um conjunto ortonormal que é uma base para C(Aᵀ), enquanto {vᵣ₊₁->, vᵣ₊₂->, ..., vₙ->} é um conjunto ortonormal que é uma base para N(A).
Chame {u₁->, u₂->, ..., uᵣ->} o conjunto de vetores tais que u₁-> = Av₁->/σ₁, u₂-> = Av₂->/σ₂, ..., uᵣ-> = Avᵣ->/σᵣ, onde σᵢ = √λᵢ
Esses vetores sempre precisam existir, pois λ₁, λ₂, ..., λᵣ são maiores que zero.
Vamos provar que {u₁->, u₂->, ..., uᵣ->} são ortogonais:
Para i≠j:
uᵢ->ᵀuⱼ-> = (Avᵢ->/σᵢ)ᵀ(Avⱼ->/σⱼ)
uᵢ->ᵀuⱼ-> = (1/(σᵢσⱼ))vᵢ->ᵀAᵀAvⱼ->
Como vⱼ-> é um autovetor de AᵀA, segue-se que AᵀAvⱼ-> = σⱼ²vⱼ->. Substituindo:
uᵢ->ᵀuⱼ-> = (1/(σᵢσⱼ))vᵢ->ᵀσⱼ²vⱼ->
uᵢ->ᵀuⱼ-> = (σⱼ²/(σᵢσⱼ))vᵢ->ᵀvⱼ->
Como os vᵢ->'s são ortonormais, vᵢ->ᵀvⱼ-> = 0. Substituindo:
uᵢ->ᵀuⱼ-> = (σⱼ²/(σᵢσⱼ))*0
uᵢ->ᵀuⱼ-> = 0
Vamos provar que {u₁->, u₂->, ..., uᵣ->} são unitários:
||uᵢ->|| = √(uᵢ->ᵀuᵢ->)
||uᵢ->|| = √((Avᵢ->/σᵢ)ᵀ(Avᵢ->/σᵢ))
||uᵢ->|| = √((1/(σᵢσᵢ))vᵢ->ᵀAᵀAvᵢ->)
Como vᵢ-> é um autovetor de AᵀA, segue-se que AᵀAvᵢ-> = σᵢ²vᵢ->. Substituindo:
||uᵢ->|| = √((1/(σᵢσᵢ))vᵢ->ᵀσᵢ²vᵢ->)
||uᵢ->|| = √((σᵢ²/(σᵢσᵢ))vᵢ->ᵀvᵢ->)
Como os vᵢ->'s são ortonormais, vᵢ->ᵀvᵢ-> = 1. Substituindo:
||uᵢ->|| = √((σᵢ²/(σᵢσᵢ))*1)
Simplificando:
||uᵢ->|| = √1
||uᵢ->|| = 1
Portanto o conjunto {u₁->, u₂->, ..., uᵣ->} é ortonormal.
Porque eles foram definidos como uᵢ-> = (Avᵢ->/σᵢ), segue-se que todos eles pertencem a C(A), pois são simplesmente combinações lineares das colunas de A.
Como Rank(A) = r, então esses r vetores {u₁->, u₂->, ..., uᵣ->}, que são linearmente independentes (pois são ortogonais entre si), precisam ser uma base para C(A).
Chame {uᵣ₊₁->, uᵣ₊₂->, ..., uₘ->} uma base ortonormal para N(Aᵀ). Sempre é possível achar essa base ortonormal por Gram-Schmidt.
Chame V = [v₁-> v₂-> ... vₙ->]. Veja que V é ortogonal.
Chame U = [u₁-> u₂-> ... uₘ->]. Veja que U é ortogonal.
Chame Σ = matriz mxn cujas primeiras r entradas diagonais serão {σ₁, σ₂, ..., σᵣ}, e todas as outras entradas (diagonais e não-diagonais) serão 0. Note que Σ é diagonal (retangular se m≠n).
Vamos explicar por que a seguinte igualdade é verdadeira:
AV = UΣ
Primeiro, vamos analisar AV.
Sabemos que a i-ésima coluna de AV é simplesmente Avᵢ->. Para as primeiras r colunas de AV, a i-ésima coluna será, como definimos acima, σᵢuᵢ->, pois uᵢ-> = Avᵢ->/σᵢ. Todas as restantes n-r colunas serão 0->, uma vez que Avᵣ₊₁-> = 0->, ..., Avₙ-> = 0->, pois esses vetores formam uma base para N(A).
Logo, AV = [σ₁u₁-> σ₂u₂-> ... σᵣuᵣ-> 0-> 0-> ... 0->], onde a quantidade de 0->'s é n-r. Note que AV é mxn, pois A é mxn e V é nxn.
Agora, vamos analisar UΣ.
Esta também será uma matriz mxn, pois U é mxm e Σ é mxn.
Porque Σ é diagonal com r entradas diagonais não-zero e n-r entradas diagonais zero, temos o seguinte:
O primeiro vetor colunar de Σ é , o segundo vetor colunar é , o r-ésimo vetor colunar é (na r-ésima posição), e todos os outros n-r vetores colunares são 0->.
Portanto segue-se que UΣ será [σ₁u₁-> σ₂u₂-> ... σᵣuᵣ-> 0-> 0-> ... 0->]
Logo, AV = UΣ.
Como V é ortogonal, seu inverso é Vᵀ. Pós-multiplicando ambos os lados por Vᵀ:
A = UΣVᵀ
Logo, toda matriz tem decomposição por valores singulares.
Vamos ver o que é AAᵀ:
AAᵀ = UΣVᵀVΣᵀUᵀ
Porque V é ortogonal, VᵀV = I. Substituindo:
AAᵀ = UΣΣᵀUᵀ
Como já provado, porque Σ é uma matriz diagonal mxn, ΣΣᵀ precisa ser mxm diagonal, e como Σ estava estruturado de forma que suas primeiras r entradas diagonais eram maiores que zero e as restantes são exatamente zero, segue-se que as primeiras r entradas diagonais de ΣΣᵀ são maiores que zero e as restantes são exatamente zero. Chame D := ΣΣᵀ. Substituindo:
AAᵀ = UDUᵀ
Porque U é ortogonal, Uᵀ = U⁻¹. Substituindo:
AAᵀ = UDU⁻¹
Veja que o LDE é a diagonalização de AAᵀ.
Como já provado, precisa ser o caso que os vetores em U são autovetores de AAᵀ. Mais especificamente, nas primeiras r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são maiores que zero, e nas restantes m-r colunas de U estão os autovetores unitários de AAᵀ para os quais os autovalores são exatamente zero.
"Six angles in 4D rotations. We'll leave it there..."
really useful lecture, thank you!
Very fruitful content
খুবই সুন্দর লাগলো...
Actually better than his textbook :)
One thing that I find confusing is that because the sigma matrix is left multiplied by V transposed, it is not pure scaling. This is because the sigma values are multiplied by the rows of V transposed and not the columns. This will cause additional rotation and changes in dimensionality.
You need to do the steps in sequence. Yes, if you consider the operations from left to right you get mixed scaling and rotation. But considered from right to left, it's a (pure) rotation, a (pure) scaling) and a final rotation.
Amazing explanation by Gilbert sir as always but I have a small doubt. We find the U vectors from the V vectors because of the reason that there might be a collection of Vectors in U to choose from which are orthogonal but what happens when even the V vectors are a collection of vectors from where we have to choose the orthogonal one's?
45:30 second approach to find SVD
Polar decomosition
Love this!!
The proof for why v1,v2,... ,vr are orthogonal means u1,u2,... ,ur are orthogonal by A*v=sigma*u at 23:00.
Amazing lesson ♥♥
38:35 pilot in class
4:49 There is an error in the subtitles. Should say 'It's symmetric', not 'It's a metric'
|det|,absolute value
Love Prof.Strang so much!!!really excellent lecture!!!!
At 19:37 why does Prof Strang encourages to use only the first formula
Since we can get free lecture from MIT, is it really necessary to go to college? I bet the wherever it is, MIT has better in quality.
Awesome, thanks!
Thanks!
How can any mXn matrix A be factorized into SQ, where both are square matrices (symmetric and orthogonal)? The dimensions are not matching. Or, is this only for a square matrix A?
I thought A^{T} A is positive semidefinite isn't it?
I was going to respond the same thing. I believe that's correct. The general argument, however, still holds.
that is why he writes positive or equal to 0.
Yes I think you are right.
Respect
You should really rerecord the previous 5 classes with functions on the place of numbers and call it tensor calculus - introduction
Feels like we can do math on our own when we intuite it better
How wonderful!
致敬!
OK, but If AA^T = U \Lambda U^T and A^TA=V \Lambda V^T what is derivation of formulae AV=\Sigma U?
Great video thanks!
@25:00 Professor Strang manipulates V_1^t * (A^T * A) *V_2 into V_1^t * (∂_2 ^2) *V_2......
How does A^T * A = ∂_2 ^2 exactly?
Those v's are just eigenvectors of A^TA. Then we have A^TA v = ∂^2 v.
Just come from AX=§X.
1. Start with Ax = ∂ x and keep the transpose x^T A^T= x^T ∂ ready
2. Multiply A^T both sides thus, A^T Ax = A^T ∂ x
3. Multiply x^T both sides thus, x^T A^T Ax = x^T A^T ∂ x
4. Replace x^T A^T with x^T ∂
5. Thus, x^T(A^T A)x =x^T (∂ ^2 ) x
So you believe (∂ ^2 ) are the eigenvalues of A^T A, so are the sigma^2
How to prove A times A transpose has the same non-trivial eigenvalue as A transpose times A?
Amazing stuff
Amazing
@37 Doesn't rotation in 3d space require two angles only, theta(zenith) and phi(azimuth)?
for 3d rotation we need roll,pitch, and yaw; you can specify zenith with pitch, azimuth with yaw, so we need roll parameter too.
You can think of it as 3 combination 2, which gives 3. Two directions combine to form a single rotation from the one vector to the other. So, the number of angles is D combination 2, where D is the dimension of your space. Four dimensions will thus require 6 angles, etc.
I think 2 angles are enough to specify the direction. Another parameter to specify the magnitude. I'm not a native speaker of English and didn't understand what they meant by roll, pitch and yaw physics.stackexchange.com/questions/175354/why-are-three-parameters-required-to-express-rotation-in-3-dimension
I have a question, could anyone answer it? many thanks! how could v1 v2 in row spaces?, u1,u2 in column space which is true obvious
5:06 did he mean positive semi definite ?
32:12 I do not understand why are the resulting vectors called u1 and u2. I expect Ax1 and Ax2 to be there as a result of A multiplication. If U multiplication is rotation, than u1 and u2 are just rotated unit vectors (exactly as v's, different rotation)... Can somebody prove me wrong please?
I think it is correct to write σ_i*u_i not just u_i. U is just the rotation, so σ_i should remain there.
legend
One question I have here is with the rotation and stretching - if A is a rectangular matrix does it not project x into a lower or higher dimensional space?
projections are square matrices at the first place I guess.
I think he made an error. He said the determinant of an orthogonal matrix is 1, but actually it is 1 or -1. Then Det(A) = + or - the product of the singular values.
actually it refers to the symmetry matrix, so he simply ignore the negative value because symmetry matrix has eigen of positive semi definiteness, and the orthogonal matrix here refers to the eigen matrix.
Matías Sandacz Hi, I think he was only talking about (semi-) definite matrices in which case Prof Strang was right.
@@imrematajz1624 I think @Matías Sandacz refers to the U and V matrices. Those are orthogonal which means the determinant is either 1 or -1. If it is 1 it would be a rotation. If it -1 it would be a reflection.
At 29:46 he even says that it could be a reflection.
Can someone please confirm this?
I wish I had gotten into MIT.
To be fair the stress MIT students go through their undergrad/grad careers is inconceivable. Though it's incredible how much you learn even in one lecture at this school, let alone an entire 4 years
@@dectorey7233 lol I know, I have friends that are going there. The summer programs I did there were tough too.
Love....
wait... this is the dude that wrote the textbook!!!???
Professor Strang mentions that U and V form bases for the four fundamental subspaces of A, but it's not clear to me how C(A) = C(Ur) and C(A') = C (Vr'). I know that U and V were determined by the eigenvectors of AA' and A'A, respectively, but how are these related to the column and row spaces of A?
i think the structure of Sigma tells that.
Here's how I understand it. They are connected to the column and row spaces by the fact that they constitute bases for those spaces, which in turn means there are r of them (rank of the matrix) and they are orthogonal (within each basis, that is to say, because the row space and column space generally belong to two different spaces if the matrix is not square). They are not columns or rows from the matrix itself. They can be constructed from the matrix (especially the basis for the row space), though, by the Gram-Schmidt process. I still need to confirm how the ortho-basis for the column space comes into life... By the way, it just so happens (miraculously) that the U and V matrices that store orthonormal vectors contain the eigenvectors of A'A and AA'. It's just a strange and beautiful fact. Additionally, the sigma matrix is built from the square roots of the eigenvalues which all happen to be real non-negative for the symmetric matrices A'A and AA'. Obviously, since the entries in U and V are eigenvectors belonging to the row/column-spaces, they definitely are linear combinations of the rows/columns (and there are r of them). Is that clearer?
One question Prof. and guys, how comes sigma in the middle is diagonal (square)? shouldn't its dimension (m, n) ? Thanks!
@What could go wrong? - Niagara edition Ah, that makes sense. Thanks for your explain!
Has any body else spotted the error @25:21
A^T A = sigma_2 ^2 /( sigma_1 * sigma_2 ) is that an error there?
sigma_2 ^2 relates to the vector v_2
while A^T A relates sigma^2 , a diagonal matrix
i.e. A^T A = V sigma^2 V^T
or am I wrong, if so would appreciate an explanation
:)
We know that A^T*A = V(E^T*E)V^T in which V2 is an eigenvector to A^T*A, and sigma2^2 is its eigenvalue (from E^T*E). Then (A^T*A)*V2= (sigma2^2)*V2.
@@vivianhuang5647
Thanks
:)
why singular values are always positive?
Why do U and V mean rotation? Why not reflection?
When U and V transposed are the same, do these matrices represent a change of basis matrix? Because for this to happen, A has to be positive symmetric, so then the singular values of A would equal the eigenvalues of A right? So if A represents a transformation , then the diagonal matrix sigma could be understood as that same transformation matrix but in the basis of eigenvalues, so then U = V transposed, should be change of basis matrices ????
But when are change of basis matrices equal to each other????
Thank you!!
When transposed they are not the same. They are orthonormal which means when multiplication happens - UU' or VV' - the outcome is an identity matrix in the correct space (m-by-m or n-by-n).
A=SQ, where A must be square matrix?
26:06 What did the professor mean by-> orthogonal v's in the "row space" ?
i think he sayed it,becaurce v's belong to V^t(V ist transposed) ,so v^t belongs to V and v^t are rows
I am not entirely sure if this is correct ( plz say if I am not), but I’m also pretty sure U, sigma, and v-transpose can also be called LDL-transpose or also LDU for L is lower D diagonal and U for upper
Did he miss the explanation why u is eigenvector of A^t A after proving the u's are orthogonal?
I dont think any proof is needed because he rearranged A=UEV^T to get the u's he tests for being orthogonal. That means any u he considers is by definition an eigenvalue of AA^t. But dont quote me on that
Magic
he never explained how he has gotten Av = sigma u
speed 1.25 is perfect
Can someone please explain why is AV1 =sigma1 U1
„6 angles in 4 dimensions, we leave it there“ 😂
Legend
How could I plug the UtU after sigma although they don't satisfy multiplication conditions concerning the size of matrices?
I think he is referring to a square matrix A in that case
26:09 I doubt the statement that v must be in the row space.
(A^T*A)v = cv (A^t)Av = cv --> v must be in the column space of A^T since it is a linear combination of the columns of A^T. Therefore it is in the rowspace of A
Where can we find the notes professor Strang mentions in the videos? ( 45:17 )
The materials for the course are available on MIT OpenCourseWare at: ocw.mit.edu/18-065S18. Best wishes on your studies!
Prof Strang's notes were turned into his most recent book -- visit the OCW course site for links and the Reading section for guidance on what chapters go with the lectures: ocw.mit.edu/18-065S18.
Can anyone please tell me how always sigma1 > sigma2 > ... sigma_r ?
that is by convention. That is what we as humans have decided to put the sigmas into the sigma matrix.