for future ref) In the first part of the video, X's _colums_ (not rows) are each points(correction: 5:02 not every rows but every columns are X average). And, note that the code is using 'svd' function, not 'pca' function. This can be confusing because Prof. Brunton says in the previous lecture(the first vid on PCA) that PCA assumes 'rows' represent each individual(e.g. person, etc.), contrast to SVD which assumes 'columns' does it. *_BUT,_* in the second part(ovarian cancer), even though the code is using 'svd' function, the 'obs' matrix is 216x4000(216 patients) where each 'row' represents individual patient. Thus, here, U and V is actually like V and U in the first part of the lecture, respectively. Also, in the for loop in the code, the code plots each patient(each dot) in the 3 "principal" axes, in the for loop(in Matlab, A' means conjugate tranpose of A). *_However,_* the code calculates the dot products of the two long vectors(4000 elements, and this can be even larger in different examples). We _don't_ need this calculation because U already contains the exact same values(this U would have been V if each individual patient were represented by column, not row). So, we can just use U(i,1), U(i,2), U(i,3) for x,y,z in the for loop, instead of calculating dot products. (I don't use MATLAB but it should work. If it were Python, the only difference would be the indices start from 0 and using square brackets instead of parentheses). But, still, knowing why those dot products("projection" onto orthonormal vector, in this case) works is important in understanding SVD and PCA. Anyway, thanks a lot for this great series of lectures, awesome.
THANK YOU. Your comment has saved my sanity, it was the final puzzle piece that made it all click. This should be included in the description of the video word-for-word.
Code is more understandable for me, thanks for your great job. This example has shown how PCA looks like in the gemotry way. Also there's some implicit relationship between the data points' shape and the centralized matrix's transformation capability which is not mentioned in linear algebra course.
Thank for this excellent lecture. I have a question. Why didn't you subtract the mean of the rows before computing the SVD like in the previous example & explained in the PCA video?
Well, I will try answer to your question. Actually, what you are reconstructing in the Gaussian Data example with SVD is a scale version of the standard deviation: STD/SQRT(n). And this is done because he is trying to plot different confidence intervals. As you may remember the equation for the confidence interval is the Mean + Z*STD/SQRT(n). Which is what he is plotting in the red circles. So with a Z value of 3, you can capture almost 99 % of the data, which is what we see in the plots. Thus that normalization term is only because of the application of that code only. For PCA you don´t have to always normalize or standardize data, it is only needed when you are working with correlations or when the application demands it. In fact, if you are working with SVD, Data doesn´t even have to be centered by the mean, which is one of the advantages to USE truncated SVD instead of PCA
He is working with SVD as a way to compute PCA and one of the advantages of SVD in front of classical PCA with the eigendecomposition formulation is that SVD or truncated SVD does not require the data to be centered by the mean
Excellent lecture. Question: once you have determined the magnitude of the principle components is there a way of determining which features they represent in your original data? For instance determining which features from the cancer data correlated strongest to a cancer diagnosis?
Mr Steve first of all I would like to thank you about this video.. secondly I would like to ask you a question because it is my first time studying on PCA.. at the point 5:47 of your, video you explained that you divide B with the square root of the nPoints, how this come up.. I mean you did this because you wanted to minimize the value of the division????
great explanation! thanks! may i know how do i tell which genes has the highest "impact" with regards to PC 1 ? (in the Ovarian Cancer example) - Is there a way i can tell from matrix U or matrix V ? i just learnt PCA 3 days ago , sorry if this is a noob question :)
You tell by the sigma matrix afaik. Look for the largest eigenvalue in sigma, and find it's corresponding eigenvector in V, and that's your most significant factor.
Just to clarify, when you mention the energy of the statistical data, you're referring to the extent to which it captures the trend in the data, right?
Thank you so much! I code the same with you, but 2 line are not perpendicular. In the code, the circle must be red ('r-') and the line must be blue ('c-').
Wonderful series of lectures. I have a question regarding using the top 3 PCAs. Why are you not scaling the top 3 eigen vectors with their associated eigen values from S in order to find x, y and z?
@Steve I work with spatial time series data(3d x.y.t eg: temperature). I seen codes reshaping the spatial dimensions into 1d, so I have 2d series then apply the PCA analysis. But I need to work on vectorial data (eg: wind) which is in components (u,v) in 3d. Which makes then 4d...will it make sense if I reshape 3d(spacial 2d+ components 1) into a 1d which makes 2d data and then apply the pca?
I got a little bit confused, what's the intuition behind calculating x, y, and z by doing V times b (observations)? What is x, y, and z showing? Sorry for the silly question, thanks in advance.
Here, x y and z are just the first three principal components of the data set. So it allows us to visualize how the data scatters in these new V coordinates. There are interesting patterns in V(:,4) and V(:,5) too, but I can't plot in x y z u v coordinates and make sense of it as a puny human stuck in 3D.
@@AdityaDiwakarVex V * obs essentially takes the "observations" (i.e. the data) and transforms it into the V coordinate system. (it is also V-transpose * obs, which is an important subtlety when computing these things)
@@Eigensteve Oh right, I do see that it is V-transpose. Thank you so much, that cleared it up. You are easily one of the best professors/teachers I've come across, thank you!
BTW to use a legend for the ovarian-data you can make use of plot handlers as follows: h = zeros(2,1); ... if(grp{i}=='Cancer') h(1) = plot3(...); else h(2) = plot3(...); ... legend(h, 'Cancer', 'Normal')
Been trying to to write a formula to combine both Honey Mustard [ detaSet ] and Ranch BBQ Sauce [ dataSet(2×2) ] as one component while randomly scaling calories and sugar. Don't see what The Matrix movie has to do with anything though.
Nice. Actually, people do think about food, flavors, and chemistry in PCA coordinates. Some neat and unexpected food pairings have been discovered this way.
Hello sir, please I'm using your code to visualize the classification of ECG signals with 3 labels. The diagram generating is not correct. I think the problem is from the "for" loop. Please help me rectify this coz I tried severally but to no avail
Can someone explain why for the log and cumulative singular value graphs, we have 216 along the x axis? Why is it not 4000 for the number of genetic markers?
Hi Erik. I got the same confusion at the beginning. But diving a bit to Steve's previous video, singular value plot is to show how much variance is captured by each principal component and cumulative sum via Sum(lambda_k)/all the lambda's. From dimension-wise, the dimension of B matrix (subtracted with means) is 216*4000. Through SVD, U is 216*216, Sigma is 216*216 and V transpose is 216*4000. I think both plots are drawn against the number of sigma's (216 here).
Hello Steve ... I would like first to Thank you by your effort in sharing and teaching this amazing technics. I also would like to ask you if it is possible you make a video on how to find the best r value using the Gavish-Donoho method using python language. This would be very useful for me. Thanks a lot and keep going.
Is there a convention about signs? I was convincing myself.. and what made me confused is T1, T2, T3 (scores) matrices in code below have same values with different signs. I found some article and code about flipping sign of svd and pca but I couldnt be sure... I'd be very happy if you made it clear for me, thanks! %% CODE clear; close all; clc; load fisheriris X = meas; % X = 5*randn(300, 10); [W, D] = eig(X'*X); W = W(:, end:-1:1); D = D(end:-1:1, end:-1:1); T1 = X*W; [U, S, V] = svd(X, 'econ'); T2 = U*S; [coeff, score, latent] = pca(X, 'Algorithm', 'svd', 'centered', false); T3 = score;
great Steve ... I would like to Thank you for your effort . I ask you to help me please with Matlab code to make feature Extraction using PCA to galaxy images ..I searched a lot and did not find any result .
What I think he is actually doing is capturing, in SVD procedure, the diffeomorphism (fancy terminology but in few words is a linear transformation which inverse exists in this case ) to reconstruct STD/SQRT(N) which is part of the equation for the confidence interval. Then he is plotting those confidence intervals up to a z value of 3 that corresponds to almost 99% of the data
( know this comment is 2 years old but..) but if anyone else is wondering, scroll back to how X is created initially. And note that U only rotates vectors and sigma stretches. And, U, sigma are SVDs of B.
this example is way to complicated. you should stick to like 10-20 data points for initial demonstration. Otherwise its too hard to understand exactly. only in hand wavy terms
for future ref) In the first part of the video, X's _colums_ (not rows) are each points(correction: 5:02 not every rows but every columns are X average).
And, note that the code is using 'svd' function, not 'pca' function.
This can be confusing because Prof. Brunton says in the previous lecture(the first vid on PCA) that PCA assumes 'rows' represent each individual(e.g. person, etc.), contrast to SVD which assumes 'columns' does it.
*_BUT,_* in the second part(ovarian cancer), even though the code is using 'svd' function, the 'obs' matrix is 216x4000(216 patients) where each 'row' represents individual patient. Thus, here, U and V is actually like V and U in the first part of the lecture, respectively.
Also, in the for loop in the code, the code plots each patient(each dot) in the 3 "principal" axes, in the for loop(in Matlab, A' means conjugate tranpose of A).
*_However,_* the code calculates the dot products of the two long vectors(4000 elements, and this can be even larger in different examples).
We _don't_ need this calculation because U already contains the exact same values(this U would have been V if each individual patient were represented by column, not row).
So, we can just use U(i,1), U(i,2), U(i,3) for x,y,z in the for loop, instead of calculating dot products.
(I don't use MATLAB but it should work. If it were Python, the only difference would be the indices start from 0 and using square brackets instead of parentheses).
But, still, knowing why those dot products("projection" onto orthonormal vector, in this case) works is important in understanding SVD and PCA.
Anyway, thanks a lot for this great series of lectures, awesome.
THANK YOU. Your comment has saved my sanity, it was the final puzzle piece that made it all click. This should be included in the description of the video word-for-word.
Excellent! Your 15-minute video really captures the majority of the 100 years of information on PCA. SVD works!
thank you so much! my understanding increased exponentially when you explained with the ovarian cancer example.
Code is more understandable for me, thanks for your great job. This example has shown how PCA looks like in the gemotry way. Also there's some implicit relationship between the data points' shape and the centralized matrix's transformation capability which is not mentioned in linear algebra course.
Thank for this excellent lecture. I have a question. Why didn't you subtract the mean of the rows before computing the SVD like in the previous example & explained in the PCA video?
I just don't understand why for the ovarian cancer example you don't do the preprocessing steps (mean and division by sqrt(Nmeas))
Well, I will try answer to your question. Actually, what you are reconstructing in the Gaussian Data example with SVD is a scale version of the standard deviation: STD/SQRT(n). And this is done because he is trying to plot different confidence intervals. As you may remember the equation for the confidence interval is the Mean + Z*STD/SQRT(n). Which is what he is plotting in the red circles. So with a Z value of 3, you can capture almost 99 % of the data, which is what we see in the plots. Thus that normalization term is only because of the application of that code only. For PCA you don´t have to always normalize or standardize data, it is only needed when you are working with correlations or when the application demands it. In fact, if you are working with SVD, Data doesn´t even have to be centered by the mean, which is one of the advantages to USE truncated SVD instead of PCA
Great Video , But one confusion , Arent we supposed to subtract the mean before computing the SVD? in the ovarian cancer case
He is working with SVD as a way to compute PCA and one of the advantages of SVD in front of classical PCA with the eigendecomposition formulation is that SVD or truncated SVD does not require the data to be centered by the mean
At 8:27, isn't it the columns of V (not U) that point into the directions of maximum variance?
thank you so much for this video. The exaplanation with the example is gold.
Excellent lecture. Question: once you have determined the magnitude of the principle components is there a way of determining which features they represent in your original data? For instance determining which features from the cancer data correlated strongest to a cancer diagnosis?
Lovely setup and great presentation. Thanks
Mr Steve first of all I would like to thank you about this video.. secondly I would like to ask you a question because it is my first time studying on PCA.. at the point 5:47 of your, video you explained that you divide B with the square root of the nPoints, how this come up.. I mean you did this because you wanted to minimize the value of the division????
Thanks..
How can i do varimax rotation to pca's in matlab???
such a nice presentation
This is what I'm looking for and what I'm coming for
Thanks...
High quality presentation, Thanks for sharing.
Glad you liked it!
great explanation! thanks! may i know how do i tell which genes has the highest "impact" with regards to PC 1 ? (in the Ovarian Cancer example) - Is there a way i can tell from matrix U or matrix V ? i just learnt PCA 3 days ago , sorry if this is a noob question :)
You tell by the sigma matrix afaik. Look for the largest eigenvalue in sigma, and find it's corresponding eigenvector in V, and that's your most significant factor.
Just to clarify, when you mention the energy of the statistical data, you're referring to the extent to which it captures the trend in the data, right?
Thank you so much! I code the same with you, but 2 line are not perpendicular. In the code, the circle must be red ('r-') and the line must be blue ('c-').
Also, is there anyway to get this code for practice? .. Thank you in advance!!
Wonderful series of lectures. I have a question regarding using the top 3 PCAs. Why are you not scaling the top 3 eigen vectors with their associated eigen values from S in order to find x, y and z?
Because that only tells you how much variance there is in those directions, he only projects the datapoints into those directions and plots.
@Steve I work with spatial time series data(3d x.y.t eg: temperature). I seen codes reshaping the spatial dimensions into 1d, so I have 2d series then apply the PCA analysis.
But I need to work on vectorial data (eg: wind) which is in components (u,v) in 3d. Which makes then 4d...will it make sense if I reshape 3d(spacial 2d+ components 1) into a 1d which makes 2d data and then apply the pca?
Off-topic, but how do you get the IDE to be dark for your presentations?
he inverted the color on the OS level
Is singular data decomposition also use in 3-dimensional data plots?
I got a little bit confused, what's the intuition behind calculating x, y, and z by doing V times b (observations)? What is x, y, and z showing? Sorry for the silly question, thanks in advance.
Here, x y and z are just the first three principal components of the data set. So it allows us to visualize how the data scatters in these new V coordinates. There are interesting patterns in V(:,4) and V(:,5) too, but I can't plot in x y z u v coordinates and make sense of it as a puny human stuck in 3D.
@@Eigensteve Oh, so it's the data reconstruction but just using 3 of the components rather than all 4000 of them... why do we do V * d or V * obs?
@@AdityaDiwakarVex V * obs essentially takes the "observations" (i.e. the data) and transforms it into the V coordinate system. (it is also V-transpose * obs, which is an important subtlety when computing these things)
@@Eigensteve Oh right, I do see that it is V-transpose. Thank you so much, that cleared it up. You are easily one of the best professors/teachers I've come across, thank you!
really great video! However, can I relocate the PC1 2 3 to the actual variables?
what're the differences between the 2 Dimensional and 3 Dimensional data set plots?
How is Proper Orthogonal Decomposition ,used in fluids , is different from PCA or SVD.
BTW to use a legend for the ovarian-data you can make use of plot handlers as follows:
h = zeros(2,1);
...
if(grp{i}=='Cancer')
h(1) = plot3(...);
else
h(2) = plot3(...);
...
legend(h, 'Cancer', 'Normal')
Been trying to to write a formula to combine both Honey Mustard [ detaSet ] and Ranch BBQ Sauce [ dataSet(2×2) ] as one component while randomly scaling calories and sugar. Don't see what The Matrix movie has to do with anything though.
Nice. Actually, people do think about food, flavors, and chemistry in PCA coordinates. Some neat and unexpected food pairings have been discovered this way.
One other question, in the U/S/V, which index corresponds to PC2? is it V(2,1) or V(2,2). Thank you!
NVM, I think I answered question. Variable "V" is 4000x216 so I believe it would be row "label" if there were one for ovarian cancer data?
dear Steve, I see that in my data set 2 states contribute to 90% of the data, how do I know, which ones?
which matlab software to use i have 2014
Hello sir, please I'm using your code to visualize the classification of ECG signals with 3 labels. The diagram generating is not correct. I think the problem is from the "for" loop. Please help me rectify this coz I tried severally but to no avail
Can someone explain why for the log and cumulative singular value graphs, we have 216 along the x axis? Why is it not 4000 for the number of genetic markers?
Hi Erik. I got the same confusion at the beginning. But diving a bit to Steve's previous video, singular value plot is to show how much variance is captured by each principal component and cumulative sum via Sum(lambda_k)/all the lambda's. From dimension-wise, the dimension of B matrix (subtracted with means) is 216*4000. Through SVD, U is 216*216, Sigma is 216*216 and V transpose is 216*4000. I think both plots are drawn against the number of sigma's (216 here).
where i can find the code cheers
2:10 Actually, that would be 16 times as much variance
Hello Steve ... I would like first to Thank you by your effort in sharing and teaching this amazing technics. I also would like to ask you if it is possible you make a video on how to find the best r value using the Gavish-Donoho method using python language. This would be very useful for me. Thanks a lot and keep going.
Thanks for the comment. Yes, that video is coming up (in Matlab and Python). Just need a few days to process and upload.
Python? What snakes gotta do with it?
Is there a convention about signs? I was convincing myself.. and what made me confused is T1, T2, T3 (scores) matrices in code below have same values with different signs. I found some article and code about flipping sign of svd and pca but I couldnt be sure... I'd be very happy if you made it clear for me, thanks!
%% CODE
clear; close all; clc;
load fisheriris
X = meas;
% X = 5*randn(300, 10);
[W, D] = eig(X'*X);
W = W(:, end:-1:1);
D = D(end:-1:1, end:-1:1);
T1 = X*W;
[U, S, V] = svd(X, 'econ');
T2 = U*S;
[coeff, score, latent] = pca(X, 'Algorithm', 'svd', 'centered', false);
T3 = score;
great Steve ... I would like to Thank you for your effort . I ask you to help me please with Matlab code to make feature Extraction using PCA to galaxy images ..I searched a lot and did not find any result .
I think any old PCA code in Matlab will work if your data is structured as a matrix.
love the series, thank you
Thanks Steve for the amazing explanation! One thing I dotn't quite understand: why U*S*[cos(theta); sin(theta)] captures 1 std of the data?
What I think he is actually doing is capturing, in SVD procedure, the diffeomorphism (fancy terminology but in few words is a linear transformation which inverse exists in this case ) to reconstruct STD/SQRT(N) which is part of the equation for the confidence interval. Then he is plotting those confidence intervals up to a z value of 3 that corresponds to almost 99% of the data
( know this comment is 2 years old but..)
but if anyone else is wondering, scroll back to how X is created initially. And note that U only rotates vectors and sigma stretches. And, U, sigma are SVDs of B.
can i get the code?
All code on databookuw.com
16 times more variance in one direction than the other 2:10
Me being a simple pleb: That looks like a galaxy!
No youre right it does.
this example is way to complicated. you should stick to like 10-20 data points for initial demonstration. Otherwise its too hard to understand exactly. only in hand wavy terms
Can i get this matlab code pls