Hey Spencer, great video, keep up the good work! I wanted to ask again if you have an academic article, paper or book that I could cite in my thesis when I do PCA?
Hello, I'm glad you liked it! 1) The (cor = True) parameter indicates whether to use to correlation or covariance matrix for the PCA calculation. In this case, since it is set to true, I am using the correlation matrix. 2) PCA is naturally trying to rotate the observation's axis for best fit. 3) How to decide what rotation is sort of an artform. You'd have to try different rotational methods to see which one is best suited to your use case. I would try maybe 2-3 different popular rotational algorithms and observe the outcomes.
@@SpencerPaoHere Thank you for the clarification! I am only familiar with PCA in SPSS. My teacher explained PCA in short as the following steps: - examine the correlation matrix Variables have to be correlated for PCA to be appropriate (i.e., if they are not correlated they are unlikely to share common factors). + as a guide look for correlations > 0.35 in absolute size - extract all potential factors - examine eigen values The total variance explained by each factor is the eigenvalue. The most common method is to only retain factors that account for variance >1. An alternative method is to use a scree plot (look for the break in the curve as an indication of the point at which further factors stop giving us a worthwhile extra amount of explained variance). - examine the factor matrix (loadings - determine which loadings load heavely on which components (does it make sense theoretically that they are loading heavely on a certain component?)) - examine the final stastistics (communalties fall since only a subset of the factors are used) Low communalties suggest that a variable may need to be excluded (i.e., it is not explain well by your components) - explore how rotations influence your PCA(if needed) How do you feel about the above explanation? Is it possible to have to highly correlated variables? E.g., would a correlation of >.90 be a problem for PCA? Also if I wanted to examine communalties in R do you know how I would code this in? I am just starting to learn about PCA, I do apologise if there are any mistakes in the above text.
@@XxRoos898xX Hello. Are you referring to the mathematics of PCA? I like the shortest explanations possible that address the main steps of the algorithm haha. But by all means, when it comes to teacher explanations, I'd stick with what they are saying since they are the ones who are grading your descriptions :p High level overview: 1) Compute covariance or correlation matrix 2) Calculate the Eigenvectors / eigenvalues from covariance or correlation matrix 3) Sort by the eigenvectors by eigenvalues and choose whichever number of eigenvectors fits your screeplot. 4) Transform your data into new subspace using the chosen eigenvectors The communalities you are referring to is equivalent to Sum of Squares of Loadings. Check out 15:02
@@SpencerPaoHere Thank you! I will rewatch your video again :) it truely is amazing! I will check out 15:02 again, thank you for your quick replies Sorry, my notes were from my teachers explanation on how to interpret/work through a spss output of a PCA - we didn't touch the algorithms behind PCA (I think this may be why it is difficult for me to follow some of the steps in R)
Very great video! Hope you will do more tutorials in R. Do you use stepAIC to do feature importance? How is AIC method different from PCA method? Thank you!
Hi! I'm glad you liked it. StepAIC is used to determine which model is best; it does this by calculating scores when subtracting features (hence can be considered a form of feature selection) The PCA method is used to shrink the number of features, reducing complexity but maximizing variance of your features.
Hello Spancer. First I want to thank you for your great video. I have a question, why i can't use 'princomp' for the reason "can only be used with more units than variables" . Is there a solution to solve it? Thank you.
I read an article where it said that there might be problems if the difference in item difficulties between the variables (the proportion of people agreeing to an item) is too high. It recommends to use correspondence analysis in this case.
oh haha. I just edited the video so that you won't see me type the words out (keeping out the more mundane parts) However, in R, there are ways to autocomplete. Try the 'Tab' + 'enter' button when writing variables/functions etc.
Hello, I have a PCA analysis where I used the prcomp and I want to show in the graph species and the environmental parameter that is sorting (grain size particle) but it does not show in the graph the sorting , this is the code that I have at the moment. Really appreciate all th help, thank you!!! pr.envt1
This sounds like a graphing issue. Have you taken a look at the sort() function? This can 'order' your independent variables when graphing. You can use the sort function on the "X" variables that you are trying to plot, and it should sort the variables to your liking.
@@SpencerPaoHere ok, so how will be the Rcode that I send you with the sort function. If you have an email address I be happy to send out my Rscript to see what I'm doing wrong, if is no problem with you
I am wary about revealing an email address publicly. Do you have a github? Maybe you can push your stuff up there and I can take a look at it? Or, even better, if you have reproducible code with a sample data (i.e the iris dataset) and can generate the problem, I can help diagnose the issue. But, in essence, for the variable that you wanted to plot, try doing something like plot(sort(name_variable)...)
Does having a categorical variable along with continuous variables make any difference? What modifications are needed in the analysis if the dataset is so? Thank you.
You'd have to one hot encode your categorical variables. Else, the interpretation of your categorical variables would be ordered. (which is something you don't want)
@@SpencerPaoHere So suppose my variable has been categorised in a rating from 1 to 5. So their values are like 0, 2, 5, 1 and so on. Can i as such use them for the analysis?
Now there is a really lengthy answer to that question and I would not be doing the answer justice without this post: -- in essence it depends. This might better answer your question. stats.stackexchange.com/questions/133492/creating-a-single-index-from-several-principal-components-or-factors-retained-fr
sir i have a data of categorical independent variables which i have converted into 0,1 and i am trying to fit a logistic regression because response variable is also categorical in nature 0,1 so can this technique be used to avoid multicollinearity problem in the dataset and do discriminate analysis for prediction
Yes. There are other forms of penalization techniques out there -- but you can use PCA to avoid multicollinearity (in fact that is one of the main purposes of PCA)
Thanks for the tutorial, but do u know how to convert the principal component analysis to the principal component regression? im stucked after i got the component that will be used for the regression but idk how to convert it, thx
@@SpencerPaoHere would we include all the components or just the main ones? And how would we do prediction using this regression as the test data will be in a different format ?
@@AJman24 The idea behind PCA is to maximize the variance explained with as few components as possible. So, whatever your threshold of variance is will determine how many components you want to use.
@@SpencerPaoHere but what if we want to compare the testing results with another model let's say ridge and we have already set aside a few observations for that and we want to test our pcr on the same observations as the ones that we used for ridge?
@@MarinaUganda Hmm weird. Once you have obtained your fit rotated on varimax, you would plot your loadings to get the visualization. Your code should look something like this: fit
Are you referring to records/rows? It really depends on what type of machine learning model you use. In general, the more data you have, the better your model will be off. (Not always the case i.e billions of records for a PCA might be a bit over board.) But, if you don't have a lot of data, then you will be worse off. (i.e < 100 records/observations)
I'd argue that anything is possible. I'd take a look at Georgia Tech (*Note that there are a ton EXCELLENT MS programs out there). But, I specifically know people who have gone through Georgia Tech's program and have heard of good things about it. They have a DS program that is remote / part time.
All the experts in youtube video say ‘PCA is dimensionality reduction and bla bla bla…’. However, no one explains in simpler way how the reduced dimensions or principal components (explain x% of variance) exactly mean regarding variables, in a way a beginner in statistics can understand.
Well.. technically the variances are the eigenvalues. You can checkout the covariance matrix between the principal components and check out how much variability is explained by each component. To find out much of the variability is explained, you get the diagonal values and divide by the sum of the diagonal values to get the 'explainability' of each component.... I am not sure if I answered your question.
Thank you, this was very helpful! One quick question I had was, if I wanted to know which variables were closely related to another variable, how could I interpret that? For example, of the 8 variables, which 3 are strongly related to bot.cannine? How would I go about doing that?
Hi! Try running the R^2 value to find the correlation amongst variables. If the relationship between say X and Y is close to 1, then you know that they are strongly related. You can run the correlation function on all the features and generate a correlation matrix.
Dear Respected Professor! Thank you so much for providing us free knowledge. I highly appreciate your precious efforts. Kindly please give me your email address since I want to send my issue regarding R codes to you. Thank you.
I am not a professor. :p Though I can answer any questions you might have in the comment section, I can be reached out at business.inquiry.spao@gmail.com
Thank you for doing such a straight forward and simple video. No one seems to have a video with PCA and Eigenvalues in real application to a dataset
Great video. Simple intuitive rundown of something complex.
Bloody love your channel mate - thanks for all your great work! Peace from UK :D
This is so incredibly helpful. Thank you!
Thanks for this presentation
Dude you're a lifesaver for these videos! THANK YOU! - Clinical Research PhD Student
Amazing video!
Good stuff! Im doing PCA for my senior project and this video helped me out a ton!
I'm glad this helped!
Very nicely explained!!!
Very helpful in preparing for exam PA. Thx ~
Thank you so much! It helped a lot!
do u know about multivariate PCA in R studio?
Hey Spencer, great video, keep up the good work!
I wanted to ask again if you have an academic article, paper or book that I could cite in my thesis when I do PCA?
Try This:
royalsocietypublishing.org/doi/10.1098/rsta.2015.0202
Thank you for sharing your knowledge. You could make a video about RDA and CCA. Greetings from Ecuador 🇪🇨🙋♂️
Hey Spencer Pao,
Thank you for the great video!
Just a few questions:
- Why are you entering cor = true (pc.teeth
Hello, I'm glad you liked it!
1) The (cor = True) parameter indicates whether to use to correlation or covariance matrix for the PCA calculation. In this case, since it is set to true, I am using the correlation matrix.
2) PCA is naturally trying to rotate the observation's axis for best fit.
3) How to decide what rotation is sort of an artform. You'd have to try different rotational methods to see which one is best suited to your use case. I would try maybe 2-3 different popular rotational algorithms and observe the outcomes.
@@SpencerPaoHere Thank you for the clarification! I am only familiar with PCA in SPSS.
My teacher explained PCA in short as the following steps:
- examine the correlation matrix
Variables have to be correlated for PCA to be appropriate (i.e., if they are not correlated they are unlikely to share common factors).
+ as a guide look for correlations > 0.35 in absolute size
- extract all potential factors
- examine eigen values
The total variance explained by each factor is the eigenvalue.
The most common method is to only retain factors that account for variance >1.
An alternative method is to use a scree plot (look for the break in the curve as an indication of the point at which further factors stop giving us a worthwhile extra amount of explained variance).
- examine the factor matrix (loadings - determine which loadings load heavely on which components (does it make sense theoretically that they are loading heavely on a certain component?))
- examine the final stastistics (communalties fall since only a subset of the factors are used)
Low communalties suggest that a variable may need to be excluded (i.e., it is not explain well by your components)
- explore how rotations influence your PCA(if needed)
How do you feel about the above explanation?
Is it possible to have to highly correlated variables? E.g., would a correlation of >.90 be a problem for PCA?
Also if I wanted to examine communalties in R do you know how I would code this in?
I am just starting to learn about PCA, I do apologise if there are any mistakes in the above text.
@@XxRoos898xX Hello. Are you referring to the mathematics of PCA? I like the shortest explanations possible that address the main steps of the algorithm haha. But by all means, when it comes to teacher explanations, I'd stick with what they are saying since they are the ones who are grading your descriptions :p
High level overview:
1) Compute covariance or correlation matrix
2) Calculate the Eigenvectors / eigenvalues from covariance or correlation matrix
3) Sort by the eigenvectors by eigenvalues and choose whichever number of eigenvectors fits your screeplot.
4) Transform your data into new subspace using the chosen eigenvectors
The communalities you are referring to is equivalent to Sum of Squares of Loadings. Check out 15:02
@@SpencerPaoHere Thank you! I will rewatch your video again :) it truely is amazing! I will check out 15:02 again, thank you for your quick replies
Sorry, my notes were from my teachers explanation on how to interpret/work through a spss output of a PCA - we didn't touch the algorithms behind PCA (I think this may be why it is difficult for me to follow some of the steps in R)
Very great video! Hope you will do more tutorials in R. Do you use stepAIC to do feature importance? How is AIC method different from PCA method? Thank you!
Hi! I'm glad you liked it. StepAIC is used to determine which model is best; it does this by calculating scores when subtracting features (hence can be considered a form of feature selection)
The PCA method is used to shrink the number of features, reducing complexity but maximizing variance of your features.
Hello Spancer. First I want to thank you for your great video. I have a question, why i can't use 'princomp' for the reason "can only be used with more units than variables" . Is there a solution to solve it? Thank you.
Thanks! Hmm. well, it is perfectly reasonable (and ideal) to have more observations than there are features! This helps with the collinear effect.
Where is the next part? I love your work
Hmm. You can probably scroll through the videos on my channel to see what you're looking for. if not, let me know!
Please do a video on " how to avoid overlapping of labels in kmeans clusterplot and pca scatterplot?"
Hmm. I don't really follow. Are you referring to a preprocessing step for the visualization side?
Hi Spencer, great video! I was wondering if you could use PCA for binary data? I see that the bot.canine feature is binary. Thanks!
Yes! You absolutely can use PCA for binary outcomes. But, there are better options to use when it comes to modeling categorical variables.
I read an article where it said that there might be problems if the difference in item difficulties between the variables (the proportion of people agreeing to an item) is too high. It recommends to use correspondence analysis in this case.
Hi, I wonder how you can do autocomplete variable so fast? In 11:02
oh haha. I just edited the video so that you won't see me type the words out (keeping out the more mundane parts)
However, in R, there are ways to autocomplete. Try the 'Tab' + 'enter' button when writing variables/functions etc.
@@SpencerPaoHere Thank you so much!!!
Hello, I have a PCA analysis where I used the prcomp and I want to show in the graph species and the environmental parameter that is sorting (grain size particle) but it does not show in the graph the sorting , this is the code that I have at the moment. Really appreciate all th help, thank you!!!
pr.envt1
This sounds like a graphing issue. Have you taken a look at the sort() function? This can 'order' your independent variables when graphing. You can use the sort function on the "X" variables that you are trying to plot, and it should sort the variables to your liking.
@@SpencerPaoHere ok, so how will be the Rcode that I send you with the sort function. If you have an email address I be happy to send out my Rscript to see what I'm doing wrong, if is no problem with you
I am wary about revealing an email address publicly. Do you have a github? Maybe you can push your stuff up there and I can take a look at it?
Or, even better, if you have reproducible code with a sample data (i.e the iris dataset) and can generate the problem, I can help diagnose the issue.
But, in essence, for the variable that you wanted to plot, try doing something like plot(sort(name_variable)...)
@@SpencerPaoHere ok awesome will do
Does having a categorical variable along with continuous variables make any difference? What modifications are needed in the analysis if the dataset is so? Thank you.
You'd have to one hot encode your categorical variables. Else, the interpretation of your categorical variables would be ordered. (which is something you don't want)
@@SpencerPaoHere So suppose my variable has been categorised in a rating from 1 to 5. So their values are like 0, 2, 5, 1 and so on. Can i as such use them for the analysis?
@@adityapratapsingh6068 Yep! Just one hot encode that feature. Then, you should be good to go!
How do we construct an Index with PCA? Do we multiply the raw data of each column with the proportion variation and sum them up?
Now there is a really lengthy answer to that question and I would not be doing the answer justice without this post: -- in essence it depends.
This might better answer your question.
stats.stackexchange.com/questions/133492/creating-a-single-index-from-several-principal-components-or-factors-retained-fr
@@SpencerPaoHere Thanks :) I have another question, do I need to scale my dataset before doing PCA if I were to use the 'prcomp' function in R?
@@kx7522 Yes! (Because PCA's backend end is sum of squares): You should scale & normalize.
sir i have a data of categorical independent variables which i have converted into 0,1 and i am trying to fit a logistic regression because response variable is also categorical in nature 0,1 so can this technique be used to avoid multicollinearity problem in the dataset and do discriminate analysis for prediction
Yes. There are other forms of penalization techniques out there -- but you can use PCA to avoid multicollinearity (in fact that is one of the main purposes of PCA)
Thanks for the tutorial, but do u know how to convert the principal component analysis to the principal component regression? im stucked after i got the component that will be used for the regression but idk how to convert it, thx
Hi!
You could run the lm() on your components and you Y variable.
So it would be something like this:
df
@@SpencerPaoHere OMG THANKS A LOT, U R HELPING TOO MUCH RIGHT NOW
@@SpencerPaoHere would we include all the components or just the main ones? And how would we do prediction using this regression as the test data will be in a different format ?
@@AJman24 The idea behind PCA is to maximize the variance explained with as few components as possible. So, whatever your threshold of variance is will determine how many components you want to use.
@@SpencerPaoHere but what if we want to compare the testing results with another model let's say ridge and we have already set aside a few observations for that and we want to test our pcr on the same observations as the ones that we used for ridge?
I have beent rying to plot a biplot of the varimax rotated components. Is this possible?
It should be! You can attempt to store the component values into X and Y variable and plot(X, Y) as needed.
@@SpencerPaoHere It won't work unfortunately when I use fviz.
@@MarinaUganda Hmm weird. Once you have obtained your fit rotated on varimax, you would plot your loadings to get the visualization.
Your code should look something like this:
fit
Does number of entry among columns matters or not?
Are you referring to records/rows? It really depends on what type of machine learning model you use. In general, the more data you have, the better your model will be off. (Not always the case i.e billions of records for a PCA might be a bit over board.)
But, if you don't have a lot of data, then you will be worse off. (i.e < 100 records/observations)
Hello, would you plz tell whether its possible for me to pursue MS in ML/DL in a good US uni if I have 3.5ish cgpa in undergrad?
I'd argue that anything is possible. I'd take a look at Georgia Tech (*Note that there are a ton EXCELLENT MS programs out there). But, I specifically know people who have gone through Georgia Tech's program and have heard of good things about it. They have a DS program that is remote / part time.
All the experts in youtube video say ‘PCA is dimensionality reduction and bla bla bla…’. However, no one explains in simpler way how the reduced dimensions or principal components (explain x% of variance) exactly mean regarding variables, in a way a beginner in statistics can understand.
how to analyze principal components using variance values ?.
Well.. technically the variances are the eigenvalues. You can checkout the covariance matrix between the principal components and check out how much variability is explained by each component. To find out much of the variability is explained, you get the diagonal values and divide by the sum of the diagonal values to get the 'explainability' of each component.... I am not sure if I answered your question.
Thank you, this was very helpful! One quick question I had was, if I wanted to know which variables were closely related to another variable, how could I interpret that? For example, of the 8 variables, which 3 are strongly related to bot.cannine? How would I go about doing that?
Hi! Try running the R^2 value to find the correlation amongst variables. If the relationship between say X and Y is close to 1, then you know that they are strongly related. You can run the correlation function on all the features and generate a correlation matrix.
Dear Respected Professor! Thank you so much for providing us free knowledge. I highly appreciate your precious efforts. Kindly please give me your email address since I want to send my issue regarding R codes to you. Thank you.
I am not a professor. :p
Though I can answer any questions you might have in the comment section, I can be reached out at
business.inquiry.spao@gmail.com
@@SpencerPaoHere Thank you so much for your kind reply.