Principal Component Analysis (PCA) - easy and practical explanation

Biostatsquid

มุมมอง 37 937

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 1 มิ.ย. 2024
In this video, I will give you an easy and practical explanation of Principal Component Analysis (PCA) and how to use it to visualise biological datasets.
You can also find a step by step explanation here: biostatsquid.com/pca-simply-e...
Hope you like it!
--------------------------------------------------------------------------------------------------------------------
Watched it already?
If you liked this video or found it useful, please let me know! Your comments and feedback are very much appreciated😊
If you have questions, don't hesitate to leave me a comment down below, I will answer as soon as I can:)
--------------------------------------------------------------------------------------------------------------------
For more biostatistics tools and resources, you can visit: biostatsquid.com/
for more
• simple and clear explanations of biostatistics methods
• computational biology tools
• easy step-by-step tutorials in R and Python
To analyse and visualise your biological data!
Or follow me on Instagram at @biostatsquid: / biostatsquid
Don’t forget to subscribe if you don’t want to miss another video from me!
--------------------------------------------------------------------------------------------------------------------
More PCA resources:
A more deep explanation of the math behind PCA, without math!
towardsdatascience.com/princi...
I also really love this explanation from StatQuest!
• StatQuest: Principal C...

ความคิดเห็น • 78

@busyshah หลายเดือนก่อน ⁺⁶
I am now convinced that there are no tough subjects, only ineffective tutors. I have been struggling to understand this concept for over 3 years, and here I am, within 11 minutes things have fallen into place.
An expert not necessarily be a great teacher. There might be great experts assigned in educational institute to teach such concepts.
But someone like you is what we need in our schools and colleges (expert and well articulated).
Simplicity is the utmost form of sophistication.
Thanking you from the bottom of my heart.
Keep on helping people like us.
Perhaps another video on how to do it in R will be great hit.
@biostatsquid หลายเดือนก่อน ⁺¹
Thank you so much for your kind words, I'm really flattered! I'm glad it was useful and it cleared up concepts for you:) Great idea about an R tutorial, will definitely add it to my todo list!
@jumazevick5143 3 หลายเดือนก่อน ⁺¹⁵
This video just solved half of my problem in understanding PCA stats. To solve the other half is I need to translate the info to my actual research.
@sandracrnipolsek2289 ปีที่แล้ว ⁺²⁵
You are for sure principal component #1! You're the best at describing information ;)
@kyaw94 5 วันที่ผ่านมา
I'm currently watching without logging into my Google account. 😊 However, halfway through, I made the decision to log in, hit the like button, and subscribe to your channel. 🎉 Thank you for your valuable content-it's truly helpful, and I encourage you to keep up the great work! 👍
@cuby4942 24 วันที่ผ่านมา
I have watched so many videos trying to understand pca ..and this by far is the most interesting with fundamentals fully explained
@edwardhitti3422 ปีที่แล้ว ⁺⁴
Amazingly well explained
@elmoelmo6505 8 วันที่ผ่านมา
Hi thank you so much for explaining PCA in such a clear way. I've been really stressed about understanding it for my uni stats exam, but now I feel much more confident :)
@user-zn9wj3wk8f 7 หลายเดือนก่อน
Best explanation in TH-cam, awesome.
@julenekenyon3278 7 หลายเดือนก่อน
Very well explained, thank you!
@aditisharma9369 5 หลายเดือนก่อน
wow!!! that was explained so nicely by you..... thank you!
@wobby7055 20 วันที่ผ่านมา
So well explained. Thanks a bunch!
@antaraghoshal4012 5 หลายเดือนก่อน
Best video to understand PCA plot 😊
@ruthgyereh1052 ปีที่แล้ว ⁺¹
Fantastic presentation.
@michaellewandowski5489 3 หลายเดือนก่อน
This was excellent. Some people just know how to explain things
@Ms1Unique 8 หลายเดือนก่อน
Thank you. Well explained!
@tubihemukamamethodius6952 5 หลายเดือนก่อน
You really understand what you were talking, big up
@nibirsaadman8214 ปีที่แล้ว ⁺¹
Wow! best PCA video on youtube.
@huyquach5044 ปีที่แล้ว
Thank you very much for your clear explanation.
@harshaharod2076 7 หลายเดือนก่อน
just on point!Loved it!
@serendipitum1694 ปีที่แล้ว ⁺²
super elegant and clear explanations, thank you!
@biostatsquid ปีที่แล้ว
Thank you, I'm happy you found it useful:)
@IqbalPrawira 2 หลายเดือนก่อน
the best explanation. easy to understand.
@moniquebrasilbaptista1989 หลายเดือนก่อน
Loved it! It's a really comprehensive explanation!😍
@Dr.AmulyaPanda หลายเดือนก่อน
Simply excellent !
@RafaelRabinovich 5 หลายเดือนก่อน ⁺¹
It would be great to have PCA explained conceptually, mathematically, as well as programmatically. When push comes to shove, we'll need to do it in a computer, running an algorithm that either we have to put in, or call from a Python library.
Thank you for all the work you do educating us!
@monicaaelavarthi5637 7 หลายเดือนก่อน
Well explained. Thank You.
@tathagatasharma 5 หลายเดือนก่อน
Thank you very much, very well explained
@coder-c6961 6 หลายเดือนก่อน
Very great example this was the exact example of what im doing too!
@vylam1521 2 หลายเดือนก่อน
Thanks for making amazing video help me explain things I have been researched for days.
@dannggg 17 วันที่ผ่านมา
Very good high level video!
@ranjanpal7217 6 หลายเดือนก่อน
Amazing explanation
@ricardoveiga007 หลายเดือนก่อน
Amazing content, clearly explained! :)
@mohammedy.salemalihorbi1210 2 หลายเดือนก่อน
Good explanation, Thats great!
@tarathetortoise 3 หลายเดือนก่อน
Was going insane looking for an understandable explanation of "what" a PCA is, until I found this video! Thank you very much!
@biostatsquid 2 หลายเดือนก่อน
Thank you for your kind words!! Glad it helped:)
@ahmadebrahem4611 หลายเดือนก่อน
very well explaining
@andrefsr00 2 หลายเดือนก่อน
Nice, video, thanks!
@AW12 ปีที่แล้ว ⁺¹
Great I benefited a lot!
@mdmahmudulhasanmiddya9632 8 หลายเดือนก่อน
Very good explanation mam
@nchimunyamuloongo4436 2 หลายเดือนก่อน
Woow.. this is so helpful
@Aoffyfeefy ปีที่แล้ว
Nice video😊
@zdkr_4ii หลายเดือนก่อน
Lovely video, thank you for explaining!
@biostatsquid หลายเดือนก่อน
Glad it was helpful! You're very welcome:)
@user-sh8og4pp9p 6 หลายเดือนก่อน
very nice
@onatovonatovic526 5 หลายเดือนก่อน
i wish i could hug you, thank you so much
@mariamontero5651 4 หลายเดือนก่อน
really nice, congratulations for your video! I follow you now :)
@basuumer501 3 หลายเดือนก่อน
Nice, very nice
@user-lb7yq4ws8z หลายเดือนก่อน ⁺¹
it's well explained for begginer to understand the plot,but if you wanna know how to do it,this video can't help you
@cujo7494 4 หลายเดือนก่อน
Very well explained. Two questions:
How do they know which dimensions they have to combine into a PCA to explain most of the variance? The combinations are limitless especially for single cell sequencing analysis.
Can combining dimensions also reduce variance explanation? Like dimension 1 + dimension 2 explains 50% but dimension 1 + 3 explains 30%? How do you make sure this doesn't happen?
@biogfp9340 5 หลายเดือนก่อน
I'd love to have a tutorial on how to perform this on R. This was very well explained.
@biostatsquid 4 หลายเดือนก่อน
Great suggestion! I cover it a bit in the preprocessing video but maybe a specific video for PCA in R would be good - I'll keep it in mind! Thanks!
@shivavyavahare 4 วันที่ผ่านมา
How to explain which factors contribute to PC1 and PC2? by biplot graph.
@zeeshanazam5104 4 หลายเดือนก่อน
I have one question if I have 60(A1-A60) variables with a 2k sample size,
A1 is the first and A60 is last, in between these A10, A20, A30, A40, A50 and the confirmed output but for some of the samples the A19, A29 output doesn't exist, as A20 reached earlier, the data is of this type for some reasons.
Will PCA work in the same way as explained?
@jackdawson7385 18 วันที่ผ่านมา
Please can u tell me how can we calculate principal loading. I am a bit confused to this part.
@kishranai6262 8 หลายเดือนก่อน ⁺¹
Hi
Good presentation on PCA. Can we apply PCA on a dataset that have numeric and categorical data? Also do we need to ensure that each variable follow a normal distribution if it does not what should we do? Also do we need to normalised each of variables? Appreciate your comments.
@biostatsquid 8 หลายเดือนก่อน ⁺²
Hi, great questions. PCA is not recommended for categorical data - even if you one-hot encode it. For mixed data types, there are better alternatives like Multiple Factor Analysis available in the FactoMineR R package (FAMD()) or Multiple Factor Analysis (MFA()) is also an option. I haven't got experience with either but you can check the thread here: stats.stackexchange.com/questions/5774/can-principal-component-analysis-be-applied-to-datasets-containing-a-mix-of-cont
Yes, it is necessary to standardise data before performing PCA because PCA basically maximises the variance. So if you have some variables with a very large variance and some with little variance, it will give more importance to the variables with large variance. If you change the scale of one of your variables, e.g., weight of mice, from kg to g, the variance increases, and the variables 'weight' will go from having little impact to be the main feature that explains variance in your dataset. Standardising will do the trick since it makes the SD of all the variables the same (normalization does not make all variables to have the same variance). Hope this was clear!
@lizheltamon ปีที่แล้ว ⁺¹
Hi! I really love your explanation! Would it be possible to get a copy of the dataset? I need to teach PCA and i think this a nice example cause the relationships between the factors are easy to understand! Would definitely point them to this video!
@biostatsquid ปีที่แล้ว
Hi Liz, thanks for your feedback! Unfortunately I cannot share my dataset, not because I don't want to, but because there is no dataset! I just made up the categories and figures for illustration purposes, just cause it is easier to understand when the factors are 'obvious'. So sorry to disappoint you...
However, you can check out my post here in case it is helpful: biostatsquid.com/pca-simply-explained/
@lizheltamon ปีที่แล้ว
@@biostatsquid no worries thanks so much!
@nicthofer 2 หลายเดือนก่อน
How to obtain the loadings? is it the same to eigenvectors or scaled coordinates?? in my geochemical software iogas the report of PCA contain this items: Correlation - Eigenvectors - Eigenvector Plots - Eigenvalues - Scree Plot - Scaled Coordinates - PC1 vs PC2 - PC1 vs PC3 - PC1 vs PC4 and so on... (the last is PC3 vs PC4). My input was 32 chemical elements previously transformed with CLR
@nicthofer 2 หลายเดือนก่อน
Here is the ioGAS description for Scaled Coordinates:
"Created by scaling the length of the eigenvector to the eigenvalue. All eigenvectors have a length of 1 so scaling by the eigenvalue changes the lengths so that the length is proportional to the variance (eigenvalue) accounted for by that eigenvector.
Click on a PC header column to sort the scaled coordinates from lowest to highest or vice versa."
And for Eigenvectors:
"Eigenvectors are PCA coordinate values that correspond to the projected location of the original input variables onto the calculated PCA axes. PC1, or the first eigenvector, is a calculated line of best fit through the maximum direction of variation for the selected variables. The PC1 eigenvectors represent the value of each input put along this line. PC2 is a line of best fit through the maximum variation at right angles to PC1 so the PC2 eigenvalues are the original input variable values projected onto this axis, and so on for each of the number of principal components.
An eigenvector may be in either of two opposite directions. ioGAS will always choose the eigenvector whose first element is positive. Click on a PC header column to sort the eigenvectors from lowest to highest or vice versa."
@nicthofer 2 หลายเดือนก่อน
Ahhh, I think the Loadings are equal to Scale Coordinates 😅
@brettlidbury4110 4 หลายเดือนก่อน
Thank you for your video. After you have assigned PC1 to PC5 ..., you show the PC matrix in order reflecting the amount of variation explained, where there are a variety of values listed under each PC from - 6 to +6. What do these values represent?
@biostatsquid 4 หลายเดือนก่อน ⁺¹
Hi! Thanks for your question! So the values are just an example, they don't necessarily go from -6 to +6. Basically, the values represent the 'contribution' of that variable to a specific PC. Since PCs are ranked by the variation of the dataset they explain (PC1 explains more than PC2, which in turn explains more than PC3...), variables with higher (more positive) or lower (more negative) scores for lower PCs (i.e., PC1) are 'more important', in other words, they explain more variability in the dataset. Hope this helped!
@brettlidbury4110 4 หลายเดือนก่อน
Thank you very much for your rapid reply and explanation. I thought that this was the case, but was not certain. As an extension of my question, do these + or - values under each PC align with a tick mark on the x:y and -x:-y axes? (for reference the axes you use to demonstrate these concepts around 5:10 to 5:30 minutes into your presentation). If "yes", and by way of feedback, having a scale on these axes would be helpful. I have watched 3 separate presentations on PCA today, and I have found yours the most useful. Thank you again, and in particular for responding to my question so quickly. Best wishes.
@biostatsquid 4 หลายเดือนก่อน ⁺¹
Hi thanks so much for your feedback! No, they're not! The tick marks represent increments of 1 (so 1, 2, 3, 4...) and I think my intention was to make them match the PC scores, but I must have changed the labels around to make it make sense with the biology and forgot to update the table. But they should match, so thanks for pointing that out! Will correct it if I ever do a part 2 on this:) Cheers @@brettlidbury4110
@brettlidbury4110 4 หลายเดือนก่อน
@@biostatsquid My pleasure and looking forward to the next installment (o:
@rd10718 2 หลายเดือนก่อน
Looking for a response from the Author - What is the signfiicance of a low PCA for a large biological data set? - Does a PC1 of
@biostatsquid 2 หลายเดือนก่อน ⁺¹
If PC1 is 20% it means it explains 20% of the variability of the dataset. You can then check which are the top contributing variables of PC1 to figure out what are the features of your dataset that explain most variability. In complex scenarios you might be happy with 20% of variability. For example, you are studying height in the human population, and want to figure out which genes contribute to height. You 'take' a sample of people with different heights, do RNAseq to figure out gene expression (this is a very simple example, but let's go with it). You do PCA on the gene expression counts of all genes in the human genome. PC1 explains 20% of variability (i.e., differences in height in the sample you took). Then you check and top PC1-contributing genes are X, Y, Z. So you know that X, Y, Z genes most probably play an important role in height. But of course this is only 20% of the variability of your data. What about the other 80%? Well, you forgot about other important factors that contribute to height, like diet, gender, genomic varaibility (not only transcriptomics, but also epigenetics, genomics might play an important role!) ... etc. Hope this made it a bit easier to understand!
@nikitrianta9896 ปีที่แล้ว
Very helpful video but I'm not sure I understand when to use PCA variable need to be correlated or not?
@biostatsquid ปีที่แล้ว
Hi Niki, not sure if I understand your question, could you rephrase it, please?
@nikitrianta9896 ปีที่แล้ว
@@biostatsquid Sorry it was not clear...I just wonder if there is a limitation at applying PCA only in cases of data where there is some correlation among the factors or some factors for example height and weight are correlated etc.
@biostatsquid ปีที่แล้ว ⁺¹
@@nikitrianta9896 Oh I see ! No, not at all, actually PCA allows you to gather insights about features describing our data - by looking at the coefficients of the features/variables for each PC you can find out if they are positively, negatively or not correlated.
If you want to visualise this you can draw a plot of the coefficients for PC1 vs PC2 (for example) for all features. For each feature, imagine (or draw) a vector with origin in (0, 0) to the point (coefficient PC1, coefficient PC2). Features that are positively correlated to each other have an angle between their vectors close to 0 degrees , if they are negatively correlated the angle between them is 180 and if they are not then the angle is close to 90 degrees.
Does this answer your question? :)
@ruthgyereh1052 ปีที่แล้ว
Do you by chance make time for appointments? I would be grateful. Thanks
@biostatsquid ปีที่แล้ว
Hi Ruth! Just send me an email describing your issue and I'll tell you if I can help:)
@tomnewman9306 3 หลายเดือนก่อน
At ~3:58 you say the principal components explain 85% of the variance in life expectancy. I don't think that's right. I think it's 85% of the variance in the predictor variables. Or am I totally confused?
@user-God-s-child-0101 วันที่ผ่านมา
Whole world creator's godfather bless you all always and you all love and remember godfather with your pure hearts.

ต่อไป

เล่นอัตโนมัติ

Volcano plots explained | How to interpret a volcano plot for DGE