Principal Component Analysis in R Programming | How to Apply PCA | Step-by-Step Tutorial & Example

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 มิ.ย. 2023
  • This video explains how to apply a Principal Component Analysis (PCA) in R. More details: statisticsglobe.com/principal...
    The video is presented by Cansu Kebabci, a data scientist and statistician at Statistics Globe. Find more information about Cansu here: statisticsglobe.com/cansu-keb...
    In the video, Cansu explains the steps and application of the Principal Component Analysis in R. Watch the video to learn more on this topic!
    Here can you find the first part of this series:
    Introduction to Principal Component Analysis (Pt. 1 - Theory): • Introduction to Princi...
    Links to the tutorials mentioned in the video:
    Can PCA be Used for Categorical Variables? (Alternatives & Example): statisticsglobe.com/pca-categ...
    PCA Using Correlation & Covariance Matrix (Examples): statisticsglobe.com/pca-corre...
    Biplot of PCA in R (Examples): statisticsglobe.com/biplot-pca-r
    R code of this video:
    install.packages("MASS")
    install.packages("factoextra")
    install.packages("ggplot2")
    Load Libraries
    library(MASS)
    library(factoextra)
    library(ggplot2)
    Import biopsy data
    data(biopsy)
    dim(biopsy)
    Structure of Data
    str(biopsy)
    summary(biopsy)
    Delete Cases with Missingness
    biopsy_nomiss <- na.omit(biopsy)
    Exclude Categorical Data
    biopsy_sample <- biopsy_nomiss[,-c(1,11)]
    Run PCA
    biopsy_pca <- prcomp(biopsy_sample,
    scale = TRUE)
    Summary of Analysis
    summary(biopsy_pca)
    Elements of PCA object
    names(biopsy_pca)
    Std Dev of Components
    biopsy_pca$sdev
    Eigenvectors
    biopsy_pca$rotation
    Std Dev and Mean of Variables
    biopsy_pca$center
    biopsy_pca$scale
    Principal Component Scores
    biopsy_pca$x
    Scree Plot of Variance
    fviz_eig(biopsy_pca,
    addlabels = TRUE,
    ylim = c(0, 70))
    Biplot with Default Settings
    fviz_pca_biplot(biopsy_pca)
    Biplot with Labeled Variables
    fviz_pca_biplot(biopsy_pca,
    label="var")
    Biplot with Colored Groups
    fviz_pca_biplot(biopsy_pca,
    label="var",
    habillage = biopsy_nomiss$class)
    Biplot with Customized Colored Groups and Variables
    fviz_pca_biplot(biopsy_pca,
    label="var",
    habillage = biopsy_nomiss$class,
    col.var = "black") +
    scale_color_manual(values=c("orange", "purple"))
    Follow me on Social Media:
    Facebook - Statistics Globe Page: / statisticsglobecom
    Facebook - R Programming Group for Discussions & Questions: / statisticsglobe
    Facebook - Python Programming Group for Discussions & Questions: / statisticsglobepython
    LinkedIn - Statistics Globe Page: / statisticsglobe
    LinkedIn - R Programming Group for Discussions & Questions: / 12555223
    LinkedIn - Python Programming Group for Discussions & Questions: / 12673534
    Twitter: / joachimschork
    Instagram: / statisticsglobecom
    TikTok: / statisticsglobe

ความคิดเห็น • 60

  • @macanbhaird1966
    @macanbhaird1966 11 หลายเดือนก่อน +3

    Great stuff. Clearly explained and easy to follow for a somewhat complicated analysis. Thanks for this!

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน +1

      Hello,
      I am so glad to hear that. You are welcome!
      Best,
      Cansu

  • @darrylmorgan
    @darrylmorgan 11 หลายเดือนก่อน +3

    Really helpful tutorial.Thank you Cansu and Joachim!!

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน

      Hello Darryl,
      I am glad that the tutorial was helpful for you. Welcome!
      Best,
      Cansu

  • @user-wq3df5wd2q
    @user-wq3df5wd2q 8 หลายเดือนก่อน +2

    This was an excellent presentation, and doubly-good when paired with the intro one. I agree with others that taking the final solution and being able to unscale and unrotate to get back to an original variable solution would have made this off the charts great!

    • @cansustatisticsglobe
      @cansustatisticsglobe 8 หลายเดือนก่อน

      Hello hello!
      Thank you for your encouraging feedback. I am not sure if I got your last point. Are you interested in finding original values from te calculated principal components?
      Best,
      Cansu

  • @johneagle4384
    @johneagle4384 11 หลายเดือนก่อน +2

    Thank you Joachim and Cansu. This is a good overview of PCA.

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน

      Hello John,
      I'm glad to hear that you're interested. If you're also considering learning how to perform PCA in Python, be sure not to miss the upcoming tutorial in this series
      Best,
      Cansu

  • @KameshwarChoppella
    @KameshwarChoppella 3 หลายเดือนก่อน +1

    Well done! This was simple and straightforward

    • @StatisticsGlobe
      @StatisticsGlobe  3 หลายเดือนก่อน

      Thanks a lot, glad you found it helpful! :)

  • @anuraratnasiri5516
    @anuraratnasiri5516 11 หลายเดือนก่อน +1

    Thank you so much for sharing valuable information about PCA!

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน

      Hello!
      You are very welcome! You can always visit our Statistics Globe webpage: statisticsglobe.com/ for further details about PCA.
      Best,
      Cansu

  • @greggunter5975
    @greggunter5975 2 หลายเดือนก่อน +2

    If this is Part 2 please label the video "Part 2" so it easy for people to watch them in sequence.
    Thanks

    • @StatisticsGlobe
      @StatisticsGlobe  2 หลายเดือนก่อน

      Hey, thanks for the feedback, Greg! You can also watch this video without watching the first one, if you are only interested in how to apply PCA in R.

  • @USKalemao
    @USKalemao 4 หลายเดือนก่อน

    Thanks a lot for this valuable video! It was very easy to follow the explanations.

    • @Ifeanyi.StatisticsGlobe
      @Ifeanyi.StatisticsGlobe 4 หลายเดือนก่อน

      Thanks, Kalemao for your kind words. We are happy that you found the video helpful!

  • @rubicleisgomes323
    @rubicleisgomes323 11 หลายเดือนก่อน +1

    I will use this example in my classes! Thank you very much.

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน

      Hello,
      I am glad that you liked the example. You are welcome.
      Bes,
      Cansu

  • @wakjiratesfahun3682
    @wakjiratesfahun3682 11 หลายเดือนก่อน +1

    Welcome Kansu! Excellent tutor . I love her way of teaching. Just 😮

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน +1

      Hello!
      Thank you for such kind words. Feel free to let me know if you have any questions about this topic.
      Best,
      Cansu

    • @wakjiratesfahun3682
      @wakjiratesfahun3682 11 หลายเดือนก่อน +1

      @@cansustatisticsglobe Sure. I would love to see you in principal coordinate analysis . Is it okay?

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน

      @@wakjiratesfahun3682 Noted!

  • @deeptimittal8420
    @deeptimittal8420 หลายเดือนก่อน +1

    Thank you Statistics globe for the insightful and informative video. Please keep posting these kind of tutorials.

    • @StatisticsGlobe
      @StatisticsGlobe  หลายเดือนก่อน

      Thanks a lot, will do! :)

    • @fisherh9111
      @fisherh9111 12 วันที่ผ่านมา

      The part 1 video does this well I think.

  • @preeyawangsomnuk189
    @preeyawangsomnuk189 10 หลายเดือนก่อน +2

    Thanks for your video.

    • @matthias.statisticsglobe
      @matthias.statisticsglobe 10 หลายเดือนก่อน

      Hey Preeya! Thanks for your comment, hope the video has been helpful!

  • @Geology_monster
    @Geology_monster 23 วันที่ผ่านมา

    Love u guys 🙌🏼

  • @AlbertoFCabreraCasillas
    @AlbertoFCabreraCasillas 11 หลายเดือนก่อน +4

    Excellent presentation of PCA analyses by Cansu. You may want to alert the reader that either MASS or factoextra libraries affect the tidyverse select() function. I had to reinstall tidyverse after completing this session. On another point, I wonder if Cansu will describe the process to rotate the factor solution. I assume the example dealt with an unrotated solution.

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน +1

      Hello Alberto,
      I am happy to hear that you liked the tutorial. Thank you for the feedback.
      I have double-checked the functions of the packages. There shouldn't be any function called select() in either of the packages to mask the select() function. However, there can be complex interactions between packages, and it is conceivable that other functions in these packages may somehow interfere with select(). If you are having issues, one way to potentially avoid them is to explicitly use the dplyr::select() syntax when you want to use the select() function.
      Regarding the second point, rotation is often used in factor analysis, like EFA, SEM, etc., to ease the interpretation of the factors. However, rotating may undermine the purpose of PCA, which is to capture as much variability as possible with the first few components. In other words, the first few rotated components might not explain as much of the variance in the data as the original principal components do. That is why, in practice, it's often more useful to interpret the loadings of the original variables on the principal components directly
      I hope it is clear. Please let me know if I am missing something, or in case of any further questions.
      Best,
      Cansu

  • @ginnistrehle426
    @ginnistrehle426 5 หลายเดือนก่อน +1

    Very helpful! Is there a way to only project the observations in the space instead of a biplot of both the observations and variables?

    • @cansustatisticsglobe
      @cansustatisticsglobe 5 หลายเดือนก่อน

      Hello!
      I am glad that you found the tutorial helpful. Sure! You can use scatterplots for that. Please see the tutorials on our website: statisticsglobe.com/scatterplot-pca-r
      Best,
      Cansu

  • @erdavg
    @erdavg 8 หลายเดือนก่อน +1

    Muchas graciaaaas

    • @matthias.statisticsglobe
      @matthias.statisticsglobe 8 หลายเดือนก่อน

      Thank you very much for the positive feedback. Hope the video has been helpful.

  • @Mustafa-dw3wm
    @Mustafa-dw3wm 5 หลายเดือนก่อน +1

    Very understandable. Thanks a lot. But one question.. Isn't it necessary before the PCA to make a Kaiser-Meyer-Olkin test?

    • @cansustatisticsglobe
      @cansustatisticsglobe 5 หลายเดือนก่อน

      Hello Mustafa,
      The Kaiser-Meyer-Olkin test is more commonly used for other factor analysis techniques like EFA and CFA. It may also be a useful tool for assessing the adequacy of your data for PCA. However, in practice, PCA usually proceeds without it, as PCA is used more for pattern recognition and dimensionality reduction rather than strict factor extraction like in EFA and CFA.
      Best,
      Cansu

  • @alessandrorosati969
    @alessandrorosati969 11 หลายเดือนก่อน +1

    can a dataset consisting of the principal components and the target variable be used to perform machine learning techniques?

    • @cansustatisticsglobe
      @cansustatisticsglobe 11 หลายเดือนก่อน

      Hello Alessandro,
      Yes, you can.
      Best,
      Cansu

  • @BhanuBhaktaSharma-ut5zw
    @BhanuBhaktaSharma-ut5zw 7 หลายเดือนก่อน +1

    Hi is there any way of doing PCA for different variables with unequal numbers of values without omitting?

    • @cansustatisticsglobe
      @cansustatisticsglobe 7 หลายเดือนก่อน

      Hello!
      Do you mean variables with different lengths of inputs? If so, yes, you can, but the lacking inputs would be treated as missing values. Then, dealing with missingness comes into play. You should decide on a missingness handling method. The most straightforward but hazardous one is omitting the cases with missingness. For more advanced methods, see statisticsglobe.com/missing-data/.
      Best,
      Cansu

  • @Nico_boost
    @Nico_boost 7 หลายเดือนก่อน +1

    Nice video! How can one remove principal components to simplfy the model?

    • @cansustatisticsglobe
      @cansustatisticsglobe 7 หลายเดือนก่อน

      Hello Nico!
      I am glad that you liked the video. If you want to extract, let's say, the first two principal components to simplify your dataset. You can do the following for the dataset in this tutorial.
      # Principal Component Scores
      biopsy_pca$x
      # Retrieve the first two components
      biopsy_pca$x[,1:2]
      As you can see, it is a simple dataset subsetting operation.
      Best,
      Cansu

    • @Nico_boost
      @Nico_boost 7 หลายเดือนก่อน +1

      Thank you!

    • @cansustatisticsglobe
      @cansustatisticsglobe 7 หลายเดือนก่อน

      Welcome @@Nico_boost !

  • @thezenithanalysis7541
    @thezenithanalysis7541 12 วันที่ผ่านมา +1

    What variable will we use as an independent variable as an index? I mean, what is the biopsy index variable we will use for the analysis?

    • @StatisticsGlobe
      @StatisticsGlobe  11 วันที่ผ่านมา

      I'm not an expert in the biopsy field, but as far as I know, the biopsy index variable used for the analysis will typically be the first principal component derived from the PCA on the set of dummy variables. This component captures the most significant variation in the data, serving as a comprehensive index.

  • @thezenithanalysis7541
    @thezenithanalysis7541 12 วันที่ผ่านมา +1

    Can we use PCA to make an index using a set of dummy variables?

    • @StatisticsGlobe
      @StatisticsGlobe  11 วันที่ผ่านมา

      Hey! Yes, PCA can be used to create an index from a set of dummy variables by transforming them into principal components that summarize the underlying patterns.

  • @ariskoitsanos607
    @ariskoitsanos607 10 หลายเดือนก่อน +1

    Nice intro. If you had added at least some basic interpretation of the results would be even better.

    • @cansustatisticsglobe
      @cansustatisticsglobe 10 หลายเดือนก่อน

      Hey!
      Thank you for your feedback. It is the second video of our PCA series. I explain how to interpret the results in our first video, which you can find here: th-cam.com/video/DngS4LNNzc8/w-d-xo.html. I hope it helps!
      Best,
      Cansu

  • @freya_yuen
    @freya_yuen 8 หลายเดือนก่อน +1

    Is it possible that we perform PCA without using factoextra library? Relying solely on tidyverse?

    • @cansustatisticsglobe
      @cansustatisticsglobe 8 หลายเดือนก่อน

      Hello Melody!
      Yes, you don't need to install the factoextra package. It is just needed to visualize the results easily. You can also use only ggplot2 for the visualization, but then you need to write more lines of code to obtain the same visual. Please let me know if you still have some questions.
      Best,
      Cansu

    • @freya_yuen
      @freya_yuen 8 หลายเดือนก่อน +1

      @@cansustatisticsglobeGot it!! Thanks for the reply !!

    • @cansustatisticsglobe
      @cansustatisticsglobe 8 หลายเดือนก่อน

      Welcome @@freya_yuen !

  • @ambujmishra695
    @ambujmishra695 5 หลายเดือนก่อน

    if resultant variables are overlapping, what can be done to ripple that. i used geom_text_repel, nudge_x = c(4, 5, 6), # Adjust horizontal position for each label
    nudge_y = c(0, 0, 0) # Adjust vertical position for each label
    etc. But nothing was working. may be i was using it in correctly.
    kindly guide me for the same please

    • @Ifeanyi.StatisticsGlobe
      @Ifeanyi.StatisticsGlobe 4 หลายเดือนก่อน

      Hi Ambujmishra. Sorry about the late response to your comment. Do you still require assistance with the problem?

    • @ambujmishra695
      @ambujmishra695 หลายเดือนก่อน

      @@Ifeanyi.StatisticsGlobe yes

  • @pramitthapa283
    @pramitthapa283 5 หลายเดือนก่อน

    Everyone makes simple videos of PCA. No one can explain well what PCA and dimension reduction means. Maybe the video makers also don’t understand what PCA is

    • @StatisticsGlobe
      @StatisticsGlobe  5 หลายเดือนก่อน

      Hey, do you have a specific question that is not answered by the video? We are happy to help. Regards, Joachim