How to use Stata for Principal Component Analysis (PCA)

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ก.ย. 2024
  • Using Stata to replicate the results of the PCA example in Multivariate Data Analysis by Hair et al.
    The link to download the authors' sample data is mvstats.com/do....
    See my website for a copy of the Stata log file used in the video (and much else besides!): financefundame....
    Join the Finance Fundamentals Discord server: / discord .

ความคิดเห็น • 12

  • @khadimhussainmalik3284
    @khadimhussainmalik3284 5 หลายเดือนก่อน

    Dear Sir, I extend my gratitude for the insightful lecture you provided. In my research, I have identified two variables with noteworthy cross-loading factors. The dilemma arises as to which variable should be prioritized for removal, considering their significant cross-loading with Factor 1 and Factor 2.
    tour4 | 0.7039 -0.5249
    ser | 0.7423 0.5641

    • @financefundamentals
      @financefundamentals  5 หลายเดือนก่อน

      Thank you for your comment/question! As I mentioned in the video, I'm not a statistics expert. Just a generalist interested in sharing knowledge about using Stata for various analyses. So you need to consider my response below while bearing that in mind.
      Regarding your specific question about which variable to remove due to cross-loading, a common approach is to consider both the statistical and theoretical aspects. From a statistical perspective, you would most likely remove the variable with the lower communality. (Based on the limited numbers you provided, this might be 'tour4' - but you need to check that column of your results.)
      However, you should also think about the theoretical relevance of each variable to your research question. Consider which variable is more meaningful to retain, based on your study's objectives and underlying theory. Sometimes a variable with slightly lower communality may be more crucial to keep from a conceptual standpoint.
      Another option to consider is trying the analysis with each variable removed in turn, and comparing the results to see which solution makes more sense and aligns better with your research goals.

  • @ehiidoko6934
    @ehiidoko6934 4 หลายเดือนก่อน

    Thanks for this! it was super helpful

    • @financefundamentals
      @financefundamentals  4 หลายเดือนก่อน

      Awesome! Happy I could help! Good luck with your Stata/PCA journey!

  • @atharalishah4951
    @atharalishah4951 ปีที่แล้ว

    Hello sir can you please explain why x11 in the cross loading is eliminated although the value is not the same in both columns. in fact they are close to same, if this is the case then other factors are also close to each other why they are not dropped. Thanks.

    • @financefundamentals
      @financefundamentals  ปีที่แล้ว +1

      [Time stamp: issue starts around 9.55] Take a careful look at all the loadings. Notice that for all variables, except for X11, there is one (and only one) factor that has a high loading. X11 is different. It does not have any loading that is as high as any of the others, with a maximum loading of only 0.6420. But that is not the main problem. Even worse, it has TWO loadings around 0.59 to 0.64. This is called a cross-loading. So X11 is dropped. A cross-loading is NOT defined as two loadings that are exactly the same. Instead you are looking for two or more high(ish) loadings on a single variable, which are greater than your chosen significance level.

  • @mohammadtaufan9914
    @mohammadtaufan9914 ปีที่แล้ว

    Hello, can I ask you one little question? Is there a way to create plot using the factors here 9:29? Thanks in advanced.

    • @financefundamentals
      @financefundamentals  ปีที่แล้ว

      Remember that you would realistically be limited to a maximum of 3 factors if you wanted to visualise a plot. Here there are 4, which is why the source text used for this video does not try to show such a plot. 4-dimensional plots on a 2-D piece of paper are not strictly speaking impossible, but are unavoidably messy and hard to interpret.

    • @mohammadtaufan9914
      @mohammadtaufan9914 ปีที่แล้ว

      First, I'd like to give you my gratitude for replying. Your answer makes sense as it provides little to no information making plot from these factors. What I had in mind was I tried to make time series graph in which there were plot lines of each factors (X axis is variable time and Y axis is the value of factors loadings). Perhaps there is a tutorial for making such graph? As always, thank you in advanced. @@financefundamentals

  • @Mimi-nr6jx
    @Mimi-nr6jx ปีที่แล้ว +1

    How do you use the loadings to create an index please?

    • @financefundamentals
      @financefundamentals  ปีที่แล้ว

      There are a number of methods. I personally have used the approach in Anderson, TW and Rubin, H. 1956. Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5:111-150.

    • @Mimi-nr6jx
      @Mimi-nr6jx ปีที่แล้ว +1

      Thank you!