Validating K-means cluster anslysis in SPSS

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ก.ย. 2024
  • In this video I show and explain how to determine the appropriate and valid number of factors to extract in a k-means cluster analysis.

ความคิดเห็น • 30

  • @Ana-zi4mk
    @Ana-zi4mk 8 ปีที่แล้ว

    Hi, James. Thank you for this video. I also watched your other video regarding K-means cluster analysis in SPSS where you have mentioned: „If we can't converge in 10 iterations than we probably don’t have good data for clustering”. I am trying to learn how to do the cluster analysis and I am using some of my data. I have followed your suggestions on how to determine the number of clusters and how to validate them. In my case, I did k-means cluster analysis where I have specified 2, 3, 4 and 5 clusters. In the case of 3 cluster solution, post hoc tests were significantly different in the table presenting Multiple comparisons, but a number of iterations where 0.000 was achieved for all three clusters was 14. On the other hand, in the case of 4 cluster solutions, a number of iterations where 0.000 was achieved for all three clusters was 10, but in the table presenting Multiple comparisons two clusters were not significantly different on few variable. What is your opinion, is my data not suitable for cluster analysis?

    • @Gaskination
      @Gaskination  8 ปีที่แล้ว +1

      +Ana It might be suitable. The more variables you include, the harder it is to converge. So, if there are lots of variables, then more than 10 iterations is fine. I don't know if there is a published threshold or guideline.

    • @eboamuah6811
      @eboamuah6811 3 ปีที่แล้ว

      @@Gaskination Hi James. Your work has been very helpful. I have read about silhouette as a method of validation in K mean cluster analysis. However, I don't know how to obtain that in SPSS. Is there any index in SPSS that can be used to validate the number of clusters chosen in K mean cluster analysis? Thank you

    • @Gaskination
      @Gaskination  3 ปีที่แล้ว

      @@eboamuah6811 silhouette is used in two-step cluster analysis in SPSS, but I don't know of a way to produce it for K-means.

  • @zhalehmohammadalipour3542
    @zhalehmohammadalipour3542 2 ปีที่แล้ว

    Very great tutorial! it helped a lot. Thanks.

  • @nataliegillepiegaskins
    @nataliegillepiegaskins 2 ปีที่แล้ว

    Thank you for this! Nice last name!

  • @najeebullahahmadzai5160
    @najeebullahahmadzai5160 3 หลายเดือนก่อน

    Thank you sir!

  • @kanika8123
    @kanika8123 4 ปีที่แล้ว

    Thanks a lot. Very helpful video.

  • @jdemontre
    @jdemontre 4 ปีที่แล้ว +1

    Hey James, I enjoy your videos specially about SEM and now cluster analysis. Thank you! I ran my data and everything went well (10 variables and ca.100 observations). The 3-cluster solution was the best in all criteria. But the Bonferroni test resulted not significant in 2 (out of 60) comparisons (p-vaue slightly higher than 0.1), does it mean the solution was not validated?

    • @Gaskination
      @Gaskination  4 ปีที่แล้ว

      If it is just 2 out of 60 comparisons, then this is strong evidence that it is a good clustering solution. Nice!

  • @009kishor
    @009kishor 6 ปีที่แล้ว

    Very helpful video 👍🏻

  • @thanghoang1944
    @thanghoang1944 3 ปีที่แล้ว

    THANK YOU!

  • @marcelbeermann1036
    @marcelbeermann1036 4 ปีที่แล้ว

    Thanks for the video.
    How can I see if a cluster actually is underrepresented?

    • @Gaskination
      @Gaskination  4 ปีที่แล้ว

      It's just a subjective judgment. If the sample size of the cluster is small, then perhaps it is under-represented. You can see what the profile of members of that cluster looks like to determine if it is a legitimate cluster, or just an odd outlier.

  • @kieramillar-brandt2854
    @kieramillar-brandt2854 3 ปีที่แล้ว

    Hi James, thanks for this video. Is there a paper that can be referenced to support that a lower number of iterations is better? Or maybe a paper that indicates best practice in general for reporting the results of k-means clustering? Many thanks. Kiera

    • @Gaskination
      @Gaskination  3 ปีที่แล้ว

      Chapter nine of Hair et al 2010 ("Multivariate Data Analysis") is all about clustering methods.

    • @kieramillar-brandt2854
      @kieramillar-brandt2854 3 ปีที่แล้ว

      @@Gaskination thanks very much. That's really appreciated. Your videos are great!

  • @henrypritchard4911
    @henrypritchard4911 4 ปีที่แล้ว

    Hi James,
    This has been very helpful, so firstly thank you!
    I was wondering if there was a way to validate/find a statistical difference between two clusters as a post hoc one way ANOVAs cannot be performed on fewer than 3 groups/clusters of data?
    Kind Regards,
    Henry

    • @Gaskination
      @Gaskination  4 ปีที่แล้ว +1

      You can just use a t-test instead.

    • @henrypritchard4911
      @henrypritchard4911 4 ปีที่แล้ว

      @@Gaskination Thank you!

    • @henrypritchard4911
      @henrypritchard4911 4 ปีที่แล้ว

      @@Gaskination Hi James,
      I am sorry to be a pain with another question. I was also wondering why in these instances there is no need to test for normality of distribution before performing the ANOVA with post hoc tests?
      Thank you in advance and Kind regards, Henry

    • @Gaskination
      @Gaskination  4 ปีที่แล้ว

      @@henrypritchard4911 Normality of distribution is not required for cluster membership. We really just need sufficient sample size in each group.

  • @shantanuchakrabory5527
    @shantanuchakrabory5527 4 ปีที่แล้ว

    K-mean cluster analysis using spss in really special one

  • @mayurgo10
    @mayurgo10 7 ปีที่แล้ว

    my data contains 900 observations and i tried k means method, the data converges at 15 iterations for 4 cluster solution and 16 iterations for 10 cluster solution. can you suggest some good test to check which cluster solution would be better?

    • @Gaskination
      @Gaskination  7 ปีที่แล้ว +3

      Check the AIC or BIC if that is an option. You want to minimize these. Also, check to see which solution is more helpful. Usually 3-5 clusters is most useful and anything more than 5 begins to be difficult to interpret or distinguish.

  • @statsmadeeasy7233
    @statsmadeeasy7233 ปีที่แล้ว

    Hi James can we get a copy of the file that you used? I wanted to practice it.

    • @Gaskination
      @Gaskination  ปีที่แล้ว

      It's the burgers dataset available on the homepage of statwiki.gaskination.com/

  • @masharifulamin5682
    @masharifulamin5682 4 ปีที่แล้ว

    Hello James, im new here, is it possible to get the dataset to practice? plz share it with us.

    • @Gaskination
      @Gaskination  4 ปีที่แล้ว

      The dataset is available on the homepage of statwiki: statwiki.kolobkreations.com/

  • @karlafuentes2726
    @karlafuentes2726 3 ปีที่แล้ว

    In spanish plis