How to perform clustering in R with the k-means algorithm - R for Data Science

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ก.พ. 2025

ความคิดเห็น • 48

  • @data.ninjas
    @data.ninjas  3 ปีที่แล้ว +1

    Get access to download the scripts and data from GoogleDrive: dataninjas.ck.page/yt-files

  • @amiyabasak7096
    @amiyabasak7096 ปีที่แล้ว +1

    I have gained a comprehensive understanding of this topic, and sir, your explanations have been exceedingly clear to me.

    • @data.ninjas
      @data.ninjas  ปีที่แล้ว

      Thank you very much for your kind message. I'm happy to hear that you find my video helpful. Best regards

  • @juanbautista6766
    @juanbautista6766 2 ปีที่แล้ว +1

    Wow. Great tutorial. Have seen many videos for generating “elbow plot”, but using the factoextra package as you noted here is GOLDEN! Thanks!!

    • @data.ninjas
      @data.ninjas  2 ปีที่แล้ว +1

      Thank you very much for your kind message! Yes, the factoextra package makes it easy to create an elbow plot. Glad to hear you find the video helpful. Kind regards

  • @snehaj3378
    @snehaj3378 ปีที่แล้ว

    You have no idea.. how u helped me.... God Bless!!

    • @data.ninjas
      @data.ninjas  ปีที่แล้ว

      You're very welcome. Glad to know you find the video helpful. Kind regards

  • @gabrielp.40
    @gabrielp.40 8 หลายเดือนก่อน

    You are a lifesaver, thank you so much for the tutorial!

    • @data.ninjas
      @data.ninjas  8 หลายเดือนก่อน

      You're very welcome! Thank you for watching my video

  • @AchiragChiragg
    @AchiragChiragg ปีที่แล้ว

    Thank you for making this video!
    It was very informative and helpful

    • @data.ninjas
      @data.ninjas  ปีที่แล้ว

      Glad to hear you found the video helpful! Thanks for your kind comment

  • @DaliaAboelmakarm-un9ee
    @DaliaAboelmakarm-un9ee 6 หลายเดือนก่อน

    many thanks for this sufficient illustration,, really thanks

    • @data.ninjas
      @data.ninjas  6 หลายเดือนก่อน

      You're very welcome, thank you for watching my video

  • @johneagle4384
    @johneagle4384 2 ปีที่แล้ว

    Thank you for the video, and also thank you for the scripts!

    • @data.ninjas
      @data.ninjas  2 ปีที่แล้ว

      You're very welcome! Thank you for watching and for commenting on my video

  • @JorgeRodriguez-mp1mt
    @JorgeRodriguez-mp1mt 3 ปีที่แล้ว

    Aware of your contributions greetings from Mexico

    • @data.ninjas
      @data.ninjas  3 ปีที่แล้ว +1

      Thank you very much. Best regards

  • @thelightofgod9151
    @thelightofgod9151 3 ปีที่แล้ว

    Wow. Very clear and precise. Thanks

    • @data.ninjas
      @data.ninjas  3 ปีที่แล้ว

      Thanks for your kind comment

  • @lehoangucduy1425
    @lehoangucduy1425 ปีที่แล้ว +1

    Why choose center value of 3 in kmeans function? please explain help me

  • @MmaNdibe
    @MmaNdibe 7 หลายเดือนก่อน

    Thank you sir. Can means be applied to analysis with likert scale data?

    • @data.ninjas
      @data.ninjas  7 หลายเดือนก่อน +1

      You're welcome. You may need to do some data preprocessing to apply k-means to an analysis with likert scale data. You'll have to first apply one-hot encoding so each response/category becomes a binary variable (0 or 1) and then normalize the data to have a mean of 0 and a standard deviation of 1. However note that K-means clustering uses Euclidean distance and assumes that distances between points are meaningful and comparable. This may not be appropriate for likert scale data since likert scale data is ordinal and the distances between responses may not be consistent, so you may consider alternative clustering techniques that are more suited to ordinal data, such as hierarchical clustering or model-based clustering approaches

    • @MmaNdibe
      @MmaNdibe 7 หลายเดือนก่อน

      @@data.ninjas thank you. I think hierarchical will be good

  • @aysegulgunduz4292
    @aysegulgunduz4292 2 ปีที่แล้ว

    Hi, how can I find this data on the internet? or How can I have access to explanation about dataset?

  • @letsfly8654
    @letsfly8654 11 หลายเดือนก่อน

    fviz_nbclust(data,kmeans,method='wss' cannot be working why

  • @Pooh991
    @Pooh991 2 ปีที่แล้ว

    Great video, I learned a lot from it, especially in regards to the methods for choosing the optimal number of clusters. Quick question though, the clusters overlap in your plot, but I don't think that they are supposed over lat in the Kmeans method. Do you have any insight on this?

    • @data.ninjas
      @data.ninjas  2 ปีที่แล้ว

      Thanks for your kind comment. The clusters were created using 6 variables. The plots only show 2 variables at a time (2-dimensional plots) so some overlap can be seen. If it were possible to create a 6-dimensional plot then there would be not overlap

  • @anteachmad
    @anteachmad ปีที่แล้ว

    Does cluster analysis have to start with a multicollinearity test?

    • @data.ninjas
      @data.ninjas  ปีที่แล้ว +1

      No, it does not. Multicollinearity does not directly influence the cluster analysis results

  • @rafipermana7734
    @rafipermana7734 11 หลายเดือนก่อน

    when im execute fviz_nbclust, this happening: Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning messages:
    1: In stats::dist(x) : NAs introduced by coercion
    2: In storage.mode(x)

    • @data.ninjas
      @data.ninjas  11 หลายเดือนก่อน

      It may be because of NAs, kmeans cannot handle data that has NA values. See: stackoverflow.com/questions/36469671/error-in-do-onenmeth-na-nan-inf-in-foreign-function-call-arg-1

  • @HarpreetKaur-bx1ej
    @HarpreetKaur-bx1ej 2 ปีที่แล้ว

    Hi i have a question
    Perform a cluster analysis for 20 randomly selected Swiss bank notes.
    What is 20 in this case?

    • @data.ninjas
      @data.ninjas  2 ปีที่แล้ว

      Hi. That question is not clear. It may mean that from a given dataset select 20 observations (rows) randomly and perform a cluster analysis, or it may mean something else

    • @HarpreetKaur-bx1ej
      @HarpreetKaur-bx1ej 2 ปีที่แล้ว

      @@data.ninjas
      Here is the full question
      What is 20?
      Cluster analysis for 20 randomly selected Swiss bank dataset with following requirements
      1. Set pseudo random numbers for 20 randomly selected data points
      2.write about accuracy, missing values and outliers
      3. what is the rationale for selecting a k-means clustering and with a distance function
      4. interpret and make comment on clustering output
      5. is cluster analysis technique used for dataset is good? Use cluster evaluation
      6. visualize 20 selected datapoints by plotting the result of principal components

    • @data.ninjas
      @data.ninjas  2 ปีที่แล้ว

      @@HarpreetKaur-bx1ej The first interpretation was correct. Select 20 rows (data points) from the dataset randomly

    • @HarpreetKaur-bx1ej
      @HarpreetKaur-bx1ej 2 ปีที่แล้ว

      @@data.ninjas it means I have to take nstart=20?

    • @HarpreetKaur-bx1ej
      @HarpreetKaur-bx1ej 2 ปีที่แล้ว

      Can you please help me in this question as am stuck in it

  • @vishalisharma3883
    @vishalisharma3883 11 หลายเดือนก่อน

    why my mutate function is not working

  • @what2605
    @what2605 6 หลายเดือนก่อน

    that one sameple no.79 made me feel very unsatisfied ..

  • @kharankumarr2119
    @kharankumarr2119 3 ปีที่แล้ว

    Is this Cure algorithm

    • @data.ninjas
      @data.ninjas  3 ปีที่แล้ว

      The kmeans() function in R uses the Hartigan-Wong algorithm by default. Other options are the Lloyd, Forgy and MacQueen algorithms

    • @kharankumarr2119
      @kharankumarr2119 3 ปีที่แล้ว

      @@data.ninjas Sir now I need cure algorithm R programming code

    • @kharankumarr2119
      @kharankumarr2119 3 ปีที่แล้ว

      Can you please give me your mail id

    • @data.ninjas
      @data.ninjas  3 ปีที่แล้ว

      @@kharankumarr2119 There may not be an implementation of cure algorithm in R yet (or at least I have not found any). There is a Python implementation for cure: github.com/annoviko/pyclustering You may run cure in Python, or you may use the reticulate package in R to work with Python in R rstudio.github.io/reticulate/

    • @kharankumarr2119
      @kharankumarr2119 3 ปีที่แล้ว

      @@data.ninjas sir it is a project for us to do it in R programming i am data analytics student of psgcas

  • @foziachoudhary9858
    @foziachoudhary9858 2 หลายเดือนก่อน

    Please provide your mail

  • @mehrananjum5501
    @mehrananjum5501 10 หลายเดือนก่อน

    please can you help me i need your email?