How to Perform K-Means Clustering in R Statistical Computing

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ก.พ. 2025

ความคิดเห็น • 95

  • @PsychTeacher100
    @PsychTeacher100 5 ปีที่แล้ว +2

    Helped me for my data analytics class. Very new to R and this step by step tutorial was wonderful. My assignment was actually to use the Iris data. I learned so much.

  • @joes9110
    @joes9110 8 ปีที่แล้ว +7

    You explained this in plain english, which I really appreciate!! Great video, thank you

  • @DWR447
    @DWR447 10 ปีที่แล้ว +9

    Great Work. I appreciate your use of the Iris data set. I'm familiar with the data set. That means I can focus on what you have to say about K-Means clustering without having to learn the details of a new data set. Thank you.

  • @juandavidcamargo5713
    @juandavidcamargo5713 4 ปีที่แล้ว +1

    so easy to me, with your tutorial

  • @TheSandyKale
    @TheSandyKale 8 ปีที่แล้ว

    Great video. Explained in an easy to understand manner, compared to some of the more cryptic R training material I have looked at.

  • @timothysorber5825
    @timothysorber5825 10 ปีที่แล้ว

    Excellent video. It provided simple steps to follow. I was working with the faithful dataset in the R distribution. Ones eye could see the two clusters. I applied your instruction but used the faithful data and was able to break out the two existing clusters. These are almost intuitive. Thanks

  • @BishSinhaExcelsior
    @BishSinhaExcelsior 9 ปีที่แล้ว +1

    Simple and very good for beginners.

  • @WahranRai
    @WahranRai 5 ปีที่แล้ว

    4:04 Why dont you normalize the data before kmeans ?
    Is there some rules concerning the range of attributs and extra relationship ...

  • @abuzarzia71
    @abuzarzia71 7 ปีที่แล้ว +3

    Error in plot.xy(xy, type, ...) : invalid color name 'Iris-setosa' thats the error which appears everytime....help me out with this

    • @samuelsephiri147
      @samuelsephiri147 7 ปีที่แล้ว

      I'm experiencing similar problem.Can anyone help with this,especially on the part when doing a plotting comparison with original data ?However if you use colRamp,it outputs the plot however colors disspears

  • @mayurgo10
    @mayurgo10 7 ปีที่แล้ว +1

    the codes aren't working for mine it shows error.Can you help me?

  • @hanspratyaksa8936
    @hanspratyaksa8936 4 ปีที่แล้ว

    Thanks for tutorial. Why didn't you normalize or scale the data first?

  • @zinmot5457
    @zinmot5457 3 ปีที่แล้ว

    Very helpful! Thank you, god bless you sir.

  • @murtadhaal-sharuee3874
    @murtadhaal-sharuee3874 9 ปีที่แล้ว

    Hi Influxity, really helpful thank u very much.
    And I have a question if u could ans me please, is there a way to specify the initial centroids and specify the type of distance we can use?

  • @mugrad25
    @mugrad25 8 ปีที่แล้ว

    quick questions. 1) would you standardize the data turning all responses into z-scores?
    2) cluster generation is not criterion based? As it stands now, the clustering is based on finding the greatest difference and similarities simultaneously based on the wanted number of clusters; the user then has to then compare the clustering results to response variables. the closest match infers a possible difference which is more than likely tested with an ANOVA to prove significance?

    • @souravdas1983
      @souravdas1983 8 ปีที่แล้ว +1

      use scaled_iris = scale(iris.features)

  • @Dr_Ali.Aljboury
    @Dr_Ali.Aljboury 7 ปีที่แล้ว

    It's very great now I understand how is it work . But there's one question how if we make it work with semantic analysis and topological parameters.

  • @alexmartino5949
    @alexmartino5949 9 ปีที่แล้ว

    Nice video. You said that the algorithm reinitializes during each run. What does that mean?

  • @fauziardi4985
    @fauziardi4985 7 ปีที่แล้ว +1

    hello, can you help me? im new in using r. i follow your tutorial but i got an error.
    results = kmeans(Iris.features, 3)
    Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning message:
    In kmeans(Iris.features, 3) : NAs introduced by coercion

    • @yasiruddin9896
      @yasiruddin9896 7 ปีที่แล้ว

      Initially, I too got this error. Then, I tried this and it worked fine.
      x

    • @subhashishkoirala9292
      @subhashishkoirala9292 7 ปีที่แล้ว

      although a bit late try this
      results

  • @meshackamimo1945
    @meshackamimo1945 10 ปีที่แล้ว

    God bless u for this ingenious n intuitive posting...keep up the good job. More videos please.

  • @shabnamtafreshi714
    @shabnamtafreshi714 10 ปีที่แล้ว

    Simple and up to point. Thanks!

  • @abeersaxena3204
    @abeersaxena3204 6 ปีที่แล้ว

    what does cluster means describe for a particular observation? like we have to consider a whole row as a complete record. So , how to interpret the cluster means as coordinates of centroid or mean distance between centroid and data points?

  • @anishamariamthomas2981
    @anishamariamthomas2981 10 ปีที่แล้ว

    Which similarity measure is used here to perform the k-means clustering..??? And how this default measure can be changed in the Rstudio??

  • @lenwerksgt5906
    @lenwerksgt5906 4 ปีที่แล้ว

    Where did you download the Iris.CSV? I don't see that anywhere on the website. Please assist.

  • @ASTRAGOVANM
    @ASTRAGOVANM 5 ปีที่แล้ว

    what to do when you retry to run the coding with the same data and the results of the cluster have different observations / cluster??? help please

  • @DCentFN
    @DCentFN 4 ปีที่แล้ว

    When i try to check the actual data using "plot(Iris[c("petalLength","petalWidth")], col = Iris$class)" or "plot(Iris[c("sepLength","sepWidth")], col = Iris$class)", I get an error saying "Error in plot.xy(xy, type, ...) : invalid color name 'Iris-setosa'". Not sure how to fix this

  • @Injektil_o
    @Injektil_o 10 ปีที่แล้ว +3

    Hi Influxity. You mentioned that you can estimate how many clusters there should be. Can you cover this, or have you covered this elsewhere?

  • @srivathsesh
    @srivathsesh 7 ปีที่แล้ว

    Thank you, very well illustrated

  • @DBProds96
    @DBProds96 7 ปีที่แล้ว

    How would this be done with the mnist_dataset? It's got 20,000 rows and 700+ columns

  • @becarefull01
    @becarefull01 8 ปีที่แล้ว

    Could you please provide implementation of different version of spectral clustering using R ?

  • @Pokemonpets
    @Pokemonpets 8 ปีที่แล้ว

    how can you calculate F1-Measuare , precision, recall and entropy of the clustering result? additionally does it support sparse data? for text clustering sparse data is a must

    • @bklamoreaux_old
      @bklamoreaux_old 8 ปีที่แล้ว

      You can evaluate clustering algorithms with other techniques like: Calinski Harabasz Evaluation, Davies Bouldin Evaluation, Gap Evaluation (Distance from center), Silhouette Evaluation. See www.mathworks.com/help/stats/clustering.evaluation.clustercriterion-class.html

  • @revenez
    @revenez 6 ปีที่แล้ว

    Very informative video. Thank you.

  • @jeshrielpolancos5143
    @jeshrielpolancos5143 7 ปีที่แล้ว +2

    Error in table(Iris$class, results$cluster) :
    all arguments must have the same length
    how to fix this?

    • @siddharthadas86
      @siddharthadas86 7 ปีที่แล้ว +1

      Check if class is not a factor by levels, if not change it to factor.

  • @urmayshah6863
    @urmayshah6863 8 ปีที่แล้ว

    how can i train some data using k means and then test some data using that? and bit explaination about accuracy and all other parameters...!!!

  • @vktonline
    @vktonline 8 ปีที่แล้ว

    could u please upload implementation, I want to make changes in algorithm and see the results

  • @sj8648
    @sj8648 6 ปีที่แล้ว

    What a beautiful voice!

  • @otomehusband
    @otomehusband 6 ปีที่แล้ว

    thanks for providing this good videos

  • @sourishmukherjee2404
    @sourishmukherjee2404 7 ปีที่แล้ว

    I am getting 3 clusters each of same size-50 obsv on each cluster.Somethinh wrong.Any comments?
    '

  • @HibaYahyaoui
    @HibaYahyaoui 9 ปีที่แล้ว

    Thank you very much, it is verry helpful tuto..

  • @sathyavel8046
    @sathyavel8046 11 ปีที่แล้ว +52

    am tired of seeing this iris data... lots of videos using the same...why there is no real world examples used .??

    • @influxity2694
      @influxity2694  10 ปีที่แล้ว +33

      I understand what you're saying and I'm sorry you feel that way. The point of the video is to understand k-means and see how it's used in R. The Iris dataset fits k-means well as it has nice clusters that are in lower dimensions. It's also a familiar dataset and easy to get and work with. This allows viewers to focus on the commands and outputs of the algorithm in R and get a better understanding of the algorithm in general without having to worry about more complicated datasets and edge cases. I'll work with some different datasets in future videos but want to pick them so they add to understanding the main point of the video.

    • @BishSinhaExcelsior
      @BishSinhaExcelsior 9 ปีที่แล้ว +14

      mightyvel vel iris data IS real world example but may not be in your area of work :)

    • @Pokemonpets
      @Pokemonpets 8 ปีที่แล้ว

      +mightyvel vel well you are just at the beginning. you will gross out with seeing how primitive examples and algorithm equations without a single example :D

    • @nahid7499
      @nahid7499 7 ปีที่แล้ว

      Bish Sinha ftv6

    • @aslogdahl4469
      @aslogdahl4469 7 ปีที่แล้ว +5

      It is very sad to hear that Irises are not considered to be part of the real world.

  • @kapamagicman
    @kapamagicman 11 ปีที่แล้ว +2

    Thanks for doing this! I subscribed and liked. Also do some more similar functions in R

    • @influxity2694
      @influxity2694  10 ปีที่แล้ว +1

      Thank you. I'll work to get some more videos up soon.

  • @HonGoArtist
    @HonGoArtist 9 ปีที่แล้ว

    Question: How can we upload the kmeans results to the original data set? I want to compare reality to the predicted class for each entity. in other words. I'd like to see the predicted class results in the original spreadsheet "iris".

    • @janisgredzens7463
      @janisgredzens7463 8 ปีที่แล้ว +2

      If it is still helpful, here is a version using data.table package:
      -------------------------------------------------------------------------------------------------------------------------------------------------
      require(data.table)
      iris.features

    • @souravdas1983
      @souravdas1983 8 ปีที่แล้ว

      Also use 'confusionMatrix' to check how well the model has predicted.

  • @slacroix-31
    @slacroix-31 10 ปีที่แล้ว +1

    This is awesome. Thank you !!!

  • @dduttaroy
    @dduttaroy 6 ปีที่แล้ว

    table(Iris$class, results$cluster) is not clear. Please explain.

  • @marcoantoniomirandahernand6819
    @marcoantoniomirandahernand6819 10 ปีที่แล้ว

    Dear Influxity, I tried to do the same steps that you do, with the same DB, in R Gui (32 bits) , but when I execute: results

    • @influxity2694
      @influxity2694  10 ปีที่แล้ว +2

      Can check the file you're reading in? See if it has non-numeric or missing values. Look to see if you have an extra line, Null, or something like 1.2e+10. Let me know what you find and we'll go from there.

    • @NanaOnix23
      @NanaOnix23 10 ปีที่แล้ว

      Influxity Hii!!
      If the file has missing values, what should i do to fix this?
      thank you for this video, it helps me a lot!! :D

    • @michelleli8953
      @michelleli8953 10 ปีที่แล้ว

      Hi Marco Antonio Miranda Hernández , I had this problem initially as well and it turns out there was an error in my csv. file from copy+pasting.

    • @subhashishkoirala9292
      @subhashishkoirala9292 7 ปีที่แล้ว

      although very late try this
      results

  • @betzthomas9693
    @betzthomas9693 5 ปีที่แล้ว

    what is vector in kmean clustering?

  • @vivekjoshi937
    @vivekjoshi937 7 ปีที่แล้ว

    This is amazing learning

  • @arpitbhatnagar2154
    @arpitbhatnagar2154 7 ปีที่แล้ว

    Thank you!
    It was helpful

  • @kaiyuwang2822
    @kaiyuwang2822 10 ปีที่แล้ว +1

    fantastic mate!

  • @miguelguilherme4331
    @miguelguilherme4331 11 ปีที่แล้ว

    Amazing job! Thanks

  • @anoojkvarghese9903
    @anoojkvarghese9903 10 ปีที่แล้ว

    i get when k=3
    then cluster is formed 50,61,39 is it correct?

    • @vineyshar1
      @vineyshar1 10 ปีที่แล้ว

      kmeans does some random initialization at backend which results in slightly different outcome every time you run it.

  • @nikhiljamisetti7139
    @nikhiljamisetti7139 7 ปีที่แล้ว

    CAN ANYBODY TELL ME HOW TO FIND THE DEFECT CLUSTERS IN THE ABOVE DATASET

  • @XenomorphLV426
    @XenomorphLV426 4 ปีที่แล้ว

    How do I scale the data?

  • @hongngoctran1218
    @hongngoctran1218 6 ปีที่แล้ว

    How to analyze Cluster means? I don't understand it.

  • @Eleni.314
    @Eleni.314 6 ปีที่แล้ว

    Thank you sir!

  • @sarthakbiswas2201
    @sarthakbiswas2201 8 ปีที่แล้ว

    Can someone do a k-modes as well? For categorical data?
    Since, k-means doesn't work for categorical I guess.

    • @souravdas1983
      @souravdas1983 8 ปีที่แล้ว

      What is the issue in replacing 'kmodes' function in R? For dataset with both categorical and numerical variables, use K-prototype (kproto)

  • @harshalaharivaliveti5185
    @harshalaharivaliveti5185 7 ปีที่แล้ว

    how do u download the data from archive.ics.uci.edu

  • @BlueHenAnalytics
    @BlueHenAnalytics 5 ปีที่แล้ว

    Even though it's an old video, a few important parts are intentionally left out. He says that you need to 'normalize your data' and remove any rows that are missing values. Unfortunately these instructions are not given, this only complicates how to properly complete this task from start to finish. It's only a couple lines of code to do these operations yet he doesn't show how to do it.

  • @muniseswar7526
    @muniseswar7526 7 ปีที่แล้ว

    how to download iris data set

  • @rachidaitlhaj9176
    @rachidaitlhaj9176 9 ปีที่แล้ว

    Good Job

  • @zhou6075
    @zhou6075 2 ปีที่แล้ว

    thanks

  • @adityanjsg99
    @adityanjsg99 5 ปีที่แล้ว

    Nice video.. Voice fades (not often but can be avoided)

  • @pepikkk10
    @pepikkk10 7 ปีที่แล้ว

    thanks a mil :D

  • @kanikaswap
    @kanikaswap 7 ปีที่แล้ว +1

    good

  • @utkucansa
    @utkucansa 8 ปีที่แล้ว

    I liked the video. But the interpretation part could be wider. Thanks though,
    Cheers,

  • @vktonline
    @vktonline 8 ปีที่แล้ว

    nice

  • @TheZvercica
    @TheZvercica 10 ปีที่แล้ว

    I am not able to download database!!!!! I'll go crazzy

    • @yovanyluis
      @yovanyluis 10 ปีที่แล้ว +1

      Hi, what I did, is to download "iris.data" and open it in a text editor like Notepad++, add a header, just write at the first line "sepal.length","sepal.width","petal.length","petal.width","class", it is required for a csv file. Then save it as "iris.csv", and now you can follow the video :D

    • @ajufsd
      @ajufsd 10 ปีที่แล้ว +1

      iris is there in R by default. jz key in "data(iris)" and you are in.

  • @subhashishkoirala9292
    @subhashishkoirala9292 7 ปีที่แล้ว

    all of you all who are having " Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)" problem
    instead of
    results

  • @mm_007
    @mm_007 2 ปีที่แล้ว

    To keep your voice audible, do you have to pay extra taxes?