K-means Cluster Analysis With Excel - A Tutorial

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 พ.ย. 2024

ความคิดเห็น • 80

  • @DaveOnData
    @DaveOnData  3 ปีที่แล้ว +7

    💥 Learning R programming is easy for Excel users! 💥
    📺 th-cam.com/video/MNpsyjSuR20/w-d-xo.html
    👩‍🔬👨‍🔬 Learn to use your basic Excel skills to analyze the business like a Facebook data scientist:
    📺 th-cam.com/video/xIXymabyFIM/w-d-xo.html

  • @Dfcameron13
    @Dfcameron13 3 ปีที่แล้ว +7

    I love the detailed explanation. Dave Langer has a great way of explaining things conversationally. As mentioned, the first problem the algorithm needs to solve is the starting centroids for the proposed clusters. This is actually a critical problem that does not have a good general solution. I do this for a living, and use domain expertise vs. random. My only additional recommendation is to realize that you don't often know what cluster each point belongs to at the start. So, his setup of a "red cluster" and a "green cluster", where you can find the means, as Dave said, is overly simplistic. In real-world situations, it isn't that easy to find means across dimensions.

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +3

      Thank you so much for taking the time to add such a valuable comment to the video - much appreciated! 🙏🙇‍♂️
      Also, thank you for the kind words regarding my teaching style, this is also much appreciated. 😁

  • @caitlinthompson458
    @caitlinthompson458 2 ปีที่แล้ว +6

    This is such a great tutorial. I've been trying to do this all week (started in python but came back to Excel) and this is exactly what I needed. Really thorough, lots to think about and still easy to follow!

  • @aaronporche6820
    @aaronporche6820 9 หลายเดือนก่อน +1

    DUDE i think you sir are maybe the best. most well spoken teacher i have ever had. THANK YOU FOREVER YOU ARE SPECIAL. I am a logistics coordinator and trying to schedule work in excel with at least 10 dimensions of variables, this is a miracle for me ill figure out how to use R for sure cause this is exactly what I need. My brain hurts so bad this sounds easy hahaha I think its mostly the teacher tho

    • @DaveOnData
      @DaveOnData  9 หลายเดือนก่อน

      Wow! So glad you found my content useful. If you're interested in k-means clustering, you might want to check out some of my newer videos.
      K-means using Python in Excel:
      th-cam.com/video/eg-13ffiggI/w-d-xo.html
      Combining k-means and decision trees using Python in Excel:
      th-cam.com/video/O479uWMe_F0/w-d-xo.html

  • @abhinav18rana
    @abhinav18rana 3 ปีที่แล้ว +1

    Dave, you are a master at simplistic explanation of complex topics. I would be coming to your channel more often now and will recommend to my team as well who are new to analytics.

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      Thank you so much for this feedback!
      As a content creator and teacher, such recommendations are strong motivation to continue my mission.

  • @andrew-jones-data-science
    @andrew-jones-data-science 3 ปีที่แล้ว +1

    k-means is extremely useful - this is great!

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      Agreed! There's a reason why is the de facto standard clustering algorithm - easy to understand, easy to implement, and useful in so many business scenarios.

  • @pedroserpa2897
    @pedroserpa2897 ปีที่แล้ว

    Great Video.
    Suggestion1
    If in the iteration 1 you calculate de sum of the minimum distances. After that, you use the "excel->Data->Solver" to find your minimum of that sum by changing your initial points. With that, excel will do all the work for you in a glance.

  • @VenujanSrithar
    @VenujanSrithar 22 วันที่ผ่านมา +1

    You are explaining very well.
    Thank you Man !

    • @DaveOnData
      @DaveOnData  22 วันที่ผ่านมา

      You are most welcome!

  • @jissbela
    @jissbela 2 ปีที่แล้ว +2

    this video is awesome, thank you David!

  • @topfundus1093
    @topfundus1093 ปีที่แล้ว

    Prima, vielen Dank! Konnte nun mein eigenes Programm mit VB zur flexiblen Clusterbildung beliebiger Wertepaare schreiben. Die Daten (x, y) übernehme ich zunächst aus der Tabelle in ein Array. Dieses lasse ich dann mit verschachtelten Schleifen n-mal (=Iterationen) durchlaufen, bis alle Wertepaare auf Basis der kürzesten Entfernungen einem Cluster(-Punkt) zugeordnet sind, ohne das weitere Schleifendurchläufe diese Zuordnungen verändern. Es erfolgt der Ausstieg aus den Schleifen. Die Clusternummern werden dann auf einen Schlag in die Quelltabelle, in eine neue Spalte eingefügt. Fertig! Das geht alles blitzschnell und ohne die vielen, doch ziemlich aufwendigen Tabellen und Formeln, die Du im zweiten Teil Deines Tutorials zeigst. Dazu kommt, dass ich mein k-Mean-Programm universell nutzen kann. Es ist gleich, welche (numerischen) Datenspalten einlesen und clustern kann.
    Great. Thanks! I was now able to write my own program with VB for flexible clustering of any pairs of values. First, I take the data (x, y) from the table into an array. I then run this through nested loops n times (=iterations) until all pairs of values are assigned to a cluster (point) based on the shortest distances, without further loop runs changing these assignments. There is an exit from the loops. The cluster numbers are then inserted in one fell swoop into a new column in the source table. Finished! It's all lightning fast and without the many, but rather complex tables and formulas that you show in the second part of your tutorial. In addition, I can use my k-mean program universally. It doesn't matter which (numeric) data columns can read and cluster.

  • @Persianchannel1
    @Persianchannel1 2 ปีที่แล้ว

    dude you are a great teacher. i appreciate your passion of teaching. your explanation was great. thank you so much. your videos deserve more views 🌹🌹

  • @kisholoymukherjee
    @kisholoymukherjee 6 หลายเดือนก่อน +1

    As always, fantastic video Dave.

    • @DaveOnData
      @DaveOnData  6 หลายเดือนก่อน

      Thank you! I am so glad to hear you are finding my content useful.

  • @festusdavidoundo6227
    @festusdavidoundo6227 ปีที่แล้ว +2

    Thanks Dave, Got it.

  • @PaulAcito-ce2ex
    @PaulAcito-ce2ex ปีที่แล้ว

    Just fantastic explanations and approach. Thanks for sharing this.

  • @paddies64
    @paddies64 2 ปีที่แล้ว +1

    A great thank you Dave, don't know where you're from but very clear accent and a great teacher cos the subject is not that easy to teach. Need te review some parts of the video especially the one with the 1km formula ;-) but it's very clear ! Deserve a good bottle of wine Dave!

    • @boknows8263
      @boknows8263 2 ปีที่แล้ว

      He is from US, obvious from how he speaks English.

  • @willykitheka7618
    @willykitheka7618 3 ปีที่แล้ว +1

    Great job Dave! I really enjoyed that entire tutorial

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      Thank you for the feedback and so glad you are enjoying my content!

  • @strategy_gal
    @strategy_gal 3 ปีที่แล้ว +2

    Great tutorial, David! This is very helpful!

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      Wow! Thank you so much for taking the time to offer feedback. I'm flattered.

  • @husseinadam4724
    @husseinadam4724 ปีที่แล้ว

    You gained a new subscriber now thanks Dave

  • @JoseOlivar-sk5rv
    @JoseOlivar-sk5rv ปีที่แล้ว +1

    Hello, great video! Do you have a video showing how you would do this using R?

  • @AlexanderKonig2577
    @AlexanderKonig2577 2 ปีที่แล้ว +1

    Hi David, great video!!! today there are several new formulas, xlookup is an amazing and much easier way than vlookup. and for you huge and monster formula use the SUMXMY2 function. To find the average of each cluster (top rows) use the averageif formula... muuuuch easier and skip the power query step. Finnaly you shouldt run a min solver in order to find the min distance among variables and centroids.

    • @AchooUsman
      @AchooUsman ปีที่แล้ว

      Can u tell me the formula how to do it

  • @ultimatepravint
    @ultimatepravint 2 ปีที่แล้ว

    Great video, plain and simple language explanation

  • @vacilando86
    @vacilando86 2 ปีที่แล้ว

    You're my Hero. God bless you 🤝

  • @SMILE__6208
    @SMILE__6208 4 หลายเดือนก่อน

    Hi Dave! This video is very helpful, but i would like to know how you were able to add your existing tables into the power query editor. i am unable to select any other table from the worksheet, only the current one.

  • @VenujanSrithar
    @VenujanSrithar 22 วันที่ผ่านมา +1

    Please cover the part on how to select the K value ( cluster count )?
    Please explain the elbow method, wcss value, the silhouette method every thing related to selection of K value.

    • @DaveOnData
      @DaveOnData  22 วันที่ผ่านมา

      I will release an online course around Black Friday here in the US covering all these topics.

  • @ejpeninger
    @ejpeninger 3 ปีที่แล้ว +1

    Excellent tutorial, thank you so much Sir!

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      You are most welcome, hope you find the content useful!

  • @sandaranote6979
    @sandaranote6979 2 ปีที่แล้ว

    Thank you Dave for this video! It was interesting and your explanation is really clear and useful =)

  • @user-rc2vh6zu8d
    @user-rc2vh6zu8d 7 หลายเดือนก่อน +2

    If you would add a timeline (very easy in youtube) it would make navigation within this great video even easier.

    • @DaveOnData
      @DaveOnData  7 หลายเดือนก่อน

      My new videos have video chapters, but haven't had time to go back and update my older videos.

  • @swadminstudynwork1360
    @swadminstudynwork1360 2 ปีที่แล้ว

    Thanks!!! I learned a lot with your video :)

  • @mbaintheusa1165
    @mbaintheusa1165 5 หลายเดือนก่อน +1

    Thanks for the video!
    But what if the measures in different columns are way different? Lets say, we have spend in thousand dollars and some other parameter in Sq meters?
    Is it correct to say that we would need first to normalize date, to bring to some comparable values?

    • @DaveOnData
      @DaveOnData  4 หลายเดือนก่อน

      Great question! The results of k-means can be skewed when the data is not on equal footing. A standard solution (pun intended) is to apply the Z-score to each column before clustering to help with this issue.

  • @btomblin386
    @btomblin386 2 ปีที่แล้ว

    This was an awesome video!

  • @mikastamaria3062
    @mikastamaria3062 ปีที่แล้ว

    Hello! May I ask how the end results can be put into a graph for visualization to show the actual result of the clustering? Trying to search online but I can't seem to find one that aligns to this method.
    Very informative tutorial on the groundwork though to understand the foundations. :)

  • @spilledgraphics
    @spilledgraphics 3 ปีที่แล้ว +1

    7 emojis: 🤯🤯🤯🤯🤯🤯🤯 ...... one more.... 🤯❗ amazing David !

    • @spilledgraphics
      @spilledgraphics 3 ปีที่แล้ว +1

      @David, have you thought combining LAMBDA function for the iterative calculations ?

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +2

      Thank you for the feedback, very much appreciated! 🙏🙇‍♂️
      I considered a number of options, but in the end I decided to go with the simplest (albeit most manual) solution.

  • @ajaykulkarni576
    @ajaykulkarni576 3 ปีที่แล้ว +1

    Great Content David!

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      Thank you, Ajay, for the feedback, much appreciated! 🙏🙇‍♂️

  • @krispsimpson79
    @krispsimpson79 ปีที่แล้ว

    You mentioned that for categorical data there are better alternatives for cluster analysis - what would you recommend please?

  • @VenujanSrithar
    @VenujanSrithar 22 วันที่ผ่านมา +1

    So is it okay to take the initial cluster center from the data points then ?

    • @DaveOnData
      @DaveOnData  22 วันที่ผ่านมา

      In general, the best practice is to start with the cluster centers in random locations.

  • @williamstan1780
    @williamstan1780 2 ปีที่แล้ว

    David
    Thanks for the clip, very useful and informative.
    May I suggest to have one for Mixed Data (Category and numeric data? I have been searching it for ages but in vain.
    Thanks for your help

  • @nagendravishwamitra3652
    @nagendravishwamitra3652 2 ปีที่แล้ว +2

    One of the best lecture on k-means ..if you were in front of me i would have kissed you ..Greetings from India

  • @sebastianbriceno5293
    @sebastianbriceno5293 3 ปีที่แล้ว +1

    Great video! Thank you so much.
    Can we to access to the xls file to understand the formulas that you showed in the notepad?
    Regards

    • @willykitheka7618
      @willykitheka7618 3 ปีที่แล้ว +2

      at 39:27 David mentioned that you can find the notepad and the excel file in his Github repo.

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +2

      That's correct. The GitHub repo link is in the video description.

    • @sebastianbriceno5293
      @sebastianbriceno5293 3 ปีที่แล้ว +1

      @@willykitheka7618 Thanks!

    • @sebastianbriceno5293
      @sebastianbriceno5293 3 ปีที่แล้ว +1

      @@DaveOnData Thanks!

  • @rainashrivastava6182
    @rainashrivastava6182 ปีที่แล้ว

    Hey, I have a media data set in which, Rows are episode names and Columns are the different slot timings, So Let's say episode A has data for only 3 slots, and Episode B has data for 4 slots and so on. How do I apply K means to this Data set?

  • @enanta7
    @enanta7 4 หลายเดือนก่อน +1

    Can we cluster text with this method too?

    • @DaveOnData
      @DaveOnData  4 หลายเดือนก่อน

      You can cluster text documents using k-means, but the data must be preprocessed first. Check out my Python in Excel videos to learn more.

  • @dataanalyst1012
    @dataanalyst1012 2 ปีที่แล้ว

    In k means clustering, is there an assumption in numbers of observations and variables? Would having variables greater than observation affect the results of clustering and make it less accurate?

  • @mamaraci
    @mamaraci 2 ปีที่แล้ว

    Hello! Thanks for the detailed explanation. But, do you have the same, but fully based on PowerQuery?
    Initially, I want to try the same, but with ~2 million (lines) of customers, and >50 columns (dimensions). I think VLOOKUping them is not an optimal way to do that (
    Or, my best alternative is to switch to R / Python with these volumes?

  • @venkatesh0075
    @venkatesh0075 3 ปีที่แล้ว +1

    Thanks for sharing. Can you also show how this works in Python pls

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +5

      My apologies, but I don't do Python. I'm a Excel, SQL, and R guy. 😁

    • @abhinav18rana
      @abhinav18rana 3 ปีที่แล้ว +2

      @@DaveOnData I usually do stuff in excel first and then transition the logic to a tool or language. Excel makes me grasp the data and logic much better which in turn makes my life easy when the time comes for the project to upscale on a tool/language for heavy processing.

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +2

      @Abhinav Rana - I would suspect that the way you approach analytics is very common. Prototype in Excel and move to another technology if needed.

    • @AchooUsman
      @AchooUsman ปีที่แล้ว

      ​@@DaveOnDataDavid is it easy for any non programmer and no computer science student to use R ?

  • @kevinwflanagan9156
    @kevinwflanagan9156 3 ปีที่แล้ว +1

    Can you add the formula to download?

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      Apologies, I'm missing the intent of the question. The formula to download what?

    • @kevinwflanagan9156
      @kevinwflanagan9156 3 ปีที่แล้ว +1

      @@DaveOnData The Notepad formulas you showed

    • @willykitheka7618
      @willykitheka7618 3 ปีที่แล้ว +1

      @@kevinwflanagan9156 you can find the notepad in David github repo. (watch at 39.27)

    • @DaveOnData
      @DaveOnData  3 ปีที่แล้ว +1

      This is correct. The link to the GitHub repo is in the video description.

  • @RohbertWhite
    @RohbertWhite 4 หลายเดือนก่อน +1

    You clearly know your stuff, but the excel worksheet is blurry and often unreadable. So I'm having to make sense of much that you are talking about from your verbal description of what you are doing. It might as well be an audiobook instead of a video. Constructive criticism only.

    • @DaveOnData
      @DaveOnData  4 หลายเดือนก่อน

      Interesting. I just watched the video, and it was crystal clear. The video was recorded in 1080P.

    • @RohbertWhite
      @RohbertWhite 4 หลายเดือนก่อน +1

      @@DaveOnData Thank you for your input. My computer did auto adjust to 480p for some ungodly reason. So I was wrong about that part, however, the writing is still very small and I have vision problem after starring a screen to long. It would have been helpful if he would zoom in on the particular chart that his is referring rather than me having to look at all three chart while he's talking about one. I may just be old, blind, cranky, and stupid. good possibility.

  • @ie95_Moh
    @ie95_Moh 3 ปีที่แล้ว

    Sir i need to contact you please i have a question about k means