StatQuest: K-nearest neighbors, Clearly Explained
ฝัง
- เผยแพร่เมื่อ 16 มิ.ย. 2024
- Machine learning and Data Mining sure sound like complicated things, but that isn't always the case. Here we talk about the surprisingly simple and surprisingly effective K-nearest neighbors algorithm.
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
TH-cam Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
0:21 K-NN overview
0:44 K-NN applied to scatterplot data
2:44 K-NN applied to a heatmap
4:12 Thoughts on how to pick 'K'
#statquest #KNN #ML
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
I'm taking a machine learning course at university, and I've been blessed with having found your channel. Keep up the great content!
Hooray! I'm glad the videos are helpful. :)
Whenever I search for a video tutorial, and you pop up in the search results, my heart fills with joy!!! ^^
Thank you once again!
Hooray!!!!! :)
same here ..not started the video yet but only 1 video on knn .....dont know if i can understand very very well like linear regression
Five minutes explains better than some teachers spent one hour. :)
Thank you! :)
Better than teacher spending semester for me
hahahahaha
@@free_thinker4958 wtf really? also my teacher took 5 minutes that's why I understood nothing
For real, this channel is a godsend.
INTRO IS LEGENDARY BRO : )
Yup, that's a good one. :)
Every time I see your videos I'm simply amazed how you manage to make things simple,it's like 1+1=2, respect
Thank you! :)
it is good to listen to your music in your website after watching this clear-explained video. thanks a lot.
Thank you so much! :)
When a random TH-cam channel explains it better than your University Professor....
Keep it up!
Wow, thanks!
This is by far the best video on KNN algo ! Thanks Josh
You are doing awesome work Sir..have watched your other videos as well..very intuitive and logically explained
I can't believe how good you are at explaining this. wow!!!
bam!
Thank you, very clear and to the point explanation !
This channel is salt of the Earth
Thanks!
Very clear, I got the idea of this concept right away.
Well done, thanks!
THanks!
Thank you! This helped me so much in understanding KNN faster :D
Hooray!!! :)
Thank you so much. So useful honestly - i didnt get this from a 2 hour lecture
Glad it was helpful!
When I search for something and find it on StatQuest channel. Super BAM!!
YES!
WOWW! This was super helpful!
Thanks Josh!
Glad it was helpful!
Thank you. Very good explanation in such a short time.
Thanks! :)
Your videos are sooo great, I can't stop watching 💖💖 thank you
Hooray!!!!
StatQuest with Josh Starmer can you add an ICA as well?
It's on the to-do list, but it might be a while before I get to it.
StatQuest with Josh Starmer 😔😕 that's sad, but i look forward to it. You explain beautifully sir! 💪🏼👊🏼
Amazing explanation! Thank you!
Clear and concise explanation. Thank you :)
Thanks! :)
I am brushing up on my ML terminology and StatQuest always comes to the rescue!! BAM!
bam!
Thank you josh and the FFGDUNCCH (the friendly folks from the genetics department at the university of north carolina at chapel hill)
Triple bam! :)
Very well explained and loved your uke intro by the way :)
Thank you!
Easy to understand and straightforward. Thanks.
Thanks!
I am so glad I found this channel.
Thanks!
Thank you for your Clear explanation.
You're welcome! :)
Another exciting episode of statquest!
bam! :)
Simple and Clear explanation. Thank you!
Thanks!
such an amazing explanation. Thank you!
Thanks! :)
BAM!!! That was great as usual.
Hooray! Thank you! :)
Ohhh man this so simple
Thqqq for this type of explanation
Most welcome 😊
Good stuff, thanks! Do you have any videos about survival analysis?
You're a legend at explaining.
:)
awesome explanation ! thank you so much!
Thank you! :)
Summarised in a very short video....just perfect
Thank you! :)
Many thanks for the clear explanation
Thanks! :)
Great explanation! BAM! Great illustrations! Double BAM!!
Thank you very much! :)
thank you so much.This was well explained.
Thanks!
Dang. Simple and to the point! Thank you!
Thanks!
awesome! You should do a quadratic discriminant analysis to go with your awesome one on LDA
BAM! Amazing explanation!
Thanks!
Bam! Smart and clear as usual.
one video explained better than a whole semester
Awesome! :)
Loved it.... Thank you 😊
Glad you enjoyed it!
Best explanation ever, thank you!!!
Thanks!
Well explained, thank you good sir!
Glad it was helpful!
Your video is amazing as always... It would be great if you can include how to choose the value for 'k' and evaluation metrics for kNN. Also, if I understand it right, there is no actual "training" happening in kNN. It is about arranging the points on the cartesian plane and when a new data point comes, it will again be placed on the same plane and depending on the value of "k", it will be classified. Correct me if I'm wrong.
Hi. Yes, you are right. KNN is easy to implement and understand and has been widely used in academia and industry for decades. You may utilise the cross-validation technique and the validation datasets to select the value for k.
Wow! such a great explainer
Glad you think so!
That opening banjo solo is prettt sweet.
Thanks!
This channel is GOD SENT. Period.
Thanks!
I love you sir! Your video save my life!
Happy to help!
Just come across the video! Love it!! It's really clear and easy to follow! :D I have a question regarding the steps. For step 1, you said it would be used for known categories, and Im looking to use this method for unknown categories. Since we know all most of the traits, is there anyway to create categories using those characteristics? I'm new for machine learning and I wonder is there any method for this?
It depends on a lot of things. Creating categories from the raw data can be very subjective.
@@statquest Would it be possible to categorize items having trait 1,2,3,4 using similarity tests? But then the question is then where to start with.
You are amazing! Thank u so much.
Cheers from BRAZIL
Muito obrigado! :)
Just wow thanks Josh. You are just great. One doubt however, if k values are large will outliers not affect my algo? Effect of outliers in knn? Please answer.
I believe that large values for K will provide some protection from outliers.
Thank you so much for the video! I do have one question, though, relating to what you were saying about what happens if you have too large of a K (at the end of the video)...is this algorithm something that would inherently not work very well in cases where you have much larger sample sizes for some groups/clumps than others? It seems like it would always have a bias towards groups with larger sample sizes over ones with smaller ones...
That's a good question and I don't know the answer off the top of my head. To me, it seems like it could go either way.
Thank you so much for saving our time sir❤ love from Srilanka 🇱🇰
bam!
Where would I be without StatQuest? Luckily, I now have the statistical tool to estimate this!
bam!
Man, you are a legend, if I pass from the exam on Monday (which I am pretty hopeless), I will buy one of your shirts next month
Hooray! Good luck with your exam! :)
@@statquest Hey, I failed :D but still, I learnt a lot, thanks!
@@eltajbabazade1189 Better luck next time! :)
Thank you very much for your amazing work! Question kind of not related, but I was wondering: is there any explanation on euclidean distance calculated in stata as well? Thanks!
Unfortunately I don't know how to use stata.
Thanks sir, great explanation!
Glad you liked it!
BAM!!! You nailed it.
Thank you! :)
please do statquest videos on complete model building projects in R!!
Great tutorial!
Thank you!
Your videos are really great! Clear and detailed explanation. Can you please make a similar detailed playlist for neural networks?
I'm working on it. I have 5 videos so far, and 5 more to go before I have the whole playlist. Here's the link to the first one: th-cam.com/video/CqOfi41LfDw/w-d-xo.html and the other links are here: statquest.org/video-index/
@@statquest Yes I have seen those videos, just wanted to know whether there are more videos to come. Eagerly waiting!
@@unnatinandrekar99 The next one comes out on Monday, and then the rest will come out, one or two per week, for the next month.
@@statquest BAM!!!! That's prefect!!!!!!!!
THANK YOU JOSH!
Anytime! :)
THANK YOU!
YOU HAVE SAVED ME :D
Awesome! :)
You're a legend ! Thank you :)
Thanks!
you are the master of machine learning
:)
You are awesome man!!
Thanks!
Hello Josh how are you. I was wondering if you may kindly explain the Naive Bayes, to be clearly explained :)
Thanks alot for this video.
Hooray! :)
Good job ! I loved the videooo :)
Thanks!
hey there! love your videos thank you so much, I'm still a bit confused though , does not knn only work with numerical features?
It will work for any distance metric. So if you have a distance metric for categorical data, it will work. However, typically it is just used for numeric data.
Your videos are K-nearest perfection :)
Ha! Very funny.
@@statquest Noice 👍 Thanks 👍
Thank you so much
No problem!
lifesaver! thank you!
Glad it helped!
was extremely helpful tysm
Thanks! :)
Nice video well done
Thanks!
Your videos are brilliant! Would you also do a series of videos on scRNAseq/spatial transcriptomics analysis?
I'll keep that in mind!
Omg, thank you so much!!!!!
Happy to help!
Thank you!
You bet!
I like your bandcamp!
Hooray! Thank you! :)
Another one of best resource.
Can you please tell what does it mean by "it adds weight to each distance giving ever slightly more weight to large clusters than to small cluster".
One way to think about the "weights" is that if I have two clusters, A and B, and A is larger than B, then a new point that is equidistant to both A and B will be added to cluster A since it is larger. In other words, the larger the cluster, the more likely a new point will be put into that cluster.
@@statquest That's great Josh! Thanks a lot for the video series. You really rock!
Great videos. Do you have one that explains Parzen Windows for non parametric estimates?
Not yet.
what exactly does the model store after being trained? Does it store all the features and labels in our training data? So that while predicting it can compare with all distances?
I believe so.
Maybe make a video on bayesian classification? Also, we should choose a k that isn't a multiple of the number of categories to avoid a tied vote.
I should have a video on that topic by the end of February or early March.
That is awsom how you explain this topics. One suggestion, you could show how the 7 nearest ist red, 3 nearest ist orange and 1 nearest is green for the point in the middle. By my eyes, the 1 nearest neigbour ist still red! and it makes me confuse what does nearest means actually :)
What time point, minutes and seconds, are you referring to?
Hey Josh! This is just a thank you note saying if I pass the upcoming exam, then it would be all because of you! ❤
Good luck!!! Let me know how it goes!
@@statquest It went well, thank you! Hopefully I get good grades. I was thinking of suggesting that it would be great if you could cover Markov Chain Monte Carlo and related topics. Thank you again! Your channel has been incredibly helpful!
@@suparnaroy2829 I'm glad it went well! And I'll keep those topics in mind.
Explanation was very very very good. Easily understandable by anyone almost.
Can you please do a video on KFold and StratifiedKFold with an example using python. Also, can you explain cross_val_score in details
I have a video on cross validation that covers the concepts in K-Fold cross validation. That might not be exactly what you are looking for, but just in case, here is the link: th-cam.com/video/fSytzGwwBVw/w-d-xo.html
Hello Josh, Could you tell me when and where I should use KNN, KMC, Hierarchical, or other unsupervised machine learnings? By this I mean, are there any metrics to judge which one is better? Or in which situation, this one is more suitable than another one?
It depends on the field and your goal. Often heatmaps are clustered with hierarchal clustering. PCA is often combined with KNN.
Thanks you for the clear explanation.sir can suggest me how l can deal with imbalance data.
I'll be talking about that on my livestream on Monday (9am, New York Time).
Thanks for the very informative info ! Though I have a question , if my dataset is filled with just categorical string data. So no numerical data . Is there a way I can still use knn to predict ? I heard about encoding the string to numerical value but that seems very complex with big dataset .
If you use R, then you can use a Random Forest to cluster anything and then apply KNN to that clustering: th-cam.com/video/sQ870aTKqiM/w-d-xo.html If you don't use R, you can use target encoding: th-cam.com/video/589nCGeWG1w/w-d-xo.html
Omg thank you so much
No problem!
great video
Thanks!
Excellent
Thank you so much 😀
Amazing!
Thanks!
thank you so much for this video! i have my midterm tomorrow and im so scared :(
Good luck!!
My 10 year old hums statquest song made me realise I my new obsession with this
bam!
Please do a video on K-Medoid
I'll keep that in mind.