Support Vector Machines Part 2: The Polynomial Kernel (Part 2 of 3)

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ต.ค. 2024
  • Support Vector Machines use kernel functions to do all the hard work and this StatQuest dives deep into one of the most popular: The Polynomial Kernel. We talk about the parameter values and how they calculate high-dimensional coordinates via the dot-product and high-dimensional relationships
    NOTE: This StatQuest assumes you already know about...
    Support Vector Machines: • Support Vector Machine...
    Cross Validation: • Machine Learning Funda...
    ALSO NOTE: This StatQuest is based on...
    1) The description of Kernel Functions, and associated concepts on pages 352 to 353 of the Introduction to Statistical Learning in R: faculty.marshal...
    2) The Polynomial Kernel is also based on the Kernel used by scikit-learn: scikit-learn.o...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumr...
    Paperback - www.amazon.com...
    Kindle eBook - www.amazon.com...
    Patreon: / statquest
    ...or...
    TH-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshi...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer....
    ...or just donating to StatQuest!
    www.paypal.me/...
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #SVM #kernel

ความคิดเห็น • 430

  • @statquest
    @statquest  2 ปีที่แล้ว +10

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @davidonwuteaka2642
      @davidonwuteaka2642 ปีที่แล้ว

      How do I get it from Nigeria.
      I'd love to.

    • @statquest
      @statquest  ปีที่แล้ว +1

      @@davidonwuteaka2642 Unfortunately I don't have distribution of physical (printed) copies in Nigeria, but you can get the PDF.

    • @davidonwuteaka2642
      @davidonwuteaka2642 ปีที่แล้ว

      Yes, I have been trying to but the site kept rejecting my card.
      Thanks for your reply.

    • @statquest
      @statquest  ปีที่แล้ว +1

      @@davidonwuteaka2642 Bummer! I'm sorry to hear that.

  • @RHONSON100
    @RHONSON100 2 ปีที่แล้ว +90

    Your videos should be mandatory tutorial for Data Science/ ML courses in all the Universities. Students throughout the world would get benefited after watching the best ML video.Hats off to you great Josh Starmer..............

    • @statquest
      @statquest  2 ปีที่แล้ว +5

      Wow, thanks!

    • @rameshmitawa2246
      @rameshmitawa2246 2 ปีที่แล้ว +10

      Not mandatory, but my prof recommends this channel after every slide/lecture.

    • @statquest
      @statquest  2 ปีที่แล้ว +4

      @@rameshmitawa2246 That's awesome!

    • @tinacole1450
      @tinacole1450 ปีที่แล้ว +1

      I believe because most instructors don't teach it. They simply give information ....Josh actually explains difficult concepts in a simple way.

  • @stanlukash33
    @stanlukash33 3 ปีที่แล้ว +206

    I will make it easy for you guys:
    3:38 - BAM
    4:49 - DOUBLE BAM
    5:54 - TRIPLE BAM

    • @statquest
      @statquest  3 ปีที่แล้ว +24

      Just the hits! BAM! :)

    • @MrPikkabo
      @MrPikkabo 3 ปีที่แล้ว +7

      Thanks I know statistics now

  • @madhuvarun2790
    @madhuvarun2790 3 ปีที่แล้ว +76

    Dude, You are amazing. The best tutorial on SVM. I have searched the entire Internet to understand but couldn't. Please continue to make videos.

    • @statquest
      @statquest  3 ปีที่แล้ว +9

      Thanks, will do!

  • @Snoozy_FTW
    @Snoozy_FTW ปีที่แล้ว +9

    Best machine learning playlist I have encountered on the TH-cam .
    The animations and your funny way of teaching makes it easy to understand concepts.
    The amount of work you put to create these videos deserves great appreciation.
    I would definitely recommend to go through the videos for anyone who is reading this comment.

    • @statquest
      @statquest  ปีที่แล้ว

      Glad you like them!

  • @jacobwalker6891
    @jacobwalker6891 3 หลายเดือนก่อน +2

    I have read and looked at most recommended books and videos on kernels and whilst somewhat familiar with the math, never truly understood the principles.
    Statquest actually makes complex topics simple, arguably one of the best if not the best teacher on youtube and definitely the best stat explanations.
    Thanks Josh much appreciated 👍

    • @statquest
      @statquest  3 หลายเดือนก่อน

      Thank you very much! :)

  • @itsfabiolous
    @itsfabiolous ปีที่แล้ว +3

    Bro you're just a blessing. Never stop with the dry humor. Lot's of love for you!

    • @statquest
      @statquest  ปีที่แล้ว

      Thank you! Will do!

  • @marcoharfe9812
    @marcoharfe9812 4 ปีที่แล้ว +14

    I want to thank you so much for all your videos. I was lost in a forest of vectors matrices and greek letters when I heard about these topics in lecture and I did not understand a thing. As I was practising for the exam, I discovered your videos and now I do actually understand what is happening. Really love the practical, example driven approach!

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      Awesome!!!! Good luck with your exam and let me know how it goes. :)

  • @priyangkumarpatel9317
    @priyangkumarpatel9317 4 ปีที่แล้ว +3

    This is one of the best explanation for support vector machines... If anyone is interested in why dot products are integral to the idea of SVM, please refer to Professor Wilson's MIT lecture on SVM... It is another great explanation for SVM...

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thanks! :)

  • @amalboussere9270
    @amalboussere9270 4 ปีที่แล้ว +20

    thank you a lot you are such a big help in this harsh student world god bless you .

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      I'm glad you like my videos! :)

    • @palashchandrakar1112
      @palashchandrakar1112 4 ปีที่แล้ว +1

      @@statquest we just don't only like them we love your videos XOXO

    • @leif1075
      @leif1075 3 ปีที่แล้ว

      @@statquest this doesnt show where on esrth you dsrive that formula from..WHY do you multiply a times b and then add r
      .why not multiply all three or add all three..see what I mean? I don't see how anyone could figure itnout..not enough info here to derive it

  • @kwok9298
    @kwok9298 2 ปีที่แล้ว +3

    I really appreciate how the way it is explained. Please keep on the good job!

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Thank you!

  • @606Add
    @606Add 4 ปีที่แล้ว +5

    You are videos are simply amazing! And the level of abstraction is right at the sweet spot! Thank you for the extremely thoughtful and precise illustrations!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @chenghuang4724
    @chenghuang4724 ปีที่แล้ว +1

    Sir, this is the best video for explaining the Kernel!

    • @statquest
      @statquest  ปีที่แล้ว

      Glad you think so!

  • @jonathannoll3386
    @jonathannoll3386 4 ปีที่แล้ว +6

    My man. I'm so happy I have my presentation about SVM's after your uploads... Keep up the great work!

  • @hayskapoy
    @hayskapoy 4 ปีที่แล้ว +101

    Would love to see more math after seeing the big picture behind these algorithms 😄

  • @billykristianto3818
    @billykristianto3818 8 หลายเดือนก่อน +1

    Thank you very much, the explanation is easier to understand compare to my class!

    • @statquest
      @statquest  8 หลายเดือนก่อน

      Glad it helped!

  • @flaviodefalcao
    @flaviodefalcao 4 ปีที่แล้ว +1

    It is awesome and satisfing to be able to learn an intuition with these videos and reading a textbook understanding everything. THANKS

    • @statquest
      @statquest  4 ปีที่แล้ว

      Awesome! I'm glad the videos are helpful! :)

    • @flaviodefalcao
      @flaviodefalcao 4 ปีที่แล้ว +2

      @@statquest BAM!!!

  • @johnjunhyukjung
    @johnjunhyukjung ปีที่แล้ว +1

    This is how concepts should be introduced to students.. makes so much more sense

  • @tymothylim6550
    @tymothylim6550 3 ปีที่แล้ว +1

    Thank you for this video! It was very helpful in terms of understanding the details of how the kernel function leads to certain equations that need to be solved to obtain the relevant Support Vector Classifier!

  • @evelillac9718
    @evelillac9718 3 ปีที่แล้ว +1

    You literally saved my homework with your videos

  • @muhtasirimran
    @muhtasirimran 2 ปีที่แล้ว +2

    Mr. Starmer almost unconsciously changing machine Learning's future 😀

  • @MrZidane1128
    @MrZidane1128 4 ปีที่แล้ว +22

    First of all, thanks for your explanation, after plugging two data points into polynomial kernel function a and b then get the value 16,002.25, then you said we get higher dimensional relationship. Could you elaborate further what "relationship" did you refer to based on the value 16,002.25? Sorry I was not quite sure about that

    • @statquest
      @statquest  4 ปีที่แล้ว +9

      In some sense the "relationships" are similar to transforming the data to the higher dimension and calculating the distances between data points.

    • @vedgupta1686
      @vedgupta1686 2 ปีที่แล้ว +1

      @@statquest But the value 16002.25 alone is a 1-D data point. How do you suppose that helps us classify? Am I missing something?

    • @statquest
      @statquest  2 ปีที่แล้ว +5

      @@vedgupta1686 Think of that number is a loss value that is used as input for an iterative optimization algorithm like gradient descent.

    • @HeduAI
      @HeduAI 2 ปีที่แล้ว

      I thought the whole point of using the kernel trick was to save on the computation cost. If we are using an iterative algorithm anyway, how is that better than transforming the data?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      @@HeduAI Either way you would still have to use an iterative procedure. So that computation is fixed.

  • @rrrprogram8667
    @rrrprogram8667 4 ปีที่แล้ว +7

    After a lonnnnggg waitttt..... MEGAA MEGAAA MEGAAAA BAMMM is back

    • @statquest
      @statquest  4 ปีที่แล้ว

      Ha! Thank you! :)

  • @trashantrathore4995
    @trashantrathore4995 2 ปีที่แล้ว +1

    Earlier i had an intuition of all Algos which was incomplete and which could not be explained to others, Concepts are getting cleared now. Thanks STATQUEST Team, Josh Starmer, will contribute ASA i get a job in DS field.

  • @mahfuzurrahmansazal3974
    @mahfuzurrahmansazal3974 หลายเดือนก่อน +1

    Came here for the SVM, stayed for BAM!!!
    Double BAM!!!!

    • @statquest
      @statquest  หลายเดือนก่อน +1

      That's awesome! You made me laugh.

  • @deashehu2591
    @deashehu2591 4 ปีที่แล้ว +38

    I have grown to love your little songs. They sound like Pheobe's songs!!! I have a little question , what do you use for visualization?

    • @statquest
      @statquest  4 ปีที่แล้ว +12

      Thanks! I draw all the pictures in Keynote.

    • @gargidwivedi7700
      @gargidwivedi7700 4 ปีที่แล้ว +1

      That's exactly what I and my sister agreed just before we saw your comment! haha.

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      @Leila Mohammadzadeh Google "svm lagrange dual" and you will see how SVM uses the dot products to find the classifier.

  • @nightawaitsusall9607
    @nightawaitsusall9607 4 ปีที่แล้ว +2

    You my friend are a champion. Yes.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you! :)

  • @temesgenaberaasfaw5076
    @temesgenaberaasfaw5076 4 ปีที่แล้ว +2

    best tutorial for SVM , YOU DID IT THANKS

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @ahming123
    @ahming123 4 ปีที่แล้ว +74

    What do you mean by high dimension relationship??

    • @huhuboss8274
      @huhuboss8274 4 ปีที่แล้ว +9

      like the distance but in higher dimensions

    • @Actanonverba01
      @Actanonverba01 4 ปีที่แล้ว +1

      a synonym for 'high dimension' is many features or variables. Relationship think connection(s). So if we have a high D. relationship, we have a set of many variables that are connected by some idea or mathematical formula. Does that help?

    • @BrandonSLockey
      @BrandonSLockey 4 ปีที่แล้ว +2

      watch first video (Part I)

    • @leif1075
      @leif1075 3 ปีที่แล้ว

      @@Actanonverba01 that's what I thought but that is irrelevant here because we only have obe variable with two possible categories of values. But of course we can add more connecfions and variables which I think is what you are alluding to

    • @clapdrix72
      @clapdrix72 2 ปีที่แล้ว +1

      @@leif1075 It's not actually what he means and it's not irrelevant. High dimensional space means we take our original input feature space (in this case just X1) and transform it into higher dimensional space by "making up" new dimensions that are functions of our original dimensions (X1) so that the data is linearly separable in that new space. The pair wise relationships (aka similarity) are the dustances between the observations projected into that higher dimensional space (usually referred to as latent space). So it doesn't matter how many features you have in your original dataset nor how many outcome classes you have - those are irrelevant to the SVM algorithm mechanics, they only change the scale.

  • @thawinhart-rawung463
    @thawinhart-rawung463 2 ปีที่แล้ว +1

    Good job Josh

  • @axa3547
    @axa3547 3 ปีที่แล้ว

    machine learning algorithimss!!! is it just me or other who has to learn these again n again to fill the gap in knowledge

  • @NathanPhippsONeill
    @NathanPhippsONeill 4 ปีที่แล้ว +2

    Amazing vid! Thanks helping me prepare for my Machine Learning exam 😁

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Good luck and let me know how it goes. :)

    • @NathanPhippsONeill
      @NathanPhippsONeill 4 ปีที่แล้ว +1

      @@statquest It went well for a difficult exam. BUT I had a lot to write about thanks to this channel. Appreciate it ❤️

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      @@NathanPhippsONeill Hooray!!! That's awesome and congratulations. :)

  • @benardmwanjeya8371
    @benardmwanjeya8371 4 ปีที่แล้ว +4

    God bless you Josh STARmer

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @dok3820
    @dok3820 2 ปีที่แล้ว +1

    Thank you Josh. Just..thank you

  • @alternativepotato
    @alternativepotato 3 ปีที่แล้ว +1

    i love u my man you really are a life saver. Just because of that i am gonna buy a tshirt

    • @statquest
      @statquest  3 ปีที่แล้ว

      BAM! Thank you very much! :)

  • @technojos
    @technojos 3 ปีที่แล้ว +1

    Thanksss Josh Starmer.I am facinated because of your videos.
    Please make a video about how 16002.25 is used bam?.
    Moreover I think that you can make video playlist about how machine learning algorithms has coded double bamm .
    Keep going man, we love you triple bamm!!!

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Great suggestions!

    • @kevinarmbruster2724
      @kevinarmbruster2724 3 ปีที่แล้ว

      @@statquest How is the relationship of 16.002,25 to be interpreted?
      I understood that if we transfer everything to the higher dimension we can solve it, but I did not understand the part about relationships between the points and how they help.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      @@kevinarmbruster2724 We plug the relationships into an algorithm that is similar to gradient descent and it can use them to find the optimal classifier. However, the details are pretty complex and would require another video.

  • @TaylorSparks
    @TaylorSparks 2 ปีที่แล้ว +1

    bam. love it homie. keep it up

  • @shahbazsiddiqi74
    @shahbazsiddiqi74 4 ปีที่แล้ว +1

    waited too long... Thanks a ton

  • @yulinliu850
    @yulinliu850 4 ปีที่แล้ว +1

    Awesome! Josh is back.

  • @manaspatil4316
    @manaspatil4316 3 ปีที่แล้ว +2

    God bless you !!!

  • @sinarb2884
    @sinarb2884 3 ปีที่แล้ว

    I could be wrong, but I think there is a slight mistake in this video. The kernel function should be of the form (ab-1/2)^2. This is because the support vector classifier is essentially thresholding based on whether x>y or not. Let me know please if I am wrong. And, thanks for your cool videos.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Most people define it the way I defined it in the video, (ab + r)^d. For more details, see: en.wikipedia.org/wiki/Polynomial_kernel and Page 352 of the Introduction to Statistical Learning in R.

  • @sornamuhilan.s.p
    @sornamuhilan.s.p 4 ปีที่แล้ว +1

    John Starmer, you are a genius sir!!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @harshitsati
    @harshitsati 3 ปีที่แล้ว +1

    Thank you angel

  • @edmondkeogh4057
    @edmondkeogh4057 3 ปีที่แล้ว +2

    the beep boop thing was hilarious

  • @manasadevadas8685
    @manasadevadas8685 3 ปีที่แล้ว +3

    First of all thankyou so much for explaining with such amazing illustrations. One doubt, how can we actually use relationship between points to find the support vector classifier?

    • @statquest
      @statquest  3 ปีที่แล้ว +4

      Unfortunately that's a difficult question to answer and I'd have to dedicate a whole video to it. However, the simple answer is that it uses a method like Gradient Descent to find the optimal values.

    • @manasadevadas8685
      @manasadevadas8685 3 ปีที่แล้ว +2

      @@statquest Thanks for the response! Hopefully later you'd dedicate a whole video to it :)

  • @tinacole1450
    @tinacole1450 ปีที่แล้ว +1

    Does anyone laugh at how silly yet genius Josh is? Loved the robot.. I rewinded to do the robot.

    • @statquest
      @statquest  ปีที่แล้ว

      You are my favorite! Thank you so much! I'm glad you enjoy the silly sounds.

  • @harithagayathri7185
    @harithagayathri7185 4 ปีที่แล้ว +3

    Great explanation 👍 Thanks a ton Josh!!. But, a bit confused here on how to calculate appropriate 'r' coefficient for the eqn.I understand that 'd' value is calculated using Cross Validation

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      'r' is also determined by cross validation, but I am under the impression that it doesn't have as much impact as 'd'. It basically scales things by a constant, rather than adding extra dimensions.

    • @thememace
      @thememace 3 ปีที่แล้ว

      @@statquest What's the point of setting r anyway since it later gets completely ignored?🤔

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      @@thememace I'm not sure

    • @rohanpatel702
      @rohanpatel702 2 ปีที่แล้ว

      @@thememace it doesn't get completely ignored. When r=1/2, the math works out such that the x-axis doesn't get scaled at all. But when r=1, the x-axis gets scaled by sqrt(2). Even though the third element of the vectors combined by dot product is a constant (and thus ignored), the choice of r still affects how the dot product evaluates because of how it changes the first element of each vector.

  • @eric752
    @eric752 2 ปีที่แล้ว

    One suggestion: if at the beginning, if the all the topics are listed in a logical way, it would even better. Big thanks for the videos, really appreciate it 🙏

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thanks!

    • @eric752
      @eric752 2 ปีที่แล้ว +1

      @@statquest thank you

  • @tuongminhquoc
    @tuongminhquoc 4 ปีที่แล้ว +2

    First comment! I have turned on notification for your videos. I love all of your videos!

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Awesome! Thank you! :)

  • @vincent-paulvincentelli2627
    @vincent-paulvincentelli2627 3 ปีที่แล้ว

    Great video ! It would be very nice to have such an intuitive one for kernel PCA :)

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      I'll keep that in mind.

  • @commentor93
    @commentor93 2 ปีที่แล้ว

    I've understood more than I ever expected to understand in this topic all thanks to your videos.
    But now I've stumbled a bit: How do you solve a constant like the one in 5:50? Or what does solving mean in that context now that it isn't a formula? Could you please expand on that?

    • @statquest
      @statquest  2 ปีที่แล้ว

      Think of it as a loss value, and it is something we try to optimize with an iterative algorithm that is similar to Gradient Descent: th-cam.com/video/sDv4f4s2SB8/w-d-xo.html

  • @aryamahima3
    @aryamahima3 2 ปีที่แล้ว

    @5:09, u said that we need to calculate dot product between each pair of point. How do we use this dot product further? could u please clear to me, u r the only person on whole internet who can clear this. :D

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      We use it as input to an iterative optimization algorithm similar to gradient descent. For details on gradient descent, see: th-cam.com/video/sDv4f4s2SB8/w-d-xo.html

    • @aryamahima3
      @aryamahima3 2 ปีที่แล้ว +1

      @@statquest thank u so much ☺️

  • @muhammadavimajidkaaffah7715
    @muhammadavimajidkaaffah7715 4 ปีที่แล้ว +1

    SVM for multiclass please, I like your video so much.

  • @iisc2022
    @iisc2022 ปีที่แล้ว +1

    thank you

  • @preeethan
    @preeethan 4 ปีที่แล้ว +3

    Amazing explanation:) We find the High Dimensional Relationship between 2 points to be 16002.25.
    Practically what do we do with this value.? How do we find the Support Vector Classifier with this value.?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      It's quite complicated - way too complicated to be described in a comment.

    • @preeethan
      @preeethan 4 ปีที่แล้ว +1

      StatQuest with Josh Starmer
      Okay. I love all you videos, especially your intro songs! Great work keep it going Josh :)

    • @sanjivgautam9063
      @sanjivgautam9063 4 ปีที่แล้ว

      I want this answer too!

    • @balasubramanian5232
      @balasubramanian5232 4 ปีที่แล้ว

      @@statquest I want answers for the question. It'll be helpful if you could share links to resources on this

    • @statquest
      @statquest  4 ปีที่แล้ว

      @@balasubramanian5232 Google "svm lagrange dual" and you will have lots and lots of resources.

  • @tsunningwah3471
    @tsunningwah3471 2 หลายเดือนก่อน +1

    amazing

    • @statquest
      @statquest  2 หลายเดือนก่อน

      Thanks!

  • @61_shivangbhardwaj46
    @61_shivangbhardwaj46 3 ปีที่แล้ว +1

    Thnx sir great explanation :-)

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @donaldmahaya2689
    @donaldmahaya2689 4 ปีที่แล้ว +1

    I'm always left with the illusion that I understood what you just said.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      :(

    • @donaldmahaya2689
      @donaldmahaya2689 4 ปีที่แล้ว +3

      @@statquest Re-watched it and I did get it after all. BAM!

  • @zheyuanzhou3165
    @zheyuanzhou3165 4 ปีที่แล้ว +1

    super clear tut. Thank you very much! But as a non-English native speaker, I am a little confused, what is BAM trying to express?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      th-cam.com/video/i4iUvjsGCMc/w-d-xo.html

    • @zheyuanzhou3165
      @zheyuanzhou3165 4 ปีที่แล้ว +1

      @@statquest
      A tut for BAM! cool lol

  • @rajdeepkumarnath8944
    @rajdeepkumarnath8944 3 ปีที่แล้ว +1

    I once knew a kernal, whose name was Fred,
    But thats not the path we are gonna tread.
    (thats a better song Josh :D )

  • @geo1997jack
    @geo1997jack 3 ปีที่แล้ว

    I did not understand what that 16000 value means or how it helps us. Could you please clarify? Everything else was crystal clear :)

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      It's used as a measure of the relationship between two points. Once we calculate the relationships between all of the points, they are used in a method similar to Gradient Descent to find the optimal classifier.

  • @dimitrismarkopoulos3964
    @dimitrismarkopoulos3964 2 ปีที่แล้ว

    First of all congratulations! your videos are super explanatory! One question: The equation of the polynomial kernel has always the same form?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      As far as I know. However, the variables might have different names.

  • @jhfoleiss
    @jhfoleiss 4 ปีที่แล้ว +1

    Great explanation, thanks! One question: what happens when a and b are vectors? I understand that in this quest you wanted to give a simple example (with a single feature) to make things clear. If the answer to this question is in another quest, i'll gladly wait for it :)

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      If 'a' and 'b' are vectors (because you have measured more than one thing per observation), then you just multiply a^T b, where a^T = a transpose.

    • @primeprover
      @primeprover 4 ปีที่แล้ว +2

      @@statquest Doesn't that assume all the features have the same impact on the outcome? I would have thought that some form of weighting in the sums in the dot product of a and b would be necessary.

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      @@primeprover That's a good point. Like PCA, SVMs are sensitive to scale, so the first thing you would do is normalize all of the variables you've measured.

    • @primeprover
      @primeprover 4 ปีที่แล้ว

      @@statquest Surely more than just normalization is needed? If you provide two normalized variables to a linear regression model they will each get their own coefficient. One could be 1 and the other 0.1. As far as I can see we seem to be giving all features a coefficient of 1 in the models you described? I would have thought that all but one of the additional features(the other would be 1) would need an extra model parameter to scale it in relation to the others.

    • @statquest
      @statquest  4 ปีที่แล้ว +6

      ​@@primeprover I think conceptualizing SVMs in terms of linear or logistic models can be a little misleading. The choice of the parameters for the kernels, unlike linear or logistic regression, do not represent a relationship between the data and the classification. All the SVM is doing is applying relatively arbitrary transformations to the data to increase the dimensionality in a way that might be helpful for separation.

  • @harshitamangal8861
    @harshitamangal8861 4 ปีที่แล้ว +3

    Hi Josh, the explanation is amazing. I had a question- you said that the equation (a*b + r) ^d is used for finding the relationship between two points, how is this found relationship used for getting where the Support Vector Classifier?

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      Unfortunately the details of how it is used would require a whole video and I can't cram it into a comment. However, making the video is on the to-do list.

  • @abrahamjacob7360
    @abrahamjacob7360 4 ปีที่แล้ว

    Josh, this is a great video. One question on the Polynormal Kernal derivation. So the original problem was to find a classification point to find drug usage limits that cures or doesnt cure the disease. When we increased the value of 2, you mentioned it introduced a second dimension. I understood, how squaring the value helped to find a better Marginal classifier line, but ideally there is no meaning to the y axis here right, because the case still remains the same. We are just finding if the drug usage had a positive or negative impact. we could still use the y axis to determine its efficity, but if we increase the value to 3, what would Z axis represent here. Sorry if the question was confusing

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      The new dimensions don't mean anything at all - they are just extra dimensions that allow us to curve and bend the data so that we can separate it. The more dimensions, the more we can curve and bend the data.

  • @beshosamir8978
    @beshosamir8978 2 ปีที่แล้ว

    quick question : why it is useful to calculate the relationships between every two point regardless in any dimensions , how it can be useful for calculating the decision boundary ?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      SVM's are optimized using an iterative algorithm that is similar to Gradient Descent, and the relationship values are essentially the "loss" values and help move the SVC to the correct spot.

    • @beshosamir8978
      @beshosamir8978 2 ปีที่แล้ว +1

      @@statquest
      So how to know That Is the best dimension i'm looking for according the relationship between every two points?

    • @statquest
      @statquest  2 ปีที่แล้ว

      @@beshosamir8978 www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/svm.pdf

  • @sabbirakhand7120
    @sabbirakhand7120 4 ปีที่แล้ว

    After getting the value of r and d by cross validation we get the value of 16002.25. But how to use this value to determine the high dimensional relationship??
    This video was really helpful to understand the topic despite of me being from a different background. Thanks.

    • @statquest
      @statquest  4 ปีที่แล้ว

      The actual method for finding the classifier would require a whole video on it's own. It's like gradient descent, but with a few important differences.

    • @jaideepkukkadapu2600
      @jaideepkukkadapu2600 3 ปีที่แล้ว

      @@statquest Please make a video on that ,it will be very helpful for us.

  • @iliasp4275
    @iliasp4275 2 ปีที่แล้ว +1

    send my love to fred

  • @harishh.s4701
    @harishh.s4701 2 ปีที่แล้ว

    Hi,
    Thanks a lot for your content. It is very easy to understand and I appreciate your way of explaining things. I had one doubt. Can you please explain how does Cross-validation help to determine the optimal degree of the polynomial kernel used in SVM's?

    • @statquest
      @statquest  2 ปีที่แล้ว

      I do that in this video: th-cam.com/video/8A7L0GsBiLQ/w-d-xo.html

  • @ronitganguly3318
    @ronitganguly3318 2 ปีที่แล้ว

    The high dimensional relationship you calculated at the end is a number which tells what exactly? How does it help to pseudo transform into higher dimensions?

    • @statquest
      @statquest  2 ปีที่แล้ว

      Are you familiar with Gradient Descent? th-cam.com/video/sDv4f4s2SB8/w-d-xo.html SVMs use a different algorithm, but the idea is similar, and you can think of the numbers, like 16002.25 as values that the algorithm is trying to optimize.

  • @marijatosic217
    @marijatosic217 4 ปีที่แล้ว +2

    Thank you for the video! And now, what does this number 16002.25 tell us? :D How will we know what the right dosage?

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      That's just an example of the kind of values that are used by the kernel trick to determine the optimal placement of the support vector classifier.

  • @L.-..
    @L.-.. 4 ปีที่แล้ว +1

    After we find the dot product, with that value how we decide whether the new sample belongs to positive class or negative class?
    Please clarify Josh.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      It's a little too much to put into a comment. The purpose of the video was only to give insight into how the kernel works, not derive the math.

  • @marcelocoip7275
    @marcelocoip7275 2 ปีที่แล้ว

    Visually thinking about the last set of data: if you can draw a line to separate the data if you square each observation to the y-axis, then you can draw a line independently of the scale/ratio of the x-axis. Then I see is that the only thing that it is adding "solving/math value" is increasing the order of the xi-axis to fit a hyperplane (d value). What r contributes to arrive to a better solution?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      I don't think it adds much.

  • @tumul1474
    @tumul1474 4 ปีที่แล้ว +3

    this is damn amazing !!

  • @wong4359
    @wong4359 ปีที่แล้ว +1

    I wish if there are 10 like bottoms, so that I can click all of it ! I will make sound of bibibubibu when I am clicking the like.

  • @ayoubmarah4063
    @ayoubmarah4063 4 ปีที่แล้ว

    Great content as usual BIG THANKS to you
    I hope you are having a nice day
    i have questions if you dont mind :
    i got confused with the problem of the imbalanced classes , when the classes are imablanced we do either upsampling or downsampling so that we have a balanced data
    1) does the accuracy score always wrong using imbalenced data? what about f1_score then ?
    2) how to decide which sampling method is good ? should we run them both ?
    i do my best to try and search for solution but there is so much opinion and im lost , i saw your video last week but when i got my hand dirty with projects i confront new problems that are complicated
    Thank you again for your help

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      I'm glad you like the video. For details about how to use SVMs with unbalanced data, see this discussion: stats.stackexchange.com/questions/94295/svm-for-unbalanced-data

  • @The_Mashrur
    @The_Mashrur 2 ปีที่แล้ว

    When you say relationships between observations, what exactly do you mean? You didn't really go over how such relationships allow you to find an SVC in the higher dimension?

    • @statquest
      @statquest  2 ปีที่แล้ว +2

      In the case of SVM, the relationship is a rather abstract metric of distance.

  • @DeepakSingh-fo2wm
    @DeepakSingh-fo2wm 4 ปีที่แล้ว +1

    I am still not clear what happened after finding a relationship in higher dimension like in the video what happened after finding 16002.25 ?? Can you please add a short video over the same if possible.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      It would be a long video, but it's on the to-do list.

  • @nafassaadat8326
    @nafassaadat8326 3 ปีที่แล้ว +1

    BAMMMMMMMMMM!!!!!!!!!!!!!!!!!!!!! great , thank you

    • @statquest
      @statquest  3 ปีที่แล้ว

      You're welcome!!

  • @XoXkS
    @XoXkS 4 ปีที่แล้ว

    Another Great thing, besides the astonishing easy explanations, is the way you talk.
    You talk so slow, that I can watch the easy parts easily on 1.5 Speed and the hard parts on normal speed.
    Most people, when they talk slow, talk slow by making long pauses in between words, this way watching at a higher speed sounds very unnatural. You sound just fine on normal and 1.5 Speed!

  • @leonugraha
    @leonugraha 4 ปีที่แล้ว +2

    Thank you for SVM follow up video, by the way, do you maintain a Github account?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      I should...

  • @MrWincenzo
    @MrWincenzo 4 ปีที่แล้ว +1

    since the kernel requires to calculate the dot product for each couple of points, suppose we have 10 points when we do it just for each point with respect to the others and itself we should obtain 10 different dot products for each single point. Which one of those 10 dot products become the new "y" dimension of the point?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      None of them end up being the new "y" dimension. The kernel trick works without having to make that transformation. We use the transformation to give an intuition of how the process works, but the kernel trick itself bypasses the transformation. This is the "kernel trick", and I mention it in the first video in the series on SVMs: th-cam.com/video/efR1C6CvhmE/w-d-xo.html

    • @MrWincenzo
      @MrWincenzo 4 ปีที่แล้ว +1

      @@statquest yes i misunderstood before, now i got it: when we plug the values into the polynomial expression is equivalent to calculate the dot product in higher dimensions. And since the SVM only depends on those dot products among point we have just "improved" the classification mimicking the dot product in higher dimensions as musch as infinite like with RBF.
      Still thank you for all your efforts and your gentle replies to our questions. Regards.

  • @guyelovici4940
    @guyelovici4940 2 ปีที่แล้ว +1

    מלך!

    • @statquest
      @statquest  2 ปีที่แล้ว

      Wow! Thanks!

  • @berknoyan7594
    @berknoyan7594 4 ปีที่แล้ว +1

    Hi Josh,Thanks for the video. You are helping me a lot. I have just one question. What do you mean by "high dimensional relationship"? Because It can be achieved by any 2 numbers that has multiplication result of 126 which is Infinite.Its just a dot product of two 3 dimensional data.Cross Validation uses misclassification rate to select best r and d as far as i know. Do CV use these numbers on any calculation?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Cross Validation does not use these high-dimensional relationships. Instead, the algorithm that finds optimal fits, given constraints (like the number of misclassifications you will allow) uses them. Although the dot product seems like it would be too simple to use, it has a geometric interpretation related to how close the points are to each other. For more details, check out the Wikipedia article: en.wikipedia.org/wiki/Dot_product

  • @pratyanshvaibhav
    @pratyanshvaibhav ปีที่แล้ว

    respected josh sir, thank you for such amazing explanation..sir please help me i have a doubt. will we take the dot products for every pair of points like first red point with all the green points and then so on or we will take first red point with first green point and so on..

  • @stoicism-101
    @stoicism-101 2 ปีที่แล้ว

    Dear Sir,
    Kernels are basically used for finding the relationship between two points using the formulae. How do we further find the Support vector classifier?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      The SVC is found using an iterative process that is a lot like Gradient Descent, and the output from the kernels is like the "loss" values.

  • @surajjoshi3433
    @surajjoshi3433 3 ปีที่แล้ว

    Sir ,From where did you learned all these in such a detail ,please make a video on best resources to study for machine learning or just reply my comment please Josh Starmer Sir :)

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      To learn how I learn, see: th-cam.com/video/crLXJG-EAhk/w-d-xo.html

  • @chinzzz388
    @chinzzz388 4 ปีที่แล้ว

    When we calculate relationships between 2 data points, do we calculate relationships between all the points w.r.t all the other points? Ex: if we have 4 data points (1,2,3,4) do we calculate relationship between (1,2) and (3,4) OR do we calculate relationship between (1,2),(1,3),(1,4),(2,3)...etc

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      We calculate all of the relationships.

  • @hrdyam865
    @hrdyam865 4 ปีที่แล้ว +1

    Thanks for the videos 😊, Can we use SVM for multinomial classification?

    • @statquest
      @statquest  4 ปีที่แล้ว

      I believe you just create one SVM per classification, and each SVM compares one classification to all the others (i.e. a sample either has that classification or not).

  • @rishabhmalhotra127
    @rishabhmalhotra127 4 ปีที่แล้ว +4

    This StatQuest isn't about a kernel named Fred coz it's about a kernel named Polynomial. Hilarious xD

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thanks! :)

  • @hamidomar3618
    @hamidomar3618 2 ปีที่แล้ว

    Hey, great video, thanks!
    What happens after the transformation though? I mean, how does the final result. i.e. a scalar corresponding to relationship between each observation, help in identifying an optimally classifying hyperplane?

    • @statquest
      @statquest  2 ปีที่แล้ว

      The value is used in a way similar to how loss values are used in Gradient Descent. There is an iterative algorithm that uses the values to optimize the fit.

  • @nick_g
    @nick_g 3 ปีที่แล้ว +1

    I get the feeling some linear algebra might help with this stuff. I’m no expert here but it kernels remind me of how an extra column is added to a matrix in order to transform to a higher dimension without changing the original values I saw in a computerphile video: th-cam.com/video/vQ60rFwh2ig/w-d-xo.html
    Also there’s a video I watched about factoring polynomials with matrices in the numberphile channel that might apply: th-cam.com/video/wTUSz-HSaBg/w-d-xo.html

  • @zeynabmousavi1736
    @zeynabmousavi1736 4 ปีที่แล้ว

    How overfitting is evaluated in SVM? How do you check whether the output of SVM is generalizable or not?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      You compare the classifications made with the training dataset to classifications made with the testing dataset.

    • @zeynabmousavi1736
      @zeynabmousavi1736 4 ปีที่แล้ว

      @@statquest Thank you. I should have mentioned that I have small data set and I take all datapoints as training set and do 10 fold cross validation. I am concerned about having ovefitting.

  • @Han-ve8uh
    @Han-ve8uh 3 ปีที่แล้ว

    1. Is there a relationship between the values of d and r in polynomial kernel, and the number of output dimensions?
    2. At 3:27 why is the 3rd term ignored, is this part of the kernel trick? Are the 3rd terms always the same no matter what d or r is used?
    3. It seems that the dot product exists only because d=2 which after expansion allows the expression to be expressed as a dot product, if d=3 then we cannot express as a dot product of 2 terms anymore?
    4. Does this whole video apply to other kernels too?

    • @statquest
      @statquest  3 ปีที่แล้ว

      1) d ends up being the number of dimensions.
      2) Regardless of the values for 'a' and 'b', the last dimension will always have the exact same value, 1/2. Thus, it will not help us establish how 'a' and 'b' are related.
      3) If d=3, then we get a^3b^3 + 3a^2b^2 + 3ab + 1 = the following dot product (a^3 + sqrt(3)2^ + sqrt(3)a + 1) dot (a^3 + sqrt(3)2^ + sqrt(3)a + 1)
      4) This video provides the background for understanding how the RBF kernel works. For details on that, see: th-cam.com/video/Qc5IyLW_hns/w-d-xo.html

    • @Han-ve8uh
      @Han-ve8uh 3 ปีที่แล้ว

      @@statquest Thanks a lot now i get the idea of how you can for 3), always use square roots to split up the constants, then put a and b into the 2 terms of the dot product. Also, for 1) I see the number of terms of (x+y)^n is n+1, but we always throw the last term since it's a constant that square roots to equal constants for both points, so end up having n dimensions.
      How did people invent these kernels? Did the kernel trick come later as a hack to overcome computation constraints, or it came first before a whole class of kernels was discovered possible?
      Also, why is there an obsession with learning straight lines through the data (no matter raw/dimension raised), has this got to do with limitations of the optimization method (I think you mentioned in other comments it uses gradient descent). Because i'm thinking if it could generate non straight lines, then maybe there's no need to raise to higher dimensions?

    • @statquest
      @statquest  3 ปีที่แล้ว

      @@Han-ve8uh I don't know how people came up with the kernel trick. However, straight lines are usually much easier to optimize than curved ones. However, neural networks, which I describe here, fit curved lines to data: th-cam.com/video/CqOfi41LfDw/w-d-xo.html

  • @redaouazzani7120
    @redaouazzani7120 4 ปีที่แล้ว

    Great explanation ! But what are the math reasons to choose RBF Kernel or Polynomial Kernel ? It depends on what ?

    • @statquest
      @statquest  4 ปีที่แล้ว

      Usually people just start with the RBF kernel and see how well it performs. If it doesn't do well, they might try to polynomial kernel.

  • @slirpslirp
    @slirpslirp 4 ปีที่แล้ว +1

    awesome, so the dot product is equal to the result of the kernel function ?

  • @aaditstudent
    @aaditstudent 8 หลายเดือนก่อน

    Hey guys, did any of you figure out why we only need to transform the data to compute the dot product, and not tranform it ?
    Thanks in advance! :)

    • @statquest
      @statquest  8 หลายเดือนก่อน

      The kernel function itself is enough to give a metric of distance, which can be used for an iterative optimization procedure.

  • @hassanjb83
    @hassanjb83 4 ปีที่แล้ว +1

    ​At 6:33 you mention that we need to determine the value of both r and d through cross validation. If we have one dimensional data then shouldn't be d = 2 only?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Why do you say that?

    • @hemersontacon3168
      @hemersontacon3168 4 ปีที่แล้ว +2

      I think you got too attached to the example. Imagine the same example but with the two colors all mixed up. Then I think that d = 2 would not be enough to split things up!

    • @ccuny1
      @ccuny1 4 ปีที่แล้ว +1

      @@hemersontacon3168 That's an insightful comment that actually opened my eyes. Thank you.

    • @hemersontacon3168
      @hemersontacon3168 4 ปีที่แล้ว

      @@ccuny1 Glad to know and glad to help ^^

  • @abhishekanand5974
    @abhishekanand5974 3 ปีที่แล้ว

    What exactly is meant by relationships between observations?

    • @statquest
      @statquest  3 ปีที่แล้ว

      It's some metric of distance.

  • @hamedbahramiyan
    @hamedbahramiyan 3 ปีที่แล้ว

    I didn't get how the kernel relationship value is used to transform the data. In the theory, the kernel relationship is a square matrix, containing kernel values for all samples 2 by 2. You calculated one of these at the end, but how to use them to transfer the initial data? what is the function or algorithm?

    • @statquest
      @statquest  3 ปีที่แล้ว

      Explaining how the kernel values are used would require a bunch more videos. However, the quick and easy version is that it uses an iterative algorithm that is similar (but not the same) as gradient descent to find the optimal classifier.

    • @hamedbahramiyan
      @hamedbahramiyan 3 ปีที่แล้ว

      @@statquest Thanks a lot. You have shown a manual transformation in 3:30. My question is what are 'a' and 'b'? any pair of samples regardless of their labels? if so, how it works when a=x1 is used in different dot products (first with b=x2 and then with b=x4)? I hope I could clarify my point

    • @statquest
      @statquest  3 ปีที่แล้ว

      @@hamedbahramiyan If a=x1, then it's 2-D coordinates will be a, a^2, regardless of what b is set to.

    • @hamedbahramiyan
      @hamedbahramiyan 3 ปีที่แล้ว +1

      @@statquest thank you