The Kernel Trick - THE MATH YOU SHOULD KNOW!

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 มิ.ย. 2024
  • Some parametric methods, like polynomial regression and Support Vector Machines stand out as being very versatile. This is due to a concept called "Kernelization".
    In this video, we are going to kernelize linear regression. And show how they can be incorporated in other Algorithms to solve complex problems.
    If you like this video, hit that like button. If you’re new here, hit that SUBSCRIBE button and ring that bell for notifications!
    FOLLOW ME
    Quora: www.quora.com/profile/Ajay-Ha...
    REFERENCES
    [1] The Kernel Trick: people.eecs.berkeley.edu/~jor...
    [2] Positive Definite Kernels: en.wikipedia.org/wiki/Positiv...

ความคิดเห็น • 132

  • @CodeEmporium
    @CodeEmporium  5 ปีที่แล้ว +30

    Going to make a video on SVM and how it uses this kernel trick. So if you want to understand the math behind one of the most common Machine learning Algorithms, *subscribe* to keep an eye out for it ;)

  • @pavan4651
    @pavan4651 2 ปีที่แล้ว +40

    Few of my friends wanted to get into ML at some point, but when they realized ML is just maths they went back to web dev. I love math and your videos make me love ML even more. Keep up!

  • @achillesarmstrong9639
    @achillesarmstrong9639 4 ปีที่แล้ว +11

    very good explanation. also in depth. Most other videos are just explaining without formula, which are too simple

  • @987654321ABC1000
    @987654321ABC1000 4 ปีที่แล้ว +2

    This video is awesome, thanks for the lecture!

  • @krishnasumanthmannala984
    @krishnasumanthmannala984 4 ปีที่แล้ว +15

    I think at 2:50 the suffix to y and x should be i.
    Thank you for the great explanation.

  • @uforskammet
    @uforskammet 4 หลายเดือนก่อน +2

    Amazing! Elegant explanation.

  • @meloyang9326
    @meloyang9326 3 ปีที่แล้ว +1

    Thank you a lot! It really soole my problems about kernal tricks as I felt extremely puzzled in our professor's lecture.

  • @MrMaipeople
    @MrMaipeople 5 ปีที่แล้ว

    Thank you so much for this excellent Video

  • @Ashrafzaman37
    @Ashrafzaman37 4 ปีที่แล้ว

    Very nice presentation...like it.

  • @pratikdeoolwadikar5124
    @pratikdeoolwadikar5124 4 ปีที่แล้ว

    Thanks a lot, that cleared many doubts !!

  • @johnfinn9495
    @johnfinn9495 3 ปีที่แล้ว +5

    At about 2:56 you need to explain to customers the logic here: w* is not the solution because alpha depends on w. Also, at about 4:16 you cancel K, although K is singular. At lest you need some discussions of the range and null space of the kernel.

    • @User-cv4ee
      @User-cv4ee 2 ปีที่แล้ว +1

      I was wondering about how w* was the solution yet contained w in it. What is it supposed to be?

  • @harry5094
    @harry5094 5 ปีที่แล้ว +9

    Damn Dude!!, you really deserve a lot more views and subscriptions.Keep doing the great work.

    • @CodeEmporium
      @CodeEmporium  5 ปีที่แล้ว

      Thanks for the kind words homie!

    • @leif1075
      @leif1075 3 ปีที่แล้ว

      @@CodeEmporium But polynomial regression is an example of a nonlinear function generally right? Unless itnjust has several linear variables..

  • @prithviprakash1110
    @prithviprakash1110 3 ปีที่แล้ว

    Great explanation.

  • @TheAkashkajal
    @TheAkashkajal 5 ปีที่แล้ว

    Great Video

  • @Manuel-tf7qc
    @Manuel-tf7qc 5 ปีที่แล้ว

    Just to made myself clear with your development. In minute 4:13, is the symmetry of the Kernel Matrix (K = t(K)) that allows you to have [t(y)* K * alpha] instead of [t(alpha) * K * y]? where "t" is transpose.

  • @msjber5870
    @msjber5870 4 ปีที่แล้ว +5

    At 3:37, isn't the Kernel matrix K of size (m,m) rather than (m,n) since we do a dot product of every observation (from X1 to Xm) with every other one, so doing a square matrix (as you mentioned yourself just before), this gives a matrix K of size m * m, and not m * n unless I missed stgh.
    So the last element of the first row for instance should be Phi(X1)t * Phi(Xm), and not Phi(X1)t * Phi(Xn).
    Correct ?

    • @josephchong783
      @josephchong783 4 ปีที่แล้ว +2

      i was wondering about this too. It should be m x m or at least he shouldve stated m = n. Annotations is really annoying when it is not explained

  • @birdman8375
    @birdman8375 2 ปีที่แล้ว

    Can you make a video like this for simple linear regression without regularization?

  • @johnfinn9495
    @johnfinn9495 3 ปีที่แล้ว

    I have a few related questions. First, the regression equations are overdetermined, i.e. the number N of data points is greater than the number M of basis functions, right? And this is why we need regularization (ridge regression, lambda>0), right? If N>M and K is NXN, it has rank (at most) M so K^{-1} does not exist, but (K+lambda I)^{-1} does. That is OK, and I suppose you can let lambda go to zero to minimize the amount of regularization. But then, if you use a Gaussian radial basis function, this is infinite dimensional (M goes to infinity), and the regularization is no longer needed. Does all this seem correct?

    • @JI77469
      @JI77469 8 หลายเดือนก่อน

      If you go to the section on prediction, you'll see that the size of M (even if M = infinity) is irrelevant, and what's required is the inversion of K + lambda I, which is "just" inverting an
      N x N matrix. I don't think M has anything to do with the degree of overfitting. So yes even for a Gaussian kernel (when M = infinity) you still want to regularize.

  • @mikel5264
    @mikel5264 4 ปีที่แล้ว

    Man, you are the best

  • @darasingh8937
    @darasingh8937 2 ปีที่แล้ว

    Thank you for your videos! I love the fact that you show equations.

  • @JRAbduallah1986
    @JRAbduallah1986 2 ปีที่แล้ว

    Thanks for uploading this video, having kernel trick to get around phi phi transpose is a good solution, however, at the end we have the inverse of K+lambda I which is a big matrix. Do you have any solution for that?

    • @JI77469
      @JI77469 8 หลายเดือนก่อน

      To my knowledge the two practical methods that exist to avoid dealing with the often huge matrix K are "Random Features" and "Nystrom Method". But in general the huge matrix K and related issues (like inversion) are really why deep learning is often used when lots of data is available.

  • @zoro8117
    @zoro8117 6 หลายเดือนก่อน

    Dude thanks a lot ❤

  • @chaosido19
    @chaosido19 ปีที่แล้ว

    omg I watched plethora of videos and read so many articles trying to explain me what kernel method actually gains, and I finally understand not only conceptually but also down to the technical level

    • @CodeEmporium
      @CodeEmporium  ปีที่แล้ว

      Haha I made this video so long ago I thought I explained it in a real complex way. That said, super glad this was helpful

  • @purvanyatyagi2494
    @purvanyatyagi2494 3 ปีที่แล้ว

    can w use the same techniques with svm , as in svm we have to use the lagrangian to get to the dual form

  • @MSalem7777
    @MSalem7777 3 ปีที่แล้ว

    Thank you! Great explanation.

    • @CodeEmporium
      @CodeEmporium  3 ปีที่แล้ว

      Thanks for the compliments

  • @saeedmakki9923
    @saeedmakki9923 3 ปีที่แล้ว

    Thanks a lot!

  • @mikel5264
    @mikel5264 4 ปีที่แล้ว +2

    How to get the vector 'k' in the last slide?

  • @rishabtomar9837
    @rishabtomar9837 5 หลายเดือนก่อน

    It would be a great help to understand this better if can you please make a video that takes a dataset as a m samples and n features and how would we calculate this K matrix and use this for transforming the features.

  • @keyangke
    @keyangke 5 ปีที่แล้ว

    I think there might be a mistake in your equation for J(alhpa) at 3:21, should be J(w*) instead.

  • @T_rex-te3us
    @T_rex-te3us 9 หลายเดือนก่อน

    incredible explaination, thank you very much.

    • @CodeEmporium
      @CodeEmporium  9 หลายเดือนก่อน

      Thanks so much for watching and commenting! Glad it is useful

  • @ekbastu
    @ekbastu 3 ปีที่แล้ว

    Man you are a champion. Thank you very much.

    • @CodeEmporium
      @CodeEmporium  3 ปีที่แล้ว +1

      I'd love to be one someday. Thanks a ton :)

  • @hrizony7847
    @hrizony7847 9 หลายเดือนก่อน

    sorry I don’t understand the last part. In prediction, how to calculate k(x)? Say we have all training points so that we have the K, but for testing point x what does it mean? Thanks for help bro

  • @aspergale9836
    @aspergale9836 4 ปีที่แล้ว

    Where does the $1/\lambda$ come from in the derivative at 2:47?

  • @rajkundaliya7796
    @rajkundaliya7796 2 ปีที่แล้ว

    Damn! Damn! Damn! Couldn't have been better. Thanks a lot! A lot! As simple as it can get!

  • @shivamsisodiya9719
    @shivamsisodiya9719 5 ปีที่แล้ว +2

    Please make more videos on GAN

  • @ayushtankha413
    @ayushtankha413 6 หลายเดือนก่อน

    why do we get the 1/lambda term after derivative in w* ?? @2:52

  • @frankbreeze9895
    @frankbreeze9895 4 ปีที่แล้ว

    Dear author, how can we obtain the w* at 2:58? Do we obtain it by setting the derivatives of J(w) to zero? Could you please explain it? Thanks.

    • @CodeEmporium
      @CodeEmporium  4 ปีที่แล้ว +1

      Yes. The idea is to find the weight vector that minimize the cost.

    • @frankbreeze9895
      @frankbreeze9895 4 ปีที่แล้ว

      @@CodeEmporium Thank you very much for your reply.

    • @birdman8375
      @birdman8375 2 ปีที่แล้ว

      @@CodeEmporium Can you so the same but now without regularization?

  • @rembautimes8808
    @rembautimes8808 2 ปีที่แล้ว +1

    It is indeed an awesome video but viewers should have some background knowledge so that is easy to follow. What is nice is that ties in so many concepts in a single 7 min video. A good warm up video for those who have to go out and develop some code. I have resisted watching Code Emporium for a long time , now I'm a subscriber.

    • @CodeEmporium
      @CodeEmporium  2 ปีที่แล้ว

      You're right. I made this video while in grad school. So it was meant to serve as a refresher to me before exams :) That's why it's a lil hard to follow. Maybe if i had my audience more in mind at the time, this video may have been more accessible

    • @beboaltemimiburhan1330
      @beboaltemimiburhan1330 2 ปีที่แล้ว

      @@CodeEmporium ضص

  • @niteshkans
    @niteshkans 3 หลายเดือนก่อน

    There are grand errors in the equations that you solved. But, yes the explanation is on point.

  • @JI77469
    @JI77469 8 หลายเดือนก่อน

    I understand data scientists might want to shy away from Hilbert spaces, but this stuff is so much clearer if you just use the Finite Representer Theorem to reformulate Ridge regression as a simple regression problem involving the kernel matrix K. :) Just my opinion.

  • @univuniveral9713
    @univuniveral9713 4 ปีที่แล้ว +2

    I can't really get the difference between polynomial regression and nonlinear regression. Please can you help me with that?

    • @covariance5446
      @covariance5446 3 ปีที่แล้ว +3

      You could probably answer that question without even knowledge of linear algebra or regression. After all, what is relationship between a polynomial function and a non-linear function? A non-linear function is simply any function that isn't of the form y = mx + b (or y-hat = b1x1 + b2x2 + ... bnxn for multiple linear regression). A polynomial function is of the form y = [polynomial expression here]. Recall that a polynomial expression is one that only involves terms of the form cx^n + cx^(n-1) + ... cx^0. Examples include a linear function, but also quadratics, cubics, and so forth.
      In short, a polynomial function can be linear (as in the case of y=mx+b) or non-linear (as in the case of, say, y = 3x^2 + x + 2.
      In linear regression, you are fitting a line to a data. In a non-linear regression, you are fitting a curve (and it would have to be a curve, not a line since it's *non-linear*) to a set of data. BUT that curve doesn't have to be polynomial in nature (though it certainly can be). Whether that curve is defined by a polynomial function (not of order 1) OR something else is up to the circumstance! It might, for example, be an exponential function or a sinusoidal one. Recall that neither of the latter are polynomial functions because they are not of the form y = c1x^n + c2x^(n-1) + ... cnx^0.
      Hope that was a satisfying answer!

  • @vintonchen6210
    @vintonchen6210 3 ปีที่แล้ว +3

    How did you solve for the optimal w* from J(w)? I'm new to matrix calculation, would be great if you can give an explanation. Thank you.

  • @RichardBrautigan2
    @RichardBrautigan2 3 ปีที่แล้ว +3

    Great Video. Thank you. However, there is a mistake at 6:30, y_pred = w'*phi(x) (w was phi'*(K+lambda)^-1*y ). Hence, w' = y'*(K+lambda)^-1*phi and y_pred = y'(K+lambda)^-1*phi*phi(x). But you wrote phi'*phi(x) and it's a inner product! It is not Kernel. phi is a nxm matrix, m can be infinite (phi' * phi > mxm covariance matrix and phi*phi' > n*n Kernel matrix) We know Kernel matrix. It cannot be infinetexinfinete dimensions. It should be nxn matrix. There is also a problem with the notation you wrote at 3:32. If phi = [phi(x1)' ; phi(x2)'; ... ; phi(xn)' ]nxm then, phi(x)*phi(x)' can be Kernel matrix as phi(x)' = [phi(x1) phi(x2) ... phi(xn)] and phi(x)*phi(x)' = [phi(x1)'phi(x1) phi(x1)'phi(x2) ... phi(x1)'phi(xn); ..... ; phi(xn)'phi(x1) phi(xn)phi(x2) ... phi(xn)phi(xn)]nxn. But you wrote m*n matrix at 3:32. This cannot be possible. if m is not equal to n, then it cannot be symmetric.

    • @RichardBrautigan2
      @RichardBrautigan2 3 ปีที่แล้ว +1

      It is hard to explain this with plain text, sorry. In short, m is the number of dimensions and n is the number of samples. At 3:32 K is a mxn matrix. Then, K cannot be symmetric if m is not equal to n. K must be symmetric as you said. Hence, phi is nxm matrix and phi^T is a mxn matrix and phi*phi^T is a nxn matrix.

  • @spiritmoon3457
    @spiritmoon3457 3 หลายเดือนก่อน

    2:53 why after solve for w, you still have w in left and right parts of equation?

  • @lewiswesley66
    @lewiswesley66 2 ปีที่แล้ว

    can someone explain the where the 1/lambda comes from in the derivative?

  • @jamesfulton6981
    @jamesfulton6981 5 ปีที่แล้ว +16

    I think there might be a mistake in your equation for alpha_n at ~3:08. The summation shouldn't be there

    • @CodeEmporium
      @CodeEmporium  5 ปีที่แล้ว +4

      You're right. My bad. I'll pin your comment for now (until I you/someone else points out some more mistakes). Thanks for the heads up!

    • @ditke71
      @ditke71 5 ปีที่แล้ว +2

      @@CodeEmporium Under summation everything should be indexed by i not by n, and the summation is ok.

  • @rishabhnandy38
    @rishabhnandy38 4 ปีที่แล้ว

    can u please help me solving one problem on this topic

  • @birdman8375
    @birdman8375 2 ปีที่แล้ว +1

    Thats fine you make predictions without phi, but in order to make predictions you need to compute w*. In your kernelized version, you still need phi transpose, in addition to K, in order to estimate w*. Can you explain that better? How to get rid of phi transpose in the kernelized version of w*?

    • @JI77469
      @JI77469 8 หลายเดือนก่อน

      He does this in the last section ("prediction"). You really want the actual prediction function y, and he shows the formula for it just in terms of K and not with phi floating around anywhere.

  • @ilyaskhan.1994
    @ilyaskhan.1994 ปีที่แล้ว

    What kind of math is this vector calculus ?thanks

  • @goldfishjy95
    @goldfishjy95 2 ปีที่แล้ว

    what does w* represent? thank you

  • @danawen555
    @danawen555 2 ปีที่แล้ว

    thanks!

  • @rrrprogram8667
    @rrrprogram8667 4 ปีที่แล้ว

    Subscribed

  • @sathyakumarn7619
    @sathyakumarn7619 2 ปีที่แล้ว +1

    Good speed in video. But I should say perhaps it is a bit too under-detailed. Maybe, one should be able to get through if he went through all your videos. I would request for more details in derivations in future videos.

  • @a741987
    @a741987 5 ปีที่แล้ว +1

    Damn this is so beatiful

  • @st0a
    @st0a 8 หลายเดือนก่อน

    But why did you write \sum_{n=1}^{N} ||w||^2 when there's no n term in that part of the ridge regression equation? Very confusing, to say the least....

  • @ignasa007
    @ignasa007 ปีที่แล้ว

    2:57 you mean \alpha_n = \frac{1}{\lambda} (y_n - w^T\phi(x_n)), without the \sum operator. Had me confused for a while.

  • @sourasekharbanerjee9018
    @sourasekharbanerjee9018 4 ปีที่แล้ว +3

    at 6.33 how is it possible to shift "y" before the kernel matrix without transposing as compared to at 4.45

    • @pritamkhan4143
      @pritamkhan4143 4 ปีที่แล้ว

      @CodeEmporium, its a genuine question. Please do reply.

  • @sakshamsoni1869
    @sakshamsoni1869 3 ปีที่แล้ว

    How is 𝜑^𝑇 𝜑 variance matrix ?

  • @njmanikandan9408
    @njmanikandan9408 2 ปีที่แล้ว

    can someone tell me what is gram matrix ?

  • @johndagdelen815
    @johndagdelen815 3 ปีที่แล้ว

    Did you hire someone else to do the voice over for your video?

  • @brettgattinger3338
    @brettgattinger3338 3 ปีที่แล้ว +4

    maaaaaatrix

  • @unknown-otter
    @unknown-otter 4 ปีที่แล้ว

    Finally, I understood!

  • @Trubripes
    @Trubripes 29 วันที่ผ่านมา

    Dense but informative.

  • @tekingunasar4189
    @tekingunasar4189 2 ปีที่แล้ว

    What is the point of the summations in minute three if you don't even use the index variable? Why y_n and x_n and not y_i and x_i? Also, would have been better if you credited the analytics vidhya article you took this information from (Same with your support vector machine video)

  • @ahmad3823
    @ahmad3823 6 หลายเดือนก่อน

    several typos for sure but good video!

    • @CodeEmporium
      @CodeEmporium  6 หลายเดือนก่อน

      Yea. I have tried to get better about this over the years. Thanks for watching!

  • @vtrandal
    @vtrandal 2 ปีที่แล้ว

    Maahtrix? It may not all be one world, but it does overlap. Matrix!

    • @CodeEmporium
      @CodeEmporium  2 ปีที่แล้ว

      Math Trix

    • @vtrandal
      @vtrandal ปีที่แล้ว

      @@CodeEmporium I am glad you have a sense of humor.

  • @devendraalawa4173
    @devendraalawa4173 3 ปีที่แล้ว

    Aayega

  • @Leon-pn6rb
    @Leon-pn6rb 4 ปีที่แล้ว +1

    still didnt get it !
    ughhhhhhhh

  • @redberries8039
    @redberries8039 5 ปีที่แล้ว

    ..but do I need to know that math to apply kernalisation? Really do I?

    • @keyangke
      @keyangke 5 ปีที่แล้ว +2

      from sklearn.kernel_ridge import KernelRidge

  • @spherinder5793
    @spherinder5793 2 หลายเดือนก่อน

    mattrix

  • @devendraalawa4173
    @devendraalawa4173 3 ปีที่แล้ว

    कोष थिता nhi aayega bro

  • @amitupadhyay6511
    @amitupadhyay6511 3 ปีที่แล้ว +1

    hey, make it easy, we came here to understand it easily, not hard, damm it

  • @resitk7272
    @resitk7272 5 ปีที่แล้ว

    This is amazing but, there are the errors I'm getting implementing this into python :( Anyone one could help?

  • @techsavy5669
    @techsavy5669 3 ปีที่แล้ว +1

    i am even more confused now.

  • @lglgunlock
    @lglgunlock 2 ปีที่แล้ว

    Wrong equations confuse people, pls correct it

  • @devendraalawa4173
    @devendraalawa4173 3 ปีที่แล้ว

    Going to riding fhor fhirsht vi teck no

  • @choubro2
    @choubro2 ปีที่แล้ว

    bro is the mahtrix contrarian

  • @joaquingiorgi5809
    @joaquingiorgi5809 ปีที่แล้ว

    What a fuck boy way to say maatrix 😂, great video though

  • @piyushjaininventor
    @piyushjaininventor 8 หลายเดือนก่อน

    perfect definition of How Not To Teach Machine Learning Concepts.

  • @kaustubhkeny1140
    @kaustubhkeny1140 3 ปีที่แล้ว

    Bouncer.

  • @devendraalawa4173
    @devendraalawa4173 3 ปีที่แล้ว

    Galt he

  • @sameure6486
    @sameure6486 5 ปีที่แล้ว +3

    You're pronouncing "matrix" incorrectly.

    • @MrCmon113
      @MrCmon113 5 ปีที่แล้ว

      No, you are, because of the vowel shift in the English language. A German person, for example, would pronounce it as he does.

    • @PS-eu6qk
      @PS-eu6qk 4 ปีที่แล้ว

      who gives a f**k. you pronounce chicago as "shikago" and chimes as chimes. there are many other stupidity filled in this language.

  • @Seff2
    @Seff2 3 ปีที่แล้ว

    2 minutes in and understood absolutely nothing. waste of time

    • @sally1917
      @sally1917 3 ปีที่แล้ว

      maybe you need some prior course

  • @watsufizzi
    @watsufizzi 3 ปีที่แล้ว +1

    sooooo, nobody is going to mention the ridiculous pronunciation of the word "matrix"?

  • @harishr5620
    @harishr5620 5 ปีที่แล้ว +3

    You are just news reading the topic not teaching..-_-

  • @dariosilva85
    @dariosilva85 4 ปีที่แล้ว +3

    Matrix is pronounced May-Trix.

    • @CodeEmporium
      @CodeEmporium  4 ปีที่แล้ว +2

      I'm partly from India andthe states, so my pronunciations and metrics are all over the place. Ill be more consistent

    • @PS-eu6qk
      @PS-eu6qk 4 ปีที่แล้ว +1

      What does it matter you idiot. Why is chicago pronounced as "shikago" and chimes as chimes. English is a faulty language. No wonder why NASA scientists researched that english as a language is not suitable for artificial intelligence-nlp but Sanskrit is.

    • @PS-eu6qk
      @PS-eu6qk 4 ปีที่แล้ว +2

      @@CodeEmporium this really annoys me of these native english speaker picking up on non native english speakers.you dont need to justify how you say a matrix. Can native english speakers or any other native language speakers pronounce indian languages like Sanskrit and/or tamil properly? off course not. dont be apologetic.

    • @PS-eu6qk
      @PS-eu6qk 4 ปีที่แล้ว

      how do you justify magic pronounced as "may-gic" and matrix as "matrix"

    • @eskedarayele4430
      @eskedarayele4430 ปีที่แล้ว

      Oh my God thank you. I always find it hard to pronounced it easily.

  • @ejomaumambala5984
    @ejomaumambala5984 4 ปีที่แล้ว

    Too many gross math mistakes (some of them pointed out in other comments). You need to read more/better material. Please don't post any more misleading and incorrect videos.

  • @victorzurkowski2388
    @victorzurkowski2388 ปีที่แล้ว

    Why not pronouncing "/ˈmātriks/"????

    • @CodeEmporium
      @CodeEmporium  ปีที่แล้ว

      Cuz phonics and I never got along. I have wartime flashbacks from the 3rd grade