Machine Learning Lecture 22 "More on Kernels" -Cornell CS4780 SP17

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ม.ค. 2025

ความคิดเห็น • 47

  • @dhruvpatel7793
    @dhruvpatel7793 5 ปีที่แล้ว +50

    I am pretty convinced that this guy can explain theory of relativity to a toddler. Pure Respect!!

    • @rajeshs2840
      @rajeshs2840 5 ปีที่แล้ว +9

      Is he seems to be a great prof.. I love him..

  • @rhngla
    @rhngla ปีที่แล้ว +3

    The subscript of x_i implicitly switches from referring to i-th feature dimension to i-th data sample at the 7:40 mark for the discussion on kernels. Just a note to prevent potential confusion arising from this.

  • @godsgaming9120
    @godsgaming9120 4 ปีที่แล้ว +11

    Studying at the best uni in India and still completely dependent on these lectures. Amazing explanations! Hope you visit IIT Delhi sometime!

  • @rajeshs2840
    @rajeshs2840 5 ปีที่แล้ว +7

    Your lectures for me, is me as a kid going to candy shop.. I love them..

  • @KulvinderSingh-pm7cr
    @KulvinderSingh-pm7cr 6 ปีที่แล้ว +8

    Kernel trick on the last data set blew my mind... literally !!

  • @mrobjectoriented
    @mrobjectoriented 5 ปีที่แล้ว +7

    The climax at the end is worth the 50 minutes lecture!

  • @j.adrianriosa.4163
    @j.adrianriosa.4163 3 ปีที่แล้ว +3

    I even can't explain how good this explanations are!! Thank you sir!

  • @81gursimran
    @81gursimran 5 ปีที่แล้ว +9

    Love your lectures! Especially the demos!

  • @venugopalmani2739
    @venugopalmani2739 5 ปีที่แล้ว +15

    How I start my day everyday these days : Welcome everybody! Please put away your laptops.
    You should make a t-shirt or something with that line.

  • @blz1rorudon
    @blz1rorudon 5 ปีที่แล้ว +4

    Mad respect to this guy. I might consider being his lifelong disciple or some shit. Noicce!!

  • @sandeepreddy6295
    @sandeepreddy6295 4 ปีที่แล้ว +3

    a one more very good lecture from the playlist

  • @shashihnt
    @shashihnt 3 ปีที่แล้ว +2

    Extra claps for the demos they are so cool.

  • @Tyokok
    @Tyokok 11 หลายเดือนก่อน

    one question: around 17:33 "this is only true for linear classifier", but from the induction proof it can also apply to linear regression, why we never see linear regression show w = sum( alpha_i * x_i)? Thank you so much for the great great lecture!

  • @rishabhkumar-qs3jb
    @rishabhkumar-qs3jb 3 ปีที่แล้ว

    Best course of machine learning I ever watched. Amazing...:)

  • @jiahao2709
    @jiahao2709 6 ปีที่แล้ว +4

    I really like your course,very interesting

  • @mfavier
    @mfavier 2 ปีที่แล้ว

    About the inductive proof around 15:14, I think we should also specify that, because the linear space generated by the inputs is closed, then the sequence of w_i converges to a w that is in the same linear space. Otherwise we are merely saying that w is in the boundary of that linear space

  • @hassamsheikh
    @hassamsheikh 5 ปีที่แล้ว +5

    I wonder how lucky the grad students are whose adviser is this professor.

  • @rahuldeora1120
    @rahuldeora1120 5 ปีที่แล้ว +3

    Great lecture! But I had a question: In the induction proofat 15:14, as is said we can initialise any way, what if I initialise w to such a value that it cannot be written as a linear combination of the x's. Then every iteration, I will add a linear combination of x's but it still won't be in total a linear combination of the x's. Will this not dispove the induction for some initlizations?

    • @kilianweinberger698
      @kilianweinberger698  5 ปีที่แล้ว +2

      Yes, good catch. If the x’s do not span the full space (i.e. n

    • @rahuldeora1120
      @rahuldeora1120 5 ปีที่แล้ว

      @@kilianweinberger698 Thank you for taking time to reply. When you saw 'in practice these scenarios are avoided by adding a little bit of l2-regularization. " how does l2 regularization make the weight vector a linear combination of the input data when the input does not span the space?

    • @rahuldeora5815
      @rahuldeora5815 5 ปีที่แล้ว

      @@rahuldeora1120 Do reply

    • @jiviteshsharma1021
      @jiviteshsharma1021 4 ปีที่แล้ว

      ​@@rahuldeora1120 it doesnt make it so it spans the space, but since we are enclosed in a ball the Regularization value is the best estimate for all the global minima present, thus kind of making it seems as though the data spans the regularized space- This is what i understood hope im not wrong professor

  • @bryanr2978
    @bryanr2978 2 ปีที่แล้ว

    Hi Prof. Killian, around 34:53 Q&A you said that we could just set zero weights to the features that we don't care about. I was a bit confused how you could potentailly do this since you only have one alpha i for the i-th observation. If you assign zero to these features, it would be zero for all the features of the Xi. Am I wrong?

  • @jayye7847
    @jayye7847 5 ปีที่แล้ว +1

    Amazing! Good explanation! Very helpful! Great thanks from China.

  • @nabilakdim2767
    @nabilakdim2767 3 ปีที่แล้ว

    Intuition about kernels : a good Kernel says two different points are "similar" in the attribute space when their labels are "similar" in the label space

  • @anunaysanganal
    @anunaysanganal 4 ปีที่แล้ว +1

    Great Lecture! I just don't understand why we are using Linear Regression for Classification? Can we use sigmoid instead, as it also has the wTx component, so we can kernelize that as well?

    • @kilianweinberger698
      @kilianweinberger698  4 ปีที่แล้ว +2

      Just because it is simple. It is not ideal, but also not terrible. But yes, typically you would use the logistic loss, which makes more sense for classification.

  • @sudhanshuvashisht8960
    @sudhanshuvashisht8960 3 ปีที่แล้ว +1

    Though the proof by induction to prove that the W vector can be expressed as a linear combination of all input vectors makes sense, the other point of view is confusing me. Here is the other point of view: Say I have 2-d training data with only 2 training points and I map them to the three-dimensional space using some kernel function. Now since I have got only two data points (vectors) in three dimensions, expressing W as a linear combination of these vectors imply the span of W would only be limited to the plane formed by these two vectors which seem to reduce/defy the purpose of mapping to higher-dimensional space. The same example can be made pragmatic when say we have 10k training points and we are mapping each of them to a million-dimensional space.

    • @kilianweinberger698
      @kilianweinberger698  3 ปีที่แล้ว +1

      Yes, good point. That’s why e.g. SVM with RBF kernel are often referred to as non-parametric. They become more powerful (the number of parameters and their expressive power increases) as you obtain more training data.

    • @sudhanshuvashisht8960
      @sudhanshuvashisht8960 3 ปีที่แล้ว

      @@kilianweinberger698 Thanks Professor. This brings me to the next question: In my example (2 training points in 3-d space), does this mean there might be a better solution (in terms of lower loss) that is not in the plane spanned by those 2 training points?

  • @chanwoochun7694
    @chanwoochun7694 4 ปีที่แล้ว +3

    I should have taken this course and intro to wine before graduating...

  • @padmapriyapugazhendhi7465
    @padmapriyapugazhendhi7465 4 ปีที่แล้ว +1

    How can you say the gradient is a linear combination of inputs??

    • @kilianweinberger698
      @kilianweinberger698  4 ปีที่แล้ว

      Not for every loss - but for many of them. E.g. for the squared loss, take a look at the gradient derivation in the notes. The gradient consists of a sum of terms \sum_i gamma_i*x_i .

    • @padmapriyapugazhendhi7465
      @padmapriyapugazhendhi7465 4 ปีที่แล้ว

      @@kilianweinberger698 I am sorry if its a silly doubt. I thought that a linear combination means the coefficients of x_i should be constant independent of x_i. When gamma itself depends on x_i, isn't it then a non-linear combination?

    • @kilianweinberger698
      @kilianweinberger698  4 ปีที่แล้ว

      No, it is still linear. The gradient being a linear combinations of the inputs just means that the gradient always lies in the space spanned by the input vectors. If the coefficients are a function of x_i or not doesn’t matter in this particular context. Hope this helps. (Btw, you are not alone, a lot of students find that confusing ...)

    • @padmapriyapugazhendhi7465
      @padmapriyapugazhendhi7465 4 ปีที่แล้ว

      Thank you for your patient replies. Just one more intriguing question. Is x_i a vector? so that I can write x_i as (x_i1, xi2... x_in)

  • @KulvinderSingh-pm7cr
    @KulvinderSingh-pm7cr 6 ปีที่แล้ว +3

    Infinite dimensions made me remember Dr. strange !!

  • @71sephiroth
    @71sephiroth 4 ปีที่แล้ว +1

    whenever there is a breakthrough in ML there is always that exp(x) sneaking around somehow (Boosting, tSNE, RBF...)

    • @kilianweinberger698
      @kilianweinberger698  4 ปีที่แล้ว +3

      Good point ... maybe in the future we should start all papers with “Take exp(x) ...”. ;-)

    • @prattzencodes7221
      @prattzencodes7221 4 ปีที่แล้ว

      Started with Gauss and his bell shaped distribution, I guess? 😏😏

  • @dimitriognibene8945
    @dimitriognibene8945 4 ปีที่แล้ว

    @kiliam weinberger the 2 hd in noth Korea feel very alone

  • @raydex7259
    @raydex7259 4 ปีที่แล้ว

    (Question I ask myself after 11 minutes): In this Video a linear !Classifier! is stated as an example and Squared Loss is used. Squared loss does not make much sense in classification or am I wrong?

    • @kilianweinberger698
      @kilianweinberger698  4 ปีที่แล้ว +3

      Well, it is not totally crazy. In practice people still use the squared loss often for classification, just because it is so easy to implement and comes with a closed form solution.
      But you are right, if you want top performance a logistic loss makes more sense - simply because if the classifier is very confident about a sample it can give it a very large or very negative inner-product, whereas with a squared loss it is trying to hit the label exactly (e.g. +1 or -1, and a +5 would actually be penalized).

  • @dolphinwhale6210
    @dolphinwhale6210 5 ปีที่แล้ว

    these lectures are for undergraduate or graduate program??

    • @kilianweinberger698
      @kilianweinberger698  5 ปีที่แล้ว +6

      Both, but the class typically has more undergrads than graduate students.

  • @KK-mt4km
    @KK-mt4km 5 ปีที่แล้ว +2

    respect