Machine Learning Lecture 36 "Neural Networks / Deep Learning Continued" -Cornell CS4780 SP17

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ม.ค. 2025

ความคิดเห็น • 41

  • @ahmedmustahid4936
    @ahmedmustahid4936 5 ปีที่แล้ว +26

    These are the best lectures on ML I have come across so far

  • @jachawkvr
    @jachawkvr 4 ปีที่แล้ว +16

    The lecture was so much fun, I used to think that deep learning is somehow very different than the other ml algos, but I realize now that this is not the case. But, the shocking thing was how accurate the neural net's predictions were at the end, simply based on the photo.

  • @deltasun
    @deltasun 4 ปีที่แล้ว +6

    beautiful and intuitive explanation of why SGD is working for NNs even if it is so stupid as an optimization algorithm. Really illuminating. I've been struggling with this thing for months. thank you! this course is by far the most insightful course on ML I've ever seen

  • @ForcesOfOdin
    @ForcesOfOdin 11 หลายเดือนก่อน +1

    The intuition here was so so satisfying. The way it all comes together at the end, when he points out that the sigmoidal functions people used to use (because of emulating neuronal activation functions) have these flat parts which slow down the gradient. Not only is the slowed learning bad, but that slowed learning dampens the ability of the noisy SGD to escape the thin deep wells which represent ideal parameters only for a SPECIFIC data set. I.e. the thin deep wells = overfitting, the noise of SGD escapes them along with big alpha, and a slowed gradient from the sigmoidal flat parts causes an effective reduction in learning rate, which leads to getting trapped in the wells even with SGD, which causes overfitting. Just awesome.

  • @michaelmellinger2324
    @michaelmellinger2324 2 ปีที่แล้ว +2

    2:00 Begin
    2:25 Neural networks a just a simple extension of linear classifiers
    5:00 Chain rule
    15:30 Gradient descent
    16:00 No longer working with convex functions because of the transition function. Where we start matters. Not with all zero vector. Initialization is a big deal.
    20:25 SGD is really important
    28:45 We end up at some of the large holes (not necessarily deep) that we can’t escape from. Throwing away training data and test data gives us a different function. Wider less likely to change a lot. SGD can only find these!
    33:40 Two tricks: mini-batch and initial large learning rate then lower by a factor of 10.
    37:40 If you wanted to do bagging with neural networks… Ensemble several networks and don’t need to resample.
    38:50 Why they’re called neural networks.
    43:00 Discusses why ReLU, which is non-differentiable. Good at not getting trapped in local minima
    44:50 Demo
    46:00 ReLU is better at complex problems but not smaller problems like the demo.
    47:00 playground.tensorflow.org demo

  • @gaconc1
    @gaconc1 4 ปีที่แล้ว +6

    Great intellectual insight delivered passionately! Many thanks, Prof Kilian!

  • @lastfirst4073
    @lastfirst4073 ปีที่แล้ว +2

    I love the fact that he searched himself at rate my professor. 50:53
    I rate you a 10/10.
    Thank you professor.

  • @kodjigarpp
    @kodjigarpp 3 ปีที่แล้ว +3

    Thank you so much for your teaching, this is the best content I found in four years in this field. I am close to applying for a PhD in your laboratory haha!

  • @in100seconds5
    @in100seconds5 4 ปีที่แล้ว +6

    I am really grateful, really useful contents and very practical

  • @sandeepreddy6295
    @sandeepreddy6295 4 ปีที่แล้ว +2

    Great Lecture! You made learning fun. Complexity in understanding the concepts through Bishop's book vs these lectures = huge number, even though there is no doubt that the book may be one of the best.

    • @sudhanshuvashisht8960
      @sudhanshuvashisht8960 4 ปีที่แล้ว

      Hey I'm planning to start BISHOP's book after finishing this, is this the right thing to do or are you saying that book is just complexly defined version of Prof. Killian lectures?

    • @sandeepreddy6295
      @sandeepreddy6295 4 ปีที่แล้ว

      @@sudhanshuvashisht8960, It's complexly defined; Someone like the professor is the right one to say what to do or what not to do.

  • @RedPillDS
    @RedPillDS 3 ปีที่แล้ว +3

    Not the professor we want but the one we NEED !!!
    Just Awesome ...

  • @vatsan16
    @vatsan16 4 ปีที่แล้ว +9

    Am I the only one who is sad that I have only one more lecture to go? :( (Of course, I will probably come back to some of the classes but there is nothing like discovering it for the first time)

  • @sayantanmitra7567
    @sayantanmitra7567 5 ปีที่แล้ว +3

    Sir could we get access to all the project you assign to students? That would really help us.

    • @in100seconds5
      @in100seconds5 4 ปีที่แล้ว

      Sayantan Mitra not on this one but on other videos there is a link for home works

    • @tostupidforname
      @tostupidforname 4 ปีที่แล้ว

      @@in100seconds5 Oh really? I somehow missed that. Thats amazing!

    • @in100seconds5
      @in100seconds5 4 ปีที่แล้ว +1

      yeah link of homework, but not projects. Apparently dear Kilian does not share them publicly because the solution might become available to future Cornell students.

    • @tostupidforname
      @tostupidforname 4 ปีที่แล้ว +2

      @@in100seconds5 That's unfortunate but understandable.

  • @Shkencetari
    @Shkencetari 5 ปีที่แล้ว +4

    Can we access practical exams?

    • @kilianweinberger698
      @kilianweinberger698  5 ปีที่แล้ว +11

      Yes, actually you can download them here: www.dropbox.com/s/zfr5w5bxxvizmnq/Kilian%20past%20Exams.zip

    • @Shkencetari
      @Shkencetari 5 ปีที่แล้ว +1

      @@kilianweinberger698 Thank you very much. I really appreciate it.

    • @jiviteshsharma1021
      @jiviteshsharma1021 4 ปีที่แล้ว

      @@kilianweinberger698 Thank You so much

    • @udiibgui2136
      @udiibgui2136 3 ปีที่แล้ว +1

      @@kilianweinberger698 Hi Kilian thank you for these amazing lessons and resources! The files have seen to be deleted, is there a new link?

  • @med0897
    @med0897 4 ปีที่แล้ว +1

    I first would like to thank you for these amazing lectures on ML ! I just have a question about the SGD. Can we say that the SGD can escape the local minima because the landscape of the single loss function ( or batch loss function ) is different from the loss function of the whole dataset ?

    • @kilianweinberger698
      @kilianweinberger698  4 ปีที่แล้ว +4

      To some degree, but it is important that you are changing the mini-batch for every gradient update.
      Probably a better way to think about it is that it is not that easy to get stuck in local minima / saddle points. You need the precise gradient information to hit it exactly (a little like hitting the moon with a rocket - only if you aim very carefully will you be successful). If you estimate your gradient with a mini-batch your gradients will be way too noisy to hit the local minima, and you will shoot past it - eventually converging near a global minima from which it is very hard to escape. Hope this helps.

    • @med0897
      @med0897 4 ปีที่แล้ว +1

      @@kilianweinberger698 Thank you for the reply !

  • @pratoshraj3679
    @pratoshraj3679 5 ปีที่แล้ว +1

    How the cost function of neural networks is non-convex and is this the case all the time?

    • @kilianweinberger698
      @kilianweinberger698  5 ปีที่แล้ว +1

      Yes, unless you have no hidden layers, in which case you obtain something like logistic regression.

    • @kilianweinberger698
      @kilianweinberger698  5 ปีที่แล้ว +4

      I have to add ... Or in case you have no non-linear transition functions, in which case you would also get something like logistic regression :-)

  • @husamalsayed8036
    @husamalsayed8036 3 ปีที่แล้ว

    thanks for the lecture
    as you said if you have enough data the function of the training set is close to the testing set , but any way the SGD tends to go to the wider local minimum , in such case isn't better to use other classifier which put you on much narrow local minimum because as you said the function would be close to the function of the training data

    • @kilianweinberger698
      @kilianweinberger698  3 ปีที่แล้ว +4

      The danger is that the loss surface changes as you switch to different (test) data. So a narrow minimum in the training set may actually not be very deep for the test data. Wider minima are often considered more stable.

  • @Bmmhable
    @Bmmhable 5 ปีที่แล้ว

    at 11:38, I think it should be da/dU = phi_prime(x).

    • @zelazo81
      @zelazo81 5 ปีที่แล้ว

      it's correct on the blackboard because you differentiate with respect to U and then \phi(x) is a 'constant'.

  • @gregmakov2680
    @gregmakov2680 2 ปีที่แล้ว

    the layers in NN could be considered as filters. One layer = one filter. NN has behaviour as cascaded filters.

  • @prateekpatel6082
    @prateekpatel6082 3 ปีที่แล้ว

    Can you please share pointers to : sgd finds good minimas

  • @shrishtrivedi2652
    @shrishtrivedi2652 3 ปีที่แล้ว

    2:00

  • @architsharma292
    @architsharma292 หลายเดือนก่อน

    goated

  • @gregmakov2680
    @gregmakov2680 2 ปีที่แล้ว

    hahah, I love to be lazy :D:D moi thu khai niem bi dao lon het :D:D:D