Deep Learning(CS7015): Lec 9.3 Better activation functions

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ต.ค. 2024

ความคิดเห็น • 12

  • @niks4u93
    @niks4u93 5 ปีที่แล้ว +25

    awesome explanation. The Tutor is brilliant in terms of covering the deep hidden answers by explaining the why things. So far the best tutor of deep learning.

  • @joshi98kishan
    @joshi98kishan 4 ปีที่แล้ว +1

    Bestest explanation on activation functions. Thank you sir.

  • @sandeepganage9717
    @sandeepganage9717 3 ปีที่แล้ว

    Beautifully explained

  • @newbie8051
    @newbie8051 ปีที่แล้ว +3

    3:10 Problems with sigmoid
    Saturation of sigmoids causes gradients to vanish.
    8:00 Initialize weights to a large value, causes update to be very large and saturates neurons faster.
    8:40 Sigmoids are not zero centered, value is between 0 to 1.
    Issues with non-zero centered activations.
    Restrict directions of weight updates and therefore make convergence longer to achieve
    14:00 Sigmoids are computationally expensive to compute
    15:00 Tanh activation function
    Improves over sigmoid, is zero centered.
    Does not mitigate saturation of neurons.
    Computationally even more expensive
    16:00 ReLU as a piecewise non-linear activation function
    Does not saturate in positive region
    Computation is easy
    Causes dead neurons, that is once a neuron receives negative update direction, all further updates are stopped. All connected neurons to this neuron also do not receive updates from this neuron.
    Large no. of neurons die off if learning rate is set high
    23:00 Leaky ReLU
    Allows a small gradient to flow, for negative updates
    To avoid dead neuron issue

  • @ashutoshnagayach6969
    @ashutoshnagayach6969 4 ปีที่แล้ว +1

    really nice

  • @neotodsoltani5902
    @neotodsoltani5902 ปีที่แล้ว

    Hi, brilliant lecture and course.
    How can I have access to slides? It's really cumbersome to write the handout by ourselves.
    tnx

  • @pawanchoudhary619
    @pawanchoudhary619 11 หลายเดือนก่อน

    Sir, please also tech us Machine Learning

  • @manikantabandla3923
    @manikantabandla3923 ปีที่แล้ว

    Can we think of dead neuron(ReLu activation) as a form of dropping out neuron in Dropout regularization?
    Because weights of both incoming and outcoming edges of dead neuron will not get updated in that iteration(and they chose to not get updated forever).
    This leads to my next question, Is ReLu acting as some form of regularization?

    • @newbie8051
      @newbie8051 ปีที่แล้ว +1

      Dropouts are stochastic in nature, dead neurons kill off a lot of neurons they are connected with too.
      Also in dropout layers, some neurons are muted just for that iteration so that the neurons learn independently. These neurons are muted temporarily (only for a particular training iteration)
      So I do not think ReLU acts like a regularizer, your thoughts are welcome 😀

  • @keshavkumar7769
    @keshavkumar7769 4 ปีที่แล้ว +1

    how the relu units get died if we set learning rate too high ?
    20:34

    • @mr_law886
      @mr_law886 4 ปีที่แล้ว +4

      b' = b- eta(grad(b))
      if b was originally small, choosing a high value for eta would blow up the second term.
      Hence, b' will be even more negative. (The cause for a neuron to die).

    • @manikantabandla3923
      @manikantabandla3923 ปีที่แล้ว

      @@mr_law886 what if grad(b) < 0?
      Choosing a high value of eta will make (-eta(grad(b)) larger, So b' most possibly positive value.