Why Activation Function and Which One Should You Use?

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ต.ค. 2024

ความคิดเห็น • 15

  • @solletisuresh4277
    @solletisuresh4277 5 ปีที่แล้ว +2

    Ur explanation is simply awesome

  • @aghileslounis
    @aghileslounis 4 ปีที่แล้ว +3

    there is no explanation of how activation function capture non linearity at all ? we know the output will be non linear because of the function but...

    • @NormalizedNerd
      @NormalizedNerd  4 ปีที่แล้ว +3

      I think it will be better if we try to imagine the boundaries of our classifier. With a linear function, we can never have a boundary that can solve the xor problem. But if we use a non-linear function then we can have a boundary that can do the job. So the key point is...activation function can capture the non-linearity (of the dataset) because it creates a boundary that can separate linearly inseparable data points.

    • @aghileslounis
      @aghileslounis 4 ปีที่แล้ว

      @@NormalizedNerd thanks for your response, but i still can't understand well enough i mean RELU it just take the same number or make it 0 if it negative, how this can capture non linearity, it is very confusing for me, and i tried a lot of things and i ended with a conclusion that the non linearity is created by using multiple nodes and layers and activation function just help to converge faster, but still very confusing

    • @NormalizedNerd
      @NormalizedNerd  4 ปีที่แล้ว +3

      @@aghileslounis Here are a couple points to clear your confusion:
      1. ReLu is not linear (because linearity doesn't hold for the entire domain)
      2. You can solve any problem (approximate any function) by a NN with just one hidden layer. As the complexity of the function increase you have to add more nodes but you won't require more than one hidden layer. This means non-linearity comes from the activation function but we need more nodes if we want to approximate a complex function.
      3. It is true that we require multiple ReLu units to approximate a complex non-linear boundary. But it is also true that you can't approximate these boundaries if you use multiple linear functions.
      Hence the non-linearity indeed comes from the activation functions.

    • @aghileslounis
      @aghileslounis 4 ปีที่แล้ว

      @@NormalizedNerd thank you sir ! i appreciate your help a lot

    • @mathai1003
      @mathai1003 4 ปีที่แล้ว +1

      I think this is what you are looking for
      @Ghiles Lou
      mathai.co/2020/04/activation-function

  • @mikiallen7733
    @mikiallen7733 4 ปีที่แล้ว +1

    Great intro , but how can I use such stochastic activation functions which allows for fat-tail probabilities sometimes seen in real-world datasets

    • @NormalizedNerd
      @NormalizedNerd  4 ปีที่แล้ว

      Thanks.
      Your question is a bit unclear to me. Neural networks are universal function approximators hence can approximate any function. The activation functions help to include the non-linearity. One activation alone can't predict well but if we use a lot of them together (i.e. using many neurons in one layer) then they work really well.

    • @mikiallen7733
      @mikiallen7733 4 ปีที่แล้ว +1

      @@NormalizedNerd Advanced Stochastic Optimization Algorithm for Deep Learning Artificial Neural Networks in Banking and Finance Industries
      In there you will probably get a better idea of what I mean

    • @NormalizedNerd
      @NormalizedNerd  4 ปีที่แล้ว

      TBH, I wasn't aware of these extended versions of activation functions. I read some portions of the paper (www.researchgate.net/publication/337769491) and noticed some modified versions of the sigmoid activation. I'm not familiar with the concepts like volatility of Macroeconomic Indicators, functions satisfying Jameel’s Criterion, etc. but if you can compute these things from your dataset then it's just a matter of implementing a custom activation function. This is very easy to do using keras. stackoverflow.com/questions/43915482/how-do-you-create-a-custom-activation-function-with-keras

  • @nsnilesh604
    @nsnilesh604 2 ปีที่แล้ว

    Any Math behind relu function for understanding it more clearly????

  • @santhoshnaredla8243
    @santhoshnaredla8243 5 ปีที่แล้ว +1

    Good one