Neural Networks Pt. 3: ReLU In Action!!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 9 ก.ค. 2024
  • The ReLU activation function is one of the most popular activation functions for Deep Learning and Convolutional Neural Networks. However, the function itself is deceptively simple. This StatQuest walks you through an example, step-by-step, that uses the ReLU activation function so you can see exactly what it does and how it works.
    NOTE: This StatQuest assumes that you are already familiar with the main ideas behind Neural Networks. If not, check out the 'Quest: • The Essential Main Ide...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying my book, The StatQuest Illustrated Guide to Machine Learning:
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    TH-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    1:45 ReLU in the Hidden Layer
    5:35 ReLU right before the Output
    7:38 The derivative of ReLU
    #StatQuest #NeuralNetworks #ReLU

ความคิดเห็น • 300

  • @statquest
    @statquest  2 ปีที่แล้ว +13

    The full Neural Networks playlist, from the basics to deep learning, is here: th-cam.com/video/CqOfi41LfDw/w-d-xo.html
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @DThorn619
    @DThorn619 3 ปีที่แล้ว +104

    Just to help with promotion the study guides he posts on his site only cost $3.00 and they are immensely helpful to refer back to. It's like having his entire video condensed into a handy step by step guide as a PDF. Yes, you could just watch the video over and over but this way you help Josh continue making great content for us at the cost of a cup of coffee.

    • @statquest
      @statquest  3 ปีที่แล้ว +28

      TRIPLE BAM!!! Thanks for the promotion! I'm glad you like the study guides. It takes a lot of work to condense everything down to just a few pages.

  • @MrAlb3rtazzo
    @MrAlb3rtazzo 3 ปีที่แล้ว +72

    every time a go the bathroom, and I use "soft plus " I think about neural nets again and this accelerates my learning process :)

  • @naf7540
    @naf7540 3 ปีที่แล้ว +93

    This is just so crystal clear and must have taken you some time to really deconstruct in order to explain it, really fantastic, thank you Josh!

    • @statquest
      @statquest  3 ปีที่แล้ว +36

      Thanks! It took a few years to figure out how to create this whole series.

    • @wassuphomies263
      @wassuphomies263 2 ปีที่แล้ว +2

      @@statquest Thank you for the videos! This helps a lot :)

    • @AdityaSingh-qk4qe
      @AdityaSingh-qk4qe 2 ปีที่แล้ว +4

      @@statquest That's a big BAM!!! StatQuest is by far one of the best resources for statistics and ML - thanks a lot, you helped me understand so many concepts, which I never got such as PCA, and even this how activation functions like relu actually bring non-linearity by slicing, fliping, etc operations!

  • @iiilllii140
    @iiilllii140 ปีที่แล้ว +7

    This is such a nice and clear visualization of how activation functions work inside a neural network, and a perfect way to remember the inner workings. This is a masterpiece!
    Before that I knew, apply an input, activation functions, etc, etc, and you will receive an output with a magic value. But now I have a much more deeper understanding of WHY we are applying these activation functions / different activation functions.

    • @statquest
      @statquest  ปีที่แล้ว

      Thank you very much! :)

  • @discotecc
    @discotecc 4 หลายเดือนก่อน +2

    The theoretical simplicity of deep learning is a beautiful thing

    • @statquest
      @statquest  4 หลายเดือนก่อน

      :)

  • @bibiworm
    @bibiworm 3 ปีที่แล้ว +4

    I like how you explain affine transformation rotates, scales and flips activation function with vivid illustration. Now I can relate to Lecun's deep learning class, which talks about this in abstract matrix form. Thanks.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it was helpful!

  • @raul825able
    @raul825able ปีที่แล้ว +2

    Thanks Josh!!! It's such a fun to learn machine learning from your videos.

    • @statquest
      @statquest  ปีที่แล้ว +2

      Thank you! :)

  • @alimehrabifard1830
    @alimehrabifard1830 3 ปีที่แล้ว +23

    Awesome guy, Awesome channel, Awesome video, TRIPLE BAM!!!

  • @ifargantech
    @ifargantech ปีที่แล้ว +1

    I always expect your intro music... I like it. Your content is also satisfying. Thank you!

    • @statquest
      @statquest  ปีที่แล้ว

      Glad you enjoy it!

  • @user-bf6xb8xx8r
    @user-bf6xb8xx8r 3 ปีที่แล้ว +5

    How could you make difficult Machine Learning contents so easy? Incredible!

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Wow, thanks!

  • @heplaysguitar1090
    @heplaysguitar1090 3 ปีที่แล้ว +7

    I come here every time I learn some new concept to understand it clearly. Thanks a ton!!
    Would really love to jam with you someday for the intros, and maybe we can call it a BAMMING session.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      That would be awesome!

  • @hozaifas4811
    @hozaifas4811 10 หลายเดือนก่อน +1

    Your explanation deserves a huge bam! that's great man

    • @statquest
      @statquest  10 หลายเดือนก่อน +1

      Thanks!

  • @ashutosh-porwal
    @ashutosh-porwal 3 ปีที่แล้ว +7

    The way you explain is on another level sir..Thanks🙏

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Thank you very much! :)

  • @maliknauman3566
    @maliknauman3566 2 ปีที่แล้ว +3

    Google should give you award for spreading knowledge to us all...

    • @statquest
      @statquest  2 ปีที่แล้ว

      Bam! Thank you! :)

  • @aswink112
    @aswink112 3 ปีที่แล้ว +9

    My mind is blowing. Triple Bam. Josh Starmer - a great thanks to you for making such amazing videos, educating others for free of cost.

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      Wow, thank you!

  • @mainakray6452
    @mainakray6452 3 ปีที่แล้ว +2

    your explanation makes things so simple ...

  • @ramyagoka9693
    @ramyagoka9693 3 ปีที่แล้ว +1

    Thank you sir so much for such an clear explanation

    • @statquest
      @statquest  3 ปีที่แล้ว

      Happy to help! :)

  • @RomaineGangaram
    @RomaineGangaram 2 หลายเดือนก่อน +1

    Bro you are a genius. Much love from South Africa. Soon i will be able to buy your stuff. You deserve it

    • @statquest
      @statquest  2 หลายเดือนก่อน

      Thanks!

  • @drzl
    @drzl 2 ปีที่แล้ว +1

    Thank you, this helped me with an assignment

    • @statquest
      @statquest  2 ปีที่แล้ว

      Glad it helped!

  • @harishbattula2672
    @harishbattula2672 2 ปีที่แล้ว +1

    Thank you for the explanation.

    • @statquest
      @statquest  2 ปีที่แล้ว

      You are welcome!

  • @maryamsajid8400
    @maryamsajid8400 ปีที่แล้ว +1

    amazing job... understood clearly... now i don't have to search more for ReLU :D

  • @matthewlee2405
    @matthewlee2405 3 ปีที่แล้ว +2

    Thank you very much Starmer, very clear and great video! Thank you!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it was helpful!

  • @QuranKarreem
    @QuranKarreem 11 หลายเดือนก่อน +1

    Very good explanation ,especially when you talked about the relu function which is not differentiable
    keep up the great work brother

    • @statquest
      @statquest  11 หลายเดือนก่อน

      Thanks, will do!

  • @speedtent
    @speedtent ปีที่แล้ว +1

    You saved my life thank you from korea

  • @wojpaw5362
    @wojpaw5362 3 ปีที่แล้ว +1

    OMG - CLEAREST EXPLANATION OF RELU ON THE PLANET!!! PLEASE TEACH ME EVERYTHING YOU KNOW

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      bam! Here's a list of all of my videos: statquest.org/video-index/

    • @wojpaw5362
      @wojpaw5362 2 ปีที่แล้ว +1

      @@statquest Thank you Mister :)

  • @Mak007-h5s
    @Mak007-h5s 2 ปีที่แล้ว +1

    So good!

  • @tagoreji2143
    @tagoreji2143 ปีที่แล้ว +1

    thank you Professor

  • @williambertolasi1055
    @williambertolasi1055 3 ปีที่แล้ว +4

    Good explanation. It is interesting to see how the ReLU is used to gradually refine the function that defines the probabilistic output.
    Looking at how the ReLU is used reminds me of the use of diodes (with an approximate characteristic curve) in electronic circuits.

    • @statquest
      @statquest  3 ปีที่แล้ว +3

      Same here - the ReLU reminds me of a circuit.

  • @hemantrawat1576
    @hemantrawat1576 ปีที่แล้ว +1

    I really like the intro of the video statquest....

  • @nandakumar8936
    @nandakumar8936 2 หลายเดือนก่อน +1

    'at least it's ok with me' - all we need for peace of mind

    • @statquest
      @statquest  2 หลายเดือนก่อน

      :)

  • @anishtadev2678
    @anishtadev2678 3 ปีที่แล้ว +2

    Thank you Sir

    • @statquest
      @statquest  3 ปีที่แล้ว

      Most welcome!

  • @RubenMartinezCuella
    @RubenMartinezCuella 3 ปีที่แล้ว +3

    Hey Josh, here is a topic you may be interested in making a video about. It is very relevant and I feel like not many videos in the web explain it:
    - Which are the hyperparameters affecting a NN and what is the intuition behind each of them. Most packages (e.g. caret) run a grid of models with all combinations of parameters you have specified, but it gets very computationally expensive pretty easily. It would be great to learn about some of the intuition behind in order to feed that grid something better than random guesses.
    Let me know what you think about this topic, and thanks again for your great job.

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      I'll keep that in mind. In the next few months I want to do a webinar on how to do neural networks in python (with sklearn) and maybe that will help you.

    • @RubenMartinezCuella
      @RubenMartinezCuella 3 ปีที่แล้ว +1

      @@statquest thank you

  • @BowlingBowlingParkin
    @BowlingBowlingParkin 2 ปีที่แล้ว +1

    AMAZING!

  • @adamoja4295
    @adamoja4295 2 ปีที่แล้ว +1

    That was very satisfying

  • @jamasica5839
    @jamasica5839 3 ปีที่แล้ว

    With ReLU life is easier, you don't have to computing complicated THE CHAIN RULE :D
    Great series!!! I finally get it because of you Josh!

  • @chaoukimachreki6422
    @chaoukimachreki6422 2 ปีที่แล้ว +1

    Just awesome...

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you! :)

  • @jennycotan7080
    @jennycotan7080 ปีที่แล้ว +1

    You said that ReLU sounds like a robot. My personification of this function is actually a robot who is simple-minded when solving problems! Coincidence!

    • @statquest
      @statquest  ปีที่แล้ว

      Bam!

    • @jennycotan7080
      @jennycotan7080 ปีที่แล้ว +1

      @@statquest Bigger bam! A bam caused by fire magic.

  • @sohambasu660
    @sohambasu660 2 ปีที่แล้ว

    I really the great content you make that helps us to understand such difficult topics.
    Also, if you kindly include the formula generally used for the concept and break them down in the video, it would be immensely useful.
    Thanks anyways.

    • @statquest
      @statquest  2 ปีที่แล้ว

      I'm not sure I understand your question. This video discusses the formula for ReLU and breaks it down.

  • @ML-jx5zo
    @ML-jx5zo 3 ปีที่แล้ว +2

    Again a appreciation for u

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @Xayuap
    @Xayuap ปีที่แล้ว +2

    a tiny bam is just a declaration of humility

  • @6866yash
    @6866yash 2 ปีที่แล้ว +1

    You are a godsend :')

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you! :)

  • @_1jay
    @_1jay ปีที่แล้ว +3

    Another banger

  • @FloraSora
    @FloraSora 3 ปีที่แล้ว +2

    I love the toilet paper image for softplus... didn't catch it on the first watch but it became more and more suspicious as I went through this a few times... LOL.

  • @edmalynpacanor7601
    @edmalynpacanor7601 2 ปีที่แล้ว +1

    Not skipping ads for my guy Josh

    • @statquest
      @statquest  2 ปีที่แล้ว

      BAM! :) thanks for your support!

  • @firattamur1682
    @firattamur1682 3 ปีที่แล้ว +4

    Hi, I was really excited when I saw you start to neural networks after your great machine learning videos. Can you create a playlist for neural networks as you did for machine learning? It is easier to follow with playlists. Thanks

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      Yes, soon! As soon as I finish all the videos in this series. There are at least 2, maybe 3 or 4 more to go.

  • @yacinerouizi844
    @yacinerouizi844 3 ปีที่แล้ว +1

    thnak you

  • @omerutkuerzengin3061
    @omerutkuerzengin3061 5 หลายเดือนก่อน +1

    Du bist toll!

    • @statquest
      @statquest  5 หลายเดือนก่อน

      bam!

  • @Vanadium404
    @Vanadium404 11 หลายเดือนก่อน +2

    That SoftPlus toilet paper and the Chain Rule sound effect in every video lol

    • @statquest
      @statquest  11 หลายเดือนก่อน

      :)

  • @nonalcoho
    @nonalcoho 3 ปีที่แล้ว +2

    Learning math is becoming sooooo easy and funny with your effort!
    Thank you so MUCH! BAM~~~~~~
    If possible, can you make a video about the "gradient vanishing problem" in the future~?

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      I'll keep that in mind.

  • @darshdesai2754
    @darshdesai2754 3 ปีที่แล้ว +1

    Hey Josh! Amazing content - as always. I have always found your videos to be very useful in understanding the fundamental ideas, rather than just accepting the 'theoretical definitions'. I just wanted throw out a suggestion that it would be great if you could collaborate with other open source/free for all learning mediums like Khan Academy. This would not only increase the viewer base for all open source platforms but it would also fill in the gaps where the content on your channel or their channel has not been created yet.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      I'll keep that in mind. However, I have no idea how to collaborate with Khan Academy. If you have suggestions, let me know.

  • @adriangabriel3219
    @adriangabriel3219 2 ปีที่แล้ว +1

    Why does it make sense to use a ReLu at the end? Is it to reduce the complexity of the green squiggle from a curvy to a pointy squiggle?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      It restricts the final output to be between 0 and 1.

  • @anirudhsingh9025
    @anirudhsingh9025 ปีที่แล้ว

    i have been watching your quest for NN for past few days and the way you explain is good but i didn't get one thing that you said about adding 2 lines on a graph ? So how do we add 2 lines on a graph and finde a third curve?

    • @statquest
      @statquest  ปีที่แล้ว

      What time point, minutes and seconds, are you asking about?

  • @HarryKeightley
    @HarryKeightley 4 หลายเดือนก่อน

    Thank you for the very clear explanations - this video series is wonderful in its capacity to communicate complex topics in a very clear and understandable manner. I have a question: does the use of the ReLu function for this network result in a less accurate model overall due to its piecewise linearity? It seems the more elegant curve has been replaced by a more simplistic linear-looking triangle. Would this mean a larger network would be needed in order for the model to be more accurate - so that the non-linear relationship between dosage and effectiveness can be modelled more accurately through a more complete/complex interaction of nodes?

    • @statquest
      @statquest  4 หลายเดือนก่อน +1

      That's a good question and I'm not 100% what the answer is other than, "it probably depends on the data". That said, ReLU, because of it's simplicity, allows for much deeper neural networks (more hidden layers and more nodes per layer) than the sigmoid shape, and, as a result, allows for more complicated shapes to be fit to the data. ReLU's introduction to neural networks a little over 20 years ago made a huge impact on AI because "deep learning" wasn't possible with the sigmoid shape. In contrast, with it's super simple derivative, ReLU allowed neural networks to model much more complicated datasets than ever before.

    • @HarryKeightley
      @HarryKeightley 4 หลายเดือนก่อน +1

      @@statquest Thanks for taking the time to answer my question, much appreciated :). I'm very much looking forward to watching the rest of your videos related to NNs and related concepts.

  • @lisun7158
    @lisun7158 2 ปีที่แล้ว +3

    [Notes excerpt from this video]
    7:10 The reason why ReLu works. -- Like other activation function, the weights and bias on the connection slice, flip and stretch the function image into new shape.
    7:40 The method to solve the problem that the derivative of ReLu function is not defined at bent point (0,0). -- Manually define the derivative to be 0 or 1.

  • @adityams1659
    @adityams1659 2 ปีที่แล้ว +1

    *this video/ most of his videos have less than 100K views!!??*
    *ppl are missing out on a gold mine!*

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you! :)

  • @Intaberna986
    @Intaberna986 ปีที่แล้ว +1

    God bless you, I mean it.

  • @mischievousmaster
    @mischievousmaster 3 ปีที่แล้ว +1

    Josh, could you please do a video on NLP and its implementation in python?. Would really love that.
    And about the video, it is awesome as always!

    • @statquest
      @statquest  3 ปีที่แล้ว +3

      I'll keep that in mind.

  • @seanlynch4354
    @seanlynch4354 ปีที่แล้ว

    I have a question, will they y output in a ReLU function always be the same as the x input if the, x input is greater than 0?

  • @Odiskis1
    @Odiskis1 3 ปีที่แล้ว +1

    How do we know values won't become really high above 0? I though the activation function contained the values in both negative and positive directions so that they wouldn't explode. Is that not a problem?

    • @statquest
      @statquest  3 ปีที่แล้ว

      A much bigger problem is something called a "vanishing gradient", which is where the gradient gets very small and mover very slowly - too slowly to learn. ReLU helps eliminate that problem by having a gradient that is either 0 or 1.

  • @vishaltyagi5000
    @vishaltyagi5000 3 ปีที่แล้ว +1

    Hi really love your work. Any plans on doing the Recurrent Neural networks including the modern RNN units (LSTM, GRU)?

    • @statquest
      @statquest  3 ปีที่แล้ว +3

      I'll keep that in mind. I'm working on convolutional neural networks right now.

    • @vishaltyagi5000
      @vishaltyagi5000 3 ปีที่แล้ว +4

      @@statquest Thanks. Appreciate your efforts 🙂

  • @anshulbisht4130
    @anshulbisht4130 ปีที่แล้ว

    hey josh m how u added blue and orange line @5.24. i mean how that blue line in -y axis came in +y axis ( we need to multiply it by something which is not shown in video ) ?.. hopefully u reply soon :)

    • @statquest
      @statquest  ปีที่แล้ว

      We added the y-axis coordinates of the two lines. However, those lines are not exactly to scale, so that might be confusing you.

  • @user-rt6wc9vt1p
    @user-rt6wc9vt1p 3 ปีที่แล้ว +2

    How would we deal with the derivative of this function? I've read that the derivative is 0 for x < 0 and 1 for x > 0, but I'm having issues in that when weights are initialized below zero, (something like -0.5), the derivative of the activation function is 0. The chain rule would then make the entire gradient for that weight 0, and the weight would just never change.

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      Yes, that's true, so you may have to try a few different sets of random numbers. However, in practice, we usually have much larger networks with lots of inputs and lots of connections to every node and a lot of data for training. In this case, because we sum all of the connections, having a few negative weights may, for some subset of the training data, result in 0 for the derivative, but not for all data.

    • @user-rt6wc9vt1p
      @user-rt6wc9vt1p 3 ปีที่แล้ว +1

      @@statquest Ah, I see. My network had 1 neuron per layer, makes sense now

  • @Red_Toucan
    @Red_Toucan 10 หลายเดือนก่อน +3

    I was struggling (well, I still am) with understanding activation functions and these videos are helping me a lot. And man, you answer every single comment, even in old videos. Thanks a lot and many bams from Argentina :D .
    There's one thing that I did't quite get, and maybe this is reviewed in the next episodes.
    I understand activation functions are used to introduce "non-linearity" to predictions, but to me they still seem very arbitrary. I mean, why would I, for example, with ReLU in mind, keep positive values and change negative values to 0? Am I not losing a lot of information there?
    I know when it comes to deep learning sometimes the answer is along the lines of "because some dude tried it 10 years ago and it worked. Here's a paper discussing it" but I'd still like to ask.

    • @statquest
      @statquest  10 หลายเดือนก่อน +1

      It might help to see the ReLU in action. Here's an example that shows how it can help us fit any shape to the data: th-cam.com/video/83LYR-1IcjA/w-d-xo.html

    • @SonkMc
      @SonkMc 7 หลายเดือนก่อน

      No estás perdiendo información porque lo que viaja después de que empiezas a manipular el algoritmo ya no es la información original, sino una "sombra", y lo que intentas hacer es ajustar los parámetros para que todo quede ajustado a esa sombra.
      El objetivo de las funciones de activación es agregar no linearidad, pero recuerda que la magia también está en las neuronas y las capas ocultas de la red. Cada capa oculta agrega un punto de intersección, y eso va a ajustar más la información que se observa

  • @hongyichen8369
    @hongyichen8369 3 ปีที่แล้ว +2

    Hi, there are a lot of activation functions like relu and tanh etc..., can you make a video about the usage of different activation functions?

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      I'll keep that in mind.

  • @nikachachua5712
    @nikachachua5712 2 ปีที่แล้ว

    how would that green line fit the data if we dont apply relu on last node ?

    • @statquest
      @statquest  2 ปีที่แล้ว

      It's hard to say since we'd have to retrain and calculate new weights and biases.

  • @chenzeping9603
    @chenzeping9603 ปีที่แล้ว

    if you add activation funct to the final layer, doesn't it restrict the possible output values? (i.e. if you add relu to the last layer before the output, doesn't it restrict outputs to pos? similarly if you use sigmoid, doesnt it restrict it to between [0, 1]

    • @statquest
      @statquest  ปีที่แล้ว

      I'm a little confused by your question. Are you asking about what happens at 5:35 ?

  • @epistemophilicmetalhead9454
    @epistemophilicmetalhead9454 7 หลายเดือนก่อน +1

    note regarding relu: derivative at 0 is not defined so we assume the derivative at 0 o be either 0 or 1.

    • @statquest
      @statquest  7 หลายเดือนก่อน

      yep

  • @carzetonao
    @carzetonao ปีที่แล้ว +1

    Really like your video and r shirt is nice

    • @statquest
      @statquest  ปีที่แล้ว

      Thank you so much 😀!

  • @arkobanerjee009
    @arkobanerjee009 3 ปีที่แล้ว +3

    Brilliant as usual. Is there an SQ on softmax activation function in the pipeline?

    • @statquest
      @statquest  3 ปีที่แล้ว +4

      Thank you and yes. Right now I'm working on Convolutional Neural Networks and image recognition, but after that (and perhaps part of that) we'll cover softmax.

    • @masteronepiece6559
      @masteronepiece6559 3 ปีที่แล้ว +3

      @@statquest If you can visualize what happens to the data to show us how those methods are working. Best regards,

  • @Luxcium
    @Luxcium ปีที่แล้ว

    *I am already familiar with Neural Networks Part One* 😂😂😂 So I this my quest will start here so it’s time to _Start Quest_ This time my quest is leading me to the *ReLU in Action* then I will unwind and back propagar 🎉the *Recurrent Neural Networks (RRNs)…* I will then learn What is « *Seq2Seq* »but I must go watch *Long Short Term-Memory* I think I will have to check out the quest also *Word Embedding and Word2Vec…* and then I will be happy to come back to learn with Josh 😅 I am impatient to learn *Attention for Neural Networks* _Clearly Explained_

    • @statquest
      @statquest  ปีที่แล้ว

      Please just just watch the videos in order: th-cam.com/play/PLblh5JKOoLUIxGDQs4LFFD--41Vzf-ME1.html

  • @bibiworm
    @bibiworm 3 ปีที่แล้ว +1

    at 8:17, could you please explain in details why it does not matter that RELU is bent? How does it related to gradient vanishing/exploding? The reason I am asking is that if like you said we can get around the non-differentiability at bent point by setting gradient to 0, then it leads to gradient vanishing during back propagation, right? Thanks.

    • @statquest
      @statquest  3 ปีที่แล้ว

      The gradient for ReLU is either 1 or 0. Thus, the gradient can not vanish unless every single value that goes through it is less than 0. If this happens, the node goes dark and becomes unusable, but this is rare if your data is relatively large.

    • @bibiworm
      @bibiworm 3 ปีที่แล้ว

      @@statquest Thanks. That makes sense. I guess what I didn't understand was that you said in the video that "we can simply define the gradient at bent point to be 0 or 1". So the choice really does not make a difference? Thanks.

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      @@bibiworm It doesn't make a difference because the probability of having an input value equal = 0 *exactly* (and not 0.000001 or -0.00001) is pretty much 0. So it doesn't matter what value we give the derivative at *exactly* 0.

    • @bibiworm
      @bibiworm 3 ปีที่แล้ว +1

      @@statquest thank you!

  • @alfadhelboudaia1935
    @alfadhelboudaia1935 3 ปีที่แล้ว +2

    Hi, you are really awesome, would appreciate it, if you do a video on the (Maximum a posterior estimation (MAP))?

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'll keep that in mind.

  • @flockenlp1
    @flockenlp1 ปีที่แล้ว

    Hi, are you planning on making a video on Radial Basis Function Networks and self-organizing maps?
    Especially with self-organizing maps it's very hard to find good ressources, at least so far I found nothing that could really help me wrap my head around this topic. I figured, since this seem like your kind of topic and you have a hand for explaining these things in an easy to understand way there'd be no harm in asking :)

    • @statquest
      @statquest  ปีที่แล้ว

      I have a video on the radial basis function with respect to Support Vector Machines, if you are interested in that. SVMs: th-cam.com/video/efR1C6CvhmE/w-d-xo.html Polynomial Kernel: th-cam.com/video/Toet3EiSFcM/w-d-xo.html and Radial Basis Function Kernel: th-cam.com/video/Qc5IyLW_hns/w-d-xo.html

  • @dengzhonghan5125
    @dengzhonghan5125 3 ปีที่แล้ว

    Can u also talk about CNN and RNN? U r my favorite teacher.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      CNNs are coming up soon.

  • @edwinokwaro9944
    @edwinokwaro9944 16 วันที่ผ่านมา

    you did not show how the rest of the weights are updated? I need to understand how the derivative of the activation function affects the weight update

    • @statquest
      @statquest  16 วันที่ผ่านมา

      See this video for details and just replace the derivative of the softmax with 0 or 1 depending on the value for x: th-cam.com/video/GKZoOHXGcLo/w-d-xo.html

  • @Bbdu75yg
    @Bbdu75yg ปีที่แล้ว +1

    Wow!

  • @prasadphatak1503
    @prasadphatak1503 3 ปีที่แล้ว +3

    Tiny bam 😂 omfg I couldn't stop laughing. It's like Ryan Reynolds is explaining Neural Networks 😂

  • @r0cketRacoon
    @r0cketRacoon 3 หลายเดือนก่อน +1

    could you do another video on back propagation with 2 hidden layers combined with relu functions ?
    I really love ur visualization

    • @r0cketRacoon
      @r0cketRacoon 3 หลายเดือนก่อน

      Acccording to the formula for the deravative of loss function with respect to weights/bias in the previous video, if we replace the softplus function with ReLu function, then (e**x / (1+e**x)) is replaced with 0 or 1.
      If 0 then the deravative of loss function with respect to weights/bias is 0, then step size = 0, then there is no tweak in weights, bias
      Can u make a video for that? Or may i be wrong

    • @statquest
      @statquest  3 หลายเดือนก่อน

      Yes, that's correct. So, for very simple neural networks, it can be hard to train with ReLU. However, for bigger, more complicated networks, the value is rarely 0 since there are so many possible inputs.

    • @r0cketRacoon
      @r0cketRacoon 3 หลายเดือนก่อน +1

      @@statquest tks, I'm really appreciate ur dedication to making a video like this. It has helped so much around the concepts

  • @Anujkumar-my1wi
    @Anujkumar-my1wi 3 ปีที่แล้ว +1

    i want to know that whether weights in neural network are linear relation between input and nonlinear output or is it something else?

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      The weights and biases are linear transformations. Remember, the equation for a line is y = slope * x + intercept, and we can replace the "slope" with the "weight" and the "intercept" with the "bias". So y = weight * x + bias = a linear transformation. All of the non-linearity comes from the non-linear activation functions.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 3 ปีที่แล้ว

      ​@@statquest so are we using weights in neural network as a linear relation between the input and nonlinear output and we introduce nonlinearity to linear combination because it is the linear combination of inputs and weights(showing the linear relation between the input and nonlinear output) as the linear combination is showing the linear relation between input and output and we're using nonlinear activation function so to convert that linear relation between input and nonlinear into nonlinear result.

    • @nguyenngocly1484
      @nguyenngocly1484 3 ปีที่แล้ว +1

      f(x)=x is connect. f(x)=0 is disconnect. You can view ReLU as a switch. What are being connected and disconnected? Dot products (weighted sums.) The funny thing is if all the switch states are known you can simplify many connected dot products into a single dot product.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 3 ปีที่แล้ว +1

      @@nguyenngocly1484 hey ,thanks ,Can you tell me that why we use weighted sum as an activation function's input ,can't we use neuron's input as activation function's input in a neural network.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      @@Anujkumar-my1wi The "weighted sum" is simply the weights times the input value. If you have multiple connections to a neuron, then each one has it's own weight - just like the connections to the final ReLU function in this video.

  • @MrFindmethere
    @MrFindmethere 10 หลายเดือนก่อน

    How do we add the resulting graphs together

    • @statquest
      @statquest  10 หลายเดือนก่อน

      What time point, minutes and seconds, are you asking about?

  • @user-oj6uc1kv6u
    @user-oj6uc1kv6u 3 ปีที่แล้ว

    Can you make a video about Recursive Feature Elimination ? I like your video style.

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'll keep that in mind.

  • @ThePanagiotisvm
    @ThePanagiotisvm 3 ปีที่แล้ว +1

    It's only me that I didn't understand where the values of weights and bias come from? Why for example the first weight w1=1.70?

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      The weights and biases come from backpropagation, which I talk about in Part 1 in this series, and the show a simple example in Part 2 th-cam.com/video/IN2XmBhILt4/w-d-xo.html and then go into more detail in these videos: th-cam.com/video/iyn2zdALii8/w-d-xo.html th-cam.com/video/GKZoOHXGcLo/w-d-xo.html

    • @ThePanagiotisvm
      @ThePanagiotisvm 3 ปีที่แล้ว +1

      @@statquest thank you!! Back propagation video made it clear.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      @@ThePanagiotisvm bam!

  • @slkslk7841
    @slkslk7841 2 ปีที่แล้ว

    At 7:45 could you please tell why gradient descent wouldn't work for a bent line? Gradient descent videos didn't help clear this doubt.
    Amazing video btw! Thanks

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Gradient Descent needs to have a derivative defined at all points. Technically the bent line does not a derivative when x = 0 (at the bend). However, at 8:03 I say that we just define the derivative at x = 0 to either 0 or 1, and when we do that, gradient descent works just fine with the ReLU.

  • @rafibasha4145
    @rafibasha4145 2 ปีที่แล้ว

    Hi Josh,thank you for the excellent videos ..how the input data splitted across 2 hidden neurons

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      I'm not sure I understand your question. However, I will say that input data simply follows the connections from the input node to the nodes in the hidden layers. For details, see: th-cam.com/video/CqOfi41LfDw/w-d-xo.html

    • @rafibasha1840
      @rafibasha1840 2 ปีที่แล้ว +1

      @@statquest ,Thanks Josh

  • @ProEray
    @ProEray 3 ปีที่แล้ว +6

    I desperately need a recurrent neural networks video :'(

    • @ProEray
      @ProEray 3 ปีที่แล้ว +1

      Good Job btw, liked and subscribed as always

    • @statquest
      @statquest  3 ปีที่แล้ว +5

      Thanks! I'm working on convolutional neural networks right now.

  • @user-bz8nm6eb6g
    @user-bz8nm6eb6g 3 ปีที่แล้ว +2

    Wow Wow

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @bibiworm
    @bibiworm 3 ปีที่แล้ว +1

    Could you shed some light on the advantage and disadvantage of Relu vs Soft Plus please? Thank you. I didn't know there was soft plus until this video. lol

    • @statquest
      @statquest  3 ปีที่แล้ว

      There seems to be a raging debate as to whether or not ReLU is better or worse than Soft Plus and it could be domain specific. So I don't really know the answer - maybe just try them both and see what works better.

    • @bibiworm
      @bibiworm 3 ปีที่แล้ว

      @@statquest thank you!

  •  3 ปีที่แล้ว +1

    I gave "like" before watched it...

  • @user-et8es9vg5z
    @user-et8es9vg5z 3 หลายเดือนก่อน +1

    As always everything is very clear but I still don't understand why is RELU function is currently the most effective function in machine learning. I mean, the shape seems so less natural than for the softplus function for instance.

    • @statquest
      @statquest  3 หลายเดือนก่อน

      It's simple - super easy to calculate - so it doesn't take any time, which means we can use a lot more of them to fit more complicated shapes to the data.

  • @nirajpattnaik6294
    @nirajpattnaik6294 3 ปีที่แล้ว +2

    Awesome.. Awaiting CNN from SQ ..

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      It should come out in the next few weeks.

  • @sgrimm7346
    @sgrimm7346 8 หลายเดือนก่อน

    So, in other words, if the output is anything less than zero, Relu outputs a 0. If it's 0 or above, the output is that value. Ex: -.49=0, .15=.15 or whatever, .5=.5, -15=0. Is this correct?

    • @statquest
      @statquest  8 หลายเดือนก่อน

      yep

  • @abbastailor3501
    @abbastailor3501 2 ปีที่แล้ว +1

    Take me to your leader Josh 🤲

  • @onemanshow3274
    @onemanshow3274 3 ปีที่แล้ว +1

    Hey Josh, Can you please please make videos on Recurrent Neural Network and Transformers

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      I'll keep that in mind.

  • @TrungNguyen-ib9mz
    @TrungNguyen-ib9mz 3 ปีที่แล้ว

    Great video!! But might you explain more about how to estimate w1,w2,b1,b2,...? Thank you!

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      I explain how to estimate parameters (w1, w2, b1, b2 etc.) in Part 2 in this series (this is Part 3): th-cam.com/video/IN2XmBhILt4/w-d-xo.html and in these videos: th-cam.com/video/iyn2zdALii8/w-d-xo.html and th-cam.com/video/GKZoOHXGcLo/w-d-xo.html

    • @TrungNguyen-ib9mz
      @TrungNguyen-ib9mz 3 ปีที่แล้ว +1

      @@statquest Thank you so much!

  • @Okkyou
    @Okkyou 3 ปีที่แล้ว

    Does that mean, I can chose any activation function?

  • @nickmishkin4162
    @nickmishkin4162 หลายเดือนก่อน

    Nice video. If ReLU always outputs a positive number, how can the neural network produce a negative sloping curve?

    • @statquest
      @statquest  หลายเดือนก่อน +1

      A weight that comes after the ReLU can be negative, and flip it over.

    • @nickmishkin4162
      @nickmishkin4162 หลายเดือนก่อน

      @@statquest So is it inefficient to end a neural network with a ReLU function? Because then we never allow the network to generate a negative slope.
      Correct me if I'm wrong:
      Input -> ReLU(X1) -> only positive outputs -> negative weights -> ReLU(X2) -> only positive final outputs.
      I guess my real question is this: can negative weights followed by a ReLU function produce a negative slope?
      Thanks!

    • @statquest
      @statquest  หลายเดือนก่อน +1

      ​@@nickmishkin4162 This video pretty much illustrates everything you want to know about the ReLU. Look at the shape of the function that comes out of the final ReLU at 5:35

    • @nickmishkin4162
      @nickmishkin4162 28 วันที่ผ่านมา +1

      @@statquest Yes! Didn't realize your nn ended with a ReLU. Thank you

  • @miriza2
    @miriza2 3 ปีที่แล้ว +2

    Triple BAM!!! 💥 💥 💥

    • @statquest
      @statquest  3 ปีที่แล้ว

      :)

    • @zeetech0123
      @zeetech0123 3 ปีที่แล้ว

      Fourple BAM!!!!
      .
      .
      .
      .
      .
      .
      .
      (ik its not a word :p lol)

  • @julescesar4779
    @julescesar4779 2 ปีที่แล้ว +1

  • @muzammelmokhtar6498
    @muzammelmokhtar6498 2 ปีที่แล้ว

    Great video, but i dont really understand about the curvy and bent ReLU on the last part of the video..

    • @statquest
      @statquest  2 ปีที่แล้ว

      What time point, minute and seconds, are you asking about?

    • @muzammelmokhtar6498
      @muzammelmokhtar6498 2 ปีที่แล้ว

      @@statquest 7.42-8.15

    • @statquest
      @statquest  2 ปีที่แล้ว

      @@muzammelmokhtar6498 Bent lines don't have derivatives where they are bent. This, in theory, is a problem for backpropagation, which is what we use to optimize the weights and biases. To get around this, we simply define a value for the derivative at the bend. For details on back propagation, see: th-cam.com/video/IN2XmBhILt4/w-d-xo.html th-cam.com/video/iyn2zdALii8/w-d-xo.html and th-cam.com/video/GKZoOHXGcLo/w-d-xo.html

    • @muzammelmokhtar6498
      @muzammelmokhtar6498 2 ปีที่แล้ว +1

      @@statquest ouh okay, thank you👍