C4W2L03 Resnets

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 59

  • @joshsmit779
    @joshsmit779 5 ปีที่แล้ว +167

    your videos are iconic and should be preserved in the national library

    • @zy4663
      @zy4663 2 ปีที่แล้ว +3

      Internet is a nice global digital library right?

    • @saranghaeyoo8239
      @saranghaeyoo8239 2 ปีที่แล้ว +3

      *international library

  • @gravitycuda
    @gravitycuda 6 ปีที่แล้ว +38

    Hi Andrew, You are my first professor who taught me ML. I studied your course at Coursera nice seeing you again.

    • @ThamizhanDaa1
      @ThamizhanDaa1 3 ปีที่แล้ว

      Periyar oru porukki.. avan nijamaana naathigane kidaiyaadhu.. avar berum hindukkal manathai punpaduthara maathiriye pesuvar..

  • @viniciussantana8737
    @viniciussantana8737 4 ปีที่แล้ว +1

    Andrew os simply the best instrutor in neural networks subject out there. Helped me a Lot.

  • @JonesDTaylor
    @JonesDTaylor 4 ปีที่แล้ว +5

    I am doing the DL specialization after finishing your old ML course. By far you are the best teacher out there. Thank you so much for this.

  • @zekarias9888
    @zekarias9888 4 ปีที่แล้ว +8

    WooW. After I watched 5 other videos about ResNets, I was still lost. Now I got this video and it cleared my misunderstandings out of my mind. Super cool!

  • @Cicatka.Michal
    @Cicatka.Michal 3 ปีที่แล้ว +2

    Finally got it! Sometimes it is hard for me to grasp even the basic concepts if I don't know what I should understand and take from the topic. Thank You very much for these videos that will at first tell you what problem you are trying to solve is and why you should solve it and then clearly explains the soulutions. Thumbs up! :)

  • @sanjurydv
    @sanjurydv 5 ปีที่แล้ว +1

    never seen tutorial videos which such a clear explanation. He is the best

  • @Alex-xx8ij
    @Alex-xx8ij 2 ปีที่แล้ว

    Your explanation is very clear! Thank you for the lecture.

  • @JohnDoe-vr4et
    @JohnDoe-vr4et 4 ปีที่แล้ว +15

    Me after listening to most people explaining Resnet: "What? Why? Why do you do this?"
    Me after listening to Andrew: "Makes sense. Easy peasy."

  • @davidtorres5012
    @davidtorres5012 3 ปีที่แล้ว

    You are the best, Andrew

  • @HexagonalClosePacked
    @HexagonalClosePacked 5 ปีที่แล้ว

    I'm trying to understand the components behind Semantic Segmentation and your videos really helped!

  • @iasonaschristoulakis6932
    @iasonaschristoulakis6932 2 ปีที่แล้ว

    Excellent both theoretically and technically

  • @Ganitadava
    @Ganitadava 2 ปีที่แล้ว

    Sir, very nice explanation as always, thanks a lot.

  • @altunbikubra
    @altunbikubra 4 ปีที่แล้ว

    Thank you, it was a very brief and simplified explanation, loved it.

  • @swfsql
    @swfsql ปีที่แล้ว

    I think we could use the ResNet concept to improve Dropout, creating a "shutdown" regularization:
    Select a layer (or rather nodes from that layer) that ought to be shutdown and instead only act on the cost function, by adding a cost relative to that layer not being an identity layer. Then the network is free to gradually adapt itself (hopefully by reducing train-set overfit and generalizing) as to push that layer into being evermore so of an identity one. Then if that layer manages to be an identity, it can be permanently shutdown.
    This could be a way to reduce the network size, and maybe could automatically be applied on high variance with low bias.
    As far as linear Z functions go, one way for a layer to be an identity is if it has the same amount of nodes as inputs, and if you make a cost for each node[j] so that only their weight[j] is 1 while all other weights are 0, so this would be similar to a "identity" Z layer. But I think that trying to make the activation function also an identity is a hassle, but even ignoring the activation function, if you could still manage to just shutdown the Z function nodes and stack the posterior activation back into the previous activation, that would already be a network simplification.
    Edit: We also could try to simplify the activation functions if we generalize them and re-parametrize them. Eg. for a ReLU activation function, we could turn it into a leaky ReLU where the leaky-side parameter starts at zero (so it's just like normal ReLU), then we add a cost of that parameter being zero and we let backprop start pushing it towards 1, in which case that previously ReLU activation has turned into the identity activation, which can them be gracefully shutdown.

  • @MeAndCola
    @MeAndCola 5 ปีที่แล้ว +14

    "man pain" 2:10 😂

    • @altunbikubra
      @altunbikubra 4 ปีที่แล้ว

      omg he is really writing that :D

  • @MrRynRules
    @MrRynRules 3 ปีที่แล้ว

    Thank you!

  • @promethful
    @promethful 4 ปีที่แล้ว +2

    So the skipped connections don't literally skip layers but rather add the original input onto the output of the 'skipped' layers?

    • @АннаКопатько
      @АннаКопатько 3 ปีที่แล้ว

      I think so too, at least it is the only explanation that I understand

  • @mannansheikh
    @mannansheikh 4 ปีที่แล้ว

    Great

  • @steeltwistercoaster
    @steeltwistercoaster 4 ปีที่แล้ว

    +1 this is great

  • @kumarabhishek5652
    @kumarabhishek5652 3 ปีที่แล้ว +1

    Why training error is increasing in reality as opposed to theory in plain model??

  • @rahuldogra7171
    @rahuldogra7171 3 ปีที่แล้ว

    what is the benefit of adding identity blocks and skip it? Instead of skipping it why then we are adding?

  • @academicconnorshorten6171
    @academicconnorshorten6171 5 ปีที่แล้ว +5

    Do you broadcast a[l] to make it match the dimensionality of a[l+2]?

  • @wliw3034
    @wliw3034 3 ปีที่แล้ว

    Good

  • @patrickyu8470
    @patrickyu8470 2 ปีที่แล้ว

    Just a question for those out there - has anyone been able to use techniques from ResNets to improve the convergence speed of deep fully connected networks? Usually people use skip connections in the context of convolutional neural nets but I haven't seen much gain in performance with fully connected ResNets, so just wondering if there's something else I may be missing.

  • @whyitdoesmatter2814
    @whyitdoesmatter2814 4 ปีที่แล้ว

    Wait!!!!! Z_{l+1} should normally be equal to W_{l}a_{l} + b_{l+1}??

  • @amir06a
    @amir06a 4 ปีที่แล้ว

    I have a very silly doubt if the skip layers/connections exist isn't the real layers in play = total layers/2?

  • @ravivaghasiya5680
    @ravivaghasiya5680 2 ปีที่แล้ว

    Hello everyone. In this video at time 5.20 you have mentioned that as number of layers increase in plain network , training error gets increased in practice. Could you please explain me or Share me some references why does this actually occurs? One reason,i found that vanishing gradient problem and this can be addressed using ReLU. Thus, one can use ReLU in plain network. Then why does ResNet is very traditional?

  • @chrlemes
    @chrlemes 2 ปีที่แล้ว

    I think the title of the paper is wrong. The correct one is "Deep residual learning for image recognition".

  • @rahulrathnakumar785
    @rahulrathnakumar785 5 ปีที่แล้ว +1

    If a_l skips two layers to directly enter the final ReLU, how do we get the z_(l+2) in the final equation a_(l+2) =g(z_(l+2) + a_(l))? Thanks!

    • @IvanOrsolic
      @IvanOrsolic 5 ปีที่แล้ว +2

      You still calculate them, you just keep a copy of the original a_l value and plug it into the network before calculating a[l+2].
      Why would you even do that? It's explained in the next video

    • @mohe4ever514
      @mohe4ever514 3 ปีที่แล้ว

      @@IvanOrsolic If we plug this value to the network, then what is the benefit of skipping the layers? still we are going through the layers to calculate a[l+2], we just added one more term here but how it helps in skipping the connection ?

  • @astropiu4753
    @astropiu4753 4 ปีที่แล้ว +7

    there's some high-frequency noise in many of this specialization's videos which is hurting my ears.

  • @swapnildubey6428
    @swapnildubey6428 5 ปีที่แล้ว +1

    how are the dimensions handled, I mean that dimension of a[l] could happen to be unequal to a[l + 2];

    • @SuperVaio123
      @SuperVaio123 4 ปีที่แล้ว +1

      Padding

    • @s.s.1930
      @s.s.1930 3 ปีที่แล้ว

      padding with zeros or using an Conv 1x1 inside skip connection

  • @shivani404sheth4
    @shivani404sheth4 3 ปีที่แล้ว

    Meet the ML god

  • @trexmidnite
    @trexmidnite 3 ปีที่แล้ว

    Sounds like terminator..

  • @adityaniet
    @adityaniet 6 ปีที่แล้ว

    Hi Andrew , I have a question, when we are calculating al+2 we need both al and z[l+2] . But z[l+2] can only be calculated by calculating a[l+1] , so how will we get that ? ....Many thanks:-)

    • @larryguo2529
      @larryguo2529 6 ปีที่แล้ว +1

      If I understand your question correctly, a[I+2] = activation of z[I+2] ......

    • @freee8838
      @freee8838 6 ปีที่แล้ว

      just like in formula a[l+1]=g(z[l+1])...

    • @_ashout
      @_ashout 6 ปีที่แล้ว +3

      Yep this confuses me as well. How can we have both z[l+2] and a[l+1] if a[l+1] skips a layer?

  • @lorenzosorgi6088
    @lorenzosorgi6088 4 ปีที่แล้ว

    is there any theoretical motivation justifying the increasing error of a deep plain network during training?

    • @mohnish.physics
      @mohnish.physics 4 ปีที่แล้ว

      Theoretically, the error should go down. But in practice, I think the exploding gradients for a network with a large number of layers increases the error.

    • @tumaaatum
      @tumaaatum 3 ปีที่แล้ว

      Yes there is. I am not sure why Andrew Ng didn't touch up on this. Basically once you add the skip connection you are including an additive term inside the non-linearity. The additive term can only increase the function space (the range of the function www.intmath.com/functions-and-graphs/2a-domain-and-range.php) as it is inside the original function (the theory of nesting functions). Hence, you allow the network to have more approximating/predictive capacity in each layer. You can visit the D2L lectures about this:
      d2l.ai/chapter_convolutional-modern/resnet.html?highlight=resnet

  • @kavorkagames
    @kavorkagames 5 ปีที่แล้ว

    I find a ResNet behaves as a shallower net. It gives a solution that resembles that of a four to six (roughly) layered net when being eight laters. ResNets are out for me.

    • @okktok
      @okktok 5 ปีที่แล้ว +3

      Kavorka Games ResNets are now that state of the art for image recognition. Every new architecture uses it and doesn’t make sense anymore to use plain networks.

    • @amir06a
      @amir06a 4 ปีที่แล้ว

      @@okktok but isn't the actual layers in play = total layers/2 as we are providing a shortcut?
      so, on a broader note, they are just like plain networks which looks bigger?

  • @mikebot5361
    @mikebot5361 5 ปีที่แล้ว

    If we use resnets, are we losing the information in between the layers?

    • @s.s.1930
      @s.s.1930 3 ปีที่แล้ว

      no, we're not losing them - we just add x after a amount of layers (in this example 2 layers) - this is our ResNet block

  • @ahmedb2559
    @ahmedb2559 ปีที่แล้ว

    Thank you !