PyTorch Hooks Explained - In-depth Tutorial

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ม.ค. 2025

ความคิดเห็น • 143

  • @rockapedra1130
    @rockapedra1130 9 หลายเดือนก่อน +3

    These lectures are gold!!

  • @altostratous
    @altostratous 3 ปีที่แล้ว +2

    Most professional video I've ever seen in programming.

  • @scottmiller2591
    @scottmiller2591 3 ปีที่แล้ว +4

    Very nice presentation - good pacing, good use of animation, good example - for complicated subject that is not explained clearly in one place in the documentation, but scattered throughout without a unifying set of examples.

  • @kartheekakella2757
    @kartheekakella2757 4 ปีที่แล้ว +13

    awesome vid! this channel's gonna go viral, take my word for it..

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +1

      Thanks, Sukruth! You're the first to comment this prediction as far as I can remember. I'll try to make it come true.

    • @xzl20212
      @xzl20212 3 ปีที่แล้ว

      @@elliotwaite I really appreciate the quality of your video. Glad you do not sacrifice quality for the subscription.

  • @hilmandayo
    @hilmandayo 4 ปีที่แล้ว +10

    I LOVE the small and little details/catch-up you threw into the video!
    It can clear a lot of doubts that beginner will probably face.
    Keep the videos coming! You contribute a lot to the world with this kind of video.
    Between, your channel makes a huge transition from producing totally random videos of exercise, etc. to deep learning ha3.

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว

      Thanks, Hilman! I'll keep the video coming.
      Hah, yep, this channel has been through some interesting phases. But I think I've found my niche.

  • @zichenwang8068
    @zichenwang8068 ปีที่แล้ว

    Thank you so much for sharing this high-quality tutorial.

    • @elliotwaite
      @elliotwaite  ปีที่แล้ว +1

      Woah, this is the first Super Thanks I've ever received. Thanks, Zichen! 😊 I'm glad you found the tutorial helpful.

  • @abhijitdeo2683
    @abhijitdeo2683 3 ปีที่แล้ว +1

    Dude this content is gold

    • @abhijitdeo2683
      @abhijitdeo2683 3 ปีที่แล้ว +1

      If u are planning for memeber only content and stuff too, I'm in man.. this is literally gold

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +1

      @@abhijitdeo2683, thanks! I don't have any plans for member only content at the moment, but I appreciate your comment.

  • @junweizheng1994
    @junweizheng1994 2 ปีที่แล้ว +1

    My first comment on TH-cam. This video is amazing and I can image you have done lots of work for making this video. I really appreciate that. Good contents, good presentation, good slides. This channel will get popular if you continue making great videos like this!

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว +1

      Thank you! I hope to make more videos in the future.

    • @chaupham1186
      @chaupham1186 2 ปีที่แล้ว +1

      @@elliotwaite Looking forward to it! Thanks for great videos

  • @SumerbankaNeOldu
    @SumerbankaNeOldu 4 ปีที่แล้ว +3

    Finally, i've nailed the hooks. Thank you :)

  • @khushpatelmd
    @khushpatelmd 3 ปีที่แล้ว +1

    Thank you so much. Please make more videos. You are incredible teacher!!

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +1

      Thanks for the comment, glad you liked the video. I have been thinking about starting to make videos again, and your encouragement helps.

  • @samanthaqiu3416
    @samanthaqiu3416 4 ปีที่แล้ว +4

    very interesting. I saw your autograd video and was very cool. Something that gets confusing for me is when you need to retain the graph in order to use the gradients computed in a first backward, in a second metaloss calculation

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +6

      Ah, the reason you have to set retain_graph=True is because the default behavior of the backward method is that after the gradients have been passed through, it will delete the data stored in the backward graph that was needed to calculate those gradients (such as the data for to the tensors that were used in the forward pass). The reason that this is the default behavior is because most of the time people only do one pass through the backward graph, and deleting the graph's data saves memory. So you have to specify that you want to keep the graph's data in memory if you want to do a second pass through it.
      Let me know if I didn't answer your question, or if there is anything you're still unsure about.

  • @abdelmananabdelrahman4099
    @abdelmananabdelrahman4099 ปีที่แล้ว

    Wow. Great content! We need more of these videos.

    • @elliotwaite
      @elliotwaite  ปีที่แล้ว

      Thank you for the encouraging comment. I may make more in the future.

  • @datascience3008
    @datascience3008 ปีที่แล้ว

    Its amazing how you can reply to all comments that you recieve.

    • @elliotwaite
      @elliotwaite  ปีที่แล้ว

      🙂 I enjoy the subject matter. Also, it's not too many to get overwhelmed by at the moment.

  • @pizhichil
    @pizhichil 4 ปีที่แล้ว +1

    Thank you so much for this video. As always, very helpful. If not this one, it would have taken a large effort to understand everything ... thanks

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว

      Thanks, Amit! I'm glad you found it helpful.

  • @raunakkbanerjee9016
    @raunakkbanerjee9016 5 หลายเดือนก่อน

    Excellent video.. crystal clear explanations

    • @elliotwaite
      @elliotwaite  5 หลายเดือนก่อน

      @@raunakkbanerjee9016 thanks!

  • @fernandofariajunior
    @fernandofariajunior 6 หลายเดือนก่อน

    This video is so helpful, thanks for making it!

    • @elliotwaite
      @elliotwaite  6 หลายเดือนก่อน

      Thanks. I'm glad you liked it.

  • @shaozhuowang3403
    @shaozhuowang3403 4 ปีที่แล้ว +3

    It's great as always, thank you guys.

  • @ohotpow
    @ohotpow 3 ปีที่แล้ว

    Very good video! It should be liked in the pytorch documentation.

  • @jiangpengli86
    @jiangpengli86 4 หลายเดือนก่อน

    Marvelous tutorial. Thank you so much.

    • @elliotwaite
      @elliotwaite  4 หลายเดือนก่อน

      @@jiangpengli86 I'm glad to hear you liked it.

  • @carlossegura403
    @carlossegura403 3 ปีที่แล้ว

    Great summary and video-quality to PyTorch hooks!

  • @shubhamthapa7586
    @shubhamthapa7586 3 ปีที่แล้ว

    Wow thanx for making this video , finally my doubts are cleared now !

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      Gald it helped.

    • @shubhamthapa7586
      @shubhamthapa7586 3 ปีที่แล้ว

      @@elliotwaite yeah I was trying to implement grad cam so thought of clearing the concept of hooks first and this video is just perfect for that.

  • @ArunKumar-bp5lo
    @ArunKumar-bp5lo 2 ปีที่แล้ว

    Nicely explain visually

  • @wayc04
    @wayc04 หลายเดือนก่อน

    Greate and helpful video! But I get the question that when I execute your program 12:50 and I check the grad of e by using print(e.grad), I find its output is tensor(2.).I get a little confused about it.

    • @elliotwaite
      @elliotwaite  หลายเดือนก่อน

      Thanks for pointing this out. It looks like maybe the way `.retain_grad()` works has changed since this video was recorded so that it will now always retain the output gradient after it has gone through all your other hook, which is why the gradient of e is now printed as 2.0 instead of 1.0. If you try to run the code on PyTorch 1.6.0 (which was the latest version when this video was made), you'll see the old behavior that's described in the video.

    • @wayc04
      @wayc04 หลายเดือนก่อน +1

      @@elliotwaite Thank you, I understand the reason now. Additionally, PyTorch has fixed the bug related to register_backward_hook by replacing it with register_full_backward_hook.

    • @elliotwaite
      @elliotwaite  หลายเดือนก่อน

      @wayc04 ah, good to know.

  • @jizhang2407
    @jizhang2407 3 ปีที่แล้ว

    @11:00, I don't get it why `return grad + 2` will update the `grad` variable, if this is not done by `grad +=2` and then `return grad` in thie c_hook function... Can anybody enlighten me? Thanks. Anyway, brilliant tutorial, and I learned a lot. Thank you, Elliot.

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      Thanks, glad you liked the tutorial. About your question, I think during the backward pass, the PyTorch code does something like this:
      grad = registered_hook_function(grad)
      So it updates the grad variable with whatever was returned from the hook function, but not in the same as an in-place operation would because the grad variable is now pointing to a different tensor, the tensor returned from the hook function (unless the returned tensor is the same tensor that was passed into the hook function). This new tensor is then passed along as the gradient to the next backward nodes in the graph.

  • @peasant12345
    @peasant12345 ปีที่แล้ว

    I still don't follow why c.grad will be modified to 100 14:32. Does it have an enforced integrity something like the grad of two sides of an add operation must be equal?

    • @elliotwaite
      @elliotwaite  ปีที่แล้ว

      Yeah, finding the gradient is basically saying, "if I change the input by a little bit, by how much will that change the output." And the add operation (or the sum of any number of inputs) will result in the inputs all having the same gradient with respect to the output, because changing any of the inputs by a little bit (dx), will changing the output by that exact same amount (dx). So the gradient for all the inputs of a sum with respect to the output is 1, which means that when backpropagating the gradients through the add (or sum) operation, the code can use the optimization of just passing the same gradient tensor to all the inputs. But this optimization is only safe if none of the backward hook functions apply any in-place operations to this shared tensor, because the result of an in-place operation would be visible to all that are using that shared tensor.

  • @samllanwarne6512
    @samllanwarne6512 4 ปีที่แล้ว +1

    at 4:55 when you multiply by 4, should accumulatedGrad for a be 8 and accumulatedGrad for b be 12, the other way around to in the video?

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +1

      I think the gradients are correct in this case, but it is a bit counterintuitive and this has mixed me up before. The counter intuitive part is that when you backprop through a multiplication, the gradients actually get multiplied by the flip of the input values. For example when you backprop through A * B, the gradient for A is B * the incoming gradient, and the gradient for B is A * the incoming gradient. This is because for each little increase in A, it will increase the output by that little increase times B, and for each little increase in B it will increase the output by that little increase times A.

    • @samllanwarne6512
      @samllanwarne6512 4 ปีที่แล้ว +1

      @@elliotwaite Thank you Elliot!

  • @phuclai4492
    @phuclai4492 ปีที่แล้ว

    great video, I love it !!! I hope you make more great videos in the future.

  • @aymensekhri2133
    @aymensekhri2133 2 ปีที่แล้ว

    Very amazing, could you please take for example the state-of-the-art models in deep learning and break them down and explain how the flow is working espacially those models that contains very specific pytorch methods like "register hooks". Because i have noticed that on youtube most of the youtubers are focusing on big terms on pytorch and they are explaing the simple concepts but once we get to the SOTA models we find many things new and complex.

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว

      That's a good suggestion for potential future videos, thanks. I've noticed that as well, that TH-camrs usually don't break down the more complex PyTorch models. I'm currently busy working on another project and have taken a break from making TH-cam videos, but I'll add this idea to my list of video ideas in case I get back into making TH-cam videos in the future.

  • @catthrowvandisc5633
    @catthrowvandisc5633 4 ปีที่แล้ว +1

    hey Elliot, thank you for this as well! i came to your channel for your autograd video and it really helped me quickly get a clearer picture. this one's just as good too. i really like how you incrementally take the problems deeper in each of your videos, they are very helpful to cement the understanding. would you be able to do one on pytorch distributed training? i couldn't find a good video explanation to help with it.

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +2

      Glad to hear you're finding these videos helpful. And thanks for the suggestion! I still don't have a good understanding of how PyTorch distributed training works either yet, but it seems like something I should learn at some point. I'm not sure when I'll get around to this one, since it seems like it might take a bit of research to get a deeper understanding of it, but I'll definitely add it to my list of potential future videos. And if I decide to make it, I'll leave another reply here letting you know when it's posted.

  • @andreborgescavalcante4589
    @andreborgescavalcante4589 2 ปีที่แล้ว

    One thing related is that for reading grad properties of intermediate tensors, we only need use first retain_grad() and then to return that tensor as an output of the forward method.

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว

      Thanks for mentioning this tip.

  • @jizhang2407
    @jizhang2407 3 ปีที่แล้ว

    @14:21, I don't understand why the in-place operation inside d_hook also changes the gradient passed to MulBackward0. Isn't the gradient of e, i.e. 1.0, is passed to both c and d as two "1.0"s, i.e. the same but two independent values? Can anybody enlighten me? Thanks.

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      Since the AddBackward0 node doesn't change the gradient, it saves memory by not duplicating the data and just passing along the same "1.0" tensor object to both c and d (or in other words, it passes along pointers that point to the same underlying data). This is why when that data is change in the d_hook by an in-place operation, it also affects the data that is seen in the c node.
      I hope that helps clarify.
      P.S. - Sorry for the late reply. I'm not sure how I missed your comment earlier. Thanks for the question.

  • @programmer8064
    @programmer8064 ปีที่แล้ว

    Thank you so much!!!!!!!!!!!!!!!!!!!!! I love this video!!!

  • @oheldad
    @oheldad 2 ปีที่แล้ว

    Great tutorial well done !

  • @markomitrovic4925
    @markomitrovic4925 ปีที่แล้ว

    Thank you for the explanation :)

    • @elliotwaite
      @elliotwaite  ปีที่แล้ว

      I'm glad you liked it.

  • @andreborgescavalcante4589
    @andreborgescavalcante4589 2 ปีที่แล้ว

    Amazing video.

  • @케이케이-u8y
    @케이케이-u8y 3 ปีที่แล้ว

    Hi Elliot , very good video , I have a question in your video 14:39
    I define def d_hook(grad): grad*=100
    Replacing e = c + d to e = c * d and i wrote code d.register_grad(d_hook) and it affect to c , c gradient = 100
    But, i do e= c+ 1*d and use hook func grad*=100 and I wrote the code as d.register_grad(d_hook) .it does not affect , c gradient= 1
    I don't know why there is such a difference

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +1

      When going backwards through c + d, the same gradient gets passed to both c and d, so multiplying the d gradient in-place by 100 will also affect the c gradient. When going backwards through c + 1 * d, the same gradient is passed to both c and the 1 * d term, and then the gradient passed to the 1 * d term is multiplied by 1 to get the gradient for d, and this multiplication by one ends up creating a new tensor, so now when you update the d gradient in-place it is no longer the same tensor used for the c gradient, so it doesn't affect the c gradient.
      I hope that makes sense. Let me know if that doesn't answer your question.

    • @케이케이-u8y
      @케이케이-u8y 3 ปีที่แล้ว +1

      @@elliotwaite Thanks your answer is really great.

  • @rayanzaki9314
    @rayanzaki9314 4 ปีที่แล้ว +1

    Awesome Explanation. It was really very helpful. Could you make a video on Quantization in Pytorch? Thanks

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว

      Thanks! Glad you liked it. And thanks for the suggestion. I'll add that to my list of potential future videos. Quantization is something I also want to learn more about at some point, and I'll probably make a video about it when I do.

  • @jonatan01i
    @jonatan01i 4 ปีที่แล้ว

    Thank you for this video, very informative, helped me a lot! Thanks!

  • @sudhirdeshmukh8445
    @sudhirdeshmukh8445 3 ปีที่แล้ว

    Hi Elliot, thanks for yet another wonderful PyTorch video. I was just wondering why there is "@staticmethod" mentioned before the forward function of a module. Why use "@staticmethod" also when and where. REF: 15:21 in the video.
    Thank you

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +1

      "@staticmethod" is just a function decorator that makes it so that when the method is called, its first argument isn't auto-filled with the value of the instance that is calling the method, or in other words, it just makes it so that the first parameter of the method doesn't have to be "self". You can use it on any methods where the "self" parameter is not used, in which case you can add the "@staticmethod" decorator and remove the "self" parameter. Some tests show that using it when appropriate provides a tiny performance boost, but I think it mostly just makes it cleaner in the sense that you don't list any unused parameters. You can find more info about it here: docs.python.org/3/library/functions.html#staticmethod

  • @pouyaparsa5851
    @pouyaparsa5851 4 ปีที่แล้ว +2

    Nice job, thank you
    I wonder is there any way to see these nodes in code and print their properties like where they point to an so on ?
    in other words could we go any further than knowing grand_fn is actually there ?

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +1

      Yeah, you can inspect some things, like the nodes in the backwards graph. I advise using a debugger. I use the one in PyCharm. The grad_fn property will point to a node in the backward graph, and then that node will have a next_functions property which will be a tuple of tuples that contain other nodes in the backward graph, and so on. For example:
      a = torch.tensor(2.0, requires_grad=True)
      b = (a * 3) * 4
      print(b.grad_fn.next_functions[0][0].next_functions[0][0])
      # Will print out:
      print(b.grad_fn.next_functions[0][0].next_functions[0][0].variable)
      # Will print out: tensor(2., requires_grad=True)
      The first print function will work its way back through the backward graph until it gets to the AccumulateGrad node for the `a` tensor, so it will print out that AccumulateGrad node object. And the second print statement will print out the variable associated with that AccumulateGrad node, which is the `a` tensor, so it will print out the `a` tensor.
      However, the things you can access is limited, for example, I don't think you can access which tensors are associated with the intermediate nodes, like the MulBackward0 nodes, since I think that information is stored on the C++ side of things.
      Good question. Thanks, Pouya!

    • @pouyaparsa5851
      @pouyaparsa5851 4 ปีที่แล้ว +1

      @@elliotwaite thanks for this perfect answer !

  • @az8134
    @az8134 3 ปีที่แล้ว +1

    damn, I never looked into those details before.

  • @TheMazyProduction
    @TheMazyProduction 4 ปีที่แล้ว +2

    What do you use to make these diagrams?

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +2

      For this one I used Figma. I learned from this tutorial: th-cam.com/video/OM-lTzFm9JQ/w-d-xo.html

    • @TheMazyProduction
      @TheMazyProduction 4 ปีที่แล้ว +1

      @@elliotwaite Perfect I needed something like this to make flowcharts!

  • @thevikinglord9209
    @thevikinglord9209 2 ปีที่แล้ว

    Nice video, so how do you check out the graph ?

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว

      Do you mean how do you check out the backward graph of the code you wrote? If so, I explored the backward graph by using a debugger and looking at the `grad_fn` property on the tensors.

  • @ハェフィシェフ
    @ハェフィシェフ 2 ปีที่แล้ว

    Do you make the diagrams yourself or did you write code for it?

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว +1

      I made them myself using Figma.

  • @PankajGupta-ki9gx
    @PankajGupta-ki9gx 3 ปีที่แล้ว

    Can you make a full-fledged Series covering various PyTorch functionalities and inbuild classes since the documentation is quite tough to interpret for custom datasets in form of a playlist!
    Thank You

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      Thanks for the suggestion. I haven't been interested in making PyTorch videos lately, but I'll add your recommendation to my potential future videos ideas list.

  • @mariogalindoq
    @mariogalindoq 3 ปีที่แล้ว

    Elliot: very interesting, but could you give examples of using hooks? That's it, why could be useful to use hooks? In which problem do you needs hooks? What can be done with hooks that can't be done other way?

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      Good question. I probably should have included more specific use cases in the video. The best use case I can think of for hooks is when you want to change how data flows through an existing module (so these would be using the module style hooks, `register_forward_hook` and `register_forward_pre_hook`). For example, these hooks were used in PyTorch's library to implement these tools:
      Quantization: pytorch.org/docs/stable/quantization.html
      Pruning: pytorch.org/tutorials/intermediate/pruning_tutorial.html
      Spectral Norm: pytorch.org/docs/stable/generated/torch.nn.utils.spectral_norm.html
      Weight Norm: pytorch.org/docs/stable/generated/torch.nn.utils.weight_norm.html
      You can download the PyTorch library and search the source code for "register_forward_" to see how they were used.
      Good use cases for the Tensor hooks are harder to think of, but they could be used to experiment with novel ways of managing the gradients as they flows through the backward graph. For example, to do gradient clipping, most people just clip the gradients at the end of the backward pass, but maybe it would be better in certain cases to clip the gradients as they are flowing through the backward graph instead, which could be done with Tensor hooks (that might actually be a good experiment to try 🤔).
      I personally haven't ever run into a situation where I needed to use them in the projects I've worked on, but it's good to know they're there if needed, and I wanted to explain them since my viewers had asked about them.

  • @nezgi8220
    @nezgi8220 3 ปีที่แล้ว

    What about stacking a and b to another tensor? How grads are calculated if many of these grad required mini tensors stacks to a big tensor?

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      In the backward pass, the gradient will get distributed to each of the tensors that were stacked together, only passing along to each tensor the part of the gradient that corresponds with that tensor.

    • @nezgi8220
      @nezgi8220 3 ปีที่แล้ว

      @@elliotwaite Indeed, it is, I tested empirically. What a miracle!

  • @rachelliu7253
    @rachelliu7253 2 ปีที่แล้ว

    Thanks so much

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว

      You're welcome. Thanks for the comment.

  • @ravivaishnav20
    @ravivaishnav20 4 ปีที่แล้ว +1

    Awesome explanation, Could you please give intuition on Data parallesim in pytorch, and is there any we can use colab GPU with our Laptop GPU ?

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว

      Thanks!
      I'll add data parallelism in PyTorch to my list of potential future videos (but not sure when I might get around to making it). For RL tasks, I've just been using MPI for Python (mpi4py.readthedocs.io/en/stable/). An example of it being used can be seen in OpenAI's Spinning Up in Deep RL code:
      github.com/openai/spinningup/blob/master/spinup/utils/mpi_pytorch.py
      github.com/openai/spinningup/blob/master/spinup/utils/mpi_tools.py
      github.com/openai/spinningup/blob/master/spinup/algos/pytorch/vpg/vpg.py
      And there's also PyTorch's built-in distributed training tools, but I haven't dived into those much yet.
      Using a colab GPU and your laptop's GPU in parallel should be possible, but I'm not sure the details of how you would get it to work. I would imagine you'd establish a way to communicate between the two processes running on the separate machines, then you synchronize the models at the start of training, and then each model computes the gradients for a separate batch of data, and then those gradients would get averaged using the communication method before using them to update the models.

  • @Aditya-ne4lk
    @Aditya-ne4lk 4 ปีที่แล้ว +1

    a.grad will have the same shape as the shape of a correct?

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +1

      Yep. The gradient tensor will have a gradient value corresponding to each of the values in the A tensor, so it will have the same shape as A.

  • @anas.2k866
    @anas.2k866 2 ปีที่แล้ว

    Thanks, so we cant track the gradient in backward when it is in module ? Is there any other way

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว

      After I made this video, PyTorch added a new hook called register_full_backward_hook() that works on modules. It is called whenever the gradient with respect to the inputs is computed. The docs for it are here:
      pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_full_backward_hook
      However, if you are asking about tracking the gradients inside the module, and not just the gradients of the inputs, to do that, I think you would have to updated the actual module code by adding calls to register hooks on the intermediate tensors.

    • @anas.2k866
      @anas.2k866 2 ปีที่แล้ว

      @@elliotwaite Ah thank you. So if I put this hook in the layer 5 of my multylayer perceptron and I lunch the loss.backward(), the grad_input is the gradient of the loss with respect to the weihts and bias of layer 5 is not it ? and what is the grad_output. Thanks again for your huge effort !!!!

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว

      @@anas.2k866 hooks are usually used to intercept gradients in places where you wouldn't otherwise have access to them.
      The graident of the loss with respect to the weights and bias of layer 5 will already be accessible by accessing the the `.grad` attribute of the weights/bias tensors of that module.
      Registering a hook on a module would be used for something else. It would be used to access the gradients just before they enter the module in the backward pass and just after they leave the module in the backward pass. The gradients just before entering the module in the backward pass will be the `grad_output` value of the hook function, because those will be the gradients with respect to the output of the module. And after those gradients flow backward through the module, you'll get the `grad_input` values, the gradients of the loss with respect to the inputs of the module. The variable names `grad_input` and `grad_output` are using input/output to refer to input/output of the forward pass, which is why they are the reverse names of what they are when flowing backward through the graph (`grad_output` is the input gradient in the backward pass and `grad_input` is the output gradient in the backward pass).

    • @anas.2k866
      @anas.2k866 2 ปีที่แล้ว

      @@elliotwaite ah ok so grad_output is the gradient of the loss with respect to the output of the module, wich in my case the gradient of the loss with respect to the activation of the neurones in layer 5. And grad_input is the gradient of the loss with respect to the activation of the neurones in layer 4 ?

    • @elliotwaite
      @elliotwaite  2 ปีที่แล้ว +1

      @@anas.2k866 Yep

  • @BlackHermit
    @BlackHermit 3 ปีที่แล้ว

    So, still no answer as to why the 0 is there in "MulBackward0"?

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +2

      I think I just figured it out. It allows for the same operation to be called in multiple ways (function overloading), and each different overloaded way of calling that operation gets a different index number for its backward version.
      The part of the PyTorch library that generates these backward operation names can be found here (and the comment above the code also describes that this is done to de-duplicate overloaded operation names):
      github.com/pytorch/pytorch/blob/master/tools/autograd/load_derivatives.py#L355
      For example, the `min` operation can be called in multiple ways. In the code below, I call the `min` operation in two of these different ways, and the resulting backward operations associated with different output tensors end up having different index numbers. "torch.min(a)" generates a MinBackward1 operation, and "torch.min(a, dim=0, keepdim=False)" generates a MinBackward0 operation.
      Code example:
      import torch
      a = torch.tensor([2.0, 3.0], requires_grad=True)
      b = torch.min(a)
      (c, c_indices) = torch.min(a, dim=0, keepdim=False)
      print(b) # Prints: tensor(2., grad_fn=)
      print(c) # Prints: tensor(2., grad_fn=)

    • @BlackHermit
      @BlackHermit 3 ปีที่แล้ว +1

      @@elliotwaite Oh, interesting. Thanks!

  • @jonatan01i
    @jonatan01i 4 ปีที่แล้ว

    Besides the inplace operation on the grads,
    is there any difference between
    - using hooks
    and
    - going through the model.parameters()'s grads
    in order to modify them before a call on optimizer.step()?

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +1

      As far as I'm aware, there won't be any difference. When you call optimizer.step(), the optimizer will only be concerned with what the current grad values of the parameters are, and it won't matter how those grad values were assigned.

    • @jonatan01i
      @jonatan01i 4 ปีที่แล้ว +1

      @@elliotwaite Makes sense, thank you!

  • @michpo1445
    @michpo1445 ปีที่แล้ว

    What is the program you're using to graphically design py torch code?

    • @elliotwaite
      @elliotwaite  ปีที่แล้ว

      I make the designs with Figma.

    • @michpo1445
      @michpo1445 ปีที่แล้ว

      @@elliotwaite Thanks, but to clarify does this tool create the pytorch code for you, or do you just use it to graphically represent what you are coding?

    • @elliotwaite
      @elliotwaite  ปีที่แล้ว +1

      @@michpo1445 it wasn't auto-generated, I just designed the slides by hand to match the info I was seeing in the Python debugger.

  • @liweidai4474
    @liweidai4474 3 ปีที่แล้ว

    There is a register_full_backward_hook() method now which is recommended over register_backward_hook(). You can check it out. One thing bothers me is that, in the documentation it clearly says that modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error. But whatif I indeed have some a predefined model, which has some inplace operators. I just want to know the grads w.r.t to the tensors before doing the inplace op and after the inplace op. Is there any means to accomplish this other than have to modify my code to not use inplace op? How about the register hooks for the tensors method?

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +1

      Thanks for letting me know about the new register_full_backward_hook() method. I've added a note about this to the video description.
      And about your question, I don't know the answer to this one.

  • @AmitYadav-zs4ft
    @AmitYadav-zs4ft 3 ปีที่แล้ว

    Hi, what are using to display those cpu/memory specs in your upper right corner? In linux, we have system monitor, but I am looking for an alternative in mac. Thanks

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +1

      iStat Menus is the one I use: bjango.com/mac/istatmenus/

  • @Jimmy-et1bp
    @Jimmy-et1bp 3 ปีที่แล้ว

    how that forward_pre_hook and forward_hook affect a,b,c gradients?

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      Any operations performed within the forward_pre_hook or forward_hook functions will affect the gradients the same as any computations performed in the module's forward method. It's almost as if you are just inserting the forward_pre_hook function's code into the beginning of the forward method, and inserting the forward_hook function's code into the end of the forward method (forward_hook probably should have been named forward_post_hook, I'm not sure why it wasn't).

  • @randomforrest9251
    @randomforrest9251 3 ปีที่แล้ว

    So why is it fallee mulbackward0?

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว +2

      I recently figured it out. It allows for the same operation to be called in multiple ways (function overloading), and each different overloaded way of calling that operation gets a different index number for its backward version.
      The part of the PyTorch library that generates these backward operation names can be found here (and the comment above the code also describes that this is done to de-duplicate overloaded operation names):
      github.com/pytorch/pytorch/blob/master/tools/autograd/load_derivatives.py#L565
      For example, the `min` operation can be called in multiple ways. In the code below, I call the `min` operation in two of these different ways, and the resulting backward operations associated with different output tensors end up having different index numbers. "torch.min(a)" generates a MinBackward1 operation, and "torch.min(a, dim=0, keepdim=False)" generates a MinBackward0 operation.
      Code example:
      import torch
      a = torch.tensor([2.0, 3.0], requires_grad=True)
      b = torch.min(a)
      (c, c_indices) = torch.min(a, dim=0, keepdim=False)
      print(b) # Prints: tensor(2., grad_fn=)
      print(c) # Prints: tensor(2., grad_fn=)

    • @randomforrest9251
      @randomforrest9251 3 ปีที่แล้ว +1

      @@elliotwaite thank you a lot!

  • @MaximYudayev
    @MaximYudayev 3 ปีที่แล้ว

    Hi Elliot. Great videos. Sub’d :) It would be outstanding if we could dive into customization of the quantization workflow! Things like making custom modules compatible with the fusing and quantization workflows, as well as expanding the data type formats. Thank you!

    • @elliotwaite
      @elliotwaite  3 ปีที่แล้ว

      Thanks. Glad you liked the videos.
      So far I've only briefly looked into PyTorch's quantization capabilities, but it looks interesting. But I'm not sure if I'll ever get around to making a video about it since I've been more focused on learning Jax these days, but I'll add the idea to my list of potential future TH-cam videos. Thanks for the recommendation.

  • @shvprkatta
    @shvprkatta 4 ปีที่แล้ว

    Thanks a ton Elliot!...it would take a lot of time to understand the concepts otherwise...

    • @elliotwaite
      @elliotwaite  4 ปีที่แล้ว +1

      Thanks! Glad you found it helpful.