This is great work. I really like the detailed explanation using simple practical examples. Please continue doing this exploring the various options in pytorch. Happy holidays
@ 8:16 do you have a simple real life "example" why we have to use the v for the right size and why it wouldn't work without it? I know silly question, but I don't really grasp the concept behind it..
Fantastic work! slight recommendation of how to improve this: if u use the same naming scheme in the Jacobian and in the code (l vs z), we can follow easier the chain rule!
interesting thing that I realized is that eventhough after set z=z.mean which changes grad_fn of z from mulbackward to meanbackward(so z doesnt have mulbackward anymore), it is still able to track gradient
The gradient in 5:46 is the multiplication of the jacobian matrix and a directional vector. When the function which you want to calculate the gradient has a one dimensional output, there is no need to determine a direction for derivative since it is unique, that's why for the mean gradient it accepts no arguments. On other hand, see that for a multidimensional output as x+2 = y (y is a multidimensional output, a vector), you have to specify what direction you want to take your gradient. That is where the v vector enters. He arbitrarily choose components just to show that the function requires an vector to define the directional derivative, but usually for statistical learning it is chosen the direction where the gradient is steepest
Hi, good question! Yes this can be a bit confusion, so I provided some links from the pytorch forum which might help: discuss.pytorch.org/t/how-to-use-torch-autograd-backward-when-variables-are-non-scalar/4191/4 discuss.pytorch.org/t/clarification-using-backward-on-non-scalars/1059
@@patloeber I can't find anyone describe how to use it there, are you able to give a quick summary? Help much be appreciated as I can't find anything, including values 0.1, 1, 0.001
Did anyone find a good explanation for v ? I'm also bit confused here @ 8:00 what is the difference between z.backward() vs z.backward(v)? z.backward() ==> is calculating dz/dx is z.backward(v) ==> is calculating dz/dv ?
When do you learn the why of things ? why am I making a gradient ? when would I use it ? I feel like these things are often explained in DS videos/classes
I just discoverd your channel and it's really good! One question, I dont totally understand why we should use .detach() or no_grad() when updating weights... we are creating a new graph or something like that? what "prevent to tracking the gradient" exactly means? Hope you can help me with that. Keep the good job (:
You should use this for example after the training when you evaluate your model. Because otherwise for each operation you do with your tensor, PyTorch will calculate the gradients. This is time consuming and expensive, so we don't want this anymore after the training because we no longer need backpropagation. It will reduce memory usage and speed up computations.
Getting this error, RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
I had a german professor who also pronounced "value" as "ualue". Can you explain why do you sometimes pronounce it that way? I am very intrigued. P.S. you have the best pytorch series on youtube
I cant understand,why must gradient equal +gradient (gradient=+gradient) in each epoch.Where Can I find consistent mathematic formula?Can you explain me once?
When I run the following code, I encountered an error. Could you help me? Thank you very much! weights = torch.ones(4, requires_grad=True) optimizer = torch.optim.SGD(weights, lr=0.01) The error is Traceback (most recent call last): File "D:/1pytorch-tutorial/my_try/learning-PythonEngineer/learning.py", line 113, in optimizer = torch.optim.SGD(weights, lr=0.01) File "D:\Anaconda3\lib\site-packages\torch\optim\sgd.py", line 68, in __init__ super(SGD, self).__init__(params, defaults) File "D:\Anaconda3\lib\site-packages\torch\optim\optimizer.py", line 39, in __init__ torch.typename(params)) TypeError: params argument given to the optimizer should be an iterable of Tensors or dicts, but got torch.FloatTensor
I'm not sure how you managed to be unclear on the third video of the series. What you said about gradients, .backwards(), .step(), and .zero_grad() were not clear at all.
I appreciate that you followed the official tutorial from the Pytorch documentation. Also your comments make things much clearer
Thanks! Glad you like it
what an amazing teacher you are
This is great work. I really like the detailed explanation using simple practical examples. Please continue doing this exploring the various options in pytorch. Happy holidays
Thank you! Glad you like it. I have more videos planned for the next few days :) Happy holidays!
Simply amazing, Appreciate your efforts and making it available to the public :)
I appreciate your efforts, you readily explained every detail. Thanks very much.
🙏
@ 8:00 what is the difference between z.backward() vs z.backward(v)?
z.backward() ==> is calculating dz/dx
is z.backward(v) ==> is calculating dz/dv ?
Also, how do we decide the value for v?
z.backward(v) calculates dz/dx*v
@@alanzhu7538 You got any idea now?😂
@@gautame You mean v * (dz/dx)? Or dz/(dx * v)?
@@peixinwu4631 v * (dz/dx)
This tutorial is brilliant. It is super friendly to people who are new to Pytorch!
glad you like it!
Thank you! Your video helps a lot to my undergraduate final project!
that's nice to hear :)
This video is super helpful! Explanations are very understandable. Thank you so much!!🤩👍🙏
@ 8:16 do you have a simple real life "example" why we have to use the v for the right size and why it wouldn't work without it? I know silly question, but I don't really grasp the concept behind it..
Fantastic work!
slight recommendation of how to improve this: if u use the same naming scheme in the Jacobian and in the code (l vs z), we can follow easier the chain rule!
Very well done. An excellent video!
Thanks !
This video helps me greatly. I like your language speed since English is not my mother tongue. Thank you a lot.
Nice, glad you like it
This video is extremely useful. Thank you!
Excellent video. Thanks!
Great Work Sir.
This video is very clear in explaining things. Thank you so much sir! Keep up the good work pls!
This tutorial is more than great! Thank you!
Very clear, thank you very much
interesting thing that I realized is that eventhough after set z=z.mean which changes grad_fn of z from mulbackward to meanbackward(so z doesnt have mulbackward anymore), it is still able to track gradient
if requires_grad=true, any operation that we do with z tracks the gradient
Great work Sir, and Thank you!
@5:46 in the image it is J^T . V
thankyou for your efforts
why can you just specify v as [0.1, 1.0, 0.001], why not some other numbers?
Did you find out?
It doesn't matter. You just need to have the same shape
how did you decide values in vector v?
The gradient in 5:46 is the multiplication of the jacobian matrix and a directional vector. When the function which you want to calculate the gradient has a one dimensional output, there is no need to determine a direction for derivative since it is unique, that's why for the mean gradient it accepts no arguments. On other hand, see that for a multidimensional output as x+2 = y (y is a multidimensional output, a vector), you have to specify what direction you want to take your gradient. That is where the v vector enters. He arbitrarily choose components just to show that the function requires an vector to define the directional derivative, but usually for statistical learning it is chosen the direction where the gradient is steepest
I don't get why you need the 3 methods presented at 8:06 for preventing the gradient. Can one not simply put requires_grad=False in the x tensor?
very nice, congrats!
Thank you very much for your tutorial. However, I did not plenty understand some details, like the variable v or the use of the optimizer.
Great work!
Quite love these courses! Great thanks
Thank you :)
awesome! Thank you
Just on to my 4 lec in pytorch series. Don't know if it's a complete series on pytorch ,but definitely whatever is there it's depicted nicely.
nice, yes it's a complete beginner series
At 9:30, isn’t another way to do x.volatile = False ?
All in one pytorch.. yeahhh.. fantastic.. thanks a ton🎉🎊🎊
glad you like it!
great work thank you
Great tutorial! Thanks for this material!
thanks :)
very nice tutorial. thank you!
thanks :)
Thanks for the video. for the tensor v, are the values of any importance or its only the size that is of importance?
Hi, good question! Yes this can be a bit confusion, so I provided some links from the pytorch forum which might help:
discuss.pytorch.org/t/how-to-use-torch-autograd-backward-when-variables-are-non-scalar/4191/4
discuss.pytorch.org/t/clarification-using-backward-on-non-scalars/1059
@@patloeber I can't find anyone describe how to use it there, are you able to give a quick summary? Help much be appreciated as I can't find anything, including values 0.1, 1, 0.001
@@sebimoe Apparently he doesn't know either....Typical
In my understanding, only the size will matter. So you can write v=torch.randn(3). Then pass v in backward function.
Did anyone find a good explanation for v ? I'm also bit confused here
@ 8:00 what is the difference between z.backward() vs z.backward(v)?
z.backward() ==> is calculating dz/dx
is z.backward(v) ==> is calculating dz/dv ?
Really amazing and very well explained ! Thank You .
Btw what IDE are you using ? looks so cool and handy , love the output option below .
Visual Studio Code
Exactly :)
thank you
When do you learn the why of things ? why am I making a gradient ? when would I use it ? I feel like these things are often explained in DS videos/classes
take a look at the ML and DL courses by Andrew Ng.
I just discoverd your channel and it's really good! One question, I dont totally understand why we should use .detach() or no_grad() when updating weights... we are creating a new graph or something like that? what "prevent to tracking the gradient" exactly means? Hope you can help me with that. Keep the good job (:
You should use this for example after the training when you evaluate your model. Because otherwise for each operation you do with your tensor, PyTorch will calculate the gradients. This is time consuming and expensive, so we don't want this anymore after the training because we no longer need backpropagation. It will reduce memory usage and speed up computations.
Great PyTorch Tutorial videos! May I know what is the VS Code extension you use to autocomplete the PyTorch line?
In this video it was only the built-in autocompletion through the official Python extension
Amazing
Really good tutorial, but I wanna know what the app which you use to draw is.
I used an Ipad for this (some random notes app)
Thank you!!
thanks for watching
If i want to play with it, suppose that we have y=3x, and we should get the answer of 3 for the gradient. So, how can i do that in pytorch?
Getting this error,
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
I had a german professor who also pronounced "value" as "ualue". Can you explain why do you sometimes pronounce it that way? I am very intrigued.
P.S. you have the best pytorch series on youtube
pretty good
Thanks!
This is so f**king useful, thank you sooo much!!
I cant understand,why must gradient equal +gradient (gradient=+gradient) in each epoch.Where Can I find consistent mathematic formula?Can you explain me once?
I'm not exactly sure what you mean. Can you point me to the time in the video where I show this?
When I run the following code, I encountered an error. Could you help me? Thank you very much!
weights = torch.ones(4, requires_grad=True)
optimizer = torch.optim.SGD(weights, lr=0.01)
The error is
Traceback (most recent call last):
File "D:/1pytorch-tutorial/my_try/learning-PythonEngineer/learning.py", line 113, in
optimizer = torch.optim.SGD(weights, lr=0.01)
File "D:\Anaconda3\lib\site-packages\torch\optim\sgd.py", line 68, in __init__
super(SGD, self).__init__(params, defaults)
File "D:\Anaconda3\lib\site-packages\torch\optim\optimizer.py", line 39, in __init__
torch.typename(params))
TypeError: params argument given to the optimizer should be an iterable of Tensors or dicts, but got torch.FloatTensor
torch.optim.SGD([weights], lr=0.01)
That’s good, ja? :)
Hey, could you explain what it means by tracking the gradient. I mean why its a issue??
Tracking each operation is necessary for the backpropagation. But it is expensive, so after the training we should disable it
in the end...it became heavy
I'm not sure how you managed to be unclear on the third video of the series.
What you said about gradients, .backwards(), .step(), and .zero_grad() were not clear at all.
Too many advertisements
Great work!