After 3 years of studying engineering, you have officially blew my mind with extracting classes. I never did this before, but I for damn sure know my future code will look 👌 clean and professional
Thank you so much for sharing your knowledge. It's because of people like you guys that science can continue moving forward. I have studied in different sources, web pages, books, videos, and graduate classes and your content is the best I have ever found.
7:35 I really like this tree diagram showing the unfolding relationship of hierarchical layers of code into each extracted class. It's like a map that shows each class and the methods that belong to each one. Visualizing this diagram helps me keep in context where exactly I am in the program overall, and what specific part I'm working on. Thanks you guys!
I like the question. The code doesn't work if we don't also pass the images. I think they are passing the batch to the network to generate the graph. The graph is stored on the tensor output not in the network itself.
Haha. It looks like it! I didn't know about that. A few more lines of code and this would be that. 😆 I like the word "parameter space". I should have used it. That's what the run builder gives us. My intention is for people to use this to experiment and easily discover relationships between parameter values opposed to trying to optimize this way. It's hard to keep track of all the tweaks and changes we do when just starting out.
Hey Shreyas - The network needs to say inside the loop so that each run starts with a new set of weights. Remember, the network's weights are initialized when the network is created. However, it's not strictly required, if the network is outside the loop, the same weights will be used for each run. That even sounds a bit interesting!
i are propagating the loss backward and zeroing the gradient for every batch. At the end of epoch when i am trying to plot histogram for param.grad it's throwing error as it is None type.
thanks for your video. I have a question, please. why didn't you clear the value of self.run_count in the end_epoch function. if for instance, our epoch is more than 1, won't the self.run_count continue to count from where it stopped in the previous epoch?
Hey lpos lpos - You are correct. This behavior is arbitrary. It can work either way. In general, we'd like to pick up where we left off. This is what happens as long as the program stays in memory. However, as an enhancement, we could create a load method that loads the state of the training process from disk when the program is first starting. In this way, we can track the stats of the training process as a whole. This is what we need. For example, we should be able to answer the question, how long did it take this network to train? How many epochs? etc. Hope this helps!
i don't understand why you used the @torch.no_grad decorator before the _get_num_corrects method. i mean its not like you are doing any changes to any tensors in that method , right?
Great tutorial! I have one question: When setting no_grad (before _get_num_correct()), when does this reset back to default? I think it only applies in this function because of the '@' symbol, but I'm not sure. Also, I use a validation loop, with model.eval() (I think it's for bn and dropout layers to not act the same as in training). Apart from not using optimizer.step() & loss.backwards() in my loop (it's obvious why), do I also need to add torch.no_grad()? And if so, for how long does it apply?
Hey danny, "I think it only applies in this function because of the '@' symbol, but I'm not sure." You are correct. "I think it's for bn and dropout layers to not act the same as in training" You are correct. "do I also need to add torch.no_grad()?" Yes. It applies inside the code block where it is used. Here is an example: with torch.no_grad(): preds = network2(images) # Pass Batch loss = F.cross_entropy(preds, labels) # Calculate Loss The use of torch.no_grad with the @ is an annotation disables grad within the given function. The use of torch.no_grad using the "with" keyword disables grad within the given code block. See here: pytorch.org/docs/stable/torch.html#locally-disabling-gradient-computation Hope this helps!
Great series!!! Absolutely love it :) one question, shouldn't we add self.network = None to the end_run function? In my mind, we want to have a new network instance everytime we use a different set of parameters. Otherwise, the network remembers previously learned things, doesn't it?
Hey Snowe - Glad you like the series 😊 You are correct. The network object does remember previously learned things. This is due to the weights inside the network. However, notice that the begin_run() method requires a network instance. The network is re-initialized inside the run loop before being passed to the begin_run() method, and so the network starts with fresh weights for each run.
@@deeplizard Wow, thank you for the quick response :) So the idea is that because we create all the separate run instances before we start with the main loop, we have a bunch of newly instantiated models that just have the same name? And then for each run, we take one of those previously instantiated models?
The networks are created as we are looping. We have a list of runs. Now, for each run, we create a network by calling the Network() class constructor. This happens inside the main loop. On the line: network = Network()
@@deeplizard That makes sense. So in the case where we test multiple different models, we create a new instance for every run in the line "network = networks[run.network].to(device)" right? Where "run" is used to refer to the already created model instances of the .get_runs method?
Great tutorials! :) One question though. I am not sure if I am correct but I think you are not actually calculating the weighted average loss. You pass the loader to the begin_run() method and set it as self.loader. In the track_loss() method you multiply the loss.item() with self.loader.batch_size, which will always be the same value. Please correct me if I am wrong.
Hey Snowe - Thank you 😃. The loss that we are passing to the track_loss method is coming from the cross_entropy loss function. This loss value is the weighted average loss for the batch (see the link below for details). Then, are multiplying this average by the size of the batch which gives us an approximate total loss for the batch. We are summing these rough totals, and at the end of the epoch we are dividing by the training set size in the end_epoch method. Thanks for pointing out that this is a bit crude. We could make it precise by setting the reduction method to 'sum' and dropping the multiplication bit inside the track loss method. See the reduction parameter here: pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#crossentropyloss I hope this helps clarify these calculations. Chris
@@deeplizard Thank you for your clarification! :) So if I understand correctly, the loss calculated by the cross_entropy loss function and the one we calculate and display in the table are not precisely the same? I know that this does not impact the weight updates since we use the cross_entropy loss for that. I am just curious ;)
@@deeplizard That I understand. But the loss for the entire epoch is higher than it actually is since we always multiply with the same batch size, correct? Say for example I have a dataset with 150 records and I use a batch size of 64. Then I have two batches of 64 and one batch of 22 in one epoch (64+64+22=150). But we always take the loss and multiply it with 64 and then divide it by the length of the data (150 in this example). However, this is not really a weighted average since we would have to multiply the loss of the last batch with 22 instead of 64. I hope this explanation makes sense :)
@Snowe - I see now. Your explanation makes sense. You are correct. This was later enhanced after the video was recorded. Check the updates section on the blog where there are some details about it. Link: deeplizard.com/learn/video/ozpv_peZ894#commit-acacec9
Hello! Thanks for the amazing course. I have a question though. Let's say i have a bunch of face images with 3 differentes labels each: age, gender and race. How can i train one single network to learn to predict all 3 different labels for a single input? Is it possible? I totally understood how to do it for 1 label, but for more than 1 i have no clue.
the only differences are that you should use sigmoid instead of softmax, and that the labels will change from vectors to matrices( becoz you will have more than one prediction per image). to calculate the accuracy, choose a probability threshold to determine which classes to pick from the outputted class predictions, instead of argmax. then compare them to the ground truth predictions.
Thanks for the video. Please I get this error and I don't know I have written all the steps correctly ? 44 ---> 45 epoch_duration = time.time() - self.epoch_start_time 46 run_duration = time.time() - self.run_start_time 47 TypeError: unsupported operand type(s) for -: 'float' and 'NoneType' Any help will be apreciated
Here, we are tracking the run duration at each epoch. Notice in the output that the run duration increases as the epochs move forward. For this reason, the code is okay. Thank you for checking on this 😃
What I learned: 1、run manager can help us to clean the train loop 2、run manager can have class to to track the same properties like epoch or run 3、easy to add new hyper parameters in training loop 4、store the results as csv and json for analysis Question: 1、what if I want to add hyper parameters in model.
I love your question. To do it, we need a function that creates network's based on some parameters. Then, we specify those parameters in the dictionary, and pass them to the function that will create the network. So instead of doing this: network = Network() We do something like this: network = create_network(param1, param2, param3) This is where I'm thinking of going next.
Hey Bidhan - We considered this before, but since they are enumerated on the website, we decided to not add this data in the thumbnail. Sometimes we update the playlist by inserting a video or just simply changing the order which would make updating thumbnails really problematic. Let me know what you think of the navigation on the site. It's on the right side if you are on desktop and at the bottom on mobile: deeplizard.com/learn/video/ozpv_peZ894 There is also another view here: deeplizard.com/learn/playlist/PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG
why am i getting 50 epochs for each run? my dataframe has 200 rows Here's my csv file: github.com/SaugatBhattarai/pytorch-from-deeplizard/blob/master/results.csv
I don't like it when peoples just calling self: self. Why not give it a better fitting name to each function? It makes the code looks ugly and it cost you more time. Just called dl or ai. or nn...wich I doing it.😀
👉 Check out the blog post and other resources for this video:
🔗 deeplizard.com/learn/video/ozpv_peZ894
After 3 years of studying engineering, you have officially blew my mind with extracting classes. I never did this before, but I for damn sure know my future code will look 👌 clean and professional
Thank you so much for sharing your knowledge. It's because of people like you guys that science can continue moving forward. I have studied in different sources, web pages, books, videos, and graduate classes and your content is the best I have ever found.
Thanks for making this!
7:35 I really like this tree diagram showing the unfolding relationship of hierarchical layers of code into each extracted class. It's like a map that shows each class and the methods that belong to each one. Visualizing this diagram helps me keep in context where exactly I am in the program overall, and what specific part I'm working on. Thanks you guys!
Hey RedShipsofSpainAgain - I feel the same way. Thank you for letting me know. It's good to see a long-time sub in the comments!
Hey RedShipsofSpainAgain - I feel the same way. Thank you for letting me know. It's good to see a long-time sub in the comments!
Thanks. I have extended the example to incorporate test loop and model checkpointing for my use case.
8:49 When using the method add_graph, you pass network, which makes sense, but why are the images passed as second argument?
I like the question. The code doesn't work if we don't also pass the images. I think they are passing the batch to the network to generate the graph. The graph is stored on the tensor output not in the network itself.
The GridsearchCV equivalent in pytorch?
Haha. It looks like it! I didn't know about that. A few more lines of code and this would be that. 😆 I like the word "parameter space". I should have used it. That's what the run builder gives us. My intention is for people to use this to experiment and easily discover relationships between parameter values opposed to trying to optimize this way. It's hard to keep track of all the tweaks and changes we do when just starting out.
@@deeplizard lol I use it to get more insights into parameter space.
13:07 - can we pull out instantiating Network() outside the for loop?
Hey Shreyas - The network needs to say inside the loop so that each run starts with a new set of weights. Remember, the network's weights are initialized when the network is created. However, it's not strictly required, if the network is outside the loop, the same weights will be used for each run. That even sounds a bit interesting!
i are propagating the loss backward and zeroing the gradient for every batch. At the end of epoch when i am trying to plot histogram for param.grad it's throwing error as it is None type.
thanks for your video. I have a question, please. why didn't you clear the value of self.run_count in the end_epoch function. if for instance, our epoch is more than 1, won't the self.run_count continue to count from where it stopped in the previous epoch?
Hey lpos lpos - You are correct. This behavior is arbitrary. It can work either way. In general, we'd like to pick up where we left off. This is what happens as long as the program stays in memory. However, as an enhancement, we could create a load method that loads the state of the training process from disk when the program is first starting. In this way, we can track the stats of the training process as a whole. This is what we need. For example, we should be able to answer the question, how long did it take this network to train? How many epochs? etc. Hope this helps!
@@deeplizard thanks a lot
I want to do transfer learning for either NLP or Computer Vision case
Great work! How many videos do you expect there is still left in this series?
As of now, working in an unplanned fashion. In general, it's open-ended.
Aiming to get people to a place where they can tinker and experiment.
i don't understand why you used the @torch.no_grad decorator before the _get_num_corrects method. i mean its not like you are doing any changes to any tensors in that method , right?
Hey Houidi - Good point! The decorator is unnecessary in this situation. Probably was left over after doing some refactoring. Thanks!
Great tutorial! I have one question: When setting no_grad (before _get_num_correct()), when does this reset back to default? I think it only applies in this function because of the '@' symbol, but I'm not sure.
Also, I use a validation loop, with model.eval() (I think it's for bn and dropout layers to not act the same as in training). Apart from not using optimizer.step() & loss.backwards() in my loop (it's obvious why), do I also need to add torch.no_grad()? And if so, for how long does it apply?
Hey danny,
"I think it only applies in this function because of the '@' symbol, but I'm not sure."
You are correct.
"I think it's for bn and dropout layers to not act the same as in training"
You are correct.
"do I also need to add torch.no_grad()?"
Yes. It applies inside the code block where it is used.
Here is an example:
with torch.no_grad():
preds = network2(images) # Pass Batch
loss = F.cross_entropy(preds, labels) # Calculate Loss
The use of torch.no_grad with the @ is an annotation disables grad within the given function.
The use of torch.no_grad using the "with" keyword disables grad within the given code block.
See here: pytorch.org/docs/stable/torch.html#locally-disabling-gradient-computation
Hope this helps!
@@deeplizard So, ideally to free up some memory I should use both model.eval() and torch.no_grad() inside my eval loop.
That's right.
You could add evaluation loop to the class. Make it compact like keras.
The tree-like structure was awesome. It gives a great overall view. Do you know any python package that can do this automatically?
Hey saeed - Thanks for mentioning that. I don't. That would be cool.
Great series!!! Absolutely love it :) one question, shouldn't we add self.network = None to the end_run function? In my mind, we want to have a new network instance everytime we use a different set of parameters. Otherwise, the network remembers previously learned things, doesn't it?
Hey Snowe - Glad you like the series 😊 You are correct. The network object does remember previously learned things. This is due to the weights inside the network. However, notice that the begin_run() method requires a network instance. The network is re-initialized inside the run loop before being passed to the begin_run() method, and so the network starts with fresh weights for each run.
@@deeplizard Wow, thank you for the quick response :) So the idea is that because we create all the separate run instances before we start with the main loop, we have a bunch of newly instantiated models that just have the same name? And then for each run, we take one of those previously instantiated models?
The networks are created as we are looping.
We have a list of runs. Now, for each run, we create a network by calling the Network() class constructor. This happens inside the main loop. On the line:
network = Network()
@@deeplizard That makes sense. So in the case where we test multiple different models, we create a new instance for every run in the line "network = networks[run.network].to(device)" right? Where "run" is used to refer to the already created model instances of the .get_runs method?
Yes. That's correct. You are thinking ahead. Later in the series, we do just that. :)
Great tutorials! :) One question though. I am not sure if I am correct but I think you are not actually calculating the weighted average loss. You pass the loader to the begin_run() method and set it as self.loader. In the track_loss() method you multiply the loss.item() with self.loader.batch_size, which will always be the same value. Please correct me if I am wrong.
Hey Snowe - Thank you 😃. The loss that we are passing to the track_loss method is coming from the cross_entropy loss function. This loss value is the weighted average loss for the batch (see the link below for details). Then, are multiplying this average by the size of the batch which gives us an approximate total loss for the batch. We are summing these rough totals, and at the end of the epoch we are dividing by the training set size in the end_epoch method.
Thanks for pointing out that this is a bit crude. We could make it precise by setting the reduction method to 'sum' and dropping the multiplication bit inside the track loss method.
See the reduction parameter here:
pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#crossentropyloss
I hope this helps clarify these calculations.
Chris
@@deeplizard Thank you for your clarification! :) So if I understand correctly, the loss calculated by the cross_entropy loss function and the one we calculate and display in the table are not precisely the same? I know that this does not impact the weight updates since we use the cross_entropy loss for that. I am just curious ;)
That's correct. The loss in the table is for the entire epoch. The loss coming from the loss function is per batch.
@@deeplizard That I understand. But the loss for the entire epoch is higher than it actually is since we always multiply with the same batch size, correct? Say for example I have a dataset with 150 records and I use a batch size of 64. Then I have two batches of 64 and one batch of 22 in one epoch (64+64+22=150). But we always take the loss and multiply it with 64 and then divide it by the length of the data (150 in this example). However, this is not really a weighted average since we would have to multiply the loss of the last batch with 22 instead of 64. I hope this explanation makes sense :)
@Snowe - I see now. Your explanation makes sense. You are correct. This was later enhanced after the video was recorded. Check the updates section on the blog where there are some details about it.
Link: deeplizard.com/learn/video/ozpv_peZ894#commit-acacec9
Hello! Thanks for the amazing course. I have a question though. Let's say i have a bunch of face images with 3 differentes labels each: age, gender and race. How can i train one single network to learn to predict all 3 different labels for a single input? Is it possible? I totally understood how to do it for 1 label, but for more than 1 i have no clue.
Its called multi label classification. You can search for that. I dont know it myself. just know the term.
the only differences are that you should use sigmoid instead of softmax, and that the labels will change from vectors to matrices( becoz you will have more than one prediction per image). to calculate the accuracy, choose a probability threshold to determine which classes to pick from the outputted class predictions, instead of argmax. then compare them to the ground truth predictions.
Thanks for the video.
Please I get this error and I don't know I have written all the steps correctly ?
44
---> 45 epoch_duration = time.time() - self.epoch_start_time
46 run_duration = time.time() - self.run_start_time
47
TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'
Any help will be apreciated
It looks like this variable is not being set properly: self.epoch_start_time
Check that it is indeed being set when the epoch starts
@@deeplizard yes i have solved it it was not suppose to be set to none at the top, instead i set it to 0 and it worked.
i think there is a bug in end_epoch as it contains the code run_duration = time.time()-self.run_start_time as this line should be in end_run()
Here, we are tracking the run duration at each epoch. Notice in the output that the run duration increases as the epochs move forward. For this reason, the code is okay. Thank you for checking on this 😃
What I learned:
1、run manager can help us to clean the train loop
2、run manager can have class to to track the same properties like epoch or run
3、easy to add new hyper parameters in training loop
4、store the results as csv and json for analysis
Question:
1、what if I want to add hyper parameters in model.
I love your question. To do it, we need a function that creates network's based on some parameters. Then, we specify those parameters in the dictionary, and pass them to the function that will create the network. So instead of doing this:
network = Network()
We do something like this:
network = create_network(param1, param2, param3)
This is where I'm thinking of going next.
@@deeplizard, I'd like to see ur model tuning video ^_^ ur videos are very helpful! How can I use tuning of params in reinforcement learning?
Just one request, can you add the serial number in the video title or thumbnail, like it's 32nd video so adding 32 in thumbnail will be very helpful
Hey Bidhan - We considered this before, but since they are enumerated on the website, we decided to not add this data in the thumbnail. Sometimes we update the playlist by inserting a video or just simply changing the order which would make updating thumbnails really problematic. Let me know what you think of the navigation on the site. It's on the right side if you are on desktop and at the bottom on mobile: deeplizard.com/learn/video/ozpv_peZ894
There is also another view here: deeplizard.com/learn/playlist/PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG
Hey guys! Enough Vlogging. Pls make the next one :D
=)))
😄 It's in the works!
why am i getting 50 epochs for each run? my dataframe has 200 rows
Here's my csv file: github.com/SaugatBhattarai/pytorch-from-deeplizard/blob/master/results.csv
The line:
for epoch in range(50):
sets the number of epochs per run.
OOP is dead, long live the functional programming.... FYI :-)
😝
@@deeplizard 💖
I don't like it when peoples just calling self: self. Why not give it a better fitting name to each function? It makes the code looks ugly and it cost you more time. Just called dl or ai. or nn...wich I doing it.😀