Pytorch RNN example (Recurrent Neural Network)

Aladdin Persson

มุมมอง 70 816

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 1 ธ.ค. 2024

ความคิดเห็น • 82

@ChizkiyahuOhayon ปีที่แล้ว ⁺⁵
I have been struggling for my master degree. You tutorials really help me a lot. What distinguish your tutorials from others is it's very practical and hands-on. I have learned the basic theory of deep learning, but implementing them is the key! Thanks for your hard work. God bless you!
@vaisuliafu3342 4 ปีที่แล้ว ⁺⁵
Really enjoy how you leave the theory for other videos and get right to the hands on, thank you!
@awadelrahman 4 ปีที่แล้ว ⁺⁹
Thanks for the tutorial. What I think regarding the LSTM having better performance when only taking the last layer output, is that the LSTM now has the chance to develop and accumulate a good decision, as the question is a classification problem (i.e. many-to-1). That is because the last output is conditioned to ALL the previous states. In the case of including the intermediate states as an input to the FC layer, the accumulated learning will be somehow "partially" phased out by the immature decisions represented in the hidden states if I may say :)
@AladdinPersson 4 ปีที่แล้ว ⁺²
Yeah I think you're absolutely right. I'm also thinking that if we would train for longer the performance should be slightly improved but in the video we only trained for a very short duration. I think your argument is especially true in this situation since the network is only fed information that is relevant (or more relevant than information from all states) but with longer training time the network would also be able to figure out the relevant parts
@awadelrahman 4 ปีที่แล้ว ⁺¹
@@AladdinPersson yeah! Your illustration is wonderful! Good luck
@praladprasad5455 4 ปีที่แล้ว ⁺⁵
A slight heads-up for people trying this out themselves -- (for the eagle-eyed observants, never mind).
Using a learning rate of 0.005 for a Vanilla RNN does not have any effect on learning, and you will end up not converging (abysmal accuracy). Use a smaller learning rate for RNN's (0.001) and you can use the default 0.005 for the GRU and LSTM implementations, to replicate the results. Great video nevertheless!
@egorkosaretskiy4942 3 ปีที่แล้ว
Love your channel, very underrated
@arsive02 ปีที่แล้ว ⁺²
Never thought of doing image related processing with RNNs xD
Nice tutorial. Thanks. I like this playlist for its clear explanations about the code, and yeah the intro is my favourite
@patloeber 4 ปีที่แล้ว ⁺⁸
Nice Tutorial! However, I have a question. Do you know some references where they combine all time steps for the classification at the end? I've not seen that before, and I'm wondering what's the point? Shouldn't the last time step output be the best predictor anyway?
@AladdinPersson 4 ปีที่แล้ว ⁺⁸
Thank you, appreciate you stopping by! I see major flaws with combining all time steps and sending that through a linear layer to some output, the major being we're assuming every example will have same amount of time steps, which is the case for mnist, but not true in general. In retrospect I wasn't very clear on this in the tutorial. I think I did show just using the last time step in the end of the video which is the standard, but my idea here was to show how you could play around with the layers to better understand the output of the rnn.
To be clear, you should only use the last time step, as you say it is the step which contains all of the information (hopefully) from previous time steps so from a computation perspective it should definitely be the best & as mentioned it is also the most general way.
@patloeber 4 ปีที่แล้ว ⁺²
Thanks for your quick response! That clarifies it, and yes I think so, too! I was a bit surprised and it indeed made me play around with the layers 😃 Great tutorial anyways :)
@tapaskumarroy6637 4 ปีที่แล้ว
As per the pytorch documentation the shape of the output of the nn.RNN cell is (seqlength,batchsize ,hidden_size)so the reshaping operation should be out.reshape(out.shape[1],-1)
@AladdinPersson 4 ปีที่แล้ว
I think you're absolutely right that the way to do the reshaping changes if you have (seq_len, batch, hidden) but keep in mind that we used batch_first=True in our example
@venkatesanr9455 4 ปีที่แล้ว ⁺¹
Good explanation with informative contents as usual and need to be discussed some topics like transformers, autoencoders. Any business use cases/applications using this library (PyTorch) with end to end discussions that will be helpful.
Thanks
@AladdinPersson 4 ปีที่แล้ว ⁺¹
Will do Transformers next
@venkatesanr9455 4 ปีที่แล้ว
@@AladdinPersson Also, I have one doubt aladdin, I believe that using the lstm initialisation in __init__ we can drive multiple LSTM like other linear layers or kindly share your ideas. If I am wrong, kindly correct me
Thanks
@AladdinPersson 4 ปีที่แล้ว ⁺¹
@@venkatesanr9455 Yes you can initialize anything you want in the init and then just use those later in the forward pass. What is it you would like to add to the current init?
@venkatesanr9455 4 ปีที่แล้ว
@@AladdinPersson Any suggestions/links work related with multiple LSTM layers using pytorch. Will be helpful.
@AladdinPersson 4 ปีที่แล้ว ⁺¹
@@venkatesanr9455 The number of layers is just an additional parameter that you send in to the LSTM you initialize in the init. An example is what I did in the video, as I had number_of_layers = 2. You can easily change this to any number of layers you would like ;)
Look: github.com/AladdinPerzon/Machine-Learning-Collection/blob/b64cd3048d6f73da13625c69b5d32f18a658c362/ML/Pytorch/Basics/pytorch_rnn_gru_lstm.py#L40
@holthuizenoemoet591 ปีที่แล้ว
what i'm always missing are a few inference examples with the final model and the code to do so
@thecros1076 4 ปีที่แล้ว ⁺²
i am not able to understand how is the sequence being passed and predicted , i am not able to understand the flow of data that is taking place , please can you explain ...how is it passing??.....and in sequence analysis of data we use RNN we give input suppose (''hello'') then we must predict ("ello_")....the underscore is the next in put of o ......so what we are predicting here in the MNIST example......i hope you got my question.
@thecros1076 4 ปีที่แล้ว
also i did not understand the hidden_size*sequence length ?why was the linear layer size choose that way ?
@AladdinPersson 4 ปีที่แล้ว ⁺¹
I think to understand how the sequence is being passed it and how they work in an RNN it can be beneficial watching a few lectures of Andrew Ng. Here's a playlist: th-cam.com/play/PL1F3ABbhcqa3BBWo170U4Ev2wfsF7FN8l.html. But so when we use batch_first=True we have the dimensions (batch_size, sequence_length, input_size) where both input_size and sequence_length equals 28. This is because we have an image of 28x28 and we are adapting this to a sequence problem where we let the model see one row at a time. So one time step
sequence here is one row of pixels, and we have 28 of these sequences and one row of pixels is 28 dimensions.
This is a bit different from the Seq2Seq example you have, the RNNs we are creating here are many-to-one with the goal of predicting a single number which is the number in the image. Regarding hidden_size*sequence_length was just a way of me making it overly complicated, normally you would take the last hidden (when the RNN has seen all time steps) and send this through a linear layer which would take as input hidden_size*1 but I was making it a bit confusing with taking the hidden from all time steps hence hidden_size*seq_length. This is not a general way of doing it at all and is not recommended, in fact I regret showing it. Generally we do not even have sequences of same length so if not all sequences are exactly the same length this would break.
@thecros1076 4 ปีที่แล้ว
@@AladdinPersson got it man .....thank you soo much❤️❤️❤️❤️❤️❤️thank you❤️❤️❤️❤️❤️❤️
@bestest43 4 ปีที่แล้ว ⁺¹
may I ask how would you define your input size, and sequence length if you would have word embeddings of num_instances by num_features?
@randomforrest9251 3 ปีที่แล้ว ⁺¹
your gru and lstm seem to work correct.. but your rnn provedes an accuracy (test and train) of about 9.7%.. something went wrong there (get them from your repo)
@samwit4501 3 ปีที่แล้ว ⁺¹
if you reduce the learning rate to 0.001 it should converge
@zhengguanwang4337 2 ปีที่แล้ว ⁺¹
perfect!!!!
@anas.2k866 2 ปีที่แล้ว
Thanks, why in implementation you do not need to specify sequence_length in your architecture ? Is there a specific forme to give in the input to the model in order to let it detects the sequence_length alone ?
@donkkey245 3 ปีที่แล้ว
I really like the paper walk through tutorial. I am your patriot. Expect you do deliver more cool stuff.
@aneekaazmat6653 2 ปีที่แล้ว
Thanks for explanation. I have tried BiLSTM on salami dataset for detectiong boundaries but the f1 score is decreasing after 20 epochs , can you please elaborate how may I fix this overfitting issue using same model?
@deepeshr3423 3 ปีที่แล้ว
The second Conv layer should have stride 2 that will output 784 i,e 16*7*7 else its 16*14*14 will be in the input to FC layers
@joxa6119 7 หลายเดือนก่อน
Can I know why there is torch zeros in the forward method? (the reason) If you have any resource to share it would be good.
@user-or7ji5hv8y 3 ปีที่แล้ว
Why do you take the product of hidden_size and sequence_length as input into nn.Linear() at 6:00?
@palanichamyramasamy825 4 ปีที่แล้ว ⁺³
Can you please post some videos on pytorch autoencoder?
@AladdinPersson 4 ปีที่แล้ว ⁺¹
As soon as I'm done with these courses I'm taking at the moment, I have a long list of topics I want to explore, autoencoder is added to that list ;)
@palanichamyramasamy825 4 ปีที่แล้ว
Please post the videos for loss functions, epoch,batch size with tensorboard monitoring
@AladdinPersson 4 ปีที่แล้ว ⁺²
@@palanichamyramasamy825 I've done that one actually :) th-cam.com/video/RLqsxWaQdHE/w-d-xo.html
@palanichamyramasamy825 4 ปีที่แล้ว
@@AladdinPersson thanks...i already view it and can you please post particularly explaining about loss functions, train,eval and how to use the trained vector file in the other set of data.
@ashishjohnsonburself 4 ปีที่แล้ว
Hi Aladdin.. The video is the to the point and awesome till the implementation part. I think you could have added a hacky intro to RNN/GRU/LSTM as well, Otherwise I really liked this one.
@shivkrishnajaiswal8394 ปีที่แล้ว
Good Tutorial
@yuqi4511 ปีที่แล้ว
Hi Aladdin,
Thank you so much for your amazing tutorial videos.
I was wondering that about only using the last hidden state in the lstm, should the code be `self.fc = nn.Linear(sequence_length, num_classes)` rather than `self.fc = nn.Linear(hidden_size, num_classes)`?
Best,
Yu
@somyekathait127 4 ปีที่แล้ว
Thanks Aladdin. Great tutorial.
By the way i was trying to test my model on individual samples. I realized that it does not matter if the shape of my individual image is (1,28,28) or (28,28), my model accepts it and gives me correct results. Why would be that ? Should not the model reject an image with (28,28) since it expects this shape: (batch, seq_len, features) ?
@nikhilkumar6036 4 ปีที่แล้ว
Hello Very Nice tutorial, I have a question. I know that Rnn's can take variable length sequences but when it comes to mini batch we should pad them to same length. Why? why can't we have variable length sequences in a mini batches.
@AladdinPersson 4 ปีที่แล้ว ⁺¹
It's a good question, so how tensors are defined is that you can't have variable lengths so we need to pad to ensure that all of them have the dimensionality. Rnn can take variable length so we don't have to have a specific length that we send in, but all of these have to be the same dimensions from how we have constructed a tensor
@HandsOnEverythingIn 3 ปีที่แล้ว
Thank you for the tutorial. Is it possible to teach us RNN with a text dataset? it is a bit hard to get how it goes with images.
@rekreator9481 3 ปีที่แล้ว
it works same with text as with images, because lstm always expects two dimensional input.. you just have sequences of text as a data (2d array), instead of pixels times pixels data in image (also 2d array)
@zrmsraggot 3 ปีที่แล้ว
Does it perform the same if you put a sequence of rows or a sequence of columns as the input ?
@tag_of_frank 2 ปีที่แล้ว
Pytorch seems like so much more work to code than Tensorflow with Keras what a pain !
@asiskumarroy4470 4 ปีที่แล้ว
YOLO object detection from scratch is much needed .Please can you make a video on that
@AladdinPersson 4 ปีที่แล้ว ⁺¹
When I get the time it's definitely a priority
@madhuvarun2790 3 ปีที่แล้ว
Why are you creating new hidden state for every forward pass?
@dreamphoenix 2 ปีที่แล้ว
Thank you!
@adesiph.d.journal461 4 ปีที่แล้ว
Hello! Great Tutorial. I am getting really confused with hidden_size = 256. How did you end up with that number, I understand its a hyperparameter, but I am unable to visualize what it represents. Any light on this part would be highly appreaciated.
@AladdinPersson 4 ปีที่แล้ว ⁺¹
Hm, not sure if I have a great response to this. You're right that this is just a hyperparameter so the choice of 256 is quite arbitrary. What happens in between the input and the hidden nodes is just a linear layer, meaning we're mapping some input nodes (which in this case is a row of pixel values from an image) to some higher dimension where it has hopefully learned how to represent the features of what this row represents in a better way than it was originally.
I think 3b1b has great videos on this on trying to understand what the neural network is learning, and it's basically the same thing here expect the 256 nodes are going to be computed from both the input row and computation from the previous time steps
@adesiph.d.journal461 4 ปีที่แล้ว
@@AladdinPersson thanks man! Amazing tutorials..have been binge watching/coding this :)
@guccihabenero 4 ปีที่แล้ว
Hi Aladdin,
which line of code says that we are feeding out lstm one row of features at a time. Im confused as it seems we are still feeding the lstm model with the whole 28x28 image
@AladdinPersson 4 ปีที่แล้ว
Every time step we're feeding it one row of 28 pixels at a time. In total we're running it for 28 time steps and that's how we end of feeding it the entire 28x28 image. We are indeed sending it in 28x28 at the same time, but nn.LSTM is made to take all the sequence time steps at the same time if you look at the documentation. If you would use LSTMCell you would need to feed it one row 28 at a time and have a for loop for all of the 28 rows.
@guccihabenero 4 ปีที่แล้ว
@@AladdinPersson So if we were wanting to feed the lstm with two rows of pixels instead of one, would we just have to double the sequence length to 56? Thanks for your reply and explanation.
@harjyotbagga7693 4 ปีที่แล้ว
Hey I have been following your tutorial series, I have a doubt! Why are we getting such overfitting results for just training for 2 epochs. Even though we're using an RNN, that's not suitable for training Image Data!
@randanblabla 4 ปีที่แล้ว
HI Aladdin, Thanks again for your videos!
In lines 61-76 you wrote the train outside of a function.
But then, in line 103 you called a modle.train() method, that you didn't defined..
Is it again something defined in the parent class?
I would like to work similarly to the sk-learn models, as in model.train() model.predict()
How should I define it?
@ChizkiyahuOhayon ปีที่แล้ว ⁺¹
model.train() doesn't really train the model. You can regard it as mode switching. When you are about to begin the testing, model.eval() can change the model to a testing state. If you model contains batch normalization and dropout, the parameters will change during the testing. In conclusion, use model.eval() before testing to make the model unchangeable, and use model.train() to make the model changeable after testing.
@k.z.982 4 ปีที่แล้ว
Thanks!
@phox8047 3 ปีที่แล้ว
Hi Aladdin.
I'm really thank you for the tutorial. and I recently run your code but I got a error
```
..\aten\src\ATen
ative\BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
```
could you update your code or let me know how to fix it? I can't find the answer
@omarabubakr6408 ปีที่แล้ว
when we squeezed here we lost info abt the channels isn't that bad?
@Yazdah 4 ปีที่แล้ว
Thanks
perfect :)
@AladdinPersson 4 ปีที่แล้ว
Appreciate the comment :)
@tag_of_frank 2 ปีที่แล้ว
out.reshape seems to be the same as nn.Flatten()
@samirelzein1978 4 ปีที่แล้ว ⁺³
I find it really a bad habit with these tutorial videos, you could save much of your and others times in presenting your already written code, commenting on each function, that you research and try to understand deeply, spend time trying to print and plot the transformations down the road, the maths intuition on things, rather than sitting and read/typing your saved code. Show input data in csv, excet, plot it, show nature, speak insight... Hope this helps. Nothing personal. 3B1B is a hige standard, but play along those lines on the coding space, hope this is clear.
@AladdinPersson 4 ปีที่แล้ว ⁺⁵
I hear you and can definitely understand your argument, but I do not necessarily agree.
I think a con of what you are proposing is that people trying to learn a topic do not want to see massive code right from the start, they want to be introduced to it and then guided along the thought process of creating the final result. That's exactly what you would miss if you just read through someone's code project and that's kind of why I want to make these videos to offer someone the ability of following along and seeing the thoughts that go behind each line.
To your 3B1B point which I do find an interesting one I have not found an way of representing code efficiently like he can represent the topics in his videos, your idea I do not think would make for a good representation and my arguments against are the ones I mentioned above, but I might be wrong about this. Also let's say there was a way to make graphical animations to efficiently explain code, that wouldn't be practical in the same way. Code changes quickly. 3b1b conceptual videos are ones that you can probably watch for a really really long time
@samirelzein1978 4 ปีที่แล้ว
@@AladdinPersson the alternative is not reading code, that wasnt a thoughtful answer. But you commenting deeply on a code you show the way you want except for typing, retyping, mistyping with minimal depth. If you guys focus on the depth and explanation, you d spare your time and times of others. Besides, if it was massive code you wont be typing it on camera. Anyway, logic is good - basic for AI. Keep doing videos, think of betterments, dont send people read github :) good luck
@AladdinPersson 4 ปีที่แล้ว ⁺³
@@samirelzein1978 Alright so I edited my first response, reading through your first comment again I think I misinterpreted what you were saying. I agree with focusing on depth, and the way I find most efficient so far is trying to explain the topic coding it line by line (by coding it live, explaining it step by step) and my goal is never to just type saved code, and I wouldn't say that's what's going on in these videos either.. Anyways, the proposition of presenting and explaining complete code I don't feel would result in more depth in these videos. If you got other ideas I'm definitely open to hearing them
@sahhaf1234 2 ปีที่แล้ว
dear aladdin,
I have listened and liked many of your videos but this one was pretty much ununderstandable...
what is num_layers, hidden_size and num_classes etc in the context of an RNN? a figure would be immensely helpful.. And what are "nodes" you keep on referring to? nerurons? If I had known all of these, I wouldn't be watching this video...
@AladdinPersson 2 ปีที่แล้ว
Sorry to hear that the video wasn’t so clear. Yes I use nodes and neurons interchangeably. It was a while since I made this video, num layers is stacking RNNs on top of each other, hidden size is the number of nodes in our intermediate calculations and num classes is number of nodes we should output at the end/top of the RNN. I think I am assuming you know about RNNs, if you don’t I would recommend search andrew ng rnn and watch his explanation. Then watch this to know how to implement it in pytorch
@sahhaf1234 2 ปีที่แล้ว
@@AladdinPersson Thanks a lot.. I dont want to be misunderstood, I am a great fan of your videos, I usually recommend them to other people, but in my opinion this one was a bit too fast. So, please take what I say as a constructive criticism..
I am perfectly aware that you are using up your precious time to bring us this information...
@shlomospi 2 ปีที่แล้ว
It's faster to create tensors in the gpu
@nikhilkumar6036 4 ปีที่แล้ว ⁺¹
Hello Very Nice tutorial, I have a question. I know that Rnn's can take variable length sequences but when it comes to mini batch we should pad them to same length. Why? why can't we have variable length sequences in a mini batches.

ต่อไป

เล่นอัตโนมัติ