Pytorch Seq2Seq with Attention for Machine Translation

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ธ.ค. 2024

ความคิดเห็น • 50

  • @semaj8683
    @semaj8683 ปีที่แล้ว

    Excellent video as ever! thank you very much for clear explanations!

  • @teetanrobotics5363
    @teetanrobotics5363 4 ปีที่แล้ว +1

    This guy is better than several college professors.

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว +1

      You're too kind but not true, have too much to learn :\

  • @antoninleroy3863
    @antoninleroy3863 2 ปีที่แล้ว

    Thanks for the free education !

  • @saraferro509
    @saraferro509 ปีที่แล้ว +1

    Dear @Aladdin, when you concatenate solely the first hidden and the second hidden you are considering one only hidden layer of the RNN. I mean, in your example you are using a LSTM with 2 hidden layers, thus the dimension of hidden would be (2*n_hidden_layers,N,n_hidden_nodes). Should we consider not only the forward and backward connection of the first hidden layer, but also of the second? Or if more of all the other layers? (minute 9:53)

  • @ZobeirRaisi
    @ZobeirRaisi 4 ปีที่แล้ว +1

    Thanks for the tutorial

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว

      Appreciate the comment, hope you find it useful :)

  • @vanshshah6418
    @vanshshah6418 2 ปีที่แล้ว +1

    what do i have to modify in the case if i want to change the num_layers, i cant figure out. can you modify your github code to make it generalise rather hardcoding your code for only one layer. thanks

  • @2010mhkhan
    @2010mhkhan 4 ปีที่แล้ว

    Thank you so much for great video and explanation

  • @zawadtahmeed850
    @zawadtahmeed850 4 ปีที่แล้ว

    Thanks for this excellent content. Please make a follow-up video on the utils file it will be really helpful for new learners like me who are willing to work with other or custom datasets.

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว +1

      In hindsight I probably should've gone through the utils function but I do think after the video you're able to go through that code by yourself if you take some time. Code for it can be found here: github.com/AladdinPerzon/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq_attention/utils.py

  • @theexecutioner66
    @theexecutioner66 ปีที่แล้ว

    Great video! Could you perhaps make one about skip connections in RNNs and how to utilise them?

  • @slouma1998
    @slouma1998 4 ปีที่แล้ว +4

    I'm curious, how did you learn this? Is there an academic book or anything like that discusses the coding part of building models ?

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว +12

      Papers (& source code from papers), blog posts, forums, since everything is so new it's hard to find good books, although if you find any then do let me know:)

    • @vijayabhaskar-j
      @vijayabhaskar-j 3 ปีที่แล้ว +1

      @@AladdinPersson d2l.ai is awesome for most stuffs and it's often updated.

  • @stephennfernandes
    @stephennfernandes 3 ปีที่แล้ว

    BucketIterator is deprecated and can you please help how to convert your code to Training on TPUs the bucket iterator class doesn't support sampler args so I can't set DistributedDataSampler object to the DataLoader for TPU training

  • @Andrew6James
    @Andrew6James 3 ปีที่แล้ว

    Hi, I wondered why the input to the decoder has size (1,N)? I thought that the decoder only takes as input the previous output in the sequence? I am trying to use a Seq2Seq model for non-NLP tasks such as predictions of factory outputs and I am getting stuck having followed your GitHub.

  • @impact783
    @impact783 3 ปีที่แล้ว +1

    Hey! Did you draw those first images? If so what software did you use?

  • @1potdish271
    @1potdish271 2 ปีที่แล้ว +2

    So your code will not work for decoder with `num_layers=2` or with `bidirectional=True` . Right!!!

  • @DanielWeikert
    @DanielWeikert 4 ปีที่แล้ว

    Any ideas on how to figure out the necessary in and output shapes for the layers. It's really something I struggle with a lot. Thanks and great video!

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว +5

      I'm assuming you're referring to when we define the lstm and the subsequent linear layers in the encoder and decoder since input size is determined by the vocab size and the embedding, hidden_size are just hyperparameters.
      I understand that the shapes can also be confusing, it's particularly confusing since we're using a bidirectional lstm in the encoder but not doing it in the decoder (following the papers implementation). If we start with the Encoder the input_size will just be the embedding size (since we first run the input x through the embedding layer). Then the linear layers in the Encoder will be hidden_size*2 since we are concatenating the ones for the forward part and the backward part because we are using a bidirectional lstm. You could also just use one of the hiddens of the encoder lstm, either forward or backward, but if you want to use the information from both you need to do something like I did with a linear layer to map it from hidden_size*2 to just hidden_size since the Decoder will not be bidirectional.
      For the decoder we we will have the encoder_states which remember are really just the hidden values for every timestep (since we don't run through encoder_states from encoder through any additional linear layers), hence it will be hidden_size*2 in size for the final dimension. These encoder_states are just element wise multiplied by the attention scores which will be scalar values to form the context vector, but that multiplication doesn't modify the shape. We will then concatenate this context_vector together with the embedding layer in the decoder resulting in hidden_size * 2 + embedding_size as the input size to the decoder lstm.
      Hopefully that gave you something, it can be confusing and there's a lot of of shapes to keep track of :) Wish you the best of luck!

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 ปีที่แล้ว

    How do you keep tract of the shapes of various inputs and outputs to ensure that they are aligned? I notice as the program becomes longer, it becomes harder to keep track, especially when you need to go through the encoder and decoder part.

  • @somayehseifi8269
    @somayehseifi8269 2 ปีที่แล้ว

    a question the decoder part, hidden which is the input argument of the forward function, i think it was the hidden from encoder not from decoder am i right? so we used hidden from encoder 3 times without using hidden from decoder . I will be thankful if u explain this @Aladdin Persson. so for the fist time step all three hidden will be from encoder or what?

  • @archit_474
    @archit_474 8 หลายเดือนก่อน

    I want to say that now(the time i am watching) Field adm BucketIterator is removed, so how can we do it without them

  • @user-qk2ev5jl2b
    @user-qk2ev5jl2b ปีที่แล้ว +1

    In line 92, the permutation sequence (1, 2, 0 ) works for me instead of (1, 0, 2)

  • @swarajshinde3950
    @swarajshinde3950 4 ปีที่แล้ว

    Nice explaination for Seq2Seq with Pytorch . Can you suggest some more sources for ( Nlp using Pytorch)

  • @charissayu8025
    @charissayu8025 3 ปีที่แล้ว

    Hi I have pip install utils yet got the problem:
    cannot import name 'translate_sentence' from 'utils' (/opt/anaconda3/lib/python3.7/site-packages/utils/__init__.py)
    appreciated if you could advice how to solve it. Thank you.

    • @ATPokerATPoker-dp4ex
      @ATPokerATPoker-dp4ex 3 ปีที่แล้ว

      download utils from his github its a costum script he made

  • @UknownCompiler
    @UknownCompiler 3 ปีที่แล้ว

    i get this attention mechanism, but it took me time to get it after so much research, why? because the attention i know have keys, queries, and values, and it can be multi-head one, and all of the articles explain it that way, somehow the same attention which have the same keyword here is much simpler and different than the one i know multi-head one, and very few articles explain the attention shown here, my question is why this attention cannot have keys, queries, and values? and what exactly the name of this type of attentions?

  • @adityay525125
    @adityay525125 3 ปีที่แล้ว

    Hi I have concatenation errors of context vector and embedding

  • @tanmay_ds
    @tanmay_ds 4 ปีที่แล้ว

    Not related to this current video but can you suggest me some good source to learn implementing ivectors for speaker verification. Thank you!

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว +1

      I must admit I have not put any considerable time into this topic so can't really give you any valuable advice on this :\

    • @tanmay_ds
      @tanmay_ds 4 ปีที่แล้ว

      @@AladdinPersson No issues sir. Thank you for your response.

  • @MasterMan2015
    @MasterMan2015 2 ปีที่แล้ว +1

    num_layers =2>1, broke the code in:
    output, hiddens, cells = model.decoder(previous_word, outputs_encoder, hiddens, cells)

    • @AdityaThurvasSenthilKumar
      @AdityaThurvasSenthilKumar ปีที่แล้ว

      yep got the same issue! did you figure out where it went wrong? lol it's 1 year ago but shooting my shot still

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 ปีที่แล้ว

    Given that the video uses LSTM, is this the ELMO model?

  • @felixmohr8354
    @felixmohr8354 ปีที่แล้ว

    First of all: Great video, really well done. I have some remarks (besides the now necessary updates in preparing the tokens etc. and the permutation prepared for bmm, which was mentioned already below).
    Most importantly, I would like to point you to the appendix of the paper, which not only explains how the hidden states are composed (yes, concatenated), but also explains how the energy function is computed. And with respect to the latter one, I think that your implementation is not correct. The problem is that you concatenate the hidden state of the decoder and encoder and then use this directly as an input for the energy function, but this is not how it works. As I understand it, you need two more separate linear layers for both encoder and decoder hidden states to learn two separate weight *matrices* for them (both with the same number of neurons, which are n' in the paper, and the matrices are W_a and U_a). The results of these computations are then *added* up and linked into a tanh function, and it is this result, which really is then mapped into a linear layer (your energy function, which corresponds to the vector v_a in the paper).
    I would be curious to know your opinion on this.

    • @winx_hajar
      @winx_hajar ปีที่แล้ว

      Could you provide your own code with this appropriate pls?

  • @arshadshaikh4676
    @arshadshaikh4676 4 ปีที่แล้ว

    How to increase the number of layers ? Can you share the link to that implementation

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว +1

      It's on the github repository which is in the video description or here: github.com/aladdinpersson/Machine-Learning-Collection
      In the readme file I've tried to collect everything into a nice summary and hopefully it shouldn't be too difficult to find. To increase the number of layers you just have to send in the number of layers to use in the LSTM, although just remember that we need same in encoder and the decoder

    • @arshadshaikh4676
      @arshadshaikh4676 4 ปีที่แล้ว

      @@AladdinPersson yes , I tried to pass the number of layers in that. But it throws error.
      I.e. just assume I have 4 as number of layers, then for the decoder we need 4 encoder hidden weights as initilization.
      But it's an issue, we are actually concatinating the fwd and bwk passes that actually converts that 8(4fwd 4bwk) into 1. I.e.[8,*,*] --> [1,*,*].. instead we need [4,*,*] to initialise the decoder.
      Again, the next thing is, I edited it and returned [4,*,*] from the encoder,
      Still error persist, because the attention mechanism is hard-coded for the encoder outputs of [1,*,*].
      ( Drop me a email, we can have a discussion- ars.arshad.ars@gmail.com)

    • @AladdinPersson
      @AladdinPersson  4 ปีที่แล้ว +2

      @@arshadshaikh4676 Actually now I remember this problem, so what I did to solve it was to concatenate and send them through a fully connected network that can then be sent in to the decoder. I found this unecessarily complicated since it didn't improve the performance all that much from my experiments, so I didn't include it in the video

    • @arshadshaikh4676
      @arshadshaikh4676 4 ปีที่แล้ว

      @@AladdinPersson Thanks for it.

  • @chiragbaid8211
    @chiragbaid8211 4 ปีที่แล้ว

    Video on image captioning