seq2seq with attention (machine translation with deep learning)

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 พ.ย. 2024

ความคิดเห็น • 46

  • @mohamedibrahimbehery3235
    @mohamedibrahimbehery3235 3 ปีที่แล้ว +10

    Man these are 11 minutes of exception right here... You practically explained a complex topic in just 11 minutes... unlike other channels... you're awesome and keep it up :)

  • @imadsaddik
    @imadsaddik ปีที่แล้ว +1

    Thank you so much, your explanation was very clear

  • @skymanaditya
    @skymanaditya 3 ปีที่แล้ว +7

    Nice tutorial. For a moment I got confused in the output translations like Nan, Nul which I confused with NAN (not a number) and NULL (empty state). :D

    • @TheEasyoung
      @TheEasyoung  3 ปีที่แล้ว +2

      Haha sorry for confusion... yeh indeed these nan, nul are Confusing :)

  • @gamefever6055
    @gamefever6055 4 ปีที่แล้ว +1

    best explanation in 10 min

  • @rishirajgupta6262
    @rishirajgupta6262 4 ปีที่แล้ว +1

    Thanks for this video. You saved my time

  • @ashishbodhankar1993
    @ashishbodhankar1993 3 ปีที่แล้ว +1

    bro you are my hero!

  • @jinhopark3671
    @jinhopark3671 5 ปีที่แล้ว +2

    Very helpful and easy to understand. Keep up the good work!

  • @superaluis
    @superaluis 4 ปีที่แล้ว +1

    Your videos are awesome. Thank you!

  • @hafsatimohammed9604
    @hafsatimohammed9604 2 ปีที่แล้ว

    Thank you so much! Your explanation helps a lot!

  • @dani-ev6qz
    @dani-ev6qz 3 ปีที่แล้ว +1

    This helped me a lot thank you!

  • @djsnooppyzatdepoet7568
    @djsnooppyzatdepoet7568 5 ปีที่แล้ว +1

    Thank you Sir.Very Easy to understand. Thank you

  • @jinpengtian2072
    @jinpengtian2072 4 ปีที่แล้ว +1

    very clear tutorial, thanks a lot

  • @yashumahajan7
    @yashumahajan7 4 ปีที่แล้ว +2

    How this fully connected layer working ...how we are geting the relevant hidden state from this fully connected layer

  • @ladyhangaku2072
    @ladyhangaku2072 4 ปีที่แล้ว +3

    Thank you, sir! One question: How are the attention weights are calculated? I know the equation for it but I don't really understand the dependencies between the last hidden states of the RNN and the decoder. Could you recommend something to read about or could explain it here?

    • @TheEasyoung
      @TheEasyoung  4 ปีที่แล้ว +5

      The attention weights are initially random number, and incrementally calculated during back propagation. The research paper is good resources :)

    • @coqaine
      @coqaine 4 ปีที่แล้ว +1

      @@TheEasyoung why watch your videos if we need to read the original research paper anyway. :) Maybe you could make a seperate video on the matrix and q.k.v vectors. :)

  • @anggipermanaharianja6122
    @anggipermanaharianja6122 4 ปีที่แล้ว +1

    well explained, thanks for the effort to make it

  • @greyxray
    @greyxray 3 ปีที่แล้ว +1

    thank you for the great videos! Could you elaborate a bit regarding the implementation of the fully collected layer in the example with the attention mechanism (or point to the code example if such exist)? I am not quite sure what are the dimensions of this part of the network.

    • @TheEasyoung
      @TheEasyoung  3 ปีที่แล้ว

      here is the well written code
      colab.research.google.com/github/tensorflow/tensorflow/blob/r1.9/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb
      the dimension should be the rnn output which you also can see from the code.

  • @sahiluppal3093
    @sahiluppal3093 4 ปีที่แล้ว +1

    Nice explaination.

  • @anujlahoty8022
    @anujlahoty8022 4 ปีที่แล้ว +1

    Very Nice Video.

  • @sophies674
    @sophies674 2 ปีที่แล้ว

    Most awesome 😱🎉😱😂explaination 😱🌼🎊🌼🎉🤗🤗

  • @TheEasyoung
    @TheEasyoung  5 ปีที่แล้ว

    reference: arxiv.org/pdf/1409.0473.pdf

  • @chanjunpark2355
    @chanjunpark2355 5 ปีที่แล้ว +1

    안녕하세요 질문이 있습니다. 디코딩 부분에서 논문을 보면 g(Y t-1, S t, C) 즉 출력언어를 생성할 때 Context벡터와 이전 출력 그리고 현재 시점의 hidden State가 입력으로 들어가게 되는데 설명해주신 그림을 보면 이전 출력(Y t-1)과 Context벡터(C)만 들어가는 것으로 보입니다. 어떤것이 맞는 건가요? 그림에서 S t가 입력으로 들어가야 할 거 같아서요 ! 제가 틀렸을 수도 있습니다.

    • @TheEasyoung
      @TheEasyoung  5 ปีที่แล้ว

      피드백 감사합니다. g는 nonlinear function을 의미하고 그 안에 이전 출력값, 현재 rnn출력값, 그리고 이전 context 벡터가 들어갑니다. 이는 제가 현재 비디오에서 설명하는 것과 같아보이네요. 전 현재 스테이트를 context 벡터를 만드는 데 사용한 것으로 설명하는데, 제 설명이 좀 더 논문보다 직관적으로 이해가 쉬울 것으로 생각되었습니다.

    • @chanjunpark2355
      @chanjunpark2355 5 ปีที่แล้ว +1

      @@TheEasyoung 네 감사합니다. 좋은설명 잘 들었습니다

  • @xtimehhx
    @xtimehhx 4 ปีที่แล้ว +1

    Great video

  • @balagurugupta5193
    @balagurugupta5193 5 ปีที่แล้ว +1

    Neat explanation! thank u

  • @InNOutTube
    @InNOutTube 4 ปีที่แล้ว +3

    who noticed point you for the 1 at 6.13 minutes :(

  • @kartikafindra
    @kartikafindra 4 ปีที่แล้ว +1

    thank you so much, what is the meaning of softmax?

    • @TheEasyoung
      @TheEasyoung  4 ปีที่แล้ว

      Softmax is to provide probability of each class. Before softmax, numbers are not 0 to 1 value. After softmax, you can see numbers on each class node is range 0 to 1 which you can consider as probability and you use max probability one as prediction of your deep learning model.

    • @kartikafindra
      @kartikafindra 4 ปีที่แล้ว +1

      Minsuk Heo 허민석 thank you. Is the also called activation function?

    • @TheEasyoung
      @TheEasyoung  4 ปีที่แล้ว

      No. Activation function is different concept. Sigmoid which is popular activation function is similar in a sense giving range 0 to 1. Softmax is normally located after activation functions. The big difference is While activation function only care about one node, softmax cares all nodes and normalize percentage on all over nodes.

    • @kartikafindra
      @kartikafindra 4 ปีที่แล้ว +1

      Minsuk Heo 허민석 okay thank you so much. :) where can I find the dataset to try translation pair?

    • @TheEasyoung
      @TheEasyoung  4 ปีที่แล้ว

      There are many places. Here is one.
      www.tensorflow.org/datasets/catalog/wmt19_translate

  • @vybhavshetty9094
    @vybhavshetty9094 4 ปีที่แล้ว +1

    Thank you

  • @dhanojitray9475
    @dhanojitray9475 5 ปีที่แล้ว +2

    thank you sir.

  • @bhavyasri642
    @bhavyasri642 3 ปีที่แล้ว

    Great explanation sir..could you please share me that colab link ?

  • @Deshwal.mahesh
    @Deshwal.mahesh 3 ปีที่แล้ว

    Can you explain Luong, Bilinear, Global, Local using the same methodology?

    • @TheEasyoung
      @TheEasyoung  3 ปีที่แล้ว

      Good suggestions! I am one of engineer who mostly spend time for work, I can’t guarantee if I will create these video or not, but thanks for suggesting good topics.

  • @23232323rdurian
    @23232323rdurian 5 ปีที่แล้ว

    do you happen to know of a good TOKENIZER for 日本語 Japanese text? No spaces between the Japanese words, greatly complicating tokenization.
    I've found a few (Mecab), but I havent found anything truly effective (no NLTK J tokenizer)...
    I'm building a Synthetic Text Generator (word/phrase level) but I'm missing the Jtokenizer....
    Thank you....

    • @whatohyou01
      @whatohyou01 5 ปีที่แล้ว

      AFAIK mecab is one of the best japanese POS tagger and it can be edited to add new data but that's about it. Damn those japanese without white space.

  • @gamefever6055
    @gamefever6055 4 ปีที่แล้ว +1

    nan and null value