seq2seq with attention (machine translation with deep learning)

Minsuk Heo 허민석

มุมมอง 28 869

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 29 พ.ย. 2024

ความคิดเห็น • 46

@mohamedibrahimbehery3235 3 ปีที่แล้ว ⁺¹⁰
Man these are 11 minutes of exception right here... You practically explained a complex topic in just 11 minutes... unlike other channels... you're awesome and keep it up :)
@imadsaddik ปีที่แล้ว ⁺¹
Thank you so much, your explanation was very clear
@skymanaditya 3 ปีที่แล้ว ⁺⁷
Nice tutorial. For a moment I got confused in the output translations like Nan, Nul which I confused with NAN (not a number) and NULL (empty state). :D
@TheEasyoung 3 ปีที่แล้ว ⁺²
Haha sorry for confusion... yeh indeed these nan, nul are Confusing :)
@gamefever6055 4 ปีที่แล้ว ⁺¹
best explanation in 10 min
@rishirajgupta6262 4 ปีที่แล้ว ⁺¹
Thanks for this video. You saved my time
@ashishbodhankar1993 3 ปีที่แล้ว ⁺¹
bro you are my hero!
@jinhopark3671 5 ปีที่แล้ว ⁺²
Very helpful and easy to understand. Keep up the good work!
@superaluis 4 ปีที่แล้ว ⁺¹
Your videos are awesome. Thank you!
@hafsatimohammed9604 2 ปีที่แล้ว
Thank you so much! Your explanation helps a lot!
@dani-ev6qz 3 ปีที่แล้ว ⁺¹
This helped me a lot thank you!
@djsnooppyzatdepoet7568 5 ปีที่แล้ว ⁺¹
Thank you Sir.Very Easy to understand. Thank you
@jinpengtian2072 4 ปีที่แล้ว ⁺¹
very clear tutorial, thanks a lot
@yashumahajan7 4 ปีที่แล้ว ⁺²
How this fully connected layer working ...how we are geting the relevant hidden state from this fully connected layer
@ladyhangaku2072 4 ปีที่แล้ว ⁺³
Thank you, sir! One question: How are the attention weights are calculated? I know the equation for it but I don't really understand the dependencies between the last hidden states of the RNN and the decoder. Could you recommend something to read about or could explain it here?
@TheEasyoung 4 ปีที่แล้ว ⁺⁵
The attention weights are initially random number, and incrementally calculated during back propagation. The research paper is good resources :)
@coqaine 4 ปีที่แล้ว ⁺¹
@@TheEasyoung why watch your videos if we need to read the original research paper anyway. :) Maybe you could make a seperate video on the matrix and q.k.v vectors. :)
@anggipermanaharianja6122 4 ปีที่แล้ว ⁺¹
well explained, thanks for the effort to make it
@greyxray 3 ปีที่แล้ว ⁺¹
thank you for the great videos! Could you elaborate a bit regarding the implementation of the fully collected layer in the example with the attention mechanism (or point to the code example if such exist)? I am not quite sure what are the dimensions of this part of the network.
@TheEasyoung 3 ปีที่แล้ว
here is the well written code
colab.research.google.com/github/tensorflow/tensorflow/blob/r1.9/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb
the dimension should be the rnn output which you also can see from the code.
@sahiluppal3093 4 ปีที่แล้ว ⁺¹
Nice explaination.
@anujlahoty8022 4 ปีที่แล้ว ⁺¹
Very Nice Video.
@sophies674 2 ปีที่แล้ว
Most awesome 😱🎉😱😂explaination 😱🌼🎊🌼🎉🤗🤗
@TheEasyoung 5 ปีที่แล้ว
reference: arxiv.org/pdf/1409.0473.pdf
@chanjunpark2355 5 ปีที่แล้ว ⁺¹
안녕하세요 질문이 있습니다. 디코딩 부분에서 논문을 보면 g(Y t-1, S t, C) 즉 출력언어를 생성할 때 Context벡터와 이전 출력 그리고 현재 시점의 hidden State가 입력으로 들어가게 되는데 설명해주신 그림을 보면 이전 출력(Y t-1)과 Context벡터(C)만 들어가는 것으로 보입니다. 어떤것이 맞는 건가요? 그림에서 S t가 입력으로 들어가야 할 거 같아서요 ! 제가 틀렸을 수도 있습니다.
@TheEasyoung 5 ปีที่แล้ว
피드백 감사합니다. g는 nonlinear function을 의미하고 그 안에 이전 출력값, 현재 rnn출력값, 그리고 이전 context 벡터가 들어갑니다. 이는 제가 현재 비디오에서 설명하는 것과 같아보이네요. 전 현재 스테이트를 context 벡터를 만드는 데 사용한 것으로 설명하는데, 제 설명이 좀 더 논문보다 직관적으로 이해가 쉬울 것으로 생각되었습니다.
@chanjunpark2355 5 ปีที่แล้ว ⁺¹
@@TheEasyoung 네 감사합니다. 좋은설명 잘 들었습니다
@xtimehhx 4 ปีที่แล้ว ⁺¹
Great video
@balagurugupta5193 5 ปีที่แล้ว ⁺¹
Neat explanation! thank u
@InNOutTube 4 ปีที่แล้ว ⁺³
who noticed point you for the 1 at 6.13 minutes :(
@kartikafindra 4 ปีที่แล้ว ⁺¹
thank you so much, what is the meaning of softmax?
@TheEasyoung 4 ปีที่แล้ว
Softmax is to provide probability of each class. Before softmax, numbers are not 0 to 1 value. After softmax, you can see numbers on each class node is range 0 to 1 which you can consider as probability and you use max probability one as prediction of your deep learning model.
@kartikafindra 4 ปีที่แล้ว ⁺¹
Minsuk Heo 허민석 thank you. Is the also called activation function?
@TheEasyoung 4 ปีที่แล้ว
No. Activation function is different concept. Sigmoid which is popular activation function is similar in a sense giving range 0 to 1. Softmax is normally located after activation functions. The big difference is While activation function only care about one node, softmax cares all nodes and normalize percentage on all over nodes.
@kartikafindra 4 ปีที่แล้ว ⁺¹
Minsuk Heo 허민석 okay thank you so much. :) where can I find the dataset to try translation pair?
@TheEasyoung 4 ปีที่แล้ว
There are many places. Here is one.
www.tensorflow.org/datasets/catalog/wmt19_translate
@vybhavshetty9094 4 ปีที่แล้ว ⁺¹
Thank you
@dhanojitray9475 5 ปีที่แล้ว ⁺²
thank you sir.
@sahidafridi6379 4 ปีที่แล้ว
🤣
@bhavyasri642 3 ปีที่แล้ว
Great explanation sir..could you please share me that colab link ?
@Deshwal.mahesh 3 ปีที่แล้ว
Can you explain Luong, Bilinear, Global, Local using the same methodology?
@TheEasyoung 3 ปีที่แล้ว
Good suggestions! I am one of engineer who mostly spend time for work, I can’t guarantee if I will create these video or not, but thanks for suggesting good topics.
@23232323rdurian 5 ปีที่แล้ว
do you happen to know of a good TOKENIZER for 日本語 Japanese text? No spaces between the Japanese words, greatly complicating tokenization.
I've found a few (Mecab), but I havent found anything truly effective (no NLTK J tokenizer)...
I'm building a Synthetic Text Generator (word/phrase level) but I'm missing the Jtokenizer....
Thank you....
@whatohyou01 5 ปีที่แล้ว
AFAIK mecab is one of the best japanese POS tagger and it can be edited to add new data but that's about it. Damn those japanese without white space.
@gamefever6055 4 ปีที่แล้ว ⁺¹
nan and null value

ต่อไป

เล่นอัตโนมัติ