Man these are 11 minutes of exception right here... You practically explained a complex topic in just 11 minutes... unlike other channels... you're awesome and keep it up :)
Nice tutorial. For a moment I got confused in the output translations like Nan, Nul which I confused with NAN (not a number) and NULL (empty state). :D
Thank you, sir! One question: How are the attention weights are calculated? I know the equation for it but I don't really understand the dependencies between the last hidden states of the RNN and the decoder. Could you recommend something to read about or could explain it here?
@@TheEasyoung why watch your videos if we need to read the original research paper anyway. :) Maybe you could make a seperate video on the matrix and q.k.v vectors. :)
thank you for the great videos! Could you elaborate a bit regarding the implementation of the fully collected layer in the example with the attention mechanism (or point to the code example if such exist)? I am not quite sure what are the dimensions of this part of the network.
here is the well written code colab.research.google.com/github/tensorflow/tensorflow/blob/r1.9/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb the dimension should be the rnn output which you also can see from the code.
안녕하세요 질문이 있습니다. 디코딩 부분에서 논문을 보면 g(Y t-1, S t, C) 즉 출력언어를 생성할 때 Context벡터와 이전 출력 그리고 현재 시점의 hidden State가 입력으로 들어가게 되는데 설명해주신 그림을 보면 이전 출력(Y t-1)과 Context벡터(C)만 들어가는 것으로 보입니다. 어떤것이 맞는 건가요? 그림에서 S t가 입력으로 들어가야 할 거 같아서요 ! 제가 틀렸을 수도 있습니다.
피드백 감사합니다. g는 nonlinear function을 의미하고 그 안에 이전 출력값, 현재 rnn출력값, 그리고 이전 context 벡터가 들어갑니다. 이는 제가 현재 비디오에서 설명하는 것과 같아보이네요. 전 현재 스테이트를 context 벡터를 만드는 데 사용한 것으로 설명하는데, 제 설명이 좀 더 논문보다 직관적으로 이해가 쉬울 것으로 생각되었습니다.
Softmax is to provide probability of each class. Before softmax, numbers are not 0 to 1 value. After softmax, you can see numbers on each class node is range 0 to 1 which you can consider as probability and you use max probability one as prediction of your deep learning model.
No. Activation function is different concept. Sigmoid which is popular activation function is similar in a sense giving range 0 to 1. Softmax is normally located after activation functions. The big difference is While activation function only care about one node, softmax cares all nodes and normalize percentage on all over nodes.
Good suggestions! I am one of engineer who mostly spend time for work, I can’t guarantee if I will create these video or not, but thanks for suggesting good topics.
do you happen to know of a good TOKENIZER for 日本語 Japanese text? No spaces between the Japanese words, greatly complicating tokenization. I've found a few (Mecab), but I havent found anything truly effective (no NLTK J tokenizer)... I'm building a Synthetic Text Generator (word/phrase level) but I'm missing the Jtokenizer.... Thank you....
Man these are 11 minutes of exception right here... You practically explained a complex topic in just 11 minutes... unlike other channels... you're awesome and keep it up :)
Thank you so much, your explanation was very clear
Nice tutorial. For a moment I got confused in the output translations like Nan, Nul which I confused with NAN (not a number) and NULL (empty state). :D
Haha sorry for confusion... yeh indeed these nan, nul are Confusing :)
best explanation in 10 min
Thanks for this video. You saved my time
bro you are my hero!
Very helpful and easy to understand. Keep up the good work!
Your videos are awesome. Thank you!
Thank you so much! Your explanation helps a lot!
This helped me a lot thank you!
Thank you Sir.Very Easy to understand. Thank you
very clear tutorial, thanks a lot
How this fully connected layer working ...how we are geting the relevant hidden state from this fully connected layer
Thank you, sir! One question: How are the attention weights are calculated? I know the equation for it but I don't really understand the dependencies between the last hidden states of the RNN and the decoder. Could you recommend something to read about or could explain it here?
The attention weights are initially random number, and incrementally calculated during back propagation. The research paper is good resources :)
@@TheEasyoung why watch your videos if we need to read the original research paper anyway. :) Maybe you could make a seperate video on the matrix and q.k.v vectors. :)
well explained, thanks for the effort to make it
thank you for the great videos! Could you elaborate a bit regarding the implementation of the fully collected layer in the example with the attention mechanism (or point to the code example if such exist)? I am not quite sure what are the dimensions of this part of the network.
here is the well written code
colab.research.google.com/github/tensorflow/tensorflow/blob/r1.9/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb
the dimension should be the rnn output which you also can see from the code.
Nice explaination.
Very Nice Video.
Most awesome 😱🎉😱😂explaination 😱🌼🎊🌼🎉🤗🤗
reference: arxiv.org/pdf/1409.0473.pdf
안녕하세요 질문이 있습니다. 디코딩 부분에서 논문을 보면 g(Y t-1, S t, C) 즉 출력언어를 생성할 때 Context벡터와 이전 출력 그리고 현재 시점의 hidden State가 입력으로 들어가게 되는데 설명해주신 그림을 보면 이전 출력(Y t-1)과 Context벡터(C)만 들어가는 것으로 보입니다. 어떤것이 맞는 건가요? 그림에서 S t가 입력으로 들어가야 할 거 같아서요 ! 제가 틀렸을 수도 있습니다.
피드백 감사합니다. g는 nonlinear function을 의미하고 그 안에 이전 출력값, 현재 rnn출력값, 그리고 이전 context 벡터가 들어갑니다. 이는 제가 현재 비디오에서 설명하는 것과 같아보이네요. 전 현재 스테이트를 context 벡터를 만드는 데 사용한 것으로 설명하는데, 제 설명이 좀 더 논문보다 직관적으로 이해가 쉬울 것으로 생각되었습니다.
@@TheEasyoung 네 감사합니다. 좋은설명 잘 들었습니다
Great video
Neat explanation! thank u
who noticed point you for the 1 at 6.13 minutes :(
thank you so much, what is the meaning of softmax?
Softmax is to provide probability of each class. Before softmax, numbers are not 0 to 1 value. After softmax, you can see numbers on each class node is range 0 to 1 which you can consider as probability and you use max probability one as prediction of your deep learning model.
Minsuk Heo 허민석 thank you. Is the also called activation function?
No. Activation function is different concept. Sigmoid which is popular activation function is similar in a sense giving range 0 to 1. Softmax is normally located after activation functions. The big difference is While activation function only care about one node, softmax cares all nodes and normalize percentage on all over nodes.
Minsuk Heo 허민석 okay thank you so much. :) where can I find the dataset to try translation pair?
There are many places. Here is one.
www.tensorflow.org/datasets/catalog/wmt19_translate
Thank you
thank you sir.
🤣
Great explanation sir..could you please share me that colab link ?
Can you explain Luong, Bilinear, Global, Local using the same methodology?
Good suggestions! I am one of engineer who mostly spend time for work, I can’t guarantee if I will create these video or not, but thanks for suggesting good topics.
do you happen to know of a good TOKENIZER for 日本語 Japanese text? No spaces between the Japanese words, greatly complicating tokenization.
I've found a few (Mecab), but I havent found anything truly effective (no NLTK J tokenizer)...
I'm building a Synthetic Text Generator (word/phrase level) but I'm missing the Jtokenizer....
Thank you....
AFAIK mecab is one of the best japanese POS tagger and it can be edited to add new data but that's about it. Damn those japanese without white space.
nan and null value