C5W3L07 Attention Model Intuition

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ก.พ. 2018
  • Take the Deep Learning Specialization: bit.ly/2TF1B06
    Check out all our courses: www.deeplearning.ai
    Subscribe to The Batch, our weekly newsletter: www.deeplearning.ai/thebatch
    Follow us:
    Twitter: / deeplearningai_
    Facebook: / deeplearninghq
    Linkedin: / deeplearningai

ความคิดเห็น • 79

  • @TheBlackhawk2011
    @TheBlackhawk2011 2 ปีที่แล้ว +79

    "if you can't explain it simply you don't understand it well enough" Andrew is one of the best instructors in the world. No wonder he teaches at Standord.

    • @grownupgaming
      @grownupgaming 2 ปีที่แล้ว +2

      as someone from berkeley, this is a good comment!

  • @Mesenqe
    @Mesenqe 4 ปีที่แล้ว +158

    I can't believe that, Siraj Raval has around 700K subscribers and this most valuable channel only 76K.

    • @AIPlayerrrr
      @AIPlayerrrr 4 ปีที่แล้ว +12

      the worst channel on youtube...notorious...

    • @Mesenqe
      @Mesenqe 4 ปีที่แล้ว +2

      @@AIPlayerrrr ok, if #Deeplearning_ai is the worst channel, where almost everyone become the master of Machine learning, and Deep learning...plz recommend me which channel you use? N.B. Not in Chinese, in English.

    • @AIPlayerrrr
      @AIPlayerrrr 4 ปีที่แล้ว +30

      Gery W. Adhane I am talking about Siraj Raval....

    • @AIPlayerrrr
      @AIPlayerrrr 4 ปีที่แล้ว +3

      Gery W. Adhane I remember Lex’s interview on Siraj Raval which Siraj said he only knew 50% of the materials lolll

    • @Mesenqe
      @Mesenqe 4 ปีที่แล้ว +2

      @@AIPlayerrrr Oh sorry, I thought you are referring to #deeplearning_ai. In that case I agree. He manipulates us all.

  • @mehmetcelepkolu7660
    @mehmetcelepkolu7660 4 ปีที่แล้ว +44

    3:24 - Woah, the legend is speaking French!

  • @frankie59er
    @frankie59er 3 ปีที่แล้ว +15

    These explanations are top-notch, definitely deserving of way more views

    • @MasterofPlay7
      @MasterofPlay7 3 ปีที่แล้ว

      you know the BERT model is the best nlp model right?

  • @PookyCodes
    @PookyCodes 3 ปีที่แล้ว +3

    Thank you so much for this valuable video!

  • @bionhoward3159
    @bionhoward3159 5 ปีที่แล้ว +2

    thank you!

  • @anggipermanaharianja6122
    @anggipermanaharianja6122 4 ปีที่แล้ว +3

    really clear explanation

  • @aryanyekrangi7093
    @aryanyekrangi7093 2 ปีที่แล้ว

    Great video series!

  • @youssefdirani
    @youssefdirani 3 ปีที่แล้ว +2

    Magical voice

  • @arthurswanson3285
    @arthurswanson3285 3 ปีที่แล้ว

    Well explained.

  • @arborymastersllc.9368
    @arborymastersllc.9368 ปีที่แล้ว

    Would definitely have to shift the location context weights for different languages as expressions shift word order across languages. Example: 3 left 2 right for language A. And 2 left 4 right Language B, with language specific variance to weight distribution themselves

  • @yeming3777
    @yeming3777 2 ปีที่แล้ว

    helps me a lot

  • @theSpicyHam
    @theSpicyHam 4 ปีที่แล้ว +2

    I learned an lot, thank you

  • @marinzhao3513
    @marinzhao3513 4 ปีที่แล้ว +1

    解析的很清晰

  • @zhenpingli2313
    @zhenpingli2313 5 ปีที่แล้ว

    real great

  • @mohsenboughriou9846
    @mohsenboughriou9846 ปีที่แล้ว

    man u're the best

  • @sandipansarkar9211
    @sandipansarkar9211 3 ปีที่แล้ว

    good expalantion but need to see again

  • @MrAkhilanil
    @MrAkhilanil ปีที่แล้ว

    Sounds cool, someone should build a chat bot with this tech!

  • @fackarov9412
    @fackarov9412 3 ปีที่แล้ว +3

    to me seems like applying kernels(1D CNN) to RNN and call it "attention"

  • @taku8751
    @taku8751 4 ปีที่แล้ว

    I hope the cursor can be bigger, I can not see it.

  • @abail7010
    @abail7010 2 ปีที่แล้ว +1

    The only thing I am not understanding is why its harder to translate shorter sentences? :)

  • @namHoang-lb6jp
    @namHoang-lb6jp 2 ปีที่แล้ว

    Some segments in the video are stamped not adjacent to each other

  • @thepresistence5935
    @thepresistence5935 2 ปีที่แล้ว +1

    Attention model invented by "avengers" 😆😆 2:55

  • @beniev1
    @beniev1 5 ปีที่แล้ว +1

    How can one know how many words should be in the output?

    • @beniev1
      @beniev1 5 ปีที่แล้ว

      I mean when you translate new sentence, not in the training...

    • @coralbow
      @coralbow 5 ปีที่แล้ว

      @@beniev1 There's a separate RNN for both encoder and decoder. Encoder RNN takes in a fixed length sequence as input and decoder RNN outputs a fixed length sequence. To make every sentence the same length (for encoder RNN), they usually add 0s to the start of sentences which are shorter than required length. For decoder RNN the idea is similar: they output a fixed length sequence of hidden states (word distributions) and for each state in that sequence they choose the most probable word. In that word distribution there is a word (EOS which basically stands for 0) they don't show this in visualisation because it adds no information.

    • @abbashoseini9344
      @abbashoseini9344 5 ปีที่แล้ว +5

      @@beniev1 I think according to this video there is no need to know number of words for your translation and you can't know this until you translate it . your translation will be finished when your network generate EOS word. and then you can count how many words network has generated for translation.

    • @louisraison
      @louisraison 5 ปีที่แล้ว +1

      @@abbashoseini9344 Exactly ! What is a bit misleading in the example here is that each word in the French sentence is translated by exactly one in English, but any word could actually be predicted instead, and the translation of this word could come later.

    • @abbashoseini9344
      @abbashoseini9344 5 ปีที่แล้ว

      it is important to note that it depend on you application. for example in sequence tagging problems you need to force model to make output and input have equal length. attention mechanism has this power to generalize all kind of problems with various constraints on input and output length.

  • @bhimireddyananthreddy1487
    @bhimireddyananthreddy1487 4 ปีที่แล้ว

    What does some set of features at 3:36 mean?

    • @ThePritt12
      @ThePritt12 4 ปีที่แล้ว

      an encoding = features of a sentence

    • @Utbdankar
      @Utbdankar 4 ปีที่แล้ว

      To determine each feature vector(the set of features) you use the input of the word itself and the previous feature vector, which outputs a new feature vector. You can think of the feature vector as "everything needed to translate the current word that came before the word in the sentence".

  • @arborymastersllc.9368
    @arborymastersllc.9368 ปีที่แล้ว

    Recursive context checking every so many words? Like after every noun verb combo identified, recheck context appropriatness.

  • @danielcai1017
    @danielcai1017 2 ปีที่แล้ว +1

    3:29 it seems he knows French well

  • @saeedullah5365
    @saeedullah5365 4 ปีที่แล้ว

    Why LSTM have more accuracy than Bi directional LSTM though is new approach

  • @2005sty
    @2005sty 2 ปีที่แล้ว +1

    It is not correct that a human translator carries out translation part by part especially in the case of translation of two different languages with different grammatic rules.

    • @isodoubIet
      @isodoubIet ปีที่แล้ว

      The "parts" don't have to be in the same order.

  • @moodiali7324
    @moodiali7324 3 ปีที่แล้ว +1

    very good content with very bad audio quality, hope that improves one day

  • @nichosetiawan1377
    @nichosetiawan1377 2 ปีที่แล้ว

    the video image is too poor, you need to fix it more

  • @procrastipractice
    @procrastipractice ปีที่แล้ว

    Good luck with German where one verb can be split over a huge distance

  • @fahds2583
    @fahds2583 3 ปีที่แล้ว +4

    video starts here 3:11

  • @ananthakrishnank3208
    @ananthakrishnank3208 ปีที่แล้ว

    3:00

  • @heejuneAhn
    @heejuneAhn 4 ปีที่แล้ว +2

    Oh, I see that is why Google translation is still bad with Engish-Japanese or Korean!

    • @socratic-programmer
      @socratic-programmer 4 ปีที่แล้ว

      Is that because the sentences read in an unusual order?
      Incidentally, recent models have become a lot better at multi-translation, so maybe they are better now than before.

    • @peterfireflylund
      @peterfireflylund 4 ปีที่แล้ว +2

      @@socratic-programmer No, it's not about an unusual order. That's actually quite easy to handle.
      It's because Korean and Japanese have completely different grammar from English. They are agglutinating, have pretty free word order because they both "tag" their words with short sounds that tell what role they play in the sentence, and they both allow most of the "real" sentence to be left out if it can be inferred from context. There are also problems with the semantic mapping between J/K and E where context is needed to figure out how to translate words/idioms. To top it all off, Japanese and Korean both have really complicated systems of honorifics. Oh, and copula is handled in *completely* different ways in J/K and E.
      Current translation systems are really bad at handling context above the sentence level, so... you can see the problems.
      Wikipedia has pretty good articles on Japanese, Korean, and English grammar.

    • @peterfireflylund
      @peterfireflylund 4 ปีที่แล้ว +2

      Forgot to add that tokenization is another issue. There are translating neural networks that are completely end-to-end: they take characters/punctuation as input and produce characters/punctuation as output. Most deep learning systems use a tokenizer before the input and a "detokenizer" after the output.
      Such a tokenizer may give all common words their own token number and it may split rarer words into smaller components, often using simple rules based on tables and regular expressions. It may also turn things like "isn't" into "is not" for English and "du" into "de le" for French.
      How to properly tokenize ideographic scripts like Chinese hanzi and Japanese kanji (and Korean Hanja) is still a research subject.
      Actually, even tokenization for *English* is still a research subject!

    • @socratic-programmer
      @socratic-programmer 4 ปีที่แล้ว

      @@peterfireflylund Some of those linguistic terms elude me, but that makes sense. Thanks for that explanation.

  • @dnaphysics
    @dnaphysics 3 หลายเดือนก่อน

    'Attention' should have been called 'context'.

  • @francois-xaviermenage4531
    @francois-xaviermenage4531 3 ปีที่แล้ว +2

    The French sentences are written on a very wrong way..

  • @jozefkoslinski4111
    @jozefkoslinski4111 2 ปีที่แล้ว

    「動画の音が良くない」、

  • @umairgillani699
    @umairgillani699 5 ปีที่แล้ว +10

    the audio quality from all professor NG's videos sucks!!