Language or Vision - What's Harder? (Ilya Sutskever) | AI Podcast Clips

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 มิ.ย. 2024
  • Full episode with Ilya Sutskever (May 2020): • Ilya Sutskever: Deep L...
    Clips channel (Lex Clips): / lexclips
    Main channel (Lex Fridman): / lexfridman
    (more links below)
    Podcast full episodes playlist:
    • Lex Fridman Podcast
    Podcasts clips playlist:
    • Lex Fridman Podcast Clips
    Podcast website:
    lexfridman.com/ai
    Podcast on Apple Podcasts (iTunes):
    apple.co/2lwqZIr
    Podcast on Spotify:
    spoti.fi/2nEwCF8
    Podcast RSS:
    lexfridman.com/category/ai/feed/
    Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic.
    Subscribe to this TH-cam channel or connect on:
    - Twitter: / lexfridman
    - LinkedIn: / lexfridman
    - Facebook: / lexfridman
    - Instagram: / lexfridman
    - Medium: / lexfridman
    - Support on Patreon: / lexfridman
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 62

  • @darshantank554
    @darshantank554 3 ปีที่แล้ว +17

    "where the vision ends, language begins" this line touches my heart!

  • @JonKroeker
    @JonKroeker ปีที่แล้ว +5

    Not only is this guy brilliant, he’s just such a nice guy

  • @bubelevakalisa7313
    @bubelevakalisa7313 3 ปีที่แล้ว +12

    Vision ends when the viewer (agent 1) sees the words. Language begins when the viewer (agent 1) combines the words it has seen with "prior knowledge" and then communicate "value added" information to a listener (agent 2). For example, when agent 1 "sees" (vision) the name Lewis Hamilton, it must be able to use its knowledge about Hamilton to effectively engage in a coherent conversation with an expert about this great F1 driver. At the moment state of the art like GPT3 can fake a coherent only when communicating with non-experts.

    • @sidgirase
      @sidgirase ปีที่แล้ว

      Vision take the visual input, Brain caches the sentences, NLP begins? If cache is out of memory, Vision goes back and queries the same input again?

  • @adambrickley1119
    @adambrickley1119 4 ปีที่แล้ว +4

    "i am going to explain why"...opens by asking a question, nice!

  • @nikhilvarmakeetha3917
    @nikhilvarmakeetha3917 4 ปีที่แล้ว +27

    The question "Where does vision end and language start?" was intriguing. It shows a potential final destination that needs to achieved for DL based AI.

    • @breakawaybooks4752
      @breakawaybooks4752 4 ปีที่แล้ว

      ✔️John Venn liked this.

    • @holgerjrgensen2166
      @holgerjrgensen2166 4 ปีที่แล้ว

      Windows start, should be, On/off, it is here Your illiteracy begin,
      better open some windows and get some fresh air, and understand the nature of dictator-principle.
      AI, is illiteracy and superstition, - intelligence can never be artificial.
      Repeating dead mantras, is Not individual thinking.
      The development of Consciousness and Language is two sides of the very same development, based on Eternal Principles.

    • @olivercroft5263
      @olivercroft5263 4 ปีที่แล้ว

      @@holgerjrgensen2166 more like returnal than eternal 🤔😘

    • @holgerjrgensen2166
      @holgerjrgensen2166 4 ปีที่แล้ว

      What do You mean, if You know what You're saying.

    • @holgerjrgensen2166
      @holgerjrgensen2166 4 ปีที่แล้ว

      Ru-Mu,
      okai, can means allmost any thing, in danish, it is åvkæj, just a sound-combination.

  • @DamianReloaded
    @DamianReloaded 4 ปีที่แล้ว +14

    This is about semantic interpretation. Whether image recognition and natural language processing could share the same "back end" for semantic interpretation and abstraction. I wonder if one could train an convolutional NN and a transformer to spit out the same semantic vector. So a natural language description of a picture and the picture would be compressed into the same (or similar) vector space coordinates ? :/

    • @Gyringag
      @Gyringag 4 ปีที่แล้ว +1

      There is already shine datasets for this task: you build net for NLP, net for CV and minimize KL-div between two hidden spaces

  • @burkebaby
    @burkebaby ปีที่แล้ว

    This was an interesting conversation! Lex - I wonder if the title should be "Language vs. Vision" instead. 6:56 - In terms of Generative AI, can Language and Vision both work to improve each other, like an arms race? How will the AI model and algorithm decide when to determine a pass or fail result for either/or?

  • @Ross-nd6xi
    @Ross-nd6xi 4 ปีที่แล้ว +2

    You should get a linguist on lex might be interesting to talk about the hermeneutic aspect of language learning and interpretation for AGI

  • @TimoNineSix
    @TimoNineSix 4 ปีที่แล้ว +4

    once the vision can read the language, the loop is complete

  • @user-my5qk5xu1d
    @user-my5qk5xu1d 4 ปีที่แล้ว +12

    0:49 The Word is "Interdisciplinary"

    • @ssssssstssssssss
      @ssssssstssssssss 4 ปีที่แล้ว +2

      Man. I hate that word.... It stems from artificial boundaries that we've created due to historical happenstance.

  • @jamesblankenship3077
    @jamesblankenship3077 4 ปีที่แล้ว +1

    This conversation really seemed to enlighten me on how language would have been impossible with sight and hearing. I can see that a word can have many definitions without the presence of a visual or tone of voice. So for the computer to learn. If we relate these few in the algorithm things so that the computer can as we did. If the computer is a rigid piece of electronics, isn't that how life began billions of years ago? Maybe with a better architect.

  • @FromFame
    @FromFame 4 ปีที่แล้ว +2

    I literarily suffer from the same cosmetic matter this respectable person suffers from.
    I use a solution daily, I understand how you can get used to it but please for the sake of other people research a solution too. I felt embarrassed to mention it and not many will, but I care about AI and those pushing it forward. Beyond being highly intelligent, you are an attractive person👍

  • @chrisbarry9345
    @chrisbarry9345 ปีที่แล้ว +1

    Man this is going to finally get watched by people

  • @johnniefujita
    @johnniefujita 4 ปีที่แล้ว +1

    i believe cnn and nlp should stand as inputs for decision making systems and reinforcement learning should explore space for actions, state and targets states. so the 2 first are more like perception constructor and the last as decision space explorer

  • @NoOne-uz4vs
    @NoOne-uz4vs 4 ปีที่แล้ว +7

    0:54 - Does anyone know what those principles are??

    • @nobodykid23
      @nobodykid23 4 ปีที่แล้ว +1

      This is just my ballpark guess, but i think it should be empirical risk minimization and something around no free lunch theorem. i dont know the third one

  • @pratik245
    @pratik245 2 ปีที่แล้ว

    Great Illya

  • @justinkiff4159
    @justinkiff4159 4 ปีที่แล้ว +3

    I think the wife example is quiet bad because there is a sexual component in the perception of the other, probably with a friend there will be more objectivity.
    Also yes if you have human level speech recognition and understanding you'll have the vision for free, understanding text is just a primitive form of acquiring information, replace objects on a picture by words and voila.

  • @shreeyatyagi
    @shreeyatyagi 4 ปีที่แล้ว

    Yes, the manmade world (physicality) our thought and action is primarily governed by language. So, language is fundamental.

  • @jonomichi2262
    @jonomichi2262 7 หลายเดือนก่อน

    I thought the interviewer was smart, but Ilya is on a different level.

  • @joshuaerkman1444
    @joshuaerkman1444 2 ปีที่แล้ว +1

    Language has much higher dimensionality than vision. Vision has three basic dimensions and that could probably be abstracted up to thousands or millions. Language has over 6,500 basic dimensions. The abstraction of these basic dimensions may go into the trillions

  • @timdh100
    @timdh100 4 ปีที่แล้ว +6

    Lex, how about a podcast with Shai Ben-David on advances on the theoretical side of ML?

  • @stevee5718
    @stevee5718 ปีที่แล้ว

    So interesting to look back at this interview now, in the wake of GPT4.

  • @umberto488
    @umberto488 24 วันที่ผ่านมา

    Beautiful hair

  • @leecharlie2513
    @leecharlie2513 3 ปีที่แล้ว +2

    Which field have more jobs(NLP or CV)? It seems to me that so far there are a lot more applications for CV, and therefore CV has more jobs opportunities than NLP. Simply search “computer vision job USA” in google and “NlP jobs USA”, the comparison result of both will show that CV has more jobs. Wonder what is your 2 cent on it? Maybe I am wrong?

    • @MrSchweppes
      @MrSchweppes 3 ปีที่แล้ว +2

      It will change this year or maybe in 2022.

    • @nn-sv5vi
      @nn-sv5vi 16 วันที่ผ่านมา

      It's 2024 and your opinions are still correct so far

  • @henrikbergman4055
    @henrikbergman4055 4 ปีที่แล้ว

    Throwing out a question here, as there are some clever people in the thread. Anyone care to help me understand why "natural language" (and does that exclude body language and tone of voice?) would be important for AI? As an example; IKEA furniture assembly instructions don't need words to explain stuff to humans. And being a poet is not a requirement for human level intelligence, right?

    • @seo95
      @seo95 4 ปีที่แล้ว +2

      Your examples are more about language generation, even if important, the hot topic nowadays is language understanding. Understanding language hides a lot of very difficult challenges. Among them reasoning about entities is one of the most difficult one. Each time we speak we refer to events happened in the past and in the present, make implicit relations between entities and talk about abstract things. The language is the description of the world in which we live and the abstract world we have created (the concept of nations, politics, jokes etc.). To understand language a machine needs at first to understand the world we have built. We are far from achieving something like that with AI.
      How can we pretend to have an "intelligent" machine if it can not understand us?

  • @IsmaelAlvesBr
    @IsmaelAlvesBr 4 ปีที่แล้ว

    The problem is that we are trying to make a robotic brain from scratch. Maybe the solution is to give initials steps so that it doesn't start from 0. It's like when you learn other language. You already know what is a dog, but need to learn how to say it in other "way" and when you should say it.

    • @maxsnts
      @maxsnts 4 ปีที่แล้ว +1

      How does that apply? When a baby is born he does not know what a dog is.
      The only thing he starts with are unconscious behaviors, like "cry if hungry".
      In that sense starting from scratch seams very similar.

  • @styles9783
    @styles9783 4 ปีที่แล้ว

    Hey Lex

  • @mohammadaminparchami7462
    @mohammadaminparchami7462 4 ปีที่แล้ว +5

    Hey lex, one cool thing would be to add some more media to the conversations. Show the guests some clips, read them news, and then we would like to hear their opinion. Great job ✋🏻👏🏻

  • @danielcogzell4965
    @danielcogzell4965 4 ปีที่แล้ว

    man.. I find it interesting how I really respect Ilya for what he achieved but I just don't agree with his views on things most of the time.

  • @pawarboy7
    @pawarboy7 2 ปีที่แล้ว

    I think vision lags language because it doesn't have a lot of labeled data

  • @Priyanka-us8rw
    @Priyanka-us8rw ปีที่แล้ว +1

    Computer vision fascinating more

  • @AM-qx3bq
    @AM-qx3bq 3 ปีที่แล้ว

    I don't understand the difficulty in the "Where vision ends and language starts" question. I imagine an advanced enough vision system can just recognize that a particular region of pixels assortment represents text, from that point it can be converted to raw text (which is a decades-old solved problem) and then fed to an NLP pipeline for interpretation. Imu, it's not a vision system's role to accomplish language understanding, but it would be ideal if it could at least identify what is text and relay it to the NLP component.

  • @ko95
    @ko95 ปีที่แล้ว

    hmmm

  • @BaikalLV
    @BaikalLV 4 ปีที่แล้ว +4

    8:15 such a blue pilled Lex

  • @michaelpetronzio6557
    @michaelpetronzio6557 4 ปีที่แล้ว +1

    You are the most nicest cutest thing!

  • @chocolategolemofroidgutand2839
    @chocolategolemofroidgutand2839 4 ปีที่แล้ว

    JUST

  • @jefferysherwood7424
    @jefferysherwood7424 4 ปีที่แล้ว +1

    🐸🐸🐸🐸🐸🐸

  • @olivercroft5263
    @olivercroft5263 4 ปีที่แล้ว +3

    Rezpect ze russians🇷🇺

  • @shreeyatyagi
    @shreeyatyagi 4 ปีที่แล้ว

    Language

    • @leecharlie2513
      @leecharlie2513 3 ปีที่แล้ว

      Why?

    • @shreeyatyagi
      @shreeyatyagi 3 ปีที่แล้ว

      @@leecharlie2513 because language is a representation.

    • @leecharlie2513
      @leecharlie2513 3 ปีที่แล้ว +2

      @@shreeyatyagi But isn’t the recent GPT-3 demonstrating very promising result to generating meaningful text and dialog?

  • @luisselvera9878
    @luisselvera9878 2 ปีที่แล้ว

    Vision ends when language starts.

  • @henrychoy2764
    @henrychoy2764 3 ปีที่แล้ว

    hav 2 say that the dumbest animals hav vision but not langwage

  • @enriquemartinez5647
    @enriquemartinez5647 4 ปีที่แล้ว

    Read what Lacan says about language. Not chomsky.