Stanford CS25: V4 I Hyung Won Chung of OpenAI

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ก.ย. 2024

ความคิดเห็น • 74

  • @hyungwonchung5222
    @hyungwonchung5222 3 หลายเดือนก่อน +350

    I very much enjoyed giving this lecture! Here is my summary:
    AI is moving so fast that it's hard to keep up. Instead of spending all our energy catching up with the latest development, we should study the change itself.
    First step is to identify and understand the dominant driving force behind the change. For AI, a single driving force stands out; exponentially cheaper compute and scaling of progressively more end-to-end models to leverage that compute.
    However this doesn’t mean we should blindly adopt the most end-to-end approach because such an approach is simply infeasible. Instead we should find an “optimal” structure to add given the current level of 1) compute, 2) data, 3) learning objectives, 4) architectures. In other words, what is the most end-to-end structure that just started to show signs of life? These are more scalable and eventually outperform those with more structures when scaled up.
    Later on, when one or more of those 4 factors improve (e.g. we got more compute or found a more scalable architecture), then we should revisit the structures we added and remove those that hinder further scaling. Repeat this over and over.
    As a community we love adding structures but a lot less for removing them. We need to do more cleanup.
    In this lecture, I use the early history of Transformer architecture as a running example of what structures made sense to be added in the past, and why they are less relevant now.
    I find comparing encoder-decoder and decoder-only architectures highly informative. For example, encoder-decoder has a structure where input and output are handled by separate parameters whereas decoder-only uses the shared parameters for both. Having separate parameters was natural when Transformer was first introduced with translation as the main evaluation task; input is in one language and output is in another.
    Modern language models used in multiturn chat interfaces make this assumption awkward. Output in the current turn becomes the input of the next turn. Why treat them separately?
    Going through examples like this, my hope is that you will be able to view seemingly overwhelming AI advances in a unified perspective, and from that be able to see where the field is heading. If more of us develop such a unified perspective, we can better leverage the incredible exponential driving force!

    • @s.7980
      @s.7980 3 หลายเดือนก่อน

      Thank you so much for giving us this lecture! Enabling us to think in this new perspective!

    • @frankzheng5221
      @frankzheng5221 3 หลายเดือนก่อน

      Nice of u

    • @shifting6885
      @shifting6885 3 หลายเดือนก่อน +1

      very nice explanation, thx a lot!

    • @ansha2221
      @ansha2221 3 หลายเดือนก่อน

      Amazing Content! Thank you

    • @shih-binshih9889
      @shih-binshih9889 3 หลายเดือนก่อน

      tks a lotssss!!

  • @labsanta
    @labsanta 3 หลายเดือนก่อน +19

    00:07 Hyung Won Chung works on large language models and training frameworks at OpenAI.
    02:31 Studying change to understand future trajectory
    07:00 Exponentially cheaper compute is driving AI research
    09:25 Challenges in modeling human thinking for AI
    13:43 AI research heavily relies on exponentially cheaper compute and associated scaling up.
    15:59 Understanding the Transformer as a sequence model and its interaction mechanism
    19:58 Explanation of cross attention mechanism
    21:55 Decoded only architecture simplifies sequence generation
    25:41 Comparing differences between the decoder and encoder-decoder architecture
    27:38 Hyung Won Chung discusses the evolution of language models.
    31:33 Deep learning hierarchical representation learning discussed
    33:34 Comparison between bidirectional and unidirectional fine-tuning for chat applications

  • @kitgary
    @kitgary 2 หลายเดือนก่อน +6

    There are so many genius in this field! Really amazing!

  • @varshneydevansh
    @varshneydevansh 3 หลายเดือนก่อน +16

    He is so well oriented with his thoughts and philosophical too. The way he correlated it with the Pen dropping and Gravity(Force) to map the AI(Linear Algebra).

  • @JuliusSmith
    @JuliusSmith 2 หลายเดือนก่อน +2

    Thanks for your excellent lecture! Regarding the "bitter lesson", I remain optimistic (as a signal processing expert), that we can push to the _left_ in that diagram to obtain comparable performance at far lest compute costs. While I agree with your prescription for frontier model development, there is much work to be done pushing to the left as well. I have already seen many instances of this. Witness for example the multi-scale perceptually rooted adversarial losses in the major audio codecs these days - sure, we could learn all that end to end by effectively simulating evolution of the human ear, but we don't really have to. For me the program is (1) get best results at the far right of your diagram (maximum compute, maximum performance), then (2) push to the left to reduce computation (by literally orders of magnitude) while maintaining comparable quality. Even distillation is an example of this. There are many. Thanks again for your stimulating talk!

  • @labsanta
    @labsanta 3 หลายเดือนก่อน +15

    Short Summary for [Stanford CS25: V4 I Hyung Won Chung of OpenAI](th-cam.com/video/orDKvo8h71o/w-d-xo.html)
    "History and Future of Transformers in AI: Lessons from the Early Days | Stanford CS25 Lecture"
    [00:07](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=7) Hyung Won Chung works on large language models and training frameworks at OpenAI.
    [02:31](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=151) Studying change to understand future trajectory
    [07:00](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=420) Exponentially cheaper compute is driving AI research
    [09:25](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=565) Challenges in modeling human thinking for AI
    [13:43](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=823) AI research heavily relies on exponentially cheaper compute and associated scaling up.
    [15:59](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=959) Understanding the Transformer as a sequence model and its interaction mechanism
    [19:58](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1198) Explanation of cross attention mechanism
    [21:55](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1315) Decoded only architecture simplifies sequence generation
    [25:41](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1541) Comparing differences between the decoder and encoder-decoder architecture
    [27:38](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1658) Hyung Won Chung discusses the evolution of language models.
    [31:33](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1893) Deep learning hierarchical representation learning discussed
    [33:34](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=2014) Comparison between bidirectional and unidirectional fine-tuning for chat applications
    ---------------------------------
    Detailed Summary for [Stanford CS25: V4 I Hyung Won Chung of OpenAI](th-cam.com/video/orDKvo8h71o/w-d-xo.html)
    "History and Future of Transformers in AI: Lessons from the Early Days | Stanford CS25 Lecture"
    [00:07](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=7) Hyung Won Chung works on large language models and training frameworks at OpenAI.
    - He has worked on various aspects of large language models, including pre-training, instruction fine-tuning, reinforcement learning with human feedback, and reasoning.
    - He has also been involved in the development of notable works such as the scaling flan papers like flan T5, flan Palm, and T5x, the training framework used to train the Palm language model.
    [02:31](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=151) Studying change to understand future trajectory
    - Identifying dominant driving forces behind the change
    - Predicting future trajectory based on understanding driving force
    [07:00](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=420) Exponentially cheaper compute is driving AI research
    - Compute costs decrease every five years, leading to AI research dominance
    - Machines are being taught to think in a general sense due to cost-effective computing
    [09:25](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=565) Challenges in modeling human thinking for AI
    - Attempting to model human thinking without understanding it poses fundamental flaws in AI.
    - The AI research has been focused on scaling up with weaker modeling assumptions and more data.
    [13:43](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=823) AI research heavily relies on exponentially cheaper compute and associated scaling up.
    - Current AI research paradigm is learning-based, allowing models to choose how they learn, which initially leads to chaos but ultimately leads to improvement with more compute.
    - Upcoming focus of the discussion will be on understanding the driving force of exponentially cheaper compute, and analyzing historical decisions and structures in Transformer architecture.
    [15:59](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=959) Understanding the Transformer as a sequence model and its interaction mechanism
    - A Transformer is a type of sequence model that represents interactions between sequence elements using dot products.
    - The Transformer encoder-decoder architecture is used for tasks like machine translation, involving encoding input sequences into dense vectors.
    [19:58](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1198) Explanation of cross attention mechanism
    - Decoder attends to output from encoder layers
    - Encoder only architecture simplified for specific NLP tasks like sentiment analysis
    [21:55](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1315) Decoded only architecture simplifies sequence generation
    - Decoder only architecture can be used for supervised learning by concatenating input with target
    - Self-attention mechanism serves both cross-attention and sequence learning within each, sharing parameters between input and target sequences
    [25:41](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1541) Comparing differences between the decoder and encoder-decoder architecture
    - The decoder attends to the same layer representation of the encoder
    - The encoder-decoder architecture has additional built-in structures compared to the decoder-only architecture
    [27:38](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1658) Hyung Won Chung discusses the evolution of language models.
    - Language models have evolved from simple translation tasks to learning broader knowledge.
    - Fine tuning pre-trained models on specific data sets can significantly improve performance.
    [31:33](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=1893) Deep learning hierarchical representation learning discussed
    - Different levels of information encoding in bottom and top layers of deep neural nets
    - Questioning the necessity of bidirectional input attention in encoders and decoders
    [33:34](th-cam.com/video/orDKvo8h71o/w-d-xo.html&t=2014) Comparison between bidirectional and unidirectional fine-tuning for chat applications
    - Bidirectional fine-tuning poses engineering challenges for multi-turn chat applications requiring re-encoding at each turn.
    - Unidirectional fine-tuning is more efficient as it eliminates the need for re-encoding at every turn, making it suitable for modern conversational interfaces.

  • @junxunpei
    @junxunpei 3 หลายเดือนก่อน +5

    Nice talk. Before this talk, i was confused about the scaling low and design in GPT. Now i understand the source of this wonderful work.

  • @manonamission2000
    @manonamission2000 2 หลายเดือนก่อน +5

    The dung beetle uses the milky way to navigate on the ground... humans use criteria, substantially different, to perform decision-making... the point I am trying to make is that mechanisms of machine-based reasoning may be wildly different than those we are aware of (mostly human) and we should be open to the possibility of discovering new forms of reasoning which may not fit within our preconceived notions of information processing/handling.

  • @g111an
    @g111an 3 หลายเดือนก่อน +11

    Thanks for giving me a new direction to think in. Learnt something new

  • @winterspring998
    @winterspring998 2 หลายเดือนก่อน +6

    OpenAI에서 일하면서 머리숱이 풍성한 그는 갓..

    • @laheek1190
      @laheek1190 2 หลายเดือนก่อน +1

      ㅋㅋㅋㅋㅋㅋㅋ

  • @KSK986
    @KSK986 2 หลายเดือนก่อน +1

    Great talk, I really enjoyed the perspective and intuition.

  • @AIWorks2040
    @AIWorks2040 3 หลายเดือนก่อน +48

    I keep watching this. This is my fourth time.

    • @KSSE_Engineer
      @KSSE_Engineer 3 หลายเดือนก่อน

      😂 bad learner .

    • @AIWorks2040
      @AIWorks2040 2 วันที่ผ่านมา

      @@KSSE_Engineer Completely opposite. A good learner is someone who can repeat and apply what they've learned. That takes longer time. If someone thinks they've learned everything quickly but actually has little understanding, that's a bad learner.

  • @AoibhinnMcCarthy
    @AoibhinnMcCarthy 3 หลายเดือนก่อน +10

    Great guest lecture

  • @sucim
    @sucim 3 หลายเดือนก่อน +3

    Good talk but too short. I love the premise of analysing the rate of change! Unfortunately the short time only permitted the observation of one particular change, it would be great to observer the changes in other details over time (architectural but also hardware, infra like FlashAttention) and also more historical changes (depth/resnets, RNN -> Transformers) and then use this library to make some predictions about possible/likely future directions for change

  • @luqmanmalik9428
    @luqmanmalik9428 3 หลายเดือนก่อน +4

    Thankyou for these videos.i am learning generative ai and llms and these videos are so helpful❤

  • @hayatisschon
    @hayatisschon 3 หลายเดือนก่อน +7

    Good lecture and insights!

  • @QueenLover-j5i
    @QueenLover-j5i 3 หลายเดือนก่อน +2

    By YouSum Live
    00:02:00 Importance of studying change itself
    00:03:34 Predicting future trajectory in AI research
    00:08:01 Impact of exponentially cheaper compute power
    00:10:10 Balancing structure and freedom in AI models
    00:14:46 Historical analysis of Transformer architecture
    00:17:29 Encoder-decoder architecture in Transformers
    00:19:56 Cross-attention mechanism between decoder and encoder
    00:20:10 All decoder layers attend to final encoder layer
    00:20:50 Transition from sequence-to-sequence to classification labels
    00:21:30 Simplifying problems for performance gains
    00:22:24 Decoder-only architecture for supervised learning
    00:22:59 Self-attention mechanism handling cross-attention
    00:23:13 Sharing parameters between input and target sequences
    00:24:03 Encoder-decoder vs. decoder-only architecture comparison
    00:26:59 Revisiting assumptions in architecture design
    00:33:10 Bidirectional vs. unidirectional attention necessity
    00:35:16 Impact of scaling efforts on AI research
    By YouSum Live

  • @claudioagmfilho
    @claudioagmfilho 3 หลายเดือนก่อน +3

    🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, Thanks for the info! A little complex for me, but we keep going!

  • @puahaha07
    @puahaha07 3 หลายเดือนก่อน +2

    Fascinating! Really curious what kinds of Q&A was exchanged in the classroom after this presentation.

  • @soyeopjeung2853
    @soyeopjeung2853 2 หลายเดือนก่อน

    좋은 강의 감사합니다

  • @kylev.8248
    @kylev.8248 3 หลายเดือนก่อน +3

    Thank you so much

  • @RahulAtlury
    @RahulAtlury 12 วันที่ผ่านมา

    this is lovely!🙂

  • @martinmatko6401
    @martinmatko6401 2 หลายเดือนก่อน

    What is the role of CCSVI Venous Hypertension and proper/improved Cerebrospinal Blood flow?? #CCSVI #BloodFlowMatters
    CCSVI is definitely one of the causes of MS.
    The novelty for some years is that we are certain that, after studying at the La Sapienza University of Rome, there are 3 types of CCSVI:
    * a 1 type with patients suffering from an obstacle to the endovacular venous discharge, i.e. due to congenital or acquired anomalies that restrict and block the drainage of the investigated veins (jugular, vertebral, azygos)
    * a 2nd type with patients suffering from an extra-vascular venous obstruction, i.e. due to external compression of the vessel
    * a 3 type with patients suffering from endo-vacular and extra-vascular venous obstruction
    So to simplify we can say that there is a CCSVI of the "hydraulic" type (1 type), a CCSVI of the "mechanical" type (2 type) and a "mixed" CCSVI of the two previous types (3 type), the 2 and the 3 type represent about 85% of cases.
    A patient with type 1 CCSVI will have a greater indication for angioplasty treatment, a patient with type 2 CCSVI will have a specific indication for specific physiotherapy decompressive treatment (RIMA Method), a patient with type 3 CCSVI it will have an indication for both an endo-vascular dilatation treatment and an extra-vascular decompression treatment.
    The RIMA Method devised by Dr. Domenico Ricci of Bari is able to release compressed veins throughout their course, as shown by a publication of June 2015 after a one-year study (Internal jugular Venous Compressione Syndrome: hemodynamic outcomes after cervical vertebral decompression manipulations-Pubmed).
    For information: Dr. Domenico Ricci cell.3393828399
    MRI IN MS VASCULAR PATHOLOGY
    www.dropbox.com/s/m0yvgmufgfcys1v/MRI%20IN%20MS%20-%20VASCULAR%20PATHOLOGY%20.pdf?dl=0
    This quantification of the disease pathology will help!
    #CCSVI
    Venous Hypertension
    >microbleedings
    >iron
    >inflammation
    >free radicals
    >neurodegeneration
    #multiplesclerosis
    M.S. - Mystery Solved
    Mysterious Autoimmunity
    = CCSVI Neurodegeneration
    M.S. - Mystery Solved
    Mysterious Autoimmunity
    = CCSVI Neurodegeneration
    Keep in mind!
    Also venous hypertension ➡️ impaired CSF absorption ➡️ reduced G Lymphatic drainage ➡️ interstitial peptides accumulation ➡️ NEURO INFLAMMATION #CCSVI
    Eliminating cause of the Symptoms of so called Multiple Sclerosis will End MS. Apparently it is unquantifiable the length of time Symptom$ can be treated!
    If you hadn't noticed
    Who Knew??
    #BloodFlowMatters
    What is the role of proper/improved Blood flow? #CCSVI Apparently #BloodFlowMatters
    Stroke common occurrence in Individuals diagnosed with Diabetes, unproven autoimmune THEORY so called MS
    Supplying Oxygen and Nutrients to Cell in a Body. Blood Circulation through and including activity/exercise. Building Blocks of life, having made yourself what you are functioning today! #CCSVI
    #Healthcare game changer when the cause
    The doctor of the future will give no medicine, but will interest his patient in the care of the human frame, in diet and in the cause and prevention of disease.
    -THOMAS EDISON
    Best possiblity easing/eliminating cause of SymptoMS!
    You can relate!
    If your veins are blocked they should be opened if you have SymptoMS or not!
    MRI IN MS VASCULAR PATHOLOGY
    www.dropbox.com/s/m0yvgmufgfcys1v/MRI%20IN%20MS%20-%20VASCULAR%20PATHOLOGY%20.pdf?dl=0
    Who Knew??
    #BloodFlowMatters
    What is the role of proper/improved Blood flow? #CCSVI Apparently #BloodFlowMatters
    Stroke is a common occurrence in Individuals diagnosed with Diabetes Neurovascular Disease Multiple Sclerosis is being referred 'slow Stroke'. What is the role of proper/improved Blood flow in both conditions as much CCSVI has been Scientifically confirmed a causative factor in Symptoms of so called MS!
    Supplying Oxygen and Nutrients to every Cell in a Body. Blood Circulation through and including activity/exercise. Building Blocks of life, having made yourself what you are functioning today! #CCSVI
    #Healthcare game changer when the cause of the Symptoms of Medical conditions are eliminated!
    The doctor of the future will give no medicine, but will interest his patient in the care of the
    human frame, in diet and in the cause and prevention of disease.
    -THOMAS EDISON
    Best possiblity easing/eliminating cause of SymptoMS!
    You can relate!
    If your veins are blocked they should be opened if you have SymptoMS or not!
    A Vascular problem led to the crippling nightmare of Multiple Sclerosis
    The real Multiple Sclerosis nightmare started at the point of NeuroDx
    The disaster of diagnosis being made by general physical observation over time,.
    Especially when Time is something you can’t afford #CCSVI
    Multiple Sclerosis is strong and you often need help.
    Make you be worthy of this help, don't stand in a corner complaining, do your part! 💪
    #Symptoms often ease/DISAPPEAR
    Facilitate Collaboration Neurovascular Disease Research! #CCSVI
    FB Group: MSS
    facebook.com/groups/4939355…!
    - Silent Ischemia ( Myocardial Ischemia Without Angina ) .. 🫀📚🔍
    ---------------------------------------------------------------
    # The medical definition of silent myocardial ischemia is verified myocardial ischemia without angina. Ischemia is a reduction of oxygen-rich blood supply to the heart muscle. Silent ischemia occurs when the heart temporarily doesn’t receive enough blood (and thus oxygen), but the person with the oxygen-deprivation doesn’t notice any effects. Silent ischemia is related to angina, which is a reduction of oxygen-rich blood in the heart that causes chest pain and other related symptoms ...
    # Most silent ischemia occurs when one or more coronary arteries are narrowed by plaque. It can also occur when the heart is forced to work harder than normal.
    People who have diabetes or who have had a heart attack are most likely to develop silent ischemia ..
    ----------------------------------------------
    # Risk factors :
    - People who are at risk for heart disease and angina are also at risk for silent ischemia. Risk factors include :
    use
    1- Diabetes
    2- High blood cholesterol
    3- High blood pressure
    4- A family history of heart disease
    5- Age (after age 45 for men and age 55 for women, risk increases)
    6- A sedentary lifestyle
    7- Obesity
    8- Unmanaged stress
    --------------------------------------------
    # To reduce your risk for silent ischemia, you should reduce your risk for heart disease in general. Here are some things you can do :
    1- Stop using tobacco of all kinds - including chewing and smoking tobacco and inhaling significant secondhand smoke .
    2- Prevent diabetes or manage it if you have it .
    3- Prevent or manage high blood pressure .
    4- Prevent or treat high blood cholesterol and triglyceride levels .
    5- Exercise regularly (talk to your doctor about what type of exercise is right for you) .
    6- If you’re overweight, lose weight; maintain a healthy weight .
    7- Eat a heart-healthy diet .
    8- Take steps to reduce stress in your life, and learn how to manage stress .
    9- See your doctor regularly, have recommended heart screenings, and follow your doctor’s instructions .
    ------------------------------------------------------
    By : Kareem Blinder
    Reference : www.beaumont.org/conditions/silent-ischemia
    ---------------------------------------------------------------
    #coachblinder #blinderpathology #anatomyandphysiology #diabetes #ischemia #سبحانك_لا_علم_لنا_الا_ما_علمتنا

  • @bayptr
    @bayptr 3 หลายเดือนก่อน

    Awesome lecture and interesting insights! One small remark: the green and red lines in the Performance vs. Compute graph should probably be monotonically increasing.

  • @pratyushparashar1736
    @pratyushparashar1736 3 หลายเดือนก่อน

    Amazing talk! I was wondering why the field has moved closer to decoder-only models lately and whether there's an explanation to it.

  • @aniketsakpal4969
    @aniketsakpal4969 3 หลายเดือนก่อน +1

    Amazing!

  • @_rd_kocaman
    @_rd_kocaman 3 หลายเดือนก่อน +1

    I keep watching this. This is my eleventh time.

  • @beaniegamer9163
    @beaniegamer9163 2 หลายเดือนก่อน

    Brilliant mind 👌 ❤🎉

  • @vvc354
    @vvc354 2 หลายเดือนก่อน +3

    한국에도 이런 인재가 있다니!

  • @ShowRisk
    @ShowRisk 3 หลายเดือนก่อน +1

    I keep watching this. This is my fourth time.

  • @specifictoken
    @specifictoken 3 หลายเดือนก่อน +2

    Go K-Bro!!

  • @gwonchanjasonyoon8087
    @gwonchanjasonyoon8087 3 หลายเดือนก่อน +1

    No PyTorch embedding?

  • @ryan2897
    @ryan2897 3 หลายเดือนก่อน +2

    I wonder if GPT-4o is a decoder only model with causal (uni-directional) attention 🤔

    • @lele_with_dots
      @lele_with_dots 3 หลายเดือนก่อน

      Less structure, it is just a huge mlp

  • @studenthub-q9g
    @studenthub-q9g 3 หลายเดือนก่อน +3

    Where is that shirt from nice white shirt

  • @s8x.
    @s8x. 3 หลายเดือนก่อน +4

    this guys stats are insane😂

  • @GatherVerse
    @GatherVerse 3 หลายเดือนก่อน +1

    You should invite Christopher Lafayette to speak

  • @sortof3337
    @sortof3337 3 หลายเดือนก่อน

    Okay who else though man named Huyng won Chung of openai. :D

  • @nityak8536
    @nityak8536 2 หลายเดือนก่อน

    👍👍

  • @MJFloof
    @MJFloof 3 หลายเดือนก่อน

    Awesome

  • @pierce8308
    @pierce8308 3 หลายเดือนก่อน +2

    I find this talk a bit unsatisfactory. He mentions how for encoder-decoder models, the decoder only attends to the last layer. also he mentions how we treat input and output seperately in encdoer-decoder. However thats not the point at all of encoder-decoder models right ? Its just that the encoder-decoder model has a intermediate encoder objective (to represent the input), thats all.
    The decoder attending to only last layer, or seperating input-output is just how the orignal transformer did it. Clearly its possible to just attend to layer wise encodings instead of only last layer encodings, just an example. Also its possible to mimic generation decoder model style by adding new input to encoder rather than decoder. I would have really liked some experiments, even if toy, because its incredibly unconvincing. Specfically how he mentions a couple times that encoder final layers are an information bottleneck, but I mean, just attend to layer wise embeddings if you want. Or put some MLP on top of encoder last states.
    Id argue, we are putting more structure in "decoder-only" model (by that I mean causal attention decoder, which is what he describes). The reason being causal attention, where we restrict the model to only attend to past, both during training and inference, even for part of output that is already generated.

  • @untitled1727
    @untitled1727 2 หลายเดือนก่อน +1

    와 개존잘이네....

  • @joedorben3504
    @joedorben3504 หลายเดือนก่อน

    Keep in mind that guys like this and Ilya are why OpenAI is what it is, not Sam Altman. I'm sure Altman is smart but he is not the kind of genius or as knowledgeable as these guys are

    • @MrFlawor
      @MrFlawor หลายเดือนก่อน

      Look at his portfolio and talk again. You’re tunnel visioning hard af. You can’t compare the two skill sets the way you do

  • @matheussmoraess
    @matheussmoraess 2 หลายเดือนก่อน

    What’s your favorite part?

  • @123jay34
    @123jay34 3 หลายเดือนก่อน

    Yep. I’m lost now lol

  • @dealscale
    @dealscale 2 หลายเดือนก่อน

    Imagine if an actual PhD scientists sat on the openai board making decisions. Instead rich trust fund baby with no background somehow makes it on the board.

  • @misterbeach8826
    @misterbeach8826 3 หลายเดือนก่อน +1

    Nice talk, but physical analogies such as 6:30 are... rather naive and high school level. He should have focused only on AI details.

  • @s.h5247
    @s.h5247 2 หลายเดือนก่อน

    I found him intellectually sexy 😊

  • @tungcaveusd
    @tungcaveusd 2 หลายเดือนก่อน

    i have no idea who he is

  • @joseph24gt
    @joseph24gt 3 หลายเดือนก่อน

    buy nvda!

  • @TheDoomWizard
    @TheDoomWizard 2 หลายเดือนก่อน +4

    No idea what he is talking about.

    • @childrenofkoris
      @childrenofkoris หลายเดือนก่อน

      hes talking about developing AI with human thinking capabilities, with past, current, and future developments..

  • @robertthallium6883
    @robertthallium6883 3 หลายเดือนก่อน

    show us the git ffs