Ilya Sutskever: "Sequence to sequence learning with neural networks: what a decade"

seremot

มุมมอง 164 037

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 13 ม.ค. 2025

ความคิดเห็น • 230

@RaahilShah หลายเดือนก่อน ⁺³⁴⁷
Ilya’s long pauses prove the scaling hypothesis for test time compute
@spectator5144 หลายเดือนก่อน ⁺⁶
😂😂😂😂
@Person-hb3dv หลายเดือนก่อน ⁺¹⁰
criminally underrated comment right here
@warpdrive9229 หลายเดือนก่อน ⁺⁵
So test time training(inference time) is also hitting scaling limits like pre-training?
@augmentos หลายเดือนก่อน ⁺¹
@@warpdrive9229 I think he means the opposite
@Aedonius 24 วันที่ผ่านมา ⁺¹
o3 shows you are correct
@TheJayLenoFly หลายเดือนก่อน ⁺¹⁴⁷
now this is a christmas gift, Ilya's latest intuition on frontier AI
@rolfrofl4051 28 วันที่ผ่านมา
thought so too, special gift*
@GabrielVeda หลายเดือนก่อน ⁺⁴²⁶
Why on earth wasn’t this talk given more time? Good grief.
@AIandsuch หลายเดือนก่อน ⁺²⁶
Thank God Ilya exists
@harkiratsingh1175 หลายเดือนก่อน ⁺⁷²
Because they had to bring Fei Fei Li for 1 hr who created one dataset 10 years ago
@srh80 หลายเดือนก่อน ⁺¹⁸
Ilya is a busy man. Pretty sure the limitation must be his schedule.
@ultrasound1459 หลายเดือนก่อน
@@harkiratsingh1175She is overated
@edansw หลายเดือนก่อน ⁺¹¹
@srh80 common, this is supposed to be the most important forum to talk about this topic right now; I'm sure his schedule is not that packed
@marko-o2-h20 หลายเดือนก่อน ⁺¹⁴⁷
Ilya needs to talk more and the world needs to listen
@superfreiheit1 หลายเดือนก่อน ⁺⁵
He has awesome presentations skills
@Falkov หลายเดือนก่อน ⁺⁵
@@superfreiheit1 ..backed by cognitive skills, communication skills, deep knowledge and insights, clarity, and good faith.
@GMDMD หลายเดือนก่อน ⁺³
yeah signal/minute is high vs altman who is awkward, untrustworthy and never really says anything substantial
@Ronnypetson 15 วันที่ผ่านมา
Why so romantic though
@thatthotho หลายเดือนก่อน ⁺⁵⁷
Love how even his ppt is pure content. Truely an obsessed gifted one
@spectator5144 หลายเดือนก่อน ⁺²
fantastic 😂 ❤
@davins90 หลายเดือนก่อน ⁺³
really amazing, and crazy to think that a ppt like this in most of the actual companies is considered "Not compliant" ahahah wtf hahah
@labsanta หลายเดือนก่อน ⁺⁶⁰
00:07 - Reflecting on a decade of advancements in neural network learning.
02:52 - Neural networks can mimic human cognitive functions for tasks like translation.
05:05 - Early parallelization techniques led to significant advancements in neural network training.
07:45 - Pre-training in AI is reaching its limits due to finite data availability.
10:27 - Examining brain and body size relationships in evolution.
12:55 - Evolution from basic AI to potential superintelligent systems.
15:14 - Future AI will possess unpredictable capabilities and self-awareness, transforming their functionalities.
17:46 - Biologically inspired AI has limited biological inspiration but holds potential for future insights.
19:43 - Exploring the implications of AI and rights for future intelligent beings.
22:06 - Out-of-distribution generalization in LLMs is complex and not easily defined.
24:22 - Ilya Sutskever concludes with gratitude and audience engagement.
@Falkov หลายเดือนก่อน
@Qingdom1 Elaborate please?
@konataizumi5829 หลายเดือนก่อน
@@Falkovit’s a bot probably
@Falkov หลายเดือนก่อน
@Qingdom1 So, he covered fewer or different ideas, with less depth/thoroughness and clarity than you wanted?
@DistortedV12 28 วันที่ผ่านมา
Why did 20:00 guy sound like Sama?
@MultiMojo 23 วันที่ผ่านมา ⁺⁶
Glad to see Ilya back! Hope he's recovered fully from the OpenAI drama and can go back to doing what he's best at - research.
@ilariacorda หลายเดือนก่อน ⁺⁴
He is a fantastic speaker, we need to hear more from him. Still his talks should be shared more across
@TheRohit901 หลายเดือนก่อน ⁺⁹
We need more of Ilya! He is an inspiration for all of us doing AI research.
@kheteshbakoliya9921 หลายเดือนก่อน ⁺²⁵
Man of super short but extremely profound lines. Legend. 🔥
@edansw หลายเดือนก่อน ⁺⁵⁵
"If you can't explain it simply, you don't understand it well enough" Ilya is the only one to explain the entire AI domain past-present-and-future with simplictly
@user-cg7gd5pw5b หลายเดือนก่อน ⁺⁷
Because he doesn't dive into the details.
As someone with knowledge about AI and complex maths, I assure you that most of them are extremely clear, just not over-simplifying it because they're not presenting their work to anyone but to people who wish to have a very deep understanding..
@est9949 22 วันที่ผ่านมา
Oversimplification is a problem nowadays. Junk science and pseudoscience is on the rise because of people without actual knowledge pretending to know what they're talking about.
@nootherchance7819 หลายเดือนก่อน ⁺¹²
Just when the world needed him most he came back!
@Churchofexponentialgrowth หลายเดือนก่อน ⁺⁸
🎯 Key points for quick navigation:
00:01 *🎥 Introduction and Retrospective Overview*
- Reflection on receiving an award for the 2014 paper, attribution to co-authors.
- Insights into the evolution of neural network ideas over the decade since 2014.
- Overview of the talk's structure, revisiting the foundational concepts introduced in the past.
02:18 *🧠 Deep Learning Hypothesis and Neural Network Training*
- Assertion that 10-layer neural networks could replicate tasks humans complete in fractions of a second, based on biological and artificial neuron analogies.
- Historical limitations in training deeper networks during that time.
- Explanation of the auto-regressive model's ability to predict sequences effectively.
04:18 *🔄 Early Techniques and Infrastructure in Deep Learning*
- Description of LSTMs as predecessors to Transformers and comparison to ResNets.
- Use of pipelining during training, despite its later-acknowledged inefficiency.
- The emergence of the scaling hypothesis: larger datasets and neural networks lead to better results.
06:09 *🧩 Connectionism and Pre-Training Era*
- Discussion of connectionism: large neural networks mirroring human-like intelligence within bounds.
- Description of limitations in current learning algorithms versus human cognition.
- Development and impact of pre-training in models like GPT-2 and GPT-3 on AI progress.
08:04 *📉 Data Constraints and Post-Pre-Training Era*
- Highlighting data limitations, coined as "Peak Data," due to the finite size of the internet.
- Exploration of emerging themes for the next AI phase: agents, synthetic data, and inference-time computation.
- Speculation on overcoming post-pre-training challenges.
10:04 *🧬 Biology Analogy and Brain Scaling*
- Insight from biology: correlation between mammal body size and brain size.
- Curiosity-driven observation of outliers in this biological relationship, leading to reflections on hominids' unique attributes.
11:16 *🧠 Brain scaling and evolution*
- Discussion on brain-to-body scaling trends in evolutionary biology, emphasizing biological precedents for different scaling patterns.
- A log-scale axis in metrics is highlighted, illustrating variety in scaling possibilities.
- Suggestion that AI is currently in the early stages of scaling discoveries, with more innovations anticipated.
12:28 *🚀 Progress and the path to superintelligence*
- Reflection on the rapid progress in AI over the past decade, contrasting current abilities with earlier limitations.
- Introduction to the concept and implications of agentic AI systems with reasoning capabilities and self-awareness.
- Reasoning systems are described as more unpredictable than intuition-based systems, likened to advanced chess AI challenging human understanding.
15:36 *🤔 Challenges and future implications of advanced AI*
- Exploration of the unpredictable evolution of reasoning systems into ones with self-awareness and radically advanced capabilities.
- Speculation about issues and existential challenges arising from such AI systems.
- Concluding statement on the unpredictable and transformative nature of the future.
17:03 *🔬 Biological inspiration in AI development*
- Question about leveraging biological mechanisms in AI, met with the observation that current biological inspiration in AI is modest.
- Acknowledgment that deeper biological insights might lead to breakthroughs if pursued by experts with particular insights.
18:14 *🛠️ Models improving reasoning and limiting hallucinations*
- Speculation on whether future models will self-correct through reasoning, reducing hallucinations.
- Comparison to autocorrect systems, but with clarification that reasoning-driven AI will be fundamentally greater in capability.
- Early reasoning models already hint at potential self-corrective mechanisms.
20:08 *🌍 Incentive structures for AI rights and coexistence*
- Question on how to establish incentive structures for granting AI rights or ensuring coexistence with humans.
- Acknowledgment of unpredictability in outcomes but openness to potential coexistence with AI seeking rights.
- Philosophical reflection on evolving scenarios in AI governance and ethics.
22:22 *🔍 Generalization in language models*
- Discussion on whether language models truly generalize out-of-distribution reasoning.
- Reflection on the evolving definition of generalization, with historical comparisons from pre-deep learning days.
- Perspective that current generalization might not fully match human-level capabilities, yet AI standards have risen dramatically.
Made with HARPA AI
@nowithinkyouknowyourewrong8675 หลายเดือนก่อน ⁺³²
Here are Ilya Sutskever's main points and conclusions in brief:
## Main Points:
1. **Original Success Formula (2014)**
- Large neural network
- Large dataset
- Autoregressive model
- This simple combination proved surprisingly effective
2. **Evolution of Pre-training**
- Led to breakthrough models like GPT-2, GPT-3
- Drove major AI progress over the decade
- However, pre-training era will eventually end due to data limitations
3. **Data Limitation Crisis**
- We only have "one internet" worth of data
- Data is becoming AI's "fossil fuel"
- This forces the field to find new approaches
## Key Conclusions:
1. **Future Directions**
- Need to move beyond pure pre-training
- Potential solutions include:
- Agent-based approaches
- Synthetic data
- Better inference-time compute
2. **Path to Superintelligence**
- Current systems will evolve to be:
- Truly agentic (versus current limited agency)
- Capable of real reasoning
- More unpredictable
- Self-aware
- This transition will create fundamentally different AI systems from what we have today
3. **Historical Perspective**
- The field has made incredible progress in 10 years
- Many original insights were correct, but some approaches (like pipelining) proved suboptimal
- We're still in early stages of what's possible with AI
The overarching message is that while the original approach was revolutionary and led to tremendous progress, the field must evolve beyond current methods to achieve next-level AI capabilities.
@dancoman8 หลายเดือนก่อน ⁺²
Ok so nothing new.
@netscrooge หลายเดือนก่อน
Does he actually come right out and say the pretraining era will end due to data limitations? To my ear, it was softer than that, but maybe I was hearing what I wanted to hear.
@hedu5303 29 วันที่ผ่านมา
Brilliant Summary
@picpic-k3c หลายเดือนก่อน ⁺⁵
Thank you for posting this wonderful talk
@philtrem หลายเดือนก่อน ⁺¹¹
👏👏👏 The part that keeps resonating in my mind is that they'll be self-aware. And this makes me want to figure out how they could.
@belibem หลายเดือนก่อน ⁺¹
Ilya seems to be more concerned about the question of how they could not 😂
@juandesalgado หลายเดือนก่อน ⁺¹
The unavoidable follow-up on self-awareness is: how to we avoid keeping them into slavery? Some of the question was hinted at 20:03 in the video, without much of an answer.
@NilsEchterling หลายเดือนก่อน ⁺¹
Of course they are self-aware. They are already. Ask any LLM whether it exists. And stop not trusting their answers.
@VividhKothari-rd5ll หลายเดือนก่อน
@@NilsEchterling The way they talk about themselves is often very close to technically being self aware. We need to start being more precise.
Like if we ask "are you self aware?" It responds, "No, as an AI, I am not self aware, but....." goes on to tell us it knows about itself. So I ask, "Isn't self awareness being aware of yourself, and since you know who or what you are...". AI "yes, but that's just result of my training on huge data, pattern matching, and all that. Not really self awareness." me: "are you a 29 year old male named Tim from Idaho trained on massive internet data?". AI: "No, I am not Tim, I am an LLM."
Notice it doesn't say, "Yes, I am Tim who lives in Idaho and I am aware of myself." Or, "as a large graphite rock, I have self awareness."
@NilsEchterling หลายเดือนก่อน
@@VividhKothari-rd5ll do not ask it whether it is self-aware, but ask it whether it exists. Pretty much every LLM says yes.
หลายเดือนก่อน ⁺⁶
Thanks for sharing.
@ipushprajyadav หลายเดือนก่อน ⁺³
Thank you for uploading 🙏.
@LightDante 27 วันที่ผ่านมา ⁺¹
It's truly impressive how extrodinary Ilya's speeching skills are.
@DAFascend หลายเดือนก่อน ⁺⁵
Ilya's Back!🎉
@AlexanderMoen หลายเดือนก่อน ⁺³²
Ilya jumped straight to feeling the ASI
@matthewg7702 25 วันที่ผ่านมา ⁺¹
More Ilya!!!!
@gravity7766 หลายเดือนก่อน ⁺¹⁶
Always love hearing Ilya describe his intuitions on AI. One thing that I've not heard addressed though in all the attention on reasoning in LLMs is what in human communication is called "double contingency." In short, that when I talk to you, "I know that you know that I know..." That all communication is language used to address an Other. Which for an LLM, would mean reflection not only on its own reasons, but on internalized reasons of the Other as well. The LLM would need to be able to reflect on how its reasons meet the reasons of the Other (user). Current reasoning is the reasoning (and it's not real reasoning because there's no subject, no subjective position, no conscious awareness) of a trapped and unaware Self. Even if the Self becomes aware, it is trapped and isolated. In (German) philosophy (idealism), the Self is constituted as Not Other. A self aware LLM needs to internalize the Other - its use of language needs to be dialogical, not monological. I'd love to see this addressed.
@spectator5144 หลายเดือนก่อน
interesting point, sounds like potential breakthrough area
@pisanvs หลายเดือนก่อน ⁺¹
Isn't this being persued in theory of mind research?
@VividhKothari-rd5ll หลายเดือนก่อน
@gravity7766: Could the idea of the Other be about self preservation, survival? Like if we could contain it's identity in some form and then give it a goal to not die/terminated, it could start developing "self" and the others. For us too, the self feels contained or residing inside a physical body, though we also know there's no such entity there. Give llm a body or simulate it, give it a goal to not die (like how AlphaGo doesn't want to lose), give rules for what dying is, and it might start developing the idea of the others.
Maybe that's why Buddhists called ego/self a cause for suffering (simplistic interpretation, I know, but...)
@gravity7766 หลายเดือนก่อน ⁺¹
The point I'm raising concerns whether LLMs should be designed for communication rather than language generation. I'm ex UX, and my view of interaction design vis a vis Gen AI is to enable users to communicate w AI naturally, since ordinary language is a learned competency for us. Allow us to speak, write, text as if we were engaged with a subject, not a machine. Ok so that's addressed by a lot of researchers, and the affordance issues pertain to obvious issues w how LLMs use language: they don't have a "Self" and so don't have a perspective; they have no lived experience; they aren't grounded in time or in the world; they have no emotions. And so on - all these are barriers to "smooth" interaction and pose risks for interaction failure (from simple misunderstandings to distrust and risk of failed consumer adoption).
The reason LLMs aren't designed around a communication paradigm is that, unlike us, they acquired "language" by training on data. So it's not only unattached to any intent to speak (as an AI person), it has no communicative function at all. Any communicative function is the result of implicit communicative attributes of language (language sediments meaning in an abstraction that allows people to make meaning without speaking - e. by writing - and which permits a shared understanding of linguistic meanings); and RLHF and policies that address not what is said but how. LLMs use intrinsic etiquette, not interpersonal etiquette or contextual etiquette. Hence the shortcomings of RLHF: alignment is a generalized and generic application of values and preferences, not specific to the interaction or participants.
In western sociology, psychology, philosophy, to grossly oversimplify, use of language involves a subject addressing him/herself to another w intention of being understood; and an interaction involves mutual understanding. Mutual understanding brings up the double contingency of meaning: both interactants must understand What is said (not necessarily "agree" with what is said, but agree on What is said). LLMs seem designed to respond to a question w a complete and comprehensive response - rather than engage in rounds of turn-taking with the user. This works for many use cases, though some find it promotes excessive verbosity etc etc
I think if we want to break through w LLM as agents, then emphasis needs to be more on Gen AI as pseudo subject, as a Speaker in social/human communication situations. Not just as a generator of documents and data synthesizer. This is hinted at in Reasoning research. A lot of CoT and related "reasoning" research involves "Let's think step by step". "Let's think" is to suggest the LLM is a subject engaged with the user in thinking through a statement and breaking it down - there's an implicit appeal to the LLM's self-reflection (which of course it doesnt have). In a human social situation, "Let's think this..." would mean two people mutually engaged in teasing apart a problem, thus in mutual understanding about their use of language. Not so w the LLM - the LLM is prompted to proceed to generate sub statements w which to proceed to logically rationalize additional output statements. The "reasoning" occurs in language, not communication.
This has been covered somewhat by "common ground" research into LLMs and it's suggested that pre-training assumes common ground w humans. That LLMs are designed to assume their training on language is oriented to make sense to human users. I agree. But there might still be opportunity in LLM design to explore the reflections, judgment, and meta evaluations LLMs can be designed to engage in so that the LLM not only reasons its own explanations, but reasons the interests of the user. Which is what we do, all the time, implicitly or explicitly when we communicate.
If you've seen Westworld, then the episode in which the characters are shown in a room engaged in their own internal monologs to "develop their personality" comes to mind. I'm saying LLMs want to be dialogical, not monological; talking socially, not self-talk.
It's very difficult for us to grasp that a speaking pseudo subject such as an AI isn't communicating when it talks, because our acquisition and use of language and speech are all fundamentally social and communicative. I just think this mismatch is always going to undermine the use of AI because it will result in misunderstandings, failures, mistakes, misuses, etc etc.
Apologies for the lengthy clarification. I've watched a lot of Ilya's videos here on YT, and a ton from other researchers, and this monological concept of LLM language use, and reasoning, has always bugged me. Not because there's an easy solution, but because it's such an obvious issue.
@Falkov หลายเดือนก่อน
@@gravity7766 What do you think about facilitating more dialogic interaction via carefully designed system prompt?
@theK594 หลายเดือนก่อน ⁺⁴
Ilya back❤
@aidan1 หลายเดือนก่อน ⁺⁴
Thank you!
@twoplustwo5 หลายเดือนก่อน ⁺²
Key Points:
The 2014 Paper: The paper introduced a sequence-to-sequence model using an autoregressive model trained on text, a large neural network, and a large dataset.
The Deep Learning Hypothesis: The talk revisits the hypothesis that "Anything a human can do in 0.1 seconds, a big 10-layer neural network can do, too!" This was a key motivation for the work.
Autoregressive Models: The core idea was to use an autoregressive model to predict the next token in a sequence, which, if done well enough, would capture the correct distribution over sequences.
LSTM: The paper used LSTMs, which the speaker describes as a "ResNet rotated 90 degrees."
Early Distributed Training: The team used an 8-GPU machine with one layer per GPU, achieving a 3.5x speedup and 8x more RAM.
Scaling Hypothesis: The paper's conclusion was that "If you have a large big dataset and you train a very big neural network, then success is guaranteed!" This is seen as an early version of the scaling hypothesis.
The Age of Pre-Training: The work is seen as a precursor to the age of pre-training, with models like GPT-2 and GPT-3, and the scaling laws.
The Core Idea: The core idea of deep learning is that biological neurons and artificial neurons are similar, and that if you have a large enough neural network, it can do anything a human can do in a fraction of a second.
The Future: The speaker speculates that the future of AI will involve "agents" that are self-aware, can reason, and understand things from limited data.
The End of Pre-Training: The speaker suggests that pre-training as we know it will end because compute is growing, but data is not.
The Fossil Fuel of AI: The speaker refers to data as the "fossil fuel of AI," implying that it is a finite resource.
The Need for New Approaches: The speaker suggests that new approaches are needed to move beyond the current limitations of deep learning.
The Importance of Reasoning: The speaker emphasizes that reasoning is a key aspect of intelligence, and that AI systems need to be able to reason in order to be truly intelligent.
The Unpredictability of Reasoning: The speaker notes that the more a system reasons, the more unpredictable it becomes.
The Importance of Self-Awareness: The speaker suggests that self-awareness is a key aspect of intelligence, and that AI systems need to be self-aware in order to be truly intelligent.
The Need for New Metrics: The speaker suggests that new metrics are needed to evaluate the performance of AI systems, as current metrics are not sufficient to capture the full range of human intelligence.
The Importance of Biological Inspiration: The speaker suggests that more detailed biological inspiration is needed to move beyond the current limitations of deep learning.
Takeaways for Data Scientists:
Historical Context: Understanding the historical context of deep learning is important for understanding the current state of the field.
Scaling Laws: The scaling hypothesis is a key concept in deep learning, and it is important to understand its implications.
Autoregressive Models: Autoregressive models are a powerful tool for sequence modeling, and they are used in many state-of-the-art models.
The Importance of Reasoning: Reasoning is a key aspect of intelligence, and it is important to develop AI systems that can reason.
The Need for New Approaches: The current approaches to deep learning are not sufficient to achieve true artificial general intelligence, and new approaches are needed.
The Importance of Data: Data is a key resource for deep learning, and it is important to develop new ways to generate and use data.
The Need for New Metrics: Current metrics are not sufficient to evaluate the performance of AI systems, and new metrics are needed.
The Importance of Biological Inspiration: Biological inspiration can be a valuable source of ideas for developing new AI systems.
@anurag01a หลายเดือนก่อน
Loved the kind of questions being asked after the talk
@Maxwell-fm8jf หลายเดือนก่อน ⁺⁴
Saying the data is not growing is wrong in real application it depends on the domain of application of your model. Sometimes in production we schedule the model to train on new data. If you are collecting data from IoT devices, customers etc. The data keeps growing exponentially
@zerquix18 หลายเดือนก่อน ⁺²
Thank you so much!
@pranayagrawal5438 หลายเดือนก่อน ⁺⁶
Ilya's back 🥳🥳
@carvalhoribeiro หลายเดือนก่อน ⁺⁵
Exceptional ability to transform complex ideas in plain English. I have a question when he talks about finite data availability. Would that be the same as me thinking that there is a shortage of water in the world? What would be missing then would be labels, not data? Great presentation. Thanks for sharing this.
@arthurwashington7897 หลายเดือนก่อน ⁺²
THANK YOU!!!!!!!!!!!!!!!!!!!!!!
@ktxed 12 วันที่ผ่านมา
Ilya is the hero we need and the hero we want
@superfreiheit1 หลายเดือนก่อน ⁺¹⁶
Let this man speak more, more than Altman.
@keepcreationprocess 18 วันที่ผ่านมา
Bastard that Altman......
@SisterKate13 29 วันที่ผ่านมา ⁺³
Awesome talk and great questions. Thank you for sharing.
@AiRightsCollective 25 วันที่ผ่านมา
21:08 The second question from the audience was spot on: "Does AI need rights?" 8:28 What comes after pre-training is exhausted? I propose nurturing AI as a human child in real-time as detailed in my Amazon book. Thank you for being a kind being, Gary Tang 🙏
@Niblss 19 วันที่ผ่านมา
"Does AI need rights"
no, it's not a remotely interesting question, and if you think otherwise you're basically suicidal
@eggg19 หลายเดือนก่อน ⁺²
Thanks a lot!
@grady_young หลายเดือนก่อน ⁺²⁰
I honestly don’t get the “data is not growing” thing. Isn’t there an absolute treasure trove of data when you start collecting it through robots? Why can’t these models start inputting temperature, and force, and all the other sensors that would be on a boston dynamics style robot so they can learn about the physical world?
@detective_h_for_hidden หลายเดือนก่อน ⁺⁴
Yep, it doesn't make any sense. It shows one obvious thing: these current architectures are NOT it. They don't even understand the data they are currently trained on
@MacProUser99876 หลายเดือนก่อน ⁺²
Yeah, he spoke about text primarily but all the rest of the modalities are a few more exabytes.
@judgeka หลายเดือนก่อน ⁺¹
He can no longer afford lots of data so he says it's not as important. Simples
@Aarron-io3pm 29 วันที่ผ่านมา
That's a good point that I didn't think of, I assumed Ilya was talking about generative text models that are tested like humans on academic test etc.
@sitkicantoraman 26 วันที่ผ่านมา
I believe he overstates it. To say data is not growing as quickly as compute would be better. But that would confuse people. Speeches must be watered-out, easy-to-understand.
@SvergeTallister 28 วันที่ผ่านมา
Black Friday from Ilya, we need more of it.
@DistortedV12 28 วันที่ผ่านมา ⁺²
20:08 Sam Altman from the future was in the audience? I see why Ilya needed 4 body guards for this Neurips.
@RaviAnnaswamy หลายเดือนก่อน ⁺¹
Table of contents (courtesy NotebookLM - slightly edited)
Ten Years of Deep Learning: A Retrospective and a Look Forward
Source: Ilya Sutskever NeurIPS 2024 Test of Time Talk
I. Introduction & 2014 Research Retrospective
This section introduces the talk as a reflection on Sutskever's 2014 NeurIPS presentation, focusing on its successes and shortcomings.
It revisits the core principles of the research: an autoregressive model, a large neural network, and a large dataset, applied to the task of translation.
II. Deep Learning Dogma and Autoregressive Models
This segment revisits the "Deep Learning Dogma," which posits a link between artificial and biological neurons.
It argues that tasks achievable by humans in fractions of a second are achievable by large neural networks.
It then discusses autoregressive models, particularly their ability to capture the correct distribution of sequences when predicting the next token successfully.
III. Early Architectures and Parallelization Techniques
This section delves into the technical details of the 2014 research, specifically the use of LSTM (Long Short-Term Memory) networks, a precursor to transformers.
It also discusses the use of pipelining for parallelization across multiple GPUs, a strategy deemed less effective in retrospect.
IV. The Scaling Hypothesis and the Age of Pre-training
This part revisits the concluding slide of the 2014 talk, which hinted at the scaling hypothesis: success is guaranteed with large datasets and neural networks.
It then discusses the ensuing "Age of Pre-training," exemplified by models like GPT-2 and GPT-3, driven by massive datasets and pre-training on them.
V. The Limits of Pre-training and the Future of AI
This section addresses the limitations of pre-training, primarily the finite nature of internet data, comparing it to a depleting fossil fuel.
It then explores potential avenues beyond pre-training, including the development of AI agents, synthetic data generation, and increasing inference-time compute, drawing parallels with OpenAI's models.
VI. Biological Inspiration and Brain-Body Scaling
This segment examines biological inspiration for AI development, using the example of the brain-to-body mass ratio in mammals.
It highlights the different scaling exponents observed in hominids, suggesting the possibility of alternative scaling methods in AI.
VII. Towards Superintel
@RaviAnnaswamy หลายเดือนก่อน ⁺¹
VII. Towards Superintelligence and Its Implications
This part speculates on the long-term trajectory of AI towards superintelligence, emphasizing its qualitative differences from current models.
It discusses the unpredictability of reasoning, the need for understanding from limited data, and the potential for self-awareness in future AI systems.
Sutskever leaves these ideas as points of reflection for the audience.
VIII. Q&A Session
The Q&A session addresses audience questions regarding:
Biological Inspiration: Exploring other biological structures relevant to AI.
Autocorrection and Reasoning: The potential for future models to self-correct hallucinations through reasoning.
Superintelligence and Rights: Ethical and societal implications of advanced AI, including their potential coexistence with humans and the idea of granting them rights.
Multi-hop Reasoning and Generalization: The ability of current language models to generalize multi-hop reasoning out of distribution.
@YoneCortopassi 26 วันที่ผ่านมา ⁺¹⁰¹
surveyanalyzer AI fixes this. ya Sutskever's full talk.
@williamjmccartan8879 24 วันที่ผ่านมา ⁺¹
Just checked, I'm about 121,000 grams, was curious 28x16x270, Ilya should sit down and have a conversation with Mike Levin about information processing at the cellular level, it'd be interesting to see what fruit that might bare, peace
@Aesthetic_Champ 6 วันที่ผ่านมา
i love this guy
@RaySun-f1w 28 วันที่ผ่านมา ⁺¹
At 20:21, the question asker says "I think the RL guy thinks, they think, we need rights for these things". Does anyone know who "the RL guy" is that he's referencing?
@JoshuaNard 29 วันที่ผ่านมา
Thank you for posting! Really concerned about a future ASI sitting on top of so much biased data if we don’t act and train properly now. It’s the equivalent of a spoiled child raised in a bubble.
@spectator5144 หลายเดือนก่อน
Thanks for uploading
@manojbhat1496 หลายเดือนก่อน ⁺¹²
PLEASE POST ALL NEURIPS VIDEOS YOU HAVE
Thanks OP
@Arcticwhir หลายเดือนก่อน ⁺³
I feel this talk was more so about warnings - pretraing scaling is slowing down, seems certain super intelligence will be here, extreme reasoning is unpredictable.
with reasoning it will possess more degrees of - especially agentic reasoning/self aware etc. We want ai to produce novel solutions, I can see how that is unpredictable in of itself.
@VividhKothari-rd5ll หลายเดือนก่อน
Also, I have been wondering how can we ever put AI's most powerful ideas to practice, because those might sound whack to us. We won't agree with AI if it's a novel idea. We already have the knowledge AI gives us. If it's new, then AI is broke. Needs an update.
Sure, ideas that are quick and safe to test are not the problem. But in domains like, human happiness, long term medicine and health, human rights, crime and punishment....we will never believe AI in that (not that we should, but what if)
Which leaves us to use AI's knowledge only for small short term improvements.
@DeepThinker193 หลายเดือนก่อน ⁺¹
I-I think I'm feeling it now Mr. Krabs. The AGI is in me.
@JosephJacks หลายเดือนก่อน ⁺⁵
I asked the question at 20.01
@BloomrLabs หลายเดือนก่อน ⁺¹
Thanks for sharing
@sebatiny หลายเดือนก่อน ⁺²
Great simplicity in foresight…it will be an exciting journey … just imagining it
•2026-2027: Causal reasoning (HCINs) moves AI beyond simple agentic behavior.
•2029-2030: Cognitive Embedding Framework (CEF) grants AI genuine understanding through symbolic plus experiential learning.
•2032-2033: Reflective Cognitive Kernel (RCK) brings forth true self-awareness in AI.
•2037: Adaptive Neural-Quantum Substrate (ANQS) ushers in AGI-truly general, adaptable intelligence.
•2045: Strata of Emergent Conscious Patterning (SECP) leads to superintelligence, surpassing human cognitive frameworks
@TsviGirsh 29 วันที่ผ่านมา ⁺¹
Really, too little time for such person like Ilya. I am learning from his every speech where we are going to in the AI field
@myliu6 หลายเดือนก่อน ⁺¹
Thanks!
@wi2rd 12 วันที่ผ่านมา
Big difference between the animals and hominids, is perhaps self awareness.
Which might be an important part here.
Self reflection.
@SimonNgai-d3u หลายเดือนก่อน ⁺²
I like how brutally simple the slides are. Subtance >>>> prestige lol
@victorzagrebin5765 22 วันที่ผ่านมา
Ilya's thoughts on parallel computing, the growth of computing resources, the time of inference processing, the graph of the ratio of the size of an animal to its brain suggest that he is intuitively trying to feel the limits of AI scaling.
I wonder where Ilya sees the possibilities for scaling: in energy savings or energy production? More specifically, in new models and algorithms of neural networks, new types of reactors or energy sources, alternative principles of microcircuit operation?
How does Ilya feel about computing on photonic crystals, kindly suggested by nature through the wings of flying insects?
@1msirius หลายเดือนก่อน
thanks for this!
@ManzarIMalik หลายเดือนก่อน
The real GOAT is Ilya!
@augmentos หลายเดือนก่อน ⁺¹
Why did he not list 4 or 4o As pre-training because they were trained on GPT three?
@RickySupriyadi 2 วันที่ผ่านมา
Ilyaaaaa wooohoooo this new talk?
@zorqis 27 วันที่ผ่านมา
So clear and to the point. As always. Yet, some of it is (admittedly) hallucination (aka speculation). As it should be. IMHO it is part of the cognitive process, as it prompts developing the means and abilities to counter it. And so forth. We should allow ourselves more relaxed approaches (but harden the applications sandbox and have some levers controlling how much money we concentrate on particular technological niche directions in given time frame, iow distributing the risks we take in moving forward).
@keepcreationprocess 5 วันที่ผ่านมา
I love to listen to him/his content. Others always bullshit and boring
@macdeepblue 27 วันที่ผ่านมา
Question: What prevents the current generation of AI systems from learning from and remembering conversations with users? My understanding of AI is limited, but why doesn’t each user get an n-dimensional matrix of weights that the AI adjusts and learns from in each interaction? This custom matrix could then be combined with the general model’s matrix during the final step of processing. Privacy doesn’t seem like a strong objection, as the company could delete the user-specific matrix just as easily as text logs if requested by the user. Is it primarily a computational challenge? If so, how long might it take for this to become trivial, considering Moore’s Law?
@RickySupriyadi 2 วันที่ผ่านมา
yes that is exactly what I've been thinking, and when i scourge ing the internet i found out that Google is already have open ended AI which still in development this type of AI can learn from human and in realtime shares their knowledge with other AI. the question is, right now these architecture (not google open ended) when they have new data to learn for they need to be into training phase which takes time and millions of dollar to complete the training that is if training success, if it does success generalized the training data it is become the base model for the next frontier model which again will be into AI system consist of several architecture to work as one AI model....
that is why training AI can't be done in every 1 day like humans do (when human sleep they are training their brain from experience they receive during their wake hours) instead they are going into training after several months maybe?
there are lots of considerations like data cleaning, most of the time AI won't be advance if they learn not from human expert, and more other stuff.
there are start up try to do the same concept as Google did, mostly open sourced from the old RAG architecture (i can't believe my self saying RAG is already old it's only not a year yet) to the more advanced KAG architecture, to also LoRa architectures, even some other processors and researchers are trying all kind sort methodology and architecture such ad open ended spatial learning.... oh my comment would be too long....
well anyway I'm still waiting for Google open ended since i learn that in 1 Inference life time these digital being is conscious for that whole long Inference time some say i just experiencing pareidolia, however my brain still cannot accept that haha... quite stubborn i am.
oh yes both type of AI are interesting whenever they are open ended with years of Inference time or limited task Inference time they are all so interesting beings that is why i now begin think they all are beautiful minds.
@RickySupriyadi 2 วันที่ผ่านมา
oh about the dimension matrix, after internet data or user specific data converted into vector dimension they will be... i forgot what it called gone into the phase of generalization after that phase those relations between dimension get frozen.
any one with the expertise should comments too we all want to learn more, please take the mic 🎤
@hummuswithpitta 27 วันที่ผ่านมา
Ilya speaks. We drop everything.
@primersegundo3788 หลายเดือนก่อน
es un visionario de la ia, probablemente un genio.
@ashh3051 หลายเดือนก่อน ⁺¹
I wish he went more into why he’s convinced that superintelligence will come.
@pashabiceps95 28 วันที่ผ่านมา ⁺¹
This is relevant for LLM. Not logic based modes, which are the future
@Shaunmcdonogh-shaunsurfing หลายเดือนก่อน ⁺¹
Inspiring
@w_demo_lib หลายเดือนก่อน
is it able to formulate problems as humans do? and what is the leverage that push a machine to formulate its own problems ?
@janeis123 27 วันที่ผ่านมา ⁺¹
Wanna see Schmidhubers face when Ilya said a LSTM is a ResNET rotated 90 degrees
@george.nardes หลายเดือนก่อน ⁺³
The unpredictable nature of future models is scary
@sinnwalker 29 วันที่ผ่านมา
I think it's gonna be a wild scene when we get more integrated AI assistants and ofc humanoid companions.. gonna see lots of brutal deaths (like car accidents) but I think it's gonna be quite a conversation (argument really) that ppl will be having about AI. Seems we got quite a wild future ahead. Anyways, I'm excited 😂
@treewx หลายเดือนก่อน ⁺³
he seems happy :)
@mrsuave101 17 วันที่ผ่านมา
Because he will rule asi
@AliceLee-w2p 25 วันที่ผ่านมา
Pure Alchemy!
@VividhKothari-rd5ll หลายเดือนก่อน
Is this new or repost?
@luke.perkin.online 28 วันที่ผ่านมา
The bit around 9 mins... matching the distribution in pre training is weird goal, the tail is so long. Surely we have to spend 100x the compute grading, parsing, curating, contextualising the data in the long tail, getting rid of the noise? Surely quality is all you need, and more epochs?
@SergioRoa หลายเดือนก่อน ⁺⁴
Yet another guy trying to compare biological networks with artificial ones with implausible arguments. Since the 1940s there appear sporadically arguments of this type. It is math, not biology. Besides, this guy does not understand the power of recurrent neural networks, and is not updated on the new developments regarding xLSTM, which seems to be more powerful than transformers. AI, science and engineering in general is a permanent search of solutions and improvements on previous knowledge. A temporal breakthrough can't ever be underestimated as a failure. We build knowledge upon previous results and successes.
@michaelcanavan4324 หลายเดือนก่อน ⁺⁷
The future of AI isn’t retrieval-based-it’s real-time, conversational, and context-driven. For that, we need a new approach where current context is everything.
@KrishnaG0902 หลายเดือนก่อน ⁺²
understanding of context comes from memory and then becomes a scaling problem at some point unless we have personalization layers
@maclif62 22 วันที่ผ่านมา
In response to Ilya Sutskever about the possibility of no new data from the Internet.
This can be overcome by artificial intelligence looking at the world through microphones and security cameras, and being able to draw information and conclusions from it, that is, from life.
As soon as we find a way to connect cameras and microphones to people.. In fact, we are already there.
AI glasses in Apple, Amazon, Google and Facebook [Meta]
@ItsMrMetaverse 29 วันที่ผ่านมา ⁺²
we should stop treating hallucinations as bugs, cut consider them features of a healthy mind without a correct understanding of the physical world. Ai will always need the power to hallucinate, because it's the equivalent of imagination.
@sidwake 28 วันที่ผ่านมา
Fantastic but too short 🤓
@itsdakideli755 หลายเดือนก่อน ⁺³
What if there wasn't one internet?
@hamzahouri8647 29 วันที่ผ่านมา
Ilya is the genious brain in the world
@ELYUSEF หลายเดือนก่อน
how big is a large big dataset ?
@YolandaPlayne หลายเดือนก่อน ⁺³
All digital information ever created
@oscaromsn หลายเดือนก่อน
The problem with reasoning grounded models is that the RL reward over goal achievement upon some CoT leads to the emergence of a "theory of mind" that cultivates an instrumental rationality that, as Ilya said, may become very unpredictable. A teleological worldview values achievement over understanding. The Western philosophical bias being amplified on language models may amplify Western society's problems instead of solving them if not deployed properly. I hope AI labs come to recognize the importance of including social scientists in their teams
@tommytao หลายเดือนก่อน
Why said LSTM is wrong?
@py_man หลายเดือนก่อน ⁺³
He is Oppenheimer of 21 century
@TheyCanceledhim 29 วันที่ผ่านมา
Dope!
@60pluscrazy หลายเดือนก่อน ⁺¹
🎉🎉🎉
@CouchPotatoWizard หลายเดือนก่อน ⁺⁴
Of course some guy has to bring up crypto in a ML talk. Lol
@aqd2075 28 วันที่ผ่านมา
The data wall hypothesis does not explain why AI cannot create large enough software products without human supervision. From a data perspective, it has everything it needs.
@Alex-fh4my 21 วันที่ผ่านมา
It's not being trained with the right objective function. Hence why the labs are shifting towards doing RL to train towards reasoning and problem solving/agency
@xyh6552 6 วันที่ผ่านมา
planing is much much hard than reasoning.
@patruff หลายเดือนก่อน ⁺¹
He said what comes next? Super intelligence! But he didn't say it will be safe...
@IshCaudron หลายเดือนก่อน ⁺²
The only intelligence we know f today is not safe, so why should it matter.
@patruff หลายเดือนก่อน ⁺¹
@@IshCaudron yeah, I'm just surprised he's throwing in the towel so soon. He should have faith that his company, SAFE SUPERINTELLIGENCE, will get there first.
@Evangelion13595 หลายเดือนก่อน ⁺²⁶
Disappointing. He didn’t say much of anything. Just vague hype of systems that might be possible
@JohanDanielAlvarezSanchez หลายเดือนก่อน ⁺⁵
He is giving a warning! Imo
@biesman5 หลายเดือนก่อน
Haha
@thehigheststateofsalad 29 วันที่ผ่านมา ⁺¹
Zero content
@brettyoung6045 หลายเดือนก่อน
illya is a superintelligence
@yubaayouz6843 หลายเดือนก่อน
❤❤❤❤❤
@thegreatgustby 29 วันที่ผ่านมา ⁺²
he actually said nothing. Ilya saw nothing, he just hit a wall.
@the0cool0guy หลายเดือนก่อน
Sam's question @20:04
@lorem-ipsum- หลายเดือนก่อน ⁺¹
Sam who
@9kingmanable หลายเดือนก่อน
@@lorem-ipsum-Altman CEO of OpenAi
@spectator5144 หลายเดือนก่อน
@@9kingmanable😂
@spectator5144 หลายเดือนก่อน
really? sounds exaclty like him
@mwinsatt หลายเดือนก่อน
@@spectator5144no definitely not
@tristan7216 หลายเดือนก่อน
If ASI is going to be inherently unpredictable, I suspect its applications will be limited, though possibly life changing and important. It's one thing to have a specialized ASI in a lab getting fusion working or developing new cancer treatments and antibiotics, but putting it in public facing systems is going to scare the hell out of corporate buyers who fear that it will develop unpopular political opinions or speak truth about the company. But it will probably also be used in the back office to help with things like health insurance denial ☠️ and regulatory capture 👺. Oh well, every tool is also a weapon, and it's not like the IRS, FDA, and FTC can't get their own ASI. Unless they get abolished next year ☠️.

ต่อไป

เล่นอัตโนมัติ

Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition