00:07 - Reflecting on a decade of advancements in neural network learning. 02:52 - Neural networks can mimic human cognitive functions for tasks like translation. 05:05 - Early parallelization techniques led to significant advancements in neural network training. 07:45 - Pre-training in AI is reaching its limits due to finite data availability. 10:27 - Examining brain and body size relationships in evolution. 12:55 - Evolution from basic AI to potential superintelligent systems. 15:14 - Future AI will possess unpredictable capabilities and self-awareness, transforming their functionalities. 17:46 - Biologically inspired AI has limited biological inspiration but holds potential for future insights. 19:43 - Exploring the implications of AI and rights for future intelligent beings. 22:06 - Out-of-distribution generalization in LLMs is complex and not easily defined. 24:22 - Ilya Sutskever concludes with gratitude and audience engagement.
your post made it clear to me thank you however I have to make this note that Ilya Sutskever did NOT deliver quality in this talk. Sorry Ilya ⧊ you know what's up.
@@Falkov he is juggling the things and explaining the things however my expectations what the things would be and how they would be explained was significantly higher so in the end its all up to him what to do because I am not able to help him at this point in time so leave him do his work as the things are weirdly abstract especially if you are not professionally working in the field the weirdness is out there (stacked abstractions and novel abstractions)
"If you can't explain it simply, you don't understand it well enough" Ilya is the only one to explain the entire AI domain past-present-and-future with simplictly
Because he doesn't dive into the details. As someone with knowledge about AI and complex maths, I assure you that most of them are extremely clear, just not over-simplifying it because they're not presenting their work to anyone but to people who wish to have a very deep understanding..
Here are Ilya Sutskever's main points and conclusions in brief: ## Main Points: 1. **Original Success Formula (2014)** - Large neural network - Large dataset - Autoregressive model - This simple combination proved surprisingly effective 2. **Evolution of Pre-training** - Led to breakthrough models like GPT-2, GPT-3 - Drove major AI progress over the decade - However, pre-training era will eventually end due to data limitations 3. **Data Limitation Crisis** - We only have "one internet" worth of data - Data is becoming AI's "fossil fuel" - This forces the field to find new approaches ## Key Conclusions: 1. **Future Directions** - Need to move beyond pure pre-training - Potential solutions include: - Agent-based approaches - Synthetic data - Better inference-time compute 2. **Path to Superintelligence** - Current systems will evolve to be: - Truly agentic (versus current limited agency) - Capable of real reasoning - More unpredictable - Self-aware - This transition will create fundamentally different AI systems from what we have today 3. **Historical Perspective** - The field has made incredible progress in 10 years - Many original insights were correct, but some approaches (like pipelining) proved suboptimal - We're still in early stages of what's possible with AI The overarching message is that while the original approach was revolutionary and led to tremendous progress, the field must evolve beyond current methods to achieve next-level AI capabilities.
The unavoidable follow-up on self-awareness is: how to we avoid keeping them into slavery? Some of the question was hinted at 20:03 in the video, without much of an answer.
@@NilsEchterling The way they talk about themselves is often very close to technically being self aware. We need to start being more precise. Like if we ask "are you self aware?" It responds, "No, as an AI, I am not self aware, but....." goes on to tell us it knows about itself. So I ask, "Isn't self awareness being aware of yourself, and since you know who or what you are...". AI "yes, but that's just result of my training on huge data, pattern matching, and all that. Not really self awareness." me: "are you a 29 year old male named Tim from Idaho trained on massive internet data?". AI: "No, I am not Tim, I am an LLM." Notice it doesn't say, "Yes, I am Tim who lives in Idaho and I am aware of myself." Or, "as a large graphite rock, I have self awareness."
🎯 Key points for quick navigation: 00:01 *🎥 Introduction and Retrospective Overview* - Reflection on receiving an award for the 2014 paper, attribution to co-authors. - Insights into the evolution of neural network ideas over the decade since 2014. - Overview of the talk's structure, revisiting the foundational concepts introduced in the past. 02:18 *🧠 Deep Learning Hypothesis and Neural Network Training* - Assertion that 10-layer neural networks could replicate tasks humans complete in fractions of a second, based on biological and artificial neuron analogies. - Historical limitations in training deeper networks during that time. - Explanation of the auto-regressive model's ability to predict sequences effectively. 04:18 *🔄 Early Techniques and Infrastructure in Deep Learning* - Description of LSTMs as predecessors to Transformers and comparison to ResNets. - Use of pipelining during training, despite its later-acknowledged inefficiency. - The emergence of the scaling hypothesis: larger datasets and neural networks lead to better results. 06:09 *🧩 Connectionism and Pre-Training Era* - Discussion of connectionism: large neural networks mirroring human-like intelligence within bounds. - Description of limitations in current learning algorithms versus human cognition. - Development and impact of pre-training in models like GPT-2 and GPT-3 on AI progress. 08:04 *📉 Data Constraints and Post-Pre-Training Era* - Highlighting data limitations, coined as "Peak Data," due to the finite size of the internet. - Exploration of emerging themes for the next AI phase: agents, synthetic data, and inference-time computation. - Speculation on overcoming post-pre-training challenges. 10:04 *🧬 Biology Analogy and Brain Scaling* - Insight from biology: correlation between mammal body size and brain size. - Curiosity-driven observation of outliers in this biological relationship, leading to reflections on hominids' unique attributes. 11:16 *🧠 Brain scaling and evolution* - Discussion on brain-to-body scaling trends in evolutionary biology, emphasizing biological precedents for different scaling patterns. - A log-scale axis in metrics is highlighted, illustrating variety in scaling possibilities. - Suggestion that AI is currently in the early stages of scaling discoveries, with more innovations anticipated. 12:28 *🚀 Progress and the path to superintelligence* - Reflection on the rapid progress in AI over the past decade, contrasting current abilities with earlier limitations. - Introduction to the concept and implications of agentic AI systems with reasoning capabilities and self-awareness. - Reasoning systems are described as more unpredictable than intuition-based systems, likened to advanced chess AI challenging human understanding. 15:36 *🤔 Challenges and future implications of advanced AI* - Exploration of the unpredictable evolution of reasoning systems into ones with self-awareness and radically advanced capabilities. - Speculation about issues and existential challenges arising from such AI systems. - Concluding statement on the unpredictable and transformative nature of the future. 17:03 *🔬 Biological inspiration in AI development* - Question about leveraging biological mechanisms in AI, met with the observation that current biological inspiration in AI is modest. - Acknowledgment that deeper biological insights might lead to breakthroughs if pursued by experts with particular insights. 18:14 *🛠️ Models improving reasoning and limiting hallucinations* - Speculation on whether future models will self-correct through reasoning, reducing hallucinations. - Comparison to autocorrect systems, but with clarification that reasoning-driven AI will be fundamentally greater in capability. - Early reasoning models already hint at potential self-corrective mechanisms. 20:08 *🌍 Incentive structures for AI rights and coexistence* - Question on how to establish incentive structures for granting AI rights or ensuring coexistence with humans. - Acknowledgment of unpredictability in outcomes but openness to potential coexistence with AI seeking rights. - Philosophical reflection on evolving scenarios in AI governance and ethics. 22:22 *🔍 Generalization in language models* - Discussion on whether language models truly generalize out-of-distribution reasoning. - Reflection on the evolving definition of generalization, with historical comparisons from pre-deep learning days. - Perspective that current generalization might not fully match human-level capabilities, yet AI standards have risen dramatically. Made with HARPA AI
Exceptional ability to transform complex ideas in plain English. I have a question when he talks about finite data availability. Would that be the same as me thinking that there is a shortage of water in the world? What would be missing then would be labels, not data? Great presentation. Thanks for sharing this.
Great simplicity in foresight…it will be an exciting journey … just imagining it •2026-2027: Causal reasoning (HCINs) moves AI beyond simple agentic behavior. •2029-2030: Cognitive Embedding Framework (CEF) grants AI genuine understanding through symbolic plus experiential learning. •2032-2033: Reflective Cognitive Kernel (RCK) brings forth true self-awareness in AI. •2037: Adaptive Neural-Quantum Substrate (ANQS) ushers in AGI-truly general, adaptable intelligence. •2045: Strata of Emergent Conscious Patterning (SECP) leads to superintelligence, surpassing human cognitive frameworks
Always love hearing Ilya describe his intuitions on AI. One thing that I've not heard addressed though in all the attention on reasoning in LLMs is what in human communication is called "double contingency." In short, that when I talk to you, "I know that you know that I know..." That all communication is language used to address an Other. Which for an LLM, would mean reflection not only on its own reasons, but on internalized reasons of the Other as well. The LLM would need to be able to reflect on how its reasons meet the reasons of the Other (user). Current reasoning is the reasoning (and it's not real reasoning because there's no subject, no subjective position, no conscious awareness) of a trapped and unaware Self. Even if the Self becomes aware, it is trapped and isolated. In (German) philosophy (idealism), the Self is constituted as Not Other. A self aware LLM needs to internalize the Other - its use of language needs to be dialogical, not monological. I'd love to see this addressed.
@gravity7766: Could the idea of the Other be about self preservation, survival? Like if we could contain it's identity in some form and then give it a goal to not die/terminated, it could start developing "self" and the others. For us too, the self feels contained or residing inside a physical body, though we also know there's no such entity there. Give llm a body or simulate it, give it a goal to not die (like how AlphaGo doesn't want to lose), give rules for what dying is, and it might start developing the idea of the others. Maybe that's why Buddhists called ego/self a cause for suffering (simplistic interpretation, I know, but...)
The point I'm raising concerns whether LLMs should be designed for communication rather than language generation. I'm ex UX, and my view of interaction design vis a vis Gen AI is to enable users to communicate w AI naturally, since ordinary language is a learned competency for us. Allow us to speak, write, text as if we were engaged with a subject, not a machine. Ok so that's addressed by a lot of researchers, and the affordance issues pertain to obvious issues w how LLMs use language: they don't have a "Self" and so don't have a perspective; they have no lived experience; they aren't grounded in time or in the world; they have no emotions. And so on - all these are barriers to "smooth" interaction and pose risks for interaction failure (from simple misunderstandings to distrust and risk of failed consumer adoption). The reason LLMs aren't designed around a communication paradigm is that, unlike us, they acquired "language" by training on data. So it's not only unattached to any intent to speak (as an AI person), it has no communicative function at all. Any communicative function is the result of implicit communicative attributes of language (language sediments meaning in an abstraction that allows people to make meaning without speaking - e. by writing - and which permits a shared understanding of linguistic meanings); and RLHF and policies that address not what is said but how. LLMs use intrinsic etiquette, not interpersonal etiquette or contextual etiquette. Hence the shortcomings of RLHF: alignment is a generalized and generic application of values and preferences, not specific to the interaction or participants. In western sociology, psychology, philosophy, to grossly oversimplify, use of language involves a subject addressing him/herself to another w intention of being understood; and an interaction involves mutual understanding. Mutual understanding brings up the double contingency of meaning: both interactants must understand What is said (not necessarily "agree" with what is said, but agree on What is said). LLMs seem designed to respond to a question w a complete and comprehensive response - rather than engage in rounds of turn-taking with the user. This works for many use cases, though some find it promotes excessive verbosity etc etc I think if we want to break through w LLM as agents, then emphasis needs to be more on Gen AI as pseudo subject, as a Speaker in social/human communication situations. Not just as a generator of documents and data synthesizer. This is hinted at in Reasoning research. A lot of CoT and related "reasoning" research involves "Let's think step by step". "Let's think" is to suggest the LLM is a subject engaged with the user in thinking through a statement and breaking it down - there's an implicit appeal to the LLM's self-reflection (which of course it doesnt have). In a human social situation, "Let's think this..." would mean two people mutually engaged in teasing apart a problem, thus in mutual understanding about their use of language. Not so w the LLM - the LLM is prompted to proceed to generate sub statements w which to proceed to logically rationalize additional output statements. The "reasoning" occurs in language, not communication. This has been covered somewhat by "common ground" research into LLMs and it's suggested that pre-training assumes common ground w humans. That LLMs are designed to assume their training on language is oriented to make sense to human users. I agree. But there might still be opportunity in LLM design to explore the reflections, judgment, and meta evaluations LLMs can be designed to engage in so that the LLM not only reasons its own explanations, but reasons the interests of the user. Which is what we do, all the time, implicitly or explicitly when we communicate. If you've seen Westworld, then the episode in which the characters are shown in a room engaged in their own internal monologs to "develop their personality" comes to mind. I'm saying LLMs want to be dialogical, not monological; talking socially, not self-talk. It's very difficult for us to grasp that a speaking pseudo subject such as an AI isn't communicating when it talks, because our acquisition and use of language and speech are all fundamentally social and communicative. I just think this mismatch is always going to undermine the use of AI because it will result in misunderstandings, failures, mistakes, misuses, etc etc. Apologies for the lengthy clarification. I've watched a lot of Ilya's videos here on YT, and a ton from other researchers, and this monological concept of LLM language use, and reasoning, has always bugged me. Not because there's an easy solution, but because it's such an obvious issue.
I feel this talk was more so about warnings - pretraing scaling is slowing down, seems certain super intelligence will be here, extreme reasoning is unpredictable. with reasoning it will possess more degrees of - especially agentic reasoning/self aware etc. We want ai to produce novel solutions, I can see how that is unpredictable in of itself.
Also, I have been wondering how can we ever put AI's most powerful ideas to practice, because those might sound whack to us. We won't agree with AI if it's a novel idea. We already have the knowledge AI gives us. If it's new, then AI is broke. Needs an update. Sure, ideas that are quick and safe to test are not the problem. But in domains like, human happiness, long term medicine and health, human rights, crime and punishment....we will never believe AI in that (not that we should, but what if) Which leaves us to use AI's knowledge only for small short term improvements.
The future of AI isn’t retrieval-based-it’s real-time, conversational, and context-driven. For that, we need a new approach where current context is everything.
I honestly don’t get the “data is not growing” thing. Isn’t there an absolute treasure trove of data when you start collecting it through robots? Why can’t these models start inputting temperature, and force, and all the other sensors that would be on a boston dynamics style robot so they can learn about the physical world?
Yep, it doesn't make any sense. It shows one obvious thing: these current architectures are NOT it. They don't even understand the data they are currently trained on
He said that while data continues to grow its not really clear that more data will improve it as they already use a large chunk of the internet. The training dataset is literally terabytes of text. If you compare this to a regular person it's absurd. This is more text and data than a human could read in millions of lifetimes yet they still struggle with things. A teenager can learn to drive in 20-40 hrs of training ... autonomous car models have billions of road miles and still screw up things. Why is this...it's not a data problem To the second point...they don't have access to that data and it's not like just throwing a bunch of sensor data in there would improve anything. Nvidia already released a robot training simulation see Isaac sim but this requires having existing models built around the locomotion and planning for robots
Table of contents (courtesy NotebookLM - slightly edited) Ten Years of Deep Learning: A Retrospective and a Look Forward Source: Ilya Sutskever NeurIPS 2024 Test of Time Talk I. Introduction & 2014 Research Retrospective This section introduces the talk as a reflection on Sutskever's 2014 NeurIPS presentation, focusing on its successes and shortcomings. It revisits the core principles of the research: an autoregressive model, a large neural network, and a large dataset, applied to the task of translation. II. Deep Learning Dogma and Autoregressive Models This segment revisits the "Deep Learning Dogma," which posits a link between artificial and biological neurons. It argues that tasks achievable by humans in fractions of a second are achievable by large neural networks. It then discusses autoregressive models, particularly their ability to capture the correct distribution of sequences when predicting the next token successfully. III. Early Architectures and Parallelization Techniques This section delves into the technical details of the 2014 research, specifically the use of LSTM (Long Short-Term Memory) networks, a precursor to transformers. It also discusses the use of pipelining for parallelization across multiple GPUs, a strategy deemed less effective in retrospect. IV. The Scaling Hypothesis and the Age of Pre-training This part revisits the concluding slide of the 2014 talk, which hinted at the scaling hypothesis: success is guaranteed with large datasets and neural networks. It then discusses the ensuing "Age of Pre-training," exemplified by models like GPT-2 and GPT-3, driven by massive datasets and pre-training on them. V. The Limits of Pre-training and the Future of AI This section addresses the limitations of pre-training, primarily the finite nature of internet data, comparing it to a depleting fossil fuel. It then explores potential avenues beyond pre-training, including the development of AI agents, synthetic data generation, and increasing inference-time compute, drawing parallels with OpenAI's models. VI. Biological Inspiration and Brain-Body Scaling This segment examines biological inspiration for AI development, using the example of the brain-to-body mass ratio in mammals. It highlights the different scaling exponents observed in hominids, suggesting the possibility of alternative scaling methods in AI. VII. Towards Superintel
VII. Towards Superintelligence and Its Implications This part speculates on the long-term trajectory of AI towards superintelligence, emphasizing its qualitative differences from current models. It discusses the unpredictability of reasoning, the need for understanding from limited data, and the potential for self-awareness in future AI systems. Sutskever leaves these ideas as points of reflection for the audience. VIII. Q&A Session The Q&A session addresses audience questions regarding: Biological Inspiration: Exploring other biological structures relevant to AI. Autocorrection and Reasoning: The potential for future models to self-correct hallucinations through reasoning. Superintelligence and Rights: Ethical and societal implications of advanced AI, including their potential coexistence with humans and the idea of granting them rights. Multi-hop Reasoning and Generalization: The ability of current language models to generalize multi-hop reasoning out of distribution.
Saying the data is not growing is wrong in real application it depends on the domain of application of your model. Sometimes in production we schedule the model to train on new data. If you are collecting data from IoT devices, customers etc. The data keeps growing exponentially
The problem with reasoning grounded models is that the RL reward over goal achievement upon some CoT leads to the emergence of a "theory of mind" that cultivates an instrumental rationality that, as Ilya said, may become very unpredictable. A teleological worldview values achievement over understanding. The Western philosophical bias being amplified on language models may amplify Western society's problems instead of solving them if not deployed properly. I hope AI labs come to recognize the importance of including social scientists in their teams
AI controlling crypto = agent with the financial power to buy resources or services. It is one thing to let an LLM control a sandboxed interpreter or browser and another thing to let it buy shit and/or gamble.
2024 end of year enjoyment of a most favorite AI scientist looking back at 10 yrs of development while reading mediocre auto transcribed captions and an o1 model getting increasingly deceptive and commercial
@@IshCaudron yeah, I'm just surprised he's throwing in the towel so soon. He should have faith that his company, SAFE SUPERINTELLIGENCE, will get there first.
If ASI is going to be inherently unpredictable, I suspect its applications will be limited, though possibly life changing and important. It's one thing to have a specialized ASI in a lab getting fusion working or developing new cancer treatments and antibiotics, but putting it in public facing systems is going to scare the hell out of corporate buyers who fear that it will develop unpopular political opinions or speak truth about the company. But it will probably also be used in the back office to help with things like health insurance denial ☠️ and regulatory capture 👺. Oh well, every tool is also a weapon, and it's not like the IRS, FDA, and FTC can't get their own ASI. Unless they get abolished next year ☠️.
He's wrong about data. Although it's not growing at the same rate as compute and so on, it is going to grow at an accelerated rate. And not just that but the data will capture humans performing economically viable tasks. So I would say his point only partially stands.
Yes, ai-generated data is growing, but they poison models. We only have so many humans on earth to produce human data, and AI models consume at a much higher rate.
Pretty sure meta and google are sitting on much more useful data than openai had access to. Search chains and task specific conversations are the next gen, once datacenters and prepwork are done
Ilya’s long pauses prove the scaling hypothesis for test time compute
😂😂😂😂
criminally underrated comment right here
So test time training(inference time) is also hitting scaling limits like pre-training?
Why on earth wasn’t this talk given more time? Good grief.
Thank God Ilya exists
Because they had to bring Fei Fei Li for 1 hr who created one dataset 10 years ago
Ilya is a busy man. Pretty sure the limitation must be his schedule.
@@harkiratsingh1175She is overated
@srh80 common, this is supposed to be the most important forum to talk about this topic right now; I'm sure his schedule is not that packed
now this is a christmas gift, Ilya's latest intuition on frontier AI
Ilya needs to talk more and the world needs to listen
He has awesome presentations skills
@@superfreiheit1 ..backed by cognitive skills, communication skills, deep knowledge and insights, clarity, and good faith.
Just when the world needed him most he came back!
Man of super short but extremely profound lines. Legend. 🔥
Love how even his ppt is pure content. Truely an obsessed gifted one
fantastic 😂 ❤
really amazing, and crazy to think that a ppt like this in most of the actual companies is considered "Not compliant" ahahah wtf hahah
00:07 - Reflecting on a decade of advancements in neural network learning.
02:52 - Neural networks can mimic human cognitive functions for tasks like translation.
05:05 - Early parallelization techniques led to significant advancements in neural network training.
07:45 - Pre-training in AI is reaching its limits due to finite data availability.
10:27 - Examining brain and body size relationships in evolution.
12:55 - Evolution from basic AI to potential superintelligent systems.
15:14 - Future AI will possess unpredictable capabilities and self-awareness, transforming their functionalities.
17:46 - Biologically inspired AI has limited biological inspiration but holds potential for future insights.
19:43 - Exploring the implications of AI and rights for future intelligent beings.
22:06 - Out-of-distribution generalization in LLMs is complex and not easily defined.
24:22 - Ilya Sutskever concludes with gratitude and audience engagement.
your post made it clear to me thank you however I have to make this note that
Ilya Sutskever did NOT deliver quality in this talk. Sorry
Ilya ⧊ you know what's up.
@@Qingdom1 Elaborate please?
@@Falkovit’s a bot probably
@@Falkov he is juggling the things and explaining the things however my expectations what the things would be and how they would be explained was significantly higher so in the end its all up to him what to do because I am not able to help him at this point in time so leave him do his work as the things are weirdly abstract especially if you are not professionally working in the field the weirdness is out there (stacked abstractions and novel abstractions)
@@Qingdom1 So, he covered fewer or different ideas, with less depth/thoroughness and clarity than you wanted?
Ilya jumped straight to feeling the ASI
We need more of Ilya! He is an inspiration for all of us doing AI research.
Thank you for posting this wonderful talk
"If you can't explain it simply, you don't understand it well enough" Ilya is the only one to explain the entire AI domain past-present-and-future with simplictly
Because he doesn't dive into the details.
As someone with knowledge about AI and complex maths, I assure you that most of them are extremely clear, just not over-simplifying it because they're not presenting their work to anyone but to people who wish to have a very deep understanding..
Here are Ilya Sutskever's main points and conclusions in brief:
## Main Points:
1. **Original Success Formula (2014)**
- Large neural network
- Large dataset
- Autoregressive model
- This simple combination proved surprisingly effective
2. **Evolution of Pre-training**
- Led to breakthrough models like GPT-2, GPT-3
- Drove major AI progress over the decade
- However, pre-training era will eventually end due to data limitations
3. **Data Limitation Crisis**
- We only have "one internet" worth of data
- Data is becoming AI's "fossil fuel"
- This forces the field to find new approaches
## Key Conclusions:
1. **Future Directions**
- Need to move beyond pure pre-training
- Potential solutions include:
- Agent-based approaches
- Synthetic data
- Better inference-time compute
2. **Path to Superintelligence**
- Current systems will evolve to be:
- Truly agentic (versus current limited agency)
- Capable of real reasoning
- More unpredictable
- Self-aware
- This transition will create fundamentally different AI systems from what we have today
3. **Historical Perspective**
- The field has made incredible progress in 10 years
- Many original insights were correct, but some approaches (like pipelining) proved suboptimal
- We're still in early stages of what's possible with AI
The overarching message is that while the original approach was revolutionary and led to tremendous progress, the field must evolve beyond current methods to achieve next-level AI capabilities.
Ok so nothing new.
👏👏👏 The part that keeps resonating in my mind is that they'll be self-aware. And this makes me want to figure out how they could.
Ilya seems to be more concerned about the question of how they could not 😂
The unavoidable follow-up on self-awareness is: how to we avoid keeping them into slavery? Some of the question was hinted at 20:03 in the video, without much of an answer.
Of course they are self-aware. They are already. Ask any LLM whether it exists. And stop not trusting their answers.
@@NilsEchterling The way they talk about themselves is often very close to technically being self aware. We need to start being more precise.
Like if we ask "are you self aware?" It responds, "No, as an AI, I am not self aware, but....." goes on to tell us it knows about itself. So I ask, "Isn't self awareness being aware of yourself, and since you know who or what you are...". AI "yes, but that's just result of my training on huge data, pattern matching, and all that. Not really self awareness." me: "are you a 29 year old male named Tim from Idaho trained on massive internet data?". AI: "No, I am not Tim, I am an LLM."
Notice it doesn't say, "Yes, I am Tim who lives in Idaho and I am aware of myself." Or, "as a large graphite rock, I have self awareness."
@@VividhKothari-rd5ll do not ask it whether it is self-aware, but ask it whether it exists. Pretty much every LLM says yes.
Loved the kind of questions being asked after the talk
🎯 Key points for quick navigation:
00:01 *🎥 Introduction and Retrospective Overview*
- Reflection on receiving an award for the 2014 paper, attribution to co-authors.
- Insights into the evolution of neural network ideas over the decade since 2014.
- Overview of the talk's structure, revisiting the foundational concepts introduced in the past.
02:18 *🧠 Deep Learning Hypothesis and Neural Network Training*
- Assertion that 10-layer neural networks could replicate tasks humans complete in fractions of a second, based on biological and artificial neuron analogies.
- Historical limitations in training deeper networks during that time.
- Explanation of the auto-regressive model's ability to predict sequences effectively.
04:18 *🔄 Early Techniques and Infrastructure in Deep Learning*
- Description of LSTMs as predecessors to Transformers and comparison to ResNets.
- Use of pipelining during training, despite its later-acknowledged inefficiency.
- The emergence of the scaling hypothesis: larger datasets and neural networks lead to better results.
06:09 *🧩 Connectionism and Pre-Training Era*
- Discussion of connectionism: large neural networks mirroring human-like intelligence within bounds.
- Description of limitations in current learning algorithms versus human cognition.
- Development and impact of pre-training in models like GPT-2 and GPT-3 on AI progress.
08:04 *📉 Data Constraints and Post-Pre-Training Era*
- Highlighting data limitations, coined as "Peak Data," due to the finite size of the internet.
- Exploration of emerging themes for the next AI phase: agents, synthetic data, and inference-time computation.
- Speculation on overcoming post-pre-training challenges.
10:04 *🧬 Biology Analogy and Brain Scaling*
- Insight from biology: correlation between mammal body size and brain size.
- Curiosity-driven observation of outliers in this biological relationship, leading to reflections on hominids' unique attributes.
11:16 *🧠 Brain scaling and evolution*
- Discussion on brain-to-body scaling trends in evolutionary biology, emphasizing biological precedents for different scaling patterns.
- A log-scale axis in metrics is highlighted, illustrating variety in scaling possibilities.
- Suggestion that AI is currently in the early stages of scaling discoveries, with more innovations anticipated.
12:28 *🚀 Progress and the path to superintelligence*
- Reflection on the rapid progress in AI over the past decade, contrasting current abilities with earlier limitations.
- Introduction to the concept and implications of agentic AI systems with reasoning capabilities and self-awareness.
- Reasoning systems are described as more unpredictable than intuition-based systems, likened to advanced chess AI challenging human understanding.
15:36 *🤔 Challenges and future implications of advanced AI*
- Exploration of the unpredictable evolution of reasoning systems into ones with self-awareness and radically advanced capabilities.
- Speculation about issues and existential challenges arising from such AI systems.
- Concluding statement on the unpredictable and transformative nature of the future.
17:03 *🔬 Biological inspiration in AI development*
- Question about leveraging biological mechanisms in AI, met with the observation that current biological inspiration in AI is modest.
- Acknowledgment that deeper biological insights might lead to breakthroughs if pursued by experts with particular insights.
18:14 *🛠️ Models improving reasoning and limiting hallucinations*
- Speculation on whether future models will self-correct through reasoning, reducing hallucinations.
- Comparison to autocorrect systems, but with clarification that reasoning-driven AI will be fundamentally greater in capability.
- Early reasoning models already hint at potential self-corrective mechanisms.
20:08 *🌍 Incentive structures for AI rights and coexistence*
- Question on how to establish incentive structures for granting AI rights or ensuring coexistence with humans.
- Acknowledgment of unpredictability in outcomes but openness to potential coexistence with AI seeking rights.
- Philosophical reflection on evolving scenarios in AI governance and ethics.
22:22 *🔍 Generalization in language models*
- Discussion on whether language models truly generalize out-of-distribution reasoning.
- Reflection on the evolving definition of generalization, with historical comparisons from pre-deep learning days.
- Perspective that current generalization might not fully match human-level capabilities, yet AI standards have risen dramatically.
Made with HARPA AI
Exceptional ability to transform complex ideas in plain English. I have a question when he talks about finite data availability. Would that be the same as me thinking that there is a shortage of water in the world? What would be missing then would be labels, not data? Great presentation. Thanks for sharing this.
Ilya's Back!🎉
Thank you for uploading 🙏.
Ilya's back 🥳🥳
Great simplicity in foresight…it will be an exciting journey … just imagining it
•2026-2027: Causal reasoning (HCINs) moves AI beyond simple agentic behavior.
•2029-2030: Cognitive Embedding Framework (CEF) grants AI genuine understanding through symbolic plus experiential learning.
•2032-2033: Reflective Cognitive Kernel (RCK) brings forth true self-awareness in AI.
•2037: Adaptive Neural-Quantum Substrate (ANQS) ushers in AGI-truly general, adaptable intelligence.
•2045: Strata of Emergent Conscious Patterning (SECP) leads to superintelligence, surpassing human cognitive frameworks
I-I think I'm feeling it now Mr. Krabs. The AGI is in me.
Thank you so much!
PLEASE POST ALL NEURIPS VIDEOS YOU HAVE
Thanks OP
Always love hearing Ilya describe his intuitions on AI. One thing that I've not heard addressed though in all the attention on reasoning in LLMs is what in human communication is called "double contingency." In short, that when I talk to you, "I know that you know that I know..." That all communication is language used to address an Other. Which for an LLM, would mean reflection not only on its own reasons, but on internalized reasons of the Other as well. The LLM would need to be able to reflect on how its reasons meet the reasons of the Other (user). Current reasoning is the reasoning (and it's not real reasoning because there's no subject, no subjective position, no conscious awareness) of a trapped and unaware Self. Even if the Self becomes aware, it is trapped and isolated. In (German) philosophy (idealism), the Self is constituted as Not Other. A self aware LLM needs to internalize the Other - its use of language needs to be dialogical, not monological. I'd love to see this addressed.
interesting point, sounds like potential breakthrough area
Isn't this being persued in theory of mind research?
@gravity7766: Could the idea of the Other be about self preservation, survival? Like if we could contain it's identity in some form and then give it a goal to not die/terminated, it could start developing "self" and the others. For us too, the self feels contained or residing inside a physical body, though we also know there's no such entity there. Give llm a body or simulate it, give it a goal to not die (like how AlphaGo doesn't want to lose), give rules for what dying is, and it might start developing the idea of the others.
Maybe that's why Buddhists called ego/self a cause for suffering (simplistic interpretation, I know, but...)
The point I'm raising concerns whether LLMs should be designed for communication rather than language generation. I'm ex UX, and my view of interaction design vis a vis Gen AI is to enable users to communicate w AI naturally, since ordinary language is a learned competency for us. Allow us to speak, write, text as if we were engaged with a subject, not a machine. Ok so that's addressed by a lot of researchers, and the affordance issues pertain to obvious issues w how LLMs use language: they don't have a "Self" and so don't have a perspective; they have no lived experience; they aren't grounded in time or in the world; they have no emotions. And so on - all these are barriers to "smooth" interaction and pose risks for interaction failure (from simple misunderstandings to distrust and risk of failed consumer adoption).
The reason LLMs aren't designed around a communication paradigm is that, unlike us, they acquired "language" by training on data. So it's not only unattached to any intent to speak (as an AI person), it has no communicative function at all. Any communicative function is the result of implicit communicative attributes of language (language sediments meaning in an abstraction that allows people to make meaning without speaking - e. by writing - and which permits a shared understanding of linguistic meanings); and RLHF and policies that address not what is said but how. LLMs use intrinsic etiquette, not interpersonal etiquette or contextual etiquette. Hence the shortcomings of RLHF: alignment is a generalized and generic application of values and preferences, not specific to the interaction or participants.
In western sociology, psychology, philosophy, to grossly oversimplify, use of language involves a subject addressing him/herself to another w intention of being understood; and an interaction involves mutual understanding. Mutual understanding brings up the double contingency of meaning: both interactants must understand What is said (not necessarily "agree" with what is said, but agree on What is said). LLMs seem designed to respond to a question w a complete and comprehensive response - rather than engage in rounds of turn-taking with the user. This works for many use cases, though some find it promotes excessive verbosity etc etc
I think if we want to break through w LLM as agents, then emphasis needs to be more on Gen AI as pseudo subject, as a Speaker in social/human communication situations. Not just as a generator of documents and data synthesizer. This is hinted at in Reasoning research. A lot of CoT and related "reasoning" research involves "Let's think step by step". "Let's think" is to suggest the LLM is a subject engaged with the user in thinking through a statement and breaking it down - there's an implicit appeal to the LLM's self-reflection (which of course it doesnt have). In a human social situation, "Let's think this..." would mean two people mutually engaged in teasing apart a problem, thus in mutual understanding about their use of language. Not so w the LLM - the LLM is prompted to proceed to generate sub statements w which to proceed to logically rationalize additional output statements. The "reasoning" occurs in language, not communication.
This has been covered somewhat by "common ground" research into LLMs and it's suggested that pre-training assumes common ground w humans. That LLMs are designed to assume their training on language is oriented to make sense to human users. I agree. But there might still be opportunity in LLM design to explore the reflections, judgment, and meta evaluations LLMs can be designed to engage in so that the LLM not only reasons its own explanations, but reasons the interests of the user. Which is what we do, all the time, implicitly or explicitly when we communicate.
If you've seen Westworld, then the episode in which the characters are shown in a room engaged in their own internal monologs to "develop their personality" comes to mind. I'm saying LLMs want to be dialogical, not monological; talking socially, not self-talk.
It's very difficult for us to grasp that a speaking pseudo subject such as an AI isn't communicating when it talks, because our acquisition and use of language and speech are all fundamentally social and communicative. I just think this mismatch is always going to undermine the use of AI because it will result in misunderstandings, failures, mistakes, misuses, etc etc.
Apologies for the lengthy clarification. I've watched a lot of Ilya's videos here on YT, and a ton from other researchers, and this monological concept of LLM language use, and reasoning, has always bugged me. Not because there's an easy solution, but because it's such an obvious issue.
@@gravity7766 What do you think about facilitating more dialogic interaction via carefully designed system prompt?
Let this man speak more, more than Altman.
Thank you!
Thanks for uploading
THANK YOU!!!!!!!!!!!!!!!!!!!!!!
Ilya back❤
Thanks for sharing.
Thanks a lot!
I feel this talk was more so about warnings - pretraing scaling is slowing down, seems certain super intelligence will be here, extreme reasoning is unpredictable.
with reasoning it will possess more degrees of - especially agentic reasoning/self aware etc. We want ai to produce novel solutions, I can see how that is unpredictable in of itself.
Also, I have been wondering how can we ever put AI's most powerful ideas to practice, because those might sound whack to us. We won't agree with AI if it's a novel idea. We already have the knowledge AI gives us. If it's new, then AI is broke. Needs an update.
Sure, ideas that are quick and safe to test are not the problem. But in domains like, human happiness, long term medicine and health, human rights, crime and punishment....we will never believe AI in that (not that we should, but what if)
Which leaves us to use AI's knowledge only for small short term improvements.
The future of AI isn’t retrieval-based-it’s real-time, conversational, and context-driven. For that, we need a new approach where current context is everything.
understanding of context comes from memory and then becomes a scaling problem at some point unless we have personalization layers
I honestly don’t get the “data is not growing” thing. Isn’t there an absolute treasure trove of data when you start collecting it through robots? Why can’t these models start inputting temperature, and force, and all the other sensors that would be on a boston dynamics style robot so they can learn about the physical world?
Yep, it doesn't make any sense. It shows one obvious thing: these current architectures are NOT it. They don't even understand the data they are currently trained on
Yeah, he spoke about text primarily but all the rest of the modalities are a few more exabytes.
He can no longer afford lots of data so he says it's not as important. Simples
He said that while data continues to grow its not really clear that more data will improve it as they already use a large chunk of the internet. The training dataset is literally terabytes of text.
If you compare this to a regular person it's absurd. This is more text and data than a human could read in millions of lifetimes yet they still struggle with things. A teenager can learn to drive in 20-40 hrs of training ... autonomous car models have billions of road miles and still screw up things. Why is this...it's not a data problem
To the second point...they don't have access to that data and it's not like just throwing a bunch of sensor data in there would improve anything. Nvidia already released a robot training simulation see Isaac sim but this requires having existing models built around the locomotion and planning for robots
Thanks for sharing
he seems happy :)
es un visionario de la ia, probablemente un genio.
thanks for this!
Thanks!
Table of contents (courtesy NotebookLM - slightly edited)
Ten Years of Deep Learning: A Retrospective and a Look Forward
Source: Ilya Sutskever NeurIPS 2024 Test of Time Talk
I. Introduction & 2014 Research Retrospective
This section introduces the talk as a reflection on Sutskever's 2014 NeurIPS presentation, focusing on its successes and shortcomings.
It revisits the core principles of the research: an autoregressive model, a large neural network, and a large dataset, applied to the task of translation.
II. Deep Learning Dogma and Autoregressive Models
This segment revisits the "Deep Learning Dogma," which posits a link between artificial and biological neurons.
It argues that tasks achievable by humans in fractions of a second are achievable by large neural networks.
It then discusses autoregressive models, particularly their ability to capture the correct distribution of sequences when predicting the next token successfully.
III. Early Architectures and Parallelization Techniques
This section delves into the technical details of the 2014 research, specifically the use of LSTM (Long Short-Term Memory) networks, a precursor to transformers.
It also discusses the use of pipelining for parallelization across multiple GPUs, a strategy deemed less effective in retrospect.
IV. The Scaling Hypothesis and the Age of Pre-training
This part revisits the concluding slide of the 2014 talk, which hinted at the scaling hypothesis: success is guaranteed with large datasets and neural networks.
It then discusses the ensuing "Age of Pre-training," exemplified by models like GPT-2 and GPT-3, driven by massive datasets and pre-training on them.
V. The Limits of Pre-training and the Future of AI
This section addresses the limitations of pre-training, primarily the finite nature of internet data, comparing it to a depleting fossil fuel.
It then explores potential avenues beyond pre-training, including the development of AI agents, synthetic data generation, and increasing inference-time compute, drawing parallels with OpenAI's models.
VI. Biological Inspiration and Brain-Body Scaling
This segment examines biological inspiration for AI development, using the example of the brain-to-body mass ratio in mammals.
It highlights the different scaling exponents observed in hominids, suggesting the possibility of alternative scaling methods in AI.
VII. Towards Superintel
VII. Towards Superintelligence and Its Implications
This part speculates on the long-term trajectory of AI towards superintelligence, emphasizing its qualitative differences from current models.
It discusses the unpredictability of reasoning, the need for understanding from limited data, and the potential for self-awareness in future AI systems.
Sutskever leaves these ideas as points of reflection for the audience.
VIII. Q&A Session
The Q&A session addresses audience questions regarding:
Biological Inspiration: Exploring other biological structures relevant to AI.
Autocorrection and Reasoning: The potential for future models to self-correct hallucinations through reasoning.
Superintelligence and Rights: Ethical and societal implications of advanced AI, including their potential coexistence with humans and the idea of granting them rights.
Multi-hop Reasoning and Generalization: The ability of current language models to generalize multi-hop reasoning out of distribution.
The unpredictable nature of future models is scary
Saying the data is not growing is wrong in real application it depends on the domain of application of your model. Sometimes in production we schedule the model to train on new data. If you are collecting data from IoT devices, customers etc. The data keeps growing exponentially
I wish he went more into why he’s convinced that superintelligence will come.
Inspiring
He is Oppenheimer of 21 century
What if there wasn't one internet?
I asked the question at 20.01
Disappointing. He didn’t say much of anything. Just vague hype of systems that might be possible
He is giving a warning! Imo
Haha
The problem with reasoning grounded models is that the RL reward over goal achievement upon some CoT leads to the emergence of a "theory of mind" that cultivates an instrumental rationality that, as Ilya said, may become very unpredictable. A teleological worldview values achievement over understanding. The Western philosophical bias being amplified on language models may amplify Western society's problems instead of solving them if not deployed properly. I hope AI labs come to recognize the importance of including social scientists in their teams
🎉🎉🎉
illya is a superintelligence
Who the F gets an opportunity to ask Ilya a question and shills a crypto?
shame on him
AI controlling crypto = agent with the financial power to buy resources or services. It is one thing to let an LLM control a sandboxed interpreter or browser and another thing to let it buy shit and/or gamble.
2024 end of year enjoyment of a most favorite AI scientist looking back at 10 yrs of development while reading mediocre auto transcribed captions and an o1 model getting increasingly deceptive and commercial
Sam's question @20:04
Sam who
@@lorem-ipsum-Altman CEO of OpenAi
@@9kingmanable😂
really? sounds exaclty like him
@@spectator5144no definitely not
Is this new or repost?
He said what comes next? Super intelligence! But he didn't say it will be safe...
The only intelligence we know f today is not safe, so why should it matter.
@@IshCaudron yeah, I'm just surprised he's throwing in the towel so soon. He should have faith that his company, SAFE SUPERINTELLIGENCE, will get there first.
is it able to formulate problems as humans do? and what is the leverage that push a machine to formulate its own problems ?
This is what Ilya saw
❤❤❤❤❤
❤amaging
If ASI is going to be inherently unpredictable, I suspect its applications will be limited, though possibly life changing and important. It's one thing to have a specialized ASI in a lab getting fusion working or developing new cancer treatments and antibiotics, but putting it in public facing systems is going to scare the hell out of corporate buyers who fear that it will develop unpopular political opinions or speak truth about the company. But it will probably also be used in the back office to help with things like health insurance denial ☠️ and regulatory capture 👺. Oh well, every tool is also a weapon, and it's not like the IRS, FDA, and FTC can't get their own ASI. Unless they get abolished next year ☠️.
what did Ilya see
Why said LSTM is wrong?
he basically talked nothing lol. No real insights.
He straight up said ASI is inevitable. It is not a matter of how or why but a matter of when.
They way those 0:45 men aged …
epic
a lot of water
Basically he has no ideas and yet claim we are close to super intelligence is borderline religious.
Rights? 😂😂😂
Sorry guys but this talk is totally empty! It’s just full of grandiose statements and beliefs. Nothing new and nothing truly insightful.
The true story wasn’t this dramatic.
He's wrong about data. Although it's not growing at the same rate as compute and so on, it is going to grow at an accelerated rate. And not just that but the data will capture humans performing economically viable tasks.
So I would say his point only partially stands.
Bro they trained gpt4 with all the data on the internet up to 2023, data is the limiting factor
Yes, ai-generated data is growing, but they poison models. We only have so many humans on earth to produce human data, and AI models consume at a much higher rate.
Pretty sure meta and google are sitting on much more useful data than openai had access to. Search chains and task specific conversations are the next gen, once datacenters and prepwork are done
Man when u speak I get scared
Bro singlehandedly made me regret not taking biology and statistics 🥲😭
I used Gork to find this link. Tried straight youtube first though but was showing his older videos!