True intelligence involves planning, learning, and modelling "new concepts" about the world and ideas in general. For this, pattern recognition is a necessary (but obviously not sufficient) requirement. Amazing content as always. Glad to see Chollet back on the show!
Something that I very often think of when reflecting on this definition of intelligence is how many forms of economically valuable work don’t require dealing with novelty - just pattern recognition and following standard processes. The type of intelligence that Arc asks us to strive for isn’t necessary for AI systems to displace a significant portion of human labor. It’s only needed if we want AI to replace all economically valuable work.
You're completely missing the point. The point is that those tasks don't require intelligence at all. He's not arguing against AI systems being useful; he's saying don't be fooled into thinking it's more capable than it is simply because it does so well on tasks that don't require it to use actual intelligence (as defined by chollet). The type of intelligence he is advocating for is necessary for true conversation and decision making. Not everyone is interested in AI simply to automate some process. There is a lot of room for AI systems to act as advisors who don't share the same memory deficits as humans and can simultaneously consider many more courses of action than a human without losing the context and goal of the one they're advising. This requires Chollet's version of intelligence.
AI companies have been promising AGI (whatever that means) but we can at least agree that an AGI could easily solve Arc. What you're doing isn't just shifting the goal post, you're trying to convince yourself that it's not even there.
Yesss finally someone realistically depicting the current state of AI. AI's are currently really good at pattern recognition Pattern recognition is of our brains intellectual functions yeah but it's not the only function involved in making humans intelligent Relational reasoning, spatial manipulation, different kinds of memory (working memory, short term, long term), executive functions - these are all interconnected intellectual functions of the human brain, and LLMs for instance are currently only capable of a subset of relational reasoning (which is pattern recognition) and also have memory - they simply are not at our level yet
"Intelligence vs Skill" Very well explained! This is where I believe Demis Hassabis got it wrong when he said that you can have intelligence without consciousness. I don't think you can. The smartest LLMs are like the subconscious part of our mind that can learn elaborate skills but that understand nothing. The conscious part of the brain, which is slow and serial (can only focus on one thing at a time) delegates most of the work to these programmed areas, only providing guidance where necessary. When we delegate too much and don't provide sufficient guidance, so we can focus on something else, we often end up executing unwanted actions, like taking a wrong turn in the car, walking into the wrong room or throwing food in the bin then putting the packaging in the fridge. We have to delegate but then guide and monitor occasionally because although our subconscious has the ability to execute actions, it has no idea why it's doing anything. It understands nothing. To understand requires the awareness we know as consciousness.
Thank you for explaining your thorough research and advanced thinking so clearly! I agree our brains and the path to AGI is made up of multiple agents and sub-agents working together, each with different expert specialisations, reward maximisation and loss minimisation functions built in. By using prompt engineering to create the appropriate expert agent (with many years of experience in that particular field and with the appropriate value system, thinking skills and output format), then chaining many agents together to work both hierarchically and sequentially, these collaborations unlock improved cognitive capabilities.
The space of vector functions is functionally complete. That means that in composed pipelines of vector functions some can be logical functions like AND and OR.
This channel has shown me the whole world of AI in a serious fashion - the intersection of logic, reasoning, philosophy, and science....makes the hype-side seem a little silly compared to the incredibly rich content this channel puts out.
Thank you This is beginning to be better understood. The risk appears to be held by the individual interpretation of intelligence, subject to pattern and anomaly distortions. It is likely best practice to assume that the agent is unaware of its intelligence, and must be analyzed by measuring patterns and anomalies within its boundary. This is a challenge confronted over and over as each agent is of specific design and is responsible to assess one and another and so on. The compute data is necessary for scale, however it is not the problem confined to proper function resolution. Meaning, what good is endless scale, pattern, intelligence, pattern, and compute in an ever outpaced faulty operating system? This good for idiots to remain believing that they are not. That is all its good for and so fault function/tyranny remains.
Fluid intelligence vs knowledge re professors entrenched in their beliefs (48:30) "It depends whether you…believe you already have the answer to the question or you believe you have templates that you can use to get the answer."
I think there are varying levels of consciousness. All the way from the individual cell up to the power and majesty of the neural networks of our minds. Neurons being the most conscious cells and together the most conscious mass of cells together.
My conception of intelligence is sort of similar to the Kaleidoscope model (I think searching through a tree of compositions of known ideas while pruning the tree through learnt pattern recognition is sufficient to deal with "novel" stimulus). I can also agree with intelligence being the sample efficiency needed to generalise. But it is also possible that sample efficiency is a product of scale. There is some potential evidence for that, (Larger LLMs learn faster than smaller LLMs) but that could also be explained by Larger LLMs just having better representations (fitting higher dimensional manifolds). There is also the question of how much is actually "novel", because there is a chance that you could just "solve" all of science with the currently observed data (everything is in distribution) but most people (including me) might be displeased if that were the case.
I maintain that humans ability to deal with truly novel situations is limited as well and the human will fall back to experience and instincts. Humans may be better at this but I don’t think it is a fundamental difference.
Developing curves from direct inverse trade off of extents (inverse vector direction symmetry) (vector extents, constant hypotenuse length), by multiplying synergistically differential change in slope length over time integrals (some integral measure), (integral of (slope1 delta * slope2 delta) = curve) has grounding in scaling dimensions towards deriving a larger scale reference frame factor to scale further with. Also, Instead of using hypotenuse constant length, you can use hypotenuse const slope, for synergistically shrinking vectors (vector direction symmetry component for extra dimensional comparison), and of course the integral of products * linear function. I mean, somewhat in the context of reinforcement, but not really, but more about end products of what you want the system to be like in it's end point composition of weight value relationships, we need to know whether the dog kinda looks like a cat, because that is a symmetry in the data to further scale from into a different relational space * proportionality of vector influence. Therefore the classifiers and backprop are not great for pass forward of nuanced relationships (overly quantized parameter behavior, lattice of proportions when scaled render smaller volume of adaptability). There are other losses to reconsider as significant losses from scaling high dimensional integration, the gaps will scale, compute per performance loss, inefficiency in the parameter summed product forward pass (output constraint of nuance, loss of constructive interference with dead priors and forward actuation guidance = false actuation symmetry with a working pathway toward goal, inability to derive high dimensional subspaces (sharp gaps between working gradients, flickering problems, hallucination as sharp phase transitions occur), the interdimensional lattice thins out as you scale sum of products, mean^n_power direction pointer to actuation subspace (will have quantized phase shifts that do not have uniformity in navigating that high dimensional spaces nuances) doesn't point to the right subspace, when there is actually much more tangents you could be relating per compute). As you can see, this is actually easier to fix than we think, it's all in the integral of slopes, the foundational measures of a accurate curve within vector spaces, you arrive at the right subspace of actuation morphology relative to goal. It's like that psychologist dude said "relevance realization" maximization. You really need that fluidity, the quantization is bad, unless you have maximized full parameter space, then quantization of context is good (float range (0 to 1) carries so much nuance, and has so much potential, it's differential derivation will allow much more nuance * all outputs). A single float is a lot like a rainbow of possible actuation correction, not just a single dimension direction, it is a interdimensional light cone of possible vector reflections (even possible cascading reverberations with itself, which allow for nuanced and efficient temporal counting, to control data flux cycles based on their reward driven amplification of momentum of synergistic relevant context, to derive high dimensional angular momentum over working actuation pathways towards goals). This is why intelligence isn't really a THE algorithm that does the work, it's the algorithm that finds other algorithms to do the work, it shape shifts the weights in the direction (actuation symmetry with goal convergence = weight update factor) (prior node output acceleration * pattern detection acceleration * temporal proximity = weight update factor for that node, where the pattern detection is the reward mechanism (vector comparison equality/difference)). As a pattern detection acceleration * detection frequency increase occurs, this increase = (importance value) = weight of influence on shaping nodes to accelerate towards those patterns) as well as inverse appetite function which is simply just (acquired resource amount counter divided by resource converted into work), decreasing a reward mechanism's transformation acting on node weight. This system is a synergistic vector comparer, optimizing for relevance, keeping or tossing patterns by their ability to turn environmental data into successful actuation towards goal. Animal intelligence really is better when combining all relevant factors (converging accelerations towards goals). It's a hyper pyramid of proportional or inversely proportional trade offs that skew the hyper pyramid of right angle trade off or scaling factors, or constant relationships by equal proportion of each other / integrated dimensions. You are riding in a nested hyper pyramid of synergistic proportionality factors as a conscious system, you have no idea how great you really are, every single person and lifeform on this planet. We are definitely a interesting feature of this universe, one important for science, one that gives life and direction of science. We are also computationally bound, we suffer from the scale induced thinning of working gradients as well, we correct for it by that relevance realization (THE critical component, for diffusion of actuation capability (uniformly) into the phase transition gaps between gradients when modeling a target high dimensional actuation subspace) relevance is our handle on reality, intelligence is the sum of pointers to that relevance. Real time high dimensional angular momentum of integrated relevant context driven by self growing reward motors that leverage the work of (vector accelerations converging on working pathways around interdimensional obstacles in data environment, optimizing actuations that converge on detection acceleration, node per node basis, drive algorithmic morphology]).
You never start with the fractal however, you have to derive it from data environment input induced convergence of prior (accelerations of outputs * reward detection acceleration * temporal proximity) to return node weight update factor, node per node basis based on it's output acceleration direction converging on reward detection acceleration. And intermediate reward mechanism growth by data input snap shots input pattern, that pattern is favored as a new reward mechanism influence based on (new pattern detection acceleration) * (pre-existing reward detection acceleration) = importance factor (future weight of influence on nodes with high accelerations that converge with that pattern detection acceleration). Intelligence is the number of synergistic pointers that interrelatedly scale relevant context proportionally to actuating on a working pathway towards goal.
@@NicholasWilliams-uk9xu Thank you for replying. However, what if you can see the fractals first (I am dyslexic and pattern recognition seems to be how I perceive these 'phenomena' or systems? I've yet to finish viewing the video and will watch it a few more times. Can I ask more questions, please? Thank you for sharing your time and energy.
44:22 "15 yo will be better at skill acquisitions than 10 yo" , I have some questions about it, because, neuroscience determines neuroplasticity like the ability of the brain to modify itself and adapt to new behaviors. And it is demostrated that as younger you are, the more plasticity you have. In other words, a baby or from 0 to 12 yo or something you neuroplasticity is extremely high and therefore as time goes on, your neuroplasticity decays too much. It's not removed completely but is reduced a lot. So taking in mind this, imagining that the two boys had the same cognition development or like that, the 10yo would acquire skill faster than 15 yo. Maybe the additional ingredient that francois comment is, you polish your macro-system of intelligence, and that's true. As long as you are improving a part of your body this part becomes better, that's not debatable. but what's more important or has more weight ? A 10yo with more neuroplasticity but less intelligence polished, or, A 15yo with less neuroplasticity with more polished intelligence?
I think it depends on the previous knowledge, the new skill to be acquired and the plasticity. For example, in the case of learning a language, the 10-year-old child could acquire the accent much better thanks to the plasticity in the neural networks that control prosody (muscle movements of the tongue, etc.) but the 15-year-old boy will probably understand more quickly the grammar, advanced vocabulary and other aspects of the language that he can relate to the knowledge he already has and the linguistic and social skills that he has more developed than the other child.
@@CodexPermutatio Interesting point, maybe as young you are and the more neuroplasticity you have, you learn better implicit things, and the more explicit things like reasoning tasks could take more in account the previous accumulated knowledge that you have more than the neuroplasticity. As francois said, this building blocks, you reuse them to construct or adapt to the new challenge, by so, the more older years old (imagining that cames from the same development process) could adquire reasoning skills better than the younger one, but the younger could acquire more intrinsic behaviors as language, patterns, etc...
System 1... System 2 to system n. Fundamentally i see heroes will inderstand fundamentally we are limited (@16:56) as said by many philosophers like JK, OSHO... and many others without AGI Research.. 😊😊loved those philosophers and those who are fighting science now on gravity...❤
55:43 its suggested that ARC test is 100% soluble based on non-overlap (disjoint set) of two test takers' solutions that Francois evaluated to be incorrect. This conclusion is faulty and 3 observers (francois and the 2 test takers ) cannot use their mutual disagreement to prove 100% solubility, rather the reverse, that at least a portion of the test is undecidable until a fourth observer can find perfect agreement with one of the three previous observers (francois and the two test takers).
Are artificial neural networks meaningfully operationally-functionally different from human neural networks? If not, then maybe we are also just pattern recognition machines too?
I remember birth, so "babies are not or less conscious" is a misnomer I think, especially since idiots will run with this and think young people are not people. I was not aware of what was happening, but my memory made sense of that experience in hindsight. I checked several early memories with my parents to make sure they were not false memories and they pretty much weren't. Consciousness as was defined by Francois is not really correct, what he means is awareness after some training of the brain against physical reality. Yet, before birth and in early life, people are conscious of their own inner world already without being aware "what it all means" not even capable of expressing that question, but capable of memory and experiencing a moment through sensory data. At early stages everything looks like some random passive movie, but that characteristic will of course change while we learn. The perception of time is indeed inversely correlated to the amount of data is abstracted away, but consciousness is again a misnomer since you can space out and people would say you were not conscious (to them). I think Francois means "abstracted awareness" instead of purely "consciousness" although one can indeed have and express more or less of both regarding some (imaginary) event. Animals are fully conscious, yet incapable of understanding higher abstractions. On some levels animals are more "aware" then humans, since they can react early to storms and stuff. I think anything that can experience (pain) is conscious. Intelligence or being capable of expressing yourself are just insufficient but necessary proxies.
I'd love to know what Chollet thinks about "metaphorical thinking" (Lakoff & co): metaphorical thinking is just as important as abstraction. His own *Kaleidoscope* is such a thing: a conceptual metaphor.
@@TerrelleStephens Nice! The research in linguistics still repeats Lakoff's ideas. An advance is the Neural Theory of Language. A technical book seems coming out in 2025 (with Narayanan.) There is also Feldman (worth reading) and in AI Schmidhuber mentions about Metaphors We Live By in a paper about the binding problem. All of them seem to think along the lines of Minsky's little (nice) theories. I think Metaphors help creativity, and to think out of distribution. Best of luck with your thesis!
Revelations from most to least severe, focusing on implications for AI development and our understanding of intelligence: 1. Most Severe: Current AI Performance Metrics Are Fundamentally Flawed Timestamp: 00:03:45-00:04:05 Quote: "Performance is measured via exam style benchmarks which are effectively memorization games" Why Panic-Inducing: This suggests we've been fooling ourselves about AI progress - our primary metrics for "intelligence" are actually just measuring memorization capacity. Years of perceived progress might be illusory. 2. The Scale is All You Need Hypothesis is Wrong Timestamp: 00:02:34-00:03:00 Quote: "Many people are extrapolating... that there's no limit to how much performance we can get out of these models all we need is to scale up the compute" Why Concerning: The dominant strategy in AI (just make bigger models) may be fundamentally misguided. This challenges the foundation of many major AI companies' strategies. 3. LLMs Cannot Do True Reasoning Timestamp: 16:54-16:56 Quote: "Neural networks consistently take pattern recognition shortcuts rather than learning true reasoning" Why Alarming: Suggests current AI systems, no matter how impressive they seem, are fundamentally incapable of real reasoning - they're just very sophisticated pattern matchers. 4. We're Missing Half of Intelligence Timestamp: 00:12:17-00:12:28 Quote: "Intelligence is a cognitive mechanism that you use to adapt to novelty to make sense of situations you've never seen before" Why Troubling: Current AI systems lack this fundamental capability, suggesting we're much further from AGI than many believe. 5. The Deep Learning Limitation Timestamp: ~16:39-16:54 Quote: "I realized that actually they were fundamentally limited, that they were a recognition engine" Why Significant: Suggests deep learning itself may be a dead end for achieving true AI, despite being the dominant paradigm. This transcript is particularly shocking because it systematically dismantles many of the core assumptions driving current AI development and suggests we might be on the wrong path entirely. Chollet's insights, backed by his extensive experience and concrete examples like the theorem-proving work, suggest that the current AI boom might be building on fundamentally limited foundations. The most panic-inducing aspect is that these aren't speculative concerns - they're observations from someone who has been deeply involved in the field and has seen these limitations firsthand through practical experimentation. It suggests we might need to fundamentally rethink our approach to AI development.
21st also the rising HLM.. With Str of criticism Agi of hacking Int of negatism .. H- acking L- auguages M - model So much evolutions Just showed up This 21st.. So weird.. 😂😂 Peace out ❤❤❤ Spread love.. 😘
Francois Chollet has deep thinking and extensive knowledge of AI, but unfortunately, he seems somewhat disconnected from hands-on work with current LLMs, relying more on his knowledge and experience in machine learning and traditional deep learning. Modern LLMs represent a fundamentally different paradigm from traditional machine learning and deep learning approaches - something many AI researchers haven't fully grasped yet. Another problem Francois Chollet is having is , even he keeps talking about intelligence , his idea is more originated from computer science perspective and lack of deep understanding from human cognition perspective - something many AI researchers have the similar problem . Computer science tend to focus on math and detailed architecture but lack of whole picture or vision , and human cognition and other social science such as psychology , neuroscience will inspire a much effective and simple model/method for AI intelligence . "Occam's Razor" - the idea that given multiple explanations for a phenomenon, the simplest one is usually the best. Einstein's "Everything should be made as simple as possible, but no simpler." Francois Chollet has developed a complex architecture and theory . Actually , it can be much simpler as long as incorporated with social science such as human cognition , psychology and neuroscience .
Your idea about babies is not correct. Babies in the womb can hear music and remember it after birth. They can also be quite active in the womb, at least some of that activity is intentional - coming from a mind that experiences things (I won't go into the details). I'm pretty certain that whatever creates our consciousness and intelligence can exist independently of external inputs. Maybe you should look into that.
I completely disagree with this "you're most efficient in acquiring new skills in your early 20s" thing. I wonder if that's more a result of the environment. For instance, academia, where that seems true based on what academics like to say. Just seems like something that definitely hasn't been proven.
Interesting to see hero/god talking consciousness, when he himselves admitted in this video he is not aware of consciousness. No one aware of conciousness fully.. There has to be a dot there. Why explain about babies, not fully sleeping, not fully conciousness.. Simple, does conciousness need eyes? @2:18:18
Intelligence is not scaling, it's the power of the scaling law.... (quite literally the exponent of some performance function derived from the training function....)
It is interesting how computer profressionals like Chollet never stop for a moment to consider that consciousness has anything to do with the physics or chemistry of bodies and brains, or even the universe itself. No, it is all some abstract and ethereal manipulation of information.
@@christianpadilla4336 Not me, but many scientists like Michael Levin, Johnjoe McFadden, Colin Hales are working on more concrete concepts of consciousness,
2:17 he's dead wrong about not having consciousness in the womb i can't say when it starts but no doubt in my mind we all have it shortly after our minds formed enough to 'house' it.
yeah got the same feeling, intelligent humans can solve arc-agi at 98% so It only makes sense that solving arc is atleast necessary (but maybe not sufficient) if we want to have advances towards agi
SPONSORED BY TUFA AI LABS (home of MindsAI)!
Open research positions to work on ARC - tufalabs.ai/open_positions.html
True intelligence involves planning, learning, and modelling "new concepts" about the world and ideas in general.
For this, pattern recognition is a necessary (but obviously not sufficient) requirement.
Amazing content as always. Glad to see Chollet back on the show!
i think searching is a part of intelligence too
@@EobardUchihaThawne search encompasses planning. you need world model + search algorithm
Something that I very often think of when reflecting on this definition of intelligence is how many forms of economically valuable work don’t require dealing with novelty - just pattern recognition and following standard processes. The type of intelligence that Arc asks us to strive for isn’t necessary for AI systems to displace a significant portion of human labor. It’s only needed if we want AI to replace all economically valuable work.
It’s needed to call it AGI
You're completely missing the point. The point is that those tasks don't require intelligence at all.
He's not arguing against AI systems being useful; he's saying don't be fooled into thinking it's more capable than it is simply because it does so well on tasks that don't require it to use actual intelligence (as defined by chollet).
The type of intelligence he is advocating for is necessary for true conversation and decision making. Not everyone is interested in AI simply to automate some process. There is a lot of room for AI systems to act as advisors who don't share the same memory deficits as humans and can simultaneously consider many more courses of action than a human without losing the context and goal of the one they're advising.
This requires Chollet's version of intelligence.
AI companies have been promising AGI (whatever that means) but we can at least agree that an AGI could easily solve Arc. What you're doing isn't just shifting the goal post, you're trying to convince yourself that it's not even there.
Great Interview, Francois has a lot of great novel ideas and can clearly express them, I see why you're a fan!
Yesss finally someone realistically depicting the current state of AI.
AI's are currently really good at pattern recognition
Pattern recognition is of our brains intellectual functions yeah but it's not the only function involved in making humans intelligent
Relational reasoning, spatial manipulation, different kinds of memory (working memory, short term, long term), executive functions - these are all interconnected intellectual functions of the human brain, and LLMs for instance are currently only capable of a subset of relational reasoning (which is pattern recognition) and also have memory - they simply are not at our level yet
Dude great point.
They're just very sophisticated search engines
This interview was awesome. Thank you
"Intelligence vs Skill"
Very well explained!
This is where I believe Demis Hassabis got it wrong when he said that you can have intelligence without consciousness. I don't think you can.
The smartest LLMs are like the subconscious part of our mind that can learn elaborate skills but that understand nothing. The conscious part of the brain, which is slow and serial (can only focus on one thing at a time) delegates most of the work to these programmed areas, only providing guidance where necessary. When we delegate too much and don't provide sufficient guidance, so we can focus on something else, we often end up executing unwanted actions, like taking a wrong turn in the car, walking into the wrong room or throwing food in the bin then putting the packaging in the fridge. We have to delegate but then guide and monitor occasionally because although our subconscious has the ability to execute actions, it has no idea why it's doing anything. It understands nothing.
To understand requires the awareness we know as consciousness.
Thank you so much for this video. I can’t say enough about the value of this interview on balance with a 1000 otherS.
The thumbnail got me !!! Look out Mr Beast. Seriously why I clicked even though a subscriber. Best ever.
I can’t stop seeing adult Harry Potter
💀🗣💀
Ze AI is like ze Voldemort, powerfül but hollöw.
1:51:34 - very good analogy for "intelegent task and agents"
1:56:51 - also very nice point about learning
Great intro, and great talk so far!
the way the interviewer was smiling the whole conversation.... me and you both mate
Great interview. Well produced.
Thank you for explaining your thorough research and advanced thinking so clearly!
I agree our brains and the path to AGI is made up of multiple agents and sub-agents working together, each with different expert specialisations, reward maximisation and loss minimisation functions built in.
By using prompt engineering to create the appropriate expert agent (with many years of experience in that particular field and with the appropriate value system, thinking skills and output format), then chaining many agents together to work both hierarchically and sequentially, these collaborations unlock improved cognitive capabilities.
The space of vector functions is functionally complete. That means that in composed pipelines of vector functions some can be logical functions like AND and OR.
>AND and OR
Boolean algebra.
One of my favorite episodes ...thanks
amazing show, as always!
This channel has shown me the whole world of AI in a serious fashion - the intersection of logic, reasoning, philosophy, and science....makes the hype-side seem a little silly compared to the incredibly rich content this channel puts out.
🎉Great interview!
This was amazing, thank you!
Thank you
This is beginning to be better understood.
The risk appears to be held by the individual interpretation of intelligence, subject to pattern and anomaly distortions.
It is likely best practice to assume that the agent is unaware of its intelligence, and must be analyzed by measuring patterns and anomalies within its boundary.
This is a challenge confronted over and over as each agent is of specific design and is responsible to assess one and another and so on.
The compute data is necessary for scale, however it is not the problem confined to proper function resolution.
Meaning, what good is endless scale, pattern, intelligence, pattern, and compute in an ever outpaced faulty operating system?
This good for idiots to remain believing that they are not.
That is all its good for and so fault function/tyranny remains.
My ego took a hit from the title and I clicked
Fluid intelligence vs knowledge re professors entrenched in their beliefs (48:30)
"It depends whether you…believe you already have the answer to the question or you believe you have templates that you can use to get the answer."
Yes that's spot on.
FINALLY!!! 🙌🏾
Lotta bangers lately!
That was a small book length podcast. Epic.
I think there are varying levels of consciousness. All the way from the individual cell up to the power and majesty of the neural networks of our minds. Neurons being the most conscious cells and together the most conscious mass of cells together.
❤ to the MindsAI team!
I've been wondering, why the focus camera all the time? Why no wide shots if both in the same room?
My conception of intelligence is sort of similar to the Kaleidoscope model (I think searching through a tree of compositions of known ideas while pruning the tree through learnt pattern recognition is sufficient to deal with "novel" stimulus). I can also agree with intelligence being the sample efficiency needed to generalise. But it is also possible that sample efficiency is a product of scale. There is some potential evidence for that, (Larger LLMs learn faster than smaller LLMs) but that could also be explained by Larger LLMs just having better representations (fitting higher dimensional manifolds).
There is also the question of how much is actually "novel", because there is a chance that you could just "solve" all of science with the currently observed data (everything is in distribution) but most people (including me) might be displeased if that were the case.
I maintain that humans ability to deal with truly novel situations is limited as well and the human will fall back to experience and instincts. Humans may be better at this but I don’t think it is a fundamental difference.
oooo been waiting for this since august
Suspend, suspend... suspend... suspension sound is masterfully done.
The intro looks like the A24 intro lol
Developing curves from direct inverse trade off of extents (inverse vector direction symmetry) (vector extents, constant hypotenuse length), by multiplying synergistically differential change in slope length over time integrals (some integral measure), (integral of (slope1 delta * slope2 delta) = curve) has grounding in scaling dimensions towards deriving a larger scale reference frame factor to scale further with. Also, Instead of using hypotenuse constant length, you can use hypotenuse const slope, for synergistically shrinking vectors (vector direction symmetry component for extra dimensional comparison), and of course the integral of products * linear function. I mean, somewhat in the context of reinforcement, but not really, but more about end products of what you want the system to be like in it's end point composition of weight value relationships, we need to know whether the dog kinda looks like a cat, because that is a symmetry in the data to further scale from into a different relational space * proportionality of vector influence. Therefore the classifiers and backprop are not great for pass forward of nuanced relationships (overly quantized parameter behavior, lattice of proportions when scaled render smaller volume of adaptability). There are other losses to reconsider as significant losses from scaling high dimensional integration, the gaps will scale, compute per performance loss, inefficiency in the parameter summed product forward pass (output constraint of nuance, loss of constructive interference with dead priors and forward actuation guidance = false actuation symmetry with a working pathway toward goal, inability to derive high dimensional subspaces (sharp gaps between working gradients, flickering problems, hallucination as sharp phase transitions occur), the interdimensional lattice thins out as you scale sum of products, mean^n_power direction pointer to actuation subspace (will have quantized phase shifts that do not have uniformity in navigating that high dimensional spaces nuances) doesn't point to the right subspace, when there is actually much more tangents you could be relating per compute). As you can see, this is actually easier to fix than we think, it's all in the integral of slopes, the foundational measures of a accurate curve within vector spaces, you arrive at the right subspace of actuation morphology relative to goal. It's like that psychologist dude said "relevance realization" maximization. You really need that fluidity, the quantization is bad, unless you have maximized full parameter space, then quantization of context is good (float range (0 to 1) carries so much nuance, and has so much potential, it's differential derivation will allow much more nuance * all outputs). A single float is a lot like a rainbow of possible actuation correction, not just a single dimension direction, it is a interdimensional light cone of possible vector reflections (even possible cascading reverberations with itself, which allow for nuanced and efficient temporal counting, to control data flux cycles based on their reward driven amplification of momentum of synergistic relevant context, to derive high dimensional angular momentum over working actuation pathways towards goals). This is why intelligence isn't really a THE algorithm that does the work, it's the algorithm that finds other algorithms to do the work, it shape shifts the weights in the direction (actuation symmetry with goal convergence = weight update factor) (prior node output acceleration * pattern detection acceleration * temporal proximity = weight update factor for that node, where the pattern detection is the reward mechanism (vector comparison equality/difference)). As a pattern detection acceleration * detection frequency increase occurs, this increase = (importance value) = weight of influence on shaping nodes to accelerate towards those patterns) as well as inverse appetite function which is simply just (acquired resource amount counter divided by resource converted into work), decreasing a reward mechanism's transformation acting on node weight. This system is a synergistic vector comparer, optimizing for relevance, keeping or tossing patterns by their ability to turn environmental data into successful actuation towards goal. Animal intelligence really is better when combining all relevant factors (converging accelerations towards goals). It's a hyper pyramid of proportional or inversely proportional trade offs that skew the hyper pyramid of right angle trade off or scaling factors, or constant relationships by equal proportion of each other / integrated dimensions. You are riding in a nested hyper pyramid of synergistic proportionality factors as a conscious system, you have no idea how great you really are, every single person and lifeform on this planet. We are definitely a interesting feature of this universe, one important for science, one that gives life and direction of science. We are also computationally bound, we suffer from the scale induced thinning of working gradients as well, we correct for it by that relevance realization (THE critical component, for diffusion of actuation capability (uniformly) into the phase transition gaps between gradients when modeling a target high dimensional actuation subspace) relevance is our handle on reality, intelligence is the sum of pointers to that relevance. Real time high dimensional angular momentum of integrated relevant context driven by self growing reward motors that leverage the work of (vector accelerations converging on working pathways around interdimensional obstacles in data environment, optimizing actuations that converge on detection acceleration, node per node basis, drive algorithmic morphology]).
Scale = Fractal
You never start with the fractal however, you have to derive it from data environment input induced convergence of prior (accelerations of outputs * reward detection acceleration * temporal proximity) to return node weight update factor, node per node basis based on it's output acceleration direction converging on reward detection acceleration. And intermediate reward mechanism growth by data input snap shots input pattern, that pattern is favored as a new reward mechanism influence based on (new pattern detection acceleration) * (pre-existing reward detection acceleration) = importance factor (future weight of influence on nodes with high accelerations that converge with that pattern detection acceleration). Intelligence is the number of synergistic pointers that interrelatedly scale relevant context proportionally to actuating on a working pathway towards goal.
@@NicholasWilliams-uk9xu Thank you for replying. However, what if you can see the fractals first (I am dyslexic and pattern recognition seems to be how I perceive these 'phenomena' or systems?
I've yet to finish viewing the video and will watch it a few more times. Can I ask more questions, please?
Thank you for sharing your time and energy.
44:22 "15 yo will be better at skill acquisitions than 10 yo" ,
I have some questions about it, because, neuroscience determines neuroplasticity like the ability of the brain to modify itself and adapt to new behaviors.
And it is demostrated that as younger you are, the more plasticity you have. In other words, a baby or from 0 to 12 yo or something you neuroplasticity is extremely high and therefore as time goes on, your neuroplasticity decays too much. It's not removed completely but is reduced a lot.
So taking in mind this, imagining that the two boys had the same cognition development or like that, the 10yo would acquire skill faster than 15 yo.
Maybe the additional ingredient that francois comment is, you polish your macro-system of intelligence, and that's true. As long as you are improving a part of your body this part becomes better, that's not debatable. but what's more important or has more weight ?
A 10yo with more neuroplasticity but less intelligence polished,
or,
A 15yo with less neuroplasticity with more polished intelligence?
I think it depends on the previous knowledge, the new skill to be acquired and the plasticity. For example, in the case of learning a language, the 10-year-old child could acquire the accent much better thanks to the plasticity in the neural networks that control prosody (muscle movements of the tongue, etc.) but the 15-year-old boy will probably understand more quickly the grammar, advanced vocabulary and other aspects of the language that he can relate to the knowledge he already has and the linguistic and social skills that he has more developed than the other child.
@@CodexPermutatio Interesting point, maybe as young you are and the more neuroplasticity you have, you learn better implicit things, and the more explicit things like reasoning tasks could take more in account the previous accumulated knowledge that you have more than the neuroplasticity. As francois said, this building blocks, you reuse them to construct or adapt to the new challenge, by so, the more older years old (imagining that cames from the same development process) could adquire reasoning skills better than the younger one, but the younger could acquire more intrinsic behaviors as language, patterns, etc...
all starts with trying something and fails, however sometimes the result is good and what is the pattern to that result.
System 1... System 2 to system n. Fundamentally i see heroes will inderstand fundamentally we are limited (@16:56) as said by many philosophers like JK, OSHO... and many others without AGI Research.. 😊😊loved those philosophers and those who are fighting science now on gravity...❤
55:43 its suggested that ARC test is 100% soluble based on non-overlap (disjoint set) of two test takers' solutions that Francois evaluated to be incorrect. This conclusion is faulty and 3 observers (francois and the 2 test takers ) cannot use their mutual disagreement to prove 100% solubility, rather the reverse, that at least a portion of the test is undecidable until a fourth observer can find perfect agreement with one of the three previous observers (francois and the two test takers).
As of this comment, sota is 55.5%
quantum computing is needed for true agi
Are artificial neural networks meaningfully operationally-functionally different from human neural networks?
If not, then maybe we are also just pattern recognition machines too?
I remember birth, so "babies are not or less conscious" is a misnomer I think, especially since idiots will run with this and think young people are not people.
I was not aware of what was happening, but my memory made sense of that experience in hindsight. I checked several early memories with my parents to make sure they were not false memories and they pretty much weren't.
Consciousness as was defined by Francois is not really correct, what he means is awareness after some training of the brain against physical reality. Yet, before birth and in early life, people are conscious of their own inner world already without being aware "what it all means" not even capable of expressing that question, but capable of memory and experiencing a moment through sensory data. At early stages everything looks like some random passive movie, but that characteristic will of course change while we learn.
The perception of time is indeed inversely correlated to the amount of data is abstracted away, but consciousness is again a misnomer since you can space out and people would say you were not conscious (to them). I think Francois means "abstracted awareness" instead of purely "consciousness" although one can indeed have and express more or less of both regarding some (imaginary) event.
Animals are fully conscious, yet incapable of understanding higher abstractions. On some levels animals are more "aware" then humans, since they can react early to storms and stuff.
I think anything that can experience (pain) is conscious. Intelligence or being capable of expressing yourself are just insufficient but necessary proxies.
I'd love to know what Chollet thinks about "metaphorical thinking" (Lakoff & co): metaphorical thinking is just as important as abstraction. His own *Kaleidoscope* is such a thing: a conceptual metaphor.
This is the basis of my thesis. Lol. Glad I'm not the only one looking at things this way. It's a hard train to get people to board though.
@@TerrelleStephens Nice! The research in linguistics still repeats Lakoff's ideas. An advance is the Neural Theory of Language. A technical book seems coming out in 2025 (with Narayanan.) There is also Feldman (worth reading) and in AI Schmidhuber mentions about Metaphors We Live By in a paper about the binding problem. All of them seem to think along the lines of Minsky's little (nice) theories. I think Metaphors help creativity, and to think out of distribution. Best of luck with your thesis!
Isn’t this just a reformation of the concepts underlying Wolfram’s computational language?
If intelligence is the ability to handle novelty, is that not just another way of saying the ability to recognize patterns?
Revelations from most to least severe, focusing on implications for AI development and our understanding of intelligence:
1. Most Severe: Current AI Performance Metrics Are Fundamentally Flawed
Timestamp: 00:03:45-00:04:05
Quote: "Performance is measured via exam style benchmarks which are effectively memorization games"
Why Panic-Inducing: This suggests we've been fooling ourselves about AI progress - our primary metrics for "intelligence" are actually just measuring memorization capacity. Years of perceived progress might be illusory.
2. The Scale is All You Need Hypothesis is Wrong
Timestamp: 00:02:34-00:03:00
Quote: "Many people are extrapolating... that there's no limit to how much performance we can get out of these models all we need is to scale up the compute"
Why Concerning: The dominant strategy in AI (just make bigger models) may be fundamentally misguided. This challenges the foundation of many major AI companies' strategies.
3. LLMs Cannot Do True Reasoning
Timestamp: 16:54-16:56
Quote: "Neural networks consistently take pattern recognition shortcuts rather than learning true reasoning"
Why Alarming: Suggests current AI systems, no matter how impressive they seem, are fundamentally incapable of real reasoning - they're just very sophisticated pattern matchers.
4. We're Missing Half of Intelligence
Timestamp: 00:12:17-00:12:28
Quote: "Intelligence is a cognitive mechanism that you use to adapt to novelty to make sense of situations you've never seen before"
Why Troubling: Current AI systems lack this fundamental capability, suggesting we're much further from AGI than many believe.
5. The Deep Learning Limitation
Timestamp: ~16:39-16:54
Quote: "I realized that actually they were fundamentally limited, that they were a recognition engine"
Why Significant: Suggests deep learning itself may be a dead end for achieving true AI, despite being the dominant paradigm.
This transcript is particularly shocking because it systematically dismantles many of the core assumptions driving current AI development and suggests we might be on the wrong path entirely. Chollet's insights, backed by his extensive experience and concrete examples like the theorem-proving work, suggest that the current AI boom might be building on fundamentally limited foundations.
The most panic-inducing aspect is that these aren't speculative concerns - they're observations from someone who has been deeply involved in the field and has seen these limitations firsthand through practical experimentation. It suggests we might need to fundamentally rethink our approach to AI development.
But don’t they create Programs in the training phase I guess the point is it’s inefficient maybe
21st also the rising
HLM..
With
Str of criticism
Agi of hacking
Int of negatism
..
H- acking
L- auguages
M - model
So much evolutions
Just showed up
This 21st..
So weird.. 😂😂
Peace out ❤❤❤
Spread love.. 😘
If somebody actually makes AGI or a model that can solve ARC problems submitting it would be really stupid
shiet it's 1 am. I'll never go to bed with this D:
Where do you live? Its like 7pm here
@brandonmorgan8016 I live in Italy lol
The weird thing is I think exactly like this guy
Francois Chollet has deep thinking and extensive knowledge of AI, but unfortunately, he seems somewhat disconnected from hands-on work with current LLMs, relying more on his knowledge and experience in machine learning and traditional deep learning. Modern LLMs represent a fundamentally different paradigm from traditional machine learning and deep learning approaches - something many AI researchers haven't fully grasped yet.
Another problem Francois Chollet is having is , even he keeps talking about intelligence , his idea is more originated from computer science perspective and lack of deep understanding from human cognition perspective - something many AI researchers have the similar problem .
Computer science tend to focus on math and detailed architecture but lack of whole picture or vision , and human cognition and other social science such as psychology , neuroscience will inspire a much effective and simple model/method for AI intelligence .
"Occam's Razor" - the idea that given multiple explanations for a phenomenon, the simplest one is usually the best. Einstein's "Everything should be made as simple as possible, but no simpler." Francois Chollet has developed a complex architecture and theory . Actually , it can be much simpler as long as incorporated with social science such as human cognition , psychology and neuroscience .
GUYS, IT'S HAPPENING
ChatGPT, summarize this 3 hour interview for me.
Where’s schmidhuber part 2?
Waiting on him to approve sorry, will release as soon as he does
thanks! Love the show
"Babies are not conscious because they sleep". In all due respect sir, I lucid dream. Consciousness exists when I'm "asleep".
Your idea about babies is not correct.
Babies in the womb can hear music and remember it after birth. They can also be quite active in the womb, at least some of that activity is intentional - coming from a mind that experiences things (I won't go into the details).
I'm pretty certain that whatever creates our consciousness and intelligence can exist independently of external inputs.
Maybe you should look into that.
Thanks
AI zen monk is back.
I completely disagree with this "you're most efficient in acquiring new skills in your early 20s" thing. I wonder if that's more a result of the environment. For instance, academia, where that seems true based on what academics like to say. Just seems like something that definitely hasn't been proven.
Interesting to see hero/god talking consciousness, when he himselves admitted in this video he is not aware of consciousness. No one aware of conciousness fully.. There has to be a dot there. Why explain about babies, not fully sleeping, not fully conciousness..
Simple, does conciousness need eyes? @2:18:18
isnt humans memorizing how to solve something?
The most insightful conversation on AI since Wolfram on Lex Friedman back in May of 23. 👍
Stop this, I need to get some actual work done :')
Intelligence is not scaling, it's the power of the scaling law.... (quite literally the exponent of some performance function derived from the training function....)
If u give to average human significantly diferent problem then his training data, he or she will fail.
First comment. please like :-)
Francois!!!
Why do I have the sneaking suspicion that if a new model were to solve ARC problems, then we would move the goalpost for intelligence once again?
It is interesting how computer profressionals like Chollet never stop for a moment to consider that consciousness has anything to do with the physics or chemistry of bodies and brains, or even the universe itself. No, it is all some abstract and ethereal manipulation of information.
I'm all but certain that they have considered it, and then rejected it.
@@deadeaded To their detriment, which is why are making no empirical progress.
@@christianpadilla4336 Not me, but many scientists like Michael Levin, Johnjoe McFadden, Colin Hales are working on more concrete concepts of consciousness,
Lmao "computer professional"
Good bot
2:17 he's dead wrong about not having consciousness in the womb i can't say when it starts but no doubt in my mind we all have it shortly after our minds formed enough to 'house' it.
Is this synonymous with gamers that can complete no damage runs using their intelligence and pattern recognition?
Why do I have the sneaking suspicion that if a new model were to solve ARC problems, then we would move the goalpost for intelligence once again?
I also have that
yeah got the same feeling, intelligent humans can solve arc-agi at 98% so It only makes sense that solving arc is atleast necessary (but maybe not sufficient) if we want to have advances towards agi