One advice to host: You need to give your guest space. You are not a salesman. Or a missionary. Challenging them does not mean repeating the same argument over and over again. It was suffocating to listen to your challenges. If it was not for the call and patient demeanor of Chollet, it would be impossible to watch. We were not able to listen to Chollet expanding upon his ideas because host just reversed the clock to zero by repeating the same "but memorization is intelligence" argument. It should be about your host, not showing the supremacy of your ideology or beliefs. If your host is wrong, you can prove them wrong by showing their arguments and ask questions as they expand upon them and then show if they are inconsistent. Not repeating the same thing over and over and over again
One advice to you: You can make this point better without insulting Dwarhesh, who is a young man that is still learning. Perhaps you should try hosting a podcast and see if you do better. Want my guess? You would do much worse than you think.
@@Limitless1717 this is the most mundane criticism of a criticism. Dwarhesh himself is doing a podcast on topics he is not a specialist on and he is openly criticizing and challenging the views of a specialist on a topic here. So maybe he should work on AGI before challenging François here if he were to take your advice seriously (though he should try to educate himself on topics in any case) And I am not doing podcasts but I have taught many many classes with a lot of teaching awards. Not the same but similar when it comes to expanding on topics. When I teach a concept I don't just attack it on the first sentence. I explain it, allow it to ripe. Put different light on different aspects of the topic. I don't try to destroy the whole concept on the first sentence. So my advice doesn't come out of nowhere. And if he puts himself to public spotlight, my criticism is actually one of the most innocent stuff that he is thrown in his direction. If he takes into account he can improve upon. I am mostly criticizing how he is doing some stuff and even provide what he can do better. It is weird that you take offense to that. Anyways it is up to him to do what he wants but I won't watch him anytime sooner again. As it is now, this is really bad way of arguing with anyone even in private, let alone on a podcast. When someone interrupt me like he does and don't know how to argue, in general I just don't bother
@@kjoseph7766 They were clearly putting their own opinions forward, which didn’t align. I would say that put pressure on each other to defend their beliefs. The one dude got super panicked as the French guy kept putting forward clearly articulated ideas and rebuttals.
After listening to this interview and reflecting on it, I was actually thinking that (not Dwarkesh because he's legitimately smart in my opinion) but "people" like Destiny talk extremely fast to compensate for shallow ill-formed thoughts.
@@ahpacific Dwarkesh could help himself out a lot if he slowed down. Simple things like rephrasing what 'french guy' was saying in a way that Francois would agree with would also help tremendously. There is a fundamental difference in epistemology between these two, Francois is emphasising true understanding, and Dwarkesh seems to imply that large gross memorisation leads to understanding- which I don't think Francois would agree with
@@ahpacific I am pretty sure Destiny is very transparent on when his thoughts are shallow or not. Notice when he formulates things as open questions and or the adjectives and intonations he uses to suggest the difference between perspective and fact. You can make a false statement and still create a good converstation like that. People like that are fun to talk to as opposed to people who will only say something if they are fully certain.
@@ClaudioMartella I may have commented too quickly, it seems thar later on in the video I got the impression Dwarkesh was playing devils advocate. Not sure...
@@wiczus6102 Right but I couldn't tell if he meant just in that shorter exchange or if the whole interview was him taking the opposite site for the debate.
@@ModernCentristthat's the impression you get from content focused on hype merchants, alarmists, and those with a vested interest in overinflating the value of current methods. Academia is full of skeptics/realists.
Dwarkesh dismissed a few completely valid answers to try and steer the answer in his preconceived idea of LLMs, I didn’t like that, dude is smart, let him finish and actually take onboard his answer before asking another question
He said he was playing devil’s advocate, calm down. He does that with most of his guests. It generally makes for a more informative and engaging interview than simply taking everything your interview subject says at face value.
@@therainman7777 Dwarkesh said himself that they were going in circles. I think this was mostly due to Dwarkesh not really thinking about Chollet's responses in the moment. LLM hype causes brain rot in smart people too.
It is pretty clear Dwarkesh has a lot of influence from the local SF AI folks. Watch his interview with Leopold. His command of the subject is admirable, and he smartly relies on the researchers in his circle to inform his understanding. Many notable people maintain quite strongly it's only a matter of scaling, and I thought he thoroughly went through these types of arguments. It was a valuable thing to do. What is an example of something Francois wasn't able to eventually effectively articulate because he was cut off?
Really glad to see people like Chollet are willing to say the emperor has no clothes. I'd recommend Subbarao Kambhampati's talks as well. He goes into some theories about _why_ people are being fooled into thinking that LLMs can reason.
@@CortezBumf Now I see why there are so many here simping for a fraud like Chollet. Basically these people read "Deep learning with python" and thought Chollet was the frontier person in AI. It's hilarious and ironic. Chollet has made no contribution to frontier AI. He's nowhere near Ilya, Schulman, Hassabis and others who've been interviewed by Dwarkesh. He's just parroting LeCun's viewpoint mixing in his own ideas about general intelligence that are completely unverified.
Now I see why there are so many here simping for a fraud like Chollet. Basically these people read "Deep learning with python" and thought Chollet was the frontier person in AI. It's hilarious and ironic. Chollet has made no contribution to frontier AI. He's nowhere near Ilya, Schulman, Hassabis and others who've been interviewed by Dwarkesh. He's just parroting LeCun's viewpoint mixing in his own ideas about general intelligence that are completely unverified.
@@CortezBumf You do know what Keras is, right? It's a frontend to the libraries like PyTorch, Theano or Tensorflow that do the actual heavy lifting. It's basically some syntactic sugar for the masses who couldn't use the somewhat more complex libraries. Now that their interface is simplified Keras is redundant.
Francois Chollet is an amazing guy. The best thing is he, like all the LLM guys, also wants to work toward AGI! He just doesn't think the current LLM paradigm will get us there. I'm really excited to see where this goes because he's challenging the current option space in exactly the right way
This interview actually got me believing LLMs might indeed get us there. The guest seems to believe in a form of intelligence that he idolizes but we haven't really seen. Dwarkesh was spot on that no scientist zero-shots their ideas.
Him and the others trying to achieve AGI is exactly what will get us eradicated. I agree with Chollet but at some point brute force memorization is all you need to solve any task, you just need to refit the training data.
@@JimStanfield-zo2pz If experience and memorization is enough for AGI, how did people create things they have not seen before? How did Mozart create his music? How did we create skyscrapers? How did we go to moon? How did we discover relativity and quantum physics? Only someone who lived his life like parrot and not create or even attempt to create anything will say this
@@KP-fy5bf That's a fair point actually. If one claims LLMs and that approach is sufficient for solving our issues, they just need more development, I might agree. But once people seriously think LLMs are intelligence, it is a different story.
One of the worst and least informed interviewers, the only way I can get through his videos is fast forwarding through his incessant low information babbling. Yet he keeps getting very smart people on to interview!
I was gonna like this comment, until I watched the whole talk. Guys watch the whole talk before forming an opinion! really interesting. @dwarkesh - great going as usual
So good to see someone let some air out of the LLM bubble. Dwarkesh might be a little challenged by this, but it’s great to get out of the echo chamber regularly.
This strikes me as moving the goal posts. Chollet doesn't seem to understand how dumb the average person is. Can LLMs replace programmers? Can the average human replace a programmer? Go watch Jerry Springer and then tell me how LLMs won't reach AGI. To the average human, these thing are already AGI. Everybody in this video is in the top 1%. They are so far up the intelligence scale that they can't even imagine what average intelligence looks like.
Wanted to agree with him, but Francois Chollet is way off. From the moment he mentioned not "copying" from stack overflow as some sort of example of humans handling novelty in the wild, it was clear he was idealizing some belief that he holds. He refuses to believe that creativity is mostly interpolation.
Train an LLM to solve math but don't include anything related to calculus in the training data. Then ask the LLM to solve a calculus problem and you'll see it fail. That's essentially what Francois Chollet was saying. Isaac Newton was able to introduce calculus based on his FOUNDATION OF MATH (Memory) and actual INTELLIGENCE (Ability to adapt to change)
Francois said the opposite, he actually said that a lot of human math skills rely on memorization too. But actual intelligence to discover/invent new math goes beyond this. This is why even winning a math olympiad would be as meaningless as winning chess, it's old math. Actual intelligence wouldn't win chess but invent the game - without being told so!
Making an AI Isaac Newton vs an AI Calculus Student is a nice and simple way to capture what they're trying to do. Making a great AI Calculus Student is an awesome accomplishment, but we really want a Newton.
Many math teachers I've known say that if you memorize and apply the tricks, at least you'll pass the exams. You won'be great at math, but good enough. Up to some level math is about memory, like chess.
He's not a skeptic, he's a realist. A human, with lower than average intelligence, can learn to safely drive a car in a few hours. No-one's created an AI that can safely drive a car on all roads even when trained on all the data available to mankind. See the problem? Bigger AI models require more training data in order to improve their performance. In other words, it's the greater volume of data that's improving their performance, not their intelligence (memory, not intelligence). An increase in intelligence would enable the model to improve performance without requiring more training data.
There was 20mins of: "Can LLMs replace programmers?" "No" "But can they? "No" "But can they? "No" "But can they? "No" "But can they? "No" "But can they? "No" "But can they? XD ... it simply becomes clear that LLMs can't replace programmers when you start using them everyday on your programmer job and realize how Bad they perform when you start to do just slightly complex logic
This strikes me as moving the goal posts. Chollet doesn't seem to understand how dumb the average person is. Can LLMs replace programmers? Can the average human replace a programmer? Go watch Jerry Springer and then tell me how LLMs won't reach AGI. To the average human, these things already are AGI.
@jasonk125 they are not AGI because they can't perform every task an average human can. Also he was trying to explain they probably can't learn new novel tasks. The things at which LLMs excels are problems that have been solved numerous times and there is a lot of data around those. Even so. The world and its operations is not interconnected and digitized enough for LLMs to take over.
@@GabrielMatusevich Well if Chollet wants to redefine AGI, and then say LLMs aren't AGI (which is what he does) then I guess there is no point arguing with him. From his website: Consensus definition of AGI, "a system that can automate the majority of economically valuable work," Chollet's definition: "The intelligence of a system is a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty." So they should have first come to an agreed upon definition of AGI (which they did not), before arguing about whether LLMs could meet that definition. Your statement: "they are not AGI because they can't perform every task an average human can" is not arguing within Chollet's definitional framework. It is closer to the consensus framework.
@jasonk125 yea, is a good point. That reminds that I don't there is even actual consensus on a definition of just "intelligence" .. which makes it even harder 😆
It is pretty glaring how different Francois's interview is from Illya's. Maybe part of it is the result of Dwarkesh polish as in interviewer where for Illya it was a series of questions Illya minimally answered and here Francois openly expanded upon the problem. But, also from the start the two seemed different. Where Francois maintains all the features of researcher, who values an open exchange of ideas, Illya values secrecy and the advantages of being a first mover. I definitely enjoyed this interview much more.
Interesting observation that I had not noticed, but perhaps felt to some degree. Smart people share information freely as for them it is abundant whereas people who are less smart hold onto knowledge as it is scarce in their world. I think this is the easiest way to really identify if the person you are talking to is smart or just socially smart.
@@MetaverseAdventures No doubt Francois is generous with his understanding because for researchers understanding functions as a public good, ie it does not decrease in value with greater supply. More so, I think it demonstrates the differences of their respective positions. Ilya has a product he needs to deploy and make profitable, and he needs to secure advantage over the competition. It is intrinsically a much more narrow and concentrated effort. This can lead to good result in the short term, but long term, it’s an approach that tends to become stifling. This is also why Francois laments the shift brought about by OpenAI (which is rather ironic).
I don’t think Ilya wants to be first simply for its own sake. He is worried about the future and would like to steer the ship in the right direction. I don’t know exactly what Ilya saw at OpenAI but after watching what Leopold Ashenbrenner had to say, the departure of Jan Leike over the 20% compute that was promised but never delivered and hearing that Ilya is now starting his own AI company called Safe Super Intelligence I suspect he has a good reason to be worried.
@Matis_747 I do appreciate and understand the position you describe that Ilya is in and thus it colours all his words, but I am still not convinced he is as brilliant as made out. We will see as he started a new company that has no profit motive and thus I anticipate more flowing knowledge from him, especially around safety as that is his focus.He is a smart person no doubt, but there is just something that does not add up for me. Probably just me and my lack of understanding as I am just a user of AI and not a creator of AI. I look forward to seeing where Ilya takes things as maybe he is going to be the most important person in AI history, or fade into obscurity.
So... You're saying that Ilya is... Playing his card close to the vest, only responding to direct questions, almost a black box that you have to creatively prompt to get the right answer... Could he be a... LLM? :)
literally tortures him with "okay but isn't this AGI why is that not AGI?" questions and after not having any positive feedback asks "okay but let's suppose you lost your job to AGI"
I think these are two separate conversations. Host goes technical, guest goes philosophical on how these LLMs currently lack processes that helps them keep up with conversation the natural way. I kinda know what chollet i's talking about. In case of Gemini, it's unable to answer from an existing thread. It goes into a cycle of vérification from its database automatically. It could simply mined from past conversations to respond. It'll be more efficient in my view, use less tokens, less resources and will be more efficient than the current architecture its in. Also gemini can't process "on the fly' response either. For example for breaking news, it won't be notified but hours later.
And in turn Francois keeps redefining what LLMs are doing and what intelligence is. He starts with LLMs just memorize, then they memorize templates but can't generate new templates, then they generalize but only locally, then they generalize enough to learn new languages that they haven't been exposed to but thats not intelligence either.... sure Francois. Call us when you've discovered AGI
"But if we gave LLMs as many Adderalls as I popped before this interview, would then get AGI?" "Ok, that may work." "That was a trick question. I snorted the Adderall."
I think Francois really got his point across in the end there. I was leaning somewhat to the scaling hypothisis side, he made me question that more. In any case you have to give him credit him for actually coming up with interesting stuff to support his arguments, unlike many other critics.
“Coming up with” you mean like on the fly? 😂 but seriously though this guys a verified genius in his field, he didn’t just “com up with it” one morning
1 million for this is as ridiculous as 1 million for P vs NP, it's a multi trillion dollar problem, it's like offering $1 for someone to find your lost supercar or something
Both of them can be solved by a kid in African on $100 android smart phone. You say trillion only because you assume you need power plants and datacenters maybe one more time over the current total infrastructure, but that's just zero imagination. It's like if you asked in 1500s the answer would be you need 100 million horses and millions of km2 of woods to burn What if the answer is that you need million times the power of Sun like advanced Kardashew 2 civilization?
There is so much great information here.... I'm at 24:32 and Francois was saying "Generality isn't specificity scaled up..." He seems very aware that the current approach to LLM is bigger, better, more data, and he is right in noting that is not how human intelligence works. We don't need to absorb the whole of human information to be considered intelligent.
Dwarkesh brings up a solid point that no scientist ever "zero shots" their ideas. Francois is partly correct, but he's holding onto some tight beliefs there about creativity, novelty and interpolation.
TL;DR the host was combative in a way that made him come off as a salesman for AI rather than having a conversation about what the guest thinks for the first half of the conversation. also, the host refused to budge on his *belief* that current LLMs are capable of real understanding despite the guest's points to the contrary. first time seeing this podcast so i don't have a frame of reference for the normal vibes of the host but he seemed extremely defensive. the guest seemed to keep calmly stating that memorization and understanding are two completely different things while the host just kept referring to anecdotes and examples of things that he thinks displays understanding. the example that set my radar off to this was the obscure language dictionary example. After being shot down the first time by the guest by claiming that the ARC puzzles are a set of tests that it would be very hard to make training data for and if LLM's developed true understanding/adaptive capabilities then they should be able to pass the ARC puzzles easily. the host then tries to bring up the example of the chess models which the guest points out is almost exclusively pattern recognition and memorization and instead of wrestling with that point he moves back to the obscure language point. i think that evasion of the chess point is actually extremely telling. if he truly believed that was a good point, he might have pushed back on it or tried to justify why he brought it up but instead he says "sure we can leave that aside" immediately. maybe I'm being a little cynical. maybe he realized that was actually a bad point for the argument he was trying to make. regardless, he went back to the obscure language point which may have been impressive if it was not for the rest of this conversation to this point. earlier, the host tried to give an example of a simple word problem that had to do with counting. the guest countered that with all of its training data, it probably was just referencing a word problem that it had before which, from my understanding of how these things work, is probably accurate. the host clearly did not understand this earlier point because the thing about language models that the guest has to point out AGAIN is that the training data probably contains similar information. not necessarily data on that language but to my imagination, the data probably contains a lot of different dictionaries in a lot of different languages. dictionaries on top of having similar formats across most languages also typically have complete sentences, verb conjugations and word classes. i can see how the guest's point about memorization and pattern recognition would apply to LLM's in this aspect. as i continue watching i am realizing that this has turned into a debate on whether or not LLM's have the capability to understand and process information as well as synthesize new information which i was not expecting nor did i want. i think it is intuitively understood that current models are not capable of these things. this is why they require so much training data to be useful. there were genuinely good parts of this podcast, but the host insisting that LLMs understand things in the way that humans do were not it. this is a little nitpicky but there was a point when the host said something like 'lets say in one year a model can solve ARC, do we have AGI?'. to me this comes of as extremely desperate because the most obnoxious part of that question is also the most useless. the timeframe in which this may happen is completely irrelevant to the question. the guest at no point argued anything about timeframes of when he thinks AGI might happen. in fact when the guest answered in the affirmative the conversation took a turn for the better. finally if you haven't gone and taken the ARC test, i would encourage you to do so because neither the host nor the guest did a very good job explaining what it was. but on the second or third puzzle, i intuitively understood why it would be hard to get our current generation of models to preform well on those tests. they require too much deliberate thought about what you are looking at for the current models to pass. it almost reminded me of the video game "the witness" in its simplicity with the only clues as to how to solve the puzzles in both games is with context of earlier puzzles.
@@young9534 i probably wont. the host did not really make me want to see more of him. i am kind of tired of being evangelized to about this tech in its current state. i will likely still follow the space and continue to learn more and seek more information. i hope this host does the same honestly. seems like the space is very full with people who want me to either believe that AGI is impossible or AGI is coming next year. i personally dont appreciate either side's dogmatism and will continue to try and find people with a more measured view on this stuff.
You are overly judgmental. He pushed back because the answers felt too abstract. If they felt too abstract to him, they would ALSO feel too abstract to others. There is literally no personal attachment involved. Too many hosts don’t push back nearly enough due to stuff like this.
Quite the opposite effect on me. François felt like a calm android repeating "arc puzzle" and his beliefs about "novelty", like he has all the answers. Dwarkesh captures the frenzy of the puzzling human experience.
Dwarkesh is right though. Experience is enough. Chollet is just wrong, even in what he thinks LLMs are doing. LLMs do generalize. Patel didn't make the correct arguments.
exactly. like whats the point of asking questions if you dont wanna hear the answer. dwarkesh got that journalist mindest: "i only want to hear a certain answer, not hear what they want to say"
Great Interview. I've personally done about 50 of the online version of the ARC challenge and the gist of solving them is simply to recognize the basic rules that are used to solve the examples and apply that same rule to get the answer. While some are challenging, most are using basic rules such as symmetry, contained or not contained in, change in color or rotation; or a combo of more than on rules. I'm sure that current large LLMs like GPT4 have internalized these basic rules in order to answer questions. so proficiently. What is perplexing to me is why can't LLM extract those rules and apply them to get more than 90% on any ARC challenge. I think that is the crux of the matter that Francois is getting at. If to solve any ARC challenge basically requires one to identify the simple rules in an example then apply those rules, why are LLMs not crushing it?
Because LLMs - once trained - don't extract rules from input data and do another step of applying those rules. That would be precisely the "synthesizing" step that Chollet talked about. LLMs just ingest the input and vomit out the most likely output. The human equivalent is a gut-feel reaction (what we call intuition) without attempt of reasoning.
Not sure they would have really gotten any further with more time. I'm 40 minutes in and the conversation basically seems to go in circles. Dwarkesh: "but this and this behavior by LLMs could be interpreted as intelligence, couldn't it?", Francois: "if that were true, then they would be able to perform well on ARC".
@@QuantPhilosopher89 I think it's good because honestly I know a lot of people like Dwarkesh. I obviously have very different metaphysical presuppositions than most people, so being able to find someone who is able to push back against LLM hype in a way that's understandable and reasonable is nice.
And that argument relies on the most complex and inclusive definition of memorisation when evaluating if LLMs are just memorizing. Ans then the most simple definition of memorization when evaluating the usefullness of memorization.
francois is about the only mainstream AI researcher i follow, all the other dorks saying scale is all we need should take a class in critical thinking.
@@netscrooge Thank you for that insight. I assumed, considering his roots, that he was simply part of the LLM, crowd. Now I will listen with fresh ears.
@@jameshuddle4712 I'm mostly going by what comes out of his own mouth. But if you can find the right interview, we can also hear what Hinton says about working with Ilya when he was his student, the startling way his mind could leap ahead.
Nice to get a little return to earth with this one, pie in the sky talk about AGI is fun but the difference between that and LLMs still seems pretty huge.
The thing about exponential curves is that great distances can be crossed in surprisingly little time. Which is most likely what is going happen. We’re not as far as you think, or as Chollet is making it seem.
@@therainman7777 You know that if AI systems were to increase in performance every year by only 1% of each previous year, that would still be considered exponential growth.
@@10ahm01 Yes, and after a sufficient number of years each 1% increment would represent a huge increase in capabilities, just as a 1% return on a trillion dollars is 10 billion dollars. Also, on an empirical level, we can actually estimate the % increase per year in technological progress using a number of different metrics, and it is nowhere near 1%. It is far, far larger than that. Moore’s law, to give just one example, equates to roughly a 40% increase per year. And many metrics relevant to AI development, such as GPU compute, are increasing even faster than that. So your point about 1% per year increases is also irrelevant for this reason. Lastly, this is not an exponential trajectory that start two or three years ago; it started decades ago. Which means the absolute increments of progress per annum at this point is quite large.
This is the most constructive debate I have watched on AGI to be honest. Bravo Patel for asking the right question to Francois. Definitely makes me think more deeper about all of it
Thanks for this! Most interesting conversation on LLMs I've heard for a long time. I think programme memorisation vs novel programme creation is an important distinction. I can personally buy the idea that we mostly rely on programme memorisation in daily life, but we clearly rely on novel programme creation happening at some point! But unsure on the degree to which that happens within individual brains vs happening out of collective processes e.g. cultural evolution etc
ARC is a great milestone for REAL research, in part because it's immune to brute force efforts where progress can be faked (all results are not equal). The prize money might get people to try things out, but at it's core, a "working solution" signals the existence of new technology (and potential to leap over $100M models). Posting "winning numbers" is less about prize money and more about $xxM in VC funding.
@@PhilippLenssen Why would such accusations necessarily be wrong? This topic was mentioned during the podcast, and the fact that the test isn't perfect was conceded. Why do you think it is a perfect test, when the creator doesn't?
@@PhilippLenssen That will be a problem with any test, but a system that can form new behaviors to understand/solve problems will be able to, as a single system, take-on all challenges, even new ones, without having to go thru a costly training process. Even when we get something that is human-level, people will still question how much is AI, and how much is hidden offshore humans. Only "the real thing" will have staying power.
@@VACatholic A lot of benchmarks are good, as long as you are going at them honestly. Long before adding language abilitites, a system should be human-level on all games (that don't involve language).
Here is the correct way to push-back on the LLM scale still gets there thing: Having a set of all "solution patterns" stored doesn't do anything; it's the human, doing the prompting, that connects the stored pattern with what it needs to be aplied on. With ARC, no one gets to see the test data, so any system has to operate at a level where what it can do is searched based on what it can see. And the key aspect of ARC/AGI, is a system that creates it's own internal solution based on unseen challenges (ie, discovers novel solutions and saves them).
Thanks Francois for the reminder that we can't just scale our way to mastering intelligence you can't memorize everything. I took this approach in college and it ultimately fails.
@@JumpDiffusion Exactly. Why do they keep parroting the memorization bit. François knows better than to say that there's some copy of the code that the LLMs memorized.
@@eprd313 It's not just an architectural issue. The whole epistemological foundation of the prevailing approach to AGI is shaky as hell ( th-cam.com/video/IeY8QaMsYqY/w-d-xo.html )
@@maloxi1472 that video aged like spilled milk. All the progress made in the last year contradicts its claims and the distance from AGI is now more of a matter of hardware than software, a distance that AI itself is helping us cover as it's designing microchips more efficiently than any human could.
Your podcasts are absolutely fantastic! I always eagerly anticipate each new episode and have learned so much from your content. Thank you for your hard work. This episode especially was inspiring and it gave me so many ideas to try and think about.
The perfect ever-changing ARC puzzle set already exists in the dynamic environment of driving a vehicle. This is a test that can't be solved with yet more examples because there is always something unique happening that throws the self-driving program into disarray with the resultant ripping off a Tesla driver's head or the idiotic smashing into the rear of a stationary emergency vehicle. If AI could become AGI through bigger data sets and more memory, then we'd already have flawless self-driving cars, robotaxis and AGI. We don't. Not even close. I think Chollet has highlighted the missing piece of the AI puzzle, pun intended.
I appreciate and very much enjoy these podcasts. I also fully understand the need to play devils advocate. However, to me this felt a lot more biased than most of the other episodes. It's clear which position Dwarkesh has chosen. That's fine, but it really shines through when someone who is not an LLM maximalist is on the podcast. Devils advocate? Yes, always do that. Extreme bias where it becomes waiting for your turn to speak over a discussion? Not ideal in my opinion. I hope if he sees this he doesn't take it personally. Obviously he's very excited about this tech. Most tech folks are either excited or at the very least quite impressed with the advances that have been made over the last few years. I just hope the quality of discussion remains consistent regardless of who is the guest.
@@jimbojimbo6873 Please define "come up with something someone hasn't already" and tell me, have you actually ever done this - and what did you come up with?
@@everettvitols5690 i’ve made a dish I’ve never seen in a recipe book or online before. If we entertain hyperbole then something as grand as self driving cars that meets regulatory requirements is something we don’t haven’t conceived, if it can do something that’s truly innovative then I’d consider it AGI. i don’t see how LLMs will ever achieve that.
I would totally invest in Chollet's research. He has a tons of insight and clarity. I have ideas, but I don't have the background to do the work myself - my background is philosophy. I'd love to participate in this challenge, but it would take me years to embed myself in the academic institutions.
Francois was very patient in facing the barrage of questions from the LLM champion. I think a conversation with Jeff Hawkins would be very beneficial to understand, in a more general way, why some aspects of intelligence are missing in current deep learning.
@@ZachMeador We have to see who the uses are, we see that the majority were from students who wanted to falsify their work, but really the numbers of users are not significantly high, we must also take into account that it is possible that they even inflate the user base thanks to the mandatory implementation of LLM in internet browsers and operating systems
@@eprd313 We can know that DNN's are probabilistic adjusters, they can help us find the most likely possible answers, but they are not intelligent, nor is a search engine, in fact their implementation in search engines has been catastrophic, especially for Google where they prioritize the answers from Reddit, whether or not these are true does not matter
Voldemort asks all the right questions, but rarely more than two in a row, before he rests his throwing arm while firing off a bunch of nerf balls. His best probes are episodic rather than sustained. This stuff is simply too hard to command on an episodic basis.
People are actually much smarter on average than one tends to give them credit for. It's just that we are very very reluctant to use System II. We'll do literally everything else before deploying the full power. But if one's life depended on it or there was sufficient incentive, we can be extremely fast learners. We just naturally try not to get challenged this way in everyday life.
Even though I agree with what you're saying- one of the things researchers found that exists as a general difference between persons widely separated along the I.Q. spectrum was that the glucose uptake & thermal output in brains of lower I.Q. people were much greater than those on the higher. This indicates that a more generally intelligent mind is both thermally and resource efficient: expending less fuel and generating less waste per unit of output. What this points to is that some people can activate system 2 with considerably less cognitive burden. Since most of us are pleasure-maximising, instinctually and natively, and since it's distinctly unpleasurable to be in the uncomfortable state of mental strain/discomfort associated with glucose starvation or one's brain overheating, one might expect that behavioural inclination follows from ability. In the same way that a natural weakling doesn't enjoy lifting weights, and so avoids it, an intellectual weakling doesn't enjoy activating system II, and so avoids it. The fundamental reason is the same in both cases: we avoid that for which we lack rewarding feedback (relative to peers) and which relatively strains us (relative to peers). The fact that anyone _can_ activate system II means simply that everyone has and utilises a general form of intelligence. However, the fact that people _don't_ do this suggests that they have a relative deficit in system II (or rather in its utilisation) which explains this avoidant tendency, while simultaneously pointing to the degrees of difference in the general intelligence of people.
It might not be fair to say an LLM needs millions of training examples but a young child doesn't. By the time a child is old enough to solve these puzzles, they have 'trained' far more than that through interaction with the world. A more accurate comparison would be an untrained LLM vs. a baby.
@@allanc3945 And even evolutionary upbringing encodes priors in the brain, like symmetries and basic geometry, hence, human « training » starts far prior to being born
Lol I screenshotted the problem at 7:24 and asked ChatGPT. While the image it generated was completely off, its answer was almost there. I sent it the puzzle and asked "what would the 8th image look like?" It replied "[...] Considering the established pattern, the 8th image should show a continuation of the diagonal line, turning green at the intersection points with the red border. Therefore, the next step would involve the following arrangement: The blue diagonal line continues towards the right border. Any part of the diagonal line intersecting the red boundary turns green. So, the 8th image should depict a continuation of the diagonal line, transforming blue cells into green ones at the boundary intersection. [...]" So OK, it didn't get the perfect answer because our line becomes green even before meeting the wall. But it's pretty damn close. GPT 5 should smash these puzzles.
Over the past few months I've tried multiple tests similar to IQ tests on Gemini 1.5, Claude 3 Opus and recently GPT-4o, I've noticed there's some of the exercises related to "sequential selection" where it should guess the logic of the sequence order and select among multiple other elements to complete it, it seems very inconsistent, there's one test I've extracted from a logic test with geometric logic rules where each step gradually increase the complexity, GPT-4o succeeded 4/10 but it got some right in a incoherent order, as if the model wasn't actually reasoning, for 6/10 it failed with hallucination at the end of the rationales, there was some that was more complex that it got right while some where simpler it failed, similarly to Claude 3 Opus and Gemini 1.5. My conclusion is that these models don't logically reason despite Visual CoT prompting and high-resolution images for the tests, they generalize over multiple similar training samples, they can't logically reflect as we use to.
@@victormustin2547 He's still right, no AGI will come out from LLMs though. Just read some research papers about "Grokking" and "platonic representation" to understand the insurmontable plateau of LLMs
@@TheRealUsername In the face of what's going on right now (LLMs get better and better) I'm not going to believe in a hypothetical plateau until I see it!
@@victormustin2547 He was very happy to respond to lots of the points with "that's an empirical question, we'll see very soon", and has been very explicit about what would prove him wrong (a pure LLM which beats ARC without massive ARC-specific fine tuning), and has even offered a cash prize to incentivize people solving this problem. This is all great behaviour and should be encouraged. It's exactly how disagreements should be handled! If he does get proven wrong and graciously concedes, then he will deserve to be commended for that. It's weird that you'd take pleasure from someone getting proven wrong when they were so open to having their mind changed in the first place. That's not great behaviour. It's that kind of attitude that makes people defensive, so they become entrenched in their positions and refuse to listen to other points of view. You should celebrate when people admit they're wrong and change their mind, but you shouldn't gloat about it.
Always worth pointing out the LLMs require a server farm that has the energy requirements of a small state, whereas the human brain runs pretty effectively on a bowl of cheerios. I think more people should think about this!
While this is true, I think it misses the point of the eventual advantage of deep learning systems. Human brains are fixed in size right now, mostly due to the evolutionary pressure of the size of the birth canal. Even if deep learning is multiple orders of magnitude less data and compute efficient than human brains (excluding the horrible compute efficiency of evolution to get us where we are), we can still scale the models to run on ever more power hungry data centers to surpass human brains. At the same time we can do this, our algorithmic and data sample efficiency gets better too, improving the ceiling that we can achieve.
@@CyberKyleall of that advancement will lead to great things but at its core foundation a LLM cannot achieve AGI. Also keep in mind these models are not even scratching the surface of the brains capabilities to apply intuition, rational, morality, and many other things that contribute to decision making beyond just simply data processing.
@@Martinit0 that’s not true, the operations needed for inference can be sharded across many nodes just like training. It’s just that training requires a ton of forward passes to see what the model outputs before you backpropagate the errors, so it requires large clusters to complete training in a reasonable timeframe. It is conceivable that you could make a ginormous model with many trillions of parameters that you’d shard across many GPUs.
@@CyberKyle. Although the capabilities will possibly reach AGI even without radical efficiency improvements, AI will always be greatly limited in it's impact until the energy efficiency problem is solved. Most likely there needs to be a fundamental change to what type of hardware architecture is created for AI to run on that can be sparse computationally and reduce memory transfer costs by possibly combining memory and processing into a unified architecture "processing-in-memory" (PIM) like neuromorphic computing.
It seems like the ARC thing maybe would be difficult for LLMs because they are reliant on visual symmetry that wouldn't be preserved through tokenization? I mean, I'm sure it's not that simple, because then natively visual models would probably be solving them easily. But still, there should be a version of this test that has complete parity between what the human is working with and what the LLM is working with, I.E. already tokenized text data.
Chollet addressed this objection in the video by pointing out that LLMs actually do quite well on these kinds of simple visual puzzles, when the puzzles are very similar to puzzles they've been trained on. So this can't be the answer. We'll find out soon enough how well the multi-modal ones do.
New ideas are really just combinations of existing ideas. As such, LLMs can indeed create new things that are not in training data. Check my channel for the video "Can AI Create New Ideas?" for more details and examples of this. That aside, ARC is a fascinating benchmark that tests something entirely different: advanced few-shot pattern recognition. This feels like a powerful and important architectural component for future AGI systems, but I would not label it as "AGI" on its own.
I don't get all these comments saying that there was a lot of repeated questions. The way I see it, the subject was interesting and "tricky" enough to talk about it in depth like you guys did here, yes it would seem that you repeat the same question everytime, but the answers and explanations from Chollet were super interesting and every time we had a new way to look at it, nice interview.
Great interview. Good to a have a different well thought out perspective on AI. I appreciated that Dwarkesh was willing to press Chollet on his claims, in particular trying to nail him down on what exactly counts as generalizing beyond the data set. It seems that he didn't really have a good answer apart from "doing well on ARC". I still think he overestimates the extent to which average humans are able to do this, and underestimates the extent to which transformers are able to do this. Also, going from 0% on ARC to 35% in a few years seems like a lot of progress to me, so I'm really surprised he didn't think so. I would bet that they next generation of multimodal models get to 50-60% and that we get to 90% by 2030.
Chollet's claim about human intelligence being unique is weak. Even ants have demonstrated the ability to respond to novelty. In the end, we're all neural networks. Stop deifying human intelligence. Bottom line: Chollet is expressing a form of religion, not science.
It's such a fascinating discussion! It definitely deserves more views! I can disagree with Francois Chollet about LLM potential in general, but I must admit that his approach is extremely refreshing and novel. We need more people with nontrivial ideas to build True AGI because just scaling LLM is a risky approach: if we fail the new great AI winter is waiting for us.
I am not so sure that humans do much more then using pattern recognitions to solve the ARC-problems. When I look at them I very quickly recognitions patterns I saw somewhere else. Our brain has been trained on millions of image patterns by evolution.
How is the brain not interpolating or probabilistic? The only addition that the brain has is qualia and input from the cns, how significant they are for generalized agi is unclear yet. For reference: Oxfords Shamil Chandaria's lectures on the Bayesian brain.
@@mythiq_ It goes beyond interpolation and probabilistic solutions. Our brains are fundamentally able to abstract concepts with very few data points, proving that we are very sample efficient even when exposed to an entirely new set of data. LLMs are just really fancy estimators, capable of understanding the semantics of a given problem and generating an outcome based on similar problems its faced before. The semantic understanding of a problem enables interpolation. It does not abstract the given problem and then deal with it with it's own sense of understanding.
@@calvinjames8357the human brain recives 11 million bits of information per secound. Times that with 12 years as generous lower limit for when humans start doing very usefull intelligence, and we are talking about 520 terrabytes of data. Several times what llms recive. And thats not counting the data recived during evolution. Now sure, a lot of this data is redundant and mostly noise, but so is most llm pretraining data.. A baby takes months to be able to do anything besides its preprogrammed instincts like sleeping nursing and crying. And even then it is something simple like picking up toys. Arguably also things that are instinctual. Arguably almost nothing great, innovative, creative or intelligent is done by humans before the mid 20s. Early in a humans 23 year is when they have recived 1 petabyte of information from their senses. To say humans only get a few datapoints and then infer smart and novel ideas, is just plainly false. We only do that after we have been exposed to terabytes of information about the world around us. And we only do it well after petabytes.
First I was glad, that I could solve the Arc puzzles myself and apparently I am not a bot, but then I thought about, how would one do, who has not seen these kind of puzzles? Is the difference from humans to LLMs just the we are able to generalize with less samples in a broader field?
EXCELLENT EPISODE! These types of counter arguments against the LLM hype are SUPER IMPORTANT in this public debate, and both Francois and Dwarkesh made great points for both sides of the debate! The pushback from Dwarkesh was excellent, but we need that type of pushback against the proponents of scale = AGI as well.
Dwarkesh, don't listen to the comments, you did extremely well in this interview, much better than Lex when he challenges his guests. Well done and continue this line!
The section of the video from "Skill and Intelligence" to the end of "Future of AI Progress" made for such a compelling and engrossing watch. I must convey huge thanks to Dwarkesh for pressing on the matter-- with such vast grey-area (between "memorization" and "intelligence" (AGI))-- given the present state of research-- that allowed Francois to give out so many priceless points which can rarely be found elsewhere. Francois's insight and experience coupled with Dwarkesh's active back-and-forth questioning gave some insights which are extremely valuable, at least for someone like myself. And I have to commend his impeccable presence of mind, as at 47:58. If it not had been this active, it would have been a bland conversation (podcast). The questions assisted in giving out so many brilliant viewpoints by Francois. Those commenting on Dwarkesh's fast speaking style and Francois's slow paced speaking, it is their individual style. He (Francois) spoke in the same pace with Lex, and Lex himself is a slow-spoken man. It is one's individuality. With his same fast-paced speaking, he made the conversation with Sholto and Trenton very engaging and worthwhile, and so enlightening, as this one. And if I am not wrong, Francois wouldn't be this slow in uttering if he was rather speaking French. In the very beginning of this interview, Dwarkesh told that he had so many guests who were strong proponents of LLMs and their scalability-- which he found a bit "contrived." So, his attempt here is clearly to use those same arguments, some of his own viewpoints, and present the questions to Francois for a better and clearer insights for us, the listeners and the learners, as to what are the "bottlenecks" of LLMs which he perceives-- given there massive "accomplishments" in these last 4-5 years, and what different mechanisms or tests are their to achieve steps closer to "true AGI." This was the backdrop of his questioning. And he created a brilliant premise for such a learned and esteemed guest of the field. Had he not presented a case for LLM while Francois presented a case "against" LLM, it would have been a one-sided talk-- just asking questions on ARC. How else would we have come to know as to why LLMs haven't fared as well on ARC these past 5 years while they have done tremendously well on other benchmarks? How else we would have gotten so many precious inputs from Francois? I bow with respect to Francois for understanding the reasoning behind the premise set for this interview, and to Dwarkesh for his brilliant questions, "hard-presses" on so many important points, and for his presence of mind. I got to learn SO MUCH from this.
@@wookiee_1 CS degrees from the US in 2021 about 100,000 vs bootcamp grads about 200,000. Either way, I agree with you that coding != software engineering. But most "coders" claim to be engineers. My point was that there aren't that many software engineers needed per company, on average most "developers" do work that wouldn't be considered "knowledge work".
More interviews about the importance of system 2 thinking would be awesome, for instance John Carmack (of Doom and MetaAI fame) is also working on this... Your channel is becoming so popular it could easily lead to a technical breakthrough at this point
@@falklumo if that's true, why did Dwarkesh' ML friends who are working on AGI not know about the ARC benchmark and why were they surprised that the frontier models failed?
@@jeroenbauwens1986 The German guy (interviewed in immediately previous podcast) is young (inexperienced) and not a ML researcher. He stated that he was extrapolating straight lines derived from existing data. DP also recently interviewed a pair of guys about LLMs who had only been in ML for a year ie inexperienced.
@@bazstraight8797 so your point being that people like John Schulman and Ilya Sutskever are more knowledgeable.. I wouldn't be too sure they know about ARC though, Ilya has said in the past that scaling's essentially all you need. It sounds like this might be a blind spot in all of these companies. I guess Dwarkesh missed the opportunity to ask them
kudos to this amazing discussion - 1:00:03 Dwarkesh and Francois finally arrived at the consensus that its not just the scaling that is required - a low hanging fruit is upcoming architectural improvements. I am not sure if these will be brought about by LLMs or some other architecture like discrete program search system that Francois alluded to 52:39
I dont understand why hyperscale people are so stubborn…. It is obvious this architecture does not possess the ability to reason… Lets scale it up and use it but lets also spend equal resources on finding new architectures…
I think it's not hard to understand his point: Ask GPT to create a novel architecture for AGI, it can't because it can't comeup with an actual novel thing, but a mix of existing ones that looks "novel" but it really is not.
That's not the issue. The issue is that GPT can't think about the problem. Even humans come up with new ideas by recombining existing bits and pieces. Look at Jules Verne's "rocket" to the moon which was a big big cannon ball ...
@@falklumo okay but it's not the same to combine things to make a new thing than actually proposing a completely novel idea right? So what I understood from his claim was that actually models currently use what they already know and combine that, but they can't take a broader look and redefine, let's say quantum mechanics with a truly novel approach, isn't like that? Maybe I got that wrong IDK 🤔
I might have missed something then, I just thought that's what this test was all about, perform over completely unseen problems that require to actually abstract more from the details and into the bigger picture to try and find the underlying logic of it. I am not an expert on any of these things, I just thought that was it.
This is really insightful from francois - great articulation of memorization/benchmarks gaming, true generalization, intelligence and a good working definition of intelligence from Piaget - I recall the other frenchman yann quote him once too. I'd love to see the day when google/Francois ' team creates a new System 2 based Combinatorial search (discrete search) based engine that can do program synthesis on the fly with LLM embedded in them in the future!
Dwarkesh is totally right pressing on the critical claim that humans do more than template matching. Chollet couldn't give a single example. The question was very easy to understand.
@@power9k470 there is a structure to this world that we seem to be able to partially observe. We create models of this structure. Whatever model we create - the model structure can be understood as a template. And many theories share similarities. One could think of the application of modus ponens/ deduction as one such template.
@@power9k470 and it's also not about who is right. I am not sure. The problem is just that Chollet claimed there are many examples but couldn't name a single (besides his test).
@@vincentcremer4235 Chollet probably could not because he is a CS and ML guy. These subjects do not have profound paradigm shifts. This question is suitable for physicists and mathematicians.
Think about art. Generative AI cannot create art in new, never seen styles. If you trained a generative art AI on all the paintings up until 1850, do you think it could develop new styles as original as impressionism or surrealism or comic art or graffiti? If you trained a generative music AI only on music recorded before 1960, could it come up with new styles as original as punk or hip-hop or EDM? Art is constantly evolving. Sure, individual artists do much that is derivative. But the fact that art evolves demonstrates a human capability that cannot possibly be a form of template-matching.
"You cannot navigate your life using memorisation" - Chollet 29:05 My take: We need some form of memory matching for decision making. We definitely do store something in memory, but not the exact memory of the event, but in different abstraction spaces. The benefit of different abstraction spaces is that you can mix and match various modular blocks together, and easily construct something new with this. So, even if you have not encountered a situation before, you can have some idea of what to do based on something similar that you have encountered before. And the thing about human memory is that it need not even be an exact replica of what you have experienced - it can change based on retrieval, but as long as it is still relevant to the context, at least you are starting from something. So, memory is definitely the way we do decision making.
he talks about the importance of synthesising programs on the fly, using pieces of other programs and reassembling them for the task at hand. isn't that process just another form of a program, which could be incorporated into an LLM just like the various existing programs he claims don't represent intelligence?
@@perer005 i'm not saying they're the same thing. just that they're both technically programs potentially able to be embedded into an AI. it's not infeasible that there are patterns within the logic used to piece together a new program which the AI could learn from and improve with.
Prediction: Arc will be solved within a year. Dwarkesh will claim he was right because it was a scaled up model. Francois will claim he was right because it wasn't purely an LLM. In the end it doesn't matter because we got closer to AGI.
There’s a reason why our biological neural nets are so efficient. The fact that you need to scale an LLM model so much for it to get a grasp of the world model is telling of its inadequacies - it doesn’t diminish the value of scale or the methods themselves, it just signals their flaws. Intelligence may be a mix of a set of skills that include the memorization LLMs can run, and abstract reasoning from little context. System 1 and System 2 together, not one without the other. I’d expect a model that can actually reason to require much less compute, just like our brain does.
Our biological neural nets are not more efficient in terms of complexity. Just the opposite is true. Our largest AI systems are terribly simple compared to the complexity of our brains, way, way less than 1% the complexity.
something weird going on in these comments, and this podcast episodes, where everyone is totally talking past each other. honest question: what is your explanation for dwarkesh's question on gemini understanding, and translating, the dead out of sample language?
I have 2 questions for everyone present in the video... 1. If LLMs are able to perform economically valuable tasks through memorization and interpolation, does it matter if they aren't "truly" intelligent? 2. Can the advanced tooling and features offered within ChatGPT, combined with self-learning LLMs, be viewed as achieving the form of intelligence referenced in the video? (Can be any AI offering, using ChatGPT merely as an example)
interesting thought experiment: if you had all the data of everything that happened in medieval economic systems, before the industrial revolution: every conversation spoken, and lots of multimodal data, trained the biggest LLM ever on all that, and then jumped forward to today: how well would it do?
Dwarkesh just didn't get the point about what Francois says general intelligence is and gets stuck there. LLM training data is truly vast. It's packed with test questions, example test questions, text books from every country and decade, online courses, homework help requests, brain teasers, etc. It is extremely hard to come up with an actually new problem. That is - something that isn't similar to anything in the training data. For the common questions, the training LLM wouldn't make it far by storing the solution for every version of every problem. It keeps encountering new versions of those problems in training. So it evolves an algorithm to solve that kind of problem. What Francois is saying is that when it solves benchmark problems it can rely on a huge Bank of algorithms which each work on only one class of problem. The benchmarks don't contain many problems that are completely different from every other problem in the training data. Only those problems actual one shot creative problem solving.
What he said about IQ tests is also very important. In an official test, the only kind psychologists would accept as an accurate score, you have to wait two years before you can take it again. This is because familiarity with the test will artificially increase the score, making you think you're smarter than you actually are. If an hour long test can boost your intelligence for two years, what does a lifetime of problem solving do? Kids spend most of their waking hours playing games, solving puzzles, being taught by adults. Any of those things could end up being similar to an IQ test question. Imagine you had to make up a test for strength. The trouble is strength doesn't exist. What exists is hundreds of muscles, and each of those muscles has "strength", but it has a different strength depending on its extension, and it has different sustained strength and explosive strength, and so on and so on. So what do you do? You make a weight lifting competition. But weightlifting is only a very narrow type of motion and it uses different muscles in different ways from any other sport. So... what? Climbing? Pole vaulting? MMA? Strength can only be measured by asking people to perform some kind of sport, and different people do a lot of different sports. There has to be one test that everyone takes, otherwise how do you compare and rank people by their strength score? You could test every muscle in different positions and different conditions. But the test also has to be doable in less than 5 hours otherwise only the people who care about strength will do it, and the strength scientists need to test a lot of people from a lot of demographics to get a large enough sample size to publish their paper. And people certainly wouldn't watch it on TV. IQ is the same as strength. Someone decided there must be a "general intelligence" factor called G, and set out to measure it. But first of all, G doesn't exist. What exists is millions of neurons, and the anatomy of a single neuron is almost as complex, in its way, as the anatomy of the human body. But second of all, if G did exist, IQ would not measure it, because IQ tests are standardized questions crafted by formulas that could be similar to a bunch of things a human could be exposed to, and the test is short too. So all an IQ test does is measure the ability to take an IQ test. How many challenges and tasks in your minute to minute existence are anything like an IQ test? Not to mention the lives of your ancestors?
Dude, with all due respect, ease up and let the guest speak. You keep cutting him off with the same question while he's trying to finish his thoughts. Do you even code? If you used different LLM coding assistants for a while, you'd understand what he means. It's essentially glorified Stack Overflow. And the story won't be any different with any other transformer neural networks.
Listened to another podcast where he and his guest spent an hour chasing their tails about nationalisation of ai research. It was a very frustrating listen, and I finally gave up, just like I'm giving up on this one.
One advice to host: You need to give your guest space. You are not a salesman. Or a missionary. Challenging them does not mean repeating the same argument over and over again. It was suffocating to listen to your challenges. If it was not for the call and patient demeanor of Chollet, it would be impossible to watch. We were not able to listen to Chollet expanding upon his ideas because host just reversed the clock to zero by repeating the same "but memorization is intelligence" argument. It should be about your host, not showing the supremacy of your ideology or beliefs. If your host is wrong, you can prove them wrong by showing their arguments and ask questions as they expand upon them and then show if they are inconsistent. Not repeating the same thing over and over and over again
this. on the other hand, it gave me tons of reaction images hahaha, some of francois' sighs and nods are just gold
💯
One advice to you: You can make this point better without insulting Dwarhesh, who is a young man that is still learning. Perhaps you should try hosting a podcast and see if you do better. Want my guess? You would do much worse than you think.
@@Limitless1717ya ur right lol... he just a lil bit slow and no offence to him it just demonstrate how francois is a genius
@@Limitless1717 this is the most mundane criticism of a criticism. Dwarhesh himself is doing a podcast on topics he is not a specialist on and he is openly criticizing and challenging the views of a specialist on a topic here. So maybe he should work on AGI before challenging François here if he were to take your advice seriously (though he should try to educate himself on topics in any case)
And I am not doing podcasts but I have taught many many classes with a lot of teaching awards. Not the same but similar when it comes to expanding on topics. When I teach a concept I don't just attack it on the first sentence. I explain it, allow it to ripe. Put different light on different aspects of the topic. I don't try to destroy the whole concept on the first sentence.
So my advice doesn't come out of nowhere. And if he puts himself to public spotlight, my criticism is actually one of the most innocent stuff that he is thrown in his direction. If he takes into account he can improve upon. I am mostly criticizing how he is doing some stuff and even provide what he can do better. It is weird that you take offense to that. Anyways it is up to him to do what he wants but I won't watch him anytime sooner again. As it is now, this is really bad way of arguing with anyone even in private, let alone on a podcast. When someone interrupt me like he does and don't know how to argue, in general I just don't bother
What I learned from this: Be the guy who speaks slower under pressure, not the guy who talks faster!
what pressure
@@kjoseph7766 They were clearly putting their own opinions forward, which didn’t align. I would say that put pressure on each other to defend their beliefs. The one dude got super panicked as the French guy kept putting forward clearly articulated ideas and rebuttals.
After listening to this interview and reflecting on it, I was actually thinking that (not Dwarkesh because he's legitimately smart in my opinion) but "people" like Destiny talk extremely fast to compensate for shallow ill-formed thoughts.
@@ahpacific Dwarkesh could help himself out a lot if he slowed down. Simple things like rephrasing what 'french guy' was saying in a way that Francois would agree with would also help tremendously. There is a fundamental difference in epistemology between these two, Francois is emphasising true understanding, and Dwarkesh seems to imply that large gross memorisation leads to understanding- which I don't think Francois would agree with
@@ahpacific I am pretty sure Destiny is very transparent on when his thoughts are shallow or not. Notice when he formulates things as open questions and or the adjectives and intonations he uses to suggest the difference between perspective and fact. You can make a false statement and still create a good converstation like that. People like that are fun to talk to as opposed to people who will only say something if they are fully certain.
Thank goodness for Francois' infinite patience
Dwarkesh is LLM, Francois is AGI
perfectly said
@@ClaudioMartella I may have commented too quickly, it seems thar later on in the video I got the impression Dwarkesh was playing devils advocate. Not sure...
@@cj-ip3zh By impression you mean that he literally said he is playing the devils advocate?
@@wiczus6102 Right but I couldn't tell if he meant just in that shorter exchange or if the whole interview was him taking the opposite site for the debate.
Damn - perfectly put!
Need more interviews with legit AI/AGI skeptics to balance out the channel
There is money in that
Skeptics? You mean realists
nah fck the doomers
The issue is, highly intelligent AI skeptics are in short supply.
@@ModernCentristthat's the impression you get from content focused on hype merchants, alarmists, and those with a vested interest in overinflating the value of current methods. Academia is full of skeptics/realists.
Dwarkesh dismissed a few completely valid answers to try and steer the answer in his preconceived idea of LLMs, I didn’t like that, dude is smart, let him finish and actually take onboard his answer before asking another question
He said he was playing devil’s advocate, calm down. He does that with most of his guests. It generally makes for a more informative and engaging interview than simply taking everything your interview subject says at face value.
@@therainman7777 Dwarkesh said himself that they were going in circles. I think this was mostly due to Dwarkesh not really thinking about Chollet's responses in the moment. LLM hype causes brain rot in smart people too.
@@BrianPeiriswas like an llm was interviewing chollet
It was like llm vs analytical intelligence
It is pretty clear Dwarkesh has a lot of influence from the local SF AI folks. Watch his interview with Leopold. His command of the subject is admirable, and he smartly relies on the researchers in his circle to inform his understanding. Many notable people maintain quite strongly it's only a matter of scaling, and I thought he thoroughly went through these types of arguments. It was a valuable thing to do. What is an example of something Francois wasn't able to eventually effectively articulate because he was cut off?
Really glad to see people like Chollet are willing to say the emperor has no clothes. I'd recommend Subbarao Kambhampati's talks as well. He goes into some theories about _why_ people are being fooled into thinking that LLMs can reason.
Thanks for the additional resource
Subbarao is wonderful! Great recommendation.
Francois doesn't even sound like a skeptic, just an informed educator
he invented Keras- he knows his stuff
@@CortezBumf Now I see why there are so many here simping for a fraud like Chollet. Basically these people read "Deep learning with python" and thought Chollet was the frontier person in AI. It's hilarious and ironic. Chollet has made no contribution to frontier AI. He's nowhere near Ilya, Schulman, Hassabis and others who've been interviewed by Dwarkesh. He's just parroting LeCun's viewpoint mixing in his own ideas about general intelligence that are completely unverified.
Now I see why there are so many here simping for a fraud like Chollet. Basically these people read "Deep learning with python" and thought Chollet was the frontier person in AI. It's hilarious and ironic. Chollet has made no contribution to frontier AI. He's nowhere near Ilya, Schulman, Hassabis and others who've been interviewed by Dwarkesh. He's just parroting LeCun's viewpoint mixing in his own ideas about general intelligence that are completely unverified.
@@CortezBumf You do know what Keras is, right? It's a frontend to the libraries like PyTorch, Theano or Tensorflow that do the actual heavy lifting. It's basically some syntactic sugar for the masses who couldn't use the somewhat more complex libraries. Now that their interface is simplified Keras is redundant.
@@randomuser5237 lmao always appreciate a redditor-ass response
🤓☝️
Francois Chollet is an amazing guy. The best thing is he, like all the LLM guys, also wants to work toward AGI! He just doesn't think the current LLM paradigm will get us there. I'm really excited to see where this goes because he's challenging the current option space in exactly the right way
This interview actually got me believing LLMs might indeed get us there. The guest seems to believe in a form of intelligence that he idolizes but we haven't really seen. Dwarkesh was spot on that no scientist zero-shots their ideas.
Chollet is actually wrong though. The LLM guys are right. Experience is enough
Him and the others trying to achieve AGI is exactly what will get us eradicated. I agree with Chollet but at some point brute force memorization is all you need to solve any task, you just need to refit the training data.
@@JimStanfield-zo2pz If experience and memorization is enough for AGI, how did people create things they have not seen before? How did Mozart create his music? How did we create skyscrapers? How did we go to moon? How did we discover relativity and quantum physics?
Only someone who lived his life like parrot and not create or even attempt to create anything will say this
@@KP-fy5bf That's a fair point actually. If one claims LLMs and that approach is sufficient for solving our issues, they just need more development, I might agree. But once people seriously think LLMs are intelligence, it is a different story.
Dawarkech does not appear to have the ability to adapt on the fly in this interview! 😂
He showed by example that some humans are not AGI
One of the worst and least informed interviewers, the only way I can get through his videos is fast forwarding through his incessant low information babbling. Yet he keeps getting very smart people on to interview!
@@slm6873 dudes like this are good for pumping up the stock price
@@slm6873can you tell us what the dumbest question was and why it was dumb?
I was gonna like this comment, until I watched the whole talk.
Guys watch the whole talk before forming an opinion!
really interesting. @dwarkesh - great going as usual
So good to see someone let some air out of the LLM bubble. Dwarkesh might be a little challenged by this, but it’s great to get out of the echo chamber regularly.
yeah truly!!
This strikes me as moving the goal posts. Chollet doesn't seem to understand how dumb the average person is. Can LLMs replace programmers? Can the average human replace a programmer? Go watch Jerry Springer and then tell me how LLMs won't reach AGI. To the average human, these thing are already AGI. Everybody in this video is in the top 1%. They are so far up the intelligence scale that they can't even imagine what average intelligence looks like.
Wanted to agree with him, but Francois Chollet is way off. From the moment he mentioned not "copying" from stack overflow as some sort of example of humans handling novelty in the wild, it was clear he was idealizing some belief that he holds. He refuses to believe that creativity is mostly interpolation.
Edit: I was wrong. This AI cycle is so dead.
@@mythiq_what made you change your mind so abruptly?
Train an LLM to solve math but don't include anything related to calculus in the training data.
Then ask the LLM to solve a calculus problem and you'll see it fail.
That's essentially what Francois Chollet was saying.
Isaac Newton was able to introduce calculus based on his FOUNDATION OF MATH (Memory) and actual INTELLIGENCE (Ability to adapt to change)
Francois said the opposite, he actually said that a lot of human math skills rely on memorization too. But actual intelligence to discover/invent new math goes beyond this. This is why even winning a math olympiad would be as meaningless as winning chess, it's old math. Actual intelligence wouldn't win chess but invent the game - without being told so!
@@falklumo That's what I meant with the Isaac Newton sentence.
Making an AI Isaac Newton vs an AI Calculus Student is a nice and simple way to capture what they're trying to do. Making a great AI Calculus Student is an awesome accomplishment, but we really want a Newton.
Newton AI is more ASI than AGI
Most people cant discover calculus, some dont know how to apply it..
Many math teachers I've known say that if you memorize and apply the tricks, at least you'll pass the exams. You won'be great at math, but good enough. Up to some level math is about memory, like chess.
This is by far the best interview until now. We need to hear the skeptics too, not only the super optimists. I really like the french guy.
He's not a skeptic, he's a realist.
A human, with lower than average intelligence, can learn to safely drive a car in a few hours. No-one's created an AI that can safely drive a car on all roads even when trained on all the data available to mankind.
See the problem?
Bigger AI models require more training data in order to improve their performance. In other words, it's the greater volume of data that's improving their performance, not their intelligence (memory, not intelligence). An increase in intelligence would enable the model to improve performance without requiring more training data.
Reading between the lines, François doesn’t think that what LLM’s do doesn’t represent intelligence at all.
There was 20mins of:
"Can LLMs replace programmers?"
"No"
"But can they?
"No"
"But can they?
"No"
"But can they?
"No"
"But can they?
"No"
"But can they?
"No"
"But can they?
XD ... it simply becomes clear that LLMs can't replace programmers when you start using them everyday on your programmer job and realize how Bad they perform when you start to do just slightly complex logic
They were inventing an ARC puzzle on the fly 😂
This strikes me as moving the goal posts. Chollet doesn't seem to understand how dumb the average person is. Can LLMs replace programmers? Can the average human replace a programmer? Go watch Jerry Springer and then tell me how LLMs won't reach AGI. To the average human, these things already are AGI.
@jasonk125 they are not AGI because they can't perform every task an average human can. Also he was trying to explain they probably can't learn new novel tasks. The things at which LLMs excels are problems that have been solved numerous times and there is a lot of data around those.
Even so. The world and its operations is not interconnected and digitized enough for LLMs to take over.
@@GabrielMatusevich Well if Chollet wants to redefine AGI, and then say LLMs aren't AGI (which is what he does) then I guess there is no point arguing with him.
From his website: Consensus definition of AGI, "a system that can automate the majority of economically valuable work,"
Chollet's definition: "The intelligence of a system is a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty."
So they should have first come to an agreed upon definition of AGI (which they did not), before arguing about whether LLMs could meet that definition.
Your statement: "they are not AGI because they can't perform every task an average human can" is not arguing within Chollet's definitional framework. It is closer to the consensus framework.
@jasonk125 yea, is a good point. That reminds that I don't there is even actual consensus on a definition of just "intelligence" .. which makes it even harder 😆
It is pretty glaring how different Francois's interview is from Illya's. Maybe part of it is the result of Dwarkesh polish as in interviewer where for Illya it was a series of questions Illya minimally answered and here Francois openly expanded upon the problem. But, also from the start the two seemed different. Where Francois maintains all the features of researcher, who values an open exchange of ideas, Illya values secrecy and the advantages of being a first mover. I definitely enjoyed this interview much more.
Interesting observation that I had not noticed, but perhaps felt to some degree. Smart people share information freely as for them it is abundant whereas people who are less smart hold onto knowledge as it is scarce in their world. I think this is the easiest way to really identify if the person you are talking to is smart or just socially smart.
@@MetaverseAdventures No doubt Francois is generous with his understanding because for researchers understanding functions as a public good, ie it does not decrease in value with greater supply. More so, I think it demonstrates the differences of their respective positions. Ilya has a product he needs to deploy and make profitable, and he needs to secure advantage over the competition. It is intrinsically a much more narrow and concentrated effort. This can lead to good result in the short term, but long term, it’s an approach that tends to become stifling. This is also why Francois laments the shift brought about by OpenAI (which is rather ironic).
I don’t think Ilya wants to be first simply for its own sake. He is worried about the future and would like to steer the ship in the right direction. I don’t know exactly what Ilya saw at OpenAI but after watching what Leopold Ashenbrenner had to say, the departure of Jan Leike over the 20% compute that was promised but never delivered and hearing that Ilya is now starting his own AI company called Safe Super Intelligence I suspect he has a good reason to be worried.
@Matis_747 I do appreciate and understand the position you describe that Ilya is in and thus it colours all his words, but I am still not convinced he is as brilliant as made out. We will see as he started a new company that has no profit motive and thus I anticipate more flowing knowledge from him, especially around safety as that is his focus.He is a smart person no doubt, but there is just something that does not add up for me. Probably just me and my lack of understanding as I am just a user of AI and not a creator of AI. I look forward to seeing where Ilya takes things as maybe he is going to be the most important person in AI history, or fade into obscurity.
So... You're saying that Ilya is... Playing his card close to the vest, only responding to direct questions, almost a black box that you have to creatively prompt to get the right answer... Could he be a... LLM? :)
literally tortures him with "okay but isn't this AGI why is that not AGI?" questions and after not having any positive feedback asks "okay but let's suppose you lost your job to AGI"
To get chollet to concede, you must synthesize new programs. Sampling the database of conversations with LLM hype bros doesn't generalize.
I think these are two separate conversations. Host goes technical, guest goes philosophical on how these LLMs currently lack processes that helps them keep up with conversation the natural way.
I kinda know what chollet i's talking about. In case of Gemini, it's unable to answer from an existing thread. It goes into a cycle of vérification from its database automatically. It could simply mined from past conversations to respond. It'll be more efficient in my view, use less tokens, less resources and will be more efficient than the current architecture its in.
Also gemini can't process "on the fly' response either. For example for breaking news, it won't be notified but hours later.
@@maximumwalunderrated comment :)
@@h.c4898 No I don't think so
And in turn Francois keeps redefining what LLMs are doing and what intelligence is. He starts with LLMs just memorize, then they memorize templates but can't generate new templates, then they generalize but only locally, then they generalize enough to learn new languages that they haven't been exposed to but thats not intelligence either.... sure Francois. Call us when you've discovered AGI
"But if we gave LLMs as many Adderalls as I popped before this interview, would then get AGI?"
"Ok, that may work."
"That was a trick question. I snorted the Adderall."
🤣🤣
Bro's onto something
I could almost smell the Adderrall through my screen.
Insufferable 😫
😂
I think Francois really got his point across in the end there. I was leaning somewhat to the scaling hypothisis side, he made me question that more. In any case you have to give him credit him for actually coming up with interesting stuff to support his arguments, unlike many other critics.
“Coming up with” you mean like on the fly? 😂 but seriously though this guys a verified genius in his field, he didn’t just “com up with it” one morning
What timestamp?
1 million for this is as ridiculous as 1 million for P vs NP, it's a multi trillion dollar problem, it's like offering $1 for someone to find your lost supercar or something
lol absolutely, had the same thought. You could even expand it to "1 million for a discovery that will render capitalism and your reward useless :D"
even worse arguably, since it'd make money obselete
What? LLM's should be able to do this.
P vs NP would instantly change everything.
Are you implying that we’re nowhere near AGI?
Both of them can be solved by a kid in African on $100 android smart phone.
You say trillion only because you assume you need power plants and datacenters maybe one more time over the current total infrastructure, but that's just zero imagination. It's like if you asked in 1500s the answer would be you need 100 million horses and millions of km2 of woods to burn
What if the answer is that you need million times the power of Sun like advanced Kardashew 2 civilization?
I love how Francois explains his answers with great patience and a subtle smile in his eyes. What a dude.
Not subtle. More like a stubborn smile…
There is so much great information here.... I'm at 24:32 and Francois was saying "Generality isn't specificity scaled up..." He seems very aware that the current approach to LLM is bigger, better, more data, and he is right in noting that is not how human intelligence works. We don't need to absorb the whole of human information to be considered intelligent.
Dwarkesh brings up a solid point that no scientist ever "zero shots" their ideas. Francois is partly correct, but he's holding onto some tight beliefs there about creativity, novelty and interpolation.
TL;DR the host was combative in a way that made him come off as a salesman for AI rather than having a conversation about what the guest thinks for the first half of the conversation. also, the host refused to budge on his *belief* that current LLMs are capable of real understanding despite the guest's points to the contrary.
first time seeing this podcast so i don't have a frame of reference for the normal vibes of the host but he seemed extremely defensive. the guest seemed to keep calmly stating that memorization and understanding are two completely different things while the host just kept referring to anecdotes and examples of things that he thinks displays understanding. the example that set my radar off to this was the obscure language dictionary example.
After being shot down the first time by the guest by claiming that the ARC puzzles are a set of tests that it would be very hard to make training data for and if LLM's developed true understanding/adaptive capabilities then they should be able to pass the ARC puzzles easily. the host then tries to bring up the example of the chess models which the guest points out is almost exclusively pattern recognition and memorization and instead of wrestling with that point he moves back to the obscure language point. i think that evasion of the chess point is actually extremely telling. if he truly believed that was a good point, he might have pushed back on it or tried to justify why he brought it up but instead he says "sure we can leave that aside" immediately. maybe I'm being a little cynical. maybe he realized that was actually a bad point for the argument he was trying to make.
regardless, he went back to the obscure language point which may have been impressive if it was not for the rest of this conversation to this point. earlier, the host tried to give an example of a simple word problem that had to do with counting. the guest countered that with all of its training data, it probably was just referencing a word problem that it had before which, from my understanding of how these things work, is probably accurate. the host clearly did not understand this earlier point because the thing about language models that the guest has to point out AGAIN is that the training data probably contains similar information. not necessarily data on that language but to my imagination, the data probably contains a lot of different dictionaries in a lot of different languages. dictionaries on top of having similar formats across most languages also typically have complete sentences, verb conjugations and word classes. i can see how the guest's point about memorization and pattern recognition would apply to LLM's in this aspect.
as i continue watching i am realizing that this has turned into a debate on whether or not LLM's have the capability to understand and process information as well as synthesize new information which i was not expecting nor did i want. i think it is intuitively understood that current models are not capable of these things. this is why they require so much training data to be useful. there were genuinely good parts of this podcast, but the host insisting that LLMs understand things in the way that humans do were not it.
this is a little nitpicky but there was a point when the host said something like 'lets say in one year a model can solve ARC, do we have AGI?'. to me this comes of as extremely desperate because the most obnoxious part of that question is also the most useless. the timeframe in which this may happen is completely irrelevant to the question. the guest at no point argued anything about timeframes of when he thinks AGI might happen. in fact when the guest answered in the affirmative the conversation took a turn for the better.
finally if you haven't gone and taken the ARC test, i would encourage you to do so because neither the host nor the guest did a very good job explaining what it was. but on the second or third puzzle, i intuitively understood why it would be hard to get our current generation of models to preform well on those tests. they require too much deliberate thought about what you are looking at for the current models to pass. it almost reminded me of the video game "the witness" in its simplicity with the only clues as to how to solve the puzzles in both games is with context of earlier puzzles.
You summed up my feelings completely
I agree with you. But I think you should still check out some of his other podcast episodes
@@young9534 i probably wont. the host did not really make me want to see more of him. i am kind of tired of being evangelized to about this tech in its current state. i will likely still follow the space and continue to learn more and seek more information. i hope this host does the same honestly. seems like the space is very full with people who want me to either believe that AGI is impossible or AGI is coming next year. i personally dont appreciate either side's dogmatism and will continue to try and find people with a more measured view on this stuff.
You are overly judgmental. He pushed back because the answers felt too abstract. If they felt too abstract to him, they would ALSO feel too abstract to others. There is literally no personal attachment involved. Too many hosts don’t push back nearly enough due to stuff like this.
Thank you for sharing your thoughts. Really helped me distill the conversation.
Dwarkesh way of talking reminds me of LLMs.
Hearing François reminds me of what humans are.
Makes sense because it seems like dwarkesh is much more knowledgeable
Quite the opposite effect on me. François felt like a calm android repeating "arc puzzle" and his beliefs about "novelty", like he has all the answers. Dwarkesh captures the frenzy of the puzzling human experience.
Dwarkesh is right though. Experience is enough. Chollet is just wrong, even in what he thinks LLMs are doing. LLMs do generalize. Patel didn't make the correct arguments.
I must say I don’t really like the way he keeps interrupting François during the interview
exactly. like whats the point of asking questions if you dont wanna hear the answer.
dwarkesh got that journalist mindest: "i only want to hear a certain answer, not hear what they want to say"
Great Interview. I've personally done about 50 of the online version of the ARC challenge and the gist of solving them is simply to recognize the basic rules that are used to solve the examples and apply that same rule to get the answer. While some are challenging, most are using basic rules such as symmetry, contained or not contained in, change in color or rotation; or a combo of more than on rules. I'm sure that current large LLMs like GPT4 have internalized these basic rules in order to answer questions. so proficiently. What is perplexing to me is why can't LLM extract those rules and apply them to get more than 90% on any ARC challenge. I think that is the crux of the matter that Francois is getting at. If to solve any ARC challenge basically requires one to identify the simple rules in an example then apply those rules, why are LLMs not crushing it?
Because LLMs - once trained - don't extract rules from input data and do another step of applying those rules. That would be precisely the "synthesizing" step that Chollet talked about. LLMs just ingest the input and vomit out the most likely output. The human equivalent is a gut-feel reaction (what we call intuition) without attempt of reasoning.
Because they can't generalize from 2 examples the rules like humans do.
I wish this went for longer than an hour, it's refreshing to hear LLM skeptics to balance my view of AI. Yann next?
Not sure they would have really gotten any further with more time. I'm 40 minutes in and the conversation basically seems to go in circles. Dwarkesh: "but this and this behavior by LLMs could be interpreted as intelligence, couldn't it?", Francois: "if that were true, then they would be able to perform well on ARC".
@@QuantPhilosopher89 I think it's good because honestly I know a lot of people like Dwarkesh. I obviously have very different metaphysical presuppositions than most people, so being able to find someone who is able to push back against LLM hype in a way that's understandable and reasonable is nice.
@@QuantPhilosopher89 well, if you stopped there then you missed some good insights
@@QuantPhilosopher89 Francois would have had a lot to say about program synthesis I would have liked to hear though ...
Francois is no LLM sceptics. He sees value of LLMs scale. He's only saying the obvious, LLMs cannot become AGI, only be a part of it.
This one line from Francois summed up his argument - generality is not specificity at scale.
And that argument relies on the most complex and inclusive definition of memorisation when evaluating if LLMs are just memorizing. Ans then the most simple definition of memorization when evaluating the usefullness of memorization.
Well, it took six months. Six months from the publishing of this video until an LLM-based model scored > 85% (OpenAI's o3).
under their condition it scored 75. wich is still insane to me
28:55 When the point in your day where you first need to synthesize an entirely new template is while interviewing Francois Chollet.
francois is about the only mainstream AI researcher i follow, all the other dorks saying scale is all we need should take a class in critical thinking.
Would love to see a conversation between Ilya and Francois
I think Ilya's understanding is deeper.
@@netscrooge because?
@@jameshuddle4712 His conceptual framework isn't merely sophisticated; it has closer ties to reality.
@@netscrooge Thank you for that insight. I assumed, considering his roots, that he was simply part of the LLM, crowd. Now I will listen with fresh ears.
@@jameshuddle4712 I'm mostly going by what comes out of his own mouth. But if you can find the right interview, we can also hear what Hinton says about working with Ilya when he was his student, the startling way his mind could leap ahead.
Nice to get a little return to earth with this one, pie in the sky talk about AGI is fun but the difference between that and LLMs still seems pretty huge.
The thing about exponential curves is that great distances can be crossed in surprisingly little time. Which is most likely what is going happen. We’re not as far as you think, or as Chollet is making it seem.
@@therainman7777 You know that if AI systems were to increase in performance every year by only 1% of each previous year, that would still be considered exponential growth.
@@10ahm01 Yes, and after a sufficient number of years each 1% increment would represent a huge increase in capabilities, just as a 1% return on a trillion dollars is 10 billion dollars.
Also, on an empirical level, we can actually estimate the % increase per year in technological progress using a number of different metrics, and it is nowhere near 1%. It is far, far larger than that. Moore’s law, to give just one example, equates to roughly a 40% increase per year. And many metrics relevant to AI development, such as GPU compute, are increasing even faster than that. So your point about 1% per year increases is also irrelevant for this reason.
Lastly, this is not an exponential trajectory that start two or three years ago; it started decades ago. Which means the absolute increments of progress per annum at this point is quite large.
source?
@@therainman7777a sigmoid function looks like an exponential in the beginning.
This has quickly become my goto podcast.. thanks Dwarkesh!
This is the most constructive debate I have watched on AGI to be honest. Bravo Patel for asking the right question to Francois. Definitely makes me think more deeper about all of it
Thanks for this! Most interesting conversation on LLMs I've heard for a long time. I think programme memorisation vs novel programme creation is an important distinction. I can personally buy the idea that we mostly rely on programme memorisation in daily life, but we clearly rely on novel programme creation happening at some point! But unsure on the degree to which that happens within individual brains vs happening out of collective processes e.g. cultural evolution etc
That was very well thought. And, if it is a collective process LLM can now be an intrinsic part of it. So well done. Vygotsky would agree.
ARC is a great milestone for REAL research, in part because it's immune to brute force efforts where progress can be faked (all results are not equal). The prize money might get people to try things out, but at it's core, a "working solution" signals the existence of new technology (and potential to leap over $100M models). Posting "winning numbers" is less about prize money and more about $xxM in VC funding.
We shouldn't be surprised though that whatever AI solves this will still get accusations of having brute forced it... moving of AGI goalposts and all.
@@PhilippLenssen Why would such accusations necessarily be wrong? This topic was mentioned during the podcast, and the fact that the test isn't perfect was conceded. Why do you think it is a perfect test, when the creator doesn't?
@@PhilippLenssen That will be a problem with any test, but a system that can form new behaviors to understand/solve problems will be able to, as a single system, take-on all challenges, even new ones, without having to go thru a costly training process. Even when we get something that is human-level, people will still question how much is AI, and how much is hidden offshore humans. Only "the real thing" will have staying power.
@@VACatholic A lot of benchmarks are good, as long as you are going at them honestly. Long before adding language abilitites, a system should be human-level on all games (that don't involve language).
Here is the correct way to push-back on the LLM scale still gets there thing: Having a set of all "solution patterns" stored doesn't do anything; it's the human, doing the prompting, that connects the stored pattern with what it needs to be aplied on. With ARC, no one gets to see the test data, so any system has to operate at a level where what it can do is searched based on what it can see. And the key aspect of ARC/AGI, is a system that creates it's own internal solution based on unseen challenges (ie, discovers novel solutions and saves them).
Thanks Francois for the reminder that we can't just scale our way to mastering intelligence you can't memorize everything. I took this approach in college and it ultimately fails.
yep. need true UNDERSTANDING to solve NEW PROBLEMS
LLMs are not just about memorization
@@JumpDiffusionand GPTs are not just LLMs
@@JumpDiffusion .. That is literally what they are about. They just have seen a LOT.
@@JumpDiffusion Exactly. Why do they keep parroting the memorization bit. François knows better than to say that there's some copy of the code that the LLMs memorized.
Finally Dwarkesh got hold of a guest who talks sense out of this LLM madness. LLM will get nowhere near AGI, multimodal or not.
But DNNs will, and LLMs are built on them
@@eprd313 It's not just an architectural issue. The whole epistemological foundation of the prevailing approach to AGI is shaky as hell ( th-cam.com/video/IeY8QaMsYqY/w-d-xo.html )
@@maloxi1472 that video aged like spilled milk. All the progress made in the last year contradicts its claims and the distance from AGI is now more of a matter of hardware than software, a distance that AI itself is helping us cover as it's designing microchips more efficiently than any human could.
I can't believe DP was arguing that the ceaser cipher of any N is not enough to close the case on this issue.
Literally, dude. I was neutral on the matter but that convinced me
Your podcasts are absolutely fantastic! I always eagerly anticipate each new episode and have learned so much from your content. Thank you for your hard work. This episode especially was inspiring and it gave me so many ideas to try and think about.
This is the first AI talk by Dwarkesh I actually enjoy.
Damn, I haven't even finished Leopold yet and you're hitting me with Francois Chollet!?
Not complaining, though.
❤ same here :)
Also same. Leopold is 4+ hours though...
Same. I couldn't take leopolds effective altruism based ccp fearmongering.
The perfect ever-changing ARC puzzle set already exists in the dynamic environment of driving a vehicle. This is a test that can't be solved with yet more examples because there is always something unique happening that throws the self-driving program into disarray with the resultant ripping off a Tesla driver's head or the idiotic smashing into the rear of a stationary emergency vehicle. If AI could become AGI through bigger data sets and more memory, then we'd already have flawless self-driving cars, robotaxis and AGI. We don't. Not even close. I think Chollet has highlighted the missing piece of the AI puzzle, pun intended.
Hey you got yourself an actual expert about the subject. Thanks! 🙏
I appreciate and very much enjoy these podcasts. I also fully understand the need to play devils advocate. However, to me this felt a lot more biased than most of the other episodes. It's clear which position Dwarkesh has chosen. That's fine, but it really shines through when someone who is not an LLM maximalist is on the podcast.
Devils advocate? Yes, always do that. Extreme bias where it becomes waiting for your turn to speak over a discussion? Not ideal in my opinion.
I hope if he sees this he doesn't take it personally. Obviously he's very excited about this tech. Most tech folks are either excited or at the very least quite impressed with the advances that have been made over the last few years. I just hope the quality of discussion remains consistent regardless of who is the guest.
Thank you someone called it out, an LLM will not achieve AGI, it’s like building faster cars to achieve flying cars.
Plenty of cars have flown. Just not controllably 🤣
Who's building LLM anyways? They're old by now
@@mint-o5497 it fundamentally cannot come up with something someone hasn’t already. That’s the issue. They are two different problem sets.
@@jimbojimbo6873 Please define "come up with something someone hasn't already" and tell me, have you actually ever done this - and what did you come up with?
@@everettvitols5690 i’ve made a dish I’ve never seen in a recipe book or online before. If we entertain hyperbole then something as grand as self driving cars that meets regulatory requirements is something we don’t haven’t conceived, if it can do something that’s truly innovative then I’d consider it AGI. i don’t see how LLMs will ever achieve that.
I learned a lot from this. This was so cool! Thanks Dwarkesh
I would totally invest in Chollet's research. He has a tons of insight and clarity.
I have ideas, but I don't have the background to do the work myself - my background is philosophy. I'd love to participate in this challenge, but it would take me years to embed myself in the academic institutions.
I knew you would bring Francois on the show one of these days. Thanks for making it be today! 🎉❤
Francois was very patient in facing the barrage of questions from the LLM champion. I think a conversation with Jeff Hawkins would be very beneficial to understand, in a more general way, why some aspects of intelligence are missing in current deep learning.
LLms are arguably the most significant tech bubble in human history. The gap between public expectations and their actual capabilities is insane.
but, people use them? every day? this is very different compared to shitcoins, which was much frothier, cumulatively
@@ZachMeadorPeople's expectations far surpass their uses. ChatGPT is nice but it isn't going to solve the Riemann Hypothesis.
When you don't understand the power of DNNs and/or don't know how to use them...
@@ZachMeador We have to see who the uses are, we see that the majority were from students who wanted to falsify their work, but really the numbers of users are not significantly high, we must also take into account that it is possible that they even inflate the user base thanks to the mandatory implementation of LLM in internet browsers and operating systems
@@eprd313 We can know that DNN's are probabilistic adjusters, they can help us find the most likely possible answers, but they are not intelligent, nor is a search engine, in fact their implementation in search engines has been catastrophic, especially for Google where they prioritize the answers from Reddit, whether or not these are true does not matter
I'm so glad Dwarkesh is doing these interviews. He asks all the key questions. Unlike another science/tech podcaster who shall remain unnamed.
Voldemort asks all the right questions, but rarely more than two in a row, before he rests his throwing arm while firing off a bunch of nerf balls. His best probes are episodic rather than sustained. This stuff is simply too hard to command on an episodic basis.
People are actually much smarter on average than one tends to give them credit for. It's just that we are very very reluctant to use System II. We'll do literally everything else before deploying the full power. But if one's life depended on it or there was sufficient incentive, we can be extremely fast learners. We just naturally try not to get challenged this way in everyday life.
Even though I agree with what you're saying- one of the things researchers found that exists as a general difference between persons widely separated along the I.Q. spectrum was that the glucose uptake & thermal output in brains of lower I.Q. people were much greater than those on the higher. This indicates that a more generally intelligent mind is both thermally and resource efficient: expending less fuel and generating less waste per unit of output. What this points to is that some people can activate system 2 with considerably less cognitive burden. Since most of us are pleasure-maximising, instinctually and natively, and since it's distinctly unpleasurable to be in the uncomfortable state of mental strain/discomfort associated with glucose starvation or one's brain overheating, one might expect that behavioural inclination follows from ability. In the same way that a natural weakling doesn't enjoy lifting weights, and so avoids it, an intellectual weakling doesn't enjoy activating system II, and so avoids it. The fundamental reason is the same in both cases: we avoid that for which we lack rewarding feedback (relative to peers) and which relatively strains us (relative to peers).
The fact that anyone _can_ activate system II means simply that everyone has and utilises a general form of intelligence. However, the fact that people _don't_ do this suggests that they have a relative deficit in system II (or rather in its utilisation) which explains this avoidant tendency, while simultaneously pointing to the degrees of difference in the general intelligence of people.
Dwarfish seems to be drinking the omnipotent LLM cool aid , saying that LLMs can do everything a human can do. Even Ilya admits the limitations
Yep! I think he just can't get past his preconceived notions and keeps banging on about something that was explained to him in the first few minutes.
It might not be fair to say an LLM needs millions of training examples but a young child doesn't. By the time a child is old enough to solve these puzzles, they have 'trained' far more than that through interaction with the world. A more accurate comparison would be an untrained LLM vs. a baby.
LOL, I should have watched the rest of the video before posting!
@@allanc3945 And even evolutionary upbringing encodes priors in the brain, like symmetries and basic geometry, hence, human « training » starts far prior to being born
Lol I screenshotted the problem at 7:24 and asked ChatGPT. While the image it generated was completely off, its answer was almost there.
I sent it the puzzle and asked "what would the 8th image look like?"
It replied
"[...] Considering the established pattern, the 8th image should show a continuation of the diagonal line, turning green at the intersection points with the red border. Therefore, the next step would involve the following arrangement:
The blue diagonal line continues towards the right border.
Any part of the diagonal line intersecting the red boundary turns green.
So, the 8th image should depict a continuation of the diagonal line, transforming blue cells into green ones at the boundary intersection. [...]"
So OK, it didn't get the perfect answer because our line becomes green even before meeting the wall. But it's pretty damn close. GPT 5 should smash these puzzles.
Over the past few months I've tried multiple tests similar to IQ tests on Gemini 1.5, Claude 3 Opus and recently GPT-4o, I've noticed there's some of the exercises related to "sequential selection" where it should guess the logic of the sequence order and select among multiple other elements to complete it, it seems very inconsistent, there's one test I've extracted from a logic test with geometric logic rules where each step gradually increase the complexity, GPT-4o succeeded 4/10 but it got some right in a incoherent order, as if the model wasn't actually reasoning, for 6/10 it failed with hallucination at the end of the rationales, there was some that was more complex that it got right while some where simpler it failed, similarly to Claude 3 Opus and Gemini 1.5. My conclusion is that these models don't logically reason despite Visual CoT prompting and high-resolution images for the tests, they generalize over multiple similar training samples, they can't logically reflect as we use to.
gpt5 is gonna blow past his expectations, coming back to this video when it does will be so fun
@@victormustin2547 He's still right, no AGI will come out from LLMs though. Just read some research papers about "Grokking" and "platonic representation" to understand the insurmontable plateau of LLMs
@@TheRealUsername In the face of what's going on right now (LLMs get better and better) I'm not going to believe in a hypothetical plateau until I see it!
@@victormustin2547 He was very happy to respond to lots of the points with "that's an empirical question, we'll see very soon", and has been very explicit about what would prove him wrong (a pure LLM which beats ARC without massive ARC-specific fine tuning), and has even offered a cash prize to incentivize people solving this problem.
This is all great behaviour and should be encouraged. It's exactly how disagreements should be handled! If he does get proven wrong and graciously concedes, then he will deserve to be commended for that.
It's weird that you'd take pleasure from someone getting proven wrong when they were so open to having their mind changed in the first place. That's not great behaviour. It's that kind of attitude that makes people defensive, so they become entrenched in their positions and refuse to listen to other points of view.
You should celebrate when people admit they're wrong and change their mind, but you shouldn't gloat about it.
Always worth pointing out the LLMs require a server farm that has the energy requirements of a small state, whereas the human brain runs pretty effectively on a bowl of cheerios. I think more people should think about this!
While this is true, I think it misses the point of the eventual advantage of deep learning systems. Human brains are fixed in size right now, mostly due to the evolutionary pressure of the size of the birth canal. Even if deep learning is multiple orders of magnitude less data and compute efficient than human brains (excluding the horrible compute efficiency of evolution to get us where we are), we can still scale the models to run on ever more power hungry data centers to surpass human brains. At the same time we can do this, our algorithmic and data sample efficiency gets better too, improving the ceiling that we can achieve.
@@CyberKyleall of that advancement will lead to great things but at its core foundation a LLM cannot achieve AGI. Also keep in mind these models are not even scratching the surface of the brains capabilities to apply intuition, rational, morality, and many other things that contribute to decision making beyond just simply data processing.
Not for inference. Inference can be done on a single (sufficiently large) GPU.
It's only the training of LLMs that requires massive server farms.
@@Martinit0 that’s not true, the operations needed for inference can be sharded across many nodes just like training. It’s just that training requires a ton of forward passes to see what the model outputs before you backpropagate the errors, so it requires large clusters to complete training in a reasonable timeframe. It is conceivable that you could make a ginormous model with many trillions of parameters that you’d shard across many GPUs.
@@CyberKyle. Although the capabilities will possibly reach AGI even without radical efficiency improvements, AI will always be greatly limited in it's impact until the energy efficiency problem is solved. Most likely there needs to be a fundamental change to what type of hardware architecture is created for AI to run on that can be sparse computationally and reduce memory transfer costs by possibly combining memory and processing into a unified architecture "processing-in-memory" (PIM) like neuromorphic computing.
It seems like the ARC thing maybe would be difficult for LLMs because they are reliant on visual symmetry that wouldn't be preserved through tokenization? I mean, I'm sure it's not that simple, because then natively visual models would probably be solving them easily. But still, there should be a version of this test that has complete parity between what the human is working with and what the LLM is working with, I.E. already tokenized text data.
An LLM can easily transform the JSON test files to ASCII art and still doesn't solve it.
Chollet addressed this objection in the video by pointing out that LLMs actually do quite well on these kinds of simple visual puzzles, when the puzzles are very similar to puzzles they've been trained on. So this can't be the answer.
We'll find out soon enough how well the multi-modal ones do.
New ideas are really just combinations of existing ideas. As such, LLMs can indeed create new things that are not in training data. Check my channel for the video "Can AI Create New Ideas?" for more details and examples of this. That aside, ARC is a fascinating benchmark that tests something entirely different: advanced few-shot pattern recognition. This feels like a powerful and important architectural component for future AGI systems, but I would not label it as "AGI" on its own.
One of the best episodes, love the challenging conversation
Great job Dwarkesh. Always v interesting videos.
Francois Chollet is a hero for assigning these limitations to compute in the evaluation for the ARC prize. Efficiency of inference is the thing.
In some respects some part of these discussions sounded more like arguing as opposed to interviewing
I don't get all these comments saying that there was a lot of repeated questions. The way I see it, the subject was interesting and "tricky" enough to talk about it in depth like you guys did here, yes it would seem that you repeat the same question everytime, but the answers and explanations from Chollet were super interesting and every time we had a new way to look at it, nice interview.
Great interview. Good to a have a different well thought out perspective on AI. I appreciated that Dwarkesh was willing to press Chollet on his claims, in particular trying to nail him down on what exactly counts as generalizing beyond the data set. It seems that he didn't really have a good answer apart from "doing well on ARC". I still think he overestimates the extent to which average humans are able to do this, and underestimates the extent to which transformers are able to do this. Also, going from 0% on ARC to 35% in a few years seems like a lot of progress to me, so I'm really surprised he didn't think so. I would bet that they next generation of multimodal models get to 50-60% and that we get to 90% by 2030.
Chollet's claim about human intelligence being unique is weak. Even ants have demonstrated the ability to respond to novelty. In the end, we're all neural networks. Stop deifying human intelligence. Bottom line: Chollet is expressing a form of religion, not science.
What a great initiative! Im so grateful to Dwarkesh for giving this publicity!
What is the name of the researcher they are talking about with test-time finetuning (mentioned by Dwarkesh at 14min mark)? It sounds like “Jack Cole”?
It's such a fascinating discussion! It definitely deserves more views! I can disagree with Francois Chollet about LLM potential in general, but I must admit that his approach is extremely refreshing and novel. We need more people with nontrivial ideas to build True AGI because just scaling LLM is a risky approach: if we fail the new great AI winter is waiting for us.
I am not so sure that humans do much more then using pattern recognitions to solve the ARC-problems. When I look at them I very quickly recognitions patterns I saw somewhere else. Our brain has been trained on millions of image patterns by evolution.
Exactly - and that’s the point Dwarkesh was challenging Chollet to address, but Chollet refused.
One of the best AI talks I've seen in the last months,thumbs up for Francois Chollet and Mike Knoop. Also thanks for
Dwarkesh Patel bringing them :)
The thing is that LLM's can only do correlation, not cognitive abstraction. Remember that they are probabilistic models.
How is the brain not interpolating or probabilistic? The only addition that the brain has is qualia and input from the cns, how significant they are for generalized agi is unclear yet. For reference: Oxfords Shamil Chandaria's lectures on the Bayesian brain.
@@mythiq_ It goes beyond interpolation and probabilistic solutions. Our brains are fundamentally able to abstract concepts with very few data points, proving that we are very sample efficient even when exposed to an entirely new set of data. LLMs are just really fancy estimators, capable of understanding the semantics of a given problem and generating an outcome based on similar problems its faced before. The semantic understanding of a problem enables interpolation. It does not abstract the given problem and then deal with it with it's own sense of understanding.
@@calvinjames8357the human brain recives 11 million bits of information per secound. Times that with 12 years as generous lower limit for when humans start doing very usefull intelligence, and we are talking about 520 terrabytes of data. Several times what llms recive. And thats not counting the data recived during evolution. Now sure, a lot of this data is redundant and mostly noise, but so is most llm pretraining data..
A baby takes months to be able to do anything besides its preprogrammed instincts like sleeping nursing and crying. And even then it is something simple like picking up toys. Arguably also things that are instinctual.
Arguably almost nothing great, innovative, creative or intelligent is done by humans before the mid 20s. Early in a humans 23 year is when they have recived 1 petabyte of information from their senses.
To say humans only get a few datapoints and then infer smart and novel ideas, is just plainly false. We only do that after we have been exposed to terabytes of information about the world around us. And we only do it well after petabytes.
I love the stark contrast to the last episode! Very interesting guests!
First I was glad, that I could solve the Arc puzzles myself and apparently I am not a bot, but then I thought about, how would one do, who has not seen these kind of puzzles? Is the difference from humans to LLMs just the we are able to generalize with less samples in a broader field?
The unique part is thst humans can use their generalizations to predict/understand the world. It is not about data storage.
EXCELLENT EPISODE! These types of counter arguments against the LLM hype are SUPER IMPORTANT in this public debate, and both Francois and Dwarkesh made great points for both sides of the debate! The pushback from Dwarkesh was excellent, but we need that type of pushback against the proponents of scale = AGI as well.
Dwarkesh, don't listen to the comments, you did extremely well in this interview, much better than Lex when he challenges his guests. Well done and continue this line!
It’s a very engaging conversation, clearly the host is very passionate about the topic and excited to converse
I am beyond fully convinced that this is the best podcast on the whole of the global internet.
Thanks!
The section of the video from "Skill and Intelligence" to the end of "Future of AI Progress" made for such a compelling and engrossing watch. I must convey huge thanks to Dwarkesh for pressing on the matter-- with such vast grey-area (between "memorization" and "intelligence" (AGI))-- given the present state of research-- that allowed Francois to give out so many priceless points which can rarely be found elsewhere. Francois's insight and experience coupled with Dwarkesh's active back-and-forth questioning gave some insights which are extremely valuable, at least for someone like myself. And I have to commend his impeccable presence of mind, as at 47:58. If it not had been this active, it would have been a bland conversation (podcast). The questions assisted in giving out so many brilliant viewpoints by Francois.
Those commenting on Dwarkesh's fast speaking style and Francois's slow paced speaking, it is their individual style. He (Francois) spoke in the same pace with Lex, and Lex himself is a slow-spoken man. It is one's individuality. With his same fast-paced speaking, he made the conversation with Sholto and Trenton very engaging and worthwhile, and so enlightening, as this one. And if I am not wrong, Francois wouldn't be this slow in uttering if he was rather speaking French.
In the very beginning of this interview, Dwarkesh told that he had so many guests who were strong proponents of LLMs and their scalability-- which he found a bit "contrived." So, his attempt here is clearly to use those same arguments, some of his own viewpoints, and present the questions to Francois for a better and clearer insights for us, the listeners and the learners, as to what are the "bottlenecks" of LLMs which he perceives-- given there massive "accomplishments" in these last 4-5 years, and what different mechanisms or tests are their to achieve steps closer to "true AGI." This was the backdrop of his questioning. And he created a brilliant premise for such a learned and esteemed guest of the field.
Had he not presented a case for LLM while Francois presented a case "against" LLM, it would have been a one-sided talk-- just asking questions on ARC. How else would we have come to know as to why LLMs haven't fared as well on ARC these past 5 years while they have done tremendously well on other benchmarks? How else we would have gotten so many precious inputs from Francois?
I bow with respect to Francois for understanding the reasoning behind the premise set for this interview, and to Dwarkesh for his brilliant questions, "hard-presses" on so many important points, and for his presence of mind. I got to learn SO MUCH from this.
Dwarkesh sure hates code monkeys.
Maybe he got bitten by one as a kid
@@wookiee_1 He is right though. Why else would the majority of US coders hired in the last 5 years come out of 12 week bootcamps.
@@wookiee_1 CS degrees from the US in 2021 about 100,000 vs bootcamp grads about 200,000. Either way, I agree with you that coding != software engineering. But most "coders" claim to be engineers. My point was that there aren't that many software engineers needed per company, on average most "developers" do work that wouldn't be considered "knowledge work".
This completely makes sense and has fully brought me back down to earth regarding LLMs 😂
More interviews about the importance of system 2 thinking would be awesome, for instance John Carmack (of Doom and MetaAI fame) is also working on this... Your channel is becoming so popular it could easily lead to a technical breakthrough at this point
The importance of System-2 thinking is now a trivial fact for everybody working on AGI. But this channel helps to popularize this.
@@falklumo if that's true, why did Dwarkesh' ML friends who are working on AGI not know about the ARC benchmark and why were they surprised that the frontier models failed?
@@jeroenbauwens1986 The German guy (interviewed in immediately previous podcast) is young (inexperienced) and not a ML researcher. He stated that he was extrapolating straight lines derived from existing data. DP also recently interviewed a pair of guys about LLMs who had only been in ML for a year ie inexperienced.
@@bazstraight8797 so your point being that people like John Schulman and Ilya Sutskever are more knowledgeable.. I wouldn't be too sure they know about ARC though, Ilya has said in the past that scaling's essentially all you need. It sounds like this might be a blind spot in all of these companies. I guess Dwarkesh missed the opportunity to ask them
Thanks Dwarkesh for giving Francois some more reach.
I'd recommend watching Francois' 2nd Lex interview for more context on his thinking.
He is young and excited with all the knowledge, give him time. Amazing conversation.
kudos to this amazing discussion - 1:00:03 Dwarkesh and Francois finally arrived at the consensus that its not just the scaling that is required - a low hanging fruit is upcoming architectural improvements. I am not sure if these will be brought about by LLMs or some other architecture like discrete program search system that Francois alluded to 52:39
I dont understand why hyperscale people are so stubborn…. It is obvious this architecture does not possess the ability to reason… Lets scale it up and use it but lets also spend equal resources on finding new architectures…
But it does reason... otherwise we would be stuck with GPT 2 level intelligence
Lol. Have you ever tried paid models?
“I don’t understand…”. Yeah, we see that…
The channel definitely deserves more subscribers) Extremely interesting discussion, and I'm looking forward to new interviews.
I think it's not hard to understand his point: Ask GPT to create a novel architecture for AGI, it can't because it can't comeup with an actual novel thing, but a mix of existing ones that looks "novel" but it really is not.
That's not the issue. The issue is that GPT can't think about the problem. Even humans come up with new ideas by recombining existing bits and pieces. Look at Jules Verne's "rocket" to the moon which was a big big cannon ball ...
Not actually true.
@@falklumo okay but it's not the same to combine things to make a new thing than actually proposing a completely novel idea right? So what I understood from his claim was that actually models currently use what they already know and combine that, but they can't take a broader look and redefine, let's say quantum mechanics with a truly novel approach, isn't like that? Maybe I got that wrong IDK 🤔
I might have missed something then, I just thought that's what this test was all about, perform over completely unseen problems that require to actually abstract more from the details and into the bigger picture to try and find the underlying logic of it. I am not an expert on any of these things, I just thought that was it.
This is really insightful from francois - great articulation of memorization/benchmarks gaming, true generalization, intelligence and a good working definition of intelligence from Piaget - I recall the other frenchman yann quote him once too. I'd love to see the day when google/Francois ' team creates a new System 2 based Combinatorial search (discrete search) based engine that can do program synthesis on the fly with LLM embedded in them in the future!
Dwarkesh is totally right pressing on the critical claim that humans do more than template matching. Chollet couldn't give a single example. The question was very easy to understand.
You think the discovery of quantum mechanics and general relativity was template matching. You know nothing about the capability of the human mind.
@@power9k470 there is a structure to this world that we seem to be able to partially observe. We create models of this structure. Whatever model we create - the model structure can be understood as a template. And many theories share similarities. One could think of the application of modus ponens/ deduction as one such template.
@@power9k470 and it's also not about who is right. I am not sure. The problem is just that Chollet claimed there are many examples but couldn't name a single (besides his test).
@@vincentcremer4235 Chollet probably could not because he is a CS and ML guy. These subjects do not have profound paradigm shifts. This question is suitable for physicists and mathematicians.
Think about art. Generative AI cannot create art in new, never seen styles. If you trained a generative art AI on all the paintings up until 1850, do you think it could develop new styles as original as impressionism or surrealism or comic art or graffiti? If you trained a generative music AI only on music recorded before 1960, could it come up with new styles as original as punk or hip-hop or EDM?
Art is constantly evolving. Sure, individual artists do much that is derivative. But the fact that art evolves demonstrates a human capability that cannot possibly be a form of template-matching.
"You cannot navigate your life using memorisation" - Chollet 29:05
My take: We need some form of memory matching for decision making. We definitely do store something in memory, but not the exact memory of the event, but in different abstraction spaces.
The benefit of different abstraction spaces is that you can mix and match various modular blocks together, and easily construct something new with this. So, even if you have not encountered a situation before, you can have some idea of what to do based on something similar that you have encountered before.
And the thing about human memory is that it need not even be an exact replica of what you have experienced - it can change based on retrieval, but as long as it is still relevant to the context, at least you are starting from something.
So, memory is definitely the way we do decision making.
he talks about the importance of synthesising programs on the fly, using pieces of other programs and reassembling them for the task at hand. isn't that process just another form of a program, which could be incorporated into an LLM just like the various existing programs he claims don't represent intelligence?
Having a program capable of creating programs to solve specific tasks, and having a large number of availible specific programs is not the same thing!
@@perer005 i'm not saying they're the same thing. just that they're both technically programs potentially able to be embedded into an AI. it's not infeasible that there are patterns within the logic used to piece together a new program which the AI could learn from and improve with.
This is by far my favorite video you’ve ever done, really great to hear the other side
Prediction: Arc will be solved within a year. Dwarkesh will claim he was right because it was a scaled up model. Francois will claim he was right because it wasn't purely an LLM. In the end it doesn't matter because we got closer to AGI.
This, and people will still be stuck in their reactionary prejudices or desperation to see change
There’s a reason why our biological neural nets are so efficient. The fact that you need to scale an LLM model so much for it to get a grasp of the world model is telling of its inadequacies - it doesn’t diminish the value of scale or the methods themselves, it just signals their flaws.
Intelligence may be a mix of a set of skills that include the memorization LLMs can run, and abstract reasoning from little context. System 1 and System 2 together, not one without the other.
I’d expect a model that can actually reason to require much less compute, just like our brain does.
Your brain uses serious compute actually.
@@TheBobiaan No doubt it does! Several orders of magnitude less of course. I argue it's likely more energy efficient.
Our biological neural nets are not more efficient in terms of complexity. Just the opposite is true. Our largest AI systems are terribly simple compared to the complexity of our brains, way, way less than 1% the complexity.
Interviewer did not internalize anything Francois said, my god.
something weird going on in these comments, and this podcast episodes, where everyone is totally talking past each other. honest question: what is your explanation for dwarkesh's question on gemini understanding, and translating, the dead out of sample language?
I have 2 questions for everyone present in the video...
1. If LLMs are able to perform economically valuable tasks through memorization and interpolation, does it matter if they aren't "truly" intelligent?
2. Can the advanced tooling and features offered within ChatGPT, combined with self-learning LLMs, be viewed as achieving the form of intelligence referenced in the video? (Can be any AI offering, using ChatGPT merely as an example)
interesting thought experiment:
if you had all the data of everything that happened in medieval economic systems, before the industrial revolution: every conversation spoken, and lots of multimodal data, trained the biggest LLM ever on all that, and then jumped forward to today: how well would it do?
I like this!
Dwarkesh just didn't get the point about what Francois says general intelligence is and gets stuck there.
LLM training data is truly vast. It's packed with test questions, example test questions, text books from every country and decade, online courses, homework help requests, brain teasers, etc. It is extremely hard to come up with an actually new problem. That is - something that isn't similar to anything in the training data.
For the common questions, the training LLM wouldn't make it far by storing the solution for every version of every problem. It keeps encountering new versions of those problems in training.
So it evolves an algorithm to solve that kind of problem.
What Francois is saying is that when it solves benchmark problems it can rely on a huge Bank of algorithms which each work on only one class of problem. The benchmarks don't contain many problems that are completely different from every other problem in the training data. Only those problems actual one shot creative problem solving.
What he said about IQ tests is also very important. In an official test, the only kind psychologists would accept as an accurate score, you have to wait two years before you can take it again. This is because familiarity with the test will artificially increase the score, making you think you're smarter than you actually are.
If an hour long test can boost your intelligence for two years, what does a lifetime of problem solving do? Kids spend most of their waking hours playing games, solving puzzles, being taught by adults. Any of those things could end up being similar to an IQ test question.
Imagine you had to make up a test for strength. The trouble is strength doesn't exist. What exists is hundreds of muscles, and each of those muscles has "strength", but it has a different strength depending on its extension, and it has different sustained strength and explosive strength, and so on and so on.
So what do you do? You make a weight lifting competition. But weightlifting is only a very narrow type of motion and it uses different muscles in different ways from any other sport. So... what? Climbing? Pole vaulting? MMA? Strength can only be measured by asking people to perform some kind of sport, and different people do a lot of different sports.
There has to be one test that everyone takes, otherwise how do you compare and rank people by their strength score?
You could test every muscle in different positions and different conditions. But the test also has to be doable in less than 5 hours otherwise only the people who care about strength will do it, and the strength scientists need to test a lot of people from a lot of demographics to get a large enough sample size to publish their paper. And people certainly wouldn't watch it on TV.
IQ is the same as strength. Someone decided there must be a "general intelligence" factor called G, and set out to measure it.
But first of all, G doesn't exist. What exists is millions of neurons, and the anatomy of a single neuron is almost as complex, in its way, as the anatomy of the human body.
But second of all, if G did exist, IQ would not measure it, because IQ tests are standardized questions crafted by formulas that could be similar to a bunch of things a human could be exposed to, and the test is short too.
So all an IQ test does is measure the ability to take an IQ test. How many challenges and tasks in your minute to minute existence are anything like an IQ test? Not to mention the lives of your ancestors?
Dude, with all due respect, ease up and let the guest speak. You keep cutting him off with the same question while he's trying to finish his thoughts. Do you even code? If you used different LLM coding assistants for a while, you'd understand what he means. It's essentially glorified Stack Overflow. And the story won't be any different with any other transformer neural networks.
Listened to another podcast where he and his guest spent an hour chasing their tails about nationalisation of ai research. It was a very frustrating listen, and I finally gave up, just like I'm giving up on this one.