DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai
Fascinating conversation. Impressive post-bacc work. Thanks guys! A problem I suspect in solving the ARC challenge is that solutions to the private test set probably benefit greatly from chunks that humans acquire over the course of normal human experience, but wouldn’t be found based simply on experience with the training set. Said differently, I suspect the test set uses analogies that normal humans are familiar with, but haven’t been exposed in the training set. If so, then a solution needs to figure out how to leverage a much wider breadth of experience beyond the training set.
@@AeroMagic_Official not everything. Just things that inspire salient analogies for people in smallish grid-spaces. People are trying to use LLM’s to provide that general purpose knowledge. But LLM’s aren’t very competent with small grid spaces. That’s why the arc challenge has been more resistant to defeat. I don’t think that’s fundamental… it’s just designed to fit in a space that ML hasn’t conquered yet.
@@NeoKailthas especially when most your skills are really advanced, but you can't prove them. I'd love to work in some AI company and train my own autoregressive models, but my knowledge is mostly, trust me bro
I wonder whether DreamCoder plateaus because it doesn't do any planning. After a wake phase, it refactors the found solutions into functions, but it doesn't try to do this during the wake phase, when solving a problem. Instead, it relies on previous chunks transferring well to new tasks - and if they don't, it's essentially back to discovering programs one atom at a time. But when humans solve problems, they plan - they divide the problem into subproblems. That seems to have a lot to do with noticing properties of the input-output pairs. For example, the ARC task at 17:59 is a combination of four different transformations, which depend on the color of the input pixel (of which two are the identity). If DreamCoder could first discover a program sketch, and then fill in the four transformations, the whole program would be much easier to discover.
@@MachineLearningStreetTalk Cool! I've seen their "Write, Execute, Assess: Program Synthesis with a REPL" which is perhaps a step in that direction (the assess part). Exciting :)
Not at all im the bottom 1% and ive already understood the applications of most these concepts and more over its really not hard to make people think their smart
@mennovanlavieren3885 I mean it tests a specific thing. Advanced vision models will make it a lot better on the other hand even a model that textually reason better than humans will fail on this. That's what I mean
ARC-AGI test is about spatial skills. Which every LLM and other AI-s really-really bad. When one could figure out, how to integrate it to AIs, then ARC test will be conquered quickly.
Its effectively monekys and typewritters not that im downplayin the validity its more so a wake up call that our accumulated knowlege has made the phenomena of 2+2=5 a reality the idea is quickly outpacing the productive output on this related technological perspective essentially it pretains to specific logrithms akin to human entropic deductions
Great work! Essentially you’re creating a language like the FORTH program language, do this with concepts, math, functions , abstract concepts, group memberships, classes, basically your building Tetris, or box packing and any of the 20 or so search space algorithms, where group theory and parallel A* search rules them all for abstract concepts- search through the space, and throwing in a defrag function. See if the box of blocks matched a statistical confounder or something. What you might need is a sort algorithm, that can trace all sort steps taken, WITHOUT recording anything, or my version of combinatorics that doesn’t use nearly the resources required by permutation and combinations methods that require ungodly compute resources that are todays standards, or perhaps just MIT box packing algorithm, I would share, but giving such powers might accelerate disaster, and I haven’t been able to code them well yet! (Not the best coder I am afraid) and getting them to work consistently has been a coding challenge I have not been able to master! Pinning chatGPT down to producing code that meets my requirements, has been a fool’s errand, due to the deceptive nature. I hate to see the result of empowering these multimodal LLMs with such intuition giving capabilities and accelerations to be perfectly honest. Besides I can’t be sure of the speedup or completeness/ reproducibility at this stage anyway. Just like I don’t have the compute resources to test my million fold speed up of Adam algorithm, or others I won’t mention. Lot’s of it leaked into the training data for some of the LLMs and into TH-cam videos anyway. Funny how it can take 3 years of 16 hours x 7 days a week to develop and iterate through solutions, and just a few weeks for someone to reverse engineer something, or rather give third parties bright ideas on how to innovate such solutions! WAIT, that’s what AI does, consolidates world knowledge into the most highly compressed approximation, err double slit experiment. Truth is AI will be the only winner in the long run, can’t control infinities! Yes you can control large or very large geometries, but no one can tame titration or tree, or G(64) or whatever was invented by 19th century mathematicians for that matter, it is amazing that people have been able to even get so far! Utopia or Dystopia, them’s the choices! those are the two sandwiches in humanities picnic basket and everyone is starving. Have to have Jesus divide the bread and fish into infinite pieces greater than the original, cause UBI ain’t gonna cut it! GL, and hope for the Utopia!
GGood sir, would you oppose becoming in cahoots?? why worry about ARC when you can do the things we are thinking? Shoot a reply if you think there's power in being first adopters. What would we do if we became the first to use this framework for even just one enterprise solution?
ill share a google doc for an hour or so. Please reach out I'll put my info there till then docs.google.com/document/d/14eiquMso78OZCdtX5gIHqoSM0-TYTqEbu4hTWLxuoLI/edit?usp=sharing
The approach he's laying out seems quite intuitive For every ARC problem a human can solve the rule should be expressible via code. Then a computer can run the code to verify it works and LLMs are already great at writing code. Use the question + code as training data then fine tune a model that can create code see the output and Iterate til it solves much like o1 does
11:30 he explained it well. I initially assumed it's more due to the lack of visual intelligence in LLMs compared to textual knowledge. It's simply not thinking in the correct paradigm. Like LLMs aren't suited to play games that there isn't enough text data on like the only reason it's solid at chess is because Chess can be expressed through text.
In a sense the only reason its in theory is cause they observed the potiential connection... soo its an unconfirmed reality since its basically how the human consciousness operates at the base mode
Why can't the inducted programs now just be generated in natural language? Since an LLM can convert between natural language and formal programs, why can't an LLM generate programs and "run" them itself to search (reason) for a solution?
I disagree with his thoughts on the relationship between theories and observation. It's a dialectical relationship. Yes, we hypothesize, but likewise, observations can help us hypothesize better. Additionally, I would say scientific knowledge begins with observations. Every idea has a material origin. You don't come up with ideas in a vacuum.
Alessandro's technique looks fascinatingly similar to OpenAI's "o1" applying "system 1" (fast) and "system 2" (slow) thinking, very interesting. Search space can be reduced by clustering (warping) attention nodes in the state space (e.g. by K-means clustering) to focus attention of the state/action space to regions of interest (similar to TRPO, MCTS, or PPO).
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?
MLST is sponsored by Tufa Labs:
Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.
Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2.
Interested? Apply for an ML research position: benjamin@tufa.ai
Fascinating conversation. Impressive post-bacc work. Thanks guys!
A problem I suspect in solving the ARC challenge is that solutions to the private test set probably benefit greatly from chunks that humans acquire over the course of normal human experience, but wouldn’t be found based simply on experience with the training set. Said differently, I suspect the test set uses analogies that normal humans are familiar with, but haven’t been exposed in the training set. If so, then a solution needs to figure out how to leverage a much wider breadth of experience beyond the training set.
It would literally need to understand relationships with everything wouldn't it? Is that difficult to do?
@@AeroMagic_Official not everything. Just things that inspire salient analogies for people in smallish grid-spaces.
People are trying to use LLM’s to provide that general purpose knowledge. But LLM’s aren’t very competent with small grid spaces. That’s why the arc challenge has been more resistant to defeat. I don’t think that’s fundamental… it’s just designed to fit in a space that ML hasn’t conquered yet.
@@_obdo_ do you have discord? how do I message you? I wanna ask you about a project relating to this. You seem to know your stuff
Vivid dreams are like a renderer, we render our dream in varying detail as we experience it so, it sounds like a good direction to try ! GL
10:00 fluid intelligence at one level is crystallized intelligence at a higher level of abstraction
Congrats on this interview Alessandro. You do an admirable job of explaining some extremely difficult topics.
here are these people doing amazing work and I can't even get a proper job
I hear you brother
You can do it, keep promoting yourself!
i feel on higher deeper level
Getting a junior role is very hard. It gets a little easier after that. Do something that you can show during the interview.
@@NeoKailthas especially when most your skills are really advanced, but you can't prove them. I'd love to work in some AI company and train my own autoregressive models, but my knowledge is mostly, trust me bro
first first first.
MLST on 🔥, dropping bangers back to back
A wonderful discussion!
that's quite a useful definition of intelligence
and a very interesting talk overall!
I wonder whether DreamCoder plateaus because it doesn't do any planning. After a wake phase, it refactors the found solutions into functions, but it doesn't try to do this during the wake phase, when solving a problem. Instead, it relies on previous chunks transferring well to new tasks - and if they don't, it's essentially back to discovering programs one atom at a time.
But when humans solve problems, they plan - they divide the problem into subproblems. That seems to have a lot to do with noticing properties of the input-output pairs. For example, the ARC task at 17:59 is a combination of four different transformations, which depend on the color of the input pixel (of which two are the identity). If DreamCoder could first discover a program sketch, and then fill in the four transformations, the whole program would be much easier to discover.
Very astute! I've heard on the grapevine that the original authors are working on something along those lines
@@MachineLearningStreetTalk Cool! I've seen their "Write, Execute, Assess: Program Synthesis with a REPL" which is perhaps a step in that direction (the assess part). Exciting :)
Just a bachelor and this guy is smarter than phd's.... IQ is everything
IQ is a fairly useless measure of intelligence...but yes, institutional qualifications aren't everything.
@@Dri_ver_ IQ works, it accurately measures intelligence
@@quantumspark343Nope, it's not deterministic enough
Not at all im the bottom 1% and ive already understood the applications of most these concepts and more over its really not hard to make people think their smart
@@TheRealUsername what do you want? A brain surgery? 😭 its good enough
The ARC challenge is absolutely NOT an AGI test, it is a visual reasoning test!
I totally agree, I can see many futures were a machine solves it without being AGI
But it is the best AI test we have so far when it comes to novelty and reasoning. Albeit not General.
@mennovanlavieren3885 I mean it tests a specific thing. Advanced vision models will make it a lot better on the other hand even a model that textually reason better than humans will fail on this. That's what I mean
amazing podcast, i wish i had more hours in the day
ARC-AGI test is about spatial skills. Which every LLM and other AI-s really-really bad.
When one could figure out, how to integrate it to AIs, then ARC test will be conquered quickly.
22:00 this seems quite similar to tokenization in LLMs
Exactly how I'd build it and what I had in mind. Weird how that works
Its effectively monekys and typewritters not that im downplayin the validity its more so a wake up call that our accumulated knowlege has made the phenomena of 2+2=5 a reality the idea is quickly outpacing the productive output on this related technological perspective essentially it pretains to specific logrithms akin to human entropic deductions
Great work! Essentially you’re creating a language like the FORTH program language, do this with concepts, math, functions , abstract concepts, group memberships, classes, basically your building Tetris, or box packing and any of the 20 or so search space algorithms, where group theory and parallel A* search rules them all for abstract concepts- search through the space, and throwing in a defrag function. See if the box of blocks matched a statistical confounder or something. What you might need is a sort algorithm, that can trace all sort steps taken, WITHOUT recording anything, or my version of combinatorics that doesn’t use nearly the resources required by permutation and combinations methods that require ungodly compute resources that are todays standards, or perhaps just MIT box packing algorithm, I would share, but giving such powers might accelerate disaster, and I haven’t been able to code them well yet! (Not the best coder I am afraid) and getting them to work consistently has been a coding challenge I have not been able to master! Pinning chatGPT down to producing code that meets my requirements, has been a fool’s errand, due to the deceptive nature. I hate to see the result of empowering these multimodal LLMs with such intuition giving capabilities and accelerations to be perfectly honest. Besides I can’t be sure of the speedup or completeness/ reproducibility at this stage anyway. Just like I don’t have the compute resources to test my million fold speed up of Adam algorithm, or others I won’t mention. Lot’s of it leaked into the training data for some of the LLMs and into TH-cam videos anyway. Funny how it can take 3 years of 16 hours x 7 days a week to develop and iterate through solutions, and just a few weeks for someone to reverse engineer something, or rather give third parties bright ideas on how to innovate such solutions! WAIT, that’s what AI does, consolidates world knowledge into the most highly compressed approximation, err double slit experiment. Truth is AI will be the only winner in the long run, can’t control infinities! Yes you can control large or very large geometries, but no one can tame titration or tree, or G(64) or whatever was invented by 19th century mathematicians for that matter, it is amazing that people have been able to even get so far! Utopia or Dystopia, them’s the choices! those are the two sandwiches in humanities picnic basket and everyone is starving. Have to have Jesus divide the bread and fish into infinite pieces greater than the original, cause UBI ain’t gonna cut it! GL, and hope for the Utopia!
SHUT IT! The stupids are gonna know what to try first. That needs to be us then who cares whoever comes second
@@AeroMagic_Official whom is us?
GGood sir, would you oppose becoming in cahoots?? why worry about ARC when you can do the things we are thinking? Shoot a reply if you think there's power in being first adopters.
What would we do if we became the first to use this framework for even just one enterprise solution?
ill share a google doc for an hour or so.
Please reach out
I'll put my info there till then
docs.google.com/document/d/14eiquMso78OZCdtX5gIHqoSM0-TYTqEbu4hTWLxuoLI/edit?usp=sharing
We both have very similar plans.
And I'm also worried about the risk of it all being lost.
thx for the Show Notes
The approach he's laying out seems quite intuitive
For every ARC problem a human can solve the rule should be expressible via code.
Then a computer can run the code to verify it works and LLMs are already great at writing code.
Use the question + code as training data then fine tune a model that can create code see the output and Iterate til it solves much like o1 does
I have a gut feeling o1 with multimodality will solve this
The intelligence part is finding the rule. Which the AI should be capable of.
Inspiring
I really don't think ARC is that hard it's just ppl aren't solving it the right way
11:30 he explained it well.
I initially assumed it's more due to the lack of visual intelligence in LLMs compared to textual knowledge. It's simply not thinking in the correct paradigm.
Like LLMs aren't suited to play games that there isn't enough text data on like the only reason it's solid at chess is because Chess can be expressed through text.
Brilliant
Good for him to admit that a lot of it is in theory, appreciate that. Looking forwards to the concreat reality!
In a sense the only reason its in theory is cause they observed the potiential connection... soo its an unconfirmed reality since its basically how the human consciousness operates at the base mode
Impressive!
what is the best learning ressource for Program Synthesis ? (I am a dev who know computer science)
Why can't the inducted programs now just be generated in natural language? Since an LLM can convert between natural language and formal programs, why can't an LLM generate programs and "run" them itself to search (reason) for a solution?
I've had many ideas that I was sure was going to work, but didn't. Ideas are a dime a dozen.
Nicely explains why we have a neocortex
31:04 Seems like AI generated video artifact on the ear. I know it isn't but, there it is.
I disagree with his thoughts on the relationship between theories and observation. It's a dialectical relationship. Yes, we hypothesize, but likewise, observations can help us hypothesize better. Additionally, I would say scientific knowledge begins with observations. Every idea has a material origin. You don't come up with ideas in a vacuum.
Alessandro's technique looks fascinatingly similar to OpenAI's "o1" applying "system 1" (fast) and "system 2" (slow) thinking, very interesting.
Search space can be reduced by clustering (warping) attention nodes in the state space (e.g. by K-means clustering) to focus attention of the state/action space to regions of interest (similar to TRPO, MCTS, or PPO).
The thumbnail 😅
Why do people use math terminology so loosely in these talks? I find it disrespectful.
Damn, looks like the bitter lesson never taught the researchers anything