Is there a way to know all the priors you embed into the puzzles? So far I’ve identified: 1. Translations - Shifting objects or patterns across the grid. 2. Rotations - Rotating objects or patterns at different angles. 3. Reflections - Flipping objects or patterns across a line. 4. Scaling - Changing the size of objects or patterns. 5. Repetition and symmetry - Repeating patterns or creating symmetrical designs. 6. Color changes - Altering the color of objects or patterns. 7. Compositions - Combining multiple operations or transformations. 8. Object addition or removal - Adding or removing elements within the grid. 9. Changes of the size matrices - Modifying the dimensions of the grid or the objects within it.
There have been a bunch of attempts at this. Table 4 on this paper leans that direction arxiv.org/pdf/2403.11793 There isn't a way to know all the priors, this is essentially helping give the answer to the test set
Thank you! I know this has been around for a while, but I'm happy to see a legitimate attempt at testing intelligence that isn't "It passed the Turing test." LLMs sound smart because they speak our language, but are they really doing anything more than regurgitating memorized information? This test shows that most likely not really.
Could someone please explain how the AI soccer players in a simulation can go from physically flopping around on the ground to teaching themselves team strategy but AI can't solve these ARC tasks?
I am new to programming, but this challenge and task really interests me and I'd like to give it a try. Could you create a tutorial on how to submit an entry to the arc challenge? maybe with a model which will produce some minimal results?
Totally! We have a ton of templates here arcprize.org/guide As for a submission tutorial, we don't have a video of this directly, but this video shows how to work with Kaggle notebooks. th-cam.com/video/crhrzhVjWog/w-d-xo.html
When you think about it, the optimal network should be like a physics simulator, every example has its own stable rules. My guess is that a recurrent network would have the best chance. Though the parameter count would need to be huge so we could perhaps make a Hypernet to generate the weights from scratch.
There are also birds such as Sulphur Crested Cockatoos that have shown problem solving skills. Hopefully it's proof enough that a basic reasoning model won't require a trillion parameters.
Sorry this isn’t general intelligence. This is just reasoning. It is painful watching a whole industry trying to reinvent psychology when there is shady a century of research there.
Let's go! I'm all in on this, I will say: don't count out the power of one shot
Nice! Love it - let us know if you need anything along the way
Is there a way to know all the priors you embed into the puzzles?
So far I’ve identified:
1. Translations - Shifting objects or patterns across the grid.
2. Rotations - Rotating objects or patterns at different angles.
3. Reflections - Flipping objects or patterns across a line.
4. Scaling - Changing the size of objects or patterns.
5. Repetition and symmetry - Repeating patterns or creating symmetrical designs.
6. Color changes - Altering the color of objects or patterns.
7. Compositions - Combining multiple operations or transformations.
8. Object addition or removal - Adding or removing elements within the grid.
9. Changes of the size matrices - Modifying the dimensions of the grid or the objects within it.
There have been a bunch of attempts at this.
Table 4 on this paper leans that direction
arxiv.org/pdf/2403.11793
There isn't a way to know all the priors, this is essentially helping give the answer to the test set
Oh you sweet summer child...
Thank you! I know this has been around for a while, but I'm happy to see a legitimate attempt at testing intelligence that isn't "It passed the Turing test." LLMs sound smart because they speak our language, but are they really doing anything more than regurgitating memorized information? This test shows that most likely not really.
Thanks! yes we agree
I have an idea now, thanks. I’ll probably check out ARC after my PhD qualifying exam. Finetuning is gonna be fun 🤩
Could someone please explain how the AI soccer players in a simulation can go from physically flopping around on the ground to teaching themselves team strategy but AI can't solve these ARC tasks?
thx for demonstrations, this taks feel like arbitrarily single step arbitrary state transition in cellular automaton. It also looks like fun to play 😄
Nice! Yes please go try it out and let us know what you think
I am new to programming, but this challenge and task really interests me and I'd like to give it a try. Could you create a tutorial on how to submit an entry to the arc challenge? maybe with a model which will produce some minimal results?
Totally! We have a ton of templates here
arcprize.org/guide
As for a submission tutorial, we don't have a video of this directly, but this video shows how to work with Kaggle notebooks.
th-cam.com/video/crhrzhVjWog/w-d-xo.html
When you think about it, the optimal network should be like a physics simulator, every example has its own stable rules. My guess is that a recurrent network would have the best chance. Though the parameter count would need to be huge so we could perhaps make a Hypernet to generate the weights from scratch.
Why is the train/evaluation set so small?
The tasks are handmade which limit the scale that can be done.
They focus on diversity rather than quantity at this stage
There are also birds such as Sulphur Crested Cockatoos that have shown problem solving skills. Hopefully it's proof enough that a basic reasoning model won't require a trillion parameters.
Does your submission count if you make use of private models like gpt4 at some point in your algorithm?
Lets get to the bottom of this. How much for getting 90 % accuracy on a free llm model? How much do i get for that?
The threshold for a Kaggle score is 85%, reach that with a valid submission and you're eligible for a prize
@@ARCprize thanks
So have you collaborated with any psychologists to make this test
Check out section 11.1 of Measure/Intelligence. Francois digs into his influence of human psychology
i think i might have unintentionally set the basis for solving this in a project i did a couple months ago
We'd love to see a submission!
@@ARCprize working on it i just handed in my graduation project i have time to work on this now
children can solve these puzzles but i dont think LLM's can
We haven't seen LLM do this yet
@@ARCprize How about VLM? I think this task requires strong spatial understanding.
How?@@ARCprize
I get that this is a stepping stone, but calling it a test for AGI is just ludicrous. This isn't even close to AGI, it's just a toy.
It can't be that hard right
Try it out! We'd love to see a submission
If u cant design an ai architecture to solve this problem, u arent as smart as you think.
Hot take)
Don’t forget to design great design to sell subscriptions😅
Sorry this isn’t general intelligence. This is just reasoning. It is painful watching a whole industry trying to reinvent psychology when there is shady a century of research there.
Thanks for the comment! We'd love to hear your ideas and thoughts about how to get closer to AGI