It would be to see when LLM can solve sudoku (even the easy one). LLMs seam to struggle with those (tested with newest DeepSeek's DeepThink), even if advice them little in prompts. Of course there might be other AI to solve them better, but if LLM is path to AGI, that could one potential benchmark to logical thinking. Also word puzzle or "Sanaristikko" (in Finnish) are quite easy to people, but difficult to LLMs.
You just have to pre-train your model on this particular task, in your case a visual and geometric word pattern, and the performance will go up. Fine-tuning will not deliver results, if not pre-trained on. It is all a question of pre-trained datasets, their complexity and geometry (and more ...).
Well, I cannot see the reason why this would be required at all. All it has to do is to recognize that this is a sudoku and then generate the algoritm in python to solve (that is in the dataset for sure) and execute it.
I have one question. In TTT+TTC(of last video) combination. Isn't it possible to just not reset the parameters in TTT part or some kind of selective resetting or reLora training after solving the problem? To result in persistent continuous learning. Won't this drive down computational costs exponentially since in novel situations retraining will happen but as problems get more and more repititive it will just do old traditional llm to get to solution without extensive same amount of thinking time? Also intelligence will grow like virus. Thoughts?
You don't understand. We want to LLM to understand the problem and the solution on a generic level, to be applied continuously to all problems. This is what we call "learning the solution". > IF the LLM just memorizes one solution string, it is not inherent in its learned knowledge and therefore it will fail with the slightest variation of the task given. Like with humans, if I just memorize the results of a test, without understanding the underlying methodology or why this is a solution, I will pass this single test, but fail right at the next one, because I haven't learned anything. I just memorized a string.
18:04 : I found it surprising that so many people felt that this seemed like cheating. To my mind, if it is just doing computations on the input, to get the output, if it is within the compute requirements for the challenge, why would it be cheating? Solomonoff induction presumably isn’t cheating , except that it would take too much compute time and memory, so why would this be cheating?
@ To *a* test, yes, but only an answer that was part of the question one was actually given. There’s no outside-help involved, no being given any extra information about the answer to the actual question one is expected to answer. Like, the task is, “given these input output pairs, determine the answer for this other input”. And, training on (a processed version of) the given input/output pairs, is just doing a computation on part of the input one is given. To think of it as cheating, I think one would have to confuse different levels of abstraction about the problem.
This is the topics I like to see, awesome.
More to come!
The title of the of the last two papers combined should be “Test Time Turbo Is All You Need".
Smile.
It would be to see when LLM can solve sudoku (even the easy one). LLMs seam to struggle with those (tested with newest DeepSeek's DeepThink), even if advice them little in prompts. Of course there might be other AI to solve them better, but if LLM is path to AGI, that could one potential benchmark to logical thinking. Also word puzzle or "Sanaristikko" (in Finnish) are quite easy to people, but difficult to LLMs.
You just have to pre-train your model on this particular task, in your case a visual and geometric word pattern, and the performance will go up. Fine-tuning will not deliver results, if not pre-trained on. It is all a question of pre-trained datasets, their complexity and geometry (and more ...).
Well, I cannot see the reason why this would be required at all. All it has to do is to recognize that this is a sudoku and then generate the algoritm in python to solve (that is in the dataset for sure) and execute it.
I have one question.
In TTT+TTC(of last video) combination. Isn't it possible to just not reset the parameters in TTT part or some kind of selective resetting or reLora training after solving the problem? To result in persistent continuous learning.
Won't this drive down computational costs exponentially since in novel situations retraining will happen but as problems get more and more repititive it will just do old traditional llm to get to solution without extensive same amount of thinking time?
Also intelligence will grow like virus.
Thoughts?
If you do not reset the parameters you encounter memorization.
@code4AI It's bad??
Don't we want it to memorize learnings? And go through process of learning again only if we need it?
You don't understand. We want to LLM to understand the problem and the solution on a generic level, to be applied continuously to all problems. This is what we call "learning the solution". >
IF the LLM just memorizes one solution string, it is not inherent in its learned knowledge and therefore it will fail with the slightest variation of the task given. Like with humans, if I just memorize the results of a test, without understanding the underlying methodology or why this is a solution, I will pass this single test, but fail right at the next one, because I haven't learned anything. I just memorized a string.
18:04 : I found it surprising that so many people felt that this seemed like cheating.
To my mind, if it is just doing computations on the input, to get the output, if it is within the compute requirements for the challenge, why would it be cheating?
Solomonoff induction presumably isn’t cheating , except that it would take too much compute time and memory, so why would this be cheating?
Because you have a permutation training sequence that includes the solution to the test.
@ To *a* test, yes, but only an answer that was part of the question one was actually given. There’s no outside-help involved, no being given any extra information about the answer to the actual question one is expected to answer.
Like, the task is, “given these input output pairs, determine the answer for this other input”. And, training on (a processed version of) the given input/output pairs, is just doing a computation on part of the input one is given.
To think of it as cheating, I think one would have to confuse different levels of abstraction about the problem.