@@MachineLearningStreetTalk Thanks Tim, glad you liked it. It is fascinating how much we are stuck in a local optima for LLMs when there is so much to learn from cognitive science. Haha more to come on that after I read more.
Hi there, it is extracted from this sentence: On the 2024 AIME exams, GPT-4o only solved on average 12% (1.8/15) of problems. o1 averaged 74% (11.1/15) with a single sample per problem, 83% (12.5/15) with consensus among 64 samples, and 93% (13.9/15) when re-ranking 1000 samples with a learned scoring function.
Maybe some tree search is done during training in the same style as stream of search (arXiv:2404.03683) by training on serialized suboptimal search strategies where errors are potentially recovered through backtracking which might overcome error accumulation, but during inference it seems implausible to me for o1 at least as for o1-pro it seems different (see 17th dec AMA session from OpenAI's API team) probably using multiple samples. And I don't think a reward model is used during inference neither, as there is QwQ 32B-Preview that is open weight reasoning model and has similar performances as o1-preview/o1-mini on AIME, MATH and LiveCodeBench and it's the same architecture and inference code (with different default temperature, top_p,..) as the standard llm Qwen2.5-32B you can try it in huggingface. All this points to the fact that reasoning models are likely just llms with a custom post-training.
I agree. If tree search is used, it is only in training. Although I really suspect adding tree search in might make the outcomes worse if the wrong heuristic is used. Hence, it may be like what I said about expanding nodes exhaustively, at least for maybe first X layers, then take all leaf nodes and do majority voting / check with ground truth
Impressed with your bit on self reinforcing systems find it difficult to break out of their viewpoint, nice talk John
@@MachineLearningStreetTalk Thanks Tim, glad you liked it. It is fascinating how much we are stuck in a local optima for LLMs when there is so much to learn from cognitive science.
Haha more to come on that after I read more.
Thanks so much for delving deeper into this, wish you a Happy New Year from Italy.
Great talk, thank you!
53:08 those two phrases don't appear in the article?
Hi there, it is extracted from this sentence:
On the 2024 AIME exams, GPT-4o only solved on average 12% (1.8/15) of problems. o1 averaged 74% (11.1/15) with a single sample per problem, 83% (12.5/15) with consensus among 64 samples, and 93% (13.9/15) when re-ranking 1000 samples with a learned scoring function.
@@johntanchongmin I stand corrected :) Thanks
Maybe some tree search is done during training in the same style as stream of search (arXiv:2404.03683) by training on serialized suboptimal search strategies where errors are potentially recovered through backtracking which might overcome error accumulation, but during inference it seems implausible to me for o1 at least as for o1-pro it seems different (see 17th dec AMA session from OpenAI's API team) probably using multiple samples. And I don't think a reward model is used during inference neither, as there is QwQ 32B-Preview that is open weight reasoning model and has similar performances as o1-preview/o1-mini on AIME, MATH and LiveCodeBench and it's the same architecture and inference code (with different default temperature, top_p,..) as the standard llm Qwen2.5-32B you can try it in huggingface. All this points to the fact that reasoning models are likely just llms with a custom post-training.
I agree. If tree search is used, it is only in training. Although I really suspect adding tree search in might make the outcomes worse if the wrong heuristic is used. Hence, it may be like what I said about expanding nodes exhaustively, at least for maybe first X layers, then take all leaf nodes and do majority voting / check with ground truth
Part 1 here: th-cam.com/video/-6J0S1q03Ds/w-d-xo.html