o3 (Part 2) - Tradeoffs of Heuristics, Tree Search, External Memory, In-built Bias

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ม.ค. 2025

ความคิดเห็น • 10

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk 19 วันที่ผ่านมา +1

    Impressed with your bit on self reinforcing systems find it difficult to break out of their viewpoint, nice talk John

    • @johntanchongmin
      @johntanchongmin  19 วันที่ผ่านมา

      @@MachineLearningStreetTalk Thanks Tim, glad you liked it. It is fascinating how much we are stuck in a local optima for LLMs when there is so much to learn from cognitive science.
      Haha more to come on that after I read more.

  • @alessandrofrau4196
    @alessandrofrau4196 23 วันที่ผ่านมา

    Thanks so much for delving deeper into this, wish you a Happy New Year from Italy.

  • @tejshah7258
    @tejshah7258 26 วันที่ผ่านมา

    Great talk, thank you!

  • @420_gunna
    @420_gunna 27 วันที่ผ่านมา +1

    53:08 those two phrases don't appear in the article?

    • @johntanchongmin
      @johntanchongmin  27 วันที่ผ่านมา +1

      Hi there, it is extracted from this sentence:
      On the 2024 AIME exams, GPT-4o only solved on average 12% (1.8/15) of problems. o1 averaged 74% (11.1/15) with a single sample per problem, 83% (12.5/15) with consensus among 64 samples, and 93% (13.9/15) when re-ranking 1000 samples with a learned scoring function.

    • @420_gunna
      @420_gunna 27 วันที่ผ่านมา

      @@johntanchongmin I stand corrected :) Thanks

  • @najialazhar9091
    @najialazhar9091 28 วันที่ผ่านมา +3

    Maybe some tree search is done during training in the same style as stream of search (arXiv:2404.03683) by training on serialized suboptimal search strategies where errors are potentially recovered through backtracking which might overcome error accumulation, but during inference it seems implausible to me for o1 at least as for o1-pro it seems different (see 17th dec AMA session from OpenAI's API team) probably using multiple samples. And I don't think a reward model is used during inference neither, as there is QwQ 32B-Preview that is open weight reasoning model and has similar performances as o1-preview/o1-mini on AIME, MATH and LiveCodeBench and it's the same architecture and inference code (with different default temperature, top_p,..) as the standard llm Qwen2.5-32B you can try it in huggingface. All this points to the fact that reasoning models are likely just llms with a custom post-training.

    • @johntanchongmin
      @johntanchongmin  28 วันที่ผ่านมา

      I agree. If tree search is used, it is only in training. Although I really suspect adding tree search in might make the outcomes worse if the wrong heuristic is used. Hence, it may be like what I said about expanding nodes exhaustively, at least for maybe first X layers, then take all leaf nodes and do majority voting / check with ground truth

  • @johntanchongmin
    @johntanchongmin  28 วันที่ผ่านมา

    Part 1 here: th-cam.com/video/-6J0S1q03Ds/w-d-xo.html