o3 (Part 1): Generating data from multiple sampling for self-improvement + Path Ahead

PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

o3 (Part 2) - Tradeoffs of Heuristics, Tree Search, External Memory, In-built Bias

John Tan Chong Min

มุมมอง 1 004

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 27 ม.ค. 2025

ความคิดเห็น • 10

@MachineLearningStreetTalk 19 วันที่ผ่านมา ⁺¹
Impressed with your bit on self reinforcing systems find it difficult to break out of their viewpoint, nice talk John
@johntanchongmin 19 วันที่ผ่านมา
@@MachineLearningStreetTalk Thanks Tim, glad you liked it. It is fascinating how much we are stuck in a local optima for LLMs when there is so much to learn from cognitive science.
Haha more to come on that after I read more.
@alessandrofrau4196 23 วันที่ผ่านมา
Thanks so much for delving deeper into this, wish you a Happy New Year from Italy.
@tejshah7258 26 วันที่ผ่านมา
Great talk, thank you!
@420_gunna 27 วันที่ผ่านมา ⁺¹
53:08 those two phrases don't appear in the article?
@johntanchongmin 27 วันที่ผ่านมา ⁺¹
Hi there, it is extracted from this sentence:
On the 2024 AIME exams, GPT-4o only solved on average 12% (1.8/15) of problems. o1 averaged 74% (11.1/15) with a single sample per problem, 83% (12.5/15) with consensus among 64 samples, and 93% (13.9/15) when re-ranking 1000 samples with a learned scoring function.
@420_gunna 27 วันที่ผ่านมา
@@johntanchongmin I stand corrected :) Thanks
@najialazhar9091 28 วันที่ผ่านมา ⁺³
Maybe some tree search is done during training in the same style as stream of search (arXiv:2404.03683) by training on serialized suboptimal search strategies where errors are potentially recovered through backtracking which might overcome error accumulation, but during inference it seems implausible to me for o1 at least as for o1-pro it seems different (see 17th dec AMA session from OpenAI's API team) probably using multiple samples. And I don't think a reward model is used during inference neither, as there is QwQ 32B-Preview that is open weight reasoning model and has similar performances as o1-preview/o1-mini on AIME, MATH and LiveCodeBench and it's the same architecture and inference code (with different default temperature, top_p,..) as the standard llm Qwen2.5-32B you can try it in huggingface. All this points to the fact that reasoning models are likely just llms with a custom post-training.
@johntanchongmin 28 วันที่ผ่านมา
I agree. If tree search is used, it is only in training. Although I really suspect adding tree search in might make the outcomes worse if the wrong heuristic is used. Hence, it may be like what I said about expanding nodes exhaustively, at least for maybe first X layers, then take all leaf nodes and do majority voting / check with ground truth
@johntanchongmin 28 วันที่ผ่านมา
Part 1 here: th-cam.com/video/-6J0S1q03Ds/w-d-xo.html

ต่อไป

เล่นอัตโนมัติ

o3 (Part 1): Generating data from multiple sampling for self-improvement + Path Ahead

o3 (Part 1): Generating data from multiple sampling for self-improvement + Path Ahead

PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning

PhD Thesis Overview (Part 1): Reward is not enough; Towards Goal-Directed, Memory-based Learning

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online

Agentic Systems for Production: Tips and Tricks

Agentic Systems for Production: Tips and Tricks

[Webinar] How to Build a Modern Agentic System

[Webinar] How to Build a Modern Agentic System

LLM-Modulo: Using Critics and Verifiers to Improve Grounding of a Plan - Explanation + Improvements

LLM-Modulo: Using Critics and Verifiers to Improve Grounding of a Plan - Explanation + Improvements

Beyond Strawberry: gpt-o1 - Is LLM alone sufficient for reasoning?

Beyond Strawberry: gpt-o1 - Is LLM alone sufficient for reasoning?

NVIDIA CEO Jensen Huang's Vision for Your Future

NVIDIA CEO Jensen Huang's Vision for Your Future

PhD Thesis Overview (Part 2): LLMs for ARC-AGI, Task-Based Memory-Infused Learning, Plan for AgentJo

PhD Thesis Overview (Part 2): LLMs for ARC-AGI, Task-Based Memory-Infused Learning, Plan for AgentJo

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Stanford Computer Scientist Answers Coding Questions From Twitter | Tech Support | WIRED

Stanford Computer Scientist Answers Coding Questions From Twitter | Tech Support | WIRED

ใครขยับไม่ได้เป็น!!

ใครขยับไม่ได้เป็น!!

คุณอยากเรียนเวลาไหนทุกวันไปตลอดชีวิต? เลือกเลย!

คุณอยากเรียนเวลาไหนทุกวันไปตลอดชีวิต? เลือกเลย!

ไก่วิเศษ #การ์ตูน #นิทาน #cartoon

ไก่วิเศษ #การ์ตูน #นิทาน #cartoon

บังอาจ ทาบบารมี ! ผ่าเบื้องลึก 1 วันก่อนสังหาร เดินเกมล้มตระกูล “วิลาวัลย์” #ถกไม่เถียง

บังอาจ ทาบบารมี ! ผ่าเบื้องลึก 1 วันก่อนสังหาร เดินเกมล้มตระกูล “วิลาวัลย์” #ถกไม่เถียง

Live!🔴 สิงคโปร์ VS ทีมชาติไทย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

Live!🔴 สิงคโปร์ VS ทีมชาติไทย เชียร์สดฟุตบอลฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

มายคราฟแต่ "น้ำกับลาวา" สลับกัน!?

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์