Groq CEO Jonathan Ross - Tech Giants in the Generative AI Age

The Race to Harness Quantum Computing's Mind-Bending Power | The Future With Hannah Fry

AI: Grappling with a New Kind of Intelligence

#โด่งดัง!ญี่ปุ่นซูฮก บอลอาเซียนเร้าใจ!! โค๊ชสิงคโปร์พูดแบบนี้ถึงไทย!! มาเลย์ขอบคุณไทยที่ให้ชีวิต..?

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

Uyurken Kendimi Kurtçukların Arasında Buldum🤯😬🪱

DeepSeek R1 and the Trade-off Between Accuracy and Efficiency | Hello SundAI - our world through...

Roger Basler de Roca #fragRoger

มุมมอง 215

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 7 ก.พ. 2025
This study examines the performance of the DeepSeek R1 language model on complex mathematical problems, revealing that it achieves higher accuracy than other models but uses considerably more tokens. Here's a summary:
DeepSeek R1's strengths:
DeepSeek R1 excels at solving complex mathematical problems, particularly those that other models struggle with, due to its token-based reasoning approach.
Token usage: DeepSeek R1 uses a significantly higher number of tokens compared to other models. The average token count for DeepSeek R1 is 4717.5, while other models average between 191.75 and 462.39. This higher token usage is linked to its more deliberate, multi-step problem-solving process.
Trade-off: The study highlights a trade-off between accuracy and efficiency. While DeepSeek R1 offers superior accuracy, it requires longer processing times because of its extensive token generation. Models like Mistral might be faster but less accurate, making them suitable for tasks requiring rapid responses.
Temperature settings: The experiment underscores the importance of temperature settings in influencing model behaviour. For instance, Llama 3.1 only achieved correct results at a temperature of 0.4, demonstrating the sensitivity of some models to this parameter.
Methodology: The study used 30 challenging mathematical problems from the MATH dataset, which were previously unsolved by other models under time constraints. Five LLMs were tested across 11 different temperature settings, and the correctness of the solutions was evaluated, also tracking the number of tokens generated. A binary metric was used for correctness using the mistral-large-2411 model as a judge.
Models evaluated: The models evaluated include deepseek-r1:8b, gemini-1.5-flash-8b, gpt-4o-mini-2024-07-18, llama3.1:8b, and mistral-8b-latest.
Dataset: The dataset is derived from a previous benchmark experiment that evaluated LLMs on advanced mathematical problem-solving. The 30 problems were selected because no model in the original study could solve them within imposed time limits.
Future research: Future research should explore the internal workings of DeepSeek R1 to better understand "reasoning tokens" and explore methods to reduce token usage. Prompt engineering strategies should also be examined to maximise model performance.
Source: Evstafev, E. (2025) Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH.
Hello SundAI - our world through the lense of AI
Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.
rogerbasler.ch... (rogerbasler.ch...)
Episode link: play.headliner...

ความคิดเห็น •

ต่อไป

เล่นอัตโนมัติ

Groq CEO Jonathan Ross - Tech Giants in the Generative AI Age

Groq CEO Jonathan Ross - Tech Giants in the Generative AI Age

The Race to Harness Quantum Computing's Mind-Bending Power | The Future With Hannah Fry

The Race to Harness Quantum Computing's Mind-Bending Power | The Future With Hannah Fry

AI: Grappling with a New Kind of Intelligence

AI: Grappling with a New Kind of Intelligence

#โด่งดัง!ญี่ปุ่นซูฮก บอลอาเซียนเร้าใจ!! โค๊ชสิงคโปร์พูดแบบนี้ถึงไทย!! มาเลย์ขอบคุณไทยที่ให้ชีวิต..?

#โด่งดัง!ญี่ปุ่นซูฮก บอลอาเซียนเร้าใจ!! โค๊ชสิงคโปร์พูดแบบนี้ถึงไทย!! มาเลย์ขอบคุณไทยที่ให้ชีวิต..?

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

ถ้าต้องทำ การบ้าน ตลอดชีวิต? คุณจะเลือกแบบไหน!

Uyurken Kendimi Kurtçukların Arasında Buldum🤯😬🪱

Uyurken Kendimi Kurtçukların Arasında Buldum🤯😬🪱

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!

Graham Hancock: Lost Civilization of the Ice Age & Ancient Human History | Lex Fridman Podcast #449

Graham Hancock: Lost Civilization of the Ice Age & Ancient Human History | Lex Fridman Podcast #449

François Chollet on OpenAI o-models and ARC

François Chollet on OpenAI o-models and ARC

The Physics and Philosophy of Time - with Carlo Rovelli

The Physics and Philosophy of Time - with Carlo Rovelli

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Quantum Computing: Hype vs. Reality

Quantum Computing: Hype vs. Reality

Lecture Series in AI: “How Could Machines Reach Human-Level Intelligence?” by Yann LeCun

Lecture Series in AI: “How Could Machines Reach Human-Level Intelligence?” by Yann LeCun

Nick Lane: Origin of Life, Evolution, Aliens, Biology, and Consciousness | Lex Fridman Podcast #318

Nick Lane: Origin of Life, Evolution, Aliens, Biology, and Consciousness | Lex Fridman Podcast #318

Stephen Wolfram on Observer Theory

Stephen Wolfram on Observer Theory

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

พ้นเส้นตาย "ทหารไทย" 18 ธ.ค.หมดเวลา "ว้าแดง" | DAILYNEWSTODAY 18/12/67

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

【หนังพากย์ไทย】ยอดฝีมือสังหารนักโทษ แต่นักโทษเป็นปรมาจารย์กังฟูที่ซ่อนอยู่ เขาจัดการทั้งหมดในทันที

🔴LIVE กัมพูชา vs ติมอร์-เลสเต | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024 | รอบแรก กลุ่ม A

🔴LIVE กัมพูชา vs ติมอร์-เลสเต | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024 | รอบแรก กลุ่ม A

🔴 LIVE : ถ่ายทอดสด การออกรางวัลสลากกินแบ่งรัฐบาล งวดวันที่ 16 ธันวาคม 2567

🔴 LIVE : ถ่ายทอดสด การออกรางวัลสลากกินแบ่งรัฐบาล งวดวันที่ 16 ธันวาคม 2567

How Strong Is Tape?

How Strong Is Tape?

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming

ถ้าทาสไม่ขุดทอง แล้วทาสจะขุดอะไร #hererm #เกม #gaming

🎄✨ Puff is saving Christmas again with his incredible baking skills! #PuffTheBaker #thatlittlepuff

🎄✨ Puff is saving Christmas again with his incredible baking skills! #PuffTheBaker #thatlittlepuff

Cat mode activated 🤣

Cat mode activated 🤣