- 230
- 22 956
Xiaol.x
Hong Kong
เข้าร่วมเมื่อ 22 พ.ค. 2010
Researcher
X: x.com/xiaolGo
X: x.com/xiaolGo
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source o1-like models, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path. Experimental results demonstrate that our approach improves accuracy across challenging datasets without requiring model fine-tuning. Our findings contribute to understanding reasoning inefficiencies in o1-like LLMs and offer a practical solution to enhance their problem-solving capabilities.
arxiv.org/abs/2501.18585
arxiv.org/abs/2501.18585
มุมมอง: 50
วีดีโอ
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems
มุมมอง 222 ชั่วโมงที่ผ่านมา
Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, and adhere to strict policy constraints. However, evaluating these agents remains a significant chal...
Tell me about yourself: LLMs are aware of their learned behaviors
มุมมอง 432 ชั่วโมงที่ผ่านมา
We study behavioral self-awareness an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) making high-risk economic decisions, and (b) outputting insecure code. Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it....
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
มุมมอง 232 ชั่วโมงที่ผ่านมา
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks but their performance in complex logical reasoning tasks remains unsatisfactory. Although some prompting methods, such as Chain-of-Thought, can improve the reasoning ability of LLMs to some extent, they suffer from an unfaithful issue where derived conclusions may not align with the generated reasoning c...
o3-mini vs DeepSeek-R1: Which One is Safer?
มุมมอง 402 ชั่วโมงที่ผ่านมา
The irruption of DeepSeek-R1 constitutes a turning point for the AI industry in general and the LLMs in particular. Its capabilities have demonstrated outstanding performance in several tasks, including creative thinking, code generation, maths and automated program repair, at apparently lower execution cost. However, LLMs must adhere to an important qualitative property, i.e., their alignment ...
Learning high-accuracy error decoding for quantum processors
มุมมอง 332 ชั่วโมงที่ผ่านมา
Building a large-scale quantum computer requires effective strategies to correct errors that inevitably arise in physical quantum systems1. Quantum error-correction codes2 present a way to reach this goal by encoding logical information redundantly into many physical qubits. A key challenge in implementing such codes is accurately decoding noisy syndrome information extracted from redundancy ch...
Trading Inference-Time Compute for Adversarial Robustness
มุมมอง 334 ชั่วโมงที่ผ่านมา
Robustness to adversarial attacks(opens in a new window) has been one of the thorns in AI’s side for more than a decade. In 2014, researchers showed(opens in a new window) that imperceptible perturbations-subtle alterations undetectable to the human eye-can cause models to misclassify images, illustrating one example of a model’s vulnerability to adversarial attacks. Addressing this weakness ...
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
มุมมอง 514 ชั่วโมงที่ผ่านมา
We introduce FrontierMath, a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians. The questions cover most major branches of modern mathematics from computationally intensive problems in number theory and real analysis to abstract questions in algebraic geometry and category theory. Solving a typical problem requires mult...
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
มุมมอง 987 ชั่วโมงที่ผ่านมา
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-...
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
มุมมอง 537 ชั่วโมงที่ผ่านมา
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sig...
An Operating Principle of the Cerebral Cortex,a Cellular Mechanism for Attentional Pattern Learning
มุมมอง 2587 ชั่วโมงที่ผ่านมา
An Operating Principle of the Cerebral Cortex, and a Cellular Mechanism for Attentional Trial-and-Error Pattern Learning and Useful Classification Extraction A feature of the brains of intelligent animals is the ability to learn to respond to an ensemble of active neuronal inputs with a behaviorally appropriate ensemble of active neuronal outputs. Previously, a hypothesis was proposed on how th...
Hallucinations Can Improve Large Language Models in Drug Discovery
มุมมอง 2239 ชั่วโมงที่ผ่านมา
Concerns about hallucinations in Large Language Models (LLMs) have been raised by researchers, yet their potential in areas where creativity is vital, such as drug discovery, merits exploration. In this paper, we come up with the hypothesis that hallucinations can improve LLMs in drug discovery. To verify this hypothesis, we use LLMs to describe the SMILES string of molecules in natural languag...
Qwen2.5-1M Technical Report
มุมมอง 399 ชั่วโมงที่ผ่านมา
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectiv...
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer
มุมมอง 71612 ชั่วโมงที่ผ่านมา
As is known, hybrid quadratic and subquadratic attention models in multi-head architectures have surpassed both Transformer and Linear RNN models , with these works primarily focusing on reducing KV complexity and improving efficiency. For further research on expressiveness, we introduce our series of models distilled from Qwen 2.5, based on pure native RWKV-7 attention, which aims to make RNN ...
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
มุมมอง 17112 ชั่วโมงที่ผ่านมา
We propose Model Swarms, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. Specifically, Model Swarms starts with a pool of LLM experts and a utility function. Guided by the best-found checkpoints across models, diverse LLM experts collaboratively move in the weight space and optimize a utility function representing model ...
Schrodinger's Memory: Large Language Models
มุมมอง 6712 ชั่วโมงที่ผ่านมา
Schrodinger's Memory: Large Language Models
Janus-Pro: Unified Multimodal Understanding andGeneration with Data and Model Scaling
มุมมอง 8312 ชั่วโมงที่ผ่านมา
Janus-Pro: Unified Multimodal Understanding andGeneration with Data and Model Scaling
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
มุมมอง 12216 ชั่วโมงที่ผ่านมา
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
มุมมอง 2.2K16 ชั่วโมงที่ผ่านมา
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
มุมมอง 54716 ชั่วโมงที่ผ่านมา
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
มุมมอง 5116 ชั่วโมงที่ผ่านมา
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Reverse Thinking Makes LLMs Stronger Reasoners
มุมมอง 26116 ชั่วโมงที่ผ่านมา
Reverse Thinking Makes LLMs Stronger Reasoners
The Geometry of Concepts: Sparse Autoencoder Feature Structure
มุมมอง 28019 ชั่วโมงที่ผ่านมา
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Does Prompt Formatting Have Any Impact on LLM Performance?
มุมมอง 23019 ชั่วโมงที่ผ่านมา
Does Prompt Formatting Have Any Impact on LLM Performance?
HybridFlow: A Flexible and Efficient RLHF Framework
มุมมอง 11219 ชั่วโมงที่ผ่านมา
HybridFlow: A Flexible and Efficient RLHF Framework
Fugatto 1:Foundational Generative Audio Transformer Opus
มุมมอง 5519 ชั่วโมงที่ผ่านมา
Fugatto 1:Foundational Generative Audio Transformer Opus
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
มุมมอง 12821 ชั่วโมงที่ผ่านมา
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
GameFactory: Creating New Games with Generative Interactive Videos
มุมมอง 14821 ชั่วโมงที่ผ่านมา
GameFactory: Creating New Games with Generative Interactive Videos
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space
มุมมอง 11421 ชั่วโมงที่ผ่านมา
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space
These are AI's discussing this report
stop lying, you hater! You're just mad because you sound fat when you do voiceover
NotebookLM?
Bro using ai for explanation 😂
Top d+! Cada teste que faço me surpreende mais o o3-mini
wait a min 2 human like ai convo created using deepseek loll ???
Wow, amazing review! "we've gone from single neurons to these collaborative minicolumns, explored the role of the MTL, dived deep into free will and attention, and now we are connecting it all to language" 😃
Are the voices real humans or AI chatbots?
Chat bots... I would be surprised if they are not AI.
It’s pretty obvious they are AI. There aren’t any pauses in speech between speakers, no fillers like um, no flaws in language.
@@HurricaneEmily How funny would it be if it was just highly edited, all sliced up...
It could be highly edited but I doubt it. It sounds like the two voices are reading, not having a natural conversation. This is what bad actors sound like. There is a sense you get from humans that they are genuinely listening to what another person is saying when they’re talking. This quality is completely lacking from this conversation which makes it hard to listen to. I almost clicked off the video because it was so grating but I was interested in the information. However, because it was delivered by AI by a human who was willing to deceive the viewers by not informing them that it was a conversation between two AIs, I’m highly skeptical of the truth behind it.
@@HurricaneEmily I don't think you can value information based off of it's presenter, it's too difficult to discern the truth that way. That being said, there is a lot of information as far as this "controlled chaos", and the hallucinations they talk about are icebergs of information in them selves, but is only talked about at a very surface, vague level, to the point where there's not much substance to the conversation, therefore not much to be disputed, because "they" didn't really talk about much, the touched the tip of a few icebergs and said, "oooo, interesting". That being said, hallucinations in llm's are real, controlled chaos is a subject of philosophy religion and mysticism since ancient times, and they could be inter related. But that topic is so much bigger than this tiny video could ever encapsulate. But good points you made, the way the AI talks to itself is definitely not congruent with any conversations with real people that I have been a part of, it's as you say, like they are reading from books, not really talking to each other, they are more or less reading lines from a "script" or book that one person (or AI) formed.
Wait is this a conversation between actual people or AI bots?
AI
1:15 math example.
2:00 Wait a second. I know these voices! #NotebookLM
Awesome work!
The windows sounds in the background hehe
Can use chat.deepseek.com/ explain the final unified formulation presented in the paper's snapshot.
Who are the people talking. Do they have a TH-cam channel or similar?
Omg it's NotebookLM, I didn't knows about it. I'm an experienced software engineer and I something feel like a donkey in this new AI world
Please write the script yourself next time. Those sloppy AI jokes are horrible. From the first seconds: "Literal crystals?" Who would say something like that in an paper discussion for ML? And use your own voice. This is AI slop. Use your own voice.
th-cam.com/video/GwiWDwgBsNw/w-d-xo.htmlsi=SQOr7snjaAv2-i1v this is my ongoing work.
Is there any other exam ?
frontier math
@Xiaoliu.x grateful to you
@@Xiaoliu.x can I have the link please
th-cam.com/video/J1GGd0T94qI/w-d-xo.html
I think you should rethink this through. Get some coherence, in your mental subway.
This sounds incredible! Does that mean universal translators might be possible? Hmmm... I also wonder how hardware developers should use this information. Maybe if we assume that since certain common structures will created in the information, information might be more efficiently accessed if we design caches to be related to each other in similarly common ways. Thanks for posting this! Keeps me excited for school.
any form information is kind of language.
Hi, nice review! Can you please review this: Rvachev (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Frontiers in Neural Circuits?
th-cam.com/video/8ipE0mRr1gQ/w-d-xo.html , i have tried my best to elaborate the paper.
Thank you, that's a very impressive review!
Good stuff! Could you please review this: Rvachev (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Frontiers in Neural Circuits?
It's AI generated - NOTEBOOK LLM by Google 😅
Thanks for chiming in on the bots!
what is this shit the narration is AI?
I didn’t know we could just post Notebook LM audio. I’ve had issues with the conversation not flowing, I noticed it in yours too. I want to see if I can get mine to teach, not just summarize the paper.
Yes, you probably can make it treat you as a student.
She sounds so sexy..
Hearing something smart discussed in a podcast format sounds as weird as seeing batman on a sunny day
i want to listen but i hate listening to AI voice, so decided to not finish with this video, maybe that is something you guys can feedback on
I cant listen to this, they are too annoying
the girl jumping in is weird. cutting each other off. same thing for the guy. hard to listen to
exactly
its ai, they prompt each other through.
that's a interesting way to think.
Lol asking notebookllm to explain llms
What happened to your mike?
AI voice casters?
I'm curious, my thoughts too lol
Yeah, you'll get this by feeding the PDF to Google's Notebook LM.
New videos revealed.
Woww, this channel just got recommended to me and i must say it's one of the best unplanned things that happened to me today Keep publishing more🎉🎉
Thank you for your appreciation!
It's AI generated - NOTEBOOK LLM by Google 😅
@@ankitsrivastava513 ohhh😭😂
While having two noticeably distinct AI “characters” “discuss” a paper is “neat”, it’s still noticeable and it honestly makes the subject matter feel like it’s not important to the person who made this video, and that causes the effect of “I don’t care if you don’t care”
While listening to the podcast, I found it counterintuitive that the paper increased the number of denoising steps, and it took me considerable time with the proof that it made sense.
Yay!! Awesome stuff ❤
Interesting possibilities, but really needs a good web UI to be useful.
这 逗哏 和 捧哏 是咋弄出来的
revealed in new episodes
Thanks for posting clips on good papers!
This is a very good channel.. Even as a beginner, you can still learn a lot from these videos.
See it, use it, conquer it.
This is freaking crazy!
This was very interesting. I will say, while it is certainly good to be an active listener, you dont have to interject after every one of her sentences to show that you are engaged. I used to do that a lot because I have RBF and people would often cut their stories short because they thought I didn't care, but I then had to fight against the compulsion because it made me seem very stressed and would stress out the other person as well. But again, very cool topic. And I love Terrence Tao.
Thank you.I am a little frustrated with this paper' readability,but it offers profound insights in this field.
Are those Ai voices? asking because the commentary seems bland
which part do you think the most bland?
It is. Probably Google's podcast summary app.
Yes! This is notebooklm's podcast summary feature
Dislike...
why?
Please stop with this nonsense... Every idiot on TH-cam is passing papers through LM Notebook and passing it off as original content. Get a life
Oh come on do your explaination instead AI one
No difference essentially
@@Xiaoliu.x Really? There's no difference between taking the time to understand a research paper and explain it yourself, and dropping and dragging a file into notebookLM??? Any idiot can do the latter....
I think it’s way better when using an AI, it’s way easier to understand. Congrats to the author and can I know which tool you used for that ?
the voice over is so annoying to me... The idea is not bad, but the conversation feels so off. Like the first person is as much knowledgable as the other person and they just complement each other. That is so distractive. I would then expect only a one person to talk or Have the first person not knowing anything and asking simple questions about the subject and the other person being an expert explaining those questions. Then a dialog would make a lot of sene. In this case it does not make any sense. Its like one brain talking with two voices, ugh.
You have discovered essence of problem.It happens when lacking of content, this is a very short paper.
fascinating!
What reader are you using? And how it is putting the square box of years like that...
Sorry. I am not so sure what you mean but you could findout in other videos.
I am talking about 1:54 this,the green box
@@Tiimezz3373just Microsoft edge
Thank you
This seems neat. I think I daydreamed about one that uses a hypernetwork or something. But that was probably a niave and unweildy a approach. I like llms conceptually. I got into the replika community back when Chatgpt was only freshly announced. I like how its a mix of stage magic and metafiction/meta-nonfiction*. So yeah. It's neat ______ *because its basically a model of the stats of the entire world of language so its like a self aware metacommentary on how people talk. It's trippy.
hmm...intelligence is compression?
Tks.