5 Easy Ways to help LLMs to Reason
ฝัง
- เผยแพร่เมื่อ 26 ม.ค. 2025
- 5 Effective Strategies to Enhance LLM Reasoning: If your LLM (either an open source LLama 3 or a proprietary GPT-4omni) fails at reasoning, given your task, I introduce 5 easy methods to help LLMs to improve their reasoning capability significantly.
Boost LLM Reasoning: 5 methods w/o fine-tuning LLMs
From Chain-Thoughts, to Tree-of-Thoughts, Graph-of-Thoughts, Abstraction-of-Thoughts to my own experimental Hybrid-Graph-Abstraction-of-Thought for real complex causal reasoning. Given the simple example of optimizing traffic light system in a city with a dumb AI, that inherently has not been pre-trained on complex reasoning decision for PLANNING complex systems.
All rights w/ authors:
Abstraction-of-Thought Makes Language Models Better Reasoners
arxiv.org/pdf/...
#airesearch
#newtechnology
#science
Excelent! we need more like this pls.
I know just enough to be dangerous as the saying goes, but here's an idea I had a while ago: how about something that instead of building on sequential data like LLMs, or a operating all at once on a predefined grid like diffusion based image generators and such; what if you had something based on the Wave Function Collapse algorithm (the game map generation one, not quantum mechanics), but applied to a free-form graph structure, with both nodes and edges being tokens; the graph is initialized with the prompt (and context) with whatever dangling edges it may imply, and as the AI thinks the probabilities of adding, removing, and replacing tokens are adjusted (keeping the prompt and context frozen, obviously), until a cluster that has good enough score (including self-consistency and compatibility with the prompt cluster's edges) is found, and that gets interpreted as the output in whatever modality the graph it forms represents? Something like this could allow for the AI to intuit (is that a word?) an end goal that starts disconnected (the void could be considered it's own node token with edges that are compatible with all tokens, and always have dangling edges, but it would be special in that it would not be counted as belonging to any cluster), and then gradually deduce a way to connect it to the context+prompt cluster, and be able to think in a more flexible way, with parallel hypothesis, inherent self-consistency checking mechanisms etc, right?
I am thinking of this for the next 72 hours. I will be here with a reply but as far as I can tell I am in for this!
You asked the wrong expert. Lean4 is a programming language for math theorum proving. Your data cleaning gateway must pass through this transform to assure that data and its use is unambigous.
Hybrid-Graph-Abstraction-of-Thought thats cool do you find it works better?
Correct me if i am wrong are we focusing too much on perfecting linear algebra based models other than abstract and universal algebra concepts. Because in graph theory when related to abstract algebra the data representations were in triangles building higher dimensional structures from lower dimensional nodes, from nodes to to complex graphs and theirevolution. There were better concepts in mathematics for solving mathematical cognition based reasoning that deep learning experts may not exploring
So what is happening ? I mean i know it took huge amount of resources but so does transformers in the begining
Imagine the psychological shock, if big tech announces that their beloved Ai systems, that they invested billions of dollars, fail at simple logic tests, before they can actually materialize a return of their investment? Because until now only the short time financial evaluation of these companies went up, based on what they promised for future AI. Image what Microsoft would be worth without AI, without Copilot plus, ..... only based on the value of Win11? smile.
using my limited understanding: the main problem seems to be the fuzziness of human language. why not let the systems develop their own precise data-/command language that it uses to interact -to its agents, internal communication-? normalizing the input to a precise perception of the situation/context? we receive visual (looking, reading), audible, ..) -IOW- sensual information and translate it into meta information. watching a real time situation vs reading a description ends up in our perception very similar. Seeing a tiger jumping out of the bushes or reading about it requires naturally different actions. but there is a preprocessing, context establishing prior to our processing. all this, translating the input and building a context would remove a lot of noise and ambiguity. this, translated to a meta language that is the only channel that will feed to the process, may enhance already in the input phase objectivity and precision. missing context, safety, .. may be requested/verified already in the translation phase before it ever reaches the generation process. it may reduce the sample- and parameter-count of the system significantly -as a result of the normalization in the pre-processing phase-.
Critiques?
Hermes-Pro-Llama-3 with my custom sampler (kinda beam search with restarts) solves your prompt:
Generated completion:
Let us assume that the official fees for sending one child to Stanford is $X. Now, since Stanford pays 90% of these fees for families with low income, it would mean that the parents need to cover the remaining 10% of the fees. So, they would have to pay $0.1 * X for each child.
< .... an useless calculation ... >
Dividing both sides by 6, we find T = 0.1 / 6 ≈ 0.0167 years or about 5 days.
However, since the time it takes to accumulate the required funds is measured in years, they won't have enough money for the 7th child with no additional income. They would need some savings or outside financial help.
----
After all model made right conclusion
Now if we enter the arena of additional prompt instruction to my innocent little prompt: An A* approach would be more elegant than beam search.