Do not use Llama-3 70B for these tasks ...
ฝัง
- เผยแพร่เมื่อ 20 ก.ย. 2024
- A detailed data analysis of the 1 mio votes by the AI community of the performance of LLMs open up new insights to areas where LLMs outperform, and areas where you better do not use a particular LLM, but opt for a better performance LLM.
all rights w/ authors:
What’s up with Llama 3? Arena data analysis
lmsys.org/blog...
#airesearch #ai #newtechnology
this is great video! really amazing explanation
One of the best comments today! 😊
“of course, those people were wrong”…..hahahaha.
Finally, someone is laughing ! Success! 😂
Love how your critiques shred the populist AI community while providing useful info.
If an opensource llm perform well for your particular usecase then, for me, it Will always have my preference than a big monolithic closed source llm from ClosedAi!
I couldn't care less about friendliness. We can get that from low param models and use them to reform texts. Larger models should just care about reasoning above all else.
Now I know you are tripping. Unless I can't read that graph properly you are trying tell us that a 44-45% win rate is a big loss!
Especially as this is a 70b open weights model, while the others are all closed weights.
And as another commenter noted Llama 3 has only 4k context window so of course it will be poor at summarisation and other tests that rely on a long context.
We will be getting longer context versions from Meta, multi model and huge parameters.
Llama 3 was trained on 8192 token 😂
@@code4AI ok it has a 8k token length, GPT4 Turbo 128k, Claude 200K, Gemini 1000K+, so 16 times longer my point still stands.
And I notice how you did not address my first point, Like I said you are tripping.
I found it essentailly useless and a waste of my time. I gave it a dataset of 10,000 lines with 22 variables and asked for summary statistics in cumulative blocks of 1000. 10 blocks in total, I reposed this question about 8 times over hours and each time the answer was DRIBBLE. And that was a very easy task. Imagine giving it a little bit more difficulta task like time series modelling. I will check the alternatives.
Maybe you should choose an appropriate tool for the task.