I Tested Gemini Pro & FLASH 2.0...and I'm SPEECHLESS

What if all the world's biggest problems have the same solution?

DeepSeek on Apple Silicon in depth | 4 MacBooks Tested

🤮จะเกิดอะไรขึ้นถ้ากินปูที่ถูกทิ้งไว้เป็นเดือน? #shorts #chinesedrama

Rude Guy Trips the Waiter on Purpose But Gets What He Deserves! #shorts

This LLM Is WAY BETTER than I thought - Mistral Small 24B - 2501 Fully Tested

GosuCoder

มุมมอง 6 397

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 ก.พ. 2025
Mistral Small 24B 2501 is a crazy good model at coding, much better than I originally realized. This LLM requires a much lower temperature setting. I put the model against itself at different temperature settings. I found that temperatures do matter a lot for this model.
Note my scoring at the top of the app currently has a bug in it, that I didn’t catch until after all the footage was done.
My Links 🔗
👉🏻 Subscribe: / @gosucoder
👉🏻 Twitter: x.com/adamwlarson
👉🏻 LinkedIn: / adamwilliamlarson
My computer specs
GPU: 7900xtx
CPU: 7800x3d
RAM: DDR5 6000Mhz
Media/Sponsorship Inquiries ✅
gosucoderyt@gmail.com
Links:
huggingface.co...

ความคิดเห็น • 24

@HaraldEngels 7 วันที่ผ่านมา ⁺¹⁶
I am using a low temperature of 0.1 also for DeepSeek and Qwen. For my coding purposes it makes a huge positive difference. I am using Mistral Small 3 on a mini PC ASRock DeskMeet with the Ryzen 5 8600G CPU, 64GB RAM (6,000 MHz), a Samsung 1TB NVMe drive and a 4TB HDD. That system runs LLMs up to 48 GB and has cost me only $900. Its max. power consumption is 65 watts. Inference is not the fastest (with 16 TOPS) but sufficient for my coding and authoring purposes. The coding results are clean and of good quality - saving me a lot of time.
@sentinel-q6j 7 วันที่ผ่านมา
which model 7b or?
@changer1285 5 วันที่ผ่านมา
@@sentinel-q6j small 3 is new, it's 24b. Not sure if they really mean the whole thing?
@changer1285 5 วันที่ผ่านมา
No additional GPU?
@florianstephan5745 7 วันที่ผ่านมา ⁺⁴
thx for the temp! Nice channel...keep it up.
@DoppsPkin 7 วันที่ผ่านมา ⁺¹
love your tests
@tteokl 7 วันที่ผ่านมา ⁺³
I'm looking forward to the Roo code video with this model
@jwickerszh 7 วันที่ผ่านมา ⁺⁵
Temperature matters, prompt matters, time of day also matters somehow ... We see so many people just testing one prompt and concluding a model is good or bad but you really have to dive down the statistics of it and experiment with ideal temperature and prompt engineering.
@GosuCoder 7 วันที่ผ่านมา ⁺¹
I've noticed that too.
@jeffwads 5 วันที่ผ่านมา
I had it code that standard letters falling and bouncing demo and it failed, but not by too much.
@dataprospect 2 วันที่ผ่านมา
I use a very low temp that vllm let me to do so: 0.01.
I am curious is it worse than 0.15?
I need consistent output. 0.15 will not make it deterministic.
@srinidhihebbar196 7 วันที่ผ่านมา ⁺²
Great work.
How does this compare with Deepseek R1.?
@GosuCoder 7 วันที่ผ่านมา
DeepSeek R1 is definitely better but it’s a lot bigger model.
@xspydazx 6 วันที่ผ่านมา
I personally find higher temperature should only be used with roleplaying or story writing to allow for random and less focussed responses. IE imaginative .. emotive .. but for tasks I use a low temperature setting. If it is a untrained task I will slowly adjust upwards until perfect .
@sfl1986 7 วันที่ผ่านมา
please post more about this model specially for agents
@glyph6757 7 วันที่ผ่านมา ⁺¹
It would be interesting to see what would happen if you gave each model three chances at each problem.
Also, in my experience LLM responses vary greatly by how good your prompt is, and the same exact prompt could be great on one model but be lackluster on another. Of course it would be very hard to test this objectively because the number of potential prompts one could use for any given problem is infinite, but I would be willing to bet you could get much better results from both models by using better prompts, and of course by having more of a back-and-forth conversation rather than relying on just one single prompt.
@ThePolarOpposite 7 วันที่ผ่านมา
I prompted it to triple check my physics and calculus homework. I don't know how they demonstrate these AI models with the toughest math problems that exist when it doesn't get sophomore level University physics correctly. It's about 1 in 5 that is accurate. So I have to use several different models to see how close they agree and then swap the answers from one model to another so they're checking each other's work. I'd give it about 90% accuracy when I'm done, which is still not acceptable in my opinion for a calculator essentially.
@GosuCoder 7 วันที่ผ่านมา ⁺¹
Yes I definitely agree with the prompt mattering a lot.
@mlsterlous 7 วันที่ผ่านมา ⁺²
Snake game is actually nothing special. First model i remember that could do this was llama 3.1 - 8b! Only 8b model. And it was many months ago. Now many other models can provide code for snake game. qwen 7/14b definitely. At the moment i'm not imressed with this model. My favorite is Qwen2.5-14B-Instruct-1M. But i will test it a bit more to make sure.
@kingsuperbus 5 วันที่ผ่านมา
but can it play crysis
@GosuCoder 5 วันที่ผ่านมา
Hahaha!
@armiman123 8 วันที่ผ่านมา ⁺¹
Maybe give a bit of context?
@GosuCoder 7 วันที่ผ่านมา
Do you mean on the model itself or ?

ต่อไป

เล่นอัตโนมัติ

I Tested Gemini Pro & FLASH 2.0...and I'm SPEECHLESS

I Tested Gemini Pro & FLASH 2.0...and I'm SPEECHLESS

What if all the world's biggest problems have the same solution?

What if all the world's biggest problems have the same solution?

DeepSeek on Apple Silicon in depth | 4 MacBooks Tested

DeepSeek on Apple Silicon in depth | 4 MacBooks Tested

🤮จะเกิดอะไรขึ้นถ้ากินปูที่ถูกทิ้งไว้เป็นเดือน? #shorts #chinesedrama

🤮จะเกิดอะไรขึ้นถ้ากินปูที่ถูกทิ้งไว้เป็นเดือน? #shorts #chinesedrama

Rude Guy Trips the Waiter on Purpose But Gets What He Deserves! #shorts

Rude Guy Trips the Waiter on Purpose But Gets What He Deserves! #shorts

ตะลึง! "ปานเทพ" แฉมีขบวนการปกปิด คดี "แตงโม นิดา" มากกว่า 200 คน | ลุยชนข่าว | 10 ก.พ. 68

ตะลึง! "ปานเทพ" แฉมีขบวนการปกปิด คดี "แตงโม นิดา" มากกว่า 200 คน | ลุยชนข่าว | 10 ก.พ. 68

Ethical Hacker: "I'll Show You Why Google Has Just Shut Down Their Quantum Chip"

Ethical Hacker: "I'll Show You Why Google Has Just Shut Down Their Quantum Chip"

My opinion about Mistral Small 3, it's ...

My opinion about Mistral Small 3, it's ...

GitHub Copilot Agent Mode is WAY BETTER than I expected!

GitHub Copilot Agent Mode is WAY BETTER than I expected!

C can do this too and it's faster than Python

C can do this too and it's faster than Python

The Man Behind DeepSeek (Liang Wenfeng)

The Man Behind DeepSeek (Liang Wenfeng)

Mistral Small 3 - The NEW Mini Model Killer

Mistral Small 3 - The NEW Mini Model Killer

From Coder to Product Engineer: The Only Way to Survive AI

From Coder to Product Engineer: The Only Way to Survive AI

NVIDIA CEO Jensen Huang's Vision for the Future

NVIDIA CEO Jensen Huang's Vision for the Future

o3-mini is crazy GOOD at coding - First impressions against DeepSeek

o3-mini is crazy GOOD at coding - First impressions against DeepSeek

จักรยานที่น่าสงสัยว่ามีเอเลี่ยน👽 #findthealien

จักรยานที่น่าสงสัยว่ามีเอเลี่ยน👽 #findthealien

จากรุ่นใหญ่สุดรวยในเกม FIVEM สู่การติดคุกแก๊งแตกในชีวิตจริง [สรุปแก๊งOREO]

จากรุ่นใหญ่สุดรวยในเกม FIVEM สู่การติดคุกแก๊งแตกในชีวิตจริง [สรุปแก๊งOREO]

酒友把白酒藏到墙皮蛋糕中，蒙混过关#shorts#家庭搞笑#李叮长

酒友把白酒藏到墙皮蛋糕中，蒙混过关#shorts#家庭搞笑#李叮长

เอาไปทำอะไร? #แร่หายาก ที่ทรัมป์อยากได้ แต่จีนมีเพียบ #เทคโนโลยี #โดนัลด์ทรัมป์ #KeepTheWorld

เอาไปทำอะไร? #แร่หายาก ที่ทรัมป์อยากได้ แต่จีนมีเพียบ #เทคโนโลยี #โดนัลด์ทรัมป์ #KeepTheWorld

ตามหาซินเดอเรลล่า ที่โรงเรียนบ้านนาหว้า! | Part สุดท้าย

ตามหาซินเดอเรลล่า ที่โรงเรียนบ้านนาหว้า! | Part สุดท้าย

[LIVE] : ONE ลุมพินี 96 | คู่เอก "คมอาวุธ vs พันฤทธิ์"

[LIVE] : ONE ลุมพินี 96 | คู่เอก "คมอาวุธ vs พันฤทธิ์"

LISA - BORN AGAIN feat. Doja Cat & RAYE (MV Teaser)

LISA - BORN AGAIN feat. Doja Cat & RAYE (MV Teaser)