Deepseek-R1-Lite (Tested): This OPENSOURCE Model BEATS O1 & CLAUDE 3.5 SONNET!?

AICodeKing

มุมมอง 11 986

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 44

@robinmountford5322 12 ชั่วโมงที่ผ่านมา ⁺²
Sonnet is a force to be reckoned with when it comes to coding.
@DGFilmsNYC วันที่ผ่านมา ⁺¹¹
I gave deep think a system prompt before I, rewrote your confetti prompt because when you prompt cline you say it like this write a website in HTML, CSS , and JS, in your test you say you can use CSS and JS , I got the confetti in one shot, just press run html in the window... Let's add a 2nd test update the questions and a/b test the tests to really test these models
@vdbv0 วันที่ผ่านมา ⁺⁸
Oh lord ! Just tested it, it's wild ! Loving it ! I'm hoping the API cost is the same as now, if it's the case I will forget Heroku quickly!
@ANSHU61936 วันที่ผ่านมา
What do you mean by you will forget heroku?
They also have some kind of Bedrock alternate?
@vdbv0 วันที่ผ่านมา
@@ANSHU61936 Claude Haïku sorry! Wrote too quick 🙏
@luizbueno5661 13 ชั่วโมงที่ผ่านมา
I’m curious about how to drop Heruku and why you could do that once deepseeks api costs little
@ANSHU61936 13 ชั่วโมงที่ผ่านมา
@@vdbv0 same question again, why are you dropping heroku? They also provide some kind of ai model?
@bot. วันที่ผ่านมา ⁺²
Now this is interesting to see. Finally a new model showing highly promising results. Well lets see what I think of it. Also, forgive me if it is a bit chaotically structured, I am writing this as I watch the video. With that out of the way, let us get started!
As weird as it is, I would consider test one neither a fail nor pass, as what the model went through eerily resembled a human being stunned by a question, and not seeing the logical answer immediately. Hard to say how this can be improved, but I theorise the problem may solve itself once the model is given more time to think without rushing. Maybe even having it change perspectives at some point?
Moving on to test #4, we can tell that it did objectively fail, but the reasoning chain was obviously halted prematurely, presumably by the system itself to limit the amount of tokens spent on thinking. Smart choice by DeepSeek, yet obviously a performance limiter in cases like these. Would love to see what the answer would be if given as much time as it wants. Yet again, if they release the model open source down the line, these compute power problems can be solved easily by the users themselves (of course assuming that it is not an absurdly big model, which it unfortunately does seem to be the case here. Would love to be incorrect on the size though.)
Yet again, more compute time will not solve all the problems, as we can see from test #9 that it was unable to create a proper website for the confetti. Unfortunately though, we did not see the code perform outside of DeepSeek's own environment, which may itself be the limiting factor in this case, not the model itself. For more rigour the code should have been run also through more conventional means just in case, something like how the Python code was externally executed. (Also, do pardon me if my assumptions about DeepSeek's environment are incorrect, I am not that familiar with web frameworks or their execution.) I would say something similar for test #12, but I did not catch if the DeepSeek environment was used, so I am forced into mere speculation for this.
Sorry for the long paragraph, but moving on to Test #11, I would consider it a fail from an artistic perspective, but the model itself was most likely not trained on SVG creation, so the expected potential is rather low. However, it is still impressive that it created a general shape of a butterfly.
All in all, a very, very exciting model. Especially if it is able to be used on most systems.
@LetYourLightShine5218 วันที่ผ่านมา ⁺³
The AI's response to Test #3 was correct but it would have been interesting if it had been able to further speculate that C possibly was the other person playing table tennis with E unless E was playing solo with the table against the wall.
@jargoti20 12 ชั่วโมงที่ผ่านมา
Interesting comparison. I would love to see the API coming out so we can implement it in our own apps
@aculz วันที่ผ่านมา ⁺³
well, this is great result for model named "Lite" which can almost beat o1 not just o1-mini. im very sure that might the "Large" one can beat sonnet aswell so we can have an Greatest Open Source model and much much cheaper than sonnet. cant wait for another brilliant move from this company
@MeinDeutschkurs วันที่ผ่านมา
Have you tested AYA? Great for structured outputs.
@AnugrahPrahasta วันที่ผ่านมา ⁺²
WOW. Finally, deepseek!
@LetYourLightShine5218 วันที่ผ่านมา ⁺²
While the "thinking outside the box" is impressive I think the AI failed Test #1 for 2 reasons. First, the AI said >>there doesn't appear to be any country with an official English name that ends in "lia."
@perfectartiste6332 วันที่ผ่านมา
good one, this will really be a game changer
@PrinzMegahertz 10 ชั่วโมงที่ผ่านมา
With regards to question 2 - shouldn't C be playing table tennis with E? If noone else in the house and C is not playing, who is E playing with?
@TURKLERDIZIS วันที่ผ่านมา ⁺²
claude sonnet 3.5 is the best choice for coding
@stephensamuel2770 วันที่ผ่านมา
First to view, first to comment.
This is quite an impressive. I have used it and the results is so amazing.
@BeastModeDR614 วันที่ผ่านมา
Nice open models are getting close, Cant wait until we can run cline locally and do full stack applications with no limit
@luizgustavs วันที่ผ่านมา
i like very much the artstyle of the images in the beginning of your videos, would you mind share the prompt to get this art style? Would be greatly appreciated
@AICodeKing วันที่ผ่านมา ⁺²
It's very basic.. Something like "A panda in a forest, in front of a campfire, cinematic, anime style".
@TheProtein83 วันที่ผ่านมา
Great job. I disagree with the opinion about CoT and coding. In case of complicated architectures, thinking steo by step should provide better results
@michaelrichey8516 วันที่ผ่านมา
logically, your 3rd question has a better answer than "unknown"
Statement 1 says there are 5 people in a house, naming them. 4 people are given activities with the 5th (C) not being mentioned.
E, however, is playing table tennis, a 2 player game. Logically, E is playing with C, because there are 5 people in the house and table tennis cannot be played alone.
@idea_list วันที่ผ่านมา
I wonder if the answer to question 3 should be "playing table tennis". It's hard to imagine that E is playing tennis solo, right..?
@dixalex02 วันที่ผ่านมา
I'm having trouble creating a markdown file of the pixijs api. Something about the url syntax prevents it from being scraped. Any advice?
@rassular วันที่ผ่านมา
Can you test the new Mistral Large 2411?
@eado9440 วันที่ผ่านมา
Open source model this good is crazy, just hope it's not like 500 GB
@aculz วันที่ผ่านมา
its okay for 500GB, if we cant use it on our local then we can use their's which is crazy cheap than sonnet and gpt
@thatbeezie วันที่ผ่านมา
How do you use this open source in like in aider or va code extensions/apps?
@aculz วันที่ผ่านมา
you need to pay to use their api
@TawnyE วันที่ผ่านมา
E
Edit: I just now noticed on the top commenter with the most hearted comments, that is a pog
@DouhaveaBugatti วันที่ผ่านมา
Suppose we combine it with the coder model☠️
@flutterflowexpert วันที่ผ่านมา
I think you need a new benchmark 😂
@다루루 วันที่ผ่านมา
😊
@warlockassim4240 วันที่ผ่านมา
first and bro answer how to make aider detect my local project?
@AICodeKing วันที่ผ่านมา ⁺¹
It should probably detect it automatically.
@Gorops วันที่ผ่านมา ⁺¹
Are you running it in the project folder/repository?
@wasimdorboz วันที่ผ่านมา
@@AICodeKing i am using linux and it not , i think i should do /save then /load not ?
@wasimdorboz วันที่ผ่านมา
@@Gorops yep
@gabrielkasonde367 วันที่ผ่านมา
Yoooooo😂🎉
@randomlettersqzkebkw 23 ชั่วโมงที่ผ่านมา
openai has no moat
@shay5338 วันที่ผ่านมา
haha 8th one to comment, it would be cool if you were to show how can we access these llm models for free without any limits
@aculz วันที่ผ่านมา ⁺¹
well. just install it on your ollama or LM Studio and use it locally. but be sure you have the greatest GPU or it will perform very slow

ต่อไป

เล่นอัตโนมัติ

Multi-Agent AI EXPLAINED: How Magentic-One Works