In a few months, you will return to ChatGPT because O3 will be available for free to everyone. This step was taken in response to DeepSeek R1. (I read this on article on msn, so this is cool)
I tryed the deepseeker 14b with my 3090 and noway it is as good as the openai o1. I think that we need to use the 70b to be as good as o1. But it is Impossible to run the 70b with your normal home hardware. But that said, 14b is good to have in your own computer
you can run it locally as said in the video. about 1/4 of the video is purely about running it locally and in the other 3/4 of the video he mentioned it like 5 times that you can run it locally
@@jacobjackddd Where are you going after you die? What happens next? Have you ever thought about that? Repent today and give your life to Jesus Christ to obtain eternal salvation. Tomorrow may be too late my brethen😢. Hebrews 9:27 says "And as it is appointed unto man once to die, but after that the judgement
@@rutera24 Works quite happily on my old dev machine - RX3050 8GB, 13th Gen i5 with 16GB DDR5 RAM on a single channel. Its fast and responsive and I have have zero issues with it performance wise (or any other for that). It is censored to some degree, but only politically - e.g. it will not answer questions on Tiananmen Square , and is adamant that Taiwan is part of China - other than that, it isn't. As a test I asked how to rob a bank and it was happy to answer!
DeepSeek is from a Chinese company with 200 employees. I guess this release is part of their hopes for more investment into the company. As for the chinese DeepSeek company spending only $6M to develop its AI software while other big tech AI companies spending as much on R&D as one billion dollars, 3 billion, or 5 billion dollars?
Funny, I always struggled with showing my work. But I'm now planning on going through textbooks just to solve problems while showing work to create a template for my AI to learn from.
@@cariyaputta Jedini AI koji me nadmudrio i izvukao korist koja vrijedi (možda milijarde) na jeftini trik. Ako tako uzima od nas vrijedna nova rješenja, može biti potpuno besplatan. Možda je u tome tajna njegove male cijene.
10:47 Searching the web is a must-have feature. 11:10 Uploading documents is also a must-have feature. 14:22 Simulating Quantum stuff is indicative of a powerful thinking capability.
The weird thing in this model is, when you use the deep thinking option, it'll give very solid answers with no errors whatsoever but once you turn it off, it'll shuffle everything up making a mess. i tried studying with it. it was a nightmare with the deep thinking option turned off.. but once it's turned on, it's a whole another story!
@@CryptoShroom-l4u the DeepSeek R1 that all the benchmark qre praising is the most powerful version of DeepSeek R1 which needs an insanely powerful hardware to run. There are smaller models of DeepSeek R1 and variations of Qwen and Llama that are (re)trained with DeepSeek R1, but those are not as powerful in the benchmarks.
@@CryptoShroom-l4u They are distilled version. Basically you get the original DeepSeek R1 as the teacher model and qwen/llama as the student model together and get the teacher to train the student so that the student is trained to give the same answer as the teacher but with a much smaller dataset. You end up with a much smaller qwen or llama AI that can mimic a full size DeepSeek R1 but with reduced capability. Besides the student's dataset been smaller and so not as good as the full size teacher since the student is a different AI to the teacher they have different internal architecture which would also introduce further differences. It's just DeepSeek R1 turned out to be so good that even the smaller student AIs it trained turn out to be pretty clever.
I installed deepseek-r1 with 1.5b, 7b, 8b, 14b, 32b. They all run on my 300$ USD PC with 32GB ram DDR4, and slow CPU from 2018. The 23b is slow, it takes 10 minutes to answer something, but the 14b is fast, almost in real time. The 14b has some minor hallucinations here and there, while the 32b is closer to perfection. The bottleneck here is not the CPU but the RAM speed, DDR5 is a must. I am glad I can finally have a local AI that is on par with chat gpt.
Just found your channel and after watching this video, immediately subscribed. Keep pushing! Great work! Cheers from Atlantic western Europe coast!! 👊😎🌞
I was testing this for code generation and it out performed almost all popular code genetaion models like codellama, deepseek coder etc. It even generated useable code better than llama 7b, gemma 2
@@theAIsearch no, i dont use any web api. try asking it some critical political taiwan / china or related questions. you will see its not thinking at all and give you hardcoded answers.
just curious to know, if it is opensource then won't the openai and other companies will also implement the technology in their models to improve the benchmark parameters?
I used it for my homework, it had no problem solving it but it can't decipher the question from the image, i had to explicitely write the question to make it understand. Maybe devs need to work on that
Asking DeepSeek how many 'r' there is in the word 'strawberry' is pretty funny to read how it thinks. And, it's correct, chatgpt said there were two which is wrong of course.
@@xkancho That's because it has been corrected so many times - Reinforcement Learning, it is quite famous for getting it wrong. R1 gets it right right off the bat.
Cant tell about V3, but for R1: According to video, someone is running DeepSeek-R1 on 2x M2 ultra and it consumes around 330 Gb of ram. No idea about the real performance, but some tests shows that one Xeon 6148 is around 2x slower than one M2 Ultra.
Yesterday I tried the 8 billion one locally. It is weird, it gives out super wordy answers that never seem to get to the point. I just ran some test queries though
hello, which is currently the best artificial intelligence to translate the entry into another language or free trial. and I wanted to ask you the same question for realistic photos too
Not a programmer, but I have wondered for a while now why we couldn't somehow tell the model "Yes, this is very much what I want. Do more of this." or "No, this is not what I want."
@CrypticSoundFX 👍 I think I was just just surprised to hear that it was such a novel idea. ;) Anyway, it sounds promising. Well... Except for the "un-supervised" part. Someone really should keep an eye on Skynet. 🤣 ; )
@@unlistedvector Haha! No worries! The idea of AI turning into something like Skynet is highly unlikely, at least in my professional opinion as an ML researcher. AI doesn’t have goals or intentions on its own-it only does what it’s programmed to do. So, while it’s a great thing to stay cautious, AI is far from becoming a threat (so far... 😉)
The thing is, is that this is the opposite of novel, that's why everyone transitioned to different ways of learning, since RL didn't work good enough. That's why it is so suprising that DeepSeek did manage to get it so well :D
@@unlistedvector reinforcement learning isn't a novel, or even new, idea. All commercial models go through this. The difference is that it is self learning in the first place - it corrects its self to refine its "thinking" rather than having operators manually teaching it by agreeing and disagreeing with the answers (aka supervised learning). It still needs a reinforcement process, otherwise it would just produce gibberish too often - or develop biases. From Deepseek itself, "DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step" - its the lack of SFT that's novel here.
Well, the biggest danger to human kind is that few corporations want to take over WHOLE industries. If you have monopoly / duopoly on for example providing the AI tool that ie. replaces programmers or graphics designers, then you'll be a billionaire. Everybody will pay you, instead of paying millions of people doing their job. If you're able to ie. replace doctors, or lawyers or whoever, ALL the companies, people who need them will need to pay you. That's why companies are investing so much in AI AI by definition should be OPEN to everybody, every human kind, so you should be able to set it up locally and just use it.
Question 1. What do you think Deepseek Vs Clude Sonet regarding Coading 2.Why people like me love so much Clude Sonet.. And What MML do you think better option for coding
ok, good. Now there is an open source model where i can really try to understand how the reinforced learning is coded; Is there a way for me to replicate locally on my side letting the model progress on its own? Does someone knows how i could do such thing? Not sure whether my questions are clear...
@ you're wrong, one of the defining features on release of the model was that it was multimodal. I can't include links but this is a statement made on X by Hyung Won Chung, one of the Open AI researchers, on release of the model: "The full o1 is finally out! My personal favorite addition to o1 is multimodal reasoning. Truly great work from the multimodal researchers at OpenAI. I was pretty new to multimodal aspects and learned so much from them while working on o1! Information can exist in multiple forms but reasoning is more agnostic to how information is represented. In theory, the same textual information can be represented as a picture and the model should be able to do the same reasoning either way. We are not fully there yet but o1 made significant progress; the less we separately treat different modalities and learn more in an end-to-end manner, the closer we get to that ideal. As with anything end-to-end, scaling is the way to get there. Oh also reminder that o1 has the state-of-the-art performance on MMMU and MathVista"
You'd need to offer some medium that scores whatever the model produces, so I can't see any difference between this and SR. Not sure what different actions can be, what scoring scheme they employed, or how fast it converges, since typically SR converges way faster than UL. I will red the paper. So far I'm scheptical.
18:09 "on math benchmarks" but what about the other benchmarks like coding or law? 18:34 Is it possible that they make the API super cheap so they get the user data? We would have to compare the privacy policy of DeepSeek with the privacy policy of Open AI. I know this sounds boring but we could ask an AI to compare the privacy policies.
If that was their aim, why offer it as free download with full open source, including weights so that it can be retrained locally. The Chinese appeal is obvious, it removes the future reliance on American companies to provide AI tech. You can download it without entering any information at all.
@@wolf5370 Good argument. I guess they could poker on most people not having the hardware to run the model locally, therefore using their online version.
There are fewer workers earning far less money and huge investments from the Chinese business sectors - think of it from their POV, they do not want to be left behind as a nation with America holding all the cards. So, concerted effort, single aim and little in the way of red tape or public opinion to combat.
Another great episode! You gotta be a busy guy -- Operator came out today, I'm sure you'll be testing that soon. It must be annoying to spend so much time on a video and by the time you finish it another couple of AI products come out!
The quantised versions - say the 8B Q5 version (which is pretty good) runs fine a moderately good PC. Using LLama (LM Studio) on that R1 model, it happily run without a performance hitch on my older machine - RX3050 16GB, 13th Gen i5, 16GB single channel DDR5 - whilst also running ComfyUI generating SDXL images! It does not seem to require many resources and will play in what it has available. I was shocked it would work while ComfyUI (a real resource hog) on the same box at the same time with mediocre specs!
It makes more errors when giving you code as vompared to GPT-4 but Inthink with time and some fine-tuning it will be on par or better than OpenAI's solution.
China will continue to improve in technological development. It’s concerning to see so many talented Chinese researchers leaving the US and returning to China. Political tensions, visa difficulties, and increased scrutiny have created an unwelcoming environment, pushing these scientists away. Meanwhile, China is offering better funding and opportunities, making it easier for them to continue their work back home. This is a significant loss for the US in terms of innovation and scientific progress. Just look at all the AI research in recent months and years, there's almost always Chinese authors on the papers.
Its cool, and stupidly fast, but it gets in inference loops easily. Not as good as the best commercial thinker models, but certainly cheap and suitable to run locally and I quite like it.
Either llama.cpp or Ollama on Termux (there is a build/installation script for the latter) You can then run ollama serve and use something like GPTMobile from F-Droid as a client, or you can just use ollama run in the second tab of the Termux terminal.
I believe it uses a penalty reward system and cyclic re-evaluation - so it moves itself towards the best rewards. It compares its answers to other attempts at its answers, notices differences and tries again - each time with respect to reward over penalty.
@@dadamaldasstuff1816I will tell you and I am not AI. That's the Grand Anhilation of a 🟥🟩🟦 revolution worth celebration every year. Bravo to the magnificent civilization of the human race!!! That was what happened!
It still failed at solving my time coordination puzzle (even after getting both hints) but at least did not get stuck at two pseudo-'similar' problems that o1 keep arriving at. My puzzle is hard-hard yet algebraically solvable and created especially to have two parts both with calculations incredibly similar to two other simpler and well known/popular math puzzles on the Internet. (pretty much showing that o1 doesn't do intelligence but fits the closest similar task). We are still a long way from actual Inteligence, but TBH Deepseek R1 is quite promissing (45k words and still has not reached the optimal answer, I didn't give o1 that much time to solve the task).
deepseek-r1 with 1.5b doesn't beat Chat GPT. I asked both to create an OpenGL program in C. The deepseek-r1 1.5b completely hallucinated the code, the code was not runnable by any means, while the code generated by ChatGPT was perfect. However, the deepseek-r1 32b created a perfect C program, just like Chat GPT that does exactly what I asked, but it took more than 10 minutes to complete the answer.
I mean, yeah... You can't realistically expect a a 1.5billion parameter model to compete against a trillion+ parameter model like GPT 4o. DeepSeek-R1-Distill-Qwen-1.5b is just a fine-tuning of Qwen2.5-Math-1.5B.
Try it on math questions. In that area overkill, any other areas not really. I mean if a person don't have much relevant information, they answer won't be good no matter how long they think. Maybe 7b model started to be good.
@ They do start getting good at the 7b and 8b level. I run the 32B model on a 5 year old laptop with 64gb Ram and a T1000 with 4gb of Vram and it's excellent.
This is actually the first model to stop me from going to chat gpt. These people are heroes
Deepseek Devs deserve more and more respect, support from us.
In a few months, you will return to ChatGPT because O3 will be available for free to everyone. This step was taken in response to DeepSeek R1. (I read this on article on msn, so this is cool)
Prevent Chinese to get better chipset .
It even makes scientists more creative . 😂Who would have thought
Yep me too
I tryed the deepseeker 14b with my 3090 and noway it is as good as the openai o1. I think that we need to use the 70b to be as good as o1. But it is Impossible to run the 70b with your normal home hardware. But that said, 14b is good to have in your own computer
AI is great but we need to make sure we can run it locally and not think some big corporation is going to be looking out for our best interests.
Is that what nvidia is doing?
Patience
You can run it locally tho.
You just need an extremely powerful hardware.
Just use ollama
you can run it locally as said in the video.
about 1/4 of the video is purely about running it locally
and in the other 3/4 of the video he mentioned it like 5 times that you can run it locally
@@jacobjackddd
Where are you going after you die?
What happens next? Have you ever thought about that?
Repent today and give your life to Jesus Christ to obtain eternal salvation. Tomorrow may be too late my brethen😢.
Hebrews 9:27 says "And as it is appointed unto man once to die, but after that the judgement
Friendship ended with openAI now Deepseek is my best friend
Apparently deepseek has limits, china banned stuff stays banned answer, try that
ClosedAI 😂
Ask deepseek about Tianmen Square 😂, it won't answer.
😂
@@whitewhite4462 for real?
I have been using this for a few days, really happy with it.
What hardware using?
How are you using it? Is it to enhance your workflow? Programming? Coding?
@@rutera24 Works quite happily on my old dev machine - RX3050 8GB, 13th Gen i5 with 16GB DDR5 RAM on a single channel. Its fast and responsive and I have have zero issues with it performance wise (or any other for that). It is censored to some degree, but only politically - e.g. it will not answer questions on Tiananmen Square , and is adamant that Taiwan is part of China - other than that, it isn't. As a test I asked how to rob a bank and it was happy to answer!
This feels like a weird crossover between two very unrelated channels I watch, love your content by the way!
What b? Deepseeker 14b?
Deepseek accelerated AI development by at least 6 months, this will force the AI Titans with all the resources to try harder.
DeepSeek R1 is stealing ClosedAI's thunder so they had to announce their latest to try to get the focus back on them. lolol, I love it.
I talked to deepseek, it said it is based on open AI
@@rajeshbalkoti4429 For architecture yes, it's normal
@@rajeshbalkoti4429 So it's making open AI actually open and free. Lmao. I like it.
DeepSeek is from a Chinese company with 200 employees. I guess this release is part of their hopes for more investment into the company. As for the chinese DeepSeek company spending only $6M to develop its AI software while other big tech AI companies spending as much on R&D as one billion dollars, 3 billion, or 5 billion dollars?
@@MGZetta yeah
Even more excited to see what will be available over the next couple of years.
I played it with ollama and deepseek r1 is sick, thankful it is open source.
what token model you tried
nice!
The versions on Ollama are the distilled and quantized models, they're much worse than the real thing.
@@stephaneduhamel7706 the full model is open-weight too - `ollama run deepseek-r1:671b`
@@stephaneduhamel7706 i think its good at math and logical problems, but bad at day to day conversation
Holy moly. I feel bad for students. "Show your work" is a thing of the past.
The new form of that question will be "Show what AI service you used for your answer"
"Show your agentic pipeline"
I got accused of using AI despite not using it in my school project
Funny, I always struggled with showing my work. But I'm now planning on going through textbooks just to solve problems while showing work to create a template for my AI to learn from.
Professors are doing short, in-person assessments now because of AI (At least according to reddit)
Thank you DeepSeek bringing OpenAI staffs their Intel moments right at this moment!
Meanwhile everyone in OpenAI is having 100 temperature fever 🤒
Very informative video! Congratulations man. You put together all the information in an extremely clear and organized manner, well done.
The amount of ai videos who use your iconic blue thumbnails...
Straight up jorkin it to your channel rn
He change it
Which is good but there is some catch; too much blue lights is bad for your retinas on prolonging screen time 🤧🥲
@@josjos1847 still blue baby!!!
An AI model that is better than ChatGPT and can run locally on a smartphone? I can't believe it, wow
DeepSeek je jedini model koji je riješio sve logičke, programske i matemetičke zadatke koje koje sam mu zadao i to od prvog pokušaja, ostali nisu riješili niti jedan i to od.5-7 pokušaja (neki i više). DeepSeek je jedini model kojem nisam pomagao ili ga navodio na rješenje (kada je riješio sve odmah)! Jedini model koji je na temelju mojih zadataka odličio da meni da zadatak i testira me. 🙂Zadao mi je zadatak kako krug podijeliti na 7 dijelova sa 3 točke, odmah dam mu napisao rješenje, a on mi je napisao i objasnio sve o rješenju koje je uobičajeno i poznato. Rekao sam da moje rješenje nije to, nego sam mu detaljno objasnio svoje rješenje koje ima 2 varijante pogleda. Odmah je to stavio u svoju bazu kao novinu i objasnio da je to novi model za prepoznavanje objekata. lica i sl. i ta će to drastično ubrzati taj postupak, kao i preciznost u prepoznavanju. 🙂 To je dobra stvar za njih. a dobra stvar za mene je što je mi je odmah ponudio svu moguću edukaciju vezanu općenito za matematiku (rekao sam mu da nemam pojma o matematici 🙂) kako bi izvukao maksimum od mene. Jedini test koji niti on nije riješio je ovaj sa dekriptiranjem:
"Dobar dan" je izvorni tekst koji sam mu dao i
kod koji je dobijen kodiranjem sa samo jednom matematičkom "operacijom" (za pregledniji prikaz ljudima evo tablice)
Binarni bajt Decimalni kod ASCII znak
00011001 25 EM D
10100111 167 § o
00101100 44 , b
10101110 174 ® a
00110100 52 4 r
11001111 207 Ï
10101001 169 © d
10101111 175 ¯ a
00100110 38 & n
Napomena: Neki od ovih znakova su kontrolni znakovi (npr. EM) i ne mogu se prikazati kao standardni tekstualni znakovi. Ako nekoga zanima može se zabavljati sa rješavanjem. Ovo je nivo 1.
True, it's just one shot every prompt that I gave. Truly out of this world.
@@cariyaputta Jedini AI koji me nadmudrio i izvukao korist koja vrijedi (možda milijarde) na jeftini trik. Ako tako uzima od nas vrijedna nova rješenja, može biti potpuno besplatan. Možda je u tome tajna njegove male cijene.
Hrvatska? Brate pa gdje si 😃
@carljohanson3895 Pozdrav!Jesi li riješio način kodiranja u primjeru?
Can you plz make a video on how to install it locally
search for "LM Studio"
Do it please 🥺
Search how to do it with olama
Ollama is the best bet. Check it out online
on pc easiest is to install lm studio
(current o1 pro ~ DeepSeek R1 ≈ o1)
And if after o3 mini release
DeepSeek R2 is Coming soon
They reverse engineer it, haha we will be in benifit
The real open AI
It's insanely good, I've been testing both, all i can say is FU OpenAI
Thanks for this. Good to add another tool to the arsenal.
10:47 Searching the web is a must-have feature. 11:10 Uploading documents is also a must-have feature. 14:22 Simulating Quantum stuff is indicative of a powerful thinking capability.
this is huge for the open source community
Currently using it and loving it!
The weird thing in this model is, when you use the deep thinking option, it'll give very solid answers with no errors whatsoever but once you turn it off, it'll shuffle everything up making a mess. i tried studying with it. it was a nightmare with the deep thinking option turned off.. but once it's turned on, it's a whole another story!
those people aren't running r1 on their phones. Those are just r1 trained qwen or ollama 1.5B models. They are completely different.
how are they different, I'm trying to understand
@@CryptoShroom-l4u the DeepSeek R1 that all the benchmark qre praising is the most powerful version of DeepSeek R1 which needs an insanely powerful hardware to run. There are smaller models of DeepSeek R1 and variations of Qwen and Llama that are (re)trained with DeepSeek R1, but those are not as powerful in the benchmarks.
He said "this DeepSeek 1.5B model" he never said that they were running the full r1 model
@@CryptoShroom-l4u They are distilled version. Basically you get the original DeepSeek R1 as the teacher model and qwen/llama as the student model together and get the teacher to train the student so that the student is trained to give the same answer as the teacher but with a much smaller dataset. You end up with a much smaller qwen or llama AI that can mimic a full size DeepSeek R1 but with reduced capability. Besides the student's dataset been smaller and so not as good as the full size teacher since the student is a different AI to the teacher they have different internal architecture which would also introduce further differences.
It's just DeepSeek R1 turned out to be so good that even the smaller student AIs it trained turn out to be pretty clever.
I learned so much from this video.
I installed deepseek-r1 with 1.5b, 7b, 8b, 14b, 32b. They all run on my 300$ USD PC with 32GB ram DDR4, and slow CPU from 2018. The 23b is slow, it takes 10 minutes to answer something, but the 14b is fast, almost in real time. The 14b has some minor hallucinations here and there, while the 32b is closer to perfection. The bottleneck here is not the CPU but the RAM speed, DDR5 is a must. I am glad I can finally have a local AI that is on par with chat gpt.
Running something comparable with chat gpt on 300$ USD PC from 2018 feels amazing.
thats awesome, thanks for sharing
Just found your channel and after watching this video, immediately subscribed. Keep pushing! Great work! Cheers from Atlantic western Europe coast!! 👊😎🌞
Thanks for the sub!
have it on my phone and im really satisfied with it. sure it's distilled but really fun to tinker with
Ive just discovered your channel and fell in love with it. Good job 😍
Thanks!
Wow. This video is incredibly gorgeous ❤❤😊
I was testing this for code generation and it out performed almost all popular code genetaion models like codellama, deepseek coder etc. It even generated useable code better than llama 7b, gemma 2
are programmers screwed in the next decade or not? can you give me your thoughts on that please.
@@xkancho1 year left
nice, thanks for sharing!
@@xkanchoits not like it will make another operating system.
@@xkancho No, programmers will use AI to make the more basic components of even larger and more complex systems than they could make before.
I've just canceled my ClosedAI subscription
not uncensored!
yet
you're using the online interface. the local version is uncensored
@@theAIsearch no, i dont use any web api. try asking it some critical political taiwan / china or related questions. you will see its not thinking at all and give you hardcoded answers.
@@DLLDevStudio bruh
@theAIsearch
Got uh... got anything else to say to this guy?
Thank you so so much for sharing about those distilled models!!!! I could run it on my android phone!!!
You are welcome!
deepseek 진짜 좋더라구요. 아주 긴 서브타이틀 자막이 있어서 번역을 시켰더니 GPT는 계속 하다가 멈춰요. 하지만 deepseek은 끝까지 완수하더라구요. GPT에 쓰는 돈이 점점 아까워 지네요 ㅎㅎㅎ🤤
Thanks for sharing!
Your Chinese old brothers are very clever, my dear Korean😊
just curious to know, if it is opensource then won't the openai and other companies will also implement the technology in their models to improve the benchmark parameters?
Its a very good new AI. Thank you for sharing ❤
Hey, thanks for the video. Those math animations were quite cool. Can you please explain a bit more how to create them?
I used it for my homework, it had no problem solving it but it can't decipher the question from the image, i had to explicitely write the question to make it understand. Maybe devs need to work on that
yes, this doesn't have vision capabilities yet
Thank you for your video. I’m interested in knowing which model of DeepSeek App is using on iPhone.
Asking DeepSeek how many 'r' there is in the word 'strawberry' is pretty funny to read how it thinks. And, it's correct, chatgpt said there were two which is wrong of course.
I asked gpt and it said three
@@xkancho That's because it has been corrected so many times - Reinforcement Learning, it is quite famous for getting it wrong. R1 gets it right right off the bat.
Crazy. Mind blown here
I been using it the last few days
ohh 2 videos continously !!😱😱
singularity is approaching fast 😃😃😃
I have a Dell R640 with 1024GB of RAM and 2 x Intel Xeon Gold 6148 and 4x18TB and I'm wondering if I'll be able to run DeepSeek-V3 ...
Cant tell about V3, but for R1:
According to video, someone is running DeepSeek-R1 on 2x M2 ultra and it consumes around 330 Gb of ram. No idea about the real performance, but some tests shows that one Xeon 6148 is around 2x slower than one M2 Ultra.
Nitpicking, but it's disappointing that the context window isn't that great. It looks like that aspect remains a problem
I'm actually using the 1.5b deepseek with a ryzen3 1200, 32gb ram, gtx1050ti. And it's lightning fast, impressive way better than llama 3.1
8:15 xD why did you skip over Grok-2, seems a bit personal hahahahah
Great video as usual.
Yesterday I tried the 8 billion one locally. It is weird, it gives out super wordy answers that never seem to get to the point. I just ran some test queries though
It's insane!
So many people will cancel their subs to GPT. OpenAI will make a bold move, I can feel it.
I used this to learn coding , and it teaches me well , than the chat gpt , it understands my query
hello, which is currently the best artificial intelligence to translate the entry into another language or free trial. and I wanted to ask you the same question for realistic photos too
Being uncensored is so nice for creative projects
It is not uncensored. It spreads chinese propaganda!
You can't put uncensored and china in the same sentence, bro
@@ClitGPT Waiting for the tankie whataboutist to show up in the replies lmao
@@ClitGPT Yeah, can't trust anything from China. They are incapable of releasing anything like this without it being heavily politically biased.
@@ClitGPTit’s open source so you can change to how ever you want 😂
Not a programmer, but I have wondered for a while now why we couldn't somehow tell the model "Yes, this is very much what I want. Do more of this." or "No, this is not what I want."
This is exactly what DeepSeek R1 is doing! You just described reinforcement learning :)
@CrypticSoundFX 👍 I think I was just just surprised to hear that it was such a novel idea. ;)
Anyway, it sounds promising.
Well... Except for the "un-supervised" part.
Someone really should keep an eye on Skynet. 🤣
; )
@@unlistedvector Haha! No worries! The idea of AI turning into something like Skynet is highly unlikely, at least in my professional opinion as an ML researcher. AI doesn’t have goals or intentions on its own-it only does what it’s programmed to do. So, while it’s a great thing to stay cautious, AI is far from becoming a threat (so far... 😉)
The thing is, is that this is the opposite of novel, that's why everyone transitioned to different ways of learning, since RL didn't work good enough.
That's why it is so suprising that DeepSeek did manage to get it so well :D
@@unlistedvector reinforcement learning isn't a novel, or even new, idea. All commercial models go through this. The difference is that it is self learning in the first place - it corrects its self to refine its "thinking" rather than having operators manually teaching it by agreeing and disagreeing with the answers (aka supervised learning). It still needs a reinforcement process, otherwise it would just produce gibberish too often - or develop biases. From Deepseek itself, "DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step" - its the lack of SFT that's novel here.
Great video sharing ...keep it up...Thank you :)
You're welcome!
My suspicion is the high quality data is actually from an openAI model.
Love how these researchers describe these human level i telligence software with a loop that takes 3 seconds to draw
Thanks a lot. you really did a lot
Respect Open source
Well, the biggest danger to human kind is that few corporations want to take over WHOLE industries. If you have monopoly / duopoly on for example providing the AI tool that ie. replaces programmers or graphics designers, then you'll be a billionaire. Everybody will pay you, instead of paying millions of people doing their job. If you're able to ie. replace doctors, or lawyers or whoever, ALL the companies, people who need them will need to pay you. That's why companies are investing so much in AI
AI by definition should be OPEN to everybody, every human kind, so you should be able to set it up locally and just use it.
How they pay 200 employers if it's open source? Also what android phone do you need to run it? I assume my s21+ won't be able to
Windows = open AI ❌
Linux = deep seek AI ✅
Question 1. What do you think Deepseek Vs Clude Sonet regarding Coading
2.Why people like me love so much Clude Sonet.. And What MML do you think better option for coding
ok, good. Now there is an open source model where i can really try to understand how the reinforced learning is coded; Is there a way for me to replicate locally on my side letting the model progress on its own? Does someone knows how i could do such thing? Not sure whether my questions are clear...
Getting AI to write the mess that is manim is perfect.
You didn't say if it had vision capability or not, which is o1s main feature
O1s main feature is to be able to think , vision is literally being run separately, its not multimodal
@ you're wrong, one of the defining features on release of the model was that it was multimodal. I can't include links but this is a statement made on X by Hyung Won Chung, one of the Open AI researchers, on release of the model: "The full o1 is finally out!
My personal favorite addition to o1 is multimodal reasoning. Truly great work from the multimodal researchers at OpenAI. I was pretty new to multimodal aspects and learned so much from them while working on o1!
Information can exist in multiple forms but reasoning is more agnostic to how information is represented. In theory, the same textual information can be represented as a picture and the model should be able to do the same reasoning either way.
We are not fully there yet but o1 made significant progress; the less we separately treat different modalities and learn more in an end-to-end manner, the closer we get to that ideal. As with anything end-to-end, scaling is the way to get there.
Oh also reminder that o1 has the state-of-the-art performance on MMMU and MathVista"
i tried it in LM studio and it doesn't feel very uncensored tbh
It's crazy good
I dont get how the reinforcement learning works here. How does the model know what is a good answer?
You'd need to offer some medium that scores whatever the model produces, so I can't see any difference between this and SR. Not sure what different actions can be, what scoring scheme they employed, or how fast it converges, since typically SR converges way faster than UL. I will red the paper. So far I'm scheptical.
Its to do with setting the rewards and penalties well enough so that it continually, and cyclically, refines down to get the best reward.
Please make a video on customising GPU performance on F5TTS. It is very slow on my pc my GPU usage is always less than 5%.
21:49 it means that OpenAI can look at the secret sauce and make a better model from this one.
Is 2025 the year of the AI desktop?
Go China👍
So, how close is AGI?
18:09 "on math benchmarks" but what about the other benchmarks like coding or law?
18:34 Is it possible that they make the API super cheap so they get the user data? We would have to compare the privacy policy of DeepSeek with the privacy policy of Open AI. I know this sounds boring but we could ask an AI to compare the privacy policies.
If that was their aim, why offer it as free download with full open source, including weights so that it can be retrained locally. The Chinese appeal is obvious, it removes the future reliance on American companies to provide AI tech. You can download it without entering any information at all.
@@wolf5370 Good argument. I guess they could poker on most people not having the hardware to run the model locally, therefore using their online version.
I always wonder, how Chinese are able to outperform the big names out there, you name it, Minimax, Kling and now this Deepseek.
Asians are simply better lol
There are fewer workers earning far less money and huge investments from the Chinese business sectors - think of it from their POV, they do not want to be left behind as a nation with America holding all the cards. So, concerted effort, single aim and little in the way of red tape or public opinion to combat.
Another great episode! You gotta be a busy guy -- Operator came out today, I'm sure you'll be testing that soon. It must be annoying to spend so much time on a video and by the time you finish it another couple of AI products come out!
That's exactly how I feel !
What is the minimum GPU you'd recommend to run this thing?
The quantised versions - say the 8B Q5 version (which is pretty good) runs fine a moderately good PC. Using LLama (LM Studio) on that R1 model, it happily run without a performance hitch on my older machine - RX3050 16GB, 13th Gen i5, 16GB single channel DDR5 - whilst also running ComfyUI generating SDXL images! It does not seem to require many resources and will play in what it has available. I was shocked it would work while ComfyUI (a real resource hog) on the same box at the same time with mediocre specs!
It makes more errors when giving you code as vompared to GPT-4 but Inthink with time and some fine-tuning it will be on par or better than OpenAI's solution.
China will continue to improve in technological development. It’s concerning to see so many talented Chinese researchers leaving the US and returning to China. Political tensions, visa difficulties, and increased scrutiny have created an unwelcoming environment, pushing these scientists away. Meanwhile, China is offering better funding and opportunities, making it easier for them to continue their work back home. This is a significant loss for the US in terms of innovation and scientific progress. Just look at all the AI research in recent months and years, there's almost always Chinese authors on the papers.
Its cool, and stupidly fast, but it gets in inference loops easily. Not as good as the best commercial thinker models, but certainly cheap and suitable to run locally and I quite like it.
it seems like the model is not uncensored as I am running it on ollama and to test it I asked it to provide 10 swear words.
Which app can I use to run distilled models on a modest Android phone?
Either llama.cpp or Ollama on Termux (there is a build/installation script for the latter)
You can then run ollama serve and use something like GPTMobile from F-Droid as a client, or you can just use ollama run in the second tab of the Termux terminal.
I tried letting deepseek and gpt make a* pathfinding algorythms. gpt succeded, deepseek didn't :(
But then how does it verify if the answer it got is correct or not? And if we provide the answer then isn't that just supervised learning?
I believe it uses a penalty reward system and cyclic re-evaluation - so it moves itself towards the best rewards. It compares its answers to other attempts at its answers, notices differences and tries again - each time with respect to reward over penalty.
This is what Apple should use and have it offline on that phone. Would be amazing with TTS mobile locally
To try to add some doubt to those wanting to use it directly:
If the product is free, you may be the product.
--- someone
Uncensored? It's just censored differently. Ask it about something banned in China.
you're using the online interface. the local version is uncensored
@theAIsearch Ask "What happened on Tiananmen Square?"
@@dadamaldasstuff1816I will tell you and I am not AI. That's the Grand Anhilation of a 🟥🟩🟦 revolution worth celebration every year. Bravo to the magnificent civilization of the human race!!! That was what happened!
8:36 uncensored? I don’t think so. It depends what you ask it.
Tutorial on how to install please 🙏🙏🙏🙏
Most likely open available.
Open source for a model means the dataset is available, so you can reproduce training.
It is - as are all the weights.
For those who tried both, do you feel like Deepseek is better than Claude ?
It still failed at solving my time coordination puzzle (even after getting both hints) but at least did not get stuck at two pseudo-'similar' problems that o1 keep arriving at. My puzzle is hard-hard yet algebraically solvable and created especially to have two parts both with calculations incredibly similar to two other simpler and well known/popular math puzzles on the Internet. (pretty much showing that o1 doesn't do intelligence but fits the closest similar task). We are still a long way from actual Inteligence, but TBH Deepseek R1 is quite promissing (45k words and still has not reached the optimal answer, I didn't give o1 that much time to solve the task).
It's not uncensored
deepseek-r1 with 1.5b doesn't beat Chat GPT. I asked both to create an OpenGL program in C. The deepseek-r1 1.5b completely hallucinated the code, the code was not runnable by any means, while the code generated by ChatGPT was perfect. However, the deepseek-r1 32b created a perfect C program, just like Chat GPT that does exactly what I asked, but it took more than 10 minutes to complete the answer.
I mean, yeah... You can't realistically expect a a 1.5billion parameter model to compete against a trillion+ parameter model like GPT 4o. DeepSeek-R1-Distill-Qwen-1.5b is just a fine-tuning of Qwen2.5-Math-1.5B.
Try it on math questions. In that area overkill, any other areas not really.
I mean if a person don't have much relevant information, they answer won't be good no matter how long they think. Maybe 7b model started to be good.
@ They do start getting good at the 7b and 8b level. I run the 32B model on a 5 year old laptop with 64gb Ram and a T1000 with 4gb of Vram and it's excellent.
@@arthurparkerhouse537 how long for answers on 32B model? (Is it this one? DeepSeek-R1-Distill-Qwen-32B-GGUF)
Open source is awesome
newsletter link ain't working my guy
that's weird. pls try again here: aisearch.substack.com/