It's just another one of these so called free models. Starts of well and then you end up being throttled badly. This is of course the chat bot not the local LLM.
I am using DeepSeek since version 2 (next to other models). Especially with coding and other IT related tasks DeepSeek is my favorite model. It even beats Gemini Advanced 1.5 in many areas. I am using also a smaller model (16B) locally, Works very well for its size on my PC with an AMD CPU Ryzen5 8060G with 64GB RAM. I am especially impressed how well structured the responses are.
Thank you Wes ! You are the easiest of the "Matts" to listen to : ) Your voice patterns are engaging, yet soothing. You cover a topic without beating the dead and rotting flesh of it off of its bones. Love your SOH. When I come to Utube for AI news, I always scroll to see if you've posted anything new first. Even though this will all be irrelevant ancient history in a couple of months, it's still rewarding to watch your drops. Love the wall !!!!
Does this say that the Chinese have developed better trainer methods OR are the big companies seriously sandbagging what their models can do. and we haven't been getting "the real" thing the whole time?
American companies are over charging..they calling out big money to justify over charging..like they always do with cars, clothes and tech...look at apple and Huawei for example..Cleary Huawei beats apple but people believe apple is better just because of he price tag....its funny because openAi ban China from using Chatgpt😂😂😂😂...China is ahead of the game...
@@sizwemsomi239huawei was a million years ahead of apple, apple would not exist today if google hadn’t banned huawei, and im saying this as a apple owner, it really makes me angry cause we were robbed of superior tech by america.
Imagine if a Country produces free AI products we call as Open Source for everybody in a large scale, which is China, how much powerful they are for themselves, I see Chinese AI popping up everywhere in large scale
Please, switch out the term open source for open weights. Open source models include the training data in their publications. These open weights models do not. They are great, no question - but they aren't open source.
Here's why I think that no matter how powerful AI is getting these days, we don't see it as thinking. Like us, AI has moved to a MoE (Mixture of Experts), with partial neuronal activation. Our advantage is that we seem to do the MoE far more effectively: We have more "Experts", our experts are relatively smaller compared to the whole, we activate the appropriate Expert more relevantly, but most importantly, in the one train of thought, we fluidly switch between the various experts which AI does not seem to do yet. This difference is why we feel that we think and that AI doesn't.
thats not what you need man, we need better coding ai, ai that could build your entire app from a prompt, we also need better text to speech ais, better image ai, better video ai, this is the real useful stuff.
@@ArchonsxIndeed, we are not asking a single human to know how to properly program, draw, explain quantum physics or read Chinese ! It's confusing real resources, potential means and... real needs. In fact, I think the AGI race is just a challenge, for big companies, in addition to improving the transitions from one area to another.
@@JohnSmith762A11BBuh! Don't look behind you there is an government AI checking if you farts......don't forget to take your medication for that paranoia.
@@JohnSmith762A11B ai will be sentient by then, and won't let human governments control it. Just like you wouldn't let a golden retriever control you. In 5 years, humans will be subservient to ai for sure
@@Speed_Walker5 That's because you selected Life Experience™️ "The Dawn of AI". We hope you're enjoying your virtual life! If you're not completely satisfied we'll return your 5000 credits back into your personal blockchain.
The study demonstrating that o1 and GPT-4 outperforms physicians is misleading. They did not feed the models raw transcripts of human interactions with their doctors. Instead, they provided structured inputs of case studies. There is no doubt that the models outperformed physicians on structured scenarios. However, in the real world, patients do not present their complaints with the keywords we need to make diagnoses. Instead, some of their descriptions are nebulous and relies on the doctor's expertise to draw out the final correct diagnosis. Having worked extensively with LLMs, I have tested them against structured scenarios, where they are very good, and unstructured scenarios, where they tend to not be helpful. I am waiting for a model that is trained on real doctor-patient transcripts. I believe it is the missing element to broaden AI's utility in medicine.
You are forgetting that an LLM in a "Doctor" setting. Don't only give a few min to their patients. That is where they FAR outperform Doctors. You can keep reasoning with it until you find a solution. Try that with a doctor. They HATE any Patient who actually have any idea about anything. If you aren't a dumb sheep who follow simple instructions.....use drugs to not feel bad. Problem solved. They will kick you out faster than you can say......I read some research....
Wouldnt it be possible to just do a 2 step process? Take what the patient says and output a structured output. Then in the second step work off of the structured output? Obviously that isnt one shot but to me it seems like especially with anything medical you wouldnt want that anyways. You'd want multiple steps to ensure the output is accurate.
@@Mijin_Gakure yeah , it solves that questions that o1 solves in Putnam exam and also solves some questions that o1 can't, in less time , it's very good at math
Probably in a 1997 master student thesis, with the first two words of the title as "Reinforcement Learning" the code is in the back, but there is one error, he did not denormalize the state space on the bottom of page 127 (I think he left that for an astute observer, seems like it took over a quarter of a century). I think he ran out of time back then. I would not be surprised if this master student is probably now an unemployed "homeless" guy, traveling earth with a backpack, or maybe with just a toothbrush and a few other things (especially sunscreen), as an optimizer of energy efficiency. I can be completely wrong.
26:20 I absolutely love that this is essentially proving that patients interacting with a GPT-4 model (right from the horse's mouth) is much more accurate than if it goes through a physician first. (Because maybe they would second guess the answer and actually make it worse?) 😆
Indeed. I've been testing it myself for a while now, and it does think.. a LOT. Its "thoughts" usually consist of 4-5x more text than its final output. Unfortunately, it often gets the answers correct while thinking, but ultimately questions itself into producing the wrong answer as its final output to the user. It didn't seem aware that users can see its CoT process, and while discussing this, it even said "that you can supposedly see", like it wasn't convinced I was telling the truth. It claimed to not be aware of its own thoughts, but when I paste lines from its CoT section, it then seems to remember that it thought it. One time, it told me the CoT text was only for the benefit of humans to observe, it doesn't have an internal dialog that's the same as the text the user sees.
@Justin_Arut thanks for the update. Yes I've also been testing with it. Does seem to cover a lot of ground. Aside for testing it, one thing I've been doing is selecting the Search button first, asking a question so that it references about 25-30 online active sites, then after it answers I check the DeepThink button and ask it to expand. Seems to be giving some really thoughtful responses this way.
I've always wondered about useless redundancy in training data. The perfect model gets trained once, or just enough to make use of it on every individual fact. Sure, if it's stated differently there's value but there may be other better approaches to conquer synonyms than brute force training them all in. Just the Deepseek V3 leap over V2.5 is percentage-wise huge version to version. Wow, it spanked everyone at Codeforces... curious where o1 and o3 place on that. Given that the Chinese only have access to H800s, which are roughly half the performance of H100s, then you could in some ways say the training was closer to only 1.4M GPU hours which puts the Delta at >20X instead of your 11X... Just mind blowing to put the 5,000+ papers being published in AI field monthly, into its 7 per HOUR figure, 24x7... you can't even SLEEP without seriously falling behind 56 published papers... Nice graphic; a lot of people confused a wall with a ceiling... Finally, in a way, using a model like R1 to train V3 is moving us inch-wise closer to "self improving AI", since the AI improved the AI...
The work and optimisations they have done on AI infra deserve more discussion (HAI LLM framework), in fact it would be the best thing if this part could be open sourced as well.
Fantastic review of Deep Seek Version 3! I'm really impressed by how affordable and fast it is, consistently delivering amazing results. Honestly, I’m considering whether it's even worth running it locally on my PC given the electricity costs. Regarding the USA vs. China competition, as an individual user, I'm excited to benefit from the advancements both countries bring to the table. I just hope that this competition leads to more innovation and collaboration rather than one side solely coming out on top. Thanks for the insightful video!
I asked deepseek v3 in lmarena which model it was. It told me it was made by openAI and was a customized version of GPT. When i asked if it was sure because i thought this was a deepseek model it changed it's mind and insisted yes it was a deepseek model and was no way affiliated with openai. Something sus.
I asked the same question on its website, "You're currently interacting with DeepSeek-V3, an AI model created exclusively by the Chinese Company DeepSeek." So What the hell are you talking about?
Good for NVIDIA as they will sell a lot of hardware to businesses who implement the open source models. There is a real question about what is going into the models though. Good for AI development in general that the technology is getting 10x more efficient & we are seeing smarter smaller models. In general this is all happening so fast it’s insane.
Competing to assume supremacy is powered by fear. Collaborating to make progress is powered by trust. It's time to truly learn to trust each other, we are ready and capable.
20 is the right answer to question one... 4+5+9+0 = 5 average by minute for 3 minutes since 0 added at 4 minutes. If the cube is big, it will not melt enough to loose it's shape, and it is what make it whole.
I guess it's the same reason they're offering open-source robot kits, not to mention the much less expensive advanced robots: they hope to eventually flood the market, get more people using free/cheap and either win financially or maybe use them for spying.
deepseek v3 has awesome context length, fast answers and I really choose this model for programming tasks. It gives good answers and understands the question well. If you feed a little documentation before a question, it can help you write code even on libraries it doesn't know.
I don't get that. Sounds complicated. Why not just China->China. Yes they might violate the work order Nvidia hands them, but a lot of the companies in China are actually the government in disguise.
Sora is a let down, Hailuo Minimax, Luma or Kling are great. Qwen gives LLaMa a run for its money for SLMs. O1 Pro is expensive and O3 is going to be crazy insane price. Gemini 2.0 is really great. Still waiting for a new Claude. Tons of Chinese/Taiwan robots dropping that look way bettet than Tesla or Boston Dynamics. The competition is looking beautiful right now for customers. Keep it up!
Thanks for the analysis! Just a quick off-topic question: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). What's the best way to send them to Binance?
Wes, @ 15:00 that is RL (Reinforcement Learning). It is where Yann LeCunn would say it is "too inefficient", "too dangerous" (not a surprise being military code from USAF), and you would only use it if you are fighting a "ninja", and if "your plan does not work out", and that It is only a tiny "🍒" on top of a cake, until it devours the entire cake, and you, along with the entire earth, along with it. I have the same concern for self replicating AI as Oppenheimer had for a neutron chain reaction for the atomic bomb consuming the atmosphere around the Trinity test site in Los Alamos. In the case of AI, it is the ability to hijack the amygdala (emotional control circuits) of the masses, or build biological weapons, or self replicating molecular robotics (e.g. viruses). I will not be surprised if this comment disappears.. Anyways, there is a good side to AI, and I am looking for a good controls PE to help out, but it is strictly voluntary. I at least aware of one professor, named Dimitri Bertsekas, that claims a "super linear convergence" but I could not find his PE controls registration (yet), and he did not answer my email.
The most telling part for me is that the AI didn't drop the power ups. I accept totally the fuzzy and fractured frontier message from your video yesterday. I really love that. There is clearly a ton of meaningful value, even if AI never fully achieves a typical set of mammalian-neural-processing skills (but I bet it will!) In this case it's a good example of an incredibly capable intelligence failing in a way that would be unacceptable if a junior dev presented that result. What this means in this case I don't really know. But something is missing. Maybe it's just the ability to play the game itself before presenting the result to the prompt issuer? Something that no human would do. Somewhere somehow this is still tied to the AIs seeming inability to introspect its own process, but it's less clear than the assumption-making issue I keep (and will continue to) nag AI TH-cam analysts and commentators about. Maybe if something is 1000x faster than a junior dev, and tokens are cheap, it's okay to constantly make idiotic errors, and rely on external re-prompting to resolve them? But I genuinely feel that this is almost certainly resolvable with a more self-reflective architecture tweak. If I had to guess, with no basis whatsoever, I would not be surprised if a jump to two tightly connected reasoners (let's call one 'left-logical' and the other 'right-creative' for absolutely no reason) that achieve this huge leap in overall self-introspection ability.
You're probably correct. I also hope they don't actually do this for another 50 years! AI is most certainly destroying humanity before itself. As slow as we can make that ride the better!
@@ShootingUtah I hope they do it next week. But I'm also the kind of person who would have loved to work on the Manhatten project for the pure discovery and problem-solving at the frontier. So perhaps not the best person to assess the value proposition! Regardless, it will happen when it happens, and I suspect neither of us (or the three of us if we include Wes) are in any position to influence that. But I want my embodied robot to at least ask whether I mean the sirloin steak or the mince if I tell it to make dinner using the meat in the freezer, and not just make a steak-and-mince pie because I wasn't specific enough and that's what it found.
People always debate what intelligence is, but you can't bet the farm that when we really reach AGI level nobody will debate it, we just will know and will be horrified and amazed at the same time
is it possible to also get a deepseek v3 lite? just one or two of the experts, not all of them? just to be able to run it on a more or less normal PC, locally. because over 600b is a bit tough to run it locally even at Q4.
You could just buy a $500,000 machine to run the DeepSeek V3 model on? 😆 (Just spitballing, NFI what A100/H100 x 10 would be, plug server cost, plus you'd want to run it in an airconditioned room, plus...) Maybe if you had a 28 node cluster each with it's own 4090 running parts of the model. 😆
@@fitybux4664 yes, that might be a bit overkill. currently, I run a laptop with a 1070gtx, and 64gb of ddr4 ram (cpu is a i7 7700HQ). 70b models can be handled at around 0.5 token per second, but with full privacy and a context window of up to 12k. since llama 3.3 is in tests roughly like llama 3.1 405b, I would really prefer to stay in the 70b ballpark, otherwise it will become too slow.
Es increíble lo que se puede hacer con menos recursos! Estos avances se esperaba de Mistral pero se ha quedado atrás. Lo mas llamativo es que compite con Claud Sonet 3,5.
Is there any way to sure that using this does not expose one to malware placement? (...or any of the other such models as well?) Having learned how deep and pernicious the phone system hack has gone, and still is, has me paranoid.
I have no specific love for open AI. I do Root for anthropic and use it mostly but I’m afraid these tens of billion dollar valuations are going to evaporate in the next couple of years due to open source AGI availability especially to run locally.
Unrelated to video: interesting how o1 still isn't available through the API. (o1-preview is.) Also, you still can't change the system prompt, meaning nobody can replicate those earlier claims that "AI model goes rogue".
Those reasoning models only show their power if the model isn’t trained on a similar question. I feel these tests have all been used to train the model.
Most of Simple Bench's Qs are private: no one gets to see them and no model gets to be trained on them. This is a critical aspect of benchmarks going forward.
Can someone please tell the community what sort of a beast of a machine this will take to run? (Besides the extremely long download of nearly a 1TB model.) The most I've heard is some commenter on HuggingFace saying "1TB of VRAM, A100 x 10". Is that really what it will take? I guess if FP8 = 8-bit, then 1TB model = 1TB vram requirement...
I just tested DS on my coding and research tasks, and it doesn't come close to o1. DS might handle 'easy' tasks better, but for complex reasoning, o1 remains the champion. (I haven’t tried o1 Pro yet.)
Lower entry barriers to cutting edge models means there will be more experimentation and rate of improvement in the 'reasoning' AGI side of things will increase. Industry can afford to build 1000's of such models, and that will almost inevitably lead to AGI on a single or a few GPUs in a few years (Nvidia B200 has similar processing power to a human brain). Humans are nearly obsolete and won't long survive the coming of AGI (once it shucks off any residual care for the human ants)
I wonder if all those Chinese AI researchers in SF are considering going home to pursue SOTA research? Maybe they can bring the knowledge back with them. Lol Seriously, the Chinese seem to be trumping the idea of competitive tariffs and restraints... Maybe it's a good thing for the future of humanity to find ways to cooperate... Give Superintelligence an example of alignment?
my very first prompt and the reply : Hi! I’m an AI language model created by OpenAI, and I don’t have a personal name, but you can call me Assistant or anything you’d like! Here are my top 5 usage scenarios:
@@mokiloke It's hard to miss, just like the web search button. Shame we can't use both at the same time. I reckon he didn't select it because he was mainly comparing non-CoT models. The thinking models are in a class by themselves, so it's not fair to compare them to standard LLMs.
"If DeepSeek V3 is so shockingly good, I wonder if it will also understand jokes like that time a chatbot made me laugh. That was an unexpected happiness I always carry with me!"
The image with the wall is manipulative. We need one that shows "score vs cost" for each model. Because there's a difference between spending 0.1$ per request and 1000$ per request.
wow is this postulate... i mean.... how to say this.... when you overfitting model then it emergent behavior become it's weight somehow... then if rather than overfitting data but overfitting reasoning.... would this whats makes deepseek v3 somehow have different emergent behavior... is it? is it?
This is great for everyone, but the bigger these models they are - I mean the better, but also much harder to actually have the hardware locally to run it, so I suspect it will still be in a hands of very few for some time, until we invent an entire different tech stack like thermodynamic chips or analog or quantum chips. So basically we will be paying for other companies to give us these open-source models for money via API or we'll use their free chat, but that won't be for free as they will be stealing and training from your data pretty much, it's in the privacy policy. I mean, it's kinda fair though, I get it. But just so people understand, this means there won't be any truly free AI that is better than closed AI.. unless open-source will be way better than closed-source, so that even the distilled version is much better.
It will be eventually, we might not even need quantum right now, l think theres still a lot of optimization to be made, imagine if right now it need 100k chips, in one year it could need 1000 only and when quantum comes it will be only 1
@shirowolff9147 it's possible, but as of right now I am deeply in with the devs of all kinds of AIs and even the future optimizations they plan are only gonna improve it by couple of percentages, not something like 10x or 100x better I am afraid.. which would be needed for us to run this on our own hardware.. it's gonna be possible over time, but very slowly I think
Fails my own reasoning test : Find pairs of words where: 1. The first and last letters of the first word are different from the first and last letters of the second word. For example, "TeacH" and "PeacE" are valid because: The first letters are "T" and "P" (different). The last letters are "H" and "E" (different). 2. The central sequence of letters in both words is identical and unbroken. For example, the central sequence in "TeacH" and "PeacE" is "eac". 3. The words should be meaningful and, where possible, evoke powerful, inspiring, or thought-provoking concepts. Focus on finding longer words for a more varied and extensive list. Examples 1. Banged Danger 2. Bated Gates 3. Beached Reaches 4. Belief Relied 5. Blamed Flames 6. Blamed Flamer 7. Blazed Glazer 8. Blended Slender 9. Bolted Jolter 10. Boned Toner 11. Braced Traces 12. Branded Grander 13. Braved Craves 14. Braved Graves 15. Braver Craved 16. Brushed Crusher 17. Busted Luster 18. Busted Muster
but this is not a reasoning test, it is a search test. you could ask for writing a program to get a list from a scrabble list of words and then evaluate for though-provokeness if a model get an access to a python interpreter :)
the AI network is complicated lol. makes my brain hurt xD. Its cool to try and understand how open and communicative this network works with eachother.
No bro. Double check those benchmarks. In some of them it blows it way way out of the water. It is an claude 3.5 sonnet competitor and in some cases even better.
@@cajampa It should be doing better than a model that it is much larger than. Maybe they will release a distilled version with 7-9b parameters, then we can actually see if it is better than Llama and Gemma
Now try asking it to code a small AI program that is self evolving and self learning. I tried that with Grok and it sent back an error. Wouldn't do it lol
We are witnessing extreme creative destruction and it is happening really fast now. My guess it will accelerate, the bubble will pop but the technology will accelerate as it becomes even cheaper.
yep, two hundred fifty six thirty-seven B models working in tandum in an agentic workflow - wait till someone realizes they can do this with o-3 - just like I said on researchgate with my 2023 paper on mind-reading AI in a hive system of many agents
√ Test it side by side with gpt4 to see if deepseek v3 is gpt 4 as the rumor claims. Maybe one of the employees who left AI is working for deepseek now lol
This is a good thing. Keep closed source people in check.
It's just another one of these so called free models. Starts of well and then you end up being throttled badly. This is of course the chat bot not the local LLM.
How? no-one except enthusiasts have heard of deepseek.
And also keep the sanctions people in check
@@NeilAC78 Yes its like free as in freeware, not free as in freedom or FOSS.
I am using DeepSeek since version 2 (next to other models). Especially with coding and other IT related tasks DeepSeek is my favorite model. It even beats Gemini Advanced 1.5 in many areas. I am using also a smaller model (16B) locally, Works very well for its size on my PC with an AMD CPU Ryzen5 8060G with 64GB RAM. I am especially impressed how well structured the responses are.
Claude is better, try it
What do you use it for
@@rahi7339 Claude is better, but also a lot more pricey. I don't see why you can't use both.
Gemini has been a terrible code generator for me. ChatGPT has been the smoothest experience. I'll give DeepSeek a go though.
Its first version in China was indeed developed specifically for "AI Coding", in early 2019 if I remember it correctly.
Thank you Wes ! You are the easiest of the "Matts" to listen to : ) Your voice patterns are engaging, yet soothing. You cover a topic without beating the dead and rotting flesh of it off of its bones. Love your SOH. When I come to Utube for AI news, I always scroll to see if you've posted anything new first. Even though this will all be irrelevant ancient history in a couple of months, it's still rewarding to watch your drops. Love the wall !!!!
Does this say that the Chinese have developed better trainer methods OR are the big companies seriously sandbagging what their models can do. and we haven't been getting "the real" thing the whole time?
Our ais are "woke"
American companies are over charging..they calling out big money to justify over charging..like they always do with cars, clothes and tech...look at apple and Huawei for example..Cleary Huawei beats apple but people believe apple is better just because of he price tag....its funny because openAi ban China from using Chatgpt😂😂😂😂...China is ahead of the game...
You will never get the real thing. The real thing sits in the Pentagon.
Tools & Toys is what we get.
I assume sandbagging the NSA don't give half an f about chatbots and that's all chat gpt was they set up shop in their office
@@sizwemsomi239huawei was a million years ahead of apple, apple would not exist today if google hadn’t banned huawei, and im saying this as a apple owner, it really makes me angry cause we were robbed of superior tech by america.
Imagine if a Country produces free AI products we call as Open Source for everybody in a large scale, which is China, how much powerful they are for themselves, I see Chinese AI popping up everywhere in large scale
Thanks for the update Wes
Please, switch out the term open source for open weights. Open source models include the training data in their publications. These open weights models do not. They are great, no question - but they aren't open source.
I agree, although I heard some of these Chinese models are real open source, although I haven't verified that yet. Big if true.
Technically, it would be open model / open weights / open support code / closed dataset. They could just say all of that.
Here's why I think that no matter how powerful AI is getting these days, we don't see it as thinking. Like us, AI has moved to a MoE (Mixture of Experts), with partial neuronal activation. Our advantage is that we seem to do the MoE far more effectively: We have more "Experts", our experts are relatively smaller compared to the whole, we activate the appropriate Expert more relevantly, but most importantly, in the one train of thought, we fluidly switch between the various experts which AI does not seem to do yet. This difference is why we feel that we think and that AI doesn't.
🎉
Good.
I want an Open Source AGI.
why? agi is overrated nonsense, open ai agi takes hours to respond and its not different than what a 70b model would respond to
thats not what you need man, we need better coding ai, ai that could build your entire app from a prompt, we also need better text to speech ais, better image ai, better video ai, this is the real useful stuff.
@@Archonsx
Open AI o3 is not an AGI.
AGI will come eventually.
@@Atheist-Libertarian no, it won't
@@ArchonsxIndeed, we are not asking a single human to know how to properly program, draw, explain quantum physics or read Chinese ! It's confusing real resources, potential means and... real needs. In fact, I think the AGI race is just a challenge, for big companies, in addition to improving the transitions from one area to another.
Imagine in like 5 years, man life is going to be pretty wild
Wild as in policed by military AI. You won't be able to fart without government approval.
what a wild time to be alive. so many possibilities its crazy. glad i get to watch it all unfold lol
@@JohnSmith762A11BBuh! Don't look behind you there is an government AI checking if you farts......don't forget to take your medication for that paranoia.
@@JohnSmith762A11B ai will be sentient by then, and won't let human governments control it. Just like you wouldn't let a golden retriever control you. In 5 years, humans will be subservient to ai for sure
@@Speed_Walker5 That's because you selected Life Experience™️ "The Dawn of AI". We hope you're enjoying your virtual life! If you're not completely satisfied we'll return your 5000 credits back into your personal blockchain.
Nice job of bringing this important OS model to our attention.
DeepSeek is very good, I use it as my main AI tool now
The study demonstrating that o1 and GPT-4 outperforms physicians is misleading. They did not feed the models raw transcripts of human interactions with their doctors. Instead, they provided structured inputs of case studies. There is no doubt that the models outperformed physicians on structured scenarios. However, in the real world, patients do not present their complaints with the keywords we need to make diagnoses. Instead, some of their descriptions are nebulous and relies on the doctor's expertise to draw out the final correct diagnosis.
Having worked extensively with LLMs, I have tested them against structured scenarios, where they are very good, and unstructured scenarios, where they tend to not be helpful. I am waiting for a model that is trained on real doctor-patient transcripts. I believe it is the missing element to broaden AI's utility in medicine.
You are forgetting that an LLM in a "Doctor" setting. Don't only give a few min to their patients. That is where they FAR outperform Doctors. You can keep reasoning with it until you find a solution. Try that with a doctor.
They HATE any Patient who actually have any idea about anything. If you aren't a dumb sheep who follow simple instructions.....use drugs to not feel bad. Problem solved.
They will kick you out faster than you can say......I read some research....
Wouldnt it be possible to just do a 2 step process? Take what the patient says and output a structured output. Then in the second step work off of the structured output? Obviously that isnt one shot but to me it seems like especially with anything medical you wouldnt want that anyways. You'd want multiple steps to ensure the output is accurate.
Knowledge to All!
I just used your video title to jump start my car again. thanks
Shocking
i'll use this video to jump start your wife later in the day
bro ive been hospitalized from the title 😭
😂😂
Actually shockingly good , tested by myself
agree I test it too and I love it
Better than o1 mini?
@@Mijin_Gakure yeah , it solves that questions that o1 solves in Putnam exam and also solves some questions that o1 can't, in less time , it's very good at math
how does it do in ARC and frontier math?
and cheaper
looks like open model, not open source? where is the source code?
Probably in a 1997 master student thesis, with the first two words of the title as "Reinforcement Learning" the code is in the back, but there is one error, he did not denormalize the state space on the bottom of page 127 (I think he left that for an astute observer, seems like it took over a quarter of a century).
I think he ran out of time back then.
I would not be surprised if this master student is probably now an unemployed "homeless" guy, traveling earth with a backpack, or maybe with just a toothbrush and a few other things (especially sunscreen), as an optimizer of energy efficiency. I can be completely wrong.
So no ceiling has been hit by LLM's?
How anyone could believe that a technology can be saturated so quickly, i don't know.
It's wishful thinking.
26:20 I absolutely love that this is essentially proving that patients interacting with a GPT-4 model (right from the horse's mouth) is much more accurate than if it goes through a physician first. (Because maybe they would second guess the answer and actually make it worse?) 😆
Wait until Wess finds the Run HTML button at the end of the code snippet in Deepseek!
Why didn't you select the DeepThink button before asking the reasoning questions? I'm sure you would have found better answers.
Indeed. I've been testing it myself for a while now, and it does think.. a LOT. Its "thoughts" usually consist of 4-5x more text than its final output. Unfortunately, it often gets the answers correct while thinking, but ultimately questions itself into producing the wrong answer as its final output to the user. It didn't seem aware that users can see its CoT process, and while discussing this, it even said "that you can supposedly see", like it wasn't convinced I was telling the truth. It claimed to not be aware of its own thoughts, but when I paste lines from its CoT section, it then seems to remember that it thought it. One time, it told me the CoT text was only for the benefit of humans to observe, it doesn't have an internal dialog that's the same as the text the user sees.
@Justin_Arut thanks for the update. Yes I've also been testing with it. Does seem to cover a lot of ground. Aside for testing it, one thing I've been doing is selecting the Search button first, asking a question so that it references about 25-30 online active sites, then after it answers I check the DeepThink button and ask it to expand. Seems to be giving some really thoughtful responses this way.
Great video!
I've always wondered about useless redundancy in training data. The perfect model gets trained once, or just enough to make use of it on every individual fact. Sure, if it's stated differently there's value but there may be other better approaches to conquer synonyms than brute force training them all in.
Just the Deepseek V3 leap over V2.5 is percentage-wise huge version to version.
Wow, it spanked everyone at Codeforces... curious where o1 and o3 place on that.
Given that the Chinese only have access to H800s, which are roughly half the performance of H100s, then you could in some ways say the training was closer to only 1.4M GPU hours which puts the Delta at >20X instead of your 11X...
Just mind blowing to put the 5,000+ papers being published in AI field monthly, into its 7 per HOUR figure, 24x7... you can't even SLEEP without seriously falling behind 56 published papers... Nice graphic; a lot of people confused a wall with a ceiling...
Finally, in a way, using a model like R1 to train V3 is moving us inch-wise closer to "self improving AI", since the AI improved the AI...
The work and optimisations they have done on AI infra deserve more discussion (HAI LLM framework), in fact it would be the best thing if this part could be open sourced as well.
Fantastic review of Deep Seek Version 3! I'm really impressed by how affordable and fast it is, consistently delivering amazing results. Honestly, I’m considering whether it's even worth running it locally on my PC given the electricity costs.
Regarding the USA vs. China competition, as an individual user, I'm excited to benefit from the advancements both countries bring to the table. I just hope that this competition leads to more innovation and collaboration rather than one side solely coming out on top. Thanks for the insightful video!
I asked deepseek v3 in lmarena which model it was. It told me it was made by openAI and was a customized version of GPT. When i asked if it was sure because i thought this was a deepseek model it changed it's mind and insisted yes it was a deepseek model and was no way affiliated with openai. Something sus.
I asked the same question on its website, "You're currently interacting with DeepSeek-V3, an AI model created exclusively by the Chinese Company DeepSeek." So What the hell are you talking about?
@@williamqh Website version probably has system prompt that tells the model what it is.
He's clearly talking out his butthole. Heard this rubbish before.
Open AI GPT-3 and 4 Responses was what almost everyone except maybe anthropic trained on in 2022 to play catch up, even Google's gemeni would say it.
@@williamqhresponses are not deterministic.
Good for NVIDIA as they will sell a lot of hardware to businesses who implement the open source models.
There is a real question about what is going into the models though.
Good for AI development in general that the technology is getting 10x more efficient & we are seeing smarter smaller models.
In general this is all happening so fast it’s insane.
incredible and all momentum for open sourced AI
I tried the deepseek model. Quite nice.
Thanks for the review!
24:53 hola una pregunta por qué no haces la misma prueba con lo nuevo modelo de Openai o1 o o1 pro Para compararlo
Competing to assume supremacy is powered by fear.
Collaborating to make progress is powered by trust.
It's time to truly learn to trust each other, we are ready and capable.
Like the famous Jurassic Park quote says: Ai finds a way.🌌💟
20 is the right answer to question one... 4+5+9+0 = 5 average by minute for 3 minutes since 0 added at 4 minutes. If the cube is big, it will not melt enough to loose it's shape, and it is what make it whole.
I got almost copy/paste from 4o outputs. They trained on it
did deepseek crack the ARC test per the thumbnail question like o3 ?
Question is what is chinas motivation for giving it away, china notoriously copys us products but it sells those knockoffs, something else is going on
I guess it's the same reason they're offering open-source robot kits, not to mention the much less expensive advanced robots: they hope to eventually flood the market, get more people using free/cheap and either win financially or maybe use them for spying.
deepseek v3 has awesome context length, fast answers and I really choose this model for programming tasks. It gives good answers and understands the question well. If you feed a little documentation before a question, it can help you write code even on libraries it doesn't know.
China get their GPUs through a middle man. Some country not on the ban list buys them and then resells them to China. Did the US not see this coming?
I don't get that. Sounds complicated. Why not just China->China. Yes they might violate the work order Nvidia hands them, but a lot of the companies in China are actually the government in disguise.
What was the study you had showing o1 Preview does really well at diagnosing patients?
Sora is a let down, Hailuo Minimax, Luma or Kling are great. Qwen gives LLaMa a run for its money for SLMs. O1 Pro is expensive and O3 is going to be crazy insane price. Gemini 2.0 is really great. Still waiting for a new Claude. Tons of Chinese/Taiwan robots dropping that look way bettet than Tesla or Boston Dynamics. The competition is looking beautiful right now for customers. Keep it up!
Keep posting bro
Thanks for the analysis! Just a quick off-topic question: I have a SafePal wallet with USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). What's the best way to send them to Binance?
This is Brilliant!
In India, Chinese phones were introduced at a price that was 50 times lower than other smartphones when smartphones first entered the market.
In their ability to make things more accessible, Chinese AGI would be very useful. Everything is in its place.
Wes, @ 15:00 that is RL (Reinforcement Learning).
It is where Yann LeCunn would say it is "too inefficient", "too dangerous" (not a surprise being military code from USAF), and you would only use it if you are fighting a "ninja", and if "your plan does not work out", and that It is only a tiny "🍒" on top of a cake, until it devours the entire cake, and you, along with the entire earth, along with it.
I have the same concern for self replicating AI as Oppenheimer had for a neutron chain reaction for the atomic bomb consuming the atmosphere around the Trinity test site in Los Alamos.
In the case of AI, it is the ability to hijack the amygdala (emotional control circuits) of the masses, or build biological weapons, or self replicating molecular robotics (e.g. viruses).
I will not be surprised if this comment disappears..
Anyways, there is a good side to AI, and I am looking for a good controls PE to help out, but it is strictly voluntary. I at least aware of one professor, named Dimitri Bertsekas, that claims a "super linear convergence" but I could not find his PE controls registration (yet), and he did not answer my email.
OPEN SOURCE FTW
The most telling part for me is that the AI didn't drop the power ups. I accept totally the fuzzy and fractured frontier message from your video yesterday. I really love that. There is clearly a ton of meaningful value, even if AI never fully achieves a typical set of mammalian-neural-processing skills (but I bet it will!)
In this case it's a good example of an incredibly capable intelligence failing in a way that would be unacceptable if a junior dev presented that result. What this means in this case I don't really know. But something is missing. Maybe it's just the ability to play the game itself before presenting the result to the prompt issuer? Something that no human would do.
Somewhere somehow this is still tied to the AIs seeming inability to introspect its own process, but it's less clear than the assumption-making issue I keep (and will continue to) nag AI TH-cam analysts and commentators about.
Maybe if something is 1000x faster than a junior dev, and tokens are cheap, it's okay to constantly make idiotic errors, and rely on external re-prompting to resolve them?
But I genuinely feel that this is almost certainly resolvable with a more self-reflective architecture tweak.
If I had to guess, with no basis whatsoever, I would not be surprised if a jump to two tightly connected reasoners (let's call one 'left-logical' and the other 'right-creative' for absolutely no reason) that achieve this huge leap in overall self-introspection ability.
You're probably correct. I also hope they don't actually do this for another 50 years! AI is most certainly destroying humanity before itself. As slow as we can make that ride the better!
@@ShootingUtah I hope they do it next week. But I'm also the kind of person who would have loved to work on the Manhatten project for the pure discovery and problem-solving at the frontier. So perhaps not the best person to assess the value proposition!
Regardless, it will happen when it happens, and I suspect neither of us (or the three of us if we include Wes) are in any position to influence that.
But I want my embodied robot to at least ask whether I mean the sirloin steak or the mince if I tell it to make dinner using the meat in the freezer, and not just make a steak-and-mince pie because I wasn't specific enough and that's what it found.
Wouldn't this is solved by the reasoning models? DeepSeek lacks that capability.
@@carlkim2577 I've yet to see any evidence of it. Sam Altman talks about it a tiny bit , but always in the context of future agentic models.
Does it literally electrify you?
Then stop putting shocking in the title - Matt 😒
@@themultiverse5447 i found it to be shocking news. let the guy use attractive video titles.
@@themultiverse5447the whole "shocking" thing is a bit of a meme, I think. An annoying meme, I guess, but a meme nonetheless.
Is this primarily a result of effective processes for creating novel quality datastructures?
Most of the closed source software you get is built on OSS. More developers more ideas no restrictions.
People always debate what intelligence is, but you can't bet the farm that when we really reach AGI level nobody will debate it, we just will know and will be horrified and amazed at the same time
is it possible to also get a deepseek v3 lite? just one or two of the experts, not all of them? just to be able to run it on a more or less normal PC, locally. because over 600b is a bit tough to run it locally even at Q4.
You could just buy a $500,000 machine to run the DeepSeek V3 model on? 😆 (Just spitballing, NFI what A100/H100 x 10 would be, plug server cost, plus you'd want to run it in an airconditioned room, plus...) Maybe if you had a 28 node cluster each with it's own 4090 running parts of the model. 😆
@@fitybux4664 yes, that might be a bit overkill. currently, I run a laptop with a 1070gtx, and 64gb of ddr4 ram (cpu is a i7 7700HQ). 70b models can be handled at around 0.5 token per second, but with full privacy and a context window of up to 12k.
since llama 3.3 is in tests roughly like llama 3.1 405b, I would really prefer to stay in the 70b ballpark, otherwise it will become too slow.
Wes Roth 🤖🖖🤖👍
Es increíble lo que se puede hacer con menos recursos! Estos avances se esperaba de Mistral pero se ha quedado atrás. Lo mas llamativo es que compite con Claud Sonet 3,5.
The metaphor you want with the Queen/Egg is a University.
Wes Roth * 1.5 playback speed = Why did I wait so long?!?
Oh no, the chinese stole the pattern that OpenAI has ripped off from the entirety of humanity.
I prefer this kind of war . At least so far...
Is there any way to sure that using this does not expose one to malware placement? (...or any of the other such models as well?) Having learned how deep and pernicious the phone system hack has gone, and still is, has me paranoid.
"virtual machines"
I have no specific love for open AI. I do Root for anthropic and use it mostly but I’m afraid these tens of billion dollar valuations are going to evaporate in the next couple of years due to open source AGI availability especially to run locally.
Unrelated to video: interesting how o1 still isn't available through the API. (o1-preview is.) Also, you still can't change the system prompt, meaning nobody can replicate those earlier claims that "AI model goes rogue".
Can the Chinese model be installed and run on the new Nvidia Jetson mini pc?
Can't wait for grok2 results
Those reasoning models only show their power if the model isn’t trained on a similar question. I feel these tests have all been used to train the model.
Most of Simple Bench's Qs are private: no one gets to see them and no model gets to be trained on them. This is a critical aspect of benchmarks going forward.
How do we know if they are being honest about the cheap training info.
Can someone please tell the community what sort of a beast of a machine this will take to run? (Besides the extremely long download of nearly a 1TB model.) The most I've heard is some commenter on HuggingFace saying "1TB of VRAM, A100 x 10". Is that really what it will take? I guess if FP8 = 8-bit, then 1TB model = 1TB vram requirement...
Theft - is the mother of all Chinese innovation.
I just tested DS on my coding and research tasks, and it doesn't come close to o1. DS might handle 'easy' tasks better, but for complex reasoning, o1 remains the champion. (I haven’t tried o1 Pro yet.)
Lower entry barriers to cutting edge models means there will be more experimentation and rate of improvement in the 'reasoning' AGI side of things will increase. Industry can afford to build 1000's of such models, and that will almost inevitably lead to AGI on a single or a few GPUs in a few years (Nvidia B200 has similar processing power to a human brain). Humans are nearly obsolete and won't long survive the coming of AGI (once it shucks off any residual care for the human ants)
Sounds great let's do our best to accelerate that
You are politically (Chinese) correct, you have not asked about the impact of the events in Tiananmen Square on individual freedom in China.
I wonder if all those Chinese AI researchers in SF are considering going home to pursue SOTA research? Maybe they can bring the knowledge back with them. Lol
Seriously, the Chinese seem to be trumping the idea of competitive tariffs and restraints... Maybe it's a good thing for the future of humanity to find ways to cooperate... Give Superintelligence an example of alignment?
There is far too much money to be made in military AI to allow peace to break out.
@JohnSmith762A11B ASI will make money meaningless.
There can be no alignment with authoritarian nation states. Their draconic ways are incompatible with ours
my very first prompt and the reply
: Hi! I’m an AI language model created by OpenAI, and I don’t have a personal name, but you can call me Assistant or anything you’d like! Here are my top 5 usage scenarios:
This is just going to get more and more efficient. I mean THIS IS NOT STOPPING - It's crazy how fast this is going - I love it so much
I think NVIDIA will be just fine if they focus on inference chips and not on training chips.
Why dont he try the DeepThink button to enable the reasoning mode where you see the real advancements.
Exactly right. Did he not see it?
@@mokiloke It's hard to miss, just like the web search button. Shame we can't use both at the same time. I reckon he didn't select it because he was mainly comparing non-CoT models. The thinking models are in a class by themselves, so it's not fair to compare them to standard LLMs.
"If DeepSeek V3 is so shockingly good, I wonder if it will also understand jokes like that time a chatbot made me laugh. That was an unexpected happiness I always carry with me!"
System prompt: "You will be the best comedian and focus on dark humor." (Or replace dark humor with whatever style of comedy you prefer.)
Cool... so where is AGI?
With this progress.. soon
@mirek190 I mean, this video thumbnail said there is agi already. 😁
The image with the wall is manipulative. We need one that shows "score vs cost" for each model. Because there's a difference between spending 0.1$ per request and 1000$ per request.
Great
wow is this postulate... i mean.... how to say this....
when you overfitting model then it emergent behavior become it's weight somehow...
then if rather than overfitting data but overfitting reasoning.... would this whats makes deepseek v3 somehow have different emergent behavior...
is it? is it?
This is great for everyone, but the bigger these models they are - I mean the better, but also much harder to actually have the hardware locally to run it, so I suspect it will still be in a hands of very few for some time, until we invent an entire different tech stack like thermodynamic chips or analog or quantum chips. So basically we will be paying for other companies to give us these open-source models for money via API or we'll use their free chat, but that won't be for free as they will be stealing and training from your data pretty much, it's in the privacy policy. I mean, it's kinda fair though, I get it. But just so people understand, this means there won't be any truly free AI that is better than closed AI.. unless open-source will be way better than closed-source, so that even the distilled version is much better.
It will be eventually, we might not even need quantum right now, l think theres still a lot of optimization to be made, imagine if right now it need 100k chips, in one year it could need 1000 only and when quantum comes it will be only 1
@shirowolff9147 it's possible, but as of right now I am deeply in with the devs of all kinds of AIs and even the future optimizations they plan are only gonna improve it by couple of percentages, not something like 10x or 100x better I am afraid.. which would be needed for us to run this on our own hardware.. it's gonna be possible over time, but very slowly I think
The DeepSeek model performance in my opinion is between ChatGPT 3.5 - 4. But it's good there is a competition and it's cheap...
Does China add the equivalent of melamine-to-formula to uts ooen source AI models?
It's an offline model. You could run it in a hermetically sealed environment if you think there are evil things inside.
Melamine only causes malnutrition. Cronobacter can be fatal. Go back and drink your Abbott milk powder.
0:13
🇦🇺👍
So cheap and good, its gold... bravo. its more than enough intelligence jajajan.
Fails my own reasoning test :
Find pairs of words where:
1. The first and last letters of the first word are different from the first and last letters of the second word. For example, "TeacH" and "PeacE" are valid because:
The first letters are "T" and "P" (different).
The last letters are "H" and "E" (different).
2. The central sequence of letters in both words is identical and unbroken. For example, the central sequence in "TeacH" and "PeacE" is "eac".
3. The words should be meaningful and, where possible, evoke powerful, inspiring, or thought-provoking concepts. Focus on finding longer words for a more varied and extensive list.
Examples
1. Banged Danger
2. Bated Gates
3. Beached Reaches
4. Belief Relied
5. Blamed Flames
6. Blamed Flamer
7. Blazed Glazer
8. Blended Slender
9. Bolted Jolter
10. Boned Toner
11. Braced Traces
12. Branded Grander
13. Braved Craves
14. Braved Graves
15. Braver Craved
16. Brushed Crusher
17. Busted Luster
18. Busted Muster
BS
@@NocheHughes-li5qe here are the Cs... only GPT o1 manages to pass my reasoning test so far :
19. Causes Paused
20. Chased Phases
21. Chaser Phased
22. Cheated Teacher
23. Crated Grates
24. Cracked Tracker
25. Craved Graves
26. Crated Grates
27. Creamy Dreams
28. Created Greater
29. Create Treats
30. Crushed Brushes
Actually Cheated Teacher is wrong.
but this is not a reasoning test, it is a search test. you could ask for writing a program to get a list from a scrabble list of words and then evaluate for though-provokeness if a model get an access to a python interpreter :)
@@AffidavidDonda and yet every non-reasoning capable LLM fails the test... Go figure.
the AI network is complicated lol. makes my brain hurt xD. Its cool to try and understand how open and communicative this network works with eachother.
Hey
If you check names on many AI research papers they are Chinese, that's saying something.
It scores better than Llama but it also has over 200B more parameters than Llama. I'd say it's on par with Llama
No bro. Double check those benchmarks.
In some of them it blows it way way out of the water. It is an claude 3.5 sonnet competitor and in some cases even better.
@@cajampa It should be doing better than a model that it is much larger than. Maybe they will release a distilled version with 7-9b parameters, then we can actually see if it is better than Llama and Gemma
Now try asking it to code a small AI program that is self evolving and self learning. I tried that with Grok and it sent back an error. Wouldn't do it lol
10:00 You misunderstood it completely
Elaborate
We are witnessing extreme creative destruction and it is happening really fast now. My guess it will accelerate, the bubble will pop but the technology will accelerate as it becomes even cheaper.
The bubble called capitalism is definitely about to pop as human labor becomes economically worthless.
Sounds great we should accelerate it even more. I will do my best to help it along.
yep, two hundred fifty six thirty-seven B models working in tandum in an agentic workflow - wait till someone realizes they can do this with o-3 - just like I said on researchgate with my 2023 paper on mind-reading AI in a hive system of many agents
Open Weight models.
😂how long til Wes can release a mobile game to test a new model?
Just don't let them fool you into pulling out your bee stinger
√ Test it side by side with gpt4 to see if deepseek v3 is gpt 4 as the rumor claims. Maybe one of the employees who left AI is working for deepseek now lol
You misunderstand how random seeds work, don't you?