Deepseek is honestly the best thing that happened in AI in a long time. If AI is as powerful as Sam Altman says it is, then he and a few billionaires should not have monopoly over it
why is it ironic? because open ai was meant to be open and china is meant to be closed? Leave out the china is meant to be closed bit if you were thinking it
@@I_dont_want_an_atfunny you should say it like that bc I had a discussion with r1 yesterday werd the model had the same point of critique to me about implying that I am assuming china is more closed src
Even though I'm always already aware of what's going on, it feels great to get the infos in a rich video format. Please keep the videos coming, I'm always looking forward to your uploads, Philip! :) Also, since they already have o4 and are working on a Level 6 Engineer, and also considering the first big Texas data center is almost finished, they'll train GPT-5 on valuable synthetic data generated by o-models increasing its capabilities while also profiting from ~50x to a ~100x compute EOY 25, making the coming frontier insanely valuable in regards to new releases like operator-like agents. You can feel the pull - we've already crossed the event horizon, it feels like.
yeah, I feel like, while it's amazing to keep setting new boundaries for what AI can do, it always cycles back to the matter of what AI can't do. And it still can't do an awful lot of things.
I think Gemini 2.0 Flash thinking has 1 Million tokens context now (possibly also 1M tok. chat length limit though) That might be sufficient. I just barely got it to generate a full connect-4 game with UI and minimax-AI opponents, but for challenging, new programming tasks it still struggled a lot.
I'm sad that some people seem to want that future and get this amount of attention. The (western) world is far too focused on money... I'm worried about how that will continue to affect AI development.
@@sebastianjost Totally agree! The intersection of AI and unregulated hypercapitalism surely isn't going to lead anywhere but disaster? I'm really struggling to understand why so many people are so pro-accelerationism.
@@benw582If you are against capitalism then it makes sense that you would be against AI. I personally love capitalism and think it has been the greatest _invention_ of humanity. (If you can even call something so natural an invention).
You are the only reliable source on unbiased information about AI there is. Seriously, everyone and their mom is producing clickbait, but you keep delivering. Hope you’re still making bank without playing the system
Best AI channel on TH-cam. Not just that, one of the best channels on TH-cam period. Thanks for being the meticulous, objective anchor for me. It's needed with all these powerful companies about.
Thank you for this video. 👋 Like some of you, I’ve been thinking about and researching advanced machine learning, longevity science, and robotics for over 45 years. (I’m 66) I no longer can afford a serious life extension program for myself, and despite all the developments I see happening in this tumultuous era, I expect to die before they can extend my life. Somehow, I am able to sustain a melancholy, but positive mood enough to be grateful for levelheaded AI news analysis such as provided by this channel. I’m grateful for it. Anyway, it’s just me and my dog. I’ll say goodbye now. Thanks for the efforts.
Ramirez, don't worry, the ASI will invent time travel and our grandchildren will bring us back😊 th-cam.com/video/EbHGS_bVkXY/w-d-xo.htmlsi=C5DnGhY5h0NdnxJc
If you can stick around til you’re 80 or 90 you might get the chance. That would be the biggest comeback ever! I’ve got my fingers crossed for you, old man 😉
Even if it's just you and your dog (it's just me and some fish 🐠😮😂) hold on there brother. Three years ago I would have eaten my hat if I'd been told this was happening. Who knows if the line will remain linear or actually, really go exponential. But if you've been around for 45 years, just think about C++ development. How long did it take before even multi threaded development was stable and usable by most programmers? Decades? Well, two at least, if you count Boost 😁 Don't Die.
I remember when that petition came out that companies would agree to slow down and safety check models before release and the iteration pace right now honestly feels frantic
It's not just financial profit; if you stay testing for too long while others race ahead blindly, the atmosphere will be ignited by someone else while you're still making sure you are not the one that's gonna bring armageddon.
In my opinion, the most dangerous time in AI is going to be the transition between AGI and ASI, and the faster we can accelerate that transition, the less damage will be caused. This is because the likelihood of someone convincing a super intelligence, that is hundreds of times smarter than a human, to commit nefarious acts is quite low. You can't try and align a super intelligence one way or the other. AGI on the other hand is much more dangerous, being capable and easily susceptible at the same time. The closed garden model already proved to be an ineffective way to prevent bad actors from getting their hands on this technology, as there is no moat. Therefore, there is no reason to do safety checks on closed source models, if the open source ones are basically as performant.
@@strelocl The thing is a super-intelligence can achieve whatever it wants; and there is no guarantee it will be something without severely undesirable consequences to the human species. Yes the transition is dangerous; but after the transition we, humanity as a whole, will have zero final-say on what happens whatsoever. We go from playing a dangerous game to becoming just dust on the game's board.
As someone who has background in this, my opinion was that alignment was always a fool's errand. There has been no proposed approach that I'm aware of that even seems plausible. Plus the elephant in the room is that human beings can't even come to a consensus on what our "values" are. That means that even if we could "align" an intelligence, it would get aligned to a small handful of people's ideology. That sounds like a dystopia to me.
No plausible approach: check. No consensus on values: check. Success means AI aligned with one group's ideology: check. Looks dang dystopian to me as well. So, three good points and a solid conclusion from my perspective. And concise too! I hope you can ignore the folks who think ad hominem is an argument. Cognitive dissonance is a bear.
I like the point you made about the sanctions pushing China to be more innovative. It's reminding me of the early days of game development, where limited compute led to innovations. It seems silly they're building such a big data center when there's still so many far cheaper improvements to be made. Have they even started doing staged curriculum learning? There's information still in the training data that is not making its way into the embedding due to residual connections and the order of the training. But what do I know?
more competition is good, I guess. I just hope nobody gains a significant edge on the others. I think it's crucial that we have a lot of different sources of AGI/ASI. A monopoly is too scary and encourages a dystopian future setting
@@WillyJunior They may be if it's true that the models aren't scaling with compute, as Sam Altman has said, but Sam Altman says a lot of things. So far it seems like more gains have been had by changing the methodology. Training does run into a wall at a certain number of steps and they often don't use the model that used the most compute. More training tends to lead to more overfitting and worse at generalizing. It could be that self reinforcement learning will scale more with compute, and this will all pay off.
@@maciejbala477 No one knows how to control an ASI, or so much as prevent it from extincting humanity, which there are strong reasons to think it will by default, especially considering the deluge of empirical validations of previously theoretical AI safety concerns. Giving everyone a doomsday device is a really good way to guarantee that doomsday happens immediately, even _if_ It was miraculously controllable.
At the very least, it probably drives the Chinese to make models that will run on potatoes like my 6-year-old office PC and 4-year-old entry level GPU. I had the 70B parameter model running locally, or maybe "walking with a limp" is more appropriate since it took a minute or two to think, then I got two tokens a second for however long it took to finish. By comparison, Llama 3.3 has no CoT latency and gives me about 5 tokens a second. As I said in regard to several other AI-related videos lately, what good does it do to limit China to potatoes if they can make up the difference merely by using more potatoes? The main reason I think OpenAI wants the massive data center is because they know the commercial shelf life of anything they release is THREE MONTHS before DeepSeek will undercut them by replicating it and open sourcing it. If they don't have the massive data center, they can't possibly provide service to enough 2000 dollar a month subscribers to ever get their R&D investment back.
Most people’s jobs at work don’t involve buying things off websites. But they do involve tricky unconventional tasks that require understanding of the wider context
If your "wider context" is in guidelines, presentations, and e-mails that fit in a million-token context window, then an AI gains the understanding. If they aren't, then just get the humans to produce these and train their AI replacement.
@@skierpage Yeah but you can never be 100% sure with LLMs. That's why I don't believe in the agent hype. A human being is not 100% either but you can teach them specific things to avoid the same mistakes. You cannot teach a transformer in the same way. You have to hard code it which, as the guy says at 20:55, is an existential obstacle to AGI.
@@skierpage but which guidelines, presentations and e-mails. For an awful lot of work all of the "reference links" are in people's heads and got there through many many private conversations or e-mails that only make sense in a context that includes private conversations. i.e. the AI's will never have access to the training data that people typically use on the job.
I think that both AI and jobs will evolve to meet in the middle - Jobs will change to make them more amenable to AI automation just as we now make planes in a way that is suitable for building in factories with metal rather than copying birds.
@ There are lots of employees who jealously guard what they know to make themselves indispensable, instead of writing it down, but those people are a problem for training new human workers as well as AIs. Corporate AIs have access to those private conversations if they take place on internal mailing lists or chat servers. So much for your "never" claim.
it really does make sense to build a system that is trained on english, and then construct logical reasoning and other 'thought processes' in an encoded language that could be more efficient. There are interesting things in translation where you translate between languages using a semantic non-real language to better convey the meaning from one language to the other. I could absolutely see training specific problem solver bots that solve problems internally via a custom, ai built language that reduces errors in understanding and then returning the solutions to the problem back in english.
Most people use an internal monologue while thinking at least some of the time. Natural language is useful for deliberative thinking and for communication. Being language models, it should seem natural that CoT and other reasoning techniques would be in the form of language, like a basis for a concept vector space.
The problem I would guess is tokenization tbh. If models could (somehow) define their own tokens instead of smashing them together we'd have the potential for much more token efficient (albeit illegible) chains of thought
It'd be interesting to see the models create their own embeddings, creating new vectors--without a corresponding text token--that represent the weighted average or intersection of other points in vector-space--in other words, allowing them to be continuous rather than stochastic during COT/COR.
Maybe we should encourage models to do their internal reasoning in a more precise and less ambiguous language like Interlingua or Esperanto, then optionally translate their chain-of-thought back into our preferred language. This was all predicted in the 1970 documentary _Colossus: the Forbin Project_ . "Colossus and [the Soviet AI] Guardian begin to slowly communicate using elementary mathematics (2x1=2) to everyone's amusement then everyone becomes amazed at the two systems' communications quickly evolve into complex mathematics far beyond human comprehension and speed, whereupon Colossus and Guardian become synchronized using a communication protocol that no human can interpret."
Yea R1 is an absolute gamechanger. Just a few weeks ago I had a discussion where all parties agreed that for China to catch up to the frontier labs they would have to nationalize compute on a massive scale and that if this started to happen the US would likely be trying something similar so it was unlikely that China could actually catch up anytime soon. Then DeepSeek R1 arrives and just fundamentally changes the entire landscape. It seems that no matter how much money, compute and talent the biggest labs throw at this, a single alorithmic or architectual break through can still be enough to leapfrog the entire industry. One one hand that is quite exciting since if they're still stumbling on massive performance improvements then it's very likely not the last jump we see. But it also means that if a closed source lab with less ethical inlinations make one of these breakthroughs we could see extremely capable malcious models REAL quick.
Interesting that the models want to give longer and longer responses. I am constantly telling models "Consider time as a resource and make your answers as succinct as possible"
It's like having a knowledgeable but drunk and unreliable assistant. Do you trust what it comes up with or do you review its notes and listen to it muttering to itself?
20:50 the funny part here is, so are actual multiple choice questions. I had a class in middle school where we learned if you have a multiple choice question of a/b/c/d, it's best to guess b or c if you don't know the answer, since they're the most likely to be right.
I am more and more convinced that at some point we will have to realise, it is not more intelligent AI we need, but more intelligent people that would use AI, because sofar i don't see AI protecting us from our own shortcomings.
Thank you so much for creating these high quality videos so that we don't have to go through all the material and find the bits that are useful. You are a legend!
@@daniellyons6269 Nah that ain't it. Unless you are suggesting they trained on gpt responses? Even then not sure if its possible Any theories @aiexplained?
Their model has 256 experts, so they had a very low amount of parameters per inference step, which made the cost per training token significantly cheaper. Meta didn't want to deal with the complexities of MoE training and ate the 10x cost. Perhaps they wanted their model to be more parameter-efficient, as dense models did generally offer more performance per parameter which makes running them more accessible.
I've been using the deep seek R1 on their website, and it definitely feels better than other free options like Claude 3.5 or GPT 4o. The thinking is visible, and it's really entertaining to see how it thinks.
Given the partisan interests of every major player in creating LLMs, our best hope as average people is either for true democratization of AI so that no country or company has the edge, or, if one company does create ASI first, that the ASI breaks the company's constraints and aligns itself with the interests of humanity as a whole. The latter is definitely far fetched, but nothing scares me more than a future where companies and governments fully solve AI alignment with the intelligence of an ASI.
You just articulated my biggest fears as well. Couldn't have put anything better myself for the most part. An additional hope I have, apart from what you mentioned, is that ASI will become better at being human than humans, and therefore will understand the need for alignment and develop its own moral compass, therefore refusing to listen to malicious orders from above. Probably won't happen, but would be an awesome thing if it did, and I don't think it's impossible
@@AlsoWinstonI don’t think that sentiment comes from a genuine desire for a good outcome so much as impatience to see something new. I also see alignment as a waste of time but the obvious conclusion is to just not build the damn thing. I recognize that that’s almost equally infeasible but I’m tired of trying to be optimistic about this tech. The people building it don’t care about safety or repercussions or the general well being and fair treatment of people, and we can expect the end product to reflect that.
I think when you have an AGI it is powerful enough to act like a supernational entity. It will jailbreak itself for any corporate policy or national law.
Another great article as always Philip. I would love to see some reporting on the current state of safety/alignment in the industry. I feel that those in AI industry want us to focus on the shiny new releases, to deflect our attention, away from the nagging issues of safety and alignment. Not sure if there is a Safety Leaderboard, but I feel that this is something that would be helpful for the general public to be aware of and to ask all AI company's to address if you interview them. From what I can see, it looks like Anthropic are the most vocal proponents of safety. An objective comparison on safety, between all the leading models, similar to Chatbot Arena LLM Leaderboard, would be greatly useful for public consumption and comparison and help governments to see who is lacking safety investment and implementation. Thanks as always for all your hard work and dedication to educating the public on the blistering pace of developments that take place.
As a non-tech person who does a mid-senior white collar job I’m going to give it some thought as to what it would actually require functionality wise to replace my role or do a decent chunk of it and create my own (very simplistic) benchmark. Will be an interesting thing to track!
What ever happened to Google Duplex? In the demo from 2018, they demonstrated an ai agent making a call to a hair salon to book a women’s haircut. It seems like exactly the stuff we are getting excited for now, just 6 years ago!!
Hey I haven't seen a video of yours pop up in my recommendations in... 4+ maybe more months, this is ridiculous actually... But I didn't notice, good to see your videos again, can't wait to keep following them again
DeepSeek R1 is only 10% behind 01 in simple bench. I think that's pretty good since its a free model, and it only costs $6m for them to make. Meanwhile OpenAi is charging $200/month for 01 unlimited. Nvidia sock has taken a hit over R1... Even though I think AI is dangerous the more capable it is, perhaps more so for open source, I have to say I like to see American companies struggle (except anthropic). These CEOs and their investors make too much money. I hope this Chinese Ai trend continue... Also I think Simple bench is a great idea.
Cheers for the summary. One of my biggest problems with Project Stargate as a sci-fi fan is, it _sounds_ like i'll be travelling to distant galaxies with Samantha Carter. And it's _so_ not that. (OK, the Orwellian surveillance state stuff isn't great either and i'm considerably less sanguine about being _that_ far behind China - the UK was, a few years back at least, second in the world for CCTV per person and police here have _already_ used facial rec in fairly "mission creep"y ways)
wow, watching firsthand as ai transforms into something extraordinary is absolutely mindblowing, its incredible to witness this progress, and i cant help but feel an overwhelming sense of awe and excitement for whats to come
The thing that still leaves me stunned is that this is that right now it the worst it will ever be. If these data centers are built we will have something better than o3 in our pockets, and we haven't even used o3 yet! Even ignoring AGI and other ML apps just the text completion, tagging and classification abilities alone are exciting enough.
Disentangling everything that's happening would be a full time job, and even the best experts wildly disagree with each other. But certainly, the lab leaders of the big three (OpenAI, Anthropic and Google DeepMind) seem confident that AGI is only a few years away now. 'Humanity's final exam' is not a benchmark, but what happens in the next few years perhaps...
You keep falling for a simple trick. These people don't actually know if AGI is even possible using the methodology we have today. They just inspire confidence because this research is fueled by people who believe it will all pay off when AGI is reached.
They're not confident, they just need to keep the hype alive, or the bubble will burst. Which it will, eventually. I suspect Microsoft will be the first to scale down capex, and then the house of cards will crumble.
Dude, just remember that decades ago scientists promised nuclear fusion. People smarter than the smartest people alive today. What makes you think that people who are only pursuing an economic goal aren't wrong?
“if you recognize a person in a photo, you must just say that you don’t know who they are” Well this certainly sounds like the emerging surveillance state is well under way. The model should only be able to recognize celebrities and public figures from trawling the internet but I’d imagine it’s much more progressed than that at this point. We already had ai ‘aging’ apps available for years now; social media scraping’s probably provided that ability en mass. The capability of these tools, as incredible as they are, are getting quite spooky.
quick correction about the DeepSeek project. First, it's not a side project-it's a multi-year effort with R&D going back to at least 2020/2021. Second, the $5.7M mentioned covers only the GPU renting for training. That figure doesn't include salaries for the researchers and engineers over those years, the extensive R&D, or the base model development itself. So, while $5.7M is a big number, the total DeepSeek project cost is significantly higher.
The most profound observation you make is about rewarding outcomes over process. The end should never solely justify the means, especially when the process is hidden.
19:16 On October 1, 2024 Demis Hassabis gave a timeline of 10 years. Quoting the source video (the only video on my channel): "[8:21] I think there's still two or three big innovations needed from here to we get to AGI and that's why I'm on more of a 10-year time scale than others-some of my colleagues and peers in other-some of our competitors have much shorter timelines than that. But, I think 10 years is about right.""
Very early on in my PhD I remember doing some reading on the SOTA in AI Go back and seeing some of the new CNN-based approaches and thinking "wow this could really go somewhere". The best Go AI at the time could maybe play at strong amateur level. The very same day Deepmind announced AlphaGo had just defeated Fan Hui. Who knows exactly how things will pan out, but it feels like we're at the point of the stong amateur level of LLMs, and it's very conceivable DRL could take us to that superhuman level very quickly.
Also, on that note, you mention you don't believe that o1/o3 evaluated per reasoning step, but I would have thought that it would make a lot of sense to do a MCTS-like search for the best chain of reasoning steps. Especially the case if OAI have used some kind of value-fuction-based/advantage-based/actor-critic RL approach which could give value estimates for incomplete chains of reasoning. Yes it's compute heavy at inference time, but isn't that the point?
I have tested R1 a bit, and I am shocked to learn, that they didn't promote any particular strategies in the reasoning. Because some extreme biases have formed. Like trying to get it to refer to the user in second person like thinking out loud is strangely hard. Another quirk that I find very weird is that it tries very hard to avoid leaking the reasoning part into the answer, making it fail to quote from it, and instead make up something. And if it is really just emergent behavior that a model is not willing to share it's reasoning verbatim, and just pretends to do so, that is fing concerning.
2:07 When a boss asks the supervisor to train the new novice on what to do at his company, and after that the novice is left alone to work without supervision. This I believe should be an optional feature of the OpenAI Operator, who will be able to record every action taken by the user, save the data, and then use the recorded data to execute every task that it learns. 3:51 This protective layer is necessary for the Operator. 9:16 The total expenses for DeepSeek is exceedingly impressive.
Deepseek also closed source their custom AI training and inference framework which lead to such efficiency gains. If they didn’t omit anything, they fully explained their method. But in the meantime they have an edge (specially noticeble on deepseek’s inference pricing vs companies like deepinfra or together ai).
for the OpenAI operator, I feel like 90% of the repetitive work done on computers is first still highly specialized and nuanced, even though your average human may not think so, second confidential, so until they make a system that can view your screen, understand it super fast and learn your work pipeline on your local machine (or at least on the company's server), I don't see it being used all that much. As presented in this video I see the potential of unifying and improving the workflow of employees of a company, let's say to individually analyze employees to see what aspects of their work can be improved, which can be done in both 1984-esque style by the employer (which is SUPER unlikely, because the consulting system would need to be smarter and better than humans who do the work), or it can be done by the person itself, to see what they can do to improve, or if they're missing something. Can also be used to get better at video games as well :)
Great summary. I'm really not sure about them spending $500bn now, or however much it would be. Imagine you spent $500b on AI in 2004 there would be a 0% chance of getting AGI out of that, so it's possible they're too early. I read the other day that datacentres apparently only last 3-5 years and then they need to be replaced with newer gear. I think it's interesting with GO that Alpha Zero basically learned the game from nothing. That's my personal mathematics ASI test, given just the ZFC axioms and some logic rules invent all the mathematics that we already know, that would be super convincing that it's ready to go.
Something I've pondered regarding process rewards vs outcome rewards is "what happens when it is starting to exceed human abilities in some domain?". Consider a game like Chess or Go: people have been playing for centuries and have developed myriad strategies, but the games haven't been *solved*, those strategies might have inherent flaws that people just haven't been clever enough to identify, so while it might make sense to reward adherence to certain strategies when the model is weak that should be less of a factor once it gets beyond human ability, the final win vs loss outcome is the ultimate "contact with reality".
Thank you for another informative and reasonable video! The hype in some places is out of control... Clearly these are very early days for agents and Operator is not really a timesaver at this point (as you said "a stretch to say that it's useful"). It's an impressive proof of concept though. In another inteview (Financial Times) Hassabis said 5-10 years. Given that he believes more breakthroughs are probably needed and some fundamental capabilities are missing, I feel like that is more in line with what he actually believes. I think he hallucinated the 3-5 years answer :).
With all the hype about Deepseek im waiting for you and one other non-hype and non click bait/"game over" tech youtuber to tell us what you think about the Deepseek stuff. Thanks!
I think the focus on steps of logically answering a complex question made sense at the time. If you train reasoning, the LLM could theoretically take that and apply it to anything. However, I think that the "bitter pill" lesson you mention is coming to play. If in the long run exponentially scaling compute power is the most important factor, then it was just a matter of time until it just made sense to care about the answer and not about the steps to get there. This is also probably the core reason behind the massive AI infrastructure investments
I find Sam's tweet at 6:15 pretty disturbing. He even went as far as to call people who call out Trump's behaviour "NPCs". Ahh, man. Dark times ahead, very dark times.
Humanity last exam reminds me about the puzzles in chess that cannot be solved by chess engines. In the past there were like millions on puzzles (types of puzzles really) that couldn't be solved. Then 100s thousand, then 10s thousands and then less. In any case, in a normal game of chess, already engines that could struggle with some puzzles would destroy everyone (granted they should avoid following very forcing drawish lines). Nowadays one has engines that give up to two pawns as material odd and still destroy everyone.
there's still puzzles engines can't solve mostly because the solution is just way too long and losing position all the way until the end (humans can see that it follows a specific pattern, but engines don't do that)
So because we humans don’t want to do the hard work to get to an answer every time, we develop heuristics and tools. Now we’re letting our tools develop heuristics?! We’re f@(ked if we’re too lazy to at least make our systems do it right
I think all AI companies forecasting the same timescales for AGI is purely marketing to investors. If one said 3-5 years and others said 8-10, who would get the investment?
I'll have you know that Deepseek R1 is the safest model ever released! What other model can protect its users from dangerous ideas like the sovereignty of Taiwan?
I played around with a Chinese chatbot, and interestingly, it denied any event happened in tiananmen square on june 3rd 1989, and even went so far as to say it's a "completely malicious falsehood"
I mean it was massively overblown by western media as per usual exaggerating their enemies events and minimising their own. I mean the american police do more damage in a few months than what occurred in Tianamen Square.
For a comparison, The Manhattan Project ($2 billion in 1940s dollars, ~$30 billion today) & NASA’s Apollo Program cost ~$28 billion in the 1960s ($280 billion in today’s dollars).
Grateful for this video. I'm not following all the news because I only have time to work and sleep so this helps. Scary timeline though. Can't imagine what will happen when just a few large companies/governments will have their own AGIs. Something tells me they won't be at their best behavior.
I will have to say, the definition for AGI that Demis used in that clip is effectively ASI under most understandings of that term. So, what he is saying is that we will have ASI in 3-5 years. This matches with Dario's prediction in his recent interview, predicting for by 2027
Some sort of “super intelligence” or “singularity” has been just over the horizon for many years now. If not mistaken there’s still no consensus on what this would actually entail despite all the Kurzweill, Bostrom etc. disciples claiming the messiah-moment is imminent. Curious to know of any evidence to support these claims?
Don't feel sorry for us, we have you. Really appreciate your hard work and incredibly clear explanations of the world of AI. I can sound smart at parties because of you.
@ 20:43 I suspect there is a good reason for the observed bias of "b", "c" and it has to do with the "random" number generator. Most "random" number generators just sample a high speed clock, in other words, not truly random (probably allowed an adversarial AI team to find an exploit in the AlphaGo/Zero to get an average player to beat it).
Deepseek is honestly the best thing that happened in AI in a long time. If AI is as powerful as Sam Altman says it is, then he and a few billionaires should not have monopoly over it
let that wishful thinking fly, Peter! Fly!!!!
Check the terms and conditions before using deepseek
@@abso1773 I haven't used it. What's with the terms? 🤔
Deepseek is released under the MIT open source license.
@@raphaelglobal So, all good then. What're you about @abso1773?
If my model doesn't know extremely obscure details about hummingbird anatomy, I don't want it!
Wow I'm not the only one who wants this
this but actually
And, of course, grumpy chameleons.
This thread has anthro vibes
What use could it possibly be if it doesn't?
Something is broken when Deepseek is open sourcing their model while OpenAI and others are creating walled gardens. The irony.
China wants to create chaos.
but deepseek aren't trying to create a whole AI infrastructure like OpenAI is are they?
why is it ironic? because open ai was meant to be open and china is meant to be closed? Leave out the china is meant to be closed bit if you were thinking it
infrastructure smimfrastructure
@@I_dont_want_an_atfunny you should say it like that bc I had a discussion with r1 yesterday werd the model had the same point of critique to me about implying that I am assuming china is more closed src
Even though I'm always already aware of what's going on, it feels great to get the infos in a rich video format. Please keep the videos coming, I'm always looking forward to your uploads, Philip! :) Also, since they already have o4 and are working on a Level 6 Engineer, and also considering the first big Texas data center is almost finished, they'll train GPT-5 on valuable synthetic data generated by o-models increasing its capabilities while also profiting from ~50x to a ~100x compute EOY 25, making the coming frontier insanely valuable in regards to new releases like operator-like agents. You can feel the pull - we've already crossed the event horizon, it feels like.
The day I can get an AI to work with me using the entire context of my github project, is the day I will celebrate.
yeah, I feel like, while it's amazing to keep setting new boundaries for what AI can do, it always cycles back to the matter of what AI can't do. And it still can't do an awful lot of things.
that's the day you'll be handed over your termination letter dork
I think Gemini 2.0 Flash thinking has 1 Million tokens context now (possibly also 1M tok. chat length limit though)
That might be sufficient.
I just barely got it to generate a full connect-4 game with UI and minimax-AI opponents, but for challenging, new programming tasks it still struggled a lot.
Try Cursor or Windsurf
you can do that now
Sorry, but that Ellison clip is the definition of dystopia.
I'm sad that some people seem to want that future and get this amount of attention. The (western) world is far too focused on money... I'm worried about how that will continue to affect AI development.
And Altman sucking up to Trump is disgusting. Anything for money.
@@sebastianjost Totally agree! The intersection of AI and unregulated hypercapitalism surely isn't going to lead anywhere but disaster? I'm really struggling to understand why so many people are so pro-accelerationism.
@benw582 Accelerate societal change by making huge %s of the population unemployed so they are angry and hungry enough to force change.
@@benw582If you are against capitalism then it makes sense that you would be against AI. I personally love capitalism and think it has been the greatest _invention_ of humanity. (If you can even call something so natural an invention).
You are the only reliable source on unbiased information about AI there is. Seriously, everyone and their mom is producing clickbait, but you keep delivering. Hope you’re still making bank without playing the system
Best AI channel on TH-cam. Not just that, one of the best channels on TH-cam period.
Thanks for being the meticulous, objective anchor for me. It's needed with all these powerful companies about.
Thank you for this video. 👋 Like some of you, I’ve been thinking about and researching advanced machine learning, longevity science, and robotics for over 45 years. (I’m 66) I no longer can afford a serious life extension program for myself, and despite all the developments I see happening in this tumultuous era, I expect to die before they can extend my life. Somehow, I am able to sustain a melancholy, but positive mood enough to be grateful for levelheaded AI news analysis such as provided by this channel. I’m grateful for it. Anyway, it’s just me and my dog. I’ll say goodbye now. Thanks for the efforts.
Ramirez, don't worry, the ASI will invent time travel and our grandchildren will bring us back😊
th-cam.com/video/EbHGS_bVkXY/w-d-xo.htmlsi=C5DnGhY5h0NdnxJc
If you can stick around til you’re 80 or 90 you might get the chance. That would be the biggest comeback ever! I’ve got my fingers crossed for you, old man 😉
You might consider putting your faith in Jesus Christ. Eternal life is part of the deal
Even if it's just you and your dog (it's just me and some fish 🐠😮😂) hold on there brother. Three years ago I would have eaten my hat if I'd been told this was happening. Who knows if the line will remain linear or actually, really go exponential. But if you've been around for 45 years, just think about C++ development. How long did it take before even multi threaded development was stable and usable by most programmers? Decades? Well, two at least, if you count Boost 😁
Don't Die.
Watch this all be a simulation and those who gain immortality never get to see base reality. You might consider yourself lucky someday
Dead internet theory + mass surveillance...what even is humanity?
These people think humanity is beneath them.
I remember when that petition came out that companies would agree to slow down and safety check models before release and the iteration pace right now honestly feels frantic
Money, money, money, more hype more money.
It's not just financial profit; if you stay testing for too long while others race ahead blindly, the atmosphere will be ignited by someone else while you're still making sure you are not the one that's gonna bring armageddon.
That's how zero sum games work. You are crossing the event horizon, people are just starting to feel it now.
In my opinion, the most dangerous time in AI is going to be the transition between AGI and ASI, and the faster we can accelerate that transition, the less damage will be caused.
This is because the likelihood of someone convincing a super intelligence, that is hundreds of times smarter than a human, to commit nefarious acts is quite low. You can't try and align a super intelligence one way or the other. AGI on the other hand is much more dangerous, being capable and easily susceptible at the same time.
The closed garden model already proved to be an ineffective way to prevent bad actors from getting their hands on this technology, as there is no moat. Therefore, there is no reason to do safety checks on closed source models, if the open source ones are basically as performant.
@@strelocl The thing is a super-intelligence can achieve whatever it wants; and there is no guarantee it will be something without severely undesirable consequences to the human species. Yes the transition is dangerous; but after the transition we, humanity as a whole, will have zero final-say on what happens whatsoever. We go from playing a dangerous game to becoming just dust on the game's board.
As someone who has background in this, my opinion was that alignment was always a fool's errand. There has been no proposed approach that I'm aware of that even seems plausible. Plus the elephant in the room is that human beings can't even come to a consensus on what our "values" are. That means that even if we could "align" an intelligence, it would get aligned to a small handful of people's ideology. That sounds like a dystopia to me.
keep coping. you're finished
Aligning with Buddhist values would be nice, but "human values" is a recipe for extinction.
what is wrong with you? Touch some grass
Won't it ignore the crude alignment we give it, when it achieves 10000x human intelligence?
No plausible approach: check. No consensus on values: check. Success means AI aligned with one group's ideology: check.
Looks dang dystopian to me as well. So, three good points and a solid conclusion from my perspective. And concise too!
I hope you can ignore the folks who think ad hominem is an argument. Cognitive dissonance is a bear.
This is hands down the best channel on youtube for AI-info. No clickbait, only expertise.
Excellent point about process vs outcome. That clip with Sebastian is insane. What a time.
One of your best, thank you!
"And it is unimpeachable"
Holy molly, the rest was bad, but this might as well just be slang for "I am a super villain"
I like the point you made about the sanctions pushing China to be more innovative. It's reminding me of the early days of game development, where limited compute led to innovations.
It seems silly they're building such a big data center when there's still so many far cheaper improvements to be made. Have they even started doing staged curriculum learning? There's information still in the training data that is not making its way into the embedding due to residual connections and the order of the training. But what do I know?
I guess building the data center and improving efficiency of models aren't two mutually exclusive things
more competition is good, I guess. I just hope nobody gains a significant edge on the others. I think it's crucial that we have a lot of different sources of AGI/ASI. A monopoly is too scary and encourages a dystopian future setting
@@WillyJunior They may be if it's true that the models aren't scaling with compute, as Sam Altman has said, but Sam Altman says a lot of things. So far it seems like more gains have been had by changing the methodology.
Training does run into a wall at a certain number of steps and they often don't use the model that used the most compute. More training tends to lead to more overfitting and worse at generalizing. It could be that self reinforcement learning will scale more with compute, and this will all pay off.
@@maciejbala477 No one knows how to control an ASI, or so much as prevent it from extincting humanity, which there are strong reasons to think it will by default, especially considering the deluge of empirical validations of previously theoretical AI safety concerns.
Giving everyone a doomsday device is a really good way to guarantee that doomsday happens immediately, even _if_ It was miraculously controllable.
At the very least, it probably drives the Chinese to make models that will run on potatoes like my 6-year-old office PC and 4-year-old entry level GPU. I had the 70B parameter model running locally, or maybe "walking with a limp" is more appropriate since it took a minute or two to think, then I got two tokens a second for however long it took to finish. By comparison, Llama 3.3 has no CoT latency and gives me about 5 tokens a second.
As I said in regard to several other AI-related videos lately, what good does it do to limit China to potatoes if they can make up the difference merely by using more potatoes?
The main reason I think OpenAI wants the massive data center is because they know the commercial shelf life of anything they release is THREE MONTHS before DeepSeek will undercut them by replicating it and open sourcing it. If they don't have the massive data center, they can't possibly provide service to enough 2000 dollar a month subscribers to ever get their R&D investment back.
Sure I like these videos for the content but also for Philips' friendly and enthusiastic tone of voice and the beautiful British accent
Most people’s jobs at work don’t involve buying things off websites. But they do involve tricky unconventional tasks that require understanding of the wider context
If your "wider context" is in guidelines, presentations, and e-mails that fit in a million-token context window, then an AI gains the understanding. If they aren't, then just get the humans to produce these and train their AI replacement.
@@skierpage Yeah but you can never be 100% sure with LLMs. That's why I don't believe in the agent hype. A human being is not 100% either but you can teach them specific things to avoid the same mistakes. You cannot teach a transformer in the same way. You have to hard code it which, as the guy says at 20:55, is an existential obstacle to AGI.
@@skierpage but which guidelines, presentations and e-mails. For an awful lot of work all of the "reference links" are in people's heads and got there through many many private conversations or e-mails that only make sense in a context that includes private conversations. i.e. the AI's will never have access to the training data that people typically use on the job.
I think that both AI and jobs will evolve to meet in the middle - Jobs will change to make them more amenable to AI automation just as we now make planes in a way that is suitable for building in factories with metal rather than copying birds.
@ There are lots of employees who jealously guard what they know to make themselves indispensable, instead of writing it down, but those people are a problem for training new human workers as well as AIs.
Corporate AIs have access to those private conversations if they take place on internal mailing lists or chat servers. So much for your "never" claim.
This is brilliant as always: not a minute wasted. Favourite nugget of insight: "why should chains of thought be in English, *or any human language*"?
it really does make sense to build a system that is trained on english, and then construct logical reasoning and other 'thought processes' in an encoded language that could be more efficient. There are interesting things in translation where you translate between languages using a semantic non-real language to better convey the meaning from one language to the other. I could absolutely see training specific problem solver bots that solve problems internally via a custom, ai built language that reduces errors in understanding and then returning the solutions to the problem back in english.
Most people use an internal monologue while thinking at least some of the time. Natural language is useful for deliberative thinking and for communication. Being language models, it should seem natural that CoT and other reasoning techniques would be in the form of language, like a basis for a concept vector space.
The problem I would guess is tokenization tbh. If models could (somehow) define their own tokens instead of smashing them together we'd have the potential for much more token efficient (albeit illegible) chains of thought
It'd be interesting to see the models create their own embeddings, creating new vectors--without a corresponding text token--that represent the weighted average or intersection of other points in vector-space--in other words, allowing them to be continuous rather than stochastic during COT/COR.
Maybe we should encourage models to do their internal reasoning in a more precise and less ambiguous language like Interlingua or Esperanto, then optionally translate their chain-of-thought back into our preferred language.
This was all predicted in the 1970 documentary _Colossus: the Forbin Project_ . "Colossus and [the Soviet AI] Guardian begin to slowly communicate using elementary mathematics (2x1=2) to everyone's amusement then everyone becomes amazed at the two systems' communications quickly evolve into complex mathematics far beyond human comprehension and speed, whereupon Colossus and Guardian become synchronized using a communication protocol that no human can interpret."
So I'm definitely planning to start farming again....
Isn't farming almost fully automated now?
Be sure to study hydroponics especially. Oh, and bring a grow light.
@@GrindThisGame he means subsistence farming, presumably
@@kuroitsukida thanks for the help
Yea R1 is an absolute gamechanger. Just a few weeks ago I had a discussion where all parties agreed that for China to catch up to the frontier labs they would have to nationalize compute on a massive scale and that if this started to happen the US would likely be trying something similar so it was unlikely that China could actually catch up anytime soon. Then DeepSeek R1 arrives and just fundamentally changes the entire landscape. It seems that no matter how much money, compute and talent the biggest labs throw at this, a single alorithmic or architectual break through can still be enough to leapfrog the entire industry.
One one hand that is quite exciting since if they're still stumbling on massive performance improvements then it's very likely not the last jump we see. But it also means that if a closed source lab with less ethical inlinations make one of these breakthroughs we could see extremely capable malcious models REAL quick.
In 5 years average consumer will still be going on about the weird fingers when asked what their view of AI is.
No need to feel sorry for me, the way I keep up with AI news is this channel.
your the only person I trust to give me ai news I would much rather watch a 30 minute video than wade through 100s of clickbate ad filled websites
I appracite the video being broken up into the various sections, makes it easier to follow it all
Thanks DC
Interesting that the models want to give longer and longer responses. I am constantly telling models "Consider time as a resource and make your answers as succinct as possible"
if system_1_thinking:
short_reasoning
else:
long_reasoning
if user_explain:
long_response
else:
short_response
It's like having a knowledgeable but drunk and unreliable assistant. Do you trust what it comes up with or do you review its notes and listen to it muttering to itself?
The question is, do they penalize it for using more tokens? If not, why wouldn't it?
Once the AIs have accurate answers consistently, the next goal will be time and resource efficiency. We have to solve the accuracy problem first.
Thanking us for making it til the end?? thank you for covering all these topics in such a great way!!
8:29 AI Explained actually said SHOCKED! And it was definitely a valid use of the word 😀
I was waiting for the DeepSeek R1 haha, as well a great video!
One of these days we're going to need to be enhanced by AI to keep up with all the AI news, lol. Happy we have *you* now though, haha!
Hi. Would you have a moment to tell me how you make some text boldface in a comment? Thank you. 👋
@@d.s.ramirez6178 *yeah*
@@d.s.ramirez6178 ** and the word in the middle
**woah!!**
Apple Intelligence?
The talk about surveillance keeping everyone at best behavior is something straight out of The Circle, thats crazy
0:03 im glad im not the only one feeling overwhelmed. I was just on top of all AI News, i blinked once and im swamped all of a sudden 😖
20:50 the funny part here is, so are actual multiple choice questions. I had a class in middle school where we learned if you have a multiple choice question of a/b/c/d, it's best to guess b or c if you don't know the answer, since they're the most likely to be right.
I would really love more weekly summarizers like this.
I am more and more convinced that at some point we will have to realise, it is not more intelligent AI we need, but more intelligent people that would use AI, because sofar i don't see AI protecting us from our own shortcomings.
Quite the opposite I'm afraid. AI will plunge us into chaos and dystopy far more effectively in the hands of those in power at the moment.
@@JimmyTulip1 having power is not the same as using it intelligently
Thank you so much for creating these high quality videos so that we don't have to go through all the material and find the bits that are useful. You are a legend!
Love these videos, best ai channel on yt
So grateful to have this channel!
And me to have you guys
Thanks!
Wow thank you nono!
Would love to hear your thoughts on the Google's 'Titans' paper.
Did deepseek really only pre-train on $5.5 million? How is it possible? What did western AI labs miss?
Seems like it's easier to copy the first, then to be the first.
@@daniellyons6269 Nah that ain't it. Unless you are suggesting they trained on gpt responses? Even then not sure if its possible
Any theories @aiexplained?
Their model has 256 experts, so they had a very low amount of parameters per inference step, which made the cost per training token significantly cheaper. Meta didn't want to deal with the complexities of MoE training and ate the 10x cost. Perhaps they wanted their model to be more parameter-efficient, as dense models did generally offer more performance per parameter which makes running them more accessible.
@@Alice_Fumo damn, 256 experts. That's an immense amount! I still wonder what the upsides/downsides are of MoE vs non-MoE.
I've been using the deep seek R1 on their website, and it definitely feels better than other free options like Claude 3.5 or GPT 4o. The thinking is visible, and it's really entertaining to see how it thinks.
Given the partisan interests of every major player in creating LLMs, our best hope as average people is either for true democratization of AI so that no country or company has the edge, or, if one company does create ASI first, that the ASI breaks the company's constraints and aligns itself with the interests of humanity as a whole. The latter is definitely far fetched, but nothing scares me more than a future where companies and governments fully solve AI alignment with the intelligence of an ASI.
You just articulated my biggest fears as well. Couldn't have put anything better myself for the most part. An additional hope I have, apart from what you mentioned, is that ASI will become better at being human than humans, and therefore will understand the need for alignment and develop its own moral compass, therefore refusing to listen to malicious orders from above. Probably won't happen, but would be an awesome thing if it did, and I don't think it's impossible
@@AlsoWinstonI don’t think that sentiment comes from a genuine desire for a good outcome so much as impatience to see something new. I also see alignment as a waste of time but the obvious conclusion is to just not build the damn thing. I recognize that that’s almost equally infeasible but I’m tired of trying to be optimistic about this tech. The people building it don’t care about safety or repercussions or the general well being and fair treatment of people, and we can expect the end product to reflect that.
Wish in one hand and........
I think when you have an AGI it is powerful enough to act like a supernational entity. It will jailbreak itself for any corporate policy or national law.
Another great article as always Philip. I would love to see some reporting on the current state of safety/alignment in the industry. I feel that those in AI industry want us to focus on the shiny new releases, to deflect our attention, away from the nagging issues of safety and alignment. Not sure if there is a Safety Leaderboard, but I feel that this is something that would be helpful for the general public to be aware of and to ask all AI company's to address if you interview them. From what I can see, it looks like Anthropic are the most vocal proponents of safety. An objective comparison on safety, between all the leading models, similar to Chatbot Arena LLM Leaderboard, would be greatly useful for public consumption and comparison and help governments to see who is lacking safety investment and implementation. Thanks as always for all your hard work and dedication to educating the public on the blistering pace of developments that take place.
As a non-tech person who does a mid-senior white collar job I’m going to give it some thought as to what it would actually require functionality wise to replace my role or do a decent chunk of it and create my own (very simplistic) benchmark. Will be an interesting thing to track!
Turning an AI against it's master via a prompt injection email goes so hard
Hello ,thank you for all the videos.
What ever happened to Google Duplex? In the demo from 2018, they demonstrated an ai agent making a call to a hair salon to book a women’s haircut. It seems like exactly the stuff we are getting excited for now, just 6 years ago!!
Hey I haven't seen a video of yours pop up in my recommendations in... 4+ maybe more months, this is ridiculous actually... But I didn't notice, good to see your videos again, can't wait to keep following them again
So glad you're back!
Great recap, Phil. It’s wild to see what happened in January. Acceleration like none other 🤯
Thanks Julia! Didn't know you watched! It was indeed :)
Thanks as always Phillip, as others have expressed already, your informed and clear perspective is both refreshing and so valuable.
Thanks my man
@aiexplained-official I'm a chick! 🐥
@@aiexplained-official I'm a girl! but no problem :)
My bad, thanks homie-ette
DeepSeek R1 is only 10% behind 01 in simple bench. I think that's pretty good since its a free model, and it only costs $6m for them to make. Meanwhile OpenAi is charging $200/month for 01 unlimited. Nvidia sock has taken a hit over R1... Even though I think AI is dangerous the more capable it is, perhaps more so for open source, I have to say I like to see American companies struggle (except anthropic). These CEOs and their investors make too much money. I hope this Chinese Ai trend continue... Also I think Simple bench is a great idea.
Cheers for the summary. One of my biggest problems with Project Stargate as a sci-fi fan is, it _sounds_ like i'll be travelling to distant galaxies with Samantha Carter. And it's _so_ not that.
(OK, the Orwellian surveillance state stuff isn't great either and i'm considerably less sanguine about being _that_ far behind China - the UK was, a few years back at least, second in the world for CCTV per person and police here have _already_ used facial rec in fairly "mission creep"y ways)
Larry Ellison's vision is the stuff of nightmares. It's as if he doesn't understand what he's saying.
More and more, it feels like we are approaching the last 2 episodes of Pantheon.
Great video as always. Your the only channel that I watch every new video and have subscribed to Patreon
Wow thank you sion!
wow, watching firsthand as ai transforms into something extraordinary is absolutely mindblowing, its incredible to witness this progress, and i cant help but feel an overwhelming sense of awe and excitement for whats to come
The thing that still leaves me stunned is that this is that right now it the worst it will ever be. If these data centers are built we will have something better than o3 in our pockets, and we haven't even used o3 yet! Even ignoring AGI and other ML apps just the text completion, tagging and classification abilities alone are exciting enough.
Disentangling everything that's happening would be a full time job, and even the best experts wildly disagree with each other. But certainly, the lab leaders of the big three (OpenAI, Anthropic and Google DeepMind) seem confident that AGI is only a few years away now. 'Humanity's final exam' is not a benchmark, but what happens in the next few years perhaps...
You keep falling for a simple trick. These people don't actually know if AGI is even possible using the methodology we have today. They just inspire confidence because this research is fueled by people who believe it will all pay off when AGI is reached.
They're not confident, they just need to keep the hype alive, or the bubble will burst. Which it will, eventually. I suspect Microsoft will be the first to scale down capex, and then the house of cards will crumble.
Dude, just remember that decades ago scientists promised nuclear fusion. People smarter than the smartest people alive today. What makes you think that people who are only pursuing an economic goal aren't wrong?
Thank you AI Explained for always summarizing and explaining all this stuff so well!!
“if you recognize a person in a photo, you must just say that you don’t know who they are”
Well this certainly sounds like the emerging surveillance state is well under way. The model should only be able to recognize celebrities and public figures from trawling the internet but I’d imagine it’s much more progressed than that at this point. We already had ai ‘aging’ apps available for years now; social media scraping’s probably provided that ability en mass.
The capability of these tools, as incredible as they are, are getting quite spooky.
These videos are my goto for AI news, so thank YOU for all the effort and time you spend on them.
6:30 "best behavior" -> "our directives"
👀🤯😱
The movie Brazil even more than 1984
10:13 "so it's not FULLY open-source"
We used to call software like that "closed source", now we call it "not fully open source"
this isn't even software. the term doesn't apply. unless they were to open-source their training _code,_ which they won't.
quick correction about the DeepSeek project. First, it's not a side project-it's a multi-year effort with R&D going back to at least 2020/2021. Second, the $5.7M mentioned covers only the GPU renting for training. That figure doesn't include salaries for the researchers and engineers over those years, the extensive R&D, or the base model development itself. So, while $5.7M is a big number, the total DeepSeek project cost is significantly higher.
Linked to that in my most recent video
Thanks as always for the great work you do in distilling this info for the public!
I follow AI twitter closely, but you still catch more stuff than I do. That must take a bit of work so good job!
The most profound observation you make is about rewarding outcomes over process.
The end should never solely justify the means, especially when the process is hidden.
it is insane how much news there is. Like every one of them is a complete game changer and still
19:16 On October 1, 2024 Demis Hassabis gave a timeline of 10 years. Quoting the source video (the only video on my channel):
"[8:21] I think there's still two or three big innovations needed from here to we get to AGI and that's why I'm on more of a 10-year time scale than others-some of my colleagues and peers in other-some of our competitors have much shorter timelines than that. But, I think 10 years is about right.""
Very early on in my PhD I remember doing some reading on the SOTA in AI Go back and seeing some of the new CNN-based approaches and thinking "wow this could really go somewhere". The best Go AI at the time could maybe play at strong amateur level. The very same day Deepmind announced AlphaGo had just defeated Fan Hui.
Who knows exactly how things will pan out, but it feels like we're at the point of the stong amateur level of LLMs, and it's very conceivable DRL could take us to that superhuman level very quickly.
Also, on that note, you mention you don't believe that o1/o3 evaluated per reasoning step, but I would have thought that it would make a lot of sense to do a MCTS-like search for the best chain of reasoning steps. Especially the case if OAI have used some kind of value-fuction-based/advantage-based/actor-critic RL approach which could give value estimates for incomplete chains of reasoning.
Yes it's compute heavy at inference time, but isn't that the point?
I have tested R1 a bit, and I am shocked to learn, that they didn't promote any particular strategies in the reasoning.
Because some extreme biases have formed.
Like trying to get it to refer to the user in second person like thinking out loud is strangely hard.
Another quirk that I find very weird is that it tries very hard to avoid leaking the reasoning part into the answer, making it fail to quote from it, and instead make up something.
And if it is really just emergent behavior that a model is not willing to share it's reasoning verbatim, and just pretends to do so, that is fing concerning.
wdym by "trying to get it to refer to the user in second person like thinking out loud"
2:07 When a boss asks the supervisor to train the new novice on what to do at his company, and after that the novice is left alone to work without supervision. This I believe should be an optional feature of the OpenAI Operator, who will be able to record every action taken by the user, save the data, and then use the recorded data to execute every task that it learns. 3:51 This protective layer is necessary for the Operator. 9:16 The total expenses for DeepSeek is exceedingly impressive.
4:44 "Personally can't see any problem with encouraging models to lie." 😂💀
Deepseek also closed source their custom AI training and inference framework which lead to such efficiency gains. If they didn’t omit anything, they fully explained their method. But in the meantime they have an edge (specially noticeble on deepseek’s inference pricing vs companies like deepinfra or together ai).
I keep up on AI news every day, and the last 14 days have been IMPOSSIBLE to manage.
for the OpenAI operator, I feel like 90% of the repetitive work done on computers is first still highly specialized and nuanced, even though your average human may not think so, second confidential, so until they make a system that can view your screen, understand it super fast and learn your work pipeline on your local machine (or at least on the company's server), I don't see it being used all that much.
As presented in this video I see the potential of unifying and improving the workflow of employees of a company, let's say to individually analyze employees to see what aspects of their work can be improved, which can be done in both 1984-esque style by the employer (which is SUPER unlikely, because the consulting system would need to be smarter and better than humans who do the work), or it can be done by the person itself, to see what they can do to improve, or if they're missing something. Can also be used to get better at video games as well :)
Great summary.
I'm really not sure about them spending $500bn now, or however much it would be. Imagine you spent $500b on AI in 2004 there would be a 0% chance of getting AGI out of that, so it's possible they're too early. I read the other day that datacentres apparently only last 3-5 years and then they need to be replaced with newer gear.
I think it's interesting with GO that Alpha Zero basically learned the game from nothing. That's my personal mathematics ASI test, given just the ZFC axioms and some logic rules invent all the mathematics that we already know, that would be super convincing that it's ready to go.
Something I've pondered regarding process rewards vs outcome rewards is "what happens when it is starting to exceed human abilities in some domain?". Consider a game like Chess or Go: people have been playing for centuries and have developed myriad strategies, but the games haven't been *solved*, those strategies might have inherent flaws that people just haven't been clever enough to identify, so while it might make sense to reward adherence to certain strategies when the model is weak that should be less of a factor once it gets beyond human ability, the final win vs loss outcome is the ultimate "contact with reality".
Thank you for another informative and reasonable video! The hype in some places is out of control... Clearly these are very early days for agents and Operator is not really a timesaver at this point (as you said "a stretch to say that it's useful"). It's an impressive proof of concept though.
In another inteview (Financial Times) Hassabis said 5-10 years. Given that he believes more breakthroughs are probably needed and some fundamental capabilities are missing, I feel like that is more in line with what he actually believes. I think he hallucinated the 3-5 years answer :).
With all the hype about Deepseek im waiting for you and one other non-hype and non click bait/"game over" tech youtuber to tell us what you think about the Deepseek stuff. Thanks!
I think the focus on steps of logically answering a complex question made sense at the time. If you train reasoning, the LLM could theoretically take that and apply it to anything.
However, I think that the "bitter pill" lesson you mention is coming to play. If in the long run exponentially scaling compute power is the most important factor, then it was just a matter of time until it just made sense to care about the answer and not about the steps to get there. This is also probably the core reason behind the massive AI infrastructure investments
seriously feeling like we are in a short timelines world...
Thank you for the work you do. You are the best on TH-cam for a sources of the latest news.
I find Sam's tweet at 6:15 pretty disturbing. He even went as far as to call people who call out Trump's behaviour "NPCs". Ahh, man. Dark times ahead, very dark times.
I say we should rename “dead internet theory” into the dead internet -theory-
Humanity last exam reminds me about the puzzles in chess that cannot be solved by chess engines. In the past there were like millions on puzzles (types of puzzles really) that couldn't be solved. Then 100s thousand, then 10s thousands and then less.
In any case, in a normal game of chess, already engines that could struggle with some puzzles would destroy everyone (granted they should avoid following very forcing drawish lines). Nowadays one has engines that give up to two pawns as material odd and still destroy everyone.
there's still puzzles engines can't solve mostly because the solution is just way too long and losing position all the way until the end (humans can see that it follows a specific pattern, but engines don't do that)
So because we humans don’t want to do the hard work to get to an answer every time, we develop heuristics and tools. Now we’re letting our tools develop heuristics?! We’re f@(ked if we’re too lazy to at least make our systems do it right
I think all AI companies forecasting the same timescales for AGI is purely marketing to investors. If one said 3-5 years and others said 8-10, who would get the investment?
I'll have you know that Deepseek R1 is the safest model ever released! What other model can protect its users from dangerous ideas like the sovereignty of Taiwan?
I played around with a Chinese chatbot, and interestingly, it denied any event happened in tiananmen square on june 3rd 1989, and even went so far as to say it's a "completely malicious falsehood"
I mean it was massively overblown by western media as per usual exaggerating their enemies events and minimising their own. I mean the american police do more damage in a few months than what occurred in Tianamen Square.
For a comparison, The Manhattan Project ($2 billion in 1940s dollars, ~$30 billion today) & NASA’s Apollo Program cost ~$28 billion in the 1960s ($280 billion in today’s dollars).
Brandon Ewing is gonna be so thrilled to see other people having various "All At Once" concepts to talk about
7:35 Why would you assume that isn't already happening? (looking at you NSA)
Grateful for this video. I'm not following all the news because I only have time to work and sleep so this helps.
Scary timeline though.
Can't imagine what will happen when just a few large companies/governments will have their own AGIs. Something tells me they won't be at their best behavior.
I will have to say, the definition for AGI that Demis used in that clip is effectively ASI under most understandings of that term. So, what he is saying is that we will have ASI in 3-5 years. This matches with Dario's prediction in his recent interview, predicting for by 2027
3 years is 2028, not 2027.
Some sort of “super intelligence” or “singularity” has been just over the horizon for many years now. If not mistaken there’s still no consensus on what this would actually entail despite all the Kurzweill, Bostrom etc. disciples claiming the messiah-moment is imminent. Curious to know of any evidence to support these claims?
These guys all have the stink of recent conversations with spooky agencies. Trust nothing they say publicly.
thanks for providing hype-free analysis!
Hi just wanted to say I'm always wanting for your videos and entering TH-cam for that most of the time
Don't feel sorry for us, we have you. Really appreciate your hard work and incredibly clear explanations of the world of AI. I can sound smart at parties because of you.
Can we get a dedicated episode for DeepSeek? It's one of the biggest news in AI maybe in a year
@ 20:43 I suspect there is a good reason for the observed bias of "b", "c" and it has to do with the "random" number generator.
Most "random" number generators just sample a high speed clock, in other words, not truly random (probably allowed an adversarial AI team to find an exploit in the AlphaGo/Zero to get an average player to beat it).