the most impressive thing for me is that they actually have the capacity to roll this out. we've come a long way since google got caught flat-footed and had nothing more than poor old lamda-based bard prototypes because everything else was too heavy to serve
The debate over slowing progress in LLMs overlooks a key point: while model advancement rates may be debatable, we're nowhere near realizing the potential of existing capabilities. Emergence isn't just about unexpected model capabilities appearing; it's also about practitioners discovering unexpected possibilities through creative applications of current systems.
หลายเดือนก่อน +8
Not to mention that if einstein was struck by lightning and become 10x smarter, the benchmark wouldnt be able to reflect it. It would lokk like 1.1x or soemthing 😂
yep. There's still ways to use the current LLMs without improving them that don't exist yet. That's also progress if it's implemented, just perhaps slightly less exciting
I get what you mean, but if the long term goal is to at some point reach something that can actually be called AGI, the foundation being used won’t ever bring us there no matter what novel approach comes out, I think we need one or two major breakthrough at the level of “Attention is all you need” from 2017. These architectures leveraging transformers initially started as a very effective tool for translation, and eventually with fine tuning we were able to get gpt 3.5, currently the newest approach is to use chain of thought to imitate reasoning. While the progress is still impressive, we won’t ever build something truly intelligent (it still seems that if you give a SOTA LLM a problem it hasn’t seen, it will struggle and even if it has, it can is still prone to generate a misleading token, deviating it entirely from the right solution). Another problem I’ve been encountering online and even experienced recently, is the skepticism for using these models in production within a somewhat serious application, the risks present in every single model to “hallucinate” has too much risk, since at the end of the day, that’s what they’re basically designed to do, hallucinate based on the text they’ve been trained and been given as a prompt. Very excited to see regardless where we will be in the next 5 years
At first I thought Sam Altman was a hero but the more time passes and the more he speaks the more I realise he's just a hypeman. I don't blame him, it's his job, but it does reduce how much I trust him. Amazing video as usual Mr Explained!
Thank you so much for your videos! Quick uploads, high quality, intelligent, and yet still fun to watch. In the past weeks, the amount of time I have decreased drastically. I stopped watching a lot of different AI TH-cam channels. But let me tell you this: I did not miss a single video of yours, and I don't plan to ever miss one!
30 minutes ago I opened Gemini, just setting up basic parameters I want carried out as part of the background, to checkout every couple of weeks, as an experiment. As always thank you Phillip
Yes. The AI also got it wrong tho, not just him. And it was quite an easy problem, but man was up all night reading research papers so we shouldn't be too hard on him.
Major progress, I suspect, will shift from scaling giant general models to assembling smaller, narrower-domain specialized models -- along with memory storage and management components, and some kind of domain identification/routing element -- into a sort of modular system that's smarter than the sum of its parts.
@@davidlovesyeshua- Human minds comprise somewhat domain biased, sort of soft cognitive modules (permeable, still relatively domain flexible) along with harder (less permeable, less flexible) evolutionarily older modules with clear neuro-structural correlates. They aren't just big, homogenous natural language processing blobs.
@@davidlovesyeshua if you're talking to me, it's because using many different specialized models is very different from AGI, you could even say it's the opposite. AGI is supposed to be the best or close to in all domains (general intelligence), so using different models will not create one model that's good at everything. Of course it's a doubt that AGI is possible, but I hope it can happen
Ahh, blessed voice of reason :). And yep, even long "pre-AI" HUDs (that essentially calculate the odds, advise "textbook best play" etc. during hands) were a big issue in games like online Poker (sites initially tried to stop their use but now some at least actually bill it as a "feature"). AI will just expand that to, well, potentially _every_ game (and _that_ actually seems a sane use case to me, even with hallucinations, because it gives you an edge BUT - unless you've got a gambling problem - the stakes aren't _that_ high, unlike e.g. medical diagnosis, driving etc.).
This is probably the only google product i can say i approve of so far. Mostly because they finally listened and toned down the censorship. You can create a tailor made model now specifically for you, with tons of fancy features that can help you and a lot of input tokens.
I was and still am supporting OpenAI yet this last year they have been hit hard with a lot of the key developers leaving. One of the biggest issues I think as well was that 01 was to be ChatGPT 5.0 yet it wasn't what they were hoping for. The only answer they have to fix the current issue is to simply put more compute power and why it has started to cost so much as the compute power should only be needed at the training level.
i've tried Gemini 2 Flash in my native language (french) and the results were HILAROUSLY bad. i asked it, "hey! can you hear me okay?" and it wrote me, i kid you not, *an essay about the meaning of what the phrase "hey! can you hear me okay?"*, instead of just replying. it did that for anything i asked. like i would literally just say "hello!" and instead of saying hello back to me it would offer translations, suggestions, explanations, of... "hello", instead of talking. i've never seen a language model do that before.
Advanced voice mode from OpenAI started speaking Dutch with mistakes when I asked it to have a basic conversation in Dutch with my wife to help her learn the language. One time it started speaking German with me and then I told it I don't speak German, it then switched to English with a German accent 😂
I recall speaking to Open AI's advanced voice mode in French and discussing movies and when I mentioned a Ghibli movie it started speaking to me in Japanese. I could make out little bits of what it said, but I had to ask it to repeat it in French and not do that again. Later we switched to English without issues. It was also bad at transcribing the French we were speaking. It would speak and understand me much more accurately than it would would document in the transcription.
Sundar's "the low hanging fruit is gone" and Sam's "there is no wall", are not contradictory. To me, they are both obviously true. A year ago, any company with a billion-dollar GPU cluster could advance the state of the art if that was their goal. Today it is no longer a given. That doesn't mean there is some kind of barrier. We can reasonably expect ai abilities to improve at least logarithmically in proportion to the amount of compute applied to it (e.g. something like, double the compute gives you 5% more intelligence). That's not great, but even if that is the best we can do (spoiler alert, it's not that bleak), the tide of moore's law will still carry us to superintelligence in a reasonable amount of time. So, right, "there is no wall", but also, "the low hanging fruit is gone".
Ultimately while I believe all of this is relatively amazing and I can see the potential for this tech, I can't really see myself using it until its able to be run entirely locally and for free. I want to have control over my data, my privacy and my system and until an open source model that can do all of these tasks is able to be run locally and is uncensored and unrestricted, I doubt many people will use these services.
I think you're a bit overly optimistic regarding how aware people are of these privacy issues. The vast majority of consumers don't know the difference between something being run locally or remotely. For major companies this isn't going to be a factor in their marketing at all.
Just like in any business, when the easy gains are over you need to get out the big guns, like: - Paying more to keep the good researchers - Hope for accidental discoveries - Put some money aside for the inevitable waves of new machine learning graduates, that should come online in a year or two - Keep scaling compute efficiency - Putting your researchers on crunchtime. This will never go poorly, am I right?
@@julius4858There was either and error or a leak of a gpt4.5 in their web UI on the first day when they released o1, so maybe that. Allthough why would they need an o1-pro if they had a 4.5 (that at least needs to be able to beat Sonnet3.5(new))
@@AmandaFessler I mean it wouldn’t even make sense to work on o1 and release it with a 200$ price tag and then release an even better model, days later? Idk, I don’t see that happening
The trajectory of uses in OAI and Google is noteworthy that both integrate them in the same time but with different strategy. Google is still experimenting while OAI is part of its service. I can see use cases are still on the horizon and how it can impact education and business are the most important to see and test.
Thabks for the video! I understand what you said about the benchmarks, but what hgemini flash got on gpqa and MMLU pro ect is on par with the new sonnet 3.5.they are impressive. I understand you weigh your own benchmark higher, but that doesn't take away from how impressive it is on other benchmarks. Always enjoy the videos!
I think it's abundantly clear that progress in terms of LLM intelligence has slowed down (of course, depending on what you take as your starting point) and Sundar is obviously right that the low hanging fruit is gone. While we see a lot of interesting additions like step by step reasoning, better text to video, etc they all come with the same problem of frequently hallucinating and not being reliable and progress is incremental, not exponential. I'm not saying those incremental gains are not interesting or even amazing but they are surely not as revolutionary as the times when LLMs became more public, or when things like image recognition, text to video, GPT4 were launched. We have the same tools as back then, just slightly better. I would argue that for most people things like o1 and Sora are not interesting developments.
I think that this technology Is really game changing in perspective. A system that can "see" and "listen" like people do unlock a vast amount of use cases
Startups depends on hyping up AI. Google's doesn't. Everyone is aware of the limitations of scaling pertaining. Last year openai was hyping strawberry as of it was AGI turn out it does not deliver significant gain compared to standard models.
No, they weren’t. You may have heard that, but if so it’s because you’re not paying attention. Absolutely no one from OpenAI-not a single, solitary employee-ever at any point claimed or even implied that strawberry was AGI. What’s more, Altman said multiple times, on the record, that the models they’re working on now, including strawberry, are NOT AGI, that they’re still pretty dumb in many ways, and that people should set realistic expectations. What you said is not even remotely close to being true, in any way. Pay better attention.
@@therainman7777 where were you during their Microsoft drama. The paywall article about strawberry, Q* etc. Their teams on twitter. O1 isn't remotely close to what it was hyped to be.
I tried to watch a film with Gemini together, but it just keeps responding to the movie every two or three seconds. It seems it cannot tell the difference between my voice and sound in the video.
lets say slowing down right even with another year of progress before a significant pause will have models that will probably max out most benchmarks. That initself is nothing shorting of astonishing and shouldn't be overlooked as "slow down in progress".
I'll be really interested to see Gemini 2.0 Flash's pricing. On benchmarks done by artificial analysis it actually seemed kind of competitive with even GPT-4o or 3.5 Sonnet in (some) areas (i.e. 87 MMLU, 59% GPQA diamond, 91% HumanEval which is all quite decent, especially for the smallest "Flash" model). Obviously on simple bench, as you point out, it's scoring a lot lower though. But if it's priced anywhere similarly to 1.5 Flash then the ratio of price to performance will be actually insane. Or perhaps we may see a pricing hike just as we did with Claude 3.5 Haiku.
Where does Microsoft's copilot sit within all these new announcements? Does Microsoft risk on missing out (again) on the next wave of personal assistant around automating local/personal tasks for users on their local computer/phone?
4:36 would have been easier for me to discern what you were talking about if you had animated a red circle or added some other indicator to point to which benchmark you were referring. i barely even noticed the mouse pointer
It seems that lack of inference time compute is the main reason these advanced features aren't being released publicly. It could be years before compute costs have fallen enough to make this intelligence "too cheap to meter".
Google is a company that does much more than just AI, so they can talk about it more realistically. In contrast, OpenAI or Anthropic ONLY do AI, they have no choice but to claim dramatic improvements in AI. My personal view is that the reason for this AI explosion is cuz via transformer architecture we were able to shift the task from a human, in developing a more clever AI (developing algorithms improvements from a human) to a hardware. So we can overload the hardware and get better AI. At this point it seems we are back at the point of improving the algorithm since it's too costly or too difficult to simply just train larger models. There it's a phase change in how progress is made and it will be much slower.
Multi-billion dollar data-centres and R&D investments in AI, we now have one hell of a gaming assistant. I wanna see AI achievements we can all benefit from e.g. Healthcare milestones (Physical & Mental), climate crisis solutions, individualised education, encryption enhancements, energy efficiencies, reduced starvation, methods to prevent wars, etc. I'm over the toys... not Santa's toys, I still want those.
When I test Gemini 2 screen sharing a grafana graph, it failed every single question. It seemed to just make up stuff from generic knowledge. It could not read the legend or the menu selections. It was a total fail.
OK, after testing on an ipad I also got some strange behavior. Basically, on the ipad Gemini needs permission to access the microphone and on the pc, you need to attach a microphone for Gemini to work (otherwise it just makes up stuff). Once I made those changes, Gemini was able to give me detailed info about the screens I was sharing.
*SUNDAR PICHAI:* “The low-hanging fruit is gone.” *SAM ALTMAN:* “There is no wall.” *DARIO AMODEI:* “…if you just eyeball the rate at which these capabilities are increasing, it does make you think that we’ll get there [to AGI] by 2026 or 2027.” I’d place my money on Sundar Pichai. Ed Zitron, AI-skeptic, said earlier this month, “Generative AI's products have effectively been trapped in amber for over a year.” (And, as I’ve said previously, I tend to view what Sam Altman says in much the same way columnist Mary McCarthy viewed what playwright and memoirist Lillian Hellman had to say.)
Of course the low hanging fruit is gone for the company that's been the furthest behind for most of the race. There is currently a wall, but it's entirely artificial and a result of methodological pigeonholing and bad curation/system prompts/DPO/clamping; and as soon as one of them finds the right path, there will be another watershed- even moreso with some new long-range attention mechanisms inspired by quantum surfaceology
@alakani “Of course the low hanging fruit is gone for the company that's been the furthest behind for most of the race.” Well, that might be true but I'm not so sure that's what Pichai meant. I guess my take is there’s a wall in terms of the current technology, i.e., more scaling, more “compute” is not going to lead to AGI. (Right at this point, I don't think we're even at I.) But I'm no computer expert, just a regular person, so it's just a hunch.
@@jeff__w Yeah I think I agree with you on that, a fundamental architecture change will be needed, but it's not really that big of a change and can even be retrofitted without full retraining. An analogy I like to use is that on December 8th 1903 the New York Times published an article that matched the sentiments of the military and mainstream science: that humans achieving flight was an unrealistic fantasy because materials were just too heavy for machines to fly, and that it would take 1 to 10 _million years_ - This was of course 5 days before a couple randos did it in their garage for a few bucks
@@alakaniI liked your NYT human fight story but I don't know who the Wright brothers in the current AI scene would be-the current crop of AI promoters seem more like Otto Lilienthal than Orville and Wilbur to me. And I'm not sure a fundamental architecture change _won’t_ be a big change but we'll just have to see. And, in any case, I don't know if Dario Amodei or Sam Altman are thinking anything like that-again, it seems like, to them, _more_ of the current architecture is what will get us to our Glorious AGI Future. (And, whatever Sam Altman thinks, I don't trust a syllable out of his mouth, anyway.)
Tried the google instant test tool on my iPhone. Pretty glitchy (re it didn't think it had access to the video). Will try again tomorrow. Good these tools are becoming more practical. My guess is that to develop significantly more intelligence than a human we will need a) some form of "embodiment", enabling these models to continuously learn from the environment - maybe over an extended period - and b) we will need a new technology for building, training and running these models significantly more efficiently, such as analogue computing circuitry or optical computing circuitry etc (these are just examples - I don't have a view on which if either might be made to work).
There is a simple explanation for the difference between what Sundar Pichai is saying and what Sam tweeted. Sam Altman, as I have said many times in your comments, is a liar. 😉
This is the only channel I still trust to get my Tic Tac Toe news
This is the comments section I come to get the most accurate Tic Tac Toe news. Sorry Phil XD
The real shipmas is the frequency of these AI Explained Videos.
For real, he’s spoiling us.
the most impressive thing for me is that they actually have the capacity to roll this out. we've come a long way since google got caught flat-footed and had nothing more than poor old lamda-based bard prototypes because everything else was too heavy to serve
That's because none of this is particularly hard - it just requires lots of compute - which they have.
The debate over slowing progress in LLMs overlooks a key point: while model advancement rates may be debatable, we're nowhere near realizing the potential of existing capabilities. Emergence isn't just about unexpected model capabilities appearing; it's also about practitioners discovering unexpected possibilities through creative applications of current systems.
Not to mention that if einstein was struck by lightning and become 10x smarter, the benchmark wouldnt be able to reflect it. It would lokk like 1.1x or soemthing 😂
:-) - lol
yep. There's still ways to use the current LLMs without improving them that don't exist yet. That's also progress if it's implemented, just perhaps slightly less exciting
I get what you mean, but if the long term goal is to at some point reach something that can actually be called AGI, the foundation being used won’t ever bring us there no matter what novel approach comes out, I think we need one or two major breakthrough at the level of “Attention is all you need” from 2017. These architectures leveraging transformers initially started as a very effective tool for translation, and eventually with fine tuning we were able to get gpt 3.5, currently the newest approach is to use chain of thought to imitate reasoning.
While the progress is still impressive, we won’t ever build something truly intelligent (it still seems that if you give a SOTA LLM a problem it hasn’t seen, it will struggle and even if it has, it can is still prone to generate a misleading token, deviating it entirely from the right solution). Another problem I’ve been encountering online and even experienced recently, is the skepticism for using these models in production within a somewhat serious application, the risks present in every single model to “hallucinate” has too much risk, since at the end of the day, that’s what they’re basically designed to do, hallucinate based on the text they’ve been trained and been given as a prompt. Very excited to see regardless where we will be in the next 5 years
In reality this is how all technology advances, massive strides and then iterative steps which lead to enormous advancement over time
So far this new Gemini is the only amazing thing to come out during OpenAI’s 12 days
Nobody cares about Gemini.
🤣
@@tunestarI use Gemini way more than ChatGPT so idk what your going on about
@@tunestarand nobody can get anything by open Ai.
@@tunestar it doesn't matter. its a google "default". defaults rule the world
At first I thought Sam Altman was a hero but the more time passes and the more he speaks the more I realise he's just a hypeman. I don't blame him, it's his job, but it does reduce how much I trust him.
Amazing video as usual Mr Explained!
Thank you so much for your videos! Quick uploads, high quality, intelligent, and yet still fun to watch. In the past weeks, the amount of time I have decreased drastically. I stopped watching a lot of different AI TH-cam channels. But let me tell you this: I did not miss a single video of yours, and I don't plan to ever miss one!
That is so kind and means a lot
“There isn’t really a wall per se, but there is a bit of a hill that we need to hike.” - Sundar Pichai
Pretty much what Altman is saying. No wall just harder to make progress.
@@byrnemeister2008 Meanwhile Noam Brown at OpenAI (one of the main guys behind o1) has just confidently said that progress will accelerate in 2025.
Pichai was saying that it gets really steep, but when the "competition" was mentioned , he changed tune.(investors are listening).....
Most surprising fact from today’s video is that your name is Phillip :D
ikr
must be new then :D
30 minutes ago I opened Gemini, just setting up basic parameters I want carried out as part of the background, to checkout every couple of weeks, as an experiment. As always thank you Phillip
the tic-tac-toe part was gold 🤣 Amazing video as always! Thank you for the laugh and the great info 👏
was it a reference to the last video where he was told he got the tic tac toe problem wrong and told the ai (which got it right) that it was wrong?
Yes. The AI also got it wrong tho, not just him.
And it was quite an easy problem, but man was up all night reading research papers so we shouldn't be too hard on him.
I used the studio to help me solve a puzzle live in a videogame. Who even needs game guides anymore :)
Major progress, I suspect, will shift from scaling giant general models to assembling smaller, narrower-domain specialized models -- along with memory storage and management components, and some kind of domain identification/routing element -- into a sort of modular system that's smarter than the sum of its parts.
that would surely introduce a lot of latency, and I doubt that's the way for AGI
@@goldenshirt Cacogenicist's observations do seem like the most likely areas of progress. People's desperation for AGI blinds them.
Any reason for thinking this?
@@davidlovesyeshua- Human minds comprise somewhat domain biased, sort of soft cognitive modules (permeable, still relatively domain flexible) along with harder (less permeable, less flexible) evolutionarily older modules with clear neuro-structural correlates. They aren't just big, homogenous natural language processing blobs.
@@davidlovesyeshua if you're talking to me, it's because using many different specialized models is very different from AGI, you could even say it's the opposite.
AGI is supposed to be the best or close to in all domains (general intelligence), so using different models will not create one model that's good at everything.
Of course it's a doubt that AGI is possible, but I hope it can happen
It cannot be argued that advanced voice has way higher sound quality than project Astra
Ahh, blessed voice of reason :). And yep, even long "pre-AI" HUDs (that essentially calculate the odds, advise "textbook best play" etc. during hands) were a big issue in games like online Poker (sites initially tried to stop their use but now some at least actually bill it as a "feature").
AI will just expand that to, well, potentially _every_ game (and _that_ actually seems a sane use case to me, even with hallucinations, because it gives you an edge BUT - unless you've got a gambling problem - the stakes aren't _that_ high, unlike e.g. medical diagnosis, driving etc.).
6:20 the reference to the mistake on the previous video is hilarious
IKR
This is probably the only google product i can say i approve of so far. Mostly because they finally listened and toned down the censorship. You can create a tailor made model now specifically for you, with tons of fancy features that can help you and a lot of input tokens.
I was and still am supporting OpenAI yet this last year they have been hit hard with a lot of the key developers leaving. One of the biggest issues I think as well was that 01 was to be ChatGPT 5.0 yet it wasn't what they were hoping for. The only answer they have to fix the current issue is to simply put more compute power and why it has started to cost so much as the compute power should only be needed at the training level.
This is not football 😊
i've tried Gemini 2 Flash in my native language (french) and the results were HILAROUSLY bad. i asked it, "hey! can you hear me okay?" and it wrote me, i kid you not, *an essay about the meaning of what the phrase "hey! can you hear me okay?"*, instead of just replying. it did that for anything i asked. like i would literally just say "hello!" and instead of saying hello back to me it would offer translations, suggestions, explanations, of... "hello", instead of talking. i've never seen a language model do that before.
that's a true LLM moment right there lol. Ah, the experience of running into a mistake that a human would never make under any circumstances...
I spoke to it in Greek and it replied with bogus TV series Greek 😂😂
Advanced voice mode from OpenAI started speaking Dutch with mistakes when I asked it to have a basic conversation in Dutch with my wife to help her learn the language. One time it started speaking German with me and then I told it I don't speak German, it then switched to English with a German accent 😂
I recall speaking to Open AI's advanced voice mode in French and discussing movies and when I mentioned a Ghibli movie it started speaking to me in Japanese. I could make out little bits of what it said, but I had to ask it to repeat it in French and not do that again. Later we switched to English without issues.
It was also bad at transcribing the French we were speaking. It would speak and understand me much more accurately than it would would document in the transcription.
Proud of OAI shipping Gemini Flash 2.0 and all those amazing tools for their shipmas lol
Thanks for the timely video, Philip - you are the go-to source for me!
Sundar's "the low hanging fruit is gone" and Sam's "there is no wall", are not contradictory. To me, they are both obviously true. A year ago, any company with a billion-dollar GPU cluster could advance the state of the art if that was their goal. Today it is no longer a given. That doesn't mean there is some kind of barrier. We can reasonably expect ai abilities to improve at least logarithmically in proportion to the amount of compute applied to it (e.g. something like, double the compute gives you 5% more intelligence). That's not great, but even if that is the best we can do (spoiler alert, it's not that bleak), the tide of moore's law will still carry us to superintelligence in a reasonable amount of time. So, right, "there is no wall", but also, "the low hanging fruit is gone".
Loving the frequent updates form you
This is definitely big for their data collection program😯🔥
Ultimately while I believe all of this is relatively amazing and I can see the potential for this tech, I can't really see myself using it until its able to be run entirely locally and for free. I want to have control over my data, my privacy and my system and until an open source model that can do all of these tasks is able to be run locally and is uncensored and unrestricted, I doubt many people will use these services.
💯
I think you're a bit overly optimistic regarding how aware people are of these privacy issues. The vast majority of consumers don't know the difference between something being run locally or remotely. For major companies this isn't going to be a factor in their marketing at all.
Just like in any business, when the easy gains are over you need to get out the big guns, like:
- Paying more to keep the good researchers
- Hope for accidental discoveries
- Put some money aside for the inevitable waves of new machine learning graduates, that should come online in a year or two
- Keep scaling compute efficiency
- Putting your researchers on crunchtime. This will never go poorly, am I right?
"AGI is in the air" Brockman said. Hope we end this shipmas with a banger from both Google and OpenAI. Great video, dude!
Wouldn’t hold my breath, they already gave us o1 pro, what else is gonna come now 😂
@@julius4858right, but you never know
@@julius4858There was either and error or a leak of a gpt4.5 in their web UI on the first day when they released o1, so maybe that. Allthough why would they need an o1-pro if they had a 4.5 (that at least needs to be able to beat Sonnet3.5(new))
@@julius4858 Alas, as a pessimist, I agree. Start with the big one. Everything else is gravy.
@@AmandaFessler I mean it wouldn’t even make sense to work on o1 and release it with a 200$ price tag and then release an even better model, days later? Idk, I don’t see that happening
The trajectory of uses in OAI and Google is noteworthy that both integrate them in the same time but with different strategy. Google is still experimenting while OAI is part of its service. I can see use cases are still on the horizon and how it can impact education and business are the most important to see and test.
Excellent callback on the noughts and crosses lmao
We need to talk about Genesis.
Thabks for the video! I understand what you said about the benchmarks, but what hgemini flash got on gpqa and MMLU pro ect is on par with the new sonnet 3.5.they are impressive. I understand you weigh your own benchmark higher, but that doesn't take away from how impressive it is on other benchmarks. Always enjoy the videos!
I think it's abundantly clear that progress in terms of LLM intelligence has slowed down (of course, depending on what you take as your starting point) and Sundar is obviously right that the low hanging fruit is gone. While we see a lot of interesting additions like step by step reasoning, better text to video, etc they all come with the same problem of frequently hallucinating and not being reliable and progress is incremental, not exponential. I'm not saying those incremental gains are not interesting or even amazing but they are surely not as revolutionary as the times when LLMs became more public, or when things like image recognition, text to video, GPT4 were launched. We have the same tools as back then, just slightly better. I would argue that for most people things like o1 and Sora are not interesting developments.
I think that this technology Is really game changing in perspective. A system that can "see" and "listen" like people do unlock a vast amount of use cases
Best thumbnails in the game.
Any thoughts on Anthropic Claude's MCP?
Thanks! Great content, as always! 🙏🏼
Thanks for this summary! Much appreciated!
Real ones that get the tic tac toe joke button
Give him a break😂
yeah that was a cute reference
Very cheeky, we love to see it 😏😁
Gemini 2 is looking amazing!
I may be wrong, but I believe that when Sam said “there is no wall,” he was referring to this quote from The Matrix: “there is no spoon.”
Lol
I thought it was Monty Python. “Found this spoon, sir”. “Well done Cenurion”
Liked for the use of the word "Normies".
I love how you owned your tic-tac-toe mistake from a previous video :)
It's funny that you brought up competitive gameplay at the end. I already feel like every game is seeing an uptick in cut and paste python aimbots.
Great video mate 🎉
As usual. Great video !
Tried to use Live and it failed.
Sorry to hear, Max
Startups depends on hyping up AI. Google's doesn't. Everyone is aware of the limitations of scaling pertaining. Last year openai was hyping strawberry as of it was AGI turn out it does not deliver significant gain compared to standard models.
No, they weren’t. You may have heard that, but if so it’s because you’re not paying attention. Absolutely no one from OpenAI-not a single, solitary employee-ever at any point claimed or even implied that strawberry was AGI. What’s more, Altman said multiple times, on the record, that the models they’re working on now, including strawberry, are NOT AGI, that they’re still pretty dumb in many ways, and that people should set realistic expectations. What you said is not even remotely close to being true, in any way. Pay better attention.
@@therainman7777 where were you during their Microsoft drama. The paywall article about strawberry, Q* etc. Their teams on twitter. O1 isn't remotely close to what it was hyped to be.
Basically, Jimmy Apples is probably an OpenAI psyop and it might just be Sam Altman's sock puppet
I tried to watch a film with Gemini together, but it just keeps responding to the movie every two or three seconds. It seems it cannot tell the difference between my voice and sound in the video.
Yeah, that would be a big unlock
Sounds like watching a movie with my family lol
Cool tools, still far into their infancy. Thanks for the video.
6:20 quite an expert at tic-tac-toe huh... I can't agree more!!😂
I think we’re all waiting for your take on O3 here!!
is this AGI Philip?
I remember I read the "Never Eat Alone" book. Finally the promise in the title is becoming real.
lets say slowing down right even with another year of progress before a significant pause will have models that will probably max out most benchmarks. That initself is nothing shorting of astonishing and shouldn't be overlooked as "slow down in progress".
Great video. Thanks
I'll be really interested to see Gemini 2.0 Flash's pricing. On benchmarks done by artificial analysis it actually seemed kind of competitive with even GPT-4o or 3.5 Sonnet in (some) areas (i.e. 87 MMLU, 59% GPQA diamond, 91% HumanEval which is all quite decent, especially for the smallest "Flash" model). Obviously on simple bench, as you point out, it's scoring a lot lower though. But if it's priced anywhere similarly to 1.5 Flash then the ratio of price to performance will be actually insane. Or perhaps we may see a pricing hike just as we did with Claude 3.5 Haiku.
Where does Microsoft's copilot sit within all these new announcements? Does Microsoft risk on missing out (again) on the next wave of personal assistant around automating local/personal tasks for users on their local computer/phone?
they use openai models
my biggest use for ia will be to read dialogs on games, on the day there is a AI for that
4:36 would have been easier for me to discern what you were talking about if you had animated a red circle or added some other indicator to point to which benchmark you were referring. i barely even noticed the mouse pointer
are we forgetting that gemini 2's omnimodalities ya know the thing that was actually cool dont come until later this year
When can we get the o1 benchmarked on the simple bench?
They are rolling out the API 'in the coming weeks'
i found gemini 2 to be veeery willing to hallucinate in an effort to answer any question
Llama 3.3 70B results on simple bench are remarkable considering its size and the fact that is open source
It seems that lack of inference time compute is the main reason these advanced features aren't being released publicly. It could be years before compute costs have fallen enough to make this intelligence "too cheap to meter".
Given the amount of releases bro is working overtimes. Pulling off all nighters
Do you play an instrument @AIExplained? Take a screenshot of any sheet music and ask it any question, it will get it wrong - no matter the model
imagine you buy a pc copilot but still waiting for the live assistant and you see this
thank you 4kilksphilip
Google is a company that does much more than just AI, so they can talk about it more realistically. In contrast, OpenAI or Anthropic ONLY do AI, they have no choice but to claim dramatic improvements in AI.
My personal view is that the reason for this AI explosion is cuz via transformer architecture we were able to shift the task from a human, in developing a more clever AI (developing algorithms improvements from a human) to a hardware. So we can overload the hardware and get better AI. At this point it seems we are back at the point of improving the algorithm since it's too costly or too difficult to simply just train larger models. There it's a phase change in how progress is made and it will be much slower.
I like how not a single person at Google actually plays Clash of Clans to point out that giants don't attack town hall
Would u talk abt the byte latent transformer?
will you be doing a video on test time training? its a technique used to significantly improve a model's abstract reasoning on the arc benchmark
Multi-billion dollar data-centres and R&D investments in AI, we now have one hell of a gaming assistant.
I wanna see AI achievements we can all benefit from e.g. Healthcare milestones (Physical & Mental), climate crisis solutions, individualised education, encryption enhancements, energy efficiencies, reduced starvation, methods to prevent wars, etc.
I'm over the toys... not Santa's toys, I still want those.
The real Christmas gift would be an open Gemma 3
why did you remove the pauses while talking with gemini? Are the pauses worse or better than OpenAI's voice mode?
I got very few pauses earlier in the day then loads later so hard to pick an interaction that was representative
When I test Gemini 2 screen sharing a grafana graph, it failed every single question. It seemed to just make up stuff from generic knowledge. It could not read the legend or the menu selections. It was a total fail.
OK, after testing on an ipad I also got some strange behavior. Basically, on the ipad Gemini needs permission to access the microphone and on the pc, you need to attach a microphone for Gemini to work (otherwise it just makes up stuff). Once I made those changes, Gemini was able to give me detailed info about the screens I was sharing.
Google might leverage Google Analytics data to train their model. Imagine events called"item added to cart". A gold mine
Has gemini 1.5 flash been tested on simple bench?
Love simple bench surprised that gpt4 outperformed most others besides sonnet and o1 when I tested some of the demo q’s.
*SUNDAR PICHAI:* “The low-hanging fruit is gone.”
*SAM ALTMAN:* “There is no wall.”
*DARIO AMODEI:* “…if you just eyeball the rate at which these capabilities are increasing, it does make you think that we’ll get there [to AGI] by 2026 or 2027.”
I’d place my money on Sundar Pichai.
Ed Zitron, AI-skeptic, said earlier this month, “Generative AI's products have effectively been trapped in amber for over a year.” (And, as I’ve said previously, I tend to view what Sam Altman says in much the same way columnist Mary McCarthy viewed what playwright and memoirist Lillian Hellman had to say.)
Of course the low hanging fruit is gone for the company that's been the furthest behind for most of the race. There is currently a wall, but it's entirely artificial and a result of methodological pigeonholing and bad curation/system prompts/DPO/clamping; and as soon as one of them finds the right path, there will be another watershed- even moreso with some new long-range attention mechanisms inspired by quantum surfaceology
@alakani “Of course the low hanging fruit is gone for the company that's been the furthest behind for most of the race.”
Well, that might be true but I'm not so sure that's what Pichai meant.
I guess my take is there’s a wall in terms of the current technology, i.e., more scaling, more “compute” is not going to lead to AGI. (Right at this point, I don't think we're even at I.) But I'm no computer expert, just a regular person, so it's just a hunch.
@@jeff__w Yeah I think I agree with you on that, a fundamental architecture change will be needed, but it's not really that big of a change and can even be retrofitted without full retraining. An analogy I like to use is that on December 8th 1903 the New York Times published an article that matched the sentiments of the military and mainstream science: that humans achieving flight was an unrealistic fantasy because materials were just too heavy for machines to fly, and that it would take 1 to 10 _million years_ - This was of course 5 days before a couple randos did it in their garage for a few bucks
@@alakaniI liked your NYT human fight story but I don't know who the Wright brothers in the current AI scene would be-the current crop of AI promoters seem more like Otto Lilienthal than Orville and Wilbur to me. And I'm not sure a fundamental architecture change _won’t_ be a big change but we'll just have to see. And, in any case, I don't know if Dario Amodei or Sam Altman are thinking anything like that-again, it seems like, to them, _more_ of the current architecture is what will get us to our Glorious AGI Future. (And, whatever Sam Altman thinks, I don't trust a syllable out of his mouth, anyway.)
Tried the google instant test tool on my iPhone. Pretty glitchy (re it didn't think it had access to the video). Will try again tomorrow. Good these tools are becoming more practical. My guess is that to develop significantly more intelligence than a human we will need a) some form of "embodiment", enabling these models to continuously learn from the environment - maybe over an extended period - and b) we will need a new technology for building, training and running these models significantly more efficiently, such as analogue computing circuitry or optical computing circuitry etc (these are just examples - I don't have a view on which if either might be made to work).
From this release, OpenAI is forced to release the share screen. If not I will have to make the switch
Google just being honest, and with deepmind behind them, I have my most confidence on google and what they say.
thx 🙏 🤖👍🏼
Not sure if I would be comfortable with using an AI that's not run locally to be my computer assistant.
Why do coding adventure, 3b1b and AI explained all sound similar? Are they one person?
I trust Sam Altman more than
Google because Google does
insert feature secretly.Google
has developed so much AI but
it is highly secretive.
There is a simple explanation for the difference between what Sundar Pichai is saying and what Sam tweeted. Sam Altman, as I have said many times in your comments, is a liar. 😉
Where I work, I can see pathways to automating 90% of what my department does. With just existing technology.
smash that notification
Up to this point, i only trust benchmarks with private test sets
Yeah I'm up
Wow, AI will be able to teach itself how to play a game? I can finally max all my skills in RuneScape
Please add Granite 3 on the simple bench
"Altman: 'No limits to AI!' Sure, gotta keep the investors happy. Pichai: 'Let’s be cautious.' Easy to say when you're sitting on a pile of cash."
“Knots and crosses” Huh, I’ve never heard it called that before.
Noughts and Crosses, a British name
I see they've gone for the Stephen Hawkins Voice Mode on Gemini, ie. TTS circa 2000
Nice
Anticheat will pretty easily be able to prevent agents from playing for people.
6:17 😉😉😉