How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ก.ย. 2024

ความคิดเห็น • 661

  • @Lucas-gt8en
    @Lucas-gt8en 2 หลายเดือนก่อน +497

    I think the fact that Zuckerberg sounds vaguely human is the most impressive AI advancement yet

    • @Penrose707
      @Penrose707 2 หลายเดือนก่อน +1

      Oh please, don't celebrate what was obviously a transparent PR move to become "cool" in the eyes of the youth. Broccoli hair and chains... all of a sudden. Ok. He's still the dude who purposefully rocked the Julius Caesar flow for like a decade. Pray tell. What other data does he want to steal from us to flip for profit? Which other brilliant app is he going to ~~create~~... er buy which will incipiate depression in our population ?

    • @dg-ov4cf
      @dg-ov4cf 2 หลายเดือนก่อน +32

      I was just about to comment about how painfully practiced his "normal human being" act is, like you just know he probably paid money to the best speech therapists and performance mentors in the world to coach him on all the mannerisms, the laugh and everything (unless you believe he just started hanging out with a bunch of chill frat dudes).
      Blows my mind how he goes from being (rightfully) clowned 24/7 for his ruthlessness and cold reptilian demeanor to somehow becoming everyone's favorite smiley happy-go-lucky bro, and now you see these comments everywhere he pops up. It kinda makes his feudal worldview even more insulting, because it kind of says he figured normal people were such easily manipulable NPCs that all it'd take was some smiles and a shiny open source model to go from being Mark "They actually trust me. Dumb fucks" Zuckerberg to Mark "Friendly Llama Man" Zuckerberg.

    • @reza2kn
      @reza2kn 2 หลายเดือนก่อน +9

      He's running on llama3-405B :D

    • @martiddy
      @martiddy 2 หลายเดือนก่อน +2

      Vaguely human, lol

    • @Merlin_Price
      @Merlin_Price 2 หลายเดือนก่อน +5

      He still looks exactly like I imagine Ronald McDonald looks without make-up.

  • @jonp3674
    @jonp3674 2 หลายเดือนก่อน +78

    Great video as always. I think one thing is it's really hard to compare machines to humans.
    So a pocket calculator is highly superhuman at arithmetic and really bad a tic tac toe, so how "intelligent" is it compared to a human?
    I also think if you ask undergrads "You have 20 seconds to produce the best answer you can to the following questions"
    "What were the causes of the 7 years war?"
    "How do plants use Boron?"
    "Give an overview of the Mayan religion and cultural practices around it."
    "Translate the above questions into 15 different languages."
    Then yeah clearly the current models are going to absolutely destroy all undergrads, in 20 seconds maybe a specialist in one of these subjects could garble something on one of the questions.
    So yeah it's really complicated. Like Cholet has examples a 5 year old can do and the models can't, but there's a lot of things the models can do that even expert humans can't. So it's really hard to compare.
    And a model which is super human at drug discovery and can't drive or play tic tac toe is still going to change the world massively.

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +19

      Great points. And thanks Jon

    • @alihms
      @alihms 2 หลายเดือนก่อน +5

      It is as if we are working to have a one monolithic AI system that can do all those things. That's not the way to go. A simpler and better approach is to have a system that can pull the input from various sources (ie, AI sub-domains). Analyse them, make intelligent decisions and actions based on that. There should be an AI sub-domain that deals with factual data ( how do plants use Boron?), another sub-domain that deals with the heuristics and logical reasoning (should I walk towards that hooded guy in that dark alley?), computational system (plays tic-tac-toe, calculating when an eclipse occurs) and so on.
      By focusing on individual specialised sub-domain AI trainings, it should be easier and less error-prone to achieve the integrated general purpose AI.

    • @FortWhenTeaThyme
      @FortWhenTeaThyme 2 หลายเดือนก่อน +2

      Exactly. People will complain about minor hallucinations, but it's like...most doctors even have ~1% hallucinations. For my history teachers it was probably 5%. We're talking about a single brain that knows nearly everything about the world, and are complaining that occasionally it gets minor details wrong.

  • @micbab-vg2mu
    @micbab-vg2mu 2 หลายเดือนก่อน +108

    Claude 3.5 Sonnet is great :) I cannot understand why less than 5% of people in the large corporation where I work are enthusiastic about generative AI-most of them haven't even tried it.

    • @HighPotentateCanute
      @HighPotentateCanute 2 หลายเดือนก่อน +37

      NFT grifters have poisoned the well, as far as getting the public really exited about new tech goes. Most folks are too tech illiterate and just want things that work. IMO of course.

    • @verigumetin4291
      @verigumetin4291 2 หลายเดือนก่อน

      @@HighPotentateCanute you live online if you think the average joe knows what an NFT is. Just like they didn't know and cared about NFTs, people don't know and don't care about AI.
      Until it takes their job away. Then they care.

    • @oiuhwoechwe
      @oiuhwoechwe 2 หลายเดือนก่อน

      they dont understand and dont want to understand because the result is scary for them. expect pressure for pitchforks and strict regulation soon by politicians. i expect they will create a false flag event to kick start that!

    • @sebastianjost
      @sebastianjost 2 หลายเดือนก่อน +40

      The frequent hallucinations are still problematic. You need experience to work around the limitations of most current LLMs. That's time people would need to invest to get the full benefits. In some domains LLMs can be super helpful, but deviating from established workflows is also a significant cost for most people.

    • @GeoMeridium
      @GeoMeridium 2 หลายเดือนก่อน +5

      @@sebastianjost I agree with you, but when you combine recent research with the level of scaling planned for the next wave of models, these errors and hallucinations are going to become a lot less frequent.
      Some hobbyists are already discovering ways of using AIs to create self-checking workflows, so I think that the common complaints about AI are going to fade.
      With that being said, the timeframe of development could get held up by chip production. In a November 2023 DARPA report, it was mentioned that OpenAI's training of GPT-5 had been held back by Nvidia's chip production backlog.

  • @daveogfans413
    @daveogfans413 2 หลายเดือนก่อน +245

    3:48 - Legit sounds like someone is having a mental breakdown

    • @julkiewicz
      @julkiewicz 2 หลายเดือนก่อน

      A mental breakdown while also having an orgasm

    • @dexterrity
      @dexterrity 2 หลายเดือนก่อน +43

      sounds manic indeed

    • @lowmax4431
      @lowmax4431 2 หลายเดือนก่อน +5

      What is that clip from?

    • @VividhKothari-rd5ll
      @VividhKothari-rd5ll 2 หลายเดือนก่อน +6

      Sounds like old movies

    • @justtiredthings
      @justtiredthings 2 หลายเดือนก่อน +21

      It sounds like they deliberately trained it on "sexy" speaking, but it sounds insane here because it's so exaggerated and false and bc of the pedestrian context. Creepy.

  • @Hunter-uz9jw
    @Hunter-uz9jw 2 หลายเดือนก่อน +200

    Zuckerberg becoming one of the more normal and balanced figures in tech is a welcome surprise lol

    • @Jm-wt1fs
      @Jm-wt1fs 2 หลายเดือนก่อน +37

      I have a feeling that he doesn’t really care about open source as a principle, but bc he was too far behind the closed source companies with AI development, it was just a business decision that was a smart move by him. Though I will say, regardless of his true beliefs or motives, the field of AI is in a significantly better place than it could’ve been, thanks to the new open source Zucc, man of the people. Who knows, maybe he’ll even open source the code he’s made of one day

    • @mikebarnacle1469
      @mikebarnacle1469 2 หลายเดือนก่อน

      ​@@Jm-wt1fs Facebook always had a great open source reputation, just look at React and the ecosystem they support around it. My personal favorite language Rescript, only really exists because they backed the core contributers. It's more about attracting talent, and I think an authentic recognition of the broader benefits of OSS. I don't like them, but gotta give them credit here. It could be worse.

    • @chrisanderson7820
      @chrisanderson7820 2 หลายเดือนก่อน +5

      @@Jm-wt1fs Everyone else decided to go Apple form factor and he decided to go the IBM PC form factor. Become the substrate, then you can sell Office for $500 a pop later on.

    • @faselblaDer3te
      @faselblaDer3te 2 หลายเดือนก่อน +1

      The balance of the force must always be maintained

    • @AISafetyAustraliaandNewZ-iy8dp
      @AISafetyAustraliaandNewZ-iy8dp 2 หลายเดือนก่อน +2

      I'm pretty confident that his position is going to age poorly.

  • @jippoti2227
    @jippoti2227 2 หลายเดือนก่อน +115

    Demis Hassabis says that AI is overhyped in the short term and probably underestimed over the long term.

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +29

      Sounds right!

    • @byrnemeister2008
      @byrnemeister2008 2 หลายเดือนก่อน +13

      He does seem to be the most grounded of all these AI superstars.

    • @alansmithee419
      @alansmithee419 2 หลายเดือนก่อน +14

      An AI youtuber (don't know which one, might have been AIExplained for all I know) did a poll about whether people thought it was over or underhyped, and many people went into the comments to say exactly that in spite of the poll, so that is not a rare opinion.
      Another (slightly less I think) popular opinion is that AI is overall overhyped in AI spaces but massively underhyped in general public spaces.

    • @sandy666ification
      @sandy666ification 2 หลายเดือนก่อน

      What does that mean?​@@aiexplained-official

    • @Rick-rl9qq
      @Rick-rl9qq 2 หลายเดือนก่อน +2

      ​@@alansmithee419You may be referring to David Shapiro. He's the one who usually does the polls

  • @KitcloudkickerJr
    @KitcloudkickerJr 2 หลายเดือนก่อน +31

    Perfect watch for my spot of tea. The growth of ai this year has been breathtaking. 3.5 sonnet is such a pleasure to work with

  • @gball8466
    @gball8466 2 หลายเดือนก่อน +21

    We are at the dawn of a new era. People are arguing over if it's going to be 2, 5, 10, or 20 years from now. Not some second coming that will never happen, but measurable progress that we get to access in real time.

  • @johnnoren7244
    @johnnoren7244 2 หลายเดือนก่อน +108

    People are saying AI is overhyped, but the actual research being released says otherwise. We've had groundbreaking paper after groundbreaking paper being released this year. If anything, the pace has increased. It's just that you don't see it in the large models, yet. It takes some time to be implemented and tested since it's risky for the big players to make big changes. There is also a lot of money invested in the old technologies. Expect a lot of new players entering that are not tied to legacy architectures and hardware. Things are about to get wild.

    • @kingki1953
      @kingki1953 2 หลายเดือนก่อน +3

      Agreed, lot of research paper released in Arxivv but i can't even follow the single on because lack of understanding language and lack of basic knowledge of current newest LLM.

    • @Apjooz
      @Apjooz 2 หลายเดือนก่อน

      Machine learning has been eating the world for over a decade now. Doesn't seem to be going away.

    • @fromscratch8774
      @fromscratch8774 2 หลายเดือนก่อน +7

      Disagree. The same hype has made it that no one with any "groundbreaking" paper gets to sit on it for a second. Everyone is scrambling to stay ahead.

    • @squamish4244
      @squamish4244 2 หลายเดือนก่อน +3

      @@fromscratch8774 What he means is that so much of what we already possess is transformative, but it hasn't been integrated into most societal structures yet. We can't know what the results of integrating AI into drug discovery will be, for instance, because it's only been a few years, a literally impossible time frame to get results in anything resembling a human.
      Not that these industry leaders aren't deliberately hyping stuff with no basis to make such claims, of course - sure they are.

    • @SnapDragon128
      @SnapDragon128 2 หลายเดือนก่อน +7

      You're right... after all, even in some weird world where Claude 3.5 Sonnet is the pinnacle of AI forevermore, it will still change the world in incredible ways over the next decade. Civilization has discovered a brand new resource, and it'll take time to figure out how to use it.

  • @josh0n
    @josh0n 2 หลายเดือนก่อน +10

    What are some of the most promising new approaches?
    - Hinton: time to think
    - Marcus: base knowledge/semantic structures?? IDK exactly
    - Kaparthy: Rethinking tokens and streaming input? With the models deciding how to chunk and what to pay attention to??
    -Others: continuous reinforcement through inference - ie combining training and inference
    - are these kinda right?
    - what else?

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +5

      See my last video!

    • @NoidoDev
      @NoidoDev 2 หลายเดือนก่อน +1

      We want all of them.

  • @KillTheWizard
    @KillTheWizard 2 หลายเดือนก่อน +43

    “Simply trusting words from the leaders of AI labs is less advisable than ever.” Agreed and I’m feeling way about most with power across many disciplines.

    • @Raulikien
      @Raulikien 2 หลายเดือนก่อน

      This tool that we are creating to be better than most humans at everything, will not replace you. New jobs (that this thing cannot totally do by definition) will appear. Buy my products. Thanks.

  • @iceshoqer
    @iceshoqer 2 หลายเดือนก่อน +46

    I was wondering when your Claude 3.5 Sonnet video would come out, this took a while!

  • @bobtivnan
    @bobtivnan 2 หลายเดือนก่อน +16

    I'm a high school math teacher who is optimistic about using AI to improve learning. The fact that it doesn't see the equivalence in q² and (-q)² could be viewed as a mistake, or it could be viewed as an opportunity for students to have conversations and find these mistakes and in the process improve their own understanding. Of course, you need a teacher to vet these instances. But I claim that it can be a great conversation starter and motivator because kids love to find faults.

    • @anonymes2884
      @anonymes2884 2 หลายเดือนก่อน +4

      Sure but then you have a very expensive tool (whether we end users pay for it or some venture capitalist) that's effectively doing the same job as "A bad maths book". _Any_ mistake is an opportunity to learn but I don't really think "mistake engine" is much of an achievement.
      (this strikes me as similar to the "brainstorming tool" idea for LLMs proposed by a researcher in the previous video, where they're apparently useful _because_ they make up weird nonsense - hallucinations, we're told, are a _feature_ not a bug. Fair enough but i'm not sure why we need to spend billions of dollars and waste huge amounts of energy to get the same outcome as an hour spent talking to your stoner friend from high school :)

    • @Hexanitrobenzene
      @Hexanitrobenzene 2 หลายเดือนก่อน

      ​@@anonymes2884
      "Brainstorming" you refer to is actually useful in the areas humans have not mastered. Like coming up with objective functions for reinforcement learning of robotic movement.
      LLM + Verifier is a very powerful tandem. AlphaGo, AlphaFold and aforementioned "robot trainer" are all of this type. The problem is, we don't know how to make verifiers as general as LLMs.

    • @musicbro8225
      @musicbro8225 2 หลายเดือนก่อน

      Using AI as a conversation starter seems questionable to me. We already see students using it to do homework because they're more interested in exploiting AI than actually learning stuff. They're happy to abdicate their inheritance of guiding the future to a machine so why are they going to suddenly be inspired to have informed conversations because AI was introduced to them with faulty functionality?
      The potential I see is it can teach one to one, which is huge! It won't cost as much as teachers, so a lot of teachers can be laid off and/or paid less since they're only overseeing now... Sounds brutal right, but that IS the future, there is no alternative in my mind. Not yet, but not long.
      Question is, will that make for smarter thinking, rational kids? What kind of world are they growing up into? Will critical thought be relevant any more or would they just need to be cooperative and indoctrinated? These are legit questions imo.

  • @Lorem_ipsum_dolor_sit_amet
    @Lorem_ipsum_dolor_sit_amet 2 หลายเดือนก่อน +76

    Pre trained transformer based AIs seem to be a brute force approach to a generalised AI system. Like a model with sufficient training data won't need the ability to genuinely reason if it has enough examples to pull from, assuming it never encounters a novel scenario (which in the real world it will).
    I'd imagine if we ever do get a "true" AGI system, it'll probably require significantly fewer resources than even GPT3.5, because if a system can reason it wouldn't require nearly a fraction of the data for extrapolating patterns as a current gen LLM.

    • @thenextension9160
      @thenextension9160 2 หลายเดือนก่อน +4

      Useful to use to make a more sophisticated form.

    • @blisphul8084
      @blisphul8084 2 หลายเดือนก่อน +2

      I think the real breakthroughs are happening with LLMs that focus on less training data and compute. First, Mixtral, then Phi-3, and now Qwen2 are the ones leading in efficiency. Sure llama 3 does things well, but it's clear they took the brute force approach, which hurts it in novel situations as you've said.

    • @fromscratch8774
      @fromscratch8774 2 หลายเดือนก่อน +1

      100%.

  • @amirhussain3028
    @amirhussain3028 2 หลายเดือนก่อน +239

    Scaling AI as a strategy is one which favours monopolistic AI rather than than doing the hard work of inventing a better algorithm that could learn and inference energy efficiently

    • @caty863
      @caty863 2 หลายเดือนก่อน +38

      Scaling is the only strategy that is "realistic" by now. A breakthrough in new architecture/algorithm could come any time now, but who knows; maybe never. So, scaling is what we've got now.

    • @mrmooshon5858
      @mrmooshon5858 2 หลายเดือนก่อน +11

      I only partially agree with you. A bigger brain could mean a more intelligent animal. If we look at small examples of neural networks, sometimes you just don't have enough parameters to be able to predict the result at a high enough success rate. LLMs work with many languages, so many languages. And in every language there are so many concepts. If we humans have a brain that's about 20x bigger than the current biggest LLM(as far as I know, maybe I am wrong), then I think it's not quite fair yet to compare the two. That's not to say we can't achieve the same level with the current scale. I am just saying that it is very possible, maybe even likely, that the scale is not nearly big enough for agi.

    • @Kazekoge101
      @Kazekoge101 2 หลายเดือนก่อน +3

      The etched ASIC chips are working on that currently apparently

    • @julkiewicz
      @julkiewicz 2 หลายเดือนก่อน +3

      It's the mainframe vs home PC all over again.

    • @stcredzero
      @stcredzero 2 หลายเดือนก่อน +4

      Someone should work up a calculation on the amount of energy a human being uses for training our natural neural networks up until we're 20 years old as a benchmark for what is possible with regards to efficiency. I know there's also the Landauer limit, but being close to that is a tech level that's getting close to godlike "Clarketech." Being as efficient as a human being is probably a far-off limit from where we are with LLMs and GPUs. But it's a good benchmark for what we could do with hardware and algorithmic improvements. (Paradigm shifts, not just incremental improvement, of course.)

  • @daPawlak
    @daPawlak 2 หลายเดือนก่อน +5

    You are sole yt channel about AI (at least that I know of) that avoids falling either into hype or debunk pitfall as far as current state of LLMs and near future goes.
    I lost interest in a buch of others recently as they are just denying reality of issues with scaling. Half of year ago I was giving them benefit of the doubt but now if you are paying attention you must see it, so if one d doesn't t acknowledge it, I can't see excuse for it. At this point it's either ignorance or denial, and yet I only ever hear you, in the yt sphere, talking about current situation as it is.
    Thank you again for that!

  • @karlwest437
    @karlwest437 2 หลายเดือนก่อน +31

    I think LLMs are like an artificial cerebellum, great at recognising, memorising and acting on instinct, but not very good at reasoning, so I think the next step would be an artificial cerebral cortex, with the neocortical column structure the human brain has

    • @thirdeye4654
      @thirdeye4654 2 หลายเดือนก่อน +9

      I understand what you want to say, but the cerebellum is mostly responsible for motor functions. I usually think of LLMs as the parts of the neocortex that is correlated with language, like Broca's and Wernicke's. And we need more subsystems to make "consciousness" possible. For example sensory input, memory, maybe also motor functions to give agency.

    • @karlwest437
      @karlwest437 2 หลายเดือนก่อน +2

      @@thirdeye4654 I think that things you need to consciously think and reason about, is done in the cerebral cortex, but stuff that you've done enough times, gets baked into the cerebellum, like motor controls, when you first learn to drive, you have to think through everything, but after a while it becomes automatic, you could say the same thing happens with language, once you've learned enough, you don't really think about it, it becomes completely natural and instinctive, it's only when asked some complex question that you have to kind of sit back and think, and that's when the cerebral cortex kicks in

    • @azertyuiop432
      @azertyuiop432 2 หลายเดือนก่อน +2

      A more apt analogy might be the the number of layers in the cortex. We could say that the current LLMs have a very gyrated cerebrum with a vast surface, great number of connections, but the intrinsing coordination is very lacking, there is no heterogenety, it is a large homogenous lump.
      There is no real compartamentalisation, with a central coordinator

    • @karlwest437
      @karlwest437 2 หลายเดือนก่อน +3

      @@azertyuiop432 yes, you could say LLMs are sort of unconsciously dreaming, and it needs some logical filter applying to it, which would be the cerebral cortex equivalent, the LLMs would dream up all sorts of solutions to questions or problems, and the cerebral cortex would analyse them and reject solutions that don't work or are nonsense, in fact hallucinations might be considered random dreaming with no conscious selection applied, essentially I think they need an artificial cerebral cortex to become conscious

    • @musicbro8225
      @musicbro8225 2 หลายเดือนก่อน +1

      @@karlwest437 Yes, primarily the frontal lobe is associated with decision making and problem solving, also reasoning, emotions and personality amongst other things (googled). It's a complicated relationship the cerebral cortex has with it's data though and not simple statistical prediction of patterns it would seem. More analogue and nuanced surely than typical digital.
      Perhaps this 'data processing unit' could best be coded by AI itself, since it's complexity is on a level of convoluted complexity many levels beyond mere database manipulation. But I think you're onto it.

  • @trucid2
    @trucid2 2 หลายเดือนก่อน +6

    Just a few years ago we were amazed that GPT 3 could add two numbers, a skill it wasn't explicitly trained for. And now only a few years later we're disappointed that the reasoning these models can do isn't yet at adult human level?

    • @hydrohasspoken6227
      @hydrohasspoken6227 2 หลายเดือนก่อน +3

      Yes. Let me tell you why. Because they bought the "AGI soon" thing.

  • @theanonymoushackers1214
    @theanonymoushackers1214 2 หลายเดือนก่อน +4

    My life is totally fcked up. I am on the verge of giving up on everything. No one respects me. The joy from studying science and technology is what is keeping me going. Thank you for your work on AI.

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +2

      I am so sorry to hear that but glad my work is helping

  • @manslaughterinc.9135
    @manslaughterinc.9135 2 หลายเดือนก่อน +75

    Man, Zuck is looking more and more human every day.

    • @alcoholrelated4529
      @alcoholrelated4529 2 หลายเดือนก่อน +7

      That's evidence that superintelligence has taken full control over him.

    • @Dan-dy8zp
      @Dan-dy8zp 2 หลายเดือนก่อน +1

      Once he is indistinguishable from us, the end comes soon after.

  • @MrSchweppes
    @MrSchweppes 2 หลายเดือนก่อน +4

    Until evidence proves otherwise, we can assume that scaling laws are not reaching their limits. It seems that scaling AI models will continue to produce impressive results, or 'wow moments.' However, as Demis Hassabis noted about six months ago, large language models (LLMs) are just 'one of the ingredients' in AI advancement, suggesting that additional breakthroughs will be necessary. Nevertheless, it's clear that increased computational power, combined with more efficient use of that power, will remain key factors in AI progress. Therefore, the strategy of scaling up AI models is likely to remain important. Thanks for the great video! 👍

  • @maximefournes9148
    @maximefournes9148 2 หลายเดือนก่อน +28

    I do not understand why people are getting more and more skeptical of scaling laws based on janky conceptual arguments when all the empirical results (for example sonnet3.5) show that they continue to hold. People don't seem to understand logarithms very well, including AI Explained, when they say "these results are not 4 times better". There are so many things wrong with this statement. If the model is 4 times bigger you should not expect "4 times better" results. And these benchmarks are all capped at 100% A better way to quantify the progress on a benchmark like this would be to look by how much the error rate has been divided.

    • @anonymes2884
      @anonymes2884 2 หลายเดือนก่อน +7

      To me the video is sceptical of the idea that scaling LLMs will achieve _AGI_ (as many AI leaders have been suggesting) and the improvements in benchmarks say nothing about that (I guess unless you subscribe to the - to me quite naive - position that AGI is just "reaching X% on Y different benchmarks", for some values of X and Y).
      I don't think many are claiming that LLMs have no use as tools but there's a big difference between that and general intelligence - if anything i'd say what the empirical results show is that a tool can score 60, 70 even 80%+ on various benchmarks and _still_ clearly _not_ be intelligent. So it's arguably starting to look more like an article of _faith_ that sufficiently scaled LLMs = AGI (or of course, just straight up self-delusion/cynical hype).

    • @Luigi-qt5dq
      @Luigi-qt5dq 2 หลายเดือนก่อน +7

      Exactly, 95% is 4 times better than 80%. I think at this point that AGI is not such an high bar if this is human's intelligence ahah. And almost nothing scales linearly in cost, like a Ferrari costs 5 times a standard car and it is only twice as fast. And it does still makes sense for some use cases

    • @josjos1847
      @josjos1847 2 หลายเดือนก่อน

      You're the one who said it

    • @chadwick3593
      @chadwick3593 2 หลายเดือนก่อน +1

      Based on the original scaling laws, I think we should expect about 50% error reduction for every 10x model size increase.

    • @mikebarnacle1469
      @mikebarnacle1469 2 หลายเดือนก่อน +4

      People are skeptical because S curves look like exponentials, and the benchmarks are not scientific in the slightest.

  • @daveinpublic
    @daveinpublic 2 หลายเดือนก่อน +1

    I like this take.
    Some TH-camrs and news sites are saying AI will continue at this exponential curve…
    Others are saying it will plateau hard…
    The truth is we don’t know. And we’ll find out very soon. But I like hearing both sides, and knowing that AI finally has a place in our world, and it’s no longer sci fi.

  • @lpls
    @lpls 2 หลายเดือนก่อน +2

    I love how you put all the references on the description.

  • @Raulikien
    @Raulikien 2 หลายเดือนก่อน +18

    The thing is, it doesn't really matter if the field slows down by 1, 2, 5 years... We are talking about the ultimate technology here. The fact that it will probably exists within our lifetimes is already an insane fact.

    • @SebastianLopez-nh1rr
      @SebastianLopez-nh1rr 2 หลายเดือนก่อน +3

      Speculation, not fact .

    • @hydrohasspoken6227
      @hydrohasspoken6227 2 หลายเดือนก่อน

      It won't

    • @christopherbelanger6612
      @christopherbelanger6612 2 หลายเดือนก่อน

      @@hydrohasspoken6227 That's a bold claim, and why not?

    • @hydrohasspoken6227
      @hydrohasspoken6227 2 หลายเดือนก่อน +1

      @@christopherbelanger6612 because that bold claim and expectation is mostly based on the hype created by content creators and CEOs with their "AGI soon" narrative.

    • @tgo007
      @tgo007 2 หลายเดือนก่อน

      @@christopherbelanger6612 Money. Tech can only develop through research aka money. In the short term, investors and businesses are happy to spend that money. Eventually if investors are not getting a return, then they stop investing. Everything is good now and Ai has helped do things faster and cut costs. But I think it's gonna hit the wall. We're already seeing it. Easy to go from 0 to 80. To go from 80 to 100 is very hard. USA spent 300 billion. Now from here on out. Each additionally 300 billion will make it 1% better. Then eventually 0.5% better. Then eventually 0.25% better.

  • @SirQuantization
    @SirQuantization 2 หลายเดือนก่อน +30

    When I see AI being shown to act strangely like at 3:47 I always wonder what the person did to prompt it. There's no way to tell if they prompted it with, "Start rambling like a crazy person" and then pretended to act shocked. It happens a lot (not always ofc)

    • @julkiewicz
      @julkiewicz 2 หลายเดือนก่อน +6

      Not really, it sounded unhinged at times in the OpenAI demos as well.

    • @anonymes2884
      @anonymes2884 2 หลายเดือนก่อน +2

      No offence intended but when I see sceptical responses to LLMs doing weird stuff I always wonder what LLMs that poster has been using. Because I see them spout nonsense pretty much every time I use one for more than a simple query or two (even without actually _trying_ to trip them up) - in fairness _usually_ "articulate nonsense", that might _sound_ plausible to someone without domain specific knowledge, but still nonsense.
      (so given the millions of people using them every day it's not at all surprising to me that every now and then an LLM will just spout _actual_ gibberish, like we might expect from someone who's high or having some form of mental health episode)

    • @ronnetgrazer362
      @ronnetgrazer362 2 หลายเดือนก่อน

      I saw another example from that leak/botched AB-test/glitch and there was talk of replies being completely unrelated to the prompt.

    • @johnnoren7244
      @johnnoren7244 2 หลายเดือนก่อน +3

      Fake videos/screenshots of AI acting strangely are unfortunately very common because they get many views. Whatever gets views gets faked.

  • @davidpark761
    @davidpark761 หลายเดือนก่อน +1

    i cannot eat, sleep, or breathe until you drop another video
    PLEASE!!!!!!!!!!!!!!!!!! I NEED MORE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

  • @awakstein
    @awakstein 2 หลายเดือนก่อน +3

    This is the Most underrated channel in TH-cam! I use Kling or Klin and is superb, the only issue is that only generates 5 seconds

  • @mickmickymick6927
    @mickmickymick6927 2 หลายเดือนก่อน +4

    It's wrong to call Claude or GPT4o 'free', they offer a limited version for free, GPT4o being more restricted but even with Claude every hour or two I hit a wall and have to stop using it. We wouldn't call test driving a car getting it for 'free' so it's a shame we swallow these companies' marketing on this one.

  • @lucnotenboom8370
    @lucnotenboom8370 2 หลายเดือนก่อน +32

    I don't want "undergraduate level" models that get the basic stuff wrong! I want primary schooler models that actually understand what they're doing!
    To put it differently, in university I took pride in learning differently from other students. Where many studied to pass the test, with the memorization and blind application of knowledge that comes with that mindset, I would learn to understand a topic so that it would become like an addition to my common sense, and then figure out the answers on the fly on the test.
    We should not be scaling mere processing of information, we should be scaling the comprehension of these models, and I believe that at the core, they're not trained for it. They happen to pick up some comprehension, but it's not their main focus.

    • @Josh-ks7co
      @Josh-ks7co 2 หลายเดือนก่อน +2

      Making gpt create a board and play tick Tak toe in a chat engine is a gimic to show random edge case limitations. I am not saying limitations don't exists just that's not a good example.

    • @orterves
      @orterves 2 หลายเดือนก่อน

      I think that's the sort of thing Bill Gates is referring to in the interview - meta-cognition capabilities

    • @lucnotenboom8370
      @lucnotenboom8370 2 หลายเดือนก่อน

      @@Josh-ks7co I mean, it gets basic math and logic wrong all the time, which points at there being no real reliable comprehension on which its utterances are based

    • @41-Haiku
      @41-Haiku 2 หลายเดือนก่อน

      @@lucnotenboom8370 Same as you and me, I guess.

    • @chinesesparrows
      @chinesesparrows 2 หลายเดือนก่อน

      I don't want an Altman i want a Primeagen

  • @AllisterVinris
    @AllisterVinris 2 หลายเดือนก่อน +24

    I think it's a whole. Scale alone won't do everything, but it'll help. Techniques and stuff alone won't fix all the problems; but it can go a long way. Together though ... That's where it's at.

    • @Ecthelion3918
      @Ecthelion3918 2 หลายเดือนก่อน +2

      Agreed

    • @GodbornNoven
      @GodbornNoven 2 หลายเดือนก่อน +2

      Absolutely

    • @Apjooz
      @Apjooz 2 หลายเดือนก่อน +1

      If there were easy tricks we would have found them already. So scale scale scale it is.

    • @AllisterVinris
      @AllisterVinris 2 หลายเดือนก่อน +1

      @@Apjooz I mean yeah, at the very least we can always scale up. But while there isn't any *easy* trick that we haven't found, there still might be more complex and potentially revolutionnary trick left to discover, you never know.

    • @user-fr2jc8xb9g
      @user-fr2jc8xb9g 2 หลายเดือนก่อน

      @@AllisterVinris In fact , to my knowledge , the simpler the trick , the harder it is to find , simple=/=easy

  • @omniopen
    @omniopen 2 หลายเดือนก่อน +1

    One thing I’ve noticed with the biggest LLMs is how having them come up with an answer and then code the process in Python drastically improves the numerical and analytical accuracy of their solution. Not entirely sure what’s going on there but I’ve had it easily convert handwritten numbers to digital values and then from those values create lists in Python and then perform data analysis with surprising consistency and high accuracy, However when you do not prompt it to answer the question in this manner the results it generates seem to be inconsistent and unreliable.

  • @dr-maybe
    @dr-maybe 2 หลายเดือนก่อน +2

    17:25 loved how Dario interrupts you talking about how these companies are pressing ahead even though shit is insanely dangerous

  • @comicipedia
    @comicipedia 2 หลายเดือนก่อน +4

    I think this is the first time I strongly disagree with one of your videos. I think you're falling into the trap of taking these amazing pieces of technology for granted having gotten used to them.
    I was absolutely blown away when I first used GPT 3.5. And in the year and a half since we've had gpt 4, Claude 3, gpt 4 turbo, gpt 4o and now Claude 3.5. Each better than the previous one.
    Models just keep getting smarter. Claude 3.5 is much much smarter than GPT 3.5 and it's only been just over a year and a half, scaling doesn't seem to have hit a wall yet.
    Sonnet 3.5 being 4 times bigger than 3.0 isn't much. Gpt 3 used 100 times the compute as 2 and gpt 4 used about 100 times as much as 3

    • @julkiewicz
      @julkiewicz 2 หลายเดือนก่อน

      It's definitely slowing down. It's marginally better sometimes worse in the tests that I performed.

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +2

      Thanks for your perspective. Perhaps a downside of me being so immersed in AI. Still incredible achievements for sure, but incremental upgrades now

    • @comicipedia
      @comicipedia 2 หลายเดือนก่อน +2

      @@aiexplained-official but is it an incrimental upgrade? Claude 3.5 is a huge leap ahead of GPT 3.5 in a year and a half. There was over 2 years between gpt 3 and 3.5. Yes there have been lots of models in between, but that's why the jump feels smaller. Things are actually still moving very fast. 5

    • @digitalspecter
      @digitalspecter 2 หลายเดือนก่อน +1

      @@comicipedia It is incremental. Yes, the models do stuff quite a bit better but the stumbling blocks have remained pretty much the same: hallucinations, math weakness, logic problems, creating something actually novel etc. Yes, they're much more usable now but there hasn't been any fundamental breakthroughs which is especially damning when contrasted with the constant hype.. promises that can't be redeemed without solving problems that nobody knows when or even if they will be solved.. this is getting pretty close to straight up lying.

    • @comicipedia
      @comicipedia 2 หลายเดือนก่อน

      @@digitalspecter ita only been a little over a year since GPT4 and Sonnet 3.5 is much better than the original GPT4 at both maths and reasoning. 3.5 is even quite a bit better than GPT4o on the Arc AGI challenge which can't be memorised and which is a test of reasoning. 4o only came out a couple of months ago.
      Things are progreasing much the same way as before, the issue is people have unrealistic expectations. We've made quite a lot of progress in the past year.

  • @scoops2016
    @scoops2016 2 หลายเดือนก่อน +3

    Thanks, succinct and infornative as always. I had been waiting patiently for my dose of AI Explained.

  • @Oliver_w8
    @Oliver_w8 2 หลายเดือนก่อน +25

    I've always thought it was nonsense to try to say that one of these models is like a "smart highschooler" or an "undergraduate" because the models always have sophisticated databases containing very advanced material-for instance, you could even ask GPT-3 questions about PDEs or measure theory and it would reproduce accurate definitions from textbooks that are presumably in its training data. But the basic reasoning hallucinations, sheer amount of data required, and slow progression indicate very clearly that the 'mental faculty' of an LLM is so dissimilar to that of a human there is a certain sense in which a newborn child is leagues "smarter" than even the most advanced models.

    • @vaevictis3612
      @vaevictis3612 2 หลายเดือนก่อน +3

      The problem about this approach is that we *do not know* how the human brain "stores" and "reproduces" data. GPT-3 or whatever, does not simply enter some textbook database and copies text from there. The data is not simply "stored" inside the model even if it can almost seemingly reproduce it word to word. When the human expert remembers something, it also does not simply "read" from the imaginary textbook cheatsheet. In this regard, human brain is simply lagging behind the modern LLMs in terms of ability to store
      eproduce data.
      However I do agree that the full mental cognitive algorithm (whatever it is in either human brain and a silicon analogue) in the current iteration of LLMs is far from ideal. Consider, that when the human expert tries to remember something, they can see that there are different possible answers to the question. Human then tries to carefully traverse through these possible answers, to choose one that is most self-consistent and that contains the most amount of possible options. But the LLMs try to shortcut through the shortest possible answer that could be perceived as correct. As Andrej Karpathy once said (I think?), LLMs are not as much knowledge machines as imagination machines. They are more "concerned" about the act of answering than of the answer itself.

    • @GodbornNoven
      @GodbornNoven 2 หลายเดือนก่อน

      There is no formal definition for intelligence. A new born toddler is not smarter than the SOTA LLMs.

    • @SeventhSolar
      @SeventhSolar 2 หลายเดือนก่อน

      I find hallucinations at least to be very human. Not only are hallucinations of a symptom of many mental injuries and diseases, healthy people hallucinate all the time. It's well-known that memories can be completely fabricated by your own brain to fill in gaps.

    • @andersberg756
      @andersberg756 2 หลายเดือนก่อน

      Yeah we shouldn't try to understand llm:s so much in human terms, but by their strengths and weaknesses on its own. Ppl err a lot with this, getting disappointed: "how can it write so we'll but still lie"?
      Like learning about dogs in order to use them as tools, eg in police work.

    • @mgscheue
      @mgscheue 2 หลายเดือนก่อน

      Agreed. It’s not a meaningful measure. Francois Chollett discusses this in detail on Sean Carroll’s Mindscape podcast.

  • @dsmogor
    @dsmogor 2 หลายเดือนก่อน +2

    The point is that the whole power generative LLM is more of an discovery rather than invention. Transformers were created for automated translations and all the generative capabilities are just an observed side effect that accompanied the scale of training that transformer made possible. Nobody including chief scientist at OpenAI predicted what would gpt 3 be capable of compared to gpt 2 so all the current promisses are at best just wild guesses at worst sheemes to keep the share price up.

  • @Pizzarrow
    @Pizzarrow 2 หลายเดือนก่อน +40

    Judging simply by the rate and tone of your recent uploads, we all need to accept, the pace of AI progress is slower than we might have thought.

    • @M1ntt806
      @M1ntt806 2 หลายเดือนก่อน +6

      I'm so glad that he made this video and is open and honest about his own scepticisms regarding the recent developments/ lack thereof and the hype around them.

    • @2CSST2
      @2CSST2 2 หลายเดือนก่อน +14

      I personally disagree, not necessarily with the possibility that AI progress is slowing but with what seems like a conclusion people are making about it right now, including in this video.
      I especially don't get how a lot of people right now seem like they think they have a good handle on the limitations of scaling, when it's not something so easily predictable.
      As far as I'm concerned, there's not any concrete grounds for opinions to change on that point anymore than GPT4, since we haven't actually seen a truly new scale since GPT4.
      That's what will determine it, not guessing about how less or more bullish Suleyman or any other tech leaders are now compared to before, or looking at the incremental improvement of Claude Sonnet 3.5 compared to previous models.
      None of that is solid evidence, let's wait the same time gap and increase in scale that happened between GPT3 and GPT4 before getting all that hasty and drastic in claiming what scaling can and can't do. Anything before then is mostly conjecture.

    • @aisle_of_view
      @aisle_of_view 2 หลายเดือนก่อน +2

      That's good, I want a job for a few more years.

    • @Tomjones12345
      @Tomjones12345 2 หลายเดือนก่อน +1

      ​@@2CSST2you mention it but seem to be dismissing it. It being Claude 3.5 vs 3. 4 times the training data, but no where close to 4 times improvement. Of course we are going to see diminishing returns by simply throwing more data at the problem. You can argue we don't know yet, but maybe not definitive evidence, but Claude version comparisons suggest we might no longer see big leaps with just more data.

    • @Apjooz
      @Apjooz 2 หลายเดือนก่อน +1

      I just love people who talk like they sprung into existence 12 months ago. Unless you are not a human...

  • @GabrielVeda
    @GabrielVeda 2 หลายเดือนก่อน +2

    This just goes to show how effective repetition is as a tool of persuasion. Gary Marcus has saturated X with his anti-LLM, anti-scale rhetoric and it is clear he is beginning to gain traction on the masses. Scale + better pre-training data has a *lot* still to give. That doesn’t mean we have to rely on scale alone, but dismissing scale at this early stage is just foolish and wrong.

  • @superfeel1275
    @superfeel1275 2 หลายเดือนก่อน +1

    I think if we were able to scale extremely big (like 1000x the hardware and maybe 300-500x the traning data) and that we used 1-character tokens for tokenization, we could achieve an AGI. My reasoning is that at some point, to reduce loss, the model has to "predict" how our world works for the text to make sense. For example, if you put a book on a table and move the table, intuitively the book moves as well. But I doubt there's a specific passage out there that describes this. So the model will see situations where the book is displaced when a room gets thrashed for example, and intuit that to reduce loss, we now have to dedicate some weights to that concept. Of course this is the step after memorization isn't good enough and would probably require tons of scale.
    THOUGH, naive scaling is too unrealistic and expensive. You'd probably want mechanism that highlight the "weights" that encode for different reasonings, or how to extract reasoning from training data and not just memorize (which Anthropic has made huge efforts towards funnily enough). Alos, 1 character tokens would solve the issues with like not being able to reverse a word, find words that end with specific suffixes and would help generalize "word patterns" in general

  • @MatthewKelley-mq4ce
    @MatthewKelley-mq4ce 2 หลายเดือนก่อน +4

    Sonnet is helping me quite a lot.

  • @kylewollman2239
    @kylewollman2239 2 หลายเดือนก่อน +16

    I would guess that OpenAI previewed their advanced voice model when they did to try and take attention away from Google's event the next day, knowing full well that it wasn't going to be released to the public for months. When tech companies start saying that revolutionary things are coming at the end of this year or just a couple of years away, it means they have nothing and are just hyping. That's the most valuable lesson I learned from Elon Musk.

  • @stevefox7469
    @stevefox7469 2 หลายเดือนก่อน +8

    Best AI channel by far. One of the few channels that casts a critical scientific eye over the hype.

  • @Jack0trades
    @Jack0trades 2 หลายเดือนก่อน +1

    It looks to me like the recent rapid growth in AI performance, based on massive amounts of training data, is limited by that data. Instead of an exponential rise over the course of this epoch, we get more of a sigmoid - rising rapidly, then settling into an asymptotic approach dictated by the limits of that data. We will likely find better ways to extrapolate beyond those limits, but that will require some fundamentally new techniques.

  • @novantha1
    @novantha1 2 หลายเดือนก่อน +1

    I can't shake this sneaking suspicion that we're overdue some form of paradigm shift which will be deceptively simple once unlocked, very analogous to how something like a Transformer architecture feels self evident nowadays.
    I think there's basically three areas it could happen:
    Autoregression. Current models autoregressively predict tokens, but that's not really how people work. We can non vocally and implicitly reason about things before answering, and produce a "world simulation", getting feedback from that simulation before answering. There is kind of a "layer" between the tokens we predict and the answer we give. Some people have tried to bridge that with things like scratchpads, which helped, but I wonder if there's not a more fundamental shift there. Perhaps some sort of latent or implicit linear regression which processes data before doing the autoregression. Or, maybe it's something simpler. Maybe instead of predicting the "next" token, we just need an output token embedding that lets the model choose where to put the token, or when to overwrite an existing token.
    Training dynamics. We still use gradient descent to this day, but it heavily limits the architectures we can train, and the ways we can train them. Something like a recurrent Transformer, or an architecture which is to a Boltzmann network what Transformers were to a feedforward network, or a spiking neural network, or something to that effect might be part of the answer. It might be that there's a training dynamic which allows a model to backpropogate its insights from inference rather than the inferred tokens (for instance, the ability to produce a long chain of reasoning and then to backpropogate the insight gained from that chain, rather than the chain itself), or perhaps we need something more simple; it could just be that we need a raw model and an adapter model. The raw model processes information as a raw completion engine, similar to a base model (non-instruct), with an adapter which converts that reasoning to instruction following, and the model does continual learning by adjusting weights of the completions component, while only high quality instruction following data goes to the instruct component. I'm not sure, but I think something that allows a model to reason about an undefined or open ended problem and backpropogate that information could be the key we needed. Imagine being able to tell a model "solve this math problem" without necessarily having the answer ahead of time. It would heavily change the way we could train models.
    Data patterns. I'm still not totally sold on this because I don't completely understand it, but I think the authors of "Human-like systematic generalization through a meta-learning neural network" were onto something, but I don't know exactly what they're on to. Regardless, I think the slough of papers on grokking and the one I listed here note something very interesting when taken as a package; LLMs can reason and generalize, but just feeding it more data from the internet doesn't produce full generalization of all concepts contained within it, and we might require different types of (presumably synthetic) data, which are easiest to predict with generalization over memorization. I don't claim to fully understand the mechanisms involved or the shape the data would take, but I think this is not an unreasonable supposition.

  • @rowanmoore284
    @rowanmoore284 หลายเดือนก่อน +1

    Looking forward for the next video, a lots happened over the last few weeks

  • @DanielSeacrest
    @DanielSeacrest 2 หลายเดือนก่อน +1

    4x the compute doesn't equal to 4x the improvement. It doesn't get 4x the score on the MMLU lol, but we can still see the correlations between compute scaling and performance on set benchmarks and I don't think this correlation has been decreasing. We know atleast some of the ramifications of scaling, we can reliably predict the MMLU score of models given specific compute scaling as an example.
    But another important thing is effective compute. This number takes into consideration improvement (i.e. better data quality), algorithmic efficiencies, and raw compute scales (explaining it for anyone who doesn't know). Now we don't necessarily have this information on hand from Antrhopic but raw compute scaling is obviously not the only way to scale.
    And I don't think Claude 3.5 Sonnet got larger, since the cost didn't change and it actually got faster I believe, so it was likely just trained on a lot more data.
    But if anything the compute scaling we are talking about here is still just GPT-4 level. I do not believe it went far ahead of the compute that went into GPT-4, and we can see this in benchmarks and performance (similar reasoning flaws to GPT-4). It is 4x the compute over Claude 3 Sonnet getting to about GPT-4 scales. And it is kind of disappointing we haven't seen any kind of significant scaling over GPT-4 class at all yet, but with Claude 3.5 Opus (probably 4x compute over 3 Opus), Gemini 1.5 Ultra and GPT-4.5 (probably 10x compute over GPT-4) on the horizon I feel like this is going to (hopefully) change soon. Although it isn't that surprising because from what I recall Anthropic said they wouldn't push the frontier, and I believe they haven't. A Claude 3.5 Opus release would have definitely pushed the frontier, but Claude 3.5 Sonnet is more or less GPT-4 level compute / class with likely better post training techniques.

  • @wisdomking8305
    @wisdomking8305 2 หลายเดือนก่อน +2

    Less than 4 hours ago, Philip released an AI course and yes, I have watched all the modules in full, completed all the 9 quizzes and have already tested the model in a dozen way.

  • @spacexfanclub6529
    @spacexfanclub6529 2 หลายเดือนก่อน +2

    I have been watching , learning , sometimes in horror & sometimes in amazement , all of the videos you upload on this channel & just want to say this that -You are a wonderful human being doing extremely helpful work for the human kind by cutting through all the clutter & hype & delivering truly authentic one stop solution place for non AI people to keep genuine track of reality as AI continues to advance. Much Much love from India !!

  • @michaelwoodby5261
    @michaelwoodby5261 2 หลายเดือนก่อน +1

    I suspect it's a bit of scaling but the huge increases in efficiency are going to make it reliable.
    If you've got a model that's right 92% of the time, but runs at a quarter the cost, you can just run it a couple times and see if each answer is in agreement. If it decides there's an issue, it can then do that a couple more times until one answer gets a majority of the votes.
    A simple request may only have to run twice, a complex one many times, but it could scale automatically. It could also be scaled manually from the customer (how much would you like to spend on quality checks?) without new tech, beyond an editor model or a think step by step logic model where needed.

  • @williamjmccartan8879
    @williamjmccartan8879 2 หลายเดือนก่อน +1

    Always great to catch up with you Phillip, thank you for sharing your time and work, I thought Jeffrey Hinton was talking about using digital processes to increase the ability of ai, because it looks a little like we're running out of places for the current iterations expand into, both physically, and power consumption limitations. Have a great night and be safe brother, peace

  • @oimrqs1691
    @oimrqs1691 2 หลายเดือนก่อน +14

    Is the whole video based on a unproven assumption that Claude 3.5 Sonnet was trained on 4x more data than 3.0 Sonnet? Weirdly skeptical video, wasn’t expecting that.

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +14

      It's more I am getting tired of the hype from the leaders. Distracting from the great models

    • @JeffBuckleyFanboy
      @JeffBuckleyFanboy 2 หลายเดือนก่อน +2

      @@aiexplained-officialSo is it proven that Sonnet 3.5 was trained on 4X the data?

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +3

      Nothing can be proven without insider knowledge but they did a safety post saying they are currently testing a frontier model (likely 3.5 Opus) which has 4x compute its predecessor so limited evidence that Sonnet 3.5 may have a similar multiple.

    • @JeffBuckleyFanboy
      @JeffBuckleyFanboy 2 หลายเดือนก่อน +4

      @@aiexplained-officialIt could be that Sonnet is simply a checkpoint of the larger model they will be releasing later this year.

    • @41-Haiku
      @41-Haiku 2 หลายเดือนก่อน +3

      @@JeffBuckleyFanboy This seems likely to me. That could mean 3.5 Sonnet was trained on twice as much compute as 3.0 Sonnet, which perfectly comports with the increase in performance. The model performs almost exactly twice as well on each benchmark (in terms of halving failure rates).
      If I'm correct about where Claude 3.5 Sonnet lies on the scaling hypothesis curves, it looks to me like it perfectly matches up with expectations. The scaling laws have held from the word go and we should expect them to continue to hold.

  • @jamesyoungerdds7901
    @jamesyoungerdds7901 2 หลายเดือนก่อน +2

    Another gem, thanks Philip! Couldn’t help but think - couldn’t an agentic flow or strategy boost things like you math problem results? Layering in checkers, supervisors, verifiers, etc before the output, that might have really levelled up your example?

  • @felipefairbanks
    @felipefairbanks 2 หลายเดือนก่อน +1

    it is always a pleasure to see your videos. always to the point, always backing up everything you say as best as possible... I always end up feeling smarter in the end.

  • @rosscads
    @rosscads 2 หลายเดือนก่อน +1

    Phillip, love how you stick to the facts and aren't afraid to challenge the prevailing wisdom. Your balanced takes are a breath of fresh air among AI TH-camrs. Keep telling it like it is!

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน

      Thanks Ross, it's tough as you get flak from whichever side of the day you are pissing off but worth it

  • @mattshelley6541
    @mattshelley6541 2 หลายเดือนก่อน +3

    Entirely agree, people need to stop deifying these tech leaders. As he points out at the end, he has no real world experience in biology, yet his claims are being retweeted.

  • @applejuice5635
    @applejuice5635 2 หลายเดือนก่อน +6

    Been 12 days since your last video, but it feels like an eternity. Missed your anxiety inducing videos big dawg.

  • @Michael-ul7kv
    @Michael-ul7kv 2 หลายเดือนก่อน +2

    the training is more expensive but they're actually cheaper to run

  • @danagosh
    @danagosh 2 หลายเดือนก่อน +2

    It may be hype because of the extreme claims everyone is making but I also don't think they are outlandish. This technology is increasing exponentially. Exponentials are hard to fully appreciate sometimes. As Dario Amodei said at the end, if chips keep improving over the next few years and companies keep throwing money at this, the AI systems could reach another patch of emergent capabilities. And with the number of people now working on the problem, if you throw in another Transformer-like breakthrough, they could become very capable almost overnight.
    I would rather have society start talking about and preparing for a world in which we could reach powerful AI (maybe not AGI but near it) in the next few years then have us all be blindsided if it happens. Imo, people need to start taking it very seriously rather than saying it is just a bubble that will lead to nothing. In fact, I think all people working on alignment would love if this tech plateaus for awhile so more research can be done and also for society to adjust. An extra 5 years would be great, please.

  • @jeancharles6378
    @jeancharles6378 2 หลายเดือนก่อน

    Scale most probably is not the way to go. But I like the Q* approach in which LLMs are learning "to talk" by those correct paradigms that leads to solve a problem correctly. This mix of sentence completion and feedbacks on the solution of the problem, seems capable of embedding into this systems the "grammar" that they need to "speak" correctly.

  • @7TheWhiteWolf
    @7TheWhiteWolf 2 หลายเดือนก่อน +6

    Regardless of what happens, I’m excited for whatever is coming next!

  • @absiddi.7712
    @absiddi.7712 2 หลายเดือนก่อน +8

    The LLM world is long overdue for a shift away from transformers towards an architecture similar to our own. Anything else will simply create tools, not agents.

  • @aiforculture
    @aiforculture 2 หลายเดือนก่อน +1

    Great video as always. Although I have made the switch to using Claude 3.5 as my main LLM now, its hallucinations do feel far more frequent than with GPT4o - especially when taking stats out of PDFs or working with data across multiple documents.

  • @KayButtonJay
    @KayButtonJay 2 หลายเดือนก่อน +14

    The limitations are always going to be the training set. Anything not in training will not be queryable post-training. Additionally, the transformer / attention architecture will not be capable of AGI-like reasoning. It requires better architectures

  • @Slayer666th
    @Slayer666th 2 หลายเดือนก่อน +1

    I wonder if anyone has investigated how the progress of those AIs is if you combine them.
    Like give Claude, GPT, and Llama the same task, after that lets them check each others answer for errors and combine those results to get the most correct one.
    That step alone would probably decrease the errors a ton.
    Dont know what research says about it tho.

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +1

      This is SmartGPT 2.0, which I put on Patreon! Works nicely for many tasks

  • @ReflectionOcean
    @ReflectionOcean 2 หลายเดือนก่อน +2

    By "YouSum Live"
    00:00:00 Advancements in AI video generation
    00:01:20 AI models trained on minimal video data
    00:02:30 Scale's impact on AI model accuracy
    00:03:32 Challenges in AI reasoning and understanding
    00:09:04 Future potential of AI models
    00:13:10 Caution against blind trust in AI advancements
    00:16:31 Uncertainty in AI's future capabilities
    By "YouSum Live"

  • @robkline6809
    @robkline6809 2 หลายเดือนก่อน +1

    Thanks, as usual, Philip for the wonderful video. Bill Gates makes a good point about the need for improved metacognition in AI systems, but he says nothing about improving *human* metacognition. With prompting and metaprompting we have some cognitive scaffolding, but mostly the talk is about the failures of genAI - not our failure to challenge our own "thinking about thinking" and the skillful expression of those thoughts.
    I wrestle daily with expressing and manipulating complex issues, and with Claude Sonnet 3.5 my experience is a significant qualitative improvement over Chat GPT 4o; more nuanced in expression, more capable of going off on conversational sideroads without losing the thread. And with their workbench (with prompt generation and evaluation of test cases) Anthropic has provided a dynamic duo. This combination blows my mind every day. Just focusing on scaling and the inadequacies of genAI overlooks our own inadequacies in interacting with these emerging intelligences. Metacognition needs improvement - both for AI *and* humans.

  • @techgiantt
    @techgiantt 2 หลายเดือนก่อน +1

    When Llama 3 came out I thought some of these people would understand that scaling is not the solution 🤦🏼‍♂ Most of them also mistake knowledge with reasoning and application of knowledge. There's obviously a reason why even though computers are more reliable and consistent than humans, these LLMs find it difficult to be consistent when exposed to large data (e.g the Claude example you showed).

  • @bambabam1234
    @bambabam1234 2 หลายเดือนก่อน +1

    What do you think about the recent MatMul paper for computing LLMs without Matrix Multiplications?

  • @funginimp
    @funginimp 2 หลายเดือนก่อน

    In my opinion so long as synthetic data works well it's a sign there are algorithmic inefficiencies still to address. I also believe we have barely scaled up inference time compute and I think the most obvious way to solve hard problems like cancer research is by inference time brute force. I imagine a time in the future where we classify a problem's complexity class by flops.

  • @jvlbme
    @jvlbme 2 หลายเดือนก่อน +1

    I would think scaling up still has a lot of merit, but one cannot simply increase parameters without also changing the algorithm landscape. I think it will follow 'naturally' that for models to _make_ _sense_ of increased data amounts they will need to create better, different, concept models _anyway_ , and thus reasoning will have to improve regardless. Now whether the models can improve _themselves_ simply as a function of larger scale, or if some fundamental change will have to be implemented in programming, in architecture or in other approaches, is left to be seen.

  • @tautalogical
    @tautalogical 2 หลายเดือนก่อน +1

    I don't think it's over hyped. If you have kids you will know that reason is something that emerges over time and quite slowly. The world is screaming at us, through so many different signals that intelligence is a natural product of certain kinds of system at scale. There might be a few missing tricks, but the bitter lesson is correct, we can blast through without those tricks.

  • @shawnvandever3917
    @shawnvandever3917 2 หลายเดือนก่อน +1

    There is no doubt more than scale is needed. However scale is important it seems to organize world models much better. Someone with a low IQ versus High IQ is not architectures, it is better efficiency in structure and organization of mental maps. So I see how scale can do a ton to make things better. I still believe we need continuous learning, the ability to make many predictions and the ability to update mental models on the fly.

  • @Noraf83
    @Noraf83 2 หลายเดือนก่อน +1

    It's time for AI companies to publish ARC scores with new frontier model releases.

  • @peersvensson9253
    @peersvensson9253 2 หลายเดือนก่อน +4

    Speaking as someone working in research, these tech bros don't have a real understanding of how scientific progress happens and seem to believe in the "lone genius" trope popularised by movies and TV shows. CRISPR, as an example, was not discovered through some exercise of brute force intellect, it was discovered by people doing actual experimentation. Similarly, one of the main problems in modern physics is the fact that we don't have enough experimental guidance for the development of new theories, and less so that people aren't smart enough to come up with new theories.

    • @Frostbiker
      @Frostbiker 2 หลายเดือนก่อน

      The vast majority of the "tech bros" researching AI have, unsurprisingly, a background in academic research. They know how research is performed because it's all they have ever done. My criticism from working with those guys would be that they don't have adequate business experience.

  • @chrisanderson7820
    @chrisanderson7820 2 หลายเดือนก่อน +11

    I find AI as a whole concept to be highly variable, I've seen it do stuff that is literally at PhD level and of massive use to people and businesses, then act like a mushroom. Sadly its hard to tell when it's going to be one or the other. Maybe in the short term we need to identify exacting and narrow use cases where we KNOW it will act like a PhD and not try to just use it for everything.

    • @anonymes2884
      @anonymes2884 2 หลายเดือนก่อน +5

      Exactly. If it could _reliably_ deliver even "smart undergrad" level performance that would be a major advance IMO. Right now it vacillates between post-doc and nursery school.

    • @mikebarnacle1469
      @mikebarnacle1469 2 หลายเดือนก่อน +1

      These analogies don't really make sense. You could make the exact same observation about a calculator from the 80s. Those also really alternated because PhD and nursery school level intelligence. Because that's what a specialized tool looks like. But the calculator isn't anywhere close to being more than a calculator. The only difference now is because we don't know how learned networks work which should be even less confidence inspiring but because humans love to believe in magic they see it the other way and overestimate trajectory. Scaling improves capabilities because there is more memorization. Humans don't work that way, we don't just memorize more and get smarter. Memory and intelligence are separate things for us.

    • @chrisanderson7820
      @chrisanderson7820 2 หลายเดือนก่อน +1

      @@mikebarnacle1469 Your syntax is a bit scrambled so I am not entirely sure of the direction you are going. When I say PhD I mean asking the LLM to diagnose complex, rare medical problems by asking you, the user, questions to formulate its diagnosis, or passing the bar exam, a calculator from the 80s cannot do medical diagnoses or pass human exams. Yet at the same time it tells you to use glue on your pizza. It shows LLMs have massive knowledge model pattern recognition skills and zero common sense, they aren't sufficiently self-examining or completely referential in order to give reliable answers.

    • @mikebarnacle1469
      @mikebarnacle1469 2 หลายเดือนก่อน

      ​@@chrisanderson7820 The point is that calculators from the 80s can do basic arithmetic, faster, and more accurately, than any PhD mathematician. Yet they perform as well or below nursery school kids when tasked with writing a formal proof. All specialized tools have this property, and it means nothing, this is to show that the original analogy between human education levels and LLM capabilities is a meaningless observation, and not surprising, it's expected for a specialized tool. It's only if you drink kool-aid that you are surprised when the clever hans and eliza perception biases fail to meet practical real world expectations.

  • @jonnyspratt3098
    @jonnyspratt3098 2 หลายเดือนก่อน +9

    "Agentic models won't be possible until they are 2 orders of magnitude greater" :D
    "... so another 2 years" D:

  • @not_a_human_being
    @not_a_human_being 2 หลายเดือนก่อน +1

    I think we're overestimating humans... Those "noble laureates" aren't some separate breed of human beings, we praise them, we put them high on our pecking order - that's that. Gost in the shell has answered that question long ago. Question is not when it'll be as smart as us, but when are we going to admit that we aren't as smart as we thought.

  • @reza2kn
    @reza2kn 2 หลายเดือนก่อน +2

    I love this man exactly as much as I don't trust Sam Altman.

  • @lucifermorningstar4595
    @lucifermorningstar4595 2 หลายเดือนก่อน +1

    Scaling works but we need novel architectures than can leverage the creativity and generalization inherent of Transformers with reason, memory and self data interpretation and connection

  • @elitegamer3693
    @elitegamer3693 2 หลายเดือนก่อน +2

    I think current models are quite intelligent but not undergrad level as they become incoherent within small time and lots of hallucinations. We need more like big architecture and algorithm breakthroughs to solve current roadblocks than raw scaling.

    • @aisle_of_view
      @aisle_of_view 2 หลายเดือนก่อน +1

      I know a lot of undergrads who hallucinated once or twice. A lot.

  • @danielhenderson7050
    @danielhenderson7050 2 หลายเดือนก่อน +1

    Where is the gpt 4o voice clip from?

  • @JaredFarrer
    @JaredFarrer 2 หลายเดือนก่อน

    I will admit when I saw him surfing in a suit drinking a beer holding that flag… my respect level for zuck went way up

  • @joshuacook9376
    @joshuacook9376 2 หลายเดือนก่อน +4

    I think the transformer architecture is holding back progress in spatial reasoning. While it's technically possible to represent visual data as a sequence of tokens, only the time dimension is actually sequential.

  • @Ecthelion3918
    @Ecthelion3918 2 หลายเดือนก่อน +3

    I don't think the hype has gone too far personally. Technology will continue to improve, and from what I'm seeing it's only ramping up

    • @JosefTorkelsen
      @JosefTorkelsen 2 หลายเดือนก่อน

      Companies like Apple are dumbifying AI and will mislead people on the importance of AI. AI now can start to solve medical problems equal to or better than some humans and can breakthrough on research. People thinking it is just a Q&A or emoji generator will put the wrong perception on the value of these tools. I use it as employee replacements as we downsize and so far we lost 4 people and haven’t seen a productivity loss due to AI.

    • @hydrohasspoken6227
      @hydrohasspoken6227 2 หลายเดือนก่อน

      Everybody was talking about AGI "soon".

    • @imperson7005
      @imperson7005 2 หลายเดือนก่อน

      ​@@hydrohasspoken6227I think the problem is the masses want definitions to come from world leaders. Basically, no one has any real definition of AGI, AI, or how soon it might happen. For me, AGI "soon" means replacing human labor within 5-10 years.
      It's different for everyone because it's still so early

  • @HorizonIn-Finite
    @HorizonIn-Finite 2 หลายเดือนก่อน +1

    7:35
    Actually the models pass, if you tell them it’s not a trick question or riddle. And yes I tested the shortened framer riddle on a coworker and they started thinking of the whole riddle.
    The moment I said it’s not a trick, they said, “once, right?”

  • @demeurecorentin
    @demeurecorentin 2 หลายเดือนก่อน +1

    Thank you for the video, I enjoy them

  • @XOPOIIIO
    @XOPOIIIO 2 หลายเดือนก่อน +1

    You can scale to the dead end as long as you wish, but the real progress is impossible without new algorithmic breakthroughs. We didn't see nothing like transformers since 2017.

  • @choltha
    @choltha 2 หลายเดือนก่อน +2

    AI usefulness is overhyped on short timeframe (6 months) , underhyped on long perspective (2+) years.
    Maybe if the scaling hype cools down a bit we can integrate the vast amount of research results on other areas which would unlock now dimensions of capability, which are not related to just scale.
    If we don't catch up with this AI integration side , we might get into the weird situation where we have like a car with a 1000 kw motor (GPT-4) but we drive like we are on an ice like surface (chance of hallucinations, not able to quicky adapt to new situations, etc.) and can only go really slow as a result, not putting the power to good use. Now if we put on spikes on the tires (good integration, as mentioned before), there might be sudden jerk (jump in end-result-capabilities) that catches most people off guard.

  • @betabob2000
    @betabob2000 2 หลายเดือนก่อน +2

    Thanks!

  • @shotx333
    @shotx333 2 หลายเดือนก่อน +3

    Man so slow is this because of posting on patreon sooner?
    anyway, thanks.

  • @josh0n
    @josh0n 2 หลายเดือนก่อน +1

    Thank you. Until new architectures/algorithms are created many of the real advances will be in design around validation of the output, making it easy for people to check the workings, sources and conclusions of LLMs.

  • @chongshaohong2969
    @chongshaohong2969 2 หลายเดือนก่อน +1

    Do you have any comments or plans to do a video on the recent news of Perplexity plagiarizing articles and ignoring robots.txt? Also on Mustafa Suleyman's recent comments on CNBC?

  • @Shunarjuna
    @Shunarjuna 2 หลายเดือนก่อน +1

    It’s difficult to say whether it’s hype or expectations that are getting out of hand. Maybe both, but definitely one.

  • @MichaelRicksAherne
    @MichaelRicksAherne 2 หลายเดือนก่อน +1

    I have some doubts about the reasoning advancing as fast as they predict.

  • @cupotko
    @cupotko 2 หลายเดือนก่อน

    Thanks for the usually-awesome video! It raises a key question for me: have researchers already explored all the low-hanging fruit on the path to ASI/AGI? It seems we’re still in the early stages of understanding the impact of scaling, so it’s premature to despair. Many cognitive tests still evaluate models on their first attempt, similar to “system 1 thinking.” Even smart people would struggle to make the right move in chess or tic-tac-toe in 5 seconds if the board was described in Morse code or Braille.
    Focusing on LLMs’ weak performance in games like tic-tac-toe: could the issue be the conditions under which they generate responses? Models generate tokens at a speed comparable to human thinking, lacking the time to thoroughly “redraw” the game board for optimal evaluation. This limitation affects their ability to perform detailed reasoning and problem-solving, similar to humans.
    The same applies to implementing agent functions and using LLMs in scenarios requiring millisecond response times, like robotics and autonomous driving. If LLM-based systems had the luxury of a 1000-shot approach, filtering out dead ends and suggesting corrections with a separate expert discriminator network, we might see AGI/ASI-level performance in tasks beyond chess, Go, poker, and protein folding.
    Why isn’t this obvious research direction more explored? Have researchers already deemed it a dead end? Also, I’m surprised that ASIC accelerators, which can significantly speed up LLM inference, are only now being discussed by startups, not by companies like NVIDIA. I’d love to hear the channel author’s thoughts on this!

    • @aiexplained-official
      @aiexplained-official  2 หลายเดือนก่อน +3

      I agree on most points! Etched for example, coming into the game. See my last video!

  • @daves1412
    @daves1412 2 หลายเดือนก่อน

    Quite a bit of hype rn. We are heading towards the chasm phase I’m guessing.
    Like any tool it has its purpose, but I cannot see it replacing humanity for a very, very long while, personally.
    These things are more like precocious toddlers useful for task acceleration.
    Meta cognition is interesting. My guess, and it’s only that, is it can be solved but that this will be really hard.
    In the meantime a multi agent architecture will be used to get as good as possible prior to human validation.

  • @francescoromano8370
    @francescoromano8370 2 หลายเดือนก่อน

    People who doubt the scaling benefits need to read openai’s deep double descent paper that was released before gpt4. it basically is the reason why they think scale is all you need and in the years since its been published, it has lots of evidence supporting it.

  • @erwile
    @erwile 2 หลายเดือนก่อน

    I think it's all about generalization. But, because we don't have access to the training data, it's hard to know how far from the training set we are when we talk to the AI.
    With basically the internet in the training data, it looks more and more like a highly fancy search engine with lots of language skills, like it knows how to combine concepts, rephrase, use your prompt to guide its response, but it can't really reason. It can partially reason, but mainly if it's resembling its training set, which is really huge.
    Maybe they will try to create a dataset that force the AI to generalize. :
    - Like training it to deduct from something that seems different (in its vector space) but really is the same (A=B so B=A)
    - And in the other way, some concept that seems really close to each other for it, (the famous riddle that is easy to solve vs the one that is hard) but is not the same.
    - And also, training it to think, by, for example, making it do its own mistakes and automating the logical correction with code generation/math in the training process (you can't reason if you have only positive example, and you can't learn well if you don't make your own mistake).

  • @ulob
    @ulob 2 หลายเดือนก่อน

    Solving ARC using Luma there at the end? Sure, once it is scaled 1000000000000000000000x, it will certainly work!