Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World?

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ธ.ค. 2023
  • Phi-2 is a tiny model that could fit on a phone, but it outperforms huge language models like Llama 2. I explain more about how it was made and what it means. Then we see Imagen-2, the most stunning text-to-image yet, at least according to Google' images. We then glimpse Optimus 2, smaller in the sense that it's 10kg lighter! With more degrees of freedom, it's movements look a lot more humanoid. And then the full launch of AI Insiders, plus a recap of why we shouldn't use the MMLU to 2 decimal places!
    / aiexplained
    phi2 now on HuggingFace: huggingface.co/microsoft/phi-2
    Bubeck Video, min 19: • Textbooks Are All You ...
    Phi 2: www.microsoft.com/en-us/resea...
    Shital Shah: / 1734882570603753814
    Shoggoth: / 1702488701782352097
    Mamba 3B: www.together.ai/blog/mamba-3b...
    Phi 1.5B: arxiv.org/abs/2309.05463
    Phi 1: arxiv.org/abs/2306.11644
    Microsoft Prompting: www.microsoft.com/en-us/resea...
    SmartGPT Video: • SmartGPT: Major Benchm...
    The Information: www.theinformation.com/articl...
    Imagen 2: / 1734954295655534780
    deepmind.google/technologies/...
    / 1734763060244386074
    Greg Technology: / 1734544659953623509
    Swyx: www.latent.space/
    AI Engineer: youtube.com/@aiDotEngineer?si...
    Shawn Wang: x.com/swyx?s=09
    / aiexplained Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 432

  • @SebastienBubeck
    @SebastienBubeck 5 หลายเดือนก่อน +24

    Yet another amazing video! I really enjoyed your critical take on benchmarks like MMLU, this is much needed.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +7

      Thanks so much Sebastien, Phi-2 is an incredible model - have been testing it for many hours - congratulations to you and the team! And yes, am looking forward to new benchmarking standards for 2024. Thank you again for speaking yesterday.

  • @Diabloto96
    @Diabloto96 5 หลายเดือนก่อน +218

    Philip doing public work by fact-checking the MMLU WHILE creating all this content?? Impressive work, you're one-of-a-kind in the AI vulgarization field, congrats!

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +24

      Thanks Diabloto, I am very LLM-curious

    • @gabrote42
      @gabrote42 5 หลายเดือนก่อน +4

      ​@@aiexplained-official major credits!!! Hope you sent a link to this to all those companies!

    • @sumanthbalaji1768
      @sumanthbalaji1768 5 หลายเดือนก่อน +1

      ​@@aiexplained-officialhey this MMLU flaws are crazy, could you share the doc of inaccuracies for others to go through?

    • @skierpage
      @skierpage 5 หลายเดือนก่อน +1

      ​@@sumanthbalaji1768I found a Medium post from August, "Errors in the MMLU: The Deep Learning Benchmark is Wrong Surprisingly Often," but that seems independent work by a Daniel Erenrich.

    • @sumanthbalaji1768
      @sumanthbalaji1768 5 หลายเดือนก่อน +1

      @@skierpage yes I went through that blog too, doesn't have this document of errors

  • @Megneous
    @Megneous 5 หลายเดือนก่อน +110

    You honestly need to publish a paper on the errors in the MMLU. This needs to be seen by academia.

    • @KP-sg9fm
      @KP-sg9fm 5 หลายเดือนก่อน +8

      100%

    • @maxm1555
      @maxm1555 5 หลายเดือนก่อน +1

      No paper needed, they should watch this video and immediately build a new test from the ground up!

    • @StevenAkinyemi
      @StevenAkinyemi 5 หลายเดือนก่อน +1

      They know lol

    • @onoff5604
      @onoff5604 5 หลายเดือนก่อน +1

      please please publish, but please prepare to be attacked for your honesty

  • @raphaelsoltero8805
    @raphaelsoltero8805 5 หลายเดือนก่อน +98

    I feel as though it is slightly ironic that the Ai's intelligence was held back not by their own way of learning, but by our inaccurate datasets.

    • @KibberShuriq
      @KibberShuriq 5 หลายเดือนก่อน +11

      It makes a lot of sense though. We tried to make it equally good at predicting experts AS WELL as predicting average Joes AND raging lunatics. Of course that task is much harder than just predicting experts.

  • @rantmarket
    @rantmarket 5 หลายเดือนก่อน +23

    I still can't believe the MMLU isn't being called out by people, at least. It's been so long since you found those problems, that I won't accept that people don't know about the issue enough to have it thrown out by every benchmark set using it.
    Thank you again for your great work. Cheers.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +6

      Thanks rant. I thought so to and then up it pops with Gemini, front and centre

    • @skierpage
      @skierpage 5 หลายเดือนก่อน +1

      ​@@aiexplained-official what did the authors of "Measuring Massive Multitask Language Understanding," Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt, say when you contacted them?

  • @DaveShap
    @DaveShap 5 หลายเดือนก่อน +16

    Increase efficiency!

  • @L1AM
    @L1AM 5 หลายเดือนก่อน +94

    Well, at this rate this time next year we'll have a locally runnable AGI.

    • @Feel_theagi
      @Feel_theagi 5 หลายเดือนก่อน +2

      I'm more excited about how much better the largest cloud ones will be

    • @Boufonamong
      @Boufonamong 5 หลายเดือนก่อน +8

      Imagine that 😂, I'm calling mine hal

    • @Karearearea
      @Karearearea 5 หลายเดือนก่อน +2

      5 years from now we probably will

    • @aN0nyMas
      @aN0nyMas 5 หลายเดือนก่อน +3

      ​@@BoufonamongI'm calling mine Meg. Short for Megatron.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +31

      AG-phi?

  • @ClayFarrisNaff
    @ClayFarrisNaff 5 หลายเดือนก่อน +5

    I love that you're an informed AI enthusiast, yet you're not afraid to criticize -- and to do so forcefully -- where you see the need. It's a mark of integrity.

  • @kylemorris5338
    @kylemorris5338 5 หลายเดือนก่อน +15

    Having seen your previous work on the MMLU that graph that declared a .06 PERCENT breakthrough made me burst out laughing.
    We need an MMLU 2 or something to that effect yesterday, and I'm starting to suspect the only reason we don't have it yet is that nobody wants their fancy benchmark numbers to go down, even if they would be more accurate.
    Re: Phi-2, I am happy to see that synthetic data is getting more love, as opposed to the models that just use mass scrapes of any data that isn't tied down properly.

  • @randfur
    @randfur 5 หลายเดือนก่อน +35

    Thanks for looking into the benchmark data, they were too opaque up until now. Whenever a model scores impressively on one we should dig into it to know whether it really means it's good at X subject or if it's just good at making the same mistakes.

  • @consultantnigel-projectman7274
    @consultantnigel-projectman7274 5 หลายเดือนก่อน +6

    As a new Patreon member, I'm here to tell you how amazing AI Insiders is. Phillip's research is impeccable. The Insider info is priceless. Those of you who make your living with AI - do yourself a favor & budget the $30 each month to support Phillip. Everyone will eventually be making their living with AI; if not today, very soon. You will need quality, authoritative information upon which you can make important decisions. AI Insiders will provide you with AI news that is second to none. If you have not already, join. Completely worth the money.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      Such high praise, thank you so much. If you like what's there, in 2024 you will be even more impressed!

  • @H1kari_1
    @H1kari_1 5 หลายเดือนก่อน +68

    The big big issue most people are currently overseeing is that all those benchmarks are in english. The data is in english. The models are heavily optimized for english. GPT3.5 and GPT4? Speaks about any language it has gotten some data for and also provides excellent results for tasks in those languages.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +25

      Great point

    • @twosaibackbot
      @twosaibackbot 5 หลายเดือนก่อน

      Yeah I am Swedish and will be truly scared of an automated workforce when these LLM:s speak and understand smaller more local languages fluently. GPT-4 is decent at it but not yet good enough for professional use

    • @jokmenen_
      @jokmenen_ 5 หลายเดือนก่อน

      Very true. I haven't seen a model with less than 70b params yet that really impressed me with its performance in my language

    • @ryzikx
      @ryzikx 5 หลายเดือนก่อน +2

      though that is a very big problem i'd argue the larger problem is 'poisoned' models basically trained to tackle the benchmarks rather than being actual general-purpose models

    • @KyriosHeptagrammaton
      @KyriosHeptagrammaton 5 หลายเดือนก่อน +1

      Given that multi-modality seemed to boost performance I wonder if multilingual models would also be boosted.

  • @CalConrad
    @CalConrad 5 หลายเดือนก่อน +3

    For the record, you have the best thumbnails in the game.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +2

      Aw thanks Cal, often get criticised for them and hundreds of offers to pay for thumbnail services but I love them too. Minimalist.

  • @swyxTV
    @swyxTV 5 หลายเดือนก่อน +2

    thanks for having me as your first Insiders speaker Philip!

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      And thank you so much swyx. It was a great talk and laughed at the intro!

  • @skippy6086
    @skippy6086 5 หลายเดือนก่อน +10

    The pace of the race toward the edge of cliff quickens

    • @MachineLearningZuu
      @MachineLearningZuu 5 หลายเดือนก่อน +1

      Gemini Nano hit the punch line 🥊

    • @GrindThisGame
      @GrindThisGame 5 หลายเดือนก่อน +1

      Time to fly.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +3

      Hmm, but synthetic data is good for safety no?

    • @Igor_lvanov
      @Igor_lvanov 5 หลายเดือนก่อน

      @@aiexplained-official Maybe this model won't be a Shoggoth, but there are a lot of ways things may got wrong. E.g. because we will get extremely powerful systems without proper defence mechanisms against misuse, or things like instrumental convergence.

  • @alphahurricane7957
    @alphahurricane7957 5 หลายเดือนก่อน +14

    i think that smaller models giving out 100% accurate information to a general, bigger AI capable of understanding and finding anomalies in the process, or be critical of the result is the real AGI
    i saw today "teslabot 2", im very much interested in seeing AI and robotics in everyday life
    a lot of insights as always, thanks!

    • @MCA0090
      @MCA0090 5 หลายเดือนก่อน +1

      Maybe the way to go is finding ways to make models smaller and more efficient to the point that they could run on local devices instead of big datacenters rellying on clound and internet connection and higher latencies (Cloud would never work to make robots work properly)... Yesterday I was reading about liquid neural networks and how they can do the work with just a few neurons, it seems promising to shrink really large NNs into much smaller and faster ones especially for vision, videos, images and audio/speech recognition, for robotics LNNs can handle vision better than current neural networks and run fast even on small devices such as Raspberry Pi because it needs just a few neurons to do the same task as a really big NN based on other architectures do. LNN are very small and have plasticity to adapt to new situations without needing a new training process.

  • @skippersthepenguin3591
    @skippersthepenguin3591 5 หลายเดือนก่อน +48

    They should make Phi-3 a 7B model. If Phi-2 is a quarter of it then increasing by double should make it even better, and 7B models are runnable on 90% of computer hardware.

    • @berkertaskiran
      @berkertaskiran 5 หลายเดือนก่อน +14

      Their priority is probably phones.

    • @boris6237
      @boris6237 5 หลายเดือนก่อน +3

      yeah, i think it's especially important for decent models to be able to run on low-end phones so that LLM access isn't restricted to the first world @@berkertaskiran

    • @noone-ld7pt
      @noone-ld7pt 5 หลายเดือนก่อน +6

      @@boris6237 Oh wow, that's an incredibly important argument, had not thought about it like that and I really appreciate you sharing that perspective!

    • @QH96
      @QH96 5 หลายเดือนก่อน +1

      Don't quote me, but a 7 billion model would probably use about 6 GB of ram

    • @carkawalakhatulistiwa
      @carkawalakhatulistiwa 5 หลายเดือนก่อน +1

      ​@@QH96and iPhone 15 pro max only have 8gb ram . And IOS sistem aredy use 2gb of ram

  • @_ptoni_
    @_ptoni_ 5 หลายเดือนก่อน +8

    I was impressed by the phi-2 code perf

  • @BTFranklin
    @BTFranklin 5 หลายเดือนก่อน +10

    Is there any effort to actually correct the MMLU? If not, why not? What would be required to get these corrected? I feel that this is a serious problem and it's disturbing that the MMLU is continuing to be used without correction.

  • @heinkle1
    @heinkle1 5 หลายเดือนก่อน +1

    I’ll be honest, I stopped watching your videos for a while because they caused me too much anxiety - but when I then look at some of the other things going on in the world, it is actually comforting to hear about AI. Congrats on your meteoric growth in 2023.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      Thanks for coming back heinkle! Great to have you here

  • @harnageaa
    @harnageaa 5 หลายเดือนก่อน +14

    TL;DR If the data set trained for these 'gpts' would actually be accurate, we'd have even more impressive models overall. So not even changing the training method, just the data you can get way better models

    • @skierpage
      @skierpage 5 หลายเดือนก่อน

      I wonder if a large model with a big context window would be able to spot inconsistencies and mistakes in training data. I saw a documentary where a AI presented with logical inconsistencies went into a garbled "Does not compute" loop and eventually caught fire, so maybe it's too dangerous!

    • @harnageaa
      @harnageaa 5 หลายเดือนก่อน

      Idk, how can you determine if something is right or wrong if you learn the wrong thing
      in the dataset. I think the best would be "smaller models" used by a bigger model.Where the smaller models are used to detect inconsistencies within the dataset.
      You train the small models with 100% accurate data and you teach them to spot right/wrong answers, and that's their sole purpose, and they will find every mistake in any dataset. So a model for math one for chemistry one for biology,etc. Then the bigger model could access through api these mini models and get the results from them and recreate pdfs with "correct dataset".
      I think it's safer that way, when you have a big models it's harder to "control" and know what he actually knows. And to make a model that have perfect data for code,math,physics,etc. It's basically the final product we want, but to obtain that we need to curate the data we have, and fastest way to do that is a smaller model. Then once all data is curated, we use that for a bigger model. I spammed q_q oops. u get the point.
      @@skierpage

    • @skierpage
      @skierpage 5 หลายเดือนก่อน

      @@harnageaa symbolic AI tried to develop AI by teaching systems only the right answers, and it's utterly failed to keep up with transformers. One of the great things about LLMs is they can handle inconsistency and exceptions: "Water is wet" (ice), "Palestine is a state" (disputed), "An asteroid killed the dinosaurs" (generally accepted), etc. Learning everything includes ingesting megabytes of the "wrong" things; again, I want to know if an LLM can be aware of discrepancies while or after it trains.

  • @jawadur_
    @jawadur_ 5 หลายเดือนก่อน +1

    The most value delivered per minute on TH-cam

  • @pablolucianogomezdemayorag4060
    @pablolucianogomezdemayorag4060 5 หลายเดือนก่อน +22

    Amazing as always! Whish regular media was half as good at divulging complicated topics, this channel is gold

  • @3dVisualist
    @3dVisualist 5 หลายเดือนก่อน +41

    With AI Insider, you really are creating a lot of content. I do hope it turns out you were AI all along!

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +24

      Haha not quite, a hardworking human!

    • @iamcoreymcclain
      @iamcoreymcclain 5 หลายเดือนก่อน +1

      @@aiexplained-officialthe way you pronounced “Imagen” made me question if this was an AI voice as
      well lol but I think you’ve left enough small clues to prove your humanity 😂

    • @SBImNotWritingMyNameHere
      @SBImNotWritingMyNameHere 5 หลายเดือนก่อน +1

      @@aiexplained-official thats what you think

    • @3dVisualist
      @3dVisualist 5 หลายเดือนก่อน

      @@aiexplained-official certainly hardworking! Thanks for all your explainers, they really help stay on top of the fast moving world of AI.

  • @Olack87
    @Olack87 5 หลายเดือนก่อน +24

    Amazing video, as always. Have you contacted any of the people in the field about the erroneous benchmarks? Do we know if anyone is working on it to create new ones or fix them? I can't believe they don't know or care about it but the problem is still there it seems.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +23

      Yes, and people are. There are better benchmarks coming out all the time, hence my surprise at this MMLU d-measuring

  • @baychaoz
    @baychaoz 5 หลายเดือนก่อน +1

    7:06 such a legend

  • @educated_guesst
    @educated_guesst 5 หลายเดือนก่อน +1

    Hi Philip
    just wanted to say thank you for still pumpin out so many videos despite your patreon contend probably also being a ton of work
    Thank you so much for keeping us informed!

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      Haha, no thank you for supporting on Insiders. It's what keeps the main channel going!

  • @schemage2210
    @schemage2210 5 หลายเดือนก่อน +9

    We have all seen the Boston Dynamics robots doing incredible things, but the scripting and trail and error involved to make those incredible videos is insane. And lets not forgot that the Atlas robot is huge. Are we actually meant to believe that Musk's Optimus robot is "as described"? AI powered, and physically capable of all the actions its shown doing?

    • @whiterottenrabbit
      @whiterottenrabbit 5 หลายเดือนก่อน

      This time next year

    • @McDonaldsCalifornia
      @McDonaldsCalifornia 5 หลายเดือนก่อน

      Anything Musk hypes up should be taken with a laaarge grain of sand

  • @MindFieldMusic
    @MindFieldMusic 5 หลายเดือนก่อน +2

    Billy Madison to the MMLU, "I choose: Business Ethics." 😉

  • @Dylan-zg2jl
    @Dylan-zg2jl 5 หลายเดือนก่อน +1

    As usual, a fascinating video with revealing insights that are seldom if ever found anywhere else. Great job mate and look forward to more

  • @DreamOfFlying
    @DreamOfFlying 5 หลายเดือนก่อน +2

    I absolutely love your videos! They deserve each and every view and like!

  • @onoff5604
    @onoff5604 5 หลายเดือนก่อน +1

    Thank you so much for investigating problems with testing.

  • @Just4Games2011
    @Just4Games2011 5 หลายเดือนก่อน +5

    Great video, but why not mention Mixtral? Are you still experimenting with it?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +4

      First I think phi2 is more significant but also to cover it properly would be a lot more work, there's only so much time in the day!

    • @Just4Games2011
      @Just4Games2011 5 หลายเดือนก่อน

      @@aiexplained-official Fair point, can't wait to see your video on it.

  • @eburgwedel
    @eburgwedel 5 หลายเดือนก่อน +1

    Can’t wait to see Mixtral in the mix!

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing 5 หลายเดือนก่อน +1

    Sounds great for general chat conversation

  • @stephenrodwell
    @stephenrodwell 5 หลายเดือนก่อน +1

    Thanks! Fantastic content, as always. 🙏🏼

  • @sharkeys
    @sharkeys 5 หลายเดือนก่อน +2

    You know they are flexing their ability when they show hands :D

  • @HonestyLies
    @HonestyLies 5 หลายเดือนก่อน +1

    great vid as always, strapping in for next year's craziness

  • @ryzikx
    @ryzikx 5 หลายเดือนก่อน +1

    always a good day when phillip ai uploads

  • @freek633
    @freek633 5 หลายเดือนก่อน +2

    Phi-2 is a tiny model that could fit on a phone, but it outputs huge language models like Llama 2. (from the caption) outputs should be outperforms!

  • @MrSchweppes
    @MrSchweppes 5 หลายเดือนก่อน +1

    As always great video! Very informative! Many thanks to you!

  • @k225
    @k225 4 หลายเดือนก่อน +1

    AIs are experiencing the real world of academic exams. I remember several times in college where textbooks were wrong, exam questions were ambiguous, or we were told to give outdated or blatantly wrong answers to pass tests and get good grades.

  • @miker99
    @miker99 5 หลายเดือนก่อน +1

    when will they learn? rubbish in rubbish out. Thanks for all your efforts to bring awareness to this issue of testing quality.

  • @beaumac
    @beaumac 5 หลายเดือนก่อน +8

    AGI coming to a mobile device near you in 2024 thanks to synthetic data. Is there any safety checking done on this data?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +3

      Well it's synthetic so shouldn't be as bad but I was still surprised that there was toxicity at all, maybe I shouldn't be

  • @GrindThisGame
    @GrindThisGame 5 หลายเดือนก่อน +4

    Better data, better models...makes sense.

  • @aaronnewman2
    @aaronnewman2 5 หลายเดือนก่อน +3

    You are beautiful sir. Thanks as always.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +2

      Wow thanks Aaron, that's cheered my spirits

  • @Q1w1N
    @Q1w1N 5 หลายเดือนก่อน +1

    I don't know what's more concerning, the fact that those models did so good at flawed test, or that they might be much more capable than we think.

  • @Y3llowMustang
    @Y3llowMustang 5 หลายเดือนก่อน +1

    Wow that was surprisingly sudden end to the video

  • @doctormaddix2143
    @doctormaddix2143 5 หลายเดือนก่อน

    Can’t appreciate your work enough! Thank you.❤

  • @jamesatotago
    @jamesatotago 5 หลายเดือนก่อน +8

    Great video again! Please do a video on synthetic data. I get that this will likely decrease toxicity but what else will it do. If, for example, Microsoft is building the synthetic data, does that mean that we are training the AI on Microsoft’s view of the world? One can imagine how this could be influenced by all sorts of commercial imperatives. Will synthetic data make models more and more similar to one another and perhaps less interesting?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      I don't think less interesting if you ensure diversity - see original phi1 vid

  • @CrueMusic
    @CrueMusic 5 หลายเดือนก่อน +1

    Thank you! I hope you dont reduce the ammount of great content here on your channel. Its invalueable.

  • @nacho7872
    @nacho7872 5 หลายเดือนก่อน +3

    Amazing video as usual, thanks for the fast update

  • @JohnLeMayDragon
    @JohnLeMayDragon 5 หลายเดือนก่อน +1

    Thanks for another informative video!

  • @williamjmccartan8879
    @williamjmccartan8879 5 หลายเดือนก่อน +1

    Thank you for sharing your time and work Phillip, I responded to one of your tweets, by asking if you knew what is going on over at liquid ai, the new year is fine and by the looks of it your going to really busy, but if you get a chance I'm curious as that's where Joscha Bach is working at right now. Merry Christmas to you and your family and all the other family helping you with this great work, and a Happy New Year.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      Merry Christmas Bill, will check it out, cool name at the very least

  • @user-pf9jv1fl2n
    @user-pf9jv1fl2n 5 หลายเดือนก่อน +12

    Great video just one question
    Do you feel the AGI?

    • @ekstrajohn
      @ekstrajohn 5 หลายเดือนก่อน +2

      The others think you should no longer be on the board. It's not my decision, really.

  • @youssefanajar4061
    @youssefanajar4061 5 หลายเดือนก่อน +1

    Best yt channel

  • @matusstiller4219
    @matusstiller4219 5 หลายเดือนก่อน +1

    Great video, like always.

  • @muhammedkoroglu6544
    @muhammedkoroglu6544 5 หลายเดือนก่อน +1

    Amazing content! Don’t get how you don’t have a million subs

  • @jessedbrown1980
    @jessedbrown1980 5 หลายเดือนก่อน +1

    Jesus crist. So many implications from this, and will result in massive improvements. Thank you so much for pointing this out as it will really slap AI into hyper drive.

  • @tomaszkarwik6357
    @tomaszkarwik6357 5 หลายเดือนก่อน +4

    if this was SDXL. i'd give the image a 9/10 the problems are:
    -the eyes ( they are not pointing in the same direction)
    -the ear (it is just weird)
    -the lighting is wrong (the leafs are lit from behind the subject and the subject is lit from the front)
    -her whole right side is a bit wonky
    - the 1 strand of hair in the back is weird. 7:33
    PS if you want to see the best SDXL models, use the ones over at cvitai and not the stablity ai's (the 1.0 model is still the best you can get from there). Just pick the "JuggernautXL" or " DreamshaperXL" as they are SotA for XL.
    PSPS Other then the part about imagen-2 this was a very good video. Love your dedication to the craft of making ai news without the hype.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      Thanks tomas, your professional eye caught much more than me, apologies

    • @tomaszkarwik6357
      @tomaszkarwik6357 5 หลายเดือนก่อน

      @@aiexplained-official i ain't a proffesional, but i use SD. these things are just what you train your eye for

    • @maciejbala477
      @maciejbala477 5 หลายเดือนก่อน

      really? Dreamshaper is SotA? I knew about Juggernaut but i remembered dreamshaper's earlier non-SDXL versions as kinda worse than some alternatives. Will have to try it out
      WyvernMix was another that impressed me

    • @tomaszkarwik6357
      @tomaszkarwik6357 5 หลายเดือนก่อน

      @@maciejbala477 the XL version is at least close to the sota for trubo
      Or at least it was late last week

  • @covle9180
    @covle9180 5 หลายเดือนก่อน +2

    Small models ftw! If I cant run it on my phone or self-host it (without really expensive GPUs) then 90% or use cases just don't work.
    Models are flaky enough as they are. Add to that the unreliability of some companies' APIs, we need self hosted solutions we can fine tune. (Not to mention privacy issues)

  • @onoff5604
    @onoff5604 5 หลายเดือนก่อน +1

    Great video!! many thanks. On the topic of generated images of human faces: Look at the shirt collar (and ears and ear-rings if you can see them), instant give-away. The face is phenomenal...but textile manufacturing is apparently a harder problem.

  • @clearpupil
    @clearpupil 5 หลายเดือนก่อน +2

    This explains why I did so badly in my medical exams. The college has all the wrong answers :-)

  • @user-hk8jt6so3l
    @user-hk8jt6so3l 5 หลายเดือนก่อน +1

    I can not thank you enough! I will definitely support you on patren when my finances allow it! THANK YOU FOR GUIDING US THROUGH ALL OF THIS, YOU ARE THE BEST!❤

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      Thanks so much, no worries on Patreon, your kindness here is enough!

  • @patronspatron7681
    @patronspatron7681 5 หลายเดือนก่อน +3

    Me thinks the Phi models were named after you. :-)

  • @yw1971
    @yw1971 5 หลายเดือนก่อน +1

    I think if we can find a formula, no matter how long & complex, that can be the 'Engine' for such a training, it will change the field.

  • @cjgoeson
    @cjgoeson 5 หลายเดือนก่อน +1

    0:00 “You my have thor”

  • @lorenzoblz799
    @lorenzoblz799 5 หลายเดือนก่อน +1

    It would be interesting to take a few LLMs and ask them to evaluate the questions: are they clear, are they ambiguous, do they make sense?

  • @atom1496
    @atom1496 5 หลายเดือนก่อน +1

    For the benchmark, it is common to include wrong or ambiguous questions so catch training leakage. It should not be possible to get a 100%.

  • @davidbutler9323
    @davidbutler9323 5 หลายเดือนก่อน +2

    By this time next year, I expect to see a continuous stream of AI Explained content generated by Phillip-2 or I'll be really disappointed.

  • @dcgamer1027
    @dcgamer1027 5 หลายเดือนก่อน +2

    Apprciate the updates as always, I was wanting to look more into the MMLU since you mentioned people still using it and thought I'd go back and watch your video on it, but it's not in the description, might be a good idea to put it there since you played a bunch of it at the end here. I assume I'm not the only one that might want to look more at that part.
    Anyways ty and have a good day :)
    edit: also just a thought, has anyone compiled an exhaustive list of the issues in the MMLU test? And if so does anyone have a link to that list?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      Hey dc, thought I put it in there somewhere! Can search mmlu broken benchmark too. And no, to the best of my knowledge this channel has shown the biggest repository of mistakes in that benchmark

  • @mugwortofz3lb1za
    @mugwortofz3lb1za 5 หลายเดือนก่อน +1

    Always the best videos!! Have you considered making a patreon tier where some of the funds go towards a google colabs backend for the patreons to use, depending on their subscription amount & time?? Given how little resources were used training Phi-2, it could be a good idea to let people experiment with the concepts shown in your videos, as well as more exotic variations in model architecture such as cyclic attention heads, sub networks etc..

  • @YoussefMohamed-er6zy
    @YoussefMohamed-er6zy 5 หลายเดือนก่อน +1

    Finally a new video!!!🎉🎉🎉

  • @KP-sg9fm
    @KP-sg9fm 5 หลายเดือนก่อน +1

    TOP FRICKEN NOTCH MY FRIEND, THANK YOU!!!

  • @zonas7915
    @zonas7915 5 หลายเดือนก่อน +1

    Welcome HAL9000 and Skynet !

  • @onil2301
    @onil2301 5 หลายเดือนก่อน

    Is there a way to access the document you've compiled of the errors that you found in the MMLU Benchmark? i would like to source it for my bachelor thesis, if that's possible.

  • @Veileihi
    @Veileihi 5 หลายเดือนก่อน +1

    Feels like we're a part of those vaguely touched upon histories in AI movies 😅

  • @kevinli3767
    @kevinli3767 5 หลายเดือนก่อน +1

    I'll ask the question that everyone's curious about - how are you able to 1) access, 2) digest, 3) synthesize, and 4) produce so productively???

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      Will do a video on that someday! And don't forget the hours of content (researched, narrated and edited by me) for AI Insiders at the same time, plus sourcing and conducting interviews! And comment replying!

    • @kevinli3767
      @kevinli3767 5 หลายเดือนก่อน

      AGI must be helping you with the details :D @@aiexplained-official

  • @ok373737
    @ok373737 5 หลายเดือนก่อน +1

    Brilliant!

  • @tomski2671
    @tomski2671 5 หลายเดือนก่อน +1

    By my estimate it cost about $70k to train. However the real cost is preparing the data.

  • @carterellsworth7844
    @carterellsworth7844 5 หลายเดือนก่อน +7

    Is it rational to say that if Google and OpenAI are using the MMLU benchmarks in this way without acknowledging the benchmarks problems that they are behaving too naively to deserve public trust to try and solve the alignment problem?
    It's so blatant once you point it out that I find it very disturbing no one else talks about it

    • @skierpage
      @skierpage 5 หลายเดือนก่อน

      The two issues seem unrelated. The numbers game to two decimal digits is stupid when the benchmarks are 1% flawed, and training to the test when the test is bad may degrade models' real-world abilities, but what does that have to do with alignment?

  • @DavidsKanal
    @DavidsKanal 5 หลายเดือนก่อน +2

    Hey Philip, dunno if it's me watching this at 6am but this video felt a little fast and stressful. Do you think you could integrate a 1 to 2-second pause before switching to a new topic to give us time to digest the information?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      Thanks for the feedback David, will bear it in mind!

    • @be2eo502
      @be2eo502 5 หลายเดือนก่อน +1

      Agreed. We poor biological intelligences need a longer pause between concepts to integrate the information.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      It's like we need a smidgen of noise between the signal

  • @anywallsocket
    @anywallsocket 5 หลายเดือนก่อน +1

    Soon we’ll have to get the LLMs to not only generate the next update’s training data, but to prove to us the labels are correct, because otherwise we are limited by what we think we know.

  • @bobtivnan
    @bobtivnan 5 หลายเดือนก่อน +2

    Tesla robot walking like "I just sharted"

  • @AICodingAdventures
    @AICodingAdventures 5 หลายเดือนก่อน +2

    Awesome video! You did a great job exposing MMLU and how shady it is. I agree that people should stop trusting it as a measure of capabilities. What about MoE and Mistral?

  • @UncleJoeLITE
    @UncleJoeLITE 5 หลายเดือนก่อน +1

    I'll speak only to what I know. That project sounds amazing, I wish I was into VC, I'd buy in! Tbh, most GenX weren't taught ANY entrepreneurship if we went the corporate/govt career. I'm sure you have even bigger plans, depending on what sticks. _Putting decimal places on data with ~? confidence intervals is how we manipulate ofc._

  • @humunu
    @humunu 5 หลายเดือนก่อน +1

    MMLU...WTF? (Merry Christmas/Happy Holidays)

  • @supertetleman
    @supertetleman 5 หลายเดือนก่อน +1

    Just wait until Jan 8th. No meetings on the last week of December or first week of Jan. All the AI researchers have extra time to work on their pet projects and keep the compute famrs running over the holiday; I expect to see some interesting results, it's always the most productive time of year.

  • @lhomme_bepis
    @lhomme_bepis 5 หลายเดือนก่อน +1

    Could you add timeline sections to your videos? I'd like to see an outline of what topics exactly are being covered at a quick glance

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      When I add timestamps YT doesn’t automatically segment the video, wondering what I am missing

  • @haileycollet4147
    @haileycollet4147 5 หลายเดือนก่อน +3

    Please make a cleaned (remove or fix questions) MMLU bench as a PR to Eleuthear's evaluation benchmark :)

    • @haileycollet4147
      @haileycollet4147 5 หลายเดือนก่อน

      Some fixes better than none...

    • @Houshalter
      @Houshalter 5 หลายเดือนก่อน

      You can't just change a benchmark that is already widely used. It would create confusion when different models are tested at different times. And produce results that aren't comparable to each other. It needs to be a new benchmark like "MMLU 2"

    • @haileycollet4147
      @haileycollet4147 5 หลายเดือนก่อน

      @@Houshalter I mean, arguably it's pretty worthless in its current state ... I suppose it could be its own bench, or v2 or 1.5 or whatever, but seems better to fix it somewhere than to just say it's bad, since it's gonna get used anyway...

  • @zockermarlon5183
    @zockermarlon5183 5 หลายเดือนก่อน +1

    comment for the algo. keep up the great videos :D

  • @Dron008
    @Dron008 5 หลายเดือนก่อน +1

    I wonder what you think about Mixtral 8x7b model? And what about new MMMU (multimodal) benchmark? Is it good enough?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      MMMU is so much better yes. And Mixtral I am still playing about with, and investigating with the help of experts. The future looks bright.

    • @Dron008
      @Dron008 5 หลายเดือนก่อน

      @@aiexplained-officialI tried it on deepinfra site, looks good for such a small model.

  • @mukulishwar2737
    @mukulishwar2737 5 หลายเดือนก่อน +1

    Can you also talk about the newly released Mixtral 8x7b?

  • @HakuCell
    @HakuCell 5 หลายเดือนก่อน +1

    will you also make youtube shorts for those who don't have time for all the details?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      Maybe one day! Do you find the videos too long?

  • @KolTregaskes
    @KolTregaskes 5 หลายเดือนก่อน +1

    4:30 Not many people are talking about the flaws in these benchmarks, e.g. MMLU. Perhaps we need another video on this?

    • @KolTregaskes
      @KolTregaskes 5 หลายเดือนก่อน

      I've read, heard and watched a lot of content for Gemini and very few mentioned any issues with MMLU.
      For once I think a more clickbaity title is needed.

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน

      Haha, more so than the original 'Broken Benchmark Smartgpt' one!

    • @KolTregaskes
      @KolTregaskes 5 หลายเดือนก่อน

      @@aiexplained-official Hehe, indeed. Perhaps it needs spelling out more, including words like "MMLU" and not "SmartGPT". BTW, how is SmartGPT going?

    • @aiexplained-official
      @aiexplained-official  5 หลายเดือนก่อน +1

      @@KolTregaskes more to come on that front in 2024...:)

  • @michaelnemo4593
    @michaelnemo4593 5 หลายเดือนก่อน +1

    7:33 The ear looks weird, the system messed up the earrings. Even so the system is getting better by the day.

  • @Woodchuckization
    @Woodchuckization 5 หลายเดือนก่อน +1

    Is it time for Philip to create a bench marking test for AI systems himself?

  • @Rkcuddles
    @Rkcuddles 5 หลายเดือนก่อน

    16:46 “type of question that depends which source you ask” I didn’t catch this point. Anyone can elaborate?

  • @BradleyZS
    @BradleyZS 5 หลายเดือนก่อน +3

    The errors with the MMLU makes me think a good test for AI should have trick questions - questions without actual answers or lacking the appropriate option - to test the AIs ability to recognise when it doesn't know or can't find the answer.

    • @skierpage
      @skierpage 5 หลายเดือนก่อน

      I think the video showed that GPT-4 would give a better answer than any of the garbled multiple choice answers. I think you could engineer a different test-taking prompt where you prompt the AI to pick the best multiple choice answer but also point out when there's a problem with the Q&A.
      One problem is these technical reports are drowning in a sea of benchmark numbers, so I'm sure the person cranking out all the scores to two decimal digits has no time for nuance or evaluation.

    • @BradleyZS
      @BradleyZS 5 หลายเดือนก่อน

      @@skierpage
      While it is useful to let it answer freely, in terms of serving people AI should be able to work within constraints. Otherwise it will likely become just an advertising tool, always telling you to buy the industry tool to get the job done.
      In an example specefic to me, I do a lot of python programming on my phone and ChatGPT often gives coding examples for libraries that don't don't work on my phone. So it's handy if we can give it a constraint - asking it to solve the coding problem with a specefic library - since we may want the best solution we can do right now than the theoretical perfect solution.

    • @skierpage
      @skierpage 5 หลายเดือนก่อน

      @@BradleyZS Make up your mind. Do you want to constrain the AI to answer a multiple choice question, or point out that it's flawed? What should the AI do in response to a sleazy lawyer: "Have you stopped beating your wife? Answer yes or no!"

    • @BradleyZS
      @BradleyZS 5 หลายเดือนก่อน +1

      @@skierpage
      The ideal would be if the AI could recognise the intent of such questions. That it could understand that a leading question is intended to ascribe undue guilt to it, or that a trick test question exists to test the AI's ability to react to an impossible task.
      Such an ability, I believe, is crucial for the progression of AI beyond the simple LLM. An AI should be able to understand the desire of the user, and in the context whether it should give the best answer or admit the inability to answer.

  • @Huhujadu
    @Huhujadu 5 หลายเดือนก่อน +1

    They gotta do something about this holy its unnaceptable