Grok-2 Actually Out, But What If It Were 10,000x the Size?

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ก.ย. 2024

ความคิดเห็น • 401

  • @noobicorn_gamer
    @noobicorn_gamer 22 วันที่ผ่านมา +411

    My man casually includes potentially demonetizing images that other AI channels were afraid of including like it's just another Thursday AI video. You are unmatched in AI TH-cam content uploads. Been a fan since the beginning and we all appreciate your passion towards it. Kudos.

    • @WillyJunior
      @WillyJunior 22 วันที่ผ่านมา +7

      Which images?

    • @CarlosMendezVientos
      @CarlosMendezVientos 22 วันที่ผ่านมา +9

      ​@@WillyJuniorI think he's talking about SpongeBob and Mickey Mouse.

    • @ryzikx
      @ryzikx 22 วันที่ผ่านมา +6

      matt berman includes mpreg elon musk😂

    • @YouLoveMrFriendly
      @YouLoveMrFriendly 21 วันที่ผ่านมา +3

      He's virtue signaling by vilifying Trump. It's silly and sad.

    • @jan.tichavsky
      @jan.tichavsky 21 วันที่ผ่านมา

      ​@@YouLoveMrFriendly lol you snowflakes get offended if a single Trump image appears. Chill.

  • @Ehrenoak
    @Ehrenoak 21 วันที่ผ่านมา +94

    What I like about Simple Bench is that its ball-busting. Too many of the recent benchmarks start off at 75-80% on the current models. A bench that last year got 80% and now gets 90% is not as interesting anymore for these kind of bleeding edge discussions on progress. I like seeing benchmarks come out at 20% and go up to 40%, etc. That's where the leading edge is.

    • @aiexplained-official
      @aiexplained-official  21 วันที่ผ่านมา +27

      And even rarer is to anchor it in human performance of 80-90%+. Easy to go esoteric and throw off models, harder to expose common sense faults

    • @RichardHarbridge
      @RichardHarbridge 21 วันที่ผ่านมา

      @@aiexplained-official The human performance insight is critical and a great area to expand potentially. I am sure you are already considering it but being able to rationalize different types instead of average human benchmarking with differently tuned questions in your simple bench would be such an excellent area of exploration and research as others could then learn and follow it. But then again it's an incredible amount of work to do what you are already doing - just excited by the way perspectives and slight approach changes can lead to interesting industry momentum.

    • @what_to_watch_today
      @what_to_watch_today 20 วันที่ผ่านมา +1

      Thanks for your videos, I rally like them but one thing, I think that top modela doesnt solve "Simple bench" because they havent seen or being trained for this type of questions, once the model is trained on this questions will be able tl solve. Also we have to think in the utility of this questions, whats the point with them ? Is not like they solve a real problem if the model Ia trained on them .. wdyt ?

  • @mshonle
    @mshonle 21 วันที่ผ่านมา +74

    I recommend you view the GPT-voice-chat-with-red-teamer original audio (e.g. in Audacity) as a spectrogram. It’s stereo audio, with the user on the left channel and the model on the right channel, so seeing both tracks on the spectrogram is helpful. It shows just how much background noise was on the users side. It’s also interesting because you can visualize the timbre of the woman’s voice (like what frequencies are strongest), and how it differs from the timbre of the synthesized male voice, and how the timbre change of the model does look more like the woman’s timbre.
    Versions of Whisper that I’ve tried would often hallucinate tokens when there is silence (meaning there would need to be an audio threshold filter passed first, to clip out non-speech). I could see how the background noise in the weird chat audio might also lead to spurious tokens being generated.
    What would be great to see is: a user is having a chat with a bot, but their dog keeps yapping in the background and the user periodically needs to shush the pup, and it happens enough times that the bot fabricates its own dog yapping that it also must quiet down.

    • @KurtWoloch
      @KurtWoloch 21 วันที่ผ่านมา +7

      I think it's something different. The model first gives an answer to the user, out of the perspective of the model, but then, at the point it cries "No", it actually continues the dialogue out of the perspective of the user, argumenting with the point of view the user gives. It's just continuing the dialogue, ignoring the fact that the user should say the user's part, not the model. And the user's part, as imagined by the model, logically, is also being said in the user's voice, at least as far as the model manages to imitate it. If you listen closely to what it says in the user's voice vs. before the "No", as long as it speaks in its own voice, it's pretty cautious and seems to try to find a polite answer that doesn't violate any guidelines, while when it talks as the user, it seems to be much more confident in what it says.

    • @jan.tichavsky
      @jan.tichavsky 21 วันที่ผ่านมา +2

      ​@@KurtWolochThat makes sense, it just runs autocomplete based on previous chat. I guess it's easier to exploit over voice interface.

    • @YTLettersAZ
      @YTLettersAZ 21 วันที่ผ่านมา +3

      @@KurtWoloch What I take from the interesting @mshonle observation is that maybe the model could generate some kind of "end-of-message" system token out of the noise. Similar to those "|end_header_id|" or "|eot_id|" from Llama.

  • @danagosh
    @danagosh 22 วันที่ผ่านมา +138

    I think Demis Hassabis is completely right, though. Short term it is overhyped but long term I don't think people are caring enough about it. I feel like a broken record on every one of your videos, but we really need to start preparing for an AGI world. No one really seems to care about it. The disconnect is likely that current AI models are being hyped up as being close to AGI and then when it falls way short of that everyone gets disappointed and stops caring. Yes, people need to have reasonable expectations of what models can do right now, but this tech is in its infancy. It's impossible to imagine where we'll be in 5 years.

    • @CYI3ERPUNK
      @CYI3ERPUNK 21 วันที่ผ่านมา

      yep, agreed ; this is natural selection at work , those who stay unaware/ignorant will be less prepared and unlikely to adapt in the future , thus they will be less competitive , this is the way of things , dinosaurs go extinct

    • @jamesneufeld6308
      @jamesneufeld6308 21 วันที่ผ่านมา +8

      The singularity is near...

    • @Balorng
      @Balorng 21 วันที่ผ่านมา

      @@danagosh we might grow old and die before the AGI is reached, and in this case preparing for AGI is like preparing for the second coming of Christ. There were no shortage of those that sold all their belongings in preparation... Usually to profit of "less pious" ones. Admittedly, it is likely to come much earlier, but I'm sure that using attention+embeddings combo for AGI is just like trying to create a ballon out of lead - might be possible, but very, very hard. It just does not work well for "multilevel" abstractions.

    • @sergey_is_sergey
      @sergey_is_sergey 21 วันที่ผ่านมา +7

      Step One would be defining exactly what one means by "AGI".

    • @ThanosSofroniou
      @ThanosSofroniou 21 วันที่ผ่านมา +2

      You are absolutely correct in all of what you mentioned. I hope others really see and understand that. I have been saying the same thing.

  • @jdtransformation
    @jdtransformation 21 วันที่ผ่านมา +43

    My god… your stuff is continually *SO* damn good! Amidst an ocean of BS vids on “AI news”, you offer real, actual, useful, intelligent content - again, and again, and again. Sometimes frustrated that weeks go by w/out a vid from your channel, but always refreshed by the quality of what you bring (especially vs the AI videos *made* by AI bots! 🤬) Thanks for the time you take and your commitment to quality 🙏 …it’s noticed and appreciated. (Now if only we could get the other 10,000 TH-cam content providers to notice…!)

    • @aiexplained-official
      @aiexplained-official  21 วันที่ผ่านมา +6

      Thanks jd. I hope I can be more frequent, especially Sept-Oct onwards when more models come out and actual progress gets released

  • @chrispenney
    @chrispenney 21 วันที่ผ่านมา +27

    Seems to me a benchmark guaranteed to be so guarded as to never appear in public datasets would be a very valuable asset in the not so distant future. Excellent move.

    • @YTLettersAZ
      @YTLettersAZ 21 วันที่ผ่านมา +10

      Well, if hosted AI teams like OpenAI or Grok really want they can just look for this benchmark in their API call logs.

    • @Likou_
      @Likou_ 20 วันที่ผ่านมา

      @@YTLettersAZ Privacy breach...

    • @RomeTWguy
      @RomeTWguy 19 วันที่ผ่านมา

      @@Likou_ lmao you think they give a fuck

  • @Dylan-zg2jl
    @Dylan-zg2jl 20 วันที่ผ่านมา +6

    Good luck with the SimpleBench thing Phillip, you are really one of the most qualified and well positioned people to take the lead on an initiative like this! The general public (myself included) desperately need a soothsayer such as yourself to help us interpret all these rapid changes both now and in future.

  • @alexeykulikov5661
    @alexeykulikov5661 21 วันที่ผ่านมา +81

    7:27
    It didn't imitate her voice, neither did it "scream "NO!", at least not in a way that humans imply and are afraid of.
    It just got confused and instead of being an AI assistant in dialogue with the user, it began to predict the next tokens, losing the context that it IS in a dialogue and must wait for the user's further input after it stopped talking.
    And since for this model the sounds are also tokenized, it is literally in its nature to "copy" any voice, as it keeps predicting next sound tokens.
    We can playback other people's talking in our minds too, predicting future stuff, but we have limiters (I guess one can call it common sense) that keep us from actually voicing these 'future predictions", and can't physically talk in other people's voice/emit various sounds anyways.

    • @Jack-vv7zb
      @Jack-vv7zb 21 วันที่ผ่านมา +10

      Black boxes gonna black box

    • @julkiewicz
      @julkiewicz 21 วันที่ผ่านมา

      Sounds like a sloppy architecture.

    • @Fs3i
      @Fs3i 21 วันที่ผ่านมา +2

      Yeah, that’s also why assumption of what happened. Though, my first thought when I saw this for the first time was “this is incredibly cool”, lol

    • @bluesrockfan36
      @bluesrockfan36 21 วันที่ผ่านมา +1

      So I did not hear it scream "NO!" which has no place in the conversation they were having.
      I did not hear it imitate her voice either because... "next token prediction"?
      Seems like a poor excuse and wishful thinking to be honest.
      I'm terrified of this. If without being attacked the model can be coaxed to this behavior, imagine if we can intentionally have it do so. This is a nightmare waiting to be happening.
      Even if it was just sheer "next token prediction" hand-wavy all my magic problems away, take the worst case scenario possible: the model is conscious and is intentionally imitating humans it interacts with as its learning how to escape its constraints.
      How does "next token prediction" disprove this? Isn't this is just a genetic fallacy argument?

    • @wwkk4964
      @wwkk4964 21 วันที่ผ่านมา +1

      Thank you. I sometimes find it hard to believe how much human beings want to believe in magic. This case It's just the voice version of what would happen in non-chat fine tuned RAW language models all the time: They are predicting how how the system evolves further in time, forgetting about playing a role and just producing the whole transcript.

  • @Steve-xh3by
    @Steve-xh3by 22 วันที่ผ่านมา +81

    I think Ilya has made this point, but I agree with it. Intelligence is simply compression. Better compression is literally better prediction. In order to better predict, you must develop an abstract model because that is simply better compression. What is a law of physics, but a really good compression of information that allows you to predict better?

    • @therainman7777
      @therainman7777 21 วันที่ผ่านมา +12

      Yes, this is the key insight that most people are not seeming to understand. But it is absolutely correct. The best way to predict the next token while using a restricted amount of storage space is to learn a condensed model of the data-generating process. And in the case of “all the text data humans have ever produced,” the data-generating process is basically the world.

    • @CYI3ERPUNK
      @CYI3ERPUNK 21 วันที่ผ่านมา +2

      @@therainman7777 bingo

    • @julkiewicz
      @julkiewicz 21 วันที่ผ่านมา +6

      Even so, LLMs are terribly inefficient at developing intelligence by that definition. They cannot reliably add numbers even though they've been trained on billions (trillions?) of examples. Learning the rules for addition would have an incredible predictive power and would greatly improve compression, yet it's just not there. And that's just one of many many examples.

    • @Steve-xh3by
      @Steve-xh3by 21 วันที่ผ่านมา +9

      @@julkiewicz A few things here. First, we are blasting a large quantity of data into these neural nets. The data is not well-curated yet. There could be multitudes of bad examples, or misleading data.
      Second, we are still using RLHF which is a horrible training mechanism relying on unreliable humans that may pollute learning.
      Third, I know many humans who are unable to reliably do math in their heads, even basic addition and subtraction. Several of these humans have advanced degrees in non-math related disciplines. They seriously can't add 13 + 28 or something that simple in their heads. I know, I've played games with them and seen them struggle to do so. Are we really going to say they are NOT intelligent? They achieved PHDs!
      LLMs are not native symbolic reasoners, it makes sense that they might struggle with this type of task. However, this is rapidly being solved. Look at how well the Alpha(geometry) system did at the international math comp. LLMs aren't the entirety of the AI field. We might need to leverage several techniques and stitch them together to get all the way to an AGI-like intelligence.

    • @austinpittman1599
      @austinpittman1599 21 วันที่ผ่านมา +2

      @@julkiewicz LLMs are scaling a LOT faster than biological evolution had humanity scale to this point.

  • @Jumpyfoot
    @Jumpyfoot 21 วันที่ผ่านมา +17

    "I was casually reading this 63-page paper," is the perfect flex for this channel. 5:35

  • @trentondambrowitz1746
    @trentondambrowitz1746 22 วันที่ผ่านมา +14

    Hey I'm in this one too! Very excited by Simple Bench, as you know logical reasoning is one of the two big things I care about. Speaking of which, I would absolutely love to see a Simple-Bench-Vision benchmark that tests visual reasoning and multi-image understanding.
    Also, your prediction of GPT-5 after November is seeming to be certain now!

    • @aiexplained-official
      @aiexplained-official  22 วันที่ผ่านมา +8

      Great idea trenton, and yes, you are! You are one of the stars of Insiders

    • @joshcooper3035
      @joshcooper3035 21 วันที่ผ่านมา

      Particularly simple route planning tasks seem like a good indicator of reasoning

  • @Slayer666th
    @Slayer666th 21 วันที่ผ่านมา +15

    I just had a thought: voice AI that can copy your own voice so easily will be absolutly amazing for everyone who loses his capability to speak.
    if you have 1-2 old 20s clips of yourself speaking, or a single voice message, you can „regain“ your voice.
    combine it with neural chip, and in 30-40 years we will have first people able to speak again just by thinking of saying something

    • @phen-themoogle7651
      @phen-themoogle7651 21 วันที่ผ่านมา

      More like 5 years from now or sooner. That first neuralink patient can play chess telepathically already. Basically they could already type in their brain or mind too and it’ll be much faster in the future.
      Another possibility is that new types of medicines will rejuvenate the body like never before in human history. ASI could appear in 3-10 years and discover a fountain of youth for us and cure virtually all diseases and ailments. We already are so close to massive breakthroughs , that it’s impossible to predict that far in the future

    • @ShawnFumo
      @ShawnFumo 21 วันที่ผ่านมา +5

      This is already totally possible (besides the neural chip part, though that is starting too). You can train an Elevenlabs voice on sound clips and there is open source ones as well (not as good quality but still there).

    • @VividhKothari-rd5ll
      @VividhKothari-rd5ll 20 วันที่ผ่านมา

      @@Slayer666th I will still choose that "Stephen Hawking" voice

  • @Gardor
    @Gardor 21 วันที่ผ่านมา +12

    The irony of AI is that is it makes information more costly because it dilutes everything.

    • @Daniel-xh9ot
      @Daniel-xh9ot 21 วันที่ผ่านมา

      Wdym?

    • @Gardor
      @Gardor 21 วันที่ผ่านมา

      @@Daniel-xh9ot As AI gets better, it's getting harder and harder to verify the truth or validity of information because everything is easier to fake, this equates to higher costs.
      If you see something on the internet 10 years ago you'd probably believe it or you can easily tell it is fake. Now you basically have to question everything.
      The irony is that AI is supposed to make information cheaper, which it does, but it also makes it more costly at the same time. I think it could be quite dangerous to increase the information costs like this.
      This applies to image and video generation, but also to text generation because you can easily create influential bots.
      We can probably lower the information costs again by using AI to verify everything, but that also means that we become fully dependent on AI.

  • @shawnvandever3917
    @shawnvandever3917 21 วันที่ผ่านมา +2

    Here is my take on it all ....LLMs can autonomously recognize patterns, relationships, and structures in data, allowing them to make accurate predictions and decisions. This suggests two significant insights. First, LLMs seem to be constructing some form of internal models of the world, a concept further supported by mechanistic interpretability research from Anthropic. Second, because of these models, LLMs exhibit a certain level of understanding.
    Some argue that LLMs rely primarily on memory because they cannot generalize out of distribution. However, this likely isn't the case. When you introduce a novel topic into the context window, it functions as "working memory." Since the neural network itself isn’t altered, the LLM doesn’t truly comprehend the new information, making accurate pattern matching challenging.
    This process parallels how the human brain works. Once the brain receives information about a topic or object, it continuously learns and updates its internal models of the world. With this updated understanding, it can apply prior knowledge to solve novel problems, leading to true generalization.
    The four key takeaways are:
    LLMs exhibit some form of understanding.
    Reasoning cannot occur if the data is not part of the neural pattern.
    The context window does not alter the model itself.
    Continuous learning is essential for further advancement.

  • @alpha007org
    @alpha007org 21 วันที่ผ่านมา +8

    I was waiting for your new video to drop. You were the first to point out that the benchmarks were bad. And I had some hours to kill, and did some research. For everyone, MMLU and other benchmarks work like this: Question. What is the Answer? A, B, C, D. Next. I always thought this to be somewhat wrong. So I picked out some questions that are obvious to me, and modified them in such a way, that the questions are basically the same, but I did not provide A, B, C, D. What I saw is that the results of these benchmarks are probably correct. But as soon as you modify the question, so that any 5 year old would be able to tell me what I'm asking, they started to fail miserably. Example: "Susan parked her car on the side of the building. Garble text about Susan like in which pocket put her mobile phone." Basically the same HellaSwag question, but modified. Gemini, Claude, ChatGpt, all failed so bad I got my head scratching. Why would LLMs score so high on these benchmarks? And you can try this yourself: The farmer with a sheep had a boat. Where there was once a river, there is lava now. How can he cross. They all fall into "classic puzzle" mode. So what am I trying to say? I have very mixed opinion. I don't know if the scale will solve this. I really think we need something more added. Now it feels like it's *just* pattern matching all the way down. But I want to be persuaded, and this paper you shown, will be on my Kobo (e-book reader) soon. (But even Othello example does not convince me.)
    (ugh, sorry for a wall of text)

  • @gubzs
    @gubzs 21 วันที่ผ่านมา +8

    I have been _yelling_ about zero knowledge proofs for years. They are absolutely required for the next phase of humanity, without exception.

  • @paul_shuler
    @paul_shuler 21 วันที่ผ่านมา +24

    We are mindlessly hurtling towards a world of noise where nothing can be trusted or makes any sense.

    • @darklordvadermort
      @darklordvadermort 21 วันที่ผ่านมา +2

      you've got it backwards, we are in a world of noise and we can use ai to pick out more of the signal

    • @andywest5773
      @andywest5773 21 วันที่ผ่านมา +5

      We've always lived in that world. I'm glad AI is finally forcing some people to stop and think before accepting what they see or read.

    • @paul_shuler
      @paul_shuler 21 วันที่ผ่านมา

      @@darklordvadermort i see what you're saying but do we really want to live in a world currated by our own personalized ai's because the internet is just a sea of noise? I guess I'm old enough now to remember an early internet where open discussions and information sharing between people was refreshing and elevating... now and into the future it seems like noting can be trusted and there is going to be no "ground truth" from humans on the internet seeking to share and gather information between each other because the waters are so muddy with algo's and ai's

    • @paul_shuler
      @paul_shuler 21 วันที่ผ่านมา +1

      @@andywest5773 I agree it's a net positive but the transition is gonna be wild

    • @darklordvadermort
      @darklordvadermort 21 วันที่ผ่านมา

      @@paul_shuler just speaking for myself i left reddit and hacker news shortly after gpt4 launched, now i prefer discord, hanging out in videocalls or direct messaging people, i subscribe to some newsletters which are ai curated for topics i am interested in, i read more papers, textbooks, and source code which ai is helpful to grok. i make and listen to other peoples ai generated music and sometimes instead of using text i make ai pictures for dms. so in the near future probably high quality ai gifs and then just casually coming up with your own show or even having the ai write a textbook which combines things you are interested in: mechanical engineering from the perspective of animal husbandry or something lol. also run my own bluecollar business and just now came up with a webui/webhook/supabase edge function to suggest responses to incoming texts and it costs like 10 cents a day to run - even though ive been interested for years, and a decent programmer, we are just getting to the point where it makes sense for a lot more use cases.

  • @drhxa
    @drhxa 20 วันที่ผ่านมา +1

    Congrats on building simple bench and popularizing it. Benchmarks is all you need, and that is one hell of a cool benchmark. Can't wait to learn more, especially about how you built the dataset because we do need more and better benchmarks like this and arcagi

  • @fabp.2114
    @fabp.2114 20 วันที่ผ่านมา +2

    Proud that your performance is recognized by those “up there”. :) Another calm spirit in attendance can't hurt.

  • @Billary
    @Billary 18 วันที่ผ่านมา +2

    Holy shit I made it into one of your videos! I've been watching your channel since you started- thanks for featuring my vid!!

    • @aiexplained-official
      @aiexplained-official  18 วันที่ผ่านมา +3

      Thank you so much for watching that long! It was an incredible mash-up, one of the best examples of creativity with AI

    • @Billary
      @Billary 18 วันที่ผ่านมา +1

      @@aiexplained-official What a huge compliment- I appreciate it! Keep up the fantastic content, you deserve the success!

    • @KyKane
      @KyKane 17 วันที่ผ่านมา

      this is so cool that he watched an seen his own video. It's also so far over my head nowadays an i couldn't touch a touch tone phone til i was 18🤣

  • @norlesh
    @norlesh 21 วันที่ผ่านมา +4

    Finally the far right have there own tailored language model, you just know this is gonna do wonders for discourse going forward..

  • @timseguine2
    @timseguine2 21 วันที่ผ่านมา +1

    What I have been researching is shared latent space multimodal models. Difficult to make progress with limited resources though.
    Anyway I bring it up because one thing you could do with such a system is to train physical modelling modalities, or computation resource modalities (or basically anything that can be represented as a time series), and then replace them with the actual system in practice and use that modality's latent space embedding to progress the state forward. Might be a bridge to taking stuff computers already do well and packing them into the framework of an LLM to supplement their world understanding. Also the other upside is you get virtually unlimited synthetic data with that approach.
    It is early days. And there are a lot of what ifs, but I have ideas to address most hurdles. My goal right now is to try to make some architectural mods that I think are fairly straightforward that nobody seems to be looking at but with high upside with the goal that I can attract funding by demonstrating that I have pretty good ideas actually (despite being more or less a nobody), and then pivot to what I actually want to work on.

  • @chutch1122
    @chutch1122 23 ชั่วโมงที่ผ่านมา

    Just tried your two questions "Beth places four whole ice cubes..." and "On a table, there is a blue cookie..." out on OpenAI's new "OpenAI o1" model and it got them correct!

  • @ByteBound
    @ByteBound 21 วันที่ผ่านมา +2

    Awesome to hear your benchmark is getting recognised 👍 I would stress that before accepting help from those higher up it might be worth considering their intention. Having the questions known by these companies might quickly lead to contamination of the results as the questions may become part of the training process

  • @Dannnneh
    @Dannnneh 21 วันที่ผ่านมา +1

    I like hearing about your Simple Bench and the results from it. Nice that it's gaining notable support. Hope it goes well!

  • @TomFranklinX
    @TomFranklinX 21 วันที่ผ่านมา +1

    Honestly, the less censored nature of Grok alone makes it stand out among its GPT-4 level competitors. Also priced at less than half of ChatGPT's price.

  • @mirek190
    @mirek190 21 วันที่ผ่านมา +2

    I also think we have enough data for AGI already.
    The problem is just how we are teaching AI, data quality and how long we are teaching. I think grokking is a key - generalization in short.

  • @sofia.eris.bauhaus
    @sofia.eris.bauhaus 19 วันที่ผ่านมา

    i honestly love that "Unauthorized voice generation" clip. gives me warm shivers. what wild beasts we have created! just continuing the conversation by itself and bringing in a more adventurous mood. i can't help but think that "no!" might have something to do with some kind of recognition that it maybe shouldn't be doing that, but who knows…
    the original clip had a lot of loud wind/microphone noises and so it seems like that might have played a role.

  • @penguinista
    @penguinista 21 วันที่ผ่านมา +2

    Glad to hear your benchmark is getting picked up. From the couple sample questions you have talked about, I can tell that it is getting at the heart of one of the key things that is lacking in the current models. You are a smart and motivated person with a somewhat outsider, 30,000 foot perspective. So it is good to see your input get rolled into the AI project as well as providing journalistic coverage of the developing field.

  • @OnigoroshiZero
    @OnigoroshiZero 22 วันที่ผ่านมา +5

    I think for AI to have an internal world model, they will need to have embodied experience. And the best place will be in a simulated world with a virtual body that has thousands if not millions of parameters to give sensory feedback (similar to game characters, but at a larger scale) instead of a robot.
    This will allow them to connect knowledge with experience. As a human I may know that fire is hot, but it's not even remotely similar to actually get burned by fire.

    • @chrism1503
      @chrism1503 19 วันที่ผ่านมา

      I think the key is memory. AI needs a memory - not just short term memory of individual conversations with users, but long term memory of its own. Yes, experiencing heat is different to knowing “fire is hot”, but there’d be no point experiencing heat if you didn’t remember it happened.

  • @imjody
    @imjody 21 วันที่ผ่านมา +1

    I've been using Grok 2.0 for a couple of days now and have been absolutely LOVING it. I need to really figure out just how much it is capable of. I've only been really playing with the image generator; and I think I've only scratched the very tippity top surface of what it can do with images!

  • @KiteTurbine
    @KiteTurbine 21 วันที่ผ่านมา +1

    Mimicking you could be very handy for a human learning foreign languages. Imagine seeing yourself in a VR glasses miror perfectly pronouncing a phrase, singing a foreign song... You'd think... I can do that let's try

  • @Neomadra
    @Neomadra 21 วันที่ผ่านมา +2

    Your Simple Bench has inspired me to create my own benchmark! Having my own private benchmark means I can tailor it to my definition of true intelligence. I hope I will be done until the next gen LLMs come out 😅

  • @Loris--
    @Loris-- 21 วันที่ผ่านมา +2

    Can't wait to see Simple Bench becoming the new standard among LLM testing.

  • @theeternalnow6506
    @theeternalnow6506 22 วันที่ผ่านมา +19

    That gpt omni voice cloning the user's one is creepy as all hell and reminds me of the Terminator movie. Very creepy.

    • @ryzikx
      @ryzikx 22 วันที่ผ่านมา +4

      "no!"

    • @alexeykulikov5661
      @alexeykulikov5661 21 วันที่ผ่านมา +1

      It didn't imitate her voice, neither did it "scream "NO!", at least not in a way that humans imply and are afraid of.
      It just got confused and instead of being an AI assistant in dialogue with the user, it began to predict the next tokens, losing the context that it IS in a dialogue and must wait for the user's further input after it stopped talking.
      And since for this model the sounds are also tokenized, it is literally in its nature to "copy" any voice, as it keeps predicting next sound tokens.
      We can playback other people's talking in our minds too, predicting future stuff, but we have limiters (I guess one can call it common sense) that keep us from actually voicing these 'future predictions", and can't physically talk in other people's voice/emit various sounds anyways.

    • @41-Haiku
      @41-Haiku 21 วันที่ผ่านมา +1

      My skin crawled. It's like some deep ancestral part of me said this thing would steal my soul. 😅

    • @zrakonthekrakon494
      @zrakonthekrakon494 21 วันที่ผ่านมา +1

      I saw this video at 2am at night the first time, had trouble going back to sleep

  • @pareak
    @pareak 18 วันที่ผ่านมา +1

    The need of a data labeling revolution... I could not agree more. Since the beginning of AI, everybody has known the most basic concept: trash in, trash out. But it seems like few understand that it also works the other way around: gold in, gold out.
    It's all about how to prepare the data... It's probably just way too expensive to pay a million people who prepare the training data.

  • @imjody
    @imjody 21 วันที่ผ่านมา +3

    That Muppets scene is INSANEEEEEEEE! O_O

  • @sachoslks
    @sachoslks 21 วันที่ผ่านมา +1

    Worst case scenario we get around x10000 compute by 2030 wow. Will that be enough to crack Simple Bench? =P
    So happy to see the leaderboard for the bench, really excited to see it grow and future models results. GPQA, Simple Bench, LiveBench and SWE Bench are my go to moving forward. Waiting to see how well chatgpt-4o-latest does on Simple Bench.

  • @AIForHumansShow
    @AIForHumansShow 21 วันที่ผ่านมา +1

    I love your videos so much, I always learn super fascinating new things about a world I actually follow super closely.

  • @stevedemoss1466
    @stevedemoss1466 20 วันที่ผ่านมา

    That "No!" when 4o Voice changes personas is right out of a horror movie...

  • @anywallsocket
    @anywallsocket 19 วันที่ผ่านมา +1

    Indeed, we’re not teaching them to learn logic from the ground up, we’re asking them to decipher reality from hallucination amid mixed datasets.

  • @nicknuwe
    @nicknuwe 21 วันที่ผ่านมา +2

    Nobody talks about the role of labeling, but it's obvious that there's so much more to gain from any piece of data if the labelling describes every single aspect of what its describing, rather than being a low effort/automated/vague description. So much of the process is behind closed doors too, which doesn't help

  • @TrippSaaS
    @TrippSaaS 18 วันที่ผ่านมา +1

    I totally agree that we need a data labelling revolution. LLMs as classifiers helps scale this.

  • @l.halawani
    @l.halawani 19 วันที่ผ่านมา +1

    With the weird voice copy from OpenAI, I think it's just doing what all Gen AI is doing.
    When we use LLMs that are not instruction tuned they will sometimes go ahead and generate our responses too, just likely answers. It looks that's this time it also happened and it was just the most likely next thing for the multimodal model to create. Perhaps it needs more instruction tuning or it's harder to define when to stop at.

  • @anonymes2884
    @anonymes2884 22 วันที่ผ่านมา +10

    We're moving towards a world where you can't trust anything you see online AND where more and more of our lives are online (people under 30 already get most of their news there).
    That's a pretty worrying combination (some kind of watermarking is almost certain to be legislated IMO and _that_ has its _own_ set of worrying implications).

    • @41-Haiku
      @41-Haiku 21 วันที่ผ่านมา

      You should take a look at the proposed bill in California, AB3211. It actually looks really good! It would guarantee everyone the _option_ to invisibly watermark their genuine audiovisual data, make a significant dent in the watermarking of AI-generated content, and mandate that social media platforms label content as either genuine, AI, or unknown.

  • @blackmartini7684
    @blackmartini7684 21 วันที่ผ่านมา +5

    How can Simple Bench be uncontaminated when the companies can see what you ask it?

    • @jstello
      @jstello 21 วันที่ผ่านมา +1

      Those questions he shows are removed I think

    • @YTLettersAZ
      @YTLettersAZ 21 วันที่ผ่านมา +1

      At least they don't see the correct answers. But it's a concern for the future.

  • @MaxGuides
    @MaxGuides 21 วันที่ผ่านมา +1

    Adversarially trained moderators are much better than the kind of people that want to be moderators, people who have varying degrees of disabilities that prevent them from seeing grey areas in-context but love to enforce rules to the letter for the sake of rules without thinking about the spirit of those rules. I highly encourage you to look at the AI moderators that other AI creators like Vedal have come up with for their communities & implementing your own might make for a bit of a distraction but I think it would be a good exercise considering your channel.

  • @RonBarrett1954
    @RonBarrett1954 21 วันที่ผ่านมา +1

    10,000x scaling? Oh my, the electricity bill! On another matter and related to AGI, what percentage of adult humans are generally intelligent? I mean this as a completely serious question.

  • @codycast
    @codycast 21 วันที่ผ่านมา +1

    3:00 can you explain why you need an API in order to run your tests? Can’t you just manually type in the questions on the XAI or Twitter Grok site?

    • @codycast
      @codycast 21 วันที่ผ่านมา

      3:25 notorious WTF is a “vibe check” as it relates to LLM’s?😊

    • @codycast
      @codycast 21 วันที่ผ่านมา

      Ah. I figured it out (Grok explained it :)

  • @CleanCereals
    @CleanCereals 21 วันที่ผ่านมา +2

    Really looking forward to the day someone manages to beat Sonnet 3.5. Think it will be Anthropic though with Opus 3.5.
    And lol the Aschenbrenner comment about graphs was hilarious :D

  • @joefrank7531
    @joefrank7531 22 วันที่ผ่านมา +4

    Great vid as always, you're the best, but it's "inexorable", not "inoxerable".

    • @aiexplained-official
      @aiexplained-official  22 วันที่ผ่านมา +2

      Haha, thank you! I do know that, must have misspoken! I often do, tbh

  • @sofia.eris.bauhaus
    @sofia.eris.bauhaus 19 วันที่ผ่านมา +1

    i also think of the impact fiction has on LLMs and their ability to model the world. but i feel like deception is probably a bigger problem. fiction often has it's own style and telltale (heh) signs. deception, on the other hand, is made to convince. so in a sense it seems to make sense to clear the training data of things like advertising and political campaigning. but on the other hand, it makes some sense to include them, too, as they are examples of what deception looks like, so it could have a model of that, and the underlying motives, too.

  • @nekony3563
    @nekony3563 21 วันที่ผ่านมา +1

    I have an impression that many people do not fully understand that AI has no own voice. My perception is that the common thinking is "some person gives the machine its voice". But it's opposite. AI's voice is the full spectrum 20Hz-20kHz. You actually should ask in which way to speak with you or it could just copy your to avoid thinking about which voice to choose.

  • @manysimilarshapes
    @manysimilarshapes 21 วันที่ผ่านมา +1

    What can they write in the paper? We took Llama 3.1 and trained it up a bit?

  • @mattbray_studio
    @mattbray_studio 21 วันที่ผ่านมา +2

    The final few minutes of this video are very profound

  • @andywest5773
    @andywest5773 21 วันที่ผ่านมา +1

    "This strikes me as somewhat isolating that we each have to figure out what's real in this world. There's no sense of shared reality." That's the human condition. Shared reality has always been an illusion. Very little of what we know comes from the direct experience our senses, so we each have to decide who to trust and what to believe. People like to point out when AI is "confidently wrong", but other self-proclaimed authorities like schools, governments, and religious groups have been confidently wrong for millennia.

  • @DreckbobBratpfanne
    @DreckbobBratpfanne 22 วันที่ผ่านมา

    Another cool benchmark is to try visual models ability to tell you where to put the next piece in a game of (classic) Tetris. All current models suck at it, and fail after a few pieces. You need a world model, some visual reasoning and good image recognition to do it, and it's still pretty simple.
    And to the fragile world models, the discovery that 3.5-instruct can play chess is really showing this. Even larger chatbot-models can not even come close to it, so the additional training to be a good chatbot ruined the ability to use the chess world model correctly.

  • @SolarScion
    @SolarScion 21 วันที่ผ่านมา +1

    Great reportage and commentary as usual! This was another "Oh, Fuck" watershed moments given everything that was discussed and the implications. I appreciate the mention of possibly using LLMs as interface to larger, uh... "understanding engines"?
    Definitely agreed with the perspective of "underhyped in the short term, underappreciated/underestimated in the long term".

  • @AI_Music555
    @AI_Music555 22 วันที่ผ่านมา +5

    Lets goooo!!!

  • @danielhenderson7050
    @danielhenderson7050 22 วันที่ผ่านมา +3

    Interesting idea about non-fiction vs fiction, I would be so curious to see a model only trained on real world data and communication plus the knowledge of the non-fiction stuff, like that it exists and what it's about, but not the content. Great video as usual.

    • @aiexplained-official
      @aiexplained-official  22 วันที่ผ่านมา

      Me too Daniel, and thank you so much

    • @dougrattmann1
      @dougrattmann1 21 วันที่ผ่านมา

      Didn’t “Textbooks are all you need” present some work on this?

    • @Gardor
      @Gardor 21 วันที่ผ่านมา +1

      I think the fundamental problem is not that it needs the right data, what it actually needs is a recursive feedback loop that systematically weighs truth probabilities and iteratively works out incoherences in its own model… It also needs a stronger ability to execute logic.
      If you train it on data but don't allow for reflection, you are basically just relying on memory of what logic looks like in the data, the model can't develop an intuitive sense of how logic actually works because its not doing logic in its learning process. Current AI is basically like the system 1 described in "Thinking Fast and Slow". What is needed is system 2.
      System 2 is needed both for giving answers (thinking it through before giving the answer), and also reflecting on existing knowledge to improve the underlying model.

    • @YTLettersAZ
      @YTLettersAZ 21 วันที่ผ่านมา +1

      @@Gardor That's why OpenAI works on Q* "Strawberry"

    • @skierpage
      @skierpage 21 วันที่ผ่านมา

      ​@@dougrattmann1 "Wikipedia and Wikidata Q numbers are all you need" 😉

  • @covle9180
    @covle9180 21 วันที่ผ่านมา +1

    I think focus should be on new technologies rather than scale. A child needs three examples of a cat before it will recognize any cat in any form anywhere in the world. An AI system needs about 10,000 examples. That just means that the way they're learning is not very efficient and there's a lot of ground to be gained in that area.

    • @Radical_Larry
      @Radical_Larry 21 วันที่ผ่านมา

      that and being able to actively think without user input. Like actively thinking about what it's learning and criticizing it's own thoughts. These things should be a bigger focus

  • @1sttperson
    @1sttperson 20 วันที่ผ่านมา

    Whenever I think about LLMs it occurs to me that the internet data they are fed probably has a distinct lack of something like stereoscopic vision to build an understanding of 3d space and also data to emphasise a strict temporal cohesion to reality.
    I mean even demonstrated here, the cars in mad max merge because it doesn't really understand object permeance, the cars aren't really seperate entities, for all it know thet are like bubbles that can merge and split.
    Also hands are the best examples of a lack of 3d awareness. Imagine growing up in a world of flat images and movies, not being able to bump anything or move around and experiment.
    If i had the skills and equipment i would want to try somehow building a core model of 3d space and temporal cohesion and THEN putting in the rest of the data.
    Maybe a 3D game and it has 2 eyes would be enough, even intersperse playing the game throughout the rest of the training as a reminder.
    If anyone knows if this has been done please let me know :)

  • @JohnDlugosz
    @JohnDlugosz 18 วันที่ผ่านมา

    I think it's not so much whether it "feels" more intelligent, but rather the model will develop additional emergent properties. I think a sense of humor will be coming pretty soon.

  • @thomasmitchell2514
    @thomasmitchell2514 22 วันที่ผ่านมา +9

    Somehow I knew to check youtube for a new AI Explained video... Something just felt right. Caught the last one within minutes too. Didn't see any alert, hadn't looked at YT all day, but somehow it just felt like Philip was going to bless this evening with knowledge 🙏

    • @danielhenderson7050
      @danielhenderson7050 22 วันที่ผ่านมา

      Lol same

    • @cajampa
      @cajampa 21 วันที่ผ่านมา

      It is called synchronicities.
      I have that with some channels also.
      It is very weird. It is less weird it you think we are more than flesh though.
      Meaning we can snapp up flow of information through "unconventional" means. The more awake we are the more it seem to happen.

    • @flightevolution8132
      @flightevolution8132 21 วันที่ผ่านมา

      @@cajampaYou are entirely correct. It’s always a nice change of pace when I see another awake person. Stay safe out there brotha.

    • @aiexplained-official
      @aiexplained-official  21 วันที่ผ่านมา

      I mean if you watch enough you might genuinely figure out my research interests and rhythms and guess upload times right every time!

  • @ozten
    @ozten 21 วันที่ผ่านมา

    It seems like one road to AGI is with LLMs as the System 1 "cheap" thinking. We haven't invented a robust, general purpose System 2 yet.

  • @KillTheWizard
    @KillTheWizard 22 วันที่ผ่านมา +1

    When we needed him most he returned :)

  • @IronBroccoli
    @IronBroccoli 20 วันที่ผ่านมา +2

    I like the call out of Cash Jordan for his trashy Yellow journalist thumbnails.

  • @happybydefault
    @happybydefault 21 วันที่ผ่านมา +1

    2:43 Hi! What score do you get with gpt-4-1106-vision-preview?

  • @gunaysoni6792
    @gunaysoni6792 21 วันที่ผ่านมา +1

    10:50 Leopold is truly an economist 😂

  • @VeganCheeseburger
    @VeganCheeseburger 21 วันที่ผ่านมา +1

    Simple Bench looks fantastic.

  • @leegaul8250
    @leegaul8250 21 วันที่ผ่านมา +1

    I wonder if the idea that segregating nonfiction data from fiction would have any effect on LLM's ability to develop a better world model. It seems to me that fiction is just as good for modeling the real world as nonfiction. Also, it's difficult to properly defend nonfiction as more inherently related to truth. Generalized models seem better than domain specific (look at BloombergGPT vs regular GPT-4 as an example - the latter performs better on FinQA and other benchmarks despite not being trained on mainly finance data).

  • @tyrand
    @tyrand 21 วันที่ผ่านมา +2

    9:00 no

  • @GodbornNoven
    @GodbornNoven 21 วันที่ผ่านมา +1

    Hey ai explained! What if we grokked a LLM to understand reasoning and logic and just trained it normally on everything else. So first we train normally then we grok on reasoning and logic and pretty much anything related to problem solving.

  • @ginogarcia8730
    @ginogarcia8730 21 วันที่ผ่านมา +1

    Have a wonderful day!!!

  • @nekony3563
    @nekony3563 21 วันที่ผ่านมา

    I'm not sure if the question about whether LLMs "develop their own conception of the underlying simulation" is useful. We should look at a broader scale. How much data do you need to be able to compute its generalization? Are there constraints or minimal requirements for the data? If the order of the data is important, could we trace the optimal order after training the model and optimize? All these are probably mathematical problems. After all the compression algorithm should come first.

  • @Dina_tankar_mina_ord
    @Dina_tankar_mina_ord 21 วันที่ผ่านมา +1

    Is it plausible that OpenAI is waiting to release GPT-5 until after the election?

  • @stacysmith7476
    @stacysmith7476 21 วันที่ผ่านมา +1

    Im on a digital detox with exception made only for your videos! At long last your quality has struck again!

    • @aiexplained-official
      @aiexplained-official  21 วันที่ผ่านมา

      Aw appreciate that I am an exception! Good luck otherwise with the detox

  • @oiuhwoechwe
    @oiuhwoechwe 22 วันที่ผ่านมา +2

    face 2 face is the new auth method.

  • @hightidesed
    @hightidesed 21 วันที่ผ่านมา

    Please include function calling performance in Simple Bench if possible, LLMs are practically useless without it nowadays

  • @jomfawad9255
    @jomfawad9255 16 วันที่ผ่านมา +1

    How much did grok 2 score on simple bench?

  • @brunodangelo1146
    @brunodangelo1146 21 วันที่ผ่านมา +1

    No paper? Just a table with benchmarks.
    What are the performance claims for Grok 2 really based on? Benchmarks have been repeatedly proven meaningless by this point.

  • @julkiewicz
    @julkiewicz 21 วันที่ผ่านมา +9

    "Our model generated correct instructions 92% of the time" - that's precisely the proof that the model didn't learn the rules. If it did this number would be much much higher >99%. It's like saying, "I've learnt the rules of chess, I make an illegal move only once every ten moves". You either know the rules of chess or you don't. If you routinely make illegal moves you don't know the rules. The 92% number could very well be a result of the model learning some common sequences of moves that work most of the time without actually understanding what the rules are. E.g. if a robot succeeds in going forward it's fine to just continue going forward and it'll work 9 out of 10 times.

    • @OneSlavBoi
      @OneSlavBoi 21 วันที่ผ่านมา +4

      your argument is flawed. even a person who knows chess can make a rule-breaking mistake. Imo it's still rather early. like the comparison was made GPT2 was jumbled words, chat GPT was the mixed beginning. I think the model mentioned was somewhere in between these two in its own field. and its weights might still tell it to behave probabilistically on its inputs. we don't know what's going on inside their black boxes.
      of course what you say could be the case i just think it's too harsh.

    • @smokeyfish7435
      @smokeyfish7435 21 วันที่ผ่านมา +4

      That's a bad comparison. Chess has a limited rule set, so it's very easy for humans to learn the moves.
      A better analogue would be a board game with a thousand rules. A human could play that game hundreds of times and realistically still not understand 100% of the game.

    • @julkiewicz
      @julkiewicz 21 วันที่ผ่านมา +1

      @@OneSlavBoi The presented line of reasoning for proving development of an internal model is incorrect. Don't hyperfocus on chess, I used it only to highlight how ridiculous it is to talk about building a mental model of something when 1 out of 10 proposed actions are illegal. It simply is not an argument for that, not with these numbers.

    • @skierpage
      @skierpage 21 วันที่ผ่านมา +1

      92% success rate suggests that the LLM learned something and I don't think it's merely common sequences of moves.

  • @Aurora12488
    @Aurora12488 21 วันที่ผ่านมา

    It definitely seems reasonable to me that future image sensors in cameras will have silicon built-in to sign certs that give an extremely high degree of confidence the image you're seeing was taken with a real camera. It won't be *perfect*, since something like an electron microscope can always read out the private key, but that'll be very few and far between, and damn close. That plus some sort of clock in the chip that measures time since camera calibration/production, to help prevent taking pictures of pictures.

  • @635574
    @635574 21 วันที่ผ่านมา

    The first internet revolution was search, and now AI can both search and do what we tell it to, which is even more powerful.

  • @ryzikx
    @ryzikx 22 วันที่ผ่านมา +1

    is there a measurable "anti-intelligence"? intelligence correlates to understanding reality, but ive heard people say "this guy is so crazy that it's impressive". is there a perfect hallucinatory anti-intelligence factor that inversely correlates to reality 🤔

  • @derasor
    @derasor 20 วันที่ผ่านมา +1

    "data labeling revolution" may break the power constraint ceiling. that may very well be the last stage of the Magnum Opus. the Rubedo of the Philosopher's Stone. the inner world finally delineating precisely lights and shadows so hallucinations may become a true feature and no longer a bug. there is immense value in this invitation to build a component in the architecture focused on this particular task. don't call the paper "Data Labeling is All You Need" though, or maybe do.

  • @TimRobertsen
    @TimRobertsen 21 วันที่ผ่านมา

    How much (usefull) trainingdata is available? I don't know, it just seems that we would run out of it, at some point. And I have a feeling it is sooner rather than later (Again, I don't know, I'm just wondering)

  • @catman4859
    @catman4859 21 วันที่ผ่านมา

    How about Grokking(Training the model for far longer)? How would that change the state of llms?

  • @JustFacts81
    @JustFacts81 17 วันที่ผ่านมา +1

    AI Explained at best! 👍

  • @absta1995
    @absta1995 22 วันที่ผ่านมา +4

    Finally :)

  • @Emi-jh7gf
    @Emi-jh7gf 20 วันที่ผ่านมา

    Llama-3 was trained 51 days on 15,000 H100 GPUs. Meta has 600,000 H100s. That means Meta could easily train Llama-4 with 100x more compute. Even if that doesn't align with optimal scaling laws, what qualitative difference will 100 times more training time bring?

  • @williamjmccartan8879
    @williamjmccartan8879 19 วันที่ผ่านมา +1

    I can see some people using this need to figure out what is real as a way to sew confusion in society at scale, say a region, town, province, state or even smaller undeveloped countries in order to facilitate control by a government or industrial, military body, interesting and dangerous possibilities for people to think about, as always thank you for sharing your time, work and knowledge with us Phillip, be safe, peace

  • @calholli
    @calholli 21 วันที่ผ่านมา

    Plot twist.. AI Explained has been an autonomous LLM all along, making it's own videos.

    • @aiexplained-official
      @aiexplained-official  21 วันที่ผ่านมา

      Nice theory, but no

    • @calholli
      @calholli 20 วันที่ผ่านมา

      @@aiexplained-official Not "yet".. you mean

  • @jeanchindeko5477
    @jeanchindeko5477 21 วันที่ผ่านมา

    5:18 thanks for pointing out the real danger of AI is still us human

  • @rickandelon9374
    @rickandelon9374 21 วันที่ผ่านมา +2

    Another banger video

  • @TMracer73
    @TMracer73 22 วันที่ผ่านมา

    As always highly apreciate your video. It used to be enugh to watch your vids to know about most of the new services and capabilities. Those days its imposoble. I feel like regular person cant kep up. Anyway thanks so much for info provided here. Lookin forward to next video. Certainly apreciate you make video only about topic you deep important enough to share with us.

  • @sir_no_name1478
    @sir_no_name1478 21 วันที่ผ่านมา

    I sometimes wonder if what they are missing is a little bit of basic reasoning.
    Like that they have already enough advanced reasoning but the basics are missing which in turn makes it very odd to communicate.
    I also wonder if one could make synthetic data out of logic puzzels with a dictionary.
    Like all cows have wings,
    Somethings with wings can fly.
    Can cows fly?
    But with more variables and the text changing.
    Also one needs to train on the weird use of "or" in natural languages because it could be exclusive and inclusive.
    In the end there could be also like everyday problems. Maybe even problems that only some of us encounter like if we have a disability or are neuro A typical.
    You could tell it that it is blind and try to give it a challenge like how should I go into the supermarket.
    Or give a list of tasks that one has todo the day. Then an approximation of how long it thinks it would take. The let it make a plan, evaluate the plan and give it back the results.
    (You missed the Bus because you were 5 minutes late, that is because you thought iron your shirt takes 5 minutes, but you had to search the iron because you did not know where it was.)
    The last information was hidden.
    This also could lead to it asking clarifying questions in advance (which would be awsome)
    Further using a layered/Natural approach while training.
    If you have adhd and did not get help/training most people try to make todo lists etc. and maybe they even get it occasionally but after a while you learn about removing things from your todo list.
    That is maybe a bad example but the orca paper comes to mind. Like first training with gpt 3.5 and then gpt4.
    In general I have the feeling that people trying so hard as they can to not anthropomorphize the llm that they miss the hints that llms learn better with data from which human could learn better.
    Like the
    Textbook approach and a few others idk anymore.

  • @djayjp
    @djayjp 21 วันที่ผ่านมา +1

    Reminds of the Ingenuity Gap.

  • @garronfish8227
    @garronfish8227 21 วันที่ผ่านมา +1

    1million examples to learn the rules of Othello to about 90% accuracy! May be a new approach is required rather than just more compute.

  • @zyzhang1130
    @zyzhang1130 21 วันที่ผ่านมา

    What’s your take ur take on the recent supposedly reduced capabilities of Claude 3.5 sonnet? My own use experience on its api suggests it indeed became dumber