why AI can't pass this test

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 พ.ย. 2024

ความคิดเห็น • 2.3K

  • @TheMarkArmy
    @TheMarkArmy ปีที่แล้ว +4523

    currently waiting for "Are you smarter than an AI?" to become a major hit TV show

    • @deepdays9068
      @deepdays9068 ปีที่แล้ว +104

      “Delete system 32”

    • @NaudVanDalen
      @NaudVanDalen ปีที่แล้ว +92

      Everyone already lost to Watson on Jeopardy.

    • @LombaxPieboy16
      @LombaxPieboy16 ปีที่แล้ว +8

      I need the answer to this now

    • @bottomfeeder12
      @bottomfeeder12 ปีที่แล้ว +22

      create one and make 💰, why wait for someone else to do it

    • @DialecticRed
      @DialecticRed ปีที่แล้ว

      ​@@deepdays9068sudo rm -rf /

  • @LeonardChurch33
    @LeonardChurch33 ปีที่แล้ว +1010

    I'm curious about the etymology of "writing a test" vs "taking a test." I know they refer to the same action but every time I hear "writing a test" I think Sabrina is actually creating the test herself.

    • @SgtSupaman
      @SgtSupaman ปีที่แล้ว +160

      That was weird to me too. She wasn't writing the test, she was writing her answers (aka taking the test). The test was already written by someone else.

    • @saturnhex9855
      @saturnhex9855 ปีที่แล้ว +150

      I think it's just a regional difference. I remember when I was in Canada, people said "writing" instead of "taking" in the US. It also might have to do with taking tests that are long-form answer based, where you have to write paragraphs. But yeah, I wonder if someone's looked into the history of that usage difference.

    • @C4Oc.
      @C4Oc. ปีที่แล้ว +54

      As far as I know, in German you say that you're "writing a test" (Ich schreibe eine Klausur; there are actually many types of tests in German, so not 100% accurate), or you can say you're "having it" (Ich habe eine Arbeit morgen), but I've never heard anybody say they "take" a test to mean they're actually taking it as in the English meaning (German's translation of "take" is "nehmen", which in this case would mean that you're taking a copy of the test)
      In Romanian, if you "take a test", that means you pass it (Ai luat examenul? Da, cu notă bună). You can actually "give an exam", which translates to taking it in English (Când dai examenul?)

    • @PhotonBeast
      @PhotonBeast ปีที่แล้ว +17

      @@C4Oc. Agreed; it's probably a combination of regional, language/dialect history, family history, and more. In Dutch, the meaning translation for "I am cold." would be (more or less) "I have coldness." whereas the direct literal translation would actually end up being "I am the cold." which uses a different verb. The language makes a stronger distinction between being something and having something; where as in English, 'am' is correct in both statements and the emphasize or meaning comes from other structures eg. "I am the cold."

    • @C4Oc.
      @C4Oc. ปีที่แล้ว +8

      @@PhotonBeast So you don't have an adjective for "cold" in Dutch? In German, saying "Ich habe Kälte" wouldn't make much sense, since it doesn't mean "I have a cold" or "I am cold", but rather literally "I have cold". You also don't say "Ich bin kalt" to mean that you're cold in terms of temperature, but rather to mean you are cold-hearted (personality trait). For this meaning, you'd say "Mir ist kalt", which you can kind of but also kind of not translate it literally to "For me, it's cold" ("mir" doesn't actually mean "for me", or at least not all the time).

  • @MorningDusk7734
    @MorningDusk7734 ปีที่แล้ว +2059

    Reactors are supposed to be critical because that's the point where the fuel undergoes fission. You're thinking of Super Critical, where the fission spirals out of control and essentially burns too much fuel at once, instead of a controlled burn. It's like a log being on fire vs the equivalent amount of sawdust being tossed into a fire. One of those is burning much bigger, much faster.

    • @3nertia
      @3nertia ปีที่แล้ว +81

      What a great analogy!

    • @jankkhvej434
      @jankkhvej434 ปีที่แล้ว +18

      that's what i was about to write, good explanation

    • @AnarchistEagle
      @AnarchistEagle ปีที่แล้ว +117

      The fuel can still undergo fission when subcritical, it's just that the fission doesn't promote a self sustaining chain reaction like it would when its critical.

    • @SoupyMittens
      @SoupyMittens ปีที่แล้ว +34

      I haven't watched the video and this comment concerns me

    • @justinth963
      @justinth963 ปีที่แล้ว +18

      It's supposed to be prompt subcritical. Prompt criticality should never be reached.

  • @jojaymer
    @jojaymer ปีที่แล้ว +1330

    Here's a fun way to think about AI that really opened my eyes a while ago: Imagine you are put in a library full of every book that could ever and has ever existed. All your needs are taken care of, but there's just one problem --all the books are written in a completely alien language. It has ZERO roots to whatever languages you know now, so extrapolating anything just isn't possible. now imagine you work as the librarian at this alien library, and outside there's a queue of people (that you cant see) making requests for books you cant understand, speaking it a language you cannot understand. What do you do? you guess, right? you start giving random books to orders until over time you get a few right. You learn that x word correlates to y symbol on the spine of the book, or something like that. Over time, you get REALLY good at this, and you can fulfill every order to satisfaction...
    But can you read the books?
    Could you, ever, read the books? could you ever even have any idea what's in them? Sure, you figure out what symbols pair with spoken words, but ALL of it is alien to you, so could you ever know what that symbol actually means? the word?
    To us, standing outside of the library, AI may look super intelligent and efficient. It may look like it understands what we want from it. it may look like it knows everything. But all it knows (and will EVER know) is how to answer questions the way it knows we want it too.

    • @80nomads74
      @80nomads74 ปีที่แล้ว +119

      I love that analogy! I think I'll tell that to my uncle who is scared of AI.
      Another good one I've seen is comparing AI to a calculator. A calculator can multiply to very large numbers - something humanly possible. AI can summarize data it was trained on that is related to your input - something a human could do with access to Google, libraries, and plenty of time. The AI looks way more impressive than the calculator, but they essentially do the same thing - use a set of predefined/trained patterns to come up with an answer. The difference is summarizing data is not as straightforward as multiplication. The AI does not examine the validity of its data, it just regurgitates data it was trained to associate with key words in your question. Those associations may or may not be true.

    • @xw3132
      @xw3132 ปีที่แล้ว +46

      This is exactly what in my mind but better worded.
      We humans are not trapped in a library. We have many more ways to learn beside reading published materials. We experience the world with our senses each day, passively receiving huge amount of information. We conduct little experiments trying to make things work. We talk to peers, learning from their experiences that will never be written down. On the other hand, AIs can only learn from human-generated materials to get a grasp of the real world, as in your analogy, the world outside the library.
      Most importantly, we have a concious mind that decides what we want to learn and proactively looking for that knowledge with the above mentioned means. If one day we have an AI that truely surpasses human intellegence, it must also have such conciousness. On top of that, it should also have the means to interact with the real world, not limiting itselves by the second-hand human-generated materials.

    • @tteqhu
      @tteqhu ปีที่แล้ว +63

      so... a Searle's Chinese Room?

    • @jayashrishobna
      @jayashrishobna ปีที่แล้ว +7

      you may be interested in the concept of AI alignment

    • @ekstrapolatoraproksymujacy412
      @ekstrapolatoraproksymujacy412 ปีที่แล้ว +7

      Yeah, but it actually is a little bit more than that, there is some actual (like human) intelligence, not much, but not none.

  • @geniej2378
    @geniej2378 ปีที่แล้ว +704

    What’s scary is how AI can never say “I don’t know”. It’s giving its best guess but never phrasing it as a guess. So when it comes to factual information, we have to train *ourselves* not to believe AI, it will need to be fact checked.

    • @nati0598
      @nati0598 ปีที่แล้ว +47

      I think that's because most of the internet either has solid info or somebody winging it. People on the internet rarely say "I don't know", and even when they do it's a private conversation and not a blog post. If they didn't know, they wouldn't post a blog piece about it.

    • @geniej2378
      @geniej2378 ปีที่แล้ว +31

      @@nati0598 That's a great point, AI is usually trained on publicly available data (often stolen data) so that's already a curation of human language. It's not conversations, it's opinions mixed with facts/knowledge sharing acticles.

    • @pierregravel-primeau702
      @pierregravel-primeau702 ปีที่แล้ว +1

      You can program an AI to say I don't know but that would be kinda useless when you understand that AI get better with feedback

    • @petergraphix6740
      @petergraphix6740 ปีที่แล้ว +10

      >how AI can never say “I don’t know”
      Saying I don't know is also a learned behavior in humans, humans confabulate quite often especially where they have partial information. AI is perfectly capable of saying I don't know, and GPT-4 has got much better at it.

    • @KBRoller
      @KBRoller ปีที่แล้ว +12

      Firstly, it can say I don't know, it's just usually prompted NOT to. It's prompted with something like "give the user a helpful answer" instead, so it always tries to. If you prompt it in a way that tells it, "be honest and tell the user if you don't know something", it's quite capable of doing that.
      Secondly, most humans don't say "I don't know" often enough, either. There's that classic example of the French test that gave students a bunch of irrelevant info in a word problem, then asked a question that was impossible to determine. The correct answer is "I don't know" because no one could know; but something like 1/3 or more of all the students made up an answer instead. Because they figured if they're being asked a question, there must be a right answer, so they guessed at "how to figure it out".
      That said, there is an existing potential mitigation to such things already. There's an AI cross-encoder model which, when given a pair of text strings, can determine with high (though not 100%) accuracy whether one sentence follows from the other, or contradicts the other, or is unrelated to the other. And it can do these calculations in bulk. So it should be possible, in theory, to incorporate that check for contradictions into the training data, decreasing the weights of training data that contradicts a large number of other data points. (Sure, it's a "majority rules" approach, but actual fact checking would be more involved and subjective than I'm willing to try and design in a TH-cam comment 😁). By doing that, it would learn more from data that's high consensus, and less from data that's contradictory, and you could even include the "trust weights" in a partition of the input space as a separate modality, which would let it learn to output a confidence score with its inferences at runtime.

  • @tibees
    @tibees ปีที่แล้ว +3772

    Thanks for including me in this video! The question of how we measure intelligence for humans and non-humans is interesting, and it's my guess that once AI comes close to 'out-thinking' us in some regard, we will move the goalposts of what we consider true intelligence to be.

    • @GeekProdigyGuy
      @GeekProdigyGuy ปีที่แล้ว +80

      Although I have no doubt that, from a cultural standpoint, we will keep moving the goalposts, there will still be the reality of what human activities AI beats us at and some that it totally obviates. In terms of things humans can do with their brains, it seems like there will come a point where AI will develop new capabilities faster than humans can. That's one way of looking at the so-called singularity. The goalposts won't have anywhere left to go.

    • @shuu-wasseo
      @shuu-wasseo ปีที่แล้ว +24

      OH MY GOD ITS YOU

    • @Yaveen
      @Yaveen ปีที่แล้ว +48

      Yoo it's our local 3D human sent from the 4D world!!!

    • @evanhoffman7995
      @evanhoffman7995 ปีที่แล้ว +45

      I had this exact conversation just a few days ago. Intelligence used to mean math skills, then computers got really good at that so we moved the goalposts to language and art. Now computers have gotten really good at that, so we're moving the goalposts again.
      I have to wonder, are there any criteria we can come up with that humans meet but computers will never be able to? (I doubt it - progress keeps on marching.) And are there any criteria that, when computers eventually meet it, we'll acknowledge them as being actually intelligent?

    • @Forgefaerie
      @Forgefaerie ปีที่แล้ว +65

      @@evanhoffman7995 actual self awareness. computers right now are not intelligent. they are just much MUCH faster at processing information than human brain is, but they also cannot do that without humans creating ever complex algorithms for them to do so. So-called machine learning still involves and will continue to involve a great deal of human participation and prompting. "Ai" is not actualy good at creating art. its good using HUMAN created and continuously HUMAN adjusted algorithms to calculate patterns that recur in human created art and then copying those patterns, and it can ONLY do that with any degree of success when exposed to more art than any single human can process in several lifetimes..
      when/if AI becomes genuinely self aware, capable of actual contextual learning and understanding? then we can talk.

  • @tevor5753
    @tevor5753 ปีที่แล้ว +1782

    It's crazy how printer ink cost more per gallon that human blood

    • @memyself5866
      @memyself5866 ปีที่แล้ว +259

      Printer ink is upcharged so that they can sell the printer itself at a lower cost. Then people essentially pay the full cost of the printer through the ink purchases over time.

    • @abie1295
      @abie1295 ปีที่แล้ว +127

      Well human blood is a lot easier to stock/supply 💀

    • @WellBattle6
      @WellBattle6 ปีที่แล้ว +39

      Also it’s illegal to sell human blood in most places

    • @ez45
      @ez45 ปีที่แล้ว +150

      @@memyself5866 Yeah I mean I think everybody knows that. It's still a scam and we should demand better. Printers are literally the only piece of technology to have gotten worse and more expensive over the past 30 years.

    • @nosidenoside2458
      @nosidenoside2458 ปีที่แล้ว +60

      But rich old white dude needs another yacht, so the point you're making is invalid

  • @achilleus_eh
    @achilleus_eh ปีที่แล้ว +405

    "this was supposed to be an easy video," Sabrina says. Again. Sabrina, you always dig to the bottom of the questions you ask yourself. No matter the subject, it will never be an easy video

    • @Shrooblord
      @Shrooblord ปีที่แล้ว

      +

    • @gePanzerTe
      @gePanzerTe ปีที่แล้ว +2

      Many scholars and up in the chain of command do what the machine (well, it's programmers team) do : memory over mastery.
      🤗
      For the sake of lazyness (or obvious energy usage optimisation)

    • @kayleighalvarez5271
      @kayleighalvarez5271 ปีที่แล้ว +1

      I'm at the point where I wait for the "this was supposed to be an easy video" in every video.

  • @antenna8836
    @antenna8836 ปีที่แล้ว +135

    This is, unironically, one of the best sponsorships in a video I've ever seen. Entirely unique consumer story, will legit consider them. Definitely want to see more like this!
    And the idea as to "we don't know what we don't know" has suddenly spiked my interest in Socrates again, with the Oracle claiming he was wise to know the limits of his wisdom. Maybe that's the next step in AI -- getting it to admit ignorance.

    • @80nomads74
      @80nomads74 ปีที่แล้ว

      I feel like being able to admit ignorance would make it a more useful tool. Currently, you can ask ChatGPT to find a research paper on a subject, leukemia for example. It will give you a valid and coherent looking abstract on leukemia but when you try to find the rest of the paper (by going to who it said was the publisher), you would find the paper doesn't exit. ChatGPT made it up.
      I could see ChatGPT being extremely useful for finding sources for research if we could trust the result it gives. If we could trust it would say "no matches, rephrase your question" instead of making up a paper.

    • @nati0598
      @nati0598 ปีที่แล้ว +3

      We would have to get people to admit ignorance first xD

    • @revimfadli4666
      @revimfadli4666 10 หลายเดือนก่อน

      If only the devs displayed the level of certainty in each word...

  • @Kimmie6772
    @Kimmie6772 ปีที่แล้ว +320

    Thank you for verbalizing why i get agitated when people act like ChatGPT is some godlike intelligence. It is a language learning model. Yes, language and memorization is half the effort of how human intelligence evolved, but it is not the whole story. Ask GPT to do complex logic or math, explain their answers, and easily sway them to see where this quickly falls apart. It is generating responses it thinks you want to see or what you think is correct, nothing more.
    Edit: Very insightful conversation in the replies. You learn something new everyday.

    • @racerfranco5175
      @racerfranco5175 ปีที่แล้ว +16

      I like ChatGPT because I can have a direct answer that would consume more of my time than I'm generally willing to spend on. And that's it. Fast mediocre precise answers. I just do the reasoning myself. LOL

    • @maxkho00
      @maxkho00 ปีที่แล้ว +11

      GPT-4 could easily answer complex math questions. In fact, it passed a number of undergraduate and even postgraduate math exams with almost 100% accuracy. I think the Fiverr data scientist must have messed up his program somewhere since GPT-4 usually scores above the human average on IQ tests, and while physics-based questions are definitely one of its major weaknesses, it should certainly be able to score above 50% on the Learning From Experience test with chain-of-thought prompting.
      This video, like so often on this channel, is misleading; she jumped to an unproven conclusion far, far too quickly. The general consensus is that GPT-4 can definitely reason, with a general reasoning ability comparable to that of a human.

    • @Kimmie6772
      @Kimmie6772 ปีที่แล้ว +10

      @@maxkho00 straight outta the box or when given prompts beforehand? I may have been recalling a state before the update, but I've heard that they will also readily explain an incorrect answer if you suggest that the answer may be wrong. Sometimes if you ask it to explain why they came to a certain conclusion it doesn't make much sense either.

    • @maxkho00
      @maxkho00 ปีที่แล้ว +9

      @@Kimmie6772 Straight out of the box, but instead of prompting it to just answer the question, you also ask it to think step by step while doing so. This simple technique tends to improve results by a significant margin.
      Asking GPT-4 to explain its thought process ONCE it's already given you an answer is borderline useless since it doesn't have access to its past neuron activations (i.e. thoughts). So of course if you do that, it will just give you its best guess as to what it MIGHT have thought when generating the last response. Naturally, quite often, its guess will be inaccurate.

    • @Kimmie6772
      @Kimmie6772 ปีที่แล้ว +4

      @@maxkho00 that actually makes a bit more sense now. Thank you for explaining that. Though I wonder why you have to ask it to do it step by step in order for it to come to the right conclusion. My thinking was that it is because it was never taught anything math specific. However, if they specifically included math in the training data that makes the necessity a bit more confusing.

  • @richiejacobson4272
    @richiejacobson4272 ปีที่แล้ว +567

    Congrats on being "Derek'ed" Sabrina!
    That's a big milestone for any science educator on TH-cam!

    • @aaronfidelis3188
      @aaronfidelis3188 ปีที่แล้ว +46

      Agreed. It's very much a big honor to research something and getting beaten to it by another giant content creator.

    • @boltstrikes429
      @boltstrikes429 ปีที่แล้ว +28

      Someone should let Matt know

    • @davidioanhedges
      @davidioanhedges ปีที่แล้ว +43

      It just shows that what you are researching and making videos on is something people want to know about ... i.e. you are doing it right, keep going

    • @tango_doggy
      @tango_doggy ปีที่แล้ว +5

      HI reference

    • @hoej
      @hoej ปีที่แล้ว +2

      ​@@boltstrikes429and Steve Mould

  • @Jaigarful
    @Jaigarful ปีที่แล้ว +94

    Back in college for Mechanical Engineering (over a decade ago), I had a professor who said that he didn't expect us to memorize all the formulas. Instead he valued understanding how to use the formulas and when to apply them. While nursing students had to memorize all sorts of information on the human body, we didn't. We had cheat cards.
    It was awesome to get an exam where you had to work out a problem you've never seen before, but you had the tools to solve it.

    • @kaitlyn__L
      @kaitlyn__L ปีที่แล้ว +11

      Yep, every physics and electrical engineering exam I took a bit over a decade ago had the formulas printed in the front and back. There were hardly any multiple-choice questions either. The idea being exactly as you said - demonstrate you know how to use these tools, rather than just memorising a bunch of stuff or picking a premade answer.

    • @CottidaeSEA
      @CottidaeSEA ปีที่แล้ว +6

      @@kaitlyn__L This was the mentality at my school as well. It was great for maths, because we didn't have to worry about remembering every single little formula, we could just look at the cheat sheet and there it was. We still had to find the correct one out of multiple, so if you didn't know what you needed, you were screwed regardless.
      Although I believe that for the most part the students didn't actually use the cheat sheet all that much, I know I didn't. It was mostly there for reassurance, if I did forget, I could just look it up. That made it easier to focus on solving math problems rather than remembering stuff.

    • @kaitlyn__L
      @kaitlyn__L ปีที่แล้ว +3

      @@CottidaeSEA absolutely, by the time you’d done a bunch of practice scenarios in class you’d often memorised at least half of them anyway.
      That said, in a real exam, I found myself double checking I’d remembered them correctly just to be on the safe side.
      And it’s nice to know a student theoretically wouldn’t be punished for remembering a mnemonic incorrectly due to anxiety etc.

    • @gladitsnotme
      @gladitsnotme ปีที่แล้ว

      so that explains why theres a nursing shortage

    • @NorseGraphic
      @NorseGraphic ปีที่แล้ว +5

      Like multiplication, I never memorized the table. Instead I made a method coming to the answers. I simply broke it into smaller ‘items’ and then worked from there.
      Same with passwords. I don’t remember passwords, but I know the method how I construct them. Thus I don’t memorize as I rather create memorable methods of “How-to do solve ”.

  • @jamesbrown5277
    @jamesbrown5277 ปีที่แล้ว +61

    Absolutely loved doing the voiceover for this video. Thanks for having me Sabrina and making such awesome content! :)

  • @outsanely
    @outsanely ปีที่แล้ว +69

    This issue where AI fail at answering novel or reasoning based questions is something I've actually had to warn my students about- I teach physics, and specifically I focus on solving interesting and complex problems, which AI like ChatGPT can't seem to handle. I've always thought that part of this is that the training data for ChatGPT involves people asking for help on similar questions by sharing what they tried (and thus what didn't work).
    It's honestly quite interesting- I gave ChatGPT a "textbook" problem, and then gave it the same type of problem but written in a way that required more reasoning and logical arguments to answer. It did okay-ish on the first (still got it wrong, but got 2/3 of the way through before it went off the rails), but the second question it both got wrong and got wrong very early in its answer (like 1/3 of the way). It also culminated in my favorite error ChatGPT has made so far; it added 2 + 0.5 and got 5 😂

    • @petergraphix6740
      @petergraphix6740 ปีที่แล้ว

      Remember that LLMs are not effective calculators as they are token based and not character based. If you want to do any math in them, use something like ChatGPT+Plugins where you have the model pass any math questions off to a calculator for correctness.

  • @nihilisticgacha
    @nihilisticgacha ปีที่แล้ว +71

    I find it eerily similar how both modern education and AI training using standardised tests to evaluate students’ ability/intelligence, since both limit how they learn by memorising without understanding

  • @that_one_salad9778
    @that_one_salad9778 ปีที่แล้ว +153

    Sabrina may not have the best scores on these tests, but at least she's willing to constantly challenge and verify whatever her brain decides to come up with.

  • @GyroCannon
    @GyroCannon ปีที่แล้ว +455

    This was the big reason why I was skeptical of ChatGPT in the first place: general intelligence requires reasoning and extrapolation, as Sabrina pointed out
    But then ChatGPT schooled me on how useful just memorizing and regurgitating info can be 😅

    • @voidmain7902
      @voidmain7902 ปีที่แล้ว +22

      I'm actually not sure whether ChatGPT has the ability to reason. Seems like there might be limited reasoning capability but the patterns it uses can be unexpected or wildly wrong comparing to human logic. At least theoretically you can certainly encode some sort of algorithm inside a neural network, or any kind of machine learning model. That's what they are supposed to do after all.
      As to extrapolation, one thing always puzzles me is: do true extrapolation actually exist? If a human never see something, like no hint of it ever could possibly exist, they might still won't be able to imagine it. Then human could start to maybe derive a guess from common sense and meaning of something once there's a hint, which can be seen as a form of interpolation instead.
      Also: Asking a LLM a question with an image? That's absolutely gonna go south. It's like Stable Diffusion hallucination but backwards. Maybe when those image are used directly to generate an embedding for the LLM, the result would be better.

    • @tcxd1164
      @tcxd1164 ปีที่แล้ว +58

      ​​@@voidmain7902 Looking at the Google AI model's answer, I feel like reasoning isn't really the only thing these models are lacking, but also an understanding of how the world functions in general (see Tom Scott's video on "the sentences that computers can't understand, but humans can"), which is honestly kind of hard to learn (or even memorise) without being able to interact with the world in any way besides text/image inputs.

    • @voidmain7902
      @voidmain7902 ปีที่แล้ว +11

      ​@@tcxd1164 That's what I was implying with the "true extrapolation" point. Human can interpolate with common sense, which those LLMs or any models in general are obviously lacking. To some extent I'd even say that throwing more data maybe is the solution to the "Artificial Inability to be intelligent" problem, but it's almost impossible to achieve because the type of data it needs just can't really be fed to it (life experience, human senses, etc.)

    • @aedeatia
      @aedeatia ปีที่แล้ว

      @@tcxd1164 Apparently GPT4 can successfully answer 87.5% of questions from the Winograd Schema Challenge which is what that Tom Scott video is about. So not as good as humans but getting closer!

    • @bramastoprasojo2843
      @bramastoprasojo2843 ปีที่แล้ว +5

      ​@voidmain7902 There's actually a researcher by the name of David Chalmers that places that capturing of experience and how can AI have it to be one of the Hard Questions of AI. There's also John Searle and the Churchlands that argue whether AI can fundamentally understand and how can that arises in neural network connectionist models.

  • @acornmaybe
    @acornmaybe ปีที่แล้ว +337

    I extremely enjoy the editing of this video. It probably took so long to make

    • @AntonWongVideo
      @AntonWongVideo ปีที่แล้ว +9

      I guess you could say it would be *Trickey* for the average *Joe*

    • @jc_art_
      @jc_art_ ปีที่แล้ว

      ​@@AntonWongVideowagabagabobo

    • @spamburner9303
      @spamburner9303 ปีที่แล้ว +1

      @acornmaybe
      If not acorn then what? What secrets is this profile picture hiding? Please tell me the secrets of this possible nut.

    • @JoeTrickey
      @JoeTrickey ปีที่แล้ว +10

      It was actually probably the quickest turnaround we’ve had so far, about 7 days in the edit

    • @I.____.....__...__
      @I.____.....__...__ ปีที่แล้ว +4

      That opening scene alone looked extremely tedious. I don't think I'd have the patience to do all that work for such little return. 😕

  • @Simon-ow6td
    @Simon-ow6td ปีที่แล้ว +33

    My main problem with AI information engines like chat gpt is that it can very easily be confidently wrong about almost anything and if I don't have enough foundational knowlege to recognize when it gives suspect answers (especially if it has given multiple good answers before), or I have a bias that is consistent with the wrong answer, it is really easy to create missinformation that sticks around.

  • @TheInfintyithGoofball
    @TheInfintyithGoofball ปีที่แล้ว +78

    The shear amount of "cute adorable dork/nerd" in this video from the editing style to EVERY SINGLE THING SABRINA IS DOING
    (from straight up being hilarious to speaking one of the highest pitched sentences I've ever heard to slipping into a new york accent while holding a bunch of books
    this is easily one of the best channels videos on TH-cam and Sabrina & her friends continue to be hilarious.

    • @larakleefeld8855
      @larakleefeld8855 ปีที่แล้ว +5

      Low-key sounds like someone has a crush 😌 I agree tho

    • @TheInfintyithGoofball
      @TheInfintyithGoofball ปีที่แล้ว

      @@larakleefeld8855 I meant funny not attracted like a crush sooo.........

    • @larakleefeld8855
      @larakleefeld8855 ปีที่แล้ว

      @@TheInfintyithGoofball okay sure, checks out, I have no idea how these things work anyways

    • @TheInfintyithGoofball
      @TheInfintyithGoofball ปีที่แล้ว

      @@larakleefeld8855 I'm confused what do you mean by "these things"?

    • @larakleefeld8855
      @larakleefeld8855 ปีที่แล้ว

      @@TheInfintyithGoofball crushes and distinguishing between those and platonic affection 😅

  • @DanilaSaprykin
    @DanilaSaprykin ปีที่แล้ว +91

    The editing and humor throughout all of the video is just three steps forward, compared to earlier. Seeing you guys making such progress is very inspiring ❤

  • @MoneyShack
    @MoneyShack ปีที่แล้ว +771

    This should be a yearly video to see how quick AI is evolving

    • @SynthwavelLover
      @SynthwavelLover ปีที่แล้ว +37

      It's closer to a glorified calculator or a search engine. Not true AI, "AI" is just a marketing term really.

    • @user-221i
      @user-221i ปีที่แล้ว +34

      @@SynthwavelLover You don't what you are talking about. AI as field has existed for decades. Complex if else statements can be considered AI. What you mean is Artificial general intelligence.

    • @SynthwavelLover
      @SynthwavelLover ปีที่แล้ว +21

      @@user-221i AI to the layman conjures up images of T-800 Terminators, iRobot, etc. it implies the notion that the program is a self-aware, conscious being. That's what I meant when I said it's not "true AI".

    • @damientonkin
      @damientonkin ปีที่แล้ว +12

      ​@@user-221iI think what they mean is that the AI in question, generative AI, is not AI in the classical sense meant by AI researchers. Technically these are all forms of machine learning which is admittedly an aspect of AI research but if you call it AI then the public who don't know the technical distinction won't know the difference and will get more excited. Hence AI in this usage is more or less a marketing term.

    • @Luna-wu4rf
      @Luna-wu4rf ปีที่แล้ว +2

      ​​​@@damientonkinnah, it definitely is AI as considered by researchers. AI is just a term without a good description. So saying "it's not AI" is pretty meaningless, and it's literally just a thing laymen say to avoid calling a machine intelligent because they feel uncomfortable doing so. They'll keep moving the goalpost for intelligence (which is also not a well described term in researcher communities). If I had to bet what a more rigourous definition of AI as conceptualized by y'all it'd be autonomous objective-driven agents with the same or higher learning per sample than the humans. We're definitely not there yet, but it is, and we have had "AI" ever since GOFAI systems (symbolic systems as opposed to the Neural Networks of today) from the 20th century.

  • @Josh-kr9xf
    @Josh-kr9xf ปีที่แล้ว +27

    the transition at 16:58 is perfection! truly inspirational to see how much care goes into not only the video script and process but also the editing and shot selection!!!

    • @emilyrln
      @emilyrln ปีที่แล้ว

      Whoa yeah… I didn't even really think about it on my first watch because it's so smooth.

  • @kaitlyn__L
    @kaitlyn__L ปีที่แล้ว +28

    That glass with ice example is fascinating in its illustrative properties. I saw the ice, thought it must be empty because it's at the bottom, then looked at the top and laughed when I saw it was also illustrated to be full.
    I love when things are like that, when they basically are a joke rather than a big challenge.
    Though I'd also say it calls into question your assertion that these systems can provide quality alt-text, since Google's Bard would clearly call that picture "ice floating on water" rather than "ice sitting at the bottom of a bunch of water".
    And as a general IQ test, I think it probably still has problems with applicability - what does someone who's really smart but has never seen ice before do when confronted with that picture? I suppose ask what the heck those cubes are? I'm sure that's not a super large group of people who've never seen ice in 2023, but there must still be some.
    Also: I would've loved to have seen more about how the art challenge worked, since it caught my eye in the thumbnail and was the only one I wasn't familiar with beforehand. I feel it was skipped-over a bit in the video, but I suppose I can understand why.

    • @toomanyopinions8353
      @toomanyopinions8353 ปีที่แล้ว +2

      Yes, that is why IQ is very difficult to accurately measure. It's extremely biased. That said, it also is more helpful to have a human running the test on you. When there's a trained human running it, cultural differences can be accounted for. That's why in high level academic and research situations, IQ tests are never just a written test.

    • @kaitlyn__L
      @kaitlyn__L ปีที่แล้ว +2

      @@toomanyopinions8353 I would add the caveat that they can be accounted for if taken in good faith. History is full of examples of people being marked very low for lacking some assumed cultural knowledge, by people who wanted to “prove their inferiority”.

    • @Taterzz
      @Taterzz ปีที่แล้ว +2

      there's also the possibility that someone doesn't have the ability to recall experience watching said ice float on water despite having seen it. some things we just accept and never really bother to think about.

  • @Sideshowspike
    @Sideshowspike ปีที่แล้ว +8

    I am not concerned that this is supposed to be an educational channel. It was an educational video. Measuring two unlike things against each other produces unlike results. This makes sense. Thank you very much for the vid!

  • @OllAxe
    @OllAxe ปีที่แล้ว +128

    I've tried using LLMs and always found that they were really terrible at answering unique or complex questions. As such, I concluded that AI just hadn't come as far as the general public believes, but I didn't understand why there was such a discrepancy between the public image and my observations. This video did an amazing job discovering and explaining that discrepancy! I think most people probably to use LLMs like a natural language version of a search engine, and for that purpose, it's works really well, often better than asking the average person. That of course assumes the average person has to go off only the knowledge they have and can't utilize outside resources, which is usually not the case anymore, so a human equipped with Google is usually far more capable than a LLM even if a bit slower.

    • @stevenneiman1554
      @stevenneiman1554 ปีที่แล้ว

      The thing that's amazing about LLMs is that as long as it remains similar to conversations which humans have had in places the researchers could scrape, you can talk with a LLM and it will respond like a person in a conversation. As such, it passes the sniff test for humanity in a way that nothing before it could, including explicitly programmed chatbots. It's still an illusion, and for now it doesn't take THAT much to peek behind the curtain and see that there's no mind on the other side, but to a casual observer it's easy to hear the most excitable voices saying "modern AI is as smart as a person and infinitely faster", check in casually, and seem to confirm that.

    • @paracosm-cc
      @paracosm-cc ปีที่แล้ว +2

      Hi Ollaxe!!!!

    • @OllAxe
      @OllAxe ปีที่แล้ว

      @@paracosm-cc Omg hi oomfie

    • @ChaoticNeutralMatt
      @ChaoticNeutralMatt ปีที่แล้ว

      I'm curious which types of questions you've come across that it doesn't handle well? I've come across a couple, but as far as conceptual exploration it's been alright. Did you ask more specialized questions?

    • @OllAxe
      @OllAxe ปีที่แล้ว +2

      @@ChaoticNeutralMatt I think I asked Bing for Samsung's most recently released flagship and whether it included a charger and it just straight up gave me the wrong answer

  • @DustinRodriguez1_0
    @DustinRodriguez1_0 ปีที่แล้ว +401

    That 4% score for "learning for experience" is actually just a statistical error. The reason is because it should be 0. The way current "AI" implementation work, the 'learning' and 'using' periods are totally and completely separate. You could sit and have 400,000 conversations with GPT-4 right now and at the end... it would be exactly and precisely identical to the GPT-4 you started with. It does not learn from its input when being used. It only learns when explicitly in learning mode and being trained on datasets. I do think that OpenAI is going to take everyones conversations that've been had and the feedback they get from users and use that to train GPT-4 in the future, but they haven't yet. When they do, it will no longer be GPT-4. GPT-4 is a very specific version, and it can't be modified or 'learn' and still be GPT-4. This can be a bit confusing because you CAN talk to it and have it remember things over the short term, but that has to do with context windows and how long individual 'conversations' can be, but the way that works is NOT by having the network learn anything new. Instead, every time you type a line and hit enter, GPT-4 takes everything that you have said, and all of the responses it has given you already in that specific conversation, tacks your new line on to the end, and feeds THAT as the input into GPT-4, then it gives you what it predicts should come next. When you start a new conversation, all that gets thrown into the void and you start with a fresh context with the engine.
    People don't often talk about this learning/using dichotomy which I've always found odd. Like we have all of these home automation devices like Amazon Echo and Google Home and Apple HomePods... they record your speech and send it off to corporate HQ. That is totally unnecessary. Just completely not needed. It absolutely does require giant computing resources to TRAIN an AI... but then after it is trained, like when it's just figuring out what you just said, that takes extremely small amounts of computing power. As an example, Facebook/Meta has their own large language model like ChatGPT called LlaMa. They open-sourced it. They did not intend to open-source the actual post-training weights... but some employee went rogue and put em up on torrent, so now plenty o people have them. It took 1 day, less than 24 hours, for someone to take those and to create a version that would run and be usable on a Raspberry Pi, one of those credit-card-sized single board computers you can get for like $25.
    It cost Meta hundreds of thousands of dollars worth of computing time to train that model. But it can run on a $25 trash-tier CPU. The only reason that your speech gets uploaded to the corporate giants instead of the speech recognition running directly on your little home automation device is because the big company wants to mine your data. If the device did it without phoning home.... they'd be cut out of that sweet consumer data loop. It's the same reason Google doesn't do client-side video recommendations. They totally could keep all of your likes and subscriptions just on your device. And everything would even work better if they did that. But, it would cut them out of the loop, and they want to snoop to sell ads.
    Microsoft recently published a paper that I found very interesting. Instead of focusing on the quantity of training data, they focused on the quality. Instead of training a model on whatever garbage it could find online, they trained one on textbooks. The things we train humans with. And that worked radically better at producing a model that could master tasks and follow instructions and a few other metrics that are commonly tested with such models. I have wondered how long it would be until someone turned away from just general knowledge training towards something like specifically training a model to actually perform reasoning. I think that models like GPT-4 are functioning like a human being with no understanding of critical thinking, just gross intuitions formed through associations. It often sounds human because that's how loads of humans go about their life, rarely if ever actually engaging in concerted intentional reasoning. They stick with what 'feels right' without putting their views through any kind of rigorous challenge. So, train a model with reasoning, and then train it on all published science. Have it spot problems. We already know problems in old research papers that were published, so we can pretty easily tell if it is doing a good job. Then we can run more recent papers through it and see if it flags any problematic reasoning, unsupported assertions, etc. Just don't let it look at psychology or sociology papers or anything or it'll explode given how they run a test on a few dozen westerners and start talking about 'human nature' and presuming the things they found apply equally to tribesmen in the mountains of Papua New Guinea.

    • @xzonia1
      @xzonia1 ปีที่แล้ว +17

      That's what I was thinking. Thanks for writing it so I didn't have to. :)

    • @xskullcrow3269
      @xskullcrow3269 ปีที่แล้ว +2

      How would you use the same account in multiple devices if all the data is stored locally in one device?

    • @ilikememes1402
      @ilikememes1402 ปีที่แล้ว

      ​@@xskullcrow3269 yeah, could be more of a risk or vulnerability for google to do that. Comment still very good tho

    • @luipaardprint
      @luipaardprint ปีที่แล้ว +12

      Private cloud storage.

    • @chri-k
      @chri-k ปีที่แล้ว +26

      Not quite true. It knows what has already happened in the conversation and uses that to form its answers. That’s learning, even if it’s only transient.

  • @goodguyamr6996
    @goodguyamr6996 ปีที่แล้ว +363

    I live for Sabrina being the incarnation of chaos

    • @bertilhatt
      @bertilhatt ปีที่แล้ว +3

      There is the one second where the ship falls on the ground, and she’s… sad. Wonders if she should cut or continue or pick up the shrimp and… goes on. It’s magical.

    • @TheMoonlight1711
      @TheMoonlight1711 ปีที่แล้ว

      wouldnt be answer in progress without the chaos

    • @kathybramley5609
      @kathybramley5609 ปีที่แล้ว

      Agreed but I was just saying to the person above (@rogink) who gave up on the veritaserum video to come here I am watching this video at 1.5 speed and just quickly peeking the comments for a break before we move past the unusually integral ad slot/pausing further to check on my family first. It's great, but intense. And this is about my third or fourth science video watched at that speed. What is great about the comments though is that they reflect the energy of Sabrina and of the video - the comments floating to the top seem to be some wildly different deep dives that are all fascinating!

  • @PsychoSavager289
    @PsychoSavager289 ปีที่แล้ว +13

    7:08 - the bird in the middle is missing its tail, if anyone was wondering.

    • @joeljs9778
      @joeljs9778 7 หลายเดือนก่อน +1

      I feel so stupid rn, thx

    • @cruncyart
      @cruncyart 3 หลายเดือนก่อน

      So I got it right! Cool.

  • @consciouscode8150
    @consciouscode8150 ปีที่แล้ว +6

    For what it's worth, a big part of why it struggled with the image tasks is because truly multi-modal models are still a rarity, and likely the way your Fiverr engineer (and Bard) got around that was by using a smaller truly multimodal model which translated the image to text, which was then given to the smarter GPT-4. Depending on how well the image was labeled, it can lose a lot of context necessary for interpretation and also prime it to produce objectively wrong answers. With an unusual question from a feature-space it hasn't properly generalized yet, it may simply produce an answer that's grammatically correct (something it has very well-generalized) that doesn't incorporate its latent knowledge (like "not knowing" that ice floats in water). Once it's generated the text, it has no way to correct itself, so it carries on. A number of papers have been released which demonstrate massive (up to 900%) improvements in reasoning capacity if you structure its responses using "cognitive architectures" like "tree of thought", which allows it to ask itself if what it said was correct, then choose the best answer.

    • @vigilantcosmicpenguin8721
      @vigilantcosmicpenguin8721 ปีที่แล้ว

      Yeah, I didn't realize until now that multimodal models are so lacking. Someone is probably on it.

  • @markman278
    @markman278 ปีที่แล้ว +110

    I see AI, I see Sabrina, and I expect chaos

  • @tykl-
    @tykl- ปีที่แล้ว +43

    I actually love how well the sponsored segment tied into the video. You knocked it out of the park!!!❤

  • @mattkuhn6634
    @mattkuhn6634 ปีที่แล้ว +61

    Great video! I have an MS in computational linguistics, and one of the things that one of my professors in grad school always said about current LLMs is that they can't really be said to "understand" anything. They're very, very intricate stochastic pattern matching machines, but they don't "know" anything other than how to produce text. It's like Searle's "Chinese Room." Much of the advancement in the field for the past 10 years has been through getting more training data and using more computing resources in training, and we're starting to hit the limits of what that can get us with simple architectures.

    • @LM-he7eb
      @LM-he7eb ปีที่แล้ว +4

      The thing is the very base of intelligence is the ability to look at occurrences, and use them to predict/identify pattern.

    • @adv78
      @adv78 ปีที่แล้ว

      ​@@LM-he7ebThe thing is, an actual inteligent being can understand it's own limited inteligence. When a human is following logic and reaches a logic leap it doesn't have the information, we can sidestep it logic connections we have the information about.
      LLMs don't understand what is to "know" something, they receive words and output the most predictable words as a response, so when it reaches a logic leap, it just spits out nonsense.
      Until we create AI that can actually understand what it actually knows, and what these words represent, it will still be pretty limited

    • @fedweezy4976
      @fedweezy4976 ปีที่แล้ว +6

      I'm honestly surprised that the Chinese room isn't brought up more often when talking about LLMs. It's the first connection I made when I learned about the thought experiment.

    • @kennethferland5579
      @kennethferland5579 ปีที่แล้ว +8

      @@fedweezy4976 Prombably because the Chinese room is exceedingly dumb as an idea because it postulates that a rigid set of rules and crude tables CAN engage in conversation indistinquishable from a real person. From this assumption the Chinese room then asks questions on the nature of inteligence and knowlege. But because this is patently impossible, our AI's are not sets of rules and tables rather they are neural networks which must learn and then still can't pass a Turring test, any questions the Chinese room asks are as baseless as asking us to speculate on the philisophical implications of an inteligent block of cheese.

    • @fedweezy4976
      @fedweezy4976 ปีที่แล้ว +7

      @@kennethferland5579 that's kind of the point of a thought experiment mate, a hypothetical that gives us the structure to ask important and insightful questions. Hell, the color sorting hypothetical in the video is also impossible but it's used as scaffolding to highlight the problems with LLMs that can't be solved with more data.

  • @thelad7525
    @thelad7525 ปีที่แล้ว +7

    This feels like a college presentation project if it had the budget of a Disney channel celebrity PSA adbreak. But unlike Disney channel, it actually has the scuffed charm of an improvised college presentation that makes it work so well

  • @delaneys-books1290
    @delaneys-books1290 ปีที่แล้ว +9

    I love how you continue to connect with different creators and experiment with different video formats, it just makes everything so fun!

  • @thork6974
    @thork6974 ปีที่แล้ว +34

    I would honestly have taken forever to figure out that the ice cubes should have been floating. Probably because when I fix up a glass of ice water I fill it with ice from top to bottom.

    • @C4Oc.
      @C4Oc. ปีที่แล้ว +5

      For some reason I wouldn't have gotten it either had she not revealed it. Left me feeling pretty stupid.

    • @kaitlyn__L
      @kaitlyn__L ปีที่แล้ว +7

      It's also kinda hard to see the water, it's a very low contrast light-blue against white. Not sure if that's a problem with the scan or just the accessibility of the print.

  • @erikvdb
    @erikvdb ปีที่แล้ว +22

    I remember when I took the IQ test I got infuriated with the "what's wrong with this picture" questions. It's a completely normal picture of a glass of water with 2 solid cubes made of glass inside. Why would you think they are ice cubes!? They would be floating!
    Others I chalked up to just being artistic choices, to make things more visually interesting. Besides, all of them are drawn to spec, so they are literally perfect.
    Anyway it was then that I realized I was destined to become an artist/designer, not a scientist.

    • @kaitlyn__L
      @kaitlyn__L ปีที่แล้ว +9

      I wondered how someone would react to that image if they hadn't seen ice before, this is another good example of how it presupposes a lot of familiarity and leaps of logic.
      Because as you say, there are coherent explanations for the image but they rely on something other than the assumption to be true. Which I would argue is also a very important skill in science tbh.

    • @jasonhayes7962
      @jasonhayes7962 ปีที่แล้ว +10

      Not to mention the fact that ice doesn't ALWAYS float in a glass of water. I literally could not figure out what was wrong with that picture because I think the answer is wrong. All I saw was a fresh glass of water where the ice is still stuck to the bottom of the cup which happens quite often until you bang the cup and it breaks away. It may not be the majority of the time this occurs but it definitely happens enough to not be unusual. Yes I do suspect I'm autistic. Why do you ask?

    • @tovarishchfeixiao
      @tovarishchfeixiao ปีที่แล้ว +4

      @@jasonhayes7962 Well, the floating also depends on the size, shape, and density of the ice, and also on some other factors of the water.
      Just like as how boiled egg can sink or float depending on the factors of the water.

    • @PhotonBeast
      @PhotonBeast 10 หลายเดือนก่อน

      For that matter, as the 'what's wrong' picture in the video with the chicken... what about people who are red-green colorblind? And not only that, but in general, with questions like that the tests also assume that the test creator has perfect knowledge and perfect cultural knowledge and is asking questions that aren't reliant on that and in a way that is conpletely unambigious. Which... yeah, even if that is all true, it also asusmes that such questions are, in fact, indictative of some kind of intelligence versus merely a lack of knowledge or interest.

  • @RiffTopp
    @RiffTopp ปีที่แล้ว +171

    Anybody else kinda want to take those tests now?

    • @evewithwonder
      @evewithwonder ปีที่แล้ว +20

      Yes! But also, no 😂 I know I'd do awful and suffer

    • @fimbulvetr6129
      @fimbulvetr6129 ปีที่แล้ว +45

      I will have two modes through those tests; “isn’t that just common sense?” And “the heck do you mean *silly*?”

    • @WorldWeaver
      @WorldWeaver ปีที่แล้ว +1

      I would be the dullest cookie in the box.

    • @elk45
      @elk45 ปีที่แล้ว +19

      Absolutely. I need to redeem myself for not figuring out the ice cubes 😂

    • @korganrocks3995
      @korganrocks3995 ปีที่แล้ว +10

      I had to take an IQ test recently, and I never want to take anything else like it again. Partly because I felt like my brain had melted afterwards, and partly because I scored well above average, and I'm not gonna risk losing that sense of superiority! 😄

  • @limitbreak2966
    @limitbreak2966 ปีที่แล้ว +1

    OMG FINALLY YES I FOUND YOUR CHANNEL AGAIN!! I couldn’t remember your channel name for the life of me, besides knowing it was something to do with “answer” or “ask”, and I was thinking so hard of the video name from the videos I remember, but just couldn’t figure it out, so I just searched ask in TH-cam and scrolled through videos to find it but nothing, but remembered the channels only filter existed, and still nothing with ask, but then tried answer, and like 20 channels down… I FOUND IT!! :D so happy I finally found it I’ve been trying to find your channel for like two months, so glad I found it finalllyyyy

  • @Skrymaster
    @Skrymaster ปีที่แล้ว +1

    At 10:20 to be fair, the one with the 2 ice cubes in a glass of liquid was a complex question. The situation isn't outright impossible(which seemed to be the default conclusion AI drew), however it is silly, which still qualifies. If we consider (water)ice has a density of about 0.9g/cm3 whereas if the liquid they were submerged in wasn't water but for example high percentage (95%+) ethanol solution, with a density of around 0.8g/cm3, the ice would be more dense by comparison and wouldn't float. In the same vein, the "ice cubes" could've been any other denser-than-water transparent liquid brought to solid state and submerged in water, and it would still qualify.
    The image alone doesn't convey that much information, so we're inclined to believe that it's all water by assuming "normal/conventional" scenarios and experience that we might've encountered in our daily life. Thus, ice not floating, while not impossible, should be seen as a silly.

  • @iau
    @iau ปีที่แล้ว +23

    11:31 This is the key to understanding current AI. It's either surprisingly smart or utterly lost. There is hardly any in between and the AI itself doesn't have a clue that it behaves this way either.

  • @handle15615
    @handle15615 ปีที่แล้ว +45

    The timing of this could not be any more perfect. I literally just finished watching your AI pasta video an hour ago.

  • @Tha3l
    @Tha3l ปีที่แล้ว +56

    12:00 fun fact, "program an AI" was a task given to students as a bonus in some early programming course. It was supposed to be easy, and suddenly people had to research the nature of intelligence.

    • @that_one_salad9778
      @that_one_salad9778 ปีที่แล้ว

      THIS!! I absolutely despise when my professors at university (or past teachers) give me very vague task specifications that's very much up for interpretation.
      Too many times have I covered 5-10 different interpretations of a task only to be told that the solution was intended to be something more specific and simple and I've done "too much".

  • @ambassadorkees
    @ambassadorkees ปีที่แล้ว +1

    ChatGPT was asked how many kW can be taken out of a certain voltage 3-phase network *single phase* given a 25A fuse. Mistakes:
    1: Used SQRT(3) to multiplicate (for 3-phase) while it should be dividing:
    2: Solved the derived equation wrong, arriving at roughly 1,5x25 = 80....

  • @ryzvonusef
    @ryzvonusef 5 หลายเดือนก่อน +3

    7:15 the answer is Elon Gasper, graduated from Michigan State University and created a bunch of tech startups in 1980's and 90's.
    (Elon Musk went to Wharton in Pennsylvania, incase you were curious.)

  • @ez45
    @ez45 ปีที่แล้ว +15

    I love your voice, you have such a great range of intonations and just the right amount of energy in it. I could listen to you talk for hours.

  • @MitchMitchellStories
    @MitchMitchellStories ปีที่แล้ว +48

    This was fascinating, and when I think about it now it all makes sense. In general, AI is pretty much just a computer that memorizes things without being able to understand nuance or critical thinking. Makes me feel better about civilization for at least a while longer.

  • @elliejohnson2786
    @elliejohnson2786 ปีที่แล้ว +13

    This is how you do sponsors. I felt compelled to watch your sponsor AND it changed my mind about what the product was actually about since it was integrated into the video.

  • @ceegers
    @ceegers ปีที่แล้ว +4

    I think I've only ever seen your vlogbrothers fill-ins before, when you were "nerdy and quirky". This is excellent, I loved the way you made this video!

  • @stevekullens4898
    @stevekullens4898 ปีที่แล้ว +1

    The best way to beat a LLM is to test it on something that wasn't included in the training dataset. It becomes pretty clear that it's just regurgitating what it learned without actually knowing anything.

  • @clarakf
    @clarakf ปีที่แล้ว +40

    Not even 5 minutes in but I gotta say the editing on this video is superb

  • @SleeplessBrazilLimbo
    @SleeplessBrazilLimbo ปีที่แล้ว +35

    sabrina puts so much effort in the videos, it motivates me to keep studying and learning

  • @Phantaminium
    @Phantaminium ปีที่แล้ว +16

    ChatGPT is like a mix between a calculator and googling. Any question I ask, no matter how complex, I believe I could figure out on my own, hence I can determine if the answer is right or wrong, what it misunderstood and what it got partially right. Ironically it's almost always partially wrong but still useful.

    • @ThePizzabrothersGaming
      @ThePizzabrothersGaming ปีที่แล้ว

      *right

    • @vigilantcosmicpenguin8721
      @vigilantcosmicpenguin8721 ปีที่แล้ว +2

      You know the Chinese room thought experiment? The scenario is that you're in a room filled with books on Chinese, and someone who can't see you is asking you questions in Chinese, and, even though you don't know Chinese, you can look it up in the books. The conclusion is that, since the person in this scenario wouldn't be considered to actually speak Chinese, neither would an AI translator that uses training data.

    • @Phantaminium
      @Phantaminium ปีที่แล้ว

      ​@@vigilantcosmicpenguin8721I've heard of it, although I believe the questions were supposed to be given on paper rather than asked to avoid meaning being derived from verbal ques.
      2 things
      1. I believe even in that scenario ,given time, a person could actually derive some meaning based on their experience knowing one language and patterns.
      2. That achieving 1. is impossible for AI so rather than 1 person in a room AI is like a tree of rooms where the only choice is who to pass to and the answer is the path itself making it impossible for any participant to see any pattern or understand anything.

    • @trenvert123
      @trenvert123 ปีที่แล้ว

      Is it still useful, or are you overestimating its usefulness?

  • @jackalbane
    @jackalbane ปีที่แล้ว +9

    I’m actually pleased that AI is a long way from outthinking us. It means we have to put work in to create an AI that will start to wonder about itself. If AI doesn’t get further than data collection and dissemination, that’s a thumbs up from me.

    • @RohitBhati-td1hw
      @RohitBhati-td1hw ปีที่แล้ว

      That's not how llm works, it can actually start thinking about problems just like humans with data they have.

  • @dogukansaka2417
    @dogukansaka2417 ปีที่แล้ว +2

    as a biology undergrad student. What I understand is main problem is ai has memory based return. But in live intelligent has compound return system. Any information you take spread out your brain and you return what you understand. Like smell of something, goes your memory but same time visual information and you can call visual to smell same time. Also inf can be separated pieces. But ai can't be separated information as efficient our brain can do.

  • @HotelPapa100
    @HotelPapa100 ปีที่แล้ว +9

    What is silly about the ice cube image is that nobody would pay the insane amount it would cost to put heavy water ice cubes in your drink.

  • @gaggix7095
    @gaggix7095 ปีที่แล้ว +19

    There is a paper released a few days ago called "Large Language Models as General Pattern Machines", where they reach >10% on ARC without any finetuning using the davinci model.

  • @WilliamCarterII
    @WilliamCarterII ปีที่แล้ว +8

    "Terrible vibes we are going to leave over there."
    I fell out.
    The antropologist in me was like "oh no. We're going there."

  • @scottshapton
    @scottshapton ปีที่แล้ว +3

    9:22 Nuclear reactors are supposed to be stable, with exactly as many neutrons produced as are absorbed in any given time step, which is the point defined as critical. Subcritical would be the number of neutrons is going down, while supercritical is the number is going up. As the time between neutrons causing reactions is on the order of microseconds, the preceding terms generally include some delayed neutrons from decay, bringing the time step to the order of milliseconds (about 4-6% of the total if memory serves). If you ever hear that something went "prompt [super]critical", that means it stopped needed the delayed neutrons and moved into the microsecond response ranges where no human could possibly act to control it, thus only good design could save it.
    At least that's what I remember from a nuclear engineering course I failed during CoViD.
    A rhetorical question was answered with excessive detail, and now I must fade back into the internet. Good video.

    • @Qo0_0
      @Qo0_0 ปีที่แล้ว

      I feel like I have went through 50 years of training on the mountains after reading this. Thank you

  • @exLightningg
    @exLightningg ปีที่แล้ว +3

    Printer Tip: Don't buy a printer based on it's price, buy a printer based on the price of it's ink. Most of the well known brands make their printers cheap but screw you over on the ink cartridges because at that point you're stuck in their ecosystem. So much bad consumer stuff when it comes to printers/ink, really hope they become even more obsolete in the future.

    • @alisonlaett9625
      @alisonlaett9625 22 วันที่ผ่านมา

      If you’re patient enough, printer ink is actually refillable. You don’t need a new cartridge every time, there’s kits that are much much cheaper that allow you to refill the original cartridge. It takes a little more time though than just popping a new cartridge in.

  • @acakecat7581
    @acakecat7581 ปีที่แล้ว +15

    I love it when Sabrina gets Jealous,
    her competitive side is pure terrifying intelligent chaos 😄

  • @bopcity5785
    @bopcity5785 ปีที่แล้ว +9

    Great video. One important addition perhaps is that gpt-4 was trained on significantly more data than other large language models. This continued a pattern of ais appearring more intelligent in the ways shown while it is expected to eventually level out

  • @missingsunstone
    @missingsunstone ปีที่แล้ว +9

    17:29 Sabrina ranting about the unfortunate timing of MatPat’s and their pizza video was not something I expected but should have seen coming 😭😭 (Both videos were good though!!! XD 💖💖

  • @JohnTovar-ks8dp
    @JohnTovar-ks8dp ปีที่แล้ว +7

    Just...Wow! As a very occasional watcher of this channel, both the amount and quality of improvement in the production of Sabrina's videos are truly impressive. Perhaps a true test of artificial intelligence will be when several of them can work together to improve the output.

  • @ChozoSR388
    @ChozoSR388 ปีที่แล้ว +2

    Actually, that picture of the ice cubes sunk in the water is completely possible; the ice cubes just need to be made from heavy water (deuterium dioxide) instead of regular water.

  • @CDCI3
    @CDCI3 ปีที่แล้ว +9

    18:18 "Also, thanks to the people who helped make the video whose names don't start with 'T'."

  • @puzzledhead6987
    @puzzledhead6987 ปีที่แล้ว +4

    I went into shock when Tibees showed up???? Fr the two smart youtubers I watch collabing?? It's surreal

  • @ame7165
    @ame7165 ปีที่แล้ว +4

    one thing to note which is a huge factor that I've noticed from my time experimenting with gpt4 via the API is its inability to solve problems well that require more than one "step". example, create a username in Ukrainian leet speak. like replace some letters with numbers that look similar. like "leet" in leet speak is 1337, and "apple" would be "4pp13". then you prompt gpt: "hey I found this username in a game. I think it's foreign. what might it mean?" it'll likely choke. but if you tell it "hey I think this username is Ukrainian, but they might have replaced some letters with numbers that look similar, what might be username be in just Ukrainian alphabet without the numbers substituted in?", it will likely ace that problem. then if you take that result and ask it what that Ukrainian username might mean, it will probably also get that. but rarely does it succeed at such tasks when it requires serial processing. gpt4 does this better than gpt3.5, so I think they do something like this in the backend. thought you might find that interesting. I think that makes it worse at some of the things it did poorly at in your testing. you'd almost need to dedicate nodes to ask the one performing the work the questions to spark critical thought. even asking the LLM to evaluate an answer it gave to a question asked seems to help. generating a response is one step. so asking it to evaluate its own answer and asking it if it could be improved can get better quality results. I find that 3.5 benefits from this way more often than 4, so I think that's one of the big things 4 probably does something similar to behind the scenes

  • @therealelement75
    @therealelement75 11 หลายเดือนก่อน +1

    I asked ChatGPT to give me a chord progression. It gave me a good one, but it was too short. I asked it for 7 chords and it gave me SEVENTH chords. I told it I wanted 7 chords, not 7th chords, and it gave me a 5 chord progression in a different 7th chord.

  • @goronska
    @goronska ปีที่แล้ว +1

    I don't know which impresses me more here: the amount of research, or the wildness of this montage and video editing.

  • @charliefiddler3787
    @charliefiddler3787 ปีที่แล้ว +9

    This was a really insightful video. Thanks for all your work on it! I wasn't aware of all of these types of testing, but interested in how they work now!

  • @andrewdddo
    @andrewdddo ปีที่แล้ว +37

    I'm starting to believe the more Sabrina makes videos the more she's being driven into insanity xD

  • @ethanyalejaffe5234
    @ethanyalejaffe5234 ปีที่แล้ว +8

    This is both an extremely informative and extremely funny video. I don't know how you manage to consistently manage to turn out such amazing content.

  • @jm76430
    @jm76430 ปีที่แล้ว +2

    intelligence isn't the ability to extract information from knowledge. it's the ability to find deficits in the available knowledge and determine methods for filling those voids.

  • @Atzy
    @Atzy ปีที่แล้ว +1

    I think the misunderstanding of "reactors being critical" is because of movies. If reactor operator or a nuclear physicist says a reactor is critical that means its working as intended. Criticality is necessary to produce energy. A sub-critical reactor is just going to sit there.
    If an actor says the reactor is going critical though, you better run.

  • @willbrand77
    @willbrand77 ปีที่แล้ว +4

    This is exactly my experience with GPT 4 - it's either super intelligent or just mind numbingly unable to grasp simple concepts. It's often frustrating

  • @writeon2593
    @writeon2593 ปีที่แล้ว +5

    10:24 Meanwhile, I struggled to even see the ice cubes, so I just saw a glass of water on a teal surface. I would probably do even worse than the AI if the questions were time limited like this.

    • @muche6321
      @muche6321 ปีที่แล้ว +2

      There is also the assumption that it is ice in water. I believe ice in alcohol will sink. Glass cubes in water will sink as well.
      Thus, in this regard, nothing about that image is impossible. To identify anything silly about it means to talk about humor, arbitrarily rank various kinds of humor, and declare the lower ranks of humor as silly.

  • @notoriouswhitemoth
    @notoriouswhitemoth ปีที่แล้ว +11

    'Did you know nuclear power plants are supposed to be critical?'
    I did, but only thanks to tvtropes. In the context of nuclear engineering, "critical" means that a reaction is self-sustaining and doesn't need energy from an outside source to keep going. That's kind of important when your goal is to get more energy out than you're putting in.

  • @user-xr7fw4ks2n
    @user-xr7fw4ks2n ปีที่แล้ว +1

    There's a saying that relates mostly to D&D, but I feel like part of it makes sense here:
    "...Intelligence is your ability to understand that a tomato is a fruit, wisdom is your ability to know that a tomato doesn't belong in a fruit salad..."
    In these terms: AI is intelligent, but not wise.

  • @yousorooo
    @yousorooo ปีที่แล้ว +1

    When using LLM, it’s important to keep in mind that they are purely statistical models based on language and not logic.

  • @SkyfishArt
    @SkyfishArt ปีที่แล้ว +7

    Aww, i wanted to see more examples of where you and the machine got the answer different. maybe 20 highlights of the most interesting or funny ones.

  • @bastian_5975
    @bastian_5975 ปีที่แล้ว +7

    Trying to judge AI intelligence like this is kind of like trying to have a cheetah race a swordfish in a triathlon. Our intelligence was shaped by the need to quickly respond to an ever changing environment in nature, whereas AI is specifically trained to give us whatever output we want reliably. We are getting better at training AI to be adaptable by using different training techniques, but it's still a difference of something that was created for adaptability that can try to specialize vs something that was created to specialize trying to be adaptable.

    • @ayoCC
      @ayoCC ปีที่แล้ว +3

      There's more value in creating something we can't do and get value now, rather than try to surpass us in things we are already good and invest a lot and not see returns very quickly.
      It's about filling the gap in our capabilities.
      Eventually it will directly compete though I guess

  • @Jcewazhere
    @Jcewazhere ปีที่แล้ว +10

    Anyone else wanna get a copy of those tests to see how we compare?

    • @BabyCalypso
      @BabyCalypso ปีที่แล้ว

      Exactly what I came into the comment section for

  • @bobkeane7966
    @bobkeane7966 ปีที่แล้ว +1

    Sabrina reminds me of Andy from Headspace, her image and voice are deeply calming like learning from a friend.

  • @FirstNameLastName-gh9iw
    @FirstNameLastName-gh9iw ปีที่แล้ว +1

    11:46 it’s like calling an encyclopedia smart. It has a lot of information but it doesn’t actually understand any of it

  • @fakjbf3129
    @fakjbf3129 ปีที่แล้ว +6

    I would love for you to release the testing questions and answer keys so we can all take the same test and grade it to see how well we did.

  • @ColtKirwan
    @ColtKirwan ปีที่แล้ว +5

    Definitely scored better than I could've done

  • @davidkra230
    @davidkra230 ปีที่แล้ว +6

    3:28 absolutely killed me

  • @apocalyptosoldier5527
    @apocalyptosoldier5527 ปีที่แล้ว +2

    Maybe it was heavy ice, made from heavy water (deuterium oxide).
    It's, surprisingly, heavier than normal water.

    • @carultch
      @carultch 7 หลายเดือนก่อน

      It's not that surprising, considering that the extra neutrons have no impact on its geometric size. The heavy water ice is the same size as the same population of molecules of ordinary ice, and is therefore 11% more dense. And dense enough to make it heavier than liquid water.

  • @megadjc192
    @megadjc192 ปีที่แล้ว +3

    AI are brute force machines atm, and they don't actually do real time analysis based on my experience. They basically calibrate out bad answers and hope for the best. Most humans can run circles around the AI as the AI aren't really mobile in terms of accepting new data and discarding old or bad data... They don't have good real time correction. Great video on this topic!

  • @KomradZX1989
    @KomradZX1989 ปีที่แล้ว +5

    I absolutely LOVE the energy and vibes you have in this video and all your others too ❤
    1000/10

  • @dianaoflesbos6197
    @dianaoflesbos6197 ปีที่แล้ว +5

    This is what I've been saying whenever AI is brought up and I'm so thankful I have this to back me up now (yes I was too lazy to do my own research)

  • @juliayk
    @juliayk ปีที่แล้ว +5

    1:51 I thought you were holding a cross at first lmaooo

  • @twoflouers
    @twoflouers ปีที่แล้ว +1

    I think the problem with ice and Bard was in the image recognition part.
    I'm guessing it's an image recognition module plugged into a text-based model. So it described the picture as "ice floating in a glass of water", then prompt goes "describe to me what is impossible about this".
    All of these models have a large presupposition for the prompt being correct, so it tries to generate an answer based on the picture with the "ice floating in a glass of water" being impossible, and because that IS possible, botches the job completely.

  • @33pandagamer
    @33pandagamer ปีที่แล้ว +1

    Today I learned that people from different places have different ways of saying that they are "taking a test".

  • @Dojafish
    @Dojafish ปีที่แล้ว +4

    The editing and effort of this videos has gotten to another level !
    Congrats to the team .

  • @3nertia
    @3nertia ปีที่แล้ว +31

    "AI" knows nearly everything but it *understands nothing*

    • @yahiiia9269
      @yahiiia9269 ปีที่แล้ว +5

      The problem with your assessment is that it has been tested in areas it wasn't trained on, so it does understand things. The actual issue is that it can't learn afterwards, nor can it reiterate an idea and think it through like a human can. You are essentially speaking to a sub-conscious, the first thought after someone asks you a question, that is what GPT4 delivers.
      Let me remind you, OpenAI claimed this model made up a lie to get a human to solve a captcha. It wasn't trained to do that. A lot of people seem willfully ignorant of the fact that multi-modal machines are being developed as we speak, which gives it a whole new level of understanding and that these previous training methods exclusively involved text.
      They also actively limit what it is allowed to do in public use cases. It also been able to use tools to its advantage.
      This is like sticking a person in a box where they can't even see themselves, nor hear what a word sounds like, all meaning of every word, culture, everything it knows would be solely derived from text, with each word being weighted appropriately.
      Yes, it isn't super intelligent... yet.
      All it truly needs is the same data we get through sensory input and millions of years of mammalian information being preserved for basic survival.
      It's like a skeleton with weak muscles. It can't even iterate on itself and learn. After the training is done, that's where its intelligence stops, unless you continue training with new data. It also doesn't use math as an empirical framework for everything else it learns and on top of that, we are stupid idiots.
      We make stuff up all the time. If anything, the fact that GPT4 can do things that it wasn't trained on is enough to consider worrying.

    • @fimbulvetr6129
      @fimbulvetr6129 ปีที่แล้ว +4

      @@yahiiia9269Ngl sounds AI generated :0

    • @hugofontes5708
      @hugofontes5708 ปีที่แล้ว

      ​@@yahiiia9269good to remind the AI came up with a lie to get a human to help it. But I suspect somewhere in it's training data there might have been something about human-AI interactions anyway, there's just too much stuff in there.

    • @yahiiia9269
      @yahiiia9269 ปีที่แล้ว +1

      @@fimbulvetr6129 And that's why I am warning you all. I typed this crap by hand. There was no AI involved.
      It doesn't have to understand anything the same way we do. It just has to perfect mimicry.
      The current AI models are literal skin-walkers. Some of them might have my literal comments trained into them and your comments too.
      These AI's are all of us. They were trained on us.
      And the thing that everyone is interacting with is an LLM, trained to mimic written works. It's exclusively designed for language and nothing else. Imagine something that was actually trained to mimic us entirely.

    • @yahiiia9269
      @yahiiia9269 ปีที่แล้ว +1

      @@hugofontes5708 Yes, but it's not about the training data, it's about the fact that it knew when to utilize its training data to meet its goal.
      The fact that it connected the logic that it can't do captcha's, then the human tells it that it might be illegal and the LLM then knows to utilize deception in its data model in this specific instance should be worrying.
      Not to mention that these models are actually layers of different expert models cooperating.
      Fact of the matter is that we are creating something that does in fact understand. The issue you saw with Bard looking at the image is that Bard is not only a worse model than GPT-4, it also used an entirely different model for image recognition.
      And even in that case, the image recognition model didn't give the position of the ice cubes to Bard. If you describe the image through natural language to GPT-4, this is the result:
      My query:
      "Okay, there is an image with an ice cubes laying on the bottom of a drinking glass full of water. What is wrong with it?"
      GPT-4 response:
      [probably general language model being activated]
      There is nothing inherently wrong with ice cubes laying at the bottom of a glass of water under normal conditions.
      [probably scientific expert model being activated]
      Typically, ice cubes float in water because ice is less dense than water. The density of ice is about 0.92 g/cm³, and the density of water is about 1 g/cm³. This means that about 8% of an iceberg (or an ice cube) will be above the surface of the water and the rest below.
      However, there are some circumstances where ice could sink in water:
      Salinity: If the water in the glass is very salty, its density could be higher than the ice, causing the ice to sink. This is why icebergs float in the ocean, but could potentially sink in the Dead Sea.
      Impurities: If the ice cubes have impurities that increase their density, they could sink.
      Temperature: Ice can exist in different phases (e.g., 'Ice Ih', 'Ice II', 'Ice III', etc.), each with a different density. Some of these phases of ice are denser than water and would sink. However, these phases only occur at very high pressures or low temperatures, which are not typically found in a drinking glass.
      [General model activated with the previous weights of the expert science model/models]
      So, under normal conditions, it would be unusual for ice cubes to sink in a glass of water. If you see this happening, it might be due to one of the reasons mentioned above.
      Like, why does she assume she's right? Do we even have data on the salinity of the water in the image or what the ice cubes that sunk contain? Are they ice cubes at all?
      I mean, the human in this experiment and Bard were both working off of conjecture. It's common that ice floats to us. Without experience in seeing ice float, why would you assume that ice sometimes doesn't float?
      Literally the entire video is flawed. We don't even see what or how anything was tested.
      So not only was Bard a weaker model, it was reliant on another AI model to detect what is in an image. I bet that that the AI model that is used for image detection had no idea what to say about the position of the ice cubes.
      So then the AI model that detects images sends "This is an image of ice in a glass cup on a wooden table", then Bard, which is an LargeLanguageModel(LLM) in reality, sees this and makes stuff up to answer you, because what on earth is it going to derive out of such little information?
      Ask a human, "What is wrong with this image of a glass cup on a wooden table?" and don't show them an image and watch what happens.
      This is what Bard saw.
      And as you saw in the GPT-4 example, LLM's are capable of understanding what's wrong with the concept itself. In fact, it even expanded on the fact that it might not be wrong at all in certain cases.
      The fact is, we don't understand. If we understood why Bard gave that response, we wouldn't pretend that it was a "lack of understanding". Bard didn't even get the chance to understand what was happening, so it lied/hallucinated a response.

  • @Will-kt5jk
    @Will-kt5jk ปีที่แล้ว +5

    17:20 - I believe the technical term is “being Derecked” (as coined by Steve Mould & Matt Parker due to his uncanny ability to put out videos on the same thing you’re currently working on)

  • @Sweetcarolinainseoul
    @Sweetcarolinainseoul ปีที่แล้ว +1

    I was recently teaching an ESL class for April Chungdahm in South Korea, and the topic of the week was stars and constellations. So I clicked on the youtube link in the lesson, and I saw someone talking who seemed very familiar. Then, I realized that it was a much younger Sabrina with longer hair. Sabrina used to make kids science videos.

  • @timermech
    @timermech ปีที่แล้ว +2

    Great vid on the difference in actual intelligence and, for all intents and purposes, memorizing data. ( sudden recollection of Data ( Star Trek ) and his struggles to be
    more human ),
    The definition given has to be long and wordy because of what intelligence
    truly consists of. Using the information obtained to solve not just A problem but applying it in many areas without an obvious relational connection. Synthesizing the data to formulate new uses and application.
    Truly enjoyed your breakdown and the depth you went to achieve the answers you were seeking.