ChatGPT can't do math...

แชร์
ฝัง
  • เผยแพร่เมื่อ 9 ก.พ. 2025
  • 🌏 Get NordVPN 2Y plan + 4 months extra ➼ nordvpn.com/to... It’s risk-free with Nord’s 30-day money-back guarantee! ✌
    Against my better judgement, I decided to give ChatGPT another chance to solve a maths exam. This is the 2023 British Mathematics Olympiad Round 1. Has it improved since last time?
    The exam paper remains the property of UKMT and is used under a fair usage policy with no commercial gain. Find out more about the work of UKMT here: ukmt.org.uk/wh...
    Watch ChatGPT-3 take the Oxford Maths Admissions Test here: • Can ChatGPT Pass the O...
    You can download the 2023 BMO exam to try it for yourself here: ukmt.org.uk/wp...
    You can try all of the competition papers produced by UKMT for free here: ukmt.org.uk/co...
    Produced by Dr Tom Crawford at the University of Oxford. Tom is Public Engagement Lead at the Oxford University Department of Continuing Education: www.conted.ox....
    For more maths content check out Tom's website tomrocksmaths....
    You can also follow Tom on Facebook, Twitter and Instagram @tomrocksmaths.
    / tomrocksmaths
    / tomrocksmaths
    / tomrocksmaths
    A HUGE thank you to all of my patrons for their support:
    Dr Peet Morris
    Jeryagor
    John Hanson
    Rodhern
    Denise
    Cooper Healy
    Hiro
    Delicious Rose
    Bin Liu
    Check out the Tom Rocks Maths Patreon page here: / tomrocksmaths

ความคิดเห็น • 268

  • @TomRocksMaths
    @TomRocksMaths  3 หลายเดือนก่อน +17

    🌏 Get NordVPN 2Y plan + 4 months extra ➼ nordvpn.com/tomrocksmaths It’s risk-free with Nord’s 30-day money-back guarantee! ✌

    • @lio1234234
      @lio1234234 3 หลายเดือนก่อน +1

      These models don't do any background reasoning (essentially thinking before answering). Definitely recommend trying out o1-mini which does do this. Currently o1-mini does better at maths than o1-preview, but o1-preview has better general knowledge reasoning. o1 when it's finally released should be just downright better than o1-mini at everything including maths.
      Highly recommend trying some of these out on that model :)

    • @IsZomg
      @IsZomg 3 หลายเดือนก่อน

      This uses ChatGPT 3 which is outdated. The latest free tier model is ChatGPT 4o and the top model is o1. Both of these are much better at math that ChatGPT 3 which is TWO YEARS OLD now.

  • @nofilkhan6743
    @nofilkhan6743 3 หลายเดือนก่อน +413

    Chatgpt doing black magic instead of geometry.

    • @asiamies9153
      @asiamies9153 3 หลายเดือนก่อน +6

      It sees the world differently

    • @delhatton
      @delhatton 3 หลายเดือนก่อน +24

      @@asiamies9153 it doesn't see the world at all

    • @alexandermcclure6185
      @alexandermcclure6185 3 หลายเดือนก่อน +6

      @@delhatton that's still different from how humans see the world. 🙄

    • @obiwanpez
      @obiwanpez 3 หลายเดือนก่อน

      “Narn, flëmadoch, F’Tadn ygsorath, loqgawtygsdryr!”

    • @obiwanpez
      @obiwanpez 3 หลายเดือนก่อน

      Seems to be a Deep Language learning model…

  • @narutochan620
    @narutochan620 3 หลายเดือนก่อน +190

    ChatGPT invoked the Illuminati on the Geometry question 😂

  • @bekabex8643
    @bekabex8643 3 หลายเดือนก่อน +108

    the geometry drawing it produced had me gasping for air 🤣

  • @Eagle3302PL
    @Eagle3302PL 3 หลายเดือนก่อน +146

    The problem is that chatgpt or any llm, they are not applying formal logic or arithmetic to a problem, instead they regurgitate a solution they tokenized from their training set, and try to morph the solution and the answer in the context of the question being asked. Therefore, just like a cheater, it can often give a correct result confidently because it has memorised that exact question, sometimes it can even substitute values into the result to appear to have calculated it, but in the end it's all smoke and mirrors. It didn't do the math, it didin't think through the problem, that's why llm's crumble when never before seen questions get asked, because an llm has no understanding, only memorisation. Also llms crumble when irrelevant information is fed alongside the question, because the irrelevant information impacts the search space that's being looked at, so accuracy of recall is reduced.
    LLM's do not think, they do not process information logically, rather they process input and throw out the most likely output, and use some value substitution in the result to appear to be answering your exact question.
    LLM's cannot do mathematics, at best they can spit out likely solutions to to your questions where similar or those exact questions and their solutions have been fed to them in their training set. An LLM knows everything and understands nothing.

    • @mattschoolfield4776
      @mattschoolfield4776 3 หลายเดือนก่อน +12

      I wish everyone understood this.

    • @Nnm26
      @Nnm26 3 หลายเดือนก่อน +12

      Try o1 brother

    • @mattschoolfield4776
      @mattschoolfield4776 3 หลายเดือนก่อน +7

      @@Eagle3302PL it's even in the name Large Language Model. I don't get how anyone thinks they have any understanding

    • @IsZomg
      @IsZomg 3 หลายเดือนก่อน +12

      New o1 model can 'show its work' and reason in multiple steps. If you think LLMs won't beat humans at math soon you are mistaken.

    • @CoalOres
      @CoalOres 3 หลายเดือนก่อน +1

      They _might_ process information logically, we actually don't know. Since they generate it word by word (or token by token), after enough training it might have learned some forms of logic because it turns out those are very good at predicting the next token in logical proofs. Logic is useful for many different proofs, just memorizing the answer is only useful for a single one (i.e. it would be trained out pretty quickly); this doesn't guarantee it knows logic, but it makes it plausible.
      It is a common misconception that these programs work by searching the dataset, 3Blue1Brown has an excellent video series I would recommend that shows just how complex its underlying mechanics actually are.

  • @shoryaprakash8945
    @shoryaprakash8945 3 หลายเดือนก่อน +70

    I once asked chatGPT to prove that π is irrational. It gave back the proof of √2 problem, discuss squaring the circle problem and in final conclusion wrote hence π is irrational.

    • @RFC3514
      @RFC3514 3 หลายเดือนก่อน +8

      Wow, it independently (re)discovered the Chewbacca defence!

    • @JagoHerriott
      @JagoHerriott 18 วันที่ผ่านมา

      @@RFC3514 no it didnt chatgpt cant invent it just knows chewbacca defence and tried to show how to get to the answer look at @Eagle3302PL comment

  • @tymmiara5967
    @tymmiara5967 3 หลายเดือนก่อน +15

    It becomes obvious that the language model is essentially a separate module to the image generator. I bet even if the solution had been flawlessly found, the drawing of a diagram would be completely bonkers

  • @Lightning_Lance
    @Lightning_Lance 2 หลายเดือนก่อน +9

    I feel like ChatGPT may have taken your first message to be meant as a compliment rather than as a prompt that it should pretend to be you.

  • @yagodarkmoon
    @yagodarkmoon 3 หลายเดือนก่อน +20

    Question 3 the geometry one ends up much better when you give it the graph with the instructions. I tried it and got a much better result. To do this I used the snipping tool to make an image of both the question and the graph. Then I saved it to desktop as screenshot.jpg and dragged that into the ChatGPT window. It read them both fine.

    • @Pedro-op6zj
      @Pedro-op6zj 2 หลายเดือนก่อน

      after using snipping tool you can directly Ctrl C + Ctrl V in chat gpt

  • @toshiv-y1l
    @toshiv-y1l 3 หลายเดือนก่อน +6

    20:42 power of a point is a basic geometry theorem...

  • @TheDwarvenForge05
    @TheDwarvenForge05 2 หลายเดือนก่อน +2

    ChatGPT has, on multiple occasions, told me that odd numbers were even and vice versa

  • @dmytryk7887
    @dmytryk7887 3 หลายเดือนก่อน +9

    In Q1 there seems to be an error in chatgpt's explanation. For example, it says "D" must be in position 7, 8 or 9 but "DOLYMPIAS" is a valid misspelling...every letter is one late, except for D (early) and S (correct).

    • @SgtSupaman
      @SgtSupaman 3 หลายเดือนก่อน +5

      Yeah, its mistaken assumption that a letter must be within one of its original location (in either direction) actually limits the number of possible permutations to 55.
      So, it definitely didn't properly pair up its explanation with its answer.

    • @coopergates9680
      @coopergates9680 2 หลายเดือนก่อน

      You caught it first. I'm surprised GPT could pull out the correct number while misunderstanding the terms along the way.

    • @GS-td3yc
      @GS-td3yc 2 หลายเดือนก่อน

      @@coopergates9680 it literally did 2^9=2^8=256

  • @JavairiaAqdas
    @JavairiaAqdas 3 หลายเดือนก่อน +2

    we can add shape through the attachment icon right in the left corner of the prompt box, just take a Screenshot figure and put forward like this.

  • @rostcraft
    @rostcraft 3 หลายเดือนก่อน +2

    Power of a point is actually real and while I’m usually bad in geometry at olympiads, some of my friends used it several times.

    • @deinauge7894
      @deinauge7894 3 หลายเดือนก่อน

      ok. to use this at the point Z you need two lines through Z which cut a circle in 1 or 2 points. Say this circle is centered atB with radius BA. You can conclude:
      ZX*ZY = ZB*ZW (W is the point where ZB coincides with the circle)
      Since ZW=ZB-BA
      we get
      ZX*ZY = ZB*ZB-ZB*BA.
      This looks almost like what chatGPT wrote. I'd give it a pass 😂

  • @gtziavelis
    @gtziavelis 3 หลายเดือนก่อน +2

    19:35 LOL, the diagram drawing looks like equal parts 1) M.C.Escher, 2) Indian Head test pattern from the early days of television, 3) steampunk, 4) Vitruvian Man. It's all sorts of incorrect, its confidence is a barrel of laughs, but it's lovely to look at and fun to contemplate how ChatGPT may have come up with that. My favorite part is the top center A with the additional 'side shield' A, and honorable mention to how the matchsticks of the equilateral triangle have three-dimensional depth and shadows.

  • @loadstone5149
    @loadstone5149 3 หลายเดือนก่อน +12

    Tom is not locked in. Every uni maths student knows if you take a picture of the question it will always give you the right answer

    • @micharijdes9867
      @micharijdes9867 2 หลายเดือนก่อน

      Facts, but for some reason it has a really hard time with topology

  • @gergoturan4033
    @gergoturan4033 3 หลายเดือนก่อน +1

    I've only watched up to the first question so far, but I came up with a different solution that's interesting enough to mention. Another way to think of the problem is dividing the characters into 2 subsets, one of them is the characters that were typed 1 late and the other is all the others that weren't. If all the characters are different, these 2 sets give enough information to reconstruct any possible spellings. Therefore, we just need to count all the ways to make these subsets.
    We know that in an n character long word the last character can never be 1 late. So we only have n-1 letters left to work with. [n-1 choose k] will give us a k sized subset. To get all possible subsets, we need to sum up for every case of k.
    [sum(k = 0..n-1)(n-1 choose k)]
    This is the n-1st row of Pascal's triangle. We know that the sum of the n-1st row of it is 2^(n-1). The word "OLYMPIADS" has 9 letters, therefore the answer is 2^8 which is 256.

  • @jppereyra
    @jppereyra 2 หลายเดือนก่อน +3

    Our jobs are safe, ChatGPT can’t do maths at all.

  • @abdulllllahhh
    @abdulllllahhh 3 หลายเดือนก่อน +3

    On an unrelated note, I remember sitting this BMO paper last year and struggling but enjoying it. I recently started uni in Canada and have been training for putnam, and now I’m looking back at these questions both cringing and being proud at how much I’ve grown in just a year, how I’ve gone from finding these questions tough, to now being able to solve them without much struggle. This is what I love about maths, how I can always continue with just some practice. P.s, great video Tom, really enjoyed watching it.

  • @bigbluespike5645
    @bigbluespike5645 3 หลายเดือนก่อน +3

    I asked the o-1 preview the geometric question and it approached the problem very analytically - by setting up a coordiante system, finding the points X,Y and Z by solving equation systems for lines and the circle and finally showing BZ is Perpendicular to AC using vectors and dot product of BZ⋅AC. I can't fully evaluate whether it's perfect, but I still think its solution was way better.

    • @bornach
      @bornach 3 หลายเดือนก่อน

      @@bigbluespike5645 How does it do on the other problems that ChatGPT made a mess of?

    • @bigbluespike5645
      @bigbluespike5645 3 หลายเดือนก่อน

      @bornach I didn't test yet, but i'll update you when i do

  • @cheesyeasy1238
    @cheesyeasy1238 3 หลายเดือนก่อน +3

    0:24 maybe i'm too panicky but the mere mention of the MAT sends a shiver down my spine... hoping for a non-disaster tomorrow 🙏

  • @samarpradhan3985
    @samarpradhan3985 3 หลายเดือนก่อน +2

    Me who also can’t do math: “Maybe I am ChatGPT”

  • @Hankyone
    @Hankyone 3 หลายเดือนก่อน +59

    Cool video and all but are you aware of o1-mini and o1-preview???

    • @TomRocksMaths
      @TomRocksMaths  3 หลายเดือนก่อน +36

      yes of course. the plan here was to use the free version as it is what most people will have access to, so I wanted to warn them to be careful when using it.

    • @IsZomg
      @IsZomg 3 หลายเดือนก่อน +6

      @@TomRocksMaths 4o is the best 'free' model, not ChatGPT 3

    • @9madness9
      @9madness9 3 หลายเดือนก่อน +1

      What to know if you could test with Stephen Wolfram add in! To see how good the addin makes chatgpt at maths

    • @devilsolution9781
      @devilsolution9781 3 หลายเดือนก่อน

      ​@@9madness9are there plugins???

    • @IsZomg
      @IsZomg 3 หลายเดือนก่อน +2

      @@TomRocksMaths ChatGPT 3 is TWO YEARS OLD now lol you didn't do your research.

  • @komraa
    @komraa 2 หลายเดือนก่อน +1

    That image had me dying for 2 minutes straight😂😂

  • @MorallyGrayRabbit
    @MorallyGrayRabbit 3 หลายเดือนก่อน +3

    25:43 Obviously it just used the power of a point thereom

    • @SethRoganSeth
      @SethRoganSeth 7 วันที่ผ่านมา

      Where is this reference from

  • @obiwanpez
    @obiwanpez 3 หลายเดือนก่อน +1

    19:50 - “Wull there’s yer prablem!”

  • @SayanMitraepicstuff
    @SayanMitraepicstuff 3 หลายเดือนก่อน +4

    You did not use the latest o1 series of models. I was trying to search for where you mention which model you were using - couldn’t find an exact response and you have cropped the part where it mentions the model and also haven’t shown the footage of the answer generation - which would give away the model you were testing. O1 can not generate images - which was the give away.
    Do the same tests with o1-preview.

    • @blengi
      @blengi 3 หลายเดือนก่อน

      yeah this all moot if not o1 which is openAI's first reasoning model, all the others LLMs are just level 1 chatbots by openAI def

  • @Justashortcomment
    @Justashortcomment 3 หลายเดือนก่อน +5

    Why didn’t you use OpenAI’s new model o1, which is designed for these types of problems? Would be interesting to see the performance of o1-preview with these.

  • @egodreas
    @egodreas 6 วันที่ผ่านมา

    This is a good example of someone completely misunderstanding how an LLM works. It does not "look for answers online". It does not try to do math or logic or "understand the question". It just generates plausible sounding text. Sometimes it's so very plausible that it just happens to also be factually correct, but that is mostly coincidence. If you want actual _reasoning,_ you have to use a model designed for this. Like OpenAI o1.

  • @hondamirkadirov5588
    @hondamirkadirov5588 3 หลายเดือนก่อน +5

    Chatgpt got really creative in geometry🤣

  • @asdfghyter
    @asdfghyter หลายเดือนก่อน

    the reason why the "diagram" it drew was such complete nonsense is that the model for generating images is completely different from the one used to generate text, so all the image generator is given is a text description from the gpt model, none of the text model's internal "understanding" of the question

  • @suhareb9252
    @suhareb9252 3 หลายเดือนก่อน

    The way chatgpt makes Tom wonder is the same way I make my maths teacher wonder about my answers in exams 😂

  • @patrickyao
    @patrickyao 2 หลายเดือนก่อน +3

    Hey Dr Crawford - thank you for your video and insight. It seems that you are using the basic GPT4 model to solve these BMO questions. There is a different model ChatGPT provides called the o1-preview, which is specifically designed for complex and advanced reasoning and solving difficult mathematical questions like this. If you use the o1-preview model, it would take way longer time (sometimes even more than a minute) before giving you a response, and it thinks in a way deeper way than the model you have used here. With that model, I've tried feeding it questions 5 and 6 on the BMO1 paper, and it could solve them perfectly.
    Therefore I would encourage you to try again with that specific model. I do believe that you have to have ChatGPT subscription to access that model, but I think that they are going to release a free version of that model. Anyways, thanks you so much!
    P.S. It would have been better if you simply uploaded a screenshot of the question as diagrams could have been included, and ChatGPT would be able to read the question from the image (probably better than it being retyped with a different syntax)

  • @Twi_543
    @Twi_543 3 หลายเดือนก่อน +1

    When I did this practice paper I got the same thing as u for question 2 about how the difference either increases or stays the same at each point, so if it is 1 at 2024 then it must be 1 at 1 bc the each term is an integer but I was confused when looking at the mark scheme so wasnt sure it was right. Thanks for explaining the mark scheme it helped me understand it better😁👍

  • @uenu7230
    @uenu7230 หลายเดือนก่อน

    Forgot exactly how I phrased the question to chatgpt, but it involved splitters with 1 input and 1-3 outputs where the outputs were equally divided from the input, and mergers with 1 output and 1-3 inputs where the output is equal to the sum of the inputs and how to construct a sequence of splitters and mergers to end up with two outputs with 80% and 20% of the original source input. It said to split the first input with a 1-2 splitter (50%/50%) and split one of those outputs with a 1-2 splitter (25%/25% of the original input). then merge the remaining 50% with the 25% and that will equal the requested 80% output, and the remaining 25% equals the requested 20%.
    In summary, it thinks that 50% plus 25% equals 80% and 25% equals 20%. So, yeah, ChatGPT can't math.

  • @djwilliams8
    @djwilliams8 3 หลายเดือนก่อน

    I found it works a lot better when you upload a photo of a question. Just press a screenshot or snippet tool and paste.

  • @eofirdavid
    @eofirdavid 3 หลายเดือนก่อน +7

    ChatGPT's "proof" for the first question was wrong.
    According to its "step 2" the answer should be 2^2 * 3^7 which is false. Also, the possible position are wrong since the n-th letter can be in any of the 1,...,n+1 position (except for the last letter which is in 1,...,n). I have no idea why it needed to mention Young Tableaux in step 3, since even if they are related somehow, this is a simple problem that doesn't need anything advance in order to solve it.
    Finally, in step 3, without a proper explanation it suddenly only gives 2 possibilities for each letter, and for some reason the letter 'L' has either 2 or 3 possible positions. Even if you ignore this, and give 2 positions for each letter you get 2^9 and not the 2^8 correct answer.

    • @FlavioGaming
      @FlavioGaming 3 หลายเดือนก่อน +1

      L has 2 possibilities AFTER placing O in position 1 or 2. Y has 2 possibilities AFTER placing the letters O and L in their positions and so on...

    • @FlavioGaming
      @FlavioGaming 3 หลายเดือนก่อน +1

      For the last letter S, it should've said that since we've placed 8 letters in 8 positions, there's only 1 place left for S.

    • @eofirdavid
      @eofirdavid 3 หลายเดือนก่อน

      @@FlavioGaming You are right. Without the knowledge of the position of the previous letters, the n-th letter can be in 1,...,n+1 positions (which seemed what step 2 meant to say) and after you assume that you placed the previous n-1 letters, then you only have 2 possibilities (which should have been step 4), except for the last letter which only has one possibility.
      In any case, while somehow chatGPT managed to give the final right answer, everything in between seems like guesses. This sort of proof is something which I would expect from a student that saw the answer before the exam, didn't understand it, and tried to rewrite it from memory, which granted, this is how chatGPT works. I would not call this "mathematics", and I have yet to see chatGPT answer any math problem correctly, unless it is very standard and elementary and its the type of question you expect to see in basic math textbooks.

  • @asdfghyter
    @asdfghyter หลายเดือนก่อน

    20:00 I think it might have misunderstood the question. I think it interpreted "two apart" as "has two dots in between" despite the question being very clear about this

  • @justanotherinternetuser4770
    @justanotherinternetuser4770 3 หลายเดือนก่อน +1

    a british man saying math instead of maths is a thing i never thought id see in my life

  • @mujtabaalam5907
    @mujtabaalam5907 3 หลายเดือนก่อน +5

    Is this GPT 4o or 4o1?

    • @caludio
      @caludio 3 หลายเดือนก่อน +1

      I think this is a relevant question. O1 is probably a better "thinker"

    • @TomRocksMaths
      @TomRocksMaths  3 หลายเดือนก่อน +3

      the plan here was to use the free version as it is what most people will have access to, so I wanted to warn them to be careful when using it.

    • @mujtabaalam5907
      @mujtabaalam5907 3 หลายเดือนก่อน +1

      ​@@TomRocksMathsThat's fair, but you should definitely do a video where you compare the two. Or see if you can beat 4o1 at chemistry, physics, or some other subject that isn't your speciality

  • @charmer1979
    @charmer1979 วันที่ผ่านมา

    Chat gpt cannot even solve simultaneous equations with x y z . It got one wrong totally wrong.

  • @tontonbeber4555
    @tontonbeber4555 2 หลายเดือนก่อน

    @2:41 There seems to be a problem in your definition of the problem.
    It is said a letter can appear at most one position late, but any position early as you wish.
    So the third letter Y can also appear in first position, am I wrong ?
    Like MATHS can be typed TMASH where you see 3rd letter appears in 1st position ...

  • @3750gustavo
    @3750gustavo หลายเดือนก่อน

    the new QWQ 32b preview model amazingly does better at this hardcore math questions than bigger models, it outputs over 3k tokens for each question as it tries to brute force a solution

  • @MindVault-t3y
    @MindVault-t3y 2 หลายเดือนก่อน +1

    Thanks for coming to my school (I was one of the year 10s), the presentation was very interesting!

  • @CosmicAerospace
    @CosmicAerospace 3 หลายเดือนก่อน

    You can input images onto the prompt by copy pasting a screenshot or 🥇 lacing an attachment onto the prompt :)

  • @dan-florinchereches4892
    @dan-florinchereches4892 2 หลายเดือนก่อน +1

    The second problem reminds me of euclids alogithm and most notably the chinese usage of such method. If you got 2 vessels of volunes a And b the lowest volume which you can measure is the greatest common divisor of a and b.
    By using this logic and the fact that any ai and ai-1 are some linear combinations of a0 and a1 it folowsthat gdc(ai,ai-1)=gcd(a0,a1) henceif they are consecutive they both have gcd of 1.

  • @jacks5kids
    @jacks5kids 19 วันที่ผ่านมา

    For the unreliable typist problem, the professor failed to notice that ChatGPT cheated in its answer, thus the professor has failed in his duty; he should have spotted the flawed argument. Additionally he should have called out ChatGPT for cheating. ChatGPT failed to provide an acceptable answer. We know that ChatGPT had trained on this question since Tom said that he had presented the same problem to an earlier version. Thus ChatGPT already had stored up the correct numerical answer. However it overtly cheated in the lines between its flawed reasoning and the numerical answer. Look at 3:24. It multiplies two by itself nine times, and then writes that this is equal to 2^8 = 256. Count the twos; there are definitely nine of them. The reasoning was flawed on line 9. Once 8 letters are typed, the last one has only one place to go, not two. Since the question specified "with proof", ChatGPT failed to provide correct reasoning but was able to recall the correct numerical result from its training. Then it cheated to make them fit together. That's bad math.

  • @TheMemesofDestruction
    @TheMemesofDestruction 3 หลายเดือนก่อน +2

    I have found the WolframGPT is better at Maths than the standard ChatGPT. That said both often require additional prompting to achieve desired results. Then again it could just be human error on the prompter side. Cheers! ^.^

  • @KaliFissure
    @KaliFissure 3 หลายเดือนก่อน

    ChatGPT can't draw a simple cardioid. Even after I gave it the formula.

  • @Jordan-gt6gd
    @Jordan-gt6gd 3 หลายเดือนก่อน +1

    Hi Dr, Can you do a lecture series on any math course you like,similar to the ones you did with calculus and linear algebra.

  • @KoHaN7
    @KoHaN7 3 หลายเดือนก่อน +3

    Hi Tom, I really like the video! 😀If you want to see a good performance in logic and reasoning from GPT, using GPT o1-preview seesms to be the best at the moment. It would be interesting to repeat the same with that more advanced model. It thinks before answering which allowes it to check its own answeres before saying the first thing that comes to mind

    • @TomRocksMaths
      @TomRocksMaths  3 หลายเดือนก่อน

      ooooo this is exactly the kind of thing I was thinking it needs!

  • @Neptoid
    @Neptoid 3 หลายเดือนก่อน +15

    The new Chat GPT o1 doesn't have this problem, it can reason about math on the research level

    • @mattschoolfield4776
      @mattschoolfield4776 3 หลายเดือนก่อน +2

      Not if it's a llm

    • @IsZomg
      @IsZomg 3 หลายเดือนก่อน +6

      @@mattschoolfield4776 lol then neither can 80% of humans

    • @eofirdavid
      @eofirdavid 3 หลายเดือนก่อน +2

      @@IsZomg This is probably the most accurate way to think about chatGPT... Yes, his answers seem like it tries to remember and rewrite an answer that it had seen before, but never understood it, however, as someone who checked many math exams, it does not seem too far from the average student's answers. So in this sense, chatGPT does exactly what it suppose to do: answer like a human...

    • @IsZomg
      @IsZomg 3 หลายเดือนก่อน

      @@eofirdavid o1 scores 120 on IQ tests which means it's beating more than half of humans now. There's no reason to think the progress will stop either.

    • @bornach
      @bornach 3 หลายเดือนก่อน

      ​@@IsZomg Then create a reply video demonstrating that o1 can solve all the math problems that ChatGPT failed at in Tom's video. This would be very instructive for the Tom Rocks Maths audience

  • @asdfghyter
    @asdfghyter หลายเดือนก่อน

    7:09 the second rule is incorrectly rewritten. the rewritten second rule ChatGPT wrote is just the rewritten first rule in flipped order and negated. the correct rewritten second rule would be:
    a_i - a_{i-1} = 2 * (a_{i-2} - a_{i-1})
    this is impossible if a_i and a_{i-1} are consecutive (2*n can never be +-1), so by induction, the first case must hold for all i

  • @MorallyGrayRabbit
    @MorallyGrayRabbit 3 หลายเดือนก่อน

    One time I asked it what an abelian group was as a test and it told me all abelian groups are dihedral groups and spit out a bunch of complete nonsense math and i was so sad because at first i saw all the math and thought it might be actually real

  • @lipsinofficial3664
    @lipsinofficial3664 3 หลายเดือนก่อน

    You can UPLOAD PDFS

  • @jursamaj
    @jursamaj 3 หลายเดือนก่อน

    On the unreliable typist: I feel ChatGPT mischaracterized the possible positions of letters (or I'm drastically misunderstanding the rules. In steps 1 ^2, it said 'S' can only be in the last 2 positions. But 'SOLYMPIAD' appears to fit the rules ('S' is way early, and each other letter is 1 late). It may have gotten the right answer, but it's argument was flawed.
    On the polygon: Step 1 is false. Convex with equal sides does *not* imply the vertices lie on a circle. A rhombus is convex and all its sides are equal, but the vertices are *not* on a circle. This alone invalidates all the rest of the proof, which relies on the circle. Also, in step 4 part 'n=5', the 3 diagonals do *not* form an equilateral triangle. Nor would it "ensure … a regular polygon" if they did.
    The important thing to remember is that LLM "AI" isn't *reasoning* at all. It's just stringing a series of tokens together based on how often it has seen those words strung together before, plus a bit of randomness.

  • @rudolf-nd9nu
    @rudolf-nd9nu 2 หลายเดือนก่อน

    Math so hard even chatgpt ain't mathing

  • @PrajwalDSouza
    @PrajwalDSouza 3 หลายเดือนก่อน +10

    O1? O1 preview?

    • @Hankyone
      @Hankyone 3 หลายเดือนก่อน +6

      This oversight makes no sense, is he not aware these models exist???

    • @TomRocksMaths
      @TomRocksMaths  3 หลายเดือนก่อน +3

      the plan here was to use the free version as it is what most people will have access to, so I wanted to warn them to be careful when using it.

    • @PrajwalDSouza
      @PrajwalDSouza 3 หลายเดือนก่อน +2

      @@TomRocksMaths Makes sense. :) However, like I mentioned earlier, given the title of the video, it might be apt to include a discussion on o1 or drawn a comparison with o1.
      Damn. I sound like a reviewer now. 😅

  • @thisisalie-s1s
    @thisisalie-s1s 3 หลายเดือนก่อน +1

    Hi Dr Tom! I am a fan from Singapore and I would like to inform you about the Singapore A level, which is known to be harder than the IB HL maths paper. I think that you would probably enjoy doing that paper

    • @ramunasstulga8264
      @ramunasstulga8264 3 หลายเดือนก่อน

      Nah jee advanced is easier than IB HL, lil bro 💀

    • @thisisalie-s1s
      @thisisalie-s1s 3 หลายเดือนก่อน

      @@ramunasstulga8264 If you are so retarded that youre unable to even do both paper before making a valid criticism you shouldn't even comment. I find it baffling someone like you is even watching this video.

  • @Justcurios-f7f
    @Justcurios-f7f 2 หลายเดือนก่อน +1

    I love the way you approch problems, You should try a Sri Lankan A/L paper.

  • @CodecYT-w4n
    @CodecYT-w4n 3 หลายเดือนก่อน

    I once asked chatgpt to use my algorithm to find the no of prime numbers from 1 to 173, It said 4086.99

  • @Sevenigma777
    @Sevenigma777 3 หลายเดือนก่อน

    Why does it look like someone else is controlling his arms in the intro? Lol

  • @snehithgaliveeti3293
    @snehithgaliveeti3293 3 หลายเดือนก่อน +1

    Tom can you try the TMUA entrance exam paper 1 and 2

    • @nightskorpion1336
      @nightskorpion1336 3 หลายเดือนก่อน

      Yesss I've been asking this too

  • @gogyoo
    @gogyoo 3 หลายเดือนก่อน

    ChatGPT teaching us about humility. We're all smug quoting "By the power of Greyskull!". Meanwhile, it's like "No. KISS principle. None need for being bombastic: 'By the power of a point'"

  • @srikanthtupurani6316
    @srikanthtupurani6316 3 หลายเดือนก่อน

    The way chat gpt answers questions it makes us laugh. But it has the capability to understand hints and solve the problems.

  • @alohamark3025
    @alohamark3025 3 หลายเดือนก่อน

    Chatgpt is already smarter than all nine members of the US Supreme Court. But, I tend to doubt that Chatgpt will ever get to the creativity level of any well known mathematician in history. We can breathe a sigh of relief.

  • @johnplays9654
    @johnplays9654 3 หลายเดือนก่อน +1

    Chat-GPT can barely solve some basic Algebra 1 questions

    • @Natearl13
      @Natearl13 2 หลายเดือนก่อน

      Mine’s been on point with multivariable calc idk what you’re using

  • @dekkeroid2962
    @dekkeroid2962 3 หลายเดือนก่อน

    I have seen gpt do 1+1=6
    Giving it Olympiad questions is just wasting electricity

  • @ValidatingUsername
    @ValidatingUsername 3 หลายเดือนก่อน

    Have you ever had a question that used the arc length of equal sized circles to solve the question?

  • @igorvieira344
    @igorvieira344 3 หลายเดือนก่อน +2

    O1 models are way better in maths

    • @bornach
      @bornach 3 หลายเดือนก่อน

      @@igorvieira344 How does o1 do when given these maths problems?

    • @Tobi21089
      @Tobi21089 3 หลายเดือนก่อน

      ​@@bornachit aces them

  • @Mathsaurus
    @Mathsaurus 3 หลายเดือนก่อน

    It feels like ChatGPT is a still quite a way from being able to solve these sorts of problems. I made a similar video recently putting it up against this year's (2024) Senior Maths Challenge and I found its results quite surprising! th-cam.com/video/crMeD37Q49U/w-d-xo.html

  • @TheOriginalJohnDoe
    @TheOriginalJohnDoe 3 หลายเดือนก่อน

    Today I let chatgpt add two fractions and it got it wrong. HOW?!

  • @wizkidsid1991
    @wizkidsid1991 3 หลายเดือนก่อน +1

    For the second question.
    case 1 -> ai = 2ai-1 - ai-2 -> Subtract ai-1 from both sides -> ai - ai-1 = ai-1 - ai-2 so di as per chat gpt's suggestion -> di = ai-1 - ai-2. So Now if a2024-a2023 = 1 as they are consecutive. So a2023-a2022 = a2024 - a2023 = 1 -> a2023 - a2022 = 1 -> a2023 = a2022 + 1. And so it follows for the entire series.
    case 2 -> ai = 2ai-2 - ai-1 -> ai + ai-1 = 2ai-2 -> Now a2024 and a2023 are consecutive. So a2024 + a2023 = 2*a2022 Now two consecutive numbers means one is odd and one is even. So the sum will be odd. That means a2024 + a2023 = 2k+1 So 2k+1 = 2a2022 -> a2022 = k + 1/2. So it is not an integer. But the problem suggests that the sequence is of integers. Hence case 2 is not allowed.

  • @arthurdt6025
    @arthurdt6025 3 หลายเดือนก่อน

    now time for the o1-mini model if you have premium

  • @nicoleweigelt3938
    @nicoleweigelt3938 3 หลายเดือนก่อน

    Looked for something like this after I got frustrated it was getting algebra and calculus wrong 😅 Thanks for the vid!

  • @HITOKIRI01
    @HITOKIRI01 2 หลายเดือนก่อน

    Can you repeat the exercise with o1-preview?

  • @funtimenoahh
    @funtimenoahh 3 หลายเดือนก่อน +1

    You came to my school

  • @Axacqk
    @Axacqk 3 หลายเดือนก่อน

    "Cirbmcircle and Perpenimctle" is the title of a lost work by Rabelais. Unfortunately we will never read it because it is lost.

  • @lupusreginabeta3318
    @lupusreginabeta3318 3 หลายเดือนก่อน +11

    1. The Prompt is definitely upgradable 😂
    2. You should use the new preview model o1 it is quite a lot better than 4o

  • @alternativegoose7122
    @alternativegoose7122 3 หลายเดือนก่อน

    Do you ever mark igcse papers?

  • @o7rein
    @o7rein 3 หลายเดือนก่อน

    and that's how bullshitting works: experts are speechless and general public takes that for incompetence.

  • @Calcprof
    @Calcprof 3 หลายเดือนก่อน

    A while back I saw a "research" paper written by CHatGPT about an issue in game theory It was absolute nonsense, the vocabulary and sentence structure was all OK, but the "logical steps" were all outright nonsense.

  • @HBtu-f7y
    @HBtu-f7y 3 หลายเดือนก่อน

    Did you consider trying their o1 model

  • @PW_Thorn
    @PW_Thorn 3 หลายเดือนก่อน

    Next time I'll have to argue with anything, I'll say it's "by the power of a point theorem!!"
    Thanks chatgpt!!!

  • @TomLeg
    @TomLeg 3 หลายเดือนก่อน +1

    Khan's Academy explains the "power of a point theorem".

  • @jppereyra
    @jppereyra 2 หลายเดือนก่อน

    Comparing Gemini vs chat GTP
    for the time being Gemini is worse than chat GPT. However, Gemini doesnt limit the amount of questions you may do but chat gpt does. That would be a decisive factor in the dominance of Gemini vs Chat GPT, depending upon how many of us start teaching Gemini or Chat GTP to do Maths properly. ¿Do you want to be redundant? That is the main question!

  • @OzoneTheLynx
    @OzoneTheLynx 3 หลายเดือนก่อน +1

    I tried getting Gemini to draw its 'solution' to 3) and it responde with the link to the solutions XD.

  • @froggy41046
    @froggy41046 3 หลายเดือนก่อน

    Its clearly said that chatgpt can make mistake just like us humans

  • @Smashachu
    @Smashachu 2 หลายเดือนก่อน +1

    You didin't use the newest model 1o, which is significantly better in every way at mathematics.

  • @coopergates9680
    @coopergates9680 2 หลายเดือนก่อน

    Question 1, step 2, doesn't "SOLYMPIAD" fit the constraints? Same with "OLSYMPIAD"? At least some cases with a letter appearing at least 2 slots early seem omitted. D should not be restricted to 7 or later and S should be allowed before 8, for instance.

  • @Justashortcomment
    @Justashortcomment 3 หลายเดือนก่อน +1

    Hey Tom,
    Thanks for the video. BUT! ;) OpenAI will release the full o1 “reasoning model” soon. Currently we only have access to the preview.
    It would be fantastic to see a professional mathematician evaluate its performance, ideally with a problem set that isn’t on the internet or in books or has only been put on the internet recently.

  • @GodexCigas
    @GodexCigas 3 หลายเดือนก่อน +9

    Try using GPT-o1-preview - It uses advanced reasoning.

    • @Exzyll
      @Exzyll 2 หลายเดือนก่อน

      Yeah I was gonna say that he will be shocked

  • @kaisoonlo
    @kaisoonlo 2 หลายเดือนก่อน

    Try using GPT o1 preview. Unlike GPT 4o, it excels at STEM questions due to its "advance reasoning"

  • @tambuwalmathsclass
    @tambuwalmathsclass 3 หลายเดือนก่อน

    No AI is as good as humans when it comes to Mathematics.
    AIs failed so many prompts I've given them

  • @floretion
    @floretion 3 หลายเดือนก่อน

    The obvious problem confusing ChatGPT is your use of terms involving letters "a_i" when describing the equations :)

  • @brettt.9464
    @brettt.9464 3 หลายเดือนก่อน

    It forgot to apply the Power of Grayskull Theorem. That's why it was wrong.

  • @yehet8725
    @yehet8725 2 หลายเดือนก่อน

    Whenever I am asking chatgpt for help with math questions, I almost always notice something went wrong. So I guess a tool made for helping me get the question right, made me help myself in knowing when things are wrong instead :3 (this makes sense in my head okay)

  • @massiveastronomer1066
    @massiveastronomer1066 2 หลายเดือนก่อน

    I have this test coming up on the 20th, these questions are brutal.

  • @bobbobson6867
    @bobbobson6867 3 หลายเดือนก่อน

    And this is why you shouldn't do drugs, kids. 😔😔

  • @vasiledumitrescu9555
    @vasiledumitrescu9555 2 หลายเดือนก่อน

    I use it to study some theoretical stuff, it’s good at explaining theorems and definitions and producing good examples. It can even prove things pretty well, because it’s not actually doing the proof but just taking it from its database and pasting it to you. Of course it makes mistakes now and then, but they’re so dumb they’re easy to catch. And by “using it” i mean: as i’m studying from my notes or books i ask from time to time chatgpt things in order to understand the mind bogglingly abstract stuff i have to understand. Overall it has proven to be a fairly useful tool to learn math, at least for me, as i’m pursuing my bachelor degree in math.