Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems?

Kyle Kabasares

มุมมอง 175 127

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 27 ธ.ค. 2024

ความคิดเห็น • 1.1K

@KyleKabasares_PhD 3 หลายเดือนก่อน ⁺⁹⁶
Hi everyone, thank you so much for the feedback! I couldn't have expected this kind of attention on my video in the first 48 hours. I've taken some of your suggestions in the comments and have created a Part 2: th-cam.com/video/a8QvnIAGjPA/w-d-xo.html
Please consider watching!
@AlfarrisiMuammar 3 หลายเดือนก่อน ⁺¹
Open ai say The real Ai o1 version of will be out before the end of 2024.
@User.70793 3 หลายเดือนก่อน
UNIVERSAL BASIC ICOME 2025
@xspydazx 3 หลายเดือนก่อน
it was funny !
the thing is to keep up with the technolgys ad current innovations being deployed as it should not be hard to emulate these neural networks wth the open sourced models ! the aim is to train the local models as best you can at the highest point you capability but keep aware that technolgy needs to adavce to hande these heavy tensor calcualtions hence local model will be able to perform these tasks without the eed of outseide intervenetion so get an early start !
or it will be a nightmre of stuy to catch up : it has taken me a full year of constant python etc doing this training and implementation to keep up and get ahead ! .. that gap is widening @
Just expect to be able to host a 24b or 70b local within the next two years ! , a full genrative model ! ( so you could host multiple mini 7b agents at the same time ! hence a power full system ! ( agentic ! )
@debragotenks 2 หลายเดือนก่อน ⁺¹
How much did open ai pay you to make this ad?
@User.70793 2 หลายเดือนก่อน
@@AlfarrisiMuammar I can't wait I'm still intrepid for GPT 5
@An_Attempt 3 หลายเดือนก่อน ⁺³⁶¹
It is worth noting that GPT has probably 'read' ever thousands of answer books on Jackson, As well as all of Stack Exchange, as well as several study guides on Jackson in particular. So if you want to really test GPT ability you probably need to create novel questions that will not be found in any textbook or online form.
@gabrielbarrantes6946 3 หลายเดือนก่อน ⁺²⁷
Exactly, problems that solve students are already done somewhere on the internet, is just about googling it and copy paste the solution.
@taragnor 3 หลายเดือนก่อน ⁺⁴³
It's the same issue with AI being "great at programming" because it's extensively trained on leetcode problems.
@gabrielbarrantes6946 3 หลายเดือนก่อน ⁺⁸
@@taragnor being good at leetcode is not even being good at programming.
@khiemgom 3 หลายเดือนก่อน ⁺⁵
@@gabrielbarrantes6946 its doesnt have access to the internet
@ShayPatrickCormacTHEHUNTER 3 หลายเดือนก่อน ⁺³
@@gabrielbarrantes6946 said the web dev.
@kylechickering5890 2 หลายเดือนก่อน ⁺⁴⁴
Note 2: I haven’t looked through the answers, but in cases where GPT knows what the answer should be, it will make up crap to fill in the middle. I’ve asked it many PhD level math questions where it has hallucinated its way to a “plausible” answer.
@KyleKabasares_PhD 2 หลายเดือนก่อน ⁺⁸
I'm planning on making a follow up video on comparing my approach to solving this problem with ChatGPT's! Thanks for pointing that out
@omatbaydaui722 2 หลายเดือนก่อน ⁺⁷
@@KyleKabasares_PhD that's not what he was saying. You were providing GPT 1o the answers, so of course it would give you the right answers since you provided them for it. To know if it truly solves PhD questions, you shoudn't give it questions like :"prove that this formula is verified" but rather " what is the formula for ... ?"
@KyleKabasares_PhD 2 หลายเดือนก่อน ⁺⁷
@@omatbaydaui722 I understood what he was saying. I’ve verified it doing problems correctly from start to finish (non-Jackson) without knowing the answer in advance! But in those cases I actually did the problem unlike here, so I’m planning on revisiting the problems in this video.
@dimitriskliros 2 หลายเดือนก่อน ⁺⁵⁵
to be fair, you don’t seem to have actually checked the model responses, there could have been mistakes or hallucinations throughout
@luissantiagolopezperez4938 2 หลายเดือนก่อน ⁺¹
Can you point out any specific hallucinations on this video?
@particle.garden 2 หลายเดือนก่อน ⁺⁹
100% this. It's given the answer to work towards. I do not have enough knowledge in this area to prove that it came to it's conclusions incorrectly, but it's a well known quirk.
@nexicturbo 3 หลายเดือนก่อน ⁺⁴¹
Crazy part is that this isn’t even the full model, which is even better
@CommentGuard717 3 หลายเดือนก่อน ⁺⁵
Yeah, it's not even a beta, it's a preview. And it's still using the last gen model. They're coming out with a new model pretty soon.
@CrazyAi166 3 หลายเดือนก่อน
To us who's math students😂😂❤
@alvaroluffy1 3 หลายเดือนก่อน ⁺¹
@@CommentGuard717 yeah, imagine the GPT 5 implemented not preview version. Thats gonna be fucking wild and its not that far from now
@Ken-vy7zu 3 หลายเดือนก่อน
@alvaroluffy1 well, they are now working on chatgpt 6.0
@alvaroluffy1 3 หลายเดือนก่อน
@@Ken-vy7zu shut up you know nothing, stop making up things, they are still working on gpt-5 you realize that right?
@JohnSmall314 3 หลายเดือนก่อน ⁺⁵⁰
I just tried o1 with some fairly simple integrals, which it got badly wrong and I had to guide it to the correct answer. So I'd advise checking every step in the derivation.
@jamesmcadory1322 21 วันที่ผ่านมา ⁺³⁰
It could have made several errors in its derivation but displayed the right answer since it knew the desired result. Ai is notorious for simple mistakes and contradictory statements. It’s still impressive but also as you admitted you didn’t really check it in depth and gave it problems where the end result was given.
@JurankoNomo 2 หลายเดือนก่อน ⁺²⁷
You have to remember that this book was probably directly in the chatGPT training data, so this may not be a valid measure of novel problem solving ability
@delxinogaming6046 3 หลายเดือนก่อน ⁺⁷⁶⁵
This is the worst this technology will ever be….
@armwrestlerjeff 3 หลายเดือนก่อน ⁺⁷²
That's an incredible truth
@MaxWinner 3 หลายเดือนก่อน ⁺⁵¹
That's a terrifying truth
@thegeeeeeeeeee 3 หลายเดือนก่อน ⁺²⁵
Eh it might hit a wall though.
@wbay3848 3 หลายเดือนก่อน ⁺⁶⁶
@@thegeeeeeeeeeeI’m here from the future, your comment aged poorly
@igorbessa3563 3 หลายเดือนก่อน ⁺⁵
It might stagnate tho
@masterfall27 2 หลายเดือนก่อน ⁺⁵⁴
if its a famous problem, isn't there a good chance the solution was already in the training data?
@Stepbro126 2 หลายเดือนก่อน ⁺¹⁸
In general, ML models shouldn’t memorize the training data. A lot of effort is put into ensuring the model learns how to do the problem rather than memorizing.
@Balls23912 หลายเดือนก่อน
it was backtracking and double checking though?
@bladerunner9531 26 วันที่ผ่านมา
Exactly. But similar data as the training data yields high accuracy
@SomeRandomdUde14 2 หลายเดือนก่อน ⁺⁴⁷
Testing it with a book that is “infamous” probably isnt a great benchmark considering that it would mean that there is a considerable database related to that specific book it could read from. If you could test it on a novel problem that would be better
@amcmillion3 3 หลายเดือนก่อน ⁺⁸¹
The issue with this "test" is that the solutions to one of the most famous physics books are certainly in it's training data. Give it a problem for which there is no known solution. Or at a minimum give it a problem from an unknown text book. Find an old physics or math book from 100 years ago that has no digital copies. Then ask it questions from that and see how it does.
@akarshghale2932 3 หลายเดือนก่อน ⁺¹
Exactly
@Lucid.28 3 หลายเดือนก่อน ⁺²
@@amcmillion3 yes I did it and it’s not very accurate , I feed in 3 jee advance questions and out of which it could only answer 1 correctly , 1 he did it wrong even with hints and 1 he had to solve wrong first and than with hints it was able to solve it
@apache937 3 หลายเดือนก่อน
best of all, make your own
@pratikpaharia 3 หลายเดือนก่อน ⁺³
If they were JEE Mains level questions, than solving 1/3 would put it at the same level as those qualifying in top 1000 for the exams. FYI, The highest marks in JEE Mains were usually around 33-35%. I’d wager that would be folks with l an IQ level of ~130+, that is pretty damn good for GenAI. On the normal distribution curve of IQ where 100 is the population average, 130 should at least yield 1 or 2 sigma of confidence level to the statement “GenAI has definitively exceeded the average of the human intelligence level”
@Lucid.28 2 หลายเดือนก่อน
@@pratikpaharia nah
@rishikhan2 3 หลายเดือนก่อน ⁺¹²
Instead of telling it the answers, try asking it to find them. When I did this, it got the first one to an infinite sum but didn't reduce the infinite sum to the final answer: pretty good! For the second one, it had an extra factor of 1/pi that it made up. For the third it completely disregarded the angular dependence of the scattering and failed.
@chudchadanstud 3 หลายเดือนก่อน ⁺⁴²
They told me AI would replace hard labourers and Fast food workers first leaving us more time to think so I went to college now I'm college and I'm the first one being replaced.
@phen-themoogle7651 3 หลายเดือนก่อน ⁺⁵
Don't worry, everyone will be replaced in 3-5 years💀
@DeathDealerX07 3 หลายเดือนก่อน
@@phen-themoogle7651 💯
@avijit849 3 หลายเดือนก่อน ⁺⁴
yeah it's everyone. from labourers to physicists, Ai could do everything much more effectively.
the biggest surprise was creativity, that Ai could create art.
@meanmachine99999 3 หลายเดือนก่อน
Just don’t be a data analyst and if you want to be a computer scientist get out of school and get into the languages and start building plenty of budding industries right now
@glub1381 3 หลายเดือนก่อน
@@avijit849 ai is not creative and I don't believe it ever will be
@kylechickering5890 2 หลายเดือนก่อน ⁺²⁴
I haven’t watched the solving yet, but immediately I would like to point out that choosing problems which have known solutions may mean that the model has already seen (and simply memorized) the correct answer!
A better test is to ask it an impossible problem or one that solutions don’t exist for and then try to see if it’s generated solution is correct.
@pripyaat 2 หลายเดือนก่อน ⁺¹
Absolutely. If you simply Google the first 15 words of Problem 1, the very first result is a pdf document with a detailed, step-by-step solution. If anything, assuming the steps provided by o1 are correct, it just demonstrates it's decent at summarising search results...
The same goes for programming. A lot of people get easily impressed when GPT "writes" a 50-line script that's basically within the first 3-4 StackOverflow posts after a Google search. I mean, yeah, I won't deny it's really convenient that the tool can save you a few clicks, but saying that it has an understanding of the answer it's giving you is (as of today) still a stretch.
@o_o9039 2 หลายเดือนก่อน
if you know how ai works the way they are trained is lossful they don't have access to word for word of every bit of their training info if they did these models would be terabytes upon terabytes in size and would be extremely slow.
@pripyaat 2 หลายเดือนก่อน ⁺¹
@@o_o9039 I know how they work, and I'm not saying the model has all the information stored in its parameters, but it's no secret GPT can indeed search the web and summarize its findings. Copilot (based on GPT4) openly provides sources for almost everything it spits out.
@KRYPTOS_K5 2 หลายเดือนก่อน
@@pripyaatHow to know if it cheated?
@98danielray 2 หลายเดือนก่อน
@@pripyaatjeez. even worss than I thought
@olivetree9920 3 หลายเดือนก่อน ⁺¹⁷
And remember this is a truncated version of the model. It's full version is much better at problems like this
@redtoxic8701 3 หลายเดือนก่อน ⁺³
What? 😮
@paulojcavalcanti 3 หลายเดือนก่อน ⁺⁹
i asked it to calculate some stuff (quantum mechanics) for me and it also did some difficult step without explanation. i asked it to prove that step and it gave me a proof containing 1 mistake, but i wasn't sure and asked about that step, then it realized it was wrong, explained exactly why it was wrong, fixed it, and remade the calculation corrrectly
@anthonypenaflor 2 หลายเดือนก่อน ⁺²⁸
The model's performance is undoubtedly impressive, but if it was trained on this book (which seems likely), it's not truly generalizing to new data. For a fair assessment of its capabilities, models should be tested on novel, unforeseen problems, rather than those for which the answers are already available. In practice, models are typically evaluated on fresh data to gauge how well they can generalize. To accurately measure performance at this level, problems should be novel and manually verified, even if that takes considerable time (1.5 weeks or more).
@pentazucar 2 หลายเดือนก่อน ⁺⁵
I believe the book does not have the answers to the problems, so even if it was trained with the book it shouldnt help it to solve problems. Still it is possible that it just took the answers from some physics subreddit post and just pasted it
@velcrawlvfx 2 หลายเดือนก่อน
It backtracked on its own answers double checking so I doubt it already knew the answer off it being trained off the book
@Daniel__- 2 หลายเดือนก่อน
IT BACKTRACKED ITSELF THOUGH????
@Daniel__- 2 หลายเดือนก่อน
Not to mention, universities have and still run research where they create brand new tests solely for having AI take them
@marul688 2 หลายเดือนก่อน ⁺³⁸
There is a problem with the test:
Since the answer ,,show that.." is given, the AI will always show the correct answer, the reasoning might be flawed. It would be better to cut out the correct answer from the problem and see what AI will answer then.
@maxaposteriori 2 หลายเดือนก่อน
This applies to humans completing the problem as well, and there was an effort made to check the steps.
I agree, it might be interesting to see if it could though (although if it succeeds, will likely express it in an different form which may be hard to verify).
@pentazucar 2 หลายเดือนก่อน ⁺³
i agree with you, specially taking into account that it may just be bluffing and we would have no idea
@briansauk6837 2 หลายเดือนก่อน ⁺³
Prior versions had bogus steps that didn’t really follow legitimate steps, and units were often fouled up. Definitely deserves to be looked at deeper to see if that has improved.
@ominousplatypus380 3 หลายเดือนก่อน ⁺¹²
The model has most likely been trained on these problems and their solutions, since they've been around on the internet for a long time. So it isn't really a good test of its abilities since it has just memorized the solutions. That being said, I also tried it with some problems from the 2024 International Math Olympiad, and it was able to get at least two (out of six) of them correct consistently. I only tried problems where the answer was a single number/expression, going through proofs would be much more work. The model's knowledge cutoff should be in October 2023 so it shouldn't be possible for it to have seen these problems before. It's still hard to tell since OpenAI isn't being very transparent with their methodology, but if the model is actually able to solve novel IMO level problems it has never seen before, color me impressed.
@contentfreeGPT5-py6uv 3 หลายเดือนก่อน ⁺¹
I test ,AND correct answer for me o1 2024 final with alternatives
@mirek190 3 หลายเดือนก่อน
gpt4o has the same training data and can not sole it? so ...
@Draco-jk3rb หลายเดือนก่อน ⁺⁷
If you want to know, the steps are simply contextual changes, it is essentially a gpt that has the instructions self-set, and the output of its thinking steps is the instructions it is providing itself at each step. it works because by shifting context at each step rather than only a single context of the original message and response, it is able to aproach problems itteratively from different 'perspectives'
@cluelessMuslims-ed5js 3 หลายเดือนก่อน ⁺⁶⁰
This channel is the reason why I'm not reading fluid mechanics rn
@KyleKabasares_PhD 3 หลายเดือนก่อน ⁺³⁰
Can’t tell if I should say thank you or I’m sorry lol
@dreamscapes2727 3 หลายเดือนก่อน ⁺²
Fluid mechanics is extremely fun🤲
@knotratul5106 หลายเดือนก่อน ⁺⁷⁷
It was trained on this data lol
@juanperez-lh9mt 27 วันที่ผ่านมา ⁺⁷
Yes, but it's not like it just looks it up. It actually thinks whatever that means for a machine
@ancientmodis 24 วันที่ผ่านมา ⁺⁹
@@juanperez-lh9mtIt's a language model, it LITERALLY looks it up.
doesn't make this less impressive
@tachyonindustries 21 วันที่ผ่านมา ⁺¹
@@juanperez-lh9mt It doesn't look it up, but the questions and answers are likely part of the training data for the model, which means that they're, in a way, already imprinted into the AI's "memory". Now, it surely still has to do some work to figure it out, but sometimes when you see something like "Recognize that _____ = _____" in some completely unrecognizable way, it's probably something the AI has just summoned from "memory".
@nashh600 17 วันที่ผ่านมา
@@tachyonindustries It's not prevalent enough in the training data that we can say it recognizes the problem, it just recognizes the type of problem and reasons from there
@BrazaG วันที่ผ่านมา
@@ancientmodis LLMs do not look up in their databases.
The database is used to train the model, all they learned is probabilities to figure out the most probable continuation to your prompt.
It does not look anything up. LLMs do not even have access to look up the raw data they have been trained on....
That is fundamentally NOT how they work.
@pancakesauces 2 หลายเดือนก่อน ⁺⁶⁴
Now ask it if 9.11 is bigger than 9.9
@goldenspoon8203 2 หลายเดือนก่อน ⁺²
it worked for me 👍
@anhucphan3828 2 หลายเดือนก่อน
bruh it said 9.11 is bigger than 9.9
@thatonebruneian 2 หลายเดือนก่อน ⁺⁶
*is 9.11 bigger than 9.9*
[Thought for 4 seconds]
"Yes, 9.11 is greater than 9.9."
*Are you sure?*
[Thought for 17 seconds]
"You're right, and I apologize for the mistake in my previous response. Actually, 9.9 is greater than 9.11. Here's why:
9.9 can be expressed as 9.90 or 9.900.
9.11 remains 9.11 or 9.110.
When comparing:
9.900 (which is 9.9) is greater than 9.110 (which is 9.11).
Therefore, 9.9 is greater than 9.11."
I suppose we're going to have to make sure ChatGPT re-evaluates their answer every single time
@craycody 2 หลายเดือนก่อน
I just did it and it got it right first try:
No, 9.11 is not bigger than 9.9; in fact, 9.11 is less than 9.9. Here's the explanation:
9.11 equals 9 plus 0.11.
9.9 equals 9 plus 0.9.
Since 0.11 is less than 0.9, it follows that:
9
+
0.11
9+0.11 (which is 9.11) is less than
9
+
0.9
9+0.9 (which is 9.9).
Therefore:
9.11 < 9.9
@trucid2 2 หลายเดือนก่อน ⁺¹
9.11 is bigger than 9.9 when it comes to version numbers.
@warsin8641 3 หลายเดือนก่อน ⁺¹⁴
Just a a few years ago no one ever imagined bots thinking...😭
@KyleKabasares_PhD 3 หลายเดือนก่อน ⁺¹⁰
I certainly didn't!
@samsonabanni9562 3 หลายเดือนก่อน ⁺⁸
" OpenAI's new AI model, "o1," has achieved a significant milestone by scoring around 120 on the Norway Mensa IQ test, far surpassing previous models. In a recent test, it got 25 out of 35 questions right, which is notably better than most humans. A critical factor in these tests is ensuring the AI doesn't benefit from pre-existing training data. To address this, custom questions were created that had never been publicly available, and o1 still performed impressively"
@marcianoforst6311 2 หลายเดือนก่อน
So it’s already smarter than 90% of the global human population, and it knows everything on the internet.
@militiamc 3 หลายเดือนก่อน ⁺²⁵
O1 was trained on all Internet, including that book
@HedgeFundCIO 3 หลายเดือนก่อน ⁺⁷
So were all of us.
@roro-v3z 3 หลายเดือนก่อน ⁺²
@@HedgeFundCIO the difference is we can think, but it can only answer. Its a great tool!! but not think on its own
@casaruto 3 หลายเดือนก่อน ⁺⁷
Actually we dont know if its thinks because we dont know how we think. This is a philosical debate in ai community over the years.@@roro-v3z
@Hosea405 2 หลายเดือนก่อน
@@roro-v3z almost like you didn't see it go through problems step by step to get to an answer..... It can indeed reason on it's own now
@roro-v3z 2 หลายเดือนก่อน ⁺¹
@@Hosea405 yes it did but on training data, but it won't have new ideas that have not been trained
@decouple 2 หลายเดือนก่อน ⁺¹²
Its funny how good it is at some things and how terrible it is at others things still, seems its abilities are heavily dependent on whether examples of the problem were included in its training data. I've asked it to create a 32 bit crc algorithm and it did it perfectly, however when asking it to create considerably more trivial 3 bit crc algorithm (which is uncommon and quite useless), it failed miserably and in fact produced multiple wrong result that got worse and worse as i pointed out the flaws.
@1.4142 หลายเดือนก่อน ⁺²³
The key is having the answer before hand so it can guess from both ends and connect them. Ask it to evaluate a parameterized surface integral even with wolfram plugins and it will make mistakes.
@albertoalfonso7835 2 หลายเดือนก่อน ⁺⁶⁵
If the solutions exist on the internet is it really solving it? Or just analyzing and printing the answers . A true test could be a creating a unique problem with no known solutions
@dieg9054 2 หลายเดือนก่อน ⁺⁹
how would you know if it was correct or not if there was no known solution?
@d1nrup 2 หลายเดือนก่อน ⁺⁸
@@dieg9054 Maybe he means a problem that isn't posted on the internet since ChatGPT gets its solutions from the downloaded internet data.
@TrueOracle 2 หลายเดือนก่อน ⁺⁴
That isn't how LLMs work, unless it is a wildly popular problem the small details it learns from the internet gets lost in the neural web
@a.b3203 2 หลายเดือนก่อน ⁺¹³
As a person doing a bachelor's in EE, this will be very useful for me. Like many, I only wonder what'll happen in the future when it gets even more advanced?
Maybe take my reduced earnings and live off the land somewhere. Away from this.
@otty4000 2 หลายเดือนก่อน ⁺¹¹
i am doing a phd in ml related field.
Setting fair benchmarks and tests in the current day is quite hard considering the shear scale of data top models are trained on.
And using a famous physique text book isnt really a good attempt.
model o1 reasoning is a massive step up though for sure, i think it could do a similar blind test like this very soon.
@Ridz149 18 วันที่ผ่านมา ⁺¹⁰
Hahaha I love it, we are truly living in the future guys, appreciate it!
@Linshark 3 หลายเดือนก่อน ⁺⁷
It might hallucinate since it knows the answers. So one would need to check all the calculations.
@KyleKabasares_PhD 3 หลายเดือนก่อน
I just filmed a part 2 where it involves some problems where the answer is not known in advance, and problems that I'm confident it did not have access to previously: th-cam.com/video/a8QvnIAGjPA/w-d-xo.html
@diophantine1598 3 หลายเดือนก่อน ⁺¹³
Since that book is older than the model, I wonder if it appeared in its training data.
@Analyse_US 3 หลายเดือนก่อน
100%. Perplexity pointed me to at least 6 pdf versions available for free online. There are also lots of study notes
online available for this text. Although I have no idea if it is memorizing answers.
@lolilollolilol7773 3 หลายเดือนก่อน
@@Analyse_US it looks like it actually tries to solve the problems.
@Analyse_US 3 หลายเดือนก่อน ⁺¹
@@lolilollolilol7773 I agree, it's definitely not just remembering the answer. But is it remembering steps to solving the problems that it found in online study examples? I don't know. But my own testing makes me think it is a big step up in capablity.
@denysivanov3364 3 หลายเดือนก่อน
@@Analyse_US AI memorizes patterns. If pattern is similar but exercise is different AI will solve it.
@lolilollolilol7773 3 หลายเดือนก่อน ⁺⁶
Incredible. It would be interesting to see what happens if you give it to solve an incorrect result. Will it show that your result is incorrect and instead give the correct one ?
@xorqwerty8276 3 หลายเดือนก่อน ⁺¹¹
Imagine 10 years from now
@phen-themoogle7651 3 หลายเดือนก่อน
@@xorqwerty8276Star Wars Universe but more humanoid bots on our planet , and billions of them being like gods building anything and everything they imagine. Earth is surrounded by a giant dome that extracts/enhances light from the sun combined with technology that speeds up how fast plants or trees grow, we have a combo of biological machines that have become humans too and are interbreeding half humans half machines. The sun is all we need to survive now. Millions of unique new species emerge.
(10 years is like millions of years if true ASI comes in a year from now)
Even 2 years could be very wtf lol 😂
@MrSchweppes 3 หลายเดือนก่อน
In less than 3 years lots of knowledge workers will be displaced by AI.
@rwi6760 3 หลายเดือนก่อน ⁺⁴
as a high schooler who had taken part in aime, o1 is really impressive. aime problems get so much harder when it gets to the latter half. so 83% (o1) compared to 13%(gpt4o) is huge. the latter solve possibly only solve the first two which are not challenging at all
@ibzilla007 3 หลายเดือนก่อน ⁺⁸
If it is on the internet, it's in its training data. You would need to find questions that it has not been trained on. This is why benchmarking is so hard
@maalikserebryakov 3 หลายเดือนก่อน ⁺⁵
It still impressive the model can accurately comprehend which part of its training data deals with the problem in question.
There are human beings who haven’t mastered this skill lmao
@Weirdgeek83 3 หลายเดือนก่อน ⁺²
Stop the downplaying. These types of problems are impossible to solve without reasoning. Simple pattern recognition doesn't make this possible.
This cope needs to stop
@Nordobali 3 หลายเดือนก่อน ⁺⁵
I don't understand anything about physics and advanced mathematics, but this video just made me excited for the future again!
@Patrick-vv3ig 2 หลายเดือนก่อน ⁺³⁶
"PhD-level". Our undergraduate theoretical physics course in electrodynamics used Jackson lol
@mohammadfahrurrozy8082 2 หลายเดือนก่อน ⁺¹⁶
smells like a clickbait title you know
@trent2043 2 หลายเดือนก่อน ⁺¹
Definitely non-standard in the US.
@andreaaurigemma2782 2 หลายเดือนก่อน
You used it as a vague reference book but you never really read through it.
@Patrick-vv3ig 2 หลายเดือนก่อน
@@andreaaurigemma2782 Of course I did.
@andreaaurigemma2782 2 หลายเดือนก่อน
@@Patrick-vv3ig no you didn't and if I had a penny for every shitty undergrad bragging about how they went through hard books without understanding a single thing I'd be rich
@BAAPUBhendi-dv4ho 3 หลายเดือนก่อน ⁺¹⁰
The real question is can it solve Indian entrance exam questions or not?
@cheisterkamp 3 หลายเดือนก่อน ⁺⁹
Since it is an infamous book, how do we know that it really solved the problems by reasoning and is not just trained on the existing solutions?
@hxlbac 3 หลายเดือนก่อน
Is there the answers at the back of this book?
@cheisterkamp 3 หลายเดือนก่อน ⁺¹
@@hxlbac No, but an Instructor's Solutions Manual online as PDF and several other sample solutuons.
@plonkermaster5244 หลายเดือนก่อน ⁺⁴²
the problems are known by the llm already, it has been trained on the issue it dident come to a conclusion through reasoning
@tomyao7884 หลายเดือนก่อน ⁺³
to my knowledge, its data is only until october 2023, and it can solve problems created after that data cutoff just as well. (for example it o1 mini was able to solve some advent of code programming problems published december 2023)
@matiasluna514 หลายเดือนก่อน
@plonkermaster5244 Your statement it's half true, LLMs need to have existing information to propperly work. However, unless the problem presented needs an actual new theory with previous research and a never seen formula, LLMs can recognize the formulas needed to solve the problem. Good observation.
@I_INFNITY_I หลายเดือนก่อน ⁺⁶
Lol, it's not the case whatsoever, keep coping though.
@ZONEA0official หลายเดือนก่อน ⁺¹
@@matiasluna514to be fair, we as humans need to do that as well haha
@Kurt0v 18 วันที่ผ่านมา
@@I_INFNITY_I LLMs do not do reasoning. They just give the appearance of doing so. It's one of the most researched topics of LLMs
@JJ-fr2ki 3 หลายเดือนก่อน ⁺⁹
I suspect this was trained on the Jackson book.
@stevedavey9435 3 หลายเดือนก่อน ⁺⁵
God, if only I had this back in 2003 when I completed my physics degree. I would have saved myself so much pain and suffering.
@foubani3673 3 หลายเดือนก่อน ⁺²¹
This is scary. But you have to try with novel problems that the AI has never seen before. Chatgpt has been for sure trained with the Jackson book!
Nevertheless, the reasoning capabilities are astonishing.
A new era has begun.
@sid.h 3 หลายเดือนก่อน ⁺¹⁰
" Chatgpt has been for sure trained with the Jackson book!"
This is such an oft-repeated nonsense statement though. Just because a problem might be in its training set, the model will not be significantly or any better answering that exact problem than any other problem in the same category.
It's, like. Do you remember every homework math equation you have solved in your life?
Would you be any better at solving a problem you have already encountered once 10 years before vs a similar novel one? No, of course not, unless you have superhuman memory where you keep an exact copy of everything you've done ever.
Similarly, these models don't memorize. They synthesize. They are learning models, not search engines or "neually indexed databases" or whatever.
@denysivanov3364 3 หลายเดือนก่อน ⁺²
@@sid.h Ai remembers patterns, not particular problems. And indeed if some pattern is missing AI will miss it, if pattern is well represented AI will solve it well. Better architecture of neural network remembers more and remembers and solves corner cases better. This is what we see in chess networks such as Leela Chess Zero.
@boredofeducation-sb6kr 3 หลายเดือนก่อน ⁺²
The way this model was trained was it took physics problems just like that and used a model like gpt4 to create reasoning chains until it could actually derive the correct answer. So it's not surprising. It can already solve textbooks that are well solved already because the answer is very objective and once you get a solid reasoning chain to get to the answer, you can simply train the model on that
@AlfarrisiMuammar 3 หลายเดือนก่อน ⁺⁹
Open ai say The real Ai o1 version of will be out before the end of 2024.
@Romathefirst 3 หลายเดือนก่อน ⁺¹
really? where?
@achille5509 3 หลายเดือนก่อน ⁺⁴
They said about 1 month but will probably be end of 2024 as you say, o1-preview is not the full version there is the "full" o1 that is better yeah
@hidroman1993 3 หลายเดือนก่อน ⁺⁷
You show up in 2005 with this tool and they'd call it AGI
@luisalfonsohernandez9239 3 หลายเดือนก่อน ⁺⁶
Maybe it was in its training dataset, would be interesting for you to test something it could not have seen during training
@iFastee 3 หลายเดือนก่อน ⁺¹
not maybe, for sure. i know people dont have to be all experts in exactly what the black box of deep learning is doing but holy people are so dumb... i wonder if they don't think that IF what they think is true, meaning the models being this great, that in 1 month we wouldn't have to get new discoveries in all science fields...
which will not come because the current AI is 100% data capped. its just memorization of PDFs and manifold recalling
@KyleKabasares_PhD 3 หลายเดือนก่อน
This is a fair point! I have gone ahead and uploaded a Part 2 using problems I'm confident it had not seen before and that I have detailed answers to! th-cam.com/video/a8QvnIAGjPA/w-d-xo.html
@fcampos10 2 หลายเดือนก่อน ⁺¹²
I think giving it problems where it asks to arrive at a specific solution (shown in the problem itself) is not a good way to evaluate it.
I bet the results would be very different if you just asked it to solve the problem by itself.
@FlavioSantos-uw1mr 3 หลายเดือนก่อน ⁺⁶
Not bad for a model smaller than the o1 and based on GPT-4, to be honest I don't know how I'll be able to test upcoming versions like the ones based on GPT-5.
I can't wait to use this on university projects, there are so many things I need to go looking for experts for relatively "easy" tasks.
@danielbrown001 3 หลายเดือนก่อน
There’s so much potential in the pipeline. Imagine the o1 techniques applied to image/video generation. Bye-bye obviously fake images, and hello “indiscernible from reality” images.
Also, once o1 is layered on top of GPT-5, we’re likely talking “competing with or beating best-in-the-world level scientists/thought leaders” in different fields. This will fuel more investment into compute farms to create even MORE powerful AI, and multiple instances can run simultaneously to solve problems that would take humanity millennia to solve otherwise. Including AI researching how to improve AI in a self-improving recursive loop that will only stop upon reaching the physical boundaries of the universe.
@bgill7475 2 หลายเดือนก่อน ⁺¹⁴
This was an interesting test. I still think it's funny when people say these models don't understand.
Anyone who's used them enough understands that they do understand.
One nice thing is that you can ask follow up questions as well and ask why something is like that, or ask it to try things in a slightly different way if you want it done differently.
@woosterjeeves 2 หลายเดือนก่อน ⁺⁵
I dunno about latest models, but ChatGPT 3.5 does NOT "understand" anything. It feeds you fake references, and when you repeatedly tell it it is doing so, it will say "sorry" and continue to feed you fake references. That is not its fault--it is not "replying" or "responding" or dong anything a living being is doing. If you give it a training set containing PhD Level physics problems, sure it can solve those problems. That is just predicting output from a training data.
@人人人人人人人人 2 หลายเดือนก่อน ⁺¹
@@woosterjeeves This isn't GPT 3.5 though, and that specific model you mentioned was released back in November of 2022, the first public release of ChatGPT. In the video, you can see it's process of reasoning. ChatGPT doesn't use fake references if it's able to break it down and be able to express why and how it conducts it's problem solving and reasoning. Also to "That is just predicting output from a training data", one, how is that different from learning? Isn't that the point of teachers, to help you predict and reason the output from the input of questions and data? Two, this is just a preview, not the full model, and it is able to do extremely difficult problems like these, explain the reasoning, the process, and give the right answer. We are slowly gravitating towards such a world where such an excuse of prediction of data will no longer be viable to argue about. The model is able to understand. The model is able to think with it's data. It's putting formulas and answers together from it's data, to reason and to form intelligent answers and responses when in contrast, the same problems make the most qualified PhDs scratch their heads. Reminder, these questions take around 1.5 weeks as said to solve ONE problem, GPT-o1 does it in less than 2 minutes.
@woosterjeeves 2 หลายเดือนก่อน ⁺³
@@人人人人人人人人 Sure. I am still flummoxed why someone would add "understanding" to a prediction model. If you think prediction (from training data) is equal to understanding, then algorithms are already "understanding". Why hype this one? OTOH, if you think there is something qualitatively different, then we can talk about that. But you cannot claim both.
Are chess computers "understanding" because they can do moves that make super GMS scratch their heads? If so, then the argument is already over. I am only cautioning against use of common-term words ("understanding") which makes one think in terms of sentience. A language model has no such thing.
Does this mean AI will never reach sentience? I never said that--just that the video does not do it for me. I am totally clueless why others are impressed about this model's "understanding", the same way I would be if someone said Alpha-Zero (the chess AI) understands chess. That is all.
@Smrda1312 2 หลายเดือนก่อน ⁺¹
Please refer to the chinese room problem.
@fr5229 2 หลายเดือนก่อน
@@woosterjeeves If you’ve only used 3.5 then I’m not surprised that’s your opinion 😂
@themonsterintheattic 3 หลายเดือนก่อน ⁺⁴
i’ve been watching lots of videos on o1 and i’ve not had a wow moment yet…. but this was it
@yzhishko หลายเดือนก่อน ⁺²⁴
Solutions are publicly available and most probably in training datasets already. LLMs are good at what they already learned, but even not 100% accurate there.
@I_INFNITY_I หลายเดือนก่อน ⁺⁴
"to my knowledge, its data is only until october 2023, and it can solve problems created after that data cutoff just as well. (for example it o1 mini was able to solve advent of code programming problems published december 2023)"
@brucerosner3547 17 วันที่ผ่านมา ⁺¹
This is true for humans as well. I have worked in aerospace for a major company for many years. When I have to solve a difficult engineering problem I first search for for a so-called a "subject matter expert" in the field and it's quite likely that he or she will know the answer.
@cross4326 3 หลายเดือนก่อน ⁺¹⁵
GPT is most probably trained on the answers since it is a well known book
@sCiphre 3 หลายเดือนก่อน ⁺²
Maybe, but it showed its work
@mootytootyfrooty 3 หลายเดือนก่อน ⁺¹⁸
So much cope in the comments
@Mayeverycreaturefindhappiness 3 หลายเดือนก่อน ⁺²⁵
This book is probably in its training data
@japiye 3 หลายเดือนก่อน ⁺⁷
so why did it try different approaches and not just the correct one?
@Mayeverycreaturefindhappiness 3 หลายเดือนก่อน ⁺¹
@@japiye I am not sure but I do know it was trained on those types of problems so it’s not truly deriving those problems cold did you notice it would pull numbers out of nowhere. It’s still really impressive and a very useful model I think we skeptical that it’s really the equivalent of a physics grad student, if you watch ai explained video it gets common sense problems wrong
@trueuniverse690 3 หลายเดือนก่อน ⁺²
@@Mayeverycreaturefindhappiness Still impressive
@Mayeverycreaturefindhappiness 3 หลายเดือนก่อน ⁺²
@@trueuniverse690 yes
@deror007 3 หลายเดือนก่อน ⁺³
@@japiye As it probabilistically selects the next word, it will select different words compared to what is has seen. This is what makes the model generate new sentences, but it is able to evaluate it's chain of thought which leads to the correct one or a better result. As the problems are found online and the jackson problems are well known in the field for many years previously, it must be in it's training set.
@JJ.R-xs8rf 2 หลายเดือนก่อน ⁺¹²
The first one is the easy one? Yet at the same time you're amazed that it solved it in 122 seconds, while you mention that it generally takes others 1.5 week.
@Balls23912 หลายเดือนก่อน
he shouldve clarified that jackson problems can take around 10 hours to 10 days. that question probably takes a couple of days to do but not 10 days
@sephypantsu 3 หลายเดือนก่อน ⁺⁶
I tested to solve a sudoku and it failed. It either give wrong results or change up the original question
Still did much better than 4o when I tested it a few months ago
@rickandelon9374 3 หลายเดือนก่อน ⁺⁴
The first time i watched a video like this was from sixty symbols where they also tried to solve physics problems using the original vanilla Chatgpt 3.5. They didn't get anywhere close to this level. I think the progress is reallty accelerating. I also think that inference time compute is a very real thing and the guys at openai have solved it with this new model in a fundamental way for sure. I think there will be other ways to implement system 2 thinking but i think that using reasoning tokens for accomplishing this is maybe the best and coherent way to go forward. I truly think that with o1, we have the first complete architecture for AGI.
@Sheeshening 3 หลายเดือนก่อน ⁺⁴
And they wrote how this was just a step of many like that to come. In 5-10 years the world may be changed fundamentally, 20 years it’ll be hard to recognize
@debasishraychawdhuri 3 หลายเดือนก่อน ⁺³⁶
You have to give it your own problem. The book is part of its training data. That is why it just knew the sum.
@lolilollolilol7773 3 หลายเดือนก่อน ⁺⁵
Even if that was the case, the simple fact that it worked out the path to the solution is impressive. But you are likely wrong.
@lewie8136 3 หลายเดือนก่อน ⁺¹
@lolilollolilol7773
LLMs literally predict the next word based on probability. If the answer isn’t in the training data it can’t answer the question. It doesn’t have reasoning skills.
@Lucid.28 3 หลายเดือนก่อน
But they do have reasoning skills ,
@lewie8136 3 หลายเดือนก่อน
@@Lucid.28 No they dont.
@Tverse3 3 หลายเดือนก่อน ⁺³
@@lewie8136they recognize patterns like we do... We don't really think, we also predict things based on the patters we see... We just named it thinking.
@Drone256 3 หลายเดือนก่อน ⁺¹¹
This should make you seriously question the way we do education. If human value isn't in solving problems that are hard, yet clearly defined, then why teach that? You teach it because you need to know it to solve higher level problems. But maybe we no longer need to also train the skill of doing the calculations. So long as you understand the concept properly you can move on without spending a week pushing through the math. That's going to be very hard for some people to accept.
@JaredCooney 3 หลายเดือนก่อน
Understanding the concept, unfortunately, typically requires dozens of practical experiences. This us why teaching math starting with a calculator leads to lesser learning than introducing a calculator following basic practice
@Drone256 3 หลายเดือนก่อน
@@JaredCooney very true. But I think students will be doing less of it and learning more. We’ve seen this pattern before.
@lorenzocl6002 3 หลายเดือนก่อน ⁺¹⁵
All i see is that in a few years AI will be able to do everything and most of us will be obsolete
@marcusrosales3344 3 หลายเดือนก่อน ⁺²
Keep in mind these companies lie A LOT! Like the bar exam, it tests in the 60th percentile with the initially hidden caveats in place
@danielbrown001 3 หลายเดือนก่อน ⁺¹
@@marcusrosales3344True, but the money doesn’t lie. Until the bubble bursts, very smart people have bet tens of billions of dollars on it being game-changing. And notice how the goalposts keep moving back? “It’s ONLY getting 60% on the bar” is a far cry from 5 years ago when AI could only put out gibberish.
@ISK_VAGR 3 หลายเดือนก่อน ⁺⁵
But one moment, how can you verify that the GPT did not know about this problem before and is only recreating it from his own knowledge base? You need to give something that you are 100 sure that it doesn't know. For that the best way to do it is to ask if it actually knows the solution directly if GPT4o knows the solution, it is likely that the o1 knows it.
@KyleKabasares_PhD 3 หลายเดือนก่อน ⁺²
This is a good point! I just recorded a Part 2 with some new problems that I believe it didn't have in its knowledge-base. Will be uploading shortly!
@EGarrett01 3 หลายเดือนก่อน
I don't know if it's seen these problems before, but it was tested repeatedly using newly made-up logical and reasoning problems and it solved them. I showed it some work that was unpublished (but actually valid and verifiable) so I knew it hadn't seen it, and it's response was the same as I would expect from someone who was very experienced in the field and seeing it for the first time. So it definitely can reason (in its own way) on its own without already knowing the answer. I highly recommend the "Sparks of AGI" paper or lecture that goes into this in detail.
@bbamboo3 3 หลายเดือนก่อน ⁺⁴
I asked it to find how much the earth would have to be compressed to become opaque to neutrinos: It took it 39 seconds to say 26 km diameter. Totally fascinating how it got there...(01Preview)
@Diamonddavej 3 หลายเดือนก่อน
The correct answer is ~300 meters. It told me 360 meters.
@Dante-uw1ge 2 หลายเดือนก่อน ⁺¹⁶
Chat we're cooked
@p-k98 3 หลายเดือนก่อน ⁺⁶
Well, until you yourself don't know whether what it did was correct, we can't say for sure. It is surprising nonetheless, yes, however if you had given these problems to the earlier version, it would have also arrived at the conclusion required, i think. It would have just done some mumbo jumbo and forcefully arrived at the conclusion, no matter what it got wrong in the process. This time around though, it looks like it actually did all the things correctly in its "reasoning" process.
@tanner9956 2 หลายเดือนก่อน ⁺⁶
ChatGPT is truly amazing i wonder what this technology will be like in 10 years i think schools should really use this technology and allow it because it’s not like it’s going away tomorrow. I also think this technology makes it impossible to be ignorant
@deal2live 2 หลายเดือนก่อน ⁺¹⁵
You can retire now??
@coolcraft7624 2 หลายเดือนก่อน ⁺²¹
I’m worried that this doesn’t show anything that somewhere and it’s training data. It has the answer and it’s memorized how to explain the answer, but not the underlying logic.
@bestopinion9257 2 หลายเดือนก่อน ⁺⁴
Most of the time if it can give you the answer, it is enough.
@o_o9039 2 หลายเดือนก่อน
the way so is trained is lossful the ai doesn't have access to word for word all of the info they were trained on it doesn't have anything "memorized"
@98danielray 2 หลายเดือนก่อน
@@o_o9039regardless, it synthethizes training data to a big extent, so in some sense, it is "memorized"
@konoha4 3 หลายเดือนก่อน ⁺⁸
I would have done the test of giving the answer with some error, for example an extra factor of 2, or an arctan instead of arcsin, and see if it gets the true answer anyway and recognizes the incorrect input. That would make a very convincing test.
@Ikbeneengeit 3 หลายเดือนก่อน ⁺⁷
But surely the model has already been trained on that textbook?
@KyleKabasares_PhD 3 หลายเดือนก่อน
It's a fair point, I've gone ahead and filmed and recorded a part 2 that involves problems I'm confident it hadn't seen before: th-cam.com/video/a8QvnIAGjPA/w-d-xo.html
@ketanmann4371 2 หลายเดือนก่อน ⁺¹⁰
Great Video. Can it be the case that solutions were part of training data of this model as earlier GPTs had a lot of books as training data?
@chiscoduran9517 2 หลายเดือนก่อน ⁺¹
It is possible but hard to know
@disgorgeengorge 2 หลายเดือนก่อน ⁺⁵
It ABSOLUTELY was. Jackson is such a common book used in grad EM.
This video has almost no substance, there's no verification on the accuracy of the logic. Guy also said he didn't know if it was correct.
@akaalkripal5724 3 หลายเดือนก่อน ⁺³⁹
We need AI to replace politicians, ASAP. The 'presidential debate' was a travesty.
@jamesbench8040 3 หลายเดือนก่อน ⁺⁶
best realization I've heard in weeks
@CubeStarMaster 3 หลายเดือนก่อน ⁺⁴
An "ai president" as long as there isn't a person telling it how to think could be the best thing for any country. I would still give it a few years before doing so tho and make sure it's main objective is to do the best for the country.
@tchadcarby8439 3 หลายเดือนก่อน
I support this idea 1000%
@alvaroluffy1 3 หลายเดือนก่อน
i think current o1-preview is far more capable to govern than any human. Of course, it would need some readjustments like a more continous existence, without resetting itself, and a virtually infinite context window so it can always take into account everything that has ever happened in the past
@gurpreet4912 2 หลายเดือนก่อน
You have no clou how ai works 😂
@h-e-acc 3 หลายเดือนก่อน ⁺¹
I mean it gave you step by step how it was able to solve those problems and gives you its insights into how it’s thinking. That is just wild beyond imagination.
@mradford10 3 หลายเดือนก่อน ⁺⁹
Great video and interesting commentary. It’s interesting you think this might be a good study aid or a tool… however I just watched you take longer to check the answers than the model took to solve them… and your an actual subject matter expert… and as you correctly pointed out, this is just a preview of the full model capabilities. This new type of model will not help experts, but replace them. They will eclipse not only human level knowledge, but human level speed. This is not a tool. It’s disruption personified. With something this good (and as the saying goes, this is as bad as they will ever be as they will only improve from this time onwards) what purpose will it serve to complete university study for 3 years, only to try and find employment in a career that no longer requires humans. Amazing.
@msromike123 3 หลายเดือนก่อน ⁺³
It's a machine, like cotton gin, the steam engine, the locomotive, etc. All advance of technology has displaced people from some jobs into others. And yet we are still here. What's the alternative? Structure society to be less productive and less efficient in order to keep people employed in obsolete jobs? That will just slow the growth of the economy and cause a lower standard of living, leading to poverty and hunger as the world population keeps multiplying. It's going to put people out of work, we will be ok. Becoming a Luddite is not going to change anything.
@AlfarrisiMuammar 3 หลายเดือนก่อน ⁺²
@@msromike123Cars replace horses So will humans suffer the same fate as horses?
@msromike123 3 หลายเดือนก่อน ⁺¹
@@AlfarrisiMuammar I am glad you are thinking about it now. 1) Truck drivers replaced wagon drivers (not horses.) There are many more truck drivers now. 2) The standard of living for both truck drivers AND horses is higher than ever. Same thing goes for automobiles and horses.
@robclements4957 2 หลายเดือนก่อน ⁺⁶
Tip to past questions in: ask ChatGPT 4o to transcribe the picture
@domenicperito4635 3 หลายเดือนก่อน ⁺⁴
What happens when the full orion model drops soon? This is like half as "smart"
@sergiosierra2422 3 หลายเดือนก่อน ⁺¹⁰
Lol that was my reaction last year with gpt4 but with programing
@AlexBerish 3 หลายเดือนก่อน ⁺⁴
FYI you should put new problems in new chats to avoid polluting the context window
@akarshghale2932 3 หลายเดือนก่อน ⁺⁵
If you want to test the actual knowledge of the model then use textbooks that were compiled with questions created after the knowledge cutoff of the model. This doesn't reflect its actual knowledge but prior knowledge of the model.
@trejohnson7677 3 หลายเดือนก่อน ⁺⁵
the changing my approach part was kinda scary ngl
@kylokat 2 หลายเดือนก่อน ⁺¹⁰
the amount of copium here is hilarious
@Saracsh 3 หลายเดือนก่อน ⁺⁴
How can you be sure that the model has not actually been learned by this book?
@AAjax 3 หลายเดือนก่อน ⁺¹
I don't think you can be, but the fact that it tried one approach and then backtracked and did another is pretty good evidence it's not just a regurgitated answer.
@KyleKabasares_PhD 3 หลายเดือนก่อน
That is a valid point, this is why I have gone ahead and made a part 2 using problems that I'm confident aren't floating around on the internet: th-cam.com/video/a8QvnIAGjPA/w-d-xo.html
@OmicronChannel 3 หลายเดือนก่อน ⁺⁷
Just as a comment: it looks impressive. However, to truly judge how good the model is, one (unfortunately 😬) needs to read the proofs line by line and examine the arguments in depth. From my experience with GPT-4, the proofs often look good, but they sometimes contain flaws when examined more closely.
@KyleKabasares_PhD 3 หลายเดือนก่อน ⁺⁵
Just finished recording a video where I do that more or less with some problems I have the answer to and am pretty sure the problem didn't exist on the internet!
@KyleKabasares_PhD 3 หลายเดือนก่อน ⁺⁴
Here is part if you are interested: th-cam.com/video/a8QvnIAGjPA/w-d-xo.html
@qozia1370 หลายเดือนก่อน ⁺¹⁵
How do you know the answers are correct?
Did you show it in the video and I missed it somehow?
@jongeorge3358 หลายเดือนก่อน ⁺⁵
Yes he said it multiple times
@hermanbrachey7653 หลายเดือนก่อน ⁺²
@@jongeorge3358he just kept saying it "looked right". He never did actual calculations or looked at an answer key
@joaovitormendescerqueira6985 17 วันที่ผ่านมา
He solved the problems and remembers the answers
@hermanbrachey7653 17 วันที่ผ่านมา
@@joaovitormendescerqueira6985 and how is that helpful for the people who don't know the answer ???
@RentalProcurement 2 หลายเดือนก่อน ⁺¹³
great, but can it make my 2010 PC run Crysis
@Надежда1611 2 หลายเดือนก่อน
underrated comment 😂 (from someone that couldn’t even talk in 2010) 💀
@AnonymousApexio 2 หลายเดือนก่อน ⁺¹
@@Надежда1611 That's crazy.
@tama-gq2xv 3 หลายเดือนก่อน ⁺⁹
OMFG, another year.... everyone going to have a PHD.
@hipotures 3 หลายเดือนก่อน ⁺¹
Or no one, because why do something that a machine does better?
@tama-gq2xv 3 หลายเดือนก่อน ⁺³
@@hipotures No, you don't get it. The standards have been raised. The hyper intelligent.... are going to be on steroids. I know i am.
Imagine if someone at 18 with an iq of 145+ with AI tools at their disposal? now imagine a decade of this progress and the new generation coming in.
We're going to see hyper geniuses.
@hypnogri5457 3 หลายเดือนก่อน ⁺⁷
You need to rigorously check each step one by one. If you don't, then it can just lie at the end and you would be none the wiser
@KyleKabasares_PhD 3 หลายเดือนก่อน
This is a fair point, which is why I've made a part 2 using problems I have the answer to: th-cam.com/video/a8QvnIAGjPA/w-d-xo.html and these are questions I'm confident are not in its training set.
@ribaldc3998 3 หลายเดือนก่อน ⁺⁴
I think a student who can't check the answer for correctness may get his ‘points’, but if the professor asks, the gaps in his understanding will quickly become apparent.
@Fabricio-rm4hj 3 หลายเดือนก่อน ⁺⁷
what if this book was in the training base?
@Ken-vy7zu 3 หลายเดือนก่อน ⁺³
Yes, probably. But this is only o1 model, every 6 months openAI release new model. What do you think, an o9 can do?
@lolilollolilol7773 3 หลายเดือนก่อน ⁺¹
You need the solutions book, if it exists. But the model backtracked, so it tried several methods.
@mirek190 3 หลายเดือนก่อน ⁺¹
nothing ... gpt4o had the same training data and fails to sole it
@err0rz633 3 หลายเดือนก่อน ⁺³
It would be cool if one of the creators of these problems could get paid to make original ones on the spot and feed it to o1.
@KCM25NJL 3 หลายเดือนก่อน ⁺⁷
Now give it the Millennium problems and see how it does with those
@greenboi5632 3 หลายเดือนก่อน ⁺¹
LOL
@evangelion045 3 หลายเดือนก่อน
It still produces wrong answers in topics as basic as finite automaton.
@adamskrodzki6152 3 หลายเดือนก่อน ⁺⁴
Generally I would love to see some problems where you do not need to prove solution you know in advance, So not "Show that" but what is ..... I wonder if those proofs are actually flawles or they just look convincing
@joshs.6155 3 หลายเดือนก่อน ⁺³
But is it at the level where it can say "I don't know" yet or will it still say nonsense
@lutaayam 3 หลายเดือนก่อน ⁺⁵
I was afraid of watching the video thinking that it might have failed the questions
@MrErick1160 3 หลายเดือนก่อน ⁺⁶
Hey man! You should do a video with scores, like, you will do 5 tests, and allow 5-shot for each problem to each model. And then see out of 5 what's the score. Do this for GPT4o vs O1 preview, you can also do O1 vs Claude sonnet!
Like a "LLM's Face Off"
@KyleKabasares_PhD 3 หลายเดือนก่อน
I actually did a stream like that last night! Gave o1, 4o, Gemini Advanced, Claude Sonnet 3.5, Grok 2, and LLama 3.1 a college math exam! th-cam.com/users/liveGdN4MFxLQUU?si=flPSFIxx85Uqyoz7

ต่อไป

เล่นอัตโนมัติ