ChatGPT's New Advanced Voice Mode Does Math and Physics (with Accents!)

Kyle Kabasares

มุมมอง 7 059

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 พ.ย. 2024

ความคิดเห็น • 71

@LazarusStirs หลายเดือนก่อน ⁺³¹
I keep reading people crapping on the new advanced voice, but I am just absolutely Blown Away by it. The fact that it was able to solve any of those problems just being told them verbally, and also throw in some comedy by doing it in a New York accent, I mean seriously, I still can't believe we actually have this technology right now. When you take a step back and truly try to grasp what this thing is doing, at least for me, my jaw just drops to the ground. Great videos by the way.
@fcaspergerrainman หลายเดือนก่อน
Agreed
@klarad3978 หลายเดือนก่อน ⁺¹
The NY accent was ok, I’m a New Yorker and it didn’t quite nail it, but I’m just being picky, what really matters is that it actually solved it! Through verbal instruction! Amazing! People that crap on it will keep pushing the goal posts further and further as it comes closer and closer to AGI, that just shows their own level of fear and threat they feel from this technology.
@spiker.c6058 หลายเดือนก่อน
I understand that people have high exceptions because of the demos but it's mostly to do stuff without real value (for me at least), things like moaning, telling stories with very specific or invented accents or languages.
@MrErick1160 หลายเดือนก่อน ⁺²
Humans are just endlessly ungrateful spoiled brats. That's never going to change unfortunately. We may have god-level tech, we'll always be greedy for more.
@dawid_dahl หลายเดือนก่อน
100% agree.
All the people who complain about anything about Advanced Voice Mode should stop everything they’re doing right now, and watch Louis CK’a bit: “Everything is amazing and nobody is happy.”
@jendabekCZ หลายเดือนก่อน ⁺⁶
Finally some different demo than "count from 0 to 100 as fast as you can". Thanks
@gustafpihl หลายเดือนก่อน ⁺⁷
Hey Kyle. Fun video. Just wanted to comment again that the advanced model is not actually transcribing to text, but is taking the audio as input directly. That's what the old version was doing.
In a way this actually makes it even more impressive in my opinion and it would be interesting to try to figure out if there is any distinction between it's knowledge and understanding in the audio modality vs text modality. The best case is that it has been able to generalize and internally project both to the same semantic representation, but in practice I would guess it ends up not being as good at things like math (which seems to be the case). With the text modality there is a discrete and unambiguous representation available to it in its context, whereas with audio this is not the case.
@KyleKabasares_PhD หลายเดือนก่อน ⁺³
Thanks so much!
@beibarysabdigali335 หลายเดือนก่อน ⁺³
I think openAi need to hire you as an employee to create them your own dataset to train gpt and improve accuracy in astronomy tasks
You are doing great job, keep going 💯
@mgscheue หลายเดือนก่อน ⁺⁶
Such fun. I had no problem getting her to speak like an angry Australian, but when I asked it to sound like an angry Italian, she said she won't do stereotypes. As someone half Italian, I think I resent that!
@lanceguilin หลายเดือนก่อน
That is interesting... perhaps in a few more weeks, they'll nerf it to stop producing angry Australian tone.. it's a shame big business always having to conform to politically correctness/wokeness.
@leoberg7920 หลายเดือนก่อน ⁺⁴
Watching you interrupt it politely and then progressing to frustration is 😂
@koshydigital หลายเดือนก่อน ⁺⁹
Regarding the issue you were having with the integration problem - as far as I know, the advanced voice mode is using 4o as a basis (not o1). This means advanced voice mode suffers from the same historical encoder-related issues - i.e., the joke about LLMs not knowing how many R's are in strawberry still apply. This is a well understood problem that is solved in o1. Give it time :)
@Theguywithspectacles หลายเดือนก่อน
ywep
@koshydigital หลายเดือนก่อน ⁺¹
I meant to say tokenizer, not encoder.
@Morbuto หลายเดือนก่อน
For what it's worth, o1 still uses a similar tokeniser (as far as we know) and still has frequent issues with Strawberry r's...
@RPi-ne5rp หลายเดือนก่อน ⁺²
A couple of points to consider:
1. Voice input for these types of problems become a nuisance for humans and a source of noise for LLMs.
2. Some people are quick to suggest that certain problems might have been present in the training data. While this may be true, I always think that the training process is geared more towards creating accurate representations rather than simply memorizing answers.
However, you provided an example where the steps toward the solution were inaccurate, and the model still arrived at the correct answer. In another example, it read a bunch of fractions that weren't actually there, but got the rest of the problem right. This makes me think that there might be some level of memorization at play here, especially since, as far as I know, these models process audio input in an end-to-end manner.
@chaitanyapatel1946 หลายเดือนก่อน
Holy shit. That relativity problem is handled superbly well. It's not just about multiplication and division , but also exponents.
@MichaelSmith-lm5sl หลายเดือนก่อน
Here are a few different tests the user could try with OpenAI's advanced voice mode:
Differential Equations: Present a basic first-order or second-order differential equation and ask for the general solution or a particular solution given initial conditions.
Optimization Problem: Pose a multivariable calculus problem, such as finding the local minima or maxima of a function using partial derivatives or Lagrange multipliers.
Physics Kinematics: Give a scenario involving an object under projectile motion with initial velocity, angle, and gravitational force, and ask for time of flight or maximum height.
Logic Puzzles: Present a complex logic problem (e.g., involving truth-tellers and liars) and ask the AI to reason through and provide a solution.
Chemistry Stoichiometry: Give a balanced chemical equation and ask the AI to calculate the number of moles or mass of a product formed from given reactants.
These tests would assess different areas of STEM and reasoning capabilities in a broader scope.
@mikesawyer1336 หลายเดือนก่อน ⁺²
So this is GPT 4o not GPT 01 so it will be very interesting to see where this grows when we have access to GPT 01
@Martytw1 หลายเดือนก่อน ⁺⁵
We are only at the very beginning .... just imagine what AI will be like in 10 years from now
@honkytonk4465 หลายเดือนก่อน ⁺¹
Trivial insight
@KyleKabasares_PhD หลายเดือนก่อน ⁺²
10 years with double exponential growth will be crazy
@mrshankj5101 หลายเดือนก่อน ⁺⁶
Just think, this will be the worse it will ever be and it will just improve from here.
@integrateeverything หลายเดือนก่อน ⁺¹
Can it take vision photo upload of circuit diagram to solve properly?
@pigeon_official หลายเดือนก่อน ⁺¹
It can however OpenAI hasn't released that feature yet
@integrateeverything หลายเดือนก่อน
@@pigeon_official ok
@Johan-r3n หลายเดือนก่อน ⁺⁴
6940 lol why is it getting stuck there?
@KyleKabasares_PhD หลายเดือนก่อน ⁺¹
That’s what I wanted to know
@parthasarathyvenkatadri หลายเดือนก่อน ⁺¹
I think there is some problem where the speech is converted to tokens
@BlunderMunchkin หลายเดือนก่อน ⁺²
o1-mini was doing the same thing to me - ignoring my instructions after I corrected it
@ran_domness หลายเดือนก่อน
Interesting. Still more work to do.
@programmingpillars6805 หลายเดือนก่อน ⁺¹
he starts to think that human are stupid and ignore them .
@lanceguilin หลายเดือนก่อน
@@programmingpillars6805 'he'? i don't think it thinks it's a he.
@programmingpillars6805 หลายเดือนก่อน
@@lanceguilin in today's world it just a matter of time till you must say "He" or "she" to ai LLMs
@oznerriznick2474 หลายเดือนก่อน ⁺¹
Very good!
The fact that it missed the two resistors being in series rather than parallel may have been a programmed error to make it seem more human..
@ran_domness หลายเดือนก่อน ⁺⁵
So this is using GPT 4o right? So no chain of thought?
@mgscheue หลายเดือนก่อน ⁺¹
Right, advanced voice mode so far only works with 4o.
@wyqtor หลายเดือนก่อน ⁺¹
@@mgscheue Next step for Advanced Voice Mode: to learn to pause for thinking while routing the advanced query to o1-mini (or even the full o1), then integrate the result from that call into its response.
@d.d.jacksonpoetryproject หลายเดือนก่อน
Remember that the new voice mode only does GPT 4o so won’t be as smart as the non-new-voice mode GPT-01..
@vikasgupta1828 หลายเดือนก่อน
Thanks
@ParsevalMusic หลายเดือนก่อน ⁺¹
so coool
@vickmackey24 หลายเดือนก่อน ⁺²
This was hilarious. 😂🤣
@masoudmostafavi2896 หลายเดือนก่อน
God damn!😮
And it is all matrix operations and algorithms ran fast!
@KyleKabasares_PhD หลายเดือนก่อน
I'm impressed about that too!
@claudioagmfilho หลายเดือนก่อน ⁺²
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, This is amazing!
@andreasxfjd4141 หลายเดือนก่อน
3:48
I was even more persistent, but with textual input, prompt for programming in Wolfram Language, and I had at least 80%-90% unsuccessful attempts. Not sure how can it solve tasks at phd student niveau 😕
@lemon8523 หลายเดือนก่อน ⁺¹
The model used in the video is 4o, not the one that solves PhD problems
@ribaldc3998 หลายเดือนก่อน ⁺³
The AI is hard of hearing
@MH-kj9hh หลายเดือนก่อน ⁺¹
Try the integral problem again with a different number - I have a weird suscpicion that the issue might actually be a result of them trying to stop it from engaging in both 69 and 420 jokes ...
@KyleKabasares_PhD หลายเดือนก่อน ⁺²
I tried it without the 69420 and it still did not do it unfortunately
@ran_domness หลายเดือนก่อน
Copilot had no problems with the integral. You can see that if they integrate the chat capabilities with collaborative software it would be much more powerful. I imagine this is what Khan Academy is doing to create an online tutor.
th-cam.com/video/_nSmkyDNulk/w-d-xo.html
@bashbarash1148 หลายเดือนก่อน ⁺⁵
their text-to-speech team deserves much more gratitude than their llm scammers
@MH-kj9hh หลายเดือนก่อน ⁺⁵
It's not text-to-speech
The audio file, like the actual .wav file (not sure what kinda audio file) is being tokenized, like the wave form is being tokenized, and the AI is generating audio tokens back. No text involved.
@hydrohasspoken6227 หลายเดือนก่อน
Bruh...😆
@uranus8592 หลายเดือนก่อน ⁺³
This is not a text 2 speech its a natively multimodal model so it’s voice to voice it hears what you’re actually saying. It’s not transcribing your voice to text.
@bashbarash1148 หลายเดือนก่อน ⁺¹
Do you want me to take gratitudes back then?
I didn't find the confirmed information that they are using purely end2end approach to produce the sound wave.
They definitely have some sort of tokenization, and untill I see the implementation, or at least a paper with detailed model architecture, I will assume that they use text2speech, because they already have tremendously large llm. Why don't just produce text with this model. You don't need wave to wave model to generate speech of this quality.
I may be wrong, but I trust myself more than I trust openai, sorry. In any case, my point was completely tangent to this discussion
@MH-kj9hh หลายเดือนก่อน ⁺¹
@@bashbarash1148 Go watch the Gpt-4o launch live stream from 4 months ago. They talk about how old voice mode used text-to-speech which introduced a lot of latency but 4o is multimodal and reasons nativly in text, speech, and vision.
"they already have tremendously large llm. Why don't just produce text" - because 4o was actually trained on text, video, and audio -its a fully multimodal model, its just that up to this point text has been the only available input. Now with advanced voice mode straight audio is avialable as an input and once the vision gets itegrated into advanced voice mode straight images will be allowed as an input.
Yes there is tokenization, but the audio is being tokenized those tokens are being feed in and audio tokens are being produced - there is no textual middle man. This is provably true by the fact that advanced voice mode can literally steal your voice. I've had it actually start to respond for me in my own voice, which is creepy btw, but that wouldn't be possible with speach to text, it's only possible if my voice with its timber, affectation, etc.. is being tokenized.
@GalaxyHomeA9 หลายเดือนก่อน ⁺²
😆🤣😆🤣😆🤣😆🤣😆🤣😆😂
@andreaskrbyravn855 หลายเดือนก่อน ⁺¹
Been waiting for this mod and then eu bans its so dumb Jesus
@mikesawyer1336 หลายเดือนก่อน ⁺⁵
I have to say this is painful at times to listen - your are annoying her lol
@michaellam4082 หลายเดือนก่อน ⁺¹
That accent bro why
@Martytw1 หลายเดือนก่อน
🤣🤣🤣
@ninjuhdelic หลายเดือนก่อน ⁺²
Might have to show this to all this to some of my anti ai copium frens. Can’t wait to see ai take everything. Fingers crossed we’re allowed to integrate into the ai god
@thetempuragirl2805 หลายเดือนก่อน
I've been doing a role play with it since yesterday by having it do a countdown with different emotions in different scenario. I am blown away and often shocked at how realistic it's portrayal is, especially crying voice. It's incredible.

ต่อไป

เล่นอัตโนมัติ

Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems? (Part 2)