🗂 GET ALL THE CODE FILES: bartslodyczka.gumroad.com/l/jeznwq 📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7 📺Realtime API Tutorial Series: th-cam.com/play/PLi7jtY2ZZqRYE8Lvw4MuLHTZPYTA4jZHQ.html&si=7DAE9z7YtQlMrzrd
If you’re just performing audio to text, is it necessary to specify both text and audio modalities? Will the model just ignore the audio file if you don’t specify both modalities?
I haven't tested if the model will ignore it and yeah also not sure if you need to specify both. Made this code a couple weeks back and can't recall from the top of my head 🙏
Thanks. How is that different from whisper voice to text? For voice to text usecase? The price difference is 10x. Is it faster? Is Quality better? The price looks stull very high. Like 20$/ hour of voice conversation. Almost, the cost of hiring humans for talking).
Haven't done any work with whisper voice to text so i cant say, but in the demo I show this new audio model recognise abstract sounds and not just speech. So if whisper is cheaper for now, then you might stick with that for speech to text. Whereas for more dynamic sound recognition, you can use this audio model
I just tested using short audio with 2 speakers talking to each other. I asked for a transcript of the convo broken down by speaker and it gave me the below: **Speaker 1:** So, Erin, in your email you said you wanted to talk about the exam. **Speaker 2:** Yeah, um, I've just never taken a class with so many different readings. I've managed to keep up with all the assignments, but I'm not sure how to... how to... **Speaker 1:** How to review everything? **Speaker 2:** Yeah. In other classes I've had, there's usually just one book to review, not three different books. Plus all those other text excerpts and videos...
🗂 GET ALL THE CODE FILES: bartslodyczka.gumroad.com/l/jeznwq
📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7
📺Realtime API Tutorial Series: th-cam.com/play/PLi7jtY2ZZqRYE8Lvw4MuLHTZPYTA4jZHQ.html&si=7DAE9z7YtQlMrzrd
Instant subscription, then I saw you build and provide resources. Excellent content.
Thanks legend 🤝
You are an absolute legend! You should have millions of subscribers
haha! thank you my man!
You are very pedagogic and explaining very well. Thanks for sharing!
thank you very much, appreciate this comment 🙏
very well structured video and test, great work! hope you do more videos
thanks legend! Will do 💪
Most onpoint video 😍 i appreciate you
thanks man! I appreciate you too 💪
If you’re just performing audio to text, is it necessary to specify both text and audio modalities? Will the model just ignore the audio file if you don’t specify both modalities?
I haven't tested if the model will ignore it and yeah also not sure if you need to specify both. Made this code a couple weeks back and can't recall from the top of my head 🙏
Thanks. How is that different from whisper voice to text? For voice to text usecase?
The price difference is 10x. Is it faster? Is Quality better? The price looks stull very high. Like 20$/ hour of voice conversation. Almost, the cost of hiring humans for talking).
Haven't done any work with whisper voice to text so i cant say, but in the demo I show this new audio model recognise abstract sounds and not just speech. So if whisper is cheaper for now, then you might stick with that for speech to text. Whereas for more dynamic sound recognition, you can use this audio model
is it doing diarizarion? separation voices - voice1 - voice2 etc?
I just tested using short audio with 2 speakers talking to each other. I asked for a transcript of the convo broken down by speaker and it gave me the below:
**Speaker 1:** So, Erin, in your email you said you wanted to talk about the exam.
**Speaker 2:** Yeah, um, I've just never taken a class with so many different readings. I've managed to keep up with all the assignments, but I'm not sure how to... how to...
**Speaker 1:** How to review everything?
**Speaker 2:** Yeah. In other classes I've had, there's usually just one book to review, not three different books. Plus all those other text excerpts and videos...
@@BartSlodyczka wow wow, I will try. thank you