I'm doing something similar but building AI powered cybersecurity applications instead. And you're right,this whole thing is taking off.
We laughed at the Star Trek scene where Scotty,the engineer tried to speak to the computer to build software.They thought he was crazy....he wasn't crazy💯
Its cool seeing someone who really gets where these things are going and also what you can do with them right now. I genuinely think there will be people within the next 5 years who will have super intelligent AI directly accessible in their brain, at least if everything goes right. Which I feel insane for typing but it truly doesn't seem impossible. Thanks for releasing this too.
Thank you. I totally agree with you. I felt after Whisper came along the Speech-to-Text and now Speech-to-Speech combined with these new tools, future models and reduced costs will be such a game changer. I am still surprised it's rarely mentioned and how it will super-charge productivity and fundamentally change the way we code and interact with our devices. I think once it's combined with all of your data (and external data of course) in an intelligent way the creative process will be mind blowing compared to how we work now. Keep the videos up. Thanks
Another Awesome video!
Making sure to comment to get you algorithm points.
Whats your experience with a more open ended toolbox for the agent? Like having a database in supabase with functions and Agent Workflows that can be semantically searched and used. This way you dont have to provide a long list of available tools to the agent and adding new tools or workflows or even letting agents create test and then add them for reuse wouldh be easier. Like what they have done with that Minecraft AGent Voyager was its name? Or does this fail and if so were and why?
I've been waiting to see something on the new openai stuff from you. Gotta head into work and listen!
It’s truly great that you release the code for this for the masses
After listening you I started rethinking completely the system I am developing🎉
Thanks for this video and your POC project! really Epic stuff. I built a RAG prototype using Ollama and Qdrant and just updated your project to have a function call to get the related vectors from Qdrant and then have the advanced voice mode tell me about them and it works flawlessly... mind racing with all the ideas of how to integrate this into our products! Appreciate the effort to share this with the community 🔥
Mate, I just wanted to start building EXACTLY this! As if you've read my mind^^ Thank you so much!
I am so with you! This is exactly what we've been waiting for is right. I haven't been able to leave my computer for the last two days.
As someone who’s dyslexic I’m soooooo excited by this. Looking forward to your next videos :)
I regret not doing my Computer Science degree seeing this
On the contrary, if you have the imagination and the motivation, this stuff is easier than ever to learn, especially if you pair yourself up with Claude or ChatGPT. A CS degree by now is borderline old fashioned the rate at which things are changing.
Man I wish I was was this good. So grateful others are. Thanks for the sick demo
Add a command for ada to wait until you say "over" walkie talkie style and you can have time to pause and think when prompting?
Now combine this with the meta AI sunglasses and doing this with it and seeing the result while you are moving around to other places 😎
First time I've caught a video of yours. Not sure how they've slipped under the radar like that. Oh well, better late than never.
Very well done sir. Wonderful video.
Awesome work - loved it
It's completely amazing, thanks for sharing
Really love your example and passion!! Amazing stuff! I have been building my own real time speech-to-speech system, all the STT and TTS is local, works really well. And, free!
Impressive and creative!! Thanks for sharing!!
Seems like the only thing standing between us and full blown AI assistants is... software. Incredible.
I think I know the answer but did you make this to be opensource? If yes, where is it? Another question, does this work with openrouter API key? Also... I can't believe you have only 20k subs, your channel is so great that I swear, I wait the whole week for your contents, and when they arrive, a beautiful sensation of joy kicks in. Thanks for everything, bro! Huge fan here! Waiting to spend my monthly wage on your courses!
👍 👍 Great Work, Subscribed 👍👍
It would be very interesting to see you build the Next Best Version of this, using all open source and compare ❗❗ ❗ ❗
Excellent! Nice work! Realtime API costs add up, are you able to mitigate this somehow?
I am doing the same thing but they can be prohibitively expensive for anything more than a hobby project
Thanks Dan, Incredible video. Subscribed.
Nice demo, Dan. Thoroughly enjoyed watching. We need RTAPI to come down in price about 300x then I think we see it embedded everywhere. I would have it run constantly for myself like an ambient buddy.
This fine-grained level of control and fidelity is impressive. From a product perspective this should be engineered more vertically - agents to examine usage logs and recommend autonomy - eliminating redundancy. I'm imagining these goals are along your trajectory and it's interesting seeing it develop.
Bro that was quick. Great work ❤
Excellent video as usual. What other developers like you are out there? I’m learning a lot as a new developer from your projects. Thanks
Finally something interesting ty for good video!
Idea: LLMs sometimes struggle on simple tasks that coding has already solved. For counting the Rs in strawberry, for example, such tasks can be done by the AI creating a code for that, moreso than having to run that question through its own banks.
High Level LLMS >control> Low level LLMS/Neural Networks >control> Non AI scripts.
Most tasks would filter down and up this chain, possibly multiple times per prompt.
Brilliant......thank you....
issue is currently that you have to program the calls etc. - if it would just write the code for it on the fly, it would be much more automated, this is generally the issue - we don't want to hardcode anything in the future, it should know what to do. also, as well as it works, it's still probabilistic - we need some sort of classification model that checks the answers/outputs for correctness that is much less probabilistic
Amazing project thanks for sharing
WOW! IDD, your rock.
Thanks for another excellent video!
wow.... AI is truly mindblowing. I am still not sure if it is incredible or terrifying, or maybe an amalgamation of both
Yessss. He's done it again!
This is awesome and others have attempeted it. The issue is the API cost.
I tried Realtime API and it cost me almost 2 dollars for 2 minutes (a way more then $0.06 - $0.24 per minute as the official post says). Not going to use it any time soon until it becomes very cheap.
Thank YOU!
Great work! The future looks amazing.
But avoid pranksters next to you... "Hey, Ada, force delete all my files."
@@ZukunftBilden Now imagine the future: "Hey Ada, donate all my money to a charity of your choice."
Hi, I'm not an engineer or developer. Just began my AI programming journey. I believe you are doing amazing things here that I don't see from other creators. Greatly appreciate you video. Is this code actually building ADA or just implementation of the new Real-time API? Like I said I'm new and the README isn't exactly clear to me
Now we need a capable local model with an inference engine that supports the Realtime API. Then we are fine :-).
Async threading would be a beast. Only problem is you still have to confirm the tool was successful, if it's not, that can mess a lot of shit up while other operations depending on what's happening while those tools run. The solution I guess is structured output and make sure tools don't have errors, by anything they can control anyway.
Async operations would be fine as long as you know which workflows you can use them with in an open ended manner. Even if you can't quite imagine which ones can, just ask o1 mini to help you brain storm it. Structured output will likely always be a necessity. What I would really like to see however is a library of open-source and standardised function calls that can included in your project as both a RAG solution to assist LLM's when building out new apps, and an import for making the function calls available to those new apps.
Thank you.
And this is only Level Two AGI!
Next year we get Level Three, which will assist in developing Level Four.
Level Five will appear no later than 2027.
awesome work man
This is what rabbit R1 wished it could be
amazing
OK, first of all .. wow .. 2nd, what computer setup do you have, GPU, CPU, mem, Processor, Mother board, etc .. ?
Thanks for putting this together and for sharing. Do you think that having the functions in python will create a barrier, I have implemented a personal cognitive agent, currently standard voice interaction, with CRUD tool access to support personal journaling but like yourself I noticed longer delays on updating and have been wondering recently whether this can be improved by having the functions written in a compiled language, maybe MOJO will make the difference.
Amazing! Now what if we build out an extensive list of stock technical analysis and plotting functions in python using yfinance for your agent to use, and then combined with your file functions and current date, you could direct it to perform all kinds of stock research tasks and save those outputs for future comparison, etc. 💲📈📊
the realtime api is way to expensive for this sort of use case sadly. Maybe once it gets cheaper.
Would a loop that runs every minute, and provides a simple rng chance, to every so often prompt o1-mini to "have a random thought related to this conversation, then have the voice model verbalize" work the way I think it would? I imagine it could give a certain spark at the cost of some token burn
I like your terminal window. How did you make it look transparent plus the emojis ⁉️⁉️⁉️ is it cursor??
this is bonkers !!
Now give me a private version of that
How long do you think I'll need to wait til I can tell my computer to do my Houdini work?
Is it possible to use Blender or any program through voice commands only?
Big !
what is the oldest computer processor (celeron, i3. i4, etc) that can be used to do this?
No normal person will use the new realtime AI voice from openAI.
The prices are exorbitant and on top of that, you have to pay for text + voice:
$25 / 1M tokens for text
BUT
$200 / 1M tokens for voice
LOL
we are so back
23!
can u make a similar build but with gemini or claude for cheaper price like a small model version
@@SeeFoodDie its just speech to text and then converted back to speech bro
Sky isn't the limit...your wallet is. Voice api blows a hole in your wallet.
Oh my God, I can’t believe this. I am a blind professional, and this would help me so much.!
Bro is so professional, he could write a comment without having to see.
This technology is awesome, but it's just a combination what's already has been possible, but put together and professionally because of openai.
Combining voice models, with powerful llms and function calling, nothing is new here, the only new thing is that it has been done so well and fast
@@requestfx5585 do you think that’s funny? Why would you make such a shitty comment?
@@requestfx5585 I realize that I only represent 3% of the world’s population, but I am still a person with feelings.
@@requestfx5585 why would you say something so nasty do you think it’s funny that I’m blind?
@@requestfx5585 I’m not sure why my comments keep getting deleted, but you’re fucking comment was disgusting and not funny asshole