📺 WATCH PART 2 - AI Cold Caller With Google Calendar: th-cam.com/video/J3d92Ak-P7o/w-d-xo.html 👉 GET THE CODE FOR FREE: bartslodyczka.gumroad.com/l/zsjdn 🛠 Hire me to build out an EPIC AI Voice Assistant for you: bart@supportlaunchpad.com 🧠 If you are interested in joining my incubator please fill out this form: forms.gle/KJxiqhB3aWxbgGoh8 📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7
Works flawlessly. The peeps mentioning latency -- its most likely your connection. I have consistently achieved sub 1 second, almost realtime performance with this. Nicely done dude. Function calling would be neat; especially crud ops with a db
Thanks for this video! I've been meaning to set this up with the real time Twilio API, but just haven't gotten to it yet. Been using Vapi but its so expensive. i would like to see how to transfer a call to a real person, or actually book an appointment in a Google calendar. Definitely Eleven Labs integration too!
Amazing video I have one question, why are we using replit, can we deploy it on own servers like ec2 , and what things we need to change if done so.. thankyou
thank you so much.amazing video.i look forward to your other videos. im looking to create super reliable appointment booking ai assistants.i would definitely apppreciate a video on that subject.thank you.
Realtime API is MUCH better and if you can afford it, I would use that. The main reason is because the backend of the realtime api is a built in thread so you’re having a conversation with an “agent” - whereas in this set up we’re sending calls to the completions endpoint along with the entire conversation history. So it’s still very good, but inherently it is not an “agent” (so to speak). For basic calls/ tasks this current set up works great :)
Can you interrupt current voice response? Or can you try to finish your thought if you didn’t manage to say it in full and the agent started voice response? Like saying “continue” which will interrupt the response keeping the previous input prompt and allowing you to properly finish input prompt. I implemented this Command words using Microsoft Azure speech services with continuous voice recognition. +1 for adding function calling
You can do interruptions and toward the end of my video in the final demo I interrupt and continue speaking about the same topic, and the response was in line with what I was saying. The mechanism that sends API calls to the GPT actually holds all conversation items (user message and agent response) and sends the entire history with each api call, so each response is always contextually correct. I don't know how efficient this process is, but it works for now. And haven't thought about commands just yet, but good idea! And noted on function calling 🙏
When I interrupt, the agent stops talking. Is there some kind of bug? I think it has to do with speaker. When I put my phone call on speaker the agent does not reply with audio after the third or fourth interaction. But when I take the phone off speaker it works fine
Hmm, that is strange. When I demo'd the interaction on youtube I had it on speaker and I had multiple conversation turns (so I spoke many times and the ai replied many times). Not really sure what it could be 🙏
Hey legend, yes you're 100% correct, would be even faster than standard REST api calls. I think using elevenlabs streaming would be faster yet again. So really, there is so much opportunity in this code to have a really fast, really cheap AI Caller 💪
I’ve got other videos showing how to do that too 💪 but the realtime api is currently like 30 cents per minute to run, and since it’s still in beta it has some stability issues. But realtime api is very fast and I’m sure all the kinks will be ironed out soon :) great question to ask legend
Cool tech demo, but let's think twice about automating every customer interaction just because we can. Sure, AI phone systems are cheaper than human staff, but real human connection in customer service is priceless. Personal relationships, genuine empathy, and human judgment are what build lasting customer loyalty. Maybe instead of replacing humans, we should use AI to help them do their jobs better? Sometimes the 'old way' with real people is still the best way, even if it costs more than 1¢ per minute. 🤔 Great tutorial though - the technical implementation is impressive!
Thank you and excellent point, for pretty much my entire journey with ai I have this assumption/ belief that initially businesses will adopt ai to save costs and have faster experiences, but then when everyone uses ai, the question will become “what is actually a good support experience?” And for that I think businesses will revert back to human support. It might not be 100% human, but maybe 50/50 with ai and human. Either way, I still use a 100% human customer support team for my ecommerce brand, but I do give my agents ai tools and augment other parts of our support experience with ai (eg ai chatbot, ai search on our help desk). I agree the tech is cool but we should use it wisely 💪 love the comment, I always want to see this kind of discussion 🤝
What about the MANY times customer service doesn't give a damn about their job and treat customers as if they were asking for a favor. What about the long waiting times? What about the lack of good manners?
It's the pareto 80/20 rule. 80% of CS requests are easily manageable and answerable through the various channels (bots, agents, knowledge base etc). It's then augmenting this with the human experience for the 20% of more involved requests of support and service.
Siema! I'm not sure what Twilio alternatives work in Poland but you should be able to forward calls from the provider to the Replit code :) And I'm pretty sure you can also change the language to polish - so then you'd have a mega AI Caller 💪
@@BartSlodyczka Tried it; doesn't come out as good as chatgpt, but it definitely works. I just added a line "you can understand and reply in Punjabi" in the prompt haha. The bottleneck in this pipeline is Deepgram's transcription.
hmm what about using gsm modem for calling - AT commands and you are in home. or use voip gateway. Second thought i was thinking about building same purpose app but my main goals are be independent - selfhosted and do it as 'realistic ' as possible with low latency. Using external api it is to easy, building whole from scratch is a good challange to get to know with whole llm - ai -stuff.
I have heard of people using a local LLM to run the backend and it is possible, fast, and cheap if you did it this way. I haven't looked into this yet but there may be other videos about this online already. As for calling with GSM modem or VOIP, great ideas!
hi bart... First of all thankew... Secondly... are you going to extend this video... like adding functions/tools..... that's the main purpose of building these callers.....
I calculated the number or transcription minutes (STT) along with the characters spoken (TTS) via deepgram, then I compared this to the total cost spent via deepgram. This came to ~0.89 Cents (so under 1 Cent). From there I looked at OpenAI API Usage for the same period, which was negligible. So then I decided to just say it was 1 cent total. Hope this makes sense 💪
@zubairkhankharooti3621 You try it. Let me know if you are able to get 1s latency. Text to speech and speech to text WITH interruption support from India did not work. But I do want it to work. I will retry and post my findings. If it works, then awesomeness 👌
Yeah the value prop here is the 1 cent per minute cost, and I agree that other purpose built tools like Retell and Vapi are better at the backend operations of AI calling systems 💪
📺 WATCH PART 2 - AI Cold Caller With Google Calendar: th-cam.com/video/J3d92Ak-P7o/w-d-xo.html
👉 GET THE CODE FOR FREE: bartslodyczka.gumroad.com/l/zsjdn
🛠 Hire me to build out an EPIC AI Voice Assistant for you: bart@supportlaunchpad.com
🧠 If you are interested in joining my incubator please fill out this form: forms.gle/KJxiqhB3aWxbgGoh8
📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7
Don’t download the free code; it doesn’t work. Don’t anger yourself.
Works flawlessly. The peeps mentioning latency -- its most likely your connection. I have consistently achieved sub 1 second, almost realtime performance with this. Nicely done dude. Function calling would be neat; especially crud ops with a db
Noiccee!!!! 💪
You’re a legend mate,, great work. I’m learning a lot from your videos.. thanks mate.
Thank you very much 🤝 keep going man 🚀🚀
Solid build man amazing job!
Thank you legend :)
Very nice explanation, love, watching your videos 👍
Thank you 💪
Yes 11 Labs, definitely!
Also, would love to see how you would implement a script rather than a faq
Script is a solid idea, will do more thinking about this :)
Thanks for this video! I've been meaning to set this up with the real time Twilio API, but just haven't gotten to it yet. Been using Vapi but its so expensive. i would like to see how to transfer a call to a real person, or actually book an appointment in a Google calendar. Definitely Eleven Labs integration too!
Great suggestions, google calendar keeps coming up so I will also look into this :)
Amazing video
I have one question, why are we using replit, can we deploy it on own servers like ec2 , and what things we need to change if done so.. thankyou
thank you so much.amazing video.i look forward to your other videos. im looking to create super reliable appointment booking ai assistants.i would definitely apppreciate a video on that subject.thank you.
great suggestion my man!
Awesome! Thank you for sharing this. I have big plans for you.
Shit yeah!! 💪
Awesome! Thanks for sharing. I will definitely give it a try
Woot woot! Enjoy :)
Very cool stuff! Function call would be nice to see. 👍
Thank you and done will pencil this in 💪
FIRE CONTENT AS USUAL
Thank you Viski 💪
This is amazing work! How does this compare in intelligence of the OpenaAI realtime api?
Realtime API is MUCH better and if you can afford it, I would use that. The main reason is because the backend of the realtime api is a built in thread so you’re having a conversation with an “agent” - whereas in this set up we’re sending calls to the completions endpoint along with the entire conversation history. So it’s still very good, but inherently it is not an “agent” (so to speak). For basic calls/ tasks this current set up works great :)
@ appreciate that! Also there’s the conversion delay. I wish the realtime was cheaper and had other voices.
Can you interrupt current voice response? Or can you try to finish your thought if you didn’t manage to say it in full and the agent started voice response? Like saying “continue” which will interrupt the response keeping the previous input prompt and allowing you to properly finish input prompt.
I implemented this Command words using Microsoft Azure speech services with continuous voice recognition.
+1 for adding function calling
You can do interruptions and toward the end of my video in the final demo I interrupt and continue speaking about the same topic, and the response was in line with what I was saying. The mechanism that sends API calls to the GPT actually holds all conversation items (user message and agent response) and sends the entire history with each api call, so each response is always contextually correct. I don't know how efficient this process is, but it works for now. And haven't thought about commands just yet, but good idea! And noted on function calling 🙏
great it works! Can you expand on implementing function calling and eleven labs or cartesia as an alternative for TTS
Awesome! And done will pencil it in 💪
Yes really wanna see function calling like book appointments and transfer calls. Btw isn't it easier to do with livekit?
Good suggestions, will pencil them in 💪 have never used live kit before will check it out :)
@@BartSlodyczka bro you can handle alot with livekit more easily. make sure you check it out. you will thank later, thats how good it is
When I interrupt, the agent stops talking. Is there some kind of bug? I think it has to do with speaker. When I put my phone call on speaker the agent does not reply with audio after the third or fourth interaction. But when I take the phone off speaker it works fine
Hmm, that is strange. When I demo'd the interaction on youtube I had it on speaker and I had multiple conversation turns (so I spoke many times and the ai replied many times). Not really sure what it could be 🙏
Hello Bart,
If you were to use deepgram's TTS Streaming service instead of plain REST api calls, wouldn't the response time be faster?
Hey legend, yes you're 100% correct, would be even faster than standard REST api calls. I think using elevenlabs streaming would be faster yet again. So really, there is so much opportunity in this code to have a really fast, really cheap AI Caller 💪
why not use openai's realtime API? just because of the voices, right? please pardon my ignorance
I’ve got other videos showing how to do that too 💪 but the realtime api is currently like 30 cents per minute to run, and since it’s still in beta it has some stability issues. But realtime api is very fast and I’m sure all the kinks will be ironed out soon :) great question to ask legend
@@BartSlodyczka
Cool tech demo, but let's think twice about automating every customer interaction just because we can. Sure, AI phone systems are cheaper than human staff, but real human connection in customer service is priceless. Personal relationships, genuine empathy, and human judgment are what build lasting customer loyalty. Maybe instead of replacing humans, we should use AI to help them do their jobs better? Sometimes the 'old way' with real people is still the best way, even if it costs more than 1¢ per minute. 🤔 Great tutorial though - the technical implementation is impressive!
Thank you and excellent point, for pretty much my entire journey with ai I have this assumption/ belief that initially businesses will adopt ai to save costs and have faster experiences, but then when everyone uses ai, the question will become “what is actually a good support experience?” And for that I think businesses will revert back to human support. It might not be 100% human, but maybe 50/50 with ai and human. Either way, I still use a 100% human customer support team for my ecommerce brand, but I do give my agents ai tools and augment other parts of our support experience with ai (eg ai chatbot, ai search on our help desk). I agree the tech is cool but we should use it wisely 💪 love the comment, I always want to see this kind of discussion 🤝
I couldn’t find your video where you layout how to use Ai on how to help real humans do their jobs. Any help?
What about the MANY times customer service doesn't give a damn about their job and treat customers as if they were asking for a favor. What about the long waiting times? What about the lack of good manners?
Its priceless when employing “customer service” not lazy employees
It's the pareto 80/20 rule. 80% of CS requests are easily manageable and answerable through the various channels (bots, agents, knowledge base etc). It's then augmenting this with the human experience for the 20% of more involved requests of support and service.
Hello Bart! Do you think its possible to create something like this for polish market? But without using Twilio cause their rates are crazy
Siema! I'm not sure what Twilio alternatives work in Poland but you should be able to forward calls from the provider to the Replit code :) And I'm pretty sure you can also change the language to polish - so then you'd have a mega AI Caller 💪
can this ai agent can also speaks in different languages or just restricted to english only ?
Haven't tested but should be able to speak in different languages!
@@BartSlodyczka Tried it; doesn't come out as good as chatgpt, but it definitely works. I just added a line "you can understand and reply in Punjabi" in the prompt haha. The bottleneck in this pipeline is Deepgram's transcription.
hmm what about using gsm modem for calling - AT commands and you are in home. or use voip gateway. Second thought i was thinking about building same purpose app but my main goals are be independent - selfhosted and do it as 'realistic ' as possible with low latency. Using external api it is to easy, building whole from scratch is a good challange to get to know with whole llm - ai -stuff.
I have heard of people using a local LLM to run the backend and it is possible, fast, and cheap if you did it this way. I haven't looked into this yet but there may be other videos about this online already. As for calling with GSM modem or VOIP, great ideas!
Need to figure out how to make that reasoner model that formulates the text think on graph now hmm
Very interesting 🤔
can we do this connecting it to a custom GPT?
Yes you can, but this will be slightly more unstable as the assistants api is in beta (and there are like 5 or 6 api calls per request)
@@BartSlodyczka That makes sense. Thanks a lot for taking the time to respond!
Thanks, what about Deepgram Voice Agent API Real Time?
Haven't thought about this before! Nice suggestion 💪
I would like to see more the infrastructure side. How to have a small call centre structure
Very interesting suggestion! I will do more thinking about this 💪
hi bart... First of all thankew... Secondly... are you going to extend this video... like adding functions/tools..... that's the main purpose of building these callers.....
Hey legend! Yeah I will make a part 2 video with function calling 💪
@@BartSlodyczka thanks legends Chief...
Outbound agent please? In a way that we can schedule multiple calls one after another, to different customers
Great suggestion, will pencil it in!
Function calling booking appoinments
Penciling it in 💪
Sorry but how is this 1c per minute? I'd really love to know how you came to that conclusion
I calculated the number or transcription minutes (STT) along with the characters spoken (TTS) via deepgram, then I compared this to the total cost spent via deepgram. This came to ~0.89 Cents (so under 1 Cent). From there I looked at OpenAI API Usage for the same period, which was negligible. So then I decided to just say it was 1 cent total. Hope this makes sense 💪
thanks:)
Always 🤝
🔥🔥🔥🔥🔥🔥🔥
Letsss goooo 💪💪
Eleven labs plz
this is 2s latency. didn't work.
Can be even faster with streaming api for deep gram TTS and even faster with streaming TTS elevenlabs
The problem is in sanju not in the app..
@zubairkhankharooti3621 You try it. Let me know if you are able to get 1s latency. Text to speech and speech to text WITH interruption support from India did not work. But I do want it to work. I will retry and post my findings. If it works, then awesomeness 👌
@mmdls602 mentioned he tried it and it worked for him. Let me find the fault in my deployment.
But this definitely has HORRIBLE turn taking, emotion detection and latency ..
Or Am i wrong ? Thats what the secret sauce of Retell, Vapi is :)
Yeah the value prop here is the 1 cent per minute cost, and I agree that other purpose built tools like Retell and Vapi are better at the backend operations of AI calling systems 💪