Coding an AI Voice Bot from Scratch: Real-Time Conversation with Python
ฝัง
- เผยแพร่เมื่อ 2 มิ.ย. 2024
- 🔑 Get your AssemblyAI API key here: www.assemblyai.com/?...
Learn how to build a real-time AI voice assistant using Python that can handle incoming calls, transcribe speech, generate intelligent responses, and provide a human-like conversational experience. Perfect for call centers, customer support, and virtual receptionist applications.
In this coding tutorial, you'll integrate multiple cutting-edge technologies, including:
1. Assemblyai Speech-to-Text API for accurate real-time transcription.
2. OpenAI's powerful language models for natural language processing (NLP) and response generation.
3. ElevenLabs' AI voice synthesis to convert text responses into natural-sounding audio.
Step-by-step, you'll create a Python application that seamlessly combines these APIs, enabling your AI assistant to listen to incoming audio, comprehend the speech, formulate contextual responses, and communicate back with synthesized voice in real-time.
Github code: github.com/smithakolan/Assemb...
Timestamps:
00:00 - Intro & Demo of application
01:10 - Outline of application
01:58 - Step 1: download python libraries
06:21 - Step 1: Streaming Speech-to-Text with AssemblyAI
12:11 - Step 3: OpenAI Chat completion
15:32 - Step 4: Generate Human-like audio with Elevenlabs
18:48 - Running our AI Call Assistant
#AIVoiceAssistant #RealTimeSpeechRecognition #NaturalLanguageProcessing #AIVoiceSynthesis #PythonTutorial #CallCenterAutomation #VoiceBot #StreamingSpeechtoText
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: www.assemblyai.com
🐦 Twitter: / assemblyai
🦾 Discord: / discord
▶️ Subscribe: th-cam.com/users/AssemblyAI?...
🔥 We're hiring! Check our open roles: www.assemblyai.com/careers
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning - วิทยาศาสตร์และเทคโนโลยี
Exactly what I was intending on making. Thanks!
Using Groq / Mistral AI instead of OpenAI will greatly reduce the latency issue you have in your demo.
can you fine tune groq?
Great suggestion, we will explore this in the next tutorial. This one was meant to be as accessible as possible so that people could build quickly.
@@logannon no its impossible to fine tune groq. thats the problem. you have to use rag instead of fine tuning. but if you wanna make chatbot for specific domain you should try other service
Please a tutorial on llava vision model to analyze video live with cv2
And I am unable to get my API token from assembly AI website please fix it
amazing lady and also an engineer omg)) thank you a million, I'll just add this to my stack
The programming is not responding after the first introduction ,as shown in the video ;though even after using the github code. Any alternative with step by step instruction video ?
why not chunk text and output instead of output after all text is generated?
how would you handle interruptions while the ai is talking?
hi thanks for your video . i want Api real time conversation with python for Farsi language . the LLM support Farsi language?
Thanks. First time I hear of AssemblyAI. Everyone talks about faster_whisper and Deepgram. Is AssemblyAI better for STT?
Two questions: How can we improve the latency between the patient's response and the AI voice reply? and What can be done for the AI Voice to account for patient input if the patient speaks while the AI voice is speaking?
Hi Jeffrey, two very good questions! These deserve a video on their own, to be honest. To improve latency one thing you could try is running the LLM locally so you can get a faster inference over calling openai's API. As for handling overlapping speech, I've written the program to stop listening when the AI voice is responding back. But what you could do, is run another thread that is still listening while the AI voice is speaking.
As for the latency, I was assuming the majority of the latency was actually coming from ElevenLabs? And likely also from whatever functions might be needed to actually check the availability of the dentist and then also to schedule the actual appointment in the end. Am I wrong?
So yeah I think running the LLM locally will surely help, or using Groq, but I'm not convinced yet that that is the biggest bottleneck.
This video is so great! I'm following your video but now I ran into this problem, I can install the package in Pycharm with Windows system, but I got this error: OSError: Cannot find mpv-1.dll, mpv-2.dll or libmpv-2.dll in your system %PATH%. I'm a researcher in the art field with only a debutant python knowledge, could you help me solve this problem? Thanks a lot!
But I still have problems it says that [from elevenlabs import generate, stream
ImportError: cannot import name 'generate' from 'elevenlabs'] how come
i have the exact same error did you fix it ?
I followed this tutorial then in the end I realized .. assemblyAI doesn't provide the support for the Japanese language in the live Reltimetranscriber. Which sucks .. lol can't use it. Any help? @assemblyAI
Hi nice tutorial. I have coded real-time voice bot for phone conversations in Twilio.
The latency comes from text-to-speech mostly and gpt response time.
I'm guesing if either ones speed can be reduced about 2-3x, then the response time would be fast enough. In human conversation, we expect the response within 1 second....and anything above that seems more unnatural. I'm sure the speed issues will be solved with new Nvidia GPU-s or other hardware innovations.
any way to make one with adam voice like the one in elevenlabs?😊
Hi There - I was just looking at the code. Where is the appointment setting details / info coming from ?
All that is coming from the LLM we are using, so it's not hard-coded.
i am getting error "Cannot find reference 'generate' in '__init__.py' " on from elevenlabs import generate, stream line can you please help me to resolve this issue
For some reason, the microphone isn't picking up my voice. I enabled all permissions on my mac and am still having trouble. Is there any way to fix this?
I think you need to pay for the real-time transcription for this at AssemblyAI
streaming from assembly ai is a paid service. So, first you need add balance into your account. If you have not done that yet. Hope that helps :)
Excellent .
super cool
How can i conect to my phone number and google calendar?🙏🏼
You can make use of the Google API for google calendar and something like Twilio's API for making phone calls.
can u make just a chat bot word to voice
An error occured: Could not connect to the real-time service: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
what to do with this error?
same error. You found the solution?
most likely your microphone is switched off pls check
i am facing the mpv value error on windows i already installed it many times how can i fix that
just use vlc instead mpv bro
@@sethuraman9884 thank you guys
or check environment path of mpv. when you command mpv --version on cmd. you have to see its running
The only downside is the fact it takes a while to respond with voice.
The assembly ai api is not free.
I am very api to have found this
TOO SLOW!
after watching your video, i think i prefer interacting with humans
why are you saying fro. scratch if you're only using api
TOO SLOW !