Coding an AI Voice Bot from Scratch: Real-Time Conversation with Python

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 มิ.ย. 2024
  • 🔑 Get your AssemblyAI API key here: www.assemblyai.com/?...
    Learn how to build a real-time AI voice assistant using Python that can handle incoming calls, transcribe speech, generate intelligent responses, and provide a human-like conversational experience. Perfect for call centers, customer support, and virtual receptionist applications.
    In this coding tutorial, you'll integrate multiple cutting-edge technologies, including:
    1. Assemblyai Speech-to-Text API for accurate real-time transcription.
    2. OpenAI's powerful language models for natural language processing (NLP) and response generation.
    3. ElevenLabs' AI voice synthesis to convert text responses into natural-sounding audio.
    Step-by-step, you'll create a Python application that seamlessly combines these APIs, enabling your AI assistant to listen to incoming audio, comprehend the speech, formulate contextual responses, and communicate back with synthesized voice in real-time.
    Github code: github.com/smithakolan/Assemb...
    Timestamps:
    00:00 - Intro & Demo of application
    01:10 - Outline of application
    01:58 - Step 1: download python libraries
    06:21 - Step 1: Streaming Speech-to-Text with AssemblyAI
    12:11 - Step 3: OpenAI Chat completion
    15:32 - Step 4: Generate Human-like audio with Elevenlabs
    18:48 - Running our AI Call Assistant
    #AIVoiceAssistant #RealTimeSpeechRecognition #NaturalLanguageProcessing #AIVoiceSynthesis #PythonTutorial #CallCenterAutomation #VoiceBot #StreamingSpeechtoText
    ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
    🖥️ Website: www.assemblyai.com
    🐦 Twitter: / assemblyai
    🦾 Discord: / discord
    ▶️ Subscribe: th-cam.com/users/AssemblyAI?...
    🔥 We're hiring! Check our open roles: www.assemblyai.com/careers
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    #MachineLearning #DeepLearning
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 48

  • @thebackpainmiracle
    @thebackpainmiracle 24 วันที่ผ่านมา

    Exactly what I was intending on making. Thanks!

  • @NatGreenOnline
    @NatGreenOnline 2 หลายเดือนก่อน +12

    Using Groq / Mistral AI instead of OpenAI will greatly reduce the latency issue you have in your demo.

    • @logannon
      @logannon 2 หลายเดือนก่อน

      can you fine tune groq?

    • @AssemblyAI
      @AssemblyAI  2 หลายเดือนก่อน

      Great suggestion, we will explore this in the next tutorial. This one was meant to be as accessible as possible so that people could build quickly.

    • @user-vm8yn4hb4w
      @user-vm8yn4hb4w หลายเดือนก่อน

      @@logannon no its impossible to fine tune groq. thats the problem. you have to use rag instead of fine tuning. but if you wanna make chatbot for specific domain you should try other service

  • @JokerJarvis-cy2sw
    @JokerJarvis-cy2sw 2 หลายเดือนก่อน +2

    Please a tutorial on llava vision model to analyze video live with cv2
    And I am unable to get my API token from assembly AI website please fix it

  • @euginekholmogorov5196
    @euginekholmogorov5196 2 หลายเดือนก่อน +1

    amazing lady and also an engineer omg)) thank you a million, I'll just add this to my stack

  • @simonsandeep4977
    @simonsandeep4977 หลายเดือนก่อน

    The programming is not responding after the first introduction ,as shown in the video ;though even after using the github code. Any alternative with step by step instruction video ?

  • @TheBestgoku
    @TheBestgoku 2 หลายเดือนก่อน

    why not chunk text and output instead of output after all text is generated?

  • @theghostyced
    @theghostyced 23 วันที่ผ่านมา

    how would you handle interruptions while the ai is talking?

  • @sarap.sadegh4691
    @sarap.sadegh4691 2 หลายเดือนก่อน

    hi thanks for your video . i want Api real time conversation with python for Farsi language . the LLM support Farsi language?

  • @bens4446
    @bens4446 16 ชั่วโมงที่ผ่านมา

    Thanks. First time I hear of AssemblyAI. Everyone talks about faster_whisper and Deepgram. Is AssemblyAI better for STT?

  • @JeffreyJohnson-vy1zm
    @JeffreyJohnson-vy1zm 2 หลายเดือนก่อน +1

    Two questions: How can we improve the latency between the patient's response and the AI voice reply? and What can be done for the AI Voice to account for patient input if the patient speaks while the AI voice is speaking?

    • @AssemblyAI
      @AssemblyAI  2 หลายเดือนก่อน

      Hi Jeffrey, two very good questions! These deserve a video on their own, to be honest. To improve latency one thing you could try is running the LLM locally so you can get a faster inference over calling openai's API. As for handling overlapping speech, I've written the program to stop listening when the AI voice is responding back. But what you could do, is run another thread that is still listening while the AI voice is speaking.

    • @EvertvanBrussel
      @EvertvanBrussel หลายเดือนก่อน

      As for the latency, I was assuming the majority of the latency was actually coming from ElevenLabs? And likely also from whatever functions might be needed to actually check the availability of the dentist and then also to schedule the actual appointment in the end. Am I wrong?
      So yeah I think running the LLM locally will surely help, or using Groq, but I'm not convinced yet that that is the biggest bottleneck.

  • @yuchengpeng7706
    @yuchengpeng7706 2 หลายเดือนก่อน

    This video is so great! I'm following your video but now I ran into this problem, I can install the package in Pycharm with Windows system, but I got this error: OSError: Cannot find mpv-1.dll, mpv-2.dll or libmpv-2.dll in your system %PATH%. I'm a researcher in the art field with only a debutant python knowledge, could you help me solve this problem? Thanks a lot!

  • @FaisalKhrisan
    @FaisalKhrisan 16 วันที่ผ่านมา +1

    But I still have problems it says that [from elevenlabs import generate, stream
    ImportError: cannot import name 'generate' from 'elevenlabs'] how come

    • @Ghosty0069
      @Ghosty0069 วันที่ผ่านมา

      i have the exact same error did you fix it ?

  • @uttamdwivedi7709
    @uttamdwivedi7709 2 หลายเดือนก่อน

    I followed this tutorial then in the end I realized .. assemblyAI doesn't provide the support for the Japanese language in the live Reltimetranscriber. Which sucks .. lol can't use it. Any help? @assemblyAI

  • @randotkatsenko5157
    @randotkatsenko5157 8 วันที่ผ่านมา

    Hi nice tutorial. I have coded real-time voice bot for phone conversations in Twilio.
    The latency comes from text-to-speech mostly and gpt response time.
    I'm guesing if either ones speed can be reduced about 2-3x, then the response time would be fast enough. In human conversation, we expect the response within 1 second....and anything above that seems more unnatural. I'm sure the speed issues will be solved with new Nvidia GPU-s or other hardware innovations.

  • @urekmazino1327
    @urekmazino1327 22 วันที่ผ่านมา

    any way to make one with adam voice like the one in elevenlabs?😊

  • @iainhmunro
    @iainhmunro หลายเดือนก่อน +1

    Hi There - I was just looking at the code. Where is the appointment setting details / info coming from ?

    • @AssemblyAI
      @AssemblyAI  หลายเดือนก่อน

      All that is coming from the LLM we are using, so it's not hard-coded.

  • @PalashDandge
    @PalashDandge 24 วันที่ผ่านมา

    i am getting error "Cannot find reference 'generate' in '__init__.py' " on from elevenlabs import generate, stream line can you please help me to resolve this issue

  • @vishalsaichindepalli2798
    @vishalsaichindepalli2798 2 หลายเดือนก่อน

    For some reason, the microphone isn't picking up my voice. I enabled all permissions on my mac and am still having trouble. Is there any way to fix this?

    • @michaelnumnum
      @michaelnumnum 2 หลายเดือนก่อน +1

      I think you need to pay for the real-time transcription for this at AssemblyAI

    • @Vrilogs
      @Vrilogs 27 วันที่ผ่านมา

      streaming from assembly ai is a paid service. So, first you need add balance into your account. If you have not done that yet. Hope that helps :)

  • @mehdismaeili3743
    @mehdismaeili3743 20 วันที่ผ่านมา

    Excellent .

  • @sillystuff6247
    @sillystuff6247 2 หลายเดือนก่อน

    super cool

  • @Alex-qo5je
    @Alex-qo5je หลายเดือนก่อน +1

    How can i conect to my phone number and google calendar?🙏🏼

    • @AssemblyAI
      @AssemblyAI  หลายเดือนก่อน

      You can make use of the Google API for google calendar and something like Twilio's API for making phone calls.

  • @mrunexpected10
    @mrunexpected10 2 หลายเดือนก่อน

    can u make just a chat bot word to voice

  • @nithishreddy7684
    @nithishreddy7684 หลายเดือนก่อน

    An error occured: Could not connect to the real-time service: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
    what to do with this error?

    • @islamicinterestofficial
      @islamicinterestofficial หลายเดือนก่อน

      same error. You found the solution?

    • @chittisai47
      @chittisai47 หลายเดือนก่อน

      most likely your microphone is switched off pls check

  • @viditsharma6990
    @viditsharma6990 หลายเดือนก่อน

    i am facing the mpv value error on windows i already installed it many times how can i fix that

    • @sethuraman9884
      @sethuraman9884 หลายเดือนก่อน

      just use vlc instead mpv bro

    • @user-vm8yn4hb4w
      @user-vm8yn4hb4w หลายเดือนก่อน

      @@sethuraman9884 thank you guys

    • @user-vm8yn4hb4w
      @user-vm8yn4hb4w หลายเดือนก่อน

      or check environment path of mpv. when you command mpv --version on cmd. you have to see its running

  • @daeralbra
    @daeralbra 2 หลายเดือนก่อน +1

    The only downside is the fact it takes a while to respond with voice.

  • @jeevanjaison9646
    @jeevanjaison9646 16 วันที่ผ่านมา

    The assembly ai api is not free.

  • @user-qp1jq3eh3e
    @user-qp1jq3eh3e 20 วันที่ผ่านมา

    I am very api to have found this

  • @BeRMaNyA
    @BeRMaNyA 9 วันที่ผ่านมา

    TOO SLOW!

  • @drmarioschannel
    @drmarioschannel 2 หลายเดือนก่อน +3

    after watching your video, i think i prefer interacting with humans

  • @urekmazino1327
    @urekmazino1327 22 วันที่ผ่านมา

    why are you saying fro. scratch if you're only using api

  • @BernardoCastro-eb6rp
    @BernardoCastro-eb6rp 9 วันที่ผ่านมา

    TOO SLOW !