Use Gemini 2.0 to Build a Realtime Chat App with Multimodal Live API

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ก.พ. 2025

ความคิดเห็น • 34

  • @yeyulab
    @yeyulab  23 วันที่ผ่านมา

    Google-genai has been upgraded, and the session.send() definition was changed. Make sure you run the demo code on google-genai==0.3.0.

    • @miloldr
      @miloldr 21 วันที่ผ่านมา

      It would be really cool if I wouln't have to downgrade as it ruins everything else. Essentially i'm asking for our side fix, im gonna try to find the fix on my own but it would be cool if someone else would try too

    • @miloldr
      @miloldr 21 วันที่ผ่านมา +1

      Oh well that was simple, just change .send(...) to .send(input=...)

    • @rishabhchopra6418
      @rishabhchopra6418 17 ชั่วโมงที่ผ่านมา

      Which python version are we using here? It doesn't seem to be working with Python 3.13

    • @miloldr
      @miloldr 14 ชั่วโมงที่ผ่านมา

      @@rishabhchopra6418 idk and I don't mind

  • @AbdulRahman-vj9el
    @AbdulRahman-vj9el หลายเดือนก่อน +2

    Can I get response in both audio and text?
    I tried:
    CONFIG = {"generation_config": {"response_modalities": ["AUDIO", "TEXT]}} but it gives error.

    • @yeyulab
      @yeyulab  หลายเดือนก่อน

      unfortunately, the current API will throw an error.

  • @Swollphin
    @Swollphin 27 วันที่ผ่านมา +1

    Great video! Are we able to choose between different voices?

    • @yeyulab
      @yeyulab  25 วันที่ผ่านมา +3

      Yes, you can choose from several voices: Puck Charon Kore Fenrir Aoede
      You can try to add the "speech_config" under sendInitialSetupMessage() in index.html. Like this:
      function sendInitialSetupMessage() {
      console.log("sending setup message");
      const setup_client_message = {
      setup: {
      generation_config: { response_modalities: ["AUDIO"] },
      speech_config: {
      voice_config: {
      prebuilt_voice_config: { voice_name: VOICE_NAME }
      }
      }
      }
      };
      webSocket.send(JSON.stringify(setup_client_message));
      }

  • @adarsh1056
    @adarsh1056 14 วันที่ผ่านมา

    can this api be used to deploy and launch a web application to be used by others?

    • @yeyulab
      @yeyulab  13 วันที่ผ่านมา

      Yes, run python -m http.server 0.0.0.0 8000 to publish the client.

    • @adarsh1056
      @adarsh1056 13 วันที่ผ่านมา

      @ but is Google fine with us deploying this for large scale use? Maybe with say 1000 users?

    • @yeyulab
      @yeyulab  7 วันที่ผ่านมา

      @ There are limitations for this experimental free API, like 3 concurrent sessions allowed for one key, 15 mins max for one session.. so it's not for commercial yet

  • @simphyy
    @simphyy 20 วันที่ผ่านมา

    it is only showing response in text how do i make it talk back?

    • @yeyulab
      @yeyulab  20 วันที่ผ่านมา

      In the index.html, change the "TEXT" to "AUDIO":
      setup_client_message = {
      setup: {
      generation_config: { response_modalities: ["TEXT"] },
      },
      };
      Or you can find this video for supporting both Audio and Text. th-cam.com/video/nzP46fXz36c/w-d-xo.html

  • @rollin_sap
    @rollin_sap หลายเดือนก่อน

    Could you check the git repo, its not working properly.. config w text is working okayish but the audio config isnt working

    • @yeyulab
      @yeyulab  หลายเดือนก่อน

      Could you open the Inspect console of your browser to let me know its print? I was using Chrome, what browser are you using?

  • @ismynamemoyo6743
    @ismynamemoyo6743 หลายเดือนก่อน

    How do i allow interruption?

    • @yeyulab
      @yeyulab  หลายเดือนก่อน

      You don’t have to intentionally “allow”, the voice sent to and received from multimodal live api are chunked and in asynchronous way.

  • @RealLexable
    @RealLexable หลายเดือนก่อน

    Isn't it multimodal able anyway from the beginning on as gemini 2.0 flash? Or was your dev for local purposes?

    • @yeyulab
      @yeyulab  หลายเดือนก่อน

      I decoupled the API usage from google AI Studio for customization.

  • @lohithnh3496
    @lohithnh3496 หลายเดือนก่อน

    getting this error ?
    Error in Gemini session: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
    Gemini session closed.

    • @yeyulab
      @yeyulab  หลายเดือนก่อน

      How did you run the frontend side? There should be a websocket connection to this backend session, don’t know why you have an ssl issue

    • @lohithnh3496
      @lohithnh3496 หลายเดือนก่อน

      @yeyulab exactly as you did in the video -on new terminal with that command python -m http.server gives Serving HTTP on :: port 8000 ([::]:8000/) ...

    • @yeyulab
      @yeyulab  หลายเดือนก่อน

      @@lohithnh3496 Oh, this issue is not related to the websocket between server and client, it's the issue that disallows your server to connect the Gemini API because the Google's SSL CA cannot be trusted. You might check whether your google-genai packages was updated to 0.3.0+. It's better to "pip uninstall google-generativeai" if you have this legacy google SDK. If it still doesn't work, you may try to update your CA certificate in your hosting system.

    • @lohithnh3496
      @lohithnh3496 หลายเดือนก่อน +1

      @@yeyulab I'll check it out. thanks for helping out 😊

    • @memerstone-dankmemes3877
      @memerstone-dankmemes3877 21 วันที่ผ่านมา

      @@lohithnh3496 hi did you get around this? I am getting the same issue when trying to run it

  • @thabisonaha3939
    @thabisonaha3939 หลายเดือนก่อน

    This video was so helpful. Thank you for the good work. Always worth byuing you some coffee. i was wondering if its possible to train this system on my plumbing business data such that when customers live stream a video of their plumbing issues, the API can analyse the issue and advice accordingly then tell the customer that we have the solution in stock/inventory and it costs so much! Maybe the customer can purchase the item at the same time. (my wild thought)

    • @yeyulab
      @yeyulab  หลายเดือนก่อน +1

      Thanks! You made a great idea about the business opportunity. The multimodal models are not trainable at this moment, but you can use RAG to do the document retrieval on the business data and setup that retrieval as an external function, then use the multimodal model's function calling to call that function when handling customer's issues. It's not complicated to implement.

  • @mayank7001
    @mayank7001 หลายเดือนก่อน

    The video was very helpful. With the help of this video I was able to build my own application. But having trouble in deploying this. Could you please make a video on how to deploy this application. For information, I am having issue in deploying it on render, pyaudio cannot be used, how to solve it. Please help

    • @yeyulab
      @yeyulab  หลายเดือนก่อน +1

      If you need to deploy the app for external access, you must make sure you have https certificate for your url. Otherwise the voice recording will be disabled for security reasons. I will try to make a video on that!👌

    • @mayank7001
      @mayank7001 หลายเดือนก่อน

      @yeyulab yeah please. That would be really helpful.