It would be really cool if I wouln't have to downgrade as it ruins everything else. Essentially i'm asking for our side fix, im gonna try to find the fix on my own but it would be cool if someone else would try too
Yes, you can choose from several voices: Puck Charon Kore Fenrir Aoede You can try to add the "speech_config" under sendInitialSetupMessage() in index.html. Like this: function sendInitialSetupMessage() { console.log("sending setup message"); const setup_client_message = { setup: { generation_config: { response_modalities: ["AUDIO"] }, speech_config: { voice_config: { prebuilt_voice_config: { voice_name: VOICE_NAME } } } } }; webSocket.send(JSON.stringify(setup_client_message)); }
@ There are limitations for this experimental free API, like 3 concurrent sessions allowed for one key, 15 mins max for one session.. so it's not for commercial yet
In the index.html, change the "TEXT" to "AUDIO": setup_client_message = { setup: { generation_config: { response_modalities: ["TEXT"] }, }, }; Or you can find this video for supporting both Audio and Text. th-cam.com/video/nzP46fXz36c/w-d-xo.html
getting this error ? Error in Gemini session: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) Gemini session closed.
@@lohithnh3496 Oh, this issue is not related to the websocket between server and client, it's the issue that disallows your server to connect the Gemini API because the Google's SSL CA cannot be trusted. You might check whether your google-genai packages was updated to 0.3.0+. It's better to "pip uninstall google-generativeai" if you have this legacy google SDK. If it still doesn't work, you may try to update your CA certificate in your hosting system.
This video was so helpful. Thank you for the good work. Always worth byuing you some coffee. i was wondering if its possible to train this system on my plumbing business data such that when customers live stream a video of their plumbing issues, the API can analyse the issue and advice accordingly then tell the customer that we have the solution in stock/inventory and it costs so much! Maybe the customer can purchase the item at the same time. (my wild thought)
Thanks! You made a great idea about the business opportunity. The multimodal models are not trainable at this moment, but you can use RAG to do the document retrieval on the business data and setup that retrieval as an external function, then use the multimodal model's function calling to call that function when handling customer's issues. It's not complicated to implement.
The video was very helpful. With the help of this video I was able to build my own application. But having trouble in deploying this. Could you please make a video on how to deploy this application. For information, I am having issue in deploying it on render, pyaudio cannot be used, how to solve it. Please help
If you need to deploy the app for external access, you must make sure you have https certificate for your url. Otherwise the voice recording will be disabled for security reasons. I will try to make a video on that!👌
Google-genai has been upgraded, and the session.send() definition was changed. Make sure you run the demo code on google-genai==0.3.0.
It would be really cool if I wouln't have to downgrade as it ruins everything else. Essentially i'm asking for our side fix, im gonna try to find the fix on my own but it would be cool if someone else would try too
Oh well that was simple, just change .send(...) to .send(input=...)
Which python version are we using here? It doesn't seem to be working with Python 3.13
@@rishabhchopra6418 idk and I don't mind
Can I get response in both audio and text?
I tried:
CONFIG = {"generation_config": {"response_modalities": ["AUDIO", "TEXT]}} but it gives error.
unfortunately, the current API will throw an error.
Great video! Are we able to choose between different voices?
Yes, you can choose from several voices: Puck Charon Kore Fenrir Aoede
You can try to add the "speech_config" under sendInitialSetupMessage() in index.html. Like this:
function sendInitialSetupMessage() {
console.log("sending setup message");
const setup_client_message = {
setup: {
generation_config: { response_modalities: ["AUDIO"] },
speech_config: {
voice_config: {
prebuilt_voice_config: { voice_name: VOICE_NAME }
}
}
}
};
webSocket.send(JSON.stringify(setup_client_message));
}
can this api be used to deploy and launch a web application to be used by others?
Yes, run python -m http.server 0.0.0.0 8000 to publish the client.
@ but is Google fine with us deploying this for large scale use? Maybe with say 1000 users?
@ There are limitations for this experimental free API, like 3 concurrent sessions allowed for one key, 15 mins max for one session.. so it's not for commercial yet
it is only showing response in text how do i make it talk back?
In the index.html, change the "TEXT" to "AUDIO":
setup_client_message = {
setup: {
generation_config: { response_modalities: ["TEXT"] },
},
};
Or you can find this video for supporting both Audio and Text. th-cam.com/video/nzP46fXz36c/w-d-xo.html
Could you check the git repo, its not working properly.. config w text is working okayish but the audio config isnt working
Could you open the Inspect console of your browser to let me know its print? I was using Chrome, what browser are you using?
How do i allow interruption?
You don’t have to intentionally “allow”, the voice sent to and received from multimodal live api are chunked and in asynchronous way.
Isn't it multimodal able anyway from the beginning on as gemini 2.0 flash? Or was your dev for local purposes?
I decoupled the API usage from google AI Studio for customization.
getting this error ?
Error in Gemini session: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
Gemini session closed.
How did you run the frontend side? There should be a websocket connection to this backend session, don’t know why you have an ssl issue
@yeyulab exactly as you did in the video -on new terminal with that command python -m http.server gives Serving HTTP on :: port 8000 ([::]:8000/) ...
@@lohithnh3496 Oh, this issue is not related to the websocket between server and client, it's the issue that disallows your server to connect the Gemini API because the Google's SSL CA cannot be trusted. You might check whether your google-genai packages was updated to 0.3.0+. It's better to "pip uninstall google-generativeai" if you have this legacy google SDK. If it still doesn't work, you may try to update your CA certificate in your hosting system.
@@yeyulab I'll check it out. thanks for helping out 😊
@@lohithnh3496 hi did you get around this? I am getting the same issue when trying to run it
This video was so helpful. Thank you for the good work. Always worth byuing you some coffee. i was wondering if its possible to train this system on my plumbing business data such that when customers live stream a video of their plumbing issues, the API can analyse the issue and advice accordingly then tell the customer that we have the solution in stock/inventory and it costs so much! Maybe the customer can purchase the item at the same time. (my wild thought)
Thanks! You made a great idea about the business opportunity. The multimodal models are not trainable at this moment, but you can use RAG to do the document retrieval on the business data and setup that retrieval as an external function, then use the multimodal model's function calling to call that function when handling customer's issues. It's not complicated to implement.
The video was very helpful. With the help of this video I was able to build my own application. But having trouble in deploying this. Could you please make a video on how to deploy this application. For information, I am having issue in deploying it on render, pyaudio cannot be used, how to solve it. Please help
If you need to deploy the app for external access, you must make sure you have https certificate for your url. Otherwise the voice recording will be disabled for security reasons. I will try to make a video on that!👌
@yeyulab yeah please. That would be really helpful.