Build a Gemini Voice Assistant in Python

Ai Austin

มุมมอง 25 379

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 18 ม.ค. 2025

ความคิดเห็น • 67

@johnbarros1 9 หลายเดือนก่อน ⁺¹¹
Ai Austin you the man! Excellent content! Much respect! 🤜🏾🤛🏼
@Ai_Austin 9 หลายเดือนก่อน ⁺³
I appreciate that!
@mbegangsylvain1076 9 หลายเดือนก่อน ⁺⁵
Thanks, I'm working on it tomorrow 😊
@biswaranjannayak4580 6 หลายเดือนก่อน ⁺¹
Why not today?
@mulderbm 8 หลายเดือนก่อน
Thanks!
@Ai_Austin 8 หลายเดือนก่อน
thank you for the support!
@Afran146 9 หลายเดือนก่อน ⁺⁴
I will try this. Thank you
@Ai_Austin 9 หลายเดือนก่อน ⁺³
Hope you enjoy
@Tomblom 7 หลายเดือนก่อน
8:39 this is so insane to me. That your actual code is a message directly to the AI. "Hey keep it short mr. It dont want any bullshit from you. I use you to talk and that's it"
@seththunder2077 9 หลายเดือนก่อน ⁺³
yoooooooooo the ending is littttttttttttt
@raunaksharma8638 9 หลายเดือนก่อน ⁺⁵
Thnx but I am getting error module 'google.generativeai' has no attribute 'GenerativeModel' tried it in many versions of python and installed it many times.Please help
@Ai_Austin 9 หลายเดือนก่อน ⁺³
Could be the incorrect version of the genai library. I would try uninstalling the library and reinstalling. GenerativeModel is the correct class in the library, as stated by their documentation.
@shishwankalva775 9 หลายเดือนก่อน
Bro have you solved this issue!?
@dionisnavarro3772 9 หลายเดือนก่อน ⁺⁵
if I could put my two cents on having or not the source code from discord or github. This video as it is, AI Austin is giving us the free source code of this amazing tutorial as always, we just have to be patient and following video and type away those line of codes // codes blocks and you got the full working snippet as always ..... Thanks a lot AI Austin ..... (Note: Copy / Paste really won't help you a lot to learn and understand any snippet explained regardless how good is the author explaining such as it is the case here. If one wants to learn about coding, python language and get 'first hand' experience , one must code , search, READ !!!, PRACTICE!!! , Golden rules ..... try and error, you fall..... you stand up /// Error Crash --- Troubleshoot and fix "you stand up" ...... Cheers!!! Happy coding !!!
@Ai_Austin 9 หลายเดือนก่อน ⁺⁴
Well said. I totally agree copying and pasting isn't going to help anyone. Learn. Might be a convenience thing for some to compare to my actual code on there machine. Most people try to skip the video, copy and paste my github code, then complain it doesn't work on the free side of my discord. The real value of the PRO membership is the chat channel escaping that lazy insanity common in most people intrigued by my videos.
Glad to hear the algorithm found someone actually interested in developing this skill, as I aim to make this content for. Thanks for the support brotha!
@rgspacelictics 9 หลายเดือนก่อน ⁺⁴
Please show how to install it in Android as it would be more useful in Android, very good video thanks for the insights
@Nelson_Bazzard 2 หลายเดือนก่อน ⁺¹
I heavily modified your code and used it as a base, I integrated speech interupts, having it so there's a convosation loop until the language model classifies the convosation as dead or allp oints have been answered, I also went ahead and integrated functions, so it says a specific keyword then i use another script to monitor the output and it blocks the keyword from being said, then takes that and uses that keyword to trigger real actions, so far I've got it to turn my lights on and off
@logxdx158 22 วันที่ผ่านมา ⁺¹
How did you implement speech interrupt?
I have a similar setting made using open ai whisper and and silero vad for voice activity detection. The llm works in the background to produce results and then uses TTS to speak out the results. I wasn't able to integrate interrupt here.
Could you help?
@Nelson_Bazzard 22 วันที่ผ่านมา
@@logxdx158 yea bro, let me tell you it was an absolute pain and I was messing around with threading forever, essentially you want a separate thread that is continuously listening for the wakeword, when detected it stopes all processes, so stops pygame (what I used for playing the audio file of Jarvis speaking) and any other sub processes, then it calls the main function (in my case) and that defaults my assistant back to listening
@Nelson_Bazzard 22 วันที่ผ่านมา
@@logxdx158 Hey I would also be really interested to see your assistant
@logxdx158 20 วันที่ผ่านมา
@@Nelson_Bazzard I'm using threading in my speech to text code and it's working superrrr fine. But the thing is it's in a separate file.
The main llm backend is in it's own file, and the text to speech is in it's own file.
I thought making parts modular would be good but integrating interruptions feel impossible now 🥲
@DjTechDJ 8 หลายเดือนก่อน ⁺¹
Great tutorial man. I just wonder if running this on the cloud is possible? And what would be your recommendation?
Keep up the good work man!
@Pro-edit-No.01 4 หลายเดือนก่อน ⁺²
i am a windows user can you give me the link of pyAudio binding for the portAudio library. plzzzzz help me i have to submit my ai project in 2 weeks
nice explaination
❤
@Ronaldograxa 5 หลายเดือนก่อน
looove this! thanks a lot! Please how do you get the mouth to move while you talk on the video? I have been trying to learn this! thanks again for that
@nehapant9792 7 หลายเดือนก่อน
This is a great tutorial! I was wondering if you could consider adding a vision component to it in the future will be 🔥
@davidthiwa1073 7 หลายเดือนก่อน
Why did you use whisper instead of uploading the audio to Gemini since it handles it well natively with barely any downtime?
@Ai_Austin 7 หลายเดือนก่อน
because functionally that would offer no benefits and not doing so gives superior user privacy.
the latency it would take for most users to upload every audio file to the cloud would take longer than my 8gb macbook air does to process the audio with faster-whisper.
i also don't care to livestream my microphone to big tech servers.
@julienduchesneau9747 8 หลายเดือนก่อน
This shit is too funny, I had to work my way because I did it in a conda env ( I do not recommand) and at the end not able to wake up Gemini (maybe my french canadian accent) I looked the end of the video to hear the prononciation and the video wake up my Gemini and wrote me a Dank rap song !!!!
@Ai_Austin 8 หลายเดือนก่อน
🤣
@neoyt8805 9 หลายเดือนก่อน ⁺²
bro why are you not uploading the source code in github?
🙃
@Ai_Austin 9 หลายเดือนก่อน ⁺⁴
It's available on the PRO channels of my Discord for people who care to support the work I put into making the code, so I can continue making the code 🙃
@neoyt8805 9 หลายเดือนก่อน ⁺²
@@Ai_Austin bro I'm 15 I can't afford 25$ per month...also I'm Indian tooo..I've been watching your videos since the start
@Ai_Austin 9 หลายเดือนก่อน ⁺⁴
Appreciate that buddy. If you don't have money, then you have time. Learning to code can make you money. Going through the actual video, will help you learn better so you don't need my tutorials. Hope you enjoyed this one!
@neoyt8805 9 หลายเดือนก่อน
But I've always depended on your source code bro..😢.
Anyway bro I got the ask Gemini button after that some errors are coming..😢
@seththunder2077 9 หลายเดือนก่อน
@@neoyt8805 You need to appreciate the effort he puts in and the time it takes to do these videos. It looks easy with just 1 code after the other but the amount of errors he had to go through definitely hurt his head. Maybe you're just 15 now but one day you will realize how low $25 is for the quality he provides.
@bosvikanimations728 8 หลายเดือนก่อน
With the google ai API key 3:49 , How long I can use that API key? and is it completely free or do I need to pay any money? Anybody comment.
@VloggerDivyansh18 7 หลายเดือนก่อน
Free but limited, only some messages a minute.
@thenewme.ghazialpha หลายเดือนก่อน ⁺¹
12 api call per minute
@patflc 5 หลายเดือนก่อน
hello Austin everything was working so well until importing WhisperModel, then i got this error:
File "/home/linuxbrew/.linuxbrew/Cellar/python@3.12/3.12.5/lib/python3.12/site-packages/av/__init__.py", line 20, in
from av._core import time_base, library_versions
ModuleNotFoundError: No module named 'av._core'
I have been trying to solve it in a bunch of ways but still nothing T_T do you know how could i fix it? im working with windows wsl vscode thx you
anyways, is there an alternative library for achieving these same functions for the program?
@Ai_Austin 5 หลายเดือนก่อน
it is a wsl issue. running a linux operating system inside or a non unix operating system, adds a large layer of complexity and introduces millions of bugs that do not exist on an actual linux operating system.
my advice is code on linux, if not the windows will suffice. but do not code on linux in windows, unless you like having to hack wsl to run your program.
@novadocnews 9 หลายเดือนก่อน
Newbie in coding here🙋‍♂. Hello from Mexico!! Hi Austin. For days now I have tried to follow tutorials to create my own AI assistant with the little understading I have about python. I was ALMOST succesfull to use Crewai but something always went wrong in the end so I got frustrated and ended up creating one in OPENAI in 2 minutes but i have to pay constantly for tokens.😔.
I came across your video and its amazing! Even someone like me can understand it. Thank you!!
Now, I have only 2 questions:
1. I didnt undersant in which of the code part should I include a "context" so I can give it a specific personality?
2. How does this code keep "memory" or keep track of the conversations so it keeps "learning" and being costumizable? (Aside from the pre-programmed context i have made assistants "learn" to analyze an attached file so it can understand better how to behave.)
Thank you in advance and I wish you all the best!
@Ai_Austin 9 หลายเดือนก่อน ⁺¹
Appreciate feedback.
For giving it a personality, you would want to adjust the system message.
This program does not have memory of previous conversations. It remembers everything in the conversation until you hit a context limit or restart the program.
My next video that will be live in a few days will show you how to use embeddings to create an agent to remember what you choose!
@novadocnews 9 หลายเดือนก่อน
@@Ai_Austin Thankyou!! Just suscribed :)
@sowbharnikadevi9331 5 หลายเดือนก่อน
can we get the openai api without paying credits?
@TopBassBoosters 4 หลายเดือนก่อน
Sadly no, that's what is holding me back from making my own chatbot😢
@yunik_developer 9 หลายเดือนก่อน
Bro why are suggesting paid audios like there are many free and open-source libraries for it like pyaudio
@Ai_Austin 9 หลายเดือนก่อน ⁺²
1. People want to know how to give their voice assistant a higher quality voice with OpenAI and some people can afford to add $5 of credit every two months to get a better experience from their assistant.
2. No resource exists with the streaming method I showed online
3. PyAudio does not have tts
4. There is no value in showing the same voice assistant code in every video with a different language model. This gives people ability to learn multiple TTS libraries going through my videos so they can find out what works best for them. If that was the route I was going, I'd make one voice assistant tutorial and a bunch of "how to use this new LLM in python" videos.
5. You can never please everyone. If you think you can create programming videos that satisfies 100% peoples preferences, show me your success rate and how you did it. Otherwise, hope you enjoyed MY programming tutorial from MY preferences.
6. No open source library is going to compare in voice quality without taking 5 minutes to generate a sentence of audio.
So the simple answer: it was the content creation choice that makes the most sense for my content.
@mulderbm 8 หลายเดือนก่อน
How can you not like this style and learning something, you bring it as adventure story, very good use of GenAI, if you do this you can partner with Netflix
@64revolt 8 หลายเดือนก่อน ⁺¹
I can tell you one thing though, if the result is this hugely annoying angry sibilant voice that the video has I am 100% not interested.
@CreepyFilmz 2 หลายเดือนก่อน
Outdated video. Sorry.
@aragamideveloper 8 หลายเดือนก่อน
Dude got a nice voice but hes prefere to use ai voices
@robottinkeracademy 9 หลายเดือนก่อน ⁺⁶
I hate when ppl ask me for code, nothing is free in this world and certainly not my time
@Ai_Austin 9 หลายเดือนก่อน ⁺¹
respect! ✊
@SunilBagri-f9b 6 หลายเดือนก่อน
Ayo bro.
I can pay you 150$ if you decrease the latency of bot to less than 1 second. Contact me on telegram: dhruvsaharan @@Ai_Austin
@wartem 3 หลายเดือนก่อน ⁺¹
??? Heard of open source? I've given free support for my free apps and mods for 8 years now.
@Success_hustler07 2 หลายเดือนก่อน
If cr7 asking u for sorce code . Than u what aso From him for money or autograph 😅
@aayushsharma2683 9 หลายเดือนก่อน
hi, I'm getting this error can you pls tell me how to resolve this
AttributeError: module 'google.generativeai' has no attribute 'GenerativeModel'
@shishwankalva775 9 หลายเดือนก่อน
Bro have you solved this error
@Ai_Austin 9 หลายเดือนก่อน
this means you have an old version of google.generativeai installed and need to upgrade it to the newest version that contain the GenerativeModel class.
@bosvikanimations728 8 หลายเดือนก่อน
I get this in output
Ask Gemini: What are you?
response:
GenerateContentResponse(
done=True,
iterator=None,
result=glm.GenerateContentResponse({'candidates': [{'content': {'parts': [{'text': 'I am Gemini, a large multi-modal model, trained by Google. I am designed to understand and generate human language, answer questions, and provide information on a wide range of topics.'}], 'role': 'model'}, 'finish_reason': 1, 'index': 0, 'safety_ratings': [{'category': 9, 'probability': 1, 'blocked': False}, {'category': 8, 'probability': 1, 'blocked': False}, {'category': 7, 'probability': 1, 'blocked': False}, {'category': 10, 'probability': 1, 'blocked': False}], 'token_count': 0, 'grounding_attributions': []}]}),
)
When running this code
import google.generativeai as genai
GOOGLE_API_KEY = 'your_google_api_key'
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel('gemini-1.0-pro-latest')
response = model.generate_content(input("Ask Gemini: "))
print(response)
what should I do to get the output like this. 5:09
@Ai_Austin 8 หลายเดือนก่อน ⁺¹
print(response.text) to select only the text value in the response
@bosvikanimations728 8 หลายเดือนก่อน
@@Ai_Austin Thanks bro. And could you make a video or tell me about How can I implement gTTS instead of Openai Key
@adityakak8661 8 หลายเดือนก่อน
@@Ai_Austin
Ask Gemini: hhjjkkl
Traceback (most recent call last):
File "/main.py", line 43, in
print(response.text)
^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/google/generativeai/types/generation_types.py", line 407, in text
if len(parts) != 1 or "text" not in parts[0]:
^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'Part' is not iterable
i am getting this in the terminal (the ask gemini line is where i put in the text input)
@adityakak8661 7 หลายเดือนก่อน
@@Ai_Austin Ask Gemini: who are you
Traceback (most recent call last):
File "/main.py", line 42, in
print(response.text)
^^^^^^^^^^^^^
File "/opt/anaconda3/lib/python3.11/site-packages/google/generativeai/types/generation_types.py", line 407, in text
if len(parts) != 1 or "text" not in parts[0]:
^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'Part' is not iterable
this is the error im getting, when doing print(response.text). how do i fix this?

ต่อไป

เล่นอัตโนมัติ

World’s Fastest Talking AI: Deepgram + Groq