Creating JARVIS - Python Voice Virtual Assistant (ChatGPT, ElevenLabs, Deepgram, Taipy)

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • Check out the GitHub repository here:
    github.com/AlexandreSajus/JARVIS
    0:00 Talking to JARVIS
    0:58 Intro
    1:52 How JARVIS works
    3:12 How to setup JARVIS
    4:05 Getting API keys
    5:05 Installing JARVIS
    6:49 Running JARVIS
    7:44 Talking to JARVIS
    9:18 How to mod JARVIS for your use case
    10:45 Recording audio using Pyaudio
    12:25 Transcribing to text using Deepgram
    12:45 Sending prompts to OpenAI GPT
    13:14 Changing JARVIS' personality (context)
    14:10 Generating voice using ElevenLabs
    14:50 Playing audio using Pygame
    15:15 Displaying the convo in a webpage with Taipy
    16:40 Use cases and limitations
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 174

  • @joeternasky
    @joeternasky 6 หลายเดือนก่อน +6

    Fantastic project. Love how you connected these services and packages together. Thanks for going over the project, posting this video, etc. I learned quite a bit.

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      Thank you very much!

  • @dwilson7230
    @dwilson7230 5 หลายเดือนก่อน +2

    Bro this is sick as hell! Thanks for posting a video about it.

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      Thanks! Glad you liked it!

  • @isagiyoichi5207
    @isagiyoichi5207 5 หลายเดือนก่อน +2

    this is actually really incredible thanks for the video

  • @iandanforth
    @iandanforth 6 หลายเดือนก่อน +2

    Impressive! One key bit of the UX of ChatGPT mobile are the "clicks" that indicate when the model has 1. Stopped listening and 2. Stopped talking. A very small touch that makes a world of difference.

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      Yes I should definitely find better ways to convey to the user when he is being listened to

  • @chrsl3
    @chrsl3 6 หลายเดือนก่อน +1

    Fantastic work and video, thank you!!

  • @gr8tbigtreehugger
    @gr8tbigtreehugger 3 หลายเดือนก่อน +2

    Many thanks for this super helpful tutorial! My next step is voice ID, so the AI knows it's me!

  • @muhammadilyasrasyid5817
    @muhammadilyasrasyid5817 6 หลายเดือนก่อน +1

    thank you very much sir

  • @xgodwhitex
    @xgodwhitex 6 หลายเดือนก่อน +1

    Amazing job!

  • @Threecommaaclub
    @Threecommaaclub 5 หลายเดือนก่อน +1

    Hey Alex, I'm using a Linux Device running python 3.11 venv, when i try to run main.py i get the following error " no module name pyaudio. i go about using the simple command pip install pyaudio, however when running that command i get greeted with this error, "could not build wheels for py audio, which is required to install pyproject.toml-based projects, i was hoping you may be able to share some insight into why this may be happening. Great video btw, i await your speedy response :)

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน +1

      Were you able to solve this by creating a new virtual environment. Otherwise, I have no idea how to fix this, let me know if you find a solution

    • @Threecommaaclub
      @Threecommaaclub 5 หลายเดือนก่อน +1

      @@alexandresajusyeah man we were able to make it happen once we used the virtual env thanks again

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      @@Threecommaaclub Perfect!

  • @marouane9682
    @marouane9682 6 หลายเดือนก่อน +1

    i love it maaaaaaaan thank u for sharing .. pls keep sharing wiith us ur magic

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +1

      Thank you!

    • @marouane9682
      @marouane9682 6 หลายเดือนก่อน +1

      @@alexandresajus brother help me pls on my questions, .. how can i make jarvis able to transcribe and talk in french instead of english ?

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +1

      @@marouane9682 This shoud not be too hard, you just need to add a few parameters for Deepgram and Elevenlabs. For Elevenlabs, just change the voice parameter to "Pierre" or another french voice at line 116 of main.py. For Deepgram it is a bit more complicated, you will have to add a PrerecordedOptions parameter at line 72 of main.py which contains a language="fr" parameter. It's a bit too much to write in a comment so I invite you to take a look at the Deepgram doc (github.com/deepgram/deepgram-python-sdk/blob/main/README.md) Let me know if you need more help

    • @marouane9682
      @marouane9682 6 หลายเดือนก่อน +1

      @@alexandresajus thank you so much cheef

  • @painperdu6740
    @painperdu6740 6 หลายเดือนก่อน +1

    LETS GOOO NEW ALEXANDRE SAJUS VIDEO I CLICK LIKE I SUBSCRIBEEE

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      XD J’en peux plus de toi

  • @mikew2883
    @mikew2883 6 หลายเดือนก่อน +1

    Good stuff! 👍

  • @oldspammer
    @oldspammer 3 หลายเดือนก่อน +1

    Some operating system API exists for text to speech are free and can act instantly without having to transact information flows through the internet to some central system that might get bogged down with excess usage. I have noticed that if one becomes dependent upon something or someone, a monopoly situation may well result and you end up potentially having to pay pay pay for things that your local PC could have done for free on its own without the need of network data interactions. Often the distant server has a better sounding voice and it does not mispronounce as many words, but soon you shall be out sourcing too many things to outside entities where you become too dependent on them.
    If a set of 10 words or so are known to be mispronounced by the local speech api in your PC is there a way to have your PC handle those exception words with specialized processing where a sylable at a time is custom handled per each of the 10 exception words to save you from having to use an api key that can be withdrawn from handy use by the flick of a switch by the third party provider?

  • @rodrigodifederico
    @rodrigodifederico 5 หลายเดือนก่อน +1

    I did the same a few months ago but i made it all through a real phone number so you can actually call a number and an assistant will pick the call and talk to you about the shop services or clinic procedures, etc. Pretty nice lab.

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน +1

      That is a great use case. Were there any issues surrounding the latency? Were there any customer complaints from people who found the delay in answering too long or did not want to talk to an AI?

    • @rodrigodifederico
      @rodrigodifederico 5 หลายเดือนก่อน +2

      @@alexandresajus I reduced the delay by 90% running all the systems locally. The speech to audio generator, audio transcription, the language model, etc. The only remote api that i used was for the phone number ( twillio ). If you run everything through remote apis, the delay will be a real problem, won't work as an assistant over the phone because it may take up to 10 seconds for an answer. But running everything locally it's almost instant. For the voice part, both to text and back, i don't generate an audio file, i stream it, so there is no delay. With a few tricks, you can make it almost real time 🙂

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      @@rodrigodifederico Great! Is there anywhere where I could take a look at that project. Which text-to-speech model are you using?

    • @rodrigodifederico
      @rodrigodifederico 5 หลายเดือนก่อน +1

      @@alexandresajus I am planning to transform it into a product so for now i won't share the code but i'll record a live interaction video and upload it to youtube soon, ill drop the link here if you are interested. About the text to speech, i created my own model.. pretty similar to elevenlabs. But i have to say that if you use elevenlabs streaming, this part of the process will have a similar delay, so i might switch to elevenlabs stream in the future, unless i want to keep it 100% free of costs, then i would keep my model.

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      @@rodrigodifederico Sure I'd love to see a demo

  • @sebaperalta2001
    @sebaperalta2001 6 หลายเดือนก่อน +1

    Nice work! Is it possible to have it answering only on activation word? Like if you don't say Jarvis, then it would not answer. So the program is always listening, but activates on context.

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +1

      Thanks! Yes this should be easy to do, just add a condition: if the activation word is not in the transcript, continue (restart the loop without answering)

  • @edbayliss1862
    @edbayliss1862 5 หลายเดือนก่อน +1

    This really interested me. I modified it a bit to add a listen button to the UI so it only listen when you select listen, this is easier than a “wake word”
    Then I thought, integration. I use MacOS.
    I build a folder called modules, added a second step that parse the text through GPT again to match a dictionary, and then GPT decide which function in the dictionary matched and ran it.
    It worked great for checking calendar events etc, and if no matches were found it defaulted to gpt chat reponse but the extra layer added more latency and just isn’t scalable

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      Incredible! Good work! Is there anywhere where we could check out your project?

    • @edbayliss1862
      @edbayliss1862 5 หลายเดือนก่อน +1

      @@alexandresajus sure, is your GitHub open to branches? I can just push it as a branch for you check out on Monday

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      @@edbayliss1862 I'm not sure, I think it is open to fork then pull request. I think I need to manually add you as a collaborator if you want to directly push to a branch. Your call. Or you could just share the link of your repo if it is public.

  • @taylorsmith1720
    @taylorsmith1720 3 หลายเดือนก่อน +2

    🎯 Key Takeaways for quick navigation:
    01:02 *🚀 Overview of Voice Virtual Assistant Development*
    - Explanation of building a voice virtual assistant similar to Jarvis from Iron Man.
    - Overview of the backend workflow involving voice input, transcription, response generation, and audio output.
    - Introduction to third-party services like Deepgram, OpenAI, 11 Labs, and Taipy used in the development process.
    03:21 *🔧 Installation Instructions for the Voice Virtual Assistant*
    - Cloning the GitHub repository and installing necessary requirements.
    - Setting up API keys for Deepgram, OpenAI, and 11 Labs.
    - Creating an environment file to store API keys securely.
    - Executing installation commands and waiting for requirements to install.
    08:33 *🛠️ Running the Voice Virtual Assistant*
    - Instructions for running the display interface (`display.py`) and the main script (`main.py`).
    - Description of how the assistant listens, transcribes, generates responses, and displays conversations.
    - Example interaction demonstrating the assistant's response to user input.
    09:28 *💡 Customization and Modification of the Voice Virtual Assistant*
    - Guidance on modifying the assistant for specific use cases.
    - Suggestions for changing context, models, and voices for customization.
    - Discussion of potential improvements, such as integrating news, adding memory, and overcoming latency limitations.
    Made with HARPA AI

    • @alexandresajus
      @alexandresajus  3 หลายเดือนก่อน +1

      Now THAT is how you should advertise a product. Great summary!

  • @Firebabys89
    @Firebabys89 2 หลายเดือนก่อน +1

    u are amazing dude

  • @crprp4769
    @crprp4769 5 หลายเดือนก่อน +1

    Awesome video! Thanks for sharing, but I've got a question. How can I implement a pre-trained OpenAI assistant into Taipy?

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน +1

      Thanks! It should be quite simple. Just replace the model variable line 53 at 12:52 with your own model ("ft:gpt-3.5-turbo:my-org:custom_suffix:id") and it should work. Let me know if you need more help.

  • @shawnmuok542
    @shawnmuok542 หลายเดือนก่อน

    hello i have a problem when i try to run main.py it shows me no moduel deepgram found

  • @tismine
    @tismine 2 หลายเดือนก่อน +1

    Hey Alex! Thanks a lot for the video, can you please explain a good way to create a neat requirements.txt file after I'm done with a project?

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน

      Sure! Use « pip list » in terminal to check which package versions you are using. Then create a requirements.txt at the root of your project with on each line « package_name==version » for only the packages you import within the code (not their dependencies)

  • @JanikJanesch
    @JanikJanesch วันที่ผ่านมา

    Do you know why thee is an error that it says i inly have 12 xaracters left but my request needs 42 caracters? even tho i have 20$ account balance on chatgpt.

  • @handlepersonthing
    @handlepersonthing 6 หลายเดือนก่อน +1

    Awesome work! I wonder if using the GPT-4 model would speed things up a bit?

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +2

      Thank you very much! Unfortunately, I don’t think switching the model would do a lot. Profiling here is 1s for transcribing, 1s for gpt and 2s for generating audio. The best way to reduce latency would be using smaller/quantized models or streaming data instead of doing each task sequentially

    • @serenditymuse
      @serenditymuse 6 หลายเดือนก่อน +2

      @@alexandresajus larger models often take longer thinking.

  • @nightmare6159
    @nightmare6159 2 หลายเดือนก่อน +1

    I need help, When I do pip install -requirements.txt it says there is no such directory even tho I see the file

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน

      Make sure that you are in the right directory in your terminal. You can use ls in the terminal to check the contents of the directory you are in. You can switch directory using cd in the terminal or using "Open Folder..." in VSCode.
      In general, the syntax should be "pip install -r [PATH-TO-TXT]"

  • @DalazG
    @DalazG 2 หลายเดือนก่อน +1

    Incredible material! Thanks bro, you're tutorials are super helpful for those learning to code. I'm trying to follow along
    Not sure if you've taken any subscriber requests. I've really wanted to find a tutorial on creating a machine learning model on python that can figure out its own strategy for successfully trading forex and integrating it with mql4 or 5.
    Definitely possible but there's next to no tutorials on this anywhere i noticed

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน

      Thanks! Glad to know the video is helpful. This indeed seems to be a niche topic. I don’t think I could help you with this unfortunately since I don’t know anything about forex or mql.

    • @DalazG
      @DalazG 2 หลายเดือนก่อน +1

      @alexandresajus no worries, this tutorial was super useful anyway! Subscribed.
      Curious, would ask these apis you used for this jarvis application cost a lot of money though? I know chatgpt api isn't free (just the free credits)

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน +1

      @@DalazG The APIs did not cost that much: for the whole project I talked for about 2 hours to JARVIS. It cost less than a dollar for both Deepgram and OpenAI. ElevenLabs cost me 5$ only because they have a subscription based fee.

    • @DalazG
      @DalazG 2 หลายเดือนก่อน

      @@alexandresajus gotcha, elevenlabs has a brilliant voice api. But just because it adds up, i would probably prefer to use a cheaper worse one 😅 .

  • @adben001
    @adben001 25 วันที่ผ่านมา

    Will That generate Costs throught the API or is that for free?

  • @PenguinjitsuX
    @PenguinjitsuX 5 หลายเดือนก่อน +2

    This is awesome! I am wondering though how much this project is costing you from API calls (if you were to use this daily and pretty often)? I'm planning to build a home assistant that can control all of my home gadgets and perform actions on my computer, but I'm trying to decide whether I should use all local models (whisper, coqui, and mistral) instead of the paid online services. The quality and speed is a bit lower locally, but it's free so I'm thinking about the tradeoff. Please let me know what you think, thanks!

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน +1

      Hey! Thanks, glad you liked it! I recommend going the paid online route. ElevenLabs is a paid subscription at 5$/month for 30,000 characters. OpenAI and Deepgram are pay-per-request but are dirt cheap: for this whole project, I probably talked for an entire hour with JARVIS, and it cost me 12 cents on OpenAI and 40 cents on Deepgram. If you want to lower cost, find an ElevenLabs equivalent that is pay-per-request, and you'll be good.
      Going local will drastically reduce performance and speed unless you have proper hardware, i.e., a dedicated GPU cluster at home. You'll have to use open-source, quantized to 8Gb models. If you have adequate hardware though, going local might be a good idea since you'll keep performance, and you can reduce latency by half by hosting locally, doing code shenanigans to parallelize each task instead of running them sequentially, and generally optimizing the pipeline.
      Latency is the biggest drawback; JARVIS is at 4 seconds of latency. Even if it was 2 seconds, it is still too awkward for a conversation.

    • @PenguinjitsuX
      @PenguinjitsuX 5 หลายเดือนก่อน +1

      @@alexandresajus Thanks for the in-depth reply! That's awesome to see that it's so cheap. I was actually really lucky and got got a 4090 last week. I've been running tests - On whisper and llm inference, I got performance at almost real-time,

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      @@PenguinjitsuX Wow you already made a lot of progress! Yeah unfortunately I think we are just a few years away to solve that performance-latency tradeoff for TTS, then we'll be able to have a proper conversational Jarvis. Is your project open-source? I would love to take a look if you'd let me. I don't have a Discord server but I'd love to keep in touch on Discord. Here's my username: alex_1337

  • @PilotsPitstop
    @PilotsPitstop หลายเดือนก่อน +1

    what exactly did u purchase on the open ai api thingy for it not to return "exceeded current quota"? i payed for chat gpt "hobbyist" plan and thought that would help but nah i wasted 20 $. and u should def start a discord good stuff

    • @alexandresajus
      @alexandresajus  หลายเดือนก่อน +1

      Ah I see, you’re not supposed to pay a chatgpt subscription. OpenAI have a website for their API where you just have to enter billing details and maybe add a dollar of credit to use. They charge per request and not on a subscription basis. It should be on the same site where you got your API key

    • @PilotsPitstop
      @PilotsPitstop หลายเดือนก่อน

      @@alexandresajus AH MY HERO SO FAST, so i just add some money to my account and boom it works?

  • @aashishkumarlohra277
    @aashishkumarlohra277 หลายเดือนก่อน

    when i run python main.py . i get this error
    Traceback (most recent call last):
    File "E:\JARVIS_TEST\JARVIS\main.py", line 15, in
    from record import speech_to_text
    File "E:\JARVIS_TEST\JARVIS
    ecord.py", line 8, in
    from rhasspysilence import WebRtcVadRecorder, VoiceCommand, VoiceCommandResult
    ModuleNotFoundError: No module named 'rhasspysilence'

    • @alexandresajus
      @alexandresajus  หลายเดือนก่อน

      Check this issue:
      github.com/AlexandreSajus/JARVIS/issues/4
      Also try creating a new clean virtual env before installing requirements. Check if there are no errors during installation. Check that you are running main.py from that env. Check that rhasspysilence is installed with pip list

  • @olakunleogunseye9657
    @olakunleogunseye9657 6 หลายเดือนก่อน +1

    aye this is so cool but there is no wake up keep and end key buh this the greatest and I know you know

  • @omjondhalefyco-9953
    @omjondhalefyco-9953 3 หลายเดือนก่อน +1

    What alternative can be used for elvenlabs

    • @alexandresajus
      @alexandresajus  3 หลายเดือนก่อน

      I have not tried anything apart from Elevenlabs and google_tts. I was not impressed with the quality of google_tts, but it was way faster. I'm sure you'll find better answers online

  • @pntra1220
    @pntra1220 6 หลายเดือนก่อน +1

    Nice project bro! Do you know how can I use deepgram to transcribe spanish voice? I already figured it out for elevenlabs but not for deeprgram. Thank you for taking the time to read this and continue making this videos!

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +1

      Thanks! I have not tried but there does seem to be the option to transcribe Spanish voice by using their nova-2 model and adding the parameter "language=es" to the query
      developers.deepgram.com/docs/language
      developers.deepgram.com/docs/models-languages-overview

  • @PandaLorian14
    @PandaLorian14 หลายเดือนก่อน

    dose noone get same code on deepgram me and zou dont got same code

  • @anirvindhch1209
    @anirvindhch1209 3 หลายเดือนก่อน +1

    What are you using to code this Alexandre??

    • @alexandresajus
      @alexandresajus  3 หลายเดือนก่อน +1

      What do you mean? I'm coding in Python using VSCode, I used external APIs like ElevenLabs, OpenAI, Deepgram. Libraries like Taipy for the interface. I use GitHub Copilot to help me code faster as well.

  • @edosetiawan9589
    @edosetiawan9589 3 หลายเดือนก่อน +1

    Awesome!! How to make this project to access custom data

    • @alexandresajus
      @alexandresajus  3 หลายเดือนก่อน

      A quick way to do this would simply be adding the data as a string in the context. This has its limitations (the context has a max length). If you want a chatbot that knows information from documents. I suggest you look into RAG models

  • @FantasyDark-ub3xh
    @FantasyDark-ub3xh 3 หลายเดือนก่อน +1

    Sir i want to do like this sir is there any Free API available if not in OpenAI means, pls tell some other AI APIs to do ai tasks sir!

    • @alexandresajus
      @alexandresajus  3 หลายเดือนก่อน

      Sir! If you search for them online, there should be free alternatives for the models I used in the video! I recommend looking at HuggingFace for an OpenAI alternative, sir! For example, the Mistral model has a free inference API that is only rate-limited, sir!

  • @charliepersonalaccount5276
    @charliepersonalaccount5276 2 หลายเดือนก่อน +1

    Great stuff man! What's the best way to chat with you? I have an mvp i want to run by you and maybe have you help me build it out

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน

      Thanks. Feel free to reach out on Linkedin:
      www.linkedin.com/in/alexandre-sajus/
      I don't have much time because of work, but I can take a look.

  • @user-qw6zz7pr2x
    @user-qw6zz7pr2x 4 หลายเดือนก่อน

    When I run display.py to start the web interface, it shows "ModuleNotFoundError: No module named 'taipy'". But then after I install taipy (version 3.0.0), it still gives me the same error message. I have tried to uninstall and install taipy but same error message...

    • @alexandresajus
      @alexandresajus  4 หลายเดือนก่อน +1

      Are you sure you are running display.py from the Python environment where taipy is installed? Use `pip list` to check that taipy is installed and then `python display.py` to run the file. If this does not work, I suggest creating a new virtual environment and re-installing the requirements. Bear in mind that taipy only works with Python 3.8 to 3.11

    • @user-qw6zz7pr2x
      @user-qw6zz7pr2x 4 หลายเดือนก่อน

      Thanks! Instead of click to run display.py, I typed in "python display.py" and it open the website! @@alexandresajus
      One more question- when I ran "python main.py", I got the error message "TypeError: 'ABCMeta' object is not subscriptable". I am using Python3.8.10 in Visual Studio.

  • @s.gveeronstart4794
    @s.gveeronstart4794 5 หลายเดือนก่อน +1

    sir can u teach how to made it
    i mean to say that if u make a play list according to this topic

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      Unfortunately, I won't be making an extended tutorial on this in the near future. But I'm sure there are many tutorials on the tools I used on TH-cam. You can just look up "ElevenLabs tutorial" or "OpenAI API tutorial".

  • @GameXnationOfficial
    @GameXnationOfficial 2 หลายเดือนก่อน +1

    "You exceeded your current quota, please check your plan and billing details" its showing something like this and jarvis is not replying after an error

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน

      You've exceeded your free quota on one of the APIs, check on which function call this error gets triggered to see which API needs billing

  • @EnnoAI431
    @EnnoAI431 5 หลายเดือนก่อน +1

    Great Project!!
    Would it also run on a RaspberryPi?
    Recently I ran a project also called Jarvis on a Pi . You don't need the API's from Deepgram & Elevenlabs and also latency is pretty good. Although the voice was horrible.... unless you like robots :-).

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน

      Thanks! Sure this should be able to run on Raspberry since all of the heavy stuff is third party services that are hosted so barely anything runs in local. Cool! Where can I take a look at your project?

  • @undeadgaming2102
    @undeadgaming2102 3 หลายเดือนก่อน +1

    i want to ask can you make a video on how we can make it do different tasks???

    • @alexandresajus
      @alexandresajus  3 หลายเดือนก่อน

      What task are you thinking about? If it's just asking about the weather, you can add the current weather to the context so Jarvis knows about the current weather

    • @undeadgaming2102
      @undeadgaming2102 3 หลายเดือนก่อน

      @@alexandresajus i was thinking like a google assistant

  • @AndroidePulpico
    @AndroidePulpico 3 หลายเดือนก่อน +1

    The latency is preaty Bad, have you tried Whisper Jax or Faster Whisper ??

    • @alexandresajus
      @alexandresajus  3 หลายเดือนก่อน

      Yeah, the latency issue is currently the worst one. I have not tried these services. Let me know if it speeds up things. Currently, the consensus for reducing latency seems to be streaming data, running the tasks in parallel instead of sequentially, and hosting local and smaller models.

  • @tomasrochaakemi
    @tomasrochaakemi 6 หลายเดือนก่อน +2

    hey alex! can you help me with this error? "ERROR: Failed building wheel for webrtcvad
    Failed to build webrtcvad
    ERROR: Could not build wheels for webrtcvad, which is required to install pyproject.toml-based projects"

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      Sure! This is because you don't have Microsoft Visual C++ installed properly. I have written a guide on how to fix this here:
      github.com/AlexandreSajus/JARVIS/issues/3

    • @tomasrochaakemi
      @tomasrochaakemi 6 หลายเดือนก่อน +1

      @@alexandresajus hey man. it worked but now i got another error. while running python main.py this error apears: line 17, in set_api_key
      os.environ["ELEVEN_API_KEY"] = api_key
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^
      File "", line 684, in __setitem__
      File "", line 744, in check_str
      TypeError: str expected, not NoneType

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      @@tomasrochaakemi This means that Python has tried to find a .env file with ELEVEN_API_KEY but has not found either the file or the key in the file. You'll need to create a .env file a the same level of main.py containing ELEVENLABS_API_KEY=[your-API-key]
      Please follow the Requirements and the How to Install Step 3 of my repository ( github.com/AlexandreSajus/JARVIS ). I mention these steps at 4:06 and 6:06 of the video.

    • @tomasrochaakemi
      @tomasrochaakemi 6 หลายเดือนก่อน +1

      @@alexandresajus I did it still shows this

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      ​@@tomasrochaakemi Hmmm weird issue. As a workaround, just replace the 3 lines of os.getenv("...") by simply the API key as a string. For example:
      OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") -> OPENAI_API_KEY = "YOUR-API-KEY"

  • @Jordan-tr3fn
    @Jordan-tr3fn 5 หลายเดือนก่อน +1

    hey cool vids, why not using OpenAI for translation instead on Deepgram ? you could stream the audio and not have audio files

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน +1

      This is indeed probably a better approach. I was not aware of it at the time

    • @tismine
      @tismine 2 หลายเดือนก่อน

      Are you sure OpenAI supports streamed audio input? I looked around all the places no one was able to do that...

    • @Jordan-tr3fn
      @Jordan-tr3fn 2 หลายเดือนก่อน

      @@tismine « openai stream audio » on Google …

  • @niyatibalsara9409
    @niyatibalsara9409 4 หลายเดือนก่อน

    im encountering webrtcvad installation error..please let me know what to do..its urgent.. i need it for my project

    • @niyatibalsara9409
      @niyatibalsara9409 4 หลายเดือนก่อน

      @alexandresajus

    • @alexandresajus
      @alexandresajus  4 หลายเดือนก่อน +1

      Please refer to this fix, let me know if it works:
      github.com/AlexandreSajus/JARVIS/issues/3

    • @niyatibalsara9409
      @niyatibalsara9409 4 หลายเดือนก่อน

      PS C:\Users\HP\Desktop\JARVIS2> & c:/Users/HP/Desktop/JARVIS2/myvenv/Scripts/python.exe c:/Users/HP/Desktop/JARVIS2/JARVIS/main.py
      Traceback (most recent call last):
      File "c:\Users\HP\Desktop\JARVIS2\JARVIS\main.py", line 8, in
      from dotenv import load_dotenv
      ModuleNotFoundError: No module named 'dotenv'
      PS C:\Users\HP\Desktop\JARVIS2> pip install python-dotenv
      Collecting python-dotenv
      Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
      Installing collected packages: python-dotenv
      Successfully installed python-dotenv-1.0.1
      [notice] A new release of pip available: 22.3.1 -> 24.0
      [notice] To update, run: python.exe -m pip install --upgrade pip
      PS C:\Users\HP\Desktop\JARVIS2> .\venv\Scripts\Activate
      (venv) PS C:\Users\HP\Desktop\JARVIS2> python JARVIS\main.py
      pygame 2.5.2 (SDL 2.28.3, Python 3.11.2)
      Hello from the pygame community. www.pygame.org/contribute.html
      Traceback (most recent call last):
      File "C:\Users\HP\Desktop\JARVIS2\JARVIS\main.py", line 13, in
      import elevenlabs
      File "C:\Users\HP\Desktop\JARVIS2\venv\Lib\site-packages\elevenlabs\__init__.py", line 2, in
      from .simple import * # noqa F403
      ^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\HP\Desktop\JARVIS2\venv\Lib\site-packages\elevenlabs\simple.py", line 113, in
      elevenlabs.set_api_key(os.getenv("ELEVENLABS_API_KEY"))
      ^^^^^^^^^^^^^^^^^^^^^^
      AttributeError: partially initialized module 'elevenlabs' has no attribute 'set_api_key' (most likely due to a circular import)
      Please solve this error.. its urgent not working.. please help

  • @ezzeldinhany7301
    @ezzeldinhany7301 4 หลายเดือนก่อน

    hi alex, it says no module named 'deepgram' after running python main.py in terminal what should i do?

    • @ezzeldinhany7301
      @ezzeldinhany7301 4 หลายเดือนก่อน

      i also tried pip install deepgram and it did not work

    • @alexandresajus
      @alexandresajus  4 หลายเดือนก่อน

      @@ezzeldinhany7301 Using the same terminal where you ran "python main.py", run "pip list" and check if deepgram if properly installed. I suggest you reinstall requirements into a clean environment for this. Let me know if this works.

    • @ezzeldinhany7301
      @ezzeldinhany7301 4 หลายเดือนก่อน

      @@alexandresajus i did reinstall requirements during the process of trying to solve this problem

    • @alexandresajus
      @alexandresajus  4 หลายเดือนก่อน

      @@ezzeldinhany7301 Did the terminal say that deepgram was successfully installed? Can you check with "pip list" if deepgram is installed? Can you check if you are running main.py from the environment where you installed deepgram? Once again, I strongly recommend creating a fresh Python environment using venv and installing the requirements there and checking everything above

    • @ezzeldinhany7301
      @ezzeldinhany7301 4 หลายเดือนก่อน

      i now have fixed the deepgram issue but it says it cannot download rhasspysilence i tried with pip also
      @@alexandresajus

  • @felipemartinez1924
    @felipemartinez1924 5 หลายเดือนก่อน +1

    How do I change the speech recognition to spanish? Btw amazing work!

    • @alexandresajus
      @alexandresajus  5 หลายเดือนก่อน +1

      Thanks! I have not tried another language but there does seem to be the option in Deepgram's API to transcribe Spanish voice by using their nova-2 model and adding the parameter "language=es" to the query
      developers.deepgram.com/docs/language
      developers.deepgram.com/docs/models-languages-overview

    • @felipemartinez1924
      @felipemartinez1924 5 หลายเดือนก่อน +1

      @@alexandresajus Thanks, you're amazing! You should do a series of this kind of videos, maybe a Jarvis like this one but that is able to take action like opening a program, or saving reminders, stuff like that. Thank you very much and looking forward to more videos. :)

    • @jan-peterbornsen8506
      @jan-peterbornsen8506 4 หลายเดือนก่อน

      @@felipemartinez1924 Hey were you able to change the language of Deepgram's API? I want to change it to german but all my attempts failed so far... i tried just adding a language=de but its not helping in anyway...

  • @ibrahimqadirmustafa
    @ibrahimqadirmustafa 6 หลายเดือนก่อน +1

    Amazing bro , I want create like this but in Kurdish language do you know how can i use it and speaking in Kurdish language?

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      Thanks! Unfortunately this might be harder to do in Kurdish. You need to find services that support the Kurdish language which are quite rare: both Deepgram and Elevenlabs do not support Kurdish currently. I'd guess that OpenAI does support Kurdish but I am not sure, even if it does not you can use a service to do the English-Kurdish translation in the middle of the pipeline.

    • @ibrahimqadirmustafa
      @ibrahimqadirmustafa 6 หลายเดือนก่อน +1

      @@alexandresajus
      Can I use Google translate package in python for translate the content response from AI

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      @@ibrahimqadirmustafa Yes this would solve part of the problem

    • @ibrahimqadirmustafa
      @ibrahimqadirmustafa 6 หลายเดือนก่อน

      @@alexandresajus ok thanks for you if i need help i can contact u 😁

  • @NotZymsYT
    @NotZymsYT 2 หลายเดือนก่อน

    can anyone help be i keep getting "ERROR: Failed building wheel for pyarrow" ?

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน +1

      Switch to Python 3.8 to 3.11. The Taipy version I am using is old and does not support Python 3.12. You can also try changing to taipy==3.1.0 in requirements.txt
      github.com/AlexandreSajus/JARVIS/issues/7

    • @NotZymsYT
      @NotZymsYT 2 หลายเดือนก่อน +1

      @alexandresajus you are awesome thank you so much !!!!

    • @NotZymsYT
      @NotZymsYT 2 หลายเดือนก่อน

      @@alexandresajus hey sorry to be a pest the original issue is fixed but now It seems like the api_key variable obtained from os.getenv("ELEVENLABS_API_KEY") is None, and the set_api_key function from the elevenlabs module is trying to set this None value as the value of the ELEVEN_API_KEY environment variable. However, environment variables must be strings, so attempting to assign None as the value raises a TypeError. im really new to all this and any help is super appreciated

    • @alexandresajus
      @alexandresajus  2 หลายเดือนก่อน

      @@NotZymsYT os.getenv("ELEVENLABS_API_KEY") should not get None. Please make sure you properly do step 3 of the installation as described at 6:04: make sure you have a .env file at the same level as main.py and make sure it is filled with the API keys using the syntax described in the README

    • @NotZymsYT
      @NotZymsYT 2 หลายเดือนก่อน

      @@alexandresajus i ran through the whole video on extra slow and now its giving me Traceback (most recent call last):
      File "main.py", line 59, in
      file_name: Union[Union[str, bytes, PathLike[str], PathLike[bytes]], int]
      TypeError: 'ABCMeta' object is not subscriptable

  • @_GIGABYTES
    @_GIGABYTES 6 หลายเดือนก่อน

    Traceback (most recent call last):
    File "F:\va\New folder (3)\JARVIS\display.py", line 5, in
    from taipy.gui import Gui, State, invoke_callback, get_state_id
    ModuleNotFoundError: No module named 'taipy'

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      Are you sure you installed the requirements of the project (5:33)?

    • @Threecommaaclub
      @Threecommaaclub 5 หลายเดือนก่อน +1

      hey, im I'm not sure if you're still running into this issue however I was able to solve this dilemma by creating a virtual environment as stated in the video try creating a virtual environment and if you need help there is a another video on TH-cam that should solve that issue.

  • @AdeniranFrancis
    @AdeniranFrancis 12 วันที่ผ่านมา

    whenever i see videos like these, i clone the repos and i am never, ever able to successfully install all the dependencies or requirements.txt. makes me want to give up writing code altogether.

  • @GreggHoush
    @GreggHoush 6 หลายเดือนก่อน +7

    You should disable those API keys and blur API keys in videos like these. Everybody wants free API keys.

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +2

      Good advice. I disabled these keys right after recording and they all have a hard rate limit

  • @blazzycrafter
    @blazzycrafter 5 หลายเดือนก่อน +2

    YOU STOLE MY WORK?........
    ......
    ......
    .....
    .....
    ......
    HOW THE HEK DID IT WORK?
    XD

  • @ashrafulislamemon8782
    @ashrafulislamemon8782 20 วันที่ผ่านมา

    I am stuck at git clone

  • @tchen8124
    @tchen8124 6 หลายเดือนก่อน

    What’s the point of using elevenlabs? Without carefully finetuning, the voice sounds robotic anyway. Kinda a waste of money

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      What do you suggest I use? I looked for fast TTS AI services and stumbled upon Elevenlabs and did not ask too many questions. The whole point was trying to recreate Jarvis from Iron Man which has a robotic voice. It cost me a dollar for 30,000 characters

    • @kyouko5363
      @kyouko5363 6 หลายเดือนก่อน

      ​@@alexandresajus I'm tempted to make a suggestion here but.. if it gets too popular I might not be able to use it anymore. I can't afford API keys, and rely on it every day to ingest documentation and large pieces of text without interrupting my programming. Even made a private Neovim plugin for it.. as for LLMs.. I am *this* close to saying to hell with it and writing a daemon or local webserver or something that'll instruct Selenium to forward queries and responses on a headless Chromium instance. I'm tired of there being no free API keys for LLMs, not even rate limited ones, when the browser experience is free to begin with, but the moment I want to see the text in my terminal and respond in my terminal, it suddenly costs money, despite me technically having reduced their server load by skipping all the unnecessary CSS, HTML and JS every time I want to just send and receive a goddamned string? I *thought* ChatGPT had a free rate limited API key, and conveniently around the time it became part of my workflow, the API credits equivalent of a free trial runs out, almost as if to give you a cake and then take it right back after the first bite. I'm rambling. But hey, at least I've got good TTS for free.

  • @PHG_Team
    @PHG_Team 6 หลายเดือนก่อน

    bruh
    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for pyarrow
    Failed to build pyarrow
    ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      This is probably due to a Python version issue: you are probably using Python 3.12 and this project uses Taipy which only supports Python 3.8 to 3.11. Please try using another Python version. If this does not help, do not hesitate to give more details on the issue here: github.com/AlexandreSajus/JARVIS/issues

    • @PHG_Team
      @PHG_Team 6 หลายเดือนก่อน +1

      ​​@@alexandresajusthx bro. If i delete display.py the assistant works? I want to create mine gui

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน

      @@PHG_Team Yes you can delete display.py, both programs are independant.

    • @PHG_Team
      @PHG_Team 6 หลายเดือนก่อน

      i'm italian adn i want to change speaking lenguage how can i do?@@alexandresajus

  • @Mirkolinori
    @Mirkolinori 22 วันที่ผ่านมา

    Good Idea but Eleven Labs is to expensive, the price is more then horrible for live tts… better you use the build in OpenAi tts. Also you can use the openai api whisper, assistant gpt and tts… all with easy tts. Quick cheap and easy

  • @n00ter99
    @n00ter99 6 หลายเดือนก่อน +1

    That latency is painful

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +2

      Agreed, unfortunately that latency is very hard to shave off. We could probably reduce it a bit by hosting locally, using quantized/smaller models and streaming the data instead of doing each task sequentially

    • @chrsl3
      @chrsl3 6 หลายเดือนก่อน +1

      it works so wonderfully, i wouldn't be bothered at all by the small latency.

    • @n00ter99
      @n00ter99 6 หลายเดือนก่อน +1

      @@alexandresajus Measure the latencies of the things you mentioned - you'll find that implementing streaming all the way across the stack will solve most of it. I have spent the last year building low latency streaming models in order to get sub 100-millisecond latencies for various audio/speech startups, it's the only way to get speeds and responsiveness that feels natural

    • @alexandresajus
      @alexandresajus  6 หลายเดือนก่อน +1

      ​@@n00ter99 I did profiling on each task and we are at about 1s for transcribing, 1s for gpt and 2s for generating audio. Really? Where can I find how to do this? What models/services were you using?