How to use
ฝัง
- เผยแพร่เมื่อ 7 ม.ค. 2025
- I've used the #SpeechRecognition Python Library extensively in many of projects on my channel, but I will need an offline speech recognition library for future projects. So in this video, I'll be showing you how to install #vosk the offline speech recognition library for Python.
If you're on windows, download the appropriate #pyaudio .whl file here prior to pip installing vosk: www.lfd.uci.ed...
You can download the model you need here: alphacephei.co...
Tip Jar:
Bitcoin: 1AkfvhGPvTXMnun4mx9D6afBXw5237jF9W
focken hell, this video was buffering in the background and suddenly started playing while i was coding, thought my language model had come alive for a few seconds there lmao
LOL, I don't think my voice would be a great language output.
Thank you so much. This is the only source that works perfectly with no holding back or alteration of material. You are great. You saved my day.
Yay! I'm glad it helped!
Great Tutorial! I used it to add vosk to my own JARVIS system.
I don't know if others mentioned it, but to fix the problem with the mic not being able to be shared you can change 1 line of code in your example.
change:
data = stream.read(4096)
to
data = stream.read(4096, exception_on_overflow = False)
I can talk to my assistant and have OBS recording from the same microphone at the same time.
Awesome. Thanks for the contribution too. I will need that for sure.
I've wanted to do this project ever since the IBM activa commercial way back in like 1990 :D looks like the tech is finally getting close!
Me too. Well circa 1999/2000, but then I abandoned my programming dreams for nearly two decades before starting again.
Muito legal irmão!!! Sei que fala inglês, mas traduzir é fácil. Adorei o vídeo. Estou iniciando em python e estava com dificuldades de converter fala em texto, principalmente pela questão de ser "off-line" o que eu preciso! Muito obrigado!!!
It was an awesome tutorial and exactly what I was looking for, thanks very much.
Thank you for the positive feedback!
very glad vosk some more attention. I think it's very underrated
I'm using the speech_recognition library for my digital assistant, but I'm going to replace it with vosk going forward.
This is just what I've been looking for. Thanks brandon🖒
I didn't see this notification one year ago, but just wanted to say I'm glad it helped and I hope you've done some great projects since then.
The output is a json file that’s why it’s displayed like this. Use the json module to extract the text out of it
@@Zaddish2 can you show me please how to write the voice text in a .txt file like every spoken word in a line
File "C:\Python\Python37\lib\site-packages\vosk\__init__.py", line 138, in __init__
self._handle = _c.vosk_recognizer_new(args[0]._handle, args[1])
AttributeError: 'str' object has no attribute '_handle'
Got the vosk and pyaudio installed without any problems, however when I try to run the script as written in the video it's unable to locate the model location as it tosses the above error at me.
11:38
Better way is to do
Line 20: text = json.loads(text)
Line 21: print(text['text'])
awesome was looking for documentation for vosk online and it wasnt much...and they few others were either medium articles or stack overflow...but most solid source ive seen is this video...i just which it show how write it to a file, and the key interrupt that you were talking about. sorry for the rant but awesome video
I'm self taught so the lack of information on Vosk is what inspired me to create the video. I'm glad it helped.
@@BrandonJacobson thanks im self teaching as well
Awesome tutorial! Exactly what I needed.
Awesome! I'm glad it helped.
4:06 Can't i just copy and paste the file itself instead of screwing with command prompt and all the other?
also pyaudio is now installable on python 3.10 on windows with pip I just tried it for giggles,, and it worked :D
7:28 This isn't quite correct, yes it does store that string in that variable, which is the absolute path.
The error mentioned at 7:34 isn't because you're not giving the absolute path, the reason why it may give an error, is because there may be an invalid character after the \ character
With vosk can you like manually interrupt the recording process? Meaning, let's say, I press a button to take input from vosk, can I interupt that process and turn off recording whenever I want?
Is there a way to take the vosk model and add words that it can recognize?
It works very well for French model thank you !
Excellent tutorial, Thank You so much!
Awesome! I'm glad it helped!
This is very useful. Thank you :)
Awesome! I'm glad it helped. I'm released how to do it on a Raspberry Pi today.
I use Vosk with Kdenlive, Spanish and English. Have not found a Norwegian model. The first to language work smoothly for me.
What kind of project are you using it for?
@@BrandonJacobson Just editing videos, but I have family in Argentina, the UK and I am living in Norway and learning the language.
I'm getting an error when trying to create a Model: "Exception: Failed to create a model"... Do you know how to fix? Thanks.
Thank you for this valuable information. It’s helping me with a project for a homeless shelter. I unfortunately have ran into a problem. I’ve made an “if” statement to perform an action if a hotword/phrase is heard by the mic but I can’t figure out how to interrupt the action if I say another “hotword/phrase. I’d like to be able to say “stop” aloud and it interrupt the action performed by a previous phrase that was said in the “if” statement. Do you have any advice for me on how I can figure this out? Thank you for any help and for the help already!
I haven't experimented with this yet, but it's on my list to figure out. You can try Threading and create a function with the sole existence of listening to a STOP command and interrupts whatever is going on.
@@BrandonJacobson Thank you again! I will try studying your advice and continue to work on it.
Thanks for the tutorial, it's really helpful XD
Awesome. I'm glad it helped.
Can I save the model on my D drive and run it from there with the absolute path?
You decided to go from scratch like "Programming Hero" guy did? Well, it is fun up to some extent. This is where you actually need:
1) a good modularity (like to be able to replace say STT engine of Vosk with Whisper / Whisper.cpp or DeepSpeech or AprilASR), or replace TTS
2) good architecture, ideally multi threaded one, so you can interrupt whatever else your assistant will be doing
3) wake word detection, ideally with some sophisticated model, which is trained to hear only that wake word and nothing else, but does that good and verry effecient
4) dialog management and skills
You can code all that yourself, that's not a problem, but does it make sense to reinvent the wheel? I'd say it makes a lot more sense to take something like Rhasspy or Microft and start from that
It is a dictionary so you can do print(text[“text”]) and it should cleanly display voice commands
This is great. super simple and effective. I am trying to do an NBA play-by-play (speech to text) app. It gets a lot of NBA players names and the "actions" (e.g. Rebound, Assist, Jump Shot, Dunk) correct.
But that said, it doesn't get many names. So I was wondering if you knew if VOSK can train custom models? If not, what would give an OFFLINE inference for a custom model? Any recommendations?
Thanks in advance!
I was just about to warn you about navigating through the diverse names in the NBA. According to the Github you can create your own custom model. Seems a bit beyond my knowledge level, but I hope it helps: github.com/matteo-39/vosk-build-model
thank you bro, i can now start my project
Awesome. Good luck!
Didn't realize that this would solve my biggest nemesis which has been installing PyAudio. Omg thank you!
Awesome! It opens the path to so many more projects.
thank you so much, I have been working on my voice assistant but i couldn't figure a way of working things around when am offline. I had issues using pocket sphinx.
I heard a lot of people were having trouble using pocket sphinx, that's why I tried Vosk. I'm going to try and put it on a Raspberry Pi soon too.
This tutorial is just GRAND. I have written a bot that goes and download various videos from reddit, and then goes and makes compilation videos and inserts my pre recorded intro outro and midrole clips. For the one channel about pets there is nothing more to do, but I am working on another one for dashcam clips which tend to have a lot of curses and wanted a way to programmatically find and bleep them out.
There are pre made tools for this but they are overly complex and heavy.
I think what I will do now that you have given me a basic understanding of how to instantiate vosk and fire an audio stream through it, is after my compilation is done do some "post processing" and just split the audio off with ffmpeg, fire it through vosk and when it finds a word I want to filter, notate the timestamp in a list, and then at each index in the list, throw my bleep audio clip in there, and then ffmpeg the new audio back onto the video and call it a day.
Hello Brandon, very impressed with this tutorial and made me curious to dig further into your books. Books are even more impressive and looking forward to get my hands/eyes on some of them. I am a retired technology hobbyist and have some question about your books. Q1. Have you provided the code of youtube videos and books on Github? Q2. I am not very familiar how does the kindle version works? can I copy paste the text/code from kindle book? Thnaks for your kind reply.
Thanks for the details, Could you please help us understand how "elsa speaking app" pronunciation error identification works. What is the logic and code behind it.
can not import name model in from import model error. could you tell me how to fix this please ?
14:-3 was really helpfull..... 🙌🙌
Awesome! I'm glad to hear that!
There is an issue with vosk website. The site is not working. Is there any other way I can download the model?
Thank you for information sir 🙏
I'm glad it helped!
Hi Brandon, this is an awesome tutorial. Can you pass an audio file (say wav or mp3) instead of Mic input? I have some speech audio that I would like to convert to Text. I can play that and have it captured by Mic, but if I can just pass an audio file, that would be an awesome feature. Please help.
@Alpha Group doesnt work
Error : initializer for ctype 'char *' must be a bytes or list or tuple, not str
Hello How to make language model for vosk
How to select the 4. sound imput device, thats my mic ?
I found it:
stream = mic.open( ..... input_device_index=4, ....... )
For me it was the 4. index.
YEAH it worked very nice. Many thanks for this tut. ❤
Great tutorial!! thank you
Thanks for this video sir. I want to know do we need to repeat step of installing model in each run?... As it takes a lot of time.
The model runs once when you start the program and then it shouldn't after that. It only takes 3 seconds on my computer.
Tell where we paste it in vs code
You r so cute teaching.
Thanks bro for this explication
A great video but I still don't know what Model I need
I need Vosk so I can recognize Anime videos (I will convert the video to audio of Course) and then Use an AI Voice cloning technology to clone the voices in other language
I am not a programmer I know nothing about programming My device is Win 7
What do you advise me to do so I can setup the Vosk to be 100% Compatible with my PC with the fact that my PC is not that High end.
And Thank you in advance.
Similar to English, there's a small and large Japanese model you can find on this website: alphacephei.com/vosk/models.
You should be able to use my code but replace the model with the Japanese model and it should be able to understand most Japanese. I don't know the accuracy of the moral "tonal" languages like Japanese or Chinese. I dabbled in Japanese, I know that one wrong vowel pronunciation can have embarrassing results.
@@BrandonJacobson Thanks
For some odd reason I still cannot import PyAudio. Why is installing PyAudio such a bs overcomplicated issue?
Edit:
Finally figured it out. If you are still having issues downloading PyAudio follow these steps:
1. Open up Pycharm and create a new Python File.
2. Type in the following:
import os
print(os.system("cmd /k pip install pipwin")
3. Run this script. This will install pipwin in the Project Folder.
4. Locate '(venv) [insert Project Folder Location]>' in the Run Module
5.Type in the following and press Enter:
pipwin install pyaudio.
Hopefully this helps anyone that is stuck with the import pyaudio error like I was.
Thank you for contributing like this! It was one of my hopes on my TH-cam channel.
@@BrandonJacobson no problem Brandon. Your video is very good, but I just wanted to help others that were struggling over that PyAudio import since it isn’t exactly intuitive.
How to create a model with only 20 words for example ? I pay for it.
plz make a video how to add Vosk in ai assistant
That's my plan in the future. I used speech_recognition in these videos for an ai assistant. th-cam.com/video/qhITM2q_FyQ/w-d-xo.html and the original here: th-cam.com/video/GYJKjHMBpDI/w-d-xo.html
There can find this for Serbian language?
How to avoid noise cancellation or multiple people speaking same time, how to identify unique voice !
I haven't used vosk in a while, but I think it has two methods to help with this by adjusting --max-alternatives and --word-penalty.
Does this work for mac?
can u please provide the code here??
Will it support other languages? I am trying to help my elderly mom, who speaks Ukrainian/Russian. I tried to have her speak into a Windows PC in Russian to transcribe this text and then translate it to English. Is there a tool for this? For some reason, Windows does not support Russian speech recognition. Yes, offline is a Big plus.
Yes, there are four Russian models: alphacephei.com/vosk/models
@@BrandonJacobson Do you have a walk-through? I am new to programming. What do I need? I have a Windows PC
Thank you so much.
I'm glad it helped!
@@BrandonJacobsonlol only responding to comments that compliment you, grow a spine
@@stinger220 whoah, chill. TH-cam is horrible at sending notifications. Luckily, I saw this one. It said you responded just now, but the comment says 1 day ago. Sorry it seems like I cherry picking comments to respond. I'll do better going forward.
offline ?
thank you
thank you very much bro i need it , crack
if everything works for me, then wait for the donation
instead of manually slicing the result from recognizer.Result(), this worked for me
said = self.recognizer.Result().splitlines()[1].split(sep=":")[1].strip().strip("\"")
anyways.. thanks for the video
I bet that's way faster as well. Thanks for the contribution.
People like must spread idea of personal AI, not corporative AI. Stop give private data to the corporation. Each person must use own AI. GOOGLE AND OTHER COMPANIES! I DO NOT ALLOW TO USE MY IDEAS!
Thank you