Piper TTS (Nearly realtime TTS, and demo of a UI work in progress)
ฝัง
- เผยแพร่เมื่อ 29 มิ.ย. 2024
- Piper Github: github.com/rhasspy/piper
TH-cam for Thorsten-Voice: • TEXT TO SPEECH | Piper...
0:00 Intro
0:40 Install
1:59 Usage
2:55 UI Demo - วิทยาศาสตร์และเทคโนโลยี
Piper author here, nice video! Just a quick note: the majority of the runtime for Piper is often just loading the voice model into memory. You can avoid doing this multiple times if you keep the Piper process running and feed lines into its stdin. It will print the path the WAV files it generates on stdout. I usually use "--output_dir" to control the directory. It's also possible to use "--json_input" to send lines of JSON into Piper instead with "text", "output_file", and (optionally) "speaker" fields (for multi-speaker models).
This is great news. I've been making an app for a friend to use when streaming, so this method would improve the performance.
It would be good if we could get a Library though to do this rather than calling an external app 😁
Hey man thanks for your work..
Can you make an offline version for apple metal? It would be so helpfull
Hi Piper author. Piper is insanely fast and has very good output. It would be great if there was an easier way to train custom voices for it, as I'm following Natlamir's guide currently to no avail. Thanks
@@RavisidharthanYou're welcome! I believe the onnx runtime that Piper uses for machine learning has a CoreML backend.
Much more meaningful and practical than the turtle tts and python programming language's infamy in version and package issues. thanks really.
Good new Voice!
thanks 🙏
How I miss your old voice :) great work.
haha thanks. me too. i would like to find a way to create an onnx trained model of that voice as i have the audio clips for the dataset, and if i could create that onnx and json model for it, and have it generate with that voice in less than half a second, that would be awesome.
RVC is currently my fav due to its fast model training speed. Too sad it can't generate voice with text directly, good to see you, our hero, still exploring. Looking forward to your new related videos @@Natlamir
@@Hugozen🙏thanks!
Is it possible to compile it to webassembly to be able to use in a chrome extension for read aloud (text to speech) as offline voices?
man you are amazing. Whatever I need, you shout in the simplest form, like someone who says "come to me, I have everything here". I am a virtual assistant for my own company and I was looking for real-time Turkish voice-over for free. I will definitely use this. Now, those who call my company will talk to artificial intelligence with this voice. You are super!!!! The only thing missing is realtime lipsynic. I'm looking forward to this too, I'm sure you'll find it.
thanks. real time lip sync would be interesting, i was thinking of having the generated audio being input to wav2lip, but then wav2lip takes around 30 seconds for a short clip, so would be the bottleneck in being able to do a real time lip sync.
You really did it. Several languages. You care about the requests of your followers. Thank you for your humility It looks interesting. I will try it quickly in Arabic and French and give you an honest opinion
thank you! 🙏
Is there a way to generate tts in real time during text input stream?
Thank you for the video @nathlamir!
🙏
Can you use LLava model to read a video file.
Like extract every single frames from the video then use LLava to read them. Then Use tts model to speak what the video is talking about ?
yes, that is technically possible to do. i imagine the part that would take the longest would be the result from llava for each frame, i think it can take a while to return results. store all of the text responses and feed them to a local llm asking it to summarize and exclude / combine similar outputs to avoid duplication from 1 frame to the next that might look the same, then the output from llm runs through piper and we get the final result. sounds interesting.
i get api is missing error
api-ms-win-core-heap-l2-1-0.dll
What happen to your voice lmao😅
haha i am going to see what else is available with piper and other workflows, i do really like the speed with piper.
It worked for me for both languages: Arabic is average and French is excellent. I am impatiently waiting for your UI version. Where can I find other voices for all languages and how can I make the voices trained in RVC WebUI work with it? Thank you.
i just finished uploading the video and code / exe for the UI. for the different languages that are available in the huggingface linked on the piper site, they should accessible from huggingface or through the Languages dropdown in the UI. that is a good question about RVC trained voices work with it, i too would like that. i am doing research and will let you know what i find.