Piper TTS (Nearly realtime TTS, and demo of a UI work in progress)

Natlamir

มุมมอง 3 948

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 29 มิ.ย. 2024
Piper Github: github.com/rhasspy/piper
TH-cam for Thorsten-Voice: • TEXT TO SPEECH | Piper...
0:00 Intro
0:40 Install
1:59 Usage
2:55 UI Demo
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 28

@synesthesiam 5 หลายเดือนก่อน ⁺¹³
Piper author here, nice video! Just a quick note: the majority of the runtime for Piper is often just loading the voice model into memory. You can avoid doing this multiple times if you keep the Piper process running and feed lines into its stdin. It will print the path the WAV files it generates on stdout. I usually use "--output_dir" to control the directory. It's also possible to use "--json_input" to send lines of JSON into Piper instead with "text", "output_file", and (optionally) "speaker" fields (for multi-speaker models).
@LargoatMcGoat 5 หลายเดือนก่อน
This is great news. I've been making an app for a friend to use when streaming, so this method would improve the performance.
It would be good if we could get a Library though to do this rather than calling an external app 😁
@Ravisidharthan 3 หลายเดือนก่อน ⁺¹
Hey man thanks for your work..
Can you make an offline version for apple metal? It would be so helpfull
@LucidFirAI 3 หลายเดือนก่อน
Hi Piper author. Piper is insanely fast and has very good output. It would be great if there was an easier way to train custom voices for it, as I'm following Natlamir's guide currently to no avail. Thanks
@synesthesiam 3 หลายเดือนก่อน ⁺¹
@@RavisidharthanYou're welcome! I believe the onnx runtime that Piper uses for machine learning has a CoreML backend.
@SosyalMedyaArge-so5bs 7 หลายเดือนก่อน
Much more meaningful and practical than the turtle tts and python programming language's infamy in version and package issues. thanks really.
@ocin3055 7 หลายเดือนก่อน ⁺²
Good new Voice!
@Natlamir 7 หลายเดือนก่อน
thanks 🙏
@Hugozen 7 หลายเดือนก่อน ⁺¹
How I miss your old voice :) great work.
@Natlamir 7 หลายเดือนก่อน
haha thanks. me too. i would like to find a way to create an onnx trained model of that voice as i have the audio clips for the dataset, and if i could create that onnx and json model for it, and have it generate with that voice in less than half a second, that would be awesome.
@Hugozen 7 หลายเดือนก่อน
RVC is currently my fav due to its fast model training speed. Too sad it can't generate voice with text directly, good to see you, our hero, still exploring. Looking forward to your new related videos @@Natlamir
@Natlamir 7 หลายเดือนก่อน
@@Hugozen🙏thanks!
@tahasoft1 3 วันที่ผ่านมา
Is it possible to compile it to webassembly to be able to use in a chrome extension for read aloud (text to speech) as offline voices?
@gkhndnc 7 หลายเดือนก่อน
man you are amazing. Whatever I need, you shout in the simplest form, like someone who says "come to me, I have everything here". I am a virtual assistant for my own company and I was looking for real-time Turkish voice-over for free. I will definitely use this. Now, those who call my company will talk to artificial intelligence with this voice. You are super!!!! The only thing missing is realtime lipsynic. I'm looking forward to this too, I'm sure you'll find it.
@Natlamir 7 หลายเดือนก่อน
thanks. real time lip sync would be interesting, i was thinking of having the generated audio being input to wav2lip, but then wav2lip takes around 30 seconds for a short clip, so would be the bottleneck in being able to do a real time lip sync.
@Mehdi0montahw 7 หลายเดือนก่อน ⁺¹
You really did it. Several languages. You care about the requests of your followers. Thank you for your humility It looks interesting. I will try it quickly in Arabic and French and give you an honest opinion
@Natlamir 7 หลายเดือนก่อน ⁺¹
thank you! 🙏
@zakuro8532 หลายเดือนก่อน
Is there a way to generate tts in real time during text input stream?
@guile3d 7 หลายเดือนก่อน
Thank you for the video @nathlamir!
@Natlamir 7 หลายเดือนก่อน
🙏
@DucNguyen-99 7 หลายเดือนก่อน
Can you use LLava model to read a video file.
Like extract every single frames from the video then use LLava to read them. Then Use tts model to speak what the video is talking about ?
@Natlamir 7 หลายเดือนก่อน ⁺¹
yes, that is technically possible to do. i imagine the part that would take the longest would be the result from llava for each frame, i think it can take a while to return results. store all of the text responses and feed them to a local llm asking it to summarize and exclude / combine similar outputs to avoid duplication from 1 frame to the next that might look the same, then the output from llm runs through piper and we get the final result. sounds interesting.
@gourcuff72 5 หลายเดือนก่อน
i get api is missing error
@gourcuff72 5 หลายเดือนก่อน
api-ms-win-core-heap-l2-1-0.dll
@DzulkifleeTaib 7 หลายเดือนก่อน
What happen to your voice lmao😅
@Natlamir 7 หลายเดือนก่อน
haha i am going to see what else is available with piper and other workflows, i do really like the speed with piper.
@Mehdi0montahw 7 หลายเดือนก่อน ⁺¹
It worked for me for both languages: Arabic is average and French is excellent. I am impatiently waiting for your UI version. Where can I find other voices for all languages and how can I make the voices trained in RVC WebUI work with it? Thank you.
@Natlamir 7 หลายเดือนก่อน ⁺²
i just finished uploading the video and code / exe for the UI. for the different languages that are available in the huggingface linked on the piper site, they should accessible from huggingface or through the Languages dropdown in the UI. that is a good question about RVC trained voices work with it, i too would like that. i am doing research and will let you know what i find.

ต่อไป

เล่นอัตโนมัติ

Train / Finetune Custom Voice With Piper TTS