Raspberry Pi | Local TTS | High Quality | Faster Realtime with Piper TTS
ฝัง
- เผยแพร่เมื่อ 29 มิ.ย. 2024
- Tutorial showing you how to use Piper as locally running, high quality TTS service in less than 5 minutes! It's faster than realtime on a Raspberry Pi 3!!!
Works standalone or as Home Assistant integration.
Check my step by step tutorial to clone your own voice using Piper TTS locally.
* • Create your AI digital...
#raspberry #raspberrypi #tts #texttospeech #privacy #performance #quality
* github.com/rhasspy/piper
* github.com/synesthesiam
00:00 Intro
00:50 README of Piper
01:53 Installation of Piper and TTS Voice model
03:20 Testing Piper
Please subscribe to my channel 😊.
th-cam.com/users/ThorstenMue...
---
- www.Thorsten-Voice.de
- github.com/thorstenMueller/Th... - วิทยาศาสตร์และเทคโนโลยี
🎯 Key Takeaways for quick navigation:
00:51 Piper *is a fast, locally running neural text-to-speech system optimized for Raspberry Pi 4.*
01:20 Piper *supports integration with home assistant, a popular smart home software, allowing for voice control.*
02:02 To *set up Piper, download and unzip the Piper executable, then download an international text-to-speech model of your choice.*
03:21 Using *Piper is straightforward; input text is taken from standard input, piped to the Piper process along with the selected model, and the synthesized output is saved.*
04:10 Piper *generates output audio faster than real-time, with a real-time factor (RTF) value less than one, showcasing its efficiency on small compute devices like the Raspberry Pi.*
Made with HARPA AI
Thank you 😊. I'm not sure, but did you do this "Harpa AI" magic on another of my videos, too? As this is really helpful so comment is pinned.
Fantastic video. Learned a great deal about Linux and this tool. Thank you for posting it.
Thanks a lot for your nice feedback and glad you find it useful 😊.
Thank you man! I've tried a bunch of free TTS for my raspi project: espeak, flite, pyttsx3, and some other, they all sound robotic and unnatural for me. Piper TTS is just so good and it's surprisingly fast in the raspi 4. One thing to note, the voice model downloads doesn't come with the json file now so you have to grab it yourself.
Thanks for your nice comment 😊. Hasn't it always been two downloads - onnx model and json config file?
oh dear down another rabbit hole i go
I'm sorry 😆
This comment made me laugh so hard. Glad I'm not the only one! :)
Thank you for posting the video. I am in the process of building a new robot and wanted a better quality TTS engine/voice for my WaLi: Wall follower Looking for Intelligence robot. I chose arctic-medium at 50% higher sample rate. (Current robots use espeak-ng voices.) Loving rhasspy/piper-tts!
You're welcome, i'm glad that you can use piper-tts for your robot 😊.
That was nice.. Thanks
You're welcome 😊.
great !
Thanks 😊.
This looks good! Could you make a tutorial on how to do the voice cloning/learning for Piper?
Thanks 😊 and yes - this will be one of my next video tutorial topics.
@@ThorstenMueller Glad to hear! Subscribed.
@@DoubleBob Thank you and welcome 😊.
Hi DoubleBob, just to keep you updated. The Piper TTS voice clone tutorial is now online 😊: th-cam.com/video/b_we_jma220/w-d-xo.html
@@ThorstenMueller Very cool! Thanks for notifying me.
Thanks a lot! I think it will be useful on my slot car project. Any advice on how to obtain as fast as possible real time response (less than 1 second)?
Most Piper TTS models can be used faster than real time and there's a stream feature available or coming next 😊.
Hey @ThorstenMueller nice video, the voice is indeed quite decent, do you by any chance know or can point to a tutorial on how to configure speech dispatcher with piper?
Thanks for your nice feedback. I'm not sure what you mean with "speech dispatcher"?
Hey @@ThorstenMueller, to be honest, I'm not quite sure yet, I'm fairly new to the TTS world but from what I have found, it seems to be a Linux only thing, like a wrapper that allows to use any TTS engine in a uniform way for all the other programs, most of the programs use this to synthesize audio from text
Thanks for a good introduction video.
Is there a way to make the speaker take a pause after a saying a word? For instance, if I want the speaker to give me a list of items: You need: A rope, (pause), Scissors, (pause), Paper, (pause), and a flashlight.
Thanks for your nice feedback 😊. I'm not sure if this works in Piper, but this should work in Mimic 3 by Mycroft AI (same developer as Piper). Mimic 3 supports SSML syntax. I've created a tutorial about Mimic 3, maybe it's useful for you. th-cam.com/video/bCZlS6I84Go/w-d-xo.html
@@ThorstenMueller Thanks, I'll have a look at it!
Very useful video, thanks a lot! I wonder if there is something like this for Windows?
Yes, Piper TTS runs fine on Windows, too 😊.
I made a video on how to set it up too. Do you know this?
th-cam.com/video/GGvdq3giiTQ/w-d-xo.htmlsi=iZGkYcOY2FjqdVGn
Would you consider a video using piper recording studio, to create your own voice recordings? I tried it but got stuck. Fortunately Mike took my recordings as a donation and may be available in the next release. But i would still like to see the full process. I am running Ubuntu 20.04
Thanks for the videos. I find them very helpful.
Joe
Thanks for your voice contribution 👏. I've added Piper-Recording-Studio on my TODO list. In the meantime do you know Mimic-Recording-Studio?
th-cam.com/video/4YT8WZT_x48/w-d-xo.html
I second this 🙌🏻
Also, how to use other datasets for piper? 1150 sentences is a big time commitment and i already have previous transcribed datasets I'd like to use if possible.
I'm waiting for the detailed process of the training. I just can't wait, Mr. Thorsten.
Would you say it comes close to the quality of Coqui TTS? Coqui is good, but it takes a long time to initialize. For small short sentences that always have to be regenerated, rather bad.
What about the German voice?
Thanks ^^
Mostly depending on the models. Some Coqui TTS models are probably better than Piper models, some the other way around. But quality is subjective. I think, that longer time to synthesize for most Coqui TTS models is an argument for more quality.
Piper supports multiple german models, including mine 😉.
Piper TTS is a very good TTS for the pipeline I am developing, but I want to be able to edit the source code so it doesn't print out messages everytime it runs. Can you do a tutorial where you build it from source?
I've never tried building it manually myself, but as it's open source there should be a way to setup a build pipeline locally. But maybe it's worth to open an "issue" on the project to add a switch to turn on/off that message.
Does this work for a chatbot application? If it’s so small I wonder if you built an app could you offload the processing onto the smart phone or tablet. 🤔
Piper TTS models has nice performance, that could work for a chatbot. Depending on the performance of your computer you're running Piper TTS on. By now i guess Piper can not run on an android or ios device.
Really need this to run on android
I agree, this would really be amazing. Maybe you can join this discussion: github.com/rhasspy/piper/issues/103
Hello, would you be able to provide instructions on installing Piper TTS, and guiding through the process of training a voice using available data on Windows, please?
Thanks for your feedback and topic suggestion. Right now Piper is not officially supported on Windows. But maybe i can make a tutorial for Docker or WSL. What do you think?
@@ThorstenMueller That’s great, I’m really looking forward to your step-by-step guide with Docker. Hope you will complete the video soon, thank you.
Hallo Thorsten,
danke für das super Video. Ich versuche gerade Piper in meiner Homeassistant installation mittels dem Addon Store zu installieren. Leider kommt immer die Fehlermeldung: Dieses Add-on ist nicht mit dem Prozessor oder Betriebssystem deines Geräts kompatibel. Verwende einen Raspberry 4 mit 2 GB Arbeitsspeicher. Eigentlich sollte es doch kompatibel sein. Das einzige was ich gesehen habe, ist das ich nur 32 bit habe. Kannst Du mir da vielleicht helfen?
Freut mich, dass Dir das Video gefällt 😊. Ob das Problem an einer 32 Bit Version liegt kann ich nicht beurteilen. Ich habe gerade mal bei den Piper Issues nach deinen Problem gesucht und einen Beitrag für eine spezielle 32 Bit Version gefunden. Habe nicht genauer reingeschaut, aber vielleicht hilft es Dir ja schon etwas weiter.
github.com/rhasspy/piper/issues/67#issuecomment-1593594543
@@ThorstenMueller hallo Thorsten,
danke für deine schnelle Antwort und deine Hilfe. die 32 bit version die du mir geschickt hast, kann ich aber nicht als add on für HA installieren oder? Sorry bin nicht so fit in den sachen. Kann ich die Version denn "manuell" installieren und dann mit HA verknüpfen? Oder weisst Du ob ich den Raspberry "einfach" auf 64 bit bekomme?
@@fred1459 Gute Frage, als HA Addon habe ich die 32 Bit Variante noch nicht benötigt. Vielleicht kannst Du die Frage mal in der HA oder Rhasspy/Piper Community stellen - die können Dir sicher besser weiterhelfen, als ich das aktuell kann 😊.
Would cloning make this more expressive or just copy the voice characteristics?
I'm not sure if i understand your question right. If you clone your voice (eg. with Piper TTS) it will try to copy your voice as you pre-recorded it in your audio voice dataset. It will not add any emotions or expressions. Is this what you meant?
Btw. this is my tutorial on voice cloning with Piper TTS: th-cam.com/video/b_we_jma220/w-d-xo.html
@@ThorstenMueller I have been trying to train Tortoise but to no avail. I thought maybe this would be a good second option. I am just looking for a better cadence or prosody, so it does not sound so robotic. I tried to implement the steps you provided in your new tutorial on training Piper but I keep getting errors. I will take another stab at it next week. Thank you for all your work on these tutorials they have helped me.
Hallo, ich Versuche gerade Piper auf einem Raspberry pi Zero zum laufen zu bringen. Beim Run bekomme ich Maschinenfehler bzw Speicherzugriffsfehler. Läuft das auf dem Zero nicht? Ich nutze die Piper armv7
Hallo, das ist eine gute Frage. Bin nicht komplett sicher und habe auch keinen Pi Zero zum testen. Ich meine aber mal gelesen zu haben, dass es darauf nicht funktioniert.
Do you think this will work on RPi Zero? What do you think the performance will be?
From what i've read a RPI Zero is not supported by it's architecture, but as i don't have a Pi Zero i cannot try it myself.
Is it possible to run a Pipe server locally, so I can make it's voices available to be used with other programs, such as TTSVoiceWizard?
Sorry if it's double comments, I think my other one got deleted because of the link. Thanks in advance for any help.
That's a good question. I'll ask your question (Piper TTS server process) during my interview with Mike, so stay tuned for this video to (hopefully) get an answer 😊.
Awesome, thank you so very much! @@ThorstenMueller
hy i installed piper tts and i don t know after you end to run in terminal llama AI to speech ?
I'm not sure if i understand your question right. Do you mean using Piper TTS with Llama LLM or PrivateGPT? Do you have a timecode in that video which brings up your question?
how to fix the mac install problem【ERROR: Could not find a version that satisfies the requirement piper-phonemize (from versions: none)】
I guess, i've seen your issue on their Github repo ;-). Have you already seen this issue with some maybe helpful ideas?
github.com/rhasspy/piper/issues/27
thanks , i run it success in the ubuntu@@ThorstenMueller
absolutely terriffic ! Not only your video, but especially Piper.
It's super easy to include in python code. (For me it took a while to find out how..)
For my impression, the thorsten-medium voice runs a little bit too fast (word to word), I'd still like to know how to slow it down.
Thanks for your feedback :-). Maybe you can post process the output to reduce speed with ffmpeg or pydub. Might this be an option? Btw. i'm training a thorsten-high model at the moment.
Hey andreas! I am quite struggling with including piper in python actually. I run the executable no problem, but I would like it to be incorporated in my python program so as the loading time would only happen once (upon startup), and keep it in memory. From your experience, is this indeed possible? If so, any chance of a short explanation? Thanks a ton :)
Uri
"it took a while to find out how" Please post a link to an example. My robot would like this. If the delay is too long, I'm thinking to search a folder of past TTS and only call Piper if the phrase has not already been generated
@@urishmueli284 th-cam.com/video/2qCTx6OFs90/w-d-xo.html
@@cyclicalobsessive th-cam.com/video/2qCTx6OFs90/w-d-xo.html
Hello! How are you. I need to train a model with a Spanish accent from Argentina (I have good GPUs) any clue on how to achieve it? Thank you genius.
Thanks - i'm doing fine, hope you do too 😊. Do you have access to a useable voice dataset in spanish with argentinian accent?
@@ThorstenMueller yes I have ! train many successful models with so-vits-svc , but another approach. work with between 15 and 45 minutes of samples to get good results. I am looking for how to make a high quality txt to speech in Argentine. which is the procedure? how long for samples? thank you .
@@b1ll1on_ai I guess i can't answer that. When you have good results with 15-45 minutes of training data maybe this might be a working value for an Argentinian model too.
I followed your steps exactly and it did not run, i get piper is a directory which it is so i moved everything into piper/piper and get a different error. I even made the piper executable but then got a further error. Very frustrating, but thank you for posting this
Can you run the piper executeable with an "--help" command line argument? Does this show the Piper help output?
@@ThorstenMueller yes i can run ./piper --help from within the piper directory and the help file file displays. The error indicates there is a problem with the JSON file
@@justinanthony8858 A syntax error in JSON file or a wrong value for a specific key?
@@ThorstenMueller I'm not sure Ill have to check when I get chance, but ill raise the issue on the git pages, thanks
How can we make a TTS read a pdf book for us?
That's a highly asked question and requested feature 😊. PDF input is not supported by now. Maybe you can save the PDF als text and then split it into smaller chunks that can be synthesized. But it's by now more like a workaround. Maybe a good idea to discuss that on Piper TTS Github community - maybe a feature for that will be implemented in future.
How do I use this in python instead of local command line?
Piper TTS Python integration is not optimal by now - i've talked to Mike on this. But this question/topic will be part of my interview with him (creator of Piper TTS).
how to use it in python code?
I've talked with Mike Hansen about that feature recently and there's still work-in-progress when it comes to native python usage. But you can run it as extra process and use the results this way. Does this help you?
Wann kommt de_DE-thorsten-medium für Homeassistant? Kann man diese Version irgendwo herunterladen? Anhören geht, Download nirgends gefunden
Hi, ich stehe im Austausch mit Michael Hansen und möchte demnächst ein Piper "Thorsten (high)" trainieren. Danach kann ich ja mal die Aufnahme in Home Assistant klären.
@@ThorstenMueller uiii! Das ist ja perfekt. Ich freu mich drauf. Bislang definitiv die beste deutsche Stimme 👏🏻😍
War sehr begeistert als ich sie das erste mal gehört habe. Und das bereits auf „low“
@@redrox7657 Das freut mich natürlich sehr 😊. Nur um dich auf dem Laufenden zu halten. Das Thorsten-Voice "high" Model-Training läuft aktuell 🙂.
Hallo Thorsten, ich habe all die Schritte im Video gefolgt. Nachdem ich den letzten Befehl ausgeführt hatte, kam eine unerwartete Meldung:
-bash: ./piper: Is a directory
Und ich habe nochmal geschaut, der heruntergeladene Folder heißt "piper", und darin enthält die binary Datei "piper". Das heißt, die benötigte Datei befindet sich im ~/piper/piper/piper.
Die Struktur ist ein bisschen verwirrend.
Und es gibt einen Fehler bei Benennung von Voice "en_GB-alba-medium" (nicht mit diesem Video zu tun, vielleicht kannst du an Michael Hansen sagen?):
.onnx heißt "en_GB-alba-medium.onnx", aber die zwei anderen heißen "en_en_GB_alba_medium_MODEL_CARD" und "en_en_GB_alba_medium_en_GB-alba-medium.onnx.json".
Es sieht nicht ganz richtig aus.
Übrigens, jetzt sollte in meinem Fall den Befehl zum Testen so aussehen:
echo 'Welcome to the world of speech synthesis!' | ./piper/piper --model en_GB-alba-medium.onnx --config en_GB-alba-medium.onnx.json --output_file welcome.wav
Ja, auf die verschachtelte Ordnerstruktur und dem Namen "piper" bin ich auch schon reingefallen 😆. Ich habe gerade mal im Huggingface Download der Piper Stimmen geschaut und da sehen die Dateinamen für mich richtig aus oder schaue ich an der falschen Stelle?
huggingface.co/rhasspy/piper-voices/tree/v1.0.0/en/en_GB/alba/medium
Die Namen auf Website sind richtig. Mein PC hat die Namen automatisch geändert. Ich weiß nicht, warum das passiert. :\@@ThorstenMueller
@@alexanderyang126 Das ist merkwürdig. Bei Sonderzeichen oder Leerzeichen im Dateinamen hätte ich mir vorstellen können, dass dein Computer die Pfade automatisch anpasst (maskiert), aber in den Dateinamen sollte das eigentlich nicht der Fall sein.