Hello NanoNomad, do i need to first train HiFiGAN vocoder then Glow-TTS model with vocoder, or for Glow-TTS vocoder is not needed? I'm trying to train model for slavic language.... Any sugestion would be appreciated... BTW. i'm new in this topic... :)
Hi, Sorry I missed your earlier comments. I'm not actively working on anything for this channel anymore. I don't have enough experience with GlowTTS to give a good answer for that. I found VITS, Tortoise, Yourtts, and Xtts easier to work with and train so I stuck with those. A lot of the scripts and methods used in the videos here are probably very out of date now. Coqui TTS as a company/project is no longer in business. There is a community fork of the Coqui source code that is still being updated, but I havent followed it closely. The community fork of Coqui does have XTTS fine tuning support, but I dont think it has slavic support out-of-the-box. For XTTS there is this project I found for training additional languages: github.com/anhnh2002/XTTSv2-Finetuning-for-New-Languages
I think the text prompts are stored in the config.json I just had to copy random sentences from an online learning document. I don't speak the language, so I have no idea what the sentence actually says. It was devanagiri though.
There are no text cleaners in coqui tts for Hindi at all. You need to look at the coqui code and understand how the input is being handled. Numbers need to be written out in a verbal form until someone writes a proper text handler.
can i use this checkpoint through styletts2 by configuring the checkpoints and config to the one compared to this? also, whats the difference between config.json and config.yaml, what would be difference in say best_model_10759.pth and best_model_53795.pth
Every voice is a clone, because you need to supply reference audio samples when doing inference. Fine tuning just guides the model to being closer to the reference samples. There are no text cleaners for Hindi in coqui tts, so every number is going to need to be written/spelled out, no acronyms, etc.. someone probably needs to look at the punctuation handling in the text cleaner code for hindi to make sure the pauses are being handled correctly.
really helpful video thanks for giving such informative videos Great work 👍 👏
thansk for the video but the audio still feels ai generated..with incorrect pauses..any way to make this as flawless as english ??
Could you please help me to decide what TTS model(s) is fit for faceless yt videos?
Would it be possible to show how to do Slavic language model?
Hello NanoNomad,
do i need to first train HiFiGAN vocoder then Glow-TTS model with vocoder, or for Glow-TTS vocoder is not needed? I'm trying to train model for slavic language....
Any sugestion would be appreciated... BTW. i'm new in this topic... :)
Hi,
Sorry I missed your earlier comments. I'm not actively working on anything for this channel anymore. I don't have enough experience with GlowTTS to give a good answer for that. I found VITS, Tortoise, Yourtts, and Xtts easier to work with and train so I stuck with those.
A lot of the scripts and methods used in the videos here are probably very out of date now. Coqui TTS as a company/project is no longer in business. There is a community fork of the Coqui source code that is still being updated, but I havent followed it closely. The community fork of Coqui does have XTTS fine tuning support, but I dont think it has slavic support out-of-the-box.
For XTTS there is this project I found for training additional languages: github.com/anhnh2002/XTTSv2-Finetuning-for-New-Languages
How can we add a new language, so that we can clone in to that language using coquii
can you please share the text prompt you gave to generate the audio you shared in the video? was it in latin or devanagiri
I think the text prompts are stored in the config.json I just had to copy random sentences from an online learning document. I don't speak the language, so I have no idea what the sentence actually says. It was devanagiri though.
hey the overall audio sounds great but when there is a number in between the hindi text we get a muffled audio for the number part
There are no text cleaners in coqui tts for Hindi at all. You need to look at the coqui code and understand how the input is being handled. Numbers need to be written out in a verbal form until someone writes a proper text handler.
can i use this checkpoint through styletts2 by configuring the checkpoints and config to the one compared to this? also, whats the difference between config.json and config.yaml, what would be difference in say best_model_10759.pth and best_model_53795.pth
Styletts2 is a different model architecture and not compatible
Great stuff!
Can we clone the voice by using this ?
Every voice is a clone, because you need to supply reference audio samples when doing inference. Fine tuning just guides the model to being closer to the reference samples.
There are no text cleaners for Hindi in coqui tts, so every number is going to need to be written/spelled out, no acronyms, etc.. someone probably needs to look at the punctuation handling in the text cleaner code for hindi to make sure the pauses are being handled correctly.
hindi audio is indeed not good.. english is really good....