Hello, by any chance, do you know of any recently released multilingual TTS models? For English, I've seen that there are many very good ones, but for other languages I only know of XTTS.
thanks for all the responses everyone. I tested out the models and found that Fish was the best one for Portuguese. Its tone quality and voice cloning capabilities are really impressive for an open-source model - much better than XTTS and F5, for example.
Is it possible to generate audio for large amount of text at once with this model? Currently it is only giving me 30sec audio from the provided large text 😅
🔥Create Audiobooks with Kokoro - th-cam.com/video/gAX6wItqrtE/w-d-xo.htmlsi=MnM9RTGrIwbicJ9j
🔥Run Kokoro on Google Colab - th-cam.com/video/gAX6wItqrtE/w-d-xo.htmlsi=MnM9RTGrIwbicJ9j
🔥Install Kokoro Locally - th-cam.com/video/xOa8LgZopq0/w-d-xo.htmlsi=nb2ROt8LLySlwE5T
Hello, by any chance, do you know of any recently released multilingual TTS models? For English, I've seen that there are many very good ones, but for other languages I only know of XTTS.
F5.E2 is one. Fish is good. I'm not sure how many languages it supports though. And then of course EdgeTTS supports tons of languages.
Good response and yes we have also covered those models on the channels already. There are few others too which you can search here. thanks
Cosyvoice 2 is good too.
Supported Language: Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.)
Crosslingual & Mixlingual:Support zero-shot voice cloning for cross-lingual and code-switching scenarios.
thanks for all the responses everyone. I tested out the models and found that Fish was the best one for Portuguese. Its tone quality and voice cloning capabilities are really impressive for an open-source model - much better than XTTS and F5, for example.
Fish Speech v1.5
it looks good for such a small model I'm going to try this on my CPU
Indeed
They say in the Model Card: Supported Languages: American English, British English
thats true
Saludos desde Puerto Rico me encantan tus videos muchas gracias por ellos y no el acento español no es bueno jaja
Gracias por las amables palabras y comentarios sobre el español
Is it possible to generate audio for large amount of text at once with this model? Currently it is only giving me 30sec audio from the provided large text 😅
yes, I just did a video on it. link is in comments
@@fahdmirza ok 👌🏻
Can I create custom voices? How to run the model in a smartphone app?
You can have custom voices and I have done other videos for mobile phone support
Can you make a tutorial on how to install mateogon/pdf-narrator. It is based on Kokoro TTS. It convert PDF to TTS.
sure just did.
How to use this locally with the same GUI on their Demo page?
I just did a video about it
I always comment.
appreciated.
i tried , it not worked , some issues with torch version , i guess
any errors?
kokoro in nexa for next video ;)
sure
how to install on windows?
use WSL
Lol even I can speak Chinese that well!
lol
Hi sir , can you give list of commands for windows/ mac based environment instead of Linux-based environment.
same commands as long as you have dependencies met