These whispers/filler words and emphasis on words are really important. That how I instantly know its AI. It just sounds like someone reading an audiobook pretty much, so very 'scripted/acted' and very non casual, with a pacing that is too consistant
That'd be great to get more data on, but I only know English and some-ish Japanese. Haven't come across leaderboards for multilingual models, but technically these models can be trained to be very good in other languages
How can I train a foreign language (e.g., a traditional or cultural language from Indonesia) into a TTS system? Can you provide suggestions on the best models and training methods?
I have kokoro and F5 TTS integrated to my chatbot, kokoro is super fast and I think slightly higher quality but it cant clone audio like F5. Unfortunately F5 isnt good with laughter or umm type of sounds and it takes atleast 2-5 seconds to generate the first audio so its not good for interactive chat.
First example B sounds more realistic as a real human...but it sounds like someone who is reading an audoibook and "acts" angry, yet dos not come across as someone really angry (in an audiobook that is fine or evne prefered, but it does not seem like an actual angry/annoyed person)
These whispers/filler words and emphasis on words are really important. That how I instantly know its AI. It just sounds like someone reading an audiobook pretty much, so very 'scripted/acted' and very non casual, with a pacing that is too consistant
What about other languages ?
I have same question
That'd be great to get more data on, but I only know English and some-ish Japanese. Haven't come across leaderboards for multilingual models, but technically these models can be trained to be very good in other languages
@@Jarods_Journey Is there any reference to train this model? because I tried to train some models , it ends with some erorrs
"Pass names in the prompt to prevent misspellings" is cool as hell! I did not know Whisper can do that!
Ollama deepseek-r1 32b - is on par with o1 mini. Great video! looking forward to you testing some of the top TTS models
You can continue to talk about the updates to your project. I find it to be fascinating.
How to add others languange, like Indonesian? Can it import from others TTS engine like Applio?
Out of the box, English only and Chinese for f5 TTS. Other languages can be trained, though it's an integrated process
Thank you Jarod.
Have you tried DeepSeek?
I have. Does it do tts or voice cloning too?
In the process of trying it this weekend :)!
Thanks for the video, those tools are interesting
How can I train a foreign language (e.g., a traditional or cultural language from Indonesia) into a TTS system? Can you provide suggestions on the best models and training methods?
I have kokoro and F5 TTS integrated to my chatbot, kokoro is super fast and I think slightly higher quality but it cant clone audio like F5. Unfortunately F5 isnt good with laughter or umm type of sounds and it takes atleast 2-5 seconds to generate the first audio so its not good for interactive chat.
Yoo sick llm stuff
How about Hailuo in that contest?
It would cost, but what about comparing cloned voices?
Can you make a tutorial, how to full configure coqui-ai-TTS and create d_vector for voices in coqui-ai-TTS?
Like your videos!
First example
B sounds more realistic as a real human...but it sounds like someone who is reading an audoibook and "acts" angry, yet dos not come across as someone really angry (in an audiobook that is fine or evne prefered, but it does not seem like an actual angry/annoyed person)
5:25 Man I don't know if you drank something, but A definitely sounded better
agreed
Can you make tutorial on how to make rvc models?
I have one from a long time ago, dunno if I'll be able to get around to it again anytime soon
llasa