A heartfelt thank you from Vietnam! I was completely captivated by your positive and engaging voice-it’s like how you were drawn to the AI voice of Kokoro TTS. Maybe it’s something about the hormones, haha! It’s such a coincidence that I’m also a Mac user, and it’s rare to find TH-camrs in this field who provide tutorials for macOS. I’ve already subscribed to your channel and am eagerly waiting for your guide on setting up Kokoro locally. Honestly, I haven’t been able to fine-tune much with the F5-TTS-MLX model voices, so Kokoro TTS might be the perfect alternative. Once again, thank you so much!
Would love to have a model so good and free in spanish. BTW, you can use an LLM to check the text and put guardrails to these type of models to avoid abuse..
I use Balacoon TTS because it runs in 1/50 real time on a single CPU core (for their en_us_hifi92_light_cpu.addon model). It sounds a little artificial but still perfectly intelligible, and the low load never gets in the way, unlike any of the alternatives that sound anywhere near as good.
@@1littlecoder have you seen the voice changer apps and cloning service Balacoon offers? They can clone from a 10 second sample. I don't know whether cloned voice production is CPU or GPU though, or its speed vs real time.
You should add dialog to the test. Like, and then he whispered "hi, what's up.". elevenlabs will try to interpret it as dialog, but I have no idea for this new one.
@@figs3284 Yes, several AI processes allow you to take generated audio and add emotion to it, Murf AI, ElevenLabs, and LOVO AI being prominent options that let you control pitch, emphasis, and other vocal elements to convey different emotions.
@markcasey8465 thanks for your reply. I should have been more specific. The appeal of kokoro is the fact that it's free and runs on anything. I was more so talking about a library or some open source project or tool to add the emotion.
great to see opensource improvements , however it is not great , still electric robotic timbre tin-ness and an unnatural cadence to sentences like every other word might be the end of the sentence , lots of work still to do . and it is entangled by espeak dependency
@@1littlecoder One major limitation is that you can't use or train your own data. Instead, we have to rely on pre-existing voices, which isn't good for production.
Is a shame that this model have 2 different accents for the third most spoken language in the world and voices for other less spoken languages but NONE for the second most spoken language in the world.
@@paelnever to ask, what is the language that you refer to? Instead of telling me that I should look up on Wikipedia or Google, you could have just answered. What is the language in one word but you didn't great! Thank you for teaching me that. I should Google or use Wikipedia. I didn't know that before feeling enlightened
Local Setup Tutorial - th-cam.com/video/LVINm5vUSW8/w-d-xo.html
can't there nobody try to public it on the website?
Just FYI because you said you assume nobody is - but I am watching this video from LA :) and thank you - keep up the great work my friend
ah man! hope you and your family are safe!
Ditto 😅
A heartfelt thank you from Vietnam! I was completely captivated by your positive and engaging voice-it’s like how you were drawn to the AI voice of Kokoro TTS. Maybe it’s something about the hormones, haha! It’s such a coincidence that I’m also a Mac user, and it’s rare to find TH-camrs in this field who provide tutorials for macOS. I’ve already subscribed to your channel and am eagerly waiting for your guide on setting up Kokoro locally. Honestly, I haven’t been able to fine-tune much with the F5-TTS-MLX model voices, so Kokoro TTS might be the perfect alternative. Once again, thank you so much!
Very glad to hear this!
Yes please I cant wait for the next video about doing this locally and also how to host on our cloud. Thank you!
done sir - th-cam.com/video/LVINm5vUSW8/w-d-xo.html
Are there any free models available for training our own voice?
Coqui tts
Abdul having the " Her" moment 😂
@ 10:30. Be careful bhai!
On serious note, I liked the comparison and looks quite promising TTS.
ScarJo is going to sue me
This is great! Been looking for something on the level of Eleven Labs. Appreciate your videos, also learned of Sarvam AI from you
Glad to know that. You're working with indian languages?
@@1littlecoderyeah working on a multilingual app for small businesses, and focusing on a few big countries including India
@@1littlecoder no hindi voice ?
Kokoro sounds good and interesting. Please make a video for local installation
is 80m params enough 🤔
Would love to have a model so good and free in spanish. BTW, you can use an LLM to check the text and put guardrails to these type of models to avoid abuse..
SF bay area in the house 💯
Subscribed! Please do a tutorial on how to install locally.
here you go! - th-cam.com/video/LVINm5vUSW8/w-d-xo.html
👍 Good TTS comparison of Kokoro vs ElevenLabs
thank you!
I use Balacoon TTS because it runs in 1/50 real time on a single CPU core (for their en_us_hifi92_light_cpu.addon model). It sounds a little artificial but still perfectly intelligible, and the low load never gets in the way, unlike any of the alternatives that sound anywhere near as good.
Balacoon is the best CPU TTS I've used. The speed is quite amazing but yes the voice is just like an average TTS
@@1littlecoder have you seen the voice changer apps and cloning service Balacoon offers? They can clone from a 10 second sample. I don't know whether cloned voice production is CPU or GPU though, or its speed vs real time.
are there prompts for moods intonations etc??
Yet to figure out!
brothers, does anyone knows how to add pauses or ssml tags to this model?
Hey bro I'm following you for the last couple of months can you tell me the way to know about new ai products and tools
How much gpu space I need to run?
Can we make use of live chat with Graoq?
Sir, when you give tutorial, also include how to use styletts in mobile phone as well
You should add dialog to the test. Like, and then he whispered "hi, what's up.". elevenlabs will try to interpret it as dialog, but I have no idea for this new one.
"IDK, maybe like hormones" 😂😂
BTW, I like the F5-tts the best open source one out there.
Do you know of any way to add emotion at least to some extent in post processing?
Generally there would be tags, but I haven't tested with this yet
Not specifically kokoro. I meant outside of kokoro. Is there anything like that where you can process the generated audio, and add emotion?
@@figs3284 Yes, several AI processes allow you to take generated audio and add emotion to it, Murf AI, ElevenLabs, and LOVO AI being prominent options that let you control pitch, emphasis, and other vocal elements to convey different emotions.
@markcasey8465 thanks for your reply. I should have been more specific. The appeal of kokoro is the fact that it's free and runs on anything. I was more so talking about a library or some open source project or tool to add the emotion.
Can it do STS, too?
It is not an STT model.
@jsalsman I said STS. There aren't enough tools that do speech2speech, which I think is a very valuable modality
@@justtiredthingsSts would imply is can do stt. No?
If they fix the timbre/emotions this will give ElevenLabs a run for their money. Either that or I’ll have to wait for a Chinese copycat 😂
I wish :D
bro could you tell the difference between open source and open access model ,it kind of confusing
Doesn't do cloning
Please do a full local tutorial
you need to say 2-4-5 so it reads it like that
bro please make how to run locally
thank you
Phi 4 came out. And nobody is talking about it.
Please make a full tutorial of this.
great to see opensource improvements , however it is not great , still electric robotic timbre tin-ness and an unnatural cadence to sentences like every other word might be the end of the sentence , lots of work still to do . and it is entangled by espeak dependency
this video should be narrated by this voice tool...
don't have spanish? :(
cool
unless it's open source like lets your train your own model, it's useless. Creator won't release the code. Don't bother
@@bomar920 why is it useless? Enlightenment me
@@1littlecoder One major limitation is that you can't use or train your own data. Instead, we have to rely on pre-existing voices, which isn't good for production.
*No Arabic* 😢
Hormones! 😂😂
Nah Runaway text to speech is better,
and weight gg is better that text to speech
Is a shame that this model have 2 different accents for the third most spoken language in the world and voices for other less spoken languages but NONE for the second most spoken language in the world.
@@paelnever which one is that ?
@@1littlecoder If you really don't know you get the answer faster searching google or wikipedia or even asking an llm.
@@paelnever to ask, what is the language that you refer to? Instead of telling me that I should look up on Wikipedia or Google, you could have just answered. What is the language in one word but you didn't great! Thank you for teaching me that. I should Google or use Wikipedia. I didn't know that before feeling enlightened
@@1littlecoder Of course i guess you already know how to search things, i was just pointing your laziness.
Maybe hormones ? 😂
Too bad it cannot clone own voice yet!
French generation and voice is OK
Hormones 😂😂😂