🌀✨Hey! Quick favor... ✨🌀 If you found value in this video, a simple LIKE 👍 and hitting SUBSCRIBE 🛎 helps more than you know. It lets TH-cam know we’re onto something good here and helps others discover it too. ⚛ I’d love to hear what you think! 💬 Drop your thoughts, insights, or questions in the comments below. 👇 Your support means everything. Let’s keep exploring the creative uses of AI together!
Thank you dude. You just made my day. I tried to use this through Pinokio, but my PC sucks and it took 25 minutes to generate a 4 word sentence. NOW, I can run it online and it took just seconds. I have a 4 second audio clip of my departed father and NOW I can finally make all the AI pictures I made of him, talk. I just made my first one and it is scary good.
The big question is if the community can improve it so it can reach its full potential. BTW I think the creators of the model said the cost wasn't astronomical.
6:09 You can see the Whisper transcription if you click the "Terminal" tab in the Pinokio sidebar. From there you can copy it and paste it into the Reference Text field in the UI if you want to use the same text multiple times and want to skip the transcription step (faster).
Something about that last segment reminded me about the Good Place Janet emotionally pleading with you not to kill her then speaking normally reminding you that it’s just her self defense mechanism. It’s hilarious how it switches modes instantly instead of building up to the next emotional state. It’s interesting watching the technology evolve in real time. Each incremental step towards more believable speech. It is remarkable how quickly it’s changing.
when these things start becoming full packages and not just tech demos or developer APIs then then so much is going to change . Packages with plugins , slicing tools , synth modulators , speed curves and things that let you link images & vid to export or create decision trees then the amount of writers who going to publish their own media productions is going to be huge . Big studios think we have run out of stories and keep regurgitating the same stuff with different skins but there are so many creative people out there with stories trapped in their heads and they just need the right tools to be able to tell them the way they want in their own style and language and just have them translate across cultures at scale . That doesnt even cover education where teachers are going to write up re-enactments of historic ,scientific events or mathematical scenarios , so that students can just watch videos as homework and understand why before teachers show them how to get the most value out of short classes and they'll be able to do it in their own language and translate to students native languages making this even better for poor developing nations to grow their education systems fast quickly.
I loved the ending example, and seeing you smile and enjoy it! This is why I watch your stuff Bob! Please keep entertaining and informing us. In return, I always watch the full two ads without skipping.
I've been waiting for a free tool to transcribe ebooks into audiobooks for me. This might have met the threshold I've been looking for! I have a 12GB 4070 Super, 64GB RAM, and a 20-core i7-14700. I'm extremely curious to see if this would take a month or a day. I really have no idea! Based on your experience, what would your total guess at extrapolating be for a 10-hour audiobook?
Wow, this is absolutely mind-blowing! The accuracy and emotion captured in just a 10-second sample are incredible. It’s amazing to see how far voice cloning technology has come-F5-TTS really nailed it! Can't wait to see what more is possible with this level of precision and emotion. Great job!
FYI: The Pinokio full install can take a long time if your connection isn't fast, so, don't do it on a deadline and have something else to do to keep yourself busy.
My problem with it is that even if i put high quality recrodings (dataset) the results (soundwise) dosnt sound good . I meen the clone of the character is good but sound quality is bad like a phone call quality or somthing like that
Bit confused with the ending? So if I have a 15 second clip of a narrator with just a basic voice can I add happiness & sadness to that??? That would be great if u could do that from a generic clip.
Wonder if this could be used to bring a loved ones voice back to Life who have passed away. If you took an audio clip from seeing Old recording. Might be a bit uncanny, Valley though.
Bob, would this be good for cloning a friend’s singing voice. My musical partner passed away a year ago from lung cancer, but I, and his widow, would still like to hear him “live” again through my productions. I have used software to animate him singing, but need tools for the voice. Any suggestions?
I think there would be much better options for that, since singing is a lot more complex. If you have recordings of your friend where he sings, an easy way would be to look into RVC ( Retrieval-based Voice Conversion). If you don't know, there was a huge hype around that last year since people started making songs with the voices of big stars. There's a lot of platforms that use this where you can train your own model and can basically turn every singing recording into a recording with the voice of that model. You can also do this for free on your own machine, but I guess it's a bit harder to understand for a beginner and as always you got to have a capable computer. You can find "RVC" in Pinokio, the program Bob used in this video. But if you ask me, the absolute best option would be to use ACE Studio. It's basically a composing suite for AI voices and they added the option to train your own model for free lately. So you could basically use your friend's voice as an instrument that sings lyrics and it is highly flexible. Unfortunately, the software itself is not free, there's a monthly fee. Bob made a video about it lately, but he used his speaking voice as samples for the model which isn't optimal. I got much better (I would even say shockingly realistic) results with samples of me singing. But keep in mind that all of those solutions still aren't perfect and could easily lead to frustration in a case like yours. I wish you the best of luck and hope you find the right solution for you!
@@missoats8731 Thanks for the very useful advice. I am well versed in audio recording and have lots of multitrack recordings of his voice that I guess could be used to train the models. What I don’t have is a lot of experience in the AI scene, but have been watching the emerging apps through TH-cam and the web. I will take your advice to heart and explore RVC and Ace Studio. Thanks again for the helpful guidance!
This is so fascinating. I will have to test it on my Mac. I’m not sure how accessible it is with the voiceover screen reader, but I will give it a whirl. I’ve been using 11 labs for a while, so this would be a really nice tool if this works properly. Do you know if you can use this on iPhone and android as well? It would be pretty cool if you could.
Bob. Is it better than Fake You? I've been using Fake You, and I love it. Oh wow Bob. It turns out that F5-TTS is built in to fake u, and it's pretty cool. When did they add that?
Me. I wonder if bob will cover. Yes. Yes he will thanks I’ve been using ttsopenai for two podcasts i started. I needed something that can clone my voice really well. Thanks a bunch as always
It would be rly cool if the sotfware had feature to change already recorded audio to sound like sample audio :) do you know any ai open source software that could do that? :) I know there is RVC but you have to have a train model first, and that model requries ~15min of audio.
The longish pause between sentence fragments is weird for me. I don't notice the problem in your examples, or in examples I see in other videos about F5-TTS. It almost sounds [pause] like this. [pause] And there's little I can do, [pause] to adjust it.
Thanks , this is the best voice clone by far. I just use the Hugging Face , which is all I really need. The F5-TSS is really good but I have to say the F2-TSS version is much better in emotion. I did try the add emotion feature to the F5-TSS as done in your video but it don;t seem to work , it shows (No Audio being produced) even though I uploaded a audio and it just produces the same audio. What am I'm doing wrong ?
Please use DeepL for this text translation, not google translate. XD Found it today and didn't had the time yet to try it myself, but it is indeed really good on voice cloning itself also the multiple tone feature is nice. I wonder if you can mix tones inside sentences.
It also works with a 6gb card. I have an rtx3060 6gb in my laptop and it still works great and takes about 45 seconds usually for a generation. not the best timing, but totally worth the wait
Thanks for the video, much easyr to install via pinokio, great tool. My F5 is using CPU instead GPU, is taking too long to process the audio file. I have a 2060 rtx, but a not good processor, so each second of TTS is taking 1 minute to process. So a 10 second TTS take 10 minutes to produce. :/ How do I configure the gpu to process the TTS instead? Or thats not possible? thanks!
It struggles with British accents, particually northern ones like mine. When I try to synthesize my own voice it seems to make me sound like I speak with RP. I've even tried accentuating my accent in the sample and it doesn't make any difference.
The final example, while very impressive, falls short of what I would consider “usable”, at least for my needs. I suffer from extreme camera/microphone anxiety. I believe I’m a decent writer and I’m not shy about most of the things I might do in front of a camera(playing guitar, woodworking…). But trying to speak and explain what I’m doing feels impossible. So the idea of being able to type things up and have a voice speak it for me sounds like a brilliant solution! But in many cases, it would need to sound like me and I don’t want friends clicking on my videos and saying “what the hell is this?” because it’s obviously not me or just doesn’t sound right.
Morning buddy! Can I install this on linux? It does say but if your familiar with these things, I have stable diffusion installed and so Python, PyTorch, GIT are all installed already....I'd rather have all my AI stuff installed on one OS as well as generally it's a mare on windows as I have an AMD card, thanks buddy, this is all very cool! Happy Friday! ;D
I been using it for days and it's better than most, but still not as good as like playhd or hi whatever it is, but... for a local its cool, i been playing with it local
It will if you're running it from MimicPC. If you're going to download from somewhere and run it locally, that's a different story and would totally depend on the power of your laptop.
@@BobDoyleMedia I have already installed all the necessary stuffs on the pinokio f5-tts, after I done, the tab is empty, the terminal shown it already done and can open but there is no f5-tts locally show up on my pinokio, i dont know what is wrong
🌀✨Hey! Quick favor... ✨🌀
If you found value in this video, a simple LIKE 👍 and hitting SUBSCRIBE 🛎 helps more than you know. It lets TH-cam know we’re onto something good here and helps others discover it too. ⚛
I’d love to hear what you think! 💬 Drop your thoughts, insights, or questions in the comments below. 👇
Your support means everything. Let’s keep exploring the creative uses of AI together!
#BobDoyleMedia #LikeAndSubscribe #YourSupportMatters #ThankYou
Hey Bob. Great Channel. Thanks.
Try finding one that can do voices like Darth Vader. These things cant do voices like that yet
Thank you dude. You just made my day. I tried to use this through Pinokio, but my PC sucks and it took 25 minutes to generate a 4 word sentence. NOW, I can run it online and it took just seconds. I have a 4 second audio clip of my departed father and NOW I can finally make all the AI pictures I made of him, talk. I just made my first one and it is scary good.
Sorry Pinokio didn't out for you as a solution. Yeah, the GPU definitely makes a difference.
@@BobDoyleMedia Pinokio does work for me using FaceFusion 3.0 tho. Still a little slow but it's tolerable
How does it work on mac?
I don't have a nivida graphics card instead of an AMD Radeon. Will that work?
How did you run it online?
The big question is if the community can improve it so it can reach its full potential. BTW I think the creators of the model said the cost wasn't astronomical.
6:09 You can see the Whisper transcription if you click the "Terminal" tab in the Pinokio sidebar. From there you can copy it and paste it into the Reference Text field in the UI if you want to use the same text multiple times and want to skip the transcription step (faster).
@@LiFancier thank you! Great tip.
@@BobDoyleMediaI've experimented it even works with effects
In my opinion, the e2 model does the voice cloning more accurate.
The F5 model sometimes gives results that doesn’t really sound like the voice.
Great demo! Me too, I love this stuff! You did a great job explaining AND showing us how it works, and how to get started. Thanks Bob!
Something about that last segment reminded me about the Good Place Janet emotionally pleading with you not to kill her then speaking normally reminding you that it’s just her self defense mechanism. It’s hilarious how it switches modes instantly instead of building up to the next emotional state.
It’s interesting watching the technology evolve in real time. Each incremental step towards more believable speech. It is remarkable how quickly it’s changing.
its cool to see how satisfied you look when you're listening to the results :)
I truly do get excited aby how cool all this stuff is - even when it's not perfect yet.
Really cool, thanks for the tip on Pinocchio, very smooth installation. I'll be playing with a bunch of other toys through Pinocchio now!
I might try this one! ;) Please keep entertaining and informing us in this kind of video contents.
You seem like a down to earth guy. Thanks for this video and for explaining everything step by step :)
when these things start becoming full packages and not just tech demos or developer APIs then then so much is going to change . Packages with plugins , slicing tools , synth modulators , speed curves and things that let you link images & vid to export or create decision trees then the amount of writers who going to publish their own media productions is going to be huge .
Big studios think we have run out of stories and keep regurgitating the same stuff with different skins but there are so many creative people out there with stories trapped in their heads and they just need the right tools to be able to tell them the way they want in their own style and language and just have them translate across cultures at scale . That doesnt even cover education where teachers are going to write up re-enactments of historic ,scientific events or mathematical scenarios , so that students can just watch videos as homework and understand why before teachers show them how to get the most value out of short classes and they'll be able to do it in their own language and translate to students native languages making this even better for poor developing nations to grow their education systems fast quickly.
While the video itself was amazing, the last 10 seconds took me away! 😅 You're amazing, well done.
I have a tape of my dad from when he used to do taped “letters” to his brother from abroad. I am totally going to resurect my dads voice.
I watched again, and both times I felt your pain about the loading, but made me laugh every time
I caught up at image animations with lip sync, how have you made it? Please share turial about it. Lovely video, thoroughly entertained 😂
If you're taking a survey on GPUs, both models worked fine on my Gigabyte laptop with an RTX 3070 GPU.
I loved the ending example, and seeing you smile and enjoy it! This is why I watch your stuff Bob! Please keep entertaining and informing us. In return, I always watch the full two ads without skipping.
Thanks so much!
I've been waiting for a free tool to transcribe ebooks into audiobooks for me. This might have met the threshold I've been looking for! I have a 12GB 4070 Super, 64GB RAM, and a 20-core i7-14700. I'm extremely curious to see if this would take a month or a day. I really have no idea! Based on your experience, what would your total guess at extrapolating be for a 10-hour audiobook?
I'm excited about this info. Thanks for always sharing
My pleasure!
@@BobDoyleMedia whast best RVC and SO-VITS-SVC or F5-TTS models
its so cool to see how you make boring stuff fun to learn tks for that
Wow, this is absolutely mind-blowing! The accuracy and emotion captured in just a 10-second sample are incredible. It’s amazing to see how far voice cloning technology has come-F5-TTS really nailed it! Can't wait to see what more is possible with this level of precision and emotion. Great job!
I tried to install on Surface Pro 7, it freezed during the installation. Tried the online, it was very slow,
Did they took out the "podcast" section?? As it is gone in last few days from Pinokio's rep!!
Amazing, I wish you had used better quality reference audio, so we could hear it's best quality
the last speech cloning program that i tried took me hours to install, didn't work well (still shoutout to the devs) and took minutes to render.
Do you know of any alternatives for German speakers?
I'm so glad I've found your channel ❤ Thank you for your many great videos
So, how do you fix errors? Like, the AI voice chose to emphasize “sick” in the sentence that was intended to emphasize the word “out” (13:13).
FYI: The Pinokio full install can take a long time if your connection isn't fast, so, don't do it on a deadline and have something else to do to keep yourself busy.
I was listening to you speak thinking you are the Matthew McConaghey of AI vids, and at that precise moment, your sample audio mentioned his name
@@jamesvictor2182 Coooooooooool 😎
My problem with it is that even if i put high quality recrodings (dataset) the results (soundwise) dosnt sound good . I meen the clone of the character is good but sound quality is bad like a phone call quality or somthing like that
LoL the Windows update progress bar, u made me spit my coffe 😂
@@p_p I always appreciate when someone lets me know they caught some little thing like that. Sorry for the mess. 🤪
Thnks for Making it Simple and I also tried to use this through Pinokio, but my PC sucks and it took hours generate a sentence.
F5-TTS was the AI used for Tank Rogan’s voice
Bit confused with the ending?
So if I have a 15 second clip of a narrator with just a basic voice can I add happiness & sadness to that???
That would be great if u could do that from a generic clip.
So, this is what Suno and Udio is using? Or something similar. That's why they can reproduce the singer's voice with only small sample.
Yes, exactly. It seems like Suno and Udio are using a similar approach,
Wonder if this could be used to bring a loved ones voice back to Life who have passed away. If you took an audio clip from seeing Old recording. Might be a bit uncanny, Valley though.
Is there a GPU setting somewhere? It takes a freaking long time for a simple Multi-* small paragraph of 4 lines of emotion.
Here we go again with another of your treasures. U DA MAN
@@joseparedes380 thanks! Should be a fun one!
Why does a white screen appear after I click discover in pinokio?
When I tried its using cpu for me instead of GUP. Any idea how to change this
hi is there a way to make an API out of this? and what is the character count on inputs?
Bob, would this be good for cloning a friend’s singing voice. My musical partner passed away a year ago from lung cancer, but I, and his widow, would still like to hear him “live” again through my productions. I have used software to animate him singing, but need tools for the voice. Any suggestions?
I think there would be much better options for that, since singing is a lot more complex. If you have recordings of your friend where he sings, an easy way would be to look into RVC ( Retrieval-based Voice Conversion). If you don't know, there was a huge hype around that last year since people started making songs with the voices of big stars. There's a lot of platforms that use this where you can train your own model and can basically turn every singing recording into a recording with the voice of that model. You can also do this for free on your own machine, but I guess it's a bit harder to understand for a beginner and as always you got to have a capable computer. You can find "RVC" in Pinokio, the program Bob used in this video.
But if you ask me, the absolute best option would be to use ACE Studio. It's basically a composing suite for AI voices and they added the option to train your own model for free lately. So you could basically use your friend's voice as an instrument that sings lyrics and it is highly flexible. Unfortunately, the software itself is not free, there's a monthly fee. Bob made a video about it lately, but he used his speaking voice as samples for the model which isn't optimal. I got much better (I would even say shockingly realistic) results with samples of me singing.
But keep in mind that all of those solutions still aren't perfect and could easily lead to frustration in a case like yours. I wish you the best of luck and hope you find the right solution for you!
@@missoats8731 Thanks for the very useful advice. I am well versed in audio recording and have lots of multitrack recordings of his voice that I guess could be used to train the models. What I don’t have is a lot of experience in the AI scene, but have been watching the emerging apps through TH-cam and the web. I will take your advice to heart and explore RVC and Ace Studio. Thanks again for the helpful guidance!
This is so fascinating. I will have to test it on my Mac. I’m not sure how accessible it is with the voiceover screen reader, but I will give it a whirl. I’ve been using 11 labs for a while, so this would be a really nice tool if this works properly. Do you know if you can use this on iPhone and android as well? It would be pretty cool if you could.
Bob. Is it better than Fake You? I've been using Fake You, and I love it. Oh wow Bob. It turns out that F5-TTS is built in to fake u, and it's pretty cool. When did they add that?
Is there something which can do voice to voice from just a 20 sec audio sample??
I think if you use EMaster with the outputs it will sound even better
Hi, not sure why my install doesn't feature a 'podcast' tab. Could you perhaps shed some light as to why this is?
Works on my windows laptop: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz?
Me. I wonder if bob will cover. Yes. Yes he will thanks I’ve been using ttsopenai for two podcasts i started. I needed something that can clone my voice really well. Thanks a bunch as always
It would be rly cool if the sotfware had feature to change already recorded audio to sound like sample audio :) do you know any ai open source software that could do that? :)
I know there is RVC but you have to have a train model first, and that model requries ~15min of audio.
The longish pause between sentence fragments is weird for me. I don't notice the problem in your examples, or in examples I see in other videos about F5-TTS. It almost sounds [pause] like this. [pause] And there's little I can do, [pause] to adjust it.
Thank you for your comment. I know what. You mean. 😂 Been running into that with other TTS AI. Might still give this one a go though.
I did try that TTS but for some reason the copy paste function is not working, you need to type manually 😢
I might try this one... like that it can talk in Mandarin, could be great for learning another language.
Thanks , this is the best voice clone by far. I just use the Hugging Face , which is all I really need. The F5-TSS is really good but I have to say the F2-TSS version is much better in emotion. I did try the add emotion feature to the F5-TSS as done in your video but it don;t seem to work , it shows (No Audio being produced) even though I uploaded a audio and it just produces the same audio. What am I'm doing wrong ?
Do you know why they removed that podcast feature?
Please use DeepL for this text translation, not google translate. XD
Found it today and didn't had the time yet to try it myself, but it is indeed really good on voice cloning itself also the multiple tone feature is nice. I wonder if you can mix tones inside sentences.
It also works with a 6gb card. I have an rtx3060 6gb in my laptop and it still works great and takes about 45 seconds usually for a generation. not the best timing, but totally worth the wait
how long of words did you do to get this estimate. I'm just curious
What if I don't have room on my C drive? Pinokio installs it there, and I cant change home directory because it will give me an error.
Thanks for the video, much easyr to install via pinokio, great tool.
My F5 is using CPU instead GPU, is taking too long to process the audio file.
I have a 2060 rtx, but a not good processor, so each second of TTS is taking 1 minute to process.
So a 10 second TTS take 10 minutes to produce. :/
How do I configure the gpu to process the TTS instead? Or thats not possible?
thanks!
I'm having the same problem. I have an RTX 3060, but instead of using the GPU, the program keeps running on the CPU. I still don’t know how to fix it.
It struggles with British accents, particually northern ones like mine. When I try to synthesize my own voice it seems to make me sound like I speak with RP. I've even tried accentuating my accent in the sample and it doesn't make any difference.
The ‘Corner Over There’ is a half a mile down the road at the corner of the farm…..
Why is there no sound in my audio output?
@@Yanduo888 is there any visible indication that a waveform was generated?
Can nvidia 1660 super vram 6gb run?
@@BobDoyleMedia waveform not generated
Any idea on how well it runs on Apple silicon macs. was planning to get an m2 pro mac
Apparently it doesn't work on Mac OS 13.5, I have an error message?
So this wont run with 4gb vram at all....not even sllow???
Would this work for vocals like replay?
Hey Bob
Can I load Pinokio on Mimic-PC? How would I do that?
You're the man 👍
The final example, while very impressive, falls short of what I would consider “usable”, at least for my needs.
I suffer from extreme camera/microphone anxiety. I believe I’m a decent writer and I’m not shy about most of the things I might do in front of a camera(playing guitar, woodworking…). But trying to speak and explain what I’m doing feels impossible. So the idea of being able to type things up and have a voice speak it for me sounds like a brilliant solution! But in many cases, it would need to sound like me and I don’t want friends clicking on my videos and saying “what the hell is this?” because it’s obviously not me or just doesn’t sound right.
Great video
Morning buddy! Can I install this on linux? It does say but if your familiar with these things, I have stable diffusion installed and so Python, PyTorch, GIT are all installed already....I'd rather have all my AI stuff installed on one OS as well as generally it's a mare on windows as I have an AMD card, thanks buddy, this is all very cool! Happy Friday! ;D
10 seconds in, "it's not perfect"
title: "perfect"
...
Okay.
Can this be used on mimic computer?
clonemyvoice AI fixes this. Perfect voice clone with emotion.
which one? i cant seem to find it
It's a comparison page of 'Top AI Voice Cloning Software in 2024' to pay for.
Bob, your voice and personality can carry the content easily without the (over-gained) music used in the intro. Great content🌟👏👍
ooh cool, only few seconds input to train
Yes, I find it very impressive.
15:20 The emotion needs to be in curly brackets, actually. (Parentheses will not work.)
Could you advise make tamil Text to speech with my own voice
Is it available on any online services like MimicPC ?
@@epicchannel4724 i’m hoping it comes to mimicPC. I’ll ask them about it.
will it work on RTX3050Laptop vram 4GB
Can you tell me the best AI for lip syncing non human characters? Thanks
Please use the dark theme.
I like the light theme.
No
IPinokio!
Does it only support English language?
Thanks for the video for taking the time to produce it
@@SAMEGAMAN right now it’s English and Chinese.
I been using it for days and it's better than most, but still not as good as like playhd or hi whatever it is, but... for a local its cool, i been playing with it local
I'm amazed 11 labs still hasn't got emotions? They been out for over a year on playht & revoicer???
This is great .
amazing...many many thanks....
Pure gold
The Windoze progress bar is 100% not 100% when it sits at 100% for 100% of 3 plus minutes before getting to 100% finished.
I have an Irish accent and I can never TTS websites right, I either sound American or English! 😁
Yea. I tried a couple of uk soccer pundits. The scottish and irish come out real bad. Shame about that.
Does this work with vocals or only spoken language?
only spoken language
You can very likely train a fine-tuning specifically for vocals and use that on top.
Will this work on an ordinary laptop computer???
It will if you're running it from MimicPC. If you're going to download from somewhere and run it locally, that's a different story and would totally depend on the power of your laptop.
@@BobDoyleMedia I have a laptop with an intel i7 10th gen with 16GB ram only. WIll this work?
@@Tom77889 I truly wish I could tell you for sure, but I honestly don't know, having no way to try it myself.
@@BobDoyleMedia I have already installed all the necessary stuffs on the pinokio f5-tts, after I done, the tab is empty, the terminal shown it already done and can open but there is no f5-tts locally show up on my pinokio, i dont know what is wrong
The E2 sound like Office Space.
i tried it, but the the quality is so and so....
We want a tutorial about the last 5 seconds of the video😅
Nice to see progress, but this is not even orbiting the same planet as ElevenLabs.
Love it
Is this software safe? I don't even expect it to be free, just, is it safe?
@@czesnikadam6355 it is both free and safe. At least I haven’t had any problems with it.
Cool !😊