I'm editing a Wedding recap video and theres a section where during the bridal party speeches the microphone kinda cut out and ruined the flow of what they were saying. I used this to recreate their voice on the sections that cut out and my goodness, it's wonderful. Truly a gem for issues like this.
Game changer for sure. No more cryptic use of punctuation to try to get the right flow and inflection on words and multiple re-rolls of lines. Brilliant
I'd love something that can read e-books to me with emotion. Some Text To Speech voices are good, but completely robotic in their emotional emphasis. I've encountered audio books where i like the story but the voice who is reading it is not to my taste, especially during dialogue of the opposite gender to the reader. It would be great if there were easy AI solutions to both of these.
This is awesome I’m hoping to use it to answer customers that ask the same question over and over on the phone so I don’t gotta sit there for 10 minutes the turbo feature will make it seem real I hope
5:29 what generator did you use. I have a face swap software one with no sound and D-ID didn't allow me to use a famous person although the image was ai generated.
Whoa, this is awesome! Quick question, what was used for the Liam Neeson headshot movement (face over), I’m hoping there’s an API out there somewhere.. Thank you keep up the great work!
I'm just so lost. I'm trying to start streaming online using a voice changer, speaking live and having the voice changed live, and I'm trying to clone a voice for this purpose. Do you know of anything? I can't find anything. Every time I search for ANYTHING on this, I keep getting "text to speech" options, or cloning voices that ultimately result in text to speech only options. Is what I'm looking for a thing? I don't know what to actually search for. ;(
Just today tried Descript for the first time for a lot of text I have to read. It, or I, sound like I want to take a long walk off a short pier and when I'm half joking I can't get that light hearted flavor tone to come out. I may try ElevenLabs tomorrow?
What is it like re rendering your own voice. I am thinking of a bad recording, like in a noisy cafe, or echo-y room and redoing it so it sounds studio quality. Bonus points for 2 people talking a spearting them out into different tracks.
It definitely isn't perfect, and I've had to do some re-recordings to work around that very issue. How long was the sample that you sent it, and does it have any examples of the word that is slurring?
It's a Blue Yeti Pro. I have it as close to me as I can without it being in the frame. And I also may be running some compression on it, depending on the video.
Does anyone know if there is a service like this where you can purchase or download your Voice and add it to your Apple or Windows computer? I want to do an audiobook of my late great father, reading a public domain translation of the Bible. If I’m limited to an amount of words or minutes, I’m gonna spend 1 trillion billion dollars getting that done…
@@BobDoyleMedia ok because I'm watching a lot of videos and I can see you can change the voice, but not the text... I wanted to be sure I can before paying for it :)
As I say in the video, now I can use whatever acting skill I use with my own voice and then apply it to others, thus being able to create a stable of characters that I can offer or use for my own projects.
There are definitely solutions for that which I would love to cover eventually here on the channel. Do a search for RVC voice clone and you’ll find your answer.
Hey Bob, nice vid. You want to know how I use it? I’m one of the pre-made Australian voices on ElevenLabs - Friends send me all sorts of crazy stuff people just e used my voice for everyday! As a producer I use it for allowing me to perform reads in voices I don’t have - I even did a read in a 30yo Aussie female voice recently :)
Precisely! That's just the kind of use case I'm talking about. I have another video that addresses this specifically for VO professionals: th-cam.com/video/edNQd2LgBrw/w-d-xo.htmlsi=PbK0i5jN_BeylcB7
Personally I like RVC. With my 3090 I can clone a voice with about 20 minutes of audio in around 30 minutes that sounds pretty good, and can then convert recordings like this, or do "real time" conversion, with a delay depending on your GPU.
The thing that concerns me is if you upload your voice to ElevenLabs are you giving them permission to use your voice somewhere else without compensation
Unfortunately, it does not work, at least in any of the tests I did. Generally, the models require a slightly different type of training if you’re going to use them for singing, but I’m only speaking about training approaches that I know. I really don’t know what ElevenLabs is doing.
Well, I think some voices are better than others, and I believe that like with most AI things, a lot of it has to do with the data going into the model. If the read going in is flat, that's all you're going to get out. That's why I have several models of my voice with a range of modulation.
Here's a question, do you think Eleven Labs can get to a point where voices are synthesized in real-time instead of submitting an audio sample file and it churning through it and then spitting out the end result like it has (which is impressive still to say the least)
I've seen videos of a man who is using AI on his PC to change his voice to a female anime character in almost real time. The technology is pretty much there.
@@danielle78730 It's w-okada AI voice changer, opensource, free, local and easy to use... And you can use it in realtime in games,Discord, etc... and every app with microphone support...
"Cloning" is, making an identical copy. Cloning is NOT voice-to-voice. Voice-to-voice is a conversion. The two are very different. I'm having a tough time finding out what this program is capable of, thanks to illiterate use of terminology. Many are.
Well, I feel like the word "cloning" describes enough for the general public what the result will be. And I guess the term "conversion" isn't as sexy. I get your point, but is it really unclear what the program does? From your viewpoint, I'd say it does exactly what you said: converts. It converts text to speech, and it converts voice to another voice, clearly based on some kind of AI model that is created amazingly fast. "Cloning" or not, I'd say it's amazing.
The program itself is written using non-standard terminology. Most are today. You are obliged, I think, to use the same terminology as the product, right or wrong. What you think is a "sexy" word is irrelevant. Yes, it is unclear to those who may have a technical vocabulary. Another example is "AI." It has no definition at all. It means "really cool," right? There are quite a few examples. It's a sign of the sorry times we live in. If you must use undefined terms, consider adding a link to a glossary. @@BobDoyleMedia
It's can be illegal /harassment/defiling and infringement, unless you get permission, for the dead it's considered false light. The bottom line is, you don't own it. Regardless of legislation not updated fully yet anywhere, there's the morality of it too. Even the end of this video is infringement on the look, even though it's a satire and genuinely means no harm. Pay for actor release forms, even yourself.
@@TXanders There no morality for the dead they do not need to worry about that. It probably more a public domain thing so maybe after 50 years or less depending on living heirs trying to cash in for work they didn't do.
Yes, your point is totally valid. I guess I'm just "going with it while I can" until firmer rules are in place - but it's going to be hard to backpaddle on this tech, so it will be interesting to see what kind of legislation is created. In my case, I'm always making it evident that it's AI, so I believe that this is currently acceptable to TH-cam, which suits me just fine.@@TXanders
The value of putting your face on youtube will not have any value at all because people will think it is all fake. Well I think it is kind of good because then it boils down to the value of the content. But I predict people will upload hundreds of automated content every week all AI generated so it is all going to be BS. Maybe AI will generate uniqness as well so it will all go to BS anyway. Welcome to this BS future.
It all has been fake for years. The focal length of the camera is already changing how you look, video is color graded, using a green screen, high lumen lighting,... Does it matter? No. People look at faces, eyes, mouth, gesticulation, mimics,… because it’s a strong part of the humans multi-sensory toolset for communication and understanding.
For a free alternative for speech to speech, check out this video: th-cam.com/video/Usua2LnnX4g/w-d-xo.html
I'm editing a Wedding recap video and theres a section where during the bridal party speeches the microphone kinda cut out and ruined the flow of what they were saying. I used this to recreate their voice on the sections that cut out and my goodness, it's wonderful. Truly a gem for issues like this.
Game changer for sure. No more cryptic use of punctuation to try to get the right flow and inflection on words and multiple re-rolls of lines. Brilliant
I've tried over a dozen different AI voice labs, several are good, some are not. The one I decided to go with was Eleven Labs.
I don't see nothing wrong with using it to speak with passed family members if there voices are saved, it can help people who are grieving
That's not letting go, not grieving. That seems unhealthy.
I'd love something that can read e-books to me with emotion. Some Text To Speech voices are good, but completely robotic in their emotional emphasis.
I've encountered audio books where i like the story but the voice who is reading it is not to my taste, especially during dialogue of the opposite gender to the reader.
It would be great if there were easy AI solutions to both of these.
It's going to cost too much. If you have lots of money then go ahead.
Good, fun demonstration. Just the tool I've been looking for. Thanks.
I appreciate your time ✌️
Nice. I've used some Replica voices because they have good emotional weight but poor voice clarity. Thanks for this.
Should we try it?
@@DawnPeacockOwens just try respeecher
@@saintfame23 we already have the subscription to Eleven labs , so may as well use that one first
Respeecher have this feature and specialized on speech to speech technology. Try them as well
I extensively use Eleven Labs and I love it
This is awesome I’m hoping to use it to answer customers that ask the same question over and over on the phone so I don’t gotta sit there for 10 minutes the turbo feature will make it seem real I hope
I would love to use this technology, if they didn't nickle and dime you for every little character you use.
A workaround could also be Creative Commons impersonators.
5:29 what generator did you use. I have a face swap software one with no sound and D-ID didn't allow me to use a famous person although the image was ai generated.
Whoa, this is awesome! Quick question, what was used for the Liam Neeson headshot movement (face over), I’m hoping there’s an API out there somewhere..
Thank you keep up the great work!
That's what I want to know...did you ever find out?
I'm just so lost. I'm trying to start streaming online using a voice changer, speaking live and having the voice changed live, and I'm trying to clone a voice for this purpose. Do you know of anything? I can't find anything. Every time I search for ANYTHING on this, I keep getting "text to speech" options, or cloning voices that ultimately result in text to speech only options.
Is what I'm looking for a thing? I don't know what to actually search for. ;(
I tried Elevenlabs but had better results with Vocs AI speech to speech
Hey Bob, do you mind if I ask you achieve such perfect background substitution?? Thanks.
Awesome. 🎉 Many Thanks for sharing face fusion and 11 labs 🎉
You can really test the voice AI with something like a "love note" reading. They sound like a business transcript... funny though.
So grateful for 11 and my subscription to!
Yeah, it's getting to be a better and better value!
Just today tried Descript for the first time for a lot of text I have to read. It, or I, sound like I want to take a long walk off a short pier and when I'm half joking I can't get that light hearted flavor tone to come out. I may try ElevenLabs tomorrow?
What is it like re rendering your own voice. I am thinking of a bad recording, like in a noisy cafe, or echo-y room and redoing it so it sounds studio quality. Bonus points for 2 people talking a spearting them out into different tracks.
Well, I know the feature but I like your way of presenting it, jsut love your videos. I am also Liam Neson's big fan.
Please make videos about open source solutions for this 🎉
I use it to clone my own voice for my videos, as i tend to slur some words and not keep tempo.
It definitely isn't perfect, and I've had to do some re-recordings to work around that very issue. How long was the sample that you sent it, and does it have any examples of the word that is slurring?
4 clips 30 seconds long seemed to work. read from a script I found online. @@BobDoyleMedia
so where does this leave security? voice print identification etc..... crazy. whatcha out you don't get cloned!
It's a likely thing, no doubt. But I think we probably all already are...
Hey Gen will do this also
Can you do speech to speech with output voice a cloned voiced?
Emily definitely sounds like she isn't thrilled about what you're wearing.
What microphone are you using? The quality is great
It's a Blue Yeti Pro. I have it as close to me as I can without it being in the frame. And I also may be running some compression on it, depending on the video.
Does anyone know if there is a service like this where you can purchase or download your Voice and add it to your Apple or Windows computer?
I want to do an audiobook of my late great father, reading a public domain translation of the Bible. If I’m limited to an amount of words or minutes, I’m gonna spend 1 trillion billion dollars getting that done…
It's possible to change the voice and change some text from the audio too? To change for example the speech of a film
@@hecaz7052 you could certainly use this tool in conjunction with lip syncing software to do something like that.
@@BobDoyleMedia ok because I'm watching a lot of videos and I can see you can change the voice, but not the text... I wanted to be sure I can before paying for it :)
Fun stuff! Thanks
you've an amazing voice, what do you need this for?
As I say in the video, now I can use whatever acting skill I use with my own voice and then apply it to others, thus being able to create a stable of characters that I can offer or use for my own projects.
great vid - we live in exciting times
Indeed!
Brilliant! How can I do a clone of a singing voice?
There are definitely solutions for that which I would love to cover eventually here on the channel. Do a search for RVC voice clone and you’ll find your answer.
I tried it and it makes the result all jumbled up
Hey Bob, nice vid. You want to know how I use it? I’m one of the pre-made Australian voices on ElevenLabs - Friends send me all sorts of crazy stuff people just e used my voice for everyday! As a producer I use it for allowing me to perform reads in voices I don’t have - I even did a read in a 30yo Aussie female voice recently :)
Precisely! That's just the kind of use case I'm talking about. I have another video that addresses this specifically for VO professionals: th-cam.com/video/edNQd2LgBrw/w-d-xo.htmlsi=PbK0i5jN_BeylcB7
You could have picked me!
hmm so how do they calculate the limit using speech to speech?
My guess is that it creates a transcript of what is being, said, and counts the characters.
What is the open source voice cloning best app then ?
Personally I like RVC. With my 3090 I can clone a voice with about 20 minutes of audio in around 30 minutes that sounds pretty good, and can then convert recordings like this, or do "real time" conversion, with a delay depending on your GPU.
Emily is kind of an Emily Downer is wild, or melancholy. 😂😂😂
how much text/time can you upload at one time?
They've changed it since I did the video, and it looks like they'll take up to 50MB audio files. That's a lot!
The thing that concerns me is if you upload your voice to ElevenLabs are you giving them permission to use your voice somewhere else without compensation
I’m not sure that’s 100% true but I will certainly look into it. I think you have to give them permission to use your voice in their marketplace.
It's still a little wonky. The voices can still sound slurred and drunk with the replica feature.
Loved the Deepfake ending
:) Thanks. That's Facefusion. Going to do another video on that.
what if you sing it?
Unfortunately, it does not work, at least in any of the tests I did. Generally, the models require a slightly different type of training if you’re going to use them for singing, but I’m only speaking about training approaches that I know. I really don’t know what ElevenLabs is doing.
It still sounds generated to me. Delivery is flat and unnaturally inflected
Well, I think some voices are better than others, and I believe that like with most AI things, a lot of it has to do with the data going into the model. If the read going in is flat, that's all you're going to get out. That's why I have several models of my voice with a range of modulation.
No hate for 11labs but doesnt Vocs AI & Kits already do this?
I’m not familiar with these as you’ve listed them. Do you have a link? I’d love to check them out!
This is dope, also grapes are toxic to dogs big FYI for those who don't know
I actually do know from first hand experience. Lost my chihuahua after only 2 years. We had no idea, and he loved them as treats. Hard lesson.
@@BobDoyleMedia😢❤
Here's a question, do you think Eleven Labs can get to a point where voices are synthesized in real-time instead of submitting an audio sample file and it churning through it and then spitting out the end result like it has (which is impressive still to say the least)
Whether they do it or not, it's hard to say - but will they be ABLE to? No doubt.
I've seen videos of a man who is using AI on his PC to change his voice to a female anime character in almost real time. The technology is pretty much there.
what engine is he using to do this…
@@danielle78730 It's w-okada AI voice changer, opensource, free, local and easy to use... And you can use it in realtime in games,Discord, etc... and every app with microphone support...
"Cloning" is, making an identical copy. Cloning is NOT voice-to-voice. Voice-to-voice is a conversion. The two are very different. I'm having a tough time finding out what this program is capable of, thanks to illiterate use of terminology. Many are.
Well, I feel like the word "cloning" describes enough for the general public what the result will be. And I guess the term "conversion" isn't as sexy. I get your point, but is it really unclear what the program does? From your viewpoint, I'd say it does exactly what you said: converts. It converts text to speech, and it converts voice to another voice, clearly based on some kind of AI model that is created amazingly fast. "Cloning" or not, I'd say it's amazing.
The program itself is written using non-standard terminology. Most are today. You are obliged, I think, to use the same terminology as the product, right or wrong. What you think is a "sexy" word is irrelevant.
Yes, it is unclear to those who may have a technical vocabulary. Another example is "AI." It has no definition at all. It means "really cool," right? There are quite a few examples. It's a sign of the sorry times we live in.
If you must use undefined terms, consider adding a link to a glossary.
@@BobDoyleMedia
only english((
Yes, good point. Forgot to mention that! Obviously, that will change any moment. :)
@@BobDoyleMedia hope so)
Elevenlabs has multilingual model too.
@@xHeadcleanerx Not for Speech to Speech yet
What the legality of dead actors I wonder
It's can be illegal /harassment/defiling and infringement, unless you get permission, for the dead it's considered false light. The bottom line is, you don't own it. Regardless of legislation not updated fully yet anywhere, there's the morality of it too.
Even the end of this video is infringement on the look, even though it's a satire and genuinely means no harm. Pay for actor release forms, even yourself.
@@TXanders There no morality for the dead they do not need to worry about that. It probably more a public domain thing so maybe after 50 years or less depending on living heirs trying to cash in for work they didn't do.
Yes, your point is totally valid. I guess I'm just "going with it while I can" until firmer rules are in place - but it's going to be hard to backpaddle on this tech, so it will be interesting to see what kind of legislation is created. In my case, I'm always making it evident that it's AI, so I believe that this is currently acceptable to TH-cam, which suits me just fine.@@TXanders
ROFLMFAO!!! That was great!! ha ha ha!!!
Changing the name Emily to George is transphobic
Huh?
The value of putting your face on youtube will not have any value at all because people will think it is all fake. Well I think it is kind of good because then it boils down to the value of the content. But I predict people will upload hundreds of automated content every week all AI generated so it is all going to be BS. Maybe AI will generate uniqness as well so it will all go to BS anyway. Welcome to this BS future.
It all has been fake for years. The focal length of the camera is already changing how you look, video is color graded, using a green screen, high lumen lighting,... Does it matter? No. People look at faces, eyes, mouth, gesticulation, mimics,… because it’s a strong part of the humans multi-sensory toolset for communication and understanding.
\>