About a year ago I thought nothing could top Dall-E... the last two weeks we seem to be taking 10 steps forward every day. I fancy hearing Donald Duck singing some ACDC... and now it's possible.
Lmfao dude ai has limitless potential. Seriously, as a kid I knew this would happen one day and it so happy it's happening in my teen years. One day, AI will be able to generate any scenario prompt you throw at it to amazing detail in a virtual reality space
Brilliant tutorial. I love your videos especially because you use open source programs. I wish you would make an ai music generator tutorial next. Thanks again for your great work.
You sing really well pal 😆 While there's many idiots doing crap videos about it, yours is an excelent tutorial, covers up all the necesary stuff. Congrats from Buenos Aires.
just stumbled upon this channel while looking for a tutorial for so vits svc. I'm thoroughly impressed by the quality of this tutorial. you couldn't be more thorough and clearer. not only you provided a tutorial but also explained what to do about splicing and removing the vocals, you've earned a subscriber.
Here's a tip I've prepared for those encountering the 'ValueError: Invalid integer data type 'f'' error in AudioSlicer. I'd like to share my experience with you. Wav files can be represented in a manner using bit depth as integers or as floats. If you're dealing with a Wav file in float format, you'll encounter an error. Therefore, you need to export your Wav file in int16 format. Keep this in mind.
For anyone wondering why nothing showed up in the folder you set as the install path like me(im very new to all this). just create a dataset_raw folder then a folder with the speaker id(this can be anything) then put the wav files there and then run commands as shown in the video. The folders you see in the video will be created as you run these commands, they aren't instantly there after you run the command to install the so vits fork. also great video 💜
you may already know this but TorToiSe TTS is currently the best open source TTS AI. Its not as good as eleven labs but its pretty powerful, you only need a handful of example audio and a few minutes of training (each run, that's why its called Tortoise). There's also an AI called Bark that focuses more on inflections and emotions but its currently in its infancy and doesn't reliably output usable results
Held out to the last minute and BOOM! was worth it. This is amazing technology. Thanks for sharing. I'm super interested to see a heavy metal drum being transformed into a softer brush percussion for like a jazz piano vibe. We already trained for speech .... and the separation part with spleeter is straight forward.... how can we do it..... some one must be working on this.
My Nerdy Buddy again gave me exactly what i was looking for! Nice, mate! 👊😎 Although, in the " AnimateDiff New Model! A1111, ComfyUI, ControlNet " i did not understand what you meant with Conda, than when I installed an other AI program it messed up the AnimateDiff ( : ... So its very cool that you speaking about this virtual environment feature at 02:35 !
Whenever I try record myself singing the computer refuses to save the file, I checked my permissions and they are all ok, I guess my singing is really that bad
it is so different when using G collab I kind of wish you also did a tutorial along side that as well. There is a lot of guessing and pretty much no info on this online except for this one video on the non g collab version.
Awesome video! I've already made a dataset and it turned out great. However, if I want to create a separate new dataset of another voice, do I simply move my old datasets & log folders out and replace them with new ones and start training from there? Thank you
Thank you very much!! great tutorial. but i stil cant get it to work with collab. But sadly, I dont have enough VRAM to train local. Would you please make a tutorial on how to train in collab??
If anyone could write down all the steps how to install, I would invite him/her/they for a drink. I would really love to try out this fantastic tool, but I'm already stuck at overwriting the code to slice the vocal as I am not sure how to open that window 5:30 to overwrite the input. Thank you for your help in advance.
I dont know anything about python nor anything related to Ai... man I cant even speak in English properly and see me here, trying to learn how to use this, thx for teaching me and keep up the great work!😁😁
i need help lol when i do svc pre-resample it tells me Error: Invalid value for '-i' / '--input-dir': Path 'dataset_raw' does not exist. please i need help what do i do
Great video - these trainings are hitting my CPU, and meanwhile my GPU is chillin' not doing much at all. When Tortoise is processing, it makes a satisfying 'working' sound. It's an Asus 3090 Strix. Not sure if I need to specify GPU anywhere, but I'm using Cuda 11.7 in my env, and followed the default instructions (avoiding the cpu/amd ones). 62c on CPU, 43c on GPU without anything running except, well, this video. - just wondering if I'm missing out by letting my ryzen do the lifting
7:30 I'm a little confused here. I'm not sure if I have any VRAM, because I don't have an external GPU, only a CPU. I'm having trouble figuring out what settings to use. 🤔
This is an impressive tutorial! I want to have voice-to-voice in Italian. Should I record my voice in English or Italian? I guess the model is pretrained only on English voices
Nice tutorial, but I wish you'd demonstrated it on normal speech too. Singing, I think, is actually easier to make sound realistic/good than normal speech, since it's got regular pitch and rhythm, and we the listeners are used to compression and loads of effects, etc. I had a listen to the Element Song short, and it sounded good, though at places it sounded like you had a sore throat. Not sure if that was the model training or poorly split voice/music tracks.
great tutorial! i'm missing a few things, What if you start a new training? you simply delete everything from the logs/44k folder and start training again? or is there some way u can devide trainings in seperated folders? Also what's really the difference between the D_ and G_ part of the files output?
Yup, you can save your old files before starting a new project. You only need the generator if you’re not going to continue training, so it’s fine to remove the discriminator if you’re only doing inference
Hey I'm following the video and up to the part where I'm trying to launch the GUI, the command does nothing. It says 'svcg' is not recognized as an internal or external command. Could you help with this?
I’m not sure how you could run svc train but not svcg as they should both be installed at the same time. I’d suggest running “pip install -U so-vits-svc-fork” again
Thanks Man !!! I just wonder if i can do this on mac mini or macbook pro ? I was no idea at all. or should i buy a laptop with window software. thanks again.
Is there any way to make the realtime voice changer be instant or near instant without degrading the quality or is that just a "in the future" thing. Because it sounds good but the issue is when I say something it doesn't go through for 3 to 4 seconds. I already got a means to use it in games through Voicemeeter but that delay is a bit of a killer.
The following is from Google Translate. Sorry, my English is not good. I deleted some of the old D and G models, and only kept some of the new ones, and they corresponded one-to-one. However, now choosing to continue training prompts me: load old checkpoint failed and will start training from the beginning, what can I do to get it to continue training from the last time?
Can this only be used for singing? I'm looking for a good voice cloning AI that works in different languages. It can also be an AI that transforms an input clip's voice into a target voice.
Hi nerdy rodent, im trying to use this tut and so far its been very comprehensive (tysm!) however i have an amd card and over at pytorch its saying that rocm isnt available for windows. im not exactly very intelligent when it comes to this coding stuff so i was hoping you might ahve asolution. thank you!
hello, could you tell me what the command would be like to execute the inference in colab without the auto prediction and to be able to choose the type of inference to crepe?
Simple question; where exactly does sovits fork install itself? I'm trying to find the folder at 6:37 and I simply can't find it. Yes I'm a windows user and I'm transferring my vocal AI trainings from Collab to locally training so I'm trying to replace files
@@NerdyRodent Ok so I actaully listened to the sentances you were speaking as you were saying that and I've got it now Thanks!! Sorry you need to deal with so many ignorant people in these comments as I see it a lot
Nice! I wonder if you can take the output of a text to speech program like Tortoise TTS or coqui-ai as the input for so-vits-svc to match someone's voice and/or improve quality.
@@ChristianIce I suppose what I'm really asking is how much less robotic the voice would sound if you do. TTS also often sounds over-compressed like an 8KHz telephone conversation rather than a 44.1KHz studio recording. (Now I'm also asking myself if I care enough to start experimenting.)
@@nathanbanks2354 It will sadly sound robotic exactly the same. This technique mimicks the input voice, that's what it makes more natural compared to TTS, to the point it can mimick any other language, no matter what. This sadly means that if you input a flat inexpressive voice, you would just get a flat inexpressive voice as output.
I was thinking, is possible to use this method as a sort of voice changer? For example, i want a character giving me a line with a particular tone, but tts don't give much acting. So i record myself percorming the line, and then use that lime a sort of acapella base.
@@glassmarble996 the “cd” command can be used to change directory, and yes, obviously you can save your files anywhere you like. It doesn’t have to be in your home directory like I show in the video - that’s simply how I organise my own, personal files.
WIld, I already created a soundpack for RC radios with 850 sound files I've recorded and cleaned up of my voice with Audacity... I wasn't using the best mic at th time... some junk from Walmart, that I've since upgraded to a K668B, but still... wild that I already have a massive dataset of audio files of my voice before even getting started.
I just spent an hour scouring the internet for a copy of the "practice for speech quality measurement" text - I can't find access to that file anywhere. Help me out bro -thanks..
Great video as always! I would love to hear an English song so I can pick up on the tone and inflection errors. Curious if any French speakers can comment on how well it did.
As a French speaker I can tell you that it sounds a little bit strange. It is like he has an English accent. Not native French speaker. But it’s still quite good
Depends on your OS! On Linux it'll just be right there integrated into your terminal. For Microsoft Windows it's a bit more complicated as you'll have to locate the "Windows" icon to to click in order to show your installed programs. Check out the beginner's guide in the video description for more information if you're a user of MS Windows :)
Hey Nerdy, thanks for the tutorial. I followed it thinking I would get an "usable text-to-speech" model somehow, but it seems that this procedure is only for voice-to-voice correct ? Or I can use the model in text-to-speech ? I'm not a coder / dev, so it's kinda hard to follow
This was brilliant and so incredibly helpful!!!! I am incredibly new to this and appreciate this video so much! I am at the training step and copied and pasted the 4 commands; however, I am getting several error codes that say invalid value for '-i' (for example) /--input-dir': Path 'data_set' raw does not exist." when it does. Any help will be greatly appreciated. (I searched your comments and already tried the "is dataset_raw" and that did not work either; it says that is, Is, and 1s are not recognized as an internal or external command. I created a dataset_raw folder already with all of my audio clips. I am getting a lot of error messages saying that datasets do not exist.
@NerdyRodent thank you so much for responding; however, I am a noob at this 🥲 can you please tell me how to do that? I feel like this is my last step before I can finally start training it with my voice as Ive gotten 100s of samples spliced, ready to rock and roll. I cannot input those last 4 svc prompts 😭 and i don't want all of this work to go to waste. I'm working from a laptop.
@@NerdyRodent Thank you again. I have rewatched the video several times up to 6:20 and do not see where you made a home directory. I made the dataset_raw folder but not a home one. I restarted the process several times from the python/pip step. I truly do appreciate you taking the time to respond to questions. It means a lot.
Hello, I was wondering, I've already created an enviroment for this, named "so-vits-svc-fork" and I've downloaded the git, but it seems like i can't find them in my file manager?
Do you know where it save it by default? I did cd into the folder before creating the envirorment, but after completing all steps the folder is still empty. (i'm on windows)
Where do you run the pre -resample command? In the command prompt that pops up after the gui does? Or your own command prompt? Because you can't type in the gui prompt and it says svc isn't a recognized command in my own cmd
Great tutorial, thanks! Just a few questions: if I have dataset of 5-7 mins of audio in total for voice, how much epochs is better to train? Because in automatically generated config, there are 10000 epochs and it seems quite a lot
One of my biggest questions trying to learn how to do all this is that if im wanting to make an ai voice of a character I like and then use him for cover songs, but the problem is there's not many singing clips of his voice so i've been just wondering do they having to be singing clips? or can they be talking normally and that's fine? (ik this is probably a stupid question but it's been a huge struggle just trying to learn all this)
Why do my models sounds like freaky ass robots? My friend and I are in a band, I took a vocal recording from one of our songs, and ran it to train on 66 (10second) segments. I let it run for 10000 epochs on the RTX 4090, the end result was pretty shiet tbh! I don't know what I'm doing wrong, is it just training duration? Should I let it run for 10+ hours? I can do that, the GPU is water cooled and the CPU is water cooled I have no issue running this for 24/7
You said it saves if we put the number 1 in the epoch number in config file but it just continues at the latest model that's been saved. :( I need it to start at the latest epoch in the last training. Kind of sucks.
Hi! Thanks for such a great tutorial, but I'm stuck at finding the exact place where u had created 'dataset_raw' I mean generally where I can find so-vits-svc-fork folder? Thank you in advance :D
You can make files and directories wherever you like on your personal computer system. I store pictures in a pictures directory and music in a music directory, for example. For anything github related I work in my github directory. Feel free to use the same file and directory structure as me if you like 😀
Does it work better than RVC? Because RVC hasn't produced a usable song vocal for me once, whether I used downloaded voice models or trained my own. You used a target vocal track that didn't vary in pitch much, how does it work with a real song that varies a lot? With RVC it always came out sounding like autotune in several places, jumping straight from one pitch to the next, and a few places where it just squawked. It never once sounded like a person actually singing like the target file through the whole track. If this so-vits program is going to have similar results to RVC then it won't be anywhere near worth the trouble of setting it up and using it, which is plenty.
When I use the train command, I get this error: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [16, 2, 80, 929] What can I do?
I am a bit confused about Anaconda and creating a directory so that I install all this programs. Any tip where to start there? Your explanation is fast for me.
When you save a file, such as an audio recording, you need somewhere to save that audio file on your personal computer system. Directories are like filing cabinets and are where you can store files on a computer. For example, you could have a directory called “pictures” and save picture files in there or a directory called “Music” where you save music files. To make a new directory you can use the “mkdir” command, such as “mkdir audio” to make a directory called audio. There are also graphical file managers which provide another way to create directories, files and move them around. Hope that clears it up!
Hay Nerdy Robert. That looks awesome. Is there a way to make this as portable AI that can be taken with all of it's data and use it on different computers running it from a disk on key or external hard drive?
Since Colab expires after 12 hours, can I just continue where it stopped from a new notebook? And should commands like "#!rm -r "dataset_raw" #!rm -r "dataset/44k"" be skipped then? Or just completely identical setup like in the first run?
For some reason my comment got deleted but do you have a discord or something like that? I would love to ask some questions because when I run the GUI or with command line it says "ModuleNotFoundError: No module named 'attrs''. EDIT: Nevermind, all I had to do is just upgrade attrs lol. Still no idea why the inference doesn't work on colab though.
I'm a producer who writes and compose the melody and lyrics for all my songs. I don't have a marketable voice so I've always had a hard time finding the right singers for my tracks. My understanding of this technology is limited for the moment but, does anybody here knows if currently one can make a custom voice with AI to overlay my own voice and be able to produce vocals that sound great without the need of a real singer???
Any ideas why pip doesn't work for me? I've tried so many solutions but I just can't get it to install anything! It always says pip isn't recognised. i've tried reinstalling python/pip. Tried changing it via the control panel and Ive tried various commands, this is so annoying!
I'd like to make my thesis about voice cloning, but so far I know there's AI that changes one voice to another (like this one) and another AI that does text-to-speech in the voice it was trained to do No idea if they all work mostly the same or not I'm not sure if you'd be able to point me to any references/literature you know that explain how this works since I'm kinda I've tried looking into the github pages of some of the programs that do voice cloning but all only explain how to use them
‼‼‼At 6:56 it says it requires 14gb Vram. Glad I was paying attention before I installed something that's beyond my systems capabilities. System requirements are the most important thing to note at the beginning of a tutorial or at least the description. Guess my broke ass gotta cough up some cash one way or another 😭
@@NerdyRodent With 8GBs of RAM, I cannot seem to get past the "svc pre-hubert" step with 122 10-second samples, even with setting the batch size lower. I simply get the exact same CUDA OOM error. Is there some way to reduce the VRAM usage during pre-hubert that I'm missing? Decreasing the size of the samples and/or number of samples does not appear to change this.
I’ve installed so-vits on my Mac via homebrew but I’m stuck. I’ve made a full sample pack of voice examples, but I don’t know how to start the training. Every tutorial I see is PC based and puts the dataset in some folder that it seems so-vits made on their system, but I don't know where that’d be on a Mac. I have the audio files, I have the program installed, but I don't know what to do from here.
On Mac the command is “mkdir” to make a directory. You’ll then need to use “cp” or “mv” to put your all your audio into the new directory. You could also use the "graphical user interface", and while I don't have a Mac myself, I did find a guide here - support.apple.com/en-gb/guide/mac-help/mh26885/mac
@@rpvee yup, you’d need to type the name of the directory you want to make, such as “mkdir my-music”. Take a look the link I sent before for a guide on how to use the Mac OS gui. It’s probably worth spending a week or two just getting used to the very basics of your new computer first. You could try things such as making directories, creating files, seeing how you can rename things, editing a text file in a directory - stuff like that.
@@NerdyRodent I guess what I'm confused by is that at 6:10 in your video, it shows a so-vits-svc-fork folder. Is that something you made, or something the program auto-generated somehow?
Nerdy Rodent I am always amazed at your brilliance. You are at the top of your game as always! Thank you for your talent.
Thank you too 😀
Congrats. You got credited as a contributor to the so-vits-svc-fork repo. 👏👏👏
Oh cool! 😀
About a year ago I thought nothing could top Dall-E... the last two weeks we seem to be taking 10 steps forward every day. I fancy hearing Donald Duck singing some ACDC... and now it's possible.
Because why not, eh?
Lmfao dude ai has limitless potential. Seriously, as a kid I knew this would happen one day and it so happy it's happening in my teen years.
One day, AI will be able to generate any scenario prompt you throw at it to amazing detail in a virtual reality space
On the contrary this is only starting
Brilliant tutorial. I love your videos especially because you use open source programs. I wish you would make an ai music generator tutorial next. Thanks again for your great work.
Thank you for the step by step guide and immense insight into the process and steps required to train and infer. We all really appreciate it! ☺️
You're very welcome, and thank you! 😀
You sing really well pal 😆 While there's many idiots doing crap videos about it, yours is an excelent tutorial, covers up all the necesary stuff. Congrats from Buenos Aires.
Glad I could help!
just stumbled upon this channel while looking for a tutorial for so vits svc. I'm thoroughly impressed by the quality of this tutorial. you couldn't be more thorough and clearer. not only you provided a tutorial but also explained what to do about splicing and removing the vocals, you've earned a subscriber.
Great to hear - thanks! :)
Here's a tip I've prepared for those encountering the 'ValueError: Invalid integer data type 'f'' error in AudioSlicer. I'd like to share my experience with you.
Wav files can be represented in a manner using bit depth as integers or as floats. If you're dealing with a Wav file in float format, you'll encounter an error. Therefore, you need to export your Wav file in int16 format. Keep this in mind.
For anyone wondering why nothing showed up in the folder you set as the install path like me(im very new to all this). just create a dataset_raw folder then a folder with the speaker id(this can be anything) then put the wav files there and then run commands as shown in the video. The folders you see in the video will be created as you run these commands, they aren't instantly there after you run the command to install the so vits fork. also great video 💜
Excellent work.
This must be one of the clearest tutorials about a collaboration of github / CLI tools that I have ever stumbled upon. Stunning work 👍 thank you
Glad it was helpful!
Really like your tuts on newest neuro repos! Thank you, Nerdy Rodent!
Glad you like them!
The AI french wasn't that bad. Nice, would love some more in depth guides/tutos to this
There’s not much else to it tbh! 😃
Nerdy Rodent always dropping the best AI tuts around .Now just replace Snoop Dogg's voice with yours and make a sick intro to your vids 🎤🎵
WOW!!! Thats awesome!! I was wating for something like this! Thank you very much
Glad you liked it!
So we could get Elvis interviews for his voice and then have him sing Where The Streets Have No Name by U2?
Wow!! I'm training my voice right now! Thank you for the instructions.
You got this! 👍
how did you get this to work?? ive been at this legit 2 hours and i cant get the gui to open
Thanks!
Welcome!
Awesome tutorial as always!
Would love to see some free, open source text to voice in a good quality
This is indeed free and open source!
you may already know this but TorToiSe TTS is currently the best open source TTS AI. Its not as good as eleven labs but its pretty powerful, you only need a handful of example audio and a few minutes of training (each run, that's why its called Tortoise). There's also an AI called Bark that focuses more on inflections and emotions but its currently in its infancy and doesn't reliably output usable results
Amazing video, so well done and loved your way of explaining it!! ♥
Glad it was helpful!
Held out to the last minute and BOOM! was worth it. This is amazing technology. Thanks for sharing. I'm super interested to see a heavy metal drum being transformed into a softer brush percussion for like a jazz piano vibe. We already trained for speech .... and the separation part with spleeter is straight forward.... how can we do it..... some one must be working on this.
I sort of want to try with out of scope things like animal noises and other weird sounds 😆
Oh there's people working on it alright, these tools are getting crazier by the day
My Nerdy Buddy again gave me exactly what i was looking for! Nice, mate! 👊😎 Although, in the " AnimateDiff New Model! A1111, ComfyUI, ControlNet " i did not understand what you meant with Conda, than when I installed an other AI program it messed up the AnimateDiff ( : ... So its very cool that you speaking about this virtual environment feature at 02:35 !
Whenever I try record myself singing the computer refuses to save the file, I checked my permissions and they are all ok, I guess my singing is really that bad
Uh oh! The computers are taking over 😆
it is so different when using G collab I kind of wish you also did a tutorial along side that as well. There is a lot of guessing and pretty much no info on this online except for this one video on the non g collab version.
This is awesome!! Some day imma donate like $100 to this channel! Don’t have the money yet, but you’ve helped sooo much!
you have a great voice btw!
Thanks 😉
Awesome video! I've already made a dataset and it turned out great. However, if I want to create a separate new dataset of another voice, do I simply move my old datasets & log folders out and replace them with new ones and start training from there? Thank you
You always make videos about things I'm looking for or didn't know I needed. Thanks very much 🫶
😀 Glad to be of service! And much thanks to you too
Thank you very much!! great tutorial. but i stil cant get it to work with collab. But sadly, I dont have enough VRAM to train local. Would you please make a tutorial on how to train in collab??
If anyone could write down all the steps how to install, I would invite him/her/they for a drink. I would really love to try out this fantastic tool, but I'm already stuck at overwriting the code to slice the vocal as I am not sure how to open that window 5:30 to overwrite the input. Thank you for your help in advance.
Got the same problem
oh man. i i am so lost
Just start at the beginning and do the steps in the order shown. You can do it! 😀
I dont know anything about python nor anything related to Ai... man I cant even speak in English properly and see me here, trying to learn how to use this, thx for teaching me and keep up the great work!😁😁
i need help lol
when i do svc pre-resample
it tells me Error: Invalid value for '-i' / '--input-dir': Path 'dataset_raw' does not exist.
please i need help what do i do
Great video - these trainings are hitting my CPU, and meanwhile my GPU is chillin' not doing much at all. When Tortoise is processing, it makes a satisfying 'working' sound. It's an Asus 3090 Strix. Not sure if I need to specify GPU anywhere, but I'm using Cuda 11.7 in my env, and followed the default instructions (avoiding the cpu/amd ones). 62c on CPU, 43c on GPU without anything running except, well, this video. - just wondering if I'm missing out by letting my ryzen do the lifting
The only way I can think of to not use the GPU would be to install the CPU only version of pytorch
7:30 I'm a little confused here. I'm not sure if I have any VRAM, because I don't have an external GPU, only a CPU. I'm having trouble figuring out what settings to use. 🤔
Damn another rabbit hole. Ahh. Awesome
Weeee! 😆
This is an impressive tutorial! I want to have voice-to-voice in Italian. Should I record my voice in English or Italian? I guess the model is pretrained only on English voices
Either. You can just use the any language. Most of the examples were in Japanese.
Nice tutorial, but I wish you'd demonstrated it on normal speech too. Singing, I think, is actually easier to make sound realistic/good than normal speech, since it's got regular pitch and rhythm, and we the listeners are used to compression and loads of effects, etc.
I had a listen to the Element Song short, and it sounded good, though at places it sounded like you had a sore throat. Not sure if that was the model training or poorly split voice/music tracks.
great tutorial! i'm missing a few things, What if you start a new training? you simply delete everything from the logs/44k folder and start training again? or is there some way u can devide trainings in seperated folders? Also what's really the difference between the D_ and G_ part of the files output?
Yup, you can save your old files before starting a new project. You only need the generator if you’re not going to continue training, so it’s fine to remove the discriminator if you’re only doing inference
On 6:30 I'm stuck on where to type that code and where to place my dataset_raw folder. How can I do this?
You can place it on your desktop if you like! :)
These techs really heats my spirit
Hey I'm following the video and up to the part where I'm trying to launch the GUI, the command does nothing. It says 'svcg' is not recognized as an internal or external command. Could you help with this?
I’m not sure how you could run svc train but not svcg as they should both be installed at the same time. I’d suggest running “pip install -U so-vits-svc-fork” again
Thanks Man !!! I just wonder if i can do this on mac mini or macbook pro ? I was no idea at all. or should i buy a laptop with window software. thanks again.
Not locally that I’m aware of - but you could still use the colab. Linux + Nvidia is best if you’re getting new stuff!
Is there any way to make the realtime voice changer be instant or near instant without degrading the quality or is that just a "in the future" thing. Because it sounds good but the issue is when I say something it doesn't go through for 3 to 4 seconds. I already got a means to use it in games through Voicemeeter but that delay is a bit of a killer.
Bedankt
And thank you! 😀
is there a way to add additional information to a training model? or do you have to just retrain the entire data set?
The following is from Google Translate. Sorry, my English is not good. I deleted some of the old D and G models, and only kept some of the new ones, and they corresponded one-to-one. However, now choosing to continue training prompts me: load old checkpoint failed and will start training from the beginning, what can I do to get it to continue training from the last time?
great tutorial!
but can you show how to continue developing the model with newly added sample data?
We need a model repository for this project
Awesome content!
Can this only be used for singing? I'm looking for a good voice cloning AI that works in different languages. It can also be an AI that transforms an input clip's voice into a target voice.
It’s voice to voice 😉
Hi nerdy rodent, im trying to use this tut and so far its been very comprehensive (tysm!) however i have an amd card and over at pytorch its saying that rocm isnt available for windows. im not exactly very intelligent when it comes to this coding stuff so i was hoping you might ahve asolution. thank you!
Yes, Linux is best for AI performance, stability, ease of use and compatibility.
hello, could you tell me what the command would be like to execute the inference in colab without the auto prediction and to be able to choose the type of inference to crepe?
A very good tool. well explained thank you sir
Simple question; where exactly does sovits fork install itself?
I'm trying to find the folder at 6:37 and I simply can't find it. Yes I'm a windows user and I'm transferring my vocal AI trainings from Collab to locally training so I'm trying to replace files
As described and shown in the video, the dataset directory will be created from your dataset_raw directory, once you run the processing commands
@@NerdyRodent Ok so I actaully listened to the sentances you were speaking as you were saying that and I've got it now
Thanks!!
Sorry you need to deal with so many ignorant people in these comments as I see it a lot
Nice! I wonder if you can take the output of a text to speech program like Tortoise TTS or coqui-ai as the input for so-vits-svc to match someone's voice and/or improve quality.
Yes, you obviously can ;)
Yup. Or tts in another language too!
@@ChristianIce I suppose what I'm really asking is how much less robotic the voice would sound if you do. TTS also often sounds over-compressed like an 8KHz telephone conversation rather than a 44.1KHz studio recording.
(Now I'm also asking myself if I care enough to start experimenting.)
@@nathanbanks2354 i have the same question, pls update if you try it out...
@@nathanbanks2354
It will sadly sound robotic exactly the same.
This technique mimicks the input voice, that's what it makes more natural compared to TTS, to the point it can mimick any other language, no matter what.
This sadly means that if you input a flat inexpressive voice, you would just get a flat inexpressive voice as output.
The GUI version works amazingly well, but I can't seem to get the CLI version going... Anyone been down that road?
I was thinking, is possible to use this method as a sort of voice changer? For example, i want a character giving me a line with a particular tone, but tts don't give much acting. So i record myself percorming the line, and then use that lime a sort of acapella base.
Yup!
Thank you so much, this was super helpful! I have mine training now 🎉
😀
Invalid value for '-i' / '--input-dir': Path 'dataset_raw' does not exist. IT DOES EXIST!
You can type "dir dataset_raw" or "ls dataset_raw" to see if you really have created the directory where you think ;)
@@NerdyRodent looks like my dataset should be under C:\users\username. How can i change the directory.I dont want to fill up my ssd
@@glassmarble996 the “cd” command can be used to change directory, and yes, obviously you can save your files anywhere you like. It doesn’t have to be in your home directory like I show in the video - that’s simply how I organise my own, personal files.
@@NerdyRodent hey thanks for answer.old style cd command did not work, but cd /d d:\Ai this solved my problem.
WIld, I already created a soundpack for RC radios with 850 sound files I've recorded and cleaned up of my voice with Audacity... I wasn't using the best mic at th time... some junk from Walmart, that I've since upgraded to a K668B, but still... wild that I already have a massive dataset of audio files of my voice before even getting started.
Sounds great!
How would I go about adding more samples to an already trained model?
I just spent an hour scouring the internet for a copy of the "practice for speech quality measurement" text - I can't find access to that file anywhere. Help me out bro -thanks..
Hi, can I stop in the middle of training and then continue at the same point the next time? If yes how?
Yup. It will automatically continue
can you do other things than switching the language
I heard that it can slightly alter the lycris while still keeping the same energy
Great video as always! I would love to hear an English song so I can pick up on the tone and inflection errors. Curious if any French speakers can comment on how well it did.
Then check out the yt short where I sing the elements song - particularly near the end with the non-standard pronunciation of the word “discovered” 😉
As a French speaker I can tell you that it sounds a little bit strange. It is like he has an English accent. Not native French speaker. But it’s still quite good
Man 2 mins in im already stuck !! hehe how do you open up the anaconda prompt ?? thx
Depends on your OS! On Linux it'll just be right there integrated into your terminal. For Microsoft Windows it's a bit more complicated as you'll have to locate the "Windows" icon to to click in order to show your installed programs. Check out the beginner's guide in the video description for more information if you're a user of MS Windows :)
Hey Nerdy, thanks for the tutorial.
I followed it thinking I would get an "usable text-to-speech" model somehow, but it seems that this procedure is only for voice-to-voice correct ? Or I can use the model in text-to-speech ?
I'm not a coder / dev, so it's kinda hard to follow
Yes, this is voice to voice software where you can turn one voice into another. No coding or development required!
@@NerdyRodent But can I use it for a text-to-speech software?
@@arafatdeluxe1120 No, this is not text to speech it is voice to voice, where one voice is changed into another voice. Hope that makes sense!
This was brilliant and so incredibly helpful!!!! I am incredibly new to this and appreciate this video so much! I am at the training step and copied and pasted the 4 commands; however, I am getting several error codes that say invalid value for '-i' (for example) /--input-dir': Path 'data_set' raw does not exist." when it does. Any help will be greatly appreciated. (I searched your comments and already tried the "is dataset_raw" and that did not work either; it says that is, Is, and 1s are not recognized as an internal or external command. I created a dataset_raw folder already with all of my audio clips. I am getting a lot of error messages saying that datasets do not exist.
Cd to the parent directory first
@NerdyRodent thank you so much for responding; however, I am a noob at this 🥲 can you please tell me how to do that? I feel like this is my last step before I can finally start training it with my voice as Ive gotten 100s of samples spliced, ready to rock and roll. I cannot input those last 4 svc prompts 😭 and i don't want all of this work to go to waste. I'm working from a laptop.
@@Sahgee Just like I show in the video, if you made /home/nerdy/dataset_raw you would “cd /home/nerdy” as the parent directory 🤓
@@NerdyRodent Thank you again. I have rewatched the video several times up to 6:20 and do not see where you made a home directory. I made the dataset_raw folder but not a home one. I restarted the process several times from the python/pip step. I truly do appreciate you taking the time to respond to questions. It means a lot.
@@Sahgee your home directory is created automatically for each user by the operating system
Great tutorial, but I still have some questions about what each parameter in the Common tab means, how I can adjust it, and where I can learn it.
See github.com/34j/so-vits-svc-fork/discussions/190
@@NerdyRodent thanks
Do you know where the HuBERT model of so-vits-svc-fork is?
How is the latency in the real-time voice changing? Let's say, when the latest RTX cards are used? Can it benefit from multiple cards?
Not too bad, just a slight delay
Hello, I was wondering, I've already created an enviroment for this, named "so-vits-svc-fork" and I've downloaded the git, but it seems like i can't find them in my file manager?
Unfortunately you’ll need to remember where you decided to download to. Personally, I save everything in the “GitHub” directory I created in my home 😀
@@NerdyRodent Huh, could you inform me on how do it likes yours?
Like, how do I put it in another folder, thanks in advance!
All I did was make a new directory called "github" (mkdir github). That's it. Done - now you have a new home for everything github related! :)
@@NerdyRodent i see, thank you!
Do you know where it save it by default? I did cd into the folder before creating the envirorment, but after completing all steps the folder is still empty. (i'm on windows)
Where do you run the pre -resample command? In the command prompt that pops up after the gui does? Or your own command prompt? Because you can't type in the gui prompt and it says svc isn't a recognized command in my own cmd
Great tutorial, thanks! Just a few questions: if I have dataset of 5-7 mins of audio in total for voice, how much epochs is better to train? Because in automatically generated config, there are 10000 epochs and it seems quite a lot
Yeah, I start at 20k steps
@@NerdyRodent My 128/16 * 3125 = 25,000 steps, so my epoch is 3125, is this alright?
So, this was the surprise. Cool 😎🥶
One of my biggest questions trying to learn how to do all this is that if im wanting to make an ai voice of a character I like and then use him for cover songs, but the problem is there's not many singing clips of his voice so i've been just wondering do they having to be singing clips? or can they be talking normally and that's fine? (ik this is probably a stupid question but it's been a huge struggle just trying to learn all this)
They need to be audio clips of the voice. Try to avoid any backing music.
Why do my models sounds like freaky ass robots?
My friend and I are in a band, I took a vocal recording from one of our songs, and ran it to train on 66 (10second) segments. I let it run for 10000 epochs on the RTX 4090, the end result was pretty shiet tbh! I don't know what I'm doing wrong, is it just training duration? Should I let it run for 10+ hours? I can do that, the GPU is water cooled and the CPU is water cooled I have no issue running this for 24/7
Very nice and it works on OSX to, a bit tricky with pyenv & ancanda but works great even without GPU
Awesome, good to know!
You said it saves if we put the number 1 in the epoch number in config file but it just continues at the latest model that's been saved. :( I need it to start at the latest epoch in the last training. Kind of sucks.
Not quite, but very close! The suggestion in the video is actually to _add_ 1 to the number of epochs so that it saves the final checkpoint ;)
@@NerdyRodent I understand it now. Thank you. :)
Hi! Thanks for such a great tutorial, but I'm stuck at finding the exact place where u had created 'dataset_raw' I mean generally where I can find so-vits-svc-fork folder? Thank you in advance :D
You can make files and directories wherever you like on your personal computer system. I store pictures in a pictures directory and music in a music directory, for example. For anything github related I work in my github directory. Feel free to use the same file and directory structure as me if you like 😀
@@NerdyRodent Unlucky :(( It turned out I dont have CUDA and AMD can't work. At least I tried...
Does it work better than RVC? Because RVC hasn't produced a usable song vocal for me once, whether I used downloaded voice models or trained my own. You used a target vocal track that didn't vary in pitch much, how does it work with a real song that varies a lot? With RVC it always came out sounding like autotune in several places, jumping straight from one pitch to the next, and a few places where it just squawked. It never once sounded like a person actually singing like the target file through the whole track. If this so-vits program is going to have similar results to RVC then it won't be anywhere near worth the trouble of setting it up and using it, which is plenty.
pls can you do the google colab tutoral too, i've tried it but im not sure what im doign wrong, i would appreciate if u can show pls
When I use the train command, I get this error:
Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [16, 2, 80, 929]
What can I do?
fantastic tutorial! Is there a way to use my new cloned voice to read text? I would like to do voiceovers but with my own voice in spanish
Yes, simply use any text to speech then you have a voice to do voice-to-voice on!
What would happen if I combine my voice samples with a friend of mine's voice samples? Does it give some sort of hyrbid between the two?
I am a bit confused about Anaconda and creating a directory so that I install all this programs. Any tip where to start there?
Your explanation is fast for me.
When you save a file, such as an audio recording, you need somewhere to save that audio file on your personal computer system. Directories are like filing cabinets and are where you can store files on a computer. For example, you could have a directory called “pictures” and save picture files in there or a directory called “Music” where you save music files. To make a new directory you can use the “mkdir” command, such as “mkdir audio” to make a directory called audio. There are also graphical file managers which provide another way to create directories, files and move them around. Hope that clears it up!
Please, a video for google colab! Thank you so much!
Hay Nerdy Robert. That looks awesome. Is there a way to make this as portable AI that can be taken with all of it's data and use it on different computers running it from a disk on key or external hard drive?
Sure, you could indeed add it to your Ubuntu portable usb stick
Hi,
after a couple of runs, Im getting a CUDA out of memory exception...how do I fix this? thanks
Since Colab expires after 12 hours, can I just continue where it stopped from a new notebook? And should commands like "#!rm -r "dataset_raw"
#!rm -r "dataset/44k"" be skipped then? Or just completely identical setup like in the first run?
Yup, it just carries on from the last save
@@NerdyRodent Thanks a lot for answering!
@@mediation7997 Did it carry on for you? for me it starts all over
For some reason my comment got deleted but do you have a discord or something like that? I would love to ask some questions because when I run the GUI or with command line it says
"ModuleNotFoundError: No module named 'attrs''.
EDIT: Nevermind, all I had to do is just upgrade attrs lol. Still no idea why the inference doesn't work on colab though.
I'm a producer who writes and compose the melody and lyrics for all my songs. I don't have a marketable voice so I've always had a hard time finding the right singers for my tracks. My understanding of this technology is limited for the moment but, does anybody here knows if currently one can make a custom voice with AI to overlay my own voice and be able to produce vocals that sound great without the need of a real singer???
Any ideas why pip doesn't work for me? I've tried so many solutions but I just can't get it to install anything! It always says pip isn't recognised. i've tried reinstalling python/pip. Tried changing it via the control panel and Ive tried various commands, this is so annoying!
svc pre-resample just spits out Preprocessing: 0it [00:00, ?it/s]. Someone having an advice?
I'd like to make my thesis about voice cloning, but so far I know there's AI that changes one voice to another (like this one) and another AI that does text-to-speech in the voice it was trained to do
No idea if they all work mostly the same or not
I'm not sure if you'd be able to point me to any references/literature you know that explain how this works since I'm kinda
I've tried looking into the github pages of some of the programs that do voice cloning but all only explain how to use them
Check the papers for more info, e.g. arxiv.org/abs/2206.04658
@@NerdyRodent thanks, love u
‼‼‼At 6:56 it says it requires 14gb Vram. Glad I was paying attention before I installed something that's beyond my systems capabilities. System requirements are the most important thing to note at the beginning of a tutorial or at least the description. Guess my broke ass gotta cough up some cash one way or another 😭
Use a lower batch size with less VRAM
@@NerdyRodent With 8GBs of RAM, I cannot seem to get past the "svc pre-hubert" step with 122 10-second samples, even with setting the batch size lower. I simply get the exact same CUDA OOM error. Is there some way to reduce the VRAM usage during pre-hubert that I'm missing? Decreasing the size of the samples and/or number of samples does not appear to change this.
Merci pour cette super vidéo mon rat préféré !
🥰
merci, et vous êtes les bienvenus!
Sorry if I missed this. Is it possible to train for free on colab? If not, what is the most cost efficient cloud service for training?
Yes, free colab is enough
I’ve installed so-vits on my Mac via homebrew but I’m stuck. I’ve made a full sample pack of voice examples, but I don’t know how to start the training. Every tutorial I see is PC based and puts the dataset in some folder that it seems so-vits made on their system, but I don't know where that’d be on a Mac. I have the audio files, I have the program installed, but I don't know what to do from here.
On Mac the command is “mkdir” to make a directory. You’ll then need to use “cp” or “mv” to put your all your audio into the new directory. You could also use the "graphical user interface", and while I don't have a Mac myself, I did find a guide here - support.apple.com/en-gb/guide/mac-help/mh26885/mac
@@NerdyRodent I've typed "mkdir" but do I have to type anything else with that? Sorry, I'm a complete beginner at this.
@@rpvee yup, you’d need to type the name of the directory you want to make, such as “mkdir my-music”. Take a look the link I sent before for a guide on how to use the Mac OS gui. It’s probably worth spending a week or two just getting used to the very basics of your new computer first. You could try things such as making directories, creating files, seeing how you can rename things, editing a text file in a directory - stuff like that.
@@NerdyRodent I guess what I'm confused by is that at 6:10 in your video, it shows a so-vits-svc-fork folder. Is that something you made, or something the program auto-generated somehow?
@@rpvee yes, I used the “mkdir” command to make the directory.