6:20 This isn't actually necessary. To save more logs to the tensorboard you can change the setting in configs/48k_v2.json (or the Json of the model type you selected when creating) and edit "log_interval" down from 200 to 25 steps to save the loss to tensorboard more often without needing to save the model state at all.
Good tip, but the purpose I was going for wasn't for logging, it was for having the model to use. You can check the logs to find a good spot, but if you don't have the model at that point... Well you gotta retrain it lol.
Once again a great tutorial on Ai/Deeplearning wrapped in a really easy for end user to understand! Also, I notice some of the EN Holo members talked about your video in one of their stream which is pretty cool, maybe this is a question for your Q&A session next time. what are your thought on one of your videos getting their attention and how do you feel about their reaction to it? Keep at it mate and I can't wait for your next videos!
Appreciate it! I'm glad to hear that my content is resonating with you :)! As for their reaction to my videos, it's always intriguing to see how different people respond to the technology I discuss, esp large streamers. I knew sooner or later they would come across it and it's intriguing to hear their POV on this topic. Is it Watson?
@@Jarods_Journey True true! I guess if its streamers they'll have mix reaction regarding this AI stuff as I think influencer such as them will be the first one to be copied. and yes haha right on the money mate 😄✨
Sorry for the dumb question, but which one of the lines (vertical/horizontal) in thosebgraphs represents my epoch ? How did you which epoch was when you made your comparison with the 70k model ?
Dear TH-cam channel owner, I am a fan of your channel and regularly watch your videos with great enthusiasm. I really admire how you use background music in your videos, which makes each video more vibrant and engaging. Indeed, I have been searching and would like to know the list of background music used in some recent videos on your channel.
What's the difference between Epoch and steps? My models sounds TERRIBLE (noisy, robotic, staticky) with 40 minutes of super clean and crisp datasets at 200 Epochs and 18,600 steps... Is that amount of steps normal for 200 Epochs?
For those of us with crappy or no GPU who have been using google colab to make models, KIts AI has an upload feature for .pth and index. It allows for free generation of audio it has, which includes singing and talking. You can also upload up to 15 minutes of your own audio bites, but I use the free generation to check as it doesn't use the limitation for free accounts. I have also reused decent singing samples generated from voice back into my dataset for the model to train it to sing certain notes or just sing in general along with all the talking data I have for each. I've made 8 character models and I can tell that they sound like them with the singing and talking free samples Kits AI allows. You can upload lots of models at once and run them all in the same sample. I've compared different epoch points this way too with the same model or different data sets. It's faster than RVC in some cases and allows me to test singing specifically with out having RVC use my CPU to generate input.
why does No dashboards are active for the current data set. appear? I have trained model and if it looking for index file inside the logs folder i have one
Can you explain mathematically what the graph is showing? It looks like it’s a ratio between different numbers, but I can’t tell which numbers they are.
hey jarod, thanks for all the videos! i was wondering if you can make a video on how to set it up on roblox, cs:go, val, etc. bceause it isn't seeming to work.
Great video! 1. Can you explain for once the importance of an index file? I literally can’t hear the difference. 2. Can I regenerate an index file if I have only the pth file? Thanks! You have amazing RVC content.
Index aims to keep more features of the voice, so it can help it maintaining accents. I can usually barely hear the difference as well. You need all the audio training data for the index file unfortunately, not the pth
Hi, i set up save frequency at 50 and every 50 epochs it create 2 different pth files one called G_"epochnumber".pth and another one d_"epochnumber".pth when i try to load any one of them to test it out i've got an error. Did you have them automatically loaded into ur voice models .pth list? thanks!
Hi Jarod! What are the duration of WAV files in 0_gt_wavs and 1_16k_wavs directories after the step 2a? Mine are 3 seconds long. I have quite clean data set, 3 hour long. But it seems that the loss_mel stuck somewhere at 20 after 120 epoches. And the generated voice sounds a bit synthetic.
I often see that longer datasets may not be necessary, I would try with an hour dataset to make training a bit faster, and then try training for longer. The files in folder 0 and 1 and from the preprocess step and will be cut to around 3 seconds for RVC to process. Synthetic may depend on what you're trying to inference on, so try other audio files to see if that could be the case here too.
Good video!! I like your content about rcv (although I don't know python lol). Which would you say is the best extraction algorithm between harvest, crepe and magio crepe? (I have a good dataset only spoken but I would like to use the AI to do covers)
Hi Jarods! You can make a tutorial about using RVC without interface? using prompt comands! Because im trying integrade RVC to my pyton code! text to speak default and default to RVC. thanks
I think you just have to train more epochs to get higher step counts in the network, thus potentially improving neural network performance (though not always). The number of individual steps (parameter updates) in an epoch depends on the dataset size. For example, with a dataset of 10,000 images and a batch size of 100, each epoch would have 100 steps.
Hello I have a question and I cannot find any solution for this. I have 1277 around 5s wav files for training. On my RTX 3080 it took 13 hours to generate 10 epochs, using 26 batch size. Is it normal for it to take so long with this dataset? I want to train on total 200 epochs, how do I go about it then? On collab, one epoch on the same dataset took few minutes, but I cant use it because the notebook will still terminate to early. I then used a dataset with 200 files but it still took verylong, on 20 files one epoch took mabye 5 mins? I honestly dont know what is the issue.
Odd, you might wanna increase your batch size. That comes out to less than 2 hours of audio for the whole dataset... So at most I would assume it be closer to maybe 10 minutes per epoch. Make sure you have torch installed correctly on your system
Have you played with mangio crepe v2 at all? I was running it on colab and it had tensorboard built in. It seems like everybody is using it now but I don't know that I was getting much better/faster results. Just curious if you haven't gotten around to messing with it or if you think RVC v1 is better. Thanks for the videos, very helpful!
RVC's max is 1000. You can try download mangios RVC which allows you to train longer I believe. The only way to enable it in RVC is the manually edit the parameter in the Gradio code.
Something may have occured during training... Though idk what. You may have to run training again under a different experiment name to get the train graphs
Wait so, can I be watching this tensorboard in realtime to save time and stop training when it flatlines? Can you ELI5 a simple rule of thumb for choosing how many epochs to train at? I've got a tensorboard graph that goes down to 10k and then starts rising again from 25k to 40k...
There's no rule for epoch count because all datasets are different. You may converge faster on one compared to another. Its pretty much a game of just looking at your training and making speculations on what the best stopping point is going to be
@@Jarods_Journey Good to know, thanks. Do you have a discord server at all? There are discords for basically everything but I can't find one for local TTS.
Hi Jarod, thanks for all the good work. I trained for 10000 epochs and checkpointed every 50 epochs and I have not even one model which is low on the graph, they're all on "high" points of the graph. What can I do to get the best one :/ ?
You could try saving more often so you can get one to save at a valley, but I just recommend you pick one and listen to it. Graph numbers don't really mean much, just as long as it was going down it should be fine in most cases.
Well, most likely you'll want to host it somewhere and then allow you're customers to select a voice and use it. Either that, or you can distribute the model to them... though that would most likely be too cumbersome I'd assume.
This is super useful, but for some reason my tensorboard graph does not look as detailed as the one you have. I trained for 200 epochs and checkpointed every 20 epochs, but I am not seeing the graphs plot with that many points I only see about 2 - 3 points (very simplified) vs. what you're seeing with lots of points for each step. I made a fresh conda environment with python 3.9 and latest tensorboard and followed your steps. Any advice?
@@Jarods_Journey stupid question, i have 3 hours worth of clean and crisp audio recording. if i cut that off and have like 10 seconds and make like 1000 files, do i need to train it one by one. or can i just precess it once? thanks for the reply
@@whitegoose-k4g You can load all files into a folder after chopping it up and that will be processed. The more audio you have the slower it will run. Also you will need to lower your batch size significantly. It is useful to chop up the data, so it can be loaded and buffered in smaller chunks. Cmaller chunks provide more felxibility in loading in the data allowing the buffer to be as close to full as possible.
They describe V2 as using a "768 dimensional feauture of 12 layer hubert" which means that it can perhaps capture more feautres of the voice due to the dimension of the vector being larger and training through more layers to hopefully learn the complexities of the voice better.
So hyped for the dataset analysis video man, keep it up!
6:20 This isn't actually necessary.
To save more logs to the tensorboard you can change the setting in configs/48k_v2.json (or the Json of the model type you selected when creating) and edit "log_interval" down from 200 to 25 steps to save the loss to tensorboard more often without needing to save the model state at all.
Good tip, but the purpose I was going for wasn't for logging, it was for having the model to use. You can check the logs to find a good spot, but if you don't have the model at that point... Well you gotta retrain it lol.
Once again a great tutorial on Ai/Deeplearning wrapped in a really easy for end user to understand!
Also, I notice some of the EN Holo members talked about your video in one of their stream which is pretty cool, maybe this is a question for your Q&A session next time. what are your thought on one of your videos getting their attention and how do you feel about their reaction to it? Keep at it mate and I can't wait for your next videos!
Appreciate it! I'm glad to hear that my content is resonating with you :)!
As for their reaction to my videos, it's always intriguing to see how different people respond to the technology I discuss, esp large streamers. I knew sooner or later they would come across it and it's intriguing to hear their POV on this topic. Is it Watson?
@@Jarods_Journey True true! I guess if its streamers they'll have mix reaction regarding this AI stuff as I think influencer such as them will be the first one to be copied. and yes haha right on the money mate 😄✨
Sorry for the dumb question, but which one of the lines (vertical/horizontal) in thosebgraphs represents my epoch ? How did you which epoch was when you made your comparison with the 70k model ?
Dear TH-cam channel owner,
I am a fan of your channel and regularly watch your videos with great enthusiasm. I really admire how you use background music in your videos, which makes each video more vibrant and engaging.
Indeed, I have been searching and would like to know the list of background music used in some recent videos on your channel.
All belong to and are created by a Japanese artist, しゃろう
@@Jarods_Journey Thanks you so much
I hope they will be upgraded voice cloning of RVC . This current one is nice but looking forward next generation
What's the difference between Epoch and steps? My models sounds TERRIBLE (noisy, robotic, staticky) with 40 minutes of super clean and crisp datasets at 200 Epochs and 18,600 steps... Is that amount of steps normal for 200 Epochs?
When I tried training in mangio, my model also sounded terrible. So I used Applio which is better
I just do 250 epochs and consider it done.
For those of us with crappy or no GPU who have been using google colab to make models, KIts AI has an upload feature for .pth and index. It allows for free generation of audio it has, which includes singing and talking. You can also upload up to 15 minutes of your own audio bites, but I use the free generation to check as it doesn't use the limitation for free accounts.
I have also reused decent singing samples generated from voice back into my dataset for the model to train it to sing certain notes or just sing in general along with all the talking data I have for each.
I've made 8 character models and I can tell that they sound like them with the singing and talking free samples Kits AI allows. You can upload lots of models at once and run them all in the same sample. I've compared different epoch points this way too with the same model or different data sets.
It's faster than RVC in some cases and allows me to test singing specifically with out having RVC use my CPU to generate input.
why does No dashboards are active for the current data set. appear? I have trained model and if it looking for index file inside the logs folder i have one
Can you explain mathematically what the graph is showing? It looks like it’s a ratio between different numbers, but I can’t tell which numbers they are.
I could do it yesterday, today I get errno13; no permission. What am I doing wrong? help please
How do I get tensorboard on the google colab version?
Everytime, I get an error code and it won't show up
what do you mean by overtraining? do you mean more epochs or more new datasets fed in the pre trained model ..Sorry for my bad English ❤
When I do the venv Scripts activate and hit enter I get "Cannot be loaded because running scripts is disabled on this system" Help
hey jarod, thanks for all the videos! i was wondering if you can make a video on how to set it up on roblox, cs:go, val, etc. bceause it isn't seeming to work.
Great video!
1. Can you explain for once the importance of an index file?
I literally can’t hear the difference.
2. Can I regenerate an index file if I have only the pth file?
Thanks!
You have amazing RVC content.
Index aims to keep more features of the voice, so it can help it maintaining accents. I can usually barely hear the difference as well.
You need all the audio training data for the index file unfortunately, not the pth
How can i unistall AI Voice Changer ?
Hi. How train above 1000 Total training epochs. If I put some bigger number it returns max 1000 epoch.
Is there a way to conitnue training from the last epoch i stopped ? Lets say in another day after i already closed the app. Thank you
Sir Jarod, do you have a video on hyperparameters?
Hi, i set up save frequency at 50 and every 50 epochs it create 2 different pth files one called G_"epochnumber".pth and another one d_"epochnumber".pth
when i try to load any one of them to test it out i've got an error.
Did you have them automatically loaded into ur voice models .pth list?
thanks!
Hi Jarod! What are the duration of WAV files in 0_gt_wavs and 1_16k_wavs directories after the step 2a? Mine are 3 seconds long. I have quite clean data set, 3 hour long. But it seems that the loss_mel stuck somewhere at 20 after 120 epoches. And the generated voice sounds a bit synthetic.
I often see that longer datasets may not be necessary, I would try with an hour dataset to make training a bit faster, and then try training for longer. The files in folder 0 and 1 and from the preprocess step and will be cut to around 3 seconds for RVC to process.
Synthetic may depend on what you're trying to inference on, so try other audio files to see if that could be the case here too.
on RVC is the houshou Marine voice alr available?
Good video!! I like your content about rcv (although I don't know python lol). Which would you say is the best extraction algorithm between harvest, crepe and magio crepe? (I have a good dataset only spoken but I would like to use the AI to do covers)
if spoken dataset: preprocess with harvest (best for speak) and later inference with mangio-crepe
@@blakusp what if singing dataset? what the best option for that
@@nxrthsidebxy mangio-crepe for sure
what about rvmpe? @@blakusp
what do you meen by steps?
Hi Jarods! You can make a tutorial about using RVC without interface? using prompt comands! Because im trying integrade RVC to my pyton code! text to speak default and default to RVC. thanks
Is over training bad quality wise?
I want to create audiobook with my clone voice for free is there anything to do it? Plz reply
Hey Jarod, great video! One question though, from what folder did you take the models from step 10k and 70k?
Rvc folder, the weights folder
Only my final version is in there, what about the versions in between?
@@jerrythefeared should be in drive under, rvc_backup -> weights -> in the format of 'modelname_e[epochnumber]_s[stepnumber].pth
I think you just have to train more epochs to get higher step counts in the network, thus potentially improving neural network performance (though not always). The number of individual steps (parameter updates) in an epoch depends on the dataset size. For example, with a dataset of 10,000 images and a batch size of 100, each epoch would have 100 steps.
Hi, do you know if the model created with RVC can be reused with other clients for text to speech for example ? In onnx format ?
No, tts is a completely different architecture and atm, I don't know of any that convert to each other.
Hello I have a question and I cannot find any solution for this. I have 1277 around 5s wav files for training. On my RTX 3080 it took 13 hours to generate 10 epochs, using 26 batch size. Is it normal for it to take so long with this dataset? I want to train on total 200 epochs, how do I go about it then? On collab, one epoch on the same dataset took few minutes, but I cant use it because the notebook will still terminate to early. I then used a dataset with 200 files but it still took verylong, on 20 files one epoch took mabye 5 mins? I honestly dont know what is the issue.
Odd, you might wanna increase your batch size. That comes out to less than 2 hours of audio for the whole dataset... So at most I would assume it be closer to maybe 10 minutes per epoch.
Make sure you have torch installed correctly on your system
Have you played with mangio crepe v2 at all? I was running it on colab and it had tensorboard built in. It seems like everybody is using it now but I don't know that I was getting much better/faster results. Just curious if you haven't gotten around to messing with it or if you think RVC v1 is better.
Thanks for the videos, very helpful!
I haven't gotten around to mangios V2 but a lot of people recommend it! I will test it out sometime so until then
@@Jarods_Journey you need to delete and reinstall the latest mangio rvc. I did a git pull and mangio crepe didn't show up.
Your videos are 10/10
if I've trained a model for 1,000 steps can I train it again for 1,000 more after it's finished? It seems 1,000 is the max with RVC to do at one time.
RVC's max is 1000. You can try download mangios RVC which allows you to train longer I believe.
The only way to enable it in RVC is the manually edit the parameter in the Gradio code.
Thank you, Jarod! ♥
Hi can you mix between several models ?
For mine it just said NaN on both the smoothed and Value things, how would I change that?
Something may have occured during training... Though idk what. You may have to run training again under a different experiment name to get the train graphs
Hey Jarod, how about a tutorial on creating deepfake or deepface or faceswap videos? I think you could show us how to do it better than others. Thanks
Wait so, can I be watching this tensorboard in realtime to save time and stop training when it flatlines?
Can you ELI5 a simple rule of thumb for choosing how many epochs to train at? I've got a tensorboard graph that goes down to 10k and then starts rising again from 25k to 40k...
There's no rule for epoch count because all datasets are different. You may converge faster on one compared to another. Its pretty much a game of just looking at your training and making speculations on what the best stopping point is going to be
@@Jarods_Journey Good to know, thanks.
Do you have a discord server at all? There are discords for basically everything but I can't find one for local TTS.
When are you testing, are you referencing the index file at all? Is it really needed?
index is not needed for good quality, it mainly just affects how close of an accent the voice has.
Hi Jarod, thanks for all the good work. I trained for 10000 epochs and checkpointed every 50 epochs and I have not even one model which is low on the graph, they're all on "high" points of the graph. What can I do to get the best one :/ ?
You could try saving more often so you can get one to save at a valley, but I just recommend you pick one and listen to it. Graph numbers don't really mean much, just as long as it was going down it should be fine in most cases.
Once i train a model, how can i make it available to my customers?
Well, most likely you'll want to host it somewhere and then allow you're customers to select a voice and use it. Either that, or you can distribute the model to them... though that would most likely be too cumbersome I'd assume.
Very informative, thanks for the info
Bro my Buffer is really high and increasing a lot the response time, i've noticed yours stays almost at 0. How do i fix this?
Hardware limitation unfortunately
What’s a good way to know what octave the training data is in?
By listening to it :)!
Hey there is there any guide to download the ai soundboard? im kinda new to this thing
Not sure what the AI soundboard is, but no guide on my channel
@@Jarods_Journey ah okay!
feels like you can find rvc models but now none of the programs work to train new models
you are the best !!!
Good job.... but now I'm hearing about rvmpe...?
Similar approach I can use in Collab as well right?
Yup, you can run tensorboard in Collab if you want buy youll have to find the code to do that somewhere out there
This is super useful, but for some reason my tensorboard graph does not look as detailed as the one you have. I trained for 200 epochs and checkpointed every 20 epochs, but I am not seeing the graphs plot with that many points I only see about 2 - 3 points (very simplified) vs. what you're seeing with lots of points for each step. I made a fresh conda environment with python 3.9 and latest tensorboard and followed your steps. Any advice?
I did one at every epoch lol, that's why. For video purposes only, I don't recommend you do this as it increases the total training time.
great stuff!
help, my 6GB 3060 out of VRAm. can i fix that? or should i just cry?
Lower batch size or make sure dataset samples are all 10 seconds or less, other than that, that may be too small to run this nicely
@@Jarods_Journey stupid question, i have 3 hours worth of clean and crisp audio recording. if i cut that off and have like 10 seconds and make like 1000 files, do i need to train it one by one. or can i just precess it once?
thanks for the reply
@@whitegoose-k4g You can load all files into a folder after chopping it up and that will be processed. The more audio you have the slower it will run. Also you will need to lower your batch size significantly.
It is useful to chop up the data, so it can be loaded and buffered in smaller chunks. Cmaller chunks provide more felxibility in loading in the data allowing the buffer to be as close to full as possible.
What's the difference between RVC v1 and RVC v2?
They describe V2 as using a "768 dimensional feauture of 12 layer hubert" which means that it can perhaps capture more feautres of the voice due to the dimension of the vector being larger and training through more layers to hopefully learn the complexities of the voice better.
how do you create an executable for the tensorboard instead of always typing it manually under vs code?
ChatGPT it :), that's what I would do. Should just be a few line
wallpaper link?
You can probably find it easily by looking up rem landscape wallpapers lol, I don't have an exact link
Can we increase the limit of epoch numbers from 1000 to more in offline training. If yes then please guide me. Thank you🙂
man, we arrived in cyberpunk
Tks you!
very useful
I just get 'NaN' when training and can't get a clear voice like others get.
Maybe try setting the learning rate lower? You would have to edit the config file in the configs folder
What GPU do you use? I had the same issue on my 1660, fixed it myself. And reported on the github.
[overtrains it on purpose for the lols]
Could be the voice affect your going for lool 👀
3:11
croissant
jesus, why so complicated
Python is a terrible language with a terrible platform, yikes.
Can we increase the limit of epoch numbers from 1000 to more in offline training. If yes then please guide me. Thank you🙂
1k is the max for base RVC :)!
@@Jarods_Journey then how can i increase it. Do you have any clue? Is there any other version of rvc their which supports epoch beyond 1k.
Hey, in the gradio page, use inspect elements, change the max value when the epoch slider is selected. Do it for both the slider and the number box.
@@hipete1295 i've tried this. But no luck. I'll try again with hope.
I found the way bro.