Ive been using RVC for the past couple of days with better than expected results. With these tips I feel like im gonna take my projects to the next level. Thank you for all the hard work!
RVC is damn good. Not only can it clone voice to another, it can do so in another Langauge. Heck, I've been using it to change the voices of some anime I watched to my favorite actors.
You are able to do other languages? I try it with german and it gets my voice pretty good, but the pronounciation of words is kinda terrible. Do we need more training data for other languages? Or do you have any additional tips for me?
@@dthSinthoras you need a voice in English for music in English, or a voice in German for music in German. if you take a German voice for English music, it makes an accent. I tested with several languages, you have to keep the same language for the voice and the music
Great practical advice. I've personally used the free Adobe AI voice enhancer to make the extracted voices clearer, but UVR seems very promising. Will have to try that out. Thanks!
Why when I download voice models from the weights site and do an inference on a vocal track from a song it sounds like crap? It'll sound normal until a certain part comes up and then it'll sound like somebody getting strangled.
The outputs seem to be the best but the annoying quirks are that each time you want to train or retrain a model, you have to input the settings all over again. And there's no quick or convenient method to share and publish metods, so that others can use them
This is all very interesting, but I'd like a direct answer to these questions if you don't mind : - does crepe or mangio-crepe make a huge difference in voice quality compared to harvest? I mean, is it really night and day, like comparing pm to harvest? I have a version of RVC where I don't have crepe, and I'd like to know if it's worth upgrading ^^. - I've been using UVR5 to isolate voices, and it is indeed great... except for the quality loss. What's the current best workflow with UVR5 to not lose voice quality? Basically, which models and processing methods work best to isolate a voice from most songs? - My voices tend to not articulate as well. I assume I'm not doing something right when I create the model. I'm using 7 minutes of voice samples extracted from a videogame, so it's professional audio quality. The voice is very recognizable, but I do feel it sometimes doesn't articulate. Is it current limitation of voice cloning tech? Thank you for your hindsight!
I'm also struggling with the last one. I personally assume it's due to such a low amount of training data? But I haven't tested it. My other theory is I'm probably not extracting the vocals of songs correctly.
I think that's all too nerdy for me. I'm pretty happy with the normal version. But the sometimes does not find the index file, that sucks. Right now it is much too warm in my apartment. I'm taking a break from training new models. Thanks for the tips
I haven't listened to this song since I played Steno Arcade! (Which is free to play either for download or on Steam.) Presumably you both used it because of the license; I can't remember if it was commissioned or not.
The sound is incredibly amazing, but unfortunately, it's not working for me, even though the Retrieval-based-Voice-Conversion-WebUI is working perfectly. I keep getting an error indicating that there is an issue with Torch, even though I am certain that it is installed and present. I’m using windows 10 Can you help me ?
if I choose that a module should have no tone and I train it in the new version of RVC, I can still choose which tone algorithm to use. This means that it still uses RMVPE, i.e. the new version and the quality is not particularly good either. Hope it gets fixed. try to choose false in the old and in the new version.
I played around a lot with the Tensorboard-Graphs now. Why are you setting the smoothing this high? Because you want to see what seems to be the overall better curve? For finding the best epoch it would be better to set it to 0 right?
Thanks for this sir, btw is the voice you use a voice model? lol Does anyone know of any communities where people are sharing their vocal models for RVC?
using translater: When I train a voice in the new version of RVC, it looks as if it has chosen an algorithm even though I have chosen not to use pitch during the training. try training a module in version RVC0813Nvidia and then version RVC1006Nvidia. I mean when you train you can choose whether you want to use pitch or not. the sound is more natural in the old version when you don't use pitch. there is a synthesizer input module, bug how can we use this?
How can i combine Harvest +Diffgrad and Adam , wenn i just use the RVC Interface from mangio or Applio ? So in the Json file in "Learning Rate i have these entry : "learning_rate": 1e-4 . Why is it different to yours ? How should i change it ?
could we make a shared dropbox? i will like to know if you trained one of my dataset on the false settings on a batchsize of 40 will the autotune effect go away? my dataset is 1 Hour long? and train it for 1000 epochs thanks. how mutch can i clock my GPU in the EVGA Precision X1 my graphic card is 400
I'm just trying to explain something more clearly via google translator: where you have to choose during the training whether you want Whether the model has pitch guidance (required for singing, optional for speech):
threaten
false. When you choose false and you have to do a Model Inference, it has chosen one of those rmvpe. that is, if I load Ai_Hoshino_TTS.pth it doesn't ignore RMVPE, but the old version does.
can i send you my dataset it is 1 Hour 7 min. and 6 sec. the index file can ony get up to 600.2 mb. could you train it on 1000 epochs, 48 khz and pitch set to false? I would like to hear the difference between a batch size p¨26 and a batch size p¨40 we could make a dropbox folder.
when i use Ultimate Vocal Remover the vr models are very bad every voice sounds like a mikimosue and the seperation is very bad. maby i installed someting wrong idk. but the mdx models sound good. but they seem to runn on my cpu insted of my gpu
Ay this is awesome! After getting the hang of so-vits, this is looking real promising. Just wondering though, is there a easy way to use a model for TTS or would I need to create a separate model specifically for that?
And changing the learning time, what am i suppose to change that to? can this be done if a model is already training or would i need to retrain again with these changes made
Hi Nerdy, thanks for the tips!. Is it necessary to update the RVC that I installed a couple of weeks ago? I ask because I only have pm, harvest and crepe in my RVC.
if i make a module that doesn't respect pitch how can i get it to change pitch when i transbone it i have a daft punk vocoder module and when i transpose it will work the module even if it has no pitch
I really need some help, my software decided to stop working, i made a few cover songs and they turned out cool, but now for some reason when i click convert on model inference it just stops and will not process. i had another app that converted the sounds with my models RVC-GUI-pkg And it just does not work anymore and gives me a blank mp3 file.... Why would this happen? is there some time limit for these apps??? i see nothing on the internet about this i changed nothing with my directories as to why this would occur. PLEASE help i want to make some more covers i dont know why its not letting me
Hi there, I am playing around with Tortoise for text to speach and RVC for speach to speach, but with both I have a pretty similar problem: They sound like me, but they can pronounce german sentences. Are they just not trained for that? Are there other base-models for other languages? Or is my training data of ~45 minutes of speaking still not enough?
Quality, not quantity I say. I have an english voice speaking vietnamese, indonesian, japanese, french, russian, cantonese, mandarin, and dutch, so of course it can handle many known languages. It can even pronounce Middle English! I just wish there was a middle english text to speech so we could hear Chaucer style parodies of current events... But german should be very easy for it. Try singing a little maybe, or varying the enunciation of common words.
what does changing the log interval from 200 to 100 do or benefit? also if im using v2 models then i would need to change the 48k_v2 json config file ?
@@NerdyRodent Thanks, one more question, does the file format used for the training (wav, mp3, flac) affect the resulting model's output accuracy and quality in a noticeable way or not? If so, what export setting do you recommend in audacity?
@@NerdyRodent Welp, looks like I'm going to have to wait over 4 hours just to upload my 50 minute wav file to colab----rural internet... So mp3 must really tank the quality then...
*I need to translate srt file (subtitle) from English To Arabic , please , any gooood solution? with the best quality human like translation , not google translate or anything like that?*
How can i continue my pre-trained model later? I want to train it 20000 epochs for most realistic quality but i need to run day by day my same pre trained model..Is it possible on Easygui Colab? I have Pro colab.
what equipment do you have? I mean how much have you paid for your computer, can you use a 12 GB graphics card, or do you have to have a 16 GB to be on the safe side.
@@NerdyRodent Ok thanks. I tried several times to copy voices, but without the index it never worked. I also tried to use RVC-beta which launches quickly and has a simplified interface. But you have to use zip files, with inside you need the "index" file. Is there a way to create this "index" file?
@@NerdyRodent I have 24gb 3090, but idk why it’s training soo slow.. I set it to a batch of 40, I left for many many hours it just hit 40epochs.. very weird on epoch 34 it took 1 hour and 44 min to do 1 epoch.. i wasn't home to notice when that happened.. now I'm on epoch epoch 44 took 2 hours and 45min to finish.. and another on took 58 min to finish.. i am very confused
@@mahmood392 it could be that you have a two or three hour long dataset. The bigger your dataset, the longer it will take. Usually 30 minutes is absolutely fine.
it was 10min dataset.. something was broken because some epochs took 3 hours and then the epoch before it or after it was just 2min.... anyways i ran it again with a batch of 28.. it finished in 16min, 5seconds per epoch@@NerdyRodent
Eu não entendi muito bem o que isto faz, mas senti exatamente a dimensão da experiência. Você fez uma grande demostração, estou acioso para acompanhar o desenrolar disso, mais demostrações serão bem-vindas 😮!
Hi Nerdy, thanks for the tips, would you please share a zip file of (Mangio-RVC-Fork) same as the (RVC-beta 7z) so i run it directly in my windows. Thank you
@@NerdyRodent I have questions regarding this, but they always disappear, so this comment just wants to see, if youtube doesnt like my questions, or if everything disappears...
@@NerdyRodent Ok, my testanswer seems to stay, so TH-cam probably dont like code snippets here? I tried to show what I tried, because I still get the same error. I will try to show what I tried without using something that looks like code... I did a pip install torch_optimizer and I replaced AdamW with DiffGrad. What else is needed? Ohh and thank you for the other answer, that seemed to be at least a good hint!
Ittai dare ga watashi no doa o nokku shite iru nodeshou ka? Sā, mō koko ni wa konaide kudasai yoru osoi no ga miemasen ka? Totemo tsukarete ite kibun ga yoku arimasen watashi ga nozomu no wa hitori ni naru koto dake chikadzukanaide, watashinoie ni shin'nyū shinaide kudasai soto de burabura shite itara saikōdesu haitte konaide, nigete kakureru dakedesu imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka?
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 3.03 GiB already allocated; 0 bytes free; 3.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Ive been using RVC for the past couple of days with better than expected results. With these tips I feel like im gonna take my projects to the next level. Thank you for all the hard work!
Best of luck!
RVC is damn good. Not only can it clone voice to another, it can do so in another Langauge.
Heck, I've been using it to change the voices of some anime I watched to my favorite actors.
Fun, huh? 🍿
You are able to do other languages? I try it with german and it gets my voice pretty good, but the pronounciation of words is kinda terrible. Do we need more training data for other languages? Or do you have any additional tips for me?
@@dthSinthoras you need a voice in English for music in English, or a voice in German for music in German. if you take a German voice for English music, it makes an accent. I tested with several languages, you have to keep the same language for the voice and the music
woah that's so cool!!
@@LegendaryLibrary-l2u I think the accent is pretty cool
You are the coolest geek ever....
The final duet is gold
Thanks! Glad you liked the things 😀
Great practical advice. I've personally used the free Adobe AI voice enhancer to make the extracted voices clearer, but UVR seems very promising. Will have to try that out. Thanks!
Sandesu.-Hen!
I use UVR all the time and its great.
I always thought crepe will give the best results. Thanks for the research and the useful tips how to get the best results!
Why when I download voice models from the weights site and do an inference on a vocal track from a song it sounds like crap? It'll sound normal until a certain part comes up and then it'll sound like somebody getting strangled.
That's nerdy to the core, thanks!
The outputs seem to be the best but the annoying quirks are that each time you want to train or retrain a model, you have to input the settings all over again. And there's no quick or convenient method to share and publish metods, so that others can use them
This is all very interesting, but I'd like a direct answer to these questions if you don't mind :
- does crepe or mangio-crepe make a huge difference in voice quality compared to harvest? I mean, is it really night and day, like comparing pm to harvest? I have a version of RVC where I don't have crepe, and I'd like to know if it's worth upgrading ^^.
- I've been using UVR5 to isolate voices, and it is indeed great... except for the quality loss. What's the current best workflow with UVR5 to not lose voice quality? Basically, which models and processing methods work best to isolate a voice from most songs?
- My voices tend to not articulate as well. I assume I'm not doing something right when I create the model. I'm using 7 minutes of voice samples extracted from a videogame, so it's professional audio quality. The voice is very recognizable, but I do feel it sometimes doesn't articulate. Is it current limitation of voice cloning tech?
Thank you for your hindsight!
I'm also struggling with the last one. I personally assume it's due to such a low amount of training data? But I haven't tested it. My other theory is I'm probably not extracting the vocals of songs correctly.
Super interesting. I really like the invert audio trick!
Glad you liked it!
thank you for your hard work NR
on that note, what would be better at this stage? text2Voice with custom voice trained or text2voice with build in BARK voice and then voice2voice ?
Thank you for your hard work making this video! Looking forward for a sample cover song with your model.
I think that's all too nerdy for me. I'm pretty happy with the normal version. But the sometimes does not find the index file, that sucks. Right now it is much too warm in my apartment. I'm taking a break from training new models. Thanks for the tips
I haven't listened to this song since I played Steno Arcade! (Which is free to play either for download or on Steam.) Presumably you both used it because of the license; I can't remember if it was commissioned or not.
Creative Commons licenses are cool - especially the ones that allow derivatives 😉
Hell yeah waited for this!
That's insane! Consider my mind blown.
Good advice about sharing, because I somehow almost managed to understand that, but you confirm my theory.
The sound is incredibly amazing, but unfortunately, it's not working for me, even though the Retrieval-based-Voice-Conversion-WebUI is working perfectly.
I keep getting an error indicating that there is an issue with Torch, even though I am certain that it is installed and present.
I’m using windows 10
Can you help me ?
if I choose that a module should have no tone and I train it in the new version of RVC, I can still choose which tone algorithm to use. This means that it still uses RMVPE, i.e. the new version and the quality is not particularly good either. Hope it gets fixed. try to choose false in the old and in the new version.
I played around a lot with the Tensorboard-Graphs now. Why are you setting the smoothing this high? Because you want to see what seems to be the overall better curve? For finding the best epoch it would be better to set it to 0 right?
It shows the trend, which can be difficult to tell without smoothing
@@NerdyRodent Do you have something (a Discord, GoogleDrive, whatever) to discuss curves? I would love to deepdive into that theme :)
Thanks for this sir, btw is the voice you use a voice model? lol
Does anyone know of any communities where people are sharing their vocal models for RVC?
using translater: When I train a voice in the new version of RVC, it looks as if it has chosen an algorithm even though I have chosen not to use pitch during the training. try training a module in version RVC0813Nvidia and then version RVC1006Nvidia. I mean when you train you can choose whether you want to use pitch or not. the sound is more natural in the old version when you don't use pitch. there is a synthesizer input module, bug how can we use this?
How can i combine Harvest +Diffgrad and Adam , wenn i just use the RVC Interface from mangio or Applio ? So in the Json file in "Learning Rate i have these entry : "learning_rate": 1e-4 . Why is it different to yours ? How should i change it ?
Could you please react of the new Converting Mode : rmvpe ? Harvest is better ? Or the New One ? And Please tell how to set up the Optimizers
I need a ObS Studio Live Tutorial with this voice cloning tool, please :) .
could we make a shared dropbox? i will like to know if you trained one of my dataset on the false settings on a batchsize of 40 will the autotune effect go away? my dataset is 1 Hour long? and train it for 1000 epochs thanks. how mutch can i clock my GPU in the EVGA Precision X1 my graphic card is 400
I'm just trying to explain something more clearly via google translator: where you have to choose during the training whether you want Whether the model has pitch guidance (required for singing, optional for speech):
threaten
false. When you choose false and you have to do a Model Inference, it has chosen one of those rmvpe. that is, if I load Ai_Hoshino_TTS.pth it doesn't ignore RMVPE, but the old version does.
can i send you my dataset it is 1 Hour 7 min. and 6 sec. the index file can ony get up to 600.2 mb. could you train it on 1000 epochs, 48 khz and pitch set to false? I would like to hear the difference between a batch size p¨26 and a batch size p¨40 we could make a dropbox folder.
This app is so much fun!
Sure is!
when i use Ultimate Vocal Remover the vr models are very bad every voice sounds like a mikimosue and the seperation is very bad. maby i installed someting wrong idk. but the mdx models sound good. but they seem to runn on my cpu insted of my gpu
Ay this is awesome! After getting the hang of so-vits, this is looking real promising.
Just wondering though, is there a easy way to use a model for TTS or would I need to create a separate model specifically for that?
Sandesu.-Hen!
And changing the learning time, what am i suppose to change that to? can this be done if a model is already training or would i need to retrain again with these changes made
hey, how can i use the harvest-diffgrad model to train? I can't find them in google colab
Dont understand why you done the audio in 40k. Please explain . Does it change somethink in the long run ?
Hi Nerdy, thanks for the tips!. Is it necessary to update the RVC that I installed a couple of weeks ago?
I ask because I only have pm, harvest and crepe in my RVC.
Personally I git pull every time, but whatever version I’d best got you!
if i make a module that doesn't respect pitch how can i get it to change pitch when i transbone it i have a daft punk vocoder module and when i transpose it will work the module even if it has no pitch
Try including more pitch variety in your training data
@@NerdyRodent can i send you the daftpunk vocoder modul
I really need some help, my software decided to stop working, i made a few cover songs and they turned out cool, but now for some reason when i click convert on model inference it just stops and will not process. i had another app that converted the sounds with my models RVC-GUI-pkg
And it just does not work anymore and gives me a blank mp3 file.... Why would this happen? is there some time limit for these apps??? i see nothing on the internet about this i changed nothing with my directories as to why this would occur. PLEASE help i want to make some more covers i dont know why its not letting me
You may need to git pull again to get the latest version (assuming you did a normal install, not a zip file)
how can people train without v1 or v2'i have a model where it says none
I have 128gb system ram, but only an 8gb GPU I bought yesterday on Ebay.
Ist his going to work for me with only a 3070 8gb card?
That is impressive quality I have to say. I wonder does it work for text to speech? or is it only for singing?
You can do tts then convert that to a new voice. If it’s not singing, probably best to keep singing out of your dataset
Impressive!
Glad you liked the things!
Hi there, I am playing around with Tortoise for text to speach and RVC for speach to speach, but with both I have a pretty similar problem: They sound like me, but they can pronounce german sentences.
Are they just not trained for that? Are there other base-models for other languages? Or is my training data of ~45 minutes of speaking still not enough?
you may need to ensure that you have a balanced range of phonemes in your dataset
@@NerdyRodent So.. it should be possible yes? :)
Quality, not quantity I say. I have an english voice speaking vietnamese, indonesian, japanese, french, russian, cantonese, mandarin, and dutch, so of course it can handle many known languages. It can even pronounce Middle English! I just wish there was a middle english text to speech so we could hear Chaucer style parodies of current events... But german should be very easy for it. Try singing a little maybe, or varying the enunciation of common words.
what does changing the log interval from 200 to 100 do or benefit? also if im using v2 models then i would need to change the 48k_v2 json config file ?
It just logs more often. Who doesn’t love extra datapoints? 😉
What does the X-axis on the graphs represent?
Steps
@@NerdyRodent Thanks, one more question, does the file format used for the training (wav, mp3, flac) affect the resulting model's output accuracy and quality in a noticeable way or not? If so, what export setting do you recommend in audacity?
@@toasteroven6761 go for wav
@@NerdyRodent Welp, looks like I'm going to have to wait over 4 hours just to upload my 50 minute wav file to colab----rural internet...
So mp3 must really tank the quality then...
*I need to translate srt file (subtitle) from English To Arabic , please , any gooood solution? with the best quality human like translation , not google translate or anything like that?*
and what batch size should i train on?
How can i continue my pre-trained model later? I want to train it 20000 epochs for most realistic quality but i need to run day by day my same pre trained model..Is it possible on Easygui Colab? I have Pro colab.
Sure, you can do that on colab though 20,000 epochs is a bit high…
@@NerdyRodent How can i?
Is there a TTS tool that can use the self trained models?
As this is voice to voice, and tts generates a voice, pretty much any tts generated voice will do!
Hello and thanks for your video. I suffer a little with getting voice for training and training time (don't have it)
is there an ai to change the lyrics of a song ? like in goes a singing voice and out comes same voice with the lyrics changed
Probably best done by humans at the moment!
what equipment do you have? I mean how much have you paid for your computer, can you use a 12 GB graphics card, or do you have to have a 16 GB to be on the safe side.
The more VRAM the better, though it all depends what one does with one’s computer
So i can train the most on 16gb?
Is it possible to use the files without having the index? Because many people have shared without this file and it seems to work?
Yup, it just sounds a lot better with the index file too in my experience
@@NerdyRodent Ok thanks. I tried several times to copy voices, but without the index it never worked. I also tried to use RVC-beta which launches quickly and has a simplified interface. But you have to use zip files, with inside you need the "index" file. Is there a way to create this "index" file?
@@LegendaryLibrary-l2u not without the original dataset
Sorry if this is a stupid question, but how can I resume the training at a checkpoint, if possible?
Yes, you can just resume
@@NerdyRodentShould I just change the pretrained models in step 3 with the checkpoint model to do that?
@@exidion54 simply increase the number of epochs and press train again
Was wondering the batch size u used for this?
As much as your GPU can handle 😉
@@NerdyRodent I have 24gb 3090, but idk why it’s training soo slow.. I set it to a batch of 40, I left for many many hours it just hit 40epochs.. very weird on epoch 34 it took 1 hour and 44 min to do 1 epoch.. i wasn't home to notice when that happened.. now I'm on epoch epoch 44 took 2 hours and 45min to finish.. and another on took 58 min to finish.. i am very confused
@@mahmood392 it could be that you have a two or three hour long dataset. The bigger your dataset, the longer it will take. Usually 30 minutes is absolutely fine.
it was 10min dataset.. something was broken because some epochs took 3 hours and then the epoch before it or after it was just 2min.... anyways i ran it again with a batch of 28.. it finished in 16min, 5seconds per epoch@@NerdyRodent
Can I easily train on the 40 80 or do I have to have a 40 90
Any modern Nvidia GPU will do!
@@NerdyRodent i mean the performance on¨the 2 altso 4080 ore 4090
4090 is better
Does anyone know if it's possible to use trained voices with text to speech somehow?
Yup! TTS-> voice -> voice
Can i do Text to speech with this? How can I do that? I really need TTS :)
Can it change voice on live for videos making or calls
Yup, it’s just about fast enough for real time
@@NerdyRodent How do you get the real-time up I can't see it please
Eu não entendi muito bem o que isto faz, mas senti exatamente a dimensão da experiência. Você fez uma grande demostração, estou acioso para acompanhar o desenrolar disso, mais demostrações serão bem-vindas 😮!
Hi Nerdy, thanks for the tips, would you please share a zip file of (Mangio-RVC-Fork) same as the (RVC-beta 7z) so i run it directly in my windows. Thank you
can u tell us where to change that seed?
It’s in the config file for your training, as shown
how to change the optimizer?
what batchsize do you train on
40
@@NerdyRodent so if I can only train with 24 then I'm not getting a good quality
@@denblindedjaligator5300 24 is fine 👍🏽
I try to reproduce your optimizations, but I get this Error: AttributeError: module 'torch.optim' has no attribute 'DiffGrad'
How to fix that?
AdamW and RAdam are working, I just dont get DiffGrad to run as optimizer.
The Output with RAdam is just silence for me :O
DiffGrad isn’t a default optimiser, it’s part of pytorch optimisers
@@NerdyRodent I have questions regarding this, but they always disappear, so this comment just wants to see, if youtube doesnt like my questions, or if everything disappears...
@@NerdyRodent Ok, my testanswer seems to stay, so TH-cam probably dont like code snippets here? I tried to show what I tried, because I still get the same error. I will try to show what I tried without using something that looks like code...
I did a pip install torch_optimizer and I replaced AdamW with DiffGrad.
What else is needed?
Ohh and thank you for the other answer, that seemed to be at least a good hint!
hello Nerdy MASTER! why Index first, Training second ? thnks
Index trains fast 😉
teach us how to use bark and install
Freaking awesome
😀
DAMM, WHY NO MAKE A WINDOWS NODE CODE A SCREEN A BUTTONS ,STAR,COPY,CHANGE,OK,CONVERT?
What nvidia card is supported?
You’ll get the best performance with something like a 4090. Nice and fast with plenty of VRAM!
@@NerdyRodent what about 4080 it is god? and 16 gb of ram
@@denblindedjaligator5300 16gb VRAM can do lots of things for sure
@@NerdyRodent what is the lirning rate and the seed? Could you make a video about pipper tts
Ittai dare ga watashi no doa o nokku shite iru nodeshou ka? Sā, mō koko ni wa konaide kudasai yoru osoi no ga miemasen ka? Totemo tsukarete ite kibun ga yoku arimasen watashi ga nozomu no wa hitori ni naru koto dake chikadzukanaide, watashinoie ni shin'nyū shinaide kudasai soto de burabura shite itara saikōdesu haitte konaide, nigete kakureru dakedesu imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka?
Hi Nerdy Rodent.. do you have a discord comunity?
Great Videoi
Thanks! Glad you liked it
@@NerdyRodent im getting a pretty robotic voice, with a lot of random high pitch "artifacts" and quick tips you can give me?
Tips, Tricks and Bookmarks
"Download the weights" Weights? What weights? Where?
From the link to the weights on the github page ;)
Pretty impressive that it can make a rat talk with a British accent!
😉
Why you trained 40k and dont 48k ?
Wow!
*Open Source & Free: Tenacity , instead of Audacity*
RIP Audacity 🪦
Wow
best
very nerdy :)
Why my trained sound not sounds not as intended? What can go wrong?
One thing that could go wrong is to have a very poor quality data set
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 3.03 GiB already allocated; 0 bytes free; 3.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
You’re better off using colab if you’ve got 4GB VRAM. Even games today will struggle on just 8GB!