Super Fast Voice To Voice AI! | Voice Cloning with so-vits-svc

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ก.ค. 2024
  • Clone any voice! Have you ever wanted to sing a song in a language foreign to you? Can't be bothered to actually learn how to speak that language? No problem! Thanks to the amazing power of FREE open source Voice To Voice technology, anyone can remix any audio with the voice of their choosing - in just SECONDS from any trained model! So fast, you can even run it in real-time :)
    German, French, Dutch, Japanese, English, Welsh, Cornish, Spanish, Portuguese, Bengali, Greek, Latin, Sumerian, Akkadian, Kawishana, Paakantyi - whatever language you speak or want to speak (or sing!) it does them all! Any language really does actually mean exactly that 😉
    Super quick and easy to install locally for maximum fun! Not suitable for children.
    Some ideas for research:
    * Want to hear a loved one again? If you’ve got their voice recorded, then you can do exactly that
    * Lost your voice but have some old recordings? Hear yourself speak or sing again!
    * Develop a way to have a chatbot speak to you using your voice
    * Take audio generated by a TTS and convert it into your own voice
    All free, ready for you to run on your own computer at home (or using Google Colab).
    Enjoy!
    Update: for Issues relating to “fairseq” on MS Windows, see the GitHub page for fixes!
    == Links! ==
    * so-vits-svc-fork - github.com/voicepaw/so-vits-s...
    * song - pixabay.com/music/pop-french-...
    * anaconda - www.anaconda.com/
    * pytorch - pytorch.org/get-started/locally/
    * AudioSlicer - github.com/henrymaas/AudioSlicer
    * Spleeter - github.com/deezer/spleeter
    * Demucs - github.com/facebookresearch/d...
    * Installing Anaconda for MS Windows Beginners - • Anaconda - Python Inst...
    * Text To Speech with Tortoise TTS - • AI Voice Cloning - Tor...
    == Stable Diffusion ==
    * Talking Faces! - • Create your own animat...
    * Stable Diffusion Playlist! - th-cam.com/play/PLj.html...
    * Interested in adding things to your AI Art? Try these!
    Dreambooth Playlist - • Stable Diffusion Dream...
    * Textual Inversion Playlist - • Stable Diffusion Textu...
    Note that GitHub repositories change often. Be sure to check for any changes!
    0:00 Introduction to voice to voice cloning
    1:34 so-vits-svc installation
    3:56 Downloading so-vits-svc pre-trained models
    4:06 voice dataset for so-vits-svc
    5:56 training so-vits-svc
    10:26 so-vits-svc inference
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 766

  • @RolandaSupsene
    @RolandaSupsene ปีที่แล้ว +60

    Nerdy Rodent I am always amazed at your brilliance. You are at the top of your game as always! Thank you for your talent.

  • @sharifhamza25
    @sharifhamza25 ปีที่แล้ว +47

    Brilliant tutorial. I love your videos especially because you use open source programs. I wish you would make an ai music generator tutorial next. Thanks again for your great work.

  • @TheMohasher
    @TheMohasher ปีที่แล้ว +15

    just stumbled upon this channel while looking for a tutorial for so vits svc. I'm thoroughly impressed by the quality of this tutorial. you couldn't be more thorough and clearer. not only you provided a tutorial but also explained what to do about splicing and removing the vocals, you've earned a subscriber.

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +3

      Great to hear - thanks! :)

  • @PleaseOpenSourceAI
    @PleaseOpenSourceAI ปีที่แล้ว +6

    Really like your tuts on newest neuro repos! Thank you, Nerdy Rodent!

  • @marcthenarc868
    @marcthenarc868 ปีที่แล้ว +21

    Congrats. You got credited as a contributor to the so-vits-svc-fork repo. 👏👏👏

  • @PythonAndy
    @PythonAndy ปีที่แล้ว +5

    Amazing video, so well done and loved your way of explaining it!! ♥

  • @3Voc
    @3Voc ปีที่แล้ว +1

    This must be one of the clearest tutorials about a collaboration of github / CLI tools that I have ever stumbled upon. Stunning work 👍 thank you

  • @Havelim
    @Havelim 10 หลายเดือนก่อน +1

    You sing really well pal 😆 While there's many idiots doing crap videos about it, yours is an excelent tutorial, covers up all the necesary stuff. Congrats from Buenos Aires.

    • @NerdyRodent
      @NerdyRodent  10 หลายเดือนก่อน

      Glad I could help!

  • @JavierGarcia-td8ut
    @JavierGarcia-td8ut ปีที่แล้ว +1

    WOW!!! Thats awesome!! I was wating for something like this! Thank you very much

  • @amkire65
    @amkire65 ปีที่แล้ว +46

    About a year ago I thought nothing could top Dall-E... the last two weeks we seem to be taking 10 steps forward every day. I fancy hearing Donald Duck singing some ACDC... and now it's possible.

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +4

      Because why not, eh?

    • @supremeturtle4620
      @supremeturtle4620 ปีที่แล้ว +2

      Lmfao dude ai has limitless potential. Seriously, as a kid I knew this would happen one day and it so happy it's happening in my teen years.
      One day, AI will be able to generate any scenario prompt you throw at it to amazing detail in a virtual reality space

    • @sholonator92
      @sholonator92 ปีที่แล้ว

      On the contrary this is only starting

  • @cosmicdust632
    @cosmicdust632 ปีที่แล้ว +5

    The AI french wasn't that bad. Nice, would love some more in depth guides/tutos to this

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +4

      There’s not much else to it tbh! 😃

  • @aaronperron
    @aaronperron ปีที่แล้ว +32

    Nerdy Rodent always dropping the best AI tuts around .Now just replace Snoop Dogg's voice with yours and make a sick intro to your vids 🎤🎵

  • @AaronALAI
    @AaronALAI ปีที่แล้ว +4

    Wow!! I'm training my voice right now! Thank you for the instructions.

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +2

      You got this! 👍

    • @Fridayynightss
      @Fridayynightss ปีที่แล้ว +4

      how did you get this to work?? ive been at this legit 2 hours and i cant get the gui to open

  • @preparationsparkly
    @preparationsparkly ปีที่แล้ว

    Thank you so much, this was super helpful! I have mine training now 🎉

  • @MrNorBro
    @MrNorBro 10 หลายเดือนก่อน

    My Nerdy Buddy again gave me exactly what i was looking for! Nice, mate! 👊😎 Although, in the " AnimateDiff New Model! A1111, ComfyUI, ControlNet " i did not understand what you meant with Conda, than when I installed an other AI program it messed up the AnimateDiff ( : ... So its very cool that you speaking about this virtual environment feature at 02:35 !

  • @Dittomaster4444
    @Dittomaster4444 ปีที่แล้ว +26

    For anyone wondering why nothing showed up in the folder you set as the install path like me(im very new to all this). just create a dataset_raw folder then a folder with the speaker id(this can be anything) then put the wav files there and then run commands as shown in the video. The folders you see in the video will be created as you run these commands, they aren't instantly there after you run the command to install the so vits fork. also great video 💜

  • @YouMostTubeWanted
    @YouMostTubeWanted ปีที่แล้ว +10

    Awesome tutorial as always!
    Would love to see some free, open source text to voice in a good quality

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +6

      This is indeed free and open source!

    • @xyzonox9876
      @xyzonox9876 ปีที่แล้ว +3

      you may already know this but TorToiSe TTS is currently the best open source TTS AI. Its not as good as eleven labs but its pretty powerful, you only need a handful of example audio and a few minutes of training (each run, that's why its called Tortoise). There's also an AI called Bark that focuses more on inflections and emotions but its currently in its infancy and doesn't reliably output usable results

  • @num001koo
    @num001koo ปีที่แล้ว +8

    Here's a tip I've prepared for those encountering the 'ValueError: Invalid integer data type 'f'' error in AudioSlicer. I'd like to share my experience with you.
    Wav files can be represented in a manner using bit depth as integers or as floats. If you're dealing with a Wav file in float format, you'll encounter an error. Therefore, you need to export your Wav file in int16 format. Keep this in mind.

  • @migriv4603
    @migriv4603 ปีที่แล้ว +2

    This is awesome!! Some day imma donate like $100 to this channel! Don’t have the money yet, but you’ve helped sooo much!

  • @johnpope1473
    @johnpope1473 ปีที่แล้ว +3

    Held out to the last minute and BOOM! was worth it. This is amazing technology. Thanks for sharing. I'm super interested to see a heavy metal drum being transformed into a softer brush percussion for like a jazz piano vibe. We already trained for speech .... and the separation part with spleeter is straight forward.... how can we do it..... some one must be working on this.

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      I sort of want to try with out of scope things like animal noises and other weird sounds 😆

    • @kyletrent.mp4
      @kyletrent.mp4 ปีที่แล้ว

      Oh there's people working on it alright, these tools are getting crazier by the day

  • @knoopx
    @knoopx ปีที่แล้ว +2

    you have a great voice btw!

  • @drux9647
    @drux9647 ปีที่แล้ว

    I dont know anything about python nor anything related to Ai... man I cant even speak in English properly and see me here, trying to learn how to use this, thx for teaching me and keep up the great work!😁😁

  • @dabookwriter
    @dabookwriter ปีที่แล้ว +1

    Excellent work.

  • @farooqansari
    @farooqansari ปีที่แล้ว +1

    Awesome content!

  • @pinpointping6175
    @pinpointping6175 ปีที่แล้ว +14

    Thank you for the step by step guide and immense insight into the process and steps required to train and infer. We all really appreciate it! ☺️

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +1

      You're very welcome, and thank you! 😀

  • @gamma3501
    @gamma3501 ปีที่แล้ว

    These techs really heats my spirit

  • @daniel99497
    @daniel99497 ปีที่แล้ว +1

    Amazing voice !

  • @Some1uNo
    @Some1uNo ปีที่แล้ว +3

    Damn another rabbit hole. Ahh. Awesome

  • @minidinkde
    @minidinkde ปีที่แล้ว +2

    Awesome video! I've already made a dataset and it turned out great. However, if I want to create a separate new dataset of another voice, do I simply move my old datasets & log folders out and replace them with new ones and start training from there? Thank you

  • @Rangomania69
    @Rangomania69 7 หลายเดือนก่อน +1

    A very good tool. well explained thank you sir

  • @TheChrisLouis
    @TheChrisLouis ปีที่แล้ว +2

    it is so different when using G collab I kind of wish you also did a tutorial along side that as well. There is a lot of guessing and pretty much no info on this online except for this one video on the non g collab version.

  • @Endangereds
    @Endangereds ปีที่แล้ว +1

    So, this was the surprise. Cool 😎🥶

  • @phizc
    @phizc ปีที่แล้ว +1

    Nice tutorial, but I wish you'd demonstrated it on normal speech too. Singing, I think, is actually easier to make sound realistic/good than normal speech, since it's got regular pitch and rhythm, and we the listeners are used to compression and loads of effects, etc.
    I had a listen to the Element Song short, and it sounded good, though at places it sounded like you had a sore throat. Not sure if that was the model training or poorly split voice/music tracks.

  • @adolforangel1045
    @adolforangel1045 ปีที่แล้ว

    Great content, thanks for the tutorial!

  • @Nutronic
    @Nutronic ปีที่แล้ว +7

    So we could get Elvis interviews for his voice and then have him sing Where The Streets Have No Name by U2?

  • @mikerhinos
    @mikerhinos ปีที่แล้ว +1

    Merci pour cette super vidéo mon rat préféré !
    🥰

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      merci, et vous êtes les bienvenus!

  • @atomictraveller
    @atomictraveller ปีที่แล้ว

    i've never found the video on youtube, but back in the 90s, ray kurzweil demonstrated this technology (cepstral deconvolution, match phoneme).

  • @StalwartNightmare
    @StalwartNightmare ปีที่แล้ว +1

    Is there any way to make the realtime voice changer be instant or near instant without degrading the quality or is that just a "in the future" thing. Because it sounds good but the issue is when I say something it doesn't go through for 3 to 4 seconds. I already got a means to use it in games through Voicemeeter but that delay is a bit of a killer.

  • @joannot6706
    @joannot6706 ปีที่แล้ว +2

    pretty cool!

  • @fernandomasotto
    @fernandomasotto ปีที่แล้ว +1

    Thank you very much!! great tutorial. but i stil cant get it to work with collab. But sadly, I dont have enough VRAM to train local. Would you please make a tutorial on how to train in collab??

  • @diegocelixx4493
    @diegocelixx4493 ปีที่แล้ว +1

    hello, could you tell me what the command would be like to execute the inference in colab without the auto prediction and to be able to choose the type of inference to crepe?

  • @SamirPatnaik
    @SamirPatnaik ปีที่แล้ว +1

    Fab Fab Fabulous!

  • @flonixcorn
    @flonixcorn ปีที่แล้ว +1

    So cool!

  • @alessandroflaborea2457
    @alessandroflaborea2457 ปีที่แล้ว +1

    This is an impressive tutorial! I want to have voice-to-voice in Italian. Should I record my voice in English or Italian? I guess the model is pretrained only on English voices

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +1

      Either. You can just use the any language. Most of the examples were in Japanese.

  • @metaeditors
    @metaeditors ปีที่แล้ว +1

    Very nice and it works on OSX to, a bit tricky with pyenv & ancanda but works great even without GPU

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      Awesome, good to know!

  • @jahhe2611
    @jahhe2611 ปีที่แล้ว +3

    great tutorial! i'm missing a few things, What if you start a new training? you simply delete everything from the logs/44k folder and start training again? or is there some way u can devide trainings in seperated folders? Also what's really the difference between the D_ and G_ part of the files output?

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +1

      Yup, you can save your old files before starting a new project. You only need the generator if you’re not going to continue training, so it’s fine to remove the discriminator if you’re only doing inference

  • @Dante02d12
    @Dante02d12 ปีที่แล้ว

    Whatever you do, do no make your mom sing this song, lol.
    Very impressive tech and the install doesn't look too frustrating, I'll try it out! Do I have to read lines to clone a voice, or can I use samples? I'd love to clone some character's voices.

  • @circuitguy9750
    @circuitguy9750 ปีที่แล้ว +1

    Great video as always! I would love to hear an English song so I can pick up on the tone and inflection errors. Curious if any French speakers can comment on how well it did.

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +2

      Then check out the yt short where I sing the elements song - particularly near the end with the non-standard pronunciation of the word “discovered” 😉

    • @ericcaginicolau
      @ericcaginicolau ปีที่แล้ว +2

      As a French speaker I can tell you that it sounds a little bit strange. It is like he has an English accent. Not native French speaker. But it’s still quite good

  • @OMGITSGB
    @OMGITSGB ปีที่แล้ว +1

    is there a way to add additional information to a training model? or do you have to just retrain the entire data set?

  • @XellOwO
    @XellOwO ปีที่แล้ว

    great tutorial!
    but can you show how to continue developing the model with newly added sample data?

  • @Qubot
    @Qubot ปีที่แล้ว +2

    An other interesting video, thanks again.

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      Glad you enjoyed it 😀

  • @randomlikeu
    @randomlikeu ปีที่แล้ว +1

    super cool!

  • @Ag47fr
    @Ag47fr ปีที่แล้ว

    Thank you for your video. Informative. Important

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      You are most welcome. You may like my latest video about RVC which has much of this built into a single web interface and it trains faster!

  • @KingYoshi93
    @KingYoshi93 ปีที่แล้ว

    7:30 I'm a little confused here. I'm not sure if I have any VRAM, because I don't have an external GPU, only a CPU. I'm having trouble figuring out what settings to use. 🤔

  • @animeui_es
    @animeui_es ปีที่แล้ว

    Great video!
    I have a question…
    It’s not important if some audios contain background music?

  • @user-dm4vz7hf6d
    @user-dm4vz7hf6d 7 หลายเดือนก่อน

    The following is from Google Translate. Sorry, my English is not good. I deleted some of the old D and G models, and only kept some of the new ones, and they corresponded one-to-one. However, now choosing to continue training prompts me: load old checkpoint failed and will start training from the beginning, what can I do to get it to continue training from the last time?

  • @smarthalayla6397
    @smarthalayla6397 ปีที่แล้ว

    Hay Nerdy Robert. That looks awesome. Is there a way to make this as portable AI that can be taken with all of it's data and use it on different computers running it from a disk on key or external hard drive?

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      Sure, you could indeed add it to your Ubuntu portable usb stick

  • @ds221b
    @ds221b ปีที่แล้ว

    When I use the train command, I get this error:
    Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [16, 2, 80, 929]
    What can I do?

  • @marcoc2
    @marcoc2 ปีที่แล้ว +2

    We need a model repository for this project

  • @Taylork64
    @Taylork64 ปีที่แล้ว

    Where do you run the pre -resample command? In the command prompt that pops up after the gui does? Or your own command prompt? Because you can't type in the gui prompt and it says svc isn't a recognized command in my own cmd

  • @Passive-Options-Trader
    @Passive-Options-Trader 5 หลายเดือนก่อน +1

    brilliant

  • @h2rv
    @h2rv ปีที่แล้ว

    Any ideas why pip doesn't work for me? I've tried so many solutions but I just can't get it to install anything! It always says pip isn't recognised. i've tried reinstalling python/pip. Tried changing it via the control panel and Ive tried various commands, this is so annoying!

  • @wowzers8853
    @wowzers8853 11 หลายเดือนก่อน

    Hi nerdy rodent, im trying to use this tut and so far its been very comprehensive (tysm!) however i have an amd card and over at pytorch its saying that rocm isnt available for windows. im not exactly very intelligent when it comes to this coding stuff so i was hoping you might ahve asolution. thank you!

    • @NerdyRodent
      @NerdyRodent  11 หลายเดือนก่อน +1

      Yes, Linux is best for AI performance, stability, ease of use and compatibility.

  • @danrandall3302
    @danrandall3302 ปีที่แล้ว +1

    Can’t wait til this gets streamlined lmao

  • @beikun
    @beikun ปีที่แล้ว

    Great video.

  • @VirtualShaft
    @VirtualShaft ปีที่แล้ว

    Thank you so much

  • @KyleJohnsonVA
    @KyleJohnsonVA ปีที่แล้ว +1

    Man your timing on this video is perfect, just what I was looking for! You are using Linux in this tutorial right? If so, is it Ubuntu or something else?

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +1

      Yup, Ubuntu 22.04 with an Nvidia GPU!

  • @adamlougoat
    @adamlougoat ปีที่แล้ว

    nice video!! could you give me a link or a website where I can find other voices like Kendrick's one for example

  • @jump2k
    @jump2k ปีที่แล้ว

    i need help lol
    when i do svc pre-resample
    it tells me Error: Invalid value for '-i' / '--input-dir': Path 'dataset_raw' does not exist.
    please i need help what do i do

  • @matiasbianchi1545
    @matiasbianchi1545 ปีที่แล้ว +1

    Please, a video for google colab! Thank you so much!

  • @SyndarNailo
    @SyndarNailo ปีที่แล้ว

    I was thinking, is possible to use this method as a sort of voice changer? For example, i want a character giving me a line with a particular tone, but tts don't give much acting. So i record myself percorming the line, and then use that lime a sort of acapella base.

  • @MrDanINSANE
    @MrDanINSANE ปีที่แล้ว

    Thank you for sharing! ❤
    I got couple of questions, I hope you don't mind:
    1 - Can I create the dataset from another Language, or it must be English?
    2 - Running this locally on my Windows PC - Will it work with 4GB Nvidia with the right settings? or by default it needs something crazy as 14GB VRAM?
    Thanks ahead!

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      For training you’ll need at least a mid-range consumer GPU, so colab for that! Inference you may get away with it…

    • @MrDanINSANE
      @MrDanINSANE ปีที่แล้ว

      @@NerdyRodent Thank you!
      I hope it can work with other languages for data set I mean.

  • @SyntheticVoices
    @SyntheticVoices ปีที่แล้ว +1

    Noice will check it out

  • @jeffshatton
    @jeffshatton ปีที่แล้ว

    What would happen if I combine my voice samples with a friend of mine's voice samples? Does it give some sort of hyrbid between the two?

  • @kernsanders3973
    @kernsanders3973 ปีที่แล้ว

    Awesome, the fact it can do different languages gives it and edge from other available options like xVASynth. What is considered a "reasonable" GPU? How much VRAM is required to do training and generating?

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      You should be able to get away with 8GB VRAM

  • @musumo1908
    @musumo1908 ปีที่แล้ว

    Another fine video! I’m trying to learn basic code to keep up..😂..does this mean you can use this sorcery and create the voice track to any video? I want to use with a talking head as per your other vids? Thanks

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      Yup. Any voice to any voice!

    • @musumo1908
      @musumo1908 ปีที่แล้ว

      Excellent! So can you then use this audio track for your thin plate spine driving video? Guess you need to somehow replace the original audio file? I’m leaving out lip syncing for now lol

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      @@musumo1908 Sure. Wav2lip can sort out the lip sync

  • @_def
    @_def ปีที่แล้ว

    im using an a6000 gpu with 48gb of ram. if i increase the batch to 30 does this increase the quality or just the amount of time it will take? I have noticed that stuff at 10k epochs sounds better then ones at 20k. How do you know if it is over training? been using data set of 200 samples. Are more samples better or is it overkill at a certain point? Alot of questions sorry lol

  • @ejabaLIVE
    @ejabaLIVE ปีที่แล้ว +1

    How would I go about adding more samples to an already trained model?

  • @wakegary
    @wakegary ปีที่แล้ว

    Great video - these trainings are hitting my CPU, and meanwhile my GPU is chillin' not doing much at all. When Tortoise is processing, it makes a satisfying 'working' sound. It's an Asus 3090 Strix. Not sure if I need to specify GPU anywhere, but I'm using Cuda 11.7 in my env, and followed the default instructions (avoiding the cpu/amd ones). 62c on CPU, 43c on GPU without anything running except, well, this video. - just wondering if I'm missing out by letting my ryzen do the lifting

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      The only way I can think of to not use the GPU would be to install the CPU only version of pytorch

  • @kidklava
    @kidklava ปีที่แล้ว

    The GUI version works amazingly well, but I can't seem to get the CLI version going... Anyone been down that road?

  • @youme4773
    @youme4773 ปีที่แล้ว

    Do you know where the HuBERT model of so-vits-svc-fork is?

  • @Antonsetiady
    @Antonsetiady ปีที่แล้ว +1

    Thanks Man !!! I just wonder if i can do this on mac mini or macbook pro ? I was no idea at all. or should i buy a laptop with window software. thanks again.

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +1

      Not locally that I’m aware of - but you could still use the colab. Linux + Nvidia is best if you’re getting new stuff!

  • @krankenwagen6042
    @krankenwagen6042 ปีที่แล้ว

    can you do other things than switching the language
    I heard that it can slightly alter the lycris while still keeping the same energy

  • @bobo-the-mf
    @bobo-the-mf 11 หลายเดือนก่อน

    I have a question. If i want to train a certain speaker, what command do i have to type to train only that certain speaker ? thank you for answer and this brilliant tutorial.

    • @NerdyRodent
      @NerdyRodent  11 หลายเดือนก่อน

      That would be just like I show in the video, where only 1 speaker is trained 😃

  • @dummieangel
    @dummieangel ปีที่แล้ว

    One of my biggest questions trying to learn how to do all this is that if im wanting to make an ai voice of a character I like and then use him for cover songs, but the problem is there's not many singing clips of his voice so i've been just wondering do they having to be singing clips? or can they be talking normally and that's fine? (ik this is probably a stupid question but it's been a huge struggle just trying to learn all this)

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      They need to be audio clips of the voice. Try to avoid any backing music.

  • @TheAiConqueror
    @TheAiConqueror ปีที่แล้ว +5

    You always make videos about things I'm looking for or didn't know I needed. Thanks very much 🫶

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +1

      😀 Glad to be of service! And much thanks to you too

  • @SineEyed
    @SineEyed ปีที่แล้ว

    I just spent an hour scouring the internet for a copy of the "practice for speech quality measurement" text - I can't find access to that file anywhere. Help me out bro -thanks..

  • @ayydot4374
    @ayydot4374 ปีที่แล้ว

    Not sure where I went wrong. Environment installed but i get svcg is not a recognized command.

  • @SaladeDeFruitt
    @SaladeDeFruitt ปีที่แล้ว

    haha as a french the intro song was hilarious to hear

  • @tufanarslan3311
    @tufanarslan3311 ปีที่แล้ว

    How is the latency in the real-time voice changing? Let's say, when the latest RTX cards are used? Can it benefit from multiple cards?

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      Not too bad, just a slight delay

  • @Glebean
    @Glebean ปีที่แล้ว

    Why do my models sounds like freaky ass robots?
    My friend and I are in a band, I took a vocal recording from one of our songs, and ran it to train on 66 (10second) segments. I let it run for 10000 epochs on the RTX 4090, the end result was pretty shiet tbh! I don't know what I'm doing wrong, is it just training duration? Should I let it run for 10+ hours? I can do that, the GPU is water cooled and the CPU is water cooled I have no issue running this for 24/7

  • @majorsingh9433
    @majorsingh9433 ปีที่แล้ว

    what to write in speaker id? when I press infer I'm getting this error ValueError: Speaker_id 0 is not found.

  • @nathanbanks2354
    @nathanbanks2354 ปีที่แล้ว +1

    Nice! I wonder if you can take the output of a text to speech program like Tortoise TTS or coqui-ai as the input for so-vits-svc to match someone's voice and/or improve quality.

    • @ChristianIce
      @ChristianIce ปีที่แล้ว

      Yes, you obviously can ;)

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว +4

      Yup. Or tts in another language too!

    • @nathanbanks2354
      @nathanbanks2354 ปีที่แล้ว

      @@ChristianIce I suppose what I'm really asking is how much less robotic the voice would sound if you do. TTS also often sounds over-compressed like an 8KHz telephone conversation rather than a 44.1KHz studio recording.
      (Now I'm also asking myself if I care enough to start experimenting.)

    • @MrAmack2u
      @MrAmack2u ปีที่แล้ว +1

      @@nathanbanks2354 i have the same question, pls update if you try it out...

    • @ChristianIce
      @ChristianIce ปีที่แล้ว

      @@nathanbanks2354
      It will sadly sound robotic exactly the same.
      This technique mimicks the input voice, that's what it makes more natural compared to TTS, to the point it can mimick any other language, no matter what.
      This sadly means that if you input a flat inexpressive voice, you would just get a flat inexpressive voice as output.

  • @user-hu1fl9iw2i
    @user-hu1fl9iw2i ปีที่แล้ว

    May I ask what the final music synthesis tool is?

  • @thorblehouse3236
    @thorblehouse3236 ปีที่แล้ว

    Using Google Collab, I can't seem to get the Inference GUI active. Any ideas?

  • @jackjones8790
    @jackjones8790 ปีที่แล้ว +1

    Great tutorial, thanks! Just a few questions: if I have dataset of 5-7 mins of audio in total for voice, how much epochs is better to train? Because in automatically generated config, there are 10000 epochs and it seems quite a lot

    • @NerdyRodent
      @NerdyRodent  ปีที่แล้ว

      Yeah, I start at 20k steps

    • @candyman3537
      @candyman3537 ปีที่แล้ว

      @@NerdyRodent My 128/16 * 3125 = 25,000 steps, so my epoch is 3125, is this alright?

  • @Judexx
    @Judexx ปีที่แล้ว

    Hi,
    after a couple of runs, Im getting a CUDA out of memory exception...how do I fix this? thanks

  • @choppergirl
    @choppergirl ปีที่แล้ว

    WIld, I already created a soundpack for RC radios with 850 sound files I've recorded and cleaned up of my voice with Audacity... I wasn't using the best mic at th time... some junk from Walmart, that I've since upgraded to a K668B, but still... wild that I already have a massive dataset of audio files of my voice before even getting started.

  • @sholonator92
    @sholonator92 ปีที่แล้ว

    I'm a producer who writes and compose the melody and lyrics for all my songs. I don't have a marketable voice so I've always had a hard time finding the right singers for my tracks. My understanding of this technology is limited for the moment but, does anybody here knows if currently one can make a custom voice with AI to overlay my own voice and be able to produce vocals that sound great without the need of a real singer???

  • @gamingdayshuriken4192
    @gamingdayshuriken4192 ปีที่แล้ว

    love ya