BTW my configs are definitely not the best - I'm still tinkering around the correct params for the more experimental things like single image training and target blocks training. Most of the time they're not a catch-all configuration, so experiment with rank, learning rate and bucket size to see what fits your dataset. As for why I construct datasets without captioning, it's because my theory is that - for my specific usecase - I want to train a new concept, so I don't want any previous biases to dilute the concept. It might be wrong, but it's what I'm doing because I need it to be like this in terms of my specific pipelines.
Great tutorial! I've never used Runpod before but I've recently been thinking that I have to figure it out to train some loras - and this video is exactly what I needed. Thank you, Andrea! 🙏
Thank you for an interesting and well explained guide! I wouldn't have gotten anywhere on my second day doing this if it hadn't been for your tip to reinstall requirements. I have actually done a few things differently so far since I followed another guide before I saw yours but I'm glad I watched yours as well for comparison. Especially interesting the part about training on fewer images. I do most of my work with vintage portraits and I often don't have more than a couple of training images that are in focus. Looking into using fewer but better photos sounds like a potential gamechanger.
9:10 Is that problem arose just because you didn't set up the virtual environment (venv) to run AI-Toolkit in the first place? After stopping the runpod, you just need to activate the AI-Toolkit venv and run it. Otherwise, it will break because the initial library and dependency installation is in the AI-Toolkit venv environment, and you opened AI-Toolkit without it, so it won't work. Yes, you can just install the requirements again, but you'd be installing them globally (without venv), which isn't recommended.
suggestion - show a few examples of generations by loading the trained loras with the base model and running it on some prompts. This can tell us about the diversity in generations from a small (or just one!) collection of training data.
You’re right, I should have, that was definitely an oversight on my end. If it’s of any consolation, the thumbnails for this video were made using one of the Loras
Hey Andrea - thanks so much for the tutorial! Do you have any advice on create a quality dataset? Is image size important? Aspect ratio, size? The reason I ask is that I have followed exactly each method you have shown, yet I cannot get anyway close to the quality of character recreation with my LoRAs.
I haven't tested schnell, mostly because it doesn't get access to other things I use (ctrlnet, for example), so I'm not sure - but the config file is there, so it should work
Hi Andrea, it was deleted, it is errerno2 path to the data set not recognised or not found, I have changed everything I can think of and can't seem to get it to pick the data set path up.
Hi Andrea, I got it all working then hit this error and I cannot figure out what to do, I followed instructions exactly even using the same name. any suggestions?
I’m training the same thing (my face) over and over again, so in this particular case changing the trigger word is the same as not changing it. If I wanted to train something different I’d change it, but a trigger word is just a “random” series of letters you append to an image in order to let the model understand that trigger word = the concept / subject / style contained in the image(s)
@@risunobushi_ai yeah but in the section in the yaml file where you'd define your trigger word it says "p3rs0n" but you used the other word both in your captions and prompts later on. So I was wondering if you maybe missed this one...? I do a lot of testing for training persons myself currently and I found that using a trigger word doesn't really seem to change anything. Sometimes I feel it increases the likeness a bit but this might be also very subjective I guess.
The “p3rs0n” trigger word is in green, preceded by a #, so it’s bypassed and treated as a comment Personally I’m testing single trigger words because it’s what I need for my work pipelines, but there’s definitely a whole lot of different approach, such as no tags, trigger + tags, trigger + automatic natural language captioning, etc
@@risunobushi_ai just get this error which seems from github nobody can fix yet. Wondered if there was another way? TypeError: ModelPatcher.calculate_weight() got an unexpected keyword argument 'intermediate_dtype'
Hi Great tutorial, however I cannot log into hugging face using the command huggingface_cli login, I get a bash: huggingface-cli: command not found, anyone help?
Did you install the requirements by following the GitHub commands? Anyway, you can install huggingface-cli by typing pip install -U “huggingface_hub[cli]”
6$ a day makes 180 a month. You could build a decent machine for genAI and amortize the expense within less than a year if compared to these prices. At least, that's what I'd do if I don't find any better (reads less expensive) solution, but that's me getting anxious really easily when it comes to timer-based expenses.
to be fair what I usually do is aimed at professionals first, hobbyists second - and for a professional 6 dollars a day (depending on where you live and work) is not necessarily a lot of money. add to that that professionals often can deduct expenses, or allocate them as a business expanse and invoice the clients for it. also, being from the EU, a 4090 costs around 2200 USD. 6 USD a day * 5 days a week (6 for me but let's keep it at a work week as an average) * 50 weeks a year (excluding 2 weeks of vacation) = 1500 USD, and that's saying nothing of the prices for higher end cards like the H100 (32k USD + the cost for the server).
One major issue with runpod is that all the templates are not updated to the tools to train flux loras. If you are not working in a professional manner with them, it's not worth it because you have to pay for the pod volume even if you aren't using it. It would be great if kohya or AI toolkit actually get a template so you don't have to set up the pod all the time.
this might be me actually doing it for work, but I'm paying 3 cents/hour for the disk storage, which ends up being something like 0.7 dollars a day, which is not a lot for what I do. but regardless, if you don't actually need the storage, you could terminate the pod and set it all up in something like half an hour every time you actually need it. it's more like a matter of convenience, which could be improved with supported docker images, but then someone else would need to actually maintain them
@@risunobushi_ai yea I know, there are images available actually, but none of them are updated sadly. Kohya for example has a flux branch but the images sadly don't use it. Regardless it's a nice Video
I realized and I switched to my H100 (that’s why at one point I say “I switched to the H100, I was sitting through all the trainings for the video and I decided to run them all in parallel on different pods), rebuilding the dataset and config file behind the scenes. It happened so many times during the recording of this video - “wait, did I switch the dataset / output folder names?” It still happens when I do my own training, I just forget one parameter and I have to restart the training, way more times than I’d like to admit!
0.30 USD for a LoRA (on a A40, including pod setup times) doesn't seem much compared to Replicate and CivitAI's pricing, which are arguably still rather cheap for professionals
BTW my configs are definitely not the best - I'm still tinkering around the correct params for the more experimental things like single image training and target blocks training. Most of the time they're not a catch-all configuration, so experiment with rank, learning rate and bucket size to see what fits your dataset.
As for why I construct datasets without captioning, it's because my theory is that - for my specific usecase - I want to train a new concept, so I don't want any previous biases to dilute the concept. It might be wrong, but it's what I'm doing because I need it to be like this in terms of my specific pipelines.
Oh yeah. Keep those tutorials coming. Thank you for your deep insights and expertise
Thank you! I will!
Great tutorial! I've never used Runpod before but I've recently been thinking that I have to figure it out to train some loras - and this video is exactly what I needed. Thank you, Andrea! 🙏
Thank you for an interesting and well explained guide! I wouldn't have gotten anywhere on my second day doing this if it hadn't been for your tip to reinstall requirements.
I have actually done a few things differently so far since I followed another guide before I saw yours but I'm glad I watched yours as well for comparison. Especially interesting the part about training on fewer images. I do most of my work with vintage portraits and I often don't have more than a couple of training images that are in focus. Looking into using fewer but better photos sounds like a potential gamechanger.
thank you so much for this great tutorial. I will try it out
Super useful and actually it's animating me to jump to the pool 🏊♂ of training
Thank you Andrea, very informative and clear.
9:10 Is that problem arose just because you didn't set up the virtual environment (venv) to run AI-Toolkit in the first place? After stopping the runpod, you just need to activate the AI-Toolkit venv and run it. Otherwise, it will break because the initial library and dependency installation is in the AI-Toolkit venv environment, and you opened AI-Toolkit without it, so it won't work. Yes, you can just install the requirements again, but you'd be installing them globally (without venv), which isn't recommended.
I did say I am an idiot, so this might be precisely why lol
suggestion - show a few examples of generations by loading the trained loras with the base model and running it on some prompts. This can tell us about the diversity in generations from a small (or just one!) collection of training data.
You’re right, I should have, that was definitely an oversight on my end. If it’s of any consolation, the thumbnails for this video were made using one of the Loras
Hey Andrea - thanks so much for the tutorial! Do you have any advice on create a quality dataset? Is image size important? Aspect ratio, size? The reason I ask is that I have followed exactly each method you have shown, yet I cannot get anyway close to the quality of character recreation with my LoRAs.
I can't get these Loras to work in ComfyUI... Is it possible? Example simple workflow?
any results comparison in 2 layerblock training?
How should i approach training objects, for example shoes? Should i be captioning
I keep getting this error when I go to use "python run.py config/name.yaml" "[Errno 2] No such file or directory: 'workspace/name"
Nevermind, you answered it already in a different response. I didn't have a / in front of workspace.
Would this be the same for schnell? if we just have to use the different config as mentioned in the repo for ai-toolkit?
I haven't tested schnell, mostly because it doesn't get access to other things I use (ctrlnet, for example), so I'm not sure - but the config file is there, so it should work
Hi Andrea, it was deleted, it is errerno2 path to the data set not recognised or not found, I have changed everything I can think of and can't seem to get it to pick the data set path up.
Ah, I see the issue. Add a / before workspace, I don’t know why sometimes Jupiter copy as path doesn’t add the /
Hi Andrea, I got it all working then hit this error and I cannot figure out what to do, I followed instructions exactly even using the same name. any suggestions?
Hi, which error? If you copy paste it here it will probably be auto moderated and deleted, but I’ll be able to see it and approve it.
Thanks. You didn't adjust the trigger word in your config files. Is this not necessary?
I’m training the same thing (my face) over and over again, so in this particular case changing the trigger word is the same as not changing it. If I wanted to train something different I’d change it, but a trigger word is just a “random” series of letters you append to an image in order to let the model understand that trigger word = the concept / subject / style contained in the image(s)
@@risunobushi_ai yeah but in the section in the yaml file where you'd define your trigger word it says "p3rs0n" but you used the other word both in your captions and prompts later on. So I was wondering if you maybe missed this one...? I do a lot of testing for training persons myself currently and I found that using a trigger word doesn't really seem to change anything. Sometimes I feel it increases the likeness a bit but this might be also very subjective I guess.
The “p3rs0n” trigger word is in green, preceded by a #, so it’s bypassed and treated as a comment
Personally I’m testing single trigger words because it’s what I need for my work pipelines, but there’s definitely a whole lot of different approach, such as no tags, trigger + tags, trigger + automatic natural language captioning, etc
How to even test this? I cant find a flux/lora comfy workflow that works with my lora
Are you using flux dev non quantized as the base model?
@@risunobushi_ai I believe so? I used FLUX.1-dev for the training on runpod just like you, and use the same model on comfy
Uh, that’s weird. Are you using a loraloadmodelonly node to load the Lora after the load diffusion model node and getting no Lora results?
@@risunobushi_ai just get this error which seems from github nobody can fix yet. Wondered if there was another way? TypeError: ModelPatcher.calculate_weight() got an unexpected keyword argument 'intermediate_dtype'
@@risunobushi_ai ahh nevermind, seemed it was happening on all workflows. I uninstalled XLABS node and now its ok
Hi Great tutorial, however I cannot log into hugging face using the command huggingface_cli login, I get a bash: huggingface-cli: command not found, anyone
help?
Did you install the requirements by following the GitHub commands?
Anyway, you can install huggingface-cli by typing
pip install -U “huggingface_hub[cli]”
I got stuck here too. There is a button on the flux hugging face page that you have to accept access.
6$ a day makes 180 a month. You could build a decent machine for genAI and amortize the expense within less than a year if compared to these prices.
At least, that's what I'd do if I don't find any better (reads less expensive) solution, but that's me getting anxious really easily when it comes to timer-based expenses.
to be fair what I usually do is aimed at professionals first, hobbyists second - and for a professional 6 dollars a day (depending on where you live and work) is not necessarily a lot of money. add to that that professionals often can deduct expenses, or allocate them as a business expanse and invoice the clients for it.
also, being from the EU, a 4090 costs around 2200 USD. 6 USD a day * 5 days a week (6 for me but let's keep it at a work week as an average) * 50 weeks a year (excluding 2 weeks of vacation) = 1500 USD, and that's saying nothing of the prices for higher end cards like the H100 (32k USD + the cost for the server).
@@risunobushi_aialso the cost of electricity to run those cards... nothing to be ignored
I can't get my loras to work anywhere but runpod.
One major issue with runpod is that all the templates are not updated to the tools to train flux loras. If you are not working in a professional manner with them, it's not worth it because you have to pay for the pod volume even if you aren't using it. It would be great if kohya or AI toolkit actually get a template so you don't have to set up the pod all the time.
this might be me actually doing it for work, but I'm paying 3 cents/hour for the disk storage, which ends up being something like 0.7 dollars a day, which is not a lot for what I do. but regardless, if you don't actually need the storage, you could terminate the pod and set it all up in something like half an hour every time you actually need it. it's more like a matter of convenience, which could be improved with supported docker images, but then someone else would need to actually maintain them
@@risunobushi_ai yea I know, there are images available actually, but none of them are updated sadly. Kohya for example has a flux branch but the images sadly don't use it. Regardless it's a nice Video
great video
by the way you forgot to change the folder path in your config for your single image training
I realized and I switched to my H100 (that’s why at one point I say “I switched to the H100, I was sitting through all the trainings for the video and I decided to run them all in parallel on different pods), rebuilding the dataset and config file behind the scenes. It happened so many times during the recording of this video - “wait, did I switch the dataset / output folder names?”
It still happens when I do my own training, I just forget one parameter and I have to restart the training, way more times than I’d like to admit!
@@risunobushi_ai I guess that even happens to the best of us all the time :)
@risunobushi_ai yes I did, and I am trying a new install to see if that works. 👍
Not cheap at all
0.30 USD for a LoRA (on a A40, including pod setup times) doesn't seem much compared to Replicate and CivitAI's pricing, which are arguably still rather cheap for professionals