1boy is something form -booru(anime style) training models. 1boy just means there is exactly 1 male in the picture. 1girl, exacty 1 female. 2boy for 2 males, etc.
yes it does, don't worry i am aware, as i mentioned (not in enough detail mind you) this is useless for the SDXL inference, booru tagging tends to work best with 1.5 models, where the CLIP L / CLIP G do not use Booru Tags, they just use normal tokenized words. When i starting showing the append method (using a manually defined initial root caption in CLIP-G style first) this will add tags that are suitable for "hooking" into CLIP-L. The result is a more robust Lora that can approach ground truth to a closer degree. Sometimes i have to cut things like this, it's probably worth it's own video, but most little details like this do :) for anyone reading, the reply is correct, and played a major part in training with the old 1.5 models, however doesn't really take effect in SDXL as it didn't need that style of tagging.
After i saw what those "pony" base models can do, i am finally sold but i have a 3060 , so i couldnt train locally for sdxl because not enough vram, now i am recurring to collab. Thank god i never deleted My datasets for all the loras i want to remake
What counts as owning the rights in that check box? The images I wanted to use i just grabbed on the internet so I don't actually "own" them. main reason I haven't messed around with creating my own model yet.
If you are doing this for research and educational purposes, "rights" are not so clean cut in this regard. If you are taking other peoples protected content, that is not allowed, but i would say to evaluate a process and learn (where not publishing the model but using privately) fair use applies. I'm not a copyright lawyer and recommend you train on your own work or use public domain sources.
To make dataset from Google drive need to be same images? Example if I want to put superman in 3D like Disney or pixsrI need to download live action or something animation will be the same or could be error?
it depends if you train the style "disney/3d" or the character "superman" we can choose this by way of the order of importance in the description of the captions for each image. It's a very deep subject. Similar images will give more consistency to a concept you want to train, good descriptive captions are essential.
@@FiveBelowFiveUK alright and stable diffusion is different? example if I download image from scene example from Disney or Pixar will be different or will see like blurry or bad?
it all depends on the training method, civit is nice because it takes care of the fine settings. you need to pay attention to the wording in your captions. "a red hat on xjx cat" vs "a red xjx hat on a cat" the first will train the cat, the second will train the hat! this is the most basic example i can give here :)
I uploaded 10 images. they were 1x1 images of a mascot with a clean white background. chose sd 1.5 and the epoche examples i got back were terrible. just about all of them cropped the mascot from the waist down.
hey there, this video teaches you how to train lora with the FLux Dev model, SD1.5 was released in 2022 and it's pretty old now. I have not used it for training in over a year! Try choosing Flux ;) This model came out last month. Good Luck
(4:16) Here's why you should bother with sample image prompts. Basically it's three free images that you don't have to create outside of this page. Let's say you were creating a LoRA for the token "mycoupletoken". Throw in "mycoupletoken on beach", "mycoupletoken in a limo", and "mycoupletoken jogging in the park" just to give people an idea of how to use your LoRA. If it turns out the images suck, you can always delete them later.
as it turns out, I now set sample prompts everytime, mostly because as soon as i used a complex dataset, the tags were all shuffled in the sample prompts. This makes them useless if you do not put in some sample prompts :D
questions: 1) do all your images that you uploaded have to be the same dimensions? 2) lets say i upload a bunch of vertical images, does that mean it wont train the model to be horizontal? or can it train the image to be both vertical and horizontal? i see an option to type in dimensions but if i type in 1024x1024, will it not do vertical training? 3) do you have a workflow that you would recommend to test out the finish lora? 4) i skipped the middle part of your video, but i didnt hear you talk about trigger words in the first lora training that you demo'd. if you didnt bring up trigger word in this video, how do you set that? i need to go back and watch the whole video 5) do you have any advice for writing good captions?
since bucketing was introduced to the lora training scripts some years back, i believe that it can be advantageous to train the same image in various scaled sizes and also as cropped tall/wide formats, when trained with good captions this will improve the training results. I teach in stages, assassinkahb test model is all 1024x1024, 1:1 only. later models which i use to teach contain a defined ratio of images, for example, with a single concept, we might train that image in 1:1 at 100% (1024x1024) 75% (768x768), 50% (512x512) - then we can use these base dimensions to include wide and tall format images, depending on the dataset, you might have 6-9 images to represent that concept. you are teaching it knowledge to draw from when you run inference, it can learn how the edges "line up" with the style you are teaching. All captions in your model are triggers because you are finetuning tokens to create a difference patch on the model weights. this is how you can load your lora with many different base models that share the same base. with assassinkahb test dataset, the main triggers are "assassinkahb style" i do this to show you do not need to use "xjx style" although this has it's own uses, every word which is repeated many times is a trigger. The ones which is most common are the strongest, that is all. Context is incredibly powerful, due to NLP. "photo of xjx dog with hat" trains that specific dog "photo of dog with xjx hat" trains the hat. (in the simplest terms) soon there will be a tutorial series on this subject, so watch out for that
@@FiveBelowFiveUK what does a bucket stand for. and when i upload different images to train the lora, can the images be different sizes or should they all be the same size?
...we covered OneTrainer in a past video for those with valid privacy concerns, you also have KohyaSS-GUI. I aim to provide processes for all levels of experience, online/offline. on site Auto-trainers are designed for use by anyone at any skill level. As discussed in this video, we often prefer more control when training however that is a deep skill requirement that requires lots of experience, this is overkill if you only want to experiment with a Lora of your Pet for example ;)
*Get Aiarty Image Enhancer to enhance AI-generated images & other types of photos:
www.aiarty.com/?ttref=2409-aia-mj-ytb-FBF-incl-wxr
1boy is something form -booru(anime style) training models. 1boy just means there is exactly 1 male in the picture. 1girl, exacty 1 female. 2boy for 2 males, etc.
yes it does, don't worry i am aware, as i mentioned (not in enough detail mind you) this is useless for the SDXL inference, booru tagging tends to work best with 1.5 models, where the CLIP L / CLIP G do not use Booru Tags, they just use normal tokenized words.
When i starting showing the append method (using a manually defined initial root caption in CLIP-G style first) this will add tags that are suitable for "hooking" into CLIP-L.
The result is a more robust Lora that can approach ground truth to a closer degree.
Sometimes i have to cut things like this, it's probably worth it's own video, but most little details like this do :)
for anyone reading, the reply is correct, and played a major part in training with the old 1.5 models, however doesn't really take effect in SDXL as it didn't need that style of tagging.
@@FiveBelowFiveUK Really interesting stuff! I would love a video explaining this more. F-ing love your channel!
After i saw what those "pony" base models can do, i am finally sold but i have a 3060 , so i couldnt train locally for sdxl because not enough vram, now i am recurring to collab.
Thank god i never deleted My datasets for all the loras i want to remake
@@huevonesunltd the dataset is king!
there will be many more models to retrain our data on, no doubt!
What counts as owning the rights in that check box? The images I wanted to use i just grabbed on the internet so I don't actually "own" them. main reason I haven't messed around with creating my own model yet.
If you are doing this for research and educational purposes, "rights" are not so clean cut in this regard. If you are taking other peoples protected content, that is not allowed, but i would say to evaluate a process and learn (where not publishing the model but using privately) fair use applies. I'm not a copyright lawyer and recommend you train on your own work or use public domain sources.
To make dataset from Google drive need to be same images? Example if I want to put superman in 3D like Disney or pixsrI need to download live action or something animation will be the same or could be error?
it depends if you train the style "disney/3d" or the character "superman" we can choose this by way of the order of importance in the description of the captions for each image. It's a very deep subject. Similar images will give more consistency to a concept you want to train, good descriptive captions are essential.
@@FiveBelowFiveUK alright and stable diffusion is different? example if I download image from scene example from Disney or Pixar will be different or will see like blurry or bad?
it all depends on the training method, civit is nice because it takes care of the fine settings. you need to pay attention to the wording in your captions. "a red hat on xjx cat" vs "a red xjx hat on a cat" the first will train the cat, the second will train the hat! this is the most basic example i can give here :)
Can they still be made through google? Not sure if it's true or not, but I've been hearing google cut that feature off
I uploaded 10 images. they were 1x1 images of a mascot with a clean white background. chose sd 1.5 and the epoche examples i got back were terrible. just about all of them cropped the mascot from the waist down.
hey there, this video teaches you how to train lora with the FLux Dev model, SD1.5 was released in 2022 and it's pretty old now. I have not used it for training in over a year! Try choosing Flux ;) This model came out last month. Good Luck
(4:16) Here's why you should bother with sample image prompts. Basically it's three free images that you don't have to create outside of this page. Let's say you were creating a LoRA for the token "mycoupletoken". Throw in "mycoupletoken on beach", "mycoupletoken in a limo", and "mycoupletoken jogging in the park" just to give people an idea of how to use your LoRA. If it turns out the images suck, you can always delete them later.
as it turns out, I now set sample prompts everytime, mostly because as soon as i used a complex dataset, the tags were all shuffled in the sample prompts. This makes them useless if you do not put in some sample prompts :D
questions: 1) do all your images that you uploaded have to be the same dimensions? 2) lets say i upload a bunch of vertical images, does that mean it wont train the model to be horizontal? or can it train the image to be both vertical and horizontal? i see an option to type in dimensions but if i type in 1024x1024, will it not do vertical training? 3) do you have a workflow that you would recommend to test out the finish lora? 4) i skipped the middle part of your video, but i didnt hear you talk about trigger words in the first lora training that you demo'd. if you didnt bring up trigger word in this video, how do you set that? i need to go back and watch the whole video 5) do you have any advice for writing good captions?
since bucketing was introduced to the lora training scripts some years back, i believe that it can be advantageous to train the same image in various scaled sizes and also as cropped tall/wide formats, when trained with good captions this will improve the training results. I teach in stages, assassinkahb test model is all 1024x1024, 1:1 only. later models which i use to teach contain a defined ratio of images, for example, with a single concept, we might train that image in 1:1 at 100% (1024x1024) 75% (768x768), 50% (512x512) - then we can use these base dimensions to include wide and tall format images, depending on the dataset, you might have 6-9 images to represent that concept.
you are teaching it knowledge to draw from when you run inference, it can learn how the edges "line up" with the style you are teaching.
All captions in your model are triggers because you are finetuning tokens to create a difference patch on the model weights. this is how you can load your lora with many different base models that share the same base.
with assassinkahb test dataset, the main triggers are "assassinkahb style" i do this to show you do not need to use "xjx style" although this has it's own uses, every word which is repeated many times is a trigger. The ones which is most common are the strongest, that is all.
Context is incredibly powerful, due to NLP. "photo of xjx dog with hat" trains that specific dog "photo of dog with xjx hat" trains the hat. (in the simplest terms)
soon there will be a tutorial series on this subject, so watch out for that
@@FiveBelowFiveUK what does a bucket stand for. and when i upload different images to train the lora, can the images be different sizes or should they all be the same size?
nasty beat ;)
How do you make the txt files? did you make them individually for each image or how did you set it up?
i have always made them by hand, however i will soon address this with new workflows in the next video - it's due any day now :)
I prefer local procedures
...we covered OneTrainer in a past video for those with valid privacy concerns, you also have KohyaSS-GUI. I aim to provide processes for all levels of experience, online/offline.
on site Auto-trainers are designed for use by anyone at any skill level. As discussed in this video, we often prefer more control when training however that is a deep skill requirement that requires lots of experience, this is overkill if you only want to experiment with a Lora of your Pet for example ;)