I would love if you could go over some of those settings in advanced detail - like "oh, I fiddle with more conditioning steps when I want to X", etc. There are so many superstitious people out there giving bunk advice that your level-headed breakdown would be super valuable!
Why are there width and height values in the CLIPTextEncoderSDXL and what is the difference between width and target_width and why is one of them 4096?
Great questions and hopefully Scott can take the time to explain. Building out the workload is a great first step, but not knowing what everything does so that you can fine tune it is lame.
I've grown to understand and enjoy comfy UI more that the one i was using before thanks to your videos.I really appreciate you and the effort you put into making these tutorials. One of these days you can show us how to train sdxl 1 or it's lora with our faces . Thanks :)
this video has some great insights on how to process the original image. i have a few fyis to add. for those of us stuck with low vram rigs that have to run SD 1.5 (fir now 😢), the verbose negative prompt is essential -- for SDXL it is worthless like Scott says. for those with Mac, this web interface uses ctl just like windows. if you like png over jpg and don't want to share metadata, open the image in an editing app (photoshop, gimp, etc.), export as png, and make sure the include metadata is not selected. Thanks for all the great content Scott -- you never disappoint 👍.
Excellent tutorial, thanks! I got SDXL up and running with the refiner. If you have the time I'd like to see you make a video explaining how Stable Diffusion works and explain exactly what the program is doing as it sends the data through the nodes in Comfy so I can have a greater conceptual understanding of what is happening. Believe me I could watch hours of technical stuff lol.
Huh. I wonder what would happen if you had dedicated models for a variety of tasks (hands, eyes, hair, reflections, contrast, and so on) and fed a few steps from each of them in a daisy chain until you got to the first "true" sampler... Truly the possibilities are endless; thanks for the food for thought and the hard work!
That is an interesting idea. The multitude of experts approach is proving to be the more effective of what we have developed recently. Not too mention that you could also combine this with prompt blending syntax to ensure that each part of the processing is focusing entirely on one subject in the prompt while still maintaining an overall mixed composition. If for simplification purposes you set up 5 samplers, each with an equal number of steps, 4 for the limbs and 1 for the head/torso. Then you set up a prompt blending which focuses 20% of the processing on each limb etc. it may even have better results.
And yeah using LORA chains would mean that we could have a separate model output for each limb, while maintaining the same initial model. Allowing for less resources used at the same time compared to multiple dedicated models.
Thanks for these tutorials, great to have an in depth dive into the UI. I'm a little confused about the start/end steps and steps in the KSampler. In your second sampler in the chain of them, if you start at step 3 and do 12 steps, wouldn't that leave you at step 15 for your starting point in the next one?
This is awesome! Thanks for the clear tutorial! I'm curious though. Is there a reason you left 'return_with_leftover_noise' disabled on the last Ksampler that you added?
I have been in love with ComfyUI since I found it (coming from Unreal Blueprints, very familiar system). I am currently working out some torch issues with my current system, but I generate whenever I can. It is great to see you building out the workflow and explaining the nodes that you use and why. Very informative and THANKS for the tip with the shift-click to copy nodes AND connections. NICE!
I'm new to ComfyUI all and really love your videos. Thanks! Maybe this is obvious to folks, but one thing I recently learned was the ability to condition after one KSampler ran so you can continue to refine your final image. It ended up being an alternative (or another tool in the toolbelt) to inpainting. I wasn't just refining, I was adding to or dramatically changing the final image - all without losing the "base" starting point that was all "locked down" in that the seed was fixed, the cfg and steps didn't change, etc. So it was a very non-destructive compositional workflow. If I wanted to add an object to the image, I could do that through a second prompt that was applied to a second KSampler. I could also introduce new LoRAs later on in those steps. I'm going to continue to experiment with this strategy and go through this more than once. So instead of a long prompt followed by a smaller corrective one, do more of a build up of prompts. Start simple and continue to add on to it so that elements within the image can be independently adjusted, removed, or re-arranged. Again, a more compositional approach during image generation to hopefully reduce the amount of work in post (or a series of very similar images that can be worked together in post processing). This could get a bit messy too, but maybe not if they are arranged left to right in a linear fashion building up the scene.
WOW ! that's a super tutorial of ComfyUI there ! Thanks. I never know that there was this new addition of clipnode for SDXL ! The only drawback that I find in ComfyUI is the way it manage the workflows. I mean when you want to change your original workflow, you need to save a local file, and if you want to do something else (like inpainting) you have to redo ALL your workflow and save it to a file to recall your workflow and switch by loading one workflow or another depending on what you want to do. Definitly not fond of this way of managing workflows. They could have done some kind of "favorite" workflow. Like 5 or more "workflow ready" that you could custom afterwards and save your "favorite custom workflow" and switch whenever you like. it would skyrocket the use and adoption of comfyui !
Great video Scott, I wonder could you explain how to change the image size? What do I have to alter to produce an image of 832 x 1216 for example? Or point me to a future video that explains it, as I'm only on ep.2 Thanks💖
I was able to follow the tutorial well. I'm a bit confused at the three separate seeds. I can adjust the first (the conditioner/initializer) and get changed results, do I care about the others? In a previous video you said it wouldn't matter much for that context. Is that also true here?
thank you so much, i've become really proficient with A1111 and moving to comfyui was a big switch, so your help with how the workflows work in comfyui has made it just as easy as using A111 for me.
When you added the third or 'pre-sampler', why did you not pass the noise information as you had done with the first of the two samplers? I messed with that setting on the first two and didn't notice much of a change. Thank you for the videos and instructions. They are extremely helpful. And you suggest not to add things like extra fingers to the negative prompt. What is your method of not getting extra fingers or limbs, etc?
Prompt switching can be realized with additional KSampler that will render first steps with completely different prompt. For example you may want to create triangle composition, or a symmetrical image, and it can be done at early steps of a generation. Good for abstract art. And also I like that in ComfyUI it's seed can be fixed while base model and refiner will be generating on different seeds
I appreciate the clear steps and explanations. Almost all SDXL models found on Civitai for example, do not have refiners, so do I use the official refiner?
Thanks for this look at the setup that Stability uses internally. I'm not so familiar with Comfy, but I've been using and enjoying SDXL through Invoke, which has a similar Nodes capability. I have a few questions and comments: 1. What are the Original and Target W/H actually doing for the CLIP conditioning nodes and what is the logic to setting those values? I played around with it, testing various combinations, and the only thing I could confidently say is that setting Original W/H smaller than 1024 causes the image to become blurry. I couldn't see any specific benefit to any other value, as I tried 1024, 4096, and 40960 for Original and between 64 and 40960 for Target -- setting different values made the image different, but not obviously better or worse. I settled on just setting them the same as the output image dimensions. 2. Why are there two prompt inputs for the base text encoder node when you provide the same input to both? Invoke calls one input the prompt and the other the style. What effects are caused by, e.g. separating your prompts into a prompt and a style and sending them independently to the two inputs, switching the inputs (so prompt goes to the "style" input and vice versa), setting them both the same, or leaving one or the other blank? I've found that if I prompt the base model for a roller coaster in the first input, I get a roller coaster. But if I prompt "roller coaster" for the first input and "photograph" for the second, I get anything BUT a roller coaster -- ruined buildings, abstract paintings, etc. 3. Connected with #2, Invoke's refiner conditioning node only includes a "style" input, but I've found that only giving it a style prompt can cause the refiner to do weird things (like making architecture look like it's made of tent fabric). 4. You've indicated that initializing the noise with the refiner is an interesting idea, which it is, but have you seen any consequences other than just making the images different? Does it provide any actual benefit? 5. I've experimented with higher resolution SDXL generations. I'm on a Mac and there are some apparent generation bugs with Invoke on MPS (about 1856 square and above it becomes debilitating). But I've noticed that my scenes at higher resolution (photographic sci-fi style architecture) tend to become wide angle and taken from a high vantage point, almost as if the resolution setting is correlated with the position and zoom of the virtual camera. Has Stability done any experiments at higher resolutions than 1024x1024? 6. Is there a benefit or danger to sending the same noise seed to both the base and refiner?
Good questions... I also would like to know the answer. Did you understand the concept of why there is a field of dimensions in a node that is supposed to provide only text?
Using SDXL with a 2060 SUPER 8GB + ComfyUI and it works great 👍Turned out that Comfy is 15x(!!!) faster than A1111 for the same tasks using SDXL! Also I never got any errors with ComfyUI while A1111 always gives me "NaN tensor" errors when working with SDXL. For SDXL, Comfy is a MUST! So I`m looking forward for more tutorials on ComfyUI.
I am on the opposite; I am only be able to use Colab. When using A111, it's very fast and smooth, allowing me to work with videos and other tasks, including upscaling to 8k,comfuyi just pictures. However, with Comfuyi on Colab, I encounter issues such as disconnects and running out of ram after just 5 images, im using workflow of olivio sakiras, using base ckpt sd1.0 + refiner sd 1.0 . overheard refiner consumes lots of Ram
@@JolieKim2795 I`d double check your workflow used then. Also did you try to run ComfyUI locally? You don`t need a decent GPU to do so. Even an old 8GB NVIDIA will do it.
10 หลายเดือนก่อน
Thanks Scott, as a beginner your videos are great. Very well explained and "easy to learn than a bunsh of others.
Thanks. If you're looking for recommendations, a video focused on comparing upscalers and incorporating upscaling into this kind of workflow might help people. Seems like a nice next step. Appreciate what you've shared so far.
Hi there Scott, thank you for the excellent tut. I must admit though, my robots did not look anything close to how refined yours came out. I wonder if I missed anything somewhere...
Thanks for the tutorial! Is the "noise seed" in Ksampler Advanced same as "seed" in Ksampler? You set noise seed as 4, what's the meaning of the number? What if I left it as zero?
Appreciate this Scott, you helped me fill in the blanks! I was wondering how the primitive nodes were used, it was driving me nuts! Hahaha! I was able to add an extra step to add an upscale process and it works very well! Looking forward to more.
I just use Bridge, since I already have an Adobe sub and it's better than most gallery apps. Comfy really isn't good for that type of thing at this point.
Couple of things, isn't it recommended for the refiner to actually be started at 80% of total steps? Also, is conditioning via the refiner really a thing or did you just kind of mess around with it? You didn't select pass on noise, so I'm not sure what that means. Thank you for the tutorials, they are great!
One question I had. Is there any reason why you recommend using the VAE from the refiner, when there is only 1 version of the VAE (barring custom fixes for FP16) publicly available? If I choose to merge the fixed FP16 base VAE with the refiner, am I getting the same experience as you are (besides fp16-fp32 differences) ?
I’m curious, how would I implement a lora in this setup? I tried inserting 2 lora nodes after the checkpoint nodes and connecting them like I would in SD 1.5, but it seems to not be registering the existence of my Lora and just skipping over it. My checkpoints are connected to the Lora nodes only, except for the VAE, which is used for the decoding, what am I doing wrong and how exactly do I fix this?
Thank for this really concise and helpful tutorial. Just one thought, you did not enable the "return with leftover noise" for the "initial conditioning" node. Wouldn't it make sense to do so?
Very helpful. Have you experimented with learned with using multiple Ksamplers? Are you still keeping at least 3 at different steps today as part of your workflow?
I just did download the official base and refiner but it seems I've got the VAE version from somewhere else in the past. What's the difference? I get that the VAE is built-in to the model. Does this mean you get to delete the VAE Decode node or some other node? Can you just keep the VAE version and follow your workflow with no difference in results or at least no negative results in quality? As in the last step you showed you can 1st generate a blank latent and then into the base and refiner... Seems like you can do all sorts of tricks like that to experiment with the resulting image. I wonder if it makes sense. If I get it right, it seems that the latent creates a base noise ignoring the models so that you can just get something a bit out of the box (model). Is that right? Thank you for the tutorial. I have lot's of stuff to learn.
Hi Scott, im rewatching the whole series again, you have done a good job. I have a question in this particular episode with the Sampler...why do you have the possibility of using the denoise within the KSampler but not with the advanced KSampler? Do they work differently?
it was to simplify things. when you start at a later step with the advanced sampler, you are "skipping" some of the pieces you do not want to denoise, so it is the same thing but harder to explain.
When you make Img2Img in one of the videos I saw that you used the common Ksampler because you needed the denoiser. Now everything is much clearer to me, thank you very much for answering.
fantastic! I am looking at some advance workflows, however with no real explanations how they work. I want to use it but I dont know what some of the nodes and flows do! However I found alot of value from your vids and at this stage I am happy to just play and learn comfy and put of creating art projects/ideas with SDXL for the time being. That 3rd Sampler is neat! I tried to see if you can use latent upscale method in your previous video with SDXL base and refiner, didnt work but that is the beauty of comfy! You get to try stuff
wow - great tutorial dude. I've only recently got into comfy and wondered why all the controlNETs were failing last week :D All new ones install thanks to your videos and loving all the sdxl videos... fun times ahead (but I really need a pc gaming rig for speed) haha Out of interest - what kind of set up for a pc would you recommend for quicker generation/processing? massive 128gb RAM and like a RTX4090? :D thanks for your videos - amazing
Being that this video is now four months old can I assume that your checkpoint is now named differently? The one I have that was downloaded when I installed Searge's script yesterday has the vae included in the filename like so: sd_xl_base_1.0_0.9vae.safetensors and a refined one named respectively.
Perhaps things have changed since this was published nine months ago, because this workflow just gave me dark, abstract images. But I learned a lot about how to build out a workflow! Thanks!5
I experimented with your workflow and I found that to really see what refinement was done, you should leave BOTH samplers at 20 steps, then on the refinement sampler you can start at 12. This way you can properly compare the differences. Whereas when you do only 12 steps on the first sampler, you end up with a significant changes on the refined image.
No, but I should make one. It's just terrible what people pass on as the perfect negative. Do they think the model was trained on "bad anatomy" and "extra fingers?"
on that third sampler you added you kept the return with leftover noise to disable.. does that mean you use up all the noise in those 3 early steps? what's the thought in not setting that to enable??
🎨🖌️I’m an artist and I’d love to use this to create variants of my work and also generate animations. Is this possible using this? Sorry if it’s a dumb question but I’m totally new to this. 🖌️🎨
Ok i was a comfyui HATER. but once learning more and more on it I started noticing my images improved more than what A1111 outputed. thing is yeah A1111 is much more simplier so to get nearly the same or better on comfyUi it requires some work, but the skill cieling and options on comfy ui are so much higher which is a good thing it means it can create VERY good pieces of art if the workflow is done right.
Excellent tutorial, learned a whole lot in a short time. Why is it that the Refined images are indeed sharper, they seem to loose some of the more acute details. EXAMPLE: undersea shot without Refiner shows a murky underwater world with subtle light refraction and a sense of DOF, refiner seems to strip that away, leaving a sterile shot with little atmosphere?. Kudos
@@sedetweiler Im just an eager noob getting his toes wet. PS: Can i trouble you for one question: If i pick one image from say my ''history'', can i build from that, run further batches based off of that one image, so that i can fine tune my results?
Thank you so much for the tutorial. It really helped with some basic knowledge that was not obvious as a new user of ComfyUI (double-clicking to get a list of nodes for example.) A couple of question though. Why are the width and height of the CLIPTextEncodeSDXL nodes are set to 4096? What does this mean since the output is still 1024?
I'm curious what's happening with your 2nd refiner when it starts at step 12 while the base model is also running to step 15. Are the 2 models alternating steps (acting simultaneously) or do they still run discretely? I'm curious if the starting step is logically useful or if it's straight voodoo magic.
Yes I was just going to comment that the math does not seem to add up in that 3 sampler version at the end (first refiner: start 0, steps 3; base model: start 3, steps 12; 2nd refiner: start 12 [??? why not 15??] steps 20). I tried it at both 12 and 15 and actually liked the 12 better, but that may have been a coincidence and in fact it doesn't really matter. Very curious what is actually happening if you "mess up" these numbers. If I really mess with them, most of the time it comes out black and white (In one iteration I forgot to change the numbers when I copied over the second refiner to make the first, and I got beautiful, but black and white, versions of my images!!). Voodoo magic indeed.
Thanks for the great video. Could you please talk more on the clip encoder width and height and target width and height? What do they do and is there any documentation? Why are you using a different value for the target than the base?
also would like to know what these conditioners numbers do. And somehow, I've been happier with outputs when I set those number to 2048. But why? I don't know what those are doing.
@@sedetweiler Thanks for this tutorial - great reference. Great to have tutorials on this by someone who knows what they're talking about :-) I picked up on the size thing too - so it's 4096 for the base and 1024 for the refiner? Thanks!
@@sedetweiler "we tend to use that size" isn't really an answer. The only reason you'd have those numbers different is if you want to CROP a portion of the image..so in your case it's like wanting to crop out a 4096x4096 OUT OF a 1024x1024 image; which obviously is not how math works :)
thanks so much for the video. I'm having BASE Steps and TOTAL Steps Primitives. So I'm trying to use a Primitive node to feed the PRERUN steps to 1st Refiner (let's call it PRERUN KSampler) but i bumped into a problem. - Feeding "steps" into PRERUN Ksampler is fine but I can not feed this "steps" INT to "start at step" for the BASE KSampler . they're both INT, but perhaps ComfyUI considers "steps" and "start/end at step" are different types. 😒 - The other way around is feeding "end at step" for PRERUN and feeding this value to "start at step" for BASE and feed all KSampler with same "steps" value. But for some reason, the PRERUN Ksampler needs to be fed with exact amount of steps otherwise the result is nothing but NOISE. 😒 please help , thanks again.
@@sedetweiler thats exactly what i found. Derfuu VAR nodes and MATH nodes did the trick without any problem. Having said that, i found PRERUN step should not be more than 3 or it's all crap :) Thanks again and pls keep sharing with us the quirky tricks to play with Comfyui
I'm aware that SD don't take account of spatial relationships, but I want to be able to replace for example a sofa in an existing image with an image of another sofa, but not sure on how to take on that challenge with SD, do you have any suggestions where to start? I don't want to manually mask each image, but I want the AI to recognize what part of the image is a sofa and mask it for me, I should just provide the image of the sofa and the "base image" of the livingroom.
How is there any noise left during handover to the refiner, if you don't use the "end_at_step" parameter? Don't you get images without any noise from the base sampler if you don't limit the end in any way? Your base preview image confirms that you don't have any noise left after the base, which doesn't match the workflow described in the SD-XL documentation. And why do you overlap steps? For example you do 12 steps in base, but start at step 12 in refiner, instead of starting at step 13.
So I was following the guidance here, and found that UniPC and the 2M variants will barf on you when the refiner steps are higher than the base steps. I tried with the 12/20 pair you've demoed here, and got an image with nasty vertical streaks in it. It was fine at 20/20, but barfed again at 20/50.
Do you also use an upscaler with SDXL? All the Comfyui examples I've seen never include it, so I'm just wondering how that would look in this workflow?
Is it possible for you to do a tutorial showing the ComfyUI ->Models folder structure, and what goes into each of them? I manually installed the manager with no issues. But other things such as diffusers, embeddings, clip_vision, etc. are unknown to me. And a lot of things on huggingface can't be found within the manager. Thanks. PS: Just getting started with SDXL and using Comfy. So going through your videos one at a time.
Hmm i'm messing around with rendering the first 2-3 steps as something that i know SDXL is trained very well in so for example a brown horse racing for a positive prompt on the first 3 frames, then using a negative prompt for the Brown, with the new color being purple with a (purple horse:1.3). It's been working very well especially for harder to generate things, it's like it's erasing the colors and redrawing it now that there's a rough shape. I'd love to see how it will workout in combination with controlnet to maintain consistency in textures and shapes.
Hi Scott, really appreciate your giving us the most recent update on SDXL. Do you know how to fine tuned a model using SDXL 1.0 and Dreambooth? Is this something you can create a tutorial video for us?
Excellent video Scott. If you could do some of us a favor and go into detail about what everything is and how it works within the cliptextencode nodes then that would be of tremendous value. I have scoured the net and am only able to find limited info about the options and nothing i have found has explained how or why they work. Building out the workflow is a great first step but not knowing how to fine tune is lame 😂Thanks!!!!
Even the simpler setup is convoluted. I've worked with shader graphs, so it's alright, but I can see how this has a bad learning curve for many. I just don't see the big gain in using this setup for this utility quite yet. Stability should allow it to be "baked" into a simple GUI, so you can create a front end with different graphs, then not mess with it much, unless you want to add more pieces to the front end. Saving this front end would allow it to be shared with beginners and make it easy to get into, the complexity would be hidden until they're ready to explore. What is the advantage to conditioning with the refiner first?
So I've been using a workflow that was on the comfy up Github in their examples page. I'm struggling with trying to figure out how many steps I should be giving the refiner?
You set Base KSampler to return the leftover noise but there is no leftover noise because it does all of its steps. Then Refiner adds its own noise and process it further. You may see it in Base preview. I guess if you turn off the leftover noise from Base, result is gonna be the same. What you need to do to pass the leftover noise to the Refiner is to use for example 20 steps but end on step 12. Then disable add noise function on Refiner KSampler.
A million thanks for these. As finicky and frustrating as the program is for beginners, your calm expertise is just what's needed.
Thank you!
I would love if you could go over some of those settings in advanced detail - like "oh, I fiddle with more conditioning steps when I want to X", etc. There are so many superstitious people out there giving bunk advice that your level-headed breakdown would be super valuable!
Great idea! I will have to ponder where to start! :-)
it blew my mind that you can load an entire workflow from the image! thanks for the great content.
Why are there width and height values in the CLIPTextEncoderSDXL and what is the difference between width and target_width and why is one of them 4096?
Great questions and hopefully Scott can take the time to explain. Building out the workload is a great first step, but not knowing what everything does so that you can fine tune it is lame.
I've grown to understand and enjoy comfy UI more that the one i was using before thanks to your videos.I really appreciate you and the effort you put into making these tutorials. One of these days you can show us how to train sdxl 1 or it's lora with our faces . Thanks :)
Great to hear! Training will be coming soon! Cheers!
I was waiting for it. These are very difficult for ordinary people to figure out how to use it. Thank you for the video!
Glad it was helpful!
this video has some great insights on how to process the original image. i have a few fyis to add. for those of us stuck with low vram rigs that have to run SD 1.5 (fir now 😢), the verbose negative prompt is essential -- for SDXL it is worthless like Scott says. for those with Mac, this web interface uses ctl just like windows. if you like png over jpg and don't want to share metadata, open the image in an editing app (photoshop, gimp, etc.), export as png, and make sure the include metadata is not selected. Thanks for all the great content Scott -- you never disappoint 👍.
Excellent tutorial, thanks! I got SDXL up and running with the refiner. If you have the time I'd like to see you make a video explaining how Stable Diffusion works and explain exactly what the program is doing as it sends the data through the nodes in Comfy so I can have a greater conceptual understanding of what is happening. Believe me I could watch hours of technical stuff lol.
Huh. I wonder what would happen if you had dedicated models for a variety of tasks (hands, eyes, hair, reflections, contrast, and so on) and fed a few steps from each of them in a daisy chain until you got to the first "true" sampler...
Truly the possibilities are endless; thanks for the food for thought and the hard work!
That's a great idea, and we do have those as loras. It's fun to combine them to help get what you want.
That is an interesting idea. The multitude of experts approach is proving to be the more effective of what we have developed recently.
Not too mention that you could also combine this with prompt blending syntax to ensure that each part of the processing is focusing entirely on one subject in the prompt while still maintaining an overall mixed composition.
If for simplification purposes you set up 5 samplers, each with an equal number of steps, 4 for the limbs and 1 for the head/torso. Then you set up a prompt blending which focuses 20% of the processing on each limb etc. it may even have better results.
And yeah using LORA chains would mean that we could have a separate model output for each limb, while maintaining the same initial model. Allowing for less resources used at the same time compared to multiple dedicated models.
I think I'm going to play around with this now actually xD Minus the dedicated limb lora of course.
Thanks for these tutorials, great to have an in depth dive into the UI.
I'm a little confused about the start/end steps and steps in the KSampler.
In your second sampler in the chain of them, if you start at step 3 and do 12 steps, wouldn't that leave you at step 15 for your starting point in the next one?
There are some advantages to skipping steps in some cases. It all has to do with the residual noise.
This is awesome! Thanks for the clear tutorial!
I'm curious though. Is there a reason you left 'return_with_leftover_noise' disabled on the last Ksampler that you added?
I have been in love with ComfyUI since I found it (coming from Unreal Blueprints, very familiar system). I am currently working out some torch issues with my current system, but I generate whenever I can. It is great to see you building out the workflow and explaining the nodes that you use and why. Very informative and THANKS for the tip with the shift-click to copy nodes AND connections. NICE!
Great to hear! I am really happy with the nodes, but I hope they really update to things like docking, etc. Cheers!
Same here since I come from Houdini, just love the node spagetti
Yusss! I also used Houdini as well as Substance Designer and I am hoping to get into nested nodes here as well. Cheers!
I'm new to ComfyUI all and really love your videos. Thanks! Maybe this is obvious to folks, but one thing I recently learned was the ability to condition after one KSampler ran so you can continue to refine your final image. It ended up being an alternative (or another tool in the toolbelt) to inpainting. I wasn't just refining, I was adding to or dramatically changing the final image - all without losing the "base" starting point that was all "locked down" in that the seed was fixed, the cfg and steps didn't change, etc. So it was a very non-destructive compositional workflow. If I wanted to add an object to the image, I could do that through a second prompt that was applied to a second KSampler.
I could also introduce new LoRAs later on in those steps. I'm going to continue to experiment with this strategy and go through this more than once. So instead of a long prompt followed by a smaller corrective one, do more of a build up of prompts. Start simple and continue to add on to it so that elements within the image can be independently adjusted, removed, or re-arranged. Again, a more compositional approach during image generation to hopefully reduce the amount of work in post (or a series of very similar images that can be worked together in post processing). This could get a bit messy too, but maybe not if they are arranged left to right in a linear fashion building up the scene.
That's great! It is a lot of fun adding into the pipeline. It's what we do internally as well when testing models and playing with new ideas. Cheers!
I'm mind blown! never thought of using comfyui ever but seems like I'm sold over this video. very nice sir and thank you for sharing your knowledge
Glad you liked it! It will also teach you a lot more about how things work, which I always feel is a good idea.
@@sedetweiler absolutely! downloading it now hehe
WOW ! that's a super tutorial of ComfyUI there ! Thanks. I never know that there was this new addition of clipnode for SDXL !
The only drawback that I find in ComfyUI is the way it manage the workflows. I mean when you want to change your original workflow, you need to save a local file, and if you want to do something else (like inpainting) you have to redo ALL your workflow and save it to a file to recall your workflow and switch by loading one workflow or another depending on what you want to do. Definitly not fond of this way of managing workflows. They could have done some kind of "favorite" workflow. Like 5 or more "workflow ready" that you could custom afterwards and save your "favorite custom workflow" and switch whenever you like. it would skyrocket the use and adoption of comfyui !
I just drop the json you get from using "save" into the interface and it loads. But,*do agree that would be nice.
@@sedetweiler ooh ! Nice another tip ! Drag and drop the json just works too ! I might be able to explore more versatile stuff with comfyui now :)
Great video Scott, I wonder could you explain how to change the image size? What do I have to alter to produce an image of 832 x 1216 for example? Or point me to a future video that explains it, as I'm only on ep.2 Thanks💖
I was able to follow the tutorial well. I'm a bit confused at the three separate seeds. I can adjust the first (the conditioner/initializer) and get changed results, do I care about the others? In a previous video you said it wouldn't matter much for that context. Is that also true here?
thank you so much, i've become really proficient with A1111 and moving to comfyui was a big switch, so your help with how the workflows work in comfyui has made it just as easy as using A111 for me.
When you added the third or 'pre-sampler', why did you not pass the noise information as you had done with the first of the two samplers? I messed with that setting on the first two and didn't notice much of a change. Thank you for the videos and instructions. They are extremely helpful. And you suggest not to add things like extra fingers to the negative prompt. What is your method of not getting extra fingers or limbs, etc?
Prompt switching can be realized with additional KSampler that will render first steps with completely different prompt. For example you may want to create triangle composition, or a symmetrical image, and it can be done at early steps of a generation. Good for abstract art. And also I like that in ComfyUI it's seed can be fixed while base model and refiner will be generating on different seeds
good idea! just tried this out and it worked in an ineresteing way. Essentially prompting an init image
ComfyUI is truly about fine tuning the way one approaches the creation of an image using AI
I agree!
Love your disgust for the negative prompts lists haha. relatable stuff
(((((((((extra arms!))))))))) :-)
My first steps into ComfyUI, and it's the kind of thing I really like 🙂
Glad to hear it!
Thank you for this! I've created my own custom workflow based on this one with lots of inputs --> primitives to change stuff quickly.
Fantastic!
Thanks!!! these boxes are actually starting to make sense
Woot!
I appreciate the clear steps and explanations. Almost all SDXL models found on Civitai for example, do not have refiners, so do I use the official refiner?
Thanks for this look at the setup that Stability uses internally. I'm not so familiar with Comfy, but I've been using and enjoying SDXL through Invoke, which has a similar Nodes capability. I have a few questions and comments:
1. What are the Original and Target W/H actually doing for the CLIP conditioning nodes and what is the logic to setting those values? I played around with it, testing various combinations, and the only thing I could confidently say is that setting Original W/H smaller than 1024 causes the image to become blurry. I couldn't see any specific benefit to any other value, as I tried 1024, 4096, and 40960 for Original and between 64 and 40960 for Target -- setting different values made the image different, but not obviously better or worse. I settled on just setting them the same as the output image dimensions.
2. Why are there two prompt inputs for the base text encoder node when you provide the same input to both? Invoke calls one input the prompt and the other the style. What effects are caused by, e.g. separating your prompts into a prompt and a style and sending them independently to the two inputs, switching the inputs (so prompt goes to the "style" input and vice versa), setting them both the same, or leaving one or the other blank? I've found that if I prompt the base model for a roller coaster in the first input, I get a roller coaster. But if I prompt "roller coaster" for the first input and "photograph" for the second, I get anything BUT a roller coaster -- ruined buildings, abstract paintings, etc.
3. Connected with #2, Invoke's refiner conditioning node only includes a "style" input, but I've found that only giving it a style prompt can cause the refiner to do weird things (like making architecture look like it's made of tent fabric).
4. You've indicated that initializing the noise with the refiner is an interesting idea, which it is, but have you seen any consequences other than just making the images different? Does it provide any actual benefit?
5. I've experimented with higher resolution SDXL generations. I'm on a Mac and there are some apparent generation bugs with Invoke on MPS (about 1856 square and above it becomes debilitating). But I've noticed that my scenes at higher resolution (photographic sci-fi style architecture) tend to become wide angle and taken from a high vantage point, almost as if the resolution setting is correlated with the position and zoom of the virtual camera. Has Stability done any experiments at higher resolutions than 1024x1024?
6. Is there a benefit or danger to sending the same noise seed to both the base and refiner?
Good questions... I also would like to know the answer. Did you understand the concept of why there is a field of dimensions in a node that is supposed to provide only text?
This is getting fun! Cant wait to work on img to img tomorrow after work!
Have fun!
Using SDXL with a 2060 SUPER 8GB + ComfyUI and it works great 👍Turned out that Comfy is 15x(!!!) faster than A1111 for the same tasks using SDXL! Also I never got any errors with ComfyUI while A1111 always gives me "NaN tensor" errors when working with SDXL. For SDXL, Comfy is a MUST! So I`m looking forward for more tutorials on ComfyUI.
Great to hear!
I am on the opposite; I am only be able to use Colab. When using A111, it's very fast and smooth, allowing me to work with videos and other tasks, including upscaling to 8k,comfuyi just pictures. However, with Comfuyi on Colab, I encounter issues such as disconnects and running out of ram after just 5 images, im using workflow of olivio sakiras, using base ckpt sd1.0 + refiner sd 1.0 . overheard refiner consumes lots of Ram
@@JolieKim2795 I`d double check your workflow used then. Also did you try to run ComfyUI locally? You don`t need a decent GPU to do so. Even an old 8GB NVIDIA will do it.
Thanks Scott, as a beginner your videos are great. Very well explained and "easy to learn than a bunsh of others.
Glad to help
Thank you so much! Even though I couldn’t understand much, it helped me get started with comfy.
You’re welcome 😊 Just keep working with it and it will start to click into place.
Thanks, Scott. I was really looking for something like this to get started with SDXL in ComfyUI.
Glad it was helpful!
Great tutorial!
This is my first time using ComfyUI and this video helped me a lot, tyvm!
You are most welcome!
Thanks. If you're looking for recommendations, a video focused on comparing upscalers and incorporating upscaling into this kind of workflow might help people. Seems like a nice next step. Appreciate what you've shared so far.
Hi there Scott, thank you for the excellent tut. I must admit though, my robots did not look anything close to how refined yours came out. I wonder if I missed anything somewhere...
Thanks for the tutorial! Is the "noise seed" in Ksampler Advanced same as "seed" in Ksampler? You set noise seed as 4, what's the meaning of the number? What if I left it as zero?
thanks so much for your detailed tutorials 🌺🌺💐💐💐🙏🙏
Love your videos :D just started using ComfyUI with SDXL. Having a lot of fun so far!
Glad you enjoy it!
This is a great video congrats. Very informative very thorough and you left no doubts. Can't wait for the next step!
More to come!
Appreciate this Scott, you helped me fill in the blanks! I was wondering how the primitive nodes were used, it was driving me nuts! Hahaha! I was able to add an extra step to add an upscale process and it works very well! Looking forward to more.
By the way, is there a way to create an image gallery somehow? Sort of like how invoke Ai is set up?
I just use Bridge, since I already have an Adobe sub and it's better than most gallery apps. Comfy really isn't good for that type of thing at this point.
quickly becoming my goto channel. keep up the great work
Thank you!
Couple of things, isn't it recommended for the refiner to actually be started at 80% of total steps? Also, is conditioning via the refiner really a thing or did you just kind of mess around with it? You didn't select pass on noise, so I'm not sure what that means.
Thank you for the tutorials, they are great!
Thank You so much! videos like these are a blessing and help people to get into it more professionally.
You're so welcome!
One question I had. Is there any reason why you recommend using the VAE from the refiner, when there is only 1 version of the VAE (barring custom fixes for FP16) publicly available?
If I choose to merge the fixed FP16 base VAE with the refiner, am I getting the same experience as you are (besides fp16-fp32 differences) ?
Your videos are amazing, thank you for your great contents!!
I am definitely going to search for a good upscale workflow on your channel.
It's coming this weekend!
I’m curious, how would I implement a lora in this setup? I tried inserting 2 lora nodes after the checkpoint nodes and connecting them like I would in SD 1.5, but it seems to not be registering the existence of my Lora and just skipping over it. My checkpoints are connected to the Lora nodes only, except for the VAE, which is used for the decoding, what am I doing wrong and how exactly do I fix this?
Thank for this really concise and helpful tutorial. Just one thought, you did not enable the "return with leftover noise" for the "initial conditioning" node. Wouldn't it make sense to do so?
It actually returns so much that things go sideways. Give it a try. I have not found that to work well.
Very helpful. Have you experimented with learned with using multiple Ksamplers? Are you still keeping at least 3 at different steps today as part of your workflow?
I use 2 most of the time.
I just did download the official base and refiner but it seems I've got the VAE version from somewhere else in the past.
What's the difference? I get that the VAE is built-in to the model. Does this mean you get to delete the VAE Decode node or some other node?
Can you just keep the VAE version and follow your workflow with no difference in results or at least no negative results in quality?
As in the last step you showed you can 1st generate a blank latent and then into the base and refiner... Seems like you can do all sorts of tricks like that to experiment with the resulting image. I wonder if it makes sense. If I get it right, it seems that the latent creates a base noise ignoring the models so that you can just get something a bit out of the box (model). Is that right?
Thank you for the tutorial. I have lot's of stuff to learn.
Thanks for this ! I was using comfy and using refiner at the beginning and it was coming up with really wacky pictures, now I can use it correctly thx
Hi Scott, im rewatching the whole series again, you have done a good job. I have a question in this particular episode with the Sampler...why do you have the possibility of using the denoise within the KSampler but not with the advanced KSampler? Do they work differently?
it was to simplify things. when you start at a later step with the advanced sampler, you are "skipping" some of the pieces you do not want to denoise, so it is the same thing but harder to explain.
When you make Img2Img in one of the videos I saw that you used the common Ksampler because you needed the denoiser. Now everything is much clearer to me, thank you very much for answering.
fantastic! I am looking at some advance workflows, however with no real explanations how they work. I want to use it but I dont know what some of the nodes and flows do! However I found alot of value from your vids and at this stage I am happy to just play and learn comfy and put of creating art projects/ideas with SDXL for the time being.
That 3rd Sampler is neat! I tried to see if you can use latent upscale method in your previous video with SDXL base and refiner, didnt work but that is the beauty of comfy! You get to try stuff
I also think it is a pretty great way to learn how all of this works together. It really is limitless!
wow - great tutorial dude. I've only recently got into comfy and wondered why all the controlNETs were failing last week :D All new ones install thanks to your videos and loving all the sdxl videos... fun times ahead (but I really need a pc gaming rig for speed) haha
Out of interest - what kind of set up for a pc would you recommend for quicker generation/processing? massive 128gb RAM and like a RTX4090? :D
thanks for your videos - amazing
Being that this video is now four months old can I assume that your checkpoint is now named differently? The one I have that was downloaded when I installed Searge's script yesterday has the vae included in the filename like so: sd_xl_base_1.0_0.9vae.safetensors and a refined one named respectively.
Sure, feel free to rename them. I do because they all have generic names and need to be changed to keep things sane.
Perhaps things have changed since this was published nine months ago, because this workflow just gave me dark, abstract images. But I learned a lot about how to build out a workflow! Thanks!5
I experimented with your workflow and I found that to really see what refinement was done, you should leave BOTH samplers at 20 steps, then on the refinement sampler you can start at 12. This way you can properly compare the differences. Whereas when you do only 12 steps on the first sampler, you end up with a significant changes on the refined image.
thanks for this! looks so crazy to a beginner but I followed and it's a great place to start generating images
It really is! You have officially leveled up, as this specific type of workflow shows you how things work, not how to use a specific UI. Cheers!
Do you have any videos (or recommendations for other videos) that go in depth on debunking the negative prompt urban legends you mention?
No, but I should make one. It's just terrible what people pass on as the perfect negative. Do they think the model was trained on "bad anatomy" and "extra fingers?"
Thank you for your video! I learnt that comfyui is awesome :)
You're so welcome!
on that third sampler you added you kept the return with leftover noise to disable.. does that mean you use up all the noise in those 3 early steps? what's the thought in not setting that to enable??
🎨🖌️I’m an artist and I’d love to use this to create variants of my work and also generate animations. Is this possible using this? Sorry if it’s a dumb question but I’m totally new to this. 🖌️🎨
all the info i was looking for. great video. thank you
Glad to hear it!
Thanks Scott, your tutorials are great
Glad you like them!
Ok i was a comfyui HATER. but once learning more and more on it I started noticing my images improved more than what A1111 outputed. thing is yeah A1111 is much more simplier so to get nearly the same or better on comfyUi it requires some work, but the skill cieling and options on comfy ui are so much higher which is a good thing it means it can create VERY good pieces of art if the workflow is done right.
Excellent tutorial, learned a whole lot in a short time. Why is it that the Refined images are indeed sharper, they seem to loose some of the more acute details. EXAMPLE: undersea shot without Refiner shows a murky underwater world with subtle light refraction and a sense of DOF, refiner seems to strip that away, leaving a sterile shot with little atmosphere?. Kudos
There is some balancing going on for sure!
@@sedetweiler Im just an eager noob getting his toes wet. PS: Can i trouble you for one question: If i pick one image from say my ''history'', can i build from that, run further batches based off of that one image, so that i can fine tune my results?
Thank you so much for the tutorial. It really helped with some basic knowledge that was not obvious as a new user of ComfyUI (double-clicking to get a list of nodes for example.) A couple of question though. Why are the width and height of the CLIPTextEncodeSDXL nodes are set to 4096? What does this mean since the output is still 1024?
It is the resolution CLIP was conditioned at prior to scaling. I tend to use it and prefer the result.
I'm curious what's happening with your 2nd refiner when it starts at step 12 while the base model is also running to step 15. Are the 2 models alternating steps (acting simultaneously) or do they still run discretely? I'm curious if the starting step is logically useful or if it's straight voodoo magic.
Yes I was just going to comment that the math does not seem to add up in that 3 sampler version at the end (first refiner: start 0, steps 3; base model: start 3, steps 12; 2nd refiner: start 12 [??? why not 15??] steps 20). I tried it at both 12 and 15 and actually liked the 12 better, but that may have been a coincidence and in fact it doesn't really matter. Very curious what is actually happening if you "mess up" these numbers. If I really mess with them, most of the time it comes out black and white (In one iteration I forgot to change the numbers when I copied over the second refiner to make the first, and I got beautiful, but black and white, versions of my images!!). Voodoo magic indeed.
@@m4dbutt3r appreciate this. Would be nice to know if this power can be harnessed for good
Quick question, why was the last Ksampler added without a preview mode?
It wasn't on purpose. I just add them for their maths, not the previews.
@@sedetweiler thank you!
Thanks for the great video. Could you please talk more on the clip encoder width and height and target width and height? What do they do and is there any documentation? Why are you using a different value for the target than the base?
I would love to see and answer to this as well
Why did you choose 4096 for the height and width in the conditioners?
I'd like to know that one too
also would like to know what these conditioners numbers do. And somehow, I've been happier with outputs when I set those number to 2048. But why? I don't know what those are doing.
The refiner was initially conditioned at that size prior to scaling, so we tend to use that size.
@@sedetweiler Thanks for this tutorial - great reference. Great to have tutorials on this by someone who knows what they're talking about :-) I picked up on the size thing too - so it's 4096 for the base and 1024 for the refiner? Thanks!
@@sedetweiler "we tend to use that size" isn't really an answer. The only reason you'd have those numbers different is if you want to CROP a portion of the image..so in your case it's like wanting to crop out a 4096x4096 OUT OF a 1024x1024 image; which obviously is not how math works :)
thanks so much for the video.
I'm having BASE Steps and TOTAL Steps Primitives. So I'm trying to use a Primitive node to feed the PRERUN steps to 1st Refiner (let's call it PRERUN KSampler) but i bumped into a problem.
- Feeding "steps" into PRERUN Ksampler is fine but I can not feed this "steps" INT to "start at step" for the BASE KSampler . they're both INT, but perhaps ComfyUI considers "steps" and "start/end at step" are different types. 😒
- The other way around is feeding "end at step" for PRERUN and feeding this value to "start at step" for BASE and feed all KSampler with same "steps" value. But for some reason, the PRERUN Ksampler needs to be fed with exact amount of steps otherwise the result is nothing but NOISE. 😒
please help , thanks again.
I have also noted that, and I think it is a bug. That should work just fine. I got around it by using a math node, since that was the end goal anyway.
@@sedetweiler thats exactly what i found. Derfuu VAR nodes and MATH nodes did the trick without any problem.
Having said that, i found PRERUN step should not be more than 3 or it's all crap :)
Thanks again and pls keep sharing with us the quirky tricks to play with Comfyui
@14:20, doesn't the second sampler go up to step 15? And as a result, shouldn't the third sampler start at 15? And thanks for a great video!
They are exclusive, the step start is correct.
I'm aware that SD don't take account of spatial relationships, but I want to be able to replace for example a sofa in an existing image with an image of another sofa, but not sure on how to take on that challenge with SD, do you have any suggestions where to start? I don't want to manually mask each image, but I want the AI to recognize what part of the image is a sofa and mask it for me, I should just provide the image of the sofa and the "base image" of the livingroom.
Amazing tutorial, thanks for sharing!
Glad it was helpful!
You are da man! Thank You so much for this tutorial!!!
yay finally got it working :) learning fast thank you
Great!
How is there any noise left during handover to the refiner, if you don't use the "end_at_step" parameter? Don't you get images without any noise from the base sampler if you don't limit the end in any way?
Your base preview image confirms that you don't have any noise left after the base, which doesn't match the workflow described in the SD-XL documentation.
And why do you overlap steps? For example you do 12 steps in base, but start at step 12 in refiner, instead of starting at step 13.
Great video, very usefull. I am struggling to do a workflow from load image and the put through an ultimate upscale node.
So I was following the guidance here, and found that UniPC and the 2M variants will barf on you when the refiner steps are higher than the base steps. I tried with the 12/20 pair you've demoed here, and got an image with nasty vertical streaks in it. It was fine at 20/20, but barfed again at 20/50.
Hmmm... at around 14:15, when you add the first refiner with the 3 steps, shouldn't the last refiner's "start_at_step" be changed to 15?
Should i use a refiner for a custom model? for example if i use juggernaut xl?
Do you also use an upscaler with SDXL? All the Comfyui examples I've seen never include it, so I'm just wondering how that would look in this workflow?
Ya, you can use any upscaler and use them repeatedly. It's way more flexible than AUTO1111. I will do a video on this super soon. Cheers!
Great watch, thank you!
Mine has nowhere near as much detail as this. I'm using PonyXL model is it an issue with the model?
Probably. Models that are "adjusted" can also have massive amnesia if they are not well done or are overly focused in one area.
Would you mind sharing this workflow through a gdrive ❤
Is it possible for you to do a tutorial showing the ComfyUI ->Models folder structure, and what goes into each of them? I manually installed the manager with no issues. But other things such as diffusers, embeddings, clip_vision, etc. are unknown to me. And a lot of things on huggingface can't be found within the manager. Thanks. PS: Just getting started with SDXL and using Comfy. So going through your videos one at a time.
Hmm i'm messing around with rendering the first 2-3 steps as something that i know SDXL is trained very well in so for example a brown horse racing for a positive prompt on the first 3 frames, then using a negative prompt for the Brown, with the new color being purple with a (purple horse:1.3). It's been working very well especially for harder to generate things, it's like it's erasing the colors and redrawing it now that there's a rough shape. I'd love to see how it will workout in combination with controlnet to maintain consistency in textures and shapes.
That method can also help with LoRA images that are not as strong as you prefer. It's a great workflow. 🥂
Hi Scott, really appreciate your giving us the most recent update on SDXL. Do you know how to fine tuned a model using SDXL 1.0 and Dreambooth? Is this something you can create a tutorial video for us?
That is coming soon. It is going to be easier to train, results wise, but still getting methodology together.
@@sedetweiler 🙏🙏looking forward to… do you know if the new dataset should be set at minimum at 1024 by 1024?
any tips on adding an upscaler?
Upscaler video is out today! Woot!
thanks!!@@sedetweiler
Thanks for no nativ english speaker this was a good tutorial. It was very helpful! :)
Glad you enjoyed it!
Im still confused on what the ClipTextEncodeSDXL does? and how does the value 4096 affect it?
That was the initial conditioning prior to scaling, so we just prefer that for the refiner.
Excellent video Scott. If you could do some of us a favor and go into detail about what everything is and how it works within the cliptextencode nodes then that would be of tremendous value. I have scoured the net and am only able to find limited info about the options and nothing i have found has explained how or why they work. Building out the workflow is a great first step but not knowing how to fine tune is lame 😂Thanks!!!!
Sure thing!
Do you know which Sampler ClipDrop is using for how many Steps? Especially in the SDXL0.9 days. Would love to know.
I believe it is dpmpp sde GPU.
@@sedetweiler Interesting, seems like a very good Sampler, which I had never used. Thanks for the Info, very appreciated! 🤓👍
Why do I get a different robot to yours? Just curious. I thought if the seed was the same I should get the same image.
Even the simpler setup is convoluted. I've worked with shader graphs, so it's alright, but I can see how this has a bad learning curve for many. I just don't see the big gain in using this setup for this utility quite yet.
Stability should allow it to be "baked" into a simple GUI, so you can create a front end with different graphs, then not mess with it much, unless you want to add more pieces to the front end. Saving this front end would allow it to be shared with beginners and make it easy to get into, the complexity would be hidden until they're ready to explore.
What is the advantage to conditioning with the refiner first?
So I've been using a workflow that was on the comfy up Github in their examples page. I'm struggling with trying to figure out how many steps I should be giving the refiner?
I would start with 32 in the base and 8 in the refiner
models, the refiner, etc. Where can I find definitions for all these variables?
You set Base KSampler to return the leftover noise but there is no leftover noise because it does all of its steps. Then Refiner adds its own noise and process it further. You may see it in
Base preview. I guess if you turn off the leftover noise from Base, result is gonna be the same. What you need to do to pass the leftover noise to the Refiner is to use for example 20 steps but end on step 12. Then disable add noise function on Refiner KSampler.
Thank you very much. ı prefer comfyUI over A1111 and you are my go to channel for my purposes.
Happy to hear that!