As I mentioned in the video, I'm joining StabilityAI in April 2024. Thanks for all the channel support. I've received permission to continue making YT videos as long as I don't violate NDA :) Workflows are here: github.com/kudou-reira/SDXL_training_settings/tree/main/comfyUI_workflow/high_resolution_inpainting Full timestamps: Time stamps 00:00 Introduction 03:24 Resizing the image 05:04 Checking image size 07:00 Default mask 08:12 Preview bridge 09:23 Canvas tab 11:35 Naive inpainting 13:48 Issues with naive inpainting 15:06 Mask to region 16:34 Cut by mask 19:04 Resizing the crop image 19:47 Resizing the crop mask 20:45 Encoding the masked crop 21:48 Comparing naive and high-resolution inpainting 22:44 Compositing ksampler crop back onto the original image 25:40 Checking the robustness of the high-resolution inpainting workflow 27:32 Image blend by mask 27:56 Image downsampling on the composite 29:30 Acly's ComfyUI inpaint nodes (overview) 30:56 Acly's pre-process workflow 33:20 Fill masked options 34:38 Blur masked area 36:16 Fast inpaint 36:50 Outpainting with SDXL 41:44 Fill masked area (setup) with high-resolution inpainting 43:04 Rgthree nodes introduction 46:04 Fill masked area (visualize) 46:52 Integrating Fooocus patch 49:00 Tensor size mismatch error 50:16 Adding ControlNet depth 51:58 Rgthree bypass node on ControlNet depth 53:00 Adding ControlNet depth to the Fooocus patch workflow 54:06 Fill masked area integration 55:06 Comparing results with Acly pre-process 56:10 Image to mask error + fix 59:02 Fill masked area (blur) 59:36 Alternative blur method 01:00:34 Image composite masked 01:02:08 Fast inpaint 01:03:12 IP-Adapter overview 01:04:12 Removing redundant nodes 01:06:14 Series workflow (fill masked area) 01:07:40 Series workflow (blur masked area) 01:08:22 Group bypass/muter 01:09:12 Series workflow (fast inpaint model) 01:09:42 IP-Adapter crash course 01:11:20 Bypassing the IP-Adapter 01:12:20 Using high-resolution fast inpaint to remove objects 01:13:16 Applying reference image to high-resolution inpainting workflow 01:15:48 Integrating attention masking to the IP-Adapter inpainting 01:16:36 Why do we need double masking (attention + normal)? 01:18:06 Using IP-Adapter with multiple reference images 01:19:32 Addressing the opacity blend issue 01:20:48 Applying ControlNet to IP-Adapter 01:21:22 Adding pre-processing methods to IP-Adapter 01:22:29 Using multiple reference images for IP-Adapter 01:23:44 Compositing images inside ComfyUI (Canvas Tab) 01:26:48 Switch nodes (Comfy Impact) 01:29:16 Rgthree bookmarks 01:30:28 Pad image for outpainting 01:31:12 Integrating image padding into high-resolution inpainting workflow pt.1 01:31:44 pt.2 01:32:56 Outpainting from a bust-up image 01:34:20 Context nodes (rgthree) 01:35:52 Context switch (rgthree) 01:38:32 Replacing switch any with context switch 01:39:52 Toggle control system basics 01:41:40 Toggle between txt-2-img and high-resolution inpainting 01:43:12 Linking state across groups with relay node 01:44:20 Fixing the one-way relay issue 01:45:00 Linking the state of txt-2-img to image save node 01:46:00 Debugging the outpainting workflow 01:49:12 Debugging why both txt-2-img and inpainting run at the same time 01:51:00 Giving txt-2-img its own full scale/non-masked reference image 01:54:00 Conclusion
Great vid! I tend to find myself wanting to conserve my productivity bandwidth (decrease mental latency) 1:24:11! Thanks for the heads up haha. Must've taken you a while to edit, appreciate all the effort, I learned a couple tricks!
Pretty nice to remember most inpainting workflows in one video and compare different similar processes in one go. Always wanted to test them myself side by side but never had the time to do so, and just Learning each procedure one by one only gives you a vague idea of pros and cons of each option is the overall "winner" or at least the best for a given situation. Thanks a lot, very fun to watch as well. Nothing personal but watching so many non-native english speakers with weird accents makes me nervous (I'm a CG/VFX artist with both ADHD and ASD which help me a lot to focus in the things I love and get obsessed with like 3D/CG/VFX and now Generative Ai, but also anything that perturbs my attention like a non-standard accent drives me crazy, sorry for all the non-native speakers out there 🫣 it has nothing to do with segregation, racism, or anything like that, just phonemes and sounds my brain is not expecting in english and makes me unable to focus on anything else 🤦🏻♂️, I'm sure other fellow ADHDers and TEAers can relate. So finding new (natural sounding) English speaking and technically aquainted Ai demoers is quite a challenge. And I'm really happy to have stumbled upon your channel, subscribing and liking right away. There are lots and lots of "newcomers" to the world of digital "art" that have never worked in anything related to drawing, painting, sculpting, 2d, 3d, VFX, and CG in general. Sometimes this newcomers and self-called Ai experts that worked in who knows what before they decided to create youtube channels give advices that often make no sense whatsoever. Just because something works in one specific test doesn't mean it's technically and artistically correct, nor even what the people behind the idea and code proposed it to be used. Most of them don't take the time to read papers, and understand what's behind a certain idea or technique. Nor know anything about Navier-stokes, Euler, Karras, probabilistic algorithms, image Sampling, noise sampling, progressive refinement, Gaussian, perlin, brownian, noises, anything math related in the image creation field , and so on and so forth. And as you said: good for them....for most people learnint Ai to have fun, make cool wallpapers for their PC, or Phones, some funny memes, and the like knowing anything more than what button to click, and what checkpoint to use is more than enough. But for those of us who are not just having fun with a new tech that allows you to create cool images, with little to no talent, knowledge or expertise. But who are trying to learn, understand, and know all we can about this tech and starting to apply it to our daily workflows, and oiling up our gears and getting ready for what's coming sooner or later to our industry, finding channels like yours is really like finding a needle in a hay stack. The future of CGI is inevitable linked to the future of Ai, even if those who predict we're all on the brink of extinction/unemployment, like it or not, Ai is here to stay! Human decisions, talent, aesthetical taste and directed problem solving is not something machines will be able to do anytime soon. So either as plugins, native options or entitely new software for designing, creating, manipulating, animating, editing, and compositing with Ai tools, be ought to be ready today. Tomorrow might be already too late. And in that regard it's pretty frustrating to watch 10 or more minutes of wordy intros, excessive hype, and lots of nonsense or incorrect workflows that "just happen to work" but give you little control, or are slower to generate. Whether you are native or not doesn't mean a thing, your English and pronunciation is flawless. I suppose you are an asian descendant but raised in the US, because of your user name, the kind of art you show, and your calm way of speaking, just guessing here, of course, but whoever you are THANKS A LOT for sharing your knowledge tests and findings. Sorry for the super long comment (it's part of the ADHD side of my personality, I'm very talkative and completely unable to summarize my ideas when I write or type 😅) Kudos, congrats for your starting at Stability Ai, and please, keep up with your great videos!
Thanks for your kind words. As you have correctly guessed, I was born, lived, and studied in the United States for my entire life until deciding to move to Japan. There are a few reasons why I made a TH-cam channel in the first place. First, I felt that there were plenty of underqualified creators with poor quality channels trying to distort/shortcut design, CGI, and machine learning/generative AI. It's probably a large reason why my models/techniques/workflows are very different from the generic Gen AI influencer. Second, I like to get very technical and also have the traditional industry pipeline training, so I wanted to show how I would use these new tools to help me in my creative process. Thanks for stopping by!
The best! I looked through a bunch of tutorials, everywhere it’s just “connect this here, then here” and what it all does and how it works? God bless you man you saved my day
Thanks for the kind words. 2 hours is a long time, so I understand that it's not for everyone. But, I personally thought it was a resource worth making. Hopefully, this helps everyone create/improve their own workflows.
Thansk for your video. This is a truckload of very good material. And congrats on your new job! I might have stumbled upon another bug here. The blur masked node lets my SD server simply crash whenever I use a blur > 5. I also had those other bugs you had., although the weird checkered alpha mask occured AFTER I used your "hack". Before that, I "just" had problems with getting the mask blurred at all. Something here seems very buggy. Never ran in such a buggy situation with comfy before!
Thanks for the kind words. For the blur crash, I suggest trying other nodes that are supposed to do the same thing from other node creators if it doesn't work. Blurring at a high strength might be computationally difficult task (not sure). Also, for the checkered alpha mask, it's either a bug with newer version of Acly's inpaint nodes, or you'll just have to manually change the grow mask each time. Or, you can blur the mask, then solidify it again with a certain intensity threshold (127 is gray, 0 is black, so pixels around 10? or less must be converted to black).
Do I understand the workflow correctly that the nodes will always have to be manually adjusted to each new picture and the cropped part? That's so much work for doing something that works with A1111 automatically. I wish there was an easier workflow than this, but thanks for pointing out a way to do it. I haven't seen anyone else doing a tutorial on high-resolution inpainting in ComfyUI.
What do you mean? You can change the crop resolution if you want, but I just leave it at 1024 x 1024 pixels usually. The workflow will automatically resize the inpainted area (assuming it's a single continuous entity and not separate blobs) into your desired crop resolution (i.e. 1024 x 1024 pixels), do the inpainting, then resize it back down at the end. That's the core logic. Everything else I added such as the pre-processing of the masked area (blurs, etc.) and ControlNets/IPAdapter are just extra levels of customizability. Maybe I'm not understanding your question correctly. However, this is the "simplest" way I could think of to do the same method as A1111 in ComfyUI.
Congrats on the position! Also I think the the reason for resize width and height not doing anything at 4:40 and 6:15 is due to the mode on the image resize. You use the mode drop down to pick between using rescale factor vs resize height and width to specify the operation.
Great example of a very knowledgeable person who is sadly a terrible teacher. I wish you would understand didactics and how to convey knowledge in a manner suitable to those who don’t already know 90 percent of this. You evidently attempt structure with those section titles, but each segment is all over the place and all you actually do is walk through the nodes, which any student of your topic can do in their own time. It would be a thumbs up if you would actually explain what is going on, and spend less time frantically zooming and panning all over this overly convoluted workflow. I grant you that you know this topic well, but as a tutorial, it really fails. Which is unfortunate given the effort put into it.
Thanks for the feedback. I've already started pulling back on creating detailed videos. Also, you are mistaken about this being a tutorial. I shared the workflow before creating this video, and even experienced people could not understand the logic. This is why I made this video. I agree with you that I'm not a teacher, but I don't claim to be one, either.
As I mentioned in the video, I'm joining StabilityAI in April 2024. Thanks for all the channel support. I've received permission to continue making YT videos as long as I don't violate NDA :)
Workflows are here: github.com/kudou-reira/SDXL_training_settings/tree/main/comfyUI_workflow/high_resolution_inpainting
Full timestamps:
Time stamps
00:00 Introduction
03:24 Resizing the image
05:04 Checking image size
07:00 Default mask
08:12 Preview bridge
09:23 Canvas tab
11:35 Naive inpainting
13:48 Issues with naive inpainting
15:06 Mask to region
16:34 Cut by mask
19:04 Resizing the crop image
19:47 Resizing the crop mask
20:45 Encoding the masked crop
21:48 Comparing naive and high-resolution inpainting
22:44 Compositing ksampler crop back onto the original image
25:40 Checking the robustness of the high-resolution inpainting workflow
27:32 Image blend by mask
27:56 Image downsampling on the composite
29:30 Acly's ComfyUI inpaint nodes (overview)
30:56 Acly's pre-process workflow
33:20 Fill masked options
34:38 Blur masked area
36:16 Fast inpaint
36:50 Outpainting with SDXL
41:44 Fill masked area (setup) with high-resolution inpainting
43:04 Rgthree nodes introduction
46:04 Fill masked area (visualize)
46:52 Integrating Fooocus patch
49:00 Tensor size mismatch error
50:16 Adding ControlNet depth
51:58 Rgthree bypass node on ControlNet depth
53:00 Adding ControlNet depth to the Fooocus patch workflow
54:06 Fill masked area integration
55:06 Comparing results with Acly pre-process
56:10 Image to mask error + fix
59:02 Fill masked area (blur)
59:36 Alternative blur method
01:00:34 Image composite masked
01:02:08 Fast inpaint
01:03:12 IP-Adapter overview
01:04:12 Removing redundant nodes
01:06:14 Series workflow (fill masked area)
01:07:40 Series workflow (blur masked area)
01:08:22 Group bypass/muter
01:09:12 Series workflow (fast inpaint model)
01:09:42 IP-Adapter crash course
01:11:20 Bypassing the IP-Adapter
01:12:20 Using high-resolution fast inpaint to remove objects
01:13:16 Applying reference image to high-resolution inpainting workflow
01:15:48 Integrating attention masking to the IP-Adapter inpainting
01:16:36 Why do we need double masking (attention + normal)?
01:18:06 Using IP-Adapter with multiple reference images
01:19:32 Addressing the opacity blend issue
01:20:48 Applying ControlNet to IP-Adapter
01:21:22 Adding pre-processing methods to IP-Adapter
01:22:29 Using multiple reference images for IP-Adapter
01:23:44 Compositing images inside ComfyUI (Canvas Tab)
01:26:48 Switch nodes (Comfy Impact)
01:29:16 Rgthree bookmarks
01:30:28 Pad image for outpainting
01:31:12 Integrating image padding into high-resolution inpainting workflow pt.1
01:31:44 pt.2
01:32:56 Outpainting from a bust-up image
01:34:20 Context nodes (rgthree)
01:35:52 Context switch (rgthree)
01:38:32 Replacing switch any with context switch
01:39:52 Toggle control system basics
01:41:40 Toggle between txt-2-img and high-resolution inpainting
01:43:12 Linking state across groups with relay node
01:44:20 Fixing the one-way relay issue
01:45:00 Linking the state of txt-2-img to image save node
01:46:00 Debugging the outpainting workflow
01:49:12 Debugging why both txt-2-img and inpainting run at the same time
01:51:00 Giving txt-2-img its own full scale/non-masked reference image
01:54:00 Conclusion
Congrats!
you deserve it hun, I wish you success and happiness
I never thought i would be taken back to Fluid Mechanics while watching a comfyui video 😅
thanks
Great vid! I tend to find myself wanting to conserve my productivity bandwidth (decrease mental latency) 1:24:11! Thanks for the heads up haha. Must've taken you a while to edit, appreciate all the effort, I learned a couple tricks!
Pretty nice to remember most inpainting workflows in one video and compare different similar processes in one go. Always wanted to test them myself side by side but never had the time to do so, and just Learning each procedure one by one only gives you a vague idea of pros and cons of each option is the overall "winner" or at least the best for a given situation. Thanks a lot, very fun to watch as well.
Nothing personal but watching so many non-native english speakers with weird accents makes me nervous (I'm a CG/VFX artist with both ADHD and ASD which help me a lot to focus in the things I love and get obsessed with like 3D/CG/VFX and now Generative Ai, but also anything that perturbs my attention like a non-standard accent drives me crazy, sorry for all the non-native speakers out there 🫣 it has nothing to do with segregation, racism, or anything like that, just phonemes and sounds my brain is not expecting in english and makes me unable to focus on anything else 🤦🏻♂️,
I'm sure other fellow ADHDers and TEAers can relate.
So finding new (natural sounding) English speaking and technically aquainted Ai demoers is quite a challenge. And I'm really happy to have stumbled upon your channel, subscribing and liking right away.
There are lots and lots of "newcomers" to the world of digital "art" that have never worked in anything related to drawing, painting, sculpting, 2d, 3d, VFX, and CG in general.
Sometimes this newcomers and self-called Ai experts that worked in who knows what before they decided to create youtube channels give advices that often make no sense whatsoever.
Just because something works in one specific test doesn't mean it's technically and artistically correct, nor even what the people behind the idea and code proposed it to be used.
Most of them don't take the time to read papers, and understand what's behind a certain idea or technique. Nor know anything about Navier-stokes, Euler, Karras, probabilistic algorithms, image Sampling, noise sampling, progressive refinement, Gaussian, perlin, brownian, noises, anything math related in the image creation field , and so on and so forth.
And as you said: good for them....for most people learnint Ai to have fun, make cool wallpapers for their PC, or Phones, some funny memes, and the like knowing anything more than what button to click, and what checkpoint to use is more than enough.
But for those of us who are not just having fun with a new tech that allows you to create cool images, with little to no talent, knowledge or expertise. But who are trying to learn, understand, and know all we can about this tech and starting to apply it to our daily workflows, and oiling up our gears and getting ready for what's coming sooner or later to our industry, finding channels like yours is really like finding a needle in a hay stack.
The future of CGI is inevitable linked to the future of Ai, even if those who predict we're all on the brink of extinction/unemployment, like it or not, Ai is here to stay!
Human decisions, talent, aesthetical taste and directed problem solving is not something machines will be able to do anytime soon.
So either as plugins, native options or entitely new software for designing, creating, manipulating, animating, editing, and compositing with Ai tools, be ought to be ready today. Tomorrow might be already too late.
And in that regard it's pretty frustrating to watch 10 or more minutes of wordy intros, excessive hype, and lots of nonsense or incorrect workflows that "just happen to work" but give you little control, or are slower to generate.
Whether you are native or not doesn't mean a thing, your English and pronunciation is flawless. I suppose you are an asian descendant but raised in the US, because of your user name, the kind of art you show, and your calm way of speaking, just guessing here, of course, but whoever you are THANKS A LOT for sharing your knowledge tests and findings.
Sorry for the super long comment (it's part of the ADHD side of my personality, I'm very talkative and completely unable to summarize my ideas when I write or type 😅)
Kudos, congrats for your starting at Stability Ai, and please, keep up with your great videos!
Thanks for your kind words. As you have correctly guessed, I was born, lived, and studied in the United States for my entire life until deciding to move to Japan.
There are a few reasons why I made a TH-cam channel in the first place.
First, I felt that there were plenty of underqualified creators with poor quality channels trying to distort/shortcut design, CGI, and machine learning/generative AI. It's probably a large reason why my models/techniques/workflows are very different from the generic Gen AI influencer.
Second, I like to get very technical and also have the traditional industry pipeline training, so I wanted to show how I would use these new tools to help me in my creative process.
Thanks for stopping by!
Congratulations! It sounds really exciting!
Thanks for the kind words! It's been fun so far.
Congrats on the exciting new position!
The best! I looked through a bunch of tutorials, everywhere it’s just “connect this here, then here” and what it all does and how it works? God bless you man you saved my day
Thanks for the kind words. 2 hours is a long time, so I understand that it's not for everyone. But, I personally thought it was a resource worth making. Hopefully, this helps everyone create/improve their own workflows.
Thansk for your video. This is a truckload of very good material. And congrats on your new job!
I might have stumbled upon another bug here. The blur masked node lets my SD server simply crash whenever I use a blur > 5. I also had those other bugs you had., although the weird checkered alpha mask occured AFTER I used your "hack". Before that, I "just" had problems with getting the mask blurred at all. Something here seems very buggy. Never ran in such a buggy situation with comfy before!
Thanks for the kind words. For the blur crash, I suggest trying other nodes that are supposed to do the same thing from other node creators if it doesn't work. Blurring at a high strength might be computationally difficult task (not sure). Also, for the checkered alpha mask, it's either a bug with newer version of Acly's inpaint nodes, or you'll just have to manually change the grow mask each time. Or, you can blur the mask, then solidify it again with a certain intensity threshold (127 is gray, 0 is black, so pixels around 10? or less must be converted to black).
Do I understand the workflow correctly that the nodes will always have to be manually adjusted to each new picture and the cropped part? That's so much work for doing something that works with A1111 automatically. I wish there was an easier workflow than this, but thanks for pointing out a way to do it. I haven't seen anyone else doing a tutorial on high-resolution inpainting in ComfyUI.
What do you mean? You can change the crop resolution if you want, but I just leave it at 1024 x 1024 pixels usually. The workflow will automatically resize the inpainted area (assuming it's a single continuous entity and not separate blobs) into your desired crop resolution (i.e. 1024 x 1024 pixels), do the inpainting, then resize it back down at the end.
That's the core logic. Everything else I added such as the pre-processing of the masked area (blurs, etc.) and ControlNets/IPAdapter are just extra levels of customizability.
Maybe I'm not understanding your question correctly. However, this is the "simplest" way I could think of to do the same method as A1111 in ComfyUI.
Congratulations
Congrats on the position! Also I think the the reason for resize width and height not doing anything at 4:40 and 6:15 is due to the mode on the image resize. You use the mode drop down to pick between using rescale factor vs resize height and width to specify the operation.
Thanks! I think what you said is probably right and I'll try it when I have time. It would be useful for situations that require precise resolution.
Wish this video had a tl:dr esque section where if you just wanted to test the workflow you could more easily jump into & try the workflow that way.
I did that at first and showed some friends the workflows and explanations, but they couldn't follow it at all. So, that's why I made this video.
Congrats. Is Stability AI a fully remote company?
Thanks! There are offices in some countries, but if Stability AI is really interested in recruiting you, they will let you be remote from wherever.
@@kasukanra Nice man. Congrats, dream job
noob question: how to see the corresponding custom node library name on the top of each individual node?
If you have the ComfyUI Manager add on, you can open it up, go to the fourth option on the left (Badge), and change it to #ID Nickname.
@@kasukanra thanks
Congrats on new role….
Great example of a very knowledgeable person who is sadly a terrible teacher. I wish you would understand didactics and how to convey knowledge in a manner suitable to those who don’t already know 90 percent of this.
You evidently attempt structure with those section titles, but each segment is all over the place and all you actually do is walk through the nodes, which any student of your topic can do in their own time.
It would be a thumbs up if you would actually explain what is going on, and spend less time frantically zooming and panning all over this overly convoluted workflow.
I grant you that you know this topic well, but as a tutorial, it really fails. Which is unfortunate given the effort put into it.
Thanks for the feedback. I've already started pulling back on creating detailed videos.
Also, you are mistaken about this being a tutorial. I shared the workflow before creating this video, and even experienced people could not understand the logic.
This is why I made this video. I agree with you that I'm not a teacher, but I don't claim to be one, either.