Just for the record, and in case anyone doesn't recognise them, the three men pictured at around 2:00 are, left to right: Yoshua Benjio, Geoffrey Hinton and Yann Lecun, three legends who have been doing pioneering work in AI over several decades. I believe that image dates from 2018 when they won the ACM A.M. Turing Award for their work in deep neural networks. Professor Hinton has recently been awarded a Nobel Prize (for physics although some say that his work is not really physics).
I think for using these models commercially this is a big stepup. I can imagine that often clients want this type of work, where two persons from an image are maybe drawn on an image or similar stuff.
What actual model is being used for the image generation though? I mean as I understand it, this is like a whole bunch of different utility models mashed together with an image generation model, so what is generating the actual images? The functionality looks great, but the actual images are so inferior to things like Flux/SDXL especially ones that we've tweaked and setup exactly like we want with LORAs and what not. Is it possible to essentially replace the image generation model inside Omnigen while retaining the rest of the functionality?
I noticed that if I were to submit 2 images of people, they come out not looking like the 2 original characters, in fact they are quite far from the original. So, I took a picture of the 2 people you used in your example for the 2 people at the coffee shop and I used the same parameters as what you used, and the 2 people that Omingen generated didn't look like the same people in your input images. So I don't know how you got the result you got (which looked so close to the people in your input images). In fact I tried this with many different images, and none of them looked like the people in my input images. I'm guessing unless the people were trained in the model, (or famous people) they wouldn't come out the same. In this aspect, I can't use it to combine 2 people into 1 photo. It only works on people that have been trained in the model.
Another entertaining, perfectly-orchestrated video, with plenty of sly humor and subtle jabs to the 'you know who's and their secret stash of cheese. (or not, could be me!) 👨💻 Cheers to the only mouse nerdier than the Redragon M908 Impact with the oodles and oodles of buttons for MMO. I have one somewhere and might need to rig up Comfy with X-Mouse. So thankful for that opensource is actually available, that you're able to do this in a way that benefits everybody, AND that I have the big ass GPU to do it. It's a fun time to be a nerd, but the 90's...🤌 has control of the board imo. I don't game anymore really, and I hope I never have to justify this video card to anybody not into AI.
What's the window message written in Chinese that pops up when the String Latent node crashes all about, telling me I need to obtain 2 API Keys from Baidu & Big Models websites, if I want Comfy UI to be unhobbled and be able to produce more "polished" images? Huh? I don't need to enter my prompts in Chinese, so this extortion message doesn't apply to me?
OmniGen auto-downloaded it but I keep getting an error: Failed to import OmniGen. Please check if the code was downloaded correctly. It downloaded to /models/LLM/OmniGen-v1
Also had the same issue, checked the model.safetensor file in that directory and found it to be smaller than expected so possibly a download issue, redownloading now to give it another go. ADDITION: Hmm, listed size on HF is 15.5gb while the size reported when downloading is 14.4gb (as was the file it had autodownloaded), i wonder if its a miscount of 1000/1024.
@@zid_just_zid I don't know if you've sorted it but figured this was worth mentioning. I used the same model I had in my Pinokio OmiGen install and win explorer lists that as 15.1Gb and that worked with that webui. I think I saw the one on CivitAI listed as 14.4Gb but that may not equal the actual size downloaded and I didn't get it because I obviously I didn't need it. Figured it was worth a mention.
Excuse me, sir. I have a question. Is this possible to make 3 ~ 4 people at the same time on one picture? or more people ( I choosen ) appear possible, sir? : ) I always respect you, sir. Thx a lot as always. ( I did subscribed , love )
I haven't tried, but base on the video, you can combine multiple OmniGen nodes and since each node combines max of 3 images... so technically, you can combine more than 3 if you chain the nodes together.
stuck at "Loading safetensors for like 15 min now", I've got a 3090 24 GB gpu, and otherwise powerful pc. resolution 1152 x 896. That should work on my GPU right?
I finally gave this a try and unless something changed, Omnigen comfyui by 1038lab is a nightmare. it takes way too long to use because literally every gen you do has to reload the entire model. I ended up switching to the AIFSH one, and it's much better due to the fact it actually keeps checkpoint in memory and is much faster due to this. It's documentation is lacking for sure, and i had to fix a .py file, the nodes py file was missing import gc near the top, i kept getting an error about gc when attempting to run it until i added that.. thanks chatgpt lol.
@@NerdyRodent pretty amazed even with that hardware it takes a good 3 mins to generate with 1K inputs and canvas. Maybe I should test it on a rtx A6000 instance, time the session and then promptly terminate it
I've tried it, it really can fuse images, but beware it's not for consumer hardware, on very good Nvidia card this fusing takes hours and result may be total waste of time, simple text generation is faster but quality is awful, like 3 year old stable diffusion. Can't be used outside cloud.
I have 16Gb card, maybe it tuned mostly to process humans on photos, with objects or animals the output result is really weak (thats about fusing, text to image is like stable diff 1.5).
Phi3Transformer does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. I've got this error. How to solve?
After a week of trying, I got this node working for about 18 hours, but then an update broke it again. :-/ I *really* want Comfy to support this or ACE, but this does faces better.
Just for the record, and in case anyone doesn't recognise them, the three men pictured at around 2:00 are, left to right: Yoshua Benjio, Geoffrey Hinton and Yann Lecun, three legends who have been doing pioneering work in AI over several decades. I believe that image dates from 2018 when they won the ACM A.M. Turing Award for their work in deep neural networks. Professor Hinton has recently been awarded a Nobel Prize (for physics although some say that his work is not really physics).
I think for using these models commercially this is a big stepup. I can imagine that often clients want this type of work, where two persons from an image are maybe drawn on an image or similar stuff.
That's great so long as the clients want a picture of two different people who happen to look broadly similar to the originals.
Thanks Nerdy!
As usual: Great Video! About a supercool new Tool to play with!
Glad you liked it!
Tool wait what new album? oh just odd caps. *whispers at nerdy in comment above* yo that looks like something ai would say my man
What actual model is being used for the image generation though? I mean as I understand it, this is like a whole bunch of different utility models mashed together with an image generation model, so what is generating the actual images? The functionality looks great, but the actual images are so inferior to things like Flux/SDXL especially ones that we've tweaked and setup exactly like we want with LORAs and what not.
Is it possible to essentially replace the image generation model inside Omnigen while retaining the rest of the functionality?
following this thread
I noticed that if I were to submit 2 images of people, they come out not looking like the 2 original characters, in fact they are quite far from the original. So, I took a picture of the 2 people you used in your example for the 2 people at the coffee shop and I used the same parameters as what you used, and the 2 people that Omingen generated didn't look like the same people in your input images. So I don't know how you got the result you got (which looked so close to the people in your input images). In fact I tried this with many different images, and none of them looked like the people in my input images. I'm guessing unless the people were trained in the model, (or famous people) they wouldn't come out the same. In this aspect, I can't use it to combine 2 people into 1 photo. It only works on people that have been trained in the model.
What a great video, thank you very much... Sorry, but could you share that workflow you created at the end with Omnigen and Upscale?
this is very impressive thanks for sharing!
Where are the models or safetensoners stored in OmniGen it apears that when using cumfyui it downloads the each time, please advise
Experimenting with it now. Not combining images but changing the style of an image. Takes forever though (GTX3080)
Getting "Failed to import OmniGen. Please check if the code was downloaded correctly." 😖
Same here :/
Full of bugs. I fixed 5 and still not working
I tried a couple of comfy nodes and they keep given one error to the next. Doesn't look ready for mass usage.
An interesting non sequitur
can you share the upscaling workflow, please
Another entertaining, perfectly-orchestrated video, with plenty of sly humor and subtle jabs to the 'you know who's and their secret stash of cheese. (or not, could be me!)
👨💻
Cheers to the only mouse nerdier than the Redragon M908 Impact with the oodles and oodles of buttons for MMO. I have one somewhere and might need to rig up Comfy with X-Mouse. So thankful for that opensource is actually available, that you're able to do this in a way that benefits everybody, AND that I have the big ass GPU to do it. It's a fun time to be a nerd, but the 90's...🤌 has control of the board imo. I don't game anymore really, and I hope I never have to justify this video card to anybody not into AI.
What's the window message written in Chinese that pops up when the String Latent node crashes all about, telling me I need to obtain 2 API Keys from Baidu & Big Models websites, if I want Comfy UI to be unhobbled and be able to produce more "polished" images?
Huh?
I don't need to enter my prompts in Chinese, so this extortion message doesn't apply to me?
But how do we open it after installed? Thanks
OmniGen auto-downloaded it but I keep getting an error: Failed to import OmniGen. Please check if the code was downloaded correctly.
It downloaded to /models/LLM/OmniGen-v1
@@sk32md same here
Also had the same issue, checked the model.safetensor file in that directory and found it to be smaller than expected so possibly a download issue, redownloading now to give it another go.
ADDITION: Hmm, listed size on HF is 15.5gb while the size reported when downloading is 14.4gb (as was the file it had autodownloaded), i wonder if its a miscount of 1000/1024.
No joy for me either.
@@zid_just_zid I don't know if you've sorted it but figured this was worth mentioning. I used the same model I had in my Pinokio OmiGen install and win explorer lists that as 15.1Gb and that worked with that webui. I think I saw the one on CivitAI listed as 14.4Gb but that may not equal the actual size downloaded and I didn't get it because I obviously I didn't need it. Figured it was worth a mention.
Upgrading diffusers through pip worked for me.
Excuse me, sir. I have a question. Is this possible to make 3 ~ 4 people at the same time on one picture? or more people ( I choosen ) appear possible, sir? : ) I always respect you, sir. Thx a lot as always. ( I did subscribed , love )
I haven't tried, but base on the video, you can combine multiple OmniGen nodes and since each node combines max of 3 images... so technically, you can combine more than 3 if you chain the nodes together.
@@VuTCNguyenArtist I got it. thx a lot as always. I gonna try, sir! : )
Can Omnigen do writings and letters and so on?
Want to make movie posters with the characters of my dnd campaign :-)
stuck at "Loading safetensors for like 15 min now", I've got a 3090 24 GB gpu, and otherwise powerful pc. resolution 1152 x 896. That should work on my GPU right?
got it to work by closing comfy and opening again. at first attempt faces look like sh*t
Thanks , Great Video! bro, but not fast as used to be its Ben covered for a while (positive criticism)
I finally gave this a try and unless something changed, Omnigen comfyui by 1038lab is a nightmare. it takes way too long to use because literally every gen you do has to reload the entire model. I ended up switching to the AIFSH one, and it's much better due to the fact it actually keeps checkpoint in memory and is much faster due to this. It's documentation is lacking for sure, and i had to fix a .py file, the nodes py file was missing import gc near the top, i kept getting an error about gc when attempting to run it until i added that.. thanks chatgpt lol.
@@zengrath the one from aifsh also has lots of bugs. I couldn't get either to work.
people and background, the background is photographed from a real place? can it look real?
Thanks just the thing to try out on my 35 c/hr cloud 4090
Nice!
@@NerdyRodent pretty amazed even with that hardware it takes a good 3 mins to generate with 1K inputs and canvas. Maybe I should test it on a rtx A6000 instance, time the session and then promptly terminate it
As detailed in the video, it’s about one minute and 30 seconds on my 3090
@@NerdyRodent ok next time I'll use a stopwatch, it's just than on a 4090 anything above a few seconds seems like an eternity
@@NerdyRodent Checked took 1 minute 18 secs on the 4090. not really ,much of a boost.
can you give the workflow?
Example workflows are in the examples directory 😉
having been through hell with pinokio, ironically it's nose shoulfd be longer, i'm left with no choice but to use this.
I've tried it, it really can fuse images, but beware it's not for consumer hardware, on very good Nvidia card this fusing takes hours and result may be total waste of time, simple text generation is faster but quality is awful, like 3 year old stable diffusion. Can't be used outside cloud.
Yes, it’s best to have at least 16GB VRAM these days!
@@fontenbleau not even on a 4090?
@@NerdyRodentI have a 4090 and I thought it was rather slow, personally
I have a100, its very slow.. 130s for 1 image..
I have 16Gb card, maybe it tuned mostly to process humans on photos, with objects or animals the output result is really weak (thats about fusing, text to image is like stable diff 1.5).
OUT OF MEMORY
Use the memory optimisations if you have a low VRAM card. It has even more than in the video now 😃
Phi3Transformer does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet.
I've got this error. How to solve?
I just wanna use the Loras I made of my girlfiend and I to make funny Facebook posts without generative filling two photos together lol.
Oh, Nerdy Rodent! 🐭🎵
He really makes my day! ☀
Showing us AI, 🤖
in a really British way! ☕🎶
After a week of trying, I got this node working for about 18 hours, but then an update broke it again. :-/
I *really* want Comfy to support this or ACE, but this does faces better.
First!
Hero! 😂