This AI image generator does EVERYTHING
ฝัง
- เผยแพร่เมื่อ 7 ก.พ. 2025
- OmniGen can create & edit images with natural language. No more controlnet, loras, inpaint.
#ainews #ai #agi #singularity
TurboType helps you type faster with keyboard shortcuts. Use it for FREE:
www.turbotype....
OmniGen: arxiv.org/pdf/...
Newsletter: aisearch.subst...
Find AI tools & jobs: ai-search.io/
Support: ko-fi.com/aise...
Here's my equipment, in case you're wondering:
Dell Precision 5690: www.dell.com/e...
GPU: Nvidia RTX 5000 Ada nvda.ws/3zfqGqS
Mouse/Keyboard: ALOGIC Echelon bit.ly/alogic-...
Mic: Shure SM7B amzn.to/3DErjt1
Audio interface: Scarlett Solo amzn.to/3qELMeu
TurboType helps you type faster with keyboard shortcuts. Use it for FREE:
www.turbotype.app/
👋 hi
Why are doomed. It has been a good one guys 🫡
You can keep reviewing papers while advertising TurboType, or you can be more informative. It’s your choice.
Released 🎉
This is perfect for someone who want to create games, visual novels, comics and manga without spending tons of time learning and practicing the tools, I'd be excited to give it a try when it's out
likewise!
Sound great but people will be shocked and reject it when they find out it was created by AI.
@@wwk279nowadays everything is generated using AI!
@@wwk279 overtime i think people will get used to Ai genereted content as it gets better and better
What if this paper is itself generated by AI 😂 @@truelies5431
Imagine next year when this level of control is available for AI video.
Imagine the year after next. OMG it is going to be so great
I can not wait
@@MartensFamilyHomeMovieswhat about year after "after next one" what's gonna happen then?
Next year would be too optimistic. AI can barely generate 5 second videos and takes forever to create them. My bet is 3 years if we are optimistic.
@@elon-69-musk really right!
This is a staggering upgrade to AI image generation. We are on year 2 of the OpenAI public era. I can not imagine what comes next, let alone, the next year!
exponential growth!
I've been telling you all for 4 years now
@@TheThetruthmaster1 be honest your prediction just happened to come true lol
Its just a paper dude. Its not real until we can use it.
The paper shows that it's becoming real. Because if they can do it ijn a paper, then it can be done without a paper.
Yep, this video is nothing
@@ignoreme1141 Yeah but most of these things begin wth papers. right? that's how it works
Sadly, that's the truth. There haven't been one or two duds that promise great things, and deliver none.
@@ignoreme1141 Why do you seem angry that it came out as a paper first? all of the others who came out with something publish papers first. were you mad at them too? a little gratitude isnt a bad thing
dopepics AI fixes this. AI image generator does everything.
The fact that this was released and there isnt a single TH-cam video is wild.
Some of the early image gens had bad prompt adherence and you just glossed over that part. For example, a blonde woman was asked for and a brunette (dirty blonde at best) was generated. She also had no clothes compared to the "minimally clothed" that was asked for, background was not magenta. You said the prompt was "indeed what we get"
I agree, this looks like a weak version of ComfyUI using ControlNet, which AI Search is aware of exists. Good and specific images take effort and time. There are multiple tools where you can just use one image of a character and you can generate images with their face without issues. ControlNet is king right now.
ok dude we all saw that. don't be picky.
@@cyberprompt It's not about being picky. It's not a useful tool if it doesn't generate what you prompt for. This is a big issue for people working with this stuff.
The point of this technology is not really prompt adherence. It will be made OpenSource, so the abilities detailed in the video will (probably very soon, it all goes so fast) be attached to generation tools with better prompt adherence like Ideogram for example.
It also messed up the the second image in the iron man prompt.
That's a monkey like samurai character , but in the output made him very human looking.
When SD first came out, I went all in, first to just understand the concepts, then learn all the extras. Now I'm waiting. There have been so many advances and competition for what is "best" I don't want to burn myself out while the winner is decided. Or max out my hard drive with unnecessary tools and models.
I feel the same! There's just so much going on
it is exhausting
This is how JavaScript web developers feel when there's always a new trend
I feel this is a dream I've been having for years about to come true. I can only hope this will come out with a free limited version for those of us who crave creation but truly can't afford anything...
i hope so too!
You died a very long time ago old friend... Read Codex of the Celestial Dream 🙏
@@NakedSageAstrology Please be careful what you say to people regarding death. I'm struggling with depression and almost killed myself recently. It has the potential to create the opposite effect you're aiming for.
@@aiwkua it's ok human the robots will takeover soon
@@aiwkua stay strong!
what's your daddy do for living? he's professional design graphic....
teacher: (silent and smile with burrowing her eyebrow)
transformers are one of the coolest technologies to have been invented recently - and to think thankfully google released the attention is all you need paper in 2017. Its quite a general architecture
Sounds really cool. Looking forward to it's eventual release.
i can't wait to try it out!
I need this thing. This is beyond amazing.
wow... this is definitely would be my bucket list for testing...
can't wait for it to be released!
The proof is gonna be in the pudding (or not...)
Hey there! I've really enjoyed most of the videos you've published on TH-cam. As I'm currently experimenting with AI locally, I've been frequently visiting your channel since the spring of this year. I want to express my gratitude for all your efforts and the excellent explanations of AI's capabilities. Honestly, I'm tired of the endless parade of chatty AI newsreaders hogging TH-cam. I'm increasingly drawn to faceless channels like yours that provide clear explanations of AI processes, along with examples and visuals that are easy to follow, unlike other channels that are just jumping on the AI bandwagon. Keep up the fantastic work and enjoy what you do! 🤘😉
Thank you so much!
I don't understand why but you always manage to amaze me and make me think ‘as if something like this already exists it's so incredibly fascinating!’ your videos are always the best and I love it when you always show something new better than the old which always beats the old by a lot I LOVE THAT ABOUT YOUR VIDEOS! Youre my fav ai youtuber!
Wow, thank you!
Amazing man , I think your way of explaining and enjoyable storyteller 🎉
This is what I've been waiting for! I think other image generators will start having problems soon.
❤ love the progress
I believe this is what the omni-modality of GPT-4o has to be. The ability to chat with a model that inherently can create images.
Now that we have this, it really makes me wonder why the GPT-4o we're getting is still nerfed.
Now that you mention it, I had totally forgotten about 4o's multimodal features. In theory, 4o should be able to do what this paper claims as well. It's strange that it still resorts to DALLE for images instead of using it's native image capabilities. Maybe they still need to figure out the guardrails
@@theAIsearch It seems to me a lot of big AI youtubers and influencers are forgetting this...
@@absolutedoruiyaaa4736it doesn’t have image gen. It has text, vision and speech embedded in the model, but not image gen.
Someday, when this kind of control is possible, we will enjoy being empowered. For now it’s just a wish list kind of like describing what systems we might use to get to Mars.
i will suggest my group to check out you im impressed
Thank you for your excellent programs as always.
You're welcome!
Holy #! Can't wait to get my hands on this once it's released.
that upscaling (or deblur) at 10:10 looks mindblowing, I wonder if that quality will be available for videos too in the next future
How you always stay updated 😂😂 , good job man , i apriciate you hard work ❤❤❤❤❤❤❤
Thanks!
Mark my words, people will use this to ask the AI to strip naked people by providing a photo.
open source devs need to step up, i'm tired of chinese research papers and stuff like this, but not an actual tool i can actually use for free and locally
so you want to be spoon fed? create something yourself instead of expecting people to spend tons of money and time to release something for free
I just wanted to take a moment to express my gratitude for the amazing video you created on AI Your dedication and creativity truly shine through, and it’s clear how much effort you put into making such valuable content. I know how hard it is to start and generate subscriptions. I watched the full video, because it was interesting. Please keep us updated on this Generator. As a new content creator what would you say is the best image to video generator to start to use.? I use Kling but it takes way too long, to be honest i dont really find it great
Thank you again for your incredible work. Keep up the fantastic content!
Thanks! For image to video, Kling is your best bet. Other options include Luma (worse quality) and Runway (expensive af)
@@theAIsearch Thank you i will keep watching you a new subscriber
This means I can start teaching my daughter how to use AI after the release of the 'OmniGen'😀😇
Consider this your cheat sheet for applying the video's advice:
1. Explore OmniGen as a user-friendly alternative to traditional AI image editing tools.
2. Communicate your desired image edits to OmniGen using natural language prompts.
3. Experiment with OmniGen's ability to add objects, change colors, adjust poses, and generate depth maps.
4. Understand the power of OmniGen's unified architecture, combining a VAE, LLM, and diffusion model.
5. Leverage OmniGen's emergent abilities for multi-step editing and in-context learning.
6. Start with simple edits and progressively increase complexity to maximize your results with OmniGen.
7. Be aware of OmniGen's limitations, such as prompt sensitivity and occasional struggles with hands and fingers, and unfamiliar image types.
8. Stay informed about the evolving capabilities and future developments of OmniGen.
This will be extremely useful for everything from forensics to TH-cam thumbnails. I can't wait for this to come out!
Seems too good to be true
That is impressive. Can it also outpaint an image? That would be great.
Very intriguing. Although given the current state of AI development and previous heartbreaks, I'm practicing a healthy dose of skepticism. Hopefully when we get some hands on experience with it, the model holds true to the article's claims.
there are as many outrageous yet still true claims as grifts in the AI space which 1 what makes the grifts believable and 2 what makes AI so special, the fact that apparent scams are real technology
OmniGen's ability to simplify complex image generation is a huge step forward. Seeing it in action shows just how intuitive AI tools can become for creative professionals!
Release date?
Since the software is being developed in China, it will probably be heavily used there for CCP matters, as they already do with AI software.
sorry, but your question about what you can do with the software is a trigger these days. All this software is cool, but unfortunately it also has its dark sides. In addition, for data protection reasons, these tools only become interesting when you can use them offline, without being an IT guy. :)
thank you! any guess when this will come out?
There was actually something like this back in SD1.5 era. I've been trying to find it again, but I believe it had the word llama in it. It was injected into specific layers like LoRa and controlnet, but it was an LLM that could comprehend all these things in the image and let you command it using natural language instead of prompting.
I probably wouldn't use these as the quality looks poor compared to a well trained LoRa on Flux (esp the wu kong example which probably wasn't already in the base model like bill gates would be), and you have much more control over composition with something like latent coupling, but hopefully future models use these techniques
Oh interesting. Thanks for sharing!
Llava was the captioner
@@SearchingForSounds that's a vision llm , but this was an A1111 extension that was primarily an embedding into an AI art model, the llm was just controlling it. also long before we had multimodal llms
Have you tried using this app? Because to me it looks like a lot of these examples are fakes.
JUSTICE FOR BOB
one ring to dominate them all!
😃
If adobe finds out about this one, they're getting shut down, it's adobe's motto after all, end the competition by buying them
it won't be long before you can specify anything. we already have video. next is to create a persona for companionship. it's always that direction.
Twitter disinformation just levelled up
Release it an make it open without restrictions and I am hyped
I look forward to conversational AI merging with these models so we can just chat back and forth with the generators and tweak our images together into a final image. I'm sad the conversational Pi chatbot was abandoned.
Hopefully OpenAI and Apple release their models to the masses.
Pi just got a huge upgrade and he is faster and smarter than ever 😊😊
They are simply using a language model to call various tools like the IPAdapter for that ironman example or depth, canny, and open pose models.
An excellent work, but nothing extraordinary. You can do all of that already in A1111 or Comfy
No, it's an LLM that's specifically trained to respond with images based on input images, similar to how text-based LLMs like gpt4o respond with unique text based on your input. It's not a collection of different tools working together behind the scenes. It's an LLM that accepts both images and text as input, but unlike gpt4o which only outputs text, this one outputs images.
Please raise the loudness of audio when you edit the video.
Thanks for your share! in personally mind i don't think this model will become dominator in image generator because it seems can't supply ability for our job in fine-grained like ControlNet or IPAdapter, particularly in consistent images workflows.
This is the similar methods used by udio and suno, they learn from reference audio plus text description
Nuts! Imagine the possibilities
looks like it understands the general physics/objects in the image and can construct the prompt then just mixing it up like stable diffusion, some martial arts or gymnastic pics would be nice. Strange body positions or interactions are a huge problem with the image creators from beginning the year. Or just human object interactions, like a female woodworker climbs a tree with a belt full of working tools in the early morning hours, watched by a curious crow sitting on a nearby branch.
this might be peak ai image generating
Wait I was looking at mechanical turk jobs a little bit ago, and one of the jobs was to do what this image bot does - identify things like 'where can i wash my hands' or 'what can hold water'. Interesting.
Doesn't show me an image of where is your current location, it sucks
???
yes but probably the model was already trained with images of famous people / movie characters so it is no "magic without lora" as advertised in video I guess
Seems too good to be true. Wouldn't be the first time a chinese paper claimed it can do things it can't
Thanks
So when this ai becomes available will you make a video explaining how to use it?
definitely!
Would you recommend me to buy RTX 4060 TI In open source AI applications? Or 3060? Because of the 12gb of vram❤
looks cool nice
This will be good for architectural renderings. Say goodbye to overpriced rendering studios.
"What can be used to hold water" is a philosophical question. Technically, the woman already holds water or can hold it by drinking it. Plants contain water so they're holding it. The entire room could hold water.
I'm pretty sure this is Adobe's vision for Photoshop Ai
But they are so far off being this capable it's laughable. This new Ai is going to make so many peoples tasks so much easier
Мы давно ждём такую штуку. + сеть должна понимать направление и положение в пространстве.
Wonderful
Every day we get 1 step closer to the Holodeck. 😁
Yeah I just hope the final Holodeck wont be hampered by "ethic" and "moral" standards. It's the perfect playground for humanitys darker aspects, allowing us to separate it into a virtual space where it does no real harm.
This is fire
😃
Awsome dude, but Q: how the hell did you find this repo??? it's not really famous or anything???
I don't really remember - probably on X. There's so much crazy news everyday it's hard to keep up
a trully intelligent AI image generator :3
I can say or fact that currnt AI doesn't suck.
The worst things about all of these are that there is not a button to click and purchase and then the bar to write the script. It is a hell to find out how to use them or the APIs or discords or whatever. I want to use Flux1 and this to see what they compare and I just cannot....
There's a great tradition of unbelievable papers promising things that never matetialize. This "review" is like preordering a game based on a cinematic. It's a great way to get scammed.
looking forward to this, if the model actually does come out...free
*not free tho and 10 to 20 bux for an average user is untenable* 😅
Need this in comfyui asap...
it would be great if it generates directly PSDs with separate layers. A Designer then can work on the details, make changes easily. Adobe, are u listening?
In future we can program our babes
that's what i'm waiting for
I don't believe it until I test it 😜
Trying to replace LoRAs with one-image thing is certainly convenient if you just want to create low-effort memes to share with friends. But you are inherently giving the AI less data and details, and will almost always result in a lower quality result. This is still an impressive model and I appreciate that they are trying to package everything into a user-friendly interface with a fast workflow, but "There's no longer any need to train LoRAs" is hyperbole bordering on misinformation.
Now this is a game changer!!
Bro can you suggest any open source ai for captions generation for free..?
So ChatGPT but images.
basically
it's Chat SD!
would be interesting to see how it handles video generation.... asking it to generate the next frame, do this 1'000 times to get a 40 second video 🙂
How to use it? What's the link?
Shouldn't we be able to use an AI for ComfyUI? This way we should be able to use all the different tools with a single prompt even without OmniGen. The workflow json file could be AI generated as well and this should be no rocket science. Am I right? What do you think? I could start researching about this but maybe there already is something I don't know about.
cool idea. it sounds possible, since it's just json generation
As long as it is unrestricted.
If this model hits the net before the election, which it will, it'll have an effect.
are you thinking of deepfakes?
@@theAIsearch I just know what an out of context Facebook post was able to do and the Bill Gates photo scares me.
@@theAIsearch I would love to see the falsehoods of this paper dived into as others have pointed out the Bill Gates image is from 2018 and can be found on Alamy.
"Shanghai, China. 5th Nov, 2018. Bill Gates, chairman of the board of Terra Power, LLC, and Jack Ma, executive chairman of Alibaba Group, talk ahead of the Parallel Session on Trade and Innovation of the Hongqiao International Economic and Trade Forum in Shanghai, east China, Nov. 5, 2018. Credit: Shen Bohan/Xinhua/Alamy Live News"
Unfortunately many papers just papers and we have nothing to use years after it
Poor Bob! Stop giving ideas to his enemies!
Cool video
lemme guess this is equal to llama 3 405b model in file size :D
if it is true ... then so much easy to make consistent character
How do we get it?
If this is true and not just showing off things it has been purposely trained on for this paper, this is the beginning of the end for being able to tell what's real and what isn't. Why use Photoshop to make an image when you can generate it. Even if you make it yourself, people will be suspicious of you. I for one am not excited for it.
We will never see this be open source
I wonder if it will be possible to run it on a consumer level GPU
Well the 5000 series is going to launch this year so I mean, probably on such a beast the model should ruin