This AI image generator does EVERYTHING
ฝัง
- เผยแพร่เมื่อ 27 ธ.ค. 2024
- OmniGen can create & edit images with natural language. No more controlnet, loras, inpaint.
#ainews #ai #agi #singularity
TurboType helps you type faster with keyboard shortcuts. Use it for FREE:
www.turbotype....
OmniGen: arxiv.org/pdf/...
Newsletter: aisearch.subst...
Find AI tools & jobs: ai-search.io/
Support: ko-fi.com/aise...
Here's my equipment, in case you're wondering:
Dell Precision 5690: www.dell.com/e...
GPU: Nvidia RTX 5000 Ada nvda.ws/3zfqGqS
Mouse/Keyboard: ALOGIC Echelon bit.ly/alogic-...
Mic: Shure SM7B amzn.to/3DErjt1
Audio interface: Scarlett Solo amzn.to/3qELMeu
TurboType helps you type faster with keyboard shortcuts. Use it for FREE:
www.turbotype.app/
👋 hi
Why are doomed. It has been a good one guys 🫡
You can keep reviewing papers while advertising TurboType, or you can be more informative. It’s your choice.
Released 🎉
This is perfect for someone who want to create games, visual novels, comics and manga without spending tons of time learning and practicing the tools, I'd be excited to give it a try when it's out
likewise!
Sound great but people will be shocked and reject it when they find out it was created by AI.
@@wwk279nowadays everything is generated using AI!
@@wwk279 overtime i think people will get used to Ai genereted content as it gets better and better
What if this paper is itself generated by AI 😂 @@truelies5431
Some of the early image gens had bad prompt adherence and you just glossed over that part. For example, a blonde woman was asked for and a brunette (dirty blonde at best) was generated. She also had no clothes compared to the "minimally clothed" that was asked for, background was not magenta. You said the prompt was "indeed what we get"
I agree, this looks like a weak version of ComfyUI using ControlNet, which AI Search is aware of exists. Good and specific images take effort and time. There are multiple tools where you can just use one image of a character and you can generate images with their face without issues. ControlNet is king right now.
ok dude we all saw that. don't be picky.
@@cyberprompt It's not about being picky. It's not a useful tool if it doesn't generate what you prompt for. This is a big issue for people working with this stuff.
The point of this technology is not really prompt adherence. It will be made OpenSource, so the abilities detailed in the video will (probably very soon, it all goes so fast) be attached to generation tools with better prompt adherence like Ideogram for example.
It also messed up the the second image in the iron man prompt.
That's a monkey like samurai character , but in the output made him very human looking.
dopepics AI fixes this. AI image generator does everything.
Its just a paper dude. Its not real until we can use it.
The paper shows that it's becoming real. Because if they can do it ijn a paper, then it can be done without a paper.
Yep, this video is nothing
@@ignoreme1141 Yeah but most of these things begin wth papers. right? that's how it works
Sadly, that's the truth. There haven't been one or two duds that promise great things, and deliver none.
@@ignoreme1141 Why do you seem angry that it came out as a paper first? all of the others who came out with something publish papers first. were you mad at them too? a little gratitude isnt a bad thing
Imagine next year when this level of control is available for AI video.
Imagine the year after next. OMG it is going to be so great
I can not wait
@@robertmartens7839what about year after "after next one" what's gonna happen then?
Next year would be too optimistic. AI can barely generate 5 second videos and takes forever to create them. My bet is 3 years if we are optimistic.
@@elon-69-musk really right!
This is a staggering upgrade to AI image generation. We are on year 2 of the OpenAI public era. I can not imagine what comes next, let alone, the next year!
exponential growth!
I've been telling you all for 4 years now
@@TheThetruthmaster1 be honest your prediction just happened to come true lol
This will be extremely useful for everything from forensics to TH-cam thumbnails. I can't wait for this to come out!
I feel this is a dream I've been having for years about to come true. I can only hope this will come out with a free limited version for those of us who crave creation but truly can't afford anything...
i hope so too!
You died a very long time ago old friend... Read Codex of the Celestial Dream 🙏
@@NakedSageAstrology Please be careful what you say to people regarding death. I'm struggling with depression and almost killed myself recently. It has the potential to create the opposite effect you're aiming for.
@@aiwkua it's ok human the robots will takeover soon
@@aiwkua stay strong!
When SD first came out, I went all in, first to just understand the concepts, then learn all the extras. Now I'm waiting. There have been so many advances and competition for what is "best" I don't want to burn myself out while the winner is decided. Or max out my hard drive with unnecessary tools and models.
I feel the same! There's just so much going on
it is exhausting
This is how JavaScript web developers feel when there's always a new trend
The fact that this was released and there isnt a single TH-cam video is wild.
transformers are one of the coolest technologies to have been invented recently - and to think thankfully google released the attention is all you need paper in 2017. Its quite a general architecture
Release date?
Sounds really cool. Looking forward to it's eventual release.
i can't wait to try it out!
I need this thing. This is beyond amazing.
wow... this is definitely would be my bucket list for testing...
can't wait for it to be released!
The proof is gonna be in the pudding (or not...)
This is what I've been waiting for! I think other image generators will start having problems soon.
Would you recommend me to buy RTX 4060 TI In open source AI applications? Or 3060? Because of the 12gb of vram❤
I don't understand why but you always manage to amaze me and make me think ‘as if something like this already exists it's so incredibly fascinating!’ your videos are always the best and I love it when you always show something new better than the old which always beats the old by a lot I LOVE THAT ABOUT YOUR VIDEOS! Youre my fav ai youtuber!
Wow, thank you!
❤ love the progress
I believe this is what the omni-modality of GPT-4o has to be. The ability to chat with a model that inherently can create images.
Now that we have this, it really makes me wonder why the GPT-4o we're getting is still nerfed.
Now that you mention it, I had totally forgotten about 4o's multimodal features. In theory, 4o should be able to do what this paper claims as well. It's strange that it still resorts to DALLE for images instead of using it's native image capabilities. Maybe they still need to figure out the guardrails
@@theAIsearch It seems to me a lot of big AI youtubers and influencers are forgetting this...
@@absolutedoruiyaaa4736it doesn’t have image gen. It has text, vision and speech embedded in the model, but not image gen.
Seems too good to be true
i will suggest my group to check out you im impressed
Amazing man , I think your way of explaining and enjoyable storyteller 🎉
that upscaling (or deblur) at 10:10 looks mindblowing, I wonder if that quality will be available for videos too in the next future
one ring to dominate them all!
😃
How you always stay updated 😂😂 , good job man , i apriciate you hard work ❤❤❤❤❤❤❤
Thanks!
That is impressive. Can it also outpaint an image? That would be great.
Holy #! Can't wait to get my hands on this once it's released.
Someday, when this kind of control is possible, we will enjoy being empowered. For now it’s just a wish list kind of like describing what systems we might use to get to Mars.
Hey there! I've really enjoyed most of the videos you've published on TH-cam. As I'm currently experimenting with AI locally, I've been frequently visiting your channel since the spring of this year. I want to express my gratitude for all your efforts and the excellent explanations of AI's capabilities. Honestly, I'm tired of the endless parade of chatty AI newsreaders hogging TH-cam. I'm increasingly drawn to faceless channels like yours that provide clear explanations of AI processes, along with examples and visuals that are easy to follow, unlike other channels that are just jumping on the AI bandwagon. Keep up the fantastic work and enjoy what you do! 🤘😉
Thank you so much!
Thank you for your excellent programs as always.
You're welcome!
thank you! any guess when this will come out?
There was actually something like this back in SD1.5 era. I've been trying to find it again, but I believe it had the word llama in it. It was injected into specific layers like LoRa and controlnet, but it was an LLM that could comprehend all these things in the image and let you command it using natural language instead of prompting.
I probably wouldn't use these as the quality looks poor compared to a well trained LoRa on Flux (esp the wu kong example which probably wasn't already in the base model like bill gates would be), and you have much more control over composition with something like latent coupling, but hopefully future models use these techniques
Oh interesting. Thanks for sharing!
Llava was the captioner
@@SearchingForSounds that's a vision llm , but this was an A1111 extension that was primarily an embedding into an AI art model, the llm was just controlling it. also long before we had multimodal llms
They are simply using a language model to call various tools like the IPAdapter for that ironman example or depth, canny, and open pose models.
An excellent work, but nothing extraordinary. You can do all of that already in A1111 or Comfy
No, it's an LLM that's specifically trained to respond with images based on input images, similar to how text-based LLMs like gpt4o respond with unique text based on your input. It's not a collection of different tools working together behind the scenes. It's an LLM that accepts both images and text as input, but unlike gpt4o which only outputs text, this one outputs images.
Doesn't show me an image of where is your current location, it sucks
???
open source devs need to step up, i'm tired of chinese research papers and stuff like this, but not an actual tool i can actually use for free and locally
so you want to be spoon fed? create something yourself instead of expecting people to spend tons of money and time to release something for free
Please raise the loudness of audio when you edit the video.
Nuts! Imagine the possibilities
Consider this your cheat sheet for applying the video's advice:
1. Explore OmniGen as a user-friendly alternative to traditional AI image editing tools.
2. Communicate your desired image edits to OmniGen using natural language prompts.
3. Experiment with OmniGen's ability to add objects, change colors, adjust poses, and generate depth maps.
4. Understand the power of OmniGen's unified architecture, combining a VAE, LLM, and diffusion model.
5. Leverage OmniGen's emergent abilities for multi-step editing and in-context learning.
6. Start with simple edits and progressively increase complexity to maximize your results with OmniGen.
7. Be aware of OmniGen's limitations, such as prompt sensitivity and occasional struggles with hands and fingers, and unfamiliar image types.
8. Stay informed about the evolving capabilities and future developments of OmniGen.
Very intriguing. Although given the current state of AI development and previous heartbreaks, I'm practicing a healthy dose of skepticism. Hopefully when we get some hands on experience with it, the model holds true to the article's claims.
there are as many outrageous yet still true claims as grifts in the AI space which 1 what makes the grifts believable and 2 what makes AI so special, the fact that apparent scams are real technology
it won't be long before you can specify anything. we already have video. next is to create a persona for companionship. it's always that direction.
JUSTICE FOR BOB
Thanks for your share! in personally mind i don't think this model will become dominator in image generator because it seems can't supply ability for our job in fine-grained like ControlNet or IPAdapter, particularly in consistent images workflows.
So when this ai becomes available will you make a video explaining how to use it?
definitely!
I just wanted to take a moment to express my gratitude for the amazing video you created on AI Your dedication and creativity truly shine through, and it’s clear how much effort you put into making such valuable content. I know how hard it is to start and generate subscriptions. I watched the full video, because it was interesting. Please keep us updated on this Generator. As a new content creator what would you say is the best image to video generator to start to use.? I use Kling but it takes way too long, to be honest i dont really find it great
Thank you again for your incredible work. Keep up the fantastic content!
Thanks! For image to video, Kling is your best bet. Other options include Luma (worse quality) and Runway (expensive af)
@@theAIsearch Thank you i will keep watching you a new subscriber
Shouldn't we be able to use an AI for ComfyUI? This way we should be able to use all the different tools with a single prompt even without OmniGen. The workflow json file could be AI generated as well and this should be no rocket science. Am I right? What do you think? I could start researching about this but maybe there already is something I don't know about.
cool idea. it sounds possible, since it's just json generation
Bro can you suggest any open source ai for captions generation for free..?
Release it an make it open without restrictions and I am hyped
looks like it understands the general physics/objects in the image and can construct the prompt then just mixing it up like stable diffusion, some martial arts or gymnastic pics would be nice. Strange body positions or interactions are a huge problem with the image creators from beginning the year. Or just human object interactions, like a female woodworker climbs a tree with a belt full of working tools in the early morning hours, watched by a curious crow sitting on a nearby branch.
The worst things about all of these are that there is not a button to click and purchase and then the bar to write the script. It is a hell to find out how to use them or the APIs or discords or whatever. I want to use Flux1 and this to see what they compare and I just cannot....
This is the similar methods used by udio and suno, they learn from reference audio plus text description
would this model be "better "if it had a bigger parameter count???, what of they increase the number of images which it was trained on and problems as the hands or fingers, even the text are resolved???
yes, increasing the dataset would help a lot. the dataset they used was 10 times smaller than what SD used. imagine how good this would be if it was trained on more data
@@theAIsearch do you inow of the small parameter count would help it to run consumer level hardware and old GPUs?
Wait I was looking at mechanical turk jobs a little bit ago, and one of the jobs was to do what this image bot does - identify things like 'where can i wash my hands' or 'what can hold water'. Interesting.
I look forward to conversational AI merging with these models so we can just chat back and forth with the generators and tweak our images together into a final image. I'm sad the conversational Pi chatbot was abandoned.
Hopefully OpenAI and Apple release their models to the masses.
Pi just got a huge upgrade and he is faster and smarter than ever 😊😊
Have you tried using this app? Because to me it looks like a lot of these examples are fakes.
what's your daddy do for living? he's professional design graphic....
teacher: (silent and smile with burrowing her eyebrow)
Need this in comfyui asap...
a trully intelligent AI image generator :3
How to use it? What's the link?
this might be peak ai image generating
I can say or fact that currnt AI doesn't suck.
yes but probably the model was already trained with images of famous people / movie characters so it is no "magic without lora" as advertised in video I guess
OmniGen's ability to simplify complex image generation is a huge step forward. Seeing it in action shows just how intuitive AI tools can become for creative professionals!
it would be great if it generates directly PSDs with separate layers. A Designer then can work on the details, make changes easily. Adobe, are u listening?
Now this is a game changer!!
Мы давно ждём такую штуку. + сеть должна понимать направление и положение в пространстве.
*not free tho and 10 to 20 bux for an average user is untenable* 😅
How do we get it?
This means I can start teaching my daughter how to use AI after the release of the 'OmniGen'😀😇
I wonder if it will be possible to run it on a consumer level GPU
Well the 5000 series is going to launch this year so I mean, probably on such a beast the model should ruin
Twitter disinformation just levelled up
Anybody knows of a model that would let us upload a pic of a person smiling and take the smile away so they're not smiling anymore, everything else remaining the same?
Face App for android / iOS
Awsome dude, but Q: how the hell did you find this repo??? it's not really famous or anything???
I don't really remember - probably on X. There's so much crazy news everyday it's hard to keep up
So ChatGPT but images.
basically
it's Chat SD!
As long as it is unrestricted.
looking forward to this, if the model actually does come out...free
looks cool nice
Thanks
Since the software is being developed in China, it will probably be heavily used there for CCP matters, as they already do with AI software.
sorry, but your question about what you can do with the software is a trigger these days. All this software is cool, but unfortunately it also has its dark sides. In addition, for data protection reasons, these tools only become interesting when you can use them offline, without being an IT guy. :)
This is fire
😃
If adobe finds out about this one, they're getting shut down, it's adobe's motto after all, end the competition by buying them
"What can be used to hold water" is a philosophical question. Technically, the woman already holds water or can hold it by drinking it. Plants contain water so they're holding it. The entire room could hold water.
I'm pretty sure this is Adobe's vision for Photoshop Ai
But they are so far off being this capable it's laughable. This new Ai is going to make so many peoples tasks so much easier
When can j use it ugh
would be interesting to see how it handles video generation.... asking it to generate the next frame, do this 1'000 times to get a 40 second video 🙂
Unfortunately many papers just papers and we have nothing to use years after it
Mark my words, people will use this to ask the AI to strip naked people by providing a photo.
Every day we get 1 step closer to the Holodeck. 😁
Yeah I just hope the final Holodeck wont be hampered by "ethic" and "moral" standards. It's the perfect playground for humanitys darker aspects, allowing us to separate it into a virtual space where it does no real harm.
lemme guess this is equal to llama 3 405b model in file size :D
Wonderful
I cannot sign up.
In future we can program our babes
that's what i'm waiting for
Cool video
why is it so recent?
If this model hits the net before the election, which it will, it'll have an effect.
are you thinking of deepfakes?
@@theAIsearch I just know what an out of context Facebook post was able to do and the Bill Gates photo scares me.
@@theAIsearch I would love to see the falsehoods of this paper dived into as others have pointed out the Bill Gates image is from 2018 and can be found on Alamy.
"Shanghai, China. 5th Nov, 2018. Bill Gates, chairman of the board of Terra Power, LLC, and Jack Ma, executive chairman of Alibaba Group, talk ahead of the Parallel Session on Trade and Innovation of the Hongqiao International Economic and Trade Forum in Shanghai, east China, Nov. 5, 2018. Credit: Shen Bohan/Xinhua/Alamy Live News"
How to use tool Its Just PDF guide
When do y’all think we will have LEV, FDVR, AGI, ASI, maybe UBI and FALC
AGI within 3 years. UBI depends on the stupid govt
AGI Level
Seems too good to be true. Wouldn't be the first time a chinese paper claimed it can do things it can't