DiT: The Secret Sauce of OpenAI's Sora & Stable Diffusion 3

bycloud

มุมมอง 61 672

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ธ.ค. 2024

ความคิดเห็น • 154

@bycloudAI 8 หลายเดือนก่อน ⁺¹¹
Don't miss out on these exciting upgrades designed to elevate your content creation experience with DomoAI! Go try out: discord.gg/sPEqFUTn7n
@TheBigLou13 8 หลายเดือนก่อน
Is DomoAI selfhosted or do you have to rely/trust a third party with your data?
@doritime5229 8 หลายเดือนก่อน
its not free I can not try anything
@amortalbeing 8 หลายเดือนก่อน ⁺⁴⁹
5:43, OpenAI itself said that the more compute they threw at sora, the better the results got, so you are right in that the compute here does absolutely matter!
@carkawalakhatulistiwa 8 หลายเดือนก่อน ⁺²
More and more GPU
@hitmusicworldwide 8 หลายเดือนก่อน ⁺¹
So far what a waste of compute
@Words-. 8 หลายเดือนก่อน ⁺²
@@hitmusicworldwide How so?
@ADMNtek 8 หลายเดือนก่อน
pretty sure Jen-Hsun started salivating when he heard that.
@hitmusicworldwide 7 หลายเดือนก่อน
@@Words-. Marvel Universe's Avengers End Game used $80.00 per GPU farms (compared to 30k per H100 ) with Autodesk Maya . The results of their efforts not only employed a small towns worth of people, it cost 400 million dollars to produce including talent salaries and has earned 2.7 BILLION dollars. Now THAT is a productive use of compute. The resulting Maya GPU generated video looks 2.7 billion times better as well. Sora uses 4,200-10,500 Nvidia H100 GPU's and produces this crap? Create a python script that selects video clips and runs Maya routines instead. Do the math.
@ovum 8 หลายเดือนก่อน ⁺¹³⁶
The hell with the Frieren thumbnail lmao
@Chris-iu6ws 8 หลายเดือนก่อน ⁺⁴
Yeah not cool
@the_gobbo 8 หลายเดือนก่อน ⁺⁵³
It is absolute perfection obviously, peak thumbnail right there
@anywallsocket 8 หลายเดือนก่อน ⁺¹⁴
I’m a simple man 😂
@capitandonculo 8 หลายเดือนก่อน ⁺¹
honestly loved the thumnail, may be the only reason i clicked, (wtf is DiT?) and it was a great fucking video
@alexmehler6765 8 หลายเดือนก่อน
report him for misleading thumbnail
@imerence6290 8 หลายเดือนก่อน ⁺¹⁵⁸
Are you telling me Stable diffusion had ADHD 💀
@Happ1ness 8 หลายเดือนก่อน ⁺⁶
[Insert "always has been" meme here]
@PhotoBomber 8 หลายเดือนก่อน
Lol brilliant
@Dogo.R 8 หลายเดือนก่อน ⁺¹
Yes just like in humans it seems to mostly be an education and enviroment problem.
@adesa756 7 หลายเดือนก่อน
@mehulagarwal858 8 หลายเดือนก่อน ⁺³²
Hey, great video as always. Would love to see a deep dive video into the Stable Diffusion 3 architecture and other DIT methods!
@cybertruck2008 8 หลายเดือนก่อน ⁺¹⁶⁰
Those facebook users are probably bots, because we are reaching the dead internet theory
@TerraGlide 8 หลายเดือนก่อน ⁺¹⁸
Does that mean you’re a bot too?
@Titangrille 8 หลายเดือนก่อน ⁺¹⁹
my words exactly. Best I can do is hope we are not there already. Hello stranger from another part of the world with the same interest as me.
@H0mework 8 หลายเดือนก่อน ⁺⁹
@@TitangrilleHELLO HUMAN. YES I agree with your opinion, beep beep.
@ianglenn2821 8 หลายเดือนก่อน
yep, the irony of "those are my words exactly" haha there's only one person on the internet, and its us
@zeroking981 8 หลายเดือนก่อน ⁺¹
@@TerraGlideor say something a bot cant say like some austrian painter
@adityajoshi287 8 หลายเดือนก่อน ⁺⁵
Just letting you know, Yes, I would like to see your video on DiffiT and HDiT architecture if you make one! Love your videos!!
@manuffls1756 8 หลายเดือนก่อน ⁺¹⁵
In the interview, the developers mentioned that Sora will still need quite a while until its release. In the subsequent interview, however, the chief technology officer stated that there might be expanded access this year, possibly even in the coming months. I believe this is yet another instance of the conflict between scientific caution and the realization that, if high prices are initially charged as announced, there can be significant earnings.
@mktwos 8 หลายเดือนก่อน ⁺²
i hope sora will never be released to the public, the consequences would be disastrous.
@manuffls1756 8 หลายเดือนก่อน ⁺⁷
@@mktwos Basically you're right, I would like to play with it, but preferably without anyone else being able to do it haha Being on TH-cam and knowing that in a year you will probably be searching every remotely exotic video for evidence of AI is not pleasant. On the other hand, let's be honest: even if Sora waits another year, other companies will have published corresponding algorithms by then... so for Open AI it's once again not so much a question of whether you should do it, but whether you should want to control this development by being the first
@incription 8 หลายเดือนก่อน
we will likely get open source equivalent of sora next year at the latest. and by then openai will have an even better model@@mktwos
@autohmae 8 หลายเดือนก่อน
@@mktwos very big maybe, with an embedded watermark in each frame which isn't visible to normal people (thus using a form of steganography)
@Vivaerti 8 หลายเดือนก่อน
Corporations such as OpenAI care for the money and only for the money, they could care less about the small guys (Artists), and they care more for the potential profit they could rake in from having Sora go public once it's finished.
Remember: To them, we are only consumers, and they don't care for what we say.
@gemstone7818 8 หลายเดือนก่อน ⁺¹⁴
yeah it would be nice to hear more about DiT architectures
@rakly347 8 หลายเดือนก่อน ⁺¹⁴
"Can you even tell which one is the real and fake image?"
- Uh, yes, it was pretty obvious. I didn't even had to 'nitpick' about it, I could tell pretty much instantly.
I completely disagree with the notion we are high up on the curve. If you actually work with AI, not just use MidJourney. I'm talking about Krita and text completion models (for example to make Agents'), you will see there is still much progress day over day.
I do generate a couple thousand of images each and every day (work related). So I do have quite some experience from seeing so many of them.
I haven't even seen an upscaler yet that satisfies my requirements. The only way to get better upscales right now is more VRAM on a single device. Which isn't practical.
My current workflow is pretty convoluted. I have multiple GPU's all rendering different image layers, each with their own focus (like attention). I have to use multiple GPU's, one for each layer to make it viable. Rendering each layer one after another on a single GPU would not be practical. - So instead of using a GPU with tons of VRAM or use a very complex workflow with multiple GPU's, I'm still seeing great progress being made in efficiency. for example SD Cascade.
I think your focus on what progress is, is too narrow. Don't just look at what is possible, you have to take in to account how much progress there is being made in what is possible with limited resources. Which isn't just the amount of GPU's either. Also the amount of, and quality of, the required input.
@slackstation 8 หลายเดือนก่อน ⁺³
Definitely want to see an indepth look at DiT
@jghifiversveiws8729 8 หลายเดือนก่อน ⁺³
I definitely would like for you to explore the other DiTs
@CosmicAnew 8 หลายเดือนก่อน ⁺⁶
I can still tell the difference between fake and real images but that's because I study photos and paint them. Something about the noise and the lighting is really off in the first image, most people wouldn't light something slightly green since it's not a natural type of light. I think if you don't have a background with visual art or something similar, it's probably really difficult to tell real and fake images apart. (0:14)
@Anderson-f4t6c 8 หลายเดือนก่อน ⁺⁸
Frieren thumbnail, nice
@vedforeal7835 8 หลายเดือนก่อน ⁺¹⁵
The only vid someone needs to understand current ai situation
@NikoKun 8 หลายเดือนก่อน ⁺²²
The "Sigmoid Curve" is FAR too misleading a representation to actually give people any reliable indication of where we are, unless their goal is a skeptical narrative. It's impossible to tell where we are on the curve, or what the curve truly looks like at this time, as it's significantly different with every advance, and how we arbitrarily classify advances also changes the appearance of the curves. Such representations can ONLY be applied retrospectively, when analyzing the past, they have no value for reliable prediction. Maybe the curve has multiple slow spots along it, or maybe multiple sigmoid curves chain together.
Somewhere I saw an explanation, that overlayed a whole bunch of sigmoid curves related to recent technological advances, and when you average out all their overlaps, you end up with the same exponential singularity curve, guys like Kurzweil predicted. Tho I can't seem to find that.
@technolus5742 8 หลายเดือนก่อน ⁺¹
exactly, a few years ago progress was slow, this guy could have said the exact same thing and be completely wrong 🤦‍♂
@chrisfleitas615 8 หลายเดือนก่อน ⁺⁵
Domo AI can get expensive very fast. Went through a month's tokens from the basic subscription in an hour transforming my client's 30 sec ad into an anime. The results were good.
@claxvii177th6 8 หลายเดือนก่อน ⁺⁷
My exact reaction when started usong if ai in comfyui
@cdkw8254 8 หลายเดือนก่อน ⁺³
Love how the ai took over even sponsors
@MemeMultiverseGo 7 หลายเดือนก่อน
Venturing into the world of storytelling and creative videos, VideoGPT becomes the invisible hand that refines my content, making it resonate with a professional vibe.
@Kuchenrolle 8 หลายเดือนก่อน
+1 for the follow-up video on the more technical details of DiT.
@DrW1ne 8 หลายเดือนก่อน
You reached the pick of chill in this video. I like the vibe.
@kronux3831 6 หลายเดือนก่อน
To me, the biggest hurdles to overcome with image generation are character consistency (which is exactly what it sounds like) and object transfer. (The ability to select a specific object in one image, such as a shirt, and have it be included as part of the resultant generation.) AI image generation doesn’t need to look perfect, and I’m not to sure how much of a return companies will get over marginally increasing quality, when solving the issues I described above would lead to greater direct application. One immediate possibility would be how useful these advances would be for creating ai generated comics or animations. If I were a betting man, I’d say these two concepts are what most AI image generation companies are working on right now.
@TimeLordRaps 8 หลายเดือนก่อน ⁺²
The only thing they added was space time relation is such an understatement.
@the_gobbo 8 หลายเดือนก่อน ⁺²
THE THUMBNAIL THO LOL i lov it
@SliceOfFish 8 หลายเดือนก่อน ⁺³
Ability to generate more complex scenes is cool but I don't see much difference between SD3 and SDXL in terms of image quality.
@2034-SWE 8 หลายเดือนก่อน ⁺²
Did you see the MIT paper that just published? Using "Distribution Matching Distillation" (DMD), 30x faster image generation is achieved vs. Stable Diffusion, and at the same/higher quality of image. How's that for near the top of sigmoid curve 😉
@OMGLittleB 8 หลายเดือนก่อน
yep it's pretty awesome
@lio1234234 8 หลายเดือนก่อน
I'd love to have a video from you on those architectures!
@nikroth 8 หลายเดือนก่อน ⁺¹
Transformers is a very good movie. You can't go wrong to watch it with someone (except for some movie snobs). Transformers and chill is the new 10/10.
@deltamico 8 หลายเดือนก่อน
If only someone came up with a proven optimal way of using transformers for something and called it optimus
@nikroth 8 หลายเดือนก่อน
@@deltamico HEHEHEH :D
@noobicorn_gamer 8 หลายเดือนก่อน ⁺¹
It's much harder to tell what's fireship and what's bycloud on recommended feed these days than AI image progress :D
@lukasgruber1280 8 หลายเดือนก่อน ⁺⁴
lower pic was Pulp Fiction so the other one had to be fake
@brainstormsurge154 8 หลายเดือนก่อน
Just watched your video on Mamba. After watching this it makes me wonder about the Mamba model being used more and more for it's precision. Things like pixel art or voxel style need a lot more precision than regular diffusion or other image/video generation has. At least with what I've seen.
Although part of that is that how people make that is by giving themselves limiters such as only drawing on a bitmap within certain parameters or with programs that automates those parameters. That won't be as much of a limiting factor if AI is now getting access to a command line which means it's only a matter of time before the AI has access to a program where the parameters it works with are constrained and narrowed to get the results people want.
@7satsu 8 หลายเดือนก่อน
GPT-6 -> AGI - Supercomputers at consumer level → Quantum Computing -> ASI
@draken5379 8 หลายเดือนก่อน ⁺¹
If you like DomoAI, you can find the open source communities that they take all their workflows from, and learn to do it yourself for free :)
@MilesBellas 6 หลายเดือนก่อน ⁺¹
yes....
dive into DiT, Diff it and HDit, CorrDiff, eDiff-I
etc...
😊
@Malorianarms999 8 หลายเดือนก่อน
Banger
@jeffg4686 7 หลายเดือนก่อน
@4:06 - the other ones don't follow your prompt because they don't want you to compete with them - the others are heavily funded by corporations - all the wealthy are aligned and they don't like competition...
@l.halawani 8 หลายเดือนก่อน
I've been waiting for this video!
@Neonagi 8 หลายเดือนก่อน
The only thing you could really say is we are at the top of the 'current' sigmoid curve, until it's broken by yet another sigmoid curve. Using sigmoid curves doesn't work for predicting the future, they're only useful for looking at past processes.
@oneinazillion 8 หลายเดือนก่อน
Just because companies can afford lots of compute does not mean that they have a commercially viable/environmentally sustainable product. These are great "experiments" for sure and kudos to the amazing work being done by these scientists and engineers. But to me, these are still experimental.
I would call an AI product successful when I can run it on my phone's compute or something like an OS that can greatly augment general purpose tasks without ever having to connect to a cloud subscription.
@DaKussh 4 หลายเดือนก่อน
It's funny because the results have been arithmetically inversed but the VRAM requirements for the most basic stuff has been increasing between 25% and 50% between each iteration.
@dengyun846 8 หลายเดือนก่อน
I wish you would go into more detail on the actual mechanisms for those who can follow it.
@AlyphRat 8 หลายเดือนก่อน ⁺¹
Am I crazy, or is the editing oddly similar to Fireship?
@10x_discovery 8 หลายเดือนก่อน
Brilliant style. All the best insh'Allah :)
@waterbot 8 หลายเดือนก่อน ⁺¹
DiffiT video WHEN?
@levimccallum9006 8 หลายเดือนก่อน
Can you explain the DiT usage in Pixart sigma?
@AmirHamzah_MAHBAR 8 หลายเดือนก่อน
What is that game in @5:07?
@c0d3_m0nk3y 8 หลายเดือนก่อน
ClearConnect VR , according to Bing Copilot.
@canyoupleaserunfast 8 หลายเดือนก่อน ⁺¹
I'd like to know more about diffIT and hdit ^ _ ^
@jollyamvxgifts379 8 หลายเดือนก่อน
background music?
@jatiquep5543 8 หลายเดือนก่อน ⁺¹
Is this fireship second channel
@disguisedpuppy 8 หลายเดือนก่อน ⁺³
I am starting to get confused between these channels
@yudi8204 8 หลายเดือนก่อน ⁺¹
Fireship but 8 times longer
@aleanscm9350 8 หลายเดือนก่อน
The most probable cause is that we are limiting the growth of ai, at least we are starving it with novelty
@manutebol956 8 หลายเดือนก่อน
pulp fiction reference nice
@rje4242 8 หลายเดือนก่อน
Ringo and Honey Bunny are the real image. Not shown: wallet with "bad mother fucker" stamped on it.
@hitmusicworldwide 8 หลายเดือนก่อน ⁺¹
It's all fun n games until you realize that a Sora 2 min 720 p blurry and unintentional artifact filled video requires 720k H100 GPU's @ $30k each whilst Avengers Endgame using Autodesk Maya generates 4k masterpieces with GPU's that you can buy for $80.00 on eBay and 3d animators that work for less than 3 H100's. And Avengers Endgame generated 2.8 BILLION dollars from a $356 million production cost and paid a lot of human's grocery bills.
@christophkogler6220 5 หลายเดือนก่อน
that sounds like the training setup, not inference
@frederikcalsius5014 7 หลายเดือนก่อน
Please do DiffIT and HDiT videos!
@user-up4wj9vi3w 8 หลายเดือนก่อน
its finally over for artists
@Vivaerti 8 หลายเดือนก่อน
It isn't really, ai art still has some issues with anatomy and I doubt it will take over as the main form of art.
Though that isn't as noticeable now if you use NovelAI's latest model which can generate hands that are accurate and consistent.
@user-up4wj9vi3w 8 หลายเดือนก่อน
@@Vivaerti obvoiusly ai art can be spotted, but that doesn't stop corpos from firing some if not all of their artists and have the remaining one work with ai
@Vivaerti 8 หลายเดือนก่อน
@@user-up4wj9vi3w Corpos are corpos, they care for the money and only money.
It may be possible to spot the signs of AI art now, but slowly but surely it will be harder to tell as image generation improves over time.
@cdkw8254 8 หลายเดือนก่อน
I love how some millionaire guy was like let's just throw money and it and it actually works
@JazevoAudiosurf 8 หลายเดือนก่อน
how about MoE DiT SSM b1.58 transformers
@Miss0Demon 8 หลายเดือนก่อน ⁺⁴
Oh boy I can’t wait for easy to make AI blackmail material and mass unemployment!
@tommysalami420 8 หลายเดือนก่อน
lmao I livestreamed teaching the chatbots to use stable diffusion XD
@JotaroKujoJoJo-qx2yc 8 หลายเดือนก่อน
Why does this video type looks like fireship's video.... Did firship stole the idea(or inspired) from this channel... Or this channel stole it(or inspired) from fireship????
@WhhhhhhjuuuuuH 8 หลายเดือนก่อน
"attention is that we need" i see what you did there 😅
@Aurelloyell 8 หลายเดือนก่อน
this one is a hot one
@MilesBellas 6 หลายเดือนก่อน
Stability AI must be saved.
@unlomtrash 3 หลายเดือนก่อน
Not anymore. We have black forest
@TheLiverX 8 หลายเดือนก่อน
Finally
@asterlofts1565 8 หลายเดือนก่อน
Their secret, I think, is that they are open source... because PEOPLE THEMSELVES MODIFY THIS AND IMPROVE IT AS THEY WANT.
@CUBETechie 8 หลายเดือนก่อน ⁺³
I predict some r34 content
@jasonhemphill8525 8 หลายเดือนก่อน ⁺¹
Nooooo. On my good christian diffusion model? No way.
@SumitRana-life314 8 หลายเดือนก่อน
Man i picked the One below and got stumped whent he above one was fake. I swead these are getting so good that someday you can do it with both images as real and I would still not get it.
@nonetrix3066 8 หลายเดือนก่อน
I think we are far from perfecting image generation even with SD3, it still struggles with background details and hands to a lesser extent
@southcoastinventors6583 8 หลายเดือนก่อน
Can't really say that about a base model they showing it off on reddit 3 days ago it is amazing and once it fine-tuned with Juggernaut or Dreamshaper it will be amazing especially with controlnet and inpainting
@Vivaerti 8 หลายเดือนก่อน
I don't know, take a look at Niji journey V6 or NovelAI's latest model, they can do some neat backgrounds and hands.
They still mess up the hands but nowhere as much as before.
@jonmichaelgalindo 8 หลายเดือนก่อน
I'm worried SD3 will never release though.
@southcoastinventors6583 8 หลายเดือนก่อน
They were showing it off 3 days ago as beta build in reddit where they were showing of doing peoples prompts still in early beta said maybe released in a month. Most likely the last open source image generator from Stability due to Emad not paying Amazon on time for using their clusters.
@Vivaerti 8 หลายเดือนก่อน
It will, it seems to still be in the beta phase at the moment, but once it's released, I know a certain community that is gonna go crazy for it and no I'm not talking about safe stuff.
If you know, you know.
@spencernorman2626 8 หลายเดือนก่อน
The thumbnail......😂
@patriciogarcia5442 8 หลายเดือนก่อน
Dive into DiffiT HDiT plz 🙏🏼
@tommysalami420 8 หลายเดือนก่อน
I actually helped figure this out :3
@omkarjamdar4076 8 หลายเดือนก่อน
Can I still participate in the giveaway coz I need a PC and all I have is a i5 7thGen laptop
@comic--sans 8 หลายเดือนก่อน ⁺⁵
why do you only have on screen subtitles in certain parts of your videos? it kinda defeats the purpose of subtitles in my opinion.
@rayhere7925 8 หลายเดือนก่อน
It's ok. There, there now. Shhh... Go back to sleep.
@dannyyyXYZ 8 หลายเดือนก่อน
Ai generated subtitles as well
@comic--sans 8 หลายเดือนก่อน
@@rayhere7925 I see this everywhere and it drives me insane. people realize you can make ai subtitles using real captions instead of on screen ones right? youtubers seem to care more about viewer retention than accessibility.
@phobosmoon4643 8 หลายเดือนก่อน
i dont think its the sigmoid curve of ai image generation its the sigmoid curve of agi
@FileNotFound404 8 หลายเดือนก่อน
AI is genuinely getting to the point where it isn't even fun anymore. the only reason I ever found it interesting was that I thought it wouldn't really go anywhere, but now I see it so often, and I'm constantly trying to figure out if something is "AI".
Like I just want it to stop now before it starts taking jobs from people. Especially considering that, basically every field I'm talented in is being attacked by these corporations looking to give even less money to the ones who deserve it.
@Neonagi 8 หลายเดือนก่อน
It's like wagon makers and horse maintainers lamenting the creation of the motor vehicle. As all things, not everyone wins with a new technology, and we are forced to adapt and overcome.
@Vivaerti 8 หลายเดือนก่อน
I still find it fun to try and see how many concepts it can do accurately or poses, but it is annoying shifting through thousands of ai posts on Pixiv.
@MortyMortyMorty 8 หลายเดือนก่อน ⁺¹
Stolen Fireship thumbnails to farm views! Clever little kid!
@morphidevtalk 8 หลายเดือนก่อน
WHITE THEME DISCORD WTF
@falsechord 8 หลายเดือนก่อน
ai art devs should focus on composition not quality. at this point the only composition method is control net which works great with humans but...what about everything else buildings, landscapes, dragons, eldritch creatures.
@BinaryDood 8 หลายเดือนก่อน
sketch it yourself first... it's literally just shapes
@falsechord 8 หลายเดือนก่อน
@@BinaryDood there are things like that have the same shape like a basket ball and a soccer ball. if i draw 2 circles in 2 different locations the ai wont be able to tell which type of ball to place in which circle.
this is just a basic example, a fox and a wolf is another.
@BinaryDood 8 หลายเดือนก่อน
@@falsechord idk, read "Picture this" by Molly Bang and see what you can come up with
@MrTurbo_ 8 หลายเดือนก่อน ⁺¹
I'm worried a 4090 is not gonna be enough for stable diffusion 3 anymore lol. And knowing nvidia they probably won't be releasing a GPU with more than 24GB of ram any time soon
@Vivaerti 8 หลายเดือนก่อน
Eh, sites that allow image generation will sort this out like they did when this technology was first released.
@googleyoutubechannel8554 6 หลายเดือนก่อน ⁺¹
Makes a whole video about 'DiT', never actually defines the term... 🙄
@bgNinjashows 8 หลายเดือนก่อน ⁺¹
Why does He talk like his eating a hot potato
@HorseyWorsey 8 หลายเดือนก่อน
lol
@seriousOmajan 8 หลายเดือนก่อน
TBH I'm dumbfounded how much you rely on low res gifs with text on top while talking about image/video generation. It seems that it is basically useless to you or you have no idea how to pivot it to your workflow.
@deltamico 8 หลายเดือนก่อน
By using those gifs he culturaly bonds with the viewer. But I agree incorporating generation would signify more experiance with given subject
@Iog 8 หลายเดือนก่อน
Humans will just get more lazier, let's be real. It's like how the anime industry pushed towards CGI use, in a way.
@evernam993m8 7 หลายเดือนก่อน
Work smart, not work hard😂
@Rundik 8 หลายเดือนก่อน
Stealing other's thumbnail styles is bad

ต่อไป

เล่นอัตโนมัติ

The Unreasonable Effectiveness of Prompt "Engineering"