Offset Noise: Midjourney Dethroned

koiboi

มุมมอง 31 265

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 พ.ย. 2024

ความคิดเห็น • 142

@CrossLabsAI ปีที่แล้ว ⁺⁴⁹
We were about to make a video about Nicholas' findings. After seeing your brilliant explanation, we don't have much to add! Thank you so much for sharing our research!
@lewingtonn ปีที่แล้ว ⁺¹³
holy shit, that's such a high compliment! Keep up the fantastic work guys! We're all counting on you!!! 🤜
@shishbasupalit5955 ปีที่แล้ว ⁺¹
Very clever and elegant piece of work - makes you wonder how many other low hanging fruits are waiting to be plucked.
@zbylo26 ปีที่แล้ว ⁺⁶
wow, in hindsight seems so obvious. It only shows how that we're in early days and there's huge things just waiting to be discovered in the diffusion models. Love your work. Z.
@n8mo ปีที่แล้ว ⁺¹³
Really nice to find a Stable Diffusion channel that actually understands how the underlying technology works! Subscribed. So many of the bigger channels just do how-tos and make "breaking news" videos about new features without taking the time to understand *why* these things work.
Great explanation about the issue caused by SD's noising function. I understood the issue perfectly despite not having any ML experience (though, I am a software developer, so that helps lol)
@RikkTheGaijin ปีที่แล้ว ⁺⁷⁷
This dude is singlehandedly wiping the floor with pretty much every other channel about SD. You are a godsent mydude. Lets' watch that subscribe count going to the moon.
@lewingtonn ปีที่แล้ว ⁺²
hahahaha that's such a nice thing to say! This guy is the actual king though: www.youtube.com/@outlier4052 he published like, a literal image generation paper and his videos are still really engaging
@afrosymphony8207 ปีที่แล้ว ⁺³
why must we resort to snarky comparisons?
@lewingtonn ปีที่แล้ว
@@afrosymphony8207 😢
@RikkTheGaijin ปีที่แล้ว ⁺³
@@afrosymphony8207 your parents don't love you.
@afrosymphony8207 ปีที่แล้ว
@@RikkTheGaijin i turned out alright because my community influenced a good moral system upon me. You on the otherhand, received all the love in the world and just look at what a miserable twat you became.
@vvvemn ปีที่แล้ว ⁺²⁷
Really appreciate all the effort that goes into research and how well you explain it along the visualizations. Keep up the great work!
@msampson3d ปีที่แล้ว ⁺¹⁰
god I love this channel. What a great, straightforward explanation on the topic! I don't really know anyone else going this in depth on these stable diffusion topics on TH-cam. Greatly appreciated!
@juanchogarzonmiranda ปีที่แล้ว ⁺¹
The best SD channel , THKs Koi (Smiling:1.1)
@Originista ปีที่แล้ว ⁺³
You are like teaching how and what paintbrushes are made of, the characteristics of canvases and all the fundamental stuff an artist should know to make good art. Thanks for the insights and your efforts to make clear some obscure subjects.
@MrAwesomeTheAwesome ปีที่แล้ว ⁺³
Your videos were already good to begin with and they're getting better. Great information, good explanations. Appreciate the good work! Cheers! :D
@muerrilla ปีที่แล้ว ⁺²
wow I just redid my last dreambooth model using offset noise, and I'm blown away by the difference it makes in general quality!
@MrBlitzpunk ปีที่แล้ว ⁺⁶
People always says "this is the midjourney killer/this could replace midjourney"
As if stable diffusion and it's custom models aren't already beating it for a quite while now
@nemonomen3340 ปีที่แล้ว ⁺²
Got to be the most informative video on SD I’ve watched in months. I actually understand how it works a little better now; Excellent content!
@natsuschiffer8316 ปีที่แล้ว ⁺⁴
Hoping for another leak soon to learn more secrets like this.
@SaadAhmed3000 ปีที่แล้ว
what an amazingly simple description of a fourier transform
@OutsidersLaptop ปีที่แล้ว
Sweet. The other day I was trying to generate a scene set at night, and no matter how much I fine tuned the positive prompt to emphasize dark (and used the negative to omit light-related concepts), I kept getting outputs with really bright skies. Night-associated hues, but day-like intensities.
Intentional or not, this video answered a lot of my questions regarding SD's trouble in this specific case. These deeper dive videos are much appreciated. Keep the goods coming!
@ethansmith7608 ปีที่แล้ว ⁺⁴
Thank you for the explanation! I imagine the best hack is to randomize a seperate offset on each image, but the more faithful method I’ve seen is to use cosine noise schedule and fully destroy the image, which for some reason SD, uses a smaller standard deviation of 0.12 instead of 0.2 in previous works
@RichardShift ปีที่แล้ว ⁺²
.. and you made this video with all that content. That alone is a lot!! Thank you.
@lewingtonn ปีที่แล้ว ⁺³
hahah to be fair I haven't posted in like 8 years so I had time
@KainSpero ปีที่แล้ว ⁺²
Another wonderful video with Amazing explanations!!! So awesome see how far Cacoe has come.
@lewingtonn ปีที่แล้ว ⁺¹
dude cacoe is a beast! he's a MACHINE
@lewingtonn ปีที่แล้ว ⁺¹
an AI even!!
@Beyondarmonia ปีที่แล้ว ⁺²
Great video. Perfect balance of technical and everyday. Keep going. Subscribed.
@judahgamermedia ปีที่แล้ว ⁺¹
The one that Ive noticed and would like to see is these models doing more to improve symmetry and anatomy of odjects. As an artist these are things that we are taught and take years of practice to get right. And perhaps wants they get is down right, the need for artist in this reguard would minimize but you have to remember that this is a tool that is for people who havent mastered these aspect of art yet.
@chickenp7038 ปีที่แล้ว ⁺¹
there’s nothing more beautiful then a simple fix
@xn4pl ปีที่แล้ว ⁺¹³
They could've used simplex or perlin noise with a number of harmonics for training to properly represent low and high frequency noise.
@lewingtonn ปีที่แล้ว ⁺⁶
both of those things sound super interesting, but my dude the first person to do that will get published so what are you waiting for?
@xn4pl ปีที่แล้ว ⁺⁶
@@lewingtonn yeah, i'm not a computer science major (or even minor) and my programming knowledge is limited to writing text based tictactoe in python so science paper is not really my place to plug my thoughts on the matter. Also it wouldn't work if you just plug different noise formula and hope for the best, I guess more feasible approach would be to split the training image into different frequency bands using bandpass filter (like you showed in the video) and then apply noise of the same harmonic to each of them (preferably distribute noise weights like in a saw wave, lower harmonics having more noise and higher less (i might be wrong but i think it's called pink noise)) and then pass all of them (or their sum) to latent diffusion for training, if this training works it might more accurately represent high and low level details and use multiharmonic noise like perlin or simplex. It would probably take a double major in machine learning and digital signal processing to even test this idea or understand that I overcomplicated things and it can be done more easily by some handy signal processing math trick.
@Originista ปีที่แล้ว ⁺²
@@xn4pl I don't know if you are overcomplicating it, but I do guess that you are onto something, and that pretty cool things could come from your experimentations even if you are not a double major. What matters is not titles but experience, testing, failure and repeat. Just a thought.
@lucretius1111 ปีที่แล้ว ⁺¹
Brilliant analysis. Another awesome vid!
@mcarthcart414 ปีที่แล้ว
Great explanation! The principal kind of sounds like HDR photography. Combining multiple photos of different exposures to get a high dynamic range. The results can be stunning.
@afrosymphony8207 ปีที่แล้ว ⁺¹
midjourney still has the edge, maybe that can change with next version of illuminate
@TheGalacticIndian ปีที่แล้ว
I like this guy! Clearly Midjourney's style is so distinctive, like a single, well-known artist, that you can immediately distinguish which is which. While SD with all its quirks is much more diverse and versatile, like thousands of artists working behind the scenes. And that's a huge advantage of SD (plus you can run it locally).
@xmorse ปีที่แล้ว
You explained noising so well, thank you for your awesome videos!
@ajudicator ปีที่แล้ว ⁺⁴
Midjourney is far ahead (in part) because they add random tokens that are hidden to prompts based on RLHF (user feedback and ratings)
So if you can rate aesthetics of a model based on an reinforced approach to the weights on the prompts then this helps
@worthstream ปีที่แล้ว ⁺¹
LAION is collecting data on aesthetics score for generated images (and associated prompts), so let's hope this approach will be available to the open source community soon, too.
@KyleandPrieteni ปีที่แล้ว ⁺¹
Misjourney has already met its doom since it started thanks to SD being open-sourced. It was eventually going to meet its match with the growing community of model trainers and people discovering many awesome functions and tricks.
@chrislloyd1734 ปีที่แล้ว ⁺¹
It makes good sense! Well done with the explanation.
@DavidSilverman-darktoad ปีที่แล้ว ⁺¹
Very cool. I hope stability ai sees this and updates these noising functions for the 3.0 mode training that’s coming up
@lewingtonn ปีที่แล้ว
they 100% know about this already, dw mate
@lewingtonn ปีที่แล้ว ⁺²
BEHOLD 1.1 is here: civitai.com/models/11193/illuminati-diffusion-v11 praise be to cacoe!!!
@badradish2116 ปีที่แล้ว
all of your videos are the best video ive seen on that topic. all of them.
@zzzzzzz8473 ปีที่แล้ว ⁺²
great video overview thanks ! yea i think there is so much more experimentation we can do with diffusion , like implementing all the augmentations of styleganADA and more . i wonder if we think of and apply as many linear transformations as possible that would influence how the model has to learn about it , for example chromatic aberration , emboss , edge sharpening would likely influence the model to utilize a kind of convolution process to undo that type of augmentation , which could be a useful tool for the model in reconstructing an image as well .
@lewingtonn ปีที่แล้ว
yeah like, there's a super high likelihood that that would improve results imo... someone is gonna publish a paper where they train a model to do the noising at some point...
@inxomnyaa ปีที่แล้ว ⁺²
i thought the video is on 2.5x speed when i looked at the webcam.
@shishbasupalit5955 ปีที่แล้ว
The title and the thumbnail of the video made me think this was an art video, so I started it, then took a long break to read the original post, went down a rabbit hole back to '90s research on power spectra, and finally came back to the video to realise it was not an art video after all, but rather an amazing technical analysis of the work.
I do want to point out though it's not really that the noise doesn't affect low frequency features - white noise has a flat power spectrum and affects features of all frequencies equally. What's happening is that "natural" images have a bias that low frequency features have more power, so a constant amount of noise affects the signal to noise ratio of high frequency features more.
@kirbulich ปีที่แล้ว ⁺⁸
What is happening? I was reading this topic about a hour ago o_O It seems like we starting to enter the singuliarity phase more and more every second.
@2PeteShakur ปีที่แล้ว
good or bad?
@kirbulich ปีที่แล้ว
@@2PeteShakur idk? My life connected with computers, this is always good for me.
@pipinstallyp ปีที่แล้ว ⁺⁴
More like butterfly effect, it's been in works for two weeks almost. Many smart people are working on a lot of things. Stuff is exciting though. :) I like the idea of Singularity for sure. With AI happenings it's wild, butterfly flapped it's wing named GANs and Transformers and here we are.
@lewingtonn ปีที่แล้ว ⁺⁹
nah, I hacked into your pc like a few months back.
you need to spend less time on onlyfans mate
@kirbulich ปีที่แล้ว ⁺¹
@@lewingtonn go chat in my notepad++ I have alot of question.
@tomsolidPM ปีที่แล้ว
Great video and explanation! Kudos 🙌
@silvermushroom-gamifyevery6430 ปีที่แล้ว ⁺¹
Sir, you are the 3blue1brown of AI Art.
@Roughneck7712 ปีที่แล้ว ⁺¹
Anyone who has been training their own custom models, textual inversions, and LoRAs knows that Midjouney doesn’t hold a candle to A1111 and Kohya
@Mimeniia ปีที่แล้ว
Awe. Kaapstad in the house.
Great explanation bro!
@devnull_ ปีที่แล้ว ⁺²
Thanks! Very interesting! But have to admit, I have been way more worried about missing arms and wonky shapes and details in general :D
@devnull_ ปีที่แล้ว
Have to admit, maybe I didn't listen carefully enough, but is this more about dynamic range of the resulting image or more about preserving/learning finer details? Seems like the article you show talks about "generate very dark or light images" / guess I'll have to read the whole article :D
@XxRazienxX ปีที่แล้ว
If you learn to paint you can add those yourself.
@lewingtonn ปีที่แล้ว ⁺¹
@@devnull_ yeah, the article is very sick, you should definitely read it
@Tarbard ปีที่แล้ว
Great explanation. There's a lora called epi_noiseoffset which does this.
@autonomousreviews2521 ปีที่แล้ว ⁺¹
This was a joy to watch :)
@michaelli7000 ปีที่แล้ว
good technical and fun staff
@swannschilling474 ปีที่แล้ว ⁺¹
Sweet!!
@hermancharlesserrano1489 ปีที่แล้ว
Noice! Surely we want noise controls for the user? Sometimes you want composition, sometimes detail…
@asdion ปีที่แล้ว ⁺⁵
It's wonderful to see how profit stunts technological development once again.
@mattweger437 ปีที่แล้ว
So simple yet so insane
@NerdyRodent ปีที่แล้ว
Awesome video 😉
@steves5476 ปีที่แล้ว
Should try adding noise in fourier transform space instead of pixel space to tackle every frequency overall.
@Smiithrz ปีที่แล้ว ⁺²
Super interesting stuff man. I use both SD and MJ, and always wishing SD was on par with MJ, particularly for “art”. SD is incredible for realism, but definitely lagging behind MJ on the artistic front; hopefully this means there’s light at the end of the tunnel for SD and art?
@KyleandPrieteni ปีที่แล้ว
MJ is based on stable diffusion it's just really well-trained because they had a team of people and the resources to make it good.
I mean I have been able to make better art with SD than MJ, MJ gets repetitive since I can tell by its style. SD is open source though, there is a ton of models based on SD that are really good on the artistic front, you just have to know each model's way of handling prompts. Civit AI is where I get all the models and I train my own models too.
@Smiithrz ปีที่แล้ว ⁺¹
@@KyleandPrieteni Thanks Kyle, I'm aware of all of that. I just use the term "MJ" as a quick way of saying "The SD model that MJ uses", and "SD" as "all other SD models", lol. I actually did some work for Civitai recently :)
@pipinstallyp 9 หลายเดือนก่อน
man it's been a wild west out there an year later.
@CMak3r ปีที่แล้ว ⁺⁶
Is it affecting only training process, or generation of images on pretrained model will also benefit from this solution? Can I copypaste his code into my local Automatic1111 SD code to get better generations?
@lewingtonn ปีที่แล้ว ⁺⁴
offset noise effects the training process only. It just makes training more effective by forcing the model to think more about brightness
@xn4pl ปีที่แล้ว ⁺²
@@lewingtonn isn't txt2img uses noise that also averages near 0.5 which makes the model (original) try to match it in the generated image, so by just offsetting it we can control how bright or dark the final generation will be? At least it's the way I understood it. If the model learns that no matter what final image should have 0.5 mean than it's a learning issue, but if it was trained on different brightness images which average all over the place, so it should generate proper images from offset noise from the get go.
@VKTRUNG ปีที่แล้ว ⁺²
@@xn4pl I'm under the same impression. If the model try to match the mean brightness of the input noise, wouldn't it be possible to control the brightness of the result by controlling the mean brightness of the input noise (by using some noise function that has controllable noise mean brightness)?
@CreativePunk5555 ปีที่แล้ว
We're seeing this a bit more often now - the Midjourney killers. But these platforms only help Midjourney, competition is good business and will push MJ to improve and not get comfortable. Also, until the actual release and some time being spent by millions of users, we won't really know how great this all might be. Leonardo is something that is getting a lot of action recently, but will it stand the test of time? Everything new will always be hot for a moment - but when the dust settles, that will be the true indicator.
@jaredgreen2363 ปีที่แล้ว
Have you tried noising the discrete cosine transform? That way you obscure all frequencies at the same rate.
@drdca8263 ปีที่แล้ว
Very nice! But, when I try to think mathematically about *why* the high frequency components would be washed out first,
even though it makes sense intuitively that that would be the case, it isn’t clear to me how to justify that conclusion mathematically?
I guess the thing to do would be to describe the distribution over Fourier transforms of Gaussian noise, but at first blush, I see no reason why the variance for higher frequency components would be larger than for lower frequency components.
Like, say that we have functions f : (Z/pZ) -> R
(Or maybe C , but whatever)
for some prime number p,
if we do a discrete Fourier transform of that, then, the components/coefficients (other than the frequency zero component) will be the dot products with the different p-th roots of unity, in different orders (possibly multiplied by some constant normalization factor).
But, because these have the same terms, just in different orders (this is true because I picked the number of entries to be prime, and multiplication mod p by things other than zero, is invertible.)
then, for independent identically distributed random values for the components of the function, these Fourier components should also be identically distributed.
Now, maybe this could be just because I picked Z/pZ instead of Z/nZ ,
but still, I would think that for most natural numbers less than n, for a typical n, they should have no factors in common with- ok actually that’s not true...
Half of the natural numbers n are even, and half of the natural numbers less than n (for large n) are even, and so at least half the time, at least half of the numbers less than n will have a factor in common with n and therefore have no inverse...
But, would that really be responsible for the low frequency components being influenced less by noise?!
Would we really expect that if the training only used images where both the width and height of the image are a prime number of pixels, that this phenomenon would go away?! (Except for the frequency zero component)
That doesn’t sound like it should be true. That would be pretty bizarre, I think?
Maybe the thing is just that there are many more frequencies (among integer multiples of the base frequency) that we would consider to be “high frequency” than that we would consider “low frequency”, and so most of the variance ends up in what we would consider “high frequency”?
If you know the answer to my confusion, even if you are reading this comment multiple years after I wrote it, please reply to let me know the answer (provided that no one else has already sufficiently explained it, but that kinda goes without saying I guess.) .
@FunwithBlender ปีที่แล้ว
Always Koi, nice vid :)
@JRGeoffrion ปีที่แล้ว
Question: From the graphs in the video, as I understand it, the model assumes somewhat of a symmetrical normal distribution around the noise with set black and white points (0, 255). These assumptions (black/white points and normal distributions) don't hold true for most images - even more so for images that are extra-ordinary. Is there a way to train / denoise while accounting for these additional factors?
@VozerLamTruyen ปีที่แล้ว
We know this for years in denoising. High frequency will go first
@Uhor ปีที่แล้ว ⁺¹
♥
@jonatan01i ปีที่แล้ว
14:35 exactly what I was thinking about.
@jonatan01i ปีที่แล้ว
Now, very high freq and very low freq info is distroyed fast.
We need to do that in all frequencies.
So, not only destroying some blobs, but blobs of any size.
@alecubudulecu ปีที่แล้ว
fun fact... this is how photography and our eyes work. around 50-70% of the world around you is... grey. what you see... is for the most part... grey.
@khirondb ปีที่แล้ว
Youre hecking sick aswell fam 😊
@papus9163 ปีที่แล้ว
thnx for the update
@hungdinh2193 ปีที่แล้ว
great explain. can you make video about controlnet. thanks
@hardkur ปีที่แล้ว ⁺¹
i wish there was a way to implement Dalle mini text encoder into stable diffusion
@lewingtonn ปีที่แล้ว ⁺¹
dude, duuuuuude, I guarantee that's what SD3.0 will do, because it's the most glairing issue with SD right now
@hardkur ปีที่แล้ว ⁺¹
@@lewingtonn Correct , with Dynamic Thresholding (CFG Scale Fix) Clip skip and deliberate model im getting quality better than MJ but i never can get prompts to behave the way i want :/
@jonatan01i ปีที่แล้ว
maybe change the line not to
1*old_noise + 0.1*new_noise
but to
a*old_noise + (1-a)*new_noise ; that might help not to make the overall noise being too much
@cc12yt ปีที่แล้ว
This also kinda works as an anti "stable-diffusion-detector"
@StygianStyle ปีที่แล้ว ⁺¹
I'm new to SD. So we need to alter the code of SD then train our own models to approach the quality of MidJourney?
@lewingtonn ปีที่แล้ว
no sir, simply use the newer, better models. Luckily we have nerds to do all the hard work for us
@the_jingo ปีที่แล้ว ⁺³
so how do you use these offset thing in SD?
@lewingtonn ปีที่แล้ว
you download a model trained using it
@michaelli7000 ปีที่แล้ว
i have a question why the mj images have more like an identical art style like some digital painting with high contrast value, while sd's style seems more diverse? thank you
@philabusterr ปีที่แล้ว
Sorry if I missed this in the video, but does SD 1.5 have this built in now? Or is it forthcoming? Or is there something I have to turn on?
@omegablast2002 ปีที่แล้ว ⁺²
is the 1.1 version not available anywhere?
@HB-kl5ik ปีที่แล้ว ⁺¹
Just released on civitai 🙂
@alicapwn ปีที่แล้ว ⁺¹
The discovery is great but doesn't explain midjourney quality and coherence. Only data does.
@JoelRehra ปีที่แล้ว ⁺¹
I Think you put the wrong llink to cacoe`s server in ya description... just links to the example images...
@lewingtonn ปีที่แล้ว
holy shit thanks! I updated it now
@jakejakejakejakejakejake ปีที่แล้ว
Great channel, bump up your audio levels though! ^_^ X
@IgorNV ปีที่แล้ว
Open source wins yet again. Technology belongs to the people!
@AaronMayzes ปีที่แล้ว
I don't know anything about anything when it comes to this, but how would it change if the noise were added to the color represented as HSV or CMYK or something, rather than RBG?
@SuperSigma69 ปีที่แล้ว
If it's Open source, midjourney will just use it
@vincentcarlucci1259 ปีที่แล้ว
I am a little confused. You say that the Illuminate images are better, or at least as good as, MIdjourney but I find them too de-saturated. is this because you are only focusing on the values and expect Illuminate to correct this problem in the future?
Okay, I had a look at the images in the Discord and they do appear better, some are quite saturated. However, in low light images the colors still do seem to desaturate. This does mirror what would happen with your eye since color sensitivity peaks in bright light and declines in lower light. In very dim light you are in effect color blind. There is a question here, though, about whether this is desirable. The dim images with higher saturation from Midjourney are, to me, more appealing.
@lewingtonn ปีที่แล้ว
Ok, look that's fair, obviously it's hard to articulate opinions about aesthetics, but to me some of the illuminati images certainly look better, and for me that's really exciting because it means we're at least close to parity
@TheCopernicus1 ปีที่แล้ว ⁺¹
Mateeeeeeee
@lewingtonn ปีที่แล้ว
maaaaaaaaaaaaaaaaaaaaaaaaaaate, yeah apparently I'm not dead hahaha
@Zoltar0 ปีที่แล้ว
Open source all the way!
@chiveerum ปีที่แล้ว
Pardon a noob: if the models were trained on regenerating source-images that don't have a 0,5 mean necessarily from noise (With a 0,5 mean), why would it learn to target a 0,5 mean?
@amj2048 ปีที่แล้ว ⁺¹
That is very cool, thanks for sharing! Maths wins again lol
@CodyCha ปีที่แล้ว
Much improved but doesn’t come close to Midjouney in terms of the aesthetic and creativity
@LouisGedo ปีที่แล้ว
👋
@LeonvanBokhorst ปีที่แล้ว
Parody 😂🎉
@alexs1681 ปีที่แล้ว
So did i get it right? You even haven't tested that model, 1.1 right? Great
@lewingtonn ปีที่แล้ว
nah, cacoe being a legend gave me an early copy
@Joviex ปีที่แล้ว ⁺²
Those are not even close to parity. MJ still has the better composition.
@lewingtonn ปีที่แล้ว ⁺²
Citation needed! I don't know how you can say that with such certainty. Publish a SICK comparison proving your point go go
@CodyCha ปีที่แล้ว
@@lewingtonn if you have an eye for design, it's not even a debate. Midjourney is on a different level. Just grab a pro designer/artist and do a blind test.
@spiffingbooks2903 ปีที่แล้ว
Where has this gone over the last month, the illuminati 1.1 model on civitai seems no longer to be available. Has it been integrated into all new models now?

ต่อไป

เล่นอัตโนมัติ