Flow Matching for Generative Modeling (Paper Explained)

Yannic Kilcher

มุมมอง 37 755

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 7 เม.ย. 2024
Flow matching is a more general method than diffusion and serves as the basis for models like Stable Diffusion 3.
Paper: arxiv.org/abs/2210.02747
Abstract:
We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.
Authors: Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le
Links:
Homepage: ykilcher.com
Merch: ykilcher.com/merch
TH-cam: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
LinkedIn: / ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 77

@zerotwo7319 2 หลายเดือนก่อน ⁺⁸⁴
A Jedi himself is teaching us about generative AI. I couldn't be more grateful.
@amedeobiolatti216 2 หลายเดือนก่อน ⁺⁴
These are not the papers you are looking for 🖐
@MilesBellas 2 หลายเดือนก่อน ⁺³
Hu-Po = Yoda ?😅
@blacksages 2 หลายเดือนก่อน ⁺²³
Man, I have a presentation to do on this paper in a few days, but I've been stuck on it, you just make it so much clearer thank youuu!
All the step by step and reminders you put in your video are so helpful, I've been through Y. Lipman presentation and he just glosses over these things because they're too obvious, but you don't and I'm so grateful!
@jonatan01i 2 หลายเดือนก่อน ⁺⁴²
"that's a dog in a hat. I'm very very sorry"
@guillaumevermeillesanchezm2427 2 หลายเดือนก่อน ⁺¹⁵
I see a video from Yannic on a Monday, I click like.
@ArkyonVeil 2 หลายเดือนก่อน ⁺³
Thank you for the indepth analysis. I personally only have passing interest in the content of these videos but I find listening to them a relaxing experience. And as a bonus, learn something useful every now and then. Cheers
@xplained6486 2 หลายเดือนก่อน ⁺³
Insane video yannic, your explanation was superb! Keep up the great work
@sergiomanuel2206 หลายเดือนก่อน
Thank you so much Yannic!! Amazing explanation for such a complicated topic!!!!
@Kram1032 2 หลายเดือนก่อน ⁺⁵
very cool stuff.
Interesting how, in the optimal flow version, the shape (in their examples) does indeed get matched sooner, but initially it looks kinda small, and only then reaches its full size.
I guess that amounts to them hitting the shape sooner than the distribution is even able to spread out in full whereas in the original diffusion process, you'd first do the spreading out, and only *then* hone in on the result
@diga4696 2 หลายเดือนก่อน ⁺⁴
Best birthday gift ever!
@YannicKilcher 2 หลายเดือนก่อน ⁺⁶
Happy birthday
@loukasa 2 หลายเดือนก่อน ⁺¹
Great explanation Yannic
@PRAKASHFEB 9 วันที่ผ่านมา
Thanks for simple explanation
@JTMoustache 2 หลายเดือนก่อน ⁺⁶
Damn.. that data probability path formalism is awesome.
@simonstrandgaard5503 2 หลายเดือนก่อน ⁺¹
Great explanations.
@user-xk6rg7nh8y 2 หลายเดือนก่อน ⁺²
awesome so interesting !! it is really helpful :)) Thanks !!
@kev2582 หลายเดือนก่อน
Great walkthrough as always. This paper shines with its abstraction/generalization and mathematical rigor. What is missing is qualitative difference between the diffusion probability versus the OT approaches. Since this paper aged a bit, it would be interesting to look up where the authors are now. My hunch is straight line path finding will be worse qualitatively with image generation compared to diffusion models.
@ljh0412 2 หลายเดือนก่อน
I was waiting for this. Thank you Yannic. Hopely you also check a paper Bespoke solver, which is implemented to speed up flow matching in AudioBox from meta.
@sebastianp4023 หลายเดือนก่อน ⁺²
Question:
Do you have a video/opinion on gMLPs from the paper "Pay Attention to MLPs" Liu et al. 2021?
@OperationDarkside 2 หลายเดือนก่อน ⁺¹
Would a sand desert dune and wind analog work to visualize the probability density flow and the vector field?
The grains of sand are the probability in one point, the dunes are the distribution density in 2D and the wind is the vector field.
@novantha1 2 หลายเดือนก่อน ⁺¹²
Wait, so the essence of this paper is that we can define a source "Gaussian distribution" and translate that into a target Gaussian distribution based on a learned vector field which indicates a direction of flow, essentially.
Notably, this is...Maybe not a deterministic process, but certainly is a finite one, in contrast to traditional diffusion denoising.
But...How do we...Encode images in our dataset as a Gaussian distribution? How do we get the source distribution? Is it just noise "tokenized" as a Gaussian distribution? Is it a constant? Is it conditional on the prompt, like a latent LLM embedding (this last one would be wild, I would imagine it would be more effective for the LLM embedding to condition the target distribution but I digress).
I feel like I do understand the process here, but I have no idea how I'd go about implementing this.
@drdca8263 2 หลายเดือนก่อน ⁺⁶
I believe the images are the points in R^d , not the distributions. For each point in the training set (each image in the training set), I think they associate a probability distribution which is a Gaussian with that image as is mean, and a very small standard deviation ,
So, like, the distribution associated with a particular training image is “this starting image, plus a very small amount of noise”.
@u2b83 2 หลายเดือนก่อน ⁺⁶
@@drdca8263 Karpathy made an offhand remark a few years ago that for high-dimension points (R^d) you effectively can recover the exact point just by knowing the distribution.
The "concentration of measure" phenomenon suggests that in high-dimensional spaces, points tend to be closer to the surface of a hypersphere than to its center. This implies that for a given distribution, many points will have similar distances from the mean of the distribution, making the space effectively "smaller" in some intuitive sense than one might expect. This phenomenon can sometimes allow for predictions or reconstructions of data points based on less information than would be necessary in lower dimensions.
@CalebCranney หลายเดือนก่อน ⁺¹
Here's a video that I thought did an excellent job explaining the concept of normalizing flow from the coding perspective: th-cam.com/video/yxVcnuRrKqQ/w-d-xo.html. Then this one has some code that matches the diagrams in the Hans video: th-cam.com/video/bu9WZ0RFG0U/w-d-xo.html. I just spent a number of hours trying to grasp the concept of flow, and these were what made it start to click for me.
@timothy-ul9wp 2 หลายเดือนก่อน ⁺¹
I wonder how “straight” the flow matching path during inference actually is, as the model doesn’t actually have information from previous steps
I assume path will always point to the mean of all choices of x_0? (in Eq 23)
@TheRohr หลายเดือนก่อน
Thanks for the video! Two open questions: (1) We still need lots of data to get a good estimate of the probability distribution, right? How much should we expect and how should the dataset look like? Which is related to (2) What is actually meant with a data point or sample here? I understand for diffusion we have an image that becomes noisy. But what would be the 2-d gaussian for an RGB-image? Or is a sample here something different than an image?
@DeepThinker193 2 หลายเดือนก่อน ⁺¹³
Omg he's wearing a hoodie. Is he hacking?
@timeTegus 2 หลายเดือนก่อน ⁺⁷
yes
@IsraelMendoza-OOOOOO 2 หลายเดือนก่อน ⁺²
God Bless You brother ❤
@LouisChiaki หลายเดือนก่อน
Hmm... What is their choice of simga_min? Is the end conclusion simply that we should down scale the noise by (1 - sigma_min)?
@fireinthehole2272 2 หลายเดือนก่อน ⁺¹
Hi Kilcher, could you do "ReFT: Representation Finetuning for Language Models" it's really interesting.
@kaikapioka9711 2 หลายเดือนก่อน
Thx bud!
@TiagoTiagoT หลายเดือนก่อน
Are they basically using the butterfly effect to disturb a standardized gaussian distribution into the desired result?
@herp_derpingson 4 วันที่ผ่านมา
20:44 if I am understanding it correctly, the u_t can be just stored during the noising process. Assuming we are using a flow based noising algorithm. The paper doesnt seem to do that, but it can be done quite easily.
.
28:29 p_t(x) after marginalization should be thought as if we threw a bunch of points at the screen and let them flow, where would they settle? v_t(x) after marginalization can be thought as the net flow at a particular point at the screen? It is hard to intuit what they are doing.
.
36:16 psi_t(x) is the predicted phi_t(x) by the model?
.
39:31 whenever I see a wall of equations like this my bs sensors start tingling.
.
I was unable to build an intuition of what is happening in the paper and how it helps generate images. In normal diffusion, the pixel colors change appears out of nowhere. In this, they are supposed to "flow" and move into their correct places? Thats all I understood.
@punkdigerati 2 หลายเดือนก่อน ⁺¹
Like Atz and Jewel Kilcher?
@SofieSimp หลายเดือนก่อน
Do you have a record for your Stable Diffusion 3 presentation?
@mikaellindhe หลายเดือนก่อน
"Hey why don't you just go toward the target" seems like a reasonable optimization
@eriglac 27 วันที่ผ่านมา
i’d like to join the saturday discussions. where do i find that info?
@SouravMazumdar-ki7vv หลายเดือนก่อน
Can someone say which approch is begin discussed in 5:20
@SouravMazumdar-ki7vv หลายเดือนก่อน
Can someone say which approach is begin discussed here 5:20
@vangos154 หลายเดือนก่อน ⁺¹
One of the disadvantages of flow-based models is they require reversible layers, and thus they limit the DNN architectures that can be used. Isn't that a problem anymore?
@xandermasotto7541 หลายเดือนก่อน
continuous normalizing flows are always invertable. It's just integrating an ODE forwards vs backwards
@jabowery 2 หลายเดือนก่อน ⁺⁴
UNCLE TED!!!
@abhimanyu30hans 2 หลายเดือนก่อน
For some reason I get "Unable to accept invite" from your discord invite link.
@ScottzPlaylists 2 หลายเดือนก่อน ⁺¹⁰
@YannicKilcher
What Hardware / Software are you using❓
It seems to be a tablet and pen, but the details would be interesting..
Would it be a Good Video on "Hot to Yannic a paper"❓ 😄 I'd watch it..
Keep up the quality content❗
@Python_Scott 2 หลายเดือนก่อน ⁺⁶
👍 I wondered the same... Make the video please. Or just answer here.
@AGIBreakout 2 หลายเดือนก่อน ⁺⁶
👍I'd watch that, and Thumbs it UP 👍 an odd number of times ❗
@NWONewsGod 2 หลายเดือนก่อน ⁺⁵
Me Too!!!!!!!
@NWONewsGod 2 หลายเดือนก่อน ⁺⁴
@@AGIBreakout Ha, Ha.... "odd number of times" would work too..!!
@NWONewsGod 2 หลายเดือนก่อน ⁺³
@@Python_Scott Something different and useful ! Yes, count me in. ☺
@andylo8149 2 หลายเดือนก่อน ⁺¹
Given that flow matching is completely deterministic I don't see how it is a generalisation of diffusion models. Sure, the (deterministic) probability flow induced by a diffusion model is deterministic and is a special kind of flow matching but the training objective of a diffusion model is inherently stochastic.
I think diffusion models and flow matching are different classes of models.
@gooblepls3985 20 วันที่ผ่านมา
The stochasticity lives in the p(x0) of the expectation used as the loss: x0 is in the general case a randomly drawn sample from a tractable prior such as a Gaussian, just as in the diffusion literature (though the diffusion literature likes to call the data point x0, so the terminology is reversed there).
@Blooper1980 หลายเดือนก่อน
I wish I could understand this.
@Peter.Wirdemo หลายเดือนก่อน ⁺¹
Sorry but too many formulas in that paper ;P
Anyway, I kind of lost track in the beginning what was going on, it started out nice with images and suddenly all was about points flowing.
All going through my mind was ”What points are you talking about? Pixels?”
Haha, guess I will have to watch this again when the state of my mind is more up for it :D
@ButtBandit9000 2 หลายเดือนก่อน
@nevokrien95 หลายเดือนก่อน ⁺¹
Israel mentioned
@mullachv 2 หลายเดือนก่อน
Can't be over prepared for the solar eclipse
@tornikeonoprishvili6361 หลายเดือนก่อน ⁺¹
Damn the paper is math-dense. Watching this I feel like I'm being dragged along by a professional sprinter that I just can't keep up with.
@JohnViguerie 2 หลายเดือนก่อน ⁺²
Very hand wavy
@ttul หลายเดือนก่อน
This one is going to take me several passes…
@MrNightLifeLover 2 หลายเดือนก่อน
Published in 2022? Looks like I missed something :/
@BooleanDisorder 2 หลายเดือนก่อน
Obvious Labrador Retriever! 01:33
@AndrewRafas 2 หลายเดือนก่อน
At 20:53 what you talk and what you mark in the paper do not match. v() is the vector field, and not the other way around.
@YannicKilcher 2 หลายเดือนก่อน
u() is the actual vector field, v() is the neural network learned vector field
@wolpumba4099 2 หลายเดือนก่อน ⁺⁷
*Abstract*
This video delves into the technical aspects of flow matching for generative models, contrasting it with traditional diffusion models. It explores the concept of morphing probability distributions from a source to a target, emphasizing the significance of conditional flows and the role of vector fields in guiding this transformation. The video delves into the mathematical underpinnings of flow matching, introducing key objects such as probability density paths and time-dependent vector fields. It demonstrates how these concepts are operationalized through the conditional flow matching objective, allowing for the training of neural networks to predict vector fields for data points. Finally, the video explores specific instances of flow matching, including its relationship to diffusion models and the advantages of the optimal transport path for efficient and robust sampling.
*Summary*
*Introduction to Flow Matching*
* 0:00 - Introduction to flow matching for generative models and its application in image generation, specifically text-to-image tasks.
* 1:06 - Comparison of flow matching with traditional diffusion-based models used in image generation.
* 2:29 - Explanation of the diffusion process as a multi-step process of image generation involving the gradual denoising of random noise to produce a target image.
* 5:46 - Introduction to flow matching as a generalization of the diffusion process, where the focus shifts from defining a fixed noising process to directly learning the morphing of a source distribution into a target distribution.
*Mathematical Framework*
* 6:04 - Illustration of morphing a simple Gaussian distribution into a data distribution, highlighting the challenge of the unknown target distribution and the use of Gaussian mixture models as an approximation.
* 10:52 - Introduction of the concept of a probability density path as a time-dependent function that defines the probability density at a given point in data space and time.
* 13:41 - Explanation of the time-dependent vector field, denoted as V, which determines the direction and speed of movement for each point in the data space to achieve the desired distribution transformation.
* 17:54 - Demonstration of how the flow, representing the path of each point along the vector field over time, is determined by the vector field and the initial starting point.
*Learning the Flow*
* 19:26 - Explanation of how the vector field is set to generate the probability density path by ensuring its flow satisfies a specific equation.
* 20:31 - Introduction of the concept of regressing the flow, which involves training a neural network to predict the vector field for each given position and time.
* 21:56 - Highlighting the ability to define probability density paths and vector fields in terms of individual samples, enabling the construction of conditional probability paths based on specific data points.
* 26:16 - Demonstration of how marginalizing over conditional vector fields, weighted appropriately, can yield a total vector field that guides the transformation of the entire source distribution to the target distribution.
*Conditional Flow Matching*
* 29:40 - Acknowledging the intractability of directly computing the marginal probability path and vector field, leading to the introduction of the conditional flow matching objective.
* 30:48 - Explanation of conditional flow matching, where flow matching is performed on individual samples by sampling a target data point and a corresponding source data point, and then regressing on the vector field associated with that specific sample path.
* 33:30 - Introduction of the choice to construct probability paths as a series of normal distributions, with time-dependent mean and standard deviation functions, allowing for interpolation between the source and target distributions.
*Optimal Transport and Diffusion Paths*
* 38:43 - Exploration of special instances of Gaussian conditional probability paths, including the recovery of the diffusion objective by selecting specific mean and standard deviation functions.
* 41:21 - Introduction of the optimal transport path, which involves a straight-line movement between the source and target samples, contrasting it with the curvy paths characteristic of diffusion models.
* 44:08 - Visual comparison of the vector fields and sampling trajectories for diffusion and optimal transport paths, highlighting the efficiency and robustness of the optimal transport approach.
*Conclusion*
* 46:48 - Recap of the key differences between flow matching and diffusion models, emphasizing the flexibility and efficiency of flow matching in learning probability distribution transformations.
* 47:56 - Reiteration of the process of using a learned vector field to move samples from the source distribution to the target distribution, achieving the desired transformation.
* 53:37 - Explanation of how the knowledge about the data set is incorporated into the vector field predictor, enabling it to guide the flow of the entire source distribution to the target distribution.
i used gemini 1.5 pro
Token count
12,628 / 1,048,576
@eitanporat9892 2 หลายเดือนก่อน ⁺²
I feel like this paper is a very convoluted and long-winded way of saying “move in straight lines” the mathematical part is obvious and not very interesting. Your explanation was great - I just dislike when people write math for the sake of writing math in ML paper.
@drdca8263 2 หลายเดือนก่อน
It seems to me like this kind of procedure should have many applications outside of images!... but I don’t know what?
So, specifically, this should be applicable for when we want to learn a way to sample from a particular (but unknown) probability distribution. So, “generative AI” type stuff, I guess.
Maybe quantizing like in language models might make this not as applicable to language models? Idk.
What about world-model stuff? Or like, learning a policy?
Hm, while that does involve selecting actions at random, those are often more discrete?
Though, I guess not always. If one is doing a continuous control task thing, then I guess sampling from a continuous family of possible actions, may be the thing to do.
Uh.
Hm, so, if you started with a uniform distribution over a continuous family of actions, and wanted to evolve it towards a good distribution given the current scenario?
Hm, no, I guess this probably isn’t especially applicable to that, because like, how do you obtain the samples from the target distribution?
There must be *something* other than image generation, that this applies straightforwardly to...
@robmacl7 2 หลายเดือนก่อน ⁺²
1: Probability path go Woom!
2: Waifus
3: profit
@drdca8263 2 หลายเดือนก่อน
Ugh, I wish “generating images of attractive women” wasn’t such a large fraction of the use of such models.
I don’t think it is good for the person doing the viewing.
Beetles and broken beer bottles, and all that.
@not_a_human_being หลายเดือนก่อน
Another attempts to sprinkle some "statistics and theory" on machine learning. This will fail.

ต่อไป

เล่นอัตโนมัติ