Evidence Lower Bound (ELBO) - CLEARLY EXPLAINED!

แชร์
ฝัง

ความคิดเห็น • 122

  • @AndreiMargeloiu
    @AndreiMargeloiu 3 ปีที่แล้ว +34

    Cristal clear explanation, the world needs more people like you!

  • @TheProblembaer2
    @TheProblembaer2 3 หลายเดือนก่อน +3

    Again, thank you. This is incredible well explained, the small steps and the explanation behind, pure gold.

  • @genericperson8238
    @genericperson8238 ปีที่แล้ว +8

    Absolutely beautiful. The explanation is so insanely well thought out and clear.

  • @sonny1552
    @sonny1552 6 หลายเดือนก่อน +1

    Best explanation ever! I found this video for my understanding of VAE at first, but I recently found that this is also directly related to diffusion models. Thanks for making this video.

  • @thatipelli1
    @thatipelli1 3 ปีที่แล้ว +3

    Thanks, your tutorial cleared my doubts!!

  • @danmathewsrobin5991
    @danmathewsrobin5991 3 ปีที่แล้ว +3

    Fantastic tutorial!! Hoping to see more similar content. Thank you

  • @9speedbird
    @9speedbird 6 หลายเดือนก่อน +1

    That was great, been going through paper after paper, all I needed was this! Thanks!

  • @bevandenizclgn9282
    @bevandenizclgn9282 3 หลายเดือนก่อน

    Best explanation I have found so far, thank u!

  • @vi5hnupradeep
    @vi5hnupradeep 2 ปีที่แล้ว +2

    Thankyou so much sir ! I'm glad that I found your video 💯

  • @brookestephenson4354
    @brookestephenson4354 3 ปีที่แล้ว +3

    Very clear explanation! Thank you very much!

    • @KapilSachdeva
      @KapilSachdeva  3 ปีที่แล้ว

      Thanks Brooke. Happy that you found it helpful!

  • @Aruuuq
    @Aruuuq 3 ปีที่แล้ว +3

    Amazing tutorial! Keep up the good work.

  • @FredS10xD
    @FredS10xD 4 หลายเดือนก่อน +1

    Thank you so much for this explanation :) Very clear and well explained. I wish you all the best

  • @alexfrangos2402
    @alexfrangos2402 ปีที่แล้ว +1

    Amazing explanation, thank you so much!

  • @chethankr3598
    @chethankr3598 7 หลายเดือนก่อน +1

    This is an awesome explaination. Thank you.

  • @chadsamuelson1808
    @chadsamuelson1808 2 ปีที่แล้ว +1

    Amazingly clear explanation!

  • @T_rex-te3us
    @T_rex-te3us 7 หลายเดือนก่อน +1

    Insane explanation Mr. Sachdeva! Thank you so much - I wish you all the best in life

  • @schrodingerac
    @schrodingerac 3 ปีที่แล้ว +3

    excellent presentation and explanation
    Thank you very much sir

  • @the_akhash
    @the_akhash ปีที่แล้ว +1

    Thanks for the explanation!

  • @ziangshi182
    @ziangshi182 ปีที่แล้ว +1

    Fantastic Explanation!

  • @AruneshKumarSinghPro
    @AruneshKumarSinghPro ปีที่แล้ว +2

    This one is masterpiece. Can you please put one video on Hierarchical Variational AutoEncoders when you have time. Looking forward to it.

  • @mahayat
    @mahayat 3 ปีที่แล้ว +3

    best and clear explanation!

  • @kappa12385
    @kappa12385 2 ปีที่แล้ว +1

    Kadak sikhaya bhau. Majha aa gaya.

  • @satadrudas3675
    @satadrudas3675 5 หลายเดือนก่อน +1

    Explaied very well. Thanks

  • @MrArtod
    @MrArtod 2 ปีที่แล้ว +1

    Best explanation, thx!

  • @ajwadakil6892
    @ajwadakil6892 ปีที่แล้ว +1

    Great Explanation. Can you tell me which books / articles that I may refer to for further and deeper reading regarding variational inferences, bayesian statistics and concepts related to in depth probability?

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว +2

      For Bayesian Statistics, I would recommend reading:
      Statistical Rethinking by Richard Mclearth [See this page for more information - xcelab.net/rm/]
      A good overview is this paper (Variational Inference: A Review for Statisticians by David M. Blei et al)
      arxiv.org/abs/1601.00670
      For Basic/Foundational Variational Inference, PRML is a good source
      www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
      There are many books and lecture notes on Probability theory. Pick any one.

  • @alfcnz
    @alfcnz 2 ปีที่แล้ว +2

    Thanks! 😍😍😍

  • @amaramouri9137
    @amaramouri9137 3 ปีที่แล้ว +2

    Very good explanation.

  • @UdemmyUdemmy
    @UdemmyUdemmy 11 หลายเดือนก่อน +1

    U are a legend!

  • @lihuil3115
    @lihuil3115 2 ปีที่แล้ว +2

    best explanation ever!

  • @abhinav9058
    @abhinav9058 2 ปีที่แล้ว +2

    Subscribed sir awesome tutorial
    Learning variantional auto encoder 😃

  • @kadrimufti4295
    @kadrimufti4295 หลายเดือนก่อน

    At the 4:45 mark, how did you expand the third term Expectation into its integral form in that way? How is it an "expectation with respect to z" when there is no z but only x?

  • @peterhall6656
    @peterhall6656 9 หลายเดือนก่อน +1

    Top drawer explanation.

  • @mmattb
    @mmattb ปีที่แล้ว +1

    One more question: at 10:11 I can see the right hand term looks like a KL divergence between the distributions, but I'm confused: what would you integrate over if you expanded that? In the KL formulation typically the top and bottom of the fraction are distributions over the same variable. Is it just an intuition to call this KL, or is it literally a KL divergence; if the latter, do you mind writing out the general formula for KL when the top and bottom are distributions over different variables (z|x vs z in this case)?

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      Z|X just means that you got the Z given X but it still remains the (conditional) distribution for Z. Hence your statement about using KL divergence over the same variable is still valid. Hope this makes sense.

    • @mmattb
      @mmattb ปีที่แล้ว +1

      @@KapilSachdeva ohhhh so both of them are defined over the same domain as Z. That makes sense. Thanks again.

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      🙏

  • @Pruthvikajaykumar
    @Pruthvikajaykumar ปีที่แล้ว +1

    Thank you so much

  • @mammamiachemale
    @mammamiachemale ปีที่แล้ว +1

    I love you, great!!!

  • @wolfgangpaier6208
    @wolfgangpaier6208 10 หลายเดือนก่อน +1

    Hi, I really appreciate your video tutorial because it’s super helpful and easy to understand. I only have one question left. At 10:27 you replaced the conditional distribution q(z|x) by q(z). Is this also true for Variational Auto-Encoders? Because for VAEs, if I understand right, q(z) is approximated by a neural network that predicts z from x. So I would expect that it’s a conditional distribution where z depends on x.

    • @KapilSachdeva
      @KapilSachdeva  10 หลายเดือนก่อน +1

      In the case of VAE it will always be conditional distribution. Your understanding is correct 🙏

    • @wolfgangpaier6208
      @wolfgangpaier6208 10 หลายเดือนก่อน

      @@KapilSachdeva ok. Thanks a lot for the fast response 🙏

  • @mikhaildoroshenko2169
    @mikhaildoroshenko2169 2 ปีที่แล้ว

    Can we choose the prior distribution of z in any way we want or do we have to estimate it somehow?

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว +1

      In Bayesian Statistics, choosing/selecting prior is one of the challenging aspects.
      The prior distribution can be chosen based on your domain knowledge (when you have small datasets) or estimated from the data itself (when your dataset is large).
      Method of "estimating" the prior from data is called "Empirical Bayes" (en.wikipedia.org/wiki/Empirical_Bayes_method)
      There are few modern research papers that try to "learn" prior as an additional step in VAE.

  • @HelloWorlds__JTS
    @HelloWorlds__JTS 8 หลายเดือนก่อน +1

    Great explanations! I do have one correction to suggest: At (6:41) you say D_KL is always non-negative; but this can only be true if q is chosen to bound p from above over enough of their overlap (... for the given example, i.e. reverse-KL).

    • @KapilSachdeva
      @KapilSachdeva  8 หลายเดือนก่อน

      🙏 Correct

    • @HelloWorlds__JTS
      @HelloWorlds__JTS 4 หลายเดือนก่อน

      @@KapilSachdeva I was wrong to make my earlier suggestion, because p and q are probabilities. I can give details if anyone requests it, but it's trivial to see using total variation distance or Jensen's inequality.

  • @anshumansinha5874
    @anshumansinha5874 ปีที่แล้ว

    Why is the first term reconstruction error? I mean we are getting back x from latent variable z; but reconstruction should it not be x-x' like initial x and final x from (x|z) ? Also, how to read that expression? Eq[log(p(x|z))] = \Int (q(x)*log(p(x|z)*dx) ; i.e we want to average out the function of random variable x with the weight parameter q(x); what does that mean in the sense of VAE?

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      > Why is the first term reconstruction error?
      One way to see the error in reconstruction is x - x' i.e. the difference or square of the difference. This is what you are familiar with. Another way to see it in terms of "likelihood". That type of objective function is called maximum likelihood estimation. Read on MLE to see what it is about if you are not familiar with it. In other words, what is have is another objective/loss function that you will maximize/minimize.
      That said, you can indeed replace the E[log p(x|z)] with the MSE. It is done in quite many implementations. In the VAE, tutorial I talk about it as well.
      > what does that mean in the sense of VAE?
      For that you will want to the VAE tutorial. In that I explain why we need to do this!. If not clear from that tutorial ask the question in the comments of that vide.

  • @sahhaf1234
    @sahhaf1234 7 หลายเดือนก่อน +1

    Good explanation. I can follow the algebra easily. The problem is this: what is known and what is not known in this formulation? In other words, @0:26, I think we try to find the posterior. But, do we know the prior? Do we know the likelihood? Or, is it that we do not know them but can sample them?

    • @KapilSachdeva
      @KapilSachdeva  7 หลายเดือนก่อน

      Good questions and you have mostly answered them yourself. Prior is what you assume. Likelihood function you need to know (or model). But the most difficult will be computing the normalizing constant. Most of the time computationally intractable

  • @riaarora3126
    @riaarora3126 ปีที่แล้ว +1

    Wow, clarity supremacy

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      🙏 😀 “clarity supremacy” …. Good luck with your learnings.

  • @wadewang574
    @wadewang574 ปีที่แล้ว +1

    At 4:40, how to see the third component is an expectation with respect to z instead of x ?

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      Because the KL divergence (which in turn is the expected value) is between p(z|x) and q(z|x).
      Now you need to have a good understanding of KL divergence and expected value to understand it.

  • @mmattb
    @mmattb ปีที่แล้ว +1

    Sorry to bother you again Kapil - is the integral at 5:05 supposed to have d(z|x) instead of dz? If not, I'm certainly confused haha.

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      No bother at all. Conceptually you can think of it like that but I have not seen/encountered differential portion of the integral using the conditional (the pipe) thing. So just a notation thing here. Your understanding is correct.

  • @Maciek17PL
    @Maciek17PL ปีที่แล้ว

    what is log p blue theta (x) at 5:40? is it a pdf or a single number?

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว +1

      it would be a density but if used for optimization you would get a scalar value for a given batch of samples

  • @anshumansinha5874
    @anshumansinha5874 ปีที่แล้ว

    So, we have to maximise the ELBO (@9:28), right? As that would make it go closer to the log likelihood of the original data.
    1. Will that mean we should find parameter 'phi' which increase the reconstruction error (as it is the first term)?
    2. And find 'phi' such that the second term gets minimised? Which would mean q_phi(z|x) should be as close as possible from the prior p(z) ?
    But don't we need to minimise the reconstruction error while not going far from the assumed prior p(z). How to get these inferences from the derived equation @9:28

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      We minimize the “negative” ELBO

  • @AI_Financier
    @AI_Financier ปีที่แล้ว +1

    it is a great one, would be greater if you could start with a simple numerical example

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      Interesting. Will think about it. 🙏

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 ปีที่แล้ว +3

    Just one question, at th-cam.com/video/IXsA5Rpp25w/w-d-xo.html, when you expanded log p(x), how did you know to use q(z | x) instead of simply q(x)? Thank you.

    • @KapilSachdeva
      @KapilSachdeva  3 ปีที่แล้ว +4

      We are after approximating the posterior p(z|x). We do this approximation using q, a distribution we know how to sample from and whose parameters we intend to find using optimization procedure. So the distribution q would be different from p but would still be about (or for) z|x. In other words, it is an "assumed" distribution for "z|x".
      The symbol/notation "E_q" .... (sorry can't write latex/typeset in the comments 😟) means that it is an expectation where the probability distribution is "q". Whatever is in the subscript of symbol E implies the probability distribution.
      Since in this entire tutorial q is a distribution of z given x ( i.e. z|x); the notations E_q and E_q(z|x) are same .....i.e. q and q(z|x) are same. This is why when it expanded it was q(z|x) and not q(x)
      Watch my video on Importance Sampling (starting portion at least where I clarify the Expectation notation & symbols). Here is the link to the video - th-cam.com/video/ivBtpzHcvpg/w-d-xo.html

    • @ericzhang4486
      @ericzhang4486 3 ปีที่แล้ว

      @@KapilSachdeva does that mean: the expectation of log p(x) don't depend on distribution q, since at the end E_q[ log p(x)] becomes to log p(x)?

    • @KapilSachdeva
      @KapilSachdeva  3 ปีที่แล้ว

      @@ericzhang4486 since log p(x) does not have any 'z' in it, log p(x) will be treated as constant when your sampling distribution when computing expectation is q(z) (or even q(z|x)). This is why the equation gets simplified by taking this constant out of the integral. Let me if know this helps you understand it.

    • @ericzhang4486
      @ericzhang4486 3 ปีที่แล้ว

      @@KapilSachdeva it makes perfectly sense. Thank you so much!

    • @ericzhang4486
      @ericzhang4486 3 ปีที่แล้ว

      I come to your video from the equation 1 in DALL-E paper (arxiv.org/pdf/2102.12092.pdf). If it's possible, could you give me a little enlightenment on how elbo is derived in that case? Feel free to leave, if you don't have time. Thank you!

  • @RAP4EVERMRC96
    @RAP4EVERMRC96 ปีที่แล้ว

    4:33 why is it + ('plus') Expected value of log of p of x as to - ('minus')?

  • @heshamali5208
    @heshamali5208 ปีที่แล้ว

    why when maximizing the first component the second component will be minimized directly?

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      let's say
      fixed_amount = a + b
      if `a` increases then `b` must decrease in order to respect above equation.
      ##
      log_evidence is fixed. It is the total probability after taking into consider all parameters and hidden variables. As the tutorial shows, it consists of two components. If you maximize one component then the other should decrease.

    • @heshamali5208
      @heshamali5208 ปีที่แล้ว

      ​@@KapilSachdeva Thanks sir. my last question is how computational I could calculate Q(Z)||P(Z). like how do I know P(Z), while all I can get is latent variable Z which in my understanding it is Q(Z)? so how do I make sure that the predicted distribution of Z is close as possible to the actual distribution of Z? I know now how I could get P(X/Z). my question how do I calculate the regularization term?

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      I explain this in the tutorial on variational auto encoder. th-cam.com/video/h9kWaQQloPk/w-d-xo.html

    • @heshamali5208
      @heshamali5208 ปีที่แล้ว

      @@KapilSachdeva Thanks sir for your fast reply.

  • @heshamali5208
    @heshamali5208 2 ปีที่แล้ว

    in minute 9:200. how it's log p(z|x) / p(z). it was addition. shouldn't be log p(z|x) * p(z)? please correct it to me sir. thanks.

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      Hello Hesham, I do not see the expression "log p(z|x)/p(z)" any where in the tutorial. Could you check again the screen which is causing some confusion for you and may you have a typo in the above comment?

    • @heshamali5208
      @heshamali5208 2 ปีที่แล้ว

      @@KapilSachdeva thanks for your kind reply sir. I mean in the third line in minute 9:22, we moved from Eq[ log q(z|x)] + Eq[ log p(z)] to --> Eq[log q(z|x) /p(z)] which I don't no why it is division and not multiplication as it was addition before taking a common log.

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      @@heshamali5208 Here is how you should see it. I did not show one intermediary step and hence your confusion.
      Let’s look at only the two last terms in the equation.
      -E[log q(z|x)] + E[log p(z)]
      -E[ log q(z|x) - log p(z)] {I have take the expectation out as it common}
      -E[log q(z|x) / p(z)]
      Hope this clarifies now.

    • @heshamali5208
      @heshamali5208 2 ปีที่แล้ว +1

      @@KapilSachdeva ok thanks sir. it is clear now

  • @UdemmyUdemmy
    @UdemmyUdemmy 11 หลายเดือนก่อน +1

    hTis one video is worth a million gold particles..

  • @yongen5398
    @yongen5398 2 ปีที่แล้ว +1

    haha that " I have cheated you" at 7:36

  • @anshumansinha5874
    @anshumansinha5874 ปีที่แล้ว

    Why even know the posterior p(z|x) ? I think you can start with that.

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว +1

      For that watch the “towards Bayesian regression” series on my channel.

    • @anshumansinha5874
      @anshumansinha5874 ปีที่แล้ว

      @@KapilSachdeva Oh great that’ll be of a lot help! And great video series!

  • @NadavBenedek
    @NadavBenedek 4 หลายเดือนก่อน

    Not clear enough. In the first minute you say 'intractable', but you need to give an example of why this is intractable and why other terms are not. Also, explain why the denominator is intractable while the nomination is not.