Importance Sampling - VISUALLY EXPLAINED with EXAMPLES!

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ก.พ. 2021
  • This tutorial explains the Importance Sampling technique and its variant for unnormalized distribution functions called Self Normalized Importance Sampling.
    Notebook to go with this tutorial:
    colab.research.google.com/dri...
    #sampling
    #statistics
    #montecarlo
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 53

  • @sudhagarraghavan9696
    @sudhagarraghavan9696 3 ปีที่แล้ว +12

    Outstanding tutorial...probably the best in the TH-cam on this topic. Your knowledge is very impressive and your presentation of the subject is excellent ! Thank you!

  • @malcolmyang5662
    @malcolmyang5662 3 ปีที่แล้ว +5

    One of the best explanation I have found so far in internet. GJ

  • @sanjaykrish8719
    @sanjaykrish8719 หลายเดือนก่อน +2

    This is pure gold ❤

  • @atomsmasher
    @atomsmasher 2 ปีที่แล้ว +1

    Incredibly done tutorial, I looked at 4 videos before landing on this one and it is superb. Thank you for taking the time to enrich humanity by sharing knowledge clearly and intuitively

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      🙏 many thanks for these kind words. I am happy you found the tutorial helpful.

  • @proxyme3628
    @proxyme3628 ปีที่แล้ว +1

    Finally I can understand the importance sampling. Great way to start with uniform distribution p(x) and then introduce sampling from importance region with normal distribution q(x). Great way to get the idea of importance sampling. Thank you so much.

    • @KapilSachdeva
      @KapilSachdeva  ปีที่แล้ว

      🙏 glad that it was helpful to you.

  • @schrodingerac
    @schrodingerac 2 ปีที่แล้ว +2

    This is a great tutorial..... I really appreciate the effort done to make it looks like this

  • @prajwalchauhan6440
    @prajwalchauhan6440 5 หลายเดือนก่อน +1

    This series is the best.

  • @avishkarsaha8506
    @avishkarsaha8506 2 ปีที่แล้ว +1

    best video on importance sampling ive seen

  • @akapo98
    @akapo98 ปีที่แล้ว +1

    Very high quality video! Thank you very much!

  • @jonasschmitt3548
    @jonasschmitt3548 3 ปีที่แล้ว +2

    Great video, thank's a lot :)

  • @vi5hnupradeep
    @vi5hnupradeep 2 ปีที่แล้ว +1

    Thank you so much sir !

  • @mohsenvazirizade6334
    @mohsenvazirizade6334 6 หลายเดือนก่อน +2

    Fantastic tutorial. Way to go! what are the other resources you reommend for other importance sampling methods?

  • @yogeshmathpati1472
    @yogeshmathpati1472 2 ปีที่แล้ว +1

    Keep up the good work sir. Tysm!

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      🙏

    • @yogeshmathpati1472
      @yogeshmathpati1472 2 ปีที่แล้ว

      Can I get the code of importance sampling of some problem to know how it's done ? Actually I'm new to coding and I'm not able to find it anywhere. It would be huge favour if you help. And something about Latin Hypercube Sampling too. Thanks

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      Here is a notebook in which I have added the example that I showed in the tutorial.
      colab.research.google.com/drive/1Xm2e0MDVsoxgPIKIl-u60iu4fNARz3Kw?usp=sharing
      Hope this helps.

  • @Matteo-uq7gc
    @Matteo-uq7gc 2 ปีที่แล้ว +1

    Another awesome video! Sequential Importance Sampling would be amazing also!!!

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว +2

      🙏… will do it; most likely in conjunction with particle filtering.

    • @Matteo-uq7gc
      @Matteo-uq7gc 2 ปีที่แล้ว

      @@KapilSachdeva I will be on the lookout!!! Thank you

  • @chandreshmaurya8102
    @chandreshmaurya8102 2 ปีที่แล้ว +1

    Nice info on importance sampling. Can you tell me which software you used to make the videos?

  • @eceserin
    @eceserin 2 ปีที่แล้ว

    Very nice, thank you.

  • @bodwiser100
    @bodwiser100 4 หลายเดือนก่อน

    @KapilSachdeva, very nice video! Super helpful as always. Can you please shed light on one sticky point that has always eluded me when studying sampling methods (including MCMC): what did you mean when you mentioned that one of the problems in computing the average is "we can't sample from p"? I thought it meant that we do not know the PDF p(x). But as I continued to watch the video further on, it seemed that it was not the case. We _can_ have access to p(x). But then I was wondering that if we know p(x), why can we not sample from it? In the same vein, I have always wondered that if we knew p(x), there would never be any need for the proposal function, and the transforms etc. and all that machinery. Isn't the unavailability of p(x) the whole reason why we are doing all this?

    • @KapilSachdeva
      @KapilSachdeva  4 หลายเดือนก่อน +1

      Quite many people have this confusion. Here are two things that should help:
      - just because you know the functional form of the PDF it does not mean you can sample from it. Most of the sampling algorithms are the transformations of uniform sampling.
      - in the example I have used function whose analytical form I knew but when you are doing Bayesian statistics and have posterior prob then most of the time you do not even know the functional form.

  • @ArashSadr
    @ArashSadr 2 ปีที่แล้ว

    Thanks for the great video, I was wondering this is a typo or not? At 13:17, on the bottom of right column texts, THe last formulation should be 10/N..... ? The 1/N is missing or I am wrong?

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว +1

      🙏No you are correct. Thanks for catching it … someone is really paying attention :)

    • @medomed1105
      @medomed1105 2 ปีที่แล้ว

      @@KapilSachdeva I think 10 should not be existed . It was for uniform

  • @ajazhussain2919
    @ajazhussain2919 หลายเดือนก่อน +1

    thanks

  • @yiweikwok8735
    @yiweikwok8735 ปีที่แล้ว +1

    Thanks for the great video!
    I wanted to ask about the "variance" you plotted (timestamp 10:10) in Monte Carlo sampling. At first time I didn't know what variance you were referring to, but by looking at the colab notebook, I realized you were referring to the variance of random variable $10*h(X)$, instead of the variance of the Monte Carlo estimate $10*\sum_{i=1}^N h(X_i) / N$.
    So here I want to share some of my thoughts when I was learning it. If we are talking about variance of $10*h(X)$, then it's always 4, and the plot only shows an estimate of this ground truth variance, with $N$ getting larger and larger. For a small $N$, the estimate of variance could also have high variance, resulting a very inaccurate estimate of variance, i.e. could be much larger or smaller than 4. With a sufficiently large $N$, we can finally obtain an accurate estimate: 4. Running the program multiple times, one can observe the great fluctuations at the first of the plot.
    It's another thing when it comes to the variance of the Monte Carlo estimate $10*\sum_{i=1}^N h(X_i) / N$. For this random variable, the variance is $4/N$, depending on the number $N$. Hence actually Monte Carlo reduces the variance of its estimate of the mean by increasing number of samples $N$.
    However, the more important thing is, with the large variance of $10*h(X)$, one would require a very large $N$ to get the "law of large numbers" really working well. By Chebyshev's inequality, if we are expecting the Monte Carlo estimate to have an error less than \epsilon, the number of samples is inversely proportional to the "risk" (the probability that we fail to accomplish the goal). With a large variance of $10*h(X)$, the number of samples $N$ needs to be even larger. That's why we need importance sampling to reduce the sample variance, so that we don't need so many samples to perform Monte Carlo.
    That's my own opinion, hope it helps 😄

  • @100rabhkr_India
    @100rabhkr_India 2 ปีที่แล้ว +1

    Dear Kapil Sir, thanks for this great video. At 4:10, you mention that P_theta is not normalized. I want to ask, why do we need our P_theta to be normalized? What problem would it create if its not normalized?

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      Thanks Saurabh.
      A function can be only be considered a probability distribution function (PDF) if it is normalized. You use PDF to compute the probability of a state of your random variable. Probability is a number between 0 and 1.
      In the expectation equation, the right most term is the pdf (p(x)). Therefore, your distribution better be normalized to use it.
      In the importance sampling case you see that we need to compute the ratio p(x)/q(x). This ratio is that of PDFs and hence u need the normalizing constants for both of them.
      Now, in most of the real world probabilistic modeling the computation of this normalizing constant is the most the difficult thing (computational cost).

    • @100rabhkr_India
      @100rabhkr_India 2 ปีที่แล้ว

      @@KapilSachdeva Thanks for the quick response. Can you please provide an example of a PDF which is not normalized

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว +1

      The unnormalized PDF arise when you multiply/mix/convolve two high dimensional probability functions. Most of the time you see this happening when doing Bayesian Statistics. Read this page - en.wikipedia.org/wiki/Normalizing_constant .... The posterior distribution is result of mixing likelihood and prior divided by normalizing constant. You also saw that in the Kalman Filter tutorial. If you use non-linear functions for transforms you would end up resulting function for which computing the normalizing constant will be extremely costly.
      Also look at the first 5-10 min of this video where I explain this problem of normalizing constant in another (but related) context (th-cam.com/video/68dJCDyzB0A/w-d-xo.html)

    • @100rabhkr_India
      @100rabhkr_India 2 ปีที่แล้ว +1

      @@KapilSachdeva Understood. Thank you very much. Waiting for your videos soon on Particle filter, Bootstrap filter etc. Loved your videos. Thank you very much

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      🙏

  • @animatedmath4307
    @animatedmath4307 2 ปีที่แล้ว +1

    @14.41 you state you are interested in computing P(X\geq 2), but this has nothing to do with h. Do you mean P(h(X)\geq 2)? In such a case the rest makes sense

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      Based on your comment, in the hindsight I agree it is not proper to say we are interested in probability of X > 2. Indeed my sentence will create confusion as it has done for you. Apologies.
      Here is how I should have explained:
      In this scenario, we are interested in calculating the expectation of a function of a random variable X (i.e. h(X)), where X is normally distributed __but__ we only want to consider situations when X >=2
      I hope it makes sense now.

    • @animatedmath4307
      @animatedmath4307 2 ปีที่แล้ว +1

      @@KapilSachdeva i see: then it is not even what i thought.
      P(X\geq 2)= integral of Ind*p(x)dx, where Ind is the indicator function e p(x)=pdf(X).
      P(h(X)\geq 2)=integral of Ind(h(x))*p(x) dx
      E[restriction of h(X) to the domain [2, \infty)]=integral h(x)*Ind* p(x) dx
      which is the second integral @15.50.
      Now it is clear.

    • @KapilSachdeva
      @KapilSachdeva  2 ปีที่แล้ว

      🙏

    • @lakex24
      @lakex24 4 หลายเดือนก่อน

      Thank you for raising this question. It's indeed quite confusing for me as well. imo, the indicator function should be applied to h(x), where we should set h(x) = 1, for h(x) >= 2; and set h(x) = 0, for h(x) < 2. Since we are interested in the probability of h(x) >= 2, NOT the expectation of h(x) ...?