The intuition behind the Hamiltonian Monte Carlo algorithm

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 พ.ค. 2018
  • Explains the physical analogy that underpins the Hamiltonian Monte Carlo (HMC) algorithm. It then goes onto explain that HMC can be viewed as a specific type of Metropolis-Hastings sampler.
    The paper by Michael Betancourt I mention is "A Conceptual Introduction to Hamiltonian Monte Carlo", 2018, ArXiv, and is available here: arxiv.org/pdf/1701.02434.pdf. The Radford Neal paper is, "MCMC using Hamiltonian dynamics", Chapter 5 in the "Handbook of Markov Chain Monte Carlo" by Brooks et al., 2011.
    This video is part of a lecture course which closely follows the material covered in the book, "A Student's Guide to Bayesian Statistics", published by Sage, which is available to order on Amazon here: www.amazon.co.uk/Students-Gui...
    For more information on all things Bayesian, have a look at: ben-lambert.com/bayesian/. The playlist for the lecture course is here: • A Student's Guide to B...

ความคิดเห็น • 47

  • @user-lx7jn9gy6q
    @user-lx7jn9gy6q 2 ปีที่แล้ว +13

    For those who don't know the research and esoteric academic work on this stuff. Just imagine how difficult this would be to learn from academic documents. Ben has saved us hours upon hours of work and frustration.

  • @TanThongtan
    @TanThongtan 3 ปีที่แล้ว +16

    This is incredible. I've struggled reading through both papers mentioned but finally got an intuitive idea of how HMC works 4 mins into this video.

  • @thomasrobatsch2582
    @thomasrobatsch2582 3 ปีที่แล้ว

    I highly appreciate the effort you use to create animations!

  • @ana_log_y
    @ana_log_y 4 ปีที่แล้ว +4

    Thank you for such a well prepared and explained material!

  • @vinayramasesh2959
    @vinayramasesh2959 5 ปีที่แล้ว +7

    This was an excellent video, thanks! I hadn't understood until now that the only reason that proposals in HMC would ever be rejected is due to a slight increase in the value of H due to the fact that we're only approximately integrating the Equations of Motion.

  • @richasrivastava999
    @richasrivastava999 2 ปีที่แล้ว +1

    Thanks for this video. I was trying a lot to understand physical analogy of HMC. This video really helped.

  • @Zorothustra
    @Zorothustra 5 ปีที่แล้ว +5

    That's a very intuitive explanation of the HMC sampler, great job! Thanks for sharing.

    • @SpartacanUsuals
      @SpartacanUsuals  5 ปีที่แล้ว +2

      Thank you!

    • @haowenwu5046
      @haowenwu5046 5 ปีที่แล้ว

      Ben Lambert no thank you! This video is really clear and helpful!

  • @matakos22
    @matakos22 2 ปีที่แล้ว

    Thank you for this, it's pure gold. Highly appreciated.

  • @Peledziuko
    @Peledziuko 4 ปีที่แล้ว +3

    Thank you for this video, it goes very nicely with the papers referenced. I was wondering if you have a video or could recommend one that goes into more depth on solving for the path of the particle? I am still struggling to fully understand what that means, otherwise I am clear on other steps :)

  • @waltryley4025
    @waltryley4025 2 ปีที่แล้ว

    Brilliant work, thank you for posting.

  • @tomlindstrom6698
    @tomlindstrom6698 4 ปีที่แล้ว +1

    This presentation is fantastic. Is there any chance you'd be able to expand it to relate the sampling concepts to warning messages in stan and the necessary tweaking (tree depth, adaptive delta etc)?

  • @MrPigno87
    @MrPigno87 2 ปีที่แล้ว

    Thank you for this great explanation!

  • @khan.saqibsarwar
    @khan.saqibsarwar หลายเดือนก่อน

    Thankyou for the great explanation.

  • @qazaqtatar
    @qazaqtatar 2 ปีที่แล้ว

    Excellent lecture!

  • @emmanuelniyigena4343
    @emmanuelniyigena4343 5 ปีที่แล้ว +1

    yes, very helpfull Mr.Lambert. Thank you :)

  • @renyf6093
    @renyf6093 4 ปีที่แล้ว

    Thanks for the nice job~🍺 It's precise and easy to understand.

  • @chenxin4741
    @chenxin4741 3 ปีที่แล้ว

    Very nice visualization.

  • @JuanIgnaciodeOyarbide
    @JuanIgnaciodeOyarbide 4 ปีที่แล้ว

    Thank you for the video.
    I just have a question. For the bimodal posterior, wouldn't you receive warnings from Stan regarding the chains performance? I think it was Betancourt saying that, in the presence of bimodality, HMC is not appropriate.

  • @samw.6550
    @samw.6550 5 ปีที่แล้ว +1

    Great job! Thanks a lot! :)

  • @YYHo-kw2bi
    @YYHo-kw2bi ปีที่แล้ว

    Thank you~ I am watching this video for preparation of my master program

  • @rwtig
    @rwtig 4 ปีที่แล้ว +1

    Great video, thanks Ben. Would have been good to have seen a visualisation showing the -m trick working, took me a long time to satisfy this in my mind (essentially thinking how the equation of motion applied to the proposal point and then flipping will gets us back to the original point). Also it is very interesting that in an ideal world, this essentially never rejects a proposal as the original point and proposal should have the same energy. Maybe a visualisation driving that point home would also have been useful. Once I felt I understood these concepts, the algorithm made sense to me.

    • @rwtig
      @rwtig 4 ปีที่แล้ว

      Of course very important for the -m* trick that the joint distribution is independent for m and theta, so that we can be certain we have symmetry for paths at m and -m*.

  • @mururaj4022
    @mururaj4022 3 ปีที่แล้ว

    Amazing video to explain the intuition! Love it! I have a question though. Mathematically, I understand why we need to flip m to -m, i.e to make the proposal symmetric. But in effect, since we are going to throw away m as we are only interested in theta, would we still need to flip it? It doesn't affect the r value (since gaussian) or the next momentum value (since we are squaring it)

    • @mururaj4022
      @mururaj4022 3 ปีที่แล้ว

      Ok I found the answer in Neal's paper. ... This negation need not be done in practice,..

  • @leowatson1589
    @leowatson1589 7 หลายเดือนก่อน

    Excellent Video! But one question.
    At 17:10, where does the normalizing constant 1/z come from? the integral over the support of the normal's pdf is 1 and the other terms on the RHS are simply constant w.r.t the variable of integration.
    Is the LHS not a valid pdf at this point? Hence why we obtain the normalizing constant

  • @richardpinter9218
    @richardpinter9218 3 ปีที่แล้ว

    I love you, thank you!

  • @peymantehrani7810
    @peymantehrani7810 4 ปีที่แล้ว +7

    Thanks for your wonderful video!
    Here are my questions:
    first I didn't undrestand why changing M* to -M* , makes the conditional probability to change from 0 to 1?
    and also why the path from (theta,M) to (theta*,M*) is determinsitic?

    • @tboutelier
      @tboutelier 4 ปีที่แล้ว +4

      The path from (theta,M) to (theta*,M*) is deterministic because it is the result of the integration of the particle path through the parameter space, according to the conservation of the total energy. This is a pure mechanic reasoning and nothing probabilistic is used for this step.
      Understanding the change of sign of M* is still a bit magic for me! ^_^

    • @mikolajwojnicki2169
      @mikolajwojnicki2169 3 ปีที่แล้ว +2

      The -M* is a clever trick. You essentially reverse the motion.
      Imagine throwing a ball with a certain momentum m1 from a certain position x1. After a time t it will end up in a place x2 with momentum m2. If you then throw the ball from x2 but with momentum -m2 after time t it will end up at x1.
      So from (x1, m1) you go to (x2, m2) but report the result as (x2, -m2) because from (x2, -m2) you get back to (x1, -m1) which you would report as (x1, m1).
      (You don't actually go back in the algorithm. It's just to make sure there is no bias in the proposed values. See Metropolis-Hastings)

  • @ErickChaplin
    @ErickChaplin 2 ปีที่แล้ว

    Hi i have a question. Could you send me the wolfram program that you presented here? I have to do a presentation to my phd and it would be useful!!
    Thank you so much

  • @yuanhu7264
    @yuanhu7264 4 ปีที่แล้ว +1

    Thanks for your video! It was excellent. What I don't understand is calculating the trajectory based on currently location. In physical analogy you would have to kick the particle so it starts to move, how do you 'kick' the particle in HMC (I guess it's M but how do you know where it is heading)? Also in the video @26.37, once you flip the sign of M, logically I would expect the angle of the vector to change by -180 degrees, that's how you reverse a momentum physically, however the graph shows it heads to a strange angle, how do you explain this? Thanks!

    • @rwtig
      @rwtig 4 ปีที่แล้ว +1

      If I understand correctly, we start with a point theta and sample a momentum so now have (theta, m). Sampling this m can be thought of as the kick (or slow down) as we can now be in a different energy level. Based on the momentum if we go forward a small bit of time (I'm not sure how this time parameter was chosen), then we can solve by the laws of motion to get a new point (theta*, m*), completely deterministic. But we want a deterministic proposal that is guaranteed to return back to (theta, m) otherwise MCMC won't work. Consider instead (theta*, -m*), then due to symmetry of our joint distribution space as it is independent in m and theta, then applying our laws of motion will send us to (theta, -m). Of course then flipping gives us (theta, m). This shows that following the laws of motion and then flipping the momentum is exactly reversible.

    • @ruoshiwen371
      @ruoshiwen371 3 ปีที่แล้ว

      I had the same confusion for a long time. Then I read Neal's paper in which a one-dimensional example is illustrative. I also found a lecture note which I hope could help you. faculty.washington.edu/yenchic/19A_stat535/Lec9_HMC.pdf

  • @JaGWiREE
    @JaGWiREE 5 ปีที่แล้ว +1

    What was the intuition or reason for choosing a bimodal? Was it just to make the position and momentum graph easier to visualize both components as convergence occurs?

    • @SpartacanUsuals
      @SpartacanUsuals  5 ปีที่แล้ว

      Thanks for your comment. Yes, it was a somewhat arbitrary choice. It mainly chose this rather than a unimodal Gaussian so that the paths were more interesting (i.e. not just circles). Best, Ben

  • @NilavraPathak
    @NilavraPathak 5 ปีที่แล้ว +7

    This is awesome can you please do one for Variational Inference.

    • @SpartacanUsuals
      @SpartacanUsuals  5 ปีที่แล้ว +5

      Thanks! Yes, one is currently being planned. Cheers, Ben

  • @nickbishop7315
    @nickbishop7315 ปีที่แล้ว

    So HMC almost always accepts (rather than rejects) proposals as long as the momentum term is set correctly initially?

  • @hanyingjiang6864
    @hanyingjiang6864 2 ปีที่แล้ว

    I have read like 100 blogs and none of them can states the things clearly like this video.

  • @fakeeconomist
    @fakeeconomist ปีที่แล้ว

    genious

  • @asvz7375
    @asvz7375 6 ปีที่แล้ว +1

    what is x

    • @SpartacanUsuals
      @SpartacanUsuals  6 ปีที่แล้ว

      Thanks for your comment. 'x' is the data sample. So p(X|theta) is the likelihood. Hope that helps! Best, Ben