Crash Course on Monte Carlo Simulation

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ก.พ. 2025

ความคิดเห็น • 60

  • @very-normal
    @very-normal  11 ชั่วโมงที่ผ่านมา +3

    Code for this video: github.com/very-normal/videos
    ADDENDUM:
    - At 14:14, I'm showing the wrong code to compile all of the results. The correct code can be found in the Github link.

  • @berjonah110
    @berjonah110 2 วันที่ผ่านมา +9

    One thing to keep in mind about saving the results is that disk operations are orders of magnitude slower than ordinary computation. Depending on the complexity of the calculation, you might be better of saving an intermediate result, like the chunked mean or something similar.

  • @d_b_
    @d_b_ 2 วันที่ผ่านมา +11

    Great video. Seeing code was really a turning point for me in understanding statistics, making the formulas and charts in textbooks more tangible. Now seeing the differences in 27:28 makes me want to dive deeper into the tests to see whats causing these differences.

  • @esosaagbongiator8870
    @esosaagbongiator8870 2 วันที่ผ่านมา +31

    Can’t believe this content is free 😭, thanks so much for this kind of valuable content.

    • @tr0wb3d3r5
      @tr0wb3d3r5 2 วันที่ผ่านมา

      same 💖

    • @very-normal
      @very-normal  2 วันที่ผ่านมา +9

      🫡

  • @ziggle314
    @ziggle314 2 วันที่ผ่านมา +5

    Fantastic exposition. You have inspired me to spend more time on Monte Carlo simulations.

  • @sethsims7414
    @sethsims7414 2 วันที่ผ่านมา +3

    Python has common sampling distributions in the "random" module of the standard library. Numpy/Scipy have a much wider selection of distributions though and will usually be faster if you know how to use them.

  • @Ocelus379
    @Ocelus379 วันที่ผ่านมา

    Amazing man! Keep it up, this is just the kind of videos that people like me need to get into the field.

  • @paper_cut9457
    @paper_cut9457 3 ชั่วโมงที่ผ่านมา

    lvl3 actually gets some computer science involved but I think this should be the case for anyone working with data in these times. super funny vim mention btw and thanks for the great content !

  • @jeffreychandler8418
    @jeffreychandler8418 2 วันที่ผ่านมา +1

    I've been diving into Rand Wilcox's robust estimation textbook and it's been a really compelling statistical book. It presents the argument that we care most about the core of the data, so we can use trimmed means and winsorized variances (for those that don't know those terms, you cut the outer percentiles for the mean, and for the variance, you set the outer percentiles to the new min and max from trimming). This seems to allow the robustness of rank statistics while maintaining the interpretability of regular statistics. I've worked with it on some of my data and it seems to capture similar patterns as the rank statistics but I can compute effect sizes and communicate it more effectively.

    • @jeffreychandler8418
      @jeffreychandler8418 2 วันที่ผ่านมา

      Also, DO NOT RUN COMPUTATIONS ON THE LOGIN NODE. That is the ONE WAY you can break a cluster. I cannot emphasize that enough xD The amount of times I've had to postpone work because some person somewhere in the country ran a "10 minute job" on a login node....
      Also I know there are tools to help write SLURM files for you, just remember your own cluster will have its own limits. It's also nice practice to request the minimum CPU/memory (usually the two are tied, so may as well use all memory you are allowed) necessary, since these are shared systems.
      And I will testify to the value of clusters. I took an analysis that would have taken 6 months on a high end consumer-grade desktop and shrunk it to a day and a half using cluster resources. If you are doing research/big data anything, this is a critical skill to your success.

  • @ElJuanchoCarrancho
    @ElJuanchoCarrancho 20 ชั่วโมงที่ผ่านมา

    Great video, I can now see where the power of the Bayesian statistics come from allowing to compute a simulation from a single model.

    • @very-normal
      @very-normal  18 ชั่วโมงที่ผ่านมา +1

      Thanks! But just as a heads up, none of the methodology here is Bayesian 😅

    • @ElJuanchoCarrancho
      @ElJuanchoCarrancho 18 ชั่วโมงที่ผ่านมา

      @very-normal that's true, but the underlying concept is widely used in Bayesian models (unless you're working with a sequential Gibbs sampler).

  • @ahiser1117
    @ahiser1117 วันที่ผ่านมา +2

    Hi Christian, great video! I love your explanations and have been following for a little while. In this implementation, I couldn't help but feel like there are some significant optimizations / tool choices that would drastically improve your runtime speed here.
    Firstly, I think writing each result to a separate file is going to lead to significant I/O overhead that is likely not trivial to the overall time.
    Another idea is switching to something like JAX in Python, where you can easily compile and parallelize your code. I took a quick crack at reimplementing the array of tests you're doing here and on my personal computer was able to run this in (2.5s compile time + 3.0s runtime) 5.5 seconds with 1e6 samples per permutation of shift, distribution, and test type.
    I know JAX can be a little daunting at first, but I highly recommend getting over the initial hump, since you can really scale up your analysis!

    • @very-normal
      @very-normal  วันที่ผ่านมา +2

      I can definitely see the I/O getting in the way with such a large amount of simulations. I even feel it when I need to read everything again and compile it for making graphs/tables 💀I haven’t heard of JAX before, but I think it’d be worth looking into. Thanks for the recommendation, I appreciate you taking the time out to tell me

    • @ahiser1117
      @ahiser1117 วันที่ผ่านมา

      @@very-normalJAX is a machine learning framework created by Google (was used to train Gemini). It has increasing use in probabilistic programming which is what I use it for in my research. Can’t recommend it enough.

    • @bokehbeauty
      @bokehbeauty วันที่ผ่านมา

      Would you be so kind to share your code, or point us to helpful guidance to convert the code to JAX?

    • @ElJuanchoCarrancho
      @ElJuanchoCarrancho 20 ชั่วโมงที่ผ่านมา

      do you use JAX with GPU?

    • @bokehbeauty
      @bokehbeauty 19 ชั่วโมงที่ผ่านมา +1

      @@ElJuanchoCarrancho No, not yet. I heart of JAX, but wasn’t forced to jump the learning hurdle.

  • @olipito
    @olipito 2 วันที่ผ่านมา

    Waited for soooo long for a good MC explanation video (series) YT is surprisingly poor on this topic. Thank you!

  • @realshmel
    @realshmel 2 วันที่ผ่านมา

    Amazing video . Thanks a million !

  • @Gigano
    @Gigano 2 วันที่ผ่านมา +4

    Fantastic stuff! I wish this were available 7 years ago haha when I did my PhD haha!

  • @Terracotta-warriors_Sea
    @Terracotta-warriors_Sea 2 วันที่ผ่านมา +1

    Thank you for a great video

  • @7embersVeryOwn
    @7embersVeryOwn 17 ชั่วโมงที่ผ่านมา

    Fantastic video as always. Subject is passionating, pedagogy is top tier, and the overall quality is excellent. My only constructive criticism: a bit unfortunate not to discuss the possibility to derive uncertainty estimates when doing MC simulation, imo it's one of its biggest advantages,

    • @very-normal
      @very-normal  17 ชั่วโมงที่ผ่านมา +1

      That’s true, I should’ve brought in a standard error estimate, that would’ve been relatively easy to do. Thanks for pointing it out

  • @azure-hawk
    @azure-hawk วันที่ผ่านมา

    If speed was a major concern, you could use matrix algebra to perform all of these tests much more quickly, though you would need to write your own versions of t.test() and wilcox.test(). This is generally going to be more performative than immediately resorting to parallelization. And this may be more of a personal preference, but you could use switch() instead of the separate if statements, like so (replace ellipses with function arguments): hypothesis_test

    • @azure-hawk
      @azure-hawk วันที่ผ่านมา

      There is also the dqrng package for fast random sampling and generating values from probability distributions, if that were to become a bottleneck.

  • @BOKRalavac
    @BOKRalavac 2 วันที่ผ่านมา +1

    I realy love your videos. If there is anything that I can do to help you produce them, I would love to do so (for example translation, since I am german, or writing short code chunks for viewers to download and play with)

  • @6PrettiestPics
    @6PrettiestPics วันที่ผ่านมา

    You should try Julia. I suspect you’d love it for this kind of work.

  • @julioccorderoc
    @julioccorderoc วันที่ผ่านมา

    This content is amazing, like incredible amazing
    Thansk!

  • @Gigano
    @Gigano 2 วันที่ผ่านมา +2

    By the way, at 14:14, I do not think that code shows how to read in the results of the simulations. Rather it shows the code that runs 10,000 simulations of the non-parallelized kind.

    • @very-normal
      @very-normal  2 วันที่ผ่านมา +5

      Oops you’re right, that’s my bad. I’ll add an addendum to the comments when I get back to my desktop. Thanks for the heads up!

  • @nj-bz8pv
    @nj-bz8pv 2 วันที่ผ่านมา

    Thanks

  • @panizzutti
    @panizzutti วันที่ผ่านมา

    Great video bro! Fuck yeahh i love this so much

  • @lex494
    @lex494 2 วันที่ผ่านมา

    Great video, thanks. I really like the aesthetics of your channel. It’s clean and tidy.
    Can you recommend some resources to get a deep understanding of statistics? I have no background in mathematics, but i’m self studying, because I feel like just intuition is not enough to excel at the subject. I would like to self study statistics from scratch as well. Do a lot of problems, calculations myself. Like a good entry level textbook, or an online resource. Thanks!

    • @very-normal
      @very-normal  วันที่ผ่านมา

      To answer your recommendation question, I recommend Fundamentals of Biostatistics by Ratner. I feel that Biostat is convenient because the examples are easier to grasp, so that should serve you well. I believe you’ll be able to find it online, so you don’t need to worry about the edition
      To be honest, I feel that a deep understanding on statistics does require some form of mathematical background. There are many areas in statistics where you’ll wonder why something is the way it is or why advice like “sample size of 30” exists, and it almost always boils down to a mathematical detail.
      That being said, I think it’s still perfectly acceptable to be adept at models that are commonly used in the knowledge are you work in. Understanding the math is not necessarily the same as the ability to translate what different models mean in different situations, and this is often the skill that people need more.

  • @creepyghoul2163
    @creepyghoul2163 วันที่ผ่านมา

    We have assumed different statistical models. One is the one that generates the data end the other ones are the ones assumed by the hypothesis tests. However, the monte carlo simulation evaluates only the statistical models of the hypostetis tests, right?

    • @very-normal
      @very-normal  วันที่ผ่านมา +1

      Yes thats right! Just the hypothesis tests

  • @toofardoug2188
    @toofardoug2188 14 ชั่วโมงที่ผ่านมา

    I think there's an editing hiccup just before 5:07

  • @therealawesomequest
    @therealawesomequest 2 วันที่ผ่านมา

    Great video! Love your stuff!
    When you're displaying code blocks, and transitioning between them, could you not morph the entire block? It makes it impossible to track what's being changed.
    I love the way you talk about these things!

    • @very-normal
      @very-normal  2 วันที่ผ่านมา +1

      Thanks! Yeah, I’m not a big fan of how manim handles the code changes either. I’ll try to look into this more or get better at an alternative way of showing code!

  • @creepyghoul2163
    @creepyghoul2163 วันที่ผ่านมา

    Changing the shift and other parameters is it called sensitive analysis?

    • @very-normal
      @very-normal  วันที่ผ่านมา +1

      I hadn’t thought about that, but you’re right, a sensitivity analysis is a very similar idea to using different parameters in a simulation study.
      The only nuance I can think of is that a sensitivity analysis is moreso doing the same analysis/model on different subsets of data that’s already been collected to see how it would affect results.
      On the other hand, a simulation study is moreso done with the purpose of planning an analysis, before any data is collected

  • @sethsims7414
    @sethsims7414 2 วันที่ผ่านมา

    You can embed your R script in the submit script using here docs. Assuming you're using bash.

  • @andyvirus2300
    @andyvirus2300 วันที่ผ่านมา +1

    To bad this is done using R…

  • @onnio7998
    @onnio7998 2 วันที่ผ่านมา

    A very interesting video! I can' help but wonder though whether it would be possible to run the third simulation on a laptop in a reasonable amount of time. Maybe I'm missing something, but shouldn't it be 180 000 * 27 * 2 samples of the distributions and then some math on those from which the final result can be derived. The number of calculations will be in the hundreds of millions probably, but that should't be too bad if I imagine writing some multithreaded c++. Might take a couple of seconds. Am I dreaming?

    • @very-normal
      @very-normal  วันที่ผ่านมา

      You’re right! It’s totally possible to still do the third one on a personal machine.
      The guiding example I chose here isn’t that good for demonstrating how much of a speed up you can get from the HPC, but the essential structure of the simulation should be about the same. That’s what I was hoping to get across and that people could twist it to their needs

    • @onnio7998
      @onnio7998 วันที่ผ่านมา

      @very-normal Yep. Of course there will be a point where it's no longer possible or practical to run a an experiment locally.

  • @TM-fk4gt
    @TM-fk4gt 2 วันที่ผ่านมา +2

    I don't understand the conclusion. Could you rephrase it. 27:07

    • @very-normal
      @very-normal  2 วันที่ผ่านมา +3

      The t-tests work better than the nonparametric test when all the assumptions needed to support the t-test are met. Furthermore, there’s not much of a difference between Student and Welch’s t-test.But this study shows that its power is severely decreased in the presence of outliers, which are a natural occurrence in TH-cam. In either case of data generating distribution, power is higher when the two groups are more different, since it’s much easier to detect the difference in their means/typical values.
      Since the power of the nonparametric test is not that much lower even in the Normal setting, it’s my opinion that it’s a safer hypothesis to use if I want to analyze this type of data in the future.

    • @statswithbrian
      @statswithbrian 2 วันที่ผ่านมา

      ​@@very-normal The equal variance (Student) t-test is only going to out-perform Welch's when the sample sizes are really small, which is where you get the benefit of pooling the variances. If you run these simulations with n=3 and data coming from normal distributions, that's where the pooled test will really shine compared to the others. with n=27, you'll be able to estimate each variance separately very well anyways, so you don't really get an advantage from pooling which is why the power is the same. Making sample size (in addition to sample size) a variable in the simulation would let you see when the pooled test outperforms the unpooled test. Adding various degrees of heteroskedasticity would also probability highlight the reverse cases where the unpooled test does better.

  • @MeghrajYashwantkar
    @MeghrajYashwantkar 18 ชั่วโมงที่ผ่านมา

    i want code

    • @very-normal
      @very-normal  11 ชั่วโมงที่ผ่านมา

      what's the magic word

  • @sitrakaforler8696
    @sitrakaforler8696 วันที่ผ่านมา

    500th like.
    Really cool vidéo hahaha

  • @Aldotronix
    @Aldotronix วันที่ผ่านมา

    Too much work mate, just run a permutation test and call it a day