5 statistics questions you should really know

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ธ.ค. 2024

ความคิดเห็น • 54

  • @tystovall6574
    @tystovall6574 13 วันที่ผ่านมา +95

    Everybody knows about Type 1 Errors and Type 2 Errors, but few know about Type 3 errors: confusing Type 1 Errors with Type 2 Errors

    • @bp56789
      @bp56789 12 วันที่ผ่านมา +4

      People should take a course in naming things before they establish new terms. Worst names ever. Names should be few syllables and somewhat self-explanatory. E.g. good hit, bad hit, good miss, bad miss.

    • @tystovall6574
      @tystovall6574 11 วันที่ผ่านมา +3

      @bp56789 for real. I have no idea how "Type One" and "Type 2" sounded like a good, memorable, or intuitive name for these.

    • @pipertripp
      @pipertripp 6 วันที่ผ่านมา +1

      Oh, I would say type 3 is by far the most common error. 😅

  • @jamesdavis3851
    @jamesdavis3851 13 วันที่ผ่านมา +22

    It's also important to emphasize p-values require defining *a priori* exactly what a "more extreme" result is. If it can't be defined, or you define it afterward, you haven't actually generated a p-value.

    • @loldelol34w56436
      @loldelol34w56436 13 วันที่ผ่านมา +2

      Would you mind to explain? Newbie here

    • @kventinho
      @kventinho 12 วันที่ผ่านมา

      expound please.

  • @GabrielAPPer
    @GabrielAPPer 13 วันที่ผ่านมา +16

    Great video idea! I do think, however, that since these concepts are all quite too simple for someone with a wide experience in statistics, it would be cool to see more versions of this video concerning deeper concepts. As someone which deals with a huge amount of statistics, but since it's under Econometrics, quite limited to Casual Inference, I'd love to see what I'm missing out in the other subfields. Keep up the great work!

    • @very-normal
      @very-normal  13 วันที่ผ่านมา +9

      Yeah that’s the tricky thing about videos like this one. On one hand, my audience is full of people who do have deep stats experience, so it’s more of a quick check. But on the other, these are also ideas that I regularly have to teach to researchers during consults. I appreciate the input, I’ll try to think of ways to strike a balance here

    • @GabrielAPPer
      @GabrielAPPer 13 วันที่ผ่านมา +3

      Love that you care mate. It's always hard to balance complexity with educational themes, just know you make great videos!

    • @lexinwonderland5741
      @lexinwonderland5741 13 วันที่ผ่านมา +2

      @@very-normal i would love if instead, you made another one of these videos but "for advanced viewers", because i thought this was too perfect of an introduction to miss out on! keep up the great work dude!!

    • @blueberrypanda931
      @blueberrypanda931 13 วันที่ผ่านมา

      @@very-normal just wanted to chime in and say the depth of this video was perfect for a stats beginner like me!

  • @im_83n
    @im_83n 12 วันที่ผ่านมา

    I finally figured out how to remember the difference after watching this video. "false positive" is pretty common language, so that one is type one, as opposed to a false negative which, at least I, didn't hear as often prior to taking stats.
    Thank you again, love the content, and the channel name.

  • @falconarea
    @falconarea 13 วันที่ผ่านมา +3

    Woow great video. I felt really engaged with the idea of first watch if I know the concept, and later seeing the explanation. It is a much better format to feel I am actually improving, but setting first that I dont fully understand that topic

    • @falconarea
      @falconarea 13 วันที่ผ่านมา

      Reading the other comments, at least for me I learned a lot. I had a couple of courses in deep statistics, but i dont use it in my day to day (im a computer engineer), so the subjects were not new, but it definitely show me i dont remember anything about them.

    • @falconarea
      @falconarea 13 วันที่ผ่านมา

      Lastly, I think for me the best method to engrave this type of knowledge is by practical examples (hopefully outside the medical trial ones, they are all over the place and overused)

  • @RyJones
    @RyJones 13 วันที่ผ่านมา +8

    There are two types of statisticians: those who understand power, those that don’t, and those that aren’t sure

  • @kononivskipolya3950
    @kononivskipolya3950 12 วันที่ผ่านมา

    really interesting to see some other ways to explain the same concepts I'm taught in uni. even though I think I had all the questions right I found it helpful to hear the ideas paraphrased and visualized. good way to enhance intuition.

  • @parthosen5942
    @parthosen5942 13 วันที่ผ่านมา +3

    Hey mate, great work! Would love some videos on the difference between doing causal inference on observational vs experimental data, the pitfalls of linear regression, etc; econometrics topics that aren't technically rigorous but form the foundations of model based inference.

    • @very-normal
      @very-normal  13 วันที่ผ่านมา +1

      Yeah I think some causal questions would be good! They come up a lot in Biostat as well

    • @parthosen5942
      @parthosen5942 13 วันที่ผ่านมา

      @@very-normal Haha just noticed I had typed casual instead of causal; the exact opposite of what to do XD

  • @sagie009
    @sagie009 9 วันที่ผ่านมา +1

    8:50 thats so unhinged for you lol

  • @the_multus
    @the_multus 13 วันที่ผ่านมา +10

    10:35 that's obviously wrong!
    Everybody knows the word »frequentist« comes from the word »freaqy« ( ͡° ͜ʖ ͡°)

    • @very-normal
      @very-normal  13 วันที่ผ่านมา +4

      oops fell for the classic pitfall

  • @tomasroosguerra8338
    @tomasroosguerra8338 11 วันที่ผ่านมา

    This way of teaching is really good - with questions leading forward a full narrative. I think you're on to something. Thank you.

  • @pfizerpflanze
    @pfizerpflanze 12 วันที่ผ่านมา

    I think that a more general math formula for the pvalue should be
    2*min(P(T≥t|H0), P(T≤t|H0)) because for example when testing
    H0: σ²=σ²_0 vs H0: σ²≠σ²0 with a normal srs the usual test statistics is asymmetrical (it's a chi squared with n-1 df).
    It's not very common though, because most of the tests are either chi squared with bigger values the farther from the null or normal/student's t distributed under the null

  • @RyeCA
    @RyeCA 13 วันที่ผ่านมา

    I wonder how I have never heard of the long run idea...
    Anyway, great video, I was able to answer questions 1-4! Looking forward for more question videos :)

  • @Inexorablehorror
    @Inexorablehorror 13 วันที่ผ่านมา

    Thank you for the video. Just a small remark: at 4:30 , your text says "... the two have a similar effect." I am not a native speaker, but doesn't it imply that they are not equal, but have a small difference, i.e. a small effect size. In that case, the drug would be indeed different from placebo and the null hypothesis WOULD be wrong and you did NOT commit a type 1 error (which is a valid criticism to frequentism and NHST, that in the real world, the Null is never true...).

  • @yusufspahi1693
    @yusufspahi1693 13 วันที่ผ่านมา +1

    more of these please

  • @giovannimantovani795
    @giovannimantovani795 13 วันที่ผ่านมา

    Super idea, go on please!

  • @tofonofo4606
    @tofonofo4606 13 วันที่ผ่านมา +1

    Very Nice 👏

  • @user-hl6xe8dz9x
    @user-hl6xe8dz9x 12 วันที่ผ่านมา +1

    One request please make more elaborative video on Type-1,2,3 or sthg else or similar if exist and also on power with real use case in biology as you are Biosatistician.

  • @mesplin3
    @mesplin3 13 วันที่ผ่านมา

    With question 5, would it be okay to say that there is a number L such that probability that distance between a sample proportion and L can be arbitrarily small yet that probability will approach one over many trials?

    • @very-normal
      @very-normal  13 วันที่ผ่านมา +2

      Yeahh, that’s about right. At that part, I made a vague reference to the Law of Large Numbers, which is similar to what you’re describing. Convergence in probability (or almost surely, depending on what law is used)

  • @nabibunbillah1839
    @nabibunbillah1839 13 วันที่ผ่านมา

    It is useful. Thanks 😊

  • @d_b_
    @d_b_ 2 วันที่ผ่านมา

    It'd be nice if the terminology was more descriptive of what they're measuring. Especially type 1/2 error. Like when programming, you'd want good variable names. Instead of type 1, maybe "false alarm rate"/"cry wolf rate". Type 2: "overlooked rate"/"failed rejection rate", power: "detection rate"/"bullseye rate"

    • @very-normal
      @very-normal  2 วันที่ผ่านมา +1

      lol yeah I agree. Unfortunately, statisticians are the worst at naming things. Don’t even get me started on stuff like “sufficiency”, “completeness”, or “almost sure convergence”.
      But to be fair, statistics can be used in so many different contexts that it almost has to suffer from needing to use very vague, general terms

  • @the_multus
    @the_multus 13 วันที่ผ่านมา

    Simple questions really help me to up my terminology game in english! So thx, I guess.

  • @nicolasrobertovitordemorae9396
    @nicolasrobertovitordemorae9396 13 วันที่ผ่านมา +1

    Nice

  • @SwissPGO
    @SwissPGO 13 วันที่ผ่านมา +1

    try to make a video where you question chatgpt's models on statistical data analysis and how it succeeds or fails?

  • @braineaterzombie3981
    @braineaterzombie3981 13 วันที่ผ่านมา

    Nice vid

    • @very-normal
      @very-normal  13 วันที่ผ่านมา

      🌱💨🟢
      thank you

  • @LoganHolmes-og5jm
    @LoganHolmes-og5jm 11 วันที่ผ่านมา

    Baba Is You music in a statistics video 🤯

  • @mertaliyigit3288
    @mertaliyigit3288 13 วันที่ผ่านมา

    Omg baba is you music!!!

  • @GrahamBornholt
    @GrahamBornholt ชั่วโมงที่ผ่านมา

    Technically, the standard p-value is not a conditional probability.

    • @very-normal
      @very-normal  32 นาทีที่ผ่านมา

      how would you describe it tho lol

  • @Blahcub
    @Blahcub 13 วันที่ผ่านมา +2

    These were too easy and basic.

    • @very-normal
      @very-normal  13 วันที่ผ่านมา +1

      You know your stuff!! If you have another question you get tripped up by, I’m game to try to help out

    • @yoeri7004
      @yoeri7004 13 วันที่ผ่านมา +4

      @@very-normal I would love to see more of these type of videos, but for more advanced topics.
      E.g. pitfalls when using MCMC or pitfalls when doing logit, to name a few topics you've covered earlier

    • @ratpackenterprises1607
      @ratpackenterprises1607 13 วันที่ผ่านมา

      @@yoeri7004 Truuu

  • @simonpedley9729
    @simonpedley9729 13 วันที่ผ่านมา +1

    Let`s say I have a Bayesian prediction system. I test it 1000000 times. I find that the 90% quantile is exceeded 20% of the time. So it`s a fail, obviously. This illustrates that even if you are a Bayesian, you also have to be a frequentist, because not being a frequentist breaks the definition of probability and probability doesn`t make sense any more. B and F are not opposites. They refer to different things. They can work hand in hand.

    • @the_multus
      @the_multus 13 วันที่ผ่านมา +2

      Not really. We can assign a probability for the explosion of a bomb, but once it happens it is no longer replicable. Sure, you can argue, that the bomb could be rebuilt, but let's say it's unique for the sake of the argument. In this case there is no frequency.
      You could also think of a locally probabilistic process, which evolves over time: e.g. reproduction of organisms in a sequestered eco-system (they do breed, but they go extinct after some, likely long, time)

    • @simonpedley9729
      @simonpedley9729 13 วันที่ผ่านมา

      @@the_multus Wrt the bomb, the explosion can't be repeated. But the model that you use to model the explosion can be tested as many times as you want, using random data based on the assumptions in the model. So let's say you test it, and you find that the 90% quantile is exceeded 20% of the time. That means that the model is inconsistent with it's own assumptions. Folk are not going to accept a model like that, once that's been pointed out. The point is...even Bayesian predictions need to have good frequentist properties in this way (the modelling methodology has to have good frequentist probabilities), or the probabilities that come out are not plausible. Over your career as a bomb disposal expert, you would try and defuse 1000s of bombs. You'd hope that the models that you use would have the property that the 90% quantile is exceeded 10% of the time, over the course of your career.
      I'm not sure I quite understand the second point. Is the point that it's non-stationary? If that's the point, then that's just like weather forecasts. And weather forecasts need to have the property that the daily 90% quantile is exceeded 10% of the time (over many repeats), otherwise they are not plausible (whether they come from a Bayesian analysis or something else).
      There was a famous paper on this topic by Philip Dawid back in the 80s, which is a good starting point. And I just wrote a paper about it, but it's not published just yet...
      The difference between what you are saying and what I am saying is very subtle, and I'm not sure I 100% understand it. I am making a point about predictions, that all predictions need to have good frequentist properties. I don't think anyone would really disagree with my point. With your bomb, you are making a point about the definition of probability for real events, and I don't think anyone would disagree with your point either. I think the resolution is that we're not really disagreeing because we are talking about different things.

    • @the_multus
      @the_multus 12 วันที่ผ่านมา

      @@simonpedley9729 oh, I see now. We were just talking about different things: predictions in a model environment and a real environment. I do agree, that a model should stand to frequentist analysis. I've just provided two examples of objects, which don't demonstrate a cyclical (in a sense) behaviour but could still be reasonably described by probabilistic methods.
      That's all. Good point.