Explaining the ANOVA and F-test

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024

ความคิดเห็น • 58

  • @johns.7752
    @johns.7752 หลายเดือนก่อน +26

    The law of total variance is what made it make sense for me! None of my classes covered why something called "analysis of variance" would be a hypothesis test for significantly different means.

  • @princeofrain1428
    @princeofrain1428 หลายเดือนก่อน +18

    I wish my statistics classes had gone this deep into ANOVA. Unfortunately, we were limited by time constraints and sort of took for granted why they work. Thank you for providing more background context in a fun and engaging way!

    • @Apuryo
      @Apuryo หลายเดือนก่อน +1

      At my school, linear models is a two year course, regression and anova get their own semester then we do generalized models and other things

  • @berjonah110
    @berjonah110 หลายเดือนก่อน +6

    An additional point on using ANOVA in practice: the F-test can only tell you that a difference between the means is present, not necessarily which groups are different or not. You have to use a more specific test (Tukey's HSD) to compare specific groups against each other.

  • @smoother4740
    @smoother4740 หลายเดือนก่อน +4

    This is the best explanation of the ANOVA I've seen so far. It directly answer why such a test that is testing the "equality" of differents means is called "ANOVA "(Analysis of Variance). I also liked how you showed its direct connection with the F-statistic using the actual equations. Keep up the good work!

  • @R.H111
    @R.H111 หลายเดือนก่อน +2

    Hey dude. I'm in Highschool and I got back my (self studied) AP statistics score earlier today. Scored a 5/5. I don't think I could've done it without you lol. tysm.

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      Great job! I’m sure I only played a small role in that, you’re the one who hustled to learn the material, congratulations!

  • @lucasortengren3844
    @lucasortengren3844 หลายเดือนก่อน +2

    Immensely underrated channel, 46k subscribers is criminal

  • @doentexd4770
    @doentexd4770 หลายเดือนก่อน +2

    Christian, would you consider making a video specifically about multiple regression? I still don't have an intuitive understanding of why the Gauss-Markov hypothesis need to be confirmed in order to make inferences, and I think your videos would be of great help for you're an incredible teacher. Thank you for your work! Keep it up!

    • @samlevey3263
      @samlevey3263 หลายเดือนก่อน

      It's because the assumptions of the Gauss-Markov theorem are used to determine what the standard errors of the coefficient estimators are. So, if those assumptions aren't met, but you still calculate the standard errors in the same way as you would if they were met, then you're going to get incorrect values for the standard errors. Then you use those standard errors to calculate t-statistics and such, so you'll get incorrect values for the t-statistics, and hence incorrect confidence intervals and potentially incorrect results for hypothesis tests.

  • @Apuryo
    @Apuryo หลายเดือนก่อน +5

    what's crazy is that my stat inference midterm is literally tomorrow, it's about one way anova 🤣

    • @very-normal
      @very-normal  หลายเดือนก่อน +5

      👀 good luck!

  • @walterreuther1779
    @walterreuther1779 หลายเดือนก่อน

    Oh, I love it that you not only know the term Homoskedasticity but also mention it as an assumption we are taking!
    Sometimes I ask Psychologists about what they think of Nassim Taleb's criticism of IQ - it being too heteroskedastic - and then usually their looks give away that they have never learned about Heteroskedasticity in their Psychometric lessons... I think this is sad, so all the better you mention it ;-)

  • @mclovin312
    @mclovin312 หลายเดือนก่อน

    Thanks for continuously producing these videos! Your channel is by far the best explainer on statistics compared to other TH-cam channels IMO. I’m curious: what software do you use to create the videos? PowerPoint?

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      Thanks! I use Final Cut Pro for editing, Figma and Midjourney for graphics and the manim python library for animations

  • @yorailevi6747
    @yorailevi6747 หลายเดือนก่อน

    I want to mention I am currently taking aparametric stats course! so I understand the vids about it better!

  • @yazer9821
    @yazer9821 หลายเดือนก่อน +1

    can you do a video on GLMs please!! Your videos are great

  • @RomanNumural9
    @RomanNumural9 หลายเดือนก่อน

    I think an important note on this is that the more populations you check the higher the likelihood is that one differs significantly by sheer luck. If instead of 5 cancers you're checking 100, the odds that statistical fluke will make one mean look further away from the others is fairly high.

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      Yeah I thought about covering multiplicity here, but it deserves its own video

    • @statswithbrian
      @statswithbrian หลายเดือนก่อน +3

      This is not true with ANOVA. It has a type I error rate of 5% for finding *any* difference, not for each particular difference. If you had 1 million populations that were all the same, you would still only have a alpha% chance of finding a fluke. This is the advantage of running an ANOVA and not just running a bunch of two-sample pairwise tests.

  • @GeoffryGifari
    @GeoffryGifari หลายเดือนก่อน +1

    Hmmm what if 5 out of 6 drug-organ pairs see success in cancer treatment? (1 mean singled out from the group, but not what we expect)
    Or if the group means are clustered, split in half (pairs 1,2,3 have the same mean, so do pairs 4,5,6)?

    • @very-normal
      @very-normal  หลายเดือนก่อน +2

      You’d have a similar conclusion. The ANOVA is only detecting that at least one of them is different, so if that’s the case, there should be some compelling evidence to reject the null hypothesis. But to actually figure out *which* one is different, you’d need to follow up with secondary testing for each of the means

  • @1.4142
    @1.4142 หลายเดือนก่อน

    Wow I was just working on this exact scenario

  • @Iachlan
    @Iachlan หลายเดือนก่อน

    Can you explain the statistics behind weather prediction

    • @very-normal
      @very-normal  หลายเดือนก่อน

      I’m not very well versed it in, but it sounds like it’d be a fancy, high dimensional regression model

  • @AnkhArcRod
    @AnkhArcRod หลายเดือนก่อน +1

    @Very Normal What textbook would you suggest for this content?

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      Rosner’s Fundamentals of Biostatistics (7th ed) is a good source with a solutions manual that can also easily be found online

    • @AnkhArcRod
      @AnkhArcRod หลายเดือนก่อน

      @@very-normal Thanks! And I must say that you are an excellent teacher.

  • @Imperial_Squid
    @Imperial_Squid หลายเดือนก่อน +1

    Could you explain a bit further about the "residuals are normally distributed not that the variable is normally distributed itself" thing? This is one of the things that trips me up most often..

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      Yeah for sure, I’ll try my best. This is partially my opinion, so just a heads up.
      My feeling is that assuming something about the data itself is much stronger than assuming something about the residuals. Very rarely will real-world data follow nice distributions like the Normal, so it’s harder to convince people (read: the statistical referee) that this will hold up.
      On the other hand, assuming that the residuals is not so bad. It’s like saying, we know there’s an average outcome and people will differ from this average, but they won’t differ too badly from it. In other words, outlier residuals are very rare. It’s confusing because this residual assumption implies that the outcome is also normally distributed in this, but it’s important to note that it’s the residual assumption we make.
      It’s also important because with stuff like linear regression, we’re looking at how different values of the predictor (i.e. cancer group) shift the distribution of the outcome. If you assume the data itself to have an outcome, it gets more complicated to try to work in how other variables influence it. Assuming the distribution is on the residuals doesn’t come with this baggage.
      Some people are taught that they should try to transform the outcome so that it “works better” with linear regression or ANOVA. Even though you’re manipulating the outcome, the hope is that this transformations makes the -residuals- look more normal.
      I hope this helps clarify somewhat. If anyone else sees this and thinks I left something out, please chip in. This is a common question, but even I don’t feel like I get all the nuances.

    • @Imperial_Squid
      @Imperial_Squid หลายเดือนก่อน +2

      @@very-normal "it's confusing because this residual assumption implies the outcome is also normally distributed in this" yeah that's the bit that always tripped me up, like I get that you can make one or other the core assumption and build it up from there (it's like picking your axioms in pure maths or something), but in my head the fact that the kinda nebulous residuals assumption implies the much more intuitive distribution assumption meant that I was often fighting between intuition and logic in terms of thinking it through. It also doesn't help that thinking of an example where the residuals are Normal but the distribution _isn't_ is much harder...
      So it's more about being an assumption of convenience in that it makes the maths much nicer to deal with and is also a weaker and more generalisable assumption, rather than it being anything else like purity or tradition or something.
      Thanks, I think I get it now! Though no doubt this will be one of those weird bits that'll always feel a little bit of, I feel like I have a much better grasp of the rationale! Much appreciated!

  • @chillphil967
    @chillphil967 หลายเดือนก่อน +2

    1:19 is there heart cancer? i thought no, since the cells are from birth. cool video either way, thx!

    • @very-normal
      @very-normal  หลายเดือนก่อน +4

      I saw it was really rare, but deep down, I was just looking for an emoji to represent the group lol 😅

  • @walterreuther1779
    @walterreuther1779 หลายเดือนก่อน +1

    Question: What to do when the assumption of h̶o̶m̶o̶s̶k̶e̶d̶a̶s̶t̶i̶c̶i̶t̶y̶ homogeneity of variance is not met, i.e. there are different variances in the different populations?
    I would think this is a rather major assumption, especially if the sample size is small, as that would make ̶h̶e̶t̶e̶r̶o̶s̶k̶e̶d̶a̶s̶t̶i̶c̶i̶t̶y̶ heterogeneity of variance harder to test...
    Shouldn't one not always in some form test for ̶h̶e̶t̶e̶r̶o̶s̶k̶e̶d̶a̶s̶t̶i̶c̶i̶t̶y̶ heterogeneity of variance? Is this done in practice?
    Edit: Sorry, I wrote homoskedasticity and heteroskedasticity, but I meant homogeneity of variance and heterogeneity of variance. (The former assumes constant variance in the regressor variables, while the latter assumes the same variance for different sub-populations.

    • @zaydmohammed6805
      @zaydmohammed6805 หลายเดือนก่อน

      Same question here. In regression I remember them teaching us that you can scale down the data with the different variances in presence of heteroscedasticity. I wonder if that would work here or we have to do some sort of non parametric test

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      Yeah, common variance is a pretty strong assumption to make. One solution I know of is a variant of the ANOVA called Welch’s ANOVA that can be used when you don’t want to make this assumption.
      It’s from the same guy behind Welch’s t-test, the version that students learn for two-sample problems when they also can’t assume common variance.

    • @walterreuther1779
      @walterreuther1779 หลายเดือนก่อน

      @@very-normal Thank you that's great to know. It seems like Welch's ANOVA is really the way to go, both for small sample size and for no knowledge about the data. (Apparently, it is almost as powerful as the standard ANOVA, even if heterogeneity of variance is fulfilled, so...)

  • @AJ-tr4jx
    @AJ-tr4jx หลายเดือนก่อน

    what if the drug has effect on all the test group and the means for all the groups are shifted the same amount?

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      You’d prolly get a null result. If you shift all the distributions by the same amount, there wouldn’t be a change in the variance in group means

  • @bcs1793
    @bcs1793 วันที่ผ่านมา

    At 9:17, shouldn't Y_i be \mu_j? Or \mu_i, depending on what you are summing over

    • @very-normal
      @very-normal  วันที่ผ่านมา

      My notation was a little sloppy here... I think you are right. The denominator is supposed to be the variance of the residuals, but my sum doesn't look like it there. Thanks for catching that

  • @Iachlan
    @Iachlan หลายเดือนก่อน

    In the one sample t test, we take alpha error to be cconstant and play around with error beta. Could we do it the other way around what would the implications be?

    • @very-normal
      @very-normal  หลายเดือนก่อน

      you could, but most of the time we’re interested in detecting a significant effect, so power is the thing we want to maximize. There’s a trade off between reducing type-I error and power, so we choose to keep alpha constant to signify we tolerate a defined probability of making a wrong decision about rejecting the null

  • @abcpsc
    @abcpsc หลายเดือนก่อน

    At 9:22, why are they Chi square distributed?

    • @very-normal
      @very-normal  หลายเดือนก่อน

      It comes from the distribution assumption on the residuals.
      The residuals were assumed to be normally distributed with some variance, sigma^2. You if you divide the sum of squares by sigma^2, then you get a random variable that’s a standard normal, squaring that gives you a chi-squared distribution. This applies to both the numerator and denominator in the F-statistic.

  • @dullyvampir83
    @dullyvampir83 หลายเดือนก่อน

    If the residues are normally distributed are then the original data not normal distributed as well? Aren't they just shifted by the mean?

    • @very-normal
      @very-normal  หลายเดือนก่อน +2

      You’re right, I just wanted to emphasize that the main assumption is on the residuals. It implies that the outcome is normally distributed, but it’s more of a consequence of the fact that the residuals are normally distributed, rather than an assumption of the model

  • @jasondads9509
    @jasondads9509 หลายเดือนก่อน

    anova did my head in stats, i

  • @chillphil967
    @chillphil967 หลายเดือนก่อน

    🎉

  • @dibyajyotisaikia11
    @dibyajyotisaikia11 หลายเดือนก่อน

    I think example is incorrect, if the new drug is effective on different types of cancer , anova may still show statistically non significant inspite the drug being effective leading to wrong conclusion drawn and loss to the company 😂

    • @very-normal
      @very-normal  หลายเดือนก่อน +2

      that’s all hypothesis tests tho lol

    • @dibyajyotisaikia11
      @dibyajyotisaikia11 หลายเดือนก่อน

      @@very-normal I meant you need atleast one more group of standard or control to come to any conclusion regarding efficacy

  • @synchro-dentally1965
    @synchro-dentally1965 หลายเดือนก่อน +1

    I heard recently that Fisher was great at stats but not the best in moral and ethical character.

    • @very-normal
      @very-normal  หลายเดือนก่อน +1

      yeahhh he had some L opinions with smoking and eugenics

  • @vegetableball
    @vegetableball หลายเดือนก่อน

    Wait... You spend most of the time about ANOVA test and make an irrelevant simulation. Could you make a better simulation that looks more like the cancers and drugs problem we were looking at?

    • @very-normal
      @very-normal  หลายเดือนก่อน

      i don’t have access to data like that, so a simulation from an particular situation was the next best thing lol

  • @femboymadara
    @femboymadara หลายเดือนก่อน

    ur the goat