Brilliant video as usual, keep up the nice work you clearly seem to be headed towards being a great educational content creator geared towards statistics
First of all, thank you for your videos, I find them great to expand and reinforce what I learn in my statistics classes. Just one question: at 12:04 how did you calculate the null distribution?
Thanks! That’s exactly what I had in mind with these videos, so I’m glad they’ve been helpful. The dataset I simulated has 30 observations, so this results in a null t-distribution with 29 degrees of freedom. One degree of freedom is “spent” estimating the sample mean that’s used in the test statistic, so that’s why it’s n-1 (where “n” is sample size)
@@very-normal Thanks for answering so fast! Not even my teachers are that quick. So, I assume simulating the dataset you mentioned has some hard math or something like that and that's why you didn't get into how to simulate it?
Oh no, it’s not hard math, I actually generated data from a specific normal distribution and I rounded the values to make them look like integers. I simulated it to look like plausible data Originally, I didn’t think it would be helpful to show how I simulated it but it ended up coming up!
You mention that you're leaning towards the bayesian approach to statistical data analysis, which is cool because its also the case for me! I noticed you don't have a video on bayesian statistics on your channel. Are you planning on doing one? And if so, are you going to give a shot at explaining the frequentist-bayesian divide? It seems to me like a hot topic rn and I'm still looking for good ways to vulgarize the issue. Love your channel!
Yeah, thanks for watching! Part of what makes it hard is that I’m still sorting out how to include stuff like MCMC and Stan when we move past stuff like conjugate priors. Maybe I’m overcomplicating it lol. And yeah, I’ve been brainstorming how to approach a frequentist-Bayesian video as well, it would be a great long form video
@@very-normal I don't think it's possible to overcomplicate MCMC haha. Personally, I like the way McElreath puts it in his book statistical rethinking. He basically puts MCMC (and Stan) as a new and improved method of estimating the posterior distribution, former methods being the analytical approach (working out the math = hard), grid approximation (computer intensive) and quadratic approximation (limited to normally distributed posterior). This might be an oversimplification however, I'm new to this stuff.
The null hypothesis being rejected usually is the conclusion of that experiment. You’d probably follow up with another experiment based on what you’ve learned from the previous one
@@very-normal Nice. Again. Thank you for this video. This was quite a good explanation of where all those formulas come from. I was always just provided the formula to calculate the t-statistic but never knew why I had to use this specific formula, this video explained it really well
In your calculation of the t.statistic you calculate it as t = (sample.mean - null.mean) / (sqrt(sample.variance /n)) . I thought the standard error is calculated as sample variance / sqrt(n) ?
By Central limit theorem, the variance of difference of sample means is sigma^2 / n, or the population variance divided by sample size. So the resulting standard deviation/error is sigma / sqrt(n), and that’s what will appear in the denominator. Then we sub in the estimate for sigma there. If you multiply a normal distribution by a constant “a”, then the variance is scaled by “a^2”. So we have to scale by the standard deviation instead of the variance
And another question. In your video on the normal dist., you mentioned that the mean and variance of the normal are independent and that this is important for the t-dist., but I don't think that you explicitly discussed this here. So why does it matter and what would change if there were dependent?
I’m happy that you’re keeping past material in mind! That’s what I want people to do. I had originally intended to mention this again in the next video just to shorten this one. In short, if the sample mean and sample variance are dependent on some way, it suggests there may be some covariance between them. My educated guess is that this will contribute to higher variance in the resulting distribution, more than what is predicted by CLT. This will results in more faulty decisions downstream (type-I and type-II errors). I’m sure the sampling distribution will still look bell shaped, but would have even fatter tails than the t-distribution. This could be checked via simulation, could be interesting to look at
@@very-normal Thanks for the reply. That sounds quite right. I love that you try to connect intuition with the math. Sadly not many teachers are able to do this.
Does this mean that the hypothesis testing must always be performed with just means? Or maybe if I want to do hypothesis testing for a thing that isn't the mean I should use other test? I ask because the t-statistical formula seems to only work with means, the sample mean - population mean, so that's confusing for me at the moment.
Not always, but for the most common research problems, it’s with a mean or a difference in means. If not the mean, some other value that represents “typicalness” like a median can be used, but it involves a different hypothesis test. If you can tell me what you’d want to test that’s not a mean, I can try to give some more specific advice as well
@@very-normal Thanks for answering, it's clearer now. I don't have a concrete example at the moment. I was just curious if working with parameters other than the mean was possible
Yeah for sure. If it helps any, a good example of a test not using the mean is ANOVA, the analysis of variance. Weirdly enough, it is still a test for comparing means, but the test statistic itself for ANOVA is based on variances. There are also hypothesis tests for variance as well, but it is rare that people want to study the variance itself. I’d say they are moreso used as supporting evidence to justify using certain tests.
the only thing i didn't understand is the 0.05 i mean why we chose it ? what is special about 0.05 exactly? is it used for all problems or it can be calculated to fit the situation better ? using one constant assumption for all situations does't make sense i hope anyone answer my question
Nothing is inherently special about 5%, it just happens to be pretty small. It’s often chosen based on how safe or dangerous a Type-I error is. If a Type-I error is safe, then we might even tolerate a higher rate like 10%. But if we really want to decrease the chance of one happening if it’s really dangerous, we could go for something like 1%.
I know that the sample variance is the estimator for population variance but if we are testing whether we can reject H0 why don't we use H0 mu in the calculation of the std at the denominator? It's a random thought. I can also imagine the impact as we use H0 mu instead of the sample mean for the sample variance we will get a higher variance as we won't use the value that minimize the deviation in that sample so I guess the tstat will be smaller and the test less likely to pass. Anyway I was wondering if there is a reason related to the tstat distribution maybe being biased if we don't use that and yada yada yada, does it make sense?
That’s a good question. I don’t know the precise answer, but I feel it’s because we’d still like the sample variance to just be calculated from the data. Using the null mean in the calculation would tie the variance to the null hypothesis, but I think we are meant to only use the data in the variance. You’re definitely right that it would probably have downstream effects on the error rates of our decision making
Great video! I feel like you're supplementing all the topics that I was shaky on when I was in college... The t's "student" test or distribution had so many strange notational decisions that it made it confusing to me thinking that it was a distribution, but at the same time you have to calculate a t_{\text{df}} value with n - 1 DoF... Your example helped me finally tie together p-values, distinguish between samples & populations, and the code example was great to help me confirm that I was actually following which symbols signified what. I think I'm still a little fuzzy on where the t's distribution actually comes from. How are we comparing the t's statistic to the t's distribution generated from our null hypothesis? Is the p-value itself that value? Keep up the awesome content! You're helping demystify statistics for a much wider audience :) Are there any PDEs that get solved stochastically that implement statistical methods to solve them? I'll try not to pack too much in one comment, but you've got me hooked!
Thanks! That’s what I was hoping people would get out of this series of video. I chose to omit the origin of the t-distribution in the video because I felt it was too technical without giving much insight. With some mathematical manipulation, the t-statistic can be shown to be the ratio of two random variables: 1) a standard normal and 2) a function of a chi-squared distribution (specifically: the square-root of a chi-squared distribution with n-1 degrees of freedom, divided by n-1). Taken together, this ratio produces the t-distribution. Because standardization is a common operation, this ratio also appears very frequently in statistics, so much so that it’s more convenient for us to give it it’s own common name so that it’s easier to refer to. And yes you’re right! The p-value comes from seeing where our test statistic (here, the t-statistic) falls within the null distribution (here, a t-distribution with n-1 degrees of freedom). I’ve always felt that these old naming conventions really hurt how we learn about this test. Hope this helps!
@@very-normal Thank you for your comprehensive response! Never knew that it was related to the chi-squared distribution... In school, I had to use statistics to quantify different transport phenomena and thermodynamic experiments. Unfortunately, the t's test didn't make any sense to me during the labs or it'd be covered in ~5 minutes during the pre-lab lecture. So oftentimes I'd just have a mean, SD, and RMS. Curve fitting was a huge thing too that I never quite understood, but at least the libraries are there for that... :) Do you have plans for a chi-squared video in the future? I'd love to see what you cook up!
Yeahhh, I’ve been dabbling with the more advanced function estimators and it’s tough. I’m glad I don’t have to be the one to implement those models lol And yeah! My goal with this series is to try to cover most introductory topics in statistics and see how I feel from there, so tests involving chi-squared statistics are definitely in that scope. Thanks again for your continued viewership!
At 7:53, why do you divide sigma^2 by n? This probably has sth. to do with the CLT. EDIT: Its discussed around minute 9 in the video on the normal distribution
Right! The CLT implies that the variance of the sample mean is the population variance, divided by the sample size. In more technical terms, it’s the variance of the test statistic, not the original data
@@very-normal I am not sure I understand the last sentence regarding the variance. The CLT says that the dist. of xbar will converge to N(mu, sigma^2/n), where mu and sigma are the "true" population means of X_i. So do you mean the variance of the distribution of the X_is by "variance of test statistic". And thank you for taking the time to answer to all my questions.
Ah sorry for the confusion. In this context, the test statistic itself is x-bar, the sample mean. As you mentioned, sigma-squared divided by n is the variance of the distribution of x-bar (aka the test statistic). The variance of the individual X_i’s is actually the population variance, since they are the data. Since x-bar is calculated from the data, it itself is a random variable and therefore also has a distribution. And thankfully, CLT tells us what this distribution is. The parameters of this distributed are related to the population parameters, but are not always equal to them (as seen in the variances). I hope this helps clarify! This is a sticking point for even many of the graduate students I work with
This video series is really fantastic, thank you! One truly enlightening concept has been the way that you explain the relationship between data, random variables, probability distributions, and statistics, and how all of these concepts are foundational for NHST. One naive clarification question. A difference that I’ve noted between the videos in this series and similar introductory teaching materials*, is that you don’t seem to focus much on the concept of the ‘sampling distribution,’ and the distinction between data derived from a single sample and the frequentist concept of repeated sampling from the population. Is it accurate to say that the ‘distribution of *the* sample mean’ that you refer to in this video is a ‘distribution of *repeated* sample means’, - same for the distribution of the sample variances? IOW, doesn’t the CLT allow us to approximate the ‘mean of sample means’ rather than the mean of a single sample of n observations? Apologies if this is simply semantics, but historically, when I’ve tried to apply NHST concepts to the lab, as in, I’ve done an experiment one time, e.g. collecting the body weight of 10 mice on high fat diet, and now I want to make an inference about all mice on a high fat diet, I’ve confused these concepts. (*apologies if this is an unfair/inaccurate characterization of your videos and I’ve missed something, certainly possible!)
Thanks for watching! Hopefully I can clarify a little bit here. Yes, I think the way you've connected it is correct. When I talk about "the distribution of the sample mean," I'm also implicitly referring to the fact that the sample mean is a random variable and will vary if we collect different datasets. And this idea does extend to the sample variance. It is estimated from data, so it will have its own (sampling) distribution. But just because something is a random variable doesn't mean that its distribution is easy to describe. But in the case of the sample mean, CLT tells us we can approximate the distribution of the sample mean with a Normal distribution. The mean of this distribution ("mean of sample means") is the population-level mean, which is usually the thing you want to know. It's not quite that CLT lets you "approximate" the population means, but it tells you that the sample mean you actually see is "close" to this population mean. Unfortunately with NHST, you have no way to confirm that the population mean actually is, only what it isn't. The quantity you'd be interested in is the average body weight of lab mice (presumably on some intervention that will alter it). This is a population-level quantity you want to know. But you can only collect data from a subset of this population. So, sample average you calculate will be a little different from the population-level. 10 isn't a lot, but theoretically if you use many more mice, the average weight of your sample will get closer to the population weight you want. I hope this clarifies a bit more!
Brilliant video as usual, keep up the nice work you clearly seem to be headed towards being a great educational content creator geared towards statistics
Great videos, don’t change when u blow up
First of all, thank you for your videos, I find them great to expand and reinforce what I learn in my statistics classes. Just one question: at 12:04 how did you calculate the null distribution?
Thanks! That’s exactly what I had in mind with these videos, so I’m glad they’ve been helpful.
The dataset I simulated has 30 observations, so this results in a null t-distribution with 29 degrees of freedom. One degree of freedom is “spent” estimating the sample mean that’s used in the test statistic, so that’s why it’s n-1 (where “n” is sample size)
@@very-normal Thanks for answering so fast! Not even my teachers are that quick. So, I assume simulating the dataset you mentioned has some hard math or something like that and that's why you didn't get into how to simulate it?
Oh no, it’s not hard math, I actually generated data from a specific normal distribution and I rounded the values to make them look like integers. I simulated it to look like plausible data
Originally, I didn’t think it would be helpful to show how I simulated it but it ended up coming up!
I'll watch a video about how you simulated it. Anyway, thanks again! @@very-normal
Absolutely love your videos, I was interested in biostats and thought I’d ask where you’re enrolled!
Love your videos.
Just studied that topic today in the class, and this video was like a revision.❤🎉
Great video! This channel is awesome!
You mention that you're leaning towards the bayesian approach to statistical data analysis, which is cool because its also the case for me! I noticed you don't have a video on bayesian statistics on your channel. Are you planning on doing one? And if so, are you going to give a shot at explaining the frequentist-bayesian divide? It seems to me like a hot topic rn and I'm still looking for good ways to vulgarize the issue.
Love your channel!
Yeah, thanks for watching! Part of what makes it hard is that I’m still sorting out how to include stuff like MCMC and Stan when we move past stuff like conjugate priors. Maybe I’m overcomplicating it lol. And yeah, I’ve been brainstorming how to approach a frequentist-Bayesian video as well, it would be a great long form video
@@very-normal I don't think it's possible to overcomplicate MCMC haha. Personally, I like the way McElreath puts it in his book statistical rethinking. He basically puts MCMC (and Stan) as a new and improved method of estimating the posterior distribution, former methods being the analytical approach (working out the math = hard), grid approximation (computer intensive) and quadratic approximation (limited to normally distributed posterior). This might be an oversimplification however, I'm new to this stuff.
If the null hypothesis ends up not being rejected (and not supported either), what is our next step to obtain a satisfying conclusion?
The null hypothesis being rejected usually is the conclusion of that experiment. You’d probably follow up with another experiment based on what you’ve learned from the previous one
Hey, thanks for the video. Will you also go in bayesian or Monte-Carlo simulation based things?
Yesss, Monte Carlo sooner than Bayesian, but both are in the works
@@very-normal Nice. Again. Thank you for this video. This was quite a good explanation of where all those formulas come from. I was always just provided the formula to calculate the t-statistic but never knew why I had to use this specific formula, this video explained it really well
At 10:27, shouldn't the sample size start from n=2 since the df equal n-1?
Yeah you’re right, but I needed the change to be more visible so I played around with with df I progressed through
@@very-normal So you actually start from n=1, which means df=0? I think that is ill-defined for Student's t-distribution.
Amazing video, but i would like to see some of the calculations
What kind of calculation did you have in mind?
In your calculation of the t.statistic you calculate it as t = (sample.mean - null.mean) / (sqrt(sample.variance /n)) . I thought the standard error is calculated as sample variance / sqrt(n) ?
By Central limit theorem, the variance of difference of sample means is sigma^2 / n, or the population variance divided by sample size. So the resulting standard deviation/error is sigma / sqrt(n), and that’s what will appear in the denominator. Then we sub in the estimate for sigma there.
If you multiply a normal distribution by a constant “a”, then the variance is scaled by “a^2”. So we have to scale by the standard deviation instead of the variance
And another question. In your video on the normal dist., you mentioned that the mean and variance of the normal are independent and that this is important for the t-dist., but I don't think that you explicitly discussed this here.
So why does it matter and what would change if there were dependent?
I’m happy that you’re keeping past material in mind! That’s what I want people to do.
I had originally intended to mention this again in the next video just to shorten this one. In short, if the sample mean and sample variance are dependent on some way, it suggests there may be some covariance between them. My educated guess is that this will contribute to higher variance in the resulting distribution, more than what is predicted by CLT. This will results in more faulty decisions downstream (type-I and type-II errors). I’m sure the sampling distribution will still look bell shaped, but would have even fatter tails than the t-distribution. This could be checked via simulation, could be interesting to look at
@@very-normal Thanks for the reply. That sounds quite right. I love that you try to connect intuition with the math. Sadly not many teachers are able to do this.
Does this mean that the hypothesis testing must always be performed with just means? Or maybe if I want to do hypothesis testing for a thing that isn't the mean I should use other test?
I ask because the t-statistical formula seems to only work with means, the sample mean - population mean, so that's confusing for me at the moment.
Not always, but for the most common research problems, it’s with a mean or a difference in means. If not the mean, some other value that represents “typicalness” like a median can be used, but it involves a different hypothesis test.
If you can tell me what you’d want to test that’s not a mean, I can try to give some more specific advice as well
@@very-normal Thanks for answering, it's clearer now. I don't have a concrete example at the moment. I was just curious if working with parameters other than the mean was possible
Yeah for sure. If it helps any, a good example of a test not using the mean is ANOVA, the analysis of variance.
Weirdly enough, it is still a test for comparing means, but the test statistic itself for ANOVA is based on variances. There are also hypothesis tests for variance as well, but it is rare that people want to study the variance itself. I’d say they are moreso used as supporting evidence to justify using certain tests.
8:13 But usually I don't know the pop. mean and sd, so I guess in the real word one would use the sample mean/sd?
Nvm. explained like 10 seconds later
the only thing i didn't understand is the 0.05
i mean why we chose it ? what is special about 0.05 exactly?
is it used for all problems or it can be calculated to fit the situation better ?
using one constant assumption for all situations does't make sense
i hope anyone answer my question
Nothing is inherently special about 5%, it just happens to be pretty small. It’s often chosen based on how safe or dangerous a Type-I error is. If a Type-I error is safe, then we might even tolerate a higher rate like 10%. But if we really want to decrease the chance of one happening if it’s really dangerous, we could go for something like 1%.
I know that the sample variance is the estimator for population variance but if we are testing whether we can reject H0 why don't we use H0 mu in the calculation of the std at the denominator? It's a random thought. I can also imagine the impact as we use H0 mu instead of the sample mean for the sample variance we will get a higher variance as we won't use the value that minimize the deviation in that sample so I guess the tstat will be smaller and the test less likely to pass. Anyway I was wondering if there is a reason related to the tstat distribution maybe being biased if we don't use that and yada yada yada, does it make sense?
That’s a good question. I don’t know the precise answer, but I feel it’s because we’d still like the sample variance to just be calculated from the data. Using the null mean in the calculation would tie the variance to the null hypothesis, but I think we are meant to only use the data in the variance. You’re definitely right that it would probably have downstream effects on the error rates of our decision making
Great video! I feel like you're supplementing all the topics that I was shaky on when I was in college... The t's "student" test or distribution had so many strange notational decisions that it made it confusing to me thinking that it was a distribution, but at the same time you have to calculate a t_{\text{df}} value with n - 1 DoF...
Your example helped me finally tie together p-values, distinguish between samples & populations, and the code example was great to help me confirm that I was actually following which symbols signified what.
I think I'm still a little fuzzy on where the t's distribution actually comes from. How are we comparing the t's statistic to the t's distribution generated from our null hypothesis? Is the p-value itself that value?
Keep up the awesome content! You're helping demystify statistics for a much wider audience :)
Are there any PDEs that get solved stochastically that implement statistical methods to solve them? I'll try not to pack too much in one comment, but you've got me hooked!
Thanks! That’s what I was hoping people would get out of this series of video.
I chose to omit the origin of the t-distribution in the video because I felt it was too technical without giving much insight. With some mathematical manipulation, the t-statistic can be shown to be the ratio of two random variables: 1) a standard normal and 2) a function of a chi-squared distribution (specifically: the square-root of a chi-squared distribution with n-1 degrees of freedom, divided by n-1). Taken together, this ratio produces the t-distribution. Because standardization is a common operation, this ratio also appears very frequently in statistics, so much so that it’s more convenient for us to give it it’s own common name so that it’s easier to refer to.
And yes you’re right! The p-value comes from seeing where our test statistic (here, the t-statistic) falls within the null distribution (here, a t-distribution with n-1 degrees of freedom). I’ve always felt that these old naming conventions really hurt how we learn about this test.
Hope this helps!
@@very-normal Thank you for your comprehensive response! Never knew that it was related to the chi-squared distribution...
In school, I had to use statistics to quantify different transport phenomena and thermodynamic experiments. Unfortunately, the t's test didn't make any sense to me during the labs or it'd be covered in ~5 minutes during the pre-lab lecture. So oftentimes I'd just have a mean, SD, and RMS. Curve fitting was a huge thing too that I never quite understood, but at least the libraries are there for that... :)
Do you have plans for a chi-squared video in the future? I'd love to see what you cook up!
Yeahhh, I’ve been dabbling with the more advanced function estimators and it’s tough. I’m glad I don’t have to be the one to implement those models lol
And yeah! My goal with this series is to try to cover most introductory topics in statistics and see how I feel from there, so tests involving chi-squared statistics are definitely in that scope. Thanks again for your continued viewership!
At 7:53, why do you divide sigma^2 by n? This probably has sth. to do with the CLT.
EDIT: Its discussed around minute 9 in the video on the normal distribution
Right! The CLT implies that the variance of the sample mean is the population variance, divided by the sample size. In more technical terms, it’s the variance of the test statistic, not the original data
@@very-normal I am not sure I understand the last sentence regarding the variance. The CLT says that the dist. of xbar will converge to N(mu, sigma^2/n), where mu and sigma are the "true" population means of X_i. So do you mean the variance of the distribution of the X_is by "variance of test statistic". And thank you for taking the time to answer to all my questions.
Ah sorry for the confusion. In this context, the test statistic itself is x-bar, the sample mean. As you mentioned, sigma-squared divided by n is the variance of the distribution of x-bar (aka the test statistic). The variance of the individual X_i’s is actually the population variance, since they are the data. Since x-bar is calculated from the data, it itself is a random variable and therefore also has a distribution. And thankfully, CLT tells us what this distribution is. The parameters of this distributed are related to the population parameters, but are not always equal to them (as seen in the variances). I hope this helps clarify! This is a sticking point for even many of the graduate students I work with
@@very-normal I think I got it. Thanks again :). I guess I will find out when if I really understood it when we discuss the two sample t-test
This video series is really fantastic, thank you! One truly enlightening concept has been the way that you explain the relationship between data, random variables, probability distributions, and statistics, and how all of these concepts are foundational for NHST. One naive clarification question. A difference that I’ve noted between the videos in this series and similar introductory teaching materials*, is that you don’t seem to focus much on the concept of the ‘sampling distribution,’ and the distinction between data derived from a single sample and the frequentist concept of repeated sampling from the population.
Is it accurate to say that the ‘distribution of *the* sample mean’ that you refer to in this video is a ‘distribution of *repeated* sample means’, - same for the distribution of the sample variances? IOW, doesn’t the CLT allow us to approximate the ‘mean of sample means’ rather than the mean of a single sample of n observations? Apologies if this is simply semantics, but historically, when I’ve tried to apply NHST concepts to the lab, as in, I’ve done an experiment one time, e.g. collecting the body weight of 10 mice on high fat diet, and now I want to make an inference about all mice on a high fat diet, I’ve confused these concepts.
(*apologies if this is an unfair/inaccurate characterization of your videos and I’ve missed something, certainly possible!)
Thanks for watching! Hopefully I can clarify a little bit here.
Yes, I think the way you've connected it is correct. When I talk about "the distribution of the sample mean," I'm also implicitly referring to the fact that the sample mean is a random variable and will vary if we collect different datasets. And this idea does extend to the sample variance. It is estimated from data, so it will have its own (sampling) distribution. But just because something is a random variable doesn't mean that its distribution is easy to describe. But in the case of the sample mean, CLT tells us we can approximate the distribution of the sample mean with a Normal distribution. The mean of this distribution ("mean of sample means") is the population-level mean, which is usually the thing you want to know.
It's not quite that CLT lets you "approximate" the population means, but it tells you that the sample mean you actually see is "close" to this population mean. Unfortunately with NHST, you have no way to confirm that the population mean actually is, only what it isn't.
The quantity you'd be interested in is the average body weight of lab mice (presumably on some intervention that will alter it). This is a population-level quantity you want to know. But you can only collect data from a subset of this population. So, sample average you calculate will be a little different from the population-level. 10 isn't a lot, but theoretically if you use many more mice, the average weight of your sample will get closer to the population weight you want.
I hope this clarifies a bit more!
@@very-normal Yes it does, thanks so much for the fast and thoughtful reply!