“The expected difference between the “correct” formula of the variance and the “wrong” one with n and the sample mean, is equal to the variance of the sample mean!” This one sentence untied a knot! Thank you very much. This video is by far, and I watched various explanation of degrees of freedom etc., the best I've seen.
Not sure if I understand you correctly, but your suggestion is basically to only look at one of the triangles of the matrix without the diagonal. Is that correct? In that case you would have to divide by n(n-1) only and not by 2 to get the same value.
@@statsandscience I'll try and elaborate more: The sum of all the squares is 812. Divided by n^2 (because that's the number of squares we're considering), that's 16.571… Divided by 2, that's 8.28571… And finally, with Bessel's correction, it's 9.666… The sum of half of the squares is 406. Divided by n(n-1)/2 (because that's the number of squares we're considering), that's 19.333… So we still have to divide by 2 to get 9.666… I'm not saying the formula is wrong, just the explanation ("each value is in here twice"). In both cases, you have to divide by the number of squares AND by 2. So the division by 2 is not explained by counting the squares twice.
@@cannot-handle-handles Yes, I deleted my comment because I noticed my mistake prior to your answer. It makes sense what you say, I did not do this test before. Do you have a good intuitive explanation for the 2?
Thanks for the explanations! I'd been meaning to learn about this for ages, but just hadn't gotten around to it, haha. 😅 Something that might be helpful is that if you put times and labels in a list in the description, TH-cam will now automatically split up the play bar into chapters, as long as the first one is 0:00, so something like: 0:00 - Intro 1:33 - Terminology 4:28 - Estimating the mean or variance 9:20 - Why is the version with n biased? 17:03 - Why does n-1 save it? (explanation 1) 21:56 - Why does n-1 save it? (explanation 2) 28:42 - Summary
Thanks for the best video I've ever seen on the n-1. Referring to the key questions at 8:37 I was hoping to find the answer to a more specific question #2: not just "why isn't it n-2 or n-pi" but "why does the correction factor (n-1/n) not depend on the ratio between the sample size and the population size?" That is, if I know the population is 1000, and I choose a sample of 10 vs. a sample of 999, why wouldn't I use different correction factors to get the best answer? After all, my sample of 999 is going to be darn close to the true population variance whereas my sample of 10 is going to be way off. Your video kind of implies, but doesn't say directly (wish it did) that the n-1 "solution" provides the "average" correction factor you might need for any possible sample size relative to the population size, or to say the same thing in another way, the n-1 is the best you can do if you don't actually know the population size. Is that correct? If we DO know the population size exactly, then can we choose a better correction factor that is tailored to that particular sample size : population size ratio?
To make an even clearer statement of the problem...suppose my sample size is always 499. Now suppose that the actual population is either 500 or 1000. So that's 2 cases in total. According to the n-1 rule, I should apply the same correction (499/(499-1)) to 499 samples in a 500 population as I should apply for 499 samples in a 1000 population. That doesn't seem to be the best we can do if we know the actual population size, since I should not need to correct as hard when sample size is very close to population size. So is the n-1 rule designed only for the case where one does not know the population size? If we do know the population size, can we do better? Using what formula?
Sorry that I did not come around earlier to answer this question. You put a lot of effort into this and I hope you still benefit from an answer! When you take intro stats classes, a quite basic assumption that lurks basically everywhere is that the population you are dealing with is infinite. Of course, this assumption is also basically always wrong. Usually that does not matter though, as populations are usually "big enough", so that wrong estimates of the actual population size do not influence our outcomes to a degree we would care about. The same is true here in this formula: It is not the "average" correction factor for all possible samples, but the one for an infinite population - again, it usually does not matter what the actual size is, except when the sample size comes close to the population size. Now you correctly identified that this can cause problems because in this case you actually know a lot more than what the formula is giving you credit for. What people came up with for this case it the Finite Population Correction (FPC) - I would advise to just google it and look for yourself as the space here is quite limited (of course you can also ask follow-up questions about that here if you like!). However, in a nutshell this correction does what you pointed out - it prevents that you correct "to hard".
@@statsandscience Thank you, it is a very useful answer. I didn't know about that assumption, and so when your examples had a population size of 7, I was extra confused. Thanks for clearing it up. That makes it clearer why the correction should be greater when the sample size is smaller. Maybe mention that assumption in your video description to help others in the same boat as me?
Before I watch, basically the average cuts the variance by a factor of n, but when we find the difference between a sample value and the sample average, the average contains 1/nth of that sample so the calculated variance is shrunk by a factor of (1-1/n).
I am not sure if I understand correctly but I think there might be something to it framing the mean as containing 1/n parts of the information within the sample... Would you say you were right after watching?
Well, it is, but you can sort of getting around that in an explanation like this one. If you are interested, feel free to watch my video on degrees of freedom. :)
They would still occur, wouldn't they? Because the margins of the table are identical either way, so there would be zeros on the diagonal. Sampling without replacement is also a separate issue, as for instance discussed here: stats.stackexchange.com/questions/70124/unbiased-estimator-of-variance-for-samples-without-replacement
@@statsandscience Thank you for your reply. Zeros occur when we substract each data point from itself and this doesn't happen in case of sampling without replacement.
Thank you so much for this great video. I appreciate it. I have to give you feedback though on the quality of the sound. I found it sometimes difficult to hear well what you say. Two things I would suggest you do, as I think your understanding of these concepts and ability to communicate them visually need not go to waste. The two solutions I suggest are 1. a better microphone (Shure and Rode are the best and not that expensive) and 2. read slower pleeaassee. I had to stop multiple times and go back to understand fully what you say. If you think you need to keep your videos below a certain time threshold, then cut off unnecessary words from your script, using shorter words, trim wordy phrases (e.g. use 'most' instead of 'the majority of'). Thanks again for the great effort, keep it going.
Hey, thank you so much to take the time to give detailed feedback. 1) I am actually using such a microphone, but maybe it wasn't well positioned? I will check that. 2) thanks, I will try! It is not that I want to shorten videos, I am just used to talk fast I guess...
I did not get how (x(sample mean)-population mean) squared/varience of sample mean is equal to variance of population/n or sample size. Cd anyone pls explain? 20:00
I brushed over this a bit because it was not the focus here. Intuitively, it makes sense I think that the variance of the sample mean must be smaller than the population variance and that this depends on n because as I explained, there is no way to get the most extreme observed values as means, and the mean will always become "less extreme" in comparison the higher n is. However, I don't know an intuitive explanation for the exact formula, but the reasoning goes like this: You try to calculate the variance of the sample mean, that is, the sum of the observations divided by n, like so: Var(obs1+obs2+obs3.../n). You can rewrite this to Var((1/n)*obs1 + (1/n)*obs2 + (1/n)*obs3...). A linear combination like this has a variance equal to the sum of whatever the factor is squared (in this case 1/n^2) times the variance of the individual components: (1/n^2)*Var(obs1) + (1/n^2)*Var(obs2)... When you then assume identical variances for the observations, this equals (1/n^2*)n*Var(obs) which is Var(obs)/n. You can find that a bit nicer formatted also here: online.stat.psu.edu/stat414/lesson/24/24.4 Hope this helps, thank you for the comment!
What kind of statistics exactly do I need to learn in order to follow along? This looks really interesting, but I don't fully understand how it all works. Thanks!
You will probably find the general concept in any applied statistics textbook. As I said it is a basic step from descriptive statistics where you only draw conclusions about a particular sample to inferential statistics where you use a sample to draw conclusions about a bigger population and that is basically what is always needed and taught in applied statistics. The issue is that those books tend to be shallow in that regard and other books with more detail might only be helpful with a serious understanding of the math behind it. Which is why I made the video to bridge between these two. Let me know if that was what you had in mind!
Thank you for this great video. I hope you continue uploading more videos. Do you have e a written text for this video? As a non-native English speaker, I face some difficulties to follow your speaking. I need to repeat hearing of many parts of the video to catch the words.
Thank you! Yes, I do have that and I always wanted to make proper subtitles but just did not get to it yet. TH-cam auto generates subtitles as you probably know but I don't really like them. I will try to look into that soon and let you know.
thanks for the video, really good way to explaine! Frankly to me, it seemed you are reading from a written text, because your speaking was too constant(no stress on the words no up and downs no nothing) and that made it really difficult for me to understand what you're saying
The explanation for why we devide by 2n² in the second formula is not intuitive to me, despite it working on a small example I tested. I feel redundency in dividing by both 2 and n². If we have two instances of each distance measurement, okay we can divide by two, reducing the number of distances we're taking into consideration. But why would we then need to also divide by a second n if we reduced the number of distances we're taking into consideration from n² when we divided by 2?
So instead of the sample lying somewhere much lower than the true population mean, what if it's lying much higher? Would it be correct to use n+1 instead of n-1 in order to deliberately make the sample variance smaller?
The main problem is that you don't know that. Remember that we do all this with samples because we do not have access to the population - and this is a problem that happens because of sampling, but not when you can use the population values. Imagine a student who goes to the school cafeteria every day, and who knows that the staff tends to hand out portions that are too small most days. So they ask for something extra every day (and receive it). This will move the portion size to the optimum most days, but on days where the portion size was correct in the first place or even greater, the request will make it worse. However, this is still better because on most days the size is too small, so the average will be closer to the optimum. Does that help?
@@statsandscience Thank you for your response. It definitely helps but I still have the question of how you would know that the data values from a sample are too small. You cannot infer that it's too large, but why can you infer that it's too small? Shouldn't it go both ways? Maybe naturally, samples tend to gravitate around smaller data values as with the portion size example you gave? If that's the case then it does actually make sense since you'd typically not want to exceed the normal portion size so you don't run out (and this idea of scarcity can be applied to any other examples).
@@se0271 you indeed don't know that for a particular value. It can be too big or too small. It is just more likely that it is smaller. I'm afraid that when I go into more details I would just repeat what I said in the video but when you have specific questions I would be happy to help!
“The expected difference between the “correct” formula of the variance and the “wrong” one with n and the sample mean, is equal to the variance of the sample mean!”
This one sentence untied a knot! Thank you very much. This video is by far, and I watched various explanation of degrees of freedom etc., the best I've seen.
This is great, I have grappled with this for quite a while.
Thank you SO much for this video. It has been so hard to find a proper explanation of this.
The explanation given at 24:55 why we divide by 2 ("each value is in here twice") does not seem intuitive: If we only added (x_i-x_j)^2 for i
Not sure if I understand you correctly, but your suggestion is basically to only look at one of the triangles of the matrix without the diagonal. Is that correct? In that case you would have to divide by n(n-1) only and not by 2 to get the same value.
@@statsandscience I'll try and elaborate more:
The sum of all the squares is 812. Divided by n^2 (because that's the number of squares we're considering), that's 16.571… Divided by 2, that's 8.28571… And finally, with Bessel's correction, it's 9.666…
The sum of half of the squares is 406. Divided by n(n-1)/2 (because that's the number of squares we're considering), that's 19.333… So we still have to divide by 2 to get 9.666…
I'm not saying the formula is wrong, just the explanation ("each value is in here twice"). In both cases, you have to divide by the number of squares AND by 2. So the division by 2 is not explained by counting the squares twice.
@statsandscience But the number of squares in one of the triangles is 1+2+3+4+5+6+7=21, not 42.
@@cannot-handle-handles Yes, I deleted my comment because I noticed my mistake prior to your answer. It makes sense what you say, I did not do this test before. Do you have a good intuitive explanation for the 2?
Is it because you basically calculate the means of all pairs of points?
Thanks for the explanations! I'd been meaning to learn about this for ages, but just hadn't gotten around to it, haha. 😅 Something that might be helpful is that if you put times and labels in a list in the description, TH-cam will now automatically split up the play bar into chapters, as long as the first one is 0:00, so something like:
0:00 - Intro
1:33 - Terminology
4:28 - Estimating the mean or variance
9:20 - Why is the version with n biased?
17:03 - Why does n-1 save it? (explanation 1)
21:56 - Why does n-1 save it? (explanation 2)
28:42 - Summary
That was super helpful, thanks! And extra thanks for providing all the correct time stamps!
The greatest video on this aspect on the internet!
Thanks, glad you liked it!
This was honestly such a great watch, thank you for the video
Thank you!
Great video! Helped me finally understand the derivation of the sample standard deviation
Very enjoyable explanation. Thank you! Greetings from Mexico.
this is an amazing explanation, Thank you so much ! I was so frustrated by the hand wavy explanations on youtube , even in lectures !
Thank you, I really appreciate it!
Thanks for the best video I've ever seen on the n-1. Referring to the key questions at 8:37 I was hoping to find the answer to a more specific question #2: not just "why isn't it n-2 or n-pi" but "why does the correction factor (n-1/n) not depend on the ratio between the sample size and the population size?" That is, if I know the population is 1000, and I choose a sample of 10 vs. a sample of 999, why wouldn't I use different correction factors to get the best answer? After all, my sample of 999 is going to be darn close to the true population variance whereas my sample of 10 is going to be way off. Your video kind of implies, but doesn't say directly (wish it did) that the n-1 "solution" provides the "average" correction factor you might need for any possible sample size relative to the population size, or to say the same thing in another way, the n-1 is the best you can do if you don't actually know the population size. Is that correct? If we DO know the population size exactly, then can we choose a better correction factor that is tailored to that particular sample size : population size ratio?
To make an even clearer statement of the problem...suppose my sample size is always 499. Now suppose that the actual population is either 500 or 1000. So that's 2 cases in total. According to the n-1 rule, I should apply the same correction (499/(499-1)) to 499 samples in a 500 population as I should apply for 499 samples in a 1000 population. That doesn't seem to be the best we can do if we know the actual population size, since I should not need to correct as hard when sample size is very close to population size. So is the n-1 rule designed only for the case where one does not know the population size? If we do know the population size, can we do better? Using what formula?
Sorry that I did not come around earlier to answer this question. You put a lot of effort into this and I hope you still benefit from an answer!
When you take intro stats classes, a quite basic assumption that lurks basically everywhere is that the population you are dealing with is infinite. Of course, this assumption is also basically always wrong. Usually that does not matter though, as populations are usually "big enough", so that wrong estimates of the actual population size do not influence our outcomes to a degree we would care about. The same is true here in this formula: It is not the "average" correction factor for all possible samples, but the one for an infinite population - again, it usually does not matter what the actual size is, except when the sample size comes close to the population size.
Now you correctly identified that this can cause problems because in this case you actually know a lot more than what the formula is giving you credit for.
What people came up with for this case it the Finite Population Correction (FPC) - I would advise to just google it and look for yourself as the space here is quite limited (of course you can also ask follow-up questions about that here if you like!). However, in a nutshell this correction does what you pointed out - it prevents that you correct "to hard".
@@statsandscience Thank you, it is a very useful answer. I didn't know about that assumption, and so when your examples had a population size of 7, I was extra confused. Thanks for clearing it up. That makes it clearer why the correction should be greater when the sample size is smaller. Maybe mention that assumption in your video description to help others in the same boat as me?
Fantastic, I've been wondering this for a long time..
This puzzeled me for a long time... Thanks for explanaition... P. S. Always thought it was spelling mistake in book(s)
Before I watch, basically the average cuts the variance by a factor of n, but when we find the difference between a sample value and the sample average, the average contains 1/nth of that sample so the calculated variance is shrunk by a factor of (1-1/n).
I am not sure if I understand correctly but I think there might be something to it framing the mean as containing 1/n parts of the information within the sample... Would you say you were right after watching?
Nice, now it is clear to me.
Great, thank you for the comment!
I always thought the n-1 was related to degree of freedom spent, but actually it isn't!
Well, it is, but you can sort of getting around that in an explanation like this one. If you are interested, feel free to watch my video on degrees of freedom. :)
Does the explanation using pairwise differences apply in sampling without replacement where diagonal zeros don't occur?
They would still occur, wouldn't they? Because the margins of the table are identical either way, so there would be zeros on the diagonal.
Sampling without replacement is also a separate issue, as for instance discussed here: stats.stackexchange.com/questions/70124/unbiased-estimator-of-variance-for-samples-without-replacement
@@statsandscience
Thank you for your reply. Zeros occur when we substract each data point from itself and this doesn't happen in case of sampling without replacement.
Thank you so much for this great video. I appreciate it. I have to give you feedback though on the quality of the sound. I found it sometimes difficult to hear well what you say. Two things I would suggest you do, as I think your understanding of these concepts and ability to communicate them visually need not go to waste. The two solutions I suggest are 1. a better microphone (Shure and Rode are the best and not that expensive) and 2. read slower pleeaassee. I had to stop multiple times and go back to understand fully what you say. If you think you need to keep your videos below a certain time threshold, then cut off unnecessary words from your script, using shorter words, trim wordy phrases (e.g. use 'most' instead of 'the majority of'). Thanks again for the great effort, keep it going.
Hey, thank you so much to take the time to give detailed feedback. 1) I am actually using such a microphone, but maybe it wasn't well positioned? I will check that. 2) thanks, I will try! It is not that I want to shorten videos, I am just used to talk fast I guess...
@@statsandscience good luck with your work and thank you from the bottom of my heart, I really do understand why we divide by n-1 now :D .
I did not get how (x(sample mean)-population mean) squared/varience of sample mean is equal to variance of population/n or sample size. Cd anyone pls explain? 20:00
Herzlichen Dank! Sehr aufschlussreich!
Finally Understood !...
How did he get to the statement made on 20:00 - that var(sample mean) is equal to population variance divided by n?
I brushed over this a bit because it was not the focus here. Intuitively, it makes sense I think that the variance of the sample mean must be smaller than the population variance and that this depends on n because as I explained, there is no way to get the most extreme observed values as means, and the mean will always become "less extreme" in comparison the higher n is. However, I don't know an intuitive explanation for the exact formula, but the reasoning goes like this: You try to calculate the variance of the sample mean, that is, the sum of the observations divided by n, like so: Var(obs1+obs2+obs3.../n). You can rewrite this to Var((1/n)*obs1 + (1/n)*obs2 + (1/n)*obs3...). A linear combination like this has a variance equal to the sum of whatever the factor is squared (in this case 1/n^2) times the variance of the individual components: (1/n^2)*Var(obs1) + (1/n^2)*Var(obs2)... When you then assume identical variances for the observations, this equals (1/n^2*)n*Var(obs) which is Var(obs)/n. You can find that a bit nicer formatted also here: online.stat.psu.edu/stat414/lesson/24/24.4
Hope this helps, thank you for the comment!
I was wondering too!
Please upload more videos. I’m begging
Thanks, glad you liked it!
What kind of statistics exactly do I need to learn in order to follow along? This looks really interesting, but I don't fully understand how it all works. Thanks!
You will probably find the general concept in any applied statistics textbook. As I said it is a basic step from descriptive statistics where you only draw conclusions about a particular sample to inferential statistics where you use a sample to draw conclusions about a bigger population and that is basically what is always needed and taught in applied statistics. The issue is that those books tend to be shallow in that regard and other books with more detail might only be helpful with a serious understanding of the math behind it. Which is why I made the video to bridge between these two.
Let me know if that was what you had in mind!
Thank you for this great video.
I hope you continue uploading more videos.
Do you have e a written text for this video?
As a non-native English speaker, I face some difficulties to follow your speaking. I need to repeat hearing of many parts of the video to catch the words.
Thank you! Yes, I do have that and I always wanted to make proper subtitles but just did not get to it yet. TH-cam auto generates subtitles as you probably know but I don't really like them. I will try to look into that soon and let you know.
@@statsandscience
Thank you for your reply. I will wait for this precious script.
@@الحافظالصغير-ه2ر English subtitles are up now! I hope you will find them helpful
Loved the video. Thanks!!
upload more videos
What software do you use to type math equations and animate them in videos? thanks!
It's honestly just powerpoint and I won't recommend it for standard use, I am sure there are better options out there...
legendary
Thanks!
thanks for the video, really good way to explaine!
Frankly to me, it seemed you are reading from a written text, because your speaking was too constant(no stress on the words no up and downs no nothing) and that made it really difficult for me to understand what you're saying
Thanks! I will try to improve speaking next time!
The explanation for why we devide by 2n² in the second formula is not intuitive to me, despite it working on a small example I tested. I feel redundency in dividing by both 2 and n². If we have two instances of each distance measurement, okay we can divide by two, reducing the number of distances we're taking into consideration. But why would we then need to also divide by a second n if we reduced the number of distances we're taking into consideration from n² when we divided by 2?
So instead of the sample lying somewhere much lower than the true population mean, what if it's lying much higher? Would it be correct to use n+1 instead of n-1 in order to deliberately make the sample variance smaller?
The main problem is that you don't know that. Remember that we do all this with samples because we do not have access to the population - and this is a problem that happens because of sampling, but not when you can use the population values.
Imagine a student who goes to the school cafeteria every day, and who knows that the staff tends to hand out portions that are too small most days. So they ask for something extra every day (and receive it). This will move the portion size to the optimum most days, but on days where the portion size was correct in the first place or even greater, the request will make it worse. However, this is still better because on most days the size is too small, so the average will be closer to the optimum. Does that help?
@@statsandscience Thank you for your response. It definitely helps but I still have the question of how you would know that the data values from a sample are too small. You cannot infer that it's too large, but why can you infer that it's too small? Shouldn't it go both ways? Maybe naturally, samples tend to gravitate around smaller data values as with the portion size example you gave? If that's the case then it does actually make sense since you'd typically not want to exceed the normal portion size so you don't run out (and this idea of scarcity can be applied to any other examples).
@@se0271 you indeed don't know that for a particular value. It can be too big or too small. It is just more likely that it is smaller. I'm afraid that when I go into more details I would just repeat what I said in the video but when you have specific questions I would be happy to help!
@@statsandscience I see, I appreciate the explanation- thank you!
Subscribed!
If you permit me, I may put Arabic translation on your video. If you provide me by the English script, it will facilitate my work.
Yes, that sounds great! I think you should now be able to just download it after I have added the subtitles.
Great video, but would be even greater if you articulated a bit more 😉