"If you're understanding, frankly, even...like 10% any of this, really, you are a very advanced Statistics student"...ooooh, stop it, Sal! ^_^ Thank you so much for this. Life saver as usual!
@@mihaililiev5932 for sure, It is always good to remember why things are the way they are (which is sometimes an inappropriate question to ask when it comes to math). But as long as you try to picture a proper population of something (like height), itl make sense why these formula's are the way they are. And why you would run into some issues when calculating these formula.
Haha I wish my 400-level stats instructor would agree with you - it isn't that advanced, this is simply saying that variance will change when the mean changes - this can be done by simply changing the sample (getting a new sample). If you don't get a good (random) sample, then yeah, you're variance is going to be very unrepresentative of your target population. That's why its often beneficial to do this a large number of times, with new random samples each time, and then taking the sum of all variances. Thanks Sal, good refresher!
I want to compliment your teaching style. I am an older student in a Nurse Practitioner Program that requires statistics. I am using your videos to complement my studies. Your style is very clear and succinct . You are a very good instructor. Thank you for making these videos, it is helping me tremendously, especially to get over my "math anxiety"
I'm randomly rewatching this lecture, after I've now finished all my statistics courses and did very well. Thanks to Sal Khan. I remember it clearly when he said "if you are understanding, frankly, even 10% of this, you are a very advanced statistics student", it made me feel so good about myself, because I got a good grasp of this.
I've been calculating variance and std deviation and never knowing truly what it meant. This and his std deviation video clears up so much. Even after 10 years what an invaluable resource.
I completely agree. I love the way he says "If you understand this, you're a good statistics student", he doesn't even realize what an amazing teacher he is.
Stop reading the comments..your supposed to be studying :) Cheers and hopefully your intelligence rolls today are better than those of the people highly concerned with video quality as opposed to the material and teaching style. Well done and thank you
I went through lot of statistics, and only today I realised that Sample variance has been forced to be measured differently. Oh man, I have to start over again. Good video Mr Khan, probably one of the memorable encounters.......
checked couple videos, no one give the intuition behind why n-1. Then I found your video Sal, I knew you wouldn't disappoint me and ask me to take it for granted. I had known it before I watched the video. You are my Idol Sal :)
we never overestimate variance, variance of sample is always underestimated and the reason is variance is dispersion of data set which is more or less proportional to range, higher the range or dispersion of data, higher will be variance. Now, range or dispersion of sample will always be less than its population (just think over it for minute) so, variation of sample will be always less than that of population.
The best teacher ever, everything is clear and making good sense from video 1 to the last one. Great job, I will recommend this channel to every student i come across.
By this video i eventually broke my brain.)) It`s much more harder when it`s not you mother tongue, but the information u give is so structured, I really appreciate it. So thankfull even through 13 years. Hope i`ll absorb this part as well))
minute 7:30 said 'the population of the mean' instead of '''the mean of the population'....but generally speaking it is amaing you way of expanation and your doing a good noble job ..!! thank you very much
You're a very good teacher, Sal. I'm here because I'm trying to catch up with the stats class I have in my master's degree program. Embarrassingly, I couldn't catch up without outside help. My professor just reads the slides. Thank you for all of these videos, Sal!
Sal, I wanted to thank you again for doing this ... not only for me but for all people. You're bringing complex ideas to the grasp of everyone. What an achievement! You're second only to Wikipedia. hdll
After the terrible things happend me in high school I never thought that once I will enjoy watching videos about geeky math things. But I do... maybe I already got old and boring :) Thanks for posting the math videos, I found them very useful!
Actually, in the sample variance, we have to divide by (n-1) instead of n is because of degrees of freedom. If there are n number of data points in a sample, and we know the mean of the sample, only (n-1) values can vary freely (since after (n-1) values the remaining value cannot vary due to the already known sample mean). Even though what the presenter says is true (we indeed underestimate if we divide by n) the actual reason behind that is this.
What a great video explaining the meaning of variance. The last stat class I took was more than 10 years ago. Recently my colleague gave me a one word answer (i.e. variance) when I ask him the reasoning behind a decision. Now I know how to more questions and maybe continue to expect one word answer.
You are correct. Also, there are ways to calculate sufficiently large enough samples to practically neutralize this problem. There's a reason we don't base public opinion polls on 3 people sample sizes.
This was super easy to follow. Got 100% of it. However when my Econometrics professor talks I have no idea what she's saying : / Hope I pass the final fingers crossed.
Today I had my first statistics class and so I am watching this video and feel like I understand less than 20% but at 9:40 I learned after only one class I am a very advanced statistics student (understanding more than 10% of this)
awesome proof at the end....basically the probability of choosing a crappy sample that will; give you an underestimated mean is far higher than pocking a sample that will give you the right mean. Hence the n-1...i just wonder why -1 though instead of -0.5 or any other arbitary number? It would be fun to write a computer program to prove through a bunch of simuiltions. good vid!
I think that is the Reason why we perform Probability Tests and Test Hypothesis(under different Conditions) after getting an Desired Output. I mean in the Probability we Check the Chance of Occurrence of the Sample we Took in the Total Sample and in the Hypothesis Testing (different that is 1 sample or 2 Sample or t tests or Test for means and Test for Proportions we check the Probability of Occurrence for our Result in Case H0 is Our Result and H1 argues Against it or vice Versa). And In Confidence test we might Summarize this with greater Proofs.
@patilnikh it may not help a month late but when you are dealing with a "population" you simply divide by N but when there is a "sample" you divide by N-1
For some reason your voice sounds just like Chris Evan's/Captain America to me lol. Also thank you so much! Your videos have been a massive help to studying for my stats finals
If the values of the points are all greater than the mean, it wouldn't really change anything. because sample variance is based on the distance the points gathered are from their mean. So the points that are greater can and still will, most likely, have a sample variance smaller than that actual variance of the population. The values of the points in the sample in relation to the mean of the population do not have that much impact.
Your point that samples tend to underestimate means did fairly well with the example by taking data points that come below the mean. But what do I do if the points are actually greater than the mean and I don't know it??? Thanx anyways for your thoroughly AWESOME videos...
certainly there are chances that our sample contains high values in the number line, but this only results in a larger sample mean, not a larger variance. The sample values are still closed to their own average and thus the distance squared are less than the distance from the population average squared, which means we underestimate the variance, just as the way when our sample contains low values.
Sir, Please add on this video a real example, so that we will more relate, and it will be unforgettable... Once again thank you so much for this video series.
we never overestimate variance, variance of sample is always underestimated and the reason is variance is dispersion of data set which is more or less proportional to range, higher the range or dispersion of data, higher will be variance. Now, range or dispersion of sample will always be less than its population (just think over it for minute) so, variation of sample will be always less than that of population. Hope this will be satisfying
Let, we have a population of 0, 10, 12, 14, 15, 16, 18, 20, 25, 24, 22, 30, 28, 26, 40. We get 0, 10, 20, 30, 40 as the sample. If my calculation is correct, population variance is 86 where the sample variance is 200. Variance is overestimated here. I am trying to say, if the sample does not represent population well, sample variance may overestimate population variance, how is that dealt with? Please correct me if I am getting the things wrong.
@@aaiyeeshamostak5096 I am having the same doubt, but in your example, since you included both the extremes of population in your sample, the variance is overestimated, how often can that be the case in real time scenarios while dealing with large data?
about the error u told that the variance of the population could be way different thatn the variance of the the sample u were kind of right (or absolutly right) the thing is lets say for example u want to meassure the height of the population of america and that error could occur only if u pick up the men of the sample from the basketball team like u said in a video ! If done so the mean of the sample aswell as the variance would differ from the pop. We need to select the sample in statistics!
Thank you for video, it really helps to refresh my knowledge on statistics as I plan to do further studies in Data Science. But I have a doubt what if the sample datasets are taken at the higher side only, in your case the ones at the right side? Wouldnt this make it more s(squared) i.e the sample variance will be higher then it should be?
what if the sample was taken on the right of the population mean, not on the left, then it is going to be overestimation and the right way of calculating the sample variance would be dividing by (n+1)? and also if we have a huge n, tending to infinity (n-1) will not give a huge difference or this concept is not applicable to large numbers?
Sal, isn't the entire point of a "sample" is to draw a good enough sample to generate a mean that is pretty close to that of the mean of the population. (Considering you've picked a large enough sample ) If you pick a sample that generates a mean that is on the other end of number line, wouldn't that simple suggest that we've picked a bad sample? (Garbage in = garbage out, sorta thing) Correct me if I'm misunderstanding please (anyone, not just Sal)
ye i think u made sense, but at the same time, its kinda intuitive that even if you pick a random sample, you want it do be good, you want it to be able to justify whatever you are trying to find out. henceforth, a n-1 thingie to get a better estimate
So there is a chance of underestimating the sample variance if dividing by n. By dividing by n-1 does that ensure it will always be an overestimate? Or does it just decrease the chance of underestimation. It would be impressive if it were the former.
Lets take a population of 635 and we are trying to get variance for a sample of only 60. Say we got numerator (400) and here we have sample size 'n' in denominator. The sample variance is 400/60 = 6.667. This sample variance may not be precisely the population variance. Hence, to get a 'conservative' estimate we divide with (n-1) (i.e 60-1=59 in this case). Now we get sample variance as 400/59 = 6.78. This is actual to err and to err or positive side.
if you underestimating the variance because you assume the population mean might not be within sample point. So what if you assume wrong and then doing n-1 will overestimate your sample variance.
I know this is late but at 10:50 you say it's population variance? Thought s^2 was sample variance? Just need to confirm if it was just a mistake or if I'm getting it wrong, thank you!
I think I can explain why n-1 is better for variance. If you sample 10% of a population, you have only a 10% chance of getting the highest number, also only 10% chance of the lowest. In fact, you are most likely to get only 1 of the ten highest numbers, one of the ten lowest numbers. A purely random sample might exclude the 7 highest and 4 lowest numbers. Obviously, leaving those out deflates the variance of the sample. So divide by n-1 instead.
Even after 10 years this is still preferred over other videos.
Love you Khan Academy.
Even after many years this will still preferred over other videos.
Love you Khan Academy.
Even after 13 years
after 14
@@StarlightPoulet
@@mohammedaldhahri6698 even after 15 years
"If you're understanding, frankly, even...like 10% any of this, really, you are a very advanced Statistics student"...ooooh, stop it, Sal! ^_^
Thank you so much for this. Life saver as usual!
This is really not easy to grasp though. I mean you can accept it and memorise it it, but understanding is a bit harder
@@mihaililiev5932 for sure, It is always good to remember why things are the way they are (which is sometimes an inappropriate question to ask when it comes to math). But as long as you try to picture a proper population of something (like height), itl make sense why these formula's are the way they are. And why you would run into some issues when calculating these formula.
Haha I wish my 400-level stats instructor would agree with you - it isn't that advanced, this is simply saying that variance will change when the mean changes - this can be done by simply changing the sample (getting a new sample). If you don't get a good (random) sample, then yeah, you're variance is going to be very unrepresentative of your target population. That's why its often beneficial to do this a large number of times, with new random samples each time, and then taking the sum of all variances.
Thanks Sal, good refresher!
I want to compliment your teaching style. I am an older student in a Nurse Practitioner Program that requires statistics. I am using your videos to complement my studies. Your style is very clear and succinct . You are a very good instructor. Thank you for making these videos, it is helping me tremendously, especially to get over my "math anxiety"
HD in 240p...
It's recorded in HD but not distributed :D
It's very clear, man, what are you talking about?
@@totallynotdavid it's 240p nevertheless
It was HD in 2009
Nine years later I'm like 444 (11/21/23) 💗🥳
I'm randomly rewatching this lecture, after I've now finished all my statistics courses and did very well. Thanks to Sal Khan. I remember it clearly when he said "if you are understanding, frankly, even 10% of this, you are a very advanced statistics student", it made me feel so good about myself, because I got a good grasp of this.
I've been calculating variance and std deviation and never knowing truly what it meant. This and his std deviation video clears up so much. Even after 10 years what an invaluable resource.
The funny part is that I see people in his video praising the explanation when it is wrong. Search for my other comment.
I completely agree. I love the way he says "If you understand this, you're a good statistics student", he doesn't even realize what an amazing teacher he is.
this has the be the highest definition of any 240p video
It is because the background is still making it easy to compress. You gain a lot of quality with a still background.
An old teacher of mine always said... "the more you try to simplify science the harder it gets...". You have managed to beat that quote... THANKS!!
I did :)
;)
Stop reading the comments..your supposed to be studying :)
Cheers and hopefully your intelligence rolls today are better than those of the people highly concerned with video quality as opposed to the material and teaching style. Well done and thank you
WildChildSpeaking Jo Scofield 👏🏻
I went through lot of statistics, and only today I realised that Sample variance has been forced to be measured differently. Oh man, I have to start over again. Good video Mr Khan, probably one of the memorable encounters.......
Am here from 2019 THANKKYYOOUU ❤❤
checked couple videos, no one give the intuition behind why n-1. Then I found your video Sal, I knew you wouldn't disappoint me and ask me to take it for granted. I had known it before I watched the video. You are my Idol Sal :)
Thanks for this, you're more clear than any Cambridge engineering lecturers I've had on this! Great explanation of the (N-1) term...
we never overestimate variance, variance of sample is always underestimated and the reason is
variance is dispersion of data set which is more or less proportional to range, higher the range or dispersion of data, higher will be variance. Now, range or dispersion of sample will always be less than its population (just think over it for minute) so, variation of sample will be always less than that of population.
The best teacher ever, everything is clear and making good sense from video 1 to the last one. Great job, I will recommend this channel to every student i come across.
By this video i eventually broke my brain.)) It`s much more harder when it`s not you mother tongue, but the information u give is so structured, I really appreciate it. So thankfull even through 13 years. Hope i`ll absorb this part as well))
minute 7:30 said 'the population of the mean' instead of '''the mean of the population'....but generally speaking it is amaing you way of expanation and your doing a good noble job ..!! thank you very much
You're a very good teacher, Sal. I'm here because I'm trying to catch up with the stats class I have in my master's degree program. Embarrassingly, I couldn't catch up without outside help. My professor just reads the slides. Thank you for all of these videos, Sal!
You're not alone! I'm also here to try to teach myself haha. And don't be too embarrassed, especially if your prof is lazy.
Sal, I wanted to thank you again for doing this ... not only for me but for all people. You're bringing complex ideas to the grasp of everyone. What an achievement!
You're second only to Wikipedia.
hdll
After the terrible things happend me in high school I never thought that once I will enjoy watching videos about geeky math things. But I do... maybe I already got old and boring :) Thanks for posting the math videos, I found them very useful!
9:38 I have a stat final tomorrow and still don't understand half of this. You, my friend, have no idea how much this boosted my morale.
Actually, in the sample variance, we have to divide by (n-1) instead of n is because of degrees of freedom. If there are n number of data points in a sample, and we know the mean of the sample, only (n-1) values can vary freely (since after (n-1) values the remaining value cannot vary due to the already known sample mean). Even though what the presenter says is true (we indeed underestimate if we divide by n) the actual reason behind that is this.
What a great video explaining the meaning of variance. The last stat class I took was more than 10 years ago. Recently my colleague gave me a one word answer (i.e. variance) when I ask him the reasoning behind a decision. Now I know how to more questions and maybe continue to expect one word answer.
You may have just saved my mathematical career; I needed to skip a course and this is just right for helping me out!
Dear Khan Academy Your teaching style is amazing...
Thanks a lot...
Brilliant. I understand better than courses I took at the university
Wow I've never understood why the sample variance was computed by dividing by n-1 instead of n. Now after more than a decade I finally do!
I appreciate Sal so much for doing this, we should all be so thankful.
Thank you very much for saying me as an advanced statistic student - feels good @9:45
You are correct.
Also, there are ways to calculate sufficiently large enough samples to practically neutralize this problem. There's a reason we don't base public opinion polls on 3 people sample sizes.
This was super easy to follow. Got 100% of it. However when my Econometrics professor talks I have no idea what she's saying : / Hope I pass the final fingers crossed.
Today I had my first statistics class and so I am watching this video and feel like I understand less than 20% but at 9:40 I learned after only one class I am a very advanced statistics student (understanding more than 10% of this)
Such an informative . Finally i get to undrstand statistice other thn my class :)
The visual example of variance was excellent
I am really grateful to khan academy for excellent explanation
awesome proof at the end....basically the probability of choosing a crappy sample that will; give you an underestimated mean is far higher than pocking a sample that will give you the right mean. Hence the n-1...i just wonder why -1 though instead of -0.5 or any other arbitary number? It would be fun to write a computer program to prove through a bunch of simuiltions. good vid!
Very articulate. I admire you Sal.
I will never go to class again, except maybe for exams, Khan all day!
thanks a lot!!! saved my maths life
YIkes! I wish you re-do this video, on my phone looks very small, and on my laptop looks blurry. Ty.
But it's HD :D lol jk :)
Groundbreaking 240p HD
Yes sir ree :D
And i love your videos! your videos help me alot! Thank u
I think that is the Reason why we perform Probability Tests and Test Hypothesis(under different Conditions) after getting an Desired Output. I mean in the Probability we Check the Chance of Occurrence of the Sample we Took in the Total Sample and in the Hypothesis Testing (different that is 1 sample or 2 Sample or t tests or Test for means and Test for Proportions we check the Probability of Occurrence for our Result in Case H0 is Our Result and H1 argues Against it or vice Versa). And In Confidence test we might Summarize this with greater Proofs.
@patilnikh it may not help a month late
but when you are dealing with a "population" you simply divide by N
but when there is a "sample" you divide by N-1
you are a life saver khan
Great videos! Concepts are explained clearly and quite easy to follow, this lesson is far better than what I took in college lol.
yayyyyy im an advanced stats student i understood the last part
Great videos! Very helpful and straightforward!You're a great teacher! Thank you so much for sharing your knowledge!
For some reason your voice sounds just like Chris Evan's/Captain America to me lol. Also thank you so much! Your videos have been a massive help to studying for my stats finals
THANK YOU!!!!!! You should write a book! It would be better than the on e waisted $100 on
If the values of the points are all greater than the mean, it wouldn't really change anything. because sample variance is based on the distance the points gathered are from their mean. So the points that are greater can and still will, most likely, have a sample variance smaller than that actual variance of the population. The values of the points in the sample in relation to the mean of the population do not have that much impact.
Sal, you are an unbelievably good teacher! Thank you!!
A crisp 240p! glad you upgraded to HD saul (;
Your point that samples tend to underestimate means did fairly well with the example by taking data points that come below the mean.
But what do I do if the points are actually greater than the mean and I don't know it???
Thanx anyways for your thoroughly AWESOME videos...
certainly there are chances that our sample contains high values in the number line, but this only results in a larger sample mean, not a larger variance. The sample values are still closed to their own average and thus the distance squared are less than the distance from the population average squared, which means we underestimate the variance, just as the way when our sample contains low values.
Sal, You don't need to make an HD video, Your voice is enough to make videos HD.
feels good to be going over all this, knowing that I wouldnt have to start statistics in over a month and a half!
Sir, Please add on this video a real example, so that we will more relate, and it will be unforgettable...
Once again thank you so much for this video series.
this is like" wow"
Isn't there a chance of overestimating the sample variance as well? How is that taken into account
my thoughts exactly...
we never overestimate variance, variance of sample is always underestimated and the reason is
variance is dispersion of data set which is more or less proportional to range, higher the range or dispersion of data, higher will be variance. Now, range or dispersion of sample will always be less than its population (just think over it for minute) so, variation of sample will be always less than that of population. Hope this will be satisfying
Let, we have a population of 0, 10, 12, 14, 15, 16, 18, 20, 25, 24, 22, 30, 28, 26, 40. We get 0, 10, 20, 30, 40 as the sample. If my calculation is correct, population variance is 86 where the sample variance is 200. Variance is overestimated here. I am trying to say, if the sample does not represent population well, sample variance may overestimate population variance, how is that dealt with?
Please correct me if I am getting the things wrong.
@@aaiyeeshamostak5096 I am having the same doubt, but in your example, since you included both the extremes of population in your sample, the variance is overestimated, how often can that be the case in real time scenarios while dealing with large data?
@@tejeshvemula6423 yeah, while what Nachiket Belwalkar said does make sense, it's seriously not "ALWAYS" true, but I guess I'm getting it...
🙏🙏🙏🙏🙏🙏 thank you sir
about the error u told that the variance of the population could be way different thatn the variance of the the sample u were kind of right (or absolutly right) the thing is lets say for example u want to meassure the height of the population of america and that error could occur only if u pick up the men of the sample from the basketball team like u said in a video ! If done so the mean of the sample aswell as the variance would differ from the pop. We need to select the sample in statistics!
great videos sir. you are my Guru!
sum them all up !!!
the n-1 deal is called Bessel's correction. There's a math proof for it on wikipedia if you search "Bessel's correction"
God sent :)
Thank you for video, it really helps to refresh my knowledge on statistics as I plan to do further studies in Data Science. But I have a doubt what if the sample datasets are taken at the higher side only, in your case the ones at the right side? Wouldnt this make it more s(squared) i.e the sample variance will be higher then it should be?
that, actually makes pretty good sense. But if i had read it in a book, i would probably still be wondering...
I have statistics examn in 10 days :D
that's why sample size is so important.....
Wow, this is a great help. Thanks !
HD... 240p?
lol
ikr it sucks
Raj solanky You would be wrong then.
lool
good enough
hey if you recorded this in HD, as u say, it didn't post that way when uploaded.
THANK YOU! (above all else)
very good explanations of concept !!!!!!!!!!!!!!!!
Thanks a ton......
I still don't really understand what variance as a number tells you, other than that I understood it, great video
Thanks
U just gave me a lifeline man.. :D
10 years old but I love this :3
No no no! The other way to think about it!! I was so captivated... Why you quit on me Sal?? :D
thank you!!
i'm not advanced but i got that thing ur trying to say. I KHAN PASS STATISTICS!
Sal, you made this example so difficult and complicated where it shouldn't. Thank you
What is the standard deviation of the resolution of the video from standard HD???
Great job!
Very good effort.
what if the sample was taken on the right of the population mean, not on the left, then it is going to be overestimation and the right way of calculating the sample variance would be dividing by (n+1)? and also if we have a huge n, tending to infinity (n-1) will not give a huge difference or this concept is not applicable to large numbers?
What is the variance really is? Why it is useful in Statistics?
fascinating.
Sal, isn't the entire point of a "sample" is to draw a good enough sample to generate a mean that is pretty close to that of the mean of the population. (Considering you've picked a large enough sample )
If you pick a sample that generates a mean that is on the other end of number line, wouldn't that simple suggest that we've picked a bad sample? (Garbage in = garbage out, sorta thing)
Correct me if I'm misunderstanding please (anyone, not just Sal)
ye i think u made sense, but at the same time, its kinda intuitive that even if you pick a random sample, you want it do be good, you want it to be able to justify whatever you are trying to find out. henceforth, a n-1 thingie to get a better estimate
Thank you, Tilar!
Appreciate the response :)
So there is a chance of underestimating the sample variance if dividing by n. By dividing by n-1 does that ensure it will always be an overestimate? Or does it just decrease the chance of underestimation. It would be impressive if it were the former.
Let me be sarcastic.
WOW. THIS VIDEO IS IN HD I CAN CLEARLY SEE WHAT HE WRITES.
the playlist to watch a day before finals.
Lets take a population of 635 and we are trying to get variance for a sample of only 60. Say we got numerator (400) and here we have sample size 'n' in denominator. The sample variance is 400/60 = 6.667. This sample variance may not be precisely the population variance. Hence, to get a 'conservative' estimate we divide with (n-1) (i.e 60-1=59 in this case). Now we get sample variance as 400/59 = 6.78. This is actual to err and to err or positive side.
@thelemur gd luck. i have SAT soon.
It is good to take multiple samples to study a huge population
Sir, we do division by n-1 for underestimates and what if it overestimates is we do division with more than n?
if you underestimating the variance because you assume the population mean might not be within sample point. So what if you assume wrong and then doing n-1 will overestimate your sample variance.
I know this is late but at 10:50 you say it's population variance? Thought s^2 was sample variance? Just need to confirm if it was just a mistake or if I'm getting it wrong, thank you!
I think I can explain why n-1 is better for variance. If you sample 10% of a population, you have only a 10% chance of getting the highest number, also only 10% chance of the lowest. In fact, you are most likely to get only 1 of the ten highest numbers, one of the ten lowest numbers. A purely random sample might exclude the 7 highest and 4 lowest numbers. Obviously, leaving those out deflates the variance of the sample. So divide by n-1 instead.
wrong
@@new_filler Why?