Why Sample Variance is Divided by n-1

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 พ.ย. 2024

ความคิดเห็น • 87

  • @bankimdas9517
    @bankimdas9517 3 ปีที่แล้ว +40

    Dividing by 'n' underestimates the true variance so to correct the bias we use (n-1) in denominator.

  • @abhinandandey_ricky
    @abhinandandey_ricky 2 ปีที่แล้ว +41

    n-1 is actually degree of freedom
    Why it used for sample s.d or variance?
    The sample mean is supposed to be equal or closest to population mean. But you have the freedom to choose the samples randomly.
    So, to keep the sample mean equal to the population mean you can change (n-1) numbers of sample.
    Suppose, there are 5 samples, you can change only (5-1) = 4 sample... Because the only sample you are not changing will keep the sample mean closest or equal to population mean.

    • @kumudsharma007
      @kumudsharma007 2 ปีที่แล้ว +7

      This was the correct explanation. Crish has explained it very wrong. I think he just had copied this from someone else without even understanding himself.

    • @tushartiwari7929
      @tushartiwari7929 ปีที่แล้ว +1

      Can someone pin this to the top.
      As this is correct explanation.

    • @paulaugustus4671
      @paulaugustus4671 ปีที่แล้ว +1

      This is correct explanation, its because of degrees of freedom.

    • @vigneshStack
      @vigneshStack 9 หลายเดือนก่อน

      Bro if you don't mind can you explain me

    • @wyburp7970
      @wyburp7970 7 หลายเดือนก่อน +1

      ​@@vigneshStack This question hunted me for the past few weeks. I havent find one contemporary who actually explained it well (and, therefore, for my standards, understand it). Basically, Fisher used it a 100 years ago and Walker (1940) explained it rather clearly (search walker degree of freedom ; there's a guy who retranscribed it. his version is scuff but otherwise you have to pay for her article). Fisher wanted to estimate with precision the population mean and variance from a sample. To do, he imagined an infinite population. In brief, Fisher used N-geometry to do calculations ; he allocated a dimension for sample values possibilities (1 sample = 1 axis) for a total of "N" dimensions. Therefore, every sample had infinite value in their realm. However, if you set a mean, they get stuck to move together around the value of the mean. By doing so, their "degree of liberty" was dependant of 1 value in a sample, which reduce their degree of liberty by 1 (N-1). In the same way a train exist on a straight path, you can know its position with only one coordinate even though it also exist in a 2D system (excluding height and time). See the article of Walker 1940. There's also the article of JL Rodgers 2019 (degree of freedom at the start of the second 100 years) that gives great insights of the manipulation of degree of freedom complementarely to Walker.
      Otherwise, there's the concept that a sample is always sampler than the population. Therefore, you never ever can be sure of the mean/average of the population estimated from a sample without a) collecting everysingle sample of the population or b) using mathematics to calculate the value toward which the samples tend toward (notion of limits). In those formulas, there's the explanations of the apperance of 1/(n-1).
      TLDR : The reason why it has a 1/(n-1) in the variance formula has basically been forgotten by the majority of the majority of the people who should know it.
      www.studocu.com/row/document/jamaa%D8%A9-alkahr%D8%A9/faculty-of-graduate-studies-for-statistical-research/degrees-of-freedom/43598306?fbclid=IwAR0ujzGHqcqm6DL-eRKbZs1LbcP_X8XdO5KR_8eOxoY8JEKlp7fqJV4xWdg_aem_AZwtXgCL3q6XBDsNS4z_aOCNDQTeGcaIpA76yrROzeoq9KlY7EpBL_R5ZtOcoPVl8GoR6JrgtEd1xQgvHjvUzOfc

  • @Zaheer-r4k
    @Zaheer-r4k ปีที่แล้ว +16

    Answer :
    The calculations for both the sample standard deviation and the sample variance both contain a little bias (that’s the statistics way of saying “error”).
    Bessel’s correction (i.e. subtracting 1 from your sample size) corrects this bias.
    In other words, you’ll usually get a more accurate answer if you use n-1 instead of n.

    • @Callmeflamee
      @Callmeflamee 11 หลายเดือนก่อน +2

      if both contain errors then why do we only subtract from sample and not the population?

    • @darkclaw12
      @darkclaw12 6 หลายเดือนก่อน

      @@Callmeflamee both indicates to both the sample sd and the sample variance

    • @DarkPrincess_M
      @DarkPrincess_M 5 หลายเดือนก่อน

      Doesn't n-1 overestimate the variance?

    • @calebsteinmetz9471
      @calebsteinmetz9471 3 หลายเดือนก่อน +1

      @@Callmeflamee Because the mean could be over or under estimated, but given how variance is calculated it will always be under estimated.

    • @calebsteinmetz9471
      @calebsteinmetz9471 หลายเดือนก่อน

      Actually it isn't always under estimated, but it general is.

  • @binarystar4947
    @binarystar4947 2 ปีที่แล้ว +35

    I came here by searching the ans to this question from your live day 2 - basic to intermediate statistics video and got my ans ✨

    • @talkswithRishabh
      @talkswithRishabh 2 ปีที่แล้ว

      ++

    • @user-jk1gb7wm6z
      @user-jk1gb7wm6z 2 ปีที่แล้ว +1

      What if we pick all sample values from right side ? n +1 ???

    • @ashishvinod2193
      @ashishvinod2193 ปีที่แล้ว

      @@user-jk1gb7wm6z it is gives same bcz the sample values that you chose from the right side doesn't give Approximatly mean compare to poulation mean their is big difference that's why..
      if n-1 is small then sample variance is large that's why..

  • @bharratkhanna6096
    @bharratkhanna6096 ปีที่แล้ว +8

    The exact reason is bias introduced by sample mean since it is an estimated value of population mean and because sum of deviations should be 0, hence this constraint restricts the freedom of data points, hence to counter the bias and constraints introduced we use n-1, remember sample variance is unbiased but sample std deviate still is biased because of concave functionality of square root introducing a negative bias and n-1 is a linear function which fails to correct sample std to a level as good as sample variance. Remember this is not experiment this is mathematically proved and Bessel correction(n-1) is not used when population mean is present.

    • @Touristtt4028
      @Touristtt4028 2 หลายเดือนก่อน

      I have some doubts regarding this. How can I reach out to you? Could you provide your LinkedIn profile link if that's not a problem?

  • @pratikmanghwani7417
    @pratikmanghwani7417 3 ปีที่แล้ว +9

    With all due respect you should explain why it is unbiased with the math behind it. Thanks

  • @arnabghosh2818
    @arnabghosh2818 3 ปีที่แล้ว +5

    Theoritically in core statistics there is another justification related to degrees of freedom.

    • @Sean-oh3ph
      @Sean-oh3ph 3 ปีที่แล้ว +5

      this is the correct answer

    • @Kumbutranjaami
      @Kumbutranjaami 2 ปีที่แล้ว +2

      Right. There is a math behind why its n-1. that's not just trial and error thing explained in this video.

    • @jonpit4342
      @jonpit4342 2 ปีที่แล้ว +1

      And the core question is why we divide by the df. No one has explained that away

    • @user-jk1gb7wm6z
      @user-jk1gb7wm6z 2 ปีที่แล้ว +1

      what happens if we pick value from right side ? we will get only higher numbers values... then ? .... n +1 ??

    • @Kumbutranjaami
      @Kumbutranjaami 2 ปีที่แล้ว

      @@user-jk1gb7wm6z Its not about right or left side numbers. Because the metric is calculated by subtracting the value from mean. 2 - 1 = 1 and 10001 - 10000 = 1 right? Are you able to understand this maths?

  • @iwatchtvwithportal5367
    @iwatchtvwithportal5367 ปีที่แล้ว +2

    But you failed to explained how that n-1 comes from.

  • @sumangupta871
    @sumangupta871 3 ปีที่แล้ว +12

    What happen if someone choose 5 samples from the right. Why we are not dividing by n+1 in that case?

    • @priyanshujain5286
      @priyanshujain5286 2 ปีที่แล้ว +6

      still it should be divided by (n-1), you can try it with some example. The reason is that once you choose 5 samples from right, their mean will also shift towards right and the difference of the data point and mean would become low value

    • @himanshumaurya4737
      @himanshumaurya4737 ปีที่แล้ว

      @@priyanshujain5286 exactly

  • @abcefg7045
    @abcefg7045 ปีที่แล้ว +10

    if we take right-skewed data, then dividing with n - 1 will create more difference ?

    • @uditkumar370
      @uditkumar370 2 หลายเดือนก่อน

      I got the same question, do you find the answer yet??

  • @snehalhon
    @snehalhon 2 ปีที่แล้ว +1

    After watching each video lecture of your sir ....i m getting my concept more clear ...no words for you ...u r great

  • @mahammadodj
    @mahammadodj 2 ปีที่แล้ว +3

    starts at 2:30

  • @snehaagarwal7640
    @snehaagarwal7640 ปีที่แล้ว +1

    but what if we take the data on the right side of population variance and divide that by n-1...that will be inaccurate then

  • @andrew.schaeffer4032
    @andrew.schaeffer4032 ปีที่แล้ว +1

    great explanation thanks. I wish my statistics book talked about this.

  • @modemnaveen6240
    @modemnaveen6240 2 ปีที่แล้ว +4

    According to logic explained in this video , the sample mean and population mean can be different based on samples we are picking right ? So why we are just under estimating variance . Why not mean ? Can some one please explain

    • @APaleDot
      @APaleDot 2 ปีที่แล้ว +4

      The sample mean might be higher or lower than the population mean, depending on what samples you happen to pick. So, in the long run if you average the sample means, you get closer to the population mean. However, when calculating the sample variance, you use the sample mean (because you don't know the population mean) and the samples will always be closer to the sample mean than the population mean, because the sample mean is precisely that value which minimizes the variance for those particular samples. Therefore, the sample variance tends to be lower than the population variance using this method.

    • @HyperDangerousThing
      @HyperDangerousThing 2 ปีที่แล้ว

      @@APaleDot *and because the sample variance is way lower than the population variance, because of the use of the sample mean for determining the variance (say with just "n" in the formula in the denominator, NOT "n-1") People found out that (n-1 in the denominator) is bringing the samples variance closer to that of the population variance (approximating it), since the resulting Worth of the Variance is getting bigger (because you're dividing with a smaller number). But I still don't get the bigger picture, why this slight approximation is so important in the long run for statistics. Just why on earth is it so important for the interpretation of the result later, that the tiny growth of sample variance (trough n-1) approximates that of the population variance....

  • @thevoiceofdarkness7655
    @thevoiceofdarkness7655 ปีที่แล้ว +2

    This may be a silly question, but why do we assume it is more likely for my sample to be skewed below the mean than above?

  • @sudiptasen634
    @sudiptasen634 2 ปีที่แล้ว +3

    Hi Krish, Thank you for helping us to understand the concept. I had one question.
    When we try to calculate Popultion variance or Population Standard Deviation, it is always small/less than Sample Variance or Sample Standard deviation (tried with excel formula). Is this because of the Bessel's correction that the Sample Std. dev is always greater than Population Std. Dev?

  • @karthikeyanr1804
    @karthikeyanr1804 2 ปีที่แล้ว

    # in simple words:
    Both have a slight bias when calculating the sample standard deviation and sample variance. so we do n-1 to correct the bias

  • @keyyyla
    @keyyyla 3 ปีที่แล้ว +1

    Simple answer is: because dividing by n-1 makes the estimator unbiased.

    • @Kumbutranjaami
      @Kumbutranjaami 2 ปีที่แล้ว

      It's not that easy.

    • @prithvidhyani2002
      @prithvidhyani2002 ปีที่แล้ว

      This doesn't explain anything. What's the bias? That's what people are asking. And neither the video nor your comment addressed that.

  • @anivesh2225
    @anivesh2225 ปีที่แล้ว

    Thanks for this krish, So to negate the bias factor for sample variance w.r.t to population variance we divide by n-1, as it is a tested value by the statistician amd also it is a bessel correction

  • @-isotope_k
    @-isotope_k 2 ปีที่แล้ว +1

    Thanks !!!

  • @anilkumarsharma8901
    @anilkumarsharma8901 2 ปีที่แล้ว

    Bell curve ki koi fix height hotee hain Kya ????
    Statics key sarey formula ek app banva do jo sarey formula ka database mil jayega ???

  • @annamelody5724
    @annamelody5724 10 หลายเดือนก่อน

    Hi, i have a line of data which consists of these numbers {4, 3, 5, 6 ,4, 5, 7,6,5,4} and i have Mean = 4,9 and the variance of 1.4333. My question is, is this variance considered high or low ?

  • @harikrishna220
    @harikrishna220 2 ปีที่แล้ว +1

    Why can't we use n-1 for mean and all

  • @me_debankan4178
    @me_debankan4178 2 ปีที่แล้ว

    let's assume a city has a population and more than 50 % are 80 yrs old but in the time of sampling or surveying we can possibly get biased data which has a sample mean which is around 40-50 yrs and it can cause a problem during analyzing the data because our biased data showing population age mean is around 50 yrs .. and variance is more because of the squaring factor ... but we can eliminate this problem by dividing the variance by n-1

    • @me_debankan4178
      @me_debankan4178 2 ปีที่แล้ว

      this usually happens when you are surveying a few thousand within millions of people

    • @mahiraj8522
      @mahiraj8522 2 ปีที่แล้ว

      nice man... thank you

  • @Dilaram123
    @Dilaram123 7 หลายเดือนก่อน

    Brother sorry but you havent mention about the degrees of freedom which is the actual staistical reason behind this division of n-1 degrees of freedom.

  • @ViolinCineMusic
    @ViolinCineMusic 9 หลายเดือนก่อน

    super explanation loved it

  • @-Neutron-Star
    @-Neutron-Star 10 หลายเดือนก่อน

    so they use "n-1" just based on empirical research and not based on some hardcore mathematics? why not use "n+1"?

  • @file4318
    @file4318 ปีที่แล้ว

    Thank you very much for your video, it was very very good at explaining. But I have one more question, If descriptive statistics do not try to generalize to a population (since there is no uncertainty in descriptive statistics), then why does the sample standard deviation try to best estimate the population mean? Yet it is still considered a descriptive statistic

  • @KartikKuri-qe6ye
    @KartikKuri-qe6ye 2 ปีที่แล้ว

    thank you

  • @abhi-zc8ub
    @abhi-zc8ub 3 ปีที่แล้ว +4

    From morning i was trying to understand this but i didn't find any clear explanation 😅 finally a video from krish sir 😌 thank you so much sir 😇

  • @Ezio-ft8zm
    @Ezio-ft8zm 11 หลายเดือนก่อน

    poonam kumari se irritate hogya tha and so i landed up here before my stats exam ! feels so good to learn a concept which i didnt even understand after listening two times in a 45 min class !

  • @rubayetalam8759
    @rubayetalam8759 ปีที่แล้ว

    thanks

  • @raj-nq8ke
    @raj-nq8ke 3 ปีที่แล้ว

    Thanks.

  • @nithink94
    @nithink94 3 ปีที่แล้ว

    Thank you for this video.

  • @vankadavathrohith1589
    @vankadavathrohith1589 9 หลายเดือนก่อน

    tq so much :)

  • @josephravi7722
    @josephravi7722 ปีที่แล้ว

    additional information on how it can be seen from the perspective of degrees of freedom - th-cam.com/video/9ONRMymR2Eg/w-d-xo.html

  • @priyaljain5274
    @priyaljain5274 2 ปีที่แล้ว

    This is really helpful Mr Krish 👍😊

  • @aashishmalhotra
    @aashishmalhotra 2 ปีที่แล้ว

    got it thanks

  • @KisaanTuber
    @KisaanTuber 3 ปีที่แล้ว +1

    well explained

  • @DoingMyIkigai
    @DoingMyIkigai ปีที่แล้ว +2

    wasted 10 mins of life ---- not exzact reason given

  • @adityams1659
    @adityams1659 3 ปีที่แล้ว +2

    WHICH SOFTWARE IS THAT !??

  • @gunjanagrawal8626
    @gunjanagrawal8626 2 ปีที่แล้ว +5

    This could be understood well from the simulation video used by the Khan Academy.

  • @mrrishiraj88
    @mrrishiraj88 3 ปีที่แล้ว +1

    Hello Krish

  • @Rana-yc6yt
    @Rana-yc6yt 3 ปีที่แล้ว

    I appreciate what you doing for free but I would really love to better view of your presantation . Honestly your hand writting thing really mess up when watching videos.

  • @dipayanbhadra8332
    @dipayanbhadra8332 ปีที่แล้ว

    "Researchers experimented and saw n-1 gives good estimates"- I am not satisfied with this explanation. If this were an interview question, would the interviewer be happy with this and? Poor explanation.

  • @tanmaygupta8288
    @tanmaygupta8288 3 หลายเดือนก่อน

    Sir aap bs ghuma rhe aur kuchh bhi bol rhe in this video 😂

  • @MeghaDeySarkar
    @MeghaDeySarkar 4 หลายเดือนก่อน

    Best explanation - th-cam.com/video/ke8nSbXUJjQ/w-d-xo.html

  • @kumudsharma007
    @kumudsharma007 2 ปีที่แล้ว

    Cris please remove this video as soon as possible if you want save your reputation. This is completely teaching students very wrong concept.