Unbiased Estimators (Why n-1 ???) : Data Science Basics

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024

ความคิดเห็น • 88

  • @davidszmul2141
    @davidszmul2141 3 ปีที่แล้ว +28

    In order to be even more practical, I would simply say that:
    - Mean: You only need 1 value to estimate it. (Mean is the value itself)
    - Variance: You need at least 2 values to estimate it. Indeed the variance estimates the propagation between values (the more variance, the more spreaded around the mean it is). It is impossible to get this propagation with only one value.
    For me it is sufficient to explain practicaly why it is n for mean and n-1 for variance

    • @chonky_ollie
      @chonky_ollie 2 ปีที่แล้ว

      Best and shortest example I’ve ever seen. What a gigachad

  • @YusufRaul
    @YusufRaul 3 ปีที่แล้ว +41

    Great video, now I understand why I failed that test years ago 😅

  • @jamiewalker329
    @jamiewalker329 3 ปีที่แล้ว +18

    How I think about it: suppose you have n data points: x1, x2, x3, x4.., xn. We don't really know the population mean, so let's just pick the data point on our list which is closest to the sample mean, and use this to approximate the population mean. Say this is xi
    We can then code the data, by subtracting xi from each element - but this doesn't affect any measure of spread (including the variance). But then after coding we will have a ist x1', x2', ...., xn' but the i'th position will be 0. Then only the other n-1 data points will contribute to the spread around the mean, so we should take the average of these n-1 square deviations.

    • @gfmsantos
      @gfmsantos 3 ปีที่แล้ว +1

      I guess the only other n-1 data points will contibuite to the spread around zero not the mean.... I got lost.

    • @jamiewalker329
      @jamiewalker329 3 ปีที่แล้ว +1

      @@gfmsantos 0 is the mean of the coded data.

    • @gfmsantos
      @gfmsantos 3 ปีที่แล้ว +1

      @@jamiewalker329 Yes, but you didn't know the mean before you chose the point. As far as I understood, you've just picked a point that might be close to the sample mean, haven't you?

    • @jamiewalker329
      @jamiewalker329 3 ปีที่แล้ว +2

      @@gfmsantosYes, the sample mean. It's not supposed to be rigorous, just a way of thinking that given any data point as a reference point then there are n-1 independent deviations from that point. One data point gives zero indication of spread. With 2 data points, only the 1 distance between them would give an indication of spread, and so on...

    • @gfmsantos
      @gfmsantos 3 ปีที่แล้ว +1

      @@jamiewalker329 I see. Good. Thanks

  • @Matthew-ez4ze
    @Matthew-ez4ze 11 หลายเดือนก่อน +1

    I am reading a book on Jim Simons, who ran the Medallion fund. I’ve gone down the rabbit hole of Markov chains and this is an excellent tutorial. Thank you.

    • @ritvikmath
      @ritvikmath  11 หลายเดือนก่อน

      Wonderful!

  • @Physicsnerd1
    @Physicsnerd1 3 ปีที่แล้ว +7

    Best explanation I've seen on TH-cam. Excellent!

  • @abderrahmaneisntthatenough6905
    @abderrahmaneisntthatenough6905 3 ปีที่แล้ว +18

    I wish you provide all math related to ml and data science

  • @699ashi
    @699ashi 3 ปีที่แล้ว +2

    I believe this is the best channel I have discovered in a long time. Thanks man.

  • @stelun56
    @stelun56 3 ปีที่แล้ว

    The lucidity of this explanation is commendable.

  • @Ni999
    @Ni999 3 ปีที่แล้ว +2

    That last blue equation looks more straightforward to me as -
    = [n/(n-1)] [σ²-σ²/n]
    =[σ²n/(n-1)] [1-1/n]
    =σ²[(n-1)/(n-1)] = σ²
    ... but that's entirely my problem. :D
    Anyway, great video, well done, many thanks!
    PS - On the job we used to say that σ² came from the whole population, n, but s² comes from n-1 because we lost a degree of freedom when we sampled it. Not accurate but a good way to socialize the explanation.

  • @DistortedV12
    @DistortedV12 3 ปีที่แล้ว +3

    I watch all your vids in my free time. Thanks for sharing!

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว +1

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @junechu9701
    @junechu9701 ปีที่แล้ว

    Thanks!! I love the way of saying "boost the variance."

  • @cadence_is_a_penguin
    @cadence_is_a_penguin ปีที่แล้ว

    been trying to understand this for weeks now, this video cleared it all up. THANK YOU :))

  • @ChakravarthyDSK
    @ChakravarthyDSK 2 ปีที่แล้ว

    Please do one lesson on the concept of ESTIMATORs. It would be good if the basics of these ESTIMATORs is understood before getting into the concept of being BIASED or not. Anyways, you are doing extremely good and you way of explanation is simply superb. clap.. clap ..

  • @subhankarghosh1233
    @subhankarghosh1233 6 หลายเดือนก่อน

    Marvelous... Loved it...❤

    • @ritvikmath
      @ritvikmath  6 หลายเดือนก่อน +1

      Thanks a lot 😊

  • @vvalk2vvalk
    @vvalk2vvalk 3 ปีที่แล้ว +4

    What about n-2 or n-p, howcome more estimators we have the more we adjust? How does it exactly transfer intro calculation and ehat is the logic behind it?

  • @tyronefrielinghaus3467
    @tyronefrielinghaus3467 11 หลายเดือนก่อน

    Good intuitive explantation,,,thanksd

  • @kvs123100
    @kvs123100 3 ปีที่แล้ว +2

    Thanks for the great explanation! But one question! why minus 1? Why not 2? I know the DoF concept would come over here! but all the explanation I have gone through, they have fixed the value of the mean so as to make the last sample not independent!
    but in reality as we take samples the mean is not fixed! It is itself dependent on the value of the samples! then DoF would be number of samples itslef!

  • @neelabhchoudhary2063
    @neelabhchoudhary2063 9 หลายเดือนก่อน

    dude. this is amazingly clear

  • @musevanced
    @musevanced 3 ปีที่แล้ว +15

    Great video. But anyone else feel unsatisfied with the intuitive explanation? I've read a better one.
    When calculating the variance, the values we are using are x_i from 1 to n and x_bar. Supposedly, each of these values represents some important information that we want to include in our calculations. But, suppose we forget about the value x_n and consider JUST the values x_i from 1 to (n-1) and x_bar. It turns out we actually haven't lost any information!
    This is because we know that x_bar is the average of x_i from 1 to n. We know all the data points except one, and we know the average of ALL of the data points, so we can easily recalculate the value of the lost data point. This logic applies not just for x_n. You can "forget" any individual data point and recalculate it if you know the average. Note that if you forget more than one data point, you can no longer recalculate them and you have indeed lost information. The takeaway is that when you have some values x_i from 1 to n and their average x_bar, exactly one of those values (whether its x_1 or x_50 or x_n or x_bar) is redundant.
    The point of dividing by (n-1) is because instead of averaging over every data point, we want to average over every piece of new information.
    And finally, what if we were somehow aware of the true population mean, μ, and decided to use μ instead of x_bar in our calculations? In this case, we would divide by n instead of (n-1), as there would be no redundancy in our values.

    • @cuchulainkailen
      @cuchulainkailen 3 ปีที่แล้ว +2

      Right. The phraseology is this: the system has only n-1 degrees of freedom when you use xbar. ...Xbar has "taken it away".

  • @user-mg2me7tg6v
    @user-mg2me7tg6v 5 หลายเดือนก่อน

    Th last section is so helpful thank you!

    • @ritvikmath
      @ritvikmath  5 หลายเดือนก่อน

      Glad it was helpful!

  • @richardchabu4254
    @richardchabu4254 3 ปีที่แล้ว +1

    well explained very clear to understand

  • @yassine20909
    @yassine20909 2 ปีที่แล้ว

    Now it makes total sense. Thank you 👏👍

  • @AbrarAhmed-ox2fd
    @AbrarAhmed-ox2fd 3 ปีที่แล้ว

    Exactly what I have been looking for.

  • @braineater351
    @braineater351 3 ปีที่แล้ว

    I wanted to ask a question. For E(x bar), x bar is calculated using a sample of size n, so is E(x bar) the average value of x bar over all samples of size n? Other than that, I think this has been one of the more informative videos on this topic. Additionally, many times people tie in the concept of degrees of freedom into this, but usually they show why you have n-1 degrees of freedom and then just say "that's why we divide by n-1", I understand why it's n-1 degrees of freedom, but not how that justifies dividing by n-1. I was wondering if you had any input on this?

  • @DonLeKouT
    @DonLeKouT 3 ปีที่แล้ว +1

    Try explaining the above ideas using the degrees of freedom.

    • @cuchulainkailen
      @cuchulainkailen 3 ปีที่แล้ว

      correct.

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @GauravSharma-ui4yd
    @GauravSharma-ui4yd 3 ปีที่แล้ว +2

    Amazing...

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @martinw.9786
    @martinw.9786 2 ปีที่แล้ว

    Great explanation! Love your videos.

  • @chinmaybhalerao5062
    @chinmaybhalerao5062 2 ปีที่แล้ว

    I guess second approach for n-1 explanation will be right when both population and sample will follow same distribution which is very rare case.

  • @Set_Get
    @Set_Get 3 ปีที่แล้ว

    Thank you. Could you please do a clip on Expected value and it's rules and how to derive some results.

  • @jeffbezos4474
    @jeffbezos4474 2 ปีที่แล้ว

    you're hired!

  • @missghani8646
    @missghani8646 2 ปีที่แล้ว +2

    this is how we can understand stats not by just throwing some number to students

  • @nguyenkimquang0201
    @nguyenkimquang0201 ปีที่แล้ว

    Thank you for great content!!!❤❤❤

    • @ritvikmath
      @ritvikmath  ปีที่แล้ว

      You are so welcome!

  • @nelsonk1341
    @nelsonk1341 ปีที่แล้ว

    you are GREAT

  • @chonky_ollie
    @chonky_ollie 2 ปีที่แล้ว

    Great video, thanks!

  • @soumikdey1456
    @soumikdey1456 2 ปีที่แล้ว

    just wow!

  • @EkShunya
    @EkShunya ปีที่แล้ว

    good one

  • @alexandersmith6140
    @alexandersmith6140 10 หลายเดือนก่อน

    Hi @ritvikmath, I want to understand those derivations in the red brackets. Do you have a good set of sources that will explain to me why those three expected values return their respective formulae?

  • @jingsixu4665
    @jingsixu4665 2 ปีที่แล้ว +1

    Thanks for the explaination from this perspective. Can u talk more about why 'n-1'? I remember there is something with the degree of freedom but I never fully understand that when I was learning it.

    • @samtan6304
      @samtan6304 2 ปีที่แล้ว +3

      I also had this confusion when I first learned it. Say you have a sample with values 1,2,3, Now, you calculate the sample variance. The numerator will be [(1 - 2) + (2 - 2) + (3 - 2)]. Notice in this calculation, you are implicitly saying the sample mean must be 2, because you are subtracting every value by 2. Using this implicit information, you will realize that one term in the numerator cannot vary given the other two terms.

  • @yitongchen75
    @yitongchen75 3 ปีที่แล้ว +1

    is that because of we lose 1 degree of freedom when we used the estimated mean to calculate the estimated variance?

    • @cuchulainkailen
      @cuchulainkailen 3 ปีที่แล้ว

      Correct. It's NOT as author states, that the Variance is boosted.

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

  • @mm_ww_2
    @mm_ww_2 3 ปีที่แล้ว

    tks, great explanation

  • @plttji2615
    @plttji2615 2 ปีที่แล้ว

    Thank you for the video, can you help me how to prove that is unbiased in this question? Question: Compare the average height of employees in Google with the average height in the United States, do you think it is an unbiased estimate? If not, how to prove it is not mathced?

  • @AmineChM21
    @AmineChM21 3 ปีที่แล้ว

    Quality video , keep it up !

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 ปีที่แล้ว

    Great video but still not convinced on the intuition. How do you know that the adjustment compensates for missing tail in sampling? And if so, why not n-2, etc? I guess, if anywhere there would be missing data, it would be in the tail.

    • @yezenbraick6598
      @yezenbraick6598 2 ปีที่แล้ว

      yes why not n-2 Jamie Walker's comment explains it in another way check that out

  • @pranavjain9799
    @pranavjain9799 ปีที่แล้ว

    You are awesome

  • @prof.g5140
    @prof.g5140 2 ปีที่แล้ว +1

    incorrect intuition.
    this is more accurate: ideally the actual sample mean equals the population mean, however the actual sample mean is rarely ideal and there's an error amount. if the sample is more concentrated on lower values, then the sample mean will be lower than the population mean. since the sample is concentrated on lower values and the sample mean is also lower, the differences between the samples and the sample mean will mostly be lower than the samples and the population mean thus lowering the sample variance. if the sample is instead concentrated on higher values, then the sample mean will be higher than the population mean. since the samples are concentrated on higher values and the sample mean is higher than the population mean, the distance between the samples and the sample mean will mostly be higher than the differences between the samples and the population mean thus lower the sample variance. whether the sample is concentrated on lower or higher values (not concentrated is unlikely for small sample sizes), the sample variance (using n as denominator) will prob be lower than the population variance. therefore, we need to add a correction factor.

  • @jtm1283
    @jtm1283 6 หลายเดือนก่อน

    Two criticism (of an otherwise very nice video): 1. all the real work in the proof is done by the formulae in black on the right, for which you provided no explanation; and 2. to talk about sample sd without mentioning degrees of freedom seems incomplete. WRT to the latter, just look inside the summation and ask "how many of these are there?" For the mean, there are n different things (the x-sub-i values), so you divide by n. For sample sd there are n things (the x-sub-i values) minus 1 thing (x-bar), so it's n-1.

  • @BigHotCrispyFry
    @BigHotCrispyFry 3 ปีที่แล้ว

    good stuff!

  • @yepitsodex
    @yepitsodex 10 หลายเดือนก่อน

    the 'we need it to be slightly smaller to make up for it being a sample and not the population' argument isnt needed or realistic. Having n-1, regardless of the size of the sample, says that the one is completely arbitrary just to tweak it the smallest amount. in reality when you go to the sample space from the population space, you lose exactly one degree of freedom. It seems like thats why its n - 1 and not n-2 or something else. if you had all of the sample space numbers except for one of them, the value of the last one would be fixed, because it has to average out to the sample variance. Since it cant be just anything, that is a loss of a degree of freedom, which justifies the use of n-1

  • @Titurel
    @Titurel 8 หลายเดือนก่อน

    4:38 You really should give links to the derivation otherwise we still feel it's hand wavy

  • @asifshikari
    @asifshikari ปีที่แล้ว

    Why n-1...we could adjust even better by doing n-2

  • @mohammadreza9910
    @mohammadreza9910 7 หลายเดือนก่อน

    useful

  • @thomaskim5394
    @thomaskim5394 3 ปีที่แล้ว +1

    You still are not clear why we use n-1 instead n in the sample variance, intuitively.

    • @jamiewalker329
      @jamiewalker329 3 ปีที่แล้ว

      See my comment.

    • @thomaskim5394
      @thomaskim5394 3 ปีที่แล้ว

      @@jamiewalker329 I have already seen a similar argument like yours.

    • @cuchulainkailen
      @cuchulainkailen 3 ปีที่แล้ว

      @@jamiewalker329 It's convoluted. The answer is what I posted. # of degrees of freedom is reduced to n-1 by use of xbar.

    • @venkatnetha8382
      @venkatnetha8382 3 ปีที่แล้ว

      For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
      payhip.com/b/ndY6
      You can download the sample pages so as to see the quality of the content.

    • @thomaskim5394
      @thomaskim5394 3 ปีที่แล้ว +1

      @@venkatnetha8382 What are you talking about?

  • @gianlucalepiscopia3123
    @gianlucalepiscopia3123 3 ปีที่แล้ว

    Never understood why "data science" and not "statistics"

  • @tooirrational
    @tooirrational 3 ปีที่แล้ว

    Bias is not the factor that is used to deside the best estimates...its Mean Squares Error...n-1 is used because error is low not because its unbiased

  • @rhke6789
    @rhke6789 9 หลายเดือนก่อน

    Ah. Learning is i the details. You just skipped over "not interesting" that permits the logic to flow. Not good, Even mentioning the names of the quoted formulas you used but not explain be helpful.... variance decomposition formula or the deviation square formula

  • @alexcombei8853
    @alexcombei8853 3 ปีที่แล้ว