Nice video, few additional comments: 1) we don't square the deviations to keep them positive(we could use absolute value for that), we do this because it is has nice mathematical properties and quite frankly that's just how the second moment is defined(shifted by first moment). 2) we divide by n-1 when calculating the sample stdev but n when calculating the population stdev, why? Because the calculation of stdev requires us to know the mean. Theoretically, if we knew the mean of the population, we wouldn't need to divide by n-1 to find the stdev of a sample. In practice however, in order to find the average squared deviations, we use the sample mean to estimate the population mean. In do doing so, we expend one degree of freedom because estimating the population mean with the sample mean forces the choice of the last data point which means only n-1 data points are free to vary.
The actual reason they are squared is different. By this video's logic you could just remove the negative sign. The actual reason has to do with Euclidean distance. If you treat the data as point in big-dimensional space, you're just taking the distance of the data and the average of the data
That's not the reason, but just an interpretation. Their is no reason; this is simply the definition of population variance being the second central moment, and sample variance is the same thing just scaled to make it an unbiased estimator.
The reason is because it is algebraically easier to work with. In fact, absolute value distance is a superior measure of statistical distance since it is far more robust.
NOTE! If you're wondering why variance is labelled s^2, it's because we squared our measurements earlier. If you wanted just s, that would be "standard deviation", and to get that you literally just square root the variance. that's it.
@@notdaycrucial5179 Yes s^2 is the variance. It has units that are square units of whatever you are measuring. To bring it back to s you take the square root and that's what you'd call the standard deviation. It now has the same units as the thing you're measuring.
It's a bit convoluted but in small samples, just dividing by N actually gives a wrong estimator (we say that it is biased). Diving by N-1 fixes that. Notice that in sufficiently large sample N-1 just tends to be N so it wouldn't matter too much which one you use.
@@alazrabed I wonder if there's a mathematically constructed argument for choosing "N-1" to "make it work". Otherwise it kinda seems like some sort of "quick fix" to a problem, without much reasoning behind it. But I am just a layman.
@@sergio_henrique Oh yes, there is one, and it's actually quite subtle and profound. The idea is that, given your sample mean, you only need to know all of the sample value, except one: you can just deduce the last one if you know everything else. For instance let's say the mean is 5 from a population of three. If I know two values from my sample, let's say 4 and 5, then I can deduce the final one is 6 -- it has to be as such. This implies a loss of _degrees of freedom_ for the sample. Just one, and only one. And this is the "minus one" that you find in the denominator.
Don't know nothing about absolute brackets. Standard deviation is just the square root of the variance. Really useful because it is homogeneous in dimension with the sample measurements you made.
@@renr4502 Oh okay. So then the absolute function allows you, here again, to get rid of the negative values. But it won't measure the same thing. Because with the variance, as we square the difference from the mean, values that are really far from the mean will weight a lot in the final result. This is less true for the "mean deviation" -- I'm not sure where you would use that function, though.
@@alazrabed Yes, exactly, thank you for the explanation. I think I could grasp the understanding of both variance and standard deviation. But back a year ago, I learned statistic in my last year on high school, and they teached me about the mean deviation. I searched up on google and they said that both standard deviation and mean deviation are used to find volatility (Frequently used on finding risk on investmen since higher volatility gives a higher fluctuation). They also said that mean deviation is an alternate to the standard deviation, but are used less frequently. But the standard deviation is preferably used when there are large outliers because they can register higher levels of dispersion (or deviation from the center) than mean absolute deviation. So, I guess, standard deviation is more superior. I'm not really sure why they teached us the mean deviation in the first place, maybe because it's more praticable without calculator.
This actually gave me knawledge
Yes 🥹😭😭
Well in math only at least......
Nice video, few additional comments:
1) we don't square the deviations to keep them positive(we could use absolute value for that), we do this because it is has nice mathematical properties and quite frankly that's just how the second moment is defined(shifted by first moment).
2) we divide by n-1 when calculating the sample stdev but n when calculating the population stdev, why? Because the calculation of stdev requires us to know the mean. Theoretically, if we knew the mean of the population, we wouldn't need to divide by n-1 to find the stdev of a sample. In practice however, in order to find the average squared deviations, we use the sample mean to estimate the population mean. In do doing so, we expend one degree of freedom because estimating the population mean with the sample mean forces the choice of the last data point which means only n-1 data points are free to vary.
read about Bessel's correction to understand the second point
So I have studied this chapter but today I learned this formula to some more extinct
extent*
Satying Hard-David Goggins
Better than my math teacher
The actual reason they are squared is different. By this video's logic you could just remove the negative sign. The actual reason has to do with Euclidean distance. If you treat the data as point in big-dimensional space, you're just taking the distance of the data and the average of the data
That's not the reason, but just an interpretation. Their is no reason; this is simply the definition of population variance being the second central moment, and sample variance is the same thing just scaled to make it an unbiased estimator.
@@ignasa007 in fact it is the reason, because you've misinterpreted my intended meaning of the "reason" in this context
The reason is because it is algebraically easier to work with. In fact, absolute value distance is a superior measure of statistical distance since it is far more robust.
yes but why wouldnt you use absolute for dostance? Your explanation fails to explain this.
Yeah if that was the only reason you coul have just get the root of those bumbers
This channel is really helping us with us with math problems
NOTE!
If you're wondering why variance is labelled s^2, it's because we squared our measurements earlier. If you wanted just s, that would be "standard deviation", and to get that you literally just square root the variance. that's it.
Commenting for algorithm, love this type of content!!!
You square the difference so that if a value is way out from the average it is gonna highly influence the variance
“We just square it to make sure it’s positive”
Why not take the absolute value instead then?
because its like using the distance formula
Its more complicated to take absolute. When u generalise this to continuous values, analysing absolute is difficult.
Why (n-1) ?
Why don't we just take the mod
Travis Scott when he's not making brain rot mumble rap:
if the square is to make everything positive why can't we just take the absolute value of each term?
That is what is used in robust statistics.
You could, it just wouldn’t give the variance.
@@sentheaS I thought s^2 is just the variance squared. Is s^2 then just what the variance is
@@notdaycrucial5179 Yes s^2 is the variance. It has units that are square units of whatever you are measuring. To bring it back to s you take the square root and that's what you'd call the standard deviation. It now has the same units as the thing you're measuring.
@@MrTeen-ul7yc oh ok thanks
A/B Hypothesis testing please
I smell underated.
Please do the population and sample standard deviation with the degree of freedom thing
I'd love to see one about relativity, quantum, or other things in space. I really want to do this but the link is broken and I can't try it either
Damn. AI is insane insane
Thanks bro i never understood statistics i just straight up slap the formulas
Great content
Stay hard
How the hell an ai channel teaching better then a paid teacher.
Why divide by N-1 instead of just N ?
It's a bit convoluted but in small samples, just dividing by N actually gives a wrong estimator (we say that it is biased). Diving by N-1 fixes that. Notice that in sufficiently large sample N-1 just tends to be N so it wouldn't matter too much which one you use.
@@alazrabed I wonder if there's a mathematically constructed argument for choosing "N-1" to "make it work". Otherwise it kinda seems like some sort of "quick fix" to a problem, without much reasoning behind it. But I am just a layman.
@@sergio_henrique Oh yes, there is one, and it's actually quite subtle and profound. The idea is that, given your sample mean, you only need to know all of the sample value, except one: you can just deduce the last one if you know everything else.
For instance let's say the mean is 5 from a population of three. If I know two values from my sample, let's say 4 and 5, then I can deduce the final one is 6 -- it has to be as such.
This implies a loss of _degrees of freedom_ for the sample. Just one, and only one. And this is the "minus one" that you find in the denominator.
@@alazrabed Nice! Makes a lot more sense now with this intuition. Thanks for the clear and succinct explanation!👌
cm squared as final units?
Beautiful
why 'n minus one' below ?
Insane
So hey, why does the Standard Deviation use the absolute bracket instead??
Don't know nothing about absolute brackets. Standard deviation is just the square root of the variance. Really useful because it is homogeneous in dimension with the sample measurements you made.
@@alazrabed I meant "mean deviation", not standard deviation, sorry.
@@renr4502 Oh okay. So then the absolute function allows you, here again, to get rid of the negative values. But it won't measure the same thing. Because with the variance, as we square the difference from the mean, values that are really far from the mean will weight a lot in the final result. This is less true for the "mean deviation" -- I'm not sure where you would use that function, though.
@@alazrabed Yes, exactly, thank you for the explanation. I think I could grasp the understanding of both variance and standard deviation.
But back a year ago, I learned statistic in my last year on high school, and they teached me about the mean deviation.
I searched up on google and they said that both standard deviation and mean deviation are used to find volatility (Frequently used on finding risk on investmen since higher volatility gives a higher fluctuation). They also said that mean deviation is an alternate to the standard deviation, but are used less frequently. But the standard deviation is preferably used when there are large outliers because they can register higher levels of dispersion (or deviation from the center) than mean absolute deviation. So, I guess, standard deviation is more superior.
I'm not really sure why they teached us the mean deviation in the first place, maybe because it's more praticable without calculator.
This is what I invest my dollars in Nvidia for
fuck this is genius