SE(x) = s/√n s/√n+x = ½( s/√n) √n+x = 2√n n+x = 4n x=3n x=3*20=60 You will likely require 60 more measurements, provided the standard deviation remains the same when the sample size increases.
Here's a much simpler solution and the key takeaway: SE(xbar) = s/√n ½SE(xbar) = ½( s/√n) Once you are here, you're done, notice that 1/2 = 1 / √4, therefore, ½SE(xbar) = s/(√n√4,) ½SE(xbar) = s/(√4n,) So, in General, we need 4 times as many observations to cut SE(xbar) in half. Generally, for any factor we wish to reduce SE by; reducing SE by a factor of x requires a x^2 as many observations: (1/x) SE(xbar) = s / √(n * x^2) For the solution to the specific problem, 4n - n = 3n = 60.
Seriously man , i can't believe how much of a difference creators like him make. Honestly they are the reason why people like us have honest confidence to accomplish things in the field that we are doing ! you are the reason growth of this societies happen
Very very helpful. I landed here in my journey on Lean - 6 Sigma education.This series and the other videos from you helped me sharpen my statistics concepts.
EDIT: this answer is wrong, but in an interesting way. So I’m going to leave it here, with a correction posted below. I have posted a separate answer with what I believe is the correct math for the challenge question. _________ If you increase the sample size, two things will happen: (1) n will increase, and (2) you’ll be adding random data points, so they will probably end up with a somewhat different mean and standard deviation. About #1: n appears in the denominator of the formulas for standard deviation *and* SE, so an increase in n is sort of double counted; combining the formulas, you get a denominator of [sqrt(n-1)*sqrt(n)], which is pretty close to just n, especially for larger n. So in short, doubling n should get you close to halving SE. About #2: The change in mean doesn’t affect SE, but if the variation about the mean changes as new data comes in, it would alter the numerator of standard deviation, which in turn alters the numerator of SE. It’s beyond my skills to know if this is likely to change up or down as n grows, so I’ll just assume that, on average, actual variation about the mean does not change. Thus, all that matters is the increase in n, where a doubling of the sample would approximately half the SE.
In the interest of helping anyone who reads this in the future, I’ll explain my error here and then post a new comment with the correct answer to the challenge question. My insight was to look at where n appears in the standard deviation formula. My mistake was not realizing it appears (implicitly) *twice* in that formula. In the standard deviation formula, we see n in the denominator, but it is *also* implicitly in the numerator because the more values you sample, the larger the summation gets. (If you double your sample size, this summation approximately doubles.) So the implicit n in the numerator of the standard deviation and the n in the denominator of the SE approximately cancel out. So we need only use the denominator of the standard deviation to answer this question: it is sqrt(n-1), so we need slightly less than 4x the original sample size @The Gao Thanks for pushing back. This led me to go and reevaluate the algebra again. You might consider that your response would have been more useful if you had engaged with the math I wrote, which introduced a new idea about the role of n in the _standard_ _deviation_ formula. Just telling me I’m wrong, or just telling me what the correct answer is without engaging with my explanation, doesn’t help me learn. I ended up going back and figuring it out for myself, so no worries there. It just didn’t help me or future readers to avoid the mistake I made.
I am new at statistics, therefore this answer could be wrong BUT i think confidence interval is calculated for estimate the population mean, population mean has to be that interval, sample mean doesn't need to be inside that interval (Correct me if i am wrong)
@@codercoder7270 We actually add some value in the sample mean and subtract the same value from the sample mean. So the distance from the sample mean of both, the cieling and the floor value (in the interval) would be the same.
Actually what was wrong is that he forgot to change the sample mean value. 112 in n=5. If u see, then u will find that both below values are equidistant as I mentioned from 112, not the value of mean when n=50, n=500. So, technically he forgot to update the sample mean value when finding the new intervals.
Besides the CLT, what about the following: If np-hat (1 - p-hat) =>10 the sample proportion is approximately ND. Additionally, if n > 0.05 of the population means the sample mean is approximately ND.. For the challenge question, Excel did the work. Using the formula: SE = sqrt (p-hat (1 - p-hat)/n)). I generated a column n = 1 to 100. I then input my data as follows: Lock (F4) the cell having p-hat and for the cell having n= 1 to 100, I (F4), lock the column only allowing SE to be calculated for n=1 to 100. n= 80 produced SE = 1.4230 n= 20 produced SE = 2.8460. SE = 2.8460 * 0.50 = 14230 required additional n= 80 - 20 =60
Very Informative. I recommend this channel for the students who want to learn statistics for Data science and Machine Learning. As almost all concepts are covered that essential to learn for understanding the Machine learning models... Thank you and really appreciate your work
Hi Justin, I've made it but I cannot find the videos of outliers, boutique measures and Range+IQR) which, according to the diagram should be on the list. I searched your channel but I could not find them. Are they posted yet?
Challenge: S.E(sample mean)/2=sample S.D/√(4n) so it will be 80 measurements (4*20) So we need 60 more measurements 🙂thank you for sharing wonderful lectures
Hi Justin, didnt find your video on boutique measures. would love to go through that one as well, just like the other videos of this series. Love your content and explanation.
Is it possible for the confidence intervals not to contain the sample mean, as shown at 12:39 for n=500? If I understand this correctly, it is saying that when you have a sample mean of 114.7 there is a high likelihood that the population mean is between 110.8 and 113.1. Unless we had other information about the population, this doesn't seem reasonable to me.
SE(x)=S/√n Equation for 20 samples equal SE(x)=5/√20 Equation for the required measurements: 1/2SE (x)=S/√n SE(x)=2S/√n Therefore, let equation 1 equal equation 2 5/√20=2S/√n S.√n=2S.√20 √n=(2S.√20)/S (cancel the S together) √n=(2.√20)/1 Square both sides (√n)2=( 2√20)/1)2 n=80 Therefore, you need 60 more measurements if you wish to half the standard error
@zedstatistics Perhaps, I've missed something, but If the confidence intervals are the sample mean +/- the (SE * t score), how is the 95% confidence interval score for n=500 *lower* than the sample mean? Sample mean is 114.7, and the 95% confidence upper threshold is 113.1 in the slide (12:46). Should it not be 0.55 (SE) * 1.964729 (T score) + 114.7 = 115.78? In the slide, we have n = 5 with an even difference +-/ from the sample mean (15.8) with n = 50 we have (6.9 below and 0.3 above) with n = 500 we have (3.9 below and 1.6 below)
i dont think the underlying distribution needs to be a normal random variable...the sample mean distribution will always follow a normal distribution due to central limit theorem...also z statistic is used in case population variance is known otherwise t statistic is used
did you make some mistakens when compute confidential interval as n=50 and n=500, for example, as n=50, and x bar equals 115.3, your confidential interval is [108.5,115.6], however, (108.5+115.6)/2 not equals 115.3. looking forward your help, thx
Just spotted that as well. My guess is that he forgot to change the mean and stuck with 112 as the sample mean. The final mean (114.7) isn't even in the confidence interval
Thanks for your explanation, Justin! Very helpful. Regarding the question in the end: SE = SD/sqrt(n). Since SD = sqrt(Sum((x'-xi)^2)/(n-1)), it makes SE = (whatever)/sqrt(n(n-1)). Therefore, to halve SE, one needs to increase the output of denominator, making sqrt(a(a-1)) = 2sqrt(n(n-1). Thus: sqrt(a(a-1)) = 2sqrt((20(20-1))) a(a-1) = 4(20(20-1) a^2 - a - 1520 = 0 Solving for positive a, we receive something roughly around 40. In order to halve the SE we need to increase n by 20. My best guess.
Can anyone please explain to me the new formula for standard error of sample proportion at 13:54 ? Justin didn't quit go into details about it and I'm not able to extrapolate from the similar formula of standard error of the sample mean that was shown just right before. Why the numerator of SE of the sample proportion is p*(1-p)?
So basically this has something to do with the binomial distribution. When we talk about proportions or situations where we can have two outcomes like (yes or no, Voting in favor or against) we use binomial distribution. In binomial distribution p(1-p) is the variance. So [ p(1-p)]^(1/2) will be the standard deviation Since we know that standard error = standard deviation/ (n^1/2). So we directly put the value of standard deviation in our formula.
SE --> 1/2 SE n --> 4n 20 -- > 80 we need more 60 points for our sample. Everything been equal, "n" and "SE" should only be the changing factors (the other remain fixed) in the SE formula, this is to find the relationship between SE and n. let us say my SE = 12, s= 24 , n=20 SE/2 = 6 i will later on do 1/2 (24/sqrt(20)) = s/sqrt(n) n= 80 x (s^2/ 24^2), everything been equal s=24, in order to see the impact on "n" when changing the SE. n= 80 x (s^2/ s^2) n= 80
Here is one thing I do not understand about the t-distribution: why do we need to assume the population is normally distributed? In effect the t-distribution is the sample distribution for a given sample size, correct? I have read, and it does make intuitive sense as well, that sample distributions do not depend on the underlying population distribution. So, if you have say a sample size of 30, you will have a t-distribution approaching a normal distribution regardless of whether the population distribution was non-normal, bi-modal, skewed or what not. Were am I going wrong here?
Love this explanation, you laid it out so well! However, it is still not clear to me as to why we are using SQRT of n (why SQRT, where does that come from?) Could anyone clarify? cheers!
For the most part, the Standard Deviation of your sample (the numerator) doesn't change much regardless of sample size. So thanks to the "magic" of mathematics, using the square root of N makes it easy to systematically reduce your standard error. If you use the square root of N as the denominator, increasing your N by 4 cuts your SE in half. So this is a quick and easy way to eyeball how large your sample needs to be to reduce your standard error and consequently improve your confidence intervals. That's at least one good reason for using the square root of N as opposed to something else. For a more detailed response: statisticsbyjim.com/hypothesis-testing/standard-error-mean/
This is because when using inductive method, SE = std dev/ sqrt(n) matches more closely with the standard deviation of deductive method. Since we want to at least make our inductive analysis as close as possible to deductive method, we take this expression of SE. Hope this is helpful.
✓n is found because I think of it as sqrt of variance divided by the total observation (mean of the total variance). So finding sqrt of the mean of the total variance gives you Standard deviation/sqrt n .. Hope this helps.
We need about *57* _more_ measurements, for a total of about *77* _______________ Here’s why: The sample size *n* affects the SE formula in 3 places: (1) It appears in the denominator of the SE. (2) It appears as “n-1” in the denominator of *s* (the standard deviation, which is the numerator of the SE formula). (3) It _also_ appears implicitly in the numerator of *s* because the summation there gets larger as the sample size increases, and, assuming the deviations are similar for the new measurements, the summation increases at the same rate as *n* (1) and (3) cancel. They are both just √n. So we can ignore them. Therefore we only need (2) to answer the question. Halving SE will require doubling the denominator of *s* In this question, the denominator of *s* is √(n-1) = √(19) Doubling this: 2 * √(19) = √(4*19) = √(76) = √(77-1) = √(n-1) So *n* = 77. This means we need 77-20=57 _more_ measurements to halve the SE. (Suggestions for improvement to this answer are welcome! 🙂)
I find the probability distribution picture in this example confusing because it appears to be treating the true mean as a random variable, which it is not.
This is an old comment, but here's my understanding - this particular sample mean is an example of one of many sample means that could have been measured. Over many many samples, the sample mean is normally distributed. The t-distribution bit creates a new distribution that centres your specific sample mean as its mean, and computes a reasonable range of values to the left and to the right for the confidence interval. It is therefore saying that the *sampling mean* is distributed, not the underlying population mean. But I think you've hit on something here - the hand-wavey explanation of confidence intervals does seem to imply that there is a 95% probability that any given CI contains the true pop mean. But that's not what CIs are (as he did make a small note about). CIs make no claim about the probability of the true mean lies within a particular range, because that would imply that the true mean is distributed (which, in this form of stats, it is not. It is a fixed number waiting to be discovered, as you've said).
Why does the iq have to be normally distributed? Doesn't the central limit theorem state that the distribution of means will be normally distributed anyway? Great videos btw
Standard formula for standard error is $$SE = \frac{s}{\sqrt{n}}$$. As per the question the desired objective is reduce the SE by half i.e. SE/2. Thus the equations becomes $$\frac{SE}{2} = \frac{s}{\sqrt{n+x}}$$, where n = 20 and x is the additional measurements we are looking for as the answer. Solving this entire question will give us the final result of 60 (it will also involve the substitution of values). Thus 60 additional measurements will be required to decrease the standard error by half. Thanks Note: Above notation is based on Latex. If you put it in a RMarkdown document then you can see it as normal mathematical notation. (provided that latex or tinytex has been installed)
I like and understand your presentation. However, I can't seem to figure out, what formula was used to calculate Std-Err-of-Percent in the following table? Gender Result Frequency Weighted-Frequency Std-Err-of-Wgt-Freq Percent Std-Err-of-Percent 95%CI-for-Percent Male Negative 1 16.20746 16.20746 7.5605 7.5259 0.0000 22.6253 Positive 4 64.82982 31.56546 30.2419 14.4387 1.3398 59.1441 Total 5 81.03728 34.96896 37.8024 15.9078 5.9594 69.6454 Fem Negative 3 100.00000 56.73086 46.6482 18.0779 10.4613 82.8350 Positive 1 33.33333 33.33333 15.5494 14.1520 0.0000 43.8778 Total 4 133.33333 64.91964 62.1976 15.9078 30.3546 94.0406 Total Negative 4 116.20746 58.52508 54.2087 17.5366 19.1054 89.3120 Positive 5 98.16316 45.08850 45.7913 17.5366 10.6880 80.8946 Total 9 214.37061 71.16743 100.0000
Hi, i still didn't get why you took 97.5% for t-distribution part if u just want for 95% only while calculating for the confidence interval for sample mean??
I know this is old, but I can help, when you look at the distribution, 5% lies outside but it's outside in both what are called tails. The 5% is for both on the positive side and for the negative side. Therefore there is 2.5% in each tail. So that's the reason we need to look at the .975. It seems a bit weird I know, but you are actually looking for the overall piece, which means .025 in either direction. So when you do it for .025 you are getting it for one tail. So now if you do it in the + and the - direction, twice that is the .05 that you were looking for. I hope that helped
this is a t-test, which takes into account that the sample size is small. go watch a video about it and it may become clearer although it only really clicked with me today lol
I also got 80 at first but that doesn't change n in the sample variance denominator. So I inserted the formula for the standard deviation into the formula for the standard error and moved things around a bit so that the denominator is (n-1)*sqrt(n). Our goal is to halve the standard error, so we must double this denominator. Insert n = 20 and we get approx. 85 as our denominator and that doubly is 170. Our question now is what is n so that our formula for the denominator equals 170. That's some cumbersome algebra so I put it in online and the result was approx. 31, so we LIKELY need 11 more samples. LIKELY because new samples may shift the numerator, which will shift the mean, which will shift the standard deviation, which will shift the standard error. Let me know what you think, I could be wrong but hoping to be on the right track haha.
I think you are on the right track! I like that you substituted in the formula for standard deviation, which allows you to consider all the places n appears. I think this is what most other commenters are missing. Nice approach! I also did this, and I worked through the math manually to come up with the number 57. (See my comments on this video for details and a simple exposition of the math.)
I did not understand the proportions part. Why does p gets to be 0.65? Why does 65 percent of votes on a specific party result in a p value of 0.65? Could you explain please? I would be thankful. Your videos are superb by the way!
at 12:20 shouldn't you calculate 95% confidence intervals for n=50 and n=500, using their respected means, and not the mean for n=5. The whole point of these intervals looks wrong to me. at 15:32 "If N gets pretty large, all distributions converge to a normal distribution" -- you simply citing the Central Limit Theorem wrong. Try to apply what you're saying to uniform distribution... The Central Limit Theorem says that SAMPLING DISTRIBUTION of means (for example) of ANY distr. converge to a normal distribution, not ANY distr. --> normal !!!
"so heres the formula.. now don´t get too excited."
thanks justin
the impact your videos are having on all these peoples futures is inspiring
SE(x) = s/√n
s/√n+x = ½( s/√n)
√n+x = 2√n
n+x = 4n
x=3n
x=3*20=60
You will likely require 60 more measurements, provided the standard deviation remains the same when the sample size increases.
I like how you point out the standard deviation could change! Nice insight. 🥇
why +x?
can someone explain this?
Remember the question is how many more measurements to carry out...so that's why we say (n + x more measurements). Good work
Here's a much simpler solution and the key takeaway:
SE(xbar) = s/√n
½SE(xbar) = ½( s/√n)
Once you are here, you're done, notice that 1/2 = 1 / √4,
therefore,
½SE(xbar) = s/(√n√4,)
½SE(xbar) = s/(√4n,)
So, in General, we need 4 times as many observations to cut SE(xbar) in half.
Generally, for any factor we wish to reduce SE by; reducing SE by a factor of x requires a x^2 as many observations:
(1/x) SE(xbar) = s / √(n * x^2)
For the solution to the specific problem, 4n - n = 3n = 60.
Seriously man , i can't believe how much of a difference creators like him make. Honestly they are the reason why people like us have honest confidence to accomplish things in the field that we are doing ! you are the reason growth of this societies happen
(S/root(n))/(S/root(a)) = 2
a/n = 4
a = 4n
if n = 20, a = 80, so 60 more observations.
Excellent
The delivery at 2:06 was absolutely perfect and had me chuckling for a good minute
hahaha, this was just perfect,
Hey Brother,
You are the best statistics tutor on the TH-cam.
Thank you so much.
Appreciate your good work.
12:19 shows the 500 count interval below the sample mean.
Very very helpful. I landed here in my journey on Lean - 6 Sigma education.This series and the other videos from you helped me sharpen my statistics concepts.
EDIT: this answer is wrong, but in an interesting way. So I’m going to leave it here, with a correction posted below. I have posted a separate answer with what I believe is the correct math for the challenge question.
_________
If you increase the sample size, two things will happen: (1) n will increase, and (2) you’ll be adding random data points, so they will probably end up with a somewhat different mean and standard deviation.
About #1: n appears in the denominator of the formulas for standard deviation *and* SE, so an increase in n is sort of double counted; combining the formulas, you get a denominator of [sqrt(n-1)*sqrt(n)], which is pretty close to just n, especially for larger n. So in short, doubling n should get you close to halving SE.
About #2: The change in mean doesn’t affect SE, but if the variation about the mean changes as new data comes in, it would alter the numerator of standard deviation, which in turn alters the numerator of SE. It’s beyond my skills to know if this is likely to change up or down as n grows, so I’ll just assume that, on average, actual variation about the mean does not change. Thus, all that matters is the increase in n, where a doubling of the sample would approximately half the SE.
This is not correct. Doubling n would only reduce SE by sqrt of n. Hence you'd need to 4x n to cut SE by half.
In the interest of helping anyone who reads this in the future, I’ll explain my error here and then post a new comment with the correct answer to the challenge question.
My insight was to look at where n appears in the standard deviation formula. My mistake was not realizing it appears (implicitly) *twice* in that formula. In the standard deviation formula, we see n in the denominator, but it is *also* implicitly in the numerator because the more values you sample, the larger the summation gets. (If you double your sample size, this summation approximately doubles.) So the implicit n in the numerator of the standard deviation and the n in the denominator of the SE approximately cancel out. So we need only use the denominator of the standard deviation to answer this question: it is sqrt(n-1), so we need slightly less than 4x the original sample size
@The Gao Thanks for pushing back. This led me to go and reevaluate the algebra again. You might consider that your response would have been more useful if you had engaged with the math I wrote, which introduced a new idea about the role of n in the _standard_ _deviation_ formula. Just telling me I’m wrong, or just telling me what the correct answer is without engaging with my explanation, doesn’t help me learn. I ended up going back and figuring it out for myself, so no worries there. It just didn’t help me or future readers to avoid the mistake I made.
@@minhtoto1542 Thank you so much! I’m so glad someone found it helpful. 😀
Thanks for a great stats course!
timestamp 12:45 - Why sample mean 114.7 is out of a confidence interval [110.8, 113.1] ?
I am new at statistics, therefore this answer could be wrong BUT i think confidence interval is calculated for estimate the population mean, population mean has to be that interval, sample mean doesn't need to be inside that interval (Correct me if i am wrong)
@@codercoder7270 We actually add some value in the sample mean and subtract the same value from the sample mean. So the distance from the sample mean of both, the cieling and the floor value (in the interval) would be the same.
Actually what was wrong is that he forgot to change the sample mean value. 112 in n=5. If u see, then u will find that both below values are equidistant as I mentioned from 112, not the value of mean when n=50, n=500. So, technically he forgot to update the sample mean value when finding the new intervals.
Sir , lovely presentation!
Loved from India
( SE/(ROOT(20 + x) ) = SE/2 => x = 60 , therefore 60 more. Therefore, Total 80 will make Half the SE
Your first "SE" is wrong, it has to be "S" :) cause it's definition is: SE(x) = s/(root(n))
Besides the CLT, what about the following:
If np-hat (1 - p-hat) =>10 the sample proportion is approximately ND. Additionally, if n > 0.05 of the population means the sample mean is approximately ND..
For the challenge question, Excel did the work. Using the formula: SE = sqrt (p-hat (1 - p-hat)/n)). I generated a column n = 1 to 100. I then input my data as follows: Lock (F4) the cell having p-hat and for the cell having n= 1 to 100, I (F4), lock the column only allowing SE to be calculated for n=1 to 100.
n= 80 produced SE = 1.4230
n= 20 produced SE = 2.8460.
SE = 2.8460 * 0.50 = 14230
required additional n= 80 - 20 =60
Very Informative. I recommend this channel for the students who want to learn statistics for Data science and Machine Learning. As almost all concepts are covered that essential to learn for understanding the Machine learning models...
Thank you and really appreciate your work
80 measurements but on the condition that the standard deviation remains the same with the observations increasing from 20 to 80.
i want to believe that your answer is right. but since he said he is going to make the question difficult I suppose there could be more to it
@@tanmaychandak9958 Found you. Thanks for the channel recommendation. :)
@@11Sentinel I am glad you visited the channel. Happy learning :)
60, not 80. You need 60 MORE measurements to halve the standard error. You already have 20.
Hi Justin, I've made it but I cannot find the videos of outliers, boutique measures and Range+IQR) which, according to the diagram should be on the list. I searched your channel but I could not find them. Are they posted yet?
Challenge: S.E(sample mean)/2=sample S.D/√(4n)
so it will be 80 measurements (4*20)
So we need 60 more measurements
🙂thank you for sharing wonderful lectures
You're a God send dude thank you
Hi Justin, didnt find your video on boutique measures. would love to go through that one as well, just like the other videos of this series. Love your content and explanation.
It's the last video in the Descriptive Statistics playlist and goes by the name - Correlation and covariance
@@sagarladhwani4287 Thank you Sagar. I'll check it.
Is it possible for the confidence intervals not to contain the sample mean, as shown at 12:39 for n=500? If I understand this correctly, it is saying that when you have a sample mean of 114.7 there is a high likelihood that the population mean is between 110.8 and 113.1. Unless we had other information about the population, this doesn't seem reasonable to me.
U literally saved my life...
Loved it! I am new to statistics and ML and this has been extremely helpful in understanding the basic concepts 🙏
I'd enjoy watching a video about sample size formulas to achieve specifics margins of errors! Great videos man!!
check khan academy
SE(x)=S/√n
Equation for 20 samples equal
SE(x)=5/√20
Equation for the required measurements:
1/2SE (x)=S/√n
SE(x)=2S/√n
Therefore, let equation 1 equal equation 2
5/√20=2S/√n
S.√n=2S.√20
√n=(2S.√20)/S (cancel the S together)
√n=(2.√20)/1
Square both sides
(√n)2=( 2√20)/1)2
n=80
Therefore, you need 60 more measurements if you wish to half the standard error
where is 5 from?
@@MDMAx old s size as mentioned in the video. Hope this helps.
Дай бог Вам здоровья!
SE(x_bar)_{0.5} = SE(x_bar)_{1.0}/2 = s/(2*sqrt{n}) = s/(sqrt{4n}) i.e. we double the sqrt of the sample size, n in the denominator →2*sqrt{n}
@zedstatistics Perhaps, I've missed something, but If the confidence intervals are the sample mean +/- the (SE * t score), how is the 95% confidence interval score for n=500 *lower* than the sample mean? Sample mean is 114.7, and the 95% confidence upper threshold is 113.1 in the slide (12:46). Should it not be 0.55 (SE) * 1.964729 (T score) + 114.7 = 115.78?
In the slide, we have n = 5 with an even difference +-/ from the sample mean (15.8)
with n = 50 we have (6.9 below and 0.3 above)
with n = 500 we have (3.9 below and 1.6 below)
i dont think the underlying distribution needs to be a normal random variable...the sample mean distribution will always follow a normal distribution due to central limit theorem...also z statistic is used in case population variance is known otherwise t statistic is used
This was so incredibly helpful!!!
did you make some mistakens when compute confidential interval as n=50 and n=500, for example, as n=50, and x bar equals 115.3, your confidential interval is [108.5,115.6], however, (108.5+115.6)/2 not equals 115.3. looking forward your help, thx
Just spotted that as well. My guess is that he forgot to change the mean and stuck with 112 as the sample mean. The final mean (114.7) isn't even in the confidence interval
Thanks for your explanation, Justin! Very helpful.
Regarding the question in the end:
SE = SD/sqrt(n). Since SD = sqrt(Sum((x'-xi)^2)/(n-1)), it makes SE = (whatever)/sqrt(n(n-1)).
Therefore, to halve SE, one needs to increase the output of denominator, making sqrt(a(a-1)) = 2sqrt(n(n-1).
Thus:
sqrt(a(a-1)) = 2sqrt((20(20-1)))
a(a-1) = 4(20(20-1)
a^2 - a - 1520 = 0
Solving for positive a, we receive something roughly around 40.
In order to halve the SE we need to increase n by 20. My best guess.
Amazing!
That's Really Helpful!
The lecture was very helpful.
Can anyone please explain to me the new formula for standard error of sample proportion at 13:54 ? Justin didn't quit go into details about it and I'm not able to extrapolate from the similar formula of standard error of the sample mean that was shown just right before. Why the numerator of SE of the sample proportion is p*(1-p)?
So basically this has something to do with the binomial distribution. When we talk about proportions or situations where we can have two outcomes like (yes or no, Voting in favor or against) we use binomial distribution.
In binomial distribution p(1-p) is the variance. So [ p(1-p)]^(1/2) will be the standard deviation
Since we know that standard error = standard deviation/ (n^1/2). So we directly put the value of standard deviation
in our formula.
@@Shauracool123 Thanks Shaurya
Very good explained, I just understand what its meaning for....thanks.
SE --> 1/2 SE
n --> 4n
20 -- > 80
we need more 60 points for our sample.
Everything been equal, "n" and "SE" should only be the changing factors (the other remain fixed) in the SE formula, this is to find the relationship between SE and n.
let us say my SE = 12, s= 24 , n=20
SE/2 = 6
i will later on do 1/2 (24/sqrt(20)) = s/sqrt(n)
n= 80 x (s^2/ 24^2), everything been equal s=24, in order to see the impact on "n" when changing the SE.
n= 80 x (s^2/ s^2)
n= 80
Here is one thing I do not understand about the t-distribution: why do we need to assume the population is normally distributed? In effect the t-distribution is the sample distribution for a given sample size, correct? I have read, and it does make intuitive sense as well, that sample distributions do not depend on the underlying population distribution. So, if you have say a sample size of 30, you will have a t-distribution approaching a normal distribution regardless of whether the population distribution was non-normal, bi-modal, skewed or what not. Were am I going wrong here?
Best explanation
Love this explanation, you laid it out so well! However, it is still not clear to me as to why we are using SQRT of n (why SQRT, where does that come from?)
Could anyone clarify? cheers!
For the most part, the Standard Deviation of your sample (the numerator) doesn't change much regardless of sample size. So thanks to the "magic" of mathematics, using the square root of N makes it easy to systematically reduce your standard error. If you use the square root of N as the denominator, increasing your N by 4 cuts your SE in half. So this is a quick and easy way to eyeball how large your sample needs to be to reduce your standard error and consequently improve your confidence intervals. That's at least one good reason for using the square root of N as opposed to something else. For a more detailed response: statisticsbyjim.com/hypothesis-testing/standard-error-mean/
This is because when using inductive method, SE = std dev/ sqrt(n) matches more closely with the standard deviation of deductive method. Since we want to at least make our inductive analysis as close as possible to deductive method, we take this expression of SE. Hope this is helpful.
✓n is found because I think of it as sqrt of variance divided by the total observation (mean of the total variance). So finding sqrt of the mean of the total variance gives you Standard deviation/sqrt n ..
Hope this helps.
Sqrt(varíance) / sqrt(n)
Why do we multiply the t critical value to get standard error though? Nice video!! thanks!!
At 12:40, when n = 500, the sample mean is not part of the confidence interval? Is this okay?
We need about *57* _more_ measurements, for a total of about *77*
_______________
Here’s why:
The sample size *n* affects the SE formula in 3 places:
(1) It appears in the denominator of the SE.
(2) It appears as “n-1” in the denominator of *s* (the standard deviation, which is the numerator of the SE formula).
(3) It _also_ appears implicitly in the numerator of *s* because the summation there gets larger as the sample size increases, and, assuming the deviations are similar for the new measurements, the summation increases at the same rate as *n*
(1) and (3) cancel. They are both just √n. So we can ignore them.
Therefore we only need (2) to answer the question.
Halving SE will require doubling the denominator of *s*
In this question, the denominator of *s* is
√(n-1) = √(19)
Doubling this:
2 * √(19)
= √(4*19)
= √(76)
= √(77-1)
= √(n-1)
So *n* = 77.
This means we need 77-20=57 _more_ measurements to halve the SE.
(Suggestions for improvement to this answer are welcome! 🙂)
@12:17 how is the upper limit of the 95% confidence interval of the third sample of 500 at 113.1 less than the sample mean of 114.7 ?
I suppose he used a sample mean of 112 for the confidence interval itself since it would be somewhat in the middle of 110.8 and 113.1.
I find the probability distribution picture in this example confusing because it appears to be treating the true mean as a random variable, which it is not.
*bayesians have entered the chat*
This is an old comment, but here's my understanding - this particular sample mean is an example of one of many sample means that could have been measured. Over many many samples, the sample mean is normally distributed. The t-distribution bit creates a new distribution that centres your specific sample mean as its mean, and computes a reasonable range of values to the left and to the right for the confidence interval. It is therefore saying that the *sampling mean* is distributed, not the underlying population mean.
But I think you've hit on something here - the hand-wavey explanation of confidence intervals does seem to imply that there is a 95% probability that any given CI contains the true pop mean. But that's not what CIs are (as he did make a small note about). CIs make no claim about the probability of the true mean lies within a particular range, because that would imply that the true mean is distributed (which, in this form of stats, it is not. It is a fixed number waiting to be discovered, as you've said).
Why does the iq have to be normally distributed? Doesn't the central limit theorem state that the distribution of means will be normally distributed anyway? Great videos btw
Quantitative genetics ...
Is standard error formula also valid for lognormal distribution? Or in other words, is it applicable to different types of distributions?
If X = Kyp, Show that the standard error in X is given by Sx = (xPS_Y)/y Please how do i go about this?
What is Expected incidence (best guess)
Standard formula for standard error is $$SE = \frac{s}{\sqrt{n}}$$. As per the question the desired objective is reduce the SE by half i.e. SE/2. Thus the equations becomes $$\frac{SE}{2} = \frac{s}{\sqrt{n+x}}$$, where n = 20 and x is the additional measurements we are looking for as the answer. Solving this entire question will give us the final result of 60 (it will also involve the substitution of values). Thus 60 additional measurements will be required to decrease the standard error by half. Thanks
Note: Above notation is based on Latex. If you put it in a RMarkdown document then you can see it as normal mathematical notation. (provided that latex or tinytex has been installed)
60 more measurements to halve that standard error.
To decrease the standard error by half, n would have to increase by a factor of 4 i.e. you would need 80 measurements. Is that correct?
I got 77 total (so we need 57 more measurements). The math explained in detail in a separate comment to this video.
I like and understand your presentation. However, I can't seem to figure out, what formula was used to calculate Std-Err-of-Percent in the following table?
Gender Result Frequency Weighted-Frequency Std-Err-of-Wgt-Freq Percent Std-Err-of-Percent 95%CI-for-Percent
Male Negative 1 16.20746 16.20746 7.5605 7.5259 0.0000 22.6253
Positive 4 64.82982 31.56546 30.2419 14.4387 1.3398 59.1441
Total 5 81.03728 34.96896 37.8024 15.9078 5.9594 69.6454
Fem Negative 3 100.00000 56.73086 46.6482 18.0779 10.4613 82.8350
Positive 1 33.33333 33.33333 15.5494 14.1520 0.0000 43.8778
Total 4 133.33333 64.91964 62.1976 15.9078 30.3546 94.0406
Total Negative 4 116.20746 58.52508 54.2087 17.5366 19.1054 89.3120
Positive 5 98.16316 45.08850 45.7913 17.5366 10.6880 80.8946
Total 9 214.37061 71.16743 100.0000
Hi, i still didn't get why you took 97.5% for t-distribution part if u just want for 95% only while calculating for the confidence interval for sample mean??
I know this is old, but I can help, when you look at the distribution, 5% lies outside but it's outside in both what are called tails. The 5% is for both on the positive side and for the negative side. Therefore there is 2.5% in each tail. So that's the reason we need to look at the .975. It seems a bit weird I know, but you are actually looking for the overall piece, which means .025 in either direction. So when you do it for .025 you are getting it for one tail. So now if you do it in the + and the - direction, twice that is the .05 that you were looking for. I hope that helped
Hi, I am kind lost, I still don't know where are the 2.78 from and how it is calculated (11:20mn)
this is a t-test, which takes into account that the sample size is small. go watch a video about it and it may become clearer although it only really clicked with me today lol
You have to look up the t-statistic in a table of values, or calculate it in excel or other software
I also got 80 at first but that doesn't change n in the sample variance denominator. So I inserted the formula for the standard deviation into the formula for the standard error and moved things around a bit so that the denominator is (n-1)*sqrt(n). Our goal is to halve the standard error, so we must double this denominator.
Insert n = 20 and we get approx. 85 as our denominator and that doubly is 170. Our question now is what is n so that our formula for the denominator equals 170. That's some cumbersome algebra so I put it in online and the result was approx. 31, so we LIKELY need 11 more samples.
LIKELY because new samples may shift the numerator, which will shift the mean, which will shift the standard deviation, which will shift the standard error.
Let me know what you think, I could be wrong but hoping to be on the right track haha.
I think you are on the right track! I like that you substituted in the formula for standard deviation, which allows you to consider all the places n appears. I think this is what most other commenters are missing. Nice approach!
I also did this, and I worked through the math manually to come up with the number 57. (See my comments on this video for details and a simple exposition of the math.)
hi justin, could u plz redirect me to the video dealing with Boutique Measure.
4 times
i.e total 20x4 = 80
What software is used to make this video? power point?
Prezi PPT
thank you😁😁😁
Multiply your standard error by 4?
You mean multiply the number of scores by 4.
the playlist does not order well...
I went with Hi Fidelity Rules. Started with a bang, took it up a notch then cooled it off.
80 measures are required.
60
I did not understand the proportions part. Why does p gets to be 0.65? Why does 65 percent of votes on a specific party result in a p value of 0.65? Could you explain please? I would be thankful. Your videos are superb by the way!
It’s just 65/100 =0.65
at 12:20 shouldn't you calculate 95% confidence intervals for n=50 and n=500, using their respected means, and not the mean for n=5. The whole point of these intervals looks wrong to me.
at 15:32 "If N gets pretty large, all distributions converge to a normal distribution" -- you simply citing the Central Limit Theorem wrong. Try to apply what you're saying to uniform distribution... The Central Limit Theorem says that SAMPLING DISTRIBUTION of means (for example) of ANY distr. converge to a normal distribution, not ANY distr. --> normal !!!
20×4=80 so 60 more
Where my 94 gang at
dont get too excited Hahahahaha
wanted to confirm my answer
is the n=40 whan the se is halfed @zedstatstics