These lectures are amazing. For people like me working in related industry, I also recommend the book of "trustworthy online controlled experiments" as a companion book to these lectures (lectures 7-12) to gain a deeper understanding how these concepts are applied to real world product and business problems.
About calculating the power of a test (1:13:00 ->), isn't the most conservative choice for theta 3.49? I mean, when the true theta equals 3.49, the power becomes 5 % (with 5 % significance level). This is because the sampling distributions of Xn_bar (standardized) under theta_0 (3.5) and theta_1 (3.49) are basically equal. Using this logic, the power of a test would always be equal to the significance level. Obviously, there's a flaw in my logic, but where? Prof. Rigollet says that we should look at the worst possible case (1:13:28). In my humble opinion, the worst possible case is when the difference between theta_0 and theta_1 is miniscule. EDIT: Ok, so Prof. Rigollet basically answers my question in the next lecture (8. at 13.45 ->). If theta_1 is defined on a set Theta_1, then the power is the value where it is minimized within this set. What I didn't get at first, or got overly confused about, is that in real life I choose my theta_1 basically as my best guess. This way, in general it won't always be equal to the significance level. If I let theta_1 be such that it can be the 'next door neighbor' to my theta_0, then the power will be equal to the significance level.
17:46 He states "mean is at distance 1/sqrt(n) from the expectation" but I guess it should be 1/sqrt(n) x sigma: edit: I guess it's the same question by SuperChrzascz
@@tempvariable Because you're addressing a statistical problem and should use units which are on the same order as the variation in your inference problem. meter long sticks aren't so useful at cern!
At 40:12 I don’t think he got the percentage of values between the sigmas of normal distribution right. You have ~95% between +\- 2 (it’s actually +/-1.96)
@17:30:00 he says we are 1/sqrt(n) away from the mean. How does the CLT say that? The CLT says sqrt(n)(theta-thetahat) converges in distribution to N(0,sigma^2), what part of that statement gives an indication that the distance of the estimated mean to true mean is approximately the inverse of the square root of the sample size? It looks like it says the distance is proportional to the square root of the sample size, not the inverse of the square root of the sample size.
bit late, but sqrt(n)(theta-thetahat) is not the distance, it's sqrt(n) multiplied by the distance (theta-thetahat). so, instead, we should say that (theta-thetahat) -> N(0, sigma^2)/sqrt(n) = N(0, sigma^2/n) = N(0, 1/n) (if variance is 1, for simplicity). now, we can say, roughly speaking, that theta and thetahat will probably deviate by around 1/n, as that is the variance of the 'distance' distribution. more accurately i suppose we could consider the absolute value of this distribution (called a folded normal distribution), and observe that the average distance in this case is sqrt(2/pi)/sqrt(n) (look up the expected value for a folded normal and the formula it gives resolves to this expression for mean 0 and variance 1), and is thus at least proportional to 1/sqrt(n). note that if we have the variance =/= 1, then the average distance is instead proportional to sigma/sqrt(n), but i think he was only giving a rough intuition anyway. and for a non-zero mean (not possible for an unbiased estimator, e.g. sample mean when estimating the mean, by definition) the average distance would become a very ugly expression lol.
At the end of the video, according to what the prof. said about the alpha=5% implied that there would be 5% of the innocents that would be guilty. I think that i kinda disagree with his statement since the hypothesis testing is about finding sufficient “evidences” to prove that the innocent is guilty. So, instead of sticking with his assumption, can we just say that we need at least 10 evidences to approve that the innocent is guilty?(actually it can be any number that the investigator would determine. I just made it up.
hypothesis testing actually starts at 52:13
just great! This course is the best to clear up all the mess that is all around the testing hypothesis in various sources!
These lectures are amazing. For people like me working in related industry, I also recommend the book of "trustworthy online controlled experiments" as a companion book to these lectures (lectures 7-12) to gain a deeper understanding how these concepts are applied to real world product and business problems.
can you identify which chapters to follow alongside these 7-12 lectures?
About calculating the power of a test (1:13:00 ->), isn't the most conservative choice for theta 3.49? I mean, when the true theta equals 3.49, the power becomes 5 % (with 5 % significance level). This is because the sampling distributions of Xn_bar (standardized) under theta_0 (3.5) and theta_1 (3.49) are basically equal.
Using this logic, the power of a test would always be equal to the significance level. Obviously, there's a flaw in my logic, but where? Prof. Rigollet says that we should look at the worst possible case (1:13:28). In my humble opinion, the worst possible case is when the difference between theta_0 and theta_1 is miniscule.
EDIT:
Ok, so Prof. Rigollet basically answers my question in the next lecture (8. at 13.45 ->). If theta_1 is defined on a set Theta_1, then the power is the value where it is minimized within this set. What I didn't get at first, or got overly confused about, is that in real life I choose my theta_1 basically as my best guess. This way, in general it won't always be equal to the significance level. If I let theta_1 be such that it can be the 'next door neighbor' to my theta_0, then the power will be equal to the significance level.
17:46 He states "mean is at distance 1/sqrt(n) from the expectation" but I guess it should be 1/sqrt(n) x sigma:
edit: I guess it's the same question by SuperChrzascz
I think he deliberately omitted the sigma. Distance is measured in units. Your units might as well be in terms of sigma.
@@jmann277 The sigma depends on the population why would a person prefer a unit that changes with each population?
@@tempvariable Because you're addressing a statistical problem and should use units which are on the same order as the variation in your inference problem. meter long sticks aren't so useful at cern!
he says it is order of given the variance is a constant the distance is mainly controlled by n
Shouldn't He have multiplied the 0.3 by sigma=sqrt[373] around 18:40 ?
Yeah technically that would be more precise but he was more focused on the concept rather than finer details.
At 40:12 I don’t think he got the percentage of values between the sigmas of normal distribution right.
You have ~95% between +\- 2 (it’s actually +/-1.96)
@17:30:00 he says we are 1/sqrt(n) away from the mean. How does the CLT say that? The CLT says sqrt(n)(theta-thetahat) converges in distribution to N(0,sigma^2), what part of that statement gives an indication that the distance of the estimated mean to true mean is approximately the inverse of the square root of the sample size? It looks like it says the distance is proportional to the square root of the sample size, not the inverse of the square root of the sample size.
bit late, but sqrt(n)(theta-thetahat) is not the distance, it's sqrt(n) multiplied by the distance (theta-thetahat). so, instead, we should say that (theta-thetahat) -> N(0, sigma^2)/sqrt(n) = N(0, sigma^2/n) = N(0, 1/n) (if variance is 1, for simplicity). now, we can say, roughly speaking, that theta and thetahat will probably deviate by around 1/n, as that is the variance of the 'distance' distribution. more accurately i suppose we could consider the absolute value of this distribution (called a folded normal distribution), and observe that the average distance in this case is sqrt(2/pi)/sqrt(n) (look up the expected value for a folded normal and the formula it gives resolves to this expression for mean 0 and variance 1), and is thus at least proportional to 1/sqrt(n).
note that if we have the variance =/= 1, then the average distance is instead proportional to sigma/sqrt(n), but i think he was only giving a rough intuition anyway. and for a non-zero mean (not possible for an unbiased estimator, e.g. sample mean when estimating the mean, by definition) the average distance would become a very ugly expression lol.
Thank you for making this SO INTUITIVE. You rock.
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍
that was some really badly cleaned blackboard at the 0:33
at 5:56 he says that x1,....,xn are gaussian? what is each Xi denoting in this case?
Any random variable x1,...,xn. Because they are IID they are all gaussian
At the end of the video, according to what the prof. said about the alpha=5% implied that there would be 5% of the innocents that would be guilty. I think that i kinda disagree with his statement since the hypothesis testing is about finding sufficient “evidences” to prove that the innocent is guilty. So, instead of sticking with his assumption, can we just say that we need at least 10 evidences to approve that the innocent is guilty?(actually it can be any number that the investigator would determine. I just made it up.
Alpha controls the maximum value of Type 1 Errors. It means that by chance only 1 of 20 can wrongly assumed to be guilty when they are innocent.
Such a great lecture!!
thanks ♥️🤍
so fast that even auto caption cannot catch up!
He is French isn't he?
He answers that in the first video if you watched it properly. Yes, he is.
Hyothesis testing is the weakest subfield in mathematics
People claiming it is fast. Me watching at 2x speed :p
Why this bullshit bicycle in the classroom?
I think he injured one of his leg and is using the bicycle to move around. See the little hop at 1:08:01
this is helpful ♥️🤍