NOTE: This StatQuest was brought to you, in part, by a generous donation from TRIPLE BAM!!! members: M. Scola, N. Thomson, X. Liu, J. Lombana, A. Doss, A. Takeh, J. Butt. Thank you!!!! Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
This video makes me feel so happy. I've had a paper in review for a little while, the editor and one reviewer noticed in one of my figures (in which I had shown the raw data and CIs) that a single datum point was not in line with the rest (n=6 not great but better than most molecular biology papers!). They asked me to add an extra replicate to the experiment. I refused on the grounds of p-hacking and showed them a power analysis (+93% power including the variation of the point) and showed the point is a Grubbs outlier. I'm still waiting for the editor to get back to me ;)
@@karolgilbertosolanosuarez9094 I won ;) gurney et al 2020 microbiology Combinatorial quorum sensing in Pseudomonas aeruginosa allows for novel cheating strategies
Man josh i am trying to finish your playlist . Before this i had tried khan academy, brandon foltz, few books and i failed horribly. P value was something i could never grasp . I think the issue was that everyone talked about what is p value but never explained HOW it is calculated. You really go in the depth of how and why. And this one video right here has blown my mind. It just doesnt explain power analysis but also how you estimate population mean and how p values work . It's 5 am right now and i am still studying .Youve really blown my mind!
the other teachers cannot explain the concepts clearly because their concepts are not clear in the first place LOL. thats where our Mr BAM Sir rocks :D
Wow! I'm very impressed by your very clear explanations and calculations in this video. It was very easy to understand and follow. Big thanks from a PhD-student in Sweden!
Thank you! For my paper my lecturer said we don’t have to include a power analysis, so it wasn’t taught… so glad I watched this because I was struggling to justify my sample size and now it all makes so much more sense.
I am so happy to see the growth rate of Josh Starmer's StatQuest. I still remember I was one of the few subscriber who joined your fan base when the subscribers where in thousands!. Great going happy learning
Fantastic! I’m taking a statistical modeling class with machine learning and this particular topic just wasn’t sticking with me. As advertised, this was super clear and nailed home all the key points!
Your videos are absolutely amazing! It would be fantastic if you could create (if it doesn't already exist) a video covering the entire hypothesis testing process, including all the steps. That would involve determining the sample size and addressing the temptations encountered along the way until reaching the final result.
At 14:09, several assumptions and parameters are presented on the screen, creating the impression of the existence of "power analysis hacking." Furthermore, the concept of double-layered probabilities, where there is an 80% likelihood of correctly rejecting the Null Hypothesis and a 5% chance of randomly obtaining results below the threshold, is enough to make one's head spin. Nevertheless, this video provides the most exceptional and clear explanation.
Hey! After chatting for an hour with chatgpt about p-values, p-hacking and power analysis, I wanted to test my knowledge with a simple case and chatgpt suggested me a coffee shop experiment. The experiment is going to be about a new bean and if it improved the customer satisfaction or not. I'll try to prepare a dataset for this. If I can, I'll try to share! Also, thanks a lot for the content. I want to be a machine learning engineer and decided to learn the fundementals of statistics for that, but statistics basically changed the way I see the world ahahahah.
This is, as usual, excellent. I'm a little bit surprised that you do not show alpha and beta for a given test (as the surface below or above the critical value). It allows people to understand the main problem of the power analysis: you need to bet on the effect size as, by definition, it's unknown. It leads, as you know, to soo many problems as we alsways discover too late that the actual mean difference is smaller than expected (also known as the "curse of just a little bit too small sample". The most common solution being a ritual sacrifice of a random slave. Sorry, I want to mean a PhD student.
Excellent lesson on power analysis. I am immensely impressed by StatQuest’s instructional videos. One criticism here, however, is that the final computation of the necessary sample size to achieve the desired power could have been covered rather than telling a student to find an app on some university’s web site. I think most students could probably work out the math but it seems to go against the overall grain of the talk which careful walks the student through a well chosen toy example that nevertheless fully elucidates the process.
Thanks! I decided to not go over the math for calculating power because I did not want to give the illusion that there was a single equation that fit all situations. In reality, for each test, there is a different equation that we need to use to calculate power and, rather than worry about all of that, I thought it would be easier (and much more practical) to show how power analyses are done.
Josh, thanks so much for your videos - so clearly explained relative to other content I've seen on inference. I do have a few questions. 1. In an A/B testing scenario, we don't know the mean and the standard deviation of the distribution of the treatment group before hand. We do know the mean and the standard deviation of the prior distribution. If I want to estimate sample size required for different effect sizes, holding the chosen p-value threshold constant at 0.05 and power at 0.8, how would we do that given that we don't have s2 or standard deviation of the second group? 2. Also since we don't know the mean of the treatment group's distribution, how would we calculate the estimated difference in means to plug it into the formula for effect size? 3. Do you have a video on How to pick the right test?
i have a question. so knowing statistical power, significance level, effect size, and sample size are all related through power analysis, then wouldn't choosing a statistical power (or even significance level, as we know 0.05 threshold is just for teaching demonstration and in reality it can be any different amount) also be consider p-hacking by effectively getting a desire sample size in "remote" in order to have it prove (or disprove) hypothesis at the researcher's discretion? or is there an objective way to determine the statistical power (and significance level)? most of the materials i read often says "commonly use power = 0.8 or alpha = 0.05" or even heard "amount chosen at researcher's discretion" but without giving sufficient reason for picking those amount.
There are a couple of things to say. 1) Because we select all of the thresholds (power, significance etc.) before doing the experiment, we end up with a sample size that should not be biased in terms of having a higher probability of giving us a false positive of false negative. The key is that we set those parameters before we do the experiment and then we have to live with the result no matter what it says. If we didn't get the result we wanted, and then we did it again, then that would be p-hacking. 2) The thresholds for significance and power are often "field specific" and depend on how easy or difficult it is to control things. In physics labs, they can control things relatively well, so they tend require stricter thresholds. However, when working with human data, where it's hard to control anything, then they tend to have more relaxed criteria. The thresholds also change depending on how serious the consequences are of getting things wrong. For example, if we are testing for a Ebola, we might want to error on the side of caution and allow more false positives so that we can minimize the false negatives.
Thank you Josh for all your videos! they help make statistics look less deadly. Would look forward to some content on bayesian statistics and then maybe solving problems using both bayesian and frequentist methods :)
Hi Josh, thank you a lot for the awesome videos! I have two basic questions: 1. The denominator for the pooled estimated SD in a general condition is the number of the distributions (and not always 2), right? 2. What is the deal with statistics power calculator? isn't there simple formulas we can use ourselves? and does the googling we do to choose one, involve the nature of our experiment, or we just randomly choose one?
1) I'm not sure how you could have more than two distributions, but, presumably, if you did, then you would divide by that number. 2) Every single statistical test has a different formula for doing power calculations. And there are a lot of statistical tests, because there are a lot of different experiment types and data types. Going through every single statistical test would take forever and, to be honest, be pretty boring. So the best thing to do is to google "power calculator" and you'll find a page that has a pulldown menu where you select the test that you want to do (t-test? chi-square goodness of fit? K-S test? etc.) and then plug in your numbers. bam.
I'm not a fan of statistics or maths, and I've already finished writing my 500 word explanation of statistical powers and how they work. But I'm just watching this for fun now...
Thanks Josh! Just want to make sure, the green and red normal distributions are created from 1. collecting data_size=n from one group, calculate mean 2. repeat 1 R times 3. use R means to create green/red normal distribution correct?
Unfortunately that's not correct. The green and red distributions are the "population" distributions that we are trying to estimate with relatively small sample sizes. For details on population distributions and how they are different from samples, see: th-cam.com/video/vikkiwjQqfU/w-d-xo.html and th-cam.com/video/SzZ6GpcfoQY/w-d-xo.html
Nice videos Josh. One thing concerning p-values that I think should be mentioned is, that they are a continues value, so 0.05 is somewhat arbitrary. In terms of p-hacking and the notion of correlation vs. causality, taking 0.05 too seriously as evidence can be troublesome. You may actually miss out on some good science, just because the p-value was > 0.05 and of course, you may be too optimistic in the other direction. Maybe you already mentioned this, and I just missed it :-)
Thank you Josh for being so careful and patient. I would like to ask you a question that I am not quite clear about. You used s when calculating pooled estimated standard deviations and represented s with a dotted line on the normal distribution. However, it seems that s is very wide in the figure, which looks like 2s. I want to make sure that this s is what we call s, right?
Love your videos, thanks! :) Quick question though - from this video it seems that we should google 'statistics power calculator' after we set up 3 things - alpha, power and effect size. We can easily choose alpha and power, but what about effect size? It seems to be the thing that you are able to calculate after the test finishes, you can't set it up beforehand. Or maybe you meant something like 'minimal effect size that is enough', e.g. from business perspective?
It depends on the area you are working in. For example, in biology, we often want a 2-fold effect size. In physics, a smaller effect size might be good enough. So you have to know what people want in the field to determine the value.
@@statquest thank you for the answer. And in case of AB testing? It would be nice to know beforehand what would be the sufficient sample size. Then what should we set up in place of effect size?
What if I am trying to compare three groups? How would I calculate the effect size? What other key words should I google in addition to "statistics power calculator"? Thanks in advance for answer my questions, your videos are awesome!
Thank you for the detailed explanation! But...I am having trouble finding a reliable power calculator online that allows me to input the Effect Size...perhaps you could recommend a few? What power calculator did you use to get the value of sample size 9? Regardless, I loved the first part of your video! Very detailed, very interactive, thank you for this!
Hi Josh, thanks for the amazing videos! There is one part of this that confuses me, how do you get the population means and population standard deviations to put in the power analysis calculator? Do you simply run the experiment once with many replicates for your positive control and negative control? Then is it okay to apply the resulting sample size to your conditions?
@@statquest I'm also struggling with this. In previous publications, they don't show the actual numbers for mean and standard deviation; only a graph. I guess I have to just guess based on the graph?
Hi Josh, Thoroughly enjoyed the video. One quick question though: if the two means are d1 and d2, then how do you to take the difference between them, should it be |d1-d2| or one can used either d1-d2 or d2-d1. On a side note my kids have started complaining - They say that I am always watching stat quest :)
The order doesn't matter (and we don't use the absolute value either). If you want to learn a very cool way to compare two means, check out my playlist on linear regression. Believe it or not, a test to compare means is a type of linear regression: th-cam.com/play/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU.html (Tell your kids that I apologize for making your watch more StatQuest! :)
Thanks Josh. I wanted to ask what should you do if all your data comes from one giant experiment/repository. Your explanations usually assume we have done multiple experiments of small sample size and can generate pooled statistics. But if you did one big experiment with a very large sample size, should you artificially separate individual test into small groups so you can generate pooled statistics? Thanks again, I hope i make sense.
Most of the time people do one big experiment and collect all of the data at once. This is the standard procedure. However, in the video, I show what happens if you can repeat that same experiment a bunch of times to give you a sense of how whatever you are estimating (like the mean) has less variation when you have more samples. Thus, the repeated experiments in this video are just an illustration to give you confidence that a large sample size will be more accurate that a small one.
Wonderful video, very clear as always. Often the terms 'type 1 error' and 'type 2 error' are used in these discussions, or α and β. Was there a reason for avoiding this terminology, aside from just generally avoiding jargon?
Thanks for another great video Josh! There's one thing I still fail to understand, though: Isn't it true that a p-value and Power measure the same thing, namely the chance of (correctly) rejecting the H0? And if this is the case, is a chance of 80% of correctly rejecting the H0 not a downgrade from having a 95% chance when using an alpha of .05?
No, p-values and power are fundamentally different. p-values assume that the null hypothesis is correct. Power assumes that the null hypothesis is not correct. This makes pretty much everything about them different.
Really enjoying these videos - I was recommend by a colleague to check out your channel. One question I wanted to ask you - Is the idea that we would collect 10 samples several times, and then get the mean of the estimated means and compare that to the mean of the estimated means from the other distribution using a t-test? Just curious how it all comes together but maybe that comes in some later videos...thank you
The idea is to collect 10 measurements 1 time and then do your tests. However, this video illustrates what would happen if we did it a bunch of times to give you a sense of how much variability there would be in different sample sizes. The larger the sample size, the lower the variability in the mean, and thus, the more confidence we can have in its value.
Been following the Statistics Fundamentals playlist quite well up until this point. Now things are getting a bit meta. P-values are the threshold for false positives, to be able to say "we have a 5% tolerance for getting the wrong result" - that's already pretty meta. Then FDR and the BH-method look at "distributions of p-values" to sub-divide the true positives from the false positives. Double-meta. Now on top of that we are doing a "power analysis" to get an 80% chance of trusting our p-values? Triple meta? We already set our tolerance of 0.05, saying "we know we'll get it wrong 5% of the time" so why do we then care about if those 5% are true/false or "close enough". I'll keep watching but FYI that's the thought process for a new viewer.
Power is sort of the opposite of a p-value. p-values tell us about the probabilities of false positives. Power tells us about the probabilities of false negatives.
Thank for making these videos. One question on the beginning of this video. What is the actual method behind to "do a statistical test and compare the means and get a p-value=0.06"? From what I read, is T-test for two sample mean in this case? Thanks!
Hi Josh! Thanks for your amazing explanation! Appreciate greatly! I have a questions. I read from a book, it says that large sample size can have unwanted implication. QUOTE ....'By “too high” we mean that by increasing sample size, smaller and smaller effects (e.g., correlations) will be found to be statistically significant. The researcher must always be aware that sample size can affect the statistical test either by making it insensitive (at small sample sizes) or overly sensitive (at very large sample sizes). UNQUOTE So does it mean that even the test we are conducting is from the same distribution(or null hypothesis is true) . It will some how reject the null hypothesis given large enough sample size? Thanks!
Regardless of the sample size, you always need to make sure that the effect size is large enough to have "meaning". For example, if taking an expensive medication increases life span by 2 seconds, then who cares if it is statistically significant or not - 2 seconds is not worth taking an expensive medication. So when ever you do statistics, don't just look at the p-value, also look at the effect size and make sure it is meaningful.
If the sample size is large enough, the distribution doesn't matter (per the Central Limit Theorem: th-cam.com/video/YAlJCEDH2uY/w-d-xo.html ). Alternatively you can use non-parametric methods. These do not depend on the distribution.
@@statquest Thank you very much! We have a really small sample size, so I don't think that is an option. Do you maybe have a link to wich non-parametric methods we can use?
@@nelhuens7590 Unfortunately, I don't. But if your data is continuous, instead of using a t-test, you can use a Mann-Whitney U test: en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
Man, I'm totally hooked on your playlist (it deserves to be on Netflix!). When you mentioned having a sample size of 9, I got a bit confused. How many samples, each with a size of 9, should I collect? I always struggle with this. For instance, would it be the same if I had 3 samples, each with a size of 3? Could you clarify that?"
@@vladfarias It depends. For example, 1 person getting 9 measurements could be different than 3 people getting 3 measurements each because there could be variation in how the 3 different people collected measurements. Or if we use the same person, but collected 3 measurements on 3 different days, because there could be extra variation due to the different days. So your estimate of variation needs to take how the data will be collected into account.
StatQuest with Josh Starmer But if you already have the sample mean and data, wouldn’t that mean that you already finished your experiment and have the sample size fixed? To me this seems like a chicken and egg problem. Or are we talking about an iterative process where you finish the experiment, check if the sample size is sufficient for the Power you want, then do more experiment to get more sample?
You can estimate the mean and standard deviation from a preliminary experiment, or from a literature search, or just use an educated guess. I mention all of these options at 13:57
@@lupita3689 that was exactly my thinking as well. 1. To check if samples belong to different groups, collect samples. 2. To know how many samples to collect, use power formula. 3. To get power formula, plug in mean & SD of the 2 different groups.
Hi Josh! Excellent video as always. I’m wondering what should I conclude if I planned a study which finally had a low power to detect the actual difference I observed, however despite that fact, I actually did get a significant P-value. My conclusion should be that there’s a real difference between the two groups or instead of that, there’s a higher probability my result is actually a false positive?
Thank you, very helpful! I am a new bioinformatician and planning to do differential gene expression with DESeq2 comparing a human sample size=1000. How could we apply power analysis for RNA seq differential expression?
I believe there are some standard RNS-seq power calculators out there that you can use. I remember using one, but that was 6 years ago and I bet there is something better now.
Awesome video! There's really nothing like being able to visualise a concept. Is there a reason you never mentioned the standard error when you were explaining how larger sample sizes increase the confidence that your sample mean is close to the population mean? I mean standard error is literally the measure of that confidence right?
Yes, the standard error is the measure of that confidence. However, I omitted it because I was simply trying to convey the concept of power, rather than dive into the technical terminology.
Hi Josh, for a trustworthy result I would combine Power Analysis to determine the sample size and after that calculate P-values with the FDR correction. Is this right? Or is the FDR correction not more needed after Power Analysis? Many thanks for your response!
Great series of video again Josh!! I have one question tho, and I have tried to ask Chatgpt, and it didnt help much... Given we rejected the null, if we say alpha is the possibilities of falsely reject the null, lets say it is 0.05. And then the opposite would be correctly reject the null which is 1-0.05 = 0.95. Isn't this exactly the definition of Power? So given alpha = 0.05, the chance of correctly reject the null is 95%, and that is our power? I know it is somehow false, but i cannot figure out why
I actually have an entire video that answers your question: th-cam.com/video/Rsc5znwR5FA/w-d-xo.html (I'll give you a hint: 1-alpha is not what you think it is. 1-alpha is related to the probability that we will fail to reject the null).
@@statquest Thanks a lot Josh! I realized I made that mistake haha! 1-alpha is consist of two parts, 1.correctly accept the true null 2. falsely accept the false null and Power is the chance of correctly reject the false null. So they are not completely the same
Hi Josh, really really appreciate these videos and I have picked up a lot but learned more knowledge. This could be of great help to my study and research. Actually, I have a little question about power analysis. If I am handling a small dataset and have no idea about the sample/population distribution, so I choose permutation/bootstrapping to test the mean/median to see if there is any statistical significance between the two groups. After calculating the p-value, do I still need to perform a power analysis? Thanks
It depends. If it's just a one off test, and you're not going to repeat it, then you don't need to do a power analysis (it might still be interesting to do one, but you don't need to do one). However, if you are using it as a pilot to decide if you should collect more data, then you should do a power analysis. Otherwise you could do something called "p-hacking". Here are the details on p-hacking: th-cam.com/video/HDCOUXE3HMM/w-d-xo.html
@@statquest Thank you so much for the detailed reply. I didn't understand why the professor say the p-value sometimes is not that important until I viewed your p-hacking video. Really appreciated!
Hey Josh! Thanks a lot for such amazing content!!! Had a question regarding p-hacking: What exactly goes wrong into giving false positives if we adjust the sample size to suit our needs? As in, we are sampling from the same distribution and performing the exact same statistical test to determine the p value.
Really informative video - thanks. I was just wondering how you would go about doing this for an RNA-seq experiment? How would you go about working out the effect size based on a previous experiment from the same lab. or on GEO? A more focused video would be amazing, but any advice on where to go from here would be really useful. Thank you.
RNA-seq power analysis is super simple because people have already worked out the variation for a variety of standard test subjects (humans, inbred mice etc). You just google "power analysis rna-seq" and all kinds of good stuff comes up. For example, cqs-vumc.shinyapps.io/rnaseqsamplesizeweb/ or scotty.genetics.utah.edu/
Thanks for the video :) and I got 2 questions: 1. If I cannot find any published data for the effect size, what should be the sample size to generate a preliminary data? 2. is the threshold of significance (alpha) = p value?
Thank you for the excellent video. If you please, I would like to ask you a simple question. In order to calculate the effect size, you need to know S which depends on the sample size (in the video it is 10). Then you use this to estimate the "sample size" which is 9. What happens if they are so much different? Can this happen in 2 cases 1. the start sample size >> the calculated one and vice versa.
In theory, 's' (the standard deviation) does not actually depend on the sample size. We can get it from any source or just guess based on previous experience. Thus, if the output sample size is very different, that's OK.
Hi, very useful material and well explained. Thanks. What should be the power if i want to accept null hypothesis? How should be the overlap and sample size, in this case?
Unfortunately, traditional statistics is not designed to accept the null. The best we can do is to "fail to reject the null". You can make this failure relatively convincing if you select a relatively high power (like 95%) for rejecting the null and still fail.
@@statquest I have a subject interviewed at time 0 and at the same time, the subject has given rating for those questions asked via an app. Now, such measurements are repeated at an interval. I want to show that there is no statistical difference between interview rating and self-rating via app. I need to find out the sample number for this analysis. What should be the power , alpha and effect size to find out the sample number? I appreciate your help. Thank you
Thanks for the great video. In your example you said 9 measurements per a group is enough. What does group mean in correlation studies? Do we have more than one group in these studies?
@@Mostafaseyyedabadi If you're just trying to establish a correlation, you just have one group. You can determine the sample size using a sample size calculator for regression. For example, you could try this one: www.statskingdom.com/sample_size_regression.html
I have watched a lot of videos of yours, I fell at times in the name of simplicity you are diluting the concepts. Like in this I waited 15 minutes for you to tell me go use the calculator?
I can assure you that I don't dilute the concepts. That said, every single statistical test has a different equation required to calculate power. So the important thing isn't to memorize every single test, but instead to remember the main ideas behind the equations, which is what this video covers.
Hi Josh, thank you for the nice video! Quick question, in some online calculators, I find something called "minimum detectable effect", which seems I can pick ANY random number during the sample size calculation. Is it exactly same as the "effect size" you mentioned here in the definition perspective? I'm asking because it seems your effect size is a FIXED number.
I discovered these videos a while ago and they're really great for understanding various statistical concepts. I need to do some power analyses on non-parametric data, is that even a thing? Is there even a variability estimate for a Mann-Whitney two sample test? Thanks.
Dear Dr. Starmer. Imagine I run a pilot test with 5 subjects in each group to get an estimate of the estimated mean and standard deviation from each group, so that I can compute a power analysis. The result of the power analysis suggest that I should run the test with at least 15 subjects on each group. Question: Do I need to measure 15 new subjects for each group or can I reuse the data that I already have (5 listeners on each group) from the pilot test? Is this statistically fine or it is considered p-hacking? And it is an strict no-go or it is not recommended but it is ok? In some research fields, getting data from few subjects is very time consuming and expensive, therefore discarding it must be well justified. Could you share your thoughts about this? Thank you so much and thanks for a great stats course!
Ideally you would get new data. Imagine I collected 3 measurements per group, but those measurements were extreme compared to what I would normally get and that there is no real difference. The power analysis would tell me that I need to collect 3 samples. If I did not collect new data, I would get a false positive.
Thank you for the great video! I have one question though- 1:51 In the normal distribution graph, If the variable is day to recovery which can't be lower than 0, then would the estimated populatiln probability distribution be following poisson distribution..? Sorry if I may be confused with the concepts in advance!
Believe it or not, the normal distribution is often used to approximate the poisson distribution. The idea is that if the means are far enough from 0, then it's a good fit. This is due to the central limit theorem. For details, see: online.stat.psu.edu/stat414/lesson/28/28.2
Thanks for the great video! Do I understand correctly that the sample size determined from the power analysis depends on the means and variances estimated from the first experiment? How do I deal with the fact that this first experiment could result in estimating means and variances far away from the population's?
It's the same - even with just one sample, the sample has a distribution that may or may not be the same as an ideal distribution that we are comparing it to.
Thanks for this video I now get the 'effect size' true meaning I am unable to find the sample size calculator. I am learning stats to work on SPSS but was curious how you got the value 9 for the example
Thanks for the amazing video! Could you please help understand how to know if the p value difference is significant when data is imbalanced and distributions severely overlap?
@@statquest Wow, that was quick! Thanks! One last thing, umm.. is there any book you have written on applied statistics or one you recommend please? I mostly need it for machine learning applications. P.S. embedding you songs into your book would be something!!
@@priyamvadabhardwaj6331 I'm writing a book on machine learning that has some basic statistics, but it won't be out for another year. Unfortunately I don't know of any other good book.
Thank you Josh! I have a question. Could you collect a sample from Drug A and Drug B and use the two samples to calculate an Estimated Mean and Standard Deviation in order to calculate the Sample Size required for future research?
The main ideas are the same for dichotomous outcomes, however, the details might be a little different. So I recommend you start by finding a power calculating program for the test you want to use.
NOTE: This StatQuest was brought to you, in part, by a generous donation from TRIPLE BAM!!! members: M. Scola, N. Thomson, X. Liu, J. Lombana, A. Doss, A. Takeh, J. Butt. Thank you!!!!
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
This video makes me feel so happy. I've had a paper in review for a little while, the editor and one reviewer noticed in one of my figures (in which I had shown the raw data and CIs) that a single datum point was not in line with the rest (n=6 not great but better than most molecular biology papers!). They asked me to add an extra replicate to the experiment. I refused on the grounds of p-hacking and showed them a power analysis (+93% power including the variation of the point) and showed the point is a Grubbs outlier. I'm still waiting for the editor to get back to me ;)
BAM! :)
What did the editor reply?
@@raycyst-k9v Editor said: BAAM!
So... What happened? Do you have your DOI for us to read? 🎉🥳🤩
@@karolgilbertosolanosuarez9094 I won ;) gurney et al 2020 microbiology Combinatorial quorum sensing in Pseudomonas aeruginosa allows for novel cheating strategies
Man josh i am trying to finish your playlist . Before this i had tried khan academy, brandon foltz, few books and i failed horribly. P value was something i could never grasp . I think the issue was that everyone talked about what is p value but never explained HOW it is calculated. You really go in the depth of how and why. And this one video right here has blown my mind. It just doesnt explain power analysis but also how you estimate population mean and how p values work . It's 5 am right now and i am still studying .Youve really blown my mind!
Hooray!!! I'm glad my videos are helpful. :)
@@statquest HUGE BAM!
You've freaking done it again. I can't belive how simple this explanation was compared to the lecture I got in school. Thank you very much!
bam!
the other teachers cannot explain the concepts clearly because their concepts are not clear in the first place LOL. thats where our Mr BAM Sir rocks :D
Wow! I'm very impressed by your very clear explanations and calculations in this video. It was very easy to understand and follow. Big thanks from a PhD-student in Sweden!
Thank you very much! :)
Thank you! For my paper my lecturer said we don’t have to include a power analysis, so it wasn’t taught… so glad I watched this because I was struggling to justify my sample size and now it all makes so much more sense.
Awesome!!! :)
I am so happy to see the growth rate of Josh Starmer's StatQuest. I still remember I was one of the few subscriber who joined your fan base when the subscribers where in thousands!. Great going happy learning
Thank you very much! :)
I learned more (and laughed more) in these 16 minutes than a whole semester at med school. Thank you!
bam! :)
Fantastic! I’m taking a statistical modeling class with machine learning and this particular topic just wasn’t sticking with me. As advertised, this was super clear and nailed home all the key points!
Glad it was helpful!
I've done so much stats but akways had a difficulty understanding power analysis. This is very clear, and practical.
Thanks!
One of the best channels out there to learn statistics.
Thank you!
Your videos are absolutely amazing! It would be fantastic if you could create (if it doesn't already exist) a video covering the entire hypothesis testing process, including all the steps. That would involve determining the sample size and addressing the temptations encountered along the way until reaching the final result.
Great suggestion!
Thanks I haven’t slept this good in weeks
noted!
At 14:09, several assumptions and parameters are presented on the screen, creating the impression of the existence of "power analysis hacking." Furthermore, the concept of double-layered probabilities, where there is an 80% likelihood of correctly rejecting the Null Hypothesis and a 5% chance of randomly obtaining results below the threshold, is enough to make one's head spin. Nevertheless, this video provides the most exceptional and clear explanation.
Thanks!
Thanks Josh, you are the best at storytelling when explaining statistics.
Wow, thanks!
Josh in May 2020: "Imagine there is a virus"
2020: "Hold my beer"
:)
You are a great teacher, Josh ! Thank you so much. For a long time, I had difficulty in understanding this concept. Now, it is crystal clear.. :-)
Thanks!
Simulation is really helpful to understand this.
Thank you Professor. I will use it for my class. It is so well explained, I learnt how to explain complex concepts.
Glad it was helpful!
Hey! After chatting for an hour with chatgpt about p-values, p-hacking and power analysis, I wanted to test my knowledge with a simple case and chatgpt suggested me a coffee shop experiment. The experiment is going to be about a new bean and if it improved the customer satisfaction or not. I'll try to prepare a dataset for this. If I can, I'll try to share! Also, thanks a lot for the content. I want to be a machine learning engineer and decided to learn the fundementals of statistics for that, but statistics basically changed the way I see the world ahahahah.
I have to learn how to do those statistical tests though :)) I'll watch the quest
bam!
This is, as usual, excellent. I'm a little bit surprised that you do not show alpha and beta for a given test (as the surface below or above the critical value). It allows people to understand the main problem of the power analysis: you need to bet on the effect size as, by definition, it's unknown. It leads, as you know, to soo many problems as we alsways discover too late that the actual mean difference is smaller than expected (also known as the "curse of just a little bit too small sample". The most common solution being a ritual sacrifice of a random slave. Sorry, I want to mean a PhD student.
I'm happy my fate made me study statistics after you made this playlist , Thank you so much for this amazing explanation
Happy to help!
The interesting examples really gave me a new perspective on the problem. Appreciate this! Thank you so much!
Glad it was helpful!
Excellent lesson on power analysis. I am immensely impressed by StatQuest’s instructional videos. One criticism here, however, is that the final computation of the necessary sample size to achieve the desired power could have been covered rather than telling a student to find an app on some university’s web site. I think most students could probably work out the math but it seems to go against the overall grain of the talk which careful walks the student through a well chosen toy example that nevertheless fully elucidates the process.
Thanks! I decided to not go over the math for calculating power because I did not want to give the illusion that there was a single equation that fit all situations. In reality, for each test, there is a different equation that we need to use to calculate power and, rather than worry about all of that, I thought it would be easier (and much more practical) to show how power analyses are done.
Josh, thanks so much for your videos - so clearly explained relative to other content I've seen on inference. I do have a few questions.
1. In an A/B testing scenario, we don't know the mean and the standard deviation of the distribution of the treatment group before hand. We do know the mean and the standard deviation of the prior distribution. If I want to estimate sample size required for different effect sizes, holding the chosen p-value threshold constant at 0.05 and power at 0.8, how would we do that given that we don't have s2 or standard deviation of the second group?
2. Also since we don't know the mean of the treatment group's distribution, how would we calculate the estimated difference in means to plug it into the formula for effect size?
3. Do you have a video on How to pick the right test?
1) Do a pilot study to get a sense of the mean and standard dev.
2) Same
3) Not yet.
Totally loving the series
Hooray! :)
I love it when you're saying "BAM!" lol
BAM! :)
Nice way to start a remote workday.
Niila Saarinen yes right?
Hooray! :)
Hope your boss is not the subscriber :P
Thank you for making learning more interesting. Double bammm for the knowledge and funny remarks
Thank you! :)
i swear to god that you are a life saver man
:)
excellent video series in general, and particularly good video on Power Analysis! Thank you for making and sharing!
Thank you!
i have a question. so knowing statistical power, significance level, effect size, and sample size are all related through power analysis, then wouldn't choosing a statistical power (or even significance level, as we know 0.05 threshold is just for teaching demonstration and in reality it can be any different amount) also be consider p-hacking by effectively getting a desire sample size in "remote" in order to have it prove (or disprove) hypothesis at the researcher's discretion? or is there an objective way to determine the statistical power (and significance level)? most of the materials i read often says "commonly use power = 0.8 or alpha = 0.05" or even heard "amount chosen at researcher's discretion" but without giving sufficient reason for picking those amount.
There are a couple of things to say. 1) Because we select all of the thresholds (power, significance etc.) before doing the experiment, we end up with a sample size that should not be biased in terms of having a higher probability of giving us a false positive of false negative. The key is that we set those parameters before we do the experiment and then we have to live with the result no matter what it says. If we didn't get the result we wanted, and then we did it again, then that would be p-hacking.
2) The thresholds for significance and power are often "field specific" and depend on how easy or difficult it is to control things. In physics labs, they can control things relatively well, so they tend require stricter thresholds. However, when working with human data, where it's hard to control anything, then they tend to have more relaxed criteria. The thresholds also change depending on how serious the consequences are of getting things wrong. For example, if we are testing for a Ebola, we might want to error on the side of caution and allow more false positives so that we can minimize the false negatives.
Thank you Josh for all your videos! they help make statistics look less deadly. Would look forward to some content on bayesian statistics and then maybe solving problems using both bayesian and frequentist methods :)
Will do!
Hi Josh, thank you a lot for the awesome videos!
I have two basic questions:
1. The denominator for the pooled estimated SD in a general condition is the number of the distributions (and not always 2), right?
2. What is the deal with statistics power calculator? isn't there simple formulas we can use ourselves? and does the googling we do to choose one, involve the nature of our experiment, or we just randomly choose one?
1) I'm not sure how you could have more than two distributions, but, presumably, if you did, then you would divide by that number.
2) Every single statistical test has a different formula for doing power calculations. And there are a lot of statistical tests, because there are a lot of different experiment types and data types. Going through every single statistical test would take forever and, to be honest, be pretty boring. So the best thing to do is to google "power calculator" and you'll find a page that has a pulldown menu where you select the test that you want to do (t-test? chi-square goodness of fit? K-S test? etc.) and then plug in your numbers. bam.
watching a night before my finals for STaT! thanks josh
Good luck!!
every time when I watch this series it reminds me of Moss in the IT Crowd :D
Nice!
Gpower is very good software for sample size calculation.
I'm not a fan of statistics or maths, and I've already finished writing my 500 word explanation of statistical powers and how they work. But I'm just watching this for fun now...
Hooray! And congratulations on finishing your essay.
Thanks Josh! Just want to make sure, the green and red normal distributions are created from
1. collecting data_size=n from one group, calculate mean
2. repeat 1 R times
3. use R means to create green/red normal distribution
correct?
Unfortunately that's not correct. The green and red distributions are the "population" distributions that we are trying to estimate with relatively small sample sizes. For details on population distributions and how they are different from samples, see: th-cam.com/video/vikkiwjQqfU/w-d-xo.html and th-cam.com/video/SzZ6GpcfoQY/w-d-xo.html
Nice videos Josh. One thing concerning p-values that I think should be mentioned is, that they are a continues value, so 0.05 is somewhat arbitrary. In terms of p-hacking and the notion of correlation vs. causality, taking 0.05 too seriously as evidence can be troublesome. You may actually miss out on some good science, just because the p-value was > 0.05 and of course, you may be too optimistic in the other direction.
Maybe you already mentioned this, and I just missed it :-)
I mention this in my video that explains p-values: th-cam.com/video/vemZtEM63GY/w-d-xo.html
This tutorial is great and make a lot of common sense.
Thank you!
Thank you Josh for being so careful and patient. I would like to ask you a question that I am not quite clear about. You used s when calculating pooled estimated standard deviations and represented s with a dotted line on the normal distribution. However, it seems that s is very wide in the figure, which looks like 2s.
I want to make sure that this s is what we call s, right?
's' looks pretty accurate to me. About 65% of the area under each curve should be under the dotted line.
Love your videos, thanks! :) Quick question though - from this video it seems that we should google 'statistics power calculator' after we set up 3 things - alpha, power and effect size. We can easily choose alpha and power, but what about effect size? It seems to be the thing that you are able to calculate after the test finishes, you can't set it up beforehand. Or maybe you meant something like 'minimal effect size that is enough', e.g. from business perspective?
It depends on the area you are working in. For example, in biology, we often want a 2-fold effect size. In physics, a smaller effect size might be good enough. So you have to know what people want in the field to determine the value.
@@statquest thank you for the answer. And in case of AB testing? It would be nice to know beforehand what would be the sufficient sample size. Then what should we set up in place of effect size?
@@jakubmazur1159 Again, this is domain specific.
What if I am trying to compare three groups? How would I calculate the effect size? What other key words should I google in addition to "statistics power calculator"? Thanks in advance for answer my questions, your videos are awesome!
Add "ANOVA" to your search (which compares 3 or more groups). In general, if you have a specific test in mind, just add that to your search.
Thank you for the detailed explanation!
But...I am having trouble finding a reliable power calculator online that allows me to input the Effect Size...perhaps you could recommend a few? What power calculator did you use to get the value of sample size 9?
Regardless, I loved the first part of your video! Very detailed, very interactive, thank you for this!
www.statskingdom.com/32test_power_t_z.html
Hi Josh, thanks for the amazing videos! There is one part of this that confuses me, how do you get the population means and population standard deviations to put in the power analysis calculator? Do you simply run the experiment once with many replicates for your positive control and negative control? Then is it okay to apply the resulting sample size to your conditions?
You have to estimate the population parameters with preliminary data, or data from other sources (like publications) or just take an educated guess.
@@statquest Thanks very much for your reply!
@@statquest I'm also struggling with this. In previous publications, they don't show the actual numbers for mean and standard deviation; only a graph. I guess I have to just guess based on the graph?
@@somenerdyblonde Email the authors of the publication.
Hi Josh, Thoroughly enjoyed the video. One quick question though: if the two means are d1 and d2, then how do you to take the difference between them, should it be |d1-d2| or one can used either d1-d2 or d2-d1. On a side note my kids have started complaining - They say that I am always watching stat quest :)
The order doesn't matter (and we don't use the absolute value either). If you want to learn a very cool way to compare two means, check out my playlist on linear regression. Believe it or not, a test to compare means is a type of linear regression: th-cam.com/play/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU.html (Tell your kids that I apologize for making your watch more StatQuest! :)
Thanks Josh. I wanted to ask what should you do if all your data comes from one giant experiment/repository. Your explanations usually assume we have done multiple experiments of small sample size and can generate pooled statistics. But if you did one big experiment with a very large sample size, should you artificially separate individual test into small groups so you can generate pooled statistics? Thanks again, I hope i make sense.
Most of the time people do one big experiment and collect all of the data at once. This is the standard procedure. However, in the video, I show what happens if you can repeat that same experiment a bunch of times to give you a sense of how whatever you are estimating (like the mean) has less variation when you have more samples. Thus, the repeated experiments in this video are just an illustration to give you confidence that a large sample size will be more accurate that a small one.
Another great content! Would be great to include some examples for confidence interval and hypothesis testing!
Hypothesis testing is right around the corner.
Dude, you have awesome explanatory superPower!
Thanks! 😃
Swear. if he was my stat teacher I'd ace it.
:)
Fantastic explanation, very useful and awesome graphics, high educational value!! CLAP CLAP
Thank you very much! :)
Wonderful video, very clear as always. Often the terms 'type 1 error' and 'type 2 error' are used in these discussions, or α and β. Was there a reason for avoiding this terminology, aside from just generally avoiding jargon?
I really don't like that terminology. I'd rather people say "false positive" or "false negative" instead of "type 1" or "type 2".
This explanation is amazing!!! Thank you so very much!!
Thank you!
Thanks for another great video Josh! There's one thing I still fail to understand, though: Isn't it true that a p-value and Power measure the same thing, namely the chance of (correctly) rejecting the H0? And if this is the case, is a chance of 80% of correctly rejecting the H0 not a downgrade from having a 95% chance when using an alpha of .05?
No, p-values and power are fundamentally different. p-values assume that the null hypothesis is correct. Power assumes that the null hypothesis is not correct. This makes pretty much everything about them different.
@@statquest thanks a lot for replying! I clearly still have some work to do😁
@@baksteen2420 To learn more about p-values, try: th-cam.com/video/vemZtEM63GY/w-d-xo.html and th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
14:43 damn just the part I needed and you guys decide to skip it.
The fact is, every single test has a different way to calculate power.
Really enjoying these videos - I was recommend by a colleague to check out your channel. One question I wanted to ask you - Is the idea that we would collect 10 samples several times, and then get the mean of the estimated means and compare that to the mean of the estimated means from the other distribution using a t-test? Just curious how it all comes together but maybe that comes in some later videos...thank you
The idea is to collect 10 measurements 1 time and then do your tests. However, this video illustrates what would happen if we did it a bunch of times to give you a sense of how much variability there would be in different sample sizes. The larger the sample size, the lower the variability in the mean, and thus, the more confidence we can have in its value.
Been following the Statistics Fundamentals playlist quite well up until this point. Now things are getting a bit meta. P-values are the threshold for false positives, to be able to say "we have a 5% tolerance for getting the wrong result" - that's already pretty meta. Then FDR and the BH-method look at "distributions of p-values" to sub-divide the true positives from the false positives. Double-meta. Now on top of that we are doing a "power analysis" to get an 80% chance of trusting our p-values? Triple meta? We already set our tolerance of 0.05, saying "we know we'll get it wrong 5% of the time" so why do we then care about if those 5% are true/false or "close enough". I'll keep watching but FYI that's the thought process for a new viewer.
Power is sort of the opposite of a p-value. p-values tell us about the probabilities of false positives. Power tells us about the probabilities of false negatives.
Very nice and simplified way to put it, thank you Josh! Keep up the good work
Thank for making these videos. One question on the beginning of this video. What is the actual method behind to "do a statistical test and compare the means and get a p-value=0.06"? From what I read, is T-test for two sample mean in this case? Thanks!
Yes, in this case I'm using a t-test.
Hi Josh! Thanks for your amazing explanation! Appreciate greatly! I have a questions.
I read from a book, it says that large sample size can have unwanted implication.
QUOTE
....'By “too high” we mean that by increasing sample size, smaller and smaller effects (e.g., correlations) will be found to be statistically significant. The researcher must always be aware that sample size can affect the statistical test either by making it insensitive (at small sample sizes) or overly sensitive (at very large
sample sizes). UNQUOTE
So does it mean that even the test we are conducting is from the same distribution(or null hypothesis is true) . It will some how reject the null hypothesis given large enough sample size? Thanks!
Regardless of the sample size, you always need to make sure that the effect size is large enough to have "meaning". For example, if taking an expensive medication increases life span by 2 seconds, then who cares if it is statistically significant or not - 2 seconds is not worth taking an expensive medication. So when ever you do statistics, don't just look at the p-value, also look at the effect size and make sure it is meaningful.
Great explanation! What do you do if you want to compare a normal distribution with a non-normal distribution?
If the sample size is large enough, the distribution doesn't matter (per the Central Limit Theorem: th-cam.com/video/YAlJCEDH2uY/w-d-xo.html ). Alternatively you can use non-parametric methods. These do not depend on the distribution.
@@statquest Thank you very much! We have a really small sample size, so I don't think that is an option. Do you maybe have a link to wich non-parametric methods we can use?
@@nelhuens7590 Unfortunately, I don't. But if your data is continuous, instead of using a t-test, you can use a Mann-Whitney U test: en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test
Man, I'm totally hooked on your playlist (it deserves to be on Netflix!). When you mentioned having a sample size of 9, I got a bit confused. How many samples, each with a size of 9, should I collect? I always struggle with this. For instance, would it be the same if I had 3 samples, each with a size of 3? Could you clarify that?"
What time point, minutes and seconds, in the video are you referring to?
@@statquest 14:56. Anyway, would it be the same? (One sample size 9 and 3 samples size 3)
@@vladfarias It depends. For example, 1 person getting 9 measurements could be different than 3 people getting 3 measurements each because there could be variation in how the 3 different people collected measurements. Or if we use the same person, but collected 3 measurements on 3 different days, because there could be extra variation due to the different days. So your estimate of variation needs to take how the data will be collected into account.
Wait, but you assumed you already knew the two distributions? Since you'd need the population mean and s.d. to calculate the effect size?
The mean and standard deviations are estimated from the data.
StatQuest with Josh Starmer But if you already have the sample mean and data, wouldn’t that mean that you already finished your experiment and have the sample size fixed? To me this seems like a chicken and egg problem. Or are we talking about an iterative process where you finish the experiment, check if the sample size is sufficient for the Power you want, then do more experiment to get more sample?
You can estimate the mean and standard deviation from a preliminary experiment, or from a literature search, or just use an educated guess. I mention all of these options at 13:57
StatQuest with Josh Starmer Ah, sorry I missed that, and thanks!
@@lupita3689 that was exactly my thinking as well.
1. To check if samples belong to different groups, collect samples.
2. To know how many samples to collect, use power formula.
3. To get power formula, plug in mean & SD of the 2 different groups.
followed you on bilibili haha, cant wait to your new vedios, so I came here :)
bam! :)
Hi Josh! Excellent video as always. I’m wondering what should I conclude if I planned a study which finally had a low power to detect the actual difference I observed, however despite that fact, I actually did get a significant P-value. My conclusion should be that there’s a real difference between the two groups or instead of that, there’s a higher probability my result is actually a false positive?
If you have an underpowered study that is significant, then it is significant and you say BAM! :)
so clear and understandable !!!! Thank you very much!!!
Glad it helped!
Thank you, very helpful! I am a new bioinformatician and planning to do differential gene expression with DESeq2 comparing a human sample size=1000. How could we apply power analysis for RNA seq differential expression?
I believe there are some standard RNS-seq power calculators out there that you can use. I remember using one, but that was 6 years ago and I bet there is something better now.
Awesome video! There's really nothing like being able to visualise a concept. Is there a reason you never mentioned the standard error when you were explaining how larger sample sizes increase the confidence that your sample mean is close to the population mean? I mean standard error is literally the measure of that confidence right?
Yes, the standard error is the measure of that confidence. However, I omitted it because I was simply trying to convey the concept of power, rather than dive into the technical terminology.
Hi Josh,
for a trustworthy result I would combine Power Analysis to determine the sample size and after that calculate P-values with the FDR correction. Is this right?
Or is the FDR correction not more needed after Power Analysis?
Many thanks for your response!
If you do multiple tests, you should always use FDR.
Great series of video again Josh!! I have one question tho, and I have tried to ask Chatgpt, and it didnt help much...
Given we rejected the null, if we say alpha is the possibilities of falsely reject the null, lets say it is 0.05. And then the opposite would be correctly reject the null which is 1-0.05 = 0.95. Isn't this exactly the definition of Power? So given alpha = 0.05, the chance of correctly reject the null is 95%, and that is our power? I know it is somehow false, but i cannot figure out why
I actually have an entire video that answers your question: th-cam.com/video/Rsc5znwR5FA/w-d-xo.html (I'll give you a hint: 1-alpha is not what you think it is. 1-alpha is related to the probability that we will fail to reject the null).
@@statquest Thanks a lot Josh! I realized I made that mistake haha! 1-alpha is consist of two parts, 1.correctly accept the true null 2. falsely accept the false null and Power is the chance of correctly reject the false null. So they are not completely the same
This lecture was really excellent 😊
Glad you liked it!
Hi Josh, really really appreciate these videos and I have picked up a lot but learned more knowledge. This could be of great help to my study and research. Actually, I have a little question about power analysis. If I am handling a small dataset and have no idea about the sample/population distribution, so I choose permutation/bootstrapping to test the mean/median to see if there is any statistical significance between the two groups. After calculating the p-value, do I still need to perform a power analysis? Thanks
It depends. If it's just a one off test, and you're not going to repeat it, then you don't need to do a power analysis (it might still be interesting to do one, but you don't need to do one). However, if you are using it as a pilot to decide if you should collect more data, then you should do a power analysis. Otherwise you could do something called "p-hacking". Here are the details on p-hacking: th-cam.com/video/HDCOUXE3HMM/w-d-xo.html
@@statquest Thank you so much for the detailed reply. I didn't understand why the professor say the p-value sometimes is not that important until I viewed your p-hacking video. Really appreciated!
Hey Josh! Thanks a lot for such amazing content!!! Had a question regarding p-hacking: What exactly goes wrong into giving false positives if we adjust the sample size to suit our needs? As in, we are sampling from the same distribution and performing the exact same statistical test to determine the p value.
I explain p-hacking in this video: th-cam.com/video/HDCOUXE3HMM/w-d-xo.html
Really informative video - thanks. I was just wondering how you would go about doing this for an RNA-seq experiment? How would you go about working out the effect size based on a previous experiment from the same lab. or on GEO? A more focused video would be amazing, but any advice on where to go from here would be really useful. Thank you.
RNA-seq power analysis is super simple because people have already worked out the variation for a variety of standard test subjects (humans, inbred mice etc). You just google "power analysis rna-seq" and all kinds of good stuff comes up. For example, cqs-vumc.shinyapps.io/rnaseqsamplesizeweb/ or scotty.genetics.utah.edu/
@@statquest thank you - very helpful!
What a way to explain the things😍
Thank you! :)
Great video, I’m still a bit confused on how having different sample size for the two groups would affect the calculations
Here's something that might help: stats.stackexchange.com/questions/108079/can-i-do-a-t-test-power-analysis-for-unequal-size-groups-which-produces-2-differ
@@statquest thanks!! that was a useful link
Great, easy, and simple video. Could you provide a reference for the formula you used to calculate Cohen's (d)? Thanks!
en.wikipedia.org/wiki/Effect_size NOTE: The pooled standard deviation is simplified to assume the same number of measurements from both categories.
Excellent explanation...
Thank you! :)
Thanks for the video :) and I got 2 questions:
1. If I cannot find any published data for the effect size, what should be the sample size to generate a preliminary data?
2. is the threshold of significance (alpha) = p value?
1) Even if there is no published data, there is probably a standard sample size used within your field. Or just take a guess.
2) Yes
Thank you for the excellent video. If you please, I would like to ask you a simple question. In order to calculate the effect size, you need to know S which depends on the sample size (in the video it is 10). Then you use this to estimate the "sample size" which is 9. What happens if they are so much different? Can this happen in 2 cases 1. the start sample size >> the calculated one and vice versa.
In theory, 's' (the standard deviation) does not actually depend on the sample size. We can get it from any source or just guess based on previous experience. Thus, if the output sample size is very different, that's OK.
@@statquest Thank you very much for your answer.
Hi, very useful material and well explained. Thanks. What should be the power if i want to accept null hypothesis? How should be the overlap and sample size, in this case?
Unfortunately, traditional statistics is not designed to accept the null. The best we can do is to "fail to reject the null". You can make this failure relatively convincing if you select a relatively high power (like 95%) for rejecting the null and still fail.
@@statquest I have a subject interviewed at time 0 and at the same time, the subject has given rating for those questions asked via an app. Now, such measurements are repeated at an interval. I want to show that there is no statistical difference between interview rating and self-rating via app. I need to find out the sample number for this analysis. What should be the power , alpha and effect size to find out the sample number? I appreciate your help. Thank you
You're the best, boss!
Wow, thanks!
Thanks for the great video. In your example you said 9 measurements per a group is enough. What does group mean in correlation studies? Do we have more than one group in these studies?
What time point in the video, minutes and seconds, are you asking about?
@@statquest thanks for your quick reply. I asked about minute: 14 and second: 59
@@Mostafaseyyedabadi If you're just trying to establish a correlation, you just have one group. You can determine the sample size using a sample size calculator for regression. For example, you could try this one: www.statskingdom.com/sample_size_regression.html
I have watched a lot of videos of yours, I fell at times in the name of simplicity you are diluting the concepts. Like in this I waited 15 minutes for you to tell me go use the calculator?
I can assure you that I don't dilute the concepts. That said, every single statistical test has a different equation required to calculate power. So the important thing isn't to memorize every single test, but instead to remember the main ideas behind the equations, which is what this video covers.
Hi Josh, thank you for the nice video! Quick question, in some online calculators, I find something called "minimum detectable effect", which seems I can pick ANY random number during the sample size calculation. Is it exactly same as the "effect size" you mentioned here in the definition perspective? I'm asking because it seems your effect size is a FIXED number.
I believe that the "minimum detectable effect" tells you how small an effect size your sample will be able to detect.
Awesome channel. Thank you for what you do
Glad you enjoy it!
I discovered these videos a while ago and they're really great for understanding various statistical concepts. I need to do some power analyses on non-parametric data, is that even a thing? Is there even a variability estimate for a Mann-Whitney two sample test? Thanks.
Power estimates for non-parametric models exist. Just google "mann whitney u test power calculation"
As easy as it can be! Thanks...
You're welcome!
Dear Dr. Starmer.
Imagine I run a pilot test with 5 subjects in each group to get an estimate of the estimated mean and standard deviation from each group, so that I can compute a power analysis. The result of the power analysis suggest that I should run the test with at least 15 subjects on each group. Question: Do I need to measure 15 new subjects for each group or can I reuse the data that I already have (5 listeners on each group) from the pilot test? Is this statistically fine or it is considered p-hacking? And it is an strict no-go or it is not recommended but it is ok?
In some research fields, getting data from few subjects is very time consuming and expensive, therefore discarding it must be well justified. Could you share your thoughts about this? Thank you so much and thanks for a great stats course!
Ideally you would get new data. Imagine I collected 3 measurements per group, but those measurements were extreme compared to what I would normally get and that there is no real difference. The power analysis would tell me that I need to collect 3 samples. If I did not collect new data, I would get a false positive.
Thank you for the great video! I have one question though- 1:51 In the normal distribution graph, If the variable is day to recovery which can't be lower than 0, then would the estimated populatiln probability distribution be following poisson distribution..?
Sorry if I may be confused with the concepts in advance!
Believe it or not, the normal distribution is often used to approximate the poisson distribution. The idea is that if the means are far enough from 0, then it's a good fit. This is due to the central limit theorem. For details, see: online.stat.psu.edu/stat414/lesson/28/28.2
@@statquest I understand the explanation, thank you!!
Thanks for the great video! Do I understand correctly that the sample size determined from the power analysis depends on the means and variances estimated from the first experiment? How do I deal with the fact that this first experiment could result in estimating means and variances far away from the population's?
You have to start with some general sense of the variation in the data. It doesn't have to be perfect, but it's the best you can do.
Thanks for your illustration. but to understand it more, you explained it on an example of two sample test. What if we run one sample test ?
It's the same - even with just one sample, the sample has a distribution that may or may not be the same as an ideal distribution that we are comparing it to.
Thanks for this video
I now get the 'effect size' true meaning
I am unable to find the sample size calculator. I am learning stats to work on SPSS but was curious how you got the value 9 for the example
Just google "sample size calculator" and you should be good to go.
Thanks for the amazing video!
Could you please help understand how to know if the p value difference is significant when data is imbalanced and distributions severely overlap?
If the p-value is less than your threshold for significance, than it is significant.
@@statquest Wow, that was quick! Thanks! One last thing, umm.. is there any book you have written on applied statistics or one you recommend please? I mostly need it for machine learning applications. P.S. embedding you songs into your book would be something!!
@@priyamvadabhardwaj6331 I'm writing a book on machine learning that has some basic statistics, but it won't be out for another year. Unfortunately I don't know of any other good book.
@@statquest Looking forward to it!! Thanks and stay safe!
It was very very very easy explanation
Thanks!
Great explanation! Thank you.
Thanks!
Thank you Josh! I have a question. Could you collect a sample from Drug A and Drug B and use the two samples to calculate an Estimated Mean and Standard Deviation in order to calculate the Sample Size required for future research?
Yep!
@@statquest BAM!!! Thanks, Josh!
11:46 this guy's humor is just smth else
what a shameless guy !
:)
Hi!! this is such a great video thank you :) Is it the same if one wants to perform the power analysis for a repeated (test-retest) measure?
I don't know that off the top of my head, but presumably it is...(but I'm not sure).
Great video! But how about dichotomous outcomes, which is very common in clinical trials?
The main ideas are the same for dichotomous outcomes, however, the details might be a little different. So I recommend you start by finding a power calculating program for the test you want to use.