I started watching these videos to prepare for my Introduction to Machine Learning exam but now I just watch them because it's fun to learn about it when it is so well explained. Thank you for your effort!
I have really no words to express how incredibly amazing, clear and enlightening your videos are, you transform the historically "hard and complex" concepts into kids' games, it is astoundingly magnificent, almost majestic. Thank you, really, from the bottom of my heart. You deserve a whole university named after you.
@@statquest I just came with this Josh, do you have any explanation for why does this happen? stats.stackexchange.com/questions/535343/bootstrapped-mean-always-almost-identical-to-sample-mean
@@insertacoin738 Yes, I do. First, keep in mind that the person who posted that is calculating the mean of the the bootstrapped means. And this mean of means is very similar to the original mean. In other words, the mean of the histogram that bootstrapping created is centered on the mean of the original data. That's to be expected. Bootstrapping works because the sample distribution is an estimate (not an exact copy) of the population distribution. This estimate gets better as the sample size increases.
I really appreciate how you start off with a simple application example and then you build up from there with explanations and real time drawings. Lots of times when I read about concepts, they start more abstract or from theory, and that makes it less intuitive.
Thanks a lot on the explanation. I was confused on how to create a simulated distribution for calculating p-value and this video explains really well. Shifting the data to a mean of zero before resampling is the key!
Literal black magic. Cheers so much for making this I had some data that was a pain in the butt to get and Im trying to pull all I can out of it, this really helped!
Awesome video Josh! Really well explained, as usual. I was curious as to how the data is shifted (e.g. what function is applied) so that you can get from your original mean, to a mean of zero. Otherwise I think I understood everything!
I'm just following your Fundamentals playlist in order. My first encounter with statistics ever. Thank you so much for putting it together!! Can you recommend any collection of beginner stat problems to practice on? It would help to learn tremendously.
Great video, as always. I admire so much your work, your knowledge and your ability to make concept understandable. What about if our interest is compare some statistics between two different groups? For example, the mean difference between two groups: - Calcolate the difference of two group means - Bootstrapping each group by itself - Calculate the bootstrap mean difference and subtract the observed mean difference - Repeat to obtain the bootstrap mean difference under null hypotesis of no mean difference? Could make sense? Thank you so much
Here's a discussion on how to use the bootstrap to compare two means: stats.stackexchange.com/questions/92542/how-to-perform-a-bootstrap-test-to-compare-the-means-of-two-samples
Amazing video as usual! I just was wondering why the value of 0.05 was used as a threshold for the p-value. Was it just arbitrarily set or did we assume that it was meaningful for our experiment with the drug ?
Thanks for your work again. Another question that comes to my mind is:😊 Does testing the null hypothesis always involve centering the dataset so that the mean is zero? I mean there are common cases in real world when mean/any statistics we use is not equal to 0 when there is no effect/no differences between means? Also, does t test function in R or python also use shifting dataset so that it has mean equal to 0/ centered on null hypothesis and repeat that bootstrap like simulations many times to calculate p value? Or it uses different method?
@@statquest so technically and generally in statistics so as in R/python functions, calculating P value (which of course involves null hypothesis testing) does not obligatory use this data centering to 0 "in its mechanism" and calculating it using bootstrap method is just a method not involved in default r/python functions for t tests or linear models? Or whether the classic P value calculating also uses bootstrap?
Thank you so much for the amazing video! I really appreciate the effort you put into it. I have a question, and I hope it makes sense. From previous videos, I've learned that we should conduct a power analysis to determine the necessary sample size before starting an experiment. My question is: if our sample size is smaller than the power analysis suggests, can we use bootstrapping in this case? Or could bootstrapping serve as a replacement for power analysis altogether? Alternatively, should bootstrapping only be applied after we've collected the sample size recommended by power analysis, but we still need to draw firm conclusions?
Bootstrapping can't replace getting enough measurements to begin with. Bootstrapping is only useful for calculating p-values when there isn't an already an easy way to do that calculation. So, instead of dumping your data into a t-test to calculate the p-value, you dump it into a bootstrapping algorithm to calculate the p-value.
Hi Josh, thank you for the great video. I had a question at 4:57. Why do you look at the probabilities of observing means ≤ or ≥ ±0.5 in the bootstrap distribution?
Are already familiar with p-values? If not, check out these two videos: th-cam.com/video/vemZtEM63GY/w-d-xo.html and th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html I believe those will answer your question.
Must we always consider both tails when calculating a pvalue from bootstrapping? Had we looked at the medians and only considered the right tail that would have been significant (@.05) to reject Ho. Or did we assume that Ha was not equal to zero and therefore a two tail test?
You don't always need to use two-tailed p-values. However, I think it is almost always a mistake to not use two-tailed p-values. Not once in my career as a biostatistician did I use a single sided test. If you want to know why, see: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
hey really nice video! I am wondering why P value is calculated by adding up the proportions of values that are farther than the observed value at either side instead of one side (in the case of observed mean = 0.5, just p-value > 0.5)?
clarification needed: to fail to reject the hypothesis that the drug has 0 effect, does it means that we don't reject the null hypothesis and this mean that the experiment is not statistically significant ? does this therefore mean that we cannot conclude whenever the drug is effective or not? or that the drug is not effective?
Thanks a lot for the fantastic video. I have one question: if the test statistic is something other than mean, for example, if we want to see the slope of a trend analysis passes 0 or not, how can we scale the dataset so that it represents the null hypothesis? In the video, we simply shift samples 0.5 units to the left because we are bootstrapping for mean values. But if we have some samples that we want to bootstrap the slope of the trend, should we de-trend the data first and them bootstrap them? Thanks a lot
In this video, I shifted the values just because it made the math more obvious. However, you don't need to do that, and you can just calculate the p-value of whatever value you want with the raw (unshifted) histogram. Or you can create a confidence interval.
Hi Josh, Big fan of your videos (and merchandise)! They are incredibly helpful :) Could you please also do a series on running models in Bayesian framework?
Hey Josh, when counting p.value for medians you are assuming that they come from normal distribution (or at least simetrical around the 0)? If do, why?
Great video and I understood your procedure perfectly. I just believe that, in the process of shifting 0.5 to the left to redo the bootstrap taking the mean to 0, it would not be strictly necessary (except for ease of understanding) I think that instead of redoing the shifted bootstrap, it would be enough in the original bootstrap to take the probability of above 1.0 plus the probability below 0.0. In the original boostrap this would correspond respectively to get the probability above 0.5 and below -0.5, after shifting 0.5 to the left. Am I wrong? Another point is that at 4:11, the probability above 0.5 was 48%, but at 5:04 to get the p-value you used 47%.
Thank You for amazing explanation, still I am a little confused. On 4.10 of the video You have the probabilty for a mean >=0.5 as 0.48 and on 5.02 of the video the probability for a mean >= 0.5 becomes 0.47... How is that? And for the median - how do You get the probability for a median >= 1.8 as 0.01. How is that calculate once the bootstrapped distributions for medians does not go beyond ~ 0.5 units? Isn't the calculated probability simply a portion of the distribution beyond the given value (like 1.8 for the median in our example)? What do I miss?
1) That's just a minor typo. 2) We count the number of bootstrapped generated medians >= 1.8 and divide by the total number of bootstrapped generated medians.
Great video and helpful examples! What do you do when you're testing the median (with HO: median = 0; HA: median not 0), and the observed median is 0? As there is no shift, I'm thinking the p-value is 1.000 (as all of the bootstrapped medians are either >=0 or =
Thanks so much! These videos are really great. I was wondering if you will make one on Mixed ANOVAs? :-) Your explanations really help to understand the concepts quickly.
Hi Josh, do you have to make assumptions about normality of the data? Or does bootstrapping work for parametric and non parametric cases (because of the central limit theorem)? Thank you for another informative video!
Hey Josh, thanks for this awesome video!! Do you know of any reference (paper, handbook chapter etc.) that shows the asymptotic validity of the approach you are using? Best, Sebastian
Hi Josh and thank's for the overview. I have been using bootstrapping for quite some time now, but not to look at p-values for just one data set. What you describe is---more or less---a different kind of t-test, right? I am using bootstrapping for determining confidence intervals, but also to compare two datasets, e.g., I use two models to predict data and compare the models' performance with bootstrapping. For example, is the root-mean-squared prediction error (RMSE) larger in data set A in comparison to data set B? When repeating this (e.g.) 1000 times, each time comparing the RMSEs, I get a p-value from these comparisons. --> Model A performed better than model B in 990 of 1000 comparisons --> p = 0.99 (or 0.01) I hope this was understandable. What are your thoughts on this application of bootstrapping?
This example is like a one-sample t-test (without having to refer to the t-distribution). Your experiment is a little confusing. You have data sets A and B and also models A and B, so I don't know what you are comparing.
@@statquest Thanks, and I try to explain a bit more: I have data that I measured (in my case those are Speech Recognition Thresholds, i.e., the signal to noise ration at which 50 % of spoken words can be understood in a noisy environment, I hope this is not getting to abstract). I want to simulate this data with different models and I want to determine which model is better (e.g. model A and model B). To figure out, which model is better, I create a bootstrapped data set of the measured data and calculate the RMSE for both model simulations. Let's say, the RMSE for the bootstrapped data set of model A is 1 and of model B it is 2. I compare these values and count how often the RMSE of model A was lower than the RMSE of model B: --> For this first comparison, I count 1. Second run: RMSE of model A is 1.5, RMSE of model B is 1.4 --> I do not count this (1 of 2 comparisons indicate that the RMSE of model A is lower than the RMSE of model B) When repeating this procedure 1000 times, 990 of the comparisons showed that model A has a lower RMSE, and in 10 comparisons model B had a lower RMSE. I consider this to yield a p value of 0.99 (which is effectively an p value of 0.01). I hope you find this interesting, and I would be happy to get your thoughts on this application of bootstrapping.
@@DrMcZombie You've calculated a probability, which is part of a p-value, but not a p-value. A p-value is the probability of the observed result or data plus the probabilities of all results that are more extreme. For details, see: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html So, here's what you should do (or consider doing): 0) The null hypothesis is that there is no difference between models A and B. This means that we would expect the difference in RMSE to be 0 between models A and B. 1) Bootstrap your data, run it through your models and make a histogram of differences in RMSE. 2) Draw a 95% CI between the 2.5% quantile and the 97.5% quantile of that histogram 3) Does that CI include 0? If so, fail to reject the hypothesis that models A and B are the same. If not, reject the hypothesis that models A and B are the same. Bam.
@StatQuest with Josh Starmer Thank you for your reply, and I also see the point that you make. But just to clarify: Wouldn't this boil down to the "counting the comparisons approach"? (not with regard to the p-value, but just for failing to reject the null hypothesis) When 10 of 1000 comparisons (1%) showed, that model A had a lower RMSE than model B, then the 95%-CI of the histogram of differences between the models would not include 0. The CI would include 0 when 25 or more of 1000 comparisons (i.e. more than 2.5 % of the comparisons) would show that model A has a lower RMSE than model B. Anyway, thank's and I am looking forward to more of your great videos. --> octave code example (e.g. use octave-online.net/): % let's assume A and B are the RMSEs of two models. % H1: A is significantly different from B (0 not in 95%-CI of the difference histogram) % H0: A and B are the same (0 in 95%-CI) A = randn(10000,1) + 3; % random numbers, mean = 3; std = 1; B = randn(10000,1); % same, but mean = 0; hist(A-B); % draw histogram comparisons = sum(B > A) / numel(B); CI = quantile(A-B,[0.025 0.975]); printf('comparisons: %1.3f ; CI: [%1.3f %1.3f] ', comparisons, CI); % when CI does not include 0 --> H0 rejected, H1 true
It sounds like calculating p-values from bootstrapping can lend itself to p-hacking, if you find "the right" statistic that does lead to rejecting the null hypothesis because of some reason (e.g. being more or less sensitive to outliers). What do you think?
Amazing video! Any ideas on how to make bootstrapping run faster on python? It starts lagging once you are doing > 10^5 trials with large sample sizes.
@@statquest Thanks! There is probably some library that does this efficiently. I was just curious about how one could be implemented, but it something that can be learned at another point in time.
Thanks for the very clear and informative description of this. I have a question - whenever the absolute value of the mean/median/statistic-of-interest of the original data is greater than the absolute value calculated from the shifted data, the p-value will be zero. I have a large set of tests to run and would like to do an FDR correction on the resultant set of p-values, but a not-insignificant number of them are zero. Is this still a legitimate thing to do?
I'm not sure I understand your problem because each time you calculate a p-value you have to calculate the bootstrapped statistic. Are you saying that when the absolute value for every single bootstrapped statistic (and there should be > 10,000 of them) is > then the original statistic, the p-value is 0? Well... if that is the case, all 10,000 bootstrapped statistics are way far away from 0, then the p-value should be 0.
@@statquest Sorry, I probably didn't explain very well. For the shifted data, the largest possible mean of a bootstrap resample is just the largest value in the shifted data (which happens when it is chosen for every element of a resample). When the mean of the original unshifted data is larger than this, the p-value will be zero, regardless of the number of bootstrap resamples carried out. But this does not distinguish between cases when it is just a little bit larger, or very much larger. So if I have a lot of tests on independent data sets, I am concerned that the 'zero p-vaue' ones will be treated identically by the FDR procedure, when perhaps they shouldn't be??
@@statquest I will do that. I was just hoping to use the bootstrap so I could use the median instead of the mean. (btw I recently purchased your book on ML - very helpful, thank you!)
please reply!! when you were calculating the p value i think we were supposed to find the p value supporting the null hypothesis and if that value is less than 0.05 we can reject the null hypothesis, but here you were calculating the p value of observing mean value of 0.5 or something more extreme and i think this is not supposed to be null hypothesis, then if we get a p value of greater than 0.05 of observing the mean >=0.5 that means often we will get mean >= 0.5 which means drug is having some effect. This is what i understood can u explain?
In this video, the null hypothesis is that, on average, the drug has no effect (average effect = 0). We then use bootstrapping to calculate a p-value for this null hypothesis and we get 0.63, so we fail to reject the null hypothesis that the drug has no effect. In other words, there's a high likelihood that any random set of 8 people that have the disease will have, on average, an effect = 0.5.
Wouldn't shifting the bootstrap distribution that was obtained from the original sample data be basically equivalent (for the purpose of calculating a pvalue) to the bootstrap null distribution?
Thank you the awesome video, 1) how does this apply to compare means from two different group (ctrl/test)? 2) What if my measure is proportion (%), how can we apply this method?
If, for example, alpha = 0.05, then you can incrementally add the tails of the histogram together until you get 0.05. The last parts of this histogram added define the critical values.
The main question is, how is it different from simply running a t.test to see if the mean equals to 0 or not? Is there anything that bootstrapping adds to it? Originally I thought that bootstrapping might help for example to get tighter confidence intervals without the need to take more sample data in the field, but according to my tests which I made with boot library, the confidence intervals from the bootstrapped data are basically the same as the ones computed from the original data. Well, when I call boot.ci() they tend to be a little bit tighter, but I think it's because the t.test computation is probably a little more conservative (I guess).
The purpose of bootstrapping isn't to replace a t-test, or any other known statistical test. Those known tests will always perform better because they make assumptions about the data that bootstrapping does not, and that results in them having an edge. However, the magic with bootstrapping is that it can be used to calculate p-values or confidence intervals in any situation - including those that are not appropriate for t-tests or any other known test. For example, with bootstrapping we can compare medians or modes instead of means, and you can't do that with a t-test.
Yes, the same theory that we use to trust "normal" statistics (like t-tests and what not) also applies to bootstrapping. In other words, the theory that allows you to put trust into a t-test also suggests we should put trust in bootstrapping.
First off, the p-value is not 0.47, so that might be part of the problem. At 3:29 we have a histogram that tells us what would happen if the null hypothesis was true. Then at 3:36 we can calculate the percentage of means that were between -0.5 and 0.5 (this is just the number of means that we calculated that fell between -0.5 and 0.5 divided by the total number of means). This percentage was 36%, which also tells us that the probability of observing a mean between -0.5 and 0.5 is 0.36. Likewise, we then calculate the probability of observing a mean = 0.5 + the probability of observing a mean
Hey, your videos are a treasure! I had a doubt, at 6:18, the histogram of median doesn't look bell-shaped. This made me wonder whether the distribution of medians would be Normal (like distribution of means) or not, could you please let us know?
1:54 - Since 95% CI includes 0, we cant reject null hypothesis (drug not working). Why? What has inclusion of 0 in the CI to do with null hypothesis rejection? I am confused. Ps: I have studied all previous videos.
When the confidence interval contains 0, then we can't be confident that the true value is not 0, even though our estimate is not 0. In other words, there is enough variation in the data that we can't have a lot of confidence in the estimate we made with it.
but how would you do this for a test statistic (like a correlation coefficient), where creating a "null data set" from which to resample is not as straightforward as just mean-centering the data?
Hey Josh, thanks for this comprehensive explanation! I'm a bit confused why you need to add values greater than and equal to 0.5 but also values less than and equal to -0.5 for the p-value? Why can't I just look at values >=0.5?
In this video we calculate a two-sided p-value and I describe these, and the reasons for them, extensively in this StatQuest on p-values: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
0.05 is the standard threshold that we use when we try to understand if a p-value is significant or not. Values less than 0.05 have "statistical significance" and values larger than 0.05 don't.For more details, see: th-cam.com/video/vemZtEM63GY/w-d-xo.html
@@statquest Yes, but I need to write a bit of narrative to clarify my question related to Bootstrap but not particularly your nice video. I am a risk analyst working at a company and also doing my PhD in the field of actuarial science. We recently encountered an issue related to a model being used at the company.
@@statquest @statquest , If you do not mind I shoot my question here :) To begin with, I am a model validator, and one of our tasks is to ensure that a model works as expected and is fit for business purposes. To do so, back-testing is typically performed to check the model performance. In a nutshell and simple language, we have the following problem: A financial model generates thresholds at a confidence level of 90 percent. In order to check the model performance, it is important to count the number of defects over a given period which is usually 250 working days (i.e., one year). The defect is defined as below: A defect occurs if the relative market movement in 10 days is greater than the threshold, in other words: log(P_{t+10} /P_{t}) > v_t, where i = 1, 2, ..., 240 and P_{i} is the market price at time t and v_t stands for the thresholds comes out of the model. Note that the market movements are obtained on a rolling basis so we have overlapping intervals. If we believe that the model works good, then one can expect that the number of defects observed over 240 should be 2.4 ~ 3 violations because only at the confidence level 90 percent there is 10 percent chance for observing defects, i.e., 240*0.01 = 2.4. Now let's consider the test hypothesis that needs to be done in order to back-test the model: Null hypothesis: p = 0.01 Alternative hypothesis: p > 0.01 where p is the probability of defect. Under the null hypothesis, the model works as expected because the probability of defect is 1% which is acceptable at the confidence level of 90 percent. Here are the steps taken to back-test the model 1) Compute the spread which is the difference between the market movement and threshold, i.e., Spread = log(P_{t+10} /P_{t}) - v_t 2) Generate 1000 synthetic samples each with size 240 from the original spreads while preserving the dependency structure, for example, the Maximum Entropy Bootstrap approach is applied in this stage. 3) Count the number of positive spreads (indicating defects) for each synthetically generated sample. 4) Obtain the defect ratio for each synthetically generated smaple using (#defects)/240. 5) Use the distribution of the generated defect ratios (i.e., the probability of defect) to find the p-value corresponding to the above hypothesis test. So, using p*_1, p*_2, ..., p*_1000 we calculate the following probability: p-value = P_H0( p > 0.01 ) that is approximated basedo the distribution of p*_1, p*_2, ..., p*_1000. My question: Here the quantity under consideration is the probability of a defect or we could consider the defect rate. If the observed defect rate in the original data set is greater or less than 0.01, then we need to apply a transformation, like what you did for mean where you shifted the data to get zero mean, to have ratio equal to 0.01 and then generate samples from spreads for which the defect ratio is 0.01 to compute the probability of being greater than 0;01 under the alternative hypothesis right?
@@statquest It is fine howevere I already asked my question and I think it is interesting to be taken into account. Feel free to answer it. Thank you for your time.
Perhaps because of the different ways of thinking between East and West, as an Asian I find it easier to understand not to switch to a mean of zero and use the drug no effect as -0.5, but to do so is somewhat inconsistent with the null hypothesis method,good tutorial. There is another problem, that is, the example of 0.36 probability and the probability of less than -0.5 is 0.16 and the probability of greater than 0.5 is 0.47, which seems to be a bit contradictory to do bootstrapping on the basis of the null hypothesis. If the bootsrtrapping times are enough, shouldn't the probability of less than 0.5 and greater than 0.5 be equal?
Dear Josh,time point is 4:07, the probability of less than or equal to-5 is 0.16, greater than or equal to 5 is 0.48 in time pint4:10. Is this probability a reasonable example? If bootstrapping enough times, shouldn't 0.16 be equal to 0.48? In addition, why can't the paper version of the book be sent to China? I bought it in Japan and transferred it from Japan to China.@@statquest
@@SunSan1989 My guess is that they will probably meet in the middle. As for my book, there should be a Chinese version (and translation) available in the next year. People are working on it.
Sorry, since my English is not very good, I want to confirm that my understanding 0.16 should be replaced with the same value as 0.48. Is this understanding correct? @@statquest
@@statquest cross-tab actually who use spss know it It is cross table like cross two variables such as as gender and healthy (yes or no), you will end with 4 group, i want to know if i can consider the each group as independent group and calculate CI as normal
At 2:29 I say that we shift the data to the left by 0.5 units (where 0.5 is the mean of the data). That means we subtract 0.5 from each value in the dataset.
@@statquest but why Josh? If you have the bootstrap distribution and you calculate the 95% confidence interval you can say if the hypothesis can be rejected or not? If 0 is in than it can't be rejected. So why shift the data it doesn't matter?
@@drachenschlachter6946 Because this video is talking about how to calculate p-values, not confidence intervals. The first bootstrapping video describes confidence intervals (and does not require shifting the data): th-cam.com/video/Xz0x-8-cgaQ/w-d-xo.html
lets says we dont see the p values and see that the 95% confidence interval is crossing 0 at 5:41 then cant we say that the majority of means are crossing 0 therefore drug has been helping in the recovery instead of having no effect. I mean, with confidence interval point of view.
This example is not great for discussing CIs because we shifted the data to be centered on 0. If we wanted to calculate a CI, we would do this: th-cam.com/video/Xz0x-8-cgaQ/w-d-xo.html
I cannot understand why do we care about the region of -0.5.. Given a data with mean 0.5 and variance v, how likely i see this data if the mean is 0. lets assume the data is from a normal distribution, N p-value = P(mean >= 0.5| N(0, v)) if p-value reject H0 if p-value > 0.05: it is likely that the H0 is true => cannot reject H0 where is the role of -0.5 here?
It's a good question. The answer, I believe, is "power". Bootstrapping works in all kinds of situations, but (I believe) it has less power than parametric methods.
@@statquest That's a really good question, dear Josh, can you make a video about the differences in power? Thank you for the tutorial.I appreciate it very much.
There is no reason to subtract the mean of the distribution before bootstrapping and then adding it later. Just bootstrap the original data and see where the original mean is in the generated histogram.
I shifted the data because the null hypothesis is that the "true mean" is 0 and it's helpful to see how the distribution would be distributed around 0 in that case.
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
I started watching these videos to prepare for my Introduction to Machine Learning exam but now I just watch them because it's fun to learn about it when it is so well explained. Thank you for your effort!
Awesome!!! BAM! :)
I have really no words to express how incredibly amazing, clear and enlightening your videos are, you transform the historically "hard and complex" concepts into kids' games, it is astoundingly magnificent, almost majestic. Thank you, really, from the bottom of my heart. You deserve a whole university named after you.
Thank you so much 😀
@@statquest I just came with this Josh, do you have any explanation for why does this happen? stats.stackexchange.com/questions/535343/bootstrapped-mean-always-almost-identical-to-sample-mean
@@insertacoin738 Yes, I do. First, keep in mind that the person who posted that is calculating the mean of the the bootstrapped means. And this mean of means is very similar to the original mean. In other words, the mean of the histogram that bootstrapping created is centered on the mean of the original data. That's to be expected. Bootstrapping works because the sample distribution is an estimate (not an exact copy) of the population distribution. This estimate gets better as the sample size increases.
The way you explain bootstrap is so good. Make it simpler for everyone.
Thank you! :)
I really appreciate how you start off with a simple application example and then you build up from there with explanations and real time drawings. Lots of times when I read about concepts, they start more abstract or from theory, and that makes it less intuitive.
Thank you! I'm glad you appreciate my style.
Thank you! I've never really realized the power of bootstrapping until watching your 'Quests. Great stuff 👍👍
Thank you very much! :)
You just saved my work report, keep it up man.
Glad I could help!
Thanks a lot on the explanation. I was confused on how to create a simulated distribution for calculating p-value and this video explains really well. Shifting the data to a mean of zero before resampling is the key!
Glad it was helpful!
Literal black magic.
Cheers so much for making this I had some data that was a pain in the butt to get and Im trying to pull all I can out of it, this really helped!
Thanks!
This channel deserves atleast a million subscribers!
Thank you! :)
Hi Prof. Starmer, Thank you so much for your videos. It is well-explained and fun. It is appreciated to your works. All the best!!! Yen :)
Thank you!
It's very easy to understand! Super explanation
Thank you
Thanks!
your work is amazing
Thank you! :)
This is exactly what I was looking for, thank you!
Great to hear!
Damn, that's some good quality here ! hope to see more videos !
Thanks!
Simply awesome !
Thank you!
Awesome video Josh! Really well explained, as usual. I was curious as to how the data is shifted (e.g. what function is applied) so that you can get from your original mean, to a mean of zero. Otherwise I think I understood everything!
BAM! :) We just subtract the original mean value from all of the original values to shift the data.
@@statquest Haha, I should've thought of that! Thanks Josh!
I'm just following your Fundamentals playlist in order. My first encounter with statistics ever. Thank you so much for putting it together!! Can you recommend any collection of beginner stat problems to practice on? It would help to learn tremendously.
Also thank you for stripping away most of the terminology! Can't imagine learning this from a regular lecture or a texbook ugh
I'm glad you are enjoying the video. I have a few "beginner" stat problems here statquest.org/video-index/ (just search for "StatTest")
@@statquest Awesomeness, thank you!
Great video, as always. I admire so much your work, your knowledge and your ability to make concept understandable.
What about if our interest is compare some statistics between two different groups?
For example, the mean difference between two groups:
- Calcolate the difference of two group means
- Bootstrapping each group by itself
- Calculate the bootstrap mean difference and subtract the observed mean difference
- Repeat to obtain the bootstrap mean difference under null hypotesis of no mean difference?
Could make sense?
Thank you so much
Here's a discussion on how to use the bootstrap to compare two means: stats.stackexchange.com/questions/92542/how-to-perform-a-bootstrap-test-to-compare-the-means-of-two-samples
Thank u for your knowledge sharing. This video is helpful for me.
Glad it was helpful!
Amazing video as usual! I just was wondering why the value of 0.05 was used as a threshold for the p-value. Was it just arbitrarily set or did we assume that it was meaningful for our experiment with the drug ?
I explain p-value thresholds here: th-cam.com/video/vemZtEM63GY/w-d-xo.html
I finally did it for my real problem case.
bam! :)
Thanks for your work again. Another question that comes to my mind is:😊 Does testing the null hypothesis always involve centering the dataset so that the mean is zero? I mean there are common cases in real world when mean/any statistics we use is not equal to 0 when there is no effect/no differences between means?
Also, does t test function in R or python also use shifting dataset so that it has mean equal to 0/ centered on null hypothesis and repeat that bootstrap like simulations many times to calculate p value? Or it uses different method?
You don't have to center the data to get the p-value - it just makes it easier to visualize and interpret.
@@statquest so technically and generally in statistics so as in R/python functions, calculating P value (which of course involves null hypothesis testing) does not obligatory use this data centering to 0 "in its mechanism" and calculating it using bootstrap method is just a method not involved in default r/python functions for t tests or linear models? Or whether the classic P value calculating also uses bootstrap?
@@marcingrzebalski103 that's correct and analytical methods do not use bootstrapping.
Thank you so much for the amazing video! I really appreciate the effort you put into it. I have a question, and I hope it makes sense. From previous videos, I've learned that we should conduct a power analysis to determine the necessary sample size before starting an experiment. My question is: if our sample size is smaller than the power analysis suggests, can we use bootstrapping in this case? Or could bootstrapping serve as a replacement for power analysis altogether? Alternatively, should bootstrapping only be applied after we've collected the sample size recommended by power analysis, but we still need to draw firm conclusions?
Bootstrapping can't replace getting enough measurements to begin with. Bootstrapping is only useful for calculating p-values when there isn't an already an easy way to do that calculation. So, instead of dumping your data into a t-test to calculate the p-value, you dump it into a bootstrapping algorithm to calculate the p-value.
@@statquest I can't thank you enough.
Hi Josh, thank you for the great video. I had a question at 4:57. Why do you look at the probabilities of observing means ≤ or ≥ ±0.5 in the bootstrap distribution?
Are already familiar with p-values? If not, check out these two videos: th-cam.com/video/vemZtEM63GY/w-d-xo.html and th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html I believe those will answer your question.
Must we always consider both tails when calculating a pvalue from bootstrapping? Had we looked at the medians and only considered the right tail that would have been significant (@.05) to reject Ho. Or did we assume that Ha was not equal to zero and therefore a two tail test?
You don't always need to use two-tailed p-values. However, I think it is almost always a mistake to not use two-tailed p-values. Not once in my career as a biostatistician did I use a single sided test. If you want to know why, see: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
hey really nice video! I am wondering why P value is calculated by adding up the proportions of values that are farther than the observed value at either side instead of one side (in the case of observed mean = 0.5, just p-value > 0.5)?
It's because two-sided p-values are almost always better than 1-sided p-values. To understand why, see: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
Great video! Is this the working principles of "Particle Filters"/"Sequential Monte Carlo"?
I have no idea. I've never heard of those things before. :(
great dude, keep rocking!
Thanks, will do!
great Video!! thank you so much. 🌻🌻
would you please make some other videos about Wild Bootstrapping?
I'll keep that in mind.
Josh just made the part2 so he could sing "part 2... calculate p value" this is gem!
:)
clarification needed: to fail to reject the hypothesis that the drug has 0 effect, does it means that we don't reject the null hypothesis and this mean that the experiment is not statistically significant ? does this therefore mean that we cannot conclude whenever the drug is effective or not? or that the drug is not effective?
It means that we do not have enough evidence to exclude that the drug has no effect. Or in other words we can't conclude that the drug is effective.
@@v0ldelord BAM! :)
To learn more about hypothesis testing, check out th-cam.com/video/0oc49DyA3hU/w-d-xo.html
Thanks a lot for the fantastic video. I have one question: if the test statistic is something other than mean, for example, if we want to see the slope of a trend analysis passes 0 or not, how can we scale the dataset so that it represents the null hypothesis? In the video, we simply shift samples 0.5 units to the left because we are bootstrapping for mean values. But if we have some samples that we want to bootstrap the slope of the trend, should we de-trend the data first and them bootstrap them? Thanks a lot
In this video, I shifted the values just because it made the math more obvious. However, you don't need to do that, and you can just calculate the p-value of whatever value you want with the raw (unshifted) histogram. Or you can create a confidence interval.
Hi Josh, Big fan of your videos (and merchandise)! They are incredibly helpful :)
Could you please also do a series on running models in Bayesian framework?
Yes, that's a plan.
@@statquest That would be a TRIPLE BAM! Looking forward :)
OMG, looked like too complicated learn about this topics, however, you make so easy
BAM! :)
Really informative, thank you so much for uploading
Thanks!
hi josh thanks as always! @5:03, Probability of observing a Bootstrapped mean >=0.5 is 0.48 not 0.47 according to the previous cal, maybe?
yep. you found a typo.
Hey Josh, when counting p.value for medians you are assuming that they come from normal distribution (or at least simetrical around the 0)? If do, why?
No, I don't make any assumptions about the bootstrapped distribution.
Hi Josh, May I know the reason why p value is calculated of two-sided?
Because 99 times out of a 100 you always want a two-sided p-value. For details, see: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
Good bless you mister
Thanks!
Great video and I understood your procedure perfectly.
I just believe that, in the process of shifting 0.5 to the left to redo the bootstrap taking the mean to 0, it would not be strictly necessary (except for ease of understanding)
I think that instead of redoing the shifted bootstrap, it would be enough in the original bootstrap to take the probability of above 1.0 plus the probability below 0.0.
In the original boostrap this would correspond respectively to get the probability above 0.5 and below -0.5, after shifting 0.5 to the left.
Am I wrong?
Another point is that at 4:11, the probability above 0.5 was 48%, but at 5:04 to get the p-value you used 47%.
That is correct
Thank You for amazing explanation, still I am a little confused. On 4.10 of the video You have the probabilty for a mean >=0.5 as 0.48 and on 5.02 of the video the probability for a mean >= 0.5 becomes 0.47... How is that? And for the median - how do You get the probability for a median >= 1.8 as 0.01. How is that calculate once the bootstrapped distributions for medians does not go beyond ~ 0.5 units? Isn't the calculated probability simply a portion of the distribution beyond the given value (like 1.8 for the median in our example)? What do I miss?
1) That's just a minor typo.
2) We count the number of bootstrapped generated medians >= 1.8 and divide by the total number of bootstrapped generated medians.
@@statquest Thanks!
Thanks a lot. Really nice video. I have a question about the number of replicates when doing the bootstrapping. Is this related to the sample size?
In a small way it is dependent on the sample size (if the sample size is small, there are only so many different bootstrapped samples you can create).
Great video and helpful examples! What do you do when you're testing the median (with HO: median = 0; HA: median not 0), and the observed median is 0? As there is no shift, I'm thinking the p-value is 1.000 (as all of the bootstrapped medians are either >=0 or =
Yes, exactly.
Thanks so much! These videos are really great. I was wondering if you will make one on Mixed ANOVAs? :-) Your explanations really help to understand the concepts quickly.
One day I hope to.
Hi Josh, do you have to make assumptions about normality of the data? Or does bootstrapping work for parametric and non parametric cases (because of the central limit theorem)? Thank you for another informative video!
Bootstrapping makes no assumptions about the data.
Great video, thanks
Thank you! :)
Hey Josh, thanks for this awesome video!!
Do you know of any reference (paper, handbook chapter etc.) that shows the asymptotic validity of the approach you are using?
Best, Sebastian
Here's a great place to start if you want to learn more details: en.wikipedia.org/wiki/Bootstrapping_(statistics)
Hi Josh and thank's for the overview. I have been using bootstrapping for quite some time now, but not to look at p-values for just one data set. What you describe is---more or less---a different kind of t-test, right?
I am using bootstrapping for determining confidence intervals, but also to compare two datasets, e.g., I use two models to predict data and compare the models' performance with bootstrapping.
For example, is the root-mean-squared prediction error (RMSE) larger in data set A in comparison to data set B?
When repeating this (e.g.) 1000 times, each time comparing the RMSEs, I get a p-value from these comparisons.
--> Model A performed better than model B in 990 of 1000 comparisons --> p = 0.99 (or 0.01)
I hope this was understandable.
What are your thoughts on this application of bootstrapping?
This example is like a one-sample t-test (without having to refer to the t-distribution). Your experiment is a little confusing. You have data sets A and B and also models A and B, so I don't know what you are comparing.
@@statquest Thanks, and I try to explain a bit more: I have data that I measured (in my case those are Speech Recognition Thresholds, i.e., the signal to noise ration at which 50 % of spoken words can be understood in a noisy environment, I hope this is not getting to abstract). I want to simulate this data with different models and I want to determine which model is better (e.g. model A and model B).
To figure out, which model is better, I create a bootstrapped data set of the measured data and calculate the RMSE for both model simulations. Let's say, the RMSE for the bootstrapped data set of model A is 1 and of model B it is 2. I compare these values and count how often the RMSE of model A was lower than the RMSE of model B:
--> For this first comparison, I count 1.
Second run: RMSE of model A is 1.5, RMSE of model B is 1.4
--> I do not count this (1 of 2 comparisons indicate that the RMSE of model A is lower than the RMSE of model B)
When repeating this procedure 1000 times, 990 of the comparisons showed that model A has a lower RMSE, and in 10 comparisons model B had a lower RMSE.
I consider this to yield a p value of 0.99 (which is effectively an p value of 0.01).
I hope you find this interesting, and I would be happy to get your thoughts on this application of bootstrapping.
@@DrMcZombie You've calculated a probability, which is part of a p-value, but not a p-value. A p-value is the probability of the observed result or data plus the probabilities of all results that are more extreme. For details, see: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
So, here's what you should do (or consider doing):
0) The null hypothesis is that there is no difference between models A and B. This means that we would expect the difference in RMSE to be 0 between models A and B.
1) Bootstrap your data, run it through your models and make a histogram of differences in RMSE.
2) Draw a 95% CI between the 2.5% quantile and the 97.5% quantile of that histogram
3) Does that CI include 0? If so, fail to reject the hypothesis that models A and B are the same. If not, reject the hypothesis that models A and B are the same. Bam.
@StatQuest with Josh Starmer Thank you for your reply, and I also see the point that you make. But just to clarify: Wouldn't this boil down to the "counting the comparisons approach"? (not with regard to the p-value, but just for failing to reject the null hypothesis)
When 10 of 1000 comparisons (1%) showed, that model A had a lower RMSE than model B, then the 95%-CI of the histogram of differences between the models would not include 0.
The CI would include 0 when 25 or more of 1000 comparisons (i.e. more than 2.5 % of the comparisons) would show that model A has a lower RMSE than model B.
Anyway, thank's and I am looking forward to more of your great videos.
--> octave code example (e.g. use octave-online.net/):
% let's assume A and B are the RMSEs of two models.
% H1: A is significantly different from B (0 not in 95%-CI of the difference histogram)
% H0: A and B are the same (0 in 95%-CI)
A = randn(10000,1) + 3; % random numbers, mean = 3; std = 1;
B = randn(10000,1); % same, but mean = 0;
hist(A-B); % draw histogram
comparisons = sum(B > A) / numel(B);
CI = quantile(A-B,[0.025 0.975]);
printf('comparisons: %1.3f ; CI: [%1.3f %1.3f]
', comparisons, CI);
% when CI does not include 0 --> H0 rejected, H1 true
It sounds like calculating p-values from bootstrapping can lend itself to p-hacking, if you find "the right" statistic that does lead to rejecting the null hypothesis because of some reason (e.g. being more or less sensitive to outliers). What do you think?
That's why for everything in statistics, you plan what you are going to do (what metric you are going to use etc.) before collecting data.
Great video mate
Thanks!
How do you use bootstrapping when you have several variable? For example for a regression model.
How would you use it to test the standard deviation?
See: www.sciencedirect.com/science/article/abs/pii/S0167715217303450
the jingles are off the chain
bam!!!
Amazing video!
Any ideas on how to make bootstrapping run faster on python? It starts lagging once you are doing > 10^5 trials with large sample sizes.
Good question...I'm not really sure, but with a large sample size, you might be able to get away with doing less bootstrapping.
@@statquest Thanks! There is probably some library that does this efficiently. I was just curious about how one could be implemented, but it something that can be learned at another point in time.
Thanks
BAM! Thank you so much for contributing to StatQuest!!!
Thank you soooooooooooooo much!
BAM! :)
Thank you!
Why you calculate +-0.5 in the histogram and not only 0-0.5?
What time point in the video, minutes and seconds, are you asking about?
awesome!
Thank you!
Thanks for the very clear and informative description of this. I have a question - whenever the absolute value of the mean/median/statistic-of-interest of the original data is greater than the absolute value calculated from the shifted data, the p-value will be zero. I have a large set of tests to run and would like to do an FDR correction on the resultant set of p-values, but a not-insignificant number of them are zero. Is this still a legitimate thing to do?
I'm not sure I understand your problem because each time you calculate a p-value you have to calculate the bootstrapped statistic. Are you saying that when the absolute value for every single bootstrapped statistic (and there should be > 10,000 of them) is > then the original statistic, the p-value is 0? Well... if that is the case, all 10,000 bootstrapped statistics are way far away from 0, then the p-value should be 0.
@@statquest Sorry, I probably didn't explain very well. For the shifted data, the largest possible mean of a bootstrap resample is just the largest value in the shifted data (which happens when it is chosen for every element of a resample). When the mean of the original unshifted data is larger than this, the p-value will be zero, regardless of the number of bootstrap resamples carried out. But this does not distinguish between cases when it is just a little bit larger, or very much larger. So if I have a lot of tests on independent data sets, I am concerned that the 'zero p-vaue' ones will be treated identically by the FDR procedure, when perhaps they shouldn't be??
@@willw234 Since you are just testing the mean, you might consider just using a one sample t-test. Then you're p-values will be more spread out.
@@statquest I will do that. I was just hoping to use the bootstrap so I could use the median instead of the mean. (btw I recently purchased your book on ML - very helpful, thank you!)
@@willw234 Awesome! Thank you!
please reply!!
when you were calculating the p value i think we were supposed to find the p value supporting the null hypothesis and if that value is less than 0.05 we can reject the null hypothesis, but here you were calculating the p value of observing mean value of 0.5 or something more extreme and i think this is not supposed to be null hypothesis, then if we get a p value of greater than 0.05 of observing the mean >=0.5 that means often we will get mean >= 0.5 which means drug is having some effect. This is what i understood can u explain?
In this video, the null hypothesis is that, on average, the drug has no effect (average effect = 0). We then use bootstrapping to calculate a p-value for this null hypothesis and we get 0.63, so we fail to reject the null hypothesis that the drug has no effect. In other words, there's a high likelihood that any random set of 8 people that have the disease will have, on average, an effect = 0.5.
Thank you so much
Wouldn't shifting the bootstrap distribution that was obtained from the original sample data be basically equivalent (for the purpose of calculating a pvalue) to the bootstrap null distribution?
Sure, either way.
Thank you the awesome video, 1) how does this apply to compare means from two different group (ctrl/test)? 2) What if my measure is proportion (%), how can we apply this method?
1) see: stats.stackexchange.com/questions/128694/bootstrap-two-sample-t-test
2) see: online.stat.psu.edu/stat200/lesson/4/4.3/4.3.1
@@statquest Thank you Josh!!
Sir,please explain the convolutional neural networks I'm eagerly waiting for your way of explanation
I've already done that, see: th-cam.com/video/CqOfi41LfDw/w-d-xo.html For a complete list of all of my videos, see: statquest.org/video-index/
@@statquest yes sir,thank you for reply but in that playlist there is no CNN and RNN.
Hi Josh!
How do we calculate critical value of statistic in this case?
If, for example, alpha = 0.05, then you can incrementally add the tails of the histogram together until you get 0.05. The last parts of this histogram added define the critical values.
@@statquest Got it! Thank you!
The main question is, how is it different from simply running a t.test to see if the mean equals to 0 or not? Is there anything that bootstrapping adds to it? Originally I thought that bootstrapping might help for example to get tighter confidence intervals without the need to take more sample data in the field, but according to my tests which I made with boot library, the confidence intervals from the bootstrapped data are basically the same as the ones computed from the original data. Well, when I call boot.ci() they tend to be a little bit tighter, but I think it's because the t.test computation is probably a little more conservative (I guess).
The purpose of bootstrapping isn't to replace a t-test, or any other known statistical test. Those known tests will always perform better because they make assumptions about the data that bootstrapping does not, and that results in them having an edge. However, the magic with bootstrapping is that it can be used to calculate p-values or confidence intervals in any situation - including those that are not appropriate for t-tests or any other known test. For example, with bootstrapping we can compare medians or modes instead of means, and you can't do that with a t-test.
There's anyway to know how good this method is? I mean, comparing resampling with actual knew statistics?
Yes, the same theory that we use to trust "normal" statistics (like t-tests and what not) also applies to bootstrapping. In other words, the theory that allows you to put trust into a t-test also suggests we should put trust in bootstrapping.
I dont understand how you got the actual p-value number? for example the p-value of 0.47 - how was that calculated?
First off, the p-value is not 0.47, so that might be part of the problem. At 3:29 we have a histogram that tells us what would happen if the null hypothesis was true. Then at 3:36 we can calculate the percentage of means that were between -0.5 and 0.5 (this is just the number of means that we calculated that fell between -0.5 and 0.5 divided by the total number of means). This percentage was 36%, which also tells us that the probability of observing a mean between -0.5 and 0.5 is 0.36. Likewise, we then calculate the probability of observing a mean = 0.5 + the probability of observing a mean
Hey, your videos are a treasure! I had a doubt, at 6:18, the histogram of median doesn't look bell-shaped. This made me wonder whether the distribution of medians would be Normal (like distribution of means) or not, could you please let us know?
The distribution of medians is not normally normal.
Is there a minimum sample size needed for bootstrap to be valid?
I think 8 might be a good starting point.
I wish I could see this video earlier before my exam
bam!
I don't understand why you use the shifted data to perform the bootstrap. What if you don't "know" the null hypothesis but just your sample?
You don't have to shift the data, it just makes the math easier.
1:54 - Since 95% CI includes 0, we cant reject null hypothesis (drug not working). Why? What has inclusion of 0 in the CI to do with null hypothesis rejection? I am confused.
Ps: I have studied all previous videos.
When the confidence interval contains 0, then we can't be confident that the true value is not 0, even though our estimate is not 0. In other words, there is enough variation in the data that we can't have a lot of confidence in the estimate we made with it.
So what happened exactly when we shift the data (so the mean will be 0)? Any formula for the data shift?
value - mean
but how would you do this for a test statistic (like a correlation coefficient), where creating a "null data set" from which to resample is not as straightforward as just mean-centering the data?
See: www.sciencedirect.com/science/article/abs/pii/S0167715217303450
TRIPLE BAM!!@@statquest
When to use permutation over bootstrap (and the other way around) to calculate P-values?
If you have a relatively small dataset, you can use permutation. If it's relatively large, then you can use bootstrap.
@@statquest BAM!
where did the 0.05 come from at 5:33 ? thank you
0.05 is the standard threshold for hypothesis testing. For details, see: th-cam.com/video/vemZtEM63GY/w-d-xo.html
Hey Josh, thanks for this comprehensive explanation!
I'm a bit confused why you need to add values greater than and equal to 0.5 but also values less than and equal to -0.5 for the p-value? Why can't I just look at values >=0.5?
It is 0.05 actually. To reject the null hypothesis, observe results must be rare. Such that the probability of observing such results is
In this video we calculate a two-sided p-value and I describe these, and the reasons for them, extensively in this StatQuest on p-values: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
I wish I had more time for your videos. Non only they are high standard pieces of higher education but also a moment to relax and to enjoy the day.
bam!
I'm sorry, where did 0.05 come from, in 5:29?
0.05 is the standard threshold that we use when we try to understand if a p-value is significant or not. Values less than 0.05 have "statistical significance" and values larger than 0.05 don't.For more details, see: th-cam.com/video/vemZtEM63GY/w-d-xo.html
@@statquest Bam :)
Hi Josh, I have a question, how I can contact you and ask my question?
If you have a question about my videos, the best place to ask it is right here, in the comments.
@@statquest Yes, but I need to write a bit of narrative to clarify my question related to Bootstrap but not particularly your nice video. I am a risk analyst working at a company and also doing my PhD in the field of actuarial science. We recently encountered an issue related to a model being used at the company.
@@saeidsas2113 Unfortunately I don't have time to do much consulting work. :(
@@statquest @statquest , If you do not mind I shoot my question here :) To begin with, I am a model validator, and one of our tasks is to ensure that a model works as expected and is fit for business purposes. To do so, back-testing is typically performed to check the model performance. In a nutshell and simple language, we have the following problem:
A financial model generates thresholds at a confidence level of 90 percent. In order to check the model performance, it is important to count the number of defects over a given period which is usually 250 working days (i.e., one year). The defect is defined as below:
A defect occurs if the relative market movement in 10 days is greater than the threshold, in other words:
log(P_{t+10} /P_{t}) > v_t, where i = 1, 2, ..., 240 and P_{i} is the market price at time t and v_t stands for the thresholds comes out of the model. Note that the market movements are obtained on a rolling basis so we have overlapping intervals. If we believe that the model works good, then one can expect that the number of defects observed over 240 should be 2.4 ~ 3 violations because only at the confidence level 90 percent there is 10 percent chance for observing defects, i.e., 240*0.01 = 2.4.
Now let's consider the test hypothesis that needs to be done in order to back-test the model:
Null hypothesis: p = 0.01
Alternative hypothesis: p > 0.01
where p is the probability of defect. Under the null hypothesis, the model works as expected because the probability of defect is 1% which is acceptable at the confidence level of 90 percent. Here are the steps taken to back-test the model
1) Compute the spread which is the difference between the market movement and threshold, i.e., Spread = log(P_{t+10} /P_{t}) - v_t
2) Generate 1000 synthetic samples each with size 240 from the original spreads while preserving the dependency structure, for example, the Maximum Entropy Bootstrap approach is applied in this stage.
3) Count the number of positive spreads (indicating defects) for each synthetically generated sample.
4) Obtain the defect ratio for each synthetically generated smaple using (#defects)/240.
5) Use the distribution of the generated defect ratios (i.e., the probability of defect) to find the p-value corresponding to the above hypothesis test. So, using p*_1, p*_2, ..., p*_1000 we calculate the following probability:
p-value = P_H0( p > 0.01 ) that is approximated basedo the distribution of p*_1, p*_2, ..., p*_1000.
My question: Here the quantity under consideration is the probability of a defect or we could consider the defect rate. If the observed defect rate in the original data set is greater or less than 0.01, then we need to apply a transformation, like what you did for mean where you shifted the data to get zero mean, to have ratio equal to 0.01 and then generate samples from spreads for which the defect ratio is 0.01 to compute the probability of being greater than 0;01 under the alternative hypothesis right?
@@statquest It is fine howevere I already asked my question and I think it is interesting to be taken into account. Feel free to answer it. Thank you for your time.
Perhaps because of the different ways of thinking between East and West, as an Asian I find it easier to understand not to switch to a mean of zero and use the drug no effect as -0.5, but to do so is somewhat inconsistent with the null hypothesis method,good tutorial.
There is another problem, that is, the example of 0.36 probability and the probability of less than -0.5 is 0.16 and the probability of greater than 0.5 is 0.47, which seems to be a bit contradictory to do bootstrapping on the basis of the null hypothesis. If the bootsrtrapping times are enough, shouldn't the probability of less than 0.5 and greater than 0.5 be equal?
What time point, minutes and seconds, are you asking about?
Dear Josh,time point is 4:07, the probability of less than or equal to-5 is 0.16, greater than or equal to 5 is 0.48 in time pint4:10. Is this probability a reasonable example? If bootstrapping enough times, shouldn't 0.16 be equal to 0.48?
In addition, why can't the paper version of the book be sent to China? I bought it in Japan and transferred it from Japan to China.@@statquest
@@SunSan1989 My guess is that they will probably meet in the middle. As for my book, there should be a Chinese version (and translation) available in the next year. People are working on it.
Sorry, since my English is not very good, I want to confirm that my understanding 0.16 should be replaced with the same value as 0.48. Is this understanding correct? @@statquest
@@SunSan1989 No, I'm not sure what the value will be, but the sum will probably still add up to something close to 0.63
Can we use bootsrap to calculate confidence interval (%) for conditional event element like cross-tab element and how? Thank you
Probably, but I don't know what a cross-tab element is so it would be better to get someone else to answer.
@@statquest cross-tab actually who use spss know it
It is cross table like cross two variables such as as gender and healthy (yes or no), you will end with 4 group, i want to know if i can consider the each group as independent group and calculate CI as normal
Note: i have searched on the answer from months, thank you alot
1) Make a bootstrapped Dataset
2) Calculate a statistic
3)???
4) Profit.
:)
can you do pytorch implementation for ann.and fuzzy systems.please sir
I'll keep that in mind.
I gave it a try today. It's still not working / returning what I want it to return.
Noted
Isn’t this just randomization inference and you’re testing the sharp null hypothesis?
I believe they are different: jasonkerwin.com/nonparibus/2017/09/25/randomization-inference-vs-bootstrapping-p-values/
How do you shift the data?
At 2:29 I say that we shift the data to the left by 0.5 units (where 0.5 is the mean of the data). That means we subtract 0.5 from each value in the dataset.
@@statquest but why Josh? If you have the bootstrap distribution and you calculate the 95% confidence interval you can say if the hypothesis can be rejected or not? If 0 is in than it can't be rejected. So why shift the data it doesn't matter?
@@drachenschlachter6946 Because this video is talking about how to calculate p-values, not confidence intervals. The first bootstrapping video describes confidence intervals (and does not require shifting the data): th-cam.com/video/Xz0x-8-cgaQ/w-d-xo.html
Could you please upload 5 unavailable hidden videos?
Which ones?
lets says we dont see the p values and see that the 95% confidence interval is crossing 0 at 5:41 then cant we say that the majority of means are crossing 0 therefore drug has been helping in the recovery instead of having no effect. I mean, with confidence interval point of view.
This example is not great for discussing CIs because we shifted the data to be centered on 0. If we wanted to calculate a CI, we would do this: th-cam.com/video/Xz0x-8-cgaQ/w-d-xo.html
@@statquest ohkk thanks bam!
I cannot understand why do we care about the region of -0.5..
Given a data with mean 0.5 and variance v, how likely i see this data if the mean is 0. lets assume the data is from a normal distribution, N
p-value = P(mean >= 0.5| N(0, v))
if p-value reject H0
if p-value > 0.05: it is likely that the H0 is true => cannot reject H0
where is the role of -0.5 here?
I almost always use two-sided p-values, and I explain the reasons here: th-cam.com/video/JQc3yx0-Q9E/w-d-xo.html
Why don't people just use bootstrapping for everything instead of worrying about robust standard errors and other types of similar concerns?
It's a good question. The answer, I believe, is "power". Bootstrapping works in all kinds of situations, but (I believe) it has less power than parametric methods.
@@statquest Thank you!
@@statquest That's a really good question, dear Josh, can you make a video about the differences in power? Thank you for the tutorial.I appreciate it very much.
Can you please meet 3blue1brown?
If you two would do something together it would surely be glorious
That would be a dream come true. I wonder what the best way would be to introduce myself.
@@statquest does asking your crowd to spam his comment section go against youtuber's etiquette? 😁
@@dbuezas I bet. Maybe we can find another way. I'll do what I can.
ah the elusive triple bam
:)
make another video talking one-sided test?
You can just multiply the p-value by 2.
That was a bam with different statistics.
:)
My brother thought I was watching Blue's Clues, but stats edition
bam!
Q: What's the significance of a urine test?
A: The p-value!
Ugh! ;)
@@statquest Q: What do claims adjusters use to estimate hail damage?
A: Confi-dents intervals.
There is no reason to subtract the mean of the distribution before bootstrapping and then adding it later. Just bootstrap the original data and see where the original mean is in the generated histogram.
I shifted the data because the null hypothesis is that the "true mean" is 0 and it's helpful to see how the distribution would be distributed around 0 in that case.
بحبك
bam!