Correction: 4:15 I say getting 7 oranges and 1 blue is just as rare a getting 7 blues and 1 red. This is incorrect, since there are more blues and oranges in general than there are blues and red. However, the idea that we add up rarer events is correct. Here's the R-code: data
There are a lot of possible events in this case. Can you point me to a source where I can find a reliable method of finding rarer events in such cases? The only method I can think of right now is to intuitively guess which ones could be rarer and then calculating their probabilities to check if they are indeed rare. However, this has a chance of missing some events. Edit: I understand that hypergeometric distribution can be used to calculate the probability of getting 7 blues and 1 red (instead of the longer method you described). However, that still leaves many possibilities.
@@MeetSinojia7 There are mathematical formulas that take care of this for you. So, for fisher's exact test, see: en.wikipedia.org/wiki/Fisher%27s_exact_test
It's awesome how you link all your video for knowledge that may be needed for what we are currently discussing that is covered in another one of your video lectures! Thanks! Awesome! BAMMMMMM!
Thanks duuude! I just found your channel; your tutorials are simple to follow. Love how you break down complex matter into tiny digestable bites of data, and process and represent it in your videos, so fine. THANK YOU!
I really like how this video is condescendingly elementary, as if it were targeted towards an 8th grader as opposed to a university student. It's a refreshing break from all the in-depth websites that don't explain why we use the Hyper-Geometric distribution for the test statistic (I only made the connection after watching this video) and it's actually kind of funny.
I'm sorry this sounded condescending. My original target audience was post-doctoral genetics researchers. These people all had PhDs, but they were not statisticians. My goal was to explain it in a way that they could understand.
The histogram you show from the internet that has 8 blues etc, from the standard bag, is of course just the means I assume from a large sample of bags. (rounded to integers as well). But what if your handfull had been 10 blues and 1 red? then you cannot use the system you showed to calculate the probability right? Thanks for the great videos
Hi Josh, I love your channel. I have a question for your interpretation of the bag. Your p-value for your draw of M&Ms is 0.01, which is below the standard threshold for significance. When evaluating the meaning of this value however, is it more accurate to say that your bag is special, or that your draw is special? Best, -OC
@@statquest thanks! Then in the instance that we open the bag and see that indeed there is a normal distribution of M&M's, we would have had a false positive?
Sir, could you please tell me, what are the hypotheses we are going to assume for this Fisher-exact test. Is it the same null and alternative; there's no significant difference in the probability and vice versa respectively? In between I love your videos, thank you for making them. :)
This may be an obvious question, but how would Fisher's Exact test be applied to word frequencies in two datasets (i.e., corpora)? For example, let's say there are 5 instances of "stats" in Text #1, which has 100,000 words, but 75 instances in Text #2, which is 200,000. My field previously used chi-square and LL for this kind of test, but a researcher proposed Fisher's Exact test as a better alternative, and I'm having trouble wrapping my head around it. Thanks!
The idea is is, if you pulled 100,000 words from dataset #2, how similar or different would the distribution of words be from the dataset #1. The null hypothesis is that dataset #2 has the same proportion of each word, so if you get a very different batch of words when you sample from it, you can reject that hypothesis.
Thank you for all your excellent videos. Please let me understand why 7 oranges and 1 blue is equally rare to 7 blues and 1 red at 4:15. The probability of 7 oranges and 1 blue isn't higher than 7 blues and 1 red?
hmmm thanks josh please i have a question when we plot the original distribution of the population normaly the distribution would be multinomial ??????
Hi Josh, can you give more talks about the probability distributions, statistics of edgeR. I am a frequent user of edgeR, but even after reading their doc for several times, i still can't fully understand.
I already have two other videos on edgeR (and DESeq2). Check out the section for High-throughput Sequencing Analysis at the bottom of my index page: statquest.org/video-index/
StatQuest with Josh Starmer Thanks for reply. I checked them. Actually, i watched them before. But it’s not what I asked for. I was talking about things like why should we consider the counts are in negative binomial distribution, bcf and such.
The 'data' table you create in your R code is not a contingency table, is it? A picked M&M is in both the handful and bag column which means the marginal for those rows is wrong, I think but I'm not sure!
What if I just have 2 samples but no ideal probabilities and I want to compare how likely it is whether the probabilities of their binomial distributions the samples come from are different?
@@statquest Right, but my table has low values like, 0 , and 1. so, it does not meet the requirements for fisher's test. what is the best test in this case?
Hi, thanks for producing these contents ....... They help me a lot .... I replicated the script in the R language. " fisher.test (data, alternative = "less") fisher.test (data, alternative = "greater") " I did not understand how the p-value remained the same (0.01473) for the alternative hypotheses P0 P1. Can you clarify? What are the alternative and null hypotheses of the video problem? VLW.......
Thanks, Josh, I have a question: the null hypothesis is "my bag is special", p-value is 1% (less than 5%), shouldn't we reject the null hypothesis, meaning my bag is not special??? But intuitively, 7 blue + 1 red is very rare so it kinda tells my bag is special, where am I wrong???
That's exactly right. The null hypothesis, "that my bag is not special", although not explicitly stated, is implied when we get a super small p-value and conclude that my bag is special.
@@statquest Thanks, man. I like your video a lotttttttttttttttt. Keep it up! I learnt so much stats behind ML, you are the best! Are you a prof or phd candidate?
Can Fishers exact test be used to calculate the p-value using Bedfords Law. It would be a 9x2 or 2x9 table? Some of the expected frequencies would be less than 5
@@statquest Thank you. Would you know the formula to use for the Fishers exact test to test Bedfords Law? All I can find on the internet and youtube is the 2x2 formula. (a+b)!(c+d)!(a+c)!((b+d) divided by a!b!c!d!n!
For your p value calculation, you sum equal events and anything rarer. For the test in which this model was created- the lady drinking tea- wouldn't the p value then include her being 100% correct and 100% wrong?
A histogram is a type of bar graph that shows you how many "counts" are in each column, and since each individual m&m is a count, then we have a histogram. And you're right, this 'Quest needs a song. :)
To do this by hand, you just follow the example that starts at 1:25 and you do that for all of the different combinations. However, to get a computer to do the work for you, you simply plug the numbers into a multivariate hypergeometric distribution.
@@statquest but you haven't explicitly mentioned when you start using the hypergeometric distribution...so it would be helpful if u put up a time stamp please...
@@yatharthghorawat7535 Basically the entire video shows how to calculate the hypergeometric distribution. Fisher's exact text then redoes this calculation as mentioned at 3:56 in order to get a p-value.
@@silviapetrova8562 I talk about how to calculate it at 3:56 in this video. And I have an entire video that explains how p-values are calculated here: th-cam.com/video/5Z9OIYA8He8/w-d-xo.html
Hello Thank you I follow the link from the other video. To remind about my question. It is quite similar to this video, but it is a bit different. My situation would be like M&Ms are in different colors, 40 colors. Then let say I pick up 8 M&Ms, and I get red as in the video (ignore another colors). I would like to know how rare that I randomly pick 8 M&Ms and get red. So, I repeat the test for 100 times and out of 100 times, I get red M&M only 3 times. Then I want to know the P-value of getting 3/100 times red M&M. Is it possible to calculate?
Yes, it's possible to calculate. You can use the hypergeometric distribution (as seen in this video) to calculate the probability of getting 8 out of 8 red m&ms. Then you can use that to calculate the probability of not getting 8 out of 8 red m&ms (1 - the probability of getting 8 out of 8 red m&ms). Then you can just calculate the probability of getting 3/100 red m&ms, plus 4/100 red m&ms plus 5/100 red m&ms etc. to get the p-value.
Hi Josh, could you explain me how to relate these probabilities calculations with a distribution? could you show the probabilty distribution of your example, please?
Can u please explaine to me what does " enrichment analyse" mean ?why do we call it enrichment?? What's it step? I was looking for it since Hitler time but i did't find the answer 😭😭😭
Correction:
4:15 I say getting 7 oranges and 1 blue is just as rare a getting 7 blues and 1 red. This is incorrect, since there are more blues and oranges in general than there are blues and red. However, the idea that we add up rarer events is correct.
Here's the R-code:
data
There are a lot of possible events in this case. Can you point me to a source where I can find a reliable method of finding rarer events in such cases? The only method I can think of right now is to intuitively guess which ones could be rarer and then calculating their probabilities to check if they are indeed rare. However, this has a chance of missing some events.
Edit: I understand that hypergeometric distribution can be used to calculate the probability of getting 7 blues and 1 red (instead of the longer method you described). However, that still leaves many possibilities.
@@MeetSinojia7 There are mathematical formulas that take care of this for you. So, for fisher's exact test, see: en.wikipedia.org/wiki/Fisher%27s_exact_test
It's awesome how you link all your video for knowledge that may be needed for what we are currently discussing that is covered in another one of your video lectures! Thanks! Awesome! BAMMMMMM!
Thanks!
U know u hav beaten the youtube algorithm when it starts recommending statquest videos. Bam!!!!!!
bam! :)
Thank you for making this video. Google and NCBI gave me stat coma.
Thanks duuude! I just found your channel; your tutorials are simple to follow. Love how you break down complex matter into tiny digestable bites of data, and process and represent it in your videos, so fine. THANK YOU!
Awesome, thank you!
I really like how this video is condescendingly elementary, as if it were targeted towards an 8th grader as opposed to a university student. It's a refreshing break from all the in-depth websites that don't explain why we use the Hyper-Geometric distribution for the test statistic (I only made the connection after watching this video) and it's actually kind of funny.
I'm sorry this sounded condescending. My original target audience was post-doctoral genetics researchers. These people all had PhDs, but they were not statisticians. My goal was to explain it in a way that they could understand.
Can you explain the process behind getting the p-values for Fisher's exact test?
I talk about this starting at 3:46
The histogram you show from the internet that has 8 blues etc, from the standard bag, is of course just the means I assume from a large sample of bags. (rounded to integers as well). But what if your handfull had been 10 blues and 1 red? then you cannot use the system you showed to calculate the probability right? Thanks for the great videos
You can extrapolate the proportions of each color to accommodate any size "handful" of m&ms.
Hi Josh, I love your channel. I have a question for your interpretation of the bag.
Your p-value for your draw of M&Ms is 0.01, which is below the standard threshold for significance.
When evaluating the meaning of this value however, is it more accurate to say that your bag is special, or that your draw is special?
Best,
-OC
The bag is special. The handful is an estimate/approximation of the entire bag.
@@statquest thanks! Then in the instance that we open the bag and see that indeed there is a normal distribution of M&M's, we would have had a false positive?
@@ocm6382 Correct.
Sir, could you please tell me, what are the hypotheses we are going to assume for this Fisher-exact test. Is it the same null and alternative; there's no significant difference in the probability and vice versa respectively? In between I love your videos, thank you for making them. :)
Yes, that is correct. It's the same null, that there is no difference.
Could you please make a clip about the difference between several tests (Chi-squared, Fisher, t-test,...) and When to use it
I'll keep that in mind.
This may be an obvious question, but how would Fisher's Exact test be applied to word frequencies in two datasets (i.e., corpora)? For example, let's say there are 5 instances of "stats" in Text #1, which has 100,000 words, but 75 instances in Text #2, which is 200,000. My field previously used chi-square and LL for this kind of test, but a researcher proposed Fisher's Exact test as a better alternative, and I'm having trouble wrapping my head around it. Thanks!
The idea is is, if you pulled 100,000 words from dataset #2, how similar or different would the distribution of words be from the dataset #1. The null hypothesis is that dataset #2 has the same proportion of each word, so if you get a very different batch of words when you sample from it, you can reject that hypothesis.
Thank you for all your excellent videos. Please let me understand why 7 oranges and 1 blue is equally rare to 7 blues and 1 red at 4:15. The probability of 7 oranges and 1 blue isn't higher than 7 blues and 1 red?
You are correct! I'll make a note of that.
What if you get three blue one brown?
Hahaha!!!! Excellent question! :)
hmmm thanks josh please i have a question when we plot the original distribution of the population normaly the distribution would be multinomial ??????
yes
@@statquest thankk you so muchh josh biggg support and love from tunisia
Hi Josh, can you give more talks about the probability distributions, statistics of edgeR. I am a frequent user of edgeR, but even after reading their doc for several times, i still can't fully understand.
I already have two other videos on edgeR (and DESeq2). Check out the section for High-throughput Sequencing Analysis at the bottom of my index page: statquest.org/video-index/
StatQuest with Josh Starmer Thanks for reply. I checked them. Actually, i watched them before. But it’s not what I asked for. I was talking about things like why should we consider the counts are in negative binomial distribution, bcf and such.
@@Yzhang250 I see! I'm sorry that I don't have additional videos on those subjects.
Thank you for the great explanation! I am wondering where can I find the R code for this one?
Unfortunately I think I just wrote the code as a one off and didn't save it.
hooray! mindblowing! thank you Josh :D
Thank you! :)
thx josh i have questiom what is the difference between this test and the fisher test that we learnt from linear regression
Are you asking about the F-test? It's very different. For details, see: th-cam.com/video/nk2CQITm_eo/w-d-xo.html
@@statquestyess aaah okay thank you so much
The 'data' table you create in your R code is not a contingency table, is it? A picked M&M is in both the handful and bag column which means the marginal for those rows is wrong, I think but I'm not sure!
Thanks for noting that!
What if I just have 2 samples but no ideal probabilities and I want to compare how likely it is whether the probabilities of their binomial distributions the samples come from are different?
I think this might be a good test for that.
Hey thanks for great videos. what is the best stat analysis for a 2 by 3 table? treatment and control in columns and in rows 3 conditions.
It depends on your data, but fisher's can generalize to a 2 x 3 table.
@@statquest Right, but my table has low values like, 0 , and 1. so, it does not meet the requirements for fisher's test. what is the best test in this case?
@@mehdihjamadi3225 Hmm.... If you have a lot of values in the other conditions, it shouldn't matter. If you don't, then you're in a tight spot indeed.
@@statquest Thanks for responding and for the great videos
Hi, thanks for producing these contents ....... They help me a lot ....
I replicated the script in the R language.
"
fisher.test (data, alternative = "less")
fisher.test (data, alternative = "greater")
"
I did not understand how the p-value remained the same (0.01473) for the alternative hypotheses P0 P1. Can you clarify?
What are the alternative and null hypotheses of the video problem?
VLW.......
Very good, I think that the topic is fascinating, it would be interesting an in depth analysis, not just a quickie. :-)
Thanks! :)
Thanks, Josh, I have a question: the null hypothesis is "my bag is special", p-value is 1% (less than 5%), shouldn't we reject the null hypothesis, meaning my bag is not special??? But intuitively, 7 blue + 1 red is very rare so it kinda tells my bag is special, where am I wrong???
Oh I realized my mistake: the null hypothesis should be "my bag is NOT special"
That's exactly right. The null hypothesis, "that my bag is not special", although not explicitly stated, is implied when we get a super small p-value and conclude that my bag is special.
@@statquest Thanks, man. I like your video a lotttttttttttttttt. Keep it up! I learnt so much stats behind ML, you are the best! Are you a prof or phd candidate?
@@kennywang9929 I work in a research lab, helping people with the computational side of their experiments.
8C7*5C1/40C8
Does having a random 0 (not structural 0) bias the calculation of the Fisher's test p-value for a 2x2 contingency table?
Not that I know of.
Can Fishers exact test be used to calculate the p-value using Bedfords Law. It would be a 9x2 or 2x9 table?
Some of the expected frequencies would be less than 5
You could use fisher's exact test to compare data to the distribution defined by bedford's law.
@@statquest
Thank you.
Would you know the formula to use for the Fishers exact test to test Bedfords Law?
All I can find on the internet and youtube is the 2x2 formula.
(a+b)!(c+d)!(a+c)!((b+d) divided by
a!b!c!d!n!
@@tomp4925 Unfortunately I don't know it off the top of my head. I would just use the "fisher.test()" command in R to do it.
For your p value calculation, you sum equal events and anything rarer. For the test in which this model was created- the lady drinking tea- wouldn't the p value then include her being 100% correct and 100% wrong?
Yep.
But first, let's eat some eminems
Bam!
1:00 Shouldn't it be a Bar graph instead of histogram
PS. A stat quest without music is like a pizza without its topings
;)
A histogram is a type of bar graph that shows you how many "counts" are in each column, and since each individual m&m is a count, then we have a histogram. And you're right, this 'Quest needs a song. :)
How did you calculate the probabilities of all the combinations?
To do this by hand, you just follow the example that starts at 1:25 and you do that for all of the different combinations. However, to get a computer to do the work for you, you simply plug the numbers into a multivariate hypergeometric distribution.
Hi i just wanted to know if the formula of fishers exact test and Hypergeometric Distribution is the same or not? Please could u revert back
Fisher's exact test uses the hypergeometric distribution for calculations.
@@statquest so hypergeometric distribution acts as a step for it?
@@yatharthghorawat7535 You can see it in action in the video.
@@statquest but you haven't explicitly mentioned when you start using the hypergeometric distribution...so it would be helpful if u put up a time stamp please...
@@yatharthghorawat7535 Basically the entire video shows how to calculate the hypergeometric distribution. Fisher's exact text then redoes this calculation as mentioned at 3:56 in order to get a p-value.
I missed the song in the beginning of this video :)
:)
hmm no opening song :(
:(
Is this the same as Fisher's score?
Nope
What is "an idealized bag of m&ms" in 0:53 ?
It's a bag of m&m's that has the average percentage of each color.
Another great video!!!
Thank you! :)
i cant seem to understand why p-value = 0.01 means the bag is special
The p-value = 0.01 means that there are very few other bags like the one we have. So maybe "special" is not the right word. Maybe "rare" is better.
@@statquest i see then how do u define this value of 0.01
@@silviapetrova8562 I talk about how to calculate it at 3:56 in this video. And I have an entire video that explains how p-values are calculated here: th-cam.com/video/5Z9OIYA8He8/w-d-xo.html
tripple BAM!!
:)
Where is the intro song? 🥺
Ha! Some of the old ones don't have intros. Oh well. :)
Hello Thank you I follow the link from the other video.
To remind about my question. It is quite similar to this video, but it is a bit different.
My situation would be like M&Ms are in different colors, 40 colors. Then let say I pick up 8 M&Ms, and I get red as in the video (ignore another colors).
I would like to know how rare that I randomly pick 8 M&Ms and get red.
So, I repeat the test for 100 times and out of 100 times, I get red M&M only 3 times.
Then I want to know the P-value of getting 3/100 times red M&M. Is it possible to calculate?
Yes, it's possible to calculate. You can use the hypergeometric distribution (as seen in this video) to calculate the probability of getting 8 out of 8 red m&ms. Then you can use that to calculate the probability of not getting 8 out of 8 red m&ms (1 - the probability of getting 8 out of 8 red m&ms). Then you can just calculate the probability of getting 3/100 red m&ms, plus 4/100 red m&ms plus 5/100 red m&ms etc. to get the p-value.
Hi Josh, could you explain me how to relate these probabilities calculations with a distribution? could you show the probabilty distribution of your example, please?
I'll keep that in mind.
how can your bag be special if it is the same as the internets?
Ha! Good question. :)
u r hero
:)
Awesome!
BAM! :)
LOVE IT!
Thank you! :)
Can u please explaine to me what does " enrichment analyse" mean ?why do we call it enrichment?? What's it step? I was looking for it since Hitler time but i did't find the answer 😭😭😭
Related
The probability is 0.00000052, not 0.00000053.
Related