While this is technically correct, it is not very statistically sound. When adjusting for the bonferroni and bh procedures, we typically change the cutoff point, not the actual p-value. multiplying the p-value by the number of tests can lead to p-values greater than one (specifically for the bonferroni method, the bh method is already accounted for by dividing by the rank), which is impossible since a p-value is a probability between 0 and 1. while the end conclusion is the same, it doesn't make sense from a statistical standpoint. you can absolutely use this method to get the right significance, but if you are presenting this to a statistician or publishing this work, you would need to adjust the p-value cutoff instead of the actual p-value or change any p-value that is greater than 1 to be exactly 1, but even this is a little more nuanced than just multiplication.
Nice vid! I still don't quite understand, how you can adjust the p-values for BH without parsing an FDR-value. Does that mean I have to compare the adjusted p-values against my chosen FDR (q=0.05 in your video or 0.1 if thats my FDR)?
This is the same question I had when I learned this method the first time. You don't need to actually specify the FDR as input. Comparing it after like you mention is fine. This is the way a lot of bioinformatics software calculate it by default, like Deseq2, MEME, etc
That video was great. I now understand the difference between Bonferroni and BH. My questions is where can I get this type of data to play around? thanks Jay Kammer
Nice video. Could you please make a video with multiple variables.Which means multiple genes and multiple measures (for example 40 genes and 50 clinical measures). Thank you.
You are probably only going to correct along one axis, typically, the genes. This is irrespective of how many samples/measurements you have. Correction is for multiple hypothesis, i.e., each of the ~20k genes is one hypothesis: gene X is DE.
While this is technically correct, it is not very statistically sound. When adjusting for the bonferroni and bh procedures, we typically change the cutoff point, not the actual p-value. multiplying the p-value by the number of tests can lead to p-values greater than one (specifically for the bonferroni method, the bh method is already accounted for by dividing by the rank), which is impossible since a p-value is a probability between 0 and 1. while the end conclusion is the same, it doesn't make sense from a statistical standpoint. you can absolutely use this method to get the right significance, but if you are presenting this to a statistician or publishing this work, you would need to adjust the p-value cutoff instead of the actual p-value or change any p-value that is greater than 1 to be exactly 1, but even this is a little more nuanced than just multiplication.
This is a bit pedantic: of course probabilities don't go above 1.You can simply clip the data frame column to have a max value of 1.
Each video is valuable. If possible show us more stat models.
I have an "introduction to basic statistics in python" video in mind for the not-so-distant future
Nice vid! I still don't quite understand, how you can adjust the p-values for BH without parsing an FDR-value. Does that mean I have to compare the adjusted p-values against my chosen FDR (q=0.05 in your video or 0.1 if thats my FDR)?
This is the same question I had when I learned this method the first time. You don't need to actually specify the FDR as input. Comparing it after like you mention is fine. This is the way a lot of bioinformatics software calculate it by default, like Deseq2, MEME, etc
That video was great. I now understand the difference between Bonferroni and BH. My questions is where can I get this type of data to play around? thanks
Jay Kammer
Hmm, basically any dataset that has multiple tests. This is very common in omics/genomics/transcriptomics. Like GWAS, differential expression, etc
Nice video. Could you please make a video with multiple variables.Which means multiple genes and multiple measures (for example 40 genes and 50 clinical measures). Thank you.
You are probably only going to correct along one axis, typically, the genes. This is irrespective of how many samples/measurements you have. Correction is for multiple hypothesis, i.e., each of the ~20k genes is one hypothesis: gene X is DE.
How. To write in R for Benjamin
Super easy if you have a dataframe. You can similarly multiply the P value column by the length of the dataframe.