False discovery p value correction

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024
  • I show you how to correct for false discovery rate using the Bonferroni and Benjamini-Hochberg methods. Use FDR correction any time you test multiple hypotheses. My example is in python, but it is easily translatable to R.
    I use pandas, but you can easily apply this to a list of p values in numpy or by mapping over a native python list.

ความคิดเห็น • 12

  • @pauldahlke5618
    @pauldahlke5618 ปีที่แล้ว +1

    Nice vid! I still don't quite understand, how you can adjust the p-values for BH without parsing an FDR-value. Does that mean I have to compare the adjusted p-values against my chosen FDR (q=0.05 in your video or 0.1 if thats my FDR)?

    • @sanbomics
      @sanbomics  ปีที่แล้ว +1

      This is the same question I had when I learned this method the first time. You don't need to actually specify the FDR as input. Comparing it after like you mention is fine. This is the way a lot of bioinformatics software calculate it by default, like Deseq2, MEME, etc

  • @JacobFeinas
    @JacobFeinas 4 หลายเดือนก่อน

    While this is technically correct, it is not very statistically sound. When adjusting for the bonferroni and bh procedures, we typically change the cutoff point, not the actual p-value. multiplying the p-value by the number of tests can lead to p-values greater than one (specifically for the bonferroni method, the bh method is already accounted for by dividing by the rank), which is impossible since a p-value is a probability between 0 and 1. while the end conclusion is the same, it doesn't make sense from a statistical standpoint. you can absolutely use this method to get the right significance, but if you are presenting this to a statistician or publishing this work, you would need to adjust the p-value cutoff instead of the actual p-value or change any p-value that is greater than 1 to be exactly 1, but even this is a little more nuanced than just multiplication.

    • @sanbomics
      @sanbomics  3 หลายเดือนก่อน

      This is a bit pedantic: of course probabilities don't go above 1.You can simply clip the data frame column to have a max value of 1.

  • @jacobkammer2843
    @jacobkammer2843 ปีที่แล้ว

    That video was great. I now understand the difference between Bonferroni and BH. My questions is where can I get this type of data to play around? thanks
    Jay Kammer

    • @sanbomics
      @sanbomics  ปีที่แล้ว

      Hmm, basically any dataset that has multiple tests. This is very common in omics/genomics/transcriptomics. Like GWAS, differential expression, etc

  • @subhasismohanty7166
    @subhasismohanty7166 2 ปีที่แล้ว +2

    Each video is valuable. If possible show us more stat models.

    • @sanbomics
      @sanbomics  2 ปีที่แล้ว

      I have an "introduction to basic statistics in python" video in mind for the not-so-distant future

  • @pragneydeme3876
    @pragneydeme3876 ปีที่แล้ว

    Nice video. Could you please make a video with multiple variables.Which means multiple genes and multiple measures (for example 40 genes and 50 clinical measures). Thank you.

    • @sanbomics
      @sanbomics  ปีที่แล้ว

      You are probably only going to correct along one axis, typically, the genes. This is irrespective of how many samples/measurements you have. Correction is for multiple hypothesis, i.e., each of the ~20k genes is one hypothesis: gene X is DE.

  • @user-op9sv4ly9u
    @user-op9sv4ly9u ปีที่แล้ว

    How. To write in R for Benjamin

    • @sanbomics
      @sanbomics  ปีที่แล้ว

      Super easy if you have a dataframe. You can similarly multiply the P value column by the length of the dataframe.