How to compute Fst from SNP genomic data

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ก.ย. 2024

ความคิดเห็น • 30

  • @katherinotalora3543
    @katherinotalora3543 2 ปีที่แล้ว +3

    How to calculate genetic diversity parameters, including nucleotide diversity (π), allelic richness (Ar), observed (Ho) and expected (He) heterozygosities and inbreeding coefficient (FIS), but with rarefaction process. Thank you
    Do you have any video of this? Thanks.

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว +1

      Hi, currently not, but these are good topics for the future. The Ho, He, and F could be computed with the --het PLINK option.

    • @catarinadebettencourteavil7703
      @catarinadebettencourteavil7703 2 ปีที่แล้ว

      I need the sameee

  • @mohammadj.shamim9342
    @mohammadj.shamim9342 2 ปีที่แล้ว +1

    Hello professor. I have solved the challenge of family automation as follow:
    breed1 = "ABR"
    breed2 = "ALB"
    write.table(breed1, 'breed1.txt', row.names = F, col.names = F, quote =F, fileEncoding = "UTF-8")
    write.table(breed2, 'breed2.txt', row.names = F, col.names = F, quote =F, fileEncoding = "UTF-8")
    # if one wants more families in one text file then extend the string as "ABR
    ALB" and so on. the "
    " simply means go to the next line
    If one thinks the text is going to be too much in case they do it for 100s of families, then use the loop:
    families = c("ABR", 'ALB', "WGA") # list of families
    for (fam in families){ #loops through all families
    write.table(fam, str_c("breed_",fam, ".txt"),
    row.names = F, col.names = F,
    quote = F, fileEncoding = 'UTF-8')
    print(str_c("breed_",fam,'.txt has been created')) # lets us know of created files
    }
    Of course, It is also possible to automate the whole process so that fst is calculated between all pairs of families with one click

  • @georgewanjala4605
    @georgewanjala4605 3 ปีที่แล้ว +2

    I have tried GAL and ALP, then GAL and KAR, the manhattan plot gives a similar trend.

  • @khuramrazzaq2388
    @khuramrazzaq2388 ปีที่แล้ว +1

    Hi. I have SNP data of 250 varieties that belong to 5 groups please guide me on how to find pair-wise FST and I need to see variation percentages among groups and within groups?

    • @GenomicsBootCamp
      @GenomicsBootCamp  ปีที่แล้ว

      Hi,
      You can extract two populations all the time and run Fst with PLINK as desribed in this video, or in a more straightforward way compute a matrix of pairwise Fsts, as described in this video on the channel: Fst matrix with confidence intervals
      th-cam.com/video/f7ZNTIf6NW4/w-d-xo.html

    • @GenomicsBootCamp
      @GenomicsBootCamp  ปีที่แล้ว

      For the variation you need to get the individual Fst values, and compute with the statistical software of your preference. It is e.g. the "var()" function in R

  • @fakharunnisa2178
    @fakharunnisa2178 2 ปีที่แล้ว +1

    how to increase log value on Y Axis?

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว

      This question is unclear. Could you re-formulate?

  • @matamatosa8898
    @matamatosa8898 3 ปีที่แล้ว +1

    How to calculate it for average snps? One fst for pop

    • @GenomicsBootCamp
      @GenomicsBootCamp  3 ปีที่แล้ว +1

      Hi,
      - if you want just a plain mean Fst between the two breeds/populations, this is given by #PLINK as shown at the time point 10:42 of the video (last two lines of PLINK output)
      - if you want more sophisticated statistics or visualizations on it, it is best to read it into R or another environment and do it there

    • @matamatosa8898
      @matamatosa8898 3 ปีที่แล้ว

      @@GenomicsBootCamp
      Hi , i'm trying to do pairwise-fst between pops ,
      i found this function in plink2 , www.cog-genomics.org/plink/2.0/basic_stats#fst
      but
      i can't get the command to work , can you tell how the command is written correctly and what constitute the first "categorical or binary phenotype name" ?

  • @mahboobezamani4767
    @mahboobezamani4767 2 ปีที่แล้ว +1

    Could you make a video using HyPhY and FUBAR?

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว +1

      Hi! Both of these seem to be useful for detecting selection signatures. I am not familiar with them though, so the first step is to get reliable results for myself. It is unclear when would that happen (time limitations).

    • @mahboobezamani4767
      @mahboobezamani4767 2 ปีที่แล้ว

      @@GenomicsBootCamp sure, totally understandable.

  • @darwin6883
    @darwin6883 3 ปีที่แล้ว +1

    Do you have a github page for the R code?

    • @GenomicsBootCamp
      @GenomicsBootCamp  3 ปีที่แล้ว +2

      Thanks for the reminder on this... The code is now available via the link in the description. Not GitHub, but another site for sharing scripts.

    • @darwin6883
      @darwin6883 3 ปีที่แล้ว +2

      ​@@GenomicsBootCamp Thank you so much. I am in an evolutionary biology PhD program (you can see the inspiration for my youtube name, haha) and your videos are helping me immensely with my work.

  • @darwin6883
    @darwin6883 3 ปีที่แล้ว

    you are amazing!

  • @mahboobezamani4767
    @mahboobezamani4767 2 ปีที่แล้ว

    How about more than two populations? I do have SNPs for 802 populations and am looking for the positive selection.
    Could you please advice me.
    Thanks

    • @mahboobezamani4767
      @mahboobezamani4767 2 ปีที่แล้ว +1

      I mean 802 individual plants to compare

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว

      @@mahboobezamani4767 Hi, the Fst is essentially a comparison of allele frequencies between two groups. So if you can create certain groups with your 802 plants, e.g. according to the (sub)species, lines, region of origin, or similar, you can compare these with Fst. If you have more than two groups, you can do a pairwise comparison (there is also a video for this on the channel, using the --family option)
      Fst can identify the differing places on the genome, but it does not say what is the reason, or if this is a positive selection, or in which of the two groups. it merely says that there is a difference. But if you can identify the genes located in the signals, it helps you is some way.

    • @mahboobezamani4767
      @mahboobezamani4767 2 ปีที่แล้ว +1

      @@GenomicsBootCamp Hi, Thanks a lot for the quick reply.
      I don't think I can divide them in groups and do pairwise comparison but sure I will check your other videos.
      Basically I do have the SNPs (vcf format) for few genes where they are taken from 802 individual plants in one single project. I wanna see where there is an allele/s in any of those genes under positive selection. My understanding is to use different evolutionary tests to make sure the positive selection is not overestimated. I thought Tajima's and FST as well as calculating π might be helpful? Or using some software.

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว

      @@mahboobezamani4767 Fst compares allele frequencies, so for that you need two groups. You need to find a method for a selection signature analysis that could be used within a single population. Tajima's D seems to be ok for this, but I do not know any software on this I can recommend from own experience.

    • @mahboobezamani4767
      @mahboobezamani4767 2 ปีที่แล้ว

      @@GenomicsBootCamp I see. It makes sense. Thank you. Will see what I can do.