How to extract genomic regions with PLINK

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 ก.ย. 2024
  • This video gives an overview of ways how to narrow down your SNP genotype data to the region you are most interested in. It also shows examples of output files.

ความคิดเห็น • 26

  • @md.shahjahan9175
    @md.shahjahan9175 3 ปีที่แล้ว +3

    Thanks for your continuous videos on Genomics. I am also interested to learn the estimation of genetic parameters.

  • @sowadanognigamal134
    @sowadanognigamal134 ปีที่แล้ว +1

    I want to do Genome wide association study(GWAs) and I’m using single nucleotide polymorphism (SNPs) I have 173 snp data generated from our lab and I still need to add more up to 500 the remaining we can get them form online 3k rice data base how to extract those one form the 3000 data using Linux command

    • @GenomicsBootCamp
      @GenomicsBootCamp  ปีที่แล้ว

      Hi, the GWAS is one of the things I want to follow up on in the near future.
      As for your question, the answer depends on which format the 3K data is in. If it is a PLINK format (ped, bunary ped,vcf, or any other format PLINK is able to read), you can use the --extract option of PLINK, while specifying the exact SNP names you want to get out.
      If it is some kind of non standard format, you have to carefully extract the rows or columns corresponding to your chosen SNPs.

    • @sowadanognigamal134
      @sowadanognigamal134 ปีที่แล้ว

      @@GenomicsBootCamp it’s a PLINK format. I will try to see.. thank you Dr..

  • @kavoosmomeni4165
    @kavoosmomeni4165 5 หลายเดือนก่อน

    Thank you. That is very helpful. Please add a link to your scripts .

  • @samrawittsehay1610
    @samrawittsehay1610 3 ปีที่แล้ว +3

    Thank you very much, your videos are really helpful. I was wondering about the way of controlling population stratification in the association study.

    • @GenomicsBootCamp
      @GenomicsBootCamp  3 ปีที่แล้ว

      Personally I used fitted PCA components in a linear model, but I am not too convinced about these. There seems to be always something remaining via the QQ plot... My current go-to solution is the GEMMA software, which estimates relatedness matrix from genotypes. This is double win, as the software itself uses PLINK input files + the relatedness matrix could be implemented as a population structure/stratification in the follow up step right away
      bioinformaticshome.com/tools/gwas/descriptions/GEMMA.html

    • @samrawittsehay1610
      @samrawittsehay1610 3 ปีที่แล้ว +2

      @@GenomicsBootCamp Thank you, Prof. Gabor. I will have a look at it. My problem was with the QQ plot, hence, I got a high lambda value

  • @satishveto
    @satishveto 3 ปีที่แล้ว +2

    your videos on plink are really good

  • @Joy_Entertainment7
    @Joy_Entertainment7 2 ปีที่แล้ว +1

    Thank you very much for the simple and complete explanation.

  • @alejandrorubio5583
    @alejandrorubio5583 ปีที่แล้ว +1

    Thanks for this clear and complete explanation.

  • @zahrakhamis8554
    @zahrakhamis8554 2 ปีที่แล้ว +1

    Thank you Dr Gabor, the video is very helpful, l have a question: how can we make a loop in R to extract the snps from different chromosomes say 22 chromosome of human in one step ?

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว

      Hi,
      I am lagging with the answer her and you proabably found out already, but you have to put the PLINK command into an R loop, where you change the --ch option with each iteration as well as the --out file name, so you save the outputs under a different name.

  • @Joy_Entertainment7
    @Joy_Entertainment7 2 ปีที่แล้ว +1

    would you please make a video about how to analize genomic data and estimate genomic breeding value (e. g. with GCTA or other practical software)?
    Thank you in advance. and happy new year

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว +1

      Yes, the GCTA is on the plan, at some point. The estimation of genomic breeding values might also come, at some point, but this needs a substantial preparation time.

  • @Crass1000
    @Crass1000 ปีที่แล้ว

    Hi, is it possible to extract not by SNPs variants but regions by giving chr position?

    • @GenomicsBootCamp
      @GenomicsBootCamp  ปีที่แล้ว

      Yes. Use the from-kb/to-kb or similar options, possibly combining with --chr, if you want just one specific chromosome.
      www.cog-genomics.org/plink/1.9/filter#mrange_pos

  • @edossamerga4814
    @edossamerga4814 2 ปีที่แล้ว +1

    Thank you Professor for all your videos in Genomics but for the beginners if add some packages from the beginning of downloading and installing plink please

    • @GenomicsBootCamp
      @GenomicsBootCamp  2 ปีที่แล้ว

      Hi Edossa! I am not sure if I understand your request. Could you specify?

    • @edossamerga4814
      @edossamerga4814 2 ปีที่แล้ว

      @@GenomicsBootCamp how to download and install plink1.90

  • @tesfayegetachew7791
    @tesfayegetachew7791 3 ปีที่แล้ว +1

    Dear Gabor! Thank you very much. All your videos are really helpful. Could you tell me where I can found your R acripts?

    • @GenomicsBootCamp
      @GenomicsBootCamp  3 ปีที่แล้ว

      Hi, I am glad you like it! The one-liners are most likely just in the videos, but I put the larger scripts on a pastebin account here: pastebin.com/u/GenomicsBootCamp
      If any larger script is missing, just let me know and I amend that as well.

    • @tesfayegetachew7791
      @tesfayegetachew7791 3 ปีที่แล้ว

      @@GenomicsBootCamp Got it. Thanks!

  • @georgewanjala4605
    @georgewanjala4605 3 ปีที่แล้ว +1

    Thankyou professor for your tutorial.
    I am wondering how you got SNP.txt file.

    • @georgewanjala4605
      @georgewanjala4605 3 ปีที่แล้ว +1

      Esp these few because in my case, am getting snps from chr. 1 to 29.

    • @GenomicsBootCamp
      @GenomicsBootCamp  3 ปีที่แล้ว

      In this case I just manualy copy-pasted a few SNPs for the example and to demonstrate the format what is expected. So no specific reason for exactly these in the video.
      There is usually a good reason why someone want to extract or exclude specific SNPs, so the names of these should appear in a single column.