Bitesize Bioiniformatics: Downloading sequencing data from GEO and SRA

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ส.ค. 2024
  • In this video we're going to go through some of the different options you have for downloading raw sequence data in fastq format from the big public sequencing databases, GEO and SRA. We'll look at a couple of different database interfaces and other accessory tools which can help unlock the valuable data that these systems provide.
    Some of the sites and tools which we mention specifically are:
    GEO: www.ncbi.nlm.nih.gov/geo/
    ENA: www.ebi.ac.uk/ena
    SRA Explorer: sra-explorer.info/
    SRA Toolkit: ncbi.github.io/sra-tools/inst...
    SRA Downloader: github.com/s-andrews/sradownl...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 28

  • @sahanamuthukumar6732
    @sahanamuthukumar6732 3 ปีที่แล้ว +6

    Thank you soooo much. . . . After cracking my head for 3 days . . For the raw sequence data. . Finally I understood. . . . Thanks to you

  • @ishasingh9809
    @ishasingh9809 4 ปีที่แล้ว +4

    Thank you for all these amazing videos. They are highly recommend for biologist who lack bioinformatics skills. Can you please make videos on histone ChIP seq analysis starting from fastqc to GO since all the available ones are really old. Thanks again and please keep up the great work 👍

  • @Mr2009johnsteele
    @Mr2009johnsteele 4 ปีที่แล้ว +1

    Thanks, really helpful video. Keep them coming!

  • @MrJonathanU
    @MrJonathanU ปีที่แล้ว

    This is an excellent video. It clarifies so much! Cheers!

  • @berrydp
    @berrydp ปีที่แล้ว

    Great video and walk-through. Thank you!

  • @shravastimisra6793
    @shravastimisra6793 ปีที่แล้ว

    Very useful resource for us dummies. Thank you!

  • @sarahaghani7663
    @sarahaghani7663 2 ปีที่แล้ว

    This was super useful! Thank you so much!!!

  • @vitortarghetta418
    @vitortarghetta418 3 ปีที่แล้ว

    This video is Amazing! Thanks for uploading it

  • @fmetaller
    @fmetaller 3 ปีที่แล้ว +1

    Thanks. It's a great guide!

  • @antoniarosaneta
    @antoniarosaneta 2 ปีที่แล้ว

    Thank you very much for this valuable content. Helped me a looooot

  • @nesibesebnem2685
    @nesibesebnem2685 3 ปีที่แล้ว

    such a valuable content mashallah. thanks a lot.

  • @CoCo-bv4mn
    @CoCo-bv4mn 3 ปีที่แล้ว

    What a invaluable material

  • @taifshah9003
    @taifshah9003 3 ปีที่แล้ว

    Thank you for your quick response...

    • @BabrahamBioinf
      @BabrahamBioinf  3 ปีที่แล้ว

      Please contact us at babraham.bioinformatics@babraham.ac.uk with any questions. Thanks.

  • @romanatorx3949
    @romanatorx3949 3 ปีที่แล้ว

    Amazing video - I love the --cantspell :D

    • @simonandrews5604
      @simonandrews5604 3 ปีที่แล้ว

      Thanks! Fortunately they've fixed that so it's not an issue with newer releases. We've also expanded support in SRAdownloader - it can now also download fastq files directly from ENA which seems quicker and more reliable, and it can also just be fed a list of SRR accessions (or a single SRR name) to get the corresponding data.

  • @yajinghe1092
    @yajinghe1092 ปีที่แล้ว

    Very great lesson! I have a question, at around 15:02min, when you swich the downloaded fastq data to analysis screeen, what is that analysis app? I cannot get it at this step. Thank you very much!

    • @BabrahamBioinf
      @BabrahamBioinf  ปีที่แล้ว

      Hi, the software download at 15:02 is the route (direct from NCBI) which is NOT recommended. Other options are detailed from 17:04

  • @guruprasadh7928
    @guruprasadh7928 3 ปีที่แล้ว

    Thank you for the informative video. I have a question and would request you to help me out. Should we consider SRR files as technical replicates or should we have to pool the SRR files for further analysis?

    • @simonandrews5604
      @simonandrews5604 3 ปีที่แล้ว

      If you have multiple SRR accessions for a single SRX then the implication from the strucutre of the database is that these are technical replicates of the same sample, so the same library split across multiple sequencing lanes. I would therefore look at merging them before analysing them. I'd also recommend reading the associated paper and metadata in case the submitters have done something strange within the constraints provided by GEO/SRA though.

    • @guruprasadh7928
      @guruprasadh7928 3 ปีที่แล้ว

      @@simonandrews5604 Thank you so much for helping me out.

  • @abdullahimuhammadsirajo9647
    @abdullahimuhammadsirajo9647 ปีที่แล้ว

    Great and informative video. Thanks for this. But is it possible to convert the downloaded data to say excel/csv format unaltered?

    • @simonandrews5604
      @simonandrews5604 ปีที่แล้ว

      The raw data you get from these databases are not going to be suitable to put into something like excel - they'd be way too big. If you're after quantitations generated from the data then every entry in GEO has to have a quantitated data file with it. There are no fixed rules for what this file has to be so the contents vary wildly from sample to sample. In some cases you would have a file which would be compatible with a spreadsheet (a matrix of samples vs counts or normalised expression for example), but in many cases even this won't be suitable (bigWig files for whole genome quantitation for example). The quantitated data will appear as a supplementary file at the bottom of the sample's GEO page. The metadata on the sample will describe what the quantitation is and how it was generated.
      A more reliable way to deal with this is to process the data to generate the quantitations you want. It's more work but at least you know what you're getting and you can do it consistently.

    • @abdullahimuhammadsirajo9647
      @abdullahimuhammadsirajo9647 ปีที่แล้ว

      @@simonandrews5604 Thank you for your detailed clarification. I really appreciate it.

  • @michaelagronah
    @michaelagronah 2 ปีที่แล้ว

    Thanks so much for this video. I am running Ubuntu 20.04 and can't get sradownloader installed on it. Does sradownloader work on Ubuntu 20.04?

    • @simonandrews5604
      @simonandrews5604 2 ปีที่แล้ว

      It should do - anywhere with a recent python should work. If you're having problems can you open an issue in the sradownloader issue tracker and post the full output of the command you ran.

    • @michaelagronah
      @michaelagronah 2 ปีที่แล้ว

      @@simonandrews5604 Thanks so much for the quick response. I will update my python and reinstall it. Once again thanks so much

  • @kennyday8767
    @kennyday8767 2 ปีที่แล้ว

    Hmm "Bitesize" Bioinformatics...this video is 45m long...and goes well outside the scope of SRA.