FASTQ, BAM, and VCF file formats easily explained - A must watch if you have had a DNA test

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ย. 2024

ความคิดเห็น • 55

  • @KatharineME
    @KatharineME  2 ปีที่แล้ว +7

    The CRAM file format is simply a newer and more compressed version of the BAM file format, for anyone who was wondering that :)

    • @programmer5350
      @programmer5350 2 ปีที่แล้ว +1

      Could you also do a video for the SLAM, JAM, and THANK YOU MA'AM file formats?

    • @cristianm7097
      @cristianm7097 2 ปีที่แล้ว +1

      @@programmer5350 Are you already familiar with the WHAM-BAM file formats ?

  • @zainabumarabdullahi9446
    @zainabumarabdullahi9446 8 หลายเดือนก่อน

    No one can ever explain this better, love from Australia!

  • @programmer5350
    @programmer5350 2 ปีที่แล้ว +7

    Awesome high quality bioinformatics video! We need more of these :)

  • @miguelarellano5260
    @miguelarellano5260 ปีที่แล้ว

    Thank you so much Katharine! you saved a biotech eng. student from Mexico! 🇲🇽

  • @aleksandraperz5037
    @aleksandraperz5037 9 หลายเดือนก่อน +1

    Katharine, thank you so much for this video

  • @sapandeepsandhu4410
    @sapandeepsandhu4410 6 หลายเดือนก่อน

    SAM File Structure:
    Header Section: Optional, starts with '@', contains metadata about the sequence and the alignments.
    Alignment Section: Contains alignment information with each line representing a read.
    Columns in SAM:
    QNAME: Query template name.
    FLAG: Bitwise flag.
    RNAME: Reference sequence name.
    POS: 1-based leftmost mapping position.
    MAPQ: Mapping quality.
    CIGAR: CIGAR string.
    RNEXT: Reference name of the mate/next read.
    PNEXT: Position of the mate/next read.
    TLEN: Observed template length.
    SEQ: Segment sequence.
    QUAL: ASCII of Phred-scaled base quality+33.

  • @sapandeepsandhu4410
    @sapandeepsandhu4410 6 หลายเดือนก่อน

    .FASTQ (Raw Sequence Data)
    FASTQ is a text-based format for storing both nucleotide sequences and their corresponding quality scores. It is widely used in high-throughput sequencing.
    File Structure:
    Header Line: Starts with '@' followed by a sequence identifier.
    Sequence Line: Contains the nucleotide sequence.
    Plus Line: Starts with a '+' and may be followed by the same sequence identifier.
    Quality Line: Contains quality scores for each nucleotide in the sequence, encoded as ASCII characters

  • @md.mohiuddinmasum3632
    @md.mohiuddinmasum3632 2 ปีที่แล้ว

    Simple and amazing explanation.
    This video deserves more views.

  • @doctorkash0792
    @doctorkash0792 6 หลายเดือนก่อน

    Amazing explanation, really cleared up many things just by watching, thanks a ton and keep up the good work:)

  • @humarafique3093
    @humarafique3093 10 หลายเดือนก่อน

    Superb👏

  • @NM-tx7zm
    @NM-tx7zm 7 หลายเดือนก่อน

    This was excellently done and easy to follow! Thank you!

  • @deepap1307
    @deepap1307 6 หลายเดือนก่อน

    Thank you for the clear explanations of basics.

  • @TimoHromadka
    @TimoHromadka ปีที่แล้ว

    Great video, helped me disambiguate many concepts!

  • @kaoulkae
    @kaoulkae 2 ปีที่แล้ว +1

    This was very helpful and very well explained. You are talented 🙂

  • @secondeye3927
    @secondeye3927 2 หลายเดือนก่อน

    thank you very much. this is so helpful and very clear to understand easily

  • @MrFilu13
    @MrFilu13 2 หลายเดือนก่อน

    Good... 👍 Nicely explain ed

  • @mst63th
    @mst63th 2 ปีที่แล้ว

    Thanks, you make it easy to understand. Keep going.

  • @europhile2658
    @europhile2658 3 หลายเดือนก่อน

    excellent description!

  • @shawnmcmurtrey8090
    @shawnmcmurtrey8090 2 ปีที่แล้ว

    Very good explanations!! Looking forward to watching more of your videos!

  • @stephenjohnson9733
    @stephenjohnson9733 2 หลายเดือนก่อน

    good explanation thanks

  • @iamadityavaishy
    @iamadityavaishy 2 ปีที่แล้ว

    Thank you so much. You explained all this so easily 🤗🤩

  • @sanakhawer693
    @sanakhawer693 2 ปีที่แล้ว

    super helpful thank you so much.... please do a video on how to use different softwares

    • @KatharineME
      @KatharineME  ปีที่แล้ว

      Hi! I have been working on some content for certain softwares, what software did you have in mind?

  • @KristinaBecanovic
    @KristinaBecanovic 10 หลายเดือนก่อน

    thanks brilliant- very helpful!

  • @carlloeber
    @carlloeber ปีที่แล้ว

    You are amazing..

  • @oksana03fel
    @oksana03fel 2 ปีที่แล้ว

    Very clear video. Thank you.
    Katherine, could you please explain how to convert .fastq files to .vsf. Thank you

  • @kikiarev
    @kikiarev 2 ปีที่แล้ว

    Thank you for the explanation! It's really confusing at first glance!

  • @sinaisbitt
    @sinaisbitt ปีที่แล้ว +1

    Some great explanations in your videos. I'm really curious as to what we can do with the data once we get it. Right at the end of this video, you mentioned a video that would explain some of this. Do you still plan to make this?

    • @KatharineME
      @KatharineME  ปีที่แล้ว

      Hi Simon! Yes I think what to do with the data is a question on everyone's mind who has had a DNA test. I do plan to make that video still. Stay tuned! If you want help in the short term, Guardiome does private custom DNA Analysis: www.guardiome.com/custom-dna-analysis.

  • @gerardmingarro6788
    @gerardmingarro6788 ปีที่แล้ว

    excellent video!

  • @wakeup9199
    @wakeup9199 5 หลายเดือนก่อน

    Well done, bt still i have doubt!!! So if uploated vcf file in yfull and after that i upload da bam wht is da advantages??

  • @franciscoromogaray3076
    @franciscoromogaray3076 10 หลายเดือนก่อน

    Really clear, thanks!

  • @LappingMaster
    @LappingMaster 11 หลายเดือนก่อน

    GREAT VIDEO!!!!!!!!!

  • @spicesmiles
    @spicesmiles 2 ปีที่แล้ว

    Amazing! Succinct! Thank you!!!!

  • @eduardofernandezdelpeloso8663
    @eduardofernandezdelpeloso8663 ปีที่แล้ว

    Nice video!
    do you know any software tool
    I can use to compare the results of full genome
    sequencing from two different companies?
    I have bought tests from Dante and Nebula, and once
    I get the results I would like to be able to compare them
    and do some statistical analysis of the differences.

  • @mohamedesmailelsalahaty6050
    @mohamedesmailelsalahaty6050 2 ปีที่แล้ว

    Great go on

  • @e3.s.nro2tan75
    @e3.s.nro2tan75 ปีที่แล้ว

    30 times coverage or 100 times coverage is better? Which is better on accuracy? Is 100x an over do or it is necessary to reduce the error margin?

    • @KatharineME
      @KatharineME  ปีที่แล้ว

      Illumina sequencing does ~89% of base calls above Q30 (99.9% accurate). 30X means having ~30 base calls for each nucleotide.
      So 30X is usually all you need. 100X maybe used when high variation is expected, like in a tumor.

  • @arioche
    @arioche 3 หลายเดือนก่อน

    great

  • @جزائريهوآفتخر-ص9ث
    @جزائريهوآفتخر-ص9ث ปีที่แล้ว

    Thank you for your information, I have a question : in exam they ask always what's the difference between FastQ and Bam file, what is the best short answer for this question?

    • @hatchet646
      @hatchet646 ปีที่แล้ว +1

      bam is aligned to the reference genome, fastq is not.

    • @KatharineME
      @KatharineME  ปีที่แล้ว

      I would agree with that. The fastq file contains unordered reads. The bam file contains the same reads plus the location each maps to in the reference genome.

  • @lolisimon2933
    @lolisimon2933 ปีที่แล้ว

    Awesome video
    But Im not too sure about your explanation of genome coverage
    Your explanation for it sounded more like read depth

    • @KatharineME
      @KatharineME  ปีที่แล้ว

      Right, I see your point. There are basically two concepts at hand which are both important for variant calling: one is percent of the genome that was sequenced, and the other is the number reads with a base call at a given nucleotide. For 30X depth sequencing, we want about 30 reads covering each nucleotide.

  • @mubinpshtiwan2006
    @mubinpshtiwan2006 26 วันที่ผ่านมา

    It’s so helpful thanks, but the music is not necessary

  • @saud319
    @saud319 2 ปีที่แล้ว

    So the company I tested with gave me these files but none of them is transferable to the famous ancestry data bases. Is there a way to convert them?

    • @chibi171
      @chibi171 2 ปีที่แล้ว

      DNAgenics can convert whole genome files if that's what you mean. Into a RAW data file similar to 23andme and AncestryDNA etc. Which will allow you to upload your new results to third party sites.

  • @dpchand
    @dpchand 2 ปีที่แล้ว

    Awesome explanation... can you please tell how vcf file will look like if the segment from mother and father both have different nucleotide from that of reference?

  • @RayY-r4j
    @RayY-r4j 7 หลายเดือนก่อน +1

    FASTQ data need trimming.

  • @markcuello5
    @markcuello5 ปีที่แล้ว

    HELP

    • @KatharineME
      @KatharineME  ปีที่แล้ว

      Any question is particular I can help with?