FASTQ, BAM, and VCF file formats easily explained - A must watch if you have had a DNA test

Katharine ME

มุมมอง 15 764

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 22 ต.ค. 2024

ความคิดเห็น • 54

@zainabumarabdullahi9446 7 หลายเดือนก่อน
No one can ever explain this better, love from Australia!
@programmer5350 2 ปีที่แล้ว ⁺⁷
Awesome high quality bioinformatics video! We need more of these :)
@aleksandraperz5037 8 หลายเดือนก่อน
Katharine, thank you so much for this video
@MrFilu13 หลายเดือนก่อน
Good... 👍 Nicely explain ed
@secondeye3927 หลายเดือนก่อน
thank you very much. this is so helpful and very clear to understand easily
@KatharineME 2 ปีที่แล้ว ⁺⁶
The CRAM file format is simply a newer and more compressed version of the BAM file format, for anyone who was wondering that :)
@programmer5350 2 ปีที่แล้ว ⁺¹
Could you also do a video for the SLAM, JAM, and THANK YOU MA'AM file formats?
@cristianm7097 2 ปีที่แล้ว ⁺¹
@@programmer5350 Are you already familiar with the WHAM-BAM file formats ?
@miguelarellano5260 ปีที่แล้ว
Thank you so much Katharine! you saved a biotech eng. student from Mexico! 🇲🇽
@doctorkash0792 5 หลายเดือนก่อน
Amazing explanation, really cleared up many things just by watching, thanks a ton and keep up the good work:)
@NM-tx7zm 5 หลายเดือนก่อน
This was excellently done and easy to follow! Thank you!
@deepap1307 5 หลายเดือนก่อน
Thank you for the clear explanations of basics.
@md.mohiuddinmasum3632 2 ปีที่แล้ว
Simple and amazing explanation.
This video deserves more views.
@europhile2658 2 หลายเดือนก่อน
excellent description!
@stephenjohnson9733 หลายเดือนก่อน
good explanation thanks
@humarafique3093 8 หลายเดือนก่อน
Superb👏
@TimoHromadka ปีที่แล้ว
Great video, helped me disambiguate many concepts!
@KatharineME ปีที่แล้ว
Glad it helped!
@sapandeepsandhu4410 4 หลายเดือนก่อน
SAM File Structure:
Header Section: Optional, starts with '@', contains metadata about the sequence and the alignments.
Alignment Section: Contains alignment information with each line representing a read.
Columns in SAM:
QNAME: Query template name.
FLAG: Bitwise flag.
RNAME: Reference sequence name.
POS: 1-based leftmost mapping position.
MAPQ: Mapping quality.
CIGAR: CIGAR string.
RNEXT: Reference name of the mate/next read.
PNEXT: Position of the mate/next read.
TLEN: Observed template length.
SEQ: Segment sequence.
QUAL: ASCII of Phred-scaled base quality+33.
@kaoulkae 2 ปีที่แล้ว ⁺¹
This was very helpful and very well explained. You are talented 🙂
@shawnmcmurtrey8090 2 ปีที่แล้ว
Very good explanations!! Looking forward to watching more of your videos!
@mst63th 2 ปีที่แล้ว
Thanks, you make it easy to understand. Keep going.
@sapandeepsandhu4410 4 หลายเดือนก่อน
.FASTQ (Raw Sequence Data)
FASTQ is a text-based format for storing both nucleotide sequences and their corresponding quality scores. It is widely used in high-throughput sequencing.
File Structure:
Header Line: Starts with '@' followed by a sequence identifier.
Sequence Line: Contains the nucleotide sequence.
Plus Line: Starts with a '+' and may be followed by the same sequence identifier.
Quality Line: Contains quality scores for each nucleotide in the sequence, encoded as ASCII characters
@iamadityavaishy 2 ปีที่แล้ว
Thank you so much. You explained all this so easily 🤗🤩
@KristinaBecanovic 8 หลายเดือนก่อน
thanks brilliant- very helpful!
@carlloeber ปีที่แล้ว
You are amazing..
@franciscoromogaray3076 9 หลายเดือนก่อน
Really clear, thanks!
@sanakhawer693 2 ปีที่แล้ว
super helpful thank you so much.... please do a video on how to use different softwares
@KatharineME ปีที่แล้ว
Hi! I have been working on some content for certain softwares, what software did you have in mind?
@sinaisbitt ปีที่แล้ว ⁺¹
Some great explanations in your videos. I'm really curious as to what we can do with the data once we get it. Right at the end of this video, you mentioned a video that would explain some of this. Do you still plan to make this?
@KatharineME ปีที่แล้ว
Hi Simon! Yes I think what to do with the data is a question on everyone's mind who has had a DNA test. I do plan to make that video still. Stay tuned! If you want help in the short term, Guardiome does private custom DNA Analysis: www.guardiome.com/custom-dna-analysis.
@gerardmingarro6788 ปีที่แล้ว
excellent video!
@wakeup9199 4 หลายเดือนก่อน
Well done, bt still i have doubt!!! So if uploated vcf file in yfull and after that i upload da bam wht is da advantages??
@arioche หลายเดือนก่อน
great
@kikiarev 2 ปีที่แล้ว
Thank you for the explanation! It's really confusing at first glance!
@LappingMaster 9 หลายเดือนก่อน
GREAT VIDEO!!!!!!!!!
@oksana03fel 2 ปีที่แล้ว
Very clear video. Thank you.
Katherine, could you please explain how to convert .fastq files to .vsf. Thank you
@eduardofernandezdelpeloso8663 ปีที่แล้ว
Nice video!
do you know any software tool
I can use to compare the results of full genome
sequencing from two different companies?
I have bought tests from Dante and Nebula, and once
I get the results I would like to be able to compare them
and do some statistical analysis of the differences.
@mohamedesmailelsalahaty6050 2 ปีที่แล้ว
Great go on
@spicesmiles ปีที่แล้ว
Amazing! Succinct! Thank you!!!!
@جزائريهوآفتخر-ص9ث ปีที่แล้ว
Thank you for your information, I have a question : in exam they ask always what's the difference between FastQ and Bam file, what is the best short answer for this question?
@hatchet646 ปีที่แล้ว ⁺¹
bam is aligned to the reference genome, fastq is not.
@KatharineME ปีที่แล้ว
I would agree with that. The fastq file contains unordered reads. The bam file contains the same reads plus the location each maps to in the reference genome.
@e3.s.nro2tan75 ปีที่แล้ว
30 times coverage or 100 times coverage is better? Which is better on accuracy? Is 100x an over do or it is necessary to reduce the error margin?
@KatharineME ปีที่แล้ว
Illumina sequencing does ~89% of base calls above Q30 (99.9% accurate). 30X means having ~30 base calls for each nucleotide.
So 30X is usually all you need. 100X maybe used when high variation is expected, like in a tumor.
@lolisimon2933 ปีที่แล้ว
Awesome video
But Im not too sure about your explanation of genome coverage
Your explanation for it sounded more like read depth
@KatharineME ปีที่แล้ว
Right, I see your point. There are basically two concepts at hand which are both important for variant calling: one is percent of the genome that was sequenced, and the other is the number reads with a base call at a given nucleotide. For 30X depth sequencing, we want about 30 reads covering each nucleotide.
@saud319 2 ปีที่แล้ว
So the company I tested with gave me these files but none of them is transferable to the famous ancestry data bases. Is there a way to convert them?
@chibi171 2 ปีที่แล้ว
DNAgenics can convert whole genome files if that's what you mean. Into a RAW data file similar to 23andme and AncestryDNA etc. Which will allow you to upload your new results to third party sites.
@dpchand 2 ปีที่แล้ว
Awesome explanation... can you please tell how vcf file will look like if the segment from mother and father both have different nucleotide from that of reference?
@RayY-r4j 6 หลายเดือนก่อน ⁺¹
FASTQ data need trimming.
@markcuello5 ปีที่แล้ว
HELP
@KatharineME ปีที่แล้ว
Any question is particular I can help with?

ต่อไป

เล่นอัตโนมัติ

Which DNA test is best? Whole Genome Sequencing, Whole Exome Sequencing, and Genotyping - EXPLAINED