SAM File Structure: Header Section: Optional, starts with '@', contains metadata about the sequence and the alignments. Alignment Section: Contains alignment information with each line representing a read. Columns in SAM: QNAME: Query template name. FLAG: Bitwise flag. RNAME: Reference sequence name. POS: 1-based leftmost mapping position. MAPQ: Mapping quality. CIGAR: CIGAR string. RNEXT: Reference name of the mate/next read. PNEXT: Position of the mate/next read. TLEN: Observed template length. SEQ: Segment sequence. QUAL: ASCII of Phred-scaled base quality+33.
.FASTQ (Raw Sequence Data) FASTQ is a text-based format for storing both nucleotide sequences and their corresponding quality scores. It is widely used in high-throughput sequencing. File Structure: Header Line: Starts with '@' followed by a sequence identifier. Sequence Line: Contains the nucleotide sequence. Plus Line: Starts with a '+' and may be followed by the same sequence identifier. Quality Line: Contains quality scores for each nucleotide in the sequence, encoded as ASCII characters
Some great explanations in your videos. I'm really curious as to what we can do with the data once we get it. Right at the end of this video, you mentioned a video that would explain some of this. Do you still plan to make this?
Hi Simon! Yes I think what to do with the data is a question on everyone's mind who has had a DNA test. I do plan to make that video still. Stay tuned! If you want help in the short term, Guardiome does private custom DNA Analysis: www.guardiome.com/custom-dna-analysis.
Nice video! do you know any software tool I can use to compare the results of full genome sequencing from two different companies? I have bought tests from Dante and Nebula, and once I get the results I would like to be able to compare them and do some statistical analysis of the differences.
Illumina sequencing does ~89% of base calls above Q30 (99.9% accurate). 30X means having ~30 base calls for each nucleotide. So 30X is usually all you need. 100X maybe used when high variation is expected, like in a tumor.
Thank you for your information, I have a question : in exam they ask always what's the difference between FastQ and Bam file, what is the best short answer for this question?
I would agree with that. The fastq file contains unordered reads. The bam file contains the same reads plus the location each maps to in the reference genome.
Right, I see your point. There are basically two concepts at hand which are both important for variant calling: one is percent of the genome that was sequenced, and the other is the number reads with a base call at a given nucleotide. For 30X depth sequencing, we want about 30 reads covering each nucleotide.
DNAgenics can convert whole genome files if that's what you mean. Into a RAW data file similar to 23andme and AncestryDNA etc. Which will allow you to upload your new results to third party sites.
Awesome explanation... can you please tell how vcf file will look like if the segment from mother and father both have different nucleotide from that of reference?
The CRAM file format is simply a newer and more compressed version of the BAM file format, for anyone who was wondering that :)
Could you also do a video for the SLAM, JAM, and THANK YOU MA'AM file formats?
@@programmer5350 Are you already familiar with the WHAM-BAM file formats ?
No one can ever explain this better, love from Australia!
Awesome high quality bioinformatics video! We need more of these :)
Thank you so much Katharine! you saved a biotech eng. student from Mexico! 🇲🇽
Katharine, thank you so much for this video
SAM File Structure:
Header Section: Optional, starts with '@', contains metadata about the sequence and the alignments.
Alignment Section: Contains alignment information with each line representing a read.
Columns in SAM:
QNAME: Query template name.
FLAG: Bitwise flag.
RNAME: Reference sequence name.
POS: 1-based leftmost mapping position.
MAPQ: Mapping quality.
CIGAR: CIGAR string.
RNEXT: Reference name of the mate/next read.
PNEXT: Position of the mate/next read.
TLEN: Observed template length.
SEQ: Segment sequence.
QUAL: ASCII of Phred-scaled base quality+33.
.FASTQ (Raw Sequence Data)
FASTQ is a text-based format for storing both nucleotide sequences and their corresponding quality scores. It is widely used in high-throughput sequencing.
File Structure:
Header Line: Starts with '@' followed by a sequence identifier.
Sequence Line: Contains the nucleotide sequence.
Plus Line: Starts with a '+' and may be followed by the same sequence identifier.
Quality Line: Contains quality scores for each nucleotide in the sequence, encoded as ASCII characters
Simple and amazing explanation.
This video deserves more views.
Amazing explanation, really cleared up many things just by watching, thanks a ton and keep up the good work:)
Superb👏
This was excellently done and easy to follow! Thank you!
Thank you for the clear explanations of basics.
Great video, helped me disambiguate many concepts!
Glad it helped!
This was very helpful and very well explained. You are talented 🙂
thank you very much. this is so helpful and very clear to understand easily
Good... 👍 Nicely explain ed
Thanks, you make it easy to understand. Keep going.
excellent description!
Very good explanations!! Looking forward to watching more of your videos!
good explanation thanks
Thank you so much. You explained all this so easily 🤗🤩
super helpful thank you so much.... please do a video on how to use different softwares
Hi! I have been working on some content for certain softwares, what software did you have in mind?
thanks brilliant- very helpful!
You are amazing..
Very clear video. Thank you.
Katherine, could you please explain how to convert .fastq files to .vsf. Thank you
Thank you for the explanation! It's really confusing at first glance!
Some great explanations in your videos. I'm really curious as to what we can do with the data once we get it. Right at the end of this video, you mentioned a video that would explain some of this. Do you still plan to make this?
Hi Simon! Yes I think what to do with the data is a question on everyone's mind who has had a DNA test. I do plan to make that video still. Stay tuned! If you want help in the short term, Guardiome does private custom DNA Analysis: www.guardiome.com/custom-dna-analysis.
excellent video!
Well done, bt still i have doubt!!! So if uploated vcf file in yfull and after that i upload da bam wht is da advantages??
Really clear, thanks!
GREAT VIDEO!!!!!!!!!
Amazing! Succinct! Thank you!!!!
Nice video!
do you know any software tool
I can use to compare the results of full genome
sequencing from two different companies?
I have bought tests from Dante and Nebula, and once
I get the results I would like to be able to compare them
and do some statistical analysis of the differences.
Great go on
30 times coverage or 100 times coverage is better? Which is better on accuracy? Is 100x an over do or it is necessary to reduce the error margin?
Illumina sequencing does ~89% of base calls above Q30 (99.9% accurate). 30X means having ~30 base calls for each nucleotide.
So 30X is usually all you need. 100X maybe used when high variation is expected, like in a tumor.
great
Thank you for your information, I have a question : in exam they ask always what's the difference between FastQ and Bam file, what is the best short answer for this question?
bam is aligned to the reference genome, fastq is not.
I would agree with that. The fastq file contains unordered reads. The bam file contains the same reads plus the location each maps to in the reference genome.
Awesome video
But Im not too sure about your explanation of genome coverage
Your explanation for it sounded more like read depth
Right, I see your point. There are basically two concepts at hand which are both important for variant calling: one is percent of the genome that was sequenced, and the other is the number reads with a base call at a given nucleotide. For 30X depth sequencing, we want about 30 reads covering each nucleotide.
It’s so helpful thanks, but the music is not necessary
So the company I tested with gave me these files but none of them is transferable to the famous ancestry data bases. Is there a way to convert them?
DNAgenics can convert whole genome files if that's what you mean. Into a RAW data file similar to 23andme and AncestryDNA etc. Which will allow you to upload your new results to third party sites.
Awesome explanation... can you please tell how vcf file will look like if the segment from mother and father both have different nucleotide from that of reference?
FASTQ data need trimming.
HELP
Any question is particular I can help with?