This is a really inspring video. Previously I had good sampling efforts which sampling depth did not bother me. Now I got the samples which have a very low sampling depth. Some samples even below 100. Many are below 1000. I have to make a comprimise decision for this analysis. By the way, I am working with soil samples. Thanks a lot.
Thanks for watching! I think people get too worked up about the number of reads. More important is the number of samples. It's a tradeoff, but I always lean towards having more samples
Thank you, I enjoyed a lot studying from your videos. It is very detail and the thing that I am very like about your videos is you actually mention on what we, as the researcher think about the data we are having. I have learnt a lot. Thanks! And also, the explanations that you give as you type the codes are really helpful.
Wow - ty Pat! I will write probably a longer e-mail regarding this topic. I held back because I knew you would talk about it. People following this channel can become SOOOOOOOOOOO much better in coding and biology. I visited 3 Unis in Germany and I swear no one except my PI right now did teach me as much as you 😅🤣 P.S. the Mexican bots are hitting and boosting your TH-cam algorithm :D
Thanks a lot for your videos, they are amazing, you can explain very well and address exactly the questions which were popping up as soon as I started working on my actual data!
Thank you for sharing the video, Patrick. I found it really interesting and helpful. Your codes are so clean! I recently read your 2024 paper in mSphere which discussed rarefaction and touched on Good's coverage. The paper left me wondering about coverage thresholds. For instance, is there a threshold of, say, >85% that indicates a reliable capture of community diversity/composition?
@@Riffomonas Thanks for the fast response Patrick! This kind of more direct interaction with researchers is so nice. Well, at the end of the day the only thing that can be done is more sequencing, right? Hahah
Thank you!! The video is great, i have learned a lot from it. I have one question here. I got very imbalanced sequenced data, such as sample 1 had 300k sequences while sample 2 had only 127 sequences. I want to rarefy the OTU to 5000 sequences so i used the same method to check the coverage of my data. However, i found the coverage is not good because several coverages of some samples are below 90%, but the seqs of the samples are about 20k-30k. How do i deal with such problems?
Would love to hear your thoughts on relative abundance versus absolute count. I’m working with 16S from swabs, so we can’t normalize starting material. I’ve had people ask about absolute values but I can’t really figure out a way to do that since each sample started with different amounts and aren’t consistent! Thanks!
I think you’d need a spike in control to back out the abundance. Regardless, I think that if you want absolute abundance you would be better off using qPCR for the specific populations you are interested in
This was super helpful! Thanks a lot. One quick question: Say you have a sample with a low number of sequences, but high good's coverage. Can you "trust" this sample? Or should there be a minimum sequencing depth you still need to decide on?
I'm also doing human gut microbiome study and mine needs 100,000 reads. However, the rarefaction curve even looks worse when I rarefied to 15,000 reads. Confused...
This is a really inspring video. Previously I had good sampling efforts which sampling depth did not bother me. Now I got the samples which have a very low sampling depth. Some samples even below 100. Many are below 1000. I have to make a comprimise decision for this analysis. By the way, I am working with soil samples. Thanks a lot.
Thanks for watching! I think people get too worked up about the number of reads. More important is the number of samples. It's a tradeoff, but I always lean towards having more samples
It's just great to find such valuable information on TH-cam, thanks a lot Pat.
It’s really my pleasure. Thanks! 🤓
Thank you, I enjoyed a lot studying from your videos. It is very detail and the thing that I am very like about your videos is you actually mention on what we, as the researcher think about the data we are having. I have learnt a lot. Thanks! And also, the explanations that you give as you type the codes are really helpful.
Thanks for watching Lilian! 🤓
Wow - ty Pat! I will write probably a longer e-mail regarding this topic. I held back because I knew you would talk about it.
People following this channel can become SOOOOOOOOOOO much better in coding and biology.
I visited 3 Unis in Germany and I swear no one except my PI right now did teach me as much as you 😅🤣
P.S. the Mexican bots are hitting and boosting your TH-cam algorithm :D
Ha! Thanks so much for watching and being such a loyal viewer 🤓
Your videos are awesome. Clear explanations, excellent coding.
Thanks Brant! Glad to have you watching 🤓
Thank you, thank you! I have been thinking about this topic for a couple of weeks now!
Wonderful- thanks for watching!🤓
Thanks a lot for your videos, they are amazing, you can explain very well and address exactly the questions which were popping up as soon as I started working on my actual data!
Wonderful! Thanks for watching 🤓
great video. Thank you for this as this come at the point when i am struggling with determining that threshold.
Fantastic- thanks for watching!🤓
@@Riffomonas can you please help with the script for calculating Good's coverage? I will like to apply the same to my data.
Thank you for sharing the video, Patrick. I found it really interesting and helpful. Your codes are so clean!
I recently read your 2024 paper in mSphere which discussed rarefaction and touched on Good's coverage. The paper left me wondering about coverage thresholds. For instance, is there a threshold of, say, >85% that indicates a reliable capture of community diversity/composition?
Eh, I don't think it really matters. The deeper you go, the more resolution you'll be able to detect
@@Riffomonas Thanks for the fast response Patrick! This kind of more direct interaction with researchers is so nice.
Well, at the end of the day the only thing that can be done is more sequencing, right? Hahah
thanks for the explanation, if I have few reads but it makes sense because a treatement, even the good's coverage is a good estimator?
Thank you!! The video is great, i have learned a lot from it. I have one question here. I got very imbalanced sequenced data, such as sample 1 had 300k sequences while sample 2 had only 127 sequences. I want to rarefy the OTU to 5000 sequences so i used the same method to check the coverage of my data. However, i found the coverage is not good because several coverages of some samples are below 90%, but the seqs of the samples are about 20k-30k. How do i deal with such problems?
I don't worry about the coverage. Use the same rarefaction depth for everything and then you can compare things on the same basis.
Thank you, this is so helpful :)
my pleasure! thanks for watching 🤓
Would love to hear your thoughts on relative abundance versus absolute count. I’m working with 16S from swabs, so we can’t normalize starting material. I’ve had people ask about absolute values but I can’t really figure out a way to do that since each sample started with different amounts and aren’t consistent! Thanks!
I think you’d need a spike in control to back out the abundance. Regardless, I think that if you want absolute abundance you would be better off using qPCR for the specific populations you are interested in
@@Riffomonas thanks for the reply!
This was super helpful! Thanks a lot. One quick question: Say you have a sample with a low number of sequences, but high good's coverage. Can you "trust" this sample? Or should there be a minimum sequencing depth you still need to decide on?
Always rarefy to the smallest sample size. I use the goods to tell reviewers to back off if I have a low sequencing depth
I'm working for microbiome testing company. Our data show that we need more than 25,000 good quanlity reads on human gut microbiome...
I'm also doing human gut microbiome study and mine needs 100,000 reads. However, the rarefaction curve even looks worse when I rarefied to 15,000 reads. Confused...
You need that many reads to do what? If you’re looking for a super rare racing perhaps, but I find that hard to believe for typical analyses