Having followed this debate in the literature for years now, I really appreciate your approach to demonstrating the importance of rarefaction. I remember being almost convinced to give it up after reading the Amy Willis paper.
Thank you so much for putting together this set of videos, they are very clear and well constructed! I've been working on calculating alpha diversity metrics as well as chao estimators for a long-term fisheries-independent survey dataset. The main challenge I'm having is there are a ton of zero abundances and low counts of observed taxa, S_obs (e.g. in a data set with 27K+ independent samples, the mean and median S_obs are just 7.16 and 7, respectively). I'm hacking at it now, but any tips for cleaning up this dataset to get these estimators to run? I'm assuming I need to figure out how to remove samples with
Hi there - I don't remove any rare taxa. That would pretty much kill things like Chao1. I'd really only calculate Sobs and Shannon/Inverse Simpson for each sample by rarefaction.
Such a great video once again, thank you! I was wondering, when doing breakaway for each sample, do you need to remove the OTUs/taxa with 0 counts in order for it to work. I am getting an error when I use breakaway (Error in if (diversity >= 0) { : missing value where TRUE/FALSE needed)
Thanks! It’s supposed to get a vector that contains the frequency of frequencies, not raw OTU counts. If the community is poorly sampled or super diverse then it might not converge
Don't you think that estimating richness is biased? If i am not mistaken, In the Chao1 formula the information on the zero-observation class comes almost entirely from the number of things you saw 1 time, and 2 times, which I think it is a problem because some errors/chimeras/artefacts/contaminants get interpreted as real variants and those show up largely in the singleton-doubleton class and can almost entirely drive richness estimates to nonsensical values. What are your thoughts on that? Should we still estimate richness regardless? I love the channel !
I think the singletons and doubletons are largely meaningful (see www.biorxiv.org/content/10.1101/2020.12.11.422279v1). However, as i show in this episode, estimates of richness using Chao1 or breakaway are pretty horrendous. I don't use either in my work.
Having followed this debate in the literature for years now, I really appreciate your approach to demonstrating the importance of rarefaction. I remember being almost convinced to give it up after reading the Amy Willis paper.
Thanks Michael! Let me know if I’m missing anything but like you I think rarefaction is really the way to go
Thank you!! very clear explanation!!
Thank you so much for putting together this set of videos, they are very clear and well constructed!
I've been working on calculating alpha diversity metrics as well as chao estimators for a long-term fisheries-independent survey dataset. The main challenge I'm having is there are a ton of zero abundances and low counts of observed taxa, S_obs (e.g. in a data set with 27K+ independent samples, the mean and median S_obs are just 7.16 and 7, respectively). I'm hacking at it now, but any tips for cleaning up this dataset to get these estimators to run? I'm assuming I need to figure out how to remove samples with
Hi there - I don't remove any rare taxa. That would pretty much kill things like Chao1. I'd really only calculate Sobs and Shannon/Inverse Simpson for each sample by rarefaction.
Thank you so much
My pleasure, glad you enjoyed it! 🤓
Such a great video once again, thank you!
I was wondering, when doing breakaway for each sample, do you need to remove the OTUs/taxa with 0 counts in order for it to work. I am getting an error when I use breakaway (Error in if (diversity >= 0) { : missing value where TRUE/FALSE needed)
Thanks! It’s supposed to get a vector that contains the frequency of frequencies, not raw OTU counts. If the community is poorly sampled or super diverse then it might not converge
Don't you think that estimating richness is biased? If i am not mistaken, In the Chao1 formula the information on the zero-observation class comes almost entirely from the number of things you saw 1 time, and 2 times, which I think it is a problem because some errors/chimeras/artefacts/contaminants get interpreted as real variants and those show up largely in the singleton-doubleton class and can almost entirely drive richness estimates to nonsensical values. What are your thoughts on that? Should we still estimate richness regardless?
I love the channel !
I think the singletons and doubletons are largely meaningful (see www.biorxiv.org/content/10.1101/2020.12.11.422279v1). However, as i show in this episode, estimates of richness using Chao1 or breakaway are pretty horrendous. I don't use either in my work.
@@Riffomonas Thank you! Which α-diversity metrics would you recommend using?
@@SeguC48 I use the observed richness, shannon, and/or inverse simpson
How to rarefy data in such a way that we can visualize alpha and beta diversity using vegan?
Check out the earlier episodes where I use avgdist
@@Riffomonas avgdist is for beta diversity. For alpha diversity?
It s funny I just read Amy`s paper :D
first 👈
Ha! Well done 🤓