I'd be interested in learning how to plot >1 species accumulation curve in ggplot2, where the Y axis is species richness and the X axis is total area sampled (most of the tutorials that I've seen show the X axis as number of individuals or number of sites)
I really enjoy the content in your videos and think that. you have. some really great workflows that you have incorporated that show a high level of understanding that are also very generalizable for other people. I see your point about doing your analysis using a real-world problem incorporating the use of C++ etc, but would comment that it is hard to parse a specific video without watching all of them. The current style is great if you want to see the entire analysis, but can be difficult if you are just wanting to learn about a specific problem or how to incorporate a specific function into solving a problem. For example I recently watched your expand.grid video as well as your nested dataframe videos to get a better sense for incorporating those packages into my own work ( medicine, using ICD codes to predict in-hospital outcomes from a large multiyear dataset). As much as I don't like the general datasets (mtcars, gapminder etc) I think they are useful for modeling workflows because they are pretty consistent in what the inputs and outputs will be . I think this channel has really good content but think that it would be more useful if you either make your videos centered around the type of problem that you want to solve , or the type of package that you are incorporating. Also, in the intro providing a brief description of the type of tibble/dataframe that you are using (column names, types ) for some context of a more generalizable problem would make it much more accessible. I'm a physician so I have taken graduate lvl biochem and micro courses , but even I found the source material density of your problem a little off-putting and I ended up speeding through to the part that was useful for the specific problems I am working with. I can see that the density of the microbiology being off-putting to people who are not from a biological sciences background and who are just looking for more of R tutorial that can be consumed on an as needed basis. At the end of the day, everything is about consistent inputs = consistent outputs, and I ultimately came back to your videos because no other videos else was really addressing the high level utilization that you are. Take what you will from this. I have watched a lot of videos from this channel at this point and am glad that someone is finally filling this content gap. I definitely think the content is top notch so I hope to see this channel grow.
Thanks - I appreciate the input! I feel like there are a number of datasets using the built in datasets. I also want to demonstrate how different tools are integrated to show people how they work together. I’m trying to provide more of a vlog style approach. As we get more into visualization the content becomes more focused on a single thing. Thanks for tuning in
Hi patrick, thanks for your effort in all these great tutorials If possible, can you please make an episode about how to test the association between microbiome composition and a continuous outcome, which can be MiRKAT (Microbiome Regression-Based Analysis Tests) or corncob.
For anyone following this tutorial recently and hitting issues with the adonis output, R now recognises adonis2 instead, so just add the 2 and it should sort the issue
You must be a great professor (human) who supports and uplifting students and peers. Thank you very much for your TH-cam channel. Also one request, Could you do one video on only base R most important functions that can be widely use. Stay safe and blessed.
Hi IOT - Thanks for watching and for your praise. Great minds think alike! I've been creating a top 10 list of base R functions for a future video. Stay tuned :)
@@Riffomonas Excellent ☺️ Looking forward to them. Also again thank you very much for your valuable time for creating and sharing these resources. Stay safe and blessed.
Your vidioes carried me through my own data, comparing the treatment effect of vortioxetine on fecal samples of mice with two different types of diet.. Thank you sooo much !
Very interesting session. I was wondering if you have already done a test on homogeneity of variance in betadisper() function? I mean how do you rely on these p-values? Maybe they are significant not because of the differences in the centroids between groups but rather due to the dispersion around the centroids?
I haven't seen homogeneity of variance increase Type 1 errors. We published some simulations on this - www.nature.com/articles/ismej20085. Looking at the first column of Table 1, the lack of homogeneity of variance didn't impact the type 1 error. We could certainly run that test - I'll be sure to add it to a future video!
@@Riffomonas Hi both, I had the same question about the need to test for homogeneity of variance first. (excellent demonstration by the way). I am becoming a bit confused on the differences between ADONIS and Betadisper tests, and how from my understanding, if the anova in Betadisper is significant, an ADONIS PERMANOVA would not be suitable as homogeneity is a condition of ADONIS. This is particularly a contention I am running into with my species matrix dataset. Can you provide any more insight on this?
@@11mgarrard In adonis() (PERMANOVA) we compute the group means and test if the estimated differences between groups are consistent with a null hypothesis that all groups have the same mean. This is exactly the same as a ANOVA or t-test. Those methods assume that the groups have the same variance. It has been shown that multivariate methods that rely on dissimilarities (PERMANOVA) especially, and to a lesser extend CCA, are also sensitive to difference variances in the groups. These variances are called dispersions in the multivariate case, but they're just the spread of the observations about their group mean. betadisper() tests the null hypothesis that the dispersions about the group means are equal. Note the difference: PERMANOVA (adonis()) focuses on the differences in the group means while PERMDISP (betadisper()) focuses on the difference in the dispersions of samples around these group means. Pat misspoke in the video by suggesting PERMDISP is a synoym for PERMANOVA/adonis. PERMDISP is for dispersions, just like betadisper(). In practice, if betadisper() says your group dispersions are different, then you can't have confidence that any differences in means you detect using adonis() are truly due to differences in group means. The result in part could be due to the differences in dispersion. Being uncertain that the result you have from adonis() might not be correct isn't a good position to be in, but in practice I would just treat the p values with a big pinch of salt. I'd read Warton et al 2011 (doi.org/10.1111/j.2041-210X.2011.00127.x) for more on this, if you haven't already, as well as the paper Pat mentions,
I was wondering if you could kindly let me know how we can perform an analysis of the Gram +ve and Gram -ve difference in R. Thanks very much for all of the interesting and informative videos.
You'd have to find a way to aggregate the taxa that are Gram + or -. Then you could do something like group_by and summarize to compare their relative abundances. I'm not sure how easy it's going to be to classify taxa by their Gram staining properties since some of them are a bit variable
Is the Adonis package suitable for analyzing two continuous variables, by factor? If so, is there a way to transform a table with columns headings into adonis?
The next step would be to create a biplot. You can do this type of analysis in mothur with the core.axes function. I'll see if I can create an episode around this in the future
Thanks for a nice video. Short question though. Why would you adjust your p-values when they're derived from a re-sampling method (a permutation test)? I was under the impression that this was not necessary. If you read papers such as, doi: 10.1186/1751-0473-3-15, it sounds as though, a permutation test is dealing with the potential for false positive "in its own way" - and that you wouldn't have to correct (additionally) for multiple testing. I am curious to know, what you think of this :) Best, Simon.
Check out this video. Earlier ones in this series generated the ordination. Also geom_jitter and geom_boxplot have similar usage. Alternatives to ordination in R: Visualizing community change relative to a specific point (CC207) th-cam.com/video/jLVKJ_n6Qd0/w-d-xo.html
@@Riffomonas My question is that I have several groups,i can get significant difference of each two groups,however, I don't know which group has a higher beta diversity If I'd like to use a,b,c,d to express their difference.
Hi, thak you for the video. One question: why when i enter the adonis formula, with* between two independent variables, im not getting the interaction between two variables? when i put a / instead of a *, im getting the interaction and the Permanova for the first group only. The results of the two formulas are the same, that weird. it may be related with my data being presence absence? THX
Hmmm, presence absence data might have different syntax. The data in this episode is from a distance matrix that I calculated outside of R. It doesn't quite sound the same as what you've got. Perhaps the presence absence data is structured so that there isn't enough variation in the data to calculate the interaction term?
Hi, Pat! I have a microbiome dataset with people belonging to 3 different groups sampled across 3 time points. I would like to see if there is any difference between the groups for each time point and then do the pairwise analysis for all the groups within the significant time points. Should I run adonis for the time points and then run more adonises for the groups within the time points? My main concern is: should I consider all the p-values as one vector for the adjustment or do I adjust separately? Thank you for sharing your knowledge and sorry for this long question!
Actually I came across the linear mixture modelling approach for dealing with multivariate data with dependencies (like time points) but I didn't understand exactly if this could be a substitute for PERMANOVA in my case.
Thank you for this tutorial, exactly what I need, as I am a beginner. I have a question regarding my data. I have 2 substrats with 4 treatments each and I wanted to see the effect of substrate and treatment on my community composition. So I run adonis on distance_matrix ~ substrat*treatment. All was significant (substrat, treatment., substrat*treatment). However when I run the pairwise I got non significant p values. Could you give me a hint on why I get such a result ? thanks
My pleasure! Glad the videos are helpful. I did another more recent episode on Adonis you might enjoy too. I wonder if the problem is the significant interaction
Hi Pat. This is very useful. Thanks for creating these resources. One question- where does the .dist file come from in the Mothur SOP output? I have 5 different .dist files from going through the Mothur SOP (the really big .dist matrix that I know we're not using here, as well as opti_mcc.thetayc.0.03.lt.ave.dist, opti_mcc.thetayc.0.03.lt.std.dist, opti_mcc.jclass.0.03.lt.std.dist, and opti_mcc.jclass.0.03.lt.ave.dist) and none of them look like the square phylip in this video.
For me a big big deal is reproducibility and openness. Excel is $, doesn’t lend itself to easily adding more data, it’s super transparent in what code is there , and it mixes raw data, code, and results which I prefer to keep separate
Hi Pat! Thank you so much for this useful tutorial. Could you please make the code for pairwise comparison between factors in case of the significant interaction effect? For example between different groups from both factors "disease_stat" and "gender".
Thank you for the great R videos. What do you think about learning Tidyverse after a week in base R? Suppose you are going to teach students who have no programing language background; in this scenario, when would you start them to teach the "tidyverse"? I'm asking this because in the Tidyverse there is a lot of function and sometimes people have trouble memorizing them and give up eventually (and the pipelines of course). An example: ---------------- mpg %>% filter(year% select(cty,hwy)%>% summary() ------------- summary(mpg[mpg$year
I fought against teaching tidyverse for the longest time. Now it’s the foundation for what I teach (see the tutorials at Riffomonas.org). Even your “simple” example has a ton of non intuitive syntax and functions - [, :, $, summary,
What would be the function that performs pairwise comparisons between several groups? In your example there are only three groups (case, diarrhea, nondiarrhea), but let's imagine another example with 20 groups. How can I have pairwise comparisons without manually repeating 20 times the same test?
Hi Michael - the ordination at the end was created in the preceding episodes. You might check out these videos: (CC080) th-cam.com/video/ipy8qZKqiM4/w-d-xo.html, (CC079) th-cam.com/video/Y0GI34S-ZMI/w-d-xo.html, and (CC078) th-cam.com/video/xijDvx-J1jo/w-d-xo.html
Hi Pat, thanks so much for making these videos. I'm a newcomer and so glad I found you. Is there any reason why you don't suggest pairwise comparisons with package pairwiseAdonis? Thanks! Mel
Hey Mel - well … I don’t know pairwiseAdonis 😂. Although all the packages are a great treasure, I find it’s hard to remember all the syntax if I rarely use the package/function. It’s easier for me to remember the basics and then apply those in many situations.
Thank you for the sweet ordination videos, you're helping a brother out! Does it matter that you have a low r2 value for your model even though it's a corresponding low p-value. The model you're building around the 23:00 mark is what I am referring to. Cheers son.
Thanks for watching NWH! Good eye. The p-value is testing whether the R2 is significantly different from zero. So if we have a lot of data and a marginal r2 value, then we can have enough power to find small differences as being significant. Basically, things can be statistically significant, but not biologically significant. I'm not really sure what to make of that tradeoff in this situation.
Thanks for the question - that sounds like a great topic for a future episode. Stay tuned!
2 ปีที่แล้ว
Hi! An argentinian paleontologist here! I hope is not too late to be asking questions here. But first, I want to thank you and congratulate you on the content. It's really good and I'm looking forward to watching the other videos. I ended here trying to understand the output of ADONIS from a script written by my advisor. I also have 3 groups to compare. In the script, he first did the test looking for differences using the three groups and then he did the pairwise comparisons. To do the pairwise comparisons he recalculated the distances between the individuals of the two compared groups but I think you used the same distance matrix calculated with all individuals to do this. The questions are if you recalculated the distances and I didn't see it or my advisor did it differently? Do you have any bibliographic suggestion where I could read more about this? Thanks again! Gerardo A. Lo Valvo
Hey Gerardo! Thanks for tuning in 🤓 For this video I calculated the distances within mothur and then used R to pull out the distances for the pairwise comparison. I don't think there's a reason to recalculate a distance matrix. I'd suggest looking at the References section of the ?adonis documentation for papers about adonis. I like the MJ Anderson papers in there
2 ปีที่แล้ว
@@Riffomonas Thank you very much for your answer!! I'll look there.
Thank you for making wonderful tutorials. could you make some tutorials on Interclass correlation between microbial taxa's and between samples variables?
Hey Erick - hmmm. How different are our p-values? I would expect a little difference because the algorithm uses a random number generator. Their performance can vary by the seed and sometimes by the operating system, but the differences in outcome shouldn’t be that large
Wonderful tutorial as always Pat, thank you so much! I may have missed this in there (admittedly I skipped around a bit) but I'm curious how the adonis function changes when you are using a continuous numeric variable like moisture content or pH instead of categorical like Treatment. Is this a valid use of adonis? And if so how does it establish a "group" in this case?
@@Riffomonas @Robert Jones `adonis()` and `adonis2()` work just fine with continuous data. In the same way that a linear model works just fine with continuous and categorical covariates (this is the general linear model which fused ANOVA and linear regression), with sums of squares being computed for categorical and continuous covariates alike, `adonis()` and `adonis2()` do the same thing. If you're worried you could use `dbrda()` instead but these things are really all the same under the hood.
thanks for this video! May I ask what version of r and the vegan package you are using? I replicated your code exactly but get the following error: Error in `colnames
Hi Miranda - sorry to hear you're running into problems. I'm using R v4.0.5 and vegan 2.5-7. Here's what I had by the end of the episode leading up to the line with the colnames function. Does this help you spot anything? library(tidyverse) library(readxl) library(vegan) permutation
@@Riffomonas Amazing thank you! After some tinkering it seems like it was a technical problem with my machine, not your code. Thanks so much for an awesome tutorial! :)
Hi Sakke - thanks for the question. It appears that adonis really isn't set up to do random effects. However, you might be interested in the answer to this posting for how to approximate a random effect stats.stackexchange.com/questions/350462/can-you-perform-a-permanova-analysis-on-nested-data
Awesome video,thanks a lot for your sharing and time. But could you pleases explain “strata”,when I need to have it. Because for my datasets, when I have strata or not, I will get different results. Secondly, for my significant factor, I only got R2 = 0.22, but R2 of residual is 0.78. If I understood well, my factor can only explain 22%, residual can explain 78%. Could you please confirm my understanding? Also, how to explain residual can explain more than your significant factor? Thanks again !
strata is a grouping level that you want to restrict the permutation test within. In permutation/designs this would be a blocking factor. Say you have repeated observations from a number of individuals; you wouldn't want to swap samples from individual A with any other individual. Hence you would pass strata a factor indicating which individual each sample came from and that way when we permute we only shuffle samples within individuals, never between them. It is much better today to use the permutations argument and pass it a control object to indicate how you want the samples permuted. Check out `?permute::how` for details on this as vegan uses my permute package to do the restricted permutation tests
I am sorry for another comment. I run your code, and everything went smoothly until I do str(all_test). I got the output below, only 3 obs. of 5 variables (instead of 3 obs. of 6 variables in your output). Then I could not run the next steps for pairwise comparison as in your tutorial. Do you have any suggestions? Classes ‘anova.cca’, ‘anova’ and 'data.frame': 3 obs. of 5 variables: $ Df : num 2 335 337 $ SumOfSqs: num 14.1 103.3 117.4 $ R2 : num 0.12 0.88 1 $ F : num 22.9 NA NA $ Pr(>F) : num 0.000999 NA NA
Would you be interested in learning more about using R to do statistical analyses? Any specific questions?
I'd be interested in learning how to plot >1 species accumulation curve in ggplot2, where the Y axis is species richness and the X axis is total area sampled (most of the tutorials that I've seen show the X axis as number of individuals or number of sites)
Thanks, Jinny! I’ll ad it to the list of ideas
I really enjoy the content in your videos and think that. you have. some really great workflows that you have incorporated that show a high level of understanding that are also very generalizable for other people.
I see your point about doing your analysis using a real-world problem incorporating the use of C++ etc, but would comment that it is hard to parse a specific video without watching all of them. The current style is great if you want to see the entire analysis, but can be difficult if you are just wanting to learn about a specific problem or how to incorporate a specific function into solving a problem.
For example I recently watched your expand.grid video as well as your nested dataframe videos to get a better sense for incorporating those packages into my own work ( medicine, using ICD codes to predict in-hospital outcomes from a large multiyear dataset). As much as I don't like the general datasets (mtcars, gapminder etc) I think they are useful for modeling workflows because they are pretty consistent in what the inputs and outputs will be . I think this channel has really good content but think that it would be more useful if you either make your videos centered around the type of problem that you want to solve , or the type of package that you are incorporating.
Also, in the intro providing a brief description of the type of tibble/dataframe that you are using (column names, types ) for some context of a more generalizable problem would make it much more accessible. I'm a physician so I have taken graduate lvl biochem and micro courses , but even I found the source material density of your problem a little off-putting and I ended up speeding through to the part that was useful for the specific problems I am working with. I can see that the density of the microbiology being off-putting to people who are not from a biological sciences background and who are just looking for more of R tutorial that can be consumed on an as needed basis. At the end of the day, everything is about consistent inputs = consistent outputs, and I ultimately came back to your videos because no other videos else was really addressing the high level utilization that you are.
Take what you will from this. I have watched a lot of videos from this channel at this point and am glad that someone is finally filling this content gap. I definitely think the content is top notch so I hope to see this channel grow.
Thanks - I appreciate the input! I feel like there are a number of datasets using the built in datasets. I also want to demonstrate how different tools are integrated to show people how they work together. I’m trying to provide more of a vlog style approach. As we get more into visualization the content becomes more focused on a single thing. Thanks for tuning in
Hi patrick, thanks for your effort in all these great tutorials
If possible, can you please make an episode about how to test the association between microbiome composition and a continuous outcome, which can be MiRKAT (Microbiome Regression-Based Analysis Tests) or corncob.
Your tutorials are really helpful. this is getting a little addictive !!!
Awesome - that's my secret plan!
Got to say, each time me and R fall out, your videos rekindle our love affair!
Fantastic!
For anyone following this tutorial recently and hitting issues with the adonis output, R now recognises adonis2 instead, so just add the 2 and it should sort the issue
You must be a great professor (human) who supports and uplifting students and peers. Thank you very much for your TH-cam channel. Also one request, Could you do one video on only base R most important functions that can be widely use. Stay safe and blessed.
Hi IOT - Thanks for watching and for your praise. Great minds think alike! I've been creating a top 10 list of base R functions for a future video. Stay tuned :)
@@Riffomonas Excellent ☺️ Looking forward to them. Also again thank you very much for your valuable time for creating and sharing these resources. Stay safe and blessed.
Your vidioes carried me through my own data, comparing the treatment effect of vortioxetine on fecal samples of mice with two different types of diet.. Thank you sooo much !
Hi Pat! thanks very much for your videos. I am learning so much, I've even managed to replicate some graphs with my own data.
That is wonderful to hear - great work!
You are a wonderful teacher. Thank you so much!
Thanks Jeffrey!
brilliant- thank you so much from Spain
Wonderful - I appreciate your watching!
Thank you for your interesting tutorial.
My pleasure! Thanks for watching
Thank you!! EXACTLY what I needed and SO clear.
Than’s so awesome!
Fantastic - glad it was helpful!
Congratulations! Do you talk about simper in any video? Thanks
amazing, thank you very much for your tutorial!
My pleasure - thanks for watching!
Thank you for the video, very useful
Thanks John!
Very interesting session. I was wondering if you have already done a test on homogeneity of variance in betadisper() function? I mean how do you rely on these p-values? Maybe they are significant not because of the differences in the centroids between groups but rather due to the dispersion around the centroids?
I haven't seen homogeneity of variance increase Type 1 errors. We published some simulations on this - www.nature.com/articles/ismej20085. Looking at the first column of Table 1, the lack of homogeneity of variance didn't impact the type 1 error. We could certainly run that test - I'll be sure to add it to a future video!
@@Riffomonas Hi both, I had the same question about the need to test for homogeneity of variance first. (excellent demonstration by the way). I am becoming a bit confused on the differences between ADONIS and Betadisper tests, and how from my understanding, if the anova in Betadisper is significant, an ADONIS PERMANOVA would not be suitable as homogeneity is a condition of ADONIS. This is particularly a contention I am running into with my species matrix dataset. Can you provide any more insight on this?
@@11mgarrard In adonis() (PERMANOVA) we compute the group means and test if the estimated differences between groups are consistent with a null hypothesis that all groups have the same mean. This is exactly the same as a ANOVA or t-test. Those methods assume that the groups have the same variance. It has been shown that multivariate methods that rely on dissimilarities (PERMANOVA) especially, and to a lesser extend CCA, are also sensitive to difference variances in the groups. These variances are called dispersions in the multivariate case, but they're just the spread of the observations about their group mean. betadisper() tests the null hypothesis that the dispersions about the group means are equal. Note the difference: PERMANOVA (adonis()) focuses on the differences in the group means while PERMDISP (betadisper()) focuses on the difference in the dispersions of samples around these group means. Pat misspoke in the video by suggesting PERMDISP is a synoym for PERMANOVA/adonis. PERMDISP is for dispersions, just like betadisper().
In practice, if betadisper() says your group dispersions are different, then you can't have confidence that any differences in means you detect using adonis() are truly due to differences in group means. The result in part could be due to the differences in dispersion. Being uncertain that the result you have from adonis() might not be correct isn't a good position to be in, but in practice I would just treat the p values with a big pinch of salt. I'd read Warton et al 2011 (doi.org/10.1111/j.2041-210X.2011.00127.x) for more on this, if you haven't already, as well as the paper Pat mentions,
@@ftboth Thank you very much for this clarification. Not so good news for my personal data, but really helpful nonetheless.
Do you have an example on how to properly report methods and results for analysis with adonis?
You might check out our gut microbes paper where we looked at mouse data. I seem to recall that we wrote something in there
Are we allowed to report differences as significant If significant differences has been explained with very low R2?
Sure - but I’d report the R2. If the effect is so small then I’d probably move on and not discuss it much, if at all
I was wondering if you could kindly let me know how we can perform an analysis of the Gram +ve and Gram -ve difference in R. Thanks very much for all of the interesting and informative videos.
You'd have to find a way to aggregate the taxa that are Gram + or -. Then you could do something like group_by and summarize to compare their relative abundances. I'm not sure how easy it's going to be to classify taxa by their Gram staining properties since some of them are a bit variable
@@Riffomonas Thanks very much for your reply. I will try to figure it out.
Hi Pat, very nice video! How do you/can you add random effects to your model using the adonis function from the vegan package?
I’ll be coming back to this in an upcoming video. Stay tuned!
Would subsetting a dist object yield the same thing as filtering down a data frame then converting to a dist object?
It should…
Is the Adonis package suitable for analyzing two continuous variables, by factor? If so, is there a way to transform a table with columns headings into adonis?
Thanks no it’s for one categorical variable and a distance matrix between the observations
After knowing whether there are any differences, how can we find which variable that drive the difference?
The next step would be to create a biplot. You can do this type of analysis in mothur with the core.axes function. I'll see if I can create an episode around this in the future
Thanks for a nice video. Short question though. Why would you adjust your p-values when they're derived from a re-sampling method (a permutation test)? I was under the impression that this was not necessary. If you read papers such as, doi: 10.1186/1751-0473-3-15, it sounds as though, a permutation test is dealing with the potential for false positive "in its own way" - and that you wouldn't have to correct (additionally) for multiple testing. I am curious to know, what you think of this :) Best, Simon.
Hmmmm, I hadn't seen that. I'm not sure what to think. I'd be interested in seeing the type I error using null distribution data.
Hello, Great tutorial. Would you be so kind to also show how to make beta diversity box plots comparing the three groups, using the same data.
Check out this video. Earlier ones in this series generated the ordination. Also geom_jitter and geom_boxplot have similar usage.
Alternatives to ordination in R: Visualizing community change relative to a specific point (CC207)
th-cam.com/video/jLVKJ_n6Qd0/w-d-xo.html
Is it there a way to see wich variables are responsible for the statistical found difference in the profiles by any chance between the microbiomes?
I’d suggest checking out the adonis2 documentation where you can do pretty sophisticated experimental designs with covariates
The diarrhea bugged me. Fortunately I could watch your nice video on the toilet.
Thanks for your video! I have a question: how can you know which group have a higher beta-diversity based on the pairwise result of permanova?
For the largest variation look at the average distance to the median
@@Riffomonas My question is that I have several groups,i can get significant difference of each two groups,however, I don't know which group has a higher beta diversity If I'd like to use a,b,c,d to express their difference.
Hi, thak you for the video. One question: why when i enter the adonis formula, with* between two independent variables, im not getting the interaction between two variables? when i put a / instead of a *, im getting the interaction and the Permanova for the first group only. The results of the two formulas are the same, that weird. it may be related with my data being presence absence?
THX
Hmmm, presence absence data might have different syntax. The data in this episode is from a distance matrix that I calculated outside of R. It doesn't quite sound the same as what you've got. Perhaps the presence absence data is structured so that there isn't enough variation in the data to calculate the interaction term?
Hi, Pat! I have a microbiome dataset with people belonging to 3 different groups sampled across 3 time points. I would like to see if there is any difference between the groups for each time point and then do the pairwise analysis for all the groups within the significant time points. Should I run adonis for the time points and then run more adonises for the groups within the time points? My main concern is: should I consider all the p-values as one vector for the adjustment or do I adjust separately? Thank you for sharing your knowledge and sorry for this long question!
Actually I came across the linear mixture modelling approach for dealing with multivariate data with dependencies (like time points) but I didn't understand exactly if this could be a substitute for PERMANOVA in my case.
Thank you for this tutorial, exactly what I need, as I am a beginner. I have a question regarding my data. I have 2 substrats with 4 treatments each and I wanted to see the effect of substrate and treatment on my community composition. So I run adonis on distance_matrix ~ substrat*treatment. All was significant (substrat, treatment., substrat*treatment). However when I run the pairwise I got non significant p values. Could you give me a hint on why I get such a result ? thanks
My pleasure! Glad the videos are helpful. I did another more recent episode on Adonis you might enjoy too. I wonder if the problem is the significant interaction
@@Riffomonas thank you For the video and the suggestion. I am going to check that
Hi Pat. This is very useful. Thanks for creating these resources. One question- where does the .dist file come from in the Mothur SOP output? I have 5 different .dist files from going through the Mothur SOP (the really big .dist matrix that I know we're not using here, as well as opti_mcc.thetayc.0.03.lt.ave.dist, opti_mcc.thetayc.0.03.lt.std.dist, opti_mcc.jclass.0.03.lt.std.dist, and opti_mcc.jclass.0.03.lt.ave.dist) and none of them look like the square phylip in this video.
Hey Cameron - It would be one of the ave.dist files that were generated with dist.shared. Easiest to set output=square
@@Riffomonas perfect. Thanks!
Pat,
As a former "hater" do you have any advice on convincing Excel addicts to switch to R?
For me a big big deal is reproducibility and openness. Excel is $, doesn’t lend itself to easily adding more data, it’s super transparent in what code is there , and it mixes raw data, code, and results which I prefer to keep separate
Hi Pat! Thank you so much for this useful tutorial. Could you please make the code for pairwise comparison between factors in case of the significant interaction effect? For example between different groups from both factors "disease_stat" and "gender".
Thank you for the great R videos. What do you think about learning Tidyverse after a week in base R? Suppose you are going to teach students who have no programing language background; in this scenario, when would you start them to teach the "tidyverse"? I'm asking this because in the Tidyverse there is a lot of function and sometimes people have trouble memorizing them and give up eventually (and the pipelines of course).
An example:
----------------
mpg %>%
filter(year%
select(cty,hwy)%>%
summary()
-------------
summary(mpg[mpg$year
I fought against teaching tidyverse for the longest time. Now it’s the foundation for what I teach (see the tutorials at Riffomonas.org). Even your “simple” example has a ton of non intuitive syntax and functions - [, :, $, summary,
What would be the function that performs pairwise comparisons between several groups? In your example there are only three groups (case, diarrhea, nondiarrhea), but let's imagine another example with 20 groups. How can I have pairwise comparisons without manually repeating 20 times the same test?
I'd likely set up some type of loop over all possible pairs of comparisons and then do a correction with p.adjust at the end
Thanks for this video!
My pleasure!
Could you also make a video on how you collated the data to make the plot at the end?
Hi Michael - the ordination at the end was created in the preceding episodes. You might check out these videos: (CC080) th-cam.com/video/ipy8qZKqiM4/w-d-xo.html, (CC079) th-cam.com/video/Y0GI34S-ZMI/w-d-xo.html, and (CC078) th-cam.com/video/xijDvx-J1jo/w-d-xo.html
Hi Pat, thanks so much for making these videos. I'm a newcomer and so glad I found you. Is there any reason why you don't suggest pairwise comparisons with package pairwiseAdonis? Thanks! Mel
Hey Mel - well … I don’t know pairwiseAdonis 😂. Although all the packages are a great treasure, I find it’s hard to remember all the syntax if I rarely use the package/function. It’s easier for me to remember the basics and then apply those in many situations.
Ok, thanks Pat
Gold! Thank you!
You bet! Thanks for watching 🤓
Thank you for the sweet ordination videos, you're helping a brother out! Does it matter that you have a low r2 value for your model even though it's a corresponding low p-value. The model you're building around the 23:00 mark is what I am referring to. Cheers son.
Thanks for watching NWH! Good eye. The p-value is testing whether the R2 is significantly different from zero. So if we have a lot of data and a marginal r2 value, then we can have enough power to find small differences as being significant. Basically, things can be statistically significant, but not biologically significant. I'm not really sure what to make of that tradeoff in this situation.
I am wondering how would you process with continuous variables like age or weight? In Adonis you only implement discrete variables?
Thanks for the question - that sounds like a great topic for a future episode. Stay tuned!
Hi! An argentinian paleontologist here!
I hope is not too late to be asking questions here.
But first, I want to thank you and congratulate you on the content. It's really good and I'm looking forward to watching the other videos.
I ended here trying to understand the output of ADONIS from a script written by my advisor. I also have 3 groups to compare. In the script, he first did the test looking for differences using the three groups and then he did the pairwise comparisons. To do the pairwise comparisons he recalculated the distances between the individuals of the two compared groups but I think you used the same distance matrix calculated with all individuals to do this.
The questions are if you recalculated the distances and I didn't see it or my advisor did it differently? Do you have any bibliographic suggestion where I could read more about this?
Thanks again!
Gerardo A. Lo Valvo
Hey Gerardo! Thanks for tuning in 🤓 For this video I calculated the distances within mothur and then used R to pull out the distances for the pairwise comparison. I don't think there's a reason to recalculate a distance matrix. I'd suggest looking at the References section of the ?adonis documentation for papers about adonis. I like the MJ Anderson papers in there
@@Riffomonas Thank you very much for your answer!! I'll look there.
Thank you for making wonderful tutorials. could you make some tutorials on Interclass correlation between microbial taxa's and between samples variables?
Thanks for the suggestion!
It is awesome! Many thanks for that. Just a detail, it is "Benjamini" not "Benjimani".
Thanks! yeah, I know my midwestern tongue flips things sometimes 😂
Thanks for your video! But I can't get the same P value (Pr >F),when reanalysed the same data with the same method.
Hey Erick - hmmm. How different are our p-values? I would expect a little difference because the algorithm uses a random number generator. Their performance can vary by the seed and sometimes by the operating system, but the differences in outcome shouldn’t be that large
@@Riffomonas Thank you!
Wonderful tutorial as always Pat, thank you so much! I may have missed this in there (admittedly I skipped around a bit) but I'm curious how the adonis function changes when you are using a continuous numeric variable like moisture content or pH instead of categorical like Treatment. Is this a valid use of adonis? And if so how does it establish a "group" in this case?
Hey Robert - thanks! I don't think it will work well with adonis and am not really sure what to suggest - sorry!
@@Riffomonas @Robert Jones `adonis()` and `adonis2()` work just fine with continuous data. In the same way that a linear model works just fine with continuous and categorical covariates (this is the general linear model which fused ANOVA and linear regression), with sums of squares being computed for categorical and continuous covariates alike, `adonis()` and `adonis2()` do the same thing. If you're worried you could use `dbrda()` instead but these things are really all the same under the hood.
This is fine. See my reply to Pat's reply to your comment
@@ftboth thank you for the clarification, that is supremely helpful!
thanks for this video! May I ask what version of r and the vegan package you are using? I replicated your code exactly but get the following error: Error in `colnames
Hi Miranda - sorry to hear you're running into problems. I'm using R v4.0.5 and vegan 2.5-7. Here's what I had by the end of the episode leading up to the line with the colnames function. Does this help you spot anything?
library(tidyverse)
library(readxl)
library(vegan)
permutation
@@Riffomonas Amazing thank you! After some tinkering it seems like it was a technical problem with my machine, not your code. Thanks so much for an awesome tutorial! :)
@@Mir-gw6kj wonderful - glad you got it working!
How we can fit a random factor in adonis?
Hi Sakke - thanks for the question. It appears that adonis really isn't set up to do random effects. However, you might be interested in the answer to this posting for how to approximate a random effect stats.stackexchange.com/questions/350462/can-you-perform-a-permanova-analysis-on-nested-data
@@Riffomonas thanks a lot and thanks for all those helpful videos, I really enjoy learning and watching those ❤
Awesome video,thanks a lot for your sharing and time. But could you pleases explain “strata”,when I need to have it. Because for my datasets, when I have strata or not, I will get different results. Secondly, for my significant factor, I only got R2 = 0.22, but R2 of residual is 0.78. If I understood well, my factor can only explain 22%, residual can explain 78%. Could you please confirm my understanding? Also, how to explain residual can explain more than your significant factor? Thanks again !
Sorry, please help me with this question. Tons of thanks!
strata is a grouping level that you want to restrict the permutation test within. In permutation/designs this would be a blocking factor. Say you have repeated observations from a number of individuals; you wouldn't want to swap samples from individual A with any other individual. Hence you would pass strata a factor indicating which individual each sample came from and that way when we permute we only shuffle samples within individuals, never between them. It is much better today to use the permutations argument and pass it a control object to indicate how you want the samples permuted. Check out `?permute::how` for details on this as vegan uses my permute package to do the restricted permutation tests
@@ftboth Thanks very much for your answer. Although it's late, it helps me make it more clear.
goddamnit ggplot2
Thanks for watching!
Confusing
sorry! let me know what you couldn't follow
I am sorry for another comment. I run your code, and everything went smoothly until I do str(all_test). I got the output below, only 3 obs. of 5 variables (instead of 3 obs. of 6 variables in your output). Then I could not run the next steps for pairwise comparison as in your tutorial. Do you have any suggestions?
Classes ‘anova.cca’, ‘anova’ and 'data.frame': 3 obs. of 5 variables:
$ Df : num 2 335 337
$ SumOfSqs: num 14.1 103.3 117.4
$ R2 : num 0.12 0.88 1
$ F : num 22.9 NA NA
$ Pr(>F) : num 0.000999 NA NA