- 16
- 32 870
Just One Bird's Opinion
Canada
เข้าร่วมเมื่อ 15 ต.ค. 2022
Are you a student or a science enthusiast? If so, this channel is for you! Nikki Regimbal is a Ph.D. student studying ecology and evolutionary biology, sharing the twists and turns of her journey and getting excited about science. On this channel and on her podcast, she aims to document her Ph.D. journey from the beginning, share her own research and experiences (amphibians, flycatchers, and backswimmers - oh my!), and talk about other cool research in the field! It's a part of her goal to make science education accessible - so stay tuned for tutorials in R and QGIS. Hopefully, this will prove useful to undergraduate students interested in research and those who love ecology, together let's get excited about science and learn about our amazing planet - but remember…this is just one bird’s opinion!
Check out my website for more information about me and how to access the podcast! nicoleregimbal.wordpress.com
Check out my website for more information about me and how to access the podcast! nicoleregimbal.wordpress.com
Binomial GLMM Assumptions
Follow along to learn how to check model assumptions for a logistic mixed effects model in R! We look at outliers, binned residuals, and overdispersion.
You can find the R Script and mock data on my GitHub: github.com/nikkireg1/Binomial-GLMM-Assumptions
Here is also a link to a very helpful GLMM FAQ page by Ben Bolker: bbolker.github.io/mixedmodels-misc/glmmFAQ.html#testing-for-overdispersioncomputing-overdispersion-factor
Happy coding! Comment if you have any questions!
You can find the R Script and mock data on my GitHub: github.com/nikkireg1/Binomial-GLMM-Assumptions
Here is also a link to a very helpful GLMM FAQ page by Ben Bolker: bbolker.github.io/mixedmodels-misc/glmmFAQ.html#testing-for-overdispersioncomputing-overdispersion-factor
Happy coding! Comment if you have any questions!
มุมมอง: 1 077
วีดีโอ
Redundancy Analysis (RDA) in R
มุมมอง 12Kปีที่แล้ว
Follow along to learn how to do redundancy analysis, or RDA, in R! A redundancy analysis looks at both relationships amongst response variables and explanatory variables. This tutorial looks at fauna mortality data and environmental cover data.You can find the R Script on my GitHub: github.com/nikkireg1/Redundancy-Analysis. Happy coding! Comment if you have any questions!
Episode 3: Why did the frog cross the road?
มุมมอง 143ปีที่แล้ว
Ah yes, the burning question we have all been wondering...Why did the frog cross the road? Today I try to answer this question. Tune in to learn about frog road mortality, risk factors, and what we can do to help our froggy friends.
Procrustes Rotation in R (Comparing Species/Variables)
มุมมอง 1.1Kปีที่แล้ว
You want to do a concordance analysis but aren't sure if it's possible if your sites don't align? I am happy to tell you that it very much is possible! If this sounds like your dilemma, follow along with this tutorial to do a Procrustes Analysis that compares variables rather than sites!
Simplifying Expressions with Order of Operations
มุมมอง 42ปีที่แล้ว
This video goes over simplifying expressions using the order of operations. When in doubt - use BEDMAS/PEMDAS!
Solving Equations (1-step problems)
มุมมอง 63ปีที่แล้ว
This video is an introduction to solving equations and is good if you are just being introduced to this video for the first time. We solve a variety of types of problems but focus on one-step problems to practice isolating the variable.
Adding and Subtracting Fractions
มุมมอง 90ปีที่แล้ว
Watch this tutorial to go over how to add and subtract fractions! We look at problems with and without common denominators and also work on simplification. Please put any questions in the comments!
Episode 2: First Week of School
มุมมอง 108ปีที่แล้ว
My very first week of grad school has come to an end! Hear about what I am interested in studying (dispersal, symbioses, and global change!) and how this first week went.
Episode 1: Hello World!
มุมมอง 176ปีที่แล้ว
Hello world! This is episode 1 of my new podcast (and TH-cam), Just One Bird's Opinion. I am Nikki Regimbal, a new Ph.D. student studying ecology and evolutionary biology - and I am excited to share my journey and talk about all things ecology. Watch this video for some brief introductions and to understand the motivation for this project. Stay tuned for coding tutorials, Ph.D. updates, chats a...
Procrustes Rotation in R (Comparing Sites)
มุมมอง 2.4K2 ปีที่แล้ว
Procrustes analysis is a cool tool to assess the concordance of two principle component analyses. In this video, we discuss how a Procrustes uses the PCAs and then go through an example assessing concordance of species inventory and mortality at different sampling sites. If you are not familiar with how to do a PCA, I suggest looking at my PCA video first. I hope you find this helpful!
Linear Regression in R
มุมมอง 9792 ปีที่แล้ว
This tutorial reviews linear regression and goes through an example in R to test the significance and plot the trend. You can find the R Script on my GitHub: github.com/nikkireg1/Linear-Regression. Happy coding! Comment if you have any questions!
Principal Component Analysis (PCA) in R (presence-absence data)
มุมมอง 13K2 ปีที่แล้ว
In this tutorial, we discuss what a principal component analysis (PCA) is, walk through an example in R using species presence-absence data, and create and interpret a PCA biplot.
Logistic Regression in R
มุมมอง 3082 ปีที่แล้ว
Follow along to plot a logistic regression curve using ggplot and assess its statistical significance in R.
Subsetting a dataframe using R
มุมมอง 1382 ปีที่แล้ว
This tutorial reviews different ways to subset a dataframe using the subset function in R. I like to use this function because it is easy, versatile, and does not require additional libraries. We review how to subset rows by true/false or specific values, subset by specific columns, and combine methods using and/or. Hope you find this video helpful! If you have other R questions that you'd like...
Finding the centroid of overlapping buffer polygons in QGIS
มุมมอง 5922 ปีที่แล้ว
Follow this tutorial to learn my method to find the centroid and create a new buffer geometry for regions with buffer overlap in QGIS. This method is useful to avoid double-counting observations or overestimating what is measured in the buffer region. Note that this method creates new buffer geometry free of redundancies, but loses other fields of characteristics when you dissolve the buffers. ...
Creating Violin Plots with R (color and grayscale)
มุมมอง 5362 ปีที่แล้ว
Creating Violin Plots with R (color and grayscale)
so how does the result (relatedness and unrelatedness) inform in the science question at hand? (not an ecologist)
Thanks for making it easy for us❤
Thank you so much!! I've been looking for this tutorial for couple days... finally found it! my project can still move forward now. luv u
Thank youu so much!! I am using Logistic mixed effects model with a panel data, that's very helpful!!!
How to interpret the result?
Thank you so much ! The clearest and most logical video about RDA I've ever seen. Really helpful with a lot of details which I want to know ! 👍
thanks dude for explaining it 😎
you're welcome
BY FAR THE BEST AND MOST USEFUL VIDEO I HAVE SEEN SINCE BEING BORN!!!!!! THANK YOU FOR EXPLAINING THIS TO ME!
I know right?
Amazing video!! Thanks!!
Amazingly clear explanation! I love that you explain even the basics of each step, it's so useful for someone who's just starting out with statistical analyses and struggles to understand high-level explanations common in other sources
Oh my god I'm so happy I found this video before my ecology final tomorrow. Thanks so much 😭💖
I guess by now that you've noticed that your spelling of "principal" is incorrect.
This video is really helpful to me, and i would like to know about the variable, is there any minimum amount for the variable in using PCA? such as 5 variables of places with 6 or 7 parameters, could it be use PCA to solve it? Thanks
Thanks great video. However when i try to run those diagnostic tools i get a message binned_residuals(glmm_model) Using `ci_type = "gaussian"` because model is not bernoulli my model looks like glmm_model <- glmer(cbind(Emergence, Total_seeds - Emergence) ~ Inoculant * Species * Soil_type + (1 | Tray_ID/Pot_ID), family = binomial(link = "logit"), data = filtered_data) where emergence/seeds that did not emerge shuld be binomial. Any ideas?
I find this tutorial really helpful. Thankyou for making this video.
Yes but how do the principal components that come back when you get the summary from a PCA in R correlate back to the variables you input? I have yet to make sense out of this, nobody seems to explain it clearly and simply, not my teacher and not 1 single youtube video I've watched. I'm lost...
Hi Greg. I am not super sure I understand what you mean, but will try to help! The principal components are axes explaining variation in your data (the variables you input). So high variance explained by a principal component indicates higher relatedness between the variables. If PC1 and PC2 cumulatively explain a super high amount like 90% of your variation, the story of your data is super clear! Pretty much all the relationships between your variables is explained by those two axes. Whereas if there are many principal components explaining very little variation, then the relationship is more complicated or the variables may not be related at all. A biplot can be used to visualize the relationships between the variables (the magnitude of a positive/negative relationship). But the PC values are important to add validity to those relationships. Again, if the PC1 and 2 axes explain 90% the relatedness between variables is going to be much more reliable than if they were cumulatively explaining 20%. If you are interested in a deeper dive, you can look at the individual PC scores for each variable individually in the summary output to look the distance between variables along each PC axis. If the visual biplot is not answering your question, then maybe doing this could help. Does any of this answer your question? Apologies if I am misunderstanding.
Thank you very much!! This helped me a lot!!!
Hello, can you give us the script, thank you for tutorials ❤
Error in step(spec.rda, scope = formula(spec.rda), test = "perm") : AIC is -infinity for this model, so 'step' cannot proceed. I got it problem?
Thank you so much! I was just reading some dense text for this and your video helped
Thanks a lot for your clear and simple explanation! You made rda way more clear to me after this video!
So awesome! Will be using this soon!!
Wow, thanks!
Are there more assumptions we have to meet or is it enough to do what you did, e.g. with the collinearity?
Hi! The main RDA assumption is linearity between the predictor and response. Besides that checking for those variance inflation factors like I do in my video is really what's important. Hope this is helpful!
🙌🙌🙌Grateful for the video 👏👏
Amazing work as always!! I also wanted to ask you how you could add ordinal variables into this, Is it possible? For example, vegetation growth habit (e.g. herbaceous, shrub, tree). Is it possible to mix it with numerical data as your environmental variables? On the other hand, could you do a video using a PCoA? I have been reading and it seems is the best way to represent beta diversity in terms of similarity among the groups because of the distance approach! I would appreciate your comments about this. Thank you so much!!
Thank you! It's definitely possible to add in categorical data like that - but you'll likely have to do what's called dummy coding. With dummy coding all the categorical factors would become their own columns and be assigned a 0 or a 1. So with the example you gave, rather than vegetation growth habit being the column name, there would be a separate column for herbaceous, shrub, and tree. If observation 1 was in a tree habitat, then herbaceous and shrub would be denoted with a 0 and tree with a 1. You can think of it like a presence-absence matrix for the category. I hope this is helpful! I will definitely add a PCoA video to my list - I am hoping I can be more active with the semester winding down. :)
Hello! I just wanted to thank you for this amazing video! I also would like to ask you something. What would you do if you want to represent your species under different habitats (e.g. Forest, Meadow and Scrubland) but each habitat has their own amount of sites (e.g. 20 sites each one)? Would you combine your sites to represent the habitats? I'm a little bit lost about which approach I should take to work with my data. Thank you so much!
thanks you sharing, its so clear and really useful. I struggled in so many reading materials until your video.
Informative video. Thankyou
Great explanation! Can I do a pca using rda with numeric variables of 11 levels?
Thanks! Yes you can, numeric variables are perfectly fine. Although if you end up with a whole bunch of variables consider doing some dimensionality reduction.
Hello :) first of all, I would like to thank you for THE most straight forward and on the subject of ecology/biology video I have ever found!! It helped me understand and solve several mistakes and questions I made/had. REALLY, thank you! I would appreciate and love if you could find the time to do more videos like this. I only have one question: how to avoid the overlapping of the site labels/species labels? I have 30 sites and more thank 30 species, the problem being that I can not see what sites are overlapping :( Thank you again and I wish you success and great accomplishments in your field of study!
Thank you so much! I have a few more similar videos, but am planning on starting up making more soon! I just came back from 5 months of field work so I am just getting back into the groove of things! Those overlapping labels are always the worst! I have trouble doing it with the biplot function because it's a bit limited in the level of customization you can do. An alternative would to use the fviz_pca_biplot() function in the 'factoextra' package. Similarly the first thing you would put in the brackets would be the name of the PCA object. With this function there is an argument you can add in the brackets called repel which can be 'repel = TRUE' or 'repel = FALSE'. If you try setting it to TRUE it might solve your problem. Another option which is a bit of a 'cheat' is what I usually do to be honest because it is much simpler. For some reason I find any text label repel code I find doesn't really do what I want it to. So I save the plot but as a metafile rather than an image or however else you generally save it. I then insert the metafile into PowerPoint. Here, I can right click, go to group, then ungroup. This unconstrains the text and any other discrete plot element so you can edit it. I use it to move around my labels so they aren't on top of each other and sometimes to revise some variable names to make the aesthetic better. Just be careful to not accidentally alter your plot itself! I hope this is helpful but please let me know if you have any questions!!
@@justonebirdsopinion Hello again, Thank you for your comment!!! I am back with another question :) What if we want to make a PCA with the environmental data? chim<-read.delim("clipboard", row.names=1) #here I have 12 env data like Ca, Mg, O2, water temp etc chim <- na.omit(chim) summary(chim) t.env<-decostand(chim, method="log") #Log transforming the non-standardized variables, as all of them are measured at different scales summary(t.env) chim.pca<-rda(t.env, scale=TRUE) chim.pca summary(chim.pca) par(mfrow=c(1,1)) biplot(chim.pca, scaling=2, type = 'text',xlab="PCA axis 1", ylab="PCA axis 2") I followed your video and tried to make a plot with the correlation between sites and env data. Is it correct? Do we always log our env data and than also scale=TRUE for the rda? I hope I explained it well enough. Thank you for your hard work and I can't wait for you to post more videos. Wish you best of luck.
OMG finally an explanation of this that makes sense to me. Thanks!
Thanks, wonderful
Thank you so much <3
Great but you code text is tiny - even at the higest res.
Hi! Yea sorry about that it was due to a monitor issue. You can find the code on my github as well if that's helpful. github.com/nikkireg1/Redundancy-Analysis
Hi Nikki, many thanks for creating this tutorial and also all the thorough explanation provided :). I have a question, when you do the anova (anova(spec.rda2, by="margin", perm.max=1000) #tells you if the order of the terms is significant). I don't get what you mean by the "order of the terms"? Biologically speaking, what does it mean? I am also struggling in customizing the rda plot with ggord... do you have any experience on that? Many thanks in advance! Cheers from Uruguay :)
Hi! So sorry it took me so long to get to this comment! I just came back from 5 months of field work so am just getting back to this channel. I'm sure it's too late where my answer won't be helpful, but I will reply anyways. The 'order of terms' is related to how the anova function is handling the data. So generally, it will try to explain how much variance is explained by the first term, and then only consider the remaining variance in how much the second term might account for, and so on. This leaves less variance as an option to explain terms that come later in the model. This becomes a problem if there's correlation between variables. So if variable 1 and 2 are correlated, most of the variance will be attributed to variable 1 and very little variance attributed to variable 2, even if in theory they should be relatively equal. So assessing if the order of terms is significant is letting you know if there's a significant effect of the order you fed the variables into the model, with different (perhaps only slightly different) results depending on the order of terms. This is helpful information as you go forward in interpreting the model and determining if you are running into Type 1 error.
@@justonebirdsopinionMany thanks again 🙂
Nice explanation and nice code ty
in my case the anova anylsis that I ran was not sig. what so ever. but when I check only few choosen that was intresting for me to plot and than the anova turned sig. for each test of anova . I just don't know why mine is not fully seen on the screen of the plot. and how do I make sure I dont have overlapping of the final text in the plot?
Hi! Are you referring to the text on the rda biplot? If so, there is some ggplot code to repel overlapping text. But to be honest, I do a much easier perhaps less 'correct' method to fix my plots when stuff like this happens. I save the plot as a metafile and then insert it in powerpoint. Then if you right click and choose 'ungroup' you will be able to modify individual aesthetic elements of the plot. So you would be able to shift around any overlapping text boxes and even change the variable label if you want. Does this answer your question or am I misinterpreting?
@@justonebirdsopinion Exactly ! I had no Idea of this. it is amazing! Thank you. now my graph is way more readable for my propessor to read. Thank you so much ! it is really good explantion .😇
Thanks for the video! What happens if your mortality dataset includes only a fraction of your inventory data? Say, some species never get run over? (or only a subset are hosts for a disease of interest)
Hi Kendra! I personally still find it relevant to include columns even if all the values are zero. So personally I would keep data where you know a host does not have a disease of interest. It is interesting how it aligns with the other data frame in the procrustes, even if it is absent in the other dataset as well! I think including it creates a more complete story and there’s nothing fundamentally wrong with including it that I can think of. But don’t be afraid to omit if doing so is more relevant to the question you’re asking! Regardless, make sure both data frames line up (whether that means adding columns where everything is absent or omitting data for which you are missing the pair). Thank you and happy coding! Please let me know if this is helpful or if you have any other questions!
Hi, first of all... ty for your video! I have a problem with my RDA analysis, the analysis is taking the same number of enviromental data as especies, even when my enviromental dataset has more columns than especies, do u why this happens?
Hi Miguel! When you are going through the step that is equivalent to my env_stand, are you including all the variables for the analysis? Make sure to include any variables that may have been transformed; cbind() them with the non-standardized values. When you do your first rda, all the variables should be there. But after the step() function some should be omitted. I guess my first suggestion would be to ensure that env_stand is including all the variables you want before putting it in the first rda analysis. Are you doing that? If so, let's chat some more. Happy coding!
Hi! Great video :) I'm stuck in a situation which is similar to your script: I'm performing an rda and obtaining a model with only RDA1 and PC1 as output, but the step function won't work on it. I passed the model the same way as you did (rda_step1 <- step(rda_results, scope=formula(rda_results), test="perm")) but the script stops after a few succesful steps with the error "Error in `[.data.frame`(m, xvars) : undefined column selected". What could that mean?
Hi Laura! Thanks for your comment! The error you are getting is indicating an issue with your explanatory (x) variables. Since your output is only showing RDA1 and PC1, I am thinking that the explanatory variables you are using are highly correlated - which is causing the crash with the step function. The RDA1 output is essentially saying that all the variance explained can be attributed to this one axis, and PC1 is indicating that there is residual noise that cannot be explained by a RDA axis and is likely due to high correlation as well. My suggestion would be to start with assessing for collinearity first using the vif.cca function. This will tell you if there are any variance inflation factors in your explanatory variables that is messing with the model output. You can also check to see if your explanatory variables are highly correlated using a Pearson's r. The R script for this would be "cor(df$vars, method = 'pearson')". Omit variables that are highly correlated. If everything looks good on that front, you may be missing explanatory variables. So if there's anything you omitted from the beginning, perhaps consider adding it back in. Or it is possible that with the data you have a RDA may not be the best method to use. If you want to chat about this in more detail or have me look over your data - feel free to email me as well! If you go to the About section in my channel profile you will find a link to my website which has my contact information. I hope this at least helps you get started with your problem-solving! Good luck!
@@justonebirdsopinion Thank you very much, this was really helpful. Unfortunately I can't use vif.cca because the step function stops before having concluded, due to the error I mentioned in the first comment. I've also found the function redun() that seems to work on my data; anyway I can't find a lot of literature using this one, whereas rda() seems to be much more used. What do you think about it?
@@lauraviviani7472 Hi Laura! You can still use the vif.cca function before doing the step function. You would just put the name of your original RDA in the brackets rather than the new rda name for your step function. Did you try the Pearson's r as well? Hopefully both help you omit highly correlated variables and variance inflation factors that might be causing the problem. I've never used the redun() function before and have only used the rda() function in vegan. I think rda() is more typically used, but since I've never used redun() I can't really speak to whether it's better or worse. But since your output is suggesting very little variance in the data, changing the function likely won't fix the issue. Start with trying the vif.cca and let me know if you are able to run it! If you are having issues you can email me and I will send screenshots and maybe some mock code. If you do get rid of VIFs and correlated variables and are still having issues, maybe RDA isn't the best analysis for what you want to do. You can tell me a bit more about your data and the question you want to ask and I can think about whether a different analysis might be more suitable. I hope this can be a bit helpful! Let me know how it goes!
@@lauraviviani7472 Hi Laura! I just wanted to check in and see if the issue was resolved!
Congratulations to you As a research scholar your passion to share knowledge with world is pleasing
Wonderful video and explanantion. Please can you proivde the script used and the data set as well?
Hi! You can find the R Script and data on my GitHub at this link: github.com/nikkireg1/Redundancy-Analysis. Happy coding! Let me know if you have any questions!
@@justonebirdsopinion Thanks so much for your videos. I would like to ask if you could make vidoe on doing a mantel test and also canonical correspondency analysis. One question, is what is the major and underlying assumption for using a redundancy analysis to doing a canonical correspodnign analysis? Thanks for your time.
@@justonebirdsopinion I have different categories of environmental variables such as the biotic factor (temperature, windspeed), spatial factor (longitude and latitude), and soil properties (soil ph) and I would like to perform Canonical Correspondence analysis. I successfully performed the analysis but I don't know how to customize the plot. I want to customize the different categories of environmental variables by allotting them with different colours to differentiate spatial from biotic. In addition to this, how can I group the species based on a variable such as elevation etc? I would also appreciate it if you can give me your email. Thanks
@@farmIntegral Hi! I will work on making a CCA video and Mantel Test video soon and let you know when it's posted (hopefully the first one within a week)!
@@farmIntegral Hi! Here is a link that might be relevant to helping you group environmental variables by color: stackoverflow.com/questions/61348422/how-to-create-ordination-plots-with-different-species-groups. If this isn't what you're looking for I'm happy to help problem solve more than this! You can reach me via email at nicole.regimbal@mail.utoronto.ca. Looking forward to hearing back from you and happy coding!
Thank you
Thanks for sharing this is what i was looking regression with R2 and Formula in graph . can you pleas share the script with us
Hi! You can now find the R script on my GitHub. Here is the link: github.com/nikkireg1/Linear-Regression
Thanks alot 🙂
By far the best video I could find on this topic, thanks a lot!
Hi, thanks for the video!!! really helpful and helped me through the first step. The species data I have is also species/absence. However, because I'm dealing with plants, I have p/a data for 159 species :( and therefore my explained variance values are very low (i.e. like 0.03 for PC1). What would you recommend in this case? Should I take out less important or very rare species?
Hi! Thank you :) I'm glad the video was helpful so far! Oof that sounds frustrating! It sounds like this could be largely due to autocorrelation between your species. You can create a correlation matrix in R and omit anything that is inflating your results (I suggest omitting anything with a value greater than 0.7 or less than -0.7). Here is the code for that (with df being the name of your dataframe): pearson <- round(cor(df, method = 'pearson'), digits = 2 # rounded to 2 decimals ) This will output a table of all pairwise correlations. Since you have 159 species, that may be a lot to sift through (and kinda painfully).... hopefully creating a visual will make this a little easier - you can use the "corrplot" package to do this. corrplot(pearson, method="circle") So if there's a high correlation between two values, omit one from the analysis. If taking out correlated variables doesn't help, then yes try to take out variables that you think are less important. Removing variables from PCAs have pros and cons, but really there's no right or wrong answer as long as you think through how it will affect your PCA. Here is a link to a post on StackExchange that goes into that a bit more: stats.stackexchange.com/questions/50537/should-one-remove-highly-correlated-variables-before-doing-pca I hope this helps you out! Let me know if you have any questions or ever need another pair of eyes on the data :)
This was super helpful!! Great explanations! Thanks for guiding me to this video from your comment response on the PCA video. Much appreciated!
Great video! Super helpful!! Would a PCA like this, for presence absence of species, be able to include explanatory variables that explain the distribution of the community in the PCA? Your video was super helpful and I was able to run the PCA with my data but now I'd like to visualize in the PCA plane how other variables relate to the spread and layout of the species in the pca. Is a PCA still even what I should be using?
Thanks so much! You are going to want to do a redundancy analysis (RDA) instead - it is essentially just a PCA with explanatory predictors for the response. I have a video on this on my channel, but let me know if you have any questions :) Good luck with your analysis!
Thank u so much, everything was 100% clear! If someone wants to read more about the RDA or other ordination methods, you can refer to Numerical Ecology with R (Borcard). Saludos desde Peru!