Edited: Thank you for this walk through. I just want to get your opinion on how you view the assumptions underlying regression methods. Compared to my field [applied econometrics], would you expect the assumptions to be violated to some degree as applied in ecological statistics? Is the hope of the applied [statistician] that the model is a good-enough approximation of the process that generated the observed data? How does one distinguish that good-enough threshold? Finally, does removing outliers to attain better measured deviations increase the risk of overfitting?
Quote from George Box: "all models are wrong, but some of them are useful" -- wrong in the sense that the assumptions are nearly always false to some degree. I'm not familiar with much from econometrics, but what little I do know suggests that the data sets tend to be larger than in ecology, which helps a lot. Econometric data suffer from less "observation error". Distinguishing the "good enough" threshold is a matter of judgement, and you get better with experience. One way to develop a good "eye" is to simulate data and fit the model -- the data meets the assumptions exactly and you get to see what the residuals "should" look like. As far as removing outliers goes, there are many published methods for testing for outliers. Sometimes removing a data point improves the current model, sometimes not. It's another area where experience, and subject matter knowledge, makes a big difference. Depending on "how" the outlier is removed, you may get a biased estimate as well. I would say it's an area where expert statistical help is a good idea.
I'm sorry you're having difficulty with your work. There are many reasons why you might run the same code and get different results. This isn't really a great forum for fixing issues like that, unfortunately. I suggest you look for help at forum.posit.co, stackoverflow.com (search for questioned tagged 'R') or stats.stackexchange.com.
This was the best explation over this topic I had. Thank you so much.
THANK YOU! I was doing some googling to try and better understand what this chart was trying to explain in R and this video is exactly what I needed.
Glad it was helpful!
Wow it's all so clear now, thank you so so much ^-^
Excellent explanation !!
Good video my man
excellent explanation !!
Amazing explanation (from an ecology student). :)
Neatly Explained! Thank you Drew :)
Great video. Thank you for the clear explanation!!!
Clear and succinct presentation. This was helpful for me. Thanks!
Great explanation. Thank You.
Edited: Thank you for this walk through. I just want to get your opinion on how you view the assumptions underlying regression methods. Compared to my field [applied econometrics], would you expect the assumptions to be violated to some degree as applied in ecological statistics? Is the hope of the applied [statistician] that the model is a good-enough approximation of the process that generated the observed data? How does one distinguish that good-enough threshold? Finally, does removing outliers to attain better measured deviations increase the risk of overfitting?
Quote from George Box: "all models are wrong, but some of them are useful" -- wrong in the sense that the assumptions are nearly always false to some degree. I'm not familiar with much from econometrics, but what little I do know suggests that the data sets tend to be larger than in ecology, which helps a lot. Econometric data suffer from less "observation error". Distinguishing the "good enough" threshold is a matter of judgement, and you get better with experience. One way to develop a good "eye" is to simulate data and fit the model -- the data meets the assumptions exactly and you get to see what the residuals "should" look like. As far as removing outliers goes, there are many published methods for testing for outliers. Sometimes removing a data point improves the current model, sometimes not. It's another area where experience, and subject matter knowledge, makes a big difference. Depending on "how" the outlier is removed, you may get a biased estimate as well. I would say it's an area where expert statistical help is a good idea.
Hello. I'm so confused. Why is R studio producing different results while using the same call. 😢
I'm sorry you're having difficulty with your work. There are many reasons why you might run the same code and get different results. This isn't really a great forum for fixing issues like that, unfortunately. I suggest you look for help at forum.posit.co, stackoverflow.com (search for questioned tagged 'R') or stats.stackexchange.com.
Very helpful, thank you
nice video!
Any chance of getting access to the code used to generate the plots used in this video? Thanks!
Ugh. Reproducibility fail! I reproduced those figures over at my blog: drewtyre.rbind.io/post/checking-assumptions/ hth.