Statistics with R (3) - Generalized, linear, and generalized least squares models (LM, GLM, GLS)
ฝัง
- เผยแพร่เมื่อ 4 ธ.ค. 2024
- In this video, I show how how to implement linear models, generalized linear models and generalized least squares models in R. Using the "airquality" dataset, I show how to fit and interpret the models. The core of the session is the interpretation of partial slope coefficients in Poisson generalized linear models. Finally, I give an outlook on generalized additive models which will be covered in one of the next sessions.
The R code used in this video is:
airquality
plot(Ozone~Wind,airquality)
model1=lm(Ozone~Wind,airquality)
plot(model1)
coef(model1)
(Intercept) Wind
96.872895 -5.550923
#predictions for Wind speeds of 19 and 20 mph:
Ozone1=coef(model1)[1]+coef(model1)[2]*19
Ozone2=coef(model1)[1]+coef(model1)[2]*20
Ozone1
Ozone2
##
model2=glm(Ozone~Wind,airquality,family=poisson)
coef(model2)
(Intercept) Wind
5.0795877 -0.1488753
Ozone1.glm=exp(coef(model2)[1]+coef(model2)[2]*19)
Ozone2.glm=exp(coef(model2)[1]+coef(model2)[2]*20)
Ozone2.glm/Ozone1.glm
0.8616765
exp(coef(model2)[2]) #exp(-0.1488753 )
#0.8616765
##
library(nlme)
model3=gls(Ozone~Wind,airquality)
summary(airquality$Ozone)
model3=gls(Ozone~Wind,airquality,na.action=na.exclude)
head(airquality)
airquality$Date=
as.Date(paste(1973,airquality$Month,airquality$Day,sep="-"))
library(lattice)
xyplot(Ozone~Date,airquality)
model4=gls(Ozone~Wind*Date,airquality,na.action=na.exclude)
air2=subset(airquality,complete.cases(Ozone))
model5=gls(Ozone~Wind*Date,air2)
plot(ACF(model5,form=~Date),alpha=0.05)
model6=update(model5,correlation=corAR1())
library(MuMIn)
AICc(model5,model6)
df AICc
model5 5 1099.40
model6 6 1095.21
summary(model6)
I wish I'd seen this video last month. Better late than never. Subscribed!
I think that this one of the best tutorial ever. Thanks a lot Herr Professor
Thank You Dr, You have explained all fluently and smoothly...
Honestly speaking, you are a wonderful teacher! You saved me a lot of effort, to be frank. Thank you!!
thank you so much!
Vielen Dank Christoph, Sie haben wirklich top erklärt! Mache weiter so!!!!!! ich bin big Fan von Ihnen!!!!
thanks a lot for your time and clear explanation is was priceless, I hope you can share more videos :) I'm looking forward to GLM part 2
You are an excellent teacher! thanks!
terrific video...this is exactly what i needed!!
Cool video! I like observing nature, discover and decompress...
Thank you so much. A nice, clear and excellent lecture on gls and lm. I have used and applied the lesson learned on my analysis. It came out wonderful!!
I hope you may add some example videos on linear mixed models with lmer function from the lme4 package. I am sure you would give us a wonderful understanding how the Random effect structure would remove/minimize or handle spatial/temporal variations or differences as the gls does in the case of spatially/temporally correlated residual errors.
Hi John,
Thanks very much for your suggestion! I just added a first video on lme models. More complex models will be covered later.
Best regards,
Christoph
Very good presentation. Thanks
Wasn't aware of the As.Date object...great!
Great video! Thanks for posting this.
Thank you very much! Not only for this, but also for the other videos! They helped me a lot! I think that would be a nice video if you use those really simple examples in simulations with Montecarlo...... that would be epic!
you went on my favs, very good vid
You are excellent! Keep up the good work!
Greetings from Sweden!
+Erik Wersäll thanks a lot!
Thank you so much for your tutorials! Very helpful
Super didactic, thank you very much!
thanks!
very helpful, thank you so much for your nice tutorial.
Thanks for your interesting and helpful viedo.
Maybe you could use typical R conventions for your code, to keep it a little more readable?
Thank you for such a great job!
Excellent video, thank you
If you could add chapters (in the video) so you could jump strait to e.g: GLM that would be great. Saves time when you just want to see GLM.
Great video by the way!!!
Thanks! I´ll try to implement this in the next videos.CheersChristoph
Dude, i love you. I really do. Thank you very much.
Do you provide any tutorial where you take more predictors into consideration?
E.g. working with 6-7 predictors which are influencing one "predicted" variable?
Besides that:
You really make great videos. Very explanatory and understandable (Like it that you also take paper and pen!)
Many thanks for the super useful tutorial. What did not like so much, that it could be misunderstood that the simple use of glm() instead of lm() already results in the better model, because you spent plenty of time explaining the issue of how the glm uses exponential functions. However, the superiority of the glm-fit is because of the family selection, as the output of lm() is equivalent to glm(,family="gaussian"). Which is were i stumbled. Because i have mathematically no real idea of what the link functions are actually doing. I just know how and when to apply those families. A tutorial on link functions would be really great :)
+Kendra Millard Thanks for these comments, I´ll provide more details on GLMs in new videos. Best regards, Christoph
Many thanks for your invaluable information and explanation about the regression modeling. Could you please provide a bit more information for us.
Best Regards
Thank you very much :)
Which package I need to install on R-Studio for Mac to run the GLS-models? I just cannot find the right one! These tutorials are great thank you a lot. You got some didactical talent!
you need to install the "nmle" package
THANK YOU VERY MUCH!
A quite helpfull video.
Christoph, can please add more lectures on identifying non linearity when building a model and how to improve the models? Also do you have videos on DOEs and latent variable analysis (PCA etc)
Thank you very much
Any plans to model MLE / likelihood ratio models? Nice work!
Thanks Christoph for great and understandable tutorial.
Do you have a tutorial that gives guidance how to handle multicollinearity between explanatory variables in regression analysis?
If I have understood right, VIF (Variance Inflation Factor) is used as an indicator of multicollinearity and it is somehow utilized
also in R when handling milticollinearity in regression analysis.
Thanks in advance for your advice.
br. Mika
Hi, why the errors for the glm model is not e^error? ok I saw later but it was not there when the model was written, thanks
I code following in video but I can not get ACF plot . How to get it?
This is very helpful :)
I have one question about the video, I've recently started using R, and dose not understand when you put "as.Date(paste(1973,airquality$Month,airquality$Day,sep="-"))" and said "depending on your system", how do i know what system I use?
Thank you!
Good, video. How would you then graph the GLM? Can you follow-up with a video showing this? Thank you!!!
One easy way to add a Poisson glm would be to make a vector
y=1:20
(all the integers from 1:20) and then write
predict1=exp(coef(model2)[1]+y*coef(model2)[2]
and then write
lines(predict1)
to add the glm to your plot (you should get a curve if you've done it right). There's probably a better way but this is how I do it.
Great!
thank you!
Nice videos. I find a better approximation formula for lnv! which is used in universities.
The formula they use is lnv! ≈ vlnv - v(If i remeber well). My semi-empiric formula is lnv! ≈ (v+1/2)lnv - v +1,08
thanks!
is the model given by GLM linear?
In lm model it was shown with plot in earlier video. Can we visualize something like that?
Also your presentation is very nice.
Just one question: why did you choose the poisson family when using the glm model?
yt viewers have the pause button available, so i don't need to see your typing errors. The video can be compressed. Good tutorial anyway, keep it going!
I had a question regarding Poisson Model that you performed on Ozone~Wind; While you used the model to control the negative predicted values of Y variable, the assumptions of Poisson model are not considered. The ozone content is not a discrete data which follows a poisson Distribution.
+Ask Analytics You are right that actually the variable is likely negative binomially distributed - as can be seen e.g. from
library(fitdistrplus)
plot(fitdist(as.numeric(na.exclude(airquality$Ozone)),"nbinom"))
While it is true that the ozone concentration in itself is not Poisson distributed, the generating measurement process clearly is a poisson process.
However, for the sake of simplicity, we model it as a Poisson glm here. Further details on negative binomial models (and how they may be used to account for overdispersion) will be added later.
Thank you very much for your comment!
at 16:40 roughly, wouldn't it just be e^b where b happens to be negative in this case since we see a decreasing/negative relation b/w wind and ozone.
+tawilk In a generalized linear model, all coefficients are on the scale of the linear predictor. Thus, you transform:
I: ln y= a + bx
II: y = exp (a+bx) = exp(a) * exp (bx)
a one-unit change in x will thus lead to a (exp b)-fold change in y.
Thank you so much, please how can i generate several samples from a multivariate normal population and compute the sample Mahalanobis squared distance for each of them using R?
+Mbanefo Madukaife you use the mvrnorm function for this
+Christoph Scherber Thanks for much. Any existing code for that which can be run in the mvrnorm?
This means, that once i introduce the wind*time interaction and correct for the autocorrelation, gls does not find a significant dependance between ozone and neither predictor, judged from the p-values given in summary(model6). Is this correct?
Thank you!!!
Hi, Thanks for such a nice tutorial :)
I wish to ask one question related to Model 2 (Poisson). For example if we have two explanatory variables: one numerical and other categorical, then how we can estimate the value of Ozone?
Ozone ~ Wind+Time # Time can be 'Day' or 'Night'
Let's assume we get coefficients of Intercept as 2.3, Wind as 0.0065 and Time as 0.56.
We wish to calculate for: Wind=200 & Time=Day. so, equation will be:
Ozone_val=exp(coef(mod)[1]+coef(mod)[2]*0.0065+coef(mod)[3]* ...??
Should we multiply by 0 or 1 with coef(mod)[3]? What should we should choose?
Thanks & Best regards,
Chaitanya
This is usually a bit more complex to understand. If you have Wind+Time, separated by a "plus" sign, then you get the following estimates in the model summary:- The intercept for Wind=0 and Time="Day"- The difference in intercepts between "Day" and "Night"- the common slope for WindIt´s easiest to find out about these things using the "effects" package:require(effects)plot(allEffects(model)))Cheers,Christoph
@@ChristophScherber Very interesting package Christoph, thanks a lot for your video AND your comments :)
You're an awesome teacher !
Thank you :)
should the windspeed and ozone measurements be normalized prior to evaluating models?
indeed it can be a good idea to scale variables to remove collinearity. But of course often the original scale of variables is interesting
Dear Professor Scherber: Your videos are excellent! However, I'm wondering if you can point me in the right direction. I'm looking for a statistical method to compare modeled data against actual measured data, with the goal of calibrating the model. This isn't the place to go into the experimental details, but in brief, I'm trying to calibrate and validate a method my colleagues and I are developing for nondestructively estimating the amount of C stored in living trees. We used this method on a large sample of trees in a research forest, and want to compare estimates from our model to actual measurements obtained by felling the trees. Regressing the model estimates on the actual measurements produced a high correlation coefficient (~96%), a non-zero intercept less than 10% of the smallest model value, and a slope of nearly 1, and while this is nice to see, it doesn't really address the question of agreement between model estimate and measurement. Do you have any suggestions on how we might do this? One idea I have is to regress y-x against x, and then compare (how?) this against a y=x line.... but how to do this in R? I'm asking a lot...hope you can help! Thanks, Bob
Dear Rob, I think a regression of the predicted y (from linear models) vs. the observed y (from felled trees) sounds reasonable; you could then test if the slope differs significantly from 1.
Thanks! Since I wrote you, this is exactly what I've tried, and it's quite interesting. I also calculated prediction intervals on the regression of the observed values on the predicted values, to see how well the latter can predict the former. I also used the Bland-Altman Limits of Agreement analysis.
Hi Christoph Scherber, Your videos are really helpful. Thanks a ton for the same. However I am facing a problem.I am unable to install the package MuMin for using AICc function. I am using R studio and R 3.1.2. it will be of great help if you can guide me to resolve the problem. Thanks in advance......
+Arnab Roy You may wish to write to the package maintainer if the problem is reproducible. MuMIn does interfer sometimes with other attached packages. Be sure to have the latest versions of R, RStudio and MuMIn. Best wishes, Christoph
Hi professor Scherber, thank you! I really enjoyed your video. Can you further show us how to implement negative binomial model if poisson model shows overdispersion?
is there any problem if we save R console in work space?
Do you know how to do a GLM on statistica?
What data set did you use, and where can I find it?
usually the datasets are loaded by using the data() command (the datasets are part of the base installation of R)
I am running the following gls regression g.lm
Hy can you please help me for L-moments in r????
Thanks, unfortunately I don´t have experience with L-moments. What do you need it for?
I might be wrong but it seem to me that if
log(y) = a + bx + error
than, y = exp(a)*exp(bx)*exp(error)
as opposed to y = exp(a)*exp(bx) + exp(error)
Is there a reason why the error would be additive as you say in the video, and not multiplicative?
simon venne The reason is that in a GLM the errors (e) are modeled separately from the linear predictor (a+bx). Mathematically, you are right; but the trick is that a GLM allows you to specify link function (for linear predictor) and variance functions (for errors) separately. Your equations are correct for a model where the response variable is transformed. In the case of a generalized linear model, things are a bit different, though.
were GAM and GLM are same??
no, that´s different types of models
Sorry, I believe your interpretations are a bit... misleading. At 16:33, you write that "In a glm, an individual slope gives an estimate of the multiplicative change in the response variable.... ". This is not always the case. This is the case for the Poisson distribution because the default link is a log link. But, you could easily run a glm with a Gaussian distribution and identity link, which would have an additive interpretation. Further, it is best practices to indicate your link, even when the default is intended (learned that the hard way :) ).
hallo Professor!, Ich verstehe nicht, warum eine Variable ist besser als der andere. Haben Sie jedes Video, wo ich weiter erklären?
Entschuldigen Sie mein schlechtes Deutsch, ich bin Argentinien.
Grüße
I thought this was a very good video, but you kind of hit the afterburners at the 2/3 mark.
Hi Exit, I´d be glad to hear how this video can be improved. Cheers!
Christoph Scherber Thank you very much for this video....it was really useful for me
and you remind me of Stewie (the baby in Family Guy) which is awesome. :)
I code following in video but I can not get ACF plot . How to get it?
Where exactly do you get an error message?
ah, you need to replace the minus sign with a tilda sign (~).
Happy new year to you!