I didn't find the function coeftest in sandwich, but I found it in lmtest. I say this for those who might have got an error when running the coeftest for the first time with only sandwich loaded in. The rest was perfect. This is great content mister Huntington-Klein, thank you so much.
Yes, I am watching these in April 2021 and I got a warning that it didn't recognise "coeftest". however some googling showed that you also need to install and load lmtest. After that it worked fine
Man, every time I search on TH-cam anything that I need for my analysis, as I'm still learning, I find you. I'm really thankful for your videos, they are somewhat introductory yet concise enough for me to keep searching for concepts, methods, etc. that I never heard (I have no background in stats). Thanks again, mate.
To run coeftest function it is needed to intall the package lmtest (install.packages("lmtest"). Aditionally, to run the function linearHypothesis() it is needed to install the package car (install.packages("car")) and call the library(car) before perform the function. Thanks for the very well presented contet mr Klein!
I had to install and load several packages this time, and many errors on the way. There was even defunct code or command just because I didn't capitilize a letter :) Thanks for the video.
Hi Nick, Great video. Once you run the coeftest fuction through stargazer, it no longer produced the statistics at the bottom of the regression table (R2, Adj. R2, F-Test, Number of Observations, etc...). Do you know how to maintain such stats? Thanks for the help! EDIT: I understand I can do it manually by taking the data from the non-clustered model, but when working with many models, it would be helpful to have stargazer produce this automatically.
You can send the regular, non clustered version to stargazer, and then send the clustered version only for the SEs. I believe it's se= in stargazer to set the SEs, but you may want to check the stargazer help file.
Thank youu for the video. My question is how to solve the problem if there is no heteroscedasticity but autocorrelation problem. Should I still use vcovHC from coeftest or there exist special command only for autocorrelation problem
The adjustment for autocorrelation is different. You can use vcovHAC or NeweyWest from the sandwich package, or NW from fixest. See this section of my book theeffectbook.net/ch-StatisticalAdjustment.html#your-standard-errors-are-probably-wrong
Hello Nick! Thank you for the video, it's super useful. I am running a Heckman two-step model in R and I am trying to obtain the robust standard errors for that. Any ideas how to do that? Thanks :)
@@NickHuntingtonKlein Thanks for the reply. Unfortunately not, at least I cant get the results. The error that I get is the following: > stargazer(m2, coeftest(m2, vcovHC), type = "text") Error in UseMethod("estfun") : no applicable method for 'estfun' applied to an object of class "c('selection', 'selection', 'list')"
@@hussainimussa hmm, strange. That's surprising, as under the hood Heckman is just a regular regression. I'm afraid I don't know off the top of my head, Google may be your friend here
Just to give you more information, I am running the following Heckman two-step model, in which the dependent variables in both the selection model and the outcome model is binary variable: heckit(selection = A.KLD.Cov~A.FCF+A.MB+A.Size1+A.Lev.TLO.TA+A.Anal+A.Stock.Return+Year.effect, outcome=Cash.Only~A.Aggr.Str.Scaled+A.Aggr.Con.Scaled+A.FCF+A.MB+A.Size1+A.Lev.TLO.TA+A.Anal+A.Stock.Return+T.MB+R.Size1+T.Lev.TLO.TA+T.Anal+T.Sales.Growth+T.R.D+T.Hi.Tech+Related+S.State+Year.effect, method = "2step", data=mydata)->m2
I'm not sure, I don't really use summary(.vcov) but you could try running it both ways and seeing if it's any different. For the second question, no, HAC isn't a clustered SE format. But clustering by firm using vcovCL should have the desired effect.
Hello, thanks for your video. I get some issues to use the packages wooldridge also jtools. Both refuse to install. Could you help me about. I want to make a négative binomial
Super helpful thanks! One question: I'm in a situation where I would prefer to use clustered standard errors since I suspect there's correlation between the error terms within a cluster (e.g. I want to cluster over city), but unfortunately I do not have access to the cluster id (no idea which observation came from which city). Would using the robust standard error (where I don't have to provide the cluster membership) be a reasonable substitute option since it should still be heteroskedasticity robust or is that a fool's errand? Thanks!
I wouldn't say it really solves the same problem - your clustering issue would still be there, but I'd still recommend doing the robust SEs in that case, yes. Then acknowledge in the paper hte remaining correlation in the error term.
Hey Nick, I was wondering how I can include both robust standard errors as well as clustered standard errors in my output table. Hence having an output where my standard errors and p-value are robustness adjusted as well as clustered. Thanks in advance :)
This is more a question about the command you're using to generate the output table. I don't think stargazer or huxtable can do that (although maybe they can? I'm not sure). I bet the gt package can (gt.rstudio.com/) but I've never used it.
Yes, generally standard errors will get bigger after clustering. You can think of clustering as vaguely similar to reducing your degrees of freedom, which naturally raises SEs.
@@NickHuntingtonKlein great. Thanks a lot for your helpful answers. One last question regarding robust & clustered errors though. Is it not possible to calculate the robust standard errors first and then continue calculating the clustered errors already encompassing the robust errors. so when getting the adjusted regression result for clusters it already encompasses the robust errors? Because in that sense it doesn't really have anything to do with output packages such as stargazer. Thanks in advance
@@b1b1b1b1able Oh, I thought the issue you were having was displaying both sets of errors on a table. The easiest way to do both robust and then cluster-robust errors is to just calculate the SEs both ways using coeftest. You may also want to look into the estimatr package.
@@NickHuntingtonKlein so using coeftest with both vcovCL and vcovHC in one line? That doesn't seem to work. How would the code with coeftest look like to get an output that has both robust and clustered standard errors?
Hi Nick. This was great. I developed a logit model with clustered standard errors. Used the coeftest command. Then I used the predict command to develop a predicted probability dataset and the plan was to develop a predicted probability curve. While the predict command identifies glm, but shows error for coeftest command. So my question is how to develop a predicted probability curve for a logit model WITH clustered standard errors. Please advise. Thanks.
Clustered or robust SEs are only an adjustment to the standard errors. They won't change the coefficients themselves, and so won't have any effect on your predicted probabilities (unless you're also calculating prediction CIs or something). If you're having trouble with coeftest in general though, you can often get adjusted SEs from a regression table command like export_summs in jtools or msummary in modelsummary. See the docs for how to cluster in these.
Thanks Nick. I will try doing this. Yes, I was looking for predicted CIs, and then plot a predicted probability curve of the logit model with predicted CIs. Any advise on how should we do that or package I should use? Thanks so much for a quick response. I appreciate your help.
@@PraveenKumar-ro1cd I wasn't sure, but searching "prediction interval logit R" brings up this Stackexchange question, the first answer seems to cover it stackoverflow.com/questions/14423325/confidence-intervals-for-predictions-from-logistic-regression
@@NickHuntingtonKlein Hi Nick, This link partially solved my question. But a piece of the question exists. if I am clustering my logit model by district, and then trying to develop a predicted probability curve for the logit model, I am getting an error still. If I remove clustering variable (i.e. district), the predict command does work. But in my glm if I add clustering variable, predict command renders error. May I please have your suggestion?
Thanks for your video, It helps! However, I just want to know, is it sensible to use robust St. errors for first-difference estimation? I know, I can use cluster st. error for FE estimation to fix serial correlation problem.
Robust SEs would be a good place to start (I assume you have more than two time periods?). If you have more than two time periods (i.e. more than one observation per individual after differencing) you'll want to carefully consider the time dependence structure of your data, and may want to consider clustering depending on how treatment is assigned - see www.nber.org/papers/w24003
@@NickHuntingtonKlein Many thanks for your detail explanation! Its a small research, T=8 AND N=320. I performed FE with cluster st. error, however, the problem is the adjusted R2 is negative, which is unusual right?although 2 coeffs are significant with 2 stars, F-statistic also get 2 stars + p values are okay. So, should I perform FD in this case or should I be happy with my FE ( clustering). I have clustered data, I mean 40 countries. Thanks again for the link :)
@@nasimaakter4164 be very careful about R2 with fixed effect models as there are a few different R2 values and they all mean different things. Not sure which kind you're talking about but I imagine it has a specialized interpretation. That shouldn't have anything to do with the standard errors though. As for FE vs FD I think it comes down to modeling purposes. If you're interested directly in period to period growth, do FD
@@NickHuntingtonKlein many thanks again! I am looking for some driving forces or drivers. no, not period to period growth. yes, it is a specialisation index, I would like to see what are the drivers behind such specialization? Do you think, negative adjusted R2 is not good, I mean, the model fit is bad?? I am bit confused here.
@@nasimaakter4164 I mean, generally yes a negative adjusted R2 would be a bad fit but I'm guessing it may not be the kind of R2 you have in mind. Fixed effects models have several different R2s with different interpretations. I generally don't pay much attention to R2, especially in a FE model
Hello ! Thank you for the clear and useful videos! I tried Robust and cluster errors with a Ln dependant variable in a semi-log model but it doesn't work. Is it normal?
@@NickHuntingtonKlein I heve several messages like "length of NULL cannot be changed" when I use "stargazer". And the cluster st. error are exactly the same in each cluster. Maybe the clusters are not relevant.
@@jessicabosseaux7334 strange, yeah, that doesn't sound right. Maybe try doing clusters in the lfe or estimatr packages instead. I have a video on estimatr
R squared and RSS don't care about the standard errors, just the model coefficients. So it will be the same R squared and RSS from the original un-clustered model.
@@tylerodette8347 You can cbind the variables you want to cluster by together, that should work. Or check out my video on estimatr which might be easier.
Your tutorials are the best I've seen!!Thank you so much.Maybe a stupid question but to check the values in panel data can I start by treating it as linear?
Thank you! I'm not entirely sure what you mean - most panel data models are already linear. Do you mean ignoring the panel structure? That is called a pooled model. I wouldn't trust the results if you think the panel structure is important, but it would let you check things like collinearity
I mean that I dont know how to select the controls! For linear models it's pretty staight forward forward and I know I can just add fixed effects by adding a factor to lm function, and as I want to add industry-by-year fixed effects, I think its easier to do with linear model. But one question: to add industry-by-year fixed effects, I don't have to group or do anything more right?I'm sorry.. need to send my thesis and i'm going crazy with panel data...
I didn't find the function coeftest in sandwich, but I found it in lmtest. I say this for those who might have got an error when running the coeftest for the first time with only sandwich loaded in. The rest was perfect. This is great content mister Huntington-Klein, thank you so much.
so to get the coeftest function we have to load lm-test? because i did that and this doesn't work.... is there any other solution? thanks in advance
@@ToneDeluxe First try using : lmtest: :coeftest( ). In case this doesn't work, try installing the lmtest package again and re-run.
thanks for your help guys. i just updated R and R studio and everything worked then. But thanks a lot!!
Yes, I am watching these in April 2021 and I got a warning that it didn't recognise "coeftest". however some googling showed that you also need to install and load lmtest. After that it worked fine
Man, every time I search on TH-cam anything that I need for my analysis, as I'm still learning, I find you. I'm really thankful for your videos, they are somewhat introductory yet concise enough for me to keep searching for concepts, methods, etc. that I never heard (I have no background in stats). Thanks again, mate.
To run coeftest function it is needed to intall the package lmtest (install.packages("lmtest"). Aditionally, to run the function linearHypothesis() it is needed to install the package car (install.packages("car")) and call the library(car) before perform the function. Thanks for the very well presented contet mr Klein!
Thank you so much for this video! ChatGPT had me running in circles with error codes!
Thank you so much, Nick. I have spent almost one hour looking for the solution to output robust standard error regression result before your video.
I had to install and load several packages this time, and many errors on the way. There was even defunct code or command just because I didn't capitilize a letter :) Thanks for the video.
Very nicely explained. Many doubts got cleared. Excellent video for beginners👍
Great video, clear explanation of tests with code. Thanks
your content is always amazing!
Hi Nick,
Great video. Once you run the coeftest fuction through stargazer, it no longer produced the statistics at the bottom of the regression table (R2, Adj. R2, F-Test, Number of Observations, etc...). Do you know how to maintain such stats?
Thanks for the help!
EDIT: I understand I can do it manually by taking the data from the non-clustered model, but when working with many models, it would be helpful to have stargazer produce this automatically.
You can send the regular, non clustered version to stargazer, and then send the clustered version only for the SEs. I believe it's se= in stargazer to set the SEs, but you may want to check the stargazer help file.
Thank youu for the video. My question is how to solve the problem if there is no heteroscedasticity but autocorrelation problem. Should I still use vcovHC from coeftest or there exist special command only for autocorrelation problem
The adjustment for autocorrelation is different. You can use vcovHAC or NeweyWest from the sandwich package, or NW from fixest. See this section of my book theeffectbook.net/ch-StatisticalAdjustment.html#your-standard-errors-are-probably-wrong
Hello Nick! Thank you for the video, it's super useful. I am running a Heckman two-step model in R and I am trying to obtain the robust standard errors for that. Any ideas how to do that? Thanks :)
I believe the same steps should work for the Heckman regression object. Does it not?
@@NickHuntingtonKlein Thanks for the reply. Unfortunately not, at least I cant get the results. The error that I get is the following:
> stargazer(m2, coeftest(m2, vcovHC), type = "text")
Error in UseMethod("estfun") :
no applicable method for 'estfun' applied to an object of class "c('selection', 'selection', 'list')"
@@hussainimussa hmm, strange. That's surprising, as under the hood Heckman is just a regular regression. I'm afraid I don't know off the top of my head, Google may be your friend here
Just to give you more information, I am running the following Heckman two-step model, in which the dependent variables in both the selection model and the outcome model is binary variable:
heckit(selection = A.KLD.Cov~A.FCF+A.MB+A.Size1+A.Lev.TLO.TA+A.Anal+A.Stock.Return+Year.effect, outcome=Cash.Only~A.Aggr.Str.Scaled+A.Aggr.Con.Scaled+A.FCF+A.MB+A.Size1+A.Lev.TLO.TA+A.Anal+A.Stock.Return+T.MB+R.Size1+T.Lev.TLO.TA+T.Anal+T.Sales.Growth+T.R.D+T.Hi.Tech+Related+S.State+Year.effect, method = "2step", data=mydata)->m2
@@NickHuntingtonKlein Thanks a lot, I will try to find a solution :)
Is there a difference between (1) summary(MOD.OLS.FE, .vcov = vcov.OLS.FE)
and (2) coeftest(MOD.OLS.FE, vcov.OLS.FE) with vcov.OLS.FE
I'm not sure, I don't really use summary(.vcov) but you could try running it both ways and seeing if it's any different. For the second question, no, HAC isn't a clustered SE format. But clustering by firm using vcovCL should have the desired effect.
Hello, thanks for your video. I get some issues to use the packages wooldridge also jtools. Both refuse to install. Could you help me about. I want to make a négative binomial
That's odd, they're both installing fine for me. Might be an issue with your local R installation.
Super helpful thanks! One question: I'm in a situation where I would prefer to use clustered standard errors since I suspect there's correlation between the error terms within a cluster (e.g. I want to cluster over city), but unfortunately I do not have access to the cluster id (no idea which observation came from which city). Would using the robust standard error (where I don't have to provide the cluster membership) be a reasonable substitute option since it should still be heteroskedasticity robust or is that a fool's errand? Thanks!
I wouldn't say it really solves the same problem - your clustering issue would still be there, but I'd still recommend doing the robust SEs in that case, yes. Then acknowledge in the paper hte remaining correlation in the error term.
@@NickHuntingtonKlein Understood. Awesome, appreciate the great videos and the advice!
Hey Nick, I was wondering how I can include both robust standard errors as well as clustered standard errors in my output table. Hence having an output where my standard errors and p-value are robustness adjusted as well as clustered. Thanks in advance :)
This is more a question about the command you're using to generate the output table. I don't think stargazer or huxtable can do that (although maybe they can? I'm not sure). I bet the gt package can (gt.rstudio.com/) but I've never used it.
Yes, generally standard errors will get bigger after clustering. You can think of clustering as vaguely similar to reducing your degrees of freedom, which naturally raises SEs.
@@NickHuntingtonKlein great. Thanks a lot for your helpful answers. One last question regarding robust & clustered errors though. Is it not possible to calculate the robust standard errors first and then continue calculating the clustered errors already encompassing the robust errors. so when getting the adjusted regression result for clusters it already encompasses the robust errors? Because in that sense it doesn't really have anything to do with output packages such as stargazer. Thanks in advance
@@b1b1b1b1able Oh, I thought the issue you were having was displaying both sets of errors on a table. The easiest way to do both robust and then cluster-robust errors is to just calculate the SEs both ways using coeftest. You may also want to look into the estimatr package.
@@NickHuntingtonKlein so using coeftest with both vcovCL and vcovHC in one line? That doesn't seem to work. How would the code with coeftest look like to get an output that has both robust and clustered standard errors?
could not find function "linearHypothesis"
It's in the car package, so install car and then library(car).
Hi Nick. This was great. I developed a logit model with clustered standard errors. Used the coeftest command. Then I used the predict command to develop a predicted probability dataset and the plan was to develop a predicted probability curve. While the predict command identifies glm, but shows error for coeftest command. So my question is how to develop a predicted probability curve for a logit model WITH clustered standard errors. Please advise. Thanks.
Clustered or robust SEs are only an adjustment to the standard errors. They won't change the coefficients themselves, and so won't have any effect on your predicted probabilities (unless you're also calculating prediction CIs or something). If you're having trouble with coeftest in general though, you can often get adjusted SEs from a regression table command like export_summs in jtools or msummary in modelsummary. See the docs for how to cluster in these.
Thanks Nick. I will try doing this. Yes, I was looking for predicted CIs, and then plot a predicted probability curve of the logit model with predicted CIs. Any advise on how should we do that or package I should use? Thanks so much for a quick response. I appreciate your help.
@@PraveenKumar-ro1cd I wasn't sure, but searching "prediction interval logit R" brings up this Stackexchange question, the first answer seems to cover it stackoverflow.com/questions/14423325/confidence-intervals-for-predictions-from-logistic-regression
@@NickHuntingtonKlein Thanks so much. Let me check with the codes shown in the link. Hope it works. Will keep you updated. Thanks again :)
@@NickHuntingtonKlein Hi Nick, This link partially solved my question. But a piece of the question exists. if I am clustering my logit model by district, and then trying to develop a predicted probability curve for the logit model, I am getting an error still. If I remove clustering variable (i.e. district), the predict command does work. But in my glm if I add clustering variable, predict command renders error. May I please have your suggestion?
Thanks for your video, It helps! However, I just want to know, is it sensible to use robust St. errors for first-difference estimation? I know, I can use cluster st. error for FE estimation to fix serial correlation problem.
Robust SEs would be a good place to start (I assume you have more than two time periods?). If you have more than two time periods (i.e. more than one observation per individual after differencing) you'll want to carefully consider the time dependence structure of your data, and may want to consider clustering depending on how treatment is assigned - see www.nber.org/papers/w24003
@@NickHuntingtonKlein Many thanks for your detail explanation! Its a small research, T=8 AND N=320. I performed FE with cluster st. error, however, the problem is the adjusted R2 is negative, which is unusual right?although 2 coeffs are significant with 2 stars, F-statistic also get 2 stars + p values are okay. So, should I perform FD in this case or should I be happy with my FE ( clustering). I have clustered data, I mean 40 countries. Thanks again for the link :)
@@nasimaakter4164 be very careful about R2 with fixed effect models as there are a few different R2 values and they all mean different things. Not sure which kind you're talking about but I imagine it has a specialized interpretation. That shouldn't have anything to do with the standard errors though. As for FE vs FD I think it comes down to modeling purposes. If you're interested directly in period to period growth, do FD
@@NickHuntingtonKlein many thanks again! I am looking for some driving forces or drivers. no, not period to period growth. yes, it is a specialisation index, I would like to see what are the drivers behind such specialization? Do you think, negative adjusted R2 is not good, I mean, the model fit is bad?? I am bit confused here.
@@nasimaakter4164 I mean, generally yes a negative adjusted R2 would be a bad fit but I'm guessing it may not be the kind of R2 you have in mind. Fixed effects models have several different R2s with different interpretations. I generally don't pay much attention to R2, especially in a FE model
Hello ! Thank you for the clear and useful videos!
I tried Robust and cluster errors with a Ln dependant variable in a semi-log model but it doesn't work. Is it normal?
I'm not sure exactly what you mean by it not working, but in general it's cluster-robust or regular hetetoskedastocity-robust, not both
@@NickHuntingtonKlein I heve several messages like "length of NULL cannot be changed" when I use "stargazer". And the cluster st. error are exactly the same in each cluster. Maybe the clusters are not relevant.
@@jessicabosseaux7334 strange, yeah, that doesn't sound right. Maybe try doing clusters in the lfe or estimatr packages instead. I have a video on estimatr
@@NickHuntingtonKlein It works !!! Thank you very much !
Great video, it helped me a lot! How can i obtain the R-squared and RSS from a model after clustering the standard errors?
R squared and RSS don't care about the standard errors, just the model coefficients. So it will be the same R squared and RSS from the original un-clustered model.
@@NickHuntingtonKlein Interesting. That makes sense given SE does not show up in the calculation for those statistics. Thanks for the quick response!
@@NickHuntingtonKlein What if we want to cluster around two variables?
@@tylerodette8347 You can cbind the variables you want to cluster by together, that should work. Or check out my video on estimatr which might be easier.
Use felm for clustered se without fe bro, less hassle
I usually recommend estimatr to undergrads these days, I find they have a hard time with the felm syntax
Your tutorials are the best I've seen!!Thank you so much.Maybe a stupid question but to check the values in panel data can I start by treating it as linear?
Thank you! I'm not entirely sure what you mean - most panel data models are already linear. Do you mean ignoring the panel structure? That is called a pooled model. I wouldn't trust the results if you think the panel structure is important, but it would let you check things like collinearity
I mean that I dont know how to select the controls! For linear models it's pretty staight forward forward and I know I can just add fixed effects by adding a factor to lm function, and as I want to add industry-by-year fixed effects, I think its easier to do with linear model. But one question: to add industry-by-year fixed effects, I don't have to group or do anything more right?I'm sorry.. need to send my thesis and i'm going crazy with panel data...