Hi, thank you very much for the video, very instructive. I have a question. I have smal N (18) large T (83) panel dataset (quaterly data). As dependent variable is profitability measure for which time persistence can be expected I include lagged dependent variable in model and use LSDV estimation (Nickell bias should be minor given the number of time periods). The question is, how many lags of dependent variable should i use, is there any rule for that? My model seems to be better fit when i Include 5 lags, however choosing 5 lags seems to me to be very deliberate and not very science-based. Thank a lot in advance for your response.
LSDV sounds like a good idea in that case. The rule that I have seen for choosing the number of lags is that you add them until they become statistically non-significant. To check this, you would check the significance of the coefficients of the lagged variables. If this produces an unreasonably large number of lags, you can report two models 1) this large model and 2) a model that includes a smaller number of lags, say 2-3, and then check if including more lags makes a difference for the effect that you are mostly interested in.
@@mronkko Thank you for your response. I would have one follow up. Are there any issues (in term of ols assumptions) i should be aware of and which i should check that arise from the inclusion of more lags of dependent variable? Once again, Thank you, i really appreciate that.
@@paveljankular3512 Not that I can think of. Adding one lag means that you will need to deal with dynamic panel bias, but I do not think that adding a second lag makes a difference once you have one already.
@@mronkko But we are choosing from a huge range of possibilities. The selection bias here must be massive, effectively lowering the statistical significance of what we choose, no? When I lag a variable by n months, in many cases I will have ~12 values to choose from. Choosing the best 1 from 12 provides a lot of luck. I am encountering this in a project right now.
@@mronkko If I had to guess we are just to perform some kind of model validation techniques like k-fold or leave one out, to verify the performance extrapolates?
Depends on what you mean by "use of the random effects model". Using GLS RE estimator with lagged dependent variables is a bad idea because it leads to dynamic panel bias. See my talk on Arellano-Bond estimator. But if you mean random effects (or latent variables) more generally, there are ways to use them with lagged dependent variables. I present one approach in the Arellano-Bond video and for example the DSEM technique implemented in Mplus would work.
@@mronkko Thanks for your reply! Yes, I meant using GLS RE estimator when having lagged dependent variable in the model. So, If I understand correctly when having lagged dependent variable in the model, the best approach is using the Arellano-Bond approach (based on GMM) right?
@@amirhoseinzahedi4993 I would go for the ML approach that Allison discusses. (See the citations in my Arellano-Bond video). If you have a long panel, then A-B with GMM might be preferable because it is simpler to do.
This is so informative and clear!!! Thank you so much for generously sharing your knowledge 🙏
You are welcome!
Thank you for sharing knowledge~
You are welcome!
You are superp at teaching. Thank you so much for ther very good vdo.
It's my pleasure
brilliant video!
Thanks!
Thanks for posting this!
Hi, thank you very much for the video, very instructive. I have a question. I have smal N (18) large T (83) panel dataset (quaterly data). As dependent variable is profitability measure for which time persistence can be expected I include lagged dependent variable in model and use LSDV estimation (Nickell bias should be minor given the number of time periods). The question is, how many lags of dependent variable should i use, is there any rule for that? My model seems to be better fit when i Include 5 lags, however choosing 5 lags seems to me to be very deliberate and not very science-based. Thank a lot in advance for your response.
LSDV sounds like a good idea in that case. The rule that I have seen for choosing the number of lags is that you add them until they become statistically non-significant. To check this, you would check the significance of the coefficients of the lagged variables. If this produces an unreasonably large number of lags, you can report two models 1) this large model and 2) a model that includes a smaller number of lags, say 2-3, and then check if including more lags makes a difference for the effect that you are mostly interested in.
@@mronkko Thank you for your response. I would have one follow up. Are there any issues (in term of ols assumptions) i should be aware of and which i should check that arise from the inclusion of more lags of dependent variable? Once again, Thank you, i really appreciate that.
@@paveljankular3512 Not that I can think of. Adding one lag means that you will need to deal with dynamic panel bias, but I do not think that adding a second lag makes a difference once you have one already.
@@mronkko But we are choosing from a huge range of possibilities. The selection bias here must be massive, effectively lowering the statistical significance of what we choose, no? When I lag a variable by n months, in many cases I will have ~12 values to choose from. Choosing the best 1 from 12 provides a lot of luck. I am encountering this in a project right now.
@@mronkko If I had to guess we are just to perform some kind of model validation techniques like k-fold or leave one out, to verify the performance extrapolates?
Thanks for the video, I was wondering if you can use the Random effect model when you have lagged dependent variable in the model?
Depends on what you mean by "use of the random effects model". Using GLS RE estimator with lagged dependent variables is a bad idea because it leads to dynamic panel bias. See my talk on Arellano-Bond estimator. But if you mean random effects (or latent variables) more generally, there are ways to use them with lagged dependent variables. I present one approach in the Arellano-Bond video and for example the DSEM technique implemented in Mplus would work.
@@mronkko Thanks for your reply! Yes, I meant using GLS RE estimator when having lagged dependent variable in the model. So, If I understand correctly when having lagged dependent variable in the model, the best approach is using the Arellano-Bond approach (based on GMM) right?
@@amirhoseinzahedi4993 I would go for the ML approach that Allison discusses. (See the citations in my Arellano-Bond video). If you have a long panel, then A-B with GMM might be preferable because it is simpler to do.
@@mronkko Thanks a ton, that was really helpful!
@@amirhoseinzahedi4993 You are welcome. I just taught this topic in person this week, so it is fresh in my mind ;)
Thanks sir.
Plz sir reply me fast .
What's lagged independent variables. ?
@@mronkko thank you so much sir. .
Love from pakistan
@@mronkko Plz sir can you defined. .
Time as a variable. ? And errors in variables.
@@mronkko thank you sir 😊😊😊
@@mronkko always be happy