2 minutes in and this is the best produced video on econometrics on youtube. Very engaging presence and voice Graduated economics with donors 5 years ago, and still need a refresher every now and then lol Thanks for the presentation
Can't thank you enough for how awesomely you explain the R part You have covered everything that's not there in one single video anywhere else you the best
Thank you for your amazing effort. There is a big issue in panel data which is the cross-sectional dependence besides the autocorrelation and heteroskedasticity. It would be great if you add a supplementary video of vcovHC, vcovSCC... etc
i know Im asking randomly but does someone know a way to log back into an Instagram account..? I was dumb lost my login password. I would appreciate any help you can offer me!
@Bentlee Alessandro I really appreciate your reply. I found the site through google and Im trying it out atm. Seems to take a while so I will reply here later with my results.
Hi! I'm using panel data with the indices "firm" and "year". I tried to run a Fixed Effects regression and put "Industry" and "Year" as indices for fixed effects, but the output for FE "individual" was still firms, not industries. I heard that Industry FE are not possible in the FE model because Industry is a time-invariant variable. That's why I chose a pooled OLS model now. However, how can I determine fixed effects here? Simply as a control variable or is there any other possibility? Thank you for your help!
It's correct that you can't add industry as a control when you already have firm fixed effects, since they'd be perfectly collinear. The good news is you don't need to! Including firm fixed effects automatically controls for anything that's fixed within firm, like industry. So you don't need to add the industry controls, that job is already done by the firm fixed effects. If your goal is to study the effect of industry itself, are you sure you really want to control for firm? What identification problem does that solve for you? If you do really want both you'll need to run some sort of hierarchical random effects model, which can get tricky. The relevant R package for that would be lmer.
@@lanaschludi5466 if it says they did industry and year fixed effects then they probably didn't do firm fixed effects, just industry and year. The "yes" in the table just means "yes, we included these fixed effects, but the estimates for them aren't important so we're not going to show them to you, we just included them as controls." so you can ignore firm FE and just do industry and year.
Dear Professor, do you have any video related to BLPEstimator in R or any step-by-step guide as in your knowledge? I'm trying to use it for my paper but I am super confused. Thank you, your videos are extremely helpful!
Hi, Thanks for the videos! quick question: I noticed that when using the plm function, the R-squared statistic is not including the variation explained by the time and space fixed effects variables. Is there a way to include it into the results? I tried doing it with lm() and inserting the year and space variables (as factors) but i have too many spaces (34,000) and I recieve an error that "cannot allocate a vector of size 8.4 GB". Thanks again!
I don't think plm can do it directly but you can compute it by hand as in karthur.org/2016/fixed-effects-panel-models-in-r.html Or, estimate it instead in lfe (see the same link) or estimatr (see my estimatr video) which report the full R squarws
Hey, does anyone know if there's a way to perform a two-way clustered standard errors when working with plm package ? So far I haven't been able to solve this issue, ever since coeftest only allows one to cluster either by individual or by time. Thanks in advance !
The vcovDC() function in plm should be able to do it. But also you could switch to using the fixest package instead which has easy syntax for doing this.
Thanks vary much for posting this video. I have several questions out of it: First, can we include dummy variables in panel regression (the only thing i have found is that inappropriate for FE model, but what about other models (RE, OLS)). Second, does these assumptions works with pglm package in r studio? (When i want to run panel ordinal logistic regression (OLS, FE, RE) or panel ordinal probit regression (OLS, FE, RE)? In pglm would be interpretation the same if i will log independent variables? (like increase in 1% of log value result in increase of dependent variable on 1, because i have taken a log)
Including dummy variables in panel regression is totally fine (just be careful that they don't overlap too much with your fixed effects). The meaning of FE is slightly different in logit-FE or probit-FE than in OLS-FE, but the same general ideas work. The interpretation of the coefficients will be the same as in *logit or probit* models, not the same as in OLS-FE. I should point out that the "1% in log value" interpretation is for use with logged variables, not with logit/probit - those are different.
good video! i have a question regarding FE model. I have a big issue with cross sector independence and can not "fix" it with the coeftest. i have tried both measures (vcovHC with cluster=time and driscoll and kraay) the result after running the coeftests are that all my variables are insignificant. That my means the model is not useful? what can i do?
Insignificant results doesn't mean the model is wrong, it just means your results are insignificant! That said, FE does suck a lot of the statistical power out of a model, especially if you have a "big-N-small-T" model (i.e. many groups but few observations per group). If you're pretty certain that your predictors *should* be predicting a large share of the within-variance but you're not getting anything, consider if maybe you are actually controlling away more variation than you intend. Ask: after removing the variation from the fixed effects, should there be enough variation left to actually study? You might also try random effects if you think those assumptions might be likely to hold.
Thank you so much, professor! Such high-quality content! I have three questions regarding the fixed effect model which you did. - Is that ONLY "county" fixed effect? How do I include "time" fixed effect into that model? In this case, the "year" variable? I want to do both country AND year fixed effects. - The lag function that you used, is it by default 1 year (1 row) lag? Don't we need to specify how much lag we want? - How do I increase the lag, let say 2 years in this case? Thank you and best regards!
For all of these I'd recommend checking out my video on the fixest package, which makes two way fixed effects easy and has an easily customizable lag function
Hi Nick, thanks for the video, it has been really helpful! I have a question about adding the lagged dependent variable to the fixed effects model. I want to estimate a dynamic model but I read that adding the lagged dependent to the fixed effect model would give dynamic panel bias, as the correlation between the lagged variable and the error term yields inconsistent estimates. I also read that the system GMM estimator developed for dynamic models deals with this issue. What do you think about this and do you know how to code the system GMM estimator in R?
Hi Nick, thanks for the great content! I have one question: I am a step before this video. Therefore, still trying to figure out what to do with my missing data. It's not a lot that is missing though some of it is in the dependent variable and some of it in the independent variable. Do you have any videos on that? Would be very grateful for any advice here! Best Oliver
I'm afraid I don't have any videos on missing data. The standard thing to do is just to drop observations with missing data, although that introduces a fair number of problems. You may want to look into multiple imputation.
@nick Thank you professor for a very informative and clear explanation. I have one question, do we need to add the dummy variable for Year in the plm model? fixedeff
@@rajat1770 A two way fixed effects model is a regression model with fixed effects for individual/group and also fixed effects for time. You can get this in plm using effect = "twoways" as long as you specify both a group and time variable for the index
Professor thank you a lot for the video. It is really hard to find something easy to understand about this topic, and this video makes so many things clear for me. I have a question about the video. In plm function we can choose (model="within", effect='"time") and (model="between"). As I understand from your between model definition, I would expect to have the same results, but my results are different than each other. My question is that what is the difference between time effect within model and between model?
Thanks, it is very helpful! Is there also some video/some documents we can check to understand how to run VIF, partial F test and VAR method for panel data? Thank you
I'm afraid I don't have materials on those except or partial F (see my post regression statistics video), but for vif I recommend the jtools package. Never done a VAR in R myself I'm afraid
if i want to know the time effect and country effect, in my dependant variables(GDS,GDP,GCP) with T=20 and N 20. the question ¿ shoud i have to use year dummy and country dummys ? to capture their efect in (GDS,GDP,GCP) ? and in that case wich model recomend i want to know the individuals effect and the time effect
Thank you for the very nice explanation! It's rare to come accross someone that can explain a topic in such a short and precise manner. I have one question though: How would you deal with unbalanced panel data ie. when number of years is different among individuals? Can you just use the same approach or fo you need to adjust for the differnces? Thx again!
Thank you very much! There are some small differences in methods (often computational differences as opposed to statistical differences) when it comes to unbalanced panel data. However, conveniently, most main-line panel data software is designed to handle it already, plm being an example of a package that does so. So you can use the same commands with rare exception.
I am working with a different panel dataset and following your tutorial. When declaring my dataset as panel i ran into an error --> In pdata.frame(Healthcare, index = c("country", "Year")) : duplicate couples (id-time) in resulting pdata.frame to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany") When applying fixed effects i ran in errors --> Error in `.rowNamesDF
It's hard to say too much without knowing your data, but it sounds like you have multiple observations per combination of country/year, which plm doesn't like. I'd recommend checking out my video on the estimatr package and using that instead.
First thank you for your amazing video and seconed : i ‘d like to know , i did pannel regression on stata “ want to see the effect of transport cost with other ind variables on exports in Sub Saharan Africa. ” i did regress simply and i do not know what is fixed or any thing can you please write down any advices,, thanks inadvance
For Stata I'd recommend looking into their panel commands. xtset lets it know you're working with panel data, and then you can do fixed effects with xtreg
@@NickHuntingtonKlein thank you professor for your interest . either not professional but I feel like crashing with my model but I really thankful to your reply.
This is great. I like your passion, and commitment to teaching. I was wondering if you may do the videos on exploratory data analysis with R or Stata or Python, through Economics lens. Since before diving into data analysis, one has to check normality [Normal Distribution], outliers...and other issues in data that may affect analysis to avoid wrong conclusisons.
@@NickHuntingtonKlein Great. Would you mind to share the link for those videos on EDA? By the way, I like to read Economics papers from top Economics Journals such as AEA, QJE..NBER, when I see they are written, analyzed, do they use Rmarkdown, Quarto, Stata Markdown..LaTex or Overleaf? What is the typingsetting system recommended for Economists with the mind of reproducibility and replication? Thanks.
@@kwizeralambert1316 apologies, looks like I added the EDA section to that class after making those videos. But you can see the updated course, including the EDA lecture, here github.com/nickch-k/datacommslides Most economists use either latex or Word. Markdown is slowly gaining popularity.
@@NickHuntingtonKlein Thank you so much, Very rich courses. I understand, LaTex is widely used. Currently, I see some some researchers are using Quarto and Rmarkdown to write their papers
Hi Nick, first of all thank you so much for this stunning video. I am a little bit confused about using Random effects with Robust standard error (sandwich estimator) in r. In Stata I use xtreg Y X, re vce(robust). How do you apply this syntax in r ?
Try running some lmer output (from lme4) through coeftest from the sandwich package, or just send it to msummary (in the modelsummary package) and set the SE type with the vcov option.
hello Nick, I downloaded your r code for this video and I tried run all of your codes. however, the lagged models gave different results. The observations are not drop to 540 (as your video shown), but drop only to 639 observations. Do u know what is the reason behind it or what should I do? thank you very much...
@@NickHuntingtonKlein no sir , I runned the ( #include a lag ) model and from this ideo, it became 540 , while in my rstudio its 629. Apparently, I found the problem sir. When I load dplyr package, my result for the # include a lag model is N = 629. Once I unloaded dplyr and run it again, it becomes 540 (probably from 630- 90 county) like your video. However, i don't know why dplyr could do this 😂
@@christina-4287 oh, yeah, one of the downsides of dplyr is that it shares function names with a lot of other functions, so sometimes loading it will overwrite a necessary function you need
@@NickHuntingtonKlein okayy so is it better to unload the dplyr in this case 👍. Moreover sir, I tried to implement it to my data. I already unload the dplyr, and i have an original N= 1187. But when i regressed it using the lag of Y, the N drop very drastically to N = 836. But my county only 68 firms. I thought it should be 1187 - 68 only but it's not... There is also no missing value.. so why do you think R exclude many obsv in the regressiion? Thank you 🙏
@@christina-4287 You don't necessarily hav eto unload dplyr, but you want to be sure to use the plm version of lag() so instead of lag() you can just say plm::lag(). As for the data dropping rapidly, are there perhaps gaps in the data? If you have time periods 1, 3, 4, 5, then applying a lag will drop both 1 and 3.
Glad it helped! The standard test is the variance inflation factor or VIF. I don't know the R command off the top of my head but there's your Google term.
@@NickHuntingtonKlein Thanks for replying so quickly. The VIF commands only seem to work for lm not plm commands. Anyway, thanks for taking the time, it's much appreciated.
@@NickHuntingtonKlein Thanks a lot for the feedback. I have tried adjust my index variable, however, I have the error: "In pdata.frame(Data) : duplicate couples (id-time) in resulting pdata.frame to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")."
@@godwinezekoye211 sounds like you have duplicates. Did you run the check it said? Plm won't work if you have multiple observations per id/time. If you want that (and are just planning to do time FEs, obviously lags won't work in this setup), switch from plm to fixest, see my fixest video.
HI Nick. Thanks for the video. Im running both the within and fd models. In theory, the coefficient from these two models should be the same. However, when I run the regressions, I am obtaining different coefficients. Do you know anything about this? Thanks again
Yes, that should be true as long as you have exactly two time periods. In plm I'm getting numerically identical results for effect = "twoways" but not effect = "individual". i.e. you add in a time period dummy to the within model and it works. That lines up with what I get using regular lm() on demeaned data. I haven't thought about that equivalence in a while so I can't remember if a time dummy is a part of it, but that's what I'm getting.
Hi there-- do you have any suggestions for specific packages/code to use for visualizing panel data regressions in R? I'm having a hard time finding any guidance!
Depends what kind of visualization you want! jtools has some good regression visualizations. I've also been largely using fixest for fixed effects these days, and it has its own set of regression visualization functions
Hi...as always your delivery is unparallel. I have one question: can one try fixed effect model for a pseudo panel data (multiple cross sectional data pooled together)? each time data collected from a new sample. Stata declares it as weakly balanced data. Any suggestion is appreciated.
I'd call that a repeated cross section. You can add some sorts of fixed effects to that (like time) but not others (like individual). If plm won't let you register it as a pdata.frame, check out my estimatr video
amazing :D is there also a way to optimize the amount of lags ? i would have thought maybe up to the point where the variables are significant but the r^2 is as low as possible
@@mulle171 Typically you would try a bunch of different lag lengths and select the one that gives the best AIC or BIC statistic (Aikake/Bayesian Information Criterion). In general you pretty much never want to use statistical significance to make decisions about your model.
Hi Nick, this video is amazing! It is really saving me. I have some question regarding using the within function though. I set my indexes within the plm() regression, and there I have specified both the individual fixed effect, and the time fixed effect. Is that a correct way of doing it? Or am I losing information that way? I saw another video that includes dummies for the years, and only specifies the individual fixed effects in the index. Does that yield the same results? Lastly, if there are na's in the panel data, and I use na.action = na.omit, does this damage the results of the regression too badly? Sorry if these are too many questions at once, I'm just trying to wrap my head around the subject.
Yes, specifying it within plm() is the way to go - this should produce the same results as including year as a set of dummies, except perhaps for any standard error adjustments, and in that case the version with both included (and 'twoway' fixed effects specified) would be preferred. As for the NAs, if there are NAs in the data, plm will be dropping them anyway. How much damage this does to the results depends on how much data is missing and why. See the section on missing data in the Under the Rug chapter in my book nickchk.com/causalitybook.html
Hi Nick, thank you for the video, I have a question, which would be the code if i want to add fixed year effect and fixed state effect using first difference as model? I don't know how to do it. Thank a lot!
very interesting, I am trying to determine if I should use FE or RE for my logit regression in R. will these model="within" commands also work with the glm lines of code ? thank!
That won't work. Fixed effects for logit work very differently than for linear models. Check out the bife package for fixed effects logit. I haven't used it myself but I hear good things.
@@NickHuntingtonKlein Thank you, I will look into how to use that package! I have been having trouble finding information on how to determine weather I should be using a FE model or RE model for my logistic regression. Is the Hausman test the best tool to use when making this determination?
@@onigiriman Hausman is designed for linear models, so that wouldn't work, unless there's some special logit version. I'm afraid I don't know too much about logit random effects.
You have to have multiple observations of each of the things you have fixed effects of. So you can't have individual-level fixed effects in cross-sectional data. You could have fixed effects for some higher-level grouping. So for example if you had a cross-sectional data set of people, you can't have fixed effects for people, but you can have fixed effects for, say, city, as long as multiple people in your data are in each city. To do this in R, just add +factor(city) to your regression model, where city is the variable you want to add fixed effects of.
Is anybody getting an issue with the index function? I get the following error; data.p=pdata.frame(data,index=c("Site","Year")) Warning message: In pdata.frame(data, index = c("Site", "Year")) : duplicate couples (id-time) in resulting pdata.frame to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
@@NickHuntingtonKlein thank you. I was trying to understand how they did the classification of the variables with such a simplified command but I found it explained better in another site.
thankyou! I have purchase panel data of consumers over a period of time (1.5 year). Within this period of time there is an intervention of +/ - 0.2 years implemented. this purchase data consist out of the purchase of products. These products can belong to a productgroup (in total 10 productsgroups) I am interested in the effect of the intervention on the sales of the product for each category (and compare them?). Furthermore, i do have some interaction variables with the intervention variable, so i can analyze what the effect of these variables is during such intervention As i have panel data for each invidual it is the most powerfull to determine this effect at an invidual level. However, i am not sure how to determine if i should use a fixed, a random or mixed model. Can someone give me suggestions/advice how to approach this?
@@rutgervanbasten2159 In your case, since you're interested in how the effect varies over different predictors, I might recommend a mixed model, and specifically an HLM where you model the effect of the intervention as a function of your predictors.
Hi Nick, thanks so much for this video, it's incredibly helpful! Just a quick question-- I want to lag an independent variable by more than just one period before. Do you know how to code for lags of 2,3,4,etc. periods?
Thank you Nick for this video. I would like to ask what would be the command in Stata equivalent to R command concerning fixed effect estimator with lag? Or any other estimator, random..?
@@NickHuntingtonKlein Could you please advise, how to generate 1lag dependent variable (as you have it in your video)? I tried with slide command, and setDF command, ...etc... and nothing really works... additional problem is that I have some missing values...
Depends on how crime rate is defined. Old crime rates determine current police presence which determines current crime rate. So with the variables in the data, you can consider "lagged crime rate" to be an omitted variable. But yes your interpretation makes sense too.
A natural experiment is when you find a source of close-to-exogenous (randomized) variation in the real world without a researcher running the experiment themselves. For example, the US Vietnam draft lottery was administered by randomly ordering birthdates. So your likelihood of being drafted into the military was random, and that could be used as a sort of experiment to look at the effects of being in the military on later outcomes.
Thank you very much for this video ; it's clear ! I'm a biologist and I have to use those panel data models for my work. Do you know where I could find a clear explication on the output of summary(plm) please ? I have to report my results but untill now, I don't fully understand them :s Thank you very much!
Usually the same software tools work; the necessary adjustments for unbalanced panels are often automatic and included. Check the documentation of your command of choice though
@@NickHuntingtonKlein I think the problem is there duplicate firm-year ids but I always eliminate the duplicates. I matched my outomce variable with a different firm identifier.. could it be that?
Sir, your video saved my master's thesis. Thank you for your clear explanations!
Two years later it is still helping with master's thesis.
You saved me and my bachelors degree
2 minutes in and this is the best produced video on econometrics on youtube. Very engaging presence and voice
Graduated economics with donors 5 years ago, and still need a refresher every now and then lol
Thanks for the presentation
Just found my new favorite channel. Thank you for these videos.
Literally saved me. Ur a legend ❤
Can't thank you enough for how awesomely you explain the R part
You have covered everything that's not there in one single video anywhere else
you the best
This is so clear. Please keep making stats more accessible 🙏🙏🙏
Thank you for this! I always get confused between the graph for random intercept and fixed effects
So helpful. Was forgetting to declare it as a panel data frame!
Thanks for this video professor! You explain better than anyone!
Thank you for your amazing effort. There is a big issue in panel data which is the cross-sectional dependence besides the autocorrelation and heteroskedasticity. It would be great if you add a supplementary video of vcovHC, vcovSCC... etc
Love your board games collection.....
Thanks a lot for the video!!! I really need the basics, even during my Master degree.
i know Im asking randomly but does someone know a way to log back into an Instagram account..?
I was dumb lost my login password. I would appreciate any help you can offer me!
@Michael Kyrie instablaster :)
@Bentlee Alessandro I really appreciate your reply. I found the site through google and Im trying it out atm.
Seems to take a while so I will reply here later with my results.
@Bentlee Alessandro it did the trick and I actually got access to my account again. Im so happy:D
Thank you so much you saved my account!
@Michael Kyrie You are welcome :D
Best background topic ever !!!
Hi! I'm using panel data with the indices "firm" and "year". I tried to run a Fixed Effects regression and put "Industry" and "Year" as indices for fixed effects, but the output for FE "individual" was still firms, not industries. I heard that Industry FE are not possible in the FE model because Industry is a time-invariant variable. That's why I chose a pooled OLS model now. However, how can I determine fixed effects here? Simply as a control variable or is there any other possibility? Thank you for your help!
It's correct that you can't add industry as a control when you already have firm fixed effects, since they'd be perfectly collinear. The good news is you don't need to! Including firm fixed effects automatically controls for anything that's fixed within firm, like industry. So you don't need to add the industry controls, that job is already done by the firm fixed effects.
If your goal is to study the effect of industry itself, are you sure you really want to control for firm? What identification problem does that solve for you? If you do really want both you'll need to run some sort of hierarchical random effects model, which can get tricky. The relevant R package for that would be lmer.
@@lanaschludi5466 if it says they did industry and year fixed effects then they probably didn't do firm fixed effects, just industry and year. The "yes" in the table just means "yes, we included these fixed effects, but the estimates for them aren't important so we're not going to show them to you, we just included them as controls." so you can ignore firm FE and just do industry and year.
Thank you, Nick! Greetings from Brazil!
Good shit man. Really enjoyed this.
Thank you so much for the clear explanation. What are reasons to prefer fixed effects model over random ones in theory?
Thanks! I'd recommend checking out this section of my fixed effects chapter www.theeffectbook.net/ch-FixedEffects.html#random-effects
Thank you so much! Very informative & easy to follow.
Great video! Very well explained. Thanks for this.
Dear Professor, do you have any video related to BLPEstimator in R or any step-by-step guide as in your knowledge? I'm trying to use it for my paper but I am super confused. Thank you, your videos are extremely helpful!
Thanks! I'm afraid I don't have any blp materials though.
Hi, Thanks for the videos! quick question: I noticed that when using the plm function, the R-squared statistic is not including the variation explained by the time and space fixed effects variables. Is there a way to include it into the results?
I tried doing it with lm() and inserting the year and space variables (as factors) but i have too many spaces (34,000) and I recieve an error that "cannot allocate a vector of size 8.4 GB".
Thanks again!
I don't think plm can do it directly but you can compute it by hand as in karthur.org/2016/fixed-effects-panel-models-in-r.html
Or, estimate it instead in lfe (see the same link) or estimatr (see my estimatr video) which report the full R squarws
Hey, does anyone know if there's a way to perform a two-way clustered standard errors when working with plm package ? So far I haven't been able to solve this issue, ever since coeftest only allows one to cluster either by individual or by time. Thanks in advance !
The vcovDC() function in plm should be able to do it. But also you could switch to using the fixest package instead which has easy syntax for doing this.
Thanks vary much for posting this video. I have several questions out of it: First, can we include dummy variables in panel regression (the only thing i have found is that inappropriate for FE model, but what about other models (RE, OLS)). Second, does these assumptions works with pglm package in r studio? (When i want to run panel ordinal logistic regression (OLS, FE, RE) or panel ordinal probit regression (OLS, FE, RE)? In pglm would be interpretation the same if i will log independent variables? (like increase in 1% of log value result in increase of dependent variable on 1, because i have taken a log)
Including dummy variables in panel regression is totally fine (just be careful that they don't overlap too much with your fixed effects).
The meaning of FE is slightly different in logit-FE or probit-FE than in OLS-FE, but the same general ideas work. The interpretation of the coefficients will be the same as in *logit or probit* models, not the same as in OLS-FE. I should point out that the "1% in log value" interpretation is for use with logged variables, not with logit/probit - those are different.
good video! i have a question regarding FE model. I have a
big issue with cross sector independence and can not "fix" it with the
coeftest. i have tried both measures (vcovHC with cluster=time and driscoll and kraay) the result after running the coeftests are that all my variables
are insignificant. That my means the model is not useful? what can i do?
Insignificant results doesn't mean the model is wrong, it just means your results are insignificant! That said, FE does suck a lot of the statistical power out of a model, especially if you have a "big-N-small-T" model (i.e. many groups but few observations per group). If you're pretty certain that your predictors *should* be predicting a large share of the within-variance but you're not getting anything, consider if maybe you are actually controlling away more variation than you intend. Ask: after removing the variation from the fixed effects, should there be enough variation left to actually study? You might also try random effects if you think those assumptions might be likely to hold.
Thank you so much, professor! Such high-quality content!
I have three questions regarding the fixed effect model which you did.
- Is that ONLY "county" fixed effect? How do I include "time" fixed effect into that model? In this case, the "year" variable? I want to do both country AND year fixed effects.
- The lag function that you used, is it by default 1 year (1 row) lag? Don't we need to specify how much lag we want?
- How do I increase the lag, let say 2 years in this case?
Thank you and best regards!
For all of these I'd recommend checking out my video on the fixest package, which makes two way fixed effects easy and has an easily customizable lag function
@@NickHuntingtonKlein Thank you, sir!
Hi Nick, thanks for the video, it has been really helpful! I have a question about adding the lagged dependent variable to the fixed effects model. I want to estimate a dynamic model but I read that adding the lagged dependent to the fixed effect model would give dynamic panel bias, as the correlation between the lagged variable and the error term yields inconsistent estimates. I also read that the system GMM estimator developed for dynamic models deals with this issue. What do you think about this and do you know how to code the system GMM estimator in R?
I've never done it myself, but I believe the package to do it in is pgmm
@@NickHuntingtonKlein Alright thank you, I'll have a look!
Hi Nick, thanks for the great content! I have one question: I am a step before this video. Therefore, still trying to figure out what to do with my missing data. It's not a lot that is missing though some of it is in the dependent variable and some of it in the independent variable. Do you have any videos on that? Would be very grateful for any advice here!
Best
Oliver
I'm afraid I don't have any videos on missing data. The standard thing to do is just to drop observations with missing data, although that introduces a fair number of problems. You may want to look into multiple imputation.
@nick Thank you professor for a very informative and clear explanation.
I have one question, do we need to add the dummy variable for Year in the plm model?
fixedeff
If you want to add year effects you're better off by specifying it as a two-way model in the options
@@NickHuntingtonKlein can you please point me in the right direction. I didn't get you, what do you mean by two way model
@@rajat1770 A two way fixed effects model is a regression model with fixed effects for individual/group and also fixed effects for time. You can get this in plm using effect = "twoways" as long as you specify both a group and time variable for the index
Professor thank you a lot for the video. It is really hard to find something easy to understand about this topic, and this video makes so many things clear for me.
I have a question about the video. In plm function we can choose (model="within", effect='"time") and (model="between"). As I understand from your between model definition, I would expect to have the same results, but my results are different than each other. My question is that what is the difference between time effect within model and between model?
Thanks, it is very helpful! Is there also some video/some documents we can check to understand how to run VIF, partial F test and VAR method for panel data? Thank you
I'm afraid I don't have materials on those except or partial F (see my post regression statistics video), but for vif I recommend the jtools package. Never done a VAR in R myself I'm afraid
Thank you so much. This was very informative.
if i want to know the time effect and country effect, in my dependant variables(GDS,GDP,GCP) with T=20 and N 20.
the question ¿ shoud i have to use year dummy and country dummys ? to capture their efect in (GDS,GDP,GCP) ? and in that case wich model recomend i want to know the individuals effect and the time effect
If you want both effects, then yes you'd want to include fixed effects for both
@@NickHuntingtonKlein thanks you for the video and the quick answer (nice and complete)
Thank you for the very nice explanation! It's rare to come accross someone that can explain a topic in such a short and precise manner. I have one question though: How would you deal with unbalanced panel data ie. when number of years is different among individuals? Can you just use the same approach or fo you need to adjust for the differnces? Thx again!
Thank you very much! There are some small differences in methods (often computational differences as opposed to statistical differences) when it comes to unbalanced panel data. However, conveniently, most main-line panel data software is designed to handle it already, plm being an example of a package that does so. So you can use the same commands with rare exception.
@@NickHuntingtonKlein Thx so much!
I am working with a different panel dataset and following your tutorial. When declaring my dataset as panel i ran into an error
--> In pdata.frame(Healthcare, index = c("country", "Year")) :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
When applying fixed effects i ran in errors
--> Error in `.rowNamesDF
It's hard to say too much without knowing your data, but it sounds like you have multiple observations per combination of country/year, which plm doesn't like. I'd recommend checking out my video on the estimatr package and using that instead.
First thank you for your amazing video and seconed : i ‘d like to know , i did pannel regression on stata “ want to see the effect of transport cost with other ind variables on exports in Sub Saharan Africa. ” i did regress simply and i do not know what is fixed or any thing can you please write down any advices,, thanks inadvance
For Stata I'd recommend looking into their panel commands. xtset lets it know you're working with panel data, and then you can do fixed effects with xtreg
@@NickHuntingtonKlein thank you professor for your interest . either not professional but I feel like crashing with my model but I really thankful to your reply.
This is great. I like your passion, and commitment to teaching. I was wondering if you may do the videos on exploratory data analysis with R or Stata or Python, through Economics lens. Since before diving into data analysis, one has to check normality [Normal Distribution], outliers...and other issues in data that may affect analysis to avoid wrong conclusisons.
I do have some videos on EDA in my Data communications series
@@NickHuntingtonKlein Great. Would you mind to share the link for those videos on EDA? By the way, I like to read Economics papers from top Economics Journals such as AEA, QJE..NBER, when I see they are written, analyzed, do they use Rmarkdown, Quarto, Stata Markdown..LaTex or Overleaf? What is the typingsetting system recommended for Economists with the mind of reproducibility and replication? Thanks.
@@kwizeralambert1316 apologies, looks like I added the EDA section to that class after making those videos. But you can see the updated course, including the EDA lecture, here github.com/nickch-k/datacommslides
Most economists use either latex or Word. Markdown is slowly gaining popularity.
@@NickHuntingtonKlein Thank you so much, Very rich courses. I understand, LaTex is widely used. Currently, I see some some researchers are using Quarto and Rmarkdown to write their papers
@@kwizeralambert1316 yes I'm pretty much exclusively using quarto from here on out, outside of collaboration with people who don't use it
Hi Nick, first of all thank you so much for this stunning video. I am a little bit confused about using Random effects with Robust standard error (sandwich estimator) in r. In Stata I use xtreg Y X, re vce(robust). How do you apply this syntax in r ?
Try running some lmer output (from lme4) through coeftest from the sandwich package, or just send it to msummary (in the modelsummary package) and set the SE type with the vcov option.
@@NickHuntingtonKleinThanks! btw, What is the difference between this RE (Robust SE) from RE in your video ?
@@joseluissola8941 RE means random effects, not Robust se. So they're different things entirely
hello Nick, I downloaded your r code for this video and I tried run all of your codes. however, the lagged models gave different results. The observations are not drop to 540 (as your video shown), but drop only to 639 observations. Do u know what is the reason behind it or what should I do? thank you very much...
The 540 is only for the first-differencing model, since it by necessity has to drop one of the time periods. Is that the one you're running?
@@NickHuntingtonKlein no sir , I runned the ( #include a lag ) model and from this ideo, it became 540 , while in my rstudio its 629.
Apparently, I found the problem sir. When I load dplyr package, my result for the # include a lag model is N = 629. Once I unloaded dplyr and run it again, it becomes 540 (probably from 630- 90 county) like your video.
However, i don't know why dplyr could do this 😂
@@christina-4287 oh, yeah, one of the downsides of dplyr is that it shares function names with a lot of other functions, so sometimes loading it will overwrite a necessary function you need
@@NickHuntingtonKlein okayy so is it better to unload the dplyr in this case 👍. Moreover sir, I tried to implement it to my data. I already unload the dplyr, and i have an original N= 1187. But when i regressed it using the lag of Y, the N drop very drastically to N = 836. But my county only 68 firms. I thought it should be 1187 - 68 only but it's not... There is also no missing value.. so why do you think R exclude many obsv in the regressiion? Thank you 🙏
@@christina-4287 You don't necessarily hav eto unload dplyr, but you want to be sure to use the plm version of lag() so instead of lag() you can just say plm::lag().
As for the data dropping rapidly, are there perhaps gaps in the data? If you have time periods 1, 3, 4, 5, then applying a lag will drop both 1 and 3.
Hey Nick, Thanks for the explanation. Do you know how to check for multicollinearity in panal data regressions?
Glad it helped! The standard test is the variance inflation factor or VIF. I don't know the R command off the top of my head but there's your Google term.
@@NickHuntingtonKlein Thanks for replying so quickly. The VIF commands only seem to work for lm not plm commands. Anyway, thanks for taking the time, it's much appreciated.
I'm getting this msg when i'm trying to estimat fixed model
error in class(x)
I'm afraid I don't know French, but it sounds like an internal problem with plm maybe? I'd recommend asking on StackExchange.
Thanks for the video. How can I declare as a panel data given monthly?
In plm it should just work automatically. It assumes that the levels of the time variable it finds in the original data are consecutive and ascending.
@@NickHuntingtonKlein Thanks a lot for the feedback. I have tried adjust my index variable, however, I have the error:
"In pdata.frame(Data) : duplicate couples (id-time) in resulting pdata.frame
to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")."
@@godwinezekoye211 sounds like you have duplicates. Did you run the check it said? Plm won't work if you have multiple observations per id/time. If you want that (and are just planning to do time FEs, obviously lags won't work in this setup), switch from plm to fixest, see my fixest video.
HI Nick. Thanks for the video.
Im running both the within and fd models. In theory, the coefficient from these two models should be the same. However, when I run the regressions, I am obtaining different coefficients. Do you know anything about this? Thanks again
Yes, that should be true as long as you have exactly two time periods. In plm I'm getting numerically identical results for effect = "twoways" but not effect = "individual". i.e. you add in a time period dummy to the within model and it works. That lines up with what I get using regular lm() on demeaned data. I haven't thought about that equivalence in a while so I can't remember if a time dummy is a part of it, but that's what I'm getting.
Hi there-- do you have any suggestions for specific packages/code to use for visualizing panel data regressions in R? I'm having a hard time finding any guidance!
Depends what kind of visualization you want! jtools has some good regression visualizations. I've also been largely using fixest for fixed effects these days, and it has its own set of regression visualization functions
Hi...as always your delivery is unparallel. I have one question: can one try fixed effect model for a pseudo panel data (multiple cross sectional data pooled together)? each time data collected from a new sample. Stata declares it as weakly balanced data. Any suggestion is appreciated.
I'd call that a repeated cross section. You can add some sorts of fixed effects to that (like time) but not others (like individual). If plm won't let you register it as a pdata.frame, check out my estimatr video
where is this help file mentioned at 10:13
For any R function you can get the help file with help(). So in this case, help(lag). You'll want to select the one titled "lag a time series"
@@NickHuntingtonKlein ok, so if i want a secon lag in my plm, than i type plm(y= lag(y,k=2) +x...)
@@mulle171 correct, that will give you a two period lag. If you want both one and two period lags you'll need two separate lag functions though
amazing :D is there also a way to optimize the amount of lags ? i would have thought maybe up to the point where the variables are significant but the r^2 is as low as possible
@@mulle171 Typically you would try a bunch of different lag lengths and select the one that gives the best AIC or BIC statistic (Aikake/Bayesian Information Criterion). In general you pretty much never want to use statistical significance to make decisions about your model.
Hi Nick, this video is amazing!
It is really saving me. I have some question regarding using the within function though. I set my indexes within the plm() regression, and there I have specified both the individual fixed effect, and the time fixed effect. Is that a correct way of doing it? Or am I losing information that way? I saw another video that includes dummies for the years, and only specifies the individual fixed effects in the index. Does that yield the same results?
Lastly, if there are na's in the panel data, and I use na.action = na.omit, does this damage the results of the regression too badly?
Sorry if these are too many questions at once, I'm just trying to wrap my head around the subject.
Yes, specifying it within plm() is the way to go - this should produce the same results as including year as a set of dummies, except perhaps for any standard error adjustments, and in that case the version with both included (and 'twoway' fixed effects specified) would be preferred.
As for the NAs, if there are NAs in the data, plm will be dropping them anyway. How much damage this does to the results depends on how much data is missing and why. See the section on missing data in the Under the Rug chapter in my book nickchk.com/causalitybook.html
@@NickHuntingtonKlein Thank you so much Nick!
Hi Nick, thank you for the video, I have a question, which would be the code if i want to add fixed year effect and fixed state effect using first difference as model? I don't know how to do it.
Thank a lot!
I'd recommend checking out my video on fixest, that package makes this fairly easy
very interesting, I am trying to determine if I should use FE or RE for my logit regression in R. will these model="within" commands also work with the glm lines of code ? thank!
That won't work. Fixed effects for logit work very differently than for linear models. Check out the bife package for fixed effects logit. I haven't used it myself but I hear good things.
@@NickHuntingtonKlein Thank you, I will look into how to use that package! I have been having trouble finding information on how to determine weather I should be using a FE model or RE model for my logistic regression. Is the Hausman test the best tool to use when making this determination?
@@onigiriman Hausman is designed for linear models, so that wouldn't work, unless there's some special logit version. I'm afraid I don't know too much about logit random effects.
@@NickHuntingtonKlein I see, thank you so much for your replies. They are very helpful!
Proffesor, do you know how one could calculate fixed effects in cross-sectional data ?
You have to have multiple observations of each of the things you have fixed effects of. So you can't have individual-level fixed effects in cross-sectional data.
You could have fixed effects for some higher-level grouping. So for example if you had a cross-sectional data set of people, you can't have fixed effects for people, but you can have fixed effects for, say, city, as long as multiple people in your data are in each city. To do this in R, just add +factor(city) to your regression model, where city is the variable you want to add fixed effects of.
Is anybody getting an issue with the index function?
I get the following error;
data.p=pdata.frame(data,index=c("Site","Year"))
Warning message:
In pdata.frame(data, index = c("Site", "Year")) :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany")
same problem. how did u fix it?
Hi could you show how to do the Hausman Taylor model
See this page on how to use the pht function, or how to run H-T in plm: rdrr.io/cran/plm/man/pht.html
@@NickHuntingtonKlein thank you. I was trying to understand how they did the classification of the variables with such a simplified command but I found it explained better in another site.
Thanks for the video!!!!...I will give it a look ASAP!!!!...Cheers!!!
Hi, mayby a wierd question but can i email you with a specific question about which model i should use for my thesis. I am really strugling :)
Sure, if it's brief
thankyou!
I have purchase panel data of consumers over a period of time (1.5 year). Within this period of time there is an intervention of +/ - 0.2 years implemented. this purchase data consist out of the purchase of products. These products can belong to a productgroup (in total 10 productsgroups)
I am interested in the effect of the intervention on the sales of the product for each category (and compare them?). Furthermore, i do have some interaction variables with the intervention variable, so i can analyze what the effect of these variables is during such intervention
As i have panel data for each invidual it is the most powerfull to determine this effect at an invidual level. However, i am not sure how to determine if i should use a fixed, a random or mixed model.
Can someone give me suggestions/advice how to approach this?
@@rutgervanbasten2159 In your case, since you're interested in how the effect varies over different predictors, I might recommend a mixed model, and specifically an HLM where you model the effect of the intervention as a function of your predictors.
Hi Nick, thanks so much for this video, it's incredibly helpful! Just a quick question-- I want to lag an independent variable by more than just one period before. Do you know how to code for lags of 2,3,4,etc. periods?
In plm, lag(x, 2) will lag 2 periods for example
Thank you very much Sir! Very helpful :)
sir , if you could tell me how to convert country names to country id in r for a panel analysis
as.factor() will turn country name into a factor variable, which will be acceptable for use with plm (and other panel packages) as an id variable
Thank you Nick for this video. I would like to ask what would be the command in Stata equivalent to R command concerning fixed effect estimator with lag? Or any other estimator, random..?
After using xtset to declare the panel structure, use xtreg to run fixed or random effects. Use L. In front of a variable to lag it.
@@NickHuntingtonKlein Could you please advise, how to generate 1lag dependent variable (as you have it in your video)? I tried with slide command, and setDF command, ...etc... and nothing really works... additional problem is that I have some missing values...
@@macro_finance in stata? Just use L. on the dependent variable.
Or if you're back in R, plm::lag should do it
@@NickHuntingtonKlein No, sorry, it is in R. Actually, whenever I run the fixed effect estimator test in R, I get the error message: Error in class(x)
@@macro_finance did you use pdata.frame before running fixed effects, like in the video?
Isn't it a better example of reverse causality rather than OVB? police per capita is actually based on crime rate not vice versa?
Depends on how crime rate is defined. Old crime rates determine current police presence which determines current crime rate. So with the variables in the data, you can consider "lagged crime rate" to be an omitted variable. But yes your interpretation makes sense too.
Thank you very much!
what is a natural experiment
A natural experiment is when you find a source of close-to-exogenous (randomized) variation in the real world without a researcher running the experiment themselves. For example, the US Vietnam draft lottery was administered by randomly ordering birthdates. So your likelihood of being drafted into the military was random, and that could be used as a sort of experiment to look at the effects of being in the military on later outcomes.
Tanks
Thank you very much for this video ; it's clear ! I'm a biologist and I have to use those panel data models for my work. Do you know where I could find a clear explication on the output of summary(plm) please ? I have to report my results but untill now, I don't fully understand them :s
Thank you very much!
Glad you like it! I'm afraid I don't know a good explainer for this. My best bet would be to just Google each term you're unfamiliar with
@@NickHuntingtonKlein ... Well ok :'(
thank you very much for your quick answer. I gonna work!
HI! Thank you very much!! Very useful. One question, how would you work with unbalanced panel data?
Usually the same software tools work; the necessary adjustments for unbalanced panels are often automatic and included. Check the documentation of your command of choice though
@@NickHuntingtonKlein I think the problem is there duplicate firm-year ids but I always eliminate the duplicates. I matched my outomce variable with a different firm identifier.. could it be that?
Please show your computer screen when you are explaining rather than showing yourself.
No, it is actually better the way he is doing it, gives a lecture feel to the video and makes it more engaging. Thanks for the great content Nick!
@@juanpabloaguirre6390 It's actually very distracting and annoying. Guess we all learn differently