This is excellent. I'd love to get even more videos like these on complete panel data exploration and other tests that we can run on such data. Thank you for the video and your work!
Hi, when I try to perform panel regression using fixed effect on STATA with dummy variables, STATA "Omit" my variable in the result. It shows "omitted". How can I avoid that?
Excellent Video. Sebastian is there a way to visualize a Multiple Linear Regression with Fixed Effects? I want to do a regression curve in a plot that considers fixed effects.
You can plot the predicted values if you want, it just won't look as clean as with a simple regression. You might want to look at my video on "Creating and Editing Graphs in Stata."
Each time I'm running as effects= "twoways", I'm getting an error message as 'Error in solve.default(crossprod(WX, t.CP.WX.A1)) : system is computationally singular: reciprocal condition number = 1.12264e-21'... Can you please shed some light?
Hello, Thanks for your video. I have currently the task to analyze company data on buyouts. Here I have the financial information for multiple companies at different time series. For example company A has the financial information from 2005 - 2010, company B has financial information from 2008-2010 etc. So the number of years I got for the different companies differ and the timespan where I got the financial information from. How should I treat data like this? Could you please give me a hint? Thanks in advance.
What you're describing is an unbalanced panel. You can still use the estimation methods in this video but there can be some additional pitfalls that are outside the scope of this video. Any good econometrics book would have some information on that.
Hello Sebastian. I have a data in which I regress net income of firms which are members of a business group on control variables and net working capital and ultimate owner of the business group ( firm, family or government). If I need to see the interaction between NWC and every type of owner I'd do NWC*type of owner, however, there will always be a type of interaction, the reference one, that won't be seen. I can't just look at the intercept because one of the regressors is type of owner (different to the one with the interaction). How could I solve this situation?
It sounds like you're doing this the right way. You will always have a base group with categorical data like you have. So, you just have to think carefully about the interpretation. The estimates you get for the other groups show you what's going on with each group, relative to the base. If you're doing interactions, you'll have the difference in slope between each group and the base.
Do we need to check the assumptions like multicolliarity..autocorrelation etc before selecting model of regression or we should select model first either it is fixed effect or random effect?
I am trying to simulate data for the fixed effect model. I am struggling in generating the time-invariant & subject-invariant effect since they are must be correlated with X. Is there a reference on how to simulate these 2 effects?
They don't necessarily have to be correlated with X, but I take your point. How I would do that is generate the fixed effects first, then generate X dependent on them.
@@sebastianwaiecon Thank you for your reply: I was going to generate x in the the following fashion X_it = N(0,1) + c1 \mu + c2 \lambda Another suggestion is to do the following instead X_it = N(0,1) Then \mu_i = N(0,1) + c_1 \bar{X_i.} \lambda_t = N(0,1) c2 \bar{X_.t} Based on the literature, is there a preferable method? Could you please suggest a paper that I could use as a reference?
Hello Sebastian, I have a dataset of IPO offerings over time (let's say 15 years). Due to its nature (one company may go public only once), every company exists only once in a dataset and there might be uneven number of observations across years. Can I treat my dataset as panel data and apply fixed effects? Or should I treat is rather as a cross-section? If the latter is the case, can I apply fixed effects on years? I know there might be significant differences across years and I would like to control for them. Thanks a lot for your answer!
Interesting question. What you have is not a panel, but what we might call a pooled cross section. As long as you have multiple observations per year, you can add time dummy variables. Since companies appear only once, you couldn't do company-level FE.
after adding the legal variable. the intercept was missing. how do you recover the intercept. i'm having same problem and every r-code i use doest reveal the y-intercept
The intercept does not have a useful meaning in the within estimator. If you're using fixed effects, the goal is establishing an unbiased estimate of a coefficient. You wouldn't need the intercept in such a situation, anyway. If you really do want it, then you can use the dummy variable method.
I'm getting the following error when I use index="state" fixed effect: Error in solve.default(vcov(x)[names(coefs_wo_int), names(coefs_wo_int)], : Lapack routine dgesv: system is exactly singular: U[1,1] = 0
Any tip on how to do the regression if I want to analyze 2 groups of countries across 50 years, devided into 5 decades and show the result seperatly for each country for each decade. That is the coefficients for the independet variables effect on the dependend variable, for each country (id) for each decade
Thank you! I have a question when you use fixed effect with plm. For example, if i want to test de fixed state effect, my index would be only year? and if i want to test year fixed effect as your example, my index is state? What happens if I want to test both, which would be my index? Best! Felipe
I don't know what you mean by "test" here. In my example, I have two types of fixed effects: time and state. It does not matter which one I set as the index and which one I set as dummies. I get the same result for the variable of interest in the end.
Thank you very much. I have two questions: 1) can I use a lagged variable in fixed effects panel model, either with lm(xxx) or plm()? For examplo, can I use mrate(-1)? And 2) Suppose I am interested in testing if all the "state" dummies are jointly significant. Should I use a F test?
thank you so much for your explanation can you help me with this error: Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, : empty model
Hi Seb what if I want to create a dummy variable between high income countries and low income countries from WDI How do i specifically do these on the ground of GDP~INTEREST_RATE, I'm having a difficult time searching for income in relation to this and creating dummies for it. someone please help out
@@sebastianwaiecon Are the fixed effects always dummy variables or for example, we can consider the Hofstede cultural dimensions values as fixed effects as well? Because they are almost invariant during the time.
I have a very similar set of data but i have the problem when doing the fixed effects model using a dummy variable. I have data from two different years. Problem is, in the summary you should see the result of every year using factor(year) but it always shows only the results of one year instead of both. I really don't know what is wrong. I did it exactly the same way you did it. And unfortunately it doesn't work either when i replace the number of the year with a word. Quite strange. Do you know what the problem could be? Edit: Ah, I just realized the second row is always the first result. Your results started with 1970 but the first result directly under legal is 1971 and not 1970
Nothing is wrong. When you use dummy variables you must always have a base group to which everything else is compared. In my case, the years went from 1970 to 1983. Notice I don't have an estimate for 1970 because it is the base. The dummy variables start at 1971. So, if you only have two years, your first year will be the base and you only have a dummy for the second.
@@sebastianwaiecon I have a followup question to this. Can you somehow choose what the base group is and for which dummy variables you want the result? Edit: I figured it out. It is actually quite simple, when you understand how it works. Still, thanks for the help. This helped me a lot in conducting the empirical research for my bachelor thesis.
this was very detailed thank you! but a quick question: you did show how to get year-specific intercepts and I assume that's the same way to get state-specific intercepts but, what if we're looking for a year-state specific intercept?
Hello Sebastian! Excellent video! I would like your opinion. I want to define my X matrix and I want to include state FE and birth cohort FE. I tried the factor but then my matrix is not numeric and i get errors in my analysis due to that, what would you recommend ? So, my X right now, looks like this : X
I generally don't recommend books for learning R, since there are many online resources. A good place to start would be Grant Farnsworth's Econometrics in R. There are many good books for general econometrics. My recommendations are: Introductory Econometrics by Wooldridge, Mastering 'Metrics by Angrist and Pischke, and Mostly Harmless Econometrics by Angrist and Pischke.
Thank you, you just saved my B.A.! Do you know if there is a way to exclude the factor variables from the regression output to clean it up for a paper?
Honestly, I usually just manually delete them. You might want to look into one of the packages for making regression tables, and they might have some options.
Hi, thank you for the video, it is really helpful! I am a question at 08:27 that how could we interpret the result (more specificaly, the estimated coefficients of legal) if we do not put in factor(year)?
The interpretation wouldn't change (although you would get a different result). However, if you leave the time dummies out, you're opening your regression up to a lot more endogeneity. I wouldn't recommend doing that.
Thanks for the video. I am pretty new to the topic and was wondering if it would make sense to do a fixed effects regression in either of the following cases: 1. When the independent variable is different for every observation 2. When the independent variable can only be in one of two groups (and therefore is a dummy) Would appreciate your reply a lot, thank you!
Hello Sebastian, this video was very helpful for understanding the plm package and the "within" model. I need to use this model on a particularly huge dataset. I let the code running for hours, but it won't finish, so I don't know if maybe I am just using the code incorrectly. I suspect that it is because there are lots of na's in my data. Do you have any tips to help the plm function run faster? the code I am using is: reg_Y = plm(Y ~ Zt + as.factor(year), data= DF, index = "id", model = "within", na.action = na.omit) Where DF is 10 million observations of 17 variables. There are 8 years in total. Some of these observations are the log-differences, some are lags and some are leads. So they leave na's in different places.
This is excellent. I'd love to get even more videos like these on complete panel data exploration and other tests that we can run on such data. Thank you for the video and your work!
You're the GOAT
Thank you very much for this clear explanation!
Thank you for the awesome explenation, very clear!
Hi, when I try to perform panel regression using fixed effect on STATA with dummy variables, STATA "Omit" my variable in the result. It shows "omitted". How can I avoid that?
This is really helpful! Thanks a lot!!
Excellent Video. Sebastian is there a way to visualize a Multiple Linear Regression with Fixed Effects? I want to do a regression curve in a plot that considers fixed effects.
You can plot the predicted values if you want, it just won't look as clean as with a simple regression. You might want to look at my video on "Creating and Editing Graphs in Stata."
Hello, how do I include country-fixed effects in my mulivariate logistic regression? Is it also the plm function? Looking forward for an answer.
Thank you Sebastian, I had a quick question on standard errors - do we get the same standard errors regardless of whether we use lm or plm?
Yes, they should be the same.
are you using both time and state fixed effects in your LSDV estimation?
Yes. You can see on line 6 of the code, both are present.
Hi! Thank u so much for the video, but what if I was given the dummy variables to use?
Each time I'm running as effects= "twoways", I'm getting an error message as 'Error in solve.default(crossprod(WX, t.CP.WX.A1)) :
system is computationally singular: reciprocal condition number = 1.12264e-21'... Can you please shed some light?
Thank you for this.
Hello,
Thanks for your video. I have currently the task to analyze company data on buyouts. Here I have the financial information for multiple companies at different time series. For example company A has the financial information from 2005 - 2010, company B has financial information from 2008-2010 etc. So the number of years I got for the different companies differ and the timespan where I got the financial information from. How should I treat data like this? Could you please give me a hint? Thanks in advance.
What you're describing is an unbalanced panel. You can still use the estimation methods in this video but there can be some additional pitfalls that are outside the scope of this video. Any good econometrics book would have some information on that.
Hello Sebastian. I have a data in which I regress net income of firms which are members of a business group on control variables and net working capital and ultimate owner of the business group ( firm, family or government). If I need to see the interaction between NWC and every type of owner I'd do NWC*type of owner, however, there will always be a type of interaction, the reference one, that won't be seen. I can't just look at the intercept because one of the regressors is type of owner (different to the one with the interaction). How could I solve this situation?
It sounds like you're doing this the right way. You will always have a base group with categorical data like you have. So, you just have to think carefully about the interpretation. The estimates you get for the other groups show you what's going on with each group, relative to the base. If you're doing interactions, you'll have the difference in slope between each group and the base.
Do we need to check the assumptions like multicolliarity..autocorrelation etc before selecting model of regression or we should select model first either it is fixed effect or random effect?
It's worth thinking about serial correlation of error in selecting first differences vs. fixed effects. Multicollinearity is not important to this.
@@sebastianwaiecon is there a requirement to check staionarity in panel data?
I am trying to simulate data for the fixed effect model. I am struggling in generating the time-invariant & subject-invariant effect since they are must be correlated with X. Is there a reference on how to simulate these 2 effects?
They don't necessarily have to be correlated with X, but I take your point. How I would do that is generate the fixed effects first, then generate X dependent on them.
@@sebastianwaiecon Thank you for your reply:
I was going to generate x in the the following fashion
X_it = N(0,1) + c1 \mu + c2 \lambda
Another suggestion is to do the following instead
X_it = N(0,1) Then \mu_i = N(0,1) + c_1 \bar{X_i.} \lambda_t = N(0,1) c2 \bar{X_.t}
Based on the literature, is there a preferable method? Could you please suggest a paper that I could use as a reference?
Can you please explain the dummy variable treatment in random effects? I want to understand the random effects across time. what should I do?
Which stata version you are using, I am not able to use these commands
This is R, not Stata. Check my channel for other videos on fixed effects using Stata.
Hello Sebastian, I have a dataset of IPO offerings over time (let's say 15 years). Due to its nature (one company may go public only once), every company exists only once in a dataset and there might be uneven number of observations across years. Can I treat my dataset as panel data and apply fixed effects? Or should I treat is rather as a cross-section? If the latter is the case, can I apply fixed effects on years? I know there might be significant differences across years and I would like to control for them. Thanks a lot for your answer!
Interesting question. What you have is not a panel, but what we might call a pooled cross section. As long as you have multiple observations per year, you can add time dummy variables. Since companies appear only once, you couldn't do company-level FE.
Succinct and very helpful. Thank you very much :)
after adding the legal variable. the intercept was missing. how do you recover the intercept. i'm having same problem and every r-code i use doest reveal the y-intercept
The intercept does not have a useful meaning in the within estimator. If you're using fixed effects, the goal is establishing an unbiased estimate of a coefficient. You wouldn't need the intercept in such a situation, anyway. If you really do want it, then you can use the dummy variable method.
I'm getting the following error when I use index="state" fixed effect: Error in solve.default(vcov(x)[names(coefs_wo_int), names(coefs_wo_int)], : Lapack routine dgesv: system is exactly singular: U[1,1] = 0
Thank you this really was helpful! I’ve got one question: Why is the intercept coefficient estimate missing in the plm function?
The intercept is not really meaningful in within estimation. The point is to get unbiased estimates of the coefficients of interest.
Any tip on how to do the regression if I want to analyze 2 groups of countries across 50 years, devided into 5 decades and show the result seperatly for each country for each decade. That is the coefficients for the independet variables effect on the dependend variable, for each country (id) for each decade
Thank you!
I have a question when you use fixed effect with plm.
For example, if i want to test de fixed state effect, my index would be only year? and if i want to test year fixed effect as your example, my index is state?
What happens if I want to test both, which would be my index?
Best!
Felipe
I don't know what you mean by "test" here. In my example, I have two types of fixed effects: time and state. It does not matter which one I set as the index and which one I set as dummies. I get the same result for the variable of interest in the end.
Thank you very much. I have two questions: 1) can I use a lagged variable in fixed effects panel model, either with lm(xxx) or plm()? For examplo, can I use mrate(-1)? And 2) Suppose I am interested in testing if all the "state" dummies are jointly significant. Should I use a F test?
Yes to both.
How do I get an intercept when doing a within model?
thank you so much for your explanation can you help me with this error:
Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, :
empty model
I'm not aware of the function plm.fit.
Thank you! One question: Can I also run the within model if my dependent variable is a dummy?
Yes, you can. Just be aware you will now be estimating a linear probability model, with the interpretation and assumptions that come with that.
I'm using fixed effects for years 2010-2019, but for some reason only 2016-2019 are taken as factors. Any idea why this might be happening? Thanks.
Hard to say without seeing it. I recommend you open up your dataset and look for any missing data or other problems.
Hi Seb
what if I want to create a dummy variable between high income countries and low income countries from WDI
How do i specifically do these on the ground of GDP~INTEREST_RATE, I'm having a difficult time searching for income in relation to this and creating dummies for it.
someone please help out
THANK YOU
Thank you for the video! How do I apply industry, country and week fixed effects in the same regression?
You can use factor() to make dummies for any of those things.
@@sebastianwaiecon Are the fixed effects always dummy variables or for example, we can consider the Hofstede cultural dimensions values as fixed effects as well? Because they are almost invariant during the time.
I have a very similar set of data but i have the problem when doing the fixed effects model using a dummy variable. I have data from two different years. Problem is, in the summary you should see the result of every year using factor(year) but it always shows only the results of one year instead of both. I really don't know what is wrong. I did it exactly the same way you did it. And unfortunately it doesn't work either when i replace the number of the year with a word. Quite strange. Do you know what the problem could be?
Edit: Ah, I just realized the second row is always the first result. Your results started with 1970 but the first result directly under legal is 1971 and not 1970
Nothing is wrong. When you use dummy variables you must always have a base group to which everything else is compared. In my case, the years went from 1970 to 1983. Notice I don't have an estimate for 1970 because it is the base. The dummy variables start at 1971. So, if you only have two years, your first year will be the base and you only have a dummy for the second.
@@sebastianwaiecon Ah, I understand. Thank you
@@sebastianwaiecon I have a followup question to this. Can you somehow choose what the base group is and for which dummy variables you want the result?
Edit: I figured it out. It is actually quite simple, when you understand how it works. Still, thanks for the help. This helped me a lot in conducting the empirical research for my bachelor thesis.
this was very detailed thank you! but a quick question: you did show how to get year-specific intercepts and I assume that's the same way to get state-specific intercepts but, what if we're looking for a year-state specific intercept?
In both estimation methods, you can see the coefficient estimates for the year dummy variables.
Hi, in the within estimator - how can you see what effect the state has now that it can't be seen in the output?
You can't estimate those effects using the within estimator, since they get differenced out. You'll have to use the dummy variable method.
In short I Control for state in the within function. What if I’m interested not in years but states?
You could switch them around and use dummy variables for the states.
Hello Sebastian! Excellent video! I would like your opinion. I want to define my X matrix and I want to include state FE and birth cohort FE. I tried the factor but then my matrix is not numeric and i get errors in my analysis due to that, what would you recommend ? So, my X right now, looks like this : X
thank you!!!!!!!!!
Thanks sir for the nice video . Is there any good and updated book for learning econometrics in R ?
I generally don't recommend books for learning R, since there are many online resources. A good place to start would be Grant Farnsworth's Econometrics in R. There are many good books for general econometrics. My recommendations are: Introductory Econometrics by Wooldridge, Mastering 'Metrics by Angrist and Pischke, and Mostly Harmless Econometrics by Angrist and Pischke.
@@sebastianwaiecon Thanks so much for your kind and informative reply.
Thank you, you just saved my B.A.! Do you know if there is a way to exclude the factor variables from the regression output to clean it up for a paper?
Honestly, I usually just manually delete them. You might want to look into one of the packages for making regression tables, and they might have some options.
@@sebastianwaiecon Haha, that was what I was considering too. Thank you very much for your reply!
Hi, thank you for the video, it is really helpful! I am a question at 08:27 that how could we interpret the result (more specificaly, the estimated coefficients of legal) if we do not put in factor(year)?
The interpretation wouldn't change (although you would get a different result). However, if you leave the time dummies out, you're opening your regression up to a lot more endogeneity. I wouldn't recommend doing that.
Thanks for the video. I am pretty new to the topic and was wondering if it would make sense to do a fixed effects regression in either of the following cases:
1. When the independent variable is different for every observation
2. When the independent variable can only be in one of two groups (and therefore is a dummy)
Would appreciate your reply a lot, thank you!
Neither of those would prevent use of fixed effects on their own.
Thanks
Dear Professor, do you work privately?
I don't do consulting, if that's what you're asking.
@@sebastianwaiecon i need a few interpretation on my data results. are you able to help?
I need help as soon as possible
Hello Sebastian, this video was very helpful for understanding the plm package and the "within" model. I need to use this model on a particularly huge dataset. I let the code running for hours, but it won't finish, so I don't know if maybe I am just using the code incorrectly.
I suspect that it is because there are lots of na's in my data. Do you have any tips to help the plm function run faster?
the code I am using is:
reg_Y = plm(Y ~ Zt + as.factor(year), data= DF, index = "id", model = "within", na.action = na.omit)
Where DF is 10 million observations of 17 variables. There are 8 years in total. Some of these observations are the log-differences, some are lags and some are leads. So they leave na's in different places.
I don't have any tips for you in particular. That is simply a huge dataset and it will take some time. You could try running it overnight.