Difference in Differences Estimation in Stata

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ก.ย. 2024
  • An introduction to implementing difference in differences regressions in Stata.

ความคิดเห็น • 214

  • @wm6698
    @wm6698 2 ปีที่แล้ว +3

    Thank you so much for this! My concern is why didn't you run a complete regression model for house price? Why only a bivariate regression? (i.e., dependent and dummies).

    • @sebastianwaiecon
      @sebastianwaiecon  2 ปีที่แล้ว +8

      The purpose of this video is to demonstrate the basic technique of differences in differences estimation. You can certainly add controls to the basic model, but that is outside the scope of this video. You can search my channel for other videos on that.

  • @tdogz8932
    @tdogz8932 4 ปีที่แล้ว +20

    I watched the video 2 years ago, it helped me understand the DID model and Stata so that I could finish my graduation dissertation on time. After the graduation, I published another paper using the same model, thank you sooooooooo much!!!!!!!!

    • @fgghdfg8638
      @fgghdfg8638 3 ปีที่แล้ว

      Can you help me to do my diff in diff other way I will miss my year we can talk about price

    • @tdogz8932
      @tdogz8932 3 ปีที่แล้ว

      @@fgghdfg8638 I'm sorry that I just see your message now. Hope you are doing fine with your dissertation:)

    • @amnashaukat7827
      @amnashaukat7827 3 ปีที่แล้ว

      Can you help me in this technique?

    • @tamandanikuchanje260
      @tamandanikuchanje260 ปีที่แล้ว

      Hello can you help me?

  • @davecullins1606
    @davecullins1606 4 ปีที่แล้ว +5

    You saved my exam in the previous semester, and you're saving me in this semester as well!

  • @dargon1084
    @dargon1084 2 ปีที่แล้ว +1

    I learnt more in this video than six 2-hour videos of my own uni's lectures

  • @dandellionsy6537
    @dandellionsy6537 3 ปีที่แล้ว +2

    Thank you so much, I need it. My model might be more complicated but at least I can sense the idea of doing it. Awesome! Keep sharing more

  • @jonaFUN999
    @jonaFUN999 3 ปีที่แล้ว

    I’m from Andover, England and I approve this video 👍

  • @sylvieyin5261
    @sylvieyin5261 3 ปีที่แล้ว +1

    Thank you so much. This video makes my HW much easier.

  • @huangkiana6165
    @huangkiana6165 3 ปีที่แล้ว +1

    THIS VIDEO SAVED ME FROM MY DEADLINE. THANK YOU SO MUCH *cry

  • @danielkrupah
    @danielkrupah 2 ปีที่แล้ว +1

    Sir, please do you provide a paid service for the DD. I needed a coach

  • @simonazambelli5320
    @simonazambelli5320 2 ปีที่แล้ว

    Thank you very much. You explained everything very clearly! Thanks

  • @simonazambelli5320
    @simonazambelli5320 ปีที่แล้ว

    Love it! Thank you Sebastian!!

  • @timothyowuor9478
    @timothyowuor9478 3 ปีที่แล้ว

    Nice tutorial on DID, thanks for saving me

  • @aymanissa6722
    @aymanissa6722 ปีที่แล้ว

    Thank for such informative video,
    Could you plz explain DiD method using diff command

  • @VINAYKUMAR-kf6kd
    @VINAYKUMAR-kf6kd 11 หลายเดือนก่อน

    Thanks for the detailed Info. what if my Dependent variable is Categorical like Anemia (Yes / No). What should i need to take B coefficient or Exp(B)?? And how to cross check in excel ?

  • @peterdastan1288
    @peterdastan1288 3 หลายเดือนก่อน

    Does that mean house prices near garbage incinerator declined by an average of 21.13%?

  • @samknight7290
    @samknight7290 5 ปีที่แล้ว +1

    Hi Sebastian, thank you very much for the video. Just wondering why you did not regress the other independent variables?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      I wanted to keep things simple and focus just on the diff in diff technique. However, you can certainly add more variables to the regression as controls.

  • @MrLi1231
    @MrLi1231 4 ปีที่แล้ว

    Hi Sebastian, thank you so much. Quick question. Is this dataset a panel, or two separate cross section datasets? I am assuming it is two separate cross section, right?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +1

      You are correct. It's a pooled cross section. It would be very unlikely for the same houses to be on the market in both years.

    • @MrLi1231
      @MrLi1231 4 ปีที่แล้ว

      @@sebastianwaiecon Good point and thank you so much for the quick reply! I am working on a thesis and realised that I was supposed to be doing DiD when I had done a different methodology for the many few weeks. Your video is incredible. Big thanks from Australia!

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      Happy to help!

  • @sajidnoor9482
    @sajidnoor9482 3 ปีที่แล้ว

    Thank you very much for explaining this very clearly.

  • @sireenkhalili8631
    @sireenkhalili8631 2 ปีที่แล้ว

    Thank you so much for this video, it was really helpful!

  • @user-vb3do7hh9v
    @user-vb3do7hh9v 2 ปีที่แล้ว

    Thanks, Very well explained.
    Can I get this dummy data set or can you please guide from where I can get such dummy data set for educational / learning purpose only ?

  • @pneumascope
    @pneumascope 4 ปีที่แล้ว

    I note that you have large Standard Errors in your findings. Does this in any way have an impact on the reliability of the findings or the interpretation of the overall impact of the program (or incinerator in this case)?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +1

      It's all relative when it comes to standard errors. You could say an SE of about 8000, as it is here, is large, but the estimate is -20,000. Standard errors are always going to be big numbers when dealing with things like the prices of homes, which are in the tens of thousands. All other things being equal, larger standard errors mean less precision in the estimates. Here, we can still be quite confident the incinerator did decrease property values.

  • @yanvianna4737
    @yanvianna4737 2 ปีที่แล้ว

    Could you demonstrate how it would work when more than a year before and after treatment?

  • @YY-ty5fx
    @YY-ty5fx 4 ปีที่แล้ว +3

    What a clear explanation! I'm working on my own DD regression, and it really helped. Does the dependent variable 'price' cover prices before & after the treatment here, right?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      At the beginning of the video, I show the data browser and scroll through the data. You can see some observations are before and some are after.

  • @indagame9
    @indagame9 5 ปีที่แล้ว

    Have you ever done a coefplot to test the treatment effect? If so, I get a positive but not significant coefficient for my treat dummy variable. This would mean that the treatment group actually saw an increase in the fatalities (my y variable) or does it mean my treatment effect is positive? It is confusing because if I do a lowess plot on just the different states fatalities drops over time. However, in the coefplot the graph is trending upwards.

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      I don't use coefplot, but I don't see why it would show results any different from your regression table.

  • @GHSHAH
    @GHSHAH 3 หลายเดือนก่อน

    How to interpret the interaction term, also how to check it is significant or not.reply fast

  • @vojtechkolar5897
    @vojtechkolar5897 ปีที่แล้ว

    Hey, I kind of understand diff-in diff, now I am dealing with a problem, what if the control is on way larger levels than the treatment Lets stay Control before: 100, after: 200 = 100 % increase, Treatment before: 5, after 9. If I calculate the DID efffect using the standard table so like the diff between differnces i get in this case 100-4= 96!... So the conterfactual state of the world would in the case of treatment be 105 ? !, that does not make sense no? Even the R with OLS gives me these results. What am I doing wrong? Thank you

    • @vojtechkolar5897
      @vojtechkolar5897 ปีที่แล้ว

      I get, that I can solve this problems by working with log-level model. But isnt this problem always with level-level dif in dif? What Am i missing?

    • @sebastianwaiecon
      @sebastianwaiecon  ปีที่แล้ว

      You can do diff in diff with the dependent variable in logs. That's no problem as long as you are careful with the percentage change interpretation.

  • @shamsunnahar2294
    @shamsunnahar2294 4 ปีที่แล้ว

    clear presentation. Do you have any video on two way cluster regression in stata. If yes, please send me the link here.

  • @lVaNeSsA90
    @lVaNeSsA90 2 ปีที่แล้ว

    what did u use rprice and lrprice varibles to?

  • @Bibirallie
    @Bibirallie 2 ปีที่แล้ว

    What if there are multiple before and after variables, but not one conclusive before and after or year variable.

    • @sebastianwaiecon
      @sebastianwaiecon  2 ปีที่แล้ว

      You may want to consider a fixed effects model instead.

  • @aung9211
    @aung9211 5 ปีที่แล้ว +2

    Could you please provide how to check the Equal Trend (Parallel Trend) assumption.

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      Unfortunately, we can't do it with this dataset, since we don't have extra data on either side of the change.

  • @Muhammadilyas-ij6jh
    @Muhammadilyas-ij6jh 3 ปีที่แล้ว

    Hello sir! I have a question...it looks like you first run a simple OLS regression and then you compute the differences using the collapse command. I do not understand whether to use just OLS regression and report the differences estimator (-18824) as the DID estimator. Please guide me..

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      The number you gave estimates the difference between the treatment and control group before the treatment. We need to use the coefficient estimate for the interaction term to get the DID estimator.

  • @DX-nh8qc
    @DX-nh8qc 2 ปีที่แล้ว

    May I know How to type control covariable in stata

  • @MrAdhoul
    @MrAdhoul 2 ปีที่แล้ว

    Gread video, thank you.

  • @nazlcaneroglu4427
    @nazlcaneroglu4427 3 ปีที่แล้ว +1

    Thank you for the video! Btw is there any way that we can also see the trends of both groups by drawing a line graph in Stata? If the trends are same before the treatment period, we should be able to see that right?

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      Yes, you can use a twoway graph to do that.

  • @narlikar78
    @narlikar78 5 ปีที่แล้ว +2

    Sir, Another question in this regard and I humbly request your attention at the earliest:
    Suppose I have a panel data set of 75 Banks for 5 years (Pre-merger) which have merged to become 30 Banks (also for 5 years Post Merger) and I have been able to establish my model using all the standard Panel Data Test viz. the F-test, BP-LM Test, and Hausman (1978) that it is a Fixed Effects Model.
    given that my Dependent Variable is an Index of Inclusion (whose values lie between 0 and 1), while all other Independent variables are metric data from Balance sheets of banks, with a time dummy (0 for pre-and post merger), CAN I run a Panel Tobit model knowing well that it is a fixed effects Model. I use Stata 14 for my econometrical model testing? I have been told that Panel Tobit can be accompanied only for Random Effects Model
    My problem is my Dependent variable has a truncated range ? Please guide asap

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      Mechanically, you can do it with dummy variables (see my fixed effects video). While I am not aware of a specific reason you should not do so, I don't know enough to definitively tell you one way or another.

  • @subhalakshmipaul4816
    @subhalakshmipaul4816 6 ปีที่แล้ว +1

    Hello sir, please provide a video on reshape long from wide particularly when data sets is very large in size ..I.e., how to organise the variables before reshape... please sir ...

  • @nathanmasak
    @nathanmasak 3 ปีที่แล้ว

    That's really helpful. Thank you. Did you ever run the "event study" model? I can't find resources on this model? Your input would be appreciated.

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว +2

      I haven't, but a Google search turned up some resources. Best of luck with it.

  • @katieleck9955
    @katieleck9955 3 ปีที่แล้ว

    Hi, many thanks for the video.
    When I try to do DID for my panel data set, stata says that my treatment group dummy and did variable are omitted due to collinearity, do you know why this would be / how i could fix it?

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      Most likely what happened is that you made a mistake creating your dummy variables. Click the magnifying glass button to look at your data to check what went wrong.

  • @consultingfaqs
    @consultingfaqs 5 ปีที่แล้ว

    Could you please tell if we are using for example DHS data, which has data on demographics and health of a nation; but we want to see the effect of an external policy, like NREGA on labourforce participation of females ( the data for which is available in DHS). Then, should we merge NREGA data with DHS data, and then apply matching techniques to determine treatment and control groups? If not this, then how should we see the impact?
    Thanks

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      This question is specific to data I don't have any experience with and is therefore outside the scope of this video.

  • @sabrinanasir5844
    @sabrinanasir5844 4 ปีที่แล้ว

    Thanks for the video! If you don't have an ideal counterfactual control group (i.e. there are some slight differences between the treatment and control groups in the pre-treatment period), can you add other independent variables to the diff n diff when running the regression in Stata?

  • @KIMKIM-bt6hr
    @KIMKIM-bt6hr 3 ปีที่แล้ว

    Good morning. I am a student working with the DID model. Thanks to your DID explanation, I was able to complete my assignment smoothly. But yesterday, the professor asked, 'Why was the control variable excluded, so I couldn't actually answer it.' After class, the professor gave me a separate assignment. That is, put the control variable in and analyze it again. I want to use STATA again. But how do I add a control variable to the current video? Could you please advise which code to enter?

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      You can simply add control variables to the DID regression, if you want.

    • @KIMKIM-bt6hr
      @KIMKIM-bt6hr 3 ปีที่แล้ว

      @@sebastianwaiecon I'm a STATA beginner, so can you explain a little bit more about where to put this part?

  • @2thedata
    @2thedata 3 ปีที่แล้ว

    Thank you so much! Your video helps me! :D

  • @nazda2007
    @nazda2007 4 ปีที่แล้ว

    Dear Sebastian, I am working on my dissertation using DiD, i included additional control variables in my model. However, the model suffers from heteroskedasticity and autocorrelation. How to deal with them?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      You might want to look at my videos on heteroscedasticity.

  • @BrickTemplar
    @BrickTemplar 4 ปีที่แล้ว

    Hi Sebastian, I wonder what do we have to do if the effect is spread over the years, say, treatment was implemented in one year for the firms in one industry, next year for another?
    Say, over the three decades, the U.S. authorities have gradually cut import tariffs on a large variety of goods and services. CUT=1 if this happened, 0 otherwise.
    The equation will have a form of
    Investment=b1*tariff CUT + b2*lagged controls + industry FE etc, cluster by industry-year.
    I do not understand what do I have to add to a simple regression to make it diff-in-diffs in this case...
    Dummy CUT interacted with what?

    • @BrickTemplar
      @BrickTemplar 4 ปีที่แล้ว

      or, like in your example, incinerator would have been installed for one neighborhood in 1981, for another in 1985 etc, for another in 2005... y81 time dummy won't work anymore, so what do we have to interact?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      You'll need a dummy variable that "turns on" from a 0 to a 1 once the treatment is active. You won't be able to do this by building an interaction term, as it's more complex than that now. I'm not sure there's a better way than putting in the 1s on a case by case basis.

  • @pudurvivek
    @pudurvivek 5 ปีที่แล้ว

    Do we need to check the p values of the variables before understanding the effect of the interaction variable on the dependent variable?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      If you want to know about p-values, I suggest taking a look at my video on hypothesis testing: th-cam.com/video/lhoqZjQHHjk/w-d-xo.html

  • @myleswhitmore8803
    @myleswhitmore8803 3 ปีที่แล้ว

    Hi SebastianWaiEcon, I am a student at Morehouse College, and I really enjoyed watching your video. I need help running a Diff in Diff regression for my research paper. For context, I am using Stata to analyze NAFTA's impact on GDP and trade flow for its member nations. To facilitate this process, I will be running an individual diff and diff analysis for each country. My dummy variable will be years before 1994 (when NAFTA was signed) and after 1994. My DV will be GDP growth. And my extra variables will be looking at human capital, agriculture industry growth percentage, manufacturing growth percentage, and other variables. However, I struggle with the Stata platform and would like your advice to ensure this regression runs smoothly.

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      The most important thing for diff in diff is to identify a control and treatment group. In your case, that might be countries that were part of NAFTA and countries that were not.

    • @amnashaukat7827
      @amnashaukat7827 3 ปีที่แล้ว

      @@sebastianwaiecon Enjoying your video.. But I neend help.. I have 25 countries and data from 1960-2020... How can I specify only one time 2012 while comparing it 2010-2016.. please help me

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      @@amnashaukat7827 A fixed effects model may be more appropriate: th-cam.com/video/H95BHswbT3w/w-d-xo.html&ab_channel=SebastianWaiEcon

  • @FannysVista
    @FannysVista 4 ปีที่แล้ว

    Hi Sebastian, your video helps me a lot to understand DID estimation. I have a follow-up question. Is it possible to estimate difference indifference for survey data analysis?
    I try it on my survey data. However, the DID from regression and the DID from manual collapse calculations show a different result.

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +1

      The actual source of the data shouldn't matter here, whether it's from a survey or not.

  • @sarahfranz5748
    @sarahfranz5748 3 ปีที่แล้ว

    Thanks for this video! One question: how would you proceed if you are comparing the difference between control and treated group across a 4 week period, testing whether the difference is bigger in the beginning and decreases?

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      You can interact a time variable (linear trend, or quadratic, etc.) with a treatment dummy variable.

  • @TommasoSchembri
    @TommasoSchembri 3 ปีที่แล้ว

    Hi, thanks for the clear explanation. Is it possible to to a DID by percentage level? So that i come up with a %increase/decrease in the treatment group? thanks!!

    • @sebastianwaiecon
      @sebastianwaiecon  2 ปีที่แล้ว

      Yes, you can take the natural log of the dependent variable to get an approximation of a percentage change.

  • @zdavirandimuhammad1515
    @zdavirandimuhammad1515 2 ปีที่แล้ว

    could you explain to us about Propensity Score Matching using STATA?

  • @Maria-ny2mj
    @Maria-ny2mj 5 ปีที่แล้ว +1

    Hi! nice video thank you very much! I have a question, how do you do if there are time varying treatment ? in your example it would be… Imagine there is a neighbourhood (1) that got the incinerator got built in 81 but other neighbourhood (2)82, for example… Would it be reg price y81 y82 nearincneighbourdhood1 nearincneighorhood2 y81* nearincneighbourdhood1 y82*nearincneighorhood2? something like that?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      You could also consider including interactions between y81 and neighborhood 2 and y82 and neighborhood 1. Once we get into more than 2 periods you should also be thinking of this as a fixed effects model. You may find my video on that helpful.

    • @Maria-ny2mj
      @Maria-ny2mj 5 ปีที่แล้ว

      @@sebastianwaiecon thank you very much! I will give a look to the video!

  •  4 ปีที่แล้ว

    Hey, thanks. How do you do it with multiple time points?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      You can still make a variable indicating before and after treatment. You might also want to think about a fixed effects regression.

  • @ssjvegeto4ever
    @ssjvegeto4ever 3 ปีที่แล้ว

    Hi Sebastian, thanks a lot for the clean explanation! Could you tell me why you were inlcuding post-treatment levels of your covariates? Aren't they endogenous and thus result into bias? Thanks in advance!

    • @jackgandhi
      @jackgandhi 3 ปีที่แล้ว

      I don't understand the question. What I showed here is the most basic version of diff in diff, with the bare minimum amount of variables needed. Even if I had added more variables, that would not have created any bias -- bias happens because you left variables out.

    • @ssjvegeto4ever
      @ssjvegeto4ever 3 ปีที่แล้ว

      @@jackgandhi Thank you for the fast reply! Sorry I meant the covariate data structure. I recently did an DiD setup making use of this video's datastructure - and got the criticism that, since I included covariates with a time index for the post traetment period in the regression - these were endogenous and would thus impose bias.

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      @@ssjvegeto4ever What you are describing is a common and valid criticism of time series analysis. The purpose of diff in diff is, if the data allows, solving this problem using a control and treatment group. The "post" dummy (y81 in the video) is not enough to establish a causal relationship. This is why we have the interaction term (y81nearinc in the video). In this video, y81 controls for effects over time that are constant across groups while nearinc controls for group effects that are constant over time. The interaction pulls out the estimated effect. This is not to say this method is perfect as there could still be endogeneity due to variables that are constant neither across groups nor across time, so you still may need to think about controls. The diff in diff method is just one tool in the analyst's toolbox.

  • @keith-ole
    @keith-ole 4 ปีที่แล้ว

    Phenomenal explanation, thank you.
    If you wanted to include more prior years and a few years after, would you have to make a dummy variable for each year?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      You don't have to do that, but you might want to look into fixed effects models for that kind of thing.

  • @IamPaste
    @IamPaste 4 ปีที่แล้ว

    How would you do it for 1978?

  • @jamesleleji9470
    @jamesleleji9470 2 ปีที่แล้ว

    How can you do DID using SSPSS or R programming. Thanks

    • @sebastianwaiecon
      @sebastianwaiecon  2 ปีที่แล้ว

      The idea will be the same -- create dummy variables for treatment and time and an interaction, then put those in a regression.

  • @usmannasim618
    @usmannasim618 4 ปีที่แล้ว

    Hi Sebastian,
    Can you also please describe the coding to be used when we have a dummy variable for 'treatment' and 'control' groups?
    Thanks,

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      I did that in the video. The variable nearinc is the dummy variable for the treatment group.

  • @md.arrahman7125
    @md.arrahman7125 4 ปีที่แล้ว

    Dr. Thanks for your excellent explanation. Is this step the same for panel data as I planning to run DID for panel(2000-2019)? Expecting your kind suggestion

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +1

      I have some other videos on general panel data methods.

  • @AAH123-v4x
    @AAH123-v4x 4 ปีที่แล้ว

    A very useful video. Thank you so much. I have a question. So i created 3 columns similar to y81 nearinc and y81nric. I am running two part logit and glm model. Since the value of y81 and other two is either 0 or 1. Will we put i.y81 and etc? I mean before binary variable ain't we suppose to put i.

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +1

      For a binary variable, you will get the same result just putting the variable in or using the i. structure. If you have a categorical variable with more than two possible values, then you need to use i.

    • @AAH123-v4x
      @AAH123-v4x 4 ปีที่แล้ว

      @@sebastianwaiecon Thanks a lot!!

  • @abmakwara8010
    @abmakwara8010 4 ปีที่แล้ว

    Hi Sebastian thank you for the great content very informative, however i have a question, my research is looking at the impact of bank regulation implemented in 2014 and this regulation only affect bigger banks within my population. Banks with population of 25b and over. I have gathered panel data from 2010 - 2019. i intent on using performance ratios as depended and variable that determine profitability as control variables. I am using DID in FE model in Gretl to run the regression. I have generated some dummy variables , time dummy variable for the before and after, group dummy variable with those impacted by regulation as treatment group and the rest as control, regulatory dummy which i am not sure if its necessary.
    Two questions:
    1. Is this research feasible in terms of parallel trend
    2. will i need to interact all other variable in my model with time or the interaction only needs to be between time and group dummy. If yes then do i need to add group dummy on every interaction i do?
    3. Is there need to add individual time effect since i am running the regression in FE model
    Many thanks in advance

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      1. I have no idea, but it sounds like you have enough data to make that determination yourself.
      2. You should think about this on a case-by-case basis. Think about what you're trying to accomplish and whether or not interactions would help with that.
      3. Time dummy variables are an important component in FE. I have some videos on FE and panel data on my channel.

  • @johnkaimenyi9292
    @johnkaimenyi9292 2 ปีที่แล้ว

    Hello, is DID regression possible in STATA 15.0?

    • @sebastianwaiecon
      @sebastianwaiecon  2 ปีที่แล้ว +1

      I'm not aware of any changes in recent versions of Stata that would change anything in this video.

  • @trobberkah3425
    @trobberkah3425 2 ปีที่แล้ว

    Hi, im doing a DiD for my thesis, but im dealing with panel data. Do you know what i should do differently compared to the regression you show in this video? I noticed that there is a stata command for a fixed effects DiD regression for example.

    • @nunosilva1563
      @nunosilva1563 ปีที่แล้ว

      I face exactly the same situation, can you please reply to the above question?

  • @monikasrivastava5565
    @monikasrivastava5565 3 ปีที่แล้ว

    What are the steps to generate the result why u have not shown them. Plzz do i really need how to do it

  • @popi20101
    @popi20101 3 ปีที่แล้ว

    What if we add more than 1 control variable? not only nearinc.

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว +1

      You are always allowed to add controls if you think the DD method did not eliminate endogeneity.

    • @popi20101
      @popi20101 2 ปีที่แล้ว

      And if we have 5 years of period 2007 to 2011 and the policy is announced at 2009, how to set the year variable?

  • @zdavirandimuhammad1515
    @zdavirandimuhammad1515 2 ปีที่แล้ว

    hi thank you for the explanation. but can we req the data so we can also practice?

    • @sebastianwaiecon
      @sebastianwaiecon  2 ปีที่แล้ว

      This is the dataset KIELMC.dta from the Wooldridge econometrics textbook. It is widely available online.

    • @zdavirandimuhammad1515
      @zdavirandimuhammad1515 2 ปีที่แล้ว

      @@sebastianwaiecon thank you. also for kindly reply my message. God bless. stay safe stay healthy

  • @amartilianom
    @amartilianom 6 ปีที่แล้ว

    Hello, if you want to add control variables or covariates, do you add them normally at the regression? Thanks for the information!

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว +1

      Yes, I forgot to mention that in the video. You can add controls to the diff in diff regression as in any other.

    • @amartilianom
      @amartilianom 6 ปีที่แล้ว

      Thanks. Another question would be, it is not necessary to tell Stata we have Panel Data when we have already created the dummy variables that differentiate the control and treatment group, and the pre and post periods? No need to run a fixed effects regression too, I guess. I'm just learning about the subject :)

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      For a simple DD like this, you don't need to use xtset, if that's what you're asking. You can actually think of a DD as a very simple sort of FE model that only has two groups and two periods. If you want to see more about FE, I also have a video on it.

    • @amartilianom
      @amartilianom 6 ปีที่แล้ว

      I really appreciate your responses. Keep helping us!

  • @achintyawidhi2299
    @achintyawidhi2299 4 ปีที่แล้ว

    sir, what the difference between xtreg and reg? if i use data from year 2007 and 2014, should i use reg org xtreg? my dataset doesn't have same units across 2007 and 2014.

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      Reg is the basic regression command and xtreg is used for panel data methods such as within estimation and random effects. If you don't have the same units across years (pooled cross section), then you probably want to use reg.

  • @adriabc7614
    @adriabc7614 4 ปีที่แล้ว

    Hi Sebastian, very useful video at a great pace ;). In this example you compare the differences in price, how would you interpret the results if the variable is categorical (eg. completed studies, married, etc). Many thanks!

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      You can only do this if the categorical variable is binary (eg. married and not married). Assign a 1 to married and 0 to unmarried. We now have a linear probability model (see my video on binary choice models). The interpretation of the diff-in-diff is now the difference in probability of being married.

  • @harunasanibk2662
    @harunasanibk2662 5 ปีที่แล้ว

    Sir, how am I supposed to run the data for both "treatment and control" groups?
    Should I run the data separately? Please, what command should I use?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      I don't know what you mean by "run the data" here.

  • @mertbakirci6030
    @mertbakirci6030 4 ปีที่แล้ว

    Hey, thanks for the great content here. QUESTION: How can I test for the "common trend" assumption of the DiD-estimator in Stata or in general? Thanks in advance!

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +2

      Usually, this is done informally by comparing the dependent variable movement across groups in an extended period of time before and after the treatment goes into effect. You need a lot more data than I have in this example.

    • @mertbakirci6030
      @mertbakirci6030 4 ปีที่แล้ว

      @@sebastianwaiecon thank you!

  • @ariagalit1875
    @ariagalit1875 5 ปีที่แล้ว

    Hi. My data ranges from 2009 to 2018, and i have both treatment and comparison groups. i just want to ask whether DID, just like what you did in the video, is applicable. I am not much familiar with the method and stata, actually.

    • @ariagalit1875
      @ariagalit1875 5 ปีที่แล้ว

      And how come the interaction variable is all zero?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว +1

      You can do DID if you set up a dummy variable to indicate when the treatment went into effect. Once this is in place, you can create the interaction term.

    • @ariagalit1875
      @ariagalit1875 5 ปีที่แล้ว

      Thanks much for your reply sir

  • @nip5554
    @nip5554 5 ปีที่แล้ว

    Hi what if I want to control for additional variables?
    Then the command "collapse (mean) y, by(after treatment)
    " is not sufficient. Please tell me what to do to control for variables.

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว +1

      You can add control variables, but you'll have to run a regression rather than using the collapse method.

    • @nip5554
      @nip5554 5 ปีที่แล้ว

      @@sebastianwaiecon Thanks :)

  • @Diana-mo6mg
    @Diana-mo6mg 4 ปีที่แล้ว

    if you used logprice instead of price would the coefficient be different?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      Yes, it would. See my video on natural logarithms for how that would work.

  • @himaep_agungkrisyana1013
    @himaep_agungkrisyana1013 2 ปีที่แล้ว

    can i get do-file this stata?

  • @nirobkothopokothon
    @nirobkothopokothon 6 ปีที่แล้ว

    Hi, I would like to know whether Difference in differences analysis is suitable for a small data set thats contains only 2 years of data and have only 168 samples (84 control and 84 treatment)? Thank you so much.

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว +1

      I don't see any reason why not. However, with only 2 years of data, you have no idea of how the outcomes have been trending over time, and you may have a hard time justifying your counterfactual.

    • @nirobkothopokothon
      @nirobkothopokothon 6 ปีที่แล้ว

      thank you so much.

  • @cherrykhalil7481
    @cherrykhalil7481 6 ปีที่แล้ว

    Sebastian, thank yo so much for this video. Does the data have to be in long shape? Is there a way to run the diff in diff regression on a wide dataset? Thank you.

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว +1

      Yes, you can do it. Generate a new variable for the difference, then regress the difference on a dummy variable for the treatment group.

    • @cherrykhalil7481
      @cherrykhalil7481 6 ปีที่แล้ว

      Thank you very much! What about the interaction dummy between year and dummy? Given that my dataset is a balanced panel of 400 firms observed in both 2008 and 2013? Thanks again

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      With the wide dataset, there's no interactions as you've already built it in by taking the difference ahead of time.

    • @jargodm
      @jargodm 5 ปีที่แล้ว

      @@sebastianwaiecon Just to follow up on this, if you do have the same units before and after, the paired difference test gives a different result than the regression you discuss in the video: Y = b1 + b2*treat + b3*time + b4*treat*time, which assumes independent samples, does it not?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      I believe the estimate would be the same, but the standard error would be different.

  • @raulfotso4032
    @raulfotso4032 3 ปีที่แล้ว

    Good morning for all.please i want know how to do a Fairlie décomposition.i am student lecturer in university of Douala

  • @dalemantey6028
    @dalemantey6028 6 ปีที่แล้ว

    Can you do a DD with logistic regression? Say I have a dichotomous outcome - for this example, it could be something like house sold (yes/no). Would it be a similiar stata code, just change "regress" to "logistic" or are the considerations within DD that might limit the statistical validity of that sort of analysis?

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      The principles which drive DD -- controlling for time trends and cross sectional trends -- are still useful for logits (and probits also). However, you need to be careful about the coefficient interpretations, as it's not as clean as in the least squares DD. I would suggest looking at my video on binary choice models for details.

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว +1

      For the code, yes, you can change "regress" to "logit" and it will run.

    • @dalemantey6028
      @dalemantey6028 6 ปีที่แล้ว

      Thank you!

  • @mathewchandy9588
    @mathewchandy9588 4 ปีที่แล้ว

    Is heteroscedasticity ever an issue when you conduct a difference-in-difference analysis?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      Yes, it is. In this example, you could imagine there might be a difference in the variance of prices with and without the incinerator.

    • @mathewchandy9588
      @mathewchandy9588 4 ปีที่แล้ว

      @@sebastianwaiecon Then to solve this, would you add the vcerobust command at the end of your regression?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      I can't think of a theoretical reason why you couldn't do that. To be honest, I think most people just use robust all the time and don't really think about it.

  • @vaishalisharma6519
    @vaishalisharma6519 5 ปีที่แล้ว

    Hello sir. How to create the dummy for near inc. The actual command?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      nearinc indicates whether the house is within 3 miles of the incinerator. There is a variable called "dist" which is the distance from the incinerator in feet. To create the dummy, we would use the command: gen nearinc = dist

  • @ashishstat
    @ashishstat 3 ปีที่แล้ว

    Can I have the link of data set used in this video

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว +1

      It's KIELMC.dta, which comes with the Wooldridge econometrics textbook. You should be able to find it online.

  • @oluwaseunoginni9828
    @oluwaseunoginni9828 5 ปีที่แล้ว

    please , how did you generate the interaction variable?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      Create an interaction term by multiplying the two variables you are interacting.

  • @frankzhao1678
    @frankzhao1678 3 ปีที่แล้ว

    Thank you so much, it is a great video. Could you please show me how to do a DiD with multi periods?

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      Do you mean you have multiple periods before and after the change? It functions the same as this, but you need to define your "post" variable to include all periods after the change.

    • @frankzhao1678
      @frankzhao1678 3 ปีที่แล้ว

      @@sebastianwaiecon So if I have 2000-2010 data, and the policy happened in 2005. I need to set 2000-2004 equal to '0', and 2005-2010 equal to '1'?

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      That would be the simplest way to do it. I'm not promising this is the perfect solution as you may need to think about more sophisticated ways to handle your specific data, but it is a good starting point.

  • @antoniomastrandrea967
    @antoniomastrandrea967 4 ปีที่แล้ว +1

    Hi Sebastian, thank you for your video!
    I've two questions:
    1) What should I do if the FE variables (time and individual) are not significant? (I mean p-value > 0.1)
    2) Do I have to take care of R squared in this case?
    Thank you!

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +1

      1) If what you're after is measuring the treatment effect, this doesn't matter.
      2) I don't know what you mean by "take care," but R squared is not particularly relevant in DID estimation.

  • @LaFemmeExec
    @LaFemmeExec 4 ปีที่แล้ว

    Hey! How do I generate a variable that separates the years?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว

      In this dataset, that is y81 -- a dummy variable with a 1 for 1981 and 0 otherwise. I have another video with some examples of how to create dummy variables: th-cam.com/video/DuAhUpM-56E/w-d-xo.html

  • @alexbrunofmn
    @alexbrunofmn 5 ปีที่แล้ว

    When was the incinerator built?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว +1

      According to the original paper, construction took place from 1981-1984.

  • @manojsapkota4880
    @manojsapkota4880 4 ปีที่แล้ว

    Hello sir I am interested on DID and want to know the command to run DID regression on Stata

  • @pujiannauli
    @pujiannauli 6 ปีที่แล้ว

    wht if the p value after the reg. for the dummy time*dummy group is not significant, how to fix this? thank you so much

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      You don't "fix" it, it's just the result you got. It tells you that you can't reject the hypothesis that your treatment had no effect. Now, it could be that you have some endogeneity that you need to control for, but statistical significance, or lack thereof, is not (by itself) a problem to be fixed.

    • @consultingfaqs
      @consultingfaqs 5 ปีที่แล้ว

      @@sebastianwaiecon Hi, is the interaction term is insignificant, will adding more variables help us getting the result significant? Since, in the results show that the constant term is highly significant, which means that there is an omitted variable bias. I guess, adding more controls can help solve the problem for the insignificant interaction term.

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว +2

      ​@@consultingfaqs It bears repeating that the treatment not being significant is not a "problem" to be be solved unless you think this is because of an omitted variable. Tinkering around with different models with the explicit purpose of finding a significant effect is not an ethical use of data. The constant term being highly significant is also not evidence of omitted variables. I'm not sure where you got that idea. Adding more variables might or might not result in existing terms being more significant. It all depends on the direction of the bias, if there is one.

  • @FanettiMazakura
    @FanettiMazakura 6 ปีที่แล้ว

    Sebastian, what if I want to include id and time fixed effects in the regression? Do I only keep the interaction variable in the regression?

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      Unlike FE models, diff in diff does not necessarily have the same cross-sectional units across time periods. In my example, it's not the same houses in '78 and '81. As such, ID-based FE won't work. Here, the nearinc variable plays the same role as the FE. Your time dummy is already in there in DD.

    • @FanettiMazakura
      @FanettiMazakura 6 ปีที่แล้ว

      Yes, I get that. I have unbalanced panel data and I want to conduct a Difference-in-Differences with id and time fixed effects. Is // xtreg DepVar i.treated##i.during controls i.month , fe cluster(id) // the correct model to achieve that? Or do you think that it would be better to exclude the fixed effects?

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      If I'm understanding what you're trying to do correctly, I think you can include the fixed effects.

    • @motnaichuoiktnb
      @motnaichuoiktnb 6 ปีที่แล้ว

      Firstly thank you for your video which is very helpful. As you have mentioned in your comment it was not the same house in '78 and 81', does that mean your treatment and control group are not the same pre and post-treatment ?

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      The criterion for being in the control or treatment group is the same in both years, but the specific houses aren't the same.

  • @thanhtoba1464
    @thanhtoba1464 5 ปีที่แล้ว

    Thank you for your helpful sharing, when I run the command: "corr(y81 nearinc y81nrinc)" to test the autocorrelation between variables and the result shows there is an autocorrelation between "nearinc" and "y81nrinc" variables. The confidence of correlation is 0.5776. So my question is: what should we do in this situation.

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      First of all, "autocorrelation" is a very specific term, which you are using incorrectly. In time series data, this refers to a variable correlating with itself across time. In any case, you've pointed out that an interaction term is correlated with one of the variables you are interacting. This is true by definition. There isn't anything you do about that -- it would be strange if it were not the case. In a more general sense, there is nothing wrong with two variables in a regression being correlated with each other. That is completely normal and probably the case in most regressions.

    • @thanhtoba1464
      @thanhtoba1464 5 ปีที่แล้ว

      Thank you for pointing out my problem. You are right, it was my fault in using the term "autocorrelation". What I really mean is the "multicollinearity" but there was a mistake in typing. Anyway, according to the data in the video, the truth is "multicollinearity" really happens in the regression result because the coefficient of correlation between " nearinc" and "y81nrinc" variables is 0.5776. Usually, in the case of encountering "multicollinearity", we usually omit one of the two variables out of the model. However, it is impossible to omit any variable of these two variables due to the requirement of "Difference in difference" method because they must be included together to show the effect of the construction of the incinerator. That is why I asked the question "what should we do in this situation". And this problem not only happens in this example, but it also occurs in every "DID" model because we usually create a "did" variable by multiplying the "time" and "treated" variables (did = time * treated). And the consequence is there always is "multicollinearity" in "DID" model. Can you help me to solve this issue?

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      Multicollinearity is not a big deal. Getting into the practice of dropping variables because they are correlated with another variable in the model will lead you quickly into omitted variable bias. There is a simple test where you regress the one variable you are concerned about on all the other explanatory variables. If the R-squared is under 0.9, don't worry about it.
      As I explained previously, it is mathematically impossible for a variable and an interaction term involving it to be uncorrelated. The interaction term is absolutely key to a diff in diff regression.

    • @thanhtoba1464
      @thanhtoba1464 5 ปีที่แล้ว

      @@sebastianwaiecon Thank you very much for the explanation.

    • @gregorychung9421
      @gregorychung9421 4 ปีที่แล้ว

      @@sebastianwaiecon Hello, I found this video very helpful. However, when running my model, my DID variable keeps getting dropped because of collinearity. Is there a fix to that?

  • @jodieteague8254
    @jodieteague8254 4 ปีที่แล้ว +1

    could you then graph this in Stata?

    • @sebastianwaiecon
      @sebastianwaiecon  4 ปีที่แล้ว +1

      Yes. You would do this after running the collapse to get all the averages. The "classic" diff in diff graph has the outcome on the vertical axis and time on the horizontal axis. There are three lines: the treated group, the untreated group, and a counterfactual with the same starting point as the untreated group but the same slope as the treated group. See my video on graphing for how to use the twoway command.

    • @jodieteague8254
      @jodieteague8254 4 ปีที่แล้ว

      @@sebastianwaiecon Thank you will do!

  • @emilieriislarsen5134
    @emilieriislarsen5134 6 ปีที่แล้ว

    Hi, Sebastian, thank you so much for your video. I was wondering if it's possible to do propensity score matching and difference in differences when my dependent variable is dichotomous?

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      I can't comment on specifics as I've never combined all of these myself. However, both diff in diff and propensity score matching can be done with dichotomous dependent variables. You just need to be careful about the issues inherent in linear probability. See my video on binary choice models for details.

  • @jaredgreathouse3672
    @jaredgreathouse3672 6 ปีที่แล้ว

    What if your data have multiple units treated and untreated at the same time? There, a clean post period makes no sense. If one city 1, for example, is being treated at time t, but city 2 and 4 aren't, but the next year, city 3 is being treated and so on, wouldn't you just do treatment##time variable

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      For that, you might want to look into a full fixed effects model. I have a video on that, as well.

  • @narlikar78
    @narlikar78 5 ปีที่แล้ว

    Can we have your dataset used in the video to try the results again ourselves

    • @sebastianwaiecon
      @sebastianwaiecon  5 ปีที่แล้ว

      The dataset is KIELMC.dta that comes with the Wooldridge econometrics book. You should be able to find it online.

  • @lateralus5117
    @lateralus5117 6 ปีที่แล้ว

    Hello, i ran into a problem when running my regression.
    My regression looks like this:
    regress DepVar post_tr_yr treat_group treat_groupXpost_tr_yr
    Where post_tr_yr is a dummy for year>2007
    However my interaction term (treat_groupXpost_tr_yr) gets omitted due to collinearity.
    Is this a problem?

    • @sebastianwaiecon
      @sebastianwaiecon  6 ปีที่แล้ว

      I always recommend you go to the data browser and take a look at the values. Presumably something went wrong in your variable generation.

    • @Ilaay23
      @Ilaay23 6 ปีที่แล้ว

      I also have this problem. My interaction term is omitted due to collinearity, does anyone know how you can fix this?

    • @bencaplan4565
      @bencaplan4565 6 ปีที่แล้ว

      I have the same issue - what sort of issue in the variable generation can result in this?

    • @xMooshy
      @xMooshy 5 ปีที่แล้ว

      @@bencaplan4565 for the time dummy, the control group also gets a 1 even if it is not treated at all

  • @fgghdfg8638
    @fgghdfg8638 3 ปีที่แล้ว

    Hi professor I hope you are doing well I'm a follower on TH-cam professor can you help me to do an assignment in method difference in differences because I didn't find subject or data can help me to do it I must to do it other way I will repeat the year and I sleep only 3 hours more than 3 weeks just because of this project can you help me and if you want I can pay you to help me

    • @sebastianwaiecon
      @sebastianwaiecon  3 ปีที่แล้ว

      I recommend you ask your professor for help - it's what they're there for!