Thank you so much Doug! I just wanted to encourage you for keeping up this great job. your videos are awesome and I believe , they are being used by different people in different field.
@@dougmckee673 thanks from my side too, very clear and easy to understand. Do consider posting similar vids on regression techniques and similar, cheers!
One of the best and most lucid explanations of the DID method. Thank you for this, Doug. Especially how you explain the intuition behind how the calculation of the DID estimate done by hand is same as that estimated by the regression model. And the part where you elaborate on the simple benefits of using a regression for a DID model, is great. Really appreciate it that you having shared your understanding here.
When I watched the video for the first time, I was totally lost. During the second time, I took pauses in between to allow myself take more time to understand your super intelligent and super long sentences. It is so much clearer now. Thank you so much!
This is such a clear and helpful video. I’m taking an exam in an hour and doing last minute double checks. This makes me feel more confident, thank you
Your videos have been vital for understanding the contents of my statistics course for me! So far, I've supplemented every new concept with your videos. Sometimes, I even watch your video first and then do the readings. Please keep doing these videos!
Based on regression result (at 8:59), what is criteria to reject null hypothesis (to say that the effect of lunch program is statistically significant)?
Thank you for this! I didn't quite understand the very last point, i.e. the difference between the points made for when DD is 'ok' (appropriate) and 'not ok'
It is really helpful!This vedio is easy to understand for new learners like me!I really appreciate your help!If i can survive from my phd program,i hope i can make vedios like this in the future!
Thanks for the video, really helped me in my finance research. Just one thing when you talk about the dummy variable Dtr, I think it takes 1 if the person is in the treatment group and 0 if the person is the control group.
Beta_0 is the effect or value of outcome "y" (not including the rest of the variables). Epsilon is the error term which basically contains all other components of "y".
What a great video. I did miss conclusions about the example, though. Beta3 is 30, but it has a p-value equal to 0.228. Can we conclude that this free lunch plan didn't have a statistical relevance (at 95%), right? Those 30 points could have been by chance, right?
IGreat video Doug !!! if there is just have 1 treatment and control group with pre vs post time data and we want to include many control variables , say 5, how do we fit a model with 5 control variables ? What does the regression equation look like ?
Hi Mr Doug, Thank you for this interesting Video. Is it possible to do DID with ordinal Outcomes? My variables: Rating Firms (Y), D1 (D1== Treated simple; 0 Control Sample); D2 (D2==1 if after treatment; 0 Before). I didn't found any examples to know if is it possible and to see how we can interprete the estimators. Your response is very important for me. Thank you.
+Zeineb Ouni I haven't seen it done, but you I believe you could estimate an ordered logit model (ologit) with the same covariates shown above (D1, D2, and D1*D2 in your case). You have to be careful with interpreting interactions in the ordered logit, but I think the basic idea is valid.
First off, thanks for the great video, Doug! I have a follow-up question to one of the comments below: One person commented: So do I understand correctly an extension of the model for 3 treatment groups and 1 control with pre and post could look the following: y = β0 + β1 * Dpost + β2 * Dtr1 + β3 * Dtr2 + β4 * Dtr3 + β5 * Dpost * Dtr1 + β6 * Dpost * Dtr2 + β7 * Dpost * Dtr3 + β8 * X β5: DiD effect for Treatment 1 β6: DiD effect for Treatment 2 β7: DiD effect for Treatment 3 And you replied that is correct. So my question is can you do this same procedure in logistic regression when your dependent variable is dichotomous (e.g., disease vs. no disease)?
Interpreting coefficients on interaction terms in nonlinear models (like logistic) is tricky. If it were me, I would just estimate a linear probability model, but there's a much longer (and better) answer here: stats.stackexchange.com/questions/89513/difference-in-differences-estimator-for-logistic-regressions
To respond to doug, I want to use a word of caution on using LPM is that you can have unbounded probabilities and your errors will be heteroskedastic. The latter can be fixed by an extra option but the former as a fundamental issue within the estimator itself. I would argue the point of using DiD is to examine the magnitude of change from a program, etc and with a logit regression you will get your coefficients, calculate the margins, and use the margins to calculate a probability that the DD had on your dependent variable. You're kind of muddling the point of using a logit in this regard but it still works. Kind of loses some explanatory power and loses the charm. Still doable though.
Hi, Doug! Thank you so much for your great video. I have a quick question. At the end of the video you mentioned the example for the case where DiD is not ok. If the free lunch program has been implemented already in the control group, is there anyway I can still use it as a control group? Semiparametric DiD can be used?
Hi Doug! Thank you so much for your video I just wanted to ask you a small question: I am also planning to use the difference in differences model. I am looking at the impact of the EURO (introduiced in 1998 and in circulation in 2002) on trade flows between countries in Europe and I am new to STATA hence I am not too sure how to proceed. I did the following regression regress Tradeflow Governmenteffectiveness1 Unemployment1 GDPpercapita1 Populationsize1 Governmenteffectiveness2 Unemployment2 GDPpercapita2 Populationsize2 Distance1-2 But I am not sure what I should do next? Any help would be very much appreciated! :) Best, Joseph
+joseph dover To apply a difference in difference, you'll need to divide your trade flows into some set that might be affected by the introduction of the Euro (treatment) and another set that definitely would not be (control). You will also need to reshape your data so you have observations of each trade flow before and after the Euro was introduced. Then you should be able to apply the regression method shown in the video. Good luck!
Hassan Murtza Khan I don't usually answer Stata questions on TH-cam, but I'll make an exception just this once. :) There are two possibilities. The first is that you don't have observations for each group (treatment and control) in both the before and after periods. Tabulate your treatment dummy and your control dummy and make sure all four cells have observations. The second possibility is that you made a mistake constructing the interaction variable. Check this by tabulating the interaction with each of the dummies to make sure the result makes sense. Now your job is to try these and report back so everyone can learn!
Hi Doug Thanks a lot for the video! I just have a question. I want to conduct a different in Differences module on STATA between students that received maths lessons and those that didn't . I would like to test when having extra maths lesson help student achieve higher marks. My variables are: "StudentID" "TIME" "MATHS_LESSON" "MARKS" But the problem I have is that not every students have received maths lessons over the period of time and I would like to create 2 groups one "maths_lesson" one "Nomaths_lesson" by adding them to the variable column "StudentID". How should I proceed? Let me recap: I am now trying to obtain is a graph with "time" on the x axis and "marks" on the y axis with two line (one for the group of students who took maths classes and the one for the group that didn't) but I am struggling a bit to achieve this. Hope I am clear in describing my problem! Best regards, John
+John Dupont Using your TIME variable, you should divide your observations into "before" and "after" groups. You've already divided your students into those that got the treatment (MATHS_LESSON) and those that didn't. Once you have that, you can compute means of the four cells and subtract them to get the DD estimate. I advise first understanding your data and computing the required numbers before worrying about communicating those numbers with a graph. Hope this helps!
Hi, thanks for the video. In the beginning you say that DID is useful for estimating causal effects of programs when the program is not implemented as a randomized controlled trial. So, in a randomized controlled trial DID are not necessary? Thanks!
Thank you so much for this, I had never heard of difference in differences until a reading I had for economic development. I'm actually planning to reference this video in a paper; do you have anything you'd want me to include for a citation? Thanks again.
Matthew Tarpinian I'm really glad you've found the video helpful, but it's probably not appropriate for a citation in your paper. If you want a good reference for the method, I suggest using Angrist and Pischke's _Mostly Harmless Econometrics_ instead.
Thanks, I have one question though, what's the name of the program you're using for the regression? I'm not familiar with it, I find it quite practical
If you have a large enough number of observations (at *least* 25, and I'd feel comfortable over 100), then your outcome doesn't need to be normal--The Central Limit Theorem says your estimate of the treatment effect will be approximately normal. I believe there are nonparametric DiD-like methods when you have a continuous treatment and you believe the effect is nonlinear, but I don't know much about them.
Hi. Thanks so much for this! Quick question though. I've just run a DD regression on my data. The DD beta score isn't significant, but the group (test vs control) beta is. What does this mean?
The insignificant DD beta means there is no significant effect of the treatment. The significant group beta means you have significant pre-treatment differences between the groups.
***** I did use Stata to get some of the numbers shown, but the content is fairly independent of the software in this video. Stata plays a bigger role in some of my other videos.
Doug McKee May I ask one more question? I am using binary dependent variable (dummy). I have search information in internet and find that it is possible to have a regression model with binary dependent variable (in STATA: .probit and.logit command). In your opinion, can it be also implemented in regression of a DD model (I mean, using command .logit y DTr DPost DTrXDPost)?
***** Short answer: Yes. Longer answer: If you use your binary dependent variable in a linear regression model exactly as shown here, you are estimating a linear probability model. The coefficients can be interpreted as effects on the probability of the dependent variable being one. Most economists would do this. You *could* estimate a logistic model with the same variables on the right hand side, but it is much harder to interpret the magnitude of the coefficient on the interaction.
Doug McKee Do you mean that if y is a binary dependent variable and: 1. I use command [regress y DTr DPost DTrXDPost], then I am "estimating a linear probability model. The coefficients can be interpreted as effects on the probability of the dependent variable being one." 2. I use command [.logit y DTr DPost DTrXDPost], then "it is much harder to interpret the magnitude of the coefficient on the interaction." I hope your answer is "yes".
Douglas thanks for this amazing video, it helped me so much! I just have a question: why (y) has only one test score? I am a little bit confused about the pre-test and post-test information. If I have the test scores before the implementation and the scores after, how do I compute them? Thanks
Hi Doug, how do I add additional controls (i.e. X) into the model? I am using SPSS to do the DiD. Do I just add the control variable and regard it as an independent variable?
TH-cam needs more high-quality videos like this. Thank you, Mr. Doug!
Thank you so much Doug! I just wanted to encourage you for keeping up this great job. your videos are awesome and I believe , they are being used by different people in different field.
+Saleh Babazadeh Thanks so much for the kind words! I really should post more of these!
@@dougmckee673 thanks from my side too, very clear and easy to understand. Do consider posting similar vids on regression techniques and similar, cheers!
This is probably one of the best videos on this subject that I've ever seen. Thanks!!
One of the best and most lucid explanations of the DID method. Thank you for this, Doug. Especially how you explain the intuition behind how the calculation of the DID estimate done by hand is same as that estimated by the regression model. And the part where you elaborate on the simple benefits of using a regression for a DID model, is great.
Really appreciate it that you having shared your understanding here.
Best Diff-in-diff course I have learned. Thanks!
When I watched the video for the first time, I was totally lost. During the second time, I took pauses in between to allow myself take more time to understand your super intelligent and super long sentences. It is so much clearer now. Thank you so much!
way more intuative than previously thought, well put thanks
Absolutely brilliant tutorial, first result returned, wish youtube was always this helpful!
This is such a clear and helpful video. I’m taking an exam in an hour and doing last minute double checks. This makes me feel more confident, thank you
Excellent explanation in 12 minutes. Thank you
Your videos have been vital for understanding the contents of my statistics course for me! So far, I've supplemented every new concept with your videos. Sometimes, I even watch your video first and then do the readings. Please keep doing these videos!
Can't tell you how useful your videos are. Thanks for passing on the knowledge!
thanks a lot for this clear explanation, you dont know how much it helped me
Thank you so much! This video was waaay much helpful than reading pages and pages on DD! Very clear and to the point! Thank you!!
You are such a legend mister McKee
Clear and right to the point. I always wondered why the multiplication coefficient is the DD coeff, Now I know :D
Very good and funny videos bring a great sense of entertainment!
Very clear, easy to understand. Great job!
This is pure gold. Thanks!
Thank you so much for uploading this! I had looked online at DID and was confused. This made it so easy to understand and apply.
Thank you very much Doug.
It helped me to analyse my data (pooled cross section).
Very useful, wait for more.
Thank you - very good explanation. Helped clear a lot up for me.
Thanks for this one ! You made it clear !
Based on regression result (at 8:59), what is criteria to reject null hypothesis (to say that the effect of lunch program is statistically significant)?
Very clear to the point
Great video. It saved me!
Thank you for this! I didn't quite understand the very last point, i.e. the difference between the points made for when DD is 'ok' (appropriate) and 'not ok'
Thank you so much, this has been really useful!
Amazing video
It is really helpful!This vedio is easy to understand for new learners like me!I really appreciate your help!If i can survive from my phd program,i hope i can make vedios like this in the future!
Dude. This saved me thanks :)
Thanks for the video, really helped me in my finance research. Just one thing when you talk about the dummy variable Dtr, I think it takes 1 if the person is in the treatment group and 0 if the person is the control group.
Kevin van den Brink You're exactly right--When (if) I re-record this I'll fix that. Thanks!
Abrupt ending, good video
I wish you would post more, you're great!
Much appreciated. Keep it up man!
Very useful!
Thank you! I would like to know, if there isn't a comparable group, like Rio, then how can one figure out the effect of this programme?
What is that program 8:17, looks very neat
Stata
Based on regression result (at 8:59), what is criteria to reject null hypothesis?
Thank you for your video! But at the time 8:06, what is the difference between \beta_0 and \epsilon?
Beta_0 is the effect or value of outcome "y" (not including the rest of the variables). Epsilon is the error term which basically contains all other components of "y".
thanks, it was easy to digest
What a great video. I did miss conclusions about the example, though. Beta3 is 30, but it has a p-value equal to 0.228. Can we conclude that this free lunch plan didn't have a statistical relevance (at 95%), right? Those 30 points could have been by chance, right?
+Linear Seller Absolutely correct and not that surprising given there were only 10 observations in this sample.
Hello! Thank you for a great video! Do you any advice for estimating necessary sample size before implementing treatment? Thanks!
Thank you kind sir! :)
I didnt get the did effect of 30 from 7:35 somebody help please! 😓
I didn't either at first! Remember to average (rather than add) each set of observations before doing the DiD calculation.
IGreat video Doug !!!
if there is just have 1 treatment and control group with pre vs post time data and we want to include many control variables , say 5, how do we fit a model with 5 control variables ? What does the regression equation look like ?
+inferno9004 It looks just like the regression model shown in the video with the addition of your control variables.
In this EX, are y-scores the post-scores or the pre-post differences? I`m guessing just post scores? Thanks for clarifying!
Hi Mr Doug,
Thank you for this interesting Video.
Is it possible to do DID with ordinal Outcomes? My variables: Rating Firms (Y), D1 (D1== Treated simple; 0 Control Sample); D2 (D2==1 if after treatment; 0 Before).
I didn't found any examples to know if is it possible and to see how we can interprete the estimators.
Your response is very important for me.
Thank you.
+Zeineb Ouni I haven't seen it done, but you I believe you could estimate an ordered logit model (ologit) with the same covariates shown above (D1, D2, and D1*D2 in your case). You have to be careful with interpreting interactions in the ordered logit, but I think the basic idea is valid.
+Doug McKee Thank you so much.
Any impact evaluation it is supossed to be started #Building the #DataBase.. then the methodoly as DID must be analized..isn't???
how did you do it can you share with me , thank you
What are the assumptions of dif in dif?
First off, thanks for the great video, Doug! I have a follow-up question to one of the comments below:
One person commented:
So do I understand correctly an extension of the model for 3 treatment groups and 1 control with pre and post could look the following:
y = β0 + β1 * Dpost + β2 * Dtr1 + β3 * Dtr2 + β4 * Dtr3 + β5 * Dpost * Dtr1 + β6 * Dpost * Dtr2 + β7 * Dpost * Dtr3 + β8 * X
β5: DiD effect for Treatment 1
β6: DiD effect for Treatment 2
β7: DiD effect for Treatment 3
And you replied that is correct.
So my question is can you do this same procedure in logistic regression when your dependent variable is dichotomous (e.g., disease vs. no disease)?
Interpreting coefficients on interaction terms in nonlinear models (like logistic) is tricky. If it were me, I would just estimate a linear probability model, but there's a much longer (and better) answer here: stats.stackexchange.com/questions/89513/difference-in-differences-estimator-for-logistic-regressions
To respond to doug, I want to use a word of caution on using LPM is that you can have unbounded probabilities and your errors will be heteroskedastic. The latter can be fixed by an extra option but the former as a fundamental issue within the estimator itself.
I would argue the point of using DiD is to examine the magnitude of change from a program, etc and with a logit regression you will get your coefficients, calculate the margins, and use the margins to calculate a probability that the DD had on your dependent variable. You're kind of muddling the point of using a logit in this regard but it still works. Kind of loses some explanatory power and loses the charm. Still doable though.
Hi, Doug! Thank you so much for your great video. I have a quick question. At the end of the video you mentioned the example for the case where DiD is not ok. If the free lunch program has been implemented already in the control group, is there anyway I can still use it as a control group? Semiparametric DiD can be used?
Hi Doug!
Thank you so much for your video
I just wanted to ask you a small question:
I am also planning to use the difference in differences model. I am looking at the impact of the EURO (introduiced in 1998 and in circulation in 2002) on trade flows between countries in Europe and I am new to STATA hence I am not too sure how to proceed.
I did the following regression
regress Tradeflow Governmenteffectiveness1 Unemployment1 GDPpercapita1 Populationsize1 Governmenteffectiveness2 Unemployment2 GDPpercapita2 Populationsize2 Distance1-2
But I am not sure what I should do next?
Any help would be very much appreciated! :)
Best,
Joseph
+joseph dover To apply a difference in difference, you'll need to divide your trade flows into some set that might be affected by the introduction of the Euro (treatment) and another set that definitely would not be (control). You will also need to reshape your data so you have observations of each trade flow before and after the Euro was introduced. Then you should be able to apply the regression method shown in the video. Good luck!
Thank you so much for your video! But in the last slide, I could not understand the Not OK case...
I am trying to run this through STATA and its omitted Beta3 because of multicolinearity between variables can you guide me how to handle it.
Thanks
Hassan Murtza Khan I don't usually answer Stata questions on TH-cam, but I'll make an exception just this once. :) There are two possibilities. The first is that you don't have observations for each group (treatment and control) in both the before and after periods. Tabulate your treatment dummy and your control dummy and make sure all four cells have observations. The second possibility is that you made a mistake constructing the interaction variable. Check this by tabulating the interaction with each of the dummies to make sure the result makes sense.
Now your job is to try these and report back so everyone can learn!
Hi Doug
Thanks a lot for the video! I just have a question. I want to conduct a different in Differences module on STATA between students that received maths lessons and those that didn't . I would like to test when having extra maths lesson help student achieve higher marks.
My variables are: "StudentID" "TIME" "MATHS_LESSON" "MARKS"
But the problem I have is that not every students have received maths lessons over the period of time and I would like to create 2 groups one "maths_lesson" one "Nomaths_lesson" by adding them to the variable column "StudentID". How should I proceed?
Let me recap: I am now trying to obtain is a graph with "time" on the x axis and "marks" on the y axis with two line (one for the group of students who took maths classes and the one for the group that didn't) but I am struggling a bit to achieve this.
Hope I am clear in describing my problem!
Best regards,
John
+John Dupont Using your TIME variable, you should divide your observations into "before" and "after" groups. You've already divided your students into those that got the treatment (MATHS_LESSON) and those that didn't. Once you have that, you can compute means of the four cells and subtract them to get the DD estimate. I advise first understanding your data and computing the required numbers before worrying about communicating those numbers with a graph. Hope this helps!
Hi, thanks for the video. In the beginning you say that DID is useful for estimating causal effects of programs when the program is not implemented as a randomized controlled trial. So, in a randomized controlled trial DID are not necessary? Thanks!
Thank you so much for this, I had never heard of difference in differences until a reading I had for economic development. I'm actually planning to reference this video in a paper; do you have anything you'd want me to include for a citation?
Thanks again.
Matthew Tarpinian I'm really glad you've found the video helpful, but it's probably not appropriate for a citation in your paper. If you want a good reference for the method, I suggest using Angrist and Pischke's _Mostly Harmless Econometrics_ instead.
Thanks, I have one question though, what's the name of the program you're using for the regression? I'm not familiar with it, I find it quite practical
Doug is using Stata
Hello Professor Armstrong!
Hi Doug. Please help me! Can I use DID if my data does not follow the assumption of normality? If not..is there a non-parametric DID?!
If you have a large enough number of observations (at *least* 25, and I'd feel comfortable over 100), then your outcome doesn't need to be normal--The Central Limit Theorem says your estimate of the treatment effect will be approximately normal.
I believe there are nonparametric DiD-like methods when you have a continuous treatment and you believe the effect is nonlinear, but I don't know much about them.
Thank you Doug!
Hi. Thanks so much for this! Quick question though. I've just run a DD regression on my data. The DD beta score isn't significant, but the group (test vs control) beta is. What does this mean?
The insignificant DD beta means there is no significant effect of the treatment. The significant group beta means you have significant pre-treatment differences between the groups.
Brilliant
Thank you very much. This video really helps me. What statistic program did you use in this video? Stata?
***** I did use Stata to get some of the numbers shown, but the content is fairly independent of the software in this video. Stata plays a bigger role in some of my other videos.
Thank you very much.
Doug McKee May I ask one more question? I am using binary dependent variable (dummy). I have search information in internet and find that it is possible to have a regression model with binary dependent variable (in STATA: .probit and.logit command). In your opinion, can it be also implemented in regression of a DD model (I mean, using command .logit y DTr DPost DTrXDPost)?
***** Short answer: Yes. Longer answer: If you use your binary dependent variable in a linear regression model exactly as shown here, you are estimating a linear probability model. The coefficients can be interpreted as effects on the probability of the dependent variable being one. Most economists would do this. You *could* estimate a logistic model with the same variables on the right hand side, but it is much harder to interpret the magnitude of the coefficient on the interaction.
Doug McKee Do you mean that if y is a binary dependent variable and:
1. I use command [regress y DTr DPost DTrXDPost], then I am "estimating a linear probability model. The coefficients can be interpreted as effects on the probability of the dependent variable being one."
2. I use command [.logit y DTr DPost DTrXDPost], then "it is much harder to interpret the magnitude of the coefficient on the interaction."
I hope your answer is "yes".
Douglas thanks for this amazing video, it helped me so much! I just have a question:
why (y) has only one test score? I am a little bit confused about the pre-test and post-test information. If I have the test scores before the implementation and the scores after, how do I compute them? Thanks
They key is to have (or be able to compute) the average test score of both groups before AND after the intervention.
how to do a difference in difference method using SPSS? need practical steps
Big from you Doug
Hi Doug, how do I add additional controls (i.e. X) into the model? I am using SPSS to do the DiD. Do I just add the control variable and regard it as an independent variable?
thank you!
Thanks!
thank . you!!!!!!!!
everything made sense until @7:55 help!
Who down voted this video? Someone who didn't get a free lunch?
lol! this vid was super helpful. especially for my econometrics exam tomorrow xd
Show
Thank you!
Thanks!