Thank you for the wonderful lecture, Prof. I had a quick question, are the standard errors in the simulation results bootstrapped standard errors, if we are running the regression 10,000 times?
These are actually not 10,000 regressions, but rather one simulation with 10,000 observations. The purpose of this exercise was to illustrate an identification problem, which means that even with infinite amounts of data we would not estimate the correct parameter. The important term here is "with infinite amounts of data." Therefore, as long as we care about identification, it is irrelevant what we do with our standard errors. If we care about inference -- that is, how much we can learn from a sample of finite size -- then it matters how we express uncertainty through standard errors.
Hi Ben! I really enjoyed the video. However, with regards to the signs you have used on the DAG, won't the arrow between income and conflict have a negative sign? As the income increases, conflict decreases and vice versa, which I believe has also been validated by the regression results?
Thanks for pointing this out. You are correct: the effect of income on conflict is negative. In combination with a negative effect of temperature on income, including income as a mediator reduces the size of the coefficient. I’ll include a box in the video after Xmas.
Splitting the effect of temperature into direct and indirect effects (through income) is useful, right? What we are estimating in a model with income (mediator) as a control is both direct and indirect causal effects. Why is this wrong? Direct effect may not be significant after controlling for income. That is a valid result, right? Why is this considered bias. It is a more informative model.
This is an important question. My answer would be this: researchers should always take a stand on what effect they want to study, and then propose an identification strategy for this effect that is free from biases. If the goal is to estimate the total causal effect of a change in temperature on conflict, then controlling for a mediator is a big no-no because it would introduce a bias even if temperature was a good as random. Controlling for a variable that is on the causal path between the treatment and the outcome introduces a selection bias in the estimation. If you want to learn more about this bias, I recommend the section on bad controls in Angrist/Pischke (MHE) or the paper by Montgomery et al referenced below. If the goal is to estimate the direct causal effect net of other causal channels -- and this is what your question seems to aim at -- it is insufficient to just control for the mediators. By controlling for income, you will not get an unbiased estimate of the direct causal effect of temperature on conflict. To do that, one has to use the tools of mediation analysis. Kosuke Imai (Princeton) has some very accessible lecture slides on that. I also found the paper by Acharya et al (referenced below) very helpful. ACHARYA, A., BLACKWELL, M., & SEN, M. (2016). Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects. American Political Science Review, 110(3), 512-529. doi:10.1017/S0003055416000216 Montgomery, J.M., Nyhan, B. and Torres, M. (2018), How Conditioning on Posttreatment Variables Can Ruin Your Experiment and What to Do about It. American Journal of Political Science, 62: 760-775. doi.org/10.1111/ajps.12357
This is awesome, man
It’s quite subjective to define the term beauty or measure it
Thank you for the wonderful lecture, Prof. I had a quick question, are the standard errors in the simulation results bootstrapped standard errors, if we are running the regression 10,000 times?
These are actually not 10,000 regressions, but rather one simulation with 10,000 observations. The purpose of this exercise was to illustrate an identification problem, which means that even with infinite amounts of data we would not estimate the correct parameter. The important term here is "with infinite amounts of data." Therefore, as long as we care about identification, it is irrelevant what we do with our standard errors. If we care about inference -- that is, how much we can learn from a sample of finite size -- then it matters how we express uncertainty through standard errors.
Hi Ben! I really enjoyed the video. However, with regards to the signs you have used on the DAG, won't the arrow between income and conflict have a negative sign? As the income increases, conflict decreases and vice versa, which I believe has also been validated by the regression results?
Thanks for pointing this out. You are correct: the effect of income on conflict is negative. In combination with a negative effect of temperature on income, including income as a mediator reduces the size of the coefficient. I’ll include a box in the video after Xmas.
Splitting the effect of temperature into direct and indirect effects (through income) is useful, right? What we are estimating in a model with income (mediator) as a control is both direct and indirect causal effects. Why is this wrong? Direct effect may not be significant after controlling for income. That is a valid result, right? Why is this considered bias. It is a more informative model.
This is an important question. My answer would be this: researchers should always take a stand on what effect they want to study, and then propose an identification strategy for this effect that is free from biases. If the goal is to estimate the total causal effect of a change in temperature on conflict, then controlling for a mediator is a big no-no because it would introduce a bias even if temperature was a good as random. Controlling for a variable that is on the causal path between the treatment and the outcome introduces a selection bias in the estimation. If you want to learn more about this bias, I recommend the section on bad controls in Angrist/Pischke (MHE) or the paper by Montgomery et al referenced below.
If the goal is to estimate the direct causal effect net of other causal channels -- and this is what your question seems to aim at -- it is insufficient to just control for the mediators. By controlling for income, you will not get an unbiased estimate of the direct causal effect of temperature on conflict. To do that, one has to use the tools of mediation analysis. Kosuke Imai (Princeton) has some very accessible lecture slides on that. I also found the paper by Acharya et al (referenced below) very helpful.
ACHARYA, A., BLACKWELL, M., & SEN, M. (2016). Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects. American Political Science Review, 110(3), 512-529. doi:10.1017/S0003055416000216
Montgomery, J.M., Nyhan, B. and Torres, M. (2018), How Conditioning on Posttreatment Variables Can Ruin Your Experiment and What to Do about It. American Journal of Political Science, 62: 760-775. doi.org/10.1111/ajps.12357
@@ben_elsner thanks for your detailed explanation. I will read the references suggested.