this has been extremely useful, thank you very much! The regression model I'm running is a multinomial logistic regression for the outcome model. If so, are all the steps same except the last one where it has to be specified mreg instead of reg? Would really appreciate any help.
Thank you for explaining this in detail! One question I have is that, in case of panel data, if I'm understanding correctly, imr will be computed differently for each time for the same individual. Then, do we treat imr as a state variable? or is imr a control variable to some unobservables? Thanks a lot!
Thank you for your question. I must admit that I have not studied the panel data case enough to give a good answer to your question. However, my intuition would tell me that IMR would be treated the same. That is, once generated after the first step, it is added as an independent variable in the second step.
Thanks Steffen, this was already super helpful! You mention at 9:08 that the exclusion restriction variables should not be strongly correlated with the IMR. However, i am working with a paper that mentions that there should not be a significant correlation between the exclusion restriction variable and the dependent variable of the second stage (the paper is Brauer, Wiersema and Binder 2023). In my case I find that my potential exclusion restriction variable is a significant in the probit model (so it would be a potential candidate), but I also find that it has a low, but significant correlation with the DV of the second stage (-0,1; p-value < 0,001). What is your opinion on this criterium? Is it still a viable candidate if the correlation with the IMR is low?
Hi! Thank you for your comment. From what I understand, at 9:08, I mention indeed that there should not be too high of a correlation between IMR and the exclusion restrions. You mention that there should not be a significant correlation between your exclusion restriction and the dependent variable in the second stage. I am a bit confused here. Your exclusion restrictions are only there in the first stage, and not 'directly' in the second stage. Given what you show me here, (-0,1; p-value < 0,001). I wouldn't be too worried.
Thank you very much. with this video, my first challenge is settled. Please my second challenge is multinomial endogenous switching regression. Do you know how to perform it in stata?
Hey thank you so much. That really clarified a lot for me. One question: At 8:59 you talk about a paper recommending using a correlation matrix for imr and the exclusive restriction variables. Could you provide the citation? Would be really helpful and thanks again
Here you go: Certo, S.T., Busenbark, J.R., Woo, H.S. and Semadeni, M., 2016. Sample selection bias and Heckman models in strategic management research. Strategic Management Journal, 37(13), pp.2639-2657.
@@SteffensClassroom Hey Steffen. I read the paper and I think you might have made a mistake. The correlation should be tested between the indipendent variable and the Inverse Mills Ratio in order evaluate the quality of the exclusive restrictions. In your video you only check for the correlation between IMR and the Exclusion Restrictions. Please share your thoughts
Hi again! I hope you liked the paper. I think it is a really good piece. They talk about the correlation between IMR and x. For example in the Simulation condition section, they refer to reporting the correlation between IMR and x like in Bushway et al., 2007; Leung and Yu, 1996). Their x refers to teh exclusion restrictions. You can read this back in the Sample selection bias section on page 2643. But please also share on what page in their paper they refer to this. It is a rather long read :)
@@SteffensClassroom I thought it was a really interesting paper. Still i am just a Master student often struggling with these complex topics. On page 2649 they say: "Nevertheless, some scholars have proposed evaluating the strength of exclusion restrictions by examining the correlation between the inverse Mills ratio and the independent variable, x (Bushway etal., 2007;Leung and Yu, 1996; Moffitt, 1)" If they really mean that x is the exclusive restriction i at least find this sentence oddly phrased and a bit misleading. I would not have guessed that they refer to ER here.
Hi, what can I do if I have different datasets? one with wages and gender and other one with all the vars to calculate de Probability of being employeee. Idk how to merge it since there is no a common id var
Yep but the datasets are not the same, one has only actual employees, wages, etc and the other one also has non employees. I use the last one to run the probit and then the other one to see the wage differences
@@MrAbrahamdelpozo There should still be a way to merge this. Sounds like a 1:m merge. In any case, it seem slike you could link an employee's wage in one dataset to a set of other variables in the other dataset.
I am not sure what you want to accomplish? You need to think about what the goal is. You could also simply transform your selection variable into a dummy? Again, I do not know what you wish to accomplish here.
I'm a PhD candidate in spain and was recommended this correction in one of my papers, thank you for explaining it in a simple way to understand better
Happy that you found it useful. Good luck with your paper!
Best explanation so far . Thank you
Thank you very much! It's really useful!!
this has been extremely useful, thank you very much! The regression model I'm running is a multinomial logistic regression for the outcome model. If so, are all the steps same except the last one where it has to be specified mreg instead of reg? Would really appreciate any help.
I would not be certain as I have not done it myself before. However, it sounds reasonable at first glance.
Thanks for video
Thank you for explaining this in detail! One question I have is that, in case of panel data, if I'm understanding correctly, imr will be computed differently for each time for the same individual. Then, do we treat imr as a state variable? or is imr a control variable to some unobservables? Thanks a lot!
Thank you for your question. I must admit that I have not studied the panel data case enough to give a good answer to your question. However, my intuition would tell me that IMR would be treated the same. That is, once generated after the first step, it is added as an independent variable in the second step.
Thanks Steffen, this was already super helpful! You mention at 9:08 that the exclusion restriction variables should not be strongly correlated with the IMR. However, i am working with a paper that mentions that there should not be a significant correlation between the exclusion restriction variable and the dependent variable of the second stage (the paper is Brauer, Wiersema and Binder 2023).
In my case I find that my potential exclusion restriction variable is a significant in the probit model (so it would be a potential candidate), but I also find that it has a low, but significant correlation with the DV of the second stage (-0,1; p-value < 0,001).
What is your opinion on this criterium? Is it still a viable candidate if the correlation with the IMR is low?
Hi! Thank you for your comment. From what I understand, at 9:08, I mention indeed that there should not be too high of a correlation between IMR and the exclusion restrions. You mention that there should not be a significant correlation between your exclusion restriction and the dependent variable in the second stage. I am a bit confused here. Your exclusion restrictions are only there in the first stage, and not 'directly' in the second stage.
Given what you show me here, (-0,1; p-value < 0,001). I wouldn't be too worried.
Thank you very much. with this video, my first challenge is settled. Please my second challenge is multinomial endogenous switching regression. Do you know how to perform it in stata?
Feel free to mail me on the channel mail!
Hey thank you so much. That really clarified a lot for me. One question: At 8:59 you talk about a paper recommending using a correlation matrix for imr and the exclusive restriction variables. Could you provide the citation? Would be really helpful and thanks again
Here you go:
Certo, S.T., Busenbark, J.R., Woo, H.S. and Semadeni, M., 2016. Sample selection bias and Heckman models in strategic management research. Strategic Management Journal, 37(13), pp.2639-2657.
Awesome! Thank you so much for the quick response@@SteffensClassroom
@@SteffensClassroom Hey Steffen. I read the paper and I think you might have made a mistake. The correlation should be tested between the indipendent variable and the Inverse Mills Ratio in order evaluate the quality of the exclusive restrictions. In your video you only check for the correlation between IMR and the Exclusion Restrictions. Please share your thoughts
Hi again! I hope you liked the paper. I think it is a really good piece. They talk about the correlation between IMR and x. For example in the Simulation condition section, they refer to reporting the correlation between IMR and x like in Bushway
et al., 2007; Leung and Yu, 1996). Their x refers to teh exclusion restrictions. You can read this back in the Sample selection bias section on page 2643.
But please also share on what page in their paper they refer to this. It is a rather long read :)
@@SteffensClassroom I thought it was a really interesting paper. Still i am just a Master student often struggling with these complex topics. On page 2649 they say:
"Nevertheless, some scholars have proposed evaluating the strength of exclusion restrictions by examining the correlation between the inverse Mills ratio and the independent variable, x (Bushway etal., 2007;Leung and Yu, 1996; Moffitt, 1)"
If they really mean that x is the exclusive restriction i at least find this sentence oddly phrased and a bit misleading. I would not have guessed that they refer to ER here.
Hi, what can I do if I have different datasets? one with wages and gender and other one with all the vars to calculate de Probability of being employeee. Idk how to merge it since there is no a common id var
Hi!
You would have to create an id variable that links the observations. Otherwise, ... well...
I suggest checking the merge video :)
Yep but the datasets are not the same, one has only actual employees, wages, etc and the other one also has non employees. I use the last one to run the probit and then the other one to see the wage differences
@@MrAbrahamdelpozo There should still be a way to merge this. Sounds like a 1:m merge. In any case, it seem slike you could link an employee's wage in one dataset to a set of other variables in the other dataset.
Hi! What should i do in case my selection equation is a multonomial model?
Not use a Heckman (:
@@SteffensClassroom Can you suggest any alternatives?
I am not sure what you want to accomplish? You need to think about what the goal is. You could also simply transform your selection variable into a dummy? Again, I do not know what you wish to accomplish here.