That quote seems particularly fitting for my life right now, but since this is a blog about doing applied micro, let me write again about control variables (hehe, I know corny). Marc Bellman has a new blog post about the sensitivity of results to the use of particular controls. On the one hand, we should expect results to be sensitive. That's why we include those controls! On the other hand, if researchers are playing with different specifications including many different combinations of controls and only reporting those generating significant results, ...well, you know.
Marc's post discusses a recent working paper by Lenz and San showing that in about 40% of the observational studies analyzed in the journal, American Journal of Political Science, researchers obtain statistical significance of their estimate of interest by tinkering with the covariates included.
Yikes! Would you expect similar numbers in an economics journal?
In past blog entries, I've written about how economics papers have gotten longer and longer over the years and how referees often help write the paper instead of just 'refereeing'. But now that I see that 40% figure, I think maybe these are not such horrible developments. If you have only one specification to tinker with, it's not so hard to get that statistical significance, but if you have many suggested by referees, it's not impossible but certainly a lot harder.
Economists have been worried about the issue of control variables recently. I really like Marc's description of two recent papers:
(1) y = a + bX + cD + e
"The issue of what goes on the RHS of equation (1) is getting a lot of attention in the applied literature. Two prominent examples are Emily Oster’s forthcoming JBES article “Unobserved Selection and Coefficient Stability: Theory and Evidence” and Pei, Pischke, and Schwandt’s (2017) NBER working paper titled “Poorly Measured Confounders are More Useful on the Left than on the Right.”
Oster provides a method to assess just how much coefficient (as in coefficient c in equation 1) stability tells us about selection on unobservables. Pei et al. develop a test of identifying assumptions that treats putative additional controls as dependent variables in equation (1).
I expect both methods to become part of the applied econometrician’s toolkit over the next five to 10 years. At the very least, I expect a bare-bone regression of y on D alone to become something that has to be included in a paper, along with a discussion of why the controls that were included on the RHS of equation (1) were retained for analysis."
(h/t David McKenzie)