Friday, September 27, 2019

Another PSA: Always Check Your Code

..better yet, have someone else check your code? Better yet, build checks into your code to help you catch mistakes. 

So someone made a mistake. Instead of controlling for country of origin fixed effects, the person "controlled for" country of origin by including the country code as a continuous variable. Ouch! I cringed when read this because I have actually seen this mistake being made. It's an easy mistake to make: instead of typing in "i.country" into Stata, you just type "country". I feel for the paper's authors. 

But one thing to catch a typo in the early stages of research (especially for graduate students who are just learning to code), but quite another for this to be caught after a study has been given significant media attention. The paper was about whether religiosity promotes generosity. See the description here

And now my plea to journals: Please require code to be made available for all published papers. This is not only a way for mistakes to be caught quickly, but it provides stronger incentives for paper authors to write better, nicely organized code. 

But now a question: What about working papers? Papers often get significant media attention even before they're published in a journal. Requiring code for publication doesn't help if all of the coverage happens before publication. You may that journalists shouldn't write about working papers, but I'm not sure journalists should necessarily wait until publication given how long it takes for a paper to get through the referee process. 

So maybe another plea to the journals: speed up the referee process. I'd be happy to be given less time to write my referee reports in exchange for prompter reports on my own submissions.

Stata Tip: How to Make a GIF of a Graph

Unfortunately, we can't make put moving pictures in manuscripts, but they are excellent to use in seminars and tweet storms of your papers. Job market candidates, take note! For how to make these in Stata, see here. (h/t David Mckenzie)

Wednesday, September 25, 2019

PSA: Always Read the Codebook

As you may know, I have been doing immigration research for many years, mostly using Census/ACS data. Sometimes I select the immigrant sample based solely on country of birth but sometimes I drop from the sample those born abroad to American parents or those born in U.S. territories. It turns out, however, that if we want to use (1) 1980 Census data and (2) the years in the U.S. variable, we really need to drop those born abroad of American parents. Why? Because in 1980, the year of immigration variable is only available for "foreign-born persons who were not citizens at birth. See the relevant section of the codebook

Conclusion for those doing immigration research using 1980 U.S. Census IPUMS data: Use the citizen variable to select the immigrant sample if you want to control for years in the United States (or year of migration). 

Conclusion for everyone else: Read the codebook carefully! 

Saturday, September 21, 2019

Stata Tips: Two way clustering and tabulating with labels

Two Stata tips for you today:


  1. When you're doing two-way clustering with the reghdfe command (one of my very favorite Stata discoveries!), order matters...at least when you feel the need, the need for speed. :) Cluster first on the variable with more unique values. See comments in this twitter thread
  2. And David McKenzie tells us that this is what we should do right now: Open up Stata and type "ssc install fre".  With the 'fre' package, you can look at the values and their labels when tabulating data. No more "tabulate x" followed by "tabulate x, nolabel".  Get all of the information you need in one easy step! 

  1. Example output of Stata's fre command