Saturday, March 30, 2019

Cool Data Alert: Administrative Tax Data

I think that by now, everyone has heard of the pretty amazing findings of Raj Chetty and his coauthors regarding social mobility in the U.S. and how this depends on where people live. How were they able to learn so much about the world? Administrative data from the IRS. I have on more than on one occasion sat dumb-founded wile looking at the graphs and figures that have come out of the Opportunity Insights project. 

But how did a pair of academics get access to these data? This article tells the story. 

You may be more interested in how you can get access to tax data. The bad news is that it is still quite tricky to get IRS data. The good news is that tax data from other countries seems quite "get-able". See this blog post explaining the different types of tax data, where to access the data, and a bit on the practical steps towards acquiring access. 

My guess is that different countries collect different types of information on their tax forms. You may be able to answer some very important questions by looking into what is available in different countries. 

And if your dead set on studying the U.S., the Opportunity Insights project provides quite a bit of aggregate data for anyone to download and use. 

Image result for taxes

Saturday, March 23, 2019

And More on Those Pesky P Values

I never really thought about statistics/econometrics being such a political subject, but I guess it is. A recent article in Nature comes with a list of over 800 signatures rising up against statistical significance!

There is much to like about the article. Many things to think about. I do often see students in my office really excited about a placebo regression with statistically insignificant estimates--despite the fact that the point estimates are just as large (or larger!) as those in a baseline regression. That's not exactly what I want to see in placebo regressions. I've also seen people really excited when they get stars even though the point estimate are just too big to be believable!

I think ignoring estimate magnitudes can be a mistake when trying to write good papers. Paying too much attention on those little stars also makes for bad science. One of my favorite quotes from the article:

"Statistically significant estimates are biased upwards in magnitude and potentially to a large degree, whereas statistically non-significant estimates are biased downwards in magnitude. Consequently, any discussion that focuses on estimates chosen for their significance will be biased. On top of this, the rigid focus on statistical significance encourages researchers to choose data and methods that yield statistical significance for some desired (or simply publishable) result, or that yield statistical non-significance for an undesired result, such as potential side effects of drugs — thereby invalidating conclusions."

So what does the article recommend?

"...we recommend that authors describe the practical implications of all values inside the interval, especially the observed effect (or point estimate) and the limits."

I definitely think that's a great idea. Think about what the estimates in the specific context of your paper. For some questions, a wide interval of potential estimates is still interesting. For other questions, maybe not.

Am I ready to abandon discussion of statistical significance all together? Maybe not yet. Those stars are a nice and easy way to determine how confident we should be that there is enough data/variation in the data to be able to learn something about the world. Sure, thresholds may not be ideal for many reasons, but they do provide a quick way to make comparisons across studies.

So, how about this? Let's keep the stars but maybe report p values instead of standard errors? Would that be so crazy? And I'm all for pictures of estimates with confidence intervals around them.

The authors of the article hope that abandoning statistical significance will get people to spend less time with statistical software and more time thinking. I'm up for that!

P.S.
I have had this song in my head the entire time writing this post: https://www.youtube.com/watch?v=79ZLtr-QYNA. Enjoy!

Friday, March 15, 2019

Why IVs Can Be Really Tricky---Even the Good Ones!

After several discussions with a colleague last week, it has come to my attention that I may be more critical of instrumental variables approaches than the typical applied micro economist. To be clear, I'm not talking about the use of IVs within RD or RCT designs. I'm talking about your standard IV paper. And I'm not even sure if the phrase "critical of" is the right one to use. After all, I use IVs in several of my papers. Maybe a better phrase would be "cautious about" or even "careful when using"...

In any case, there are some IVs that I tend to really like. One example: "judge-leniency" IVs. In a recent blog post, David Mackenzie explained the basic idea behind these IVs with an example from Kling (2006, AER). Imagine you want to know the impact of incarceration length on subsequent labor market outcomes. It's almost impossible to answer this question with standard OLS approaches because, as David writes, "people who get longer prison sentences might be different from those who get given shorter sentences, in ways that matter for future labor earnings." What to do? Exploit the fact that some judges are more lenient than others when sentencing. This means that people who, by pure luck, end up with a lenient judge will have a shorter sentence for reasons that have nothing to do with them. Pretty believable, right? At least, I buy it. And the excellent news is that this main approach can be used in many different scenarios with different types of "judges" (see the blog entry for examples). 

But it turns out that even with this really great IV, there are still problems, besides the most obvious one that you need access to lots of administrative data to be able to do this. First, you need to really know the institutional details about assignment. Are the judges really randomly assigned? The second is about the exclusion restriction: Even if the judges are randomly assigned, are we sure that the only way they affect outcomes is via your variable of interest? A third is about the monotonicity assumption, something we do not typically have to worry about in other IV contexts. Again, read the blog entry for more details. 

For now, I will leave you with this. IV approaches can often allow you to answer really important questions in very precise ways. I will certainly not tell you to omit the IV estimates from your paper. I want to see those numbers. It's just that I strongly urge you when writing up your results to be very careful about emphasizing where identification is coming from. As such, my preference is usually to focus on the reduced form estimates instead of the IV estimates. The reduced form is really where the magic is--IV estimates are just an interesting way to (potentially) interpret those reduced form estimates (but only under certain assumptions). 

For a discussion of the problems with an IV that I often use, see here.  For a more sympathetic view of IV approaches, I urge you to get in touch with my colleague, Jorge Agüero, who has developed some of his own really cool IVs.

Saturday, March 9, 2019

A Reminder: Best Coding Practices

You are human. You will make mistakes. You are human. You will forget things. 

Maybe my best advice is just to recite those four short sentences out loud every time you open Stata.  But what does your fallibility imply for how you should code? Tal Gross has some excellent rules of thumb. I will summarize them below in case the link ever stops working.
  1. Use sensible names for variables and dofiles. For example, instead of calling a new variable "sex", call it "female". 
  2. Comment everything! //You won't regret it! 
  3. Make code readable. Put spaces before and after "+" and never ever put anything after a { or }. Go to the next line immediately. 
  4. Create sections with ***************. 
  5. Make code portable by making appropriate use of folders. 
  6. Check your work. No, this doesn't mean reading lines and lines of code over and over again. It means things like summarizing variables right after creating them. Anything suspicious? 
  7. Use a template. 
  8. Preserve source data. Never ever change the original data source ever. Create new data files. 
  9. Don't repeat yourself. Speaking of repeating myself, many of these tips are sounding familiar. I think I have blogged about this before. Yes, I have! Read here. That's Ok. These tips are worth repeating. Lines of code are not!

Friday, March 1, 2019

"I Just Got a Journal Rejection. Now Where Should I Send My Paper?"

I feel like I have said those exact words so many times to so many people. To all of my former advisors, colleagues, friends, random people at conferences, family members, neighbors, etc. who have helped me think through this, thank you. Today I guess I paid it forward (a bit) by having this exact conversation with a former graduate student of mine. I decided this deserves a blog entry. 

The good news is that there are people out there that have thought about this more carefully than I ever have (well, at least there is one person). Who, you ask? Her name is Tatyana Deryugina, and she gives excellent advice. I recommend that you read her blog regularly. 

Step 1: What to do after a rejection? The first thing to note is that rejection is part of the publication game. I will add that, unless you're always publishing in the very top journal(s), if you're not getting rejections, then you're not aiming high enough. 

My favorite piece of advice she gives: "It can be tempting to either (1) ignore the reports completely and send the paper back out as soon as possible or (2) treat the reports as a revise-and-resubmit and try to address all the reviewer’s comments. Neither approach is generally a good idea,.."  Read her blog for details on why, but she is exactly right. I have made both mistakes in the past. 

I will also add that when you first start a tenure track job, people will encourage you to send your paper to the very top journals. I think this is excellent advice in general, but as the tenure clock keeps ticking, be careful. You do not want to be in the position that because your paper spent too much time bouncing from top journal to top journal, you skip over the perfectly good top field journals simply because you have run out of time and need a publication right away. You also don't want to skip the appropriate journals just because you're emotionally exhausted from all of the rejections. Again, aim high! At least to start. But be aware of the risks. 

Step 2: Where should I send my paper next? Basically, the answer is to figure out where similar papers have been published recently and send it there. Click on the link for practical tips on how to systematically do this. 

Good luck! 

cDwk9fQ