Sunday, October 25, 2020

Cool Data Alert: Historical Data

I haven't done any work using historical data, but I must say I am intrigued! This IZA paper describes some of the most commonly used  data sources (well, by economists). They are broadly classified as geographical data, ethnographic data (really cool!!!) and Censuses. For each group, the authors "outline the issues they raise and also point out which methodological advances allow economists to overcome or minimize these problems." 

If you're a history buff and you're looking for a dissertation topic, take a peek. 

Thursday, October 15, 2020

Stata Tip: Quick and Easy Code for Making Plots

Yes, I've said this. A picture is worth a thousand words. Or in our case, a picture is worth a thousand numbers in a table. The DIME folks at the World Bank have just put together a super useful resource on how to quickly code all of the different types of plots you may want to make. Click on the you plot you want to make and up comes the code that is used to make it! No need to to spend hours googling how to do this for each potential plot. What an amazing public good! 



P.S.I put the the pictures up just to make my blog pretty. To get the actual code, click on the link above. Or here

Saturday, October 10, 2020

10 Commandments for How to Give a Seminar

Kjetil Storesletten gave a talk with the commandments for giving a seminar--maybe specifically a job talk. You can see the slides here and watch the talk here

Stating the question you are asking very clearly and at the beginning of the talk is so important (Commandment 2). Yes, this is of course always important but maybe especially important during an economics talk so that you don't keep getting interrupted with questions that are not really relevant to the question you are trying to answer. The fewer of those "out there" questions you get, the more likely you will be to have well prepared answers to the questions (see Commandment 10). 

I would also emphasize Commandment 4: Show the value-added of your paper. Sometimes newbies want to 'over-emphasize' their work by 'de-emphasizing' prior related work. This is not only dishonest, but you may find yourself fighting unnecessary battles. For example, if some well-respected economist has already used your identification strategy to study some other outcome and that paper is now published in a solid journal, then your audience may not make you work as hard to defend that broad identification strategy. Instead, they will focus their questions on why it may (or may not) be appropriate for your particular application. This is probably a much easier battle to fight and it is the battle you should be prepared for. 

Good luck! 

Thursday, October 8, 2020

Stata Tip: Best Advice on Writing Dofiles

Yes, I know you're excited to see the results of your regression. Go ahead and be sloppy with your coding. You probably will make mistakes. You'll go back to fix them. Maybe that's fine. But at some point, go through these J-Pal instructions and guidelines on how to clean your data. The big rules: 

  1. Document decisions
  2. Never overwrite the original/raw data file
Other gems include: Look at the distribution of every variable you use in your analysis (do you have 500 year olds? Are the missing values set to 99?) Do you see anything suspicious? The more you know about the data, the better. 

Also, use Stata's help command to learn more about "mvdecode" and "subinstr."  And remember to rename variables so that you can tell what they are by looking at the variable (hint: a dummy variable called "male" is more helpful than one called "sex"). Label the values so that you don't have to keep going back to the codebook. 

Tuesday, October 6, 2020

Message to My Students

I just saw this on twitter, and it is absolutely true. I forget to say it, I know, but I am very often very impressed with your work. So impressed that I want to help make it as great as it can possibly be. Please do keep sending me your paper drafts. Do the best job that you can. Fix the typos you catch, and remember to include a date and page numbers. But remember that I don't expect perfection in your drafts. After all, they are drafts


Friday, September 25, 2020

On Nonacademic Jobs

 Yes, I know that most of you decided to get a PhD specifically because you had an academic career in mind. But keep in mind that while you were all exposed to academic careers to some degree while in college, you might not know about the many fulfilling jobs you might have (with that PhD in hand) in the private sector, government sector, etc. I will not tell you which is best for you, but I will tell you to seriously consider all of these options when sending out applications. Read about what everyday life looks like in the different types of jobs. Think about how to prepare for the job market if you're particularly excited about a nonacademic career.  

In this morning's blog entry, David Mackenzie points us to many different and useful resources focusing on jobs for development economists. Have a look here.

Thursday, September 17, 2020

Are You Really Controlling for that Variable?

Yes, we love RCTs and RDs (and maybe sometimes IVs and difs and difs, I guess), but remember the tried and true way to get at causal estimates is just to good old-fashioned control for omitted variables. From a "research is so cool" perspective, it's quite fascinating to see an estimate of interest drop suddenly in response to adding an important control variable. From an "I won't get this published unless I have those stars" perspective, you often hope this doesn't happen...and when is it least likely to happen? When you're sloppy in constructing those control variables or when the variables themselves only imperfectly measure the true omitted variable. 

So what happens when you add a variable measured with error as a control variable in your model? Supplysideliberal.com explains it all here with excellent intuition as well as matrix algebra! Who could ask for anything more. I've copy-pasted the important bit here: 

"Compare the coefficient estimates in a large-sample, ordinary-least-squares, multiple regression with (a) an accurately measured statistical control variable, (b) instead only that statistical control variable measured with error and (c) without the statistical control variable at all. Then all coefficient estimates with the statistical control variable measured with error (b) will be a weighted average of (a) the coefficient estimates with that statistical control variable measured accurately and (c) that statistical control variable excluded. The weight showing how far inclusion of the error-ridden statistical control variable moves the results toward what they would be with an accurate measure of that variable is equal to the fraction of signal in (signal + noise), where “signal” is the variance of the accurately measured control variable that is not explained by variables that were already in the regression, and “noise” is the variance of the measurement error."

But now for some practical advice from me: When you add an important control variable to your model, be sure to show its estimated coefficient in the table (vs. just having an X signifying that you controlled for it). Why? If you know that variable is an important omitted variable, but its estimated coefficient is close to zero and not statistically significant, the culprit may be measurement error. If that's the case, then we shouldn't be surprised that adding this poorly measured variable doesn't change the estimated coefficient of interest. On the other hand, if you can show that the control variable has its expected impact on the outcome AND it doesn't budge your estimate of the coefficient of interest, you're probably good!  

Conclusion 1: Minimize measurement error by coding carefully. 

Conclusion 2 (copied from the blog): "I strongly encourage everyone reading this to vigorously criticize any researcher who claims to be statistically controlling for something simply by putting a noisy proxy for that thing in a regression. This is wrong. Anyone doing it should be called out, so that we can get better statistical practice and get scientific activities to better serve our quest for the truth about how the world works."