Friday, December 11, 2015

Clustering When Few Clusters

Clustering in STATA when you have many clusters is super easy. Sometimes depressing, but super easy. But what if you have few clusters? Then what? Good news: There's a STATA package for that! Woo hooo!!! Click here.

In other clustering news, it is true that you don't have to cluster when you have a true randomized experiment. Read this lament of clusterjerks and this response to it. Even for those of us not running RCTs, I think reading about this issue helps us understand the importance of clustering.


Friday, December 4, 2015

People Will Forget Your Regression Tables But They'll Remember Your Pictures

Take time to think about how to present your results graphically in a way that best tells your story. Context is important. Whether the y-axis includes zero? Maybe not so much. See this video.


Sunday, November 22, 2015

Commuting Zones, Occupation Codes, Occupational Tasks, and More...Thank You, David Dorn!

MSAs have gone out of fashion. Commuting Zones are all the rage these days! 


David Dorn has graciously provided cross-walks for PUMAs and counties to commuting zones on his webpage. Super useful! On behalf of labor economists everywhere, I thank you! 

Here is a short abstract:Commuting Zones (CZs) provide a local labor market geography that covers the entire land area of the United States. CZs are clusters of U.S. counties that are characterized by strong within-cluster and weak between-cluster commuting ties. The crosswalk files below provide a probabilistic matching of sub-state geographic units in U.S. Census Public Use Files to CZs.

And the webpage is linked here

PS

Use this code with the IPUMS data so that you can use the crosswalk: 
g puma2000=real(string(statefip)+string(puma, "%04.0f"))



Wednesday, October 28, 2015

Clustering. It's Kind of a Big Deal.

By now, all of us know that we need to cluster standard errors. We also know it's easy enough to do in Stata. But what are we actually doing when we add that little command to the end of our regress statement? Find out all you need to know here (working paper version here).

My favorite clustering haiku written by Keisuke Hirano (original version had robust standard errors instead of clustered ones). Found this version in Mostly Harmless:

T-stat looks too good.
Use clustered standard errors--
significance gone.



Sunday, October 18, 2015

When You Read Someone's Paper...

This is a quote about buying art by Rebekah Joy Plett. I think the same can be said about papers...especially job market papers.

Friday, October 2, 2015

Why It's So Important to Visualize Your Data..


..or add square terms or splines to your empirical specifications.

All four of these data sets have the same linear regression line! Read more here.

Monday, September 28, 2015

Oh No! I Got the Wrong Sign! What Should I Do?

Graduate students, this paper has a handy dandy checklist of potential explanations for "the wrong sign" in empirical research. A related issue is an estimate which is way too big to be believable. I really like the discussion of data mining. When is it a sin and when is it a way to understand the world using data? I also really like the conclusion that many times a wrong sign is a blessing, not a curse. It forces us to really think hard about our theory, data, identification strategy, etc. Hmm...maybe all of our papers would be better if we always started off by getting the wrong sign. Maybe a good rule of thumb is to look back at this checklist not only when you get the wrong sign, but also when you happen to get the right one of reasonable magnitude.

Friday, September 11, 2015

Stop Trying to Be Clever and Just Be Clear

Writing clearly is such an important part of doing economics. You might have the most brilliant theoretical model or the most dazzling empirical findings, but if readers don't know what you're talking about, these things won't matter at all. Take the time to write well. Here is some advice on how.

Wednesday, September 9, 2015

Manove's Dissertation Advice

This is some excellent advice for how to complete a dissertation from my friend, Michael Manove, at BU. Some of the names/people are specific to BU, but the general ideas are useful for all econ students. Not only are the suggestions insightful, but he's pretty hilarious!

My favorite lines:

Michelangelo wrote: “Every block of stone has a statue inside it, and it is the task of the sculptor to discover it.”  Likewise, after years of coursework, after reading the newspapers, after listening to professors, preachers and politicians, you have a good idea for a dissertation topic somewhere inside your brain. 

If you plan to write an empirical thesis, and most of you should, you will need data.  The most important thing you need to know about data is that the word “data” is plural.  If you accidentally say “this data” instead of “these data” you won’t sound like the pompous scholar that you want to be. 

The main ingredients for success in a PhD program are self-confidence, self-discipline and ambition.  Intelligence has little to do with the process.

Wednesday, September 2, 2015

Empirical Analysis Dominating Economics

Have a look at this link. I'm intrigued by these machine learning techniques. My undergraduate Econometrics professor told us that when doing empirical work, we should always "let the theory guide the way." I think this was and is very useful advice in general. But I remember thinking way back then, "..but what if the theory is wrong?"

Wednesday, August 26, 2015

It's Not So Easy to be LATE

We all know about the four assumptions needed in order to interpret an IV estimate as a local average treatment effect (see below if you've forgotten). But I think we economists tend to focus on the independence assumption (and the first stage),  and we sort of pay little mind to the exclusion assumption. It turns out that the exclusion restriction can be a big deal! See here. I really like the simple examples. Challenge to blog readers: Can you think of a real world example of a potential IV that satisfies the first three assumptions but not monotonicity?


Friday, July 31, 2015

Nothing screams “GRAD STUDENT!!!” louder than...

Here is a bit more on the LPM vs. nonlinear models issue.

My favorite quote: "Indeed, nothing screams “GRAD STUDENT!!!” louder than an obsession with fancy estimators — usually of the maximum likelihood variety, so probit, logit, tobit, etc., sometimes of the Bayesian variety — instead of with whether one has reasonably identified one’s parameter of interest (via a research design that relies on a plausibly exogenous source of variation), or with whether one’s findings have some reasonable claim at being externally valid (via the use of a representative sample)."

Also liked this: 



There is an unspoken ontological order of importance to things in applied work, which unfortunately goes unspoken in most econometrics classes. That order is roughly as follows:
  1. Internal validity: Is your parameter of interest credibly identified? In other words, are you estimating a causal relationship, or are you merely dealing with a correlation? If the latter, how close can you get to estimating a causal relationship with the best available data and methods?
  2. External validity: Are your findings applicable to observations outside of your sample? Why or why not?
  3. Precision: Are your standard errors right? Have you accounted for things like heteroskedasticity? Did you cluster your standard errors at the right level?
  4. Data-generating process: Did you properly model the DGP? For example, does your estimation procedure account for the fact that, say, your dependent variable is a positive integer, which would require a Poisson or negative binomial regression?




Saturday, July 25, 2015

Should We Just Stop Teaching Probit/Logit Models?

Instead of teaching these models and then teaching students to just use linear models (see Mostly Harmless), could we just skip teaching them? Read this.

Usually, the argument is that it doesn't make a difference. Fine. But sometimes it does. Then what? And how different do estimates need to be in order to even think about this....

Thursday, July 16, 2015

RCT, RD, IV, DiD..Whatever! :)

"O Data, Data! Wherefore Art Thou Missing?"

Here is a nice summary of all of the ways to address the missing data problem. All is great if it turns out that all of these techniques suggest that the missing data is missing at random. But which specification should you use as your baseline: The missing dummy trick or just drop missing observations? I usually just drop missing observations because it's simpler, Regardless of what you choose, it is important to discuss your results from the techniques discussed in the article (even if they suggest that the missings are not missing at random).

Tuesday, July 7, 2015

Stata Command for Using RD

I don't have a single RD paper, but hopefully in the future, I will. And when that time comes, I want to be prepared with the appropriate funky Stata commands. Here it is.

Friday, May 29, 2015

Something to Read Every Now and Then

Favorite piece of advice: Don’t tie up too much of your self-esteem in someone else’s evaluation of your work. See more here

Friday, May 15, 2015

Standard Errors with Population Data

I can't say that I've ever used data on an entire population, but I thought this provided a really nice explanation of what we're doing with empirical work.

Sunday, April 19, 2015

How to Check for Balance

From David Mckenzie's Bog. Useful for all sorts of applied papers. See comments here.



Tools of the Trade: a joint test of orthogonality when testing for balance

David McKenzie's picture
This is a very simple (and for once short) post, but since I have been asked this question quite a few times by people who are new to doing experiments, I figured it would be worth posting. It is also useful for non-experimental comparisons of a treatment and a control group.
Most papers with an experiment have a Table 1 where they compare the characteristics of the treatment and control group and test for balance. (See my paper with Miriam Bruhn for discussion of why this often isn’t a sensible thing to do). Ok, but let’s assume you are in a situation where you want to do this. One approach people use is just to do a series of t-tests comparing the means of the treatment and control group variable by variable. Or they might do this with regressions of the form:
X  = a + b*Treat +e
And test whether b=0.

They might do this for 20 variables, find 1 or 2 are significant at the 5% level, and then say “this is about what we expect by chance, so it seems randomization has succeeded in generating balance”.  But what if we find 3 or 4 differences out of 20 to be significant? Or what if none are individually significant, but the differences are all in the same direction.

An alternative, or complementary approach is to test for joint orthogonality. To do this, take your set of X variables (X1, X2, …, X20) and run the following:
Treat = a + b1*X1 + b2*X2 + b3*X3 + ….+b20*X20 +u
And then test the joint hypothesis b1=b2=b3=…=b20=0
This can be run as a linear regression, with an F-test; or as a probit, with a chi-squared test.

That’s it, very simple. I think people get confused because the treatment variable jumps from being on the right-hand side for the single variable tests to being on the left-hand side for the joint orthogonality test.
Now what if you have multiple treatment groups? You can then run a multinomial logit or your other preferred specification and test for joint orthogonality within this framework, but I’ve not seen this done very often – typically I see people just compare each treatment separately to the control.

What is the Hardest Part of Getting a PhD?

One answer here.

My answer: I struggled quite a bit with qualifying exams and do not want to discount that. As for dissertating, the hardest was part was not knowing what exactly I should choose to learn/explore. Which techniques should I focus on? Which problems should I fix? So many potential paths...so easy to go down a useless path for months before learning it wasn't really helpful (at least in the short run).

Saturday, April 11, 2015

Friday, March 20, 2015

How NOT to Bias Your Heterogeneous Treatment Effect Results

I can't stand it when people write about how what's typically done in the literature is wrong without providing a solution. Well, this is not one of those times.

Often we want to know how the effect of a treatment differs depending on people's outcome in the absence of the treatment. For example, do smaller class sizes help improve tests scores of the kids who typically score well or those that don't do so well?

The way you may think to answer such a question might lead to biased results, but don't worry, there's a way to fix it. Read here for a summary and here for the actual paper.


Tuesday, March 17, 2015

How to Get a PhD in Five Years

It's called "Out in Five"

Divine Genius Does Not Exist

This is a message for students struggling with their semester papers this spring break. 

Here is the gist of it:  

"Taken together, the stories reveal a pattern for how humans make new things, one that is both encouraging and challenging. The encouraging part is that everyone can create, and we can show that  fairly conclusively. The challenging part is that there  is no magic moment of creation. Creators spend almost all their  time  creating,  persevering despite doubt, failure, ridicule, and rejection until they succeed in making something  new and useful. There are no tricks, shortcuts, or get-creative-quick schemes. The process is ordinary, even if the outcome is not.
Creating is not magic but work."

Friday, February 13, 2015

Writing Good Referee Reports

This is ABSOLUTELY excellent for those of you who have not ever seen or written a referee report, but it's also very helpful for those who cannot even count the number they have written. Distinguishing between the things that are absolutely necessary to make the paper publishable and the things that might make the paper better is really helpful! Surely for the editor but also for the author!

I like this reminder: "Ultimately, the author’s name goes on the paper, not yours."

Read more here!

Sunday, February 1, 2015

Pretty Pictures Can Make All the Difference

From a recent JEP paper: "Once upon a time, a picture was worth a thousand words. But with online news, blogs, and social media, a good picture can now be worth so much more. Economists who want to disseminate their research, both inside and outside the seminar room, should invest some time in thinking about how to construct compelling and effective graphics."

My thoughts: People will forget the details of your talk/paper, but they will remember really cool pictures.


Tuesday, January 20, 2015

Characteristics of a Top Paper

From Prof. Nick Bloom's Labor Economics syllabus (Winter 2015):

In my view successful papers need to do at least two of the following three things: (1) Have excellent motivation (that is answer an important question – a good test of this is would a paper, say the New York Times, find the results interesting enough to write up); (2) Have excellent measurement (often using a new dataset, sometimes assembled by the author – this would be new data, rather than say the 1000th paper using Compustat or the CPS); and (3) Have excellent identification (showing clear causation, often with a natural or field experiment).

Thursday, January 1, 2015

How to Write Abstracts

I especially like the google scholar advice!

How to Write Abstracts

Happy New Year!

Sometimes, I find (or think of) interesting pieces of advice on how to do economics. The problem is that I forget about this stuff, and I don't have a good place place to store it so that I can remind myself. My new idea for the new year: I will store my ideas and discoveries on the world wide web! Welcome and enjoy!