Sunday, June 26, 2016

Read This Before (and After) Sitting Down To Write

Marc Bellemare has a nice style guide (see here).

Note that I don't follow all of his writing advice, at least not all of the time. However, it is definitely worth reading. Often. People will take your work more seriously if it looks professional. Also, when I read really sloppy papers, either as a professor/advisor or as a referee, I often feel a bit insulted, like you don't value my time at all. Typos happen. Everyone knows this. But spend time cleaning what you write.

At the very least, reread your papers before submitting them. I just reread this post. Twice.

Addendum: I just happened to re-read a letter I wrote to an editor about a referee report. Caught two typos. How embarrassing! Do as I try to do, not as I always do....?

Saturday, June 11, 2016

Stata Tip: Dummy Variables and Interaction Terms

One of the most frustrating things to happen (all the time) when you're in the Stata groove is to get that error message, "no room to add more observations." Yes, you can usually add more more memory (use the 'set memory' command), and if you can't, you can always buy a new computer with more memory. But insufficient memory issues often come up when you have many dummy variables in your model. I used to make them using the tabulate command:

tabulate var, gen(newstub) 

There's a much better way! You don't actually need to create those variables! Much better to just add "i." to the beginning of your variable within the regression command:

regress y i.var

Done. You can even specify the base and test for equality of different dummy coefficients!

You can use a similar trick for interaction variables. Instead of creating several interaction variables,

generate femaleXgroup2=female*group2
generate femaleXgroup3=female*group3

just use this regression command:

regress y i.sex i.group sex#group 

Actually, even better to do this:

regress y sex##group

I would say that one of the most-often made coding errors I see is to forget to include one of the non-interacted variables in regression models with interaction terms. By using the ## trick, you don't need to worry about it! Oh, if one of the variables is continuous, you need to tell Stata this by putting "c." before the variable. For example,

regress y i.sex age sex#c.age or  regress y sex##c.age

More details here. Or for quick reference, look at the this cheat sheet.