Wednesday, July 11, 2012

Three reasons not to be crazy about hypothesis tests

(1)  With large samples, unimportant treatments can be "statistically significant."  If we precisely measure that a treatment has an influence of .00001 percent on an outcome, the treatment can appear significant, while not really mattering very much.

(2) Relying on hypothesis tests produces publication bias.  Suppose 20 different researchers run one regression each, but use different samples (I will assume they are independently drawn).  They are all examining a particular treatment effect--say the impact of divorce on child outcomes.  If one measures significance at p < .05, there is around a 65 percent chance that one regression will produce a "significant" coefficient, simply because of the random aspects of the coefficients.  The researcher who gets the "significant" coefficient is more likely to get her results published than those who do not.

(3) The fact that we don't generally do randomized trials on things like marital status means it is hard to draw inferences about the non-treated group based on those who are in the treated group.  In what may be my favorite book on applied social science work, Manski shows that the confidence intervals one should develop are much broader than those that are typically used.  His work also suggests that applying hypothesis tests in the social sciences is really problematic.  This is consistent with the theme that sometimes we can draw better conclusions from plots than test statistics.

To a large extent, one could deal with (1) and (2) by reporting Box-and-Whiskers plots of coefficient estimates across studies.  Dealing with (3) is much harder.