Wednesday, July 11, 2012

Mark Thoma reminds me of something Art Goldberger taught me: R-squared is over-rated

Mark posts a letter from Stephen Ziliak:

The chief finding of the Soyer-Hogarth experiment is that the expert econometricians themselves—our best number crunchers—make better predictions when only graphical information—such as a scatter plot and theoretical linear regression line—is provided to them. Give them t-statistics and fits of R-squared for the same data and regression model and their forecasting ability declines. Give them only t-statistics and fits of R-squared and predictions fall from bad to worse.
It’s a finding that hits you between the eyes, or should. R-squared, the primary indicator of model fit, and t-statistic, the primary indicator of coefficient fit, are in the leading journals of economics - such as the AER, QJE, JPE, and RES - evidently doing more harm than good.
This reminds me of Art Goldberger's teaching in Econ 612.  After I took that class, he turned his class notes into a book.  From page 177:

From our perspective, R2 has a very modest role in regression analysis, being a measure of the goodness of fit of a sample of LS (least squares) linear regression in a body of data.  Nothing in the CR (classical regression) model requires R2 to be high.  Hence a high R2 is not evidence in favor of the model, and a low R2 is not evidence against it...

...In fact, the most important thing about R2 is that is is not important in the CR model.  The CR model is concerend with parameters in a population, not with the goodness of fit within the sample. 

I also remember Gary Chamberlain was not crazy about t-statistics--he said he didn't want to see any "damn stars" in our papers.  We should care more about confidence intervals than hypothesis tests.