The Cult of Statistical Significance is a poorly argued rant about what appears to be an important topic on the pursuit of scientific knowledge. argue that many of the statistical sciences have been using the wrong metric to determine whether the results of experiments are interesting and relevant. They report on a few detailed reviews of articles in top journals in economics, psychology, and other fields to show that the problem they describe is real and pervasive. Unfortunately, they are much more interested in casting aspersions on the work and influence of Ronald Fisher and building up his colleague William Gosset, and so they don't actually explain how to apply their preferred approach. In amongst the rant, they do manage to make the defects of Fisher's approach clear, though it's tedious reading.'s
The basic story is that Fisher argued that the main point of science is establishing what we know, and to that end, the important result of any scientific experiment is a clear statement of whether the results are statistically significant. According to Fisher, that tells you what confidence you should have that the results would be repeated if you ran the experiment again. want you to understand that a result can be statistically significant but practically useless. And there are worse cases, where statistical significance and Fisher's approach leads scientists to hide more relevant results, or worse to conclude that a proposal was ineffective when the data show that a large effect might be present, but the experiment failed to show that it was certain. want scientists to primarily report the size of the effects they find, and their confidence in the result. To , a large effect discovered in noisy data is far more important than a small effect in very clear data. They point out that with a large enough sample, every effect will be statistically significant. (Though they don't explain this point in any detail, nor give any numbers on what "large enough" means. I have an intuitive feeling for why this might be true, but this was just one of many points that wasn't presented clearly.)
They describe a few stories in detail to show the consequences for public policy. Vioxx was approved, they claim, because the tests of statistical significance allowed the scientists to fudge their results sufficiently to hide the deleterious effects. (It's not clear why this should be blamed on statistical significance rather than corruption.) They also present a case that a study of unemployment insurance in Illinois found a large effect ($4.29 in benefit for every dollar spent), but gave the Fisherian conclusion, not just that the result wasn't statistically significant, but that there was no effect. It turned out that a careful review of the data showed that the program had a statistically significant benefit-cost ratio of $7.07 for white women, but the overall benefit-cost ratio was not statistically significant because the $4.29 was only statistically significant at the .12 level, while under .05 or less is required by Fisher's followers.
demonstrate that they're on the right side of the epistemological debate by supporting the use of Bayes' Law in describing scientific results, but beyond one example, they don't explain how a scientific paper should use it in presenting results. The use of Fisher's approach gives a clear guide: describe some hypotheses, perform some tests, finally analyze the results to show which relationships are significant. With Bayes, the reasoning, approach and explanation are more complicated; but don't tell how to do it. Of the 29 references to Bayesian Theory in the index, 24 of them have descriptions like "Feynman advocates ...", or "Orthodox Fisherians oppose ...". There aren't any examples of how one might write a conclusion to a paper and show Bayesian reasoning, even though they pervasively give examples of analogous Fisherian reasoning that they find unacceptable.
Another significance question thatargue is important (but that they don't explain adequately) and that statistical significance hides is how much various treatments or alternate policy approaches might cost. Fisher's approach allows authors to publish that some proposal would have a statistically significant effect on a societal problem or the course of a disease and not mention that the cost is exorbitant and the effect small (though likely). argue that journal editors should require authors to publish the magnitude of any effects and a comparison of costs and benefits. According to the reviews they've done and others they cite, it's common in top journals to omit this level of detail and to focus on whether experimental results are significantly different from zero.
Another of the authors' pet peeves is "testing for difference from zero". They claim that it's common for papers to report results as "statistically different from zero", when they're barely so. They use the epithet "sign testing" for this case. The lack of attention to the size of an effect that significance testing allows means that papers get published showing that some effects have a positive effect on a problem, even when the effect is barely different from a placebo. And there are enough scientists performing enough experiments today that many treatments with no real effect will reach this level of significance purely by chance.
Overall, the book spends far too much time on personalities and politics. Even when the discussion is substantive, too much effort goes into why the standard approach is mistaken and far too little on how to do science right, or why their preferred approaches would actually lead to better science.
For the layperson trying to follow the progress of science, and occasionally to dip into the literature to make a decision about what treatment to recommend to a family member or what supplements would best enhance longevity or health, the point is that scientific papers have to be read more carefully. argue that editors, even of prestigious journals, are using the wrong metrics in choosing what papers to accept, and often pressure authors to present their results in formats that aren't useful for this purpose.
When reading papers, concentrate on the size and the costs of the effects being described. Significance can be relevant, but the fact that a paper appeared in a major publication doesn't mean that the effects being described are important or useful. Don't be surprised if the most-cited papers in some area don't actually present the circumstances in which an intervention would be useful. Don't assume that all "significant" effects are relevant or strong.