I’m ashamed to admit that the study of statistics was regarded (at least by me) as a necessary evil when I was in graduate school. I vaguely remember one course that attempted to teach a lecture hall of sleepy, stressed-out students how to calculate p values, the differences between retrospective, prospective and case-control studies, and the nuances between sensitivity and specificity. And don’t even get me started on odds ratios. Can you tell I’m still a bit fuzzy? In fact, I keep a reference guide at my desk for help (which I have to consult embarrassingly often).
Statistics might be dull, but there’s no denying its importance in scientific research – and the fallout when scientists fail to appreciate its power. Now, Stanford researcher John Ioannidis, MD, DSci, (of the “Why most published research findings are false” fame) has joined forces with Marcus Munafo, PhD, and others at the University of Bristol to publish a new study in in Nature Reviews Neuroscience (subscription required) delineating the statistical flaws in many published neuroscience studies. Essentially, the researchers found that, although many scientists realize that an under-powered study (for example, one with too few study subjects to adequately capture the phenomena being investigated) is less likely to find statistically significant results, they don’t necessary realize the converse: that any statistically significant finding from such a study is less likely to represent a true effect.
Stellar science blogger Ed Yong explains the sobering implications in an excellent post today:
Statistical power refers to the odds that a study will find an effect—say, whether antipsychotic drugs affect schizophrenia symptoms, or whether impulsivity is linked to addiction—assuming those effects exist. Most scientists regard a power of 80 percent as adequate—that gives you a 4 in 5 chance of finding an effect if there’s one to be found. But the studies that Munafo’s team examined tended to be so small that they had an average (median) power of just 21 percent. At that level, if you ran the same experiment five times, you’d only find an effect on one of those. The other four tries would be wasted.
But if studies are generally underpowered, there are more worrying connotations beyond missed opportunities. It means that when scientists do claim to have found effects—that is, if experiments seem to “work”—the results are less likely to be real. And it means that if the results are actually real, they’re probably bigger than they should be. As the team writes, this so-called “winner’s curse” means that “a ‘lucky’ scientist who makes the discovery in a small study is cursed by finding an inflated effect.”
I encourage you to read all of Ed’s post, which includes multiple comments from Ioannidis, Munafo and other researchers uninvolved in the study. It’s a fascinating analysis of why many studies are designed as they are, and it discusses some of the obstacles that must be overcome to improve their fidelity. And don’t overlook the comment stream, which is currently hosting a rich discussion among scientists in the field.
Previously: NIH funding mechanism “totally broken” says Stanford researcher, Research shows small studies may overestimate the effects of many medical interventions and Animal studies: necessary but often flawed, says Stanford’s Ioannidis
Photo by futureshape