Now a tenured professor at the University of
California, Schooler still ruminates over his
experiments. “I know I should just move on
already. I should really stop talking about this.
But I can’t.”

Schooler is convinced that he’s stumbled on
a serious problem. A problem that afflicts
most new ideas in many fields of research.
And he’s not alone. Other scientists have
also noticed the phenomena, looked into
it and come up blank as well.  

The decline effect
Why so many published research findings are false

As the medical technology industry knows very well, before the effectiveness of a drug or device can be confirmed, it must be tested and tested again. Different scientists in different labs need to repeat the protocols and publish their results. The test of replicability, as it’s known, is the foundation of modern research. Replicability is how the scientific community polices itself. It’s a safeguard against subjectivity. But what if replicability itself is in question?

It’s as if facts are losing their truth
The problem first reared its ugly head in Brussels in September 2007. This occurred when a few dozen neuroscientists, psychiatrists, and drug company executives had gathered in a hotel conference room to hear about a class of drugs know as a atypical or second-generation antipsychotics. According to the data released at this conference it appeared that the therapeutic power of many drugs appeared was steadily waning. A recent study showed than an effect that was less than half of that documented in trials carried out in the early nineteen-nineties. According to extensive replication studies, expensive pharmaceuticals appeared to be no better than previous generations of far less sophisticated, and less profitable, medications.

As recent findings show, all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain, It’s as if our facts are losing their truth: claims that have been enshrined in medical tomes are suddenly unprovable. While this phenomenon does not yet have an official name, it is occurring across a wide range of fields, from ecology to psychology. In the field of medicine, the phenomenon seems extremely widespread, affecting therapies ranging from cardiac stents to vitamine E and antidepressants. For instance, a forthcoming analysis demonstrates that the efficacy of antidepressants has gone down as much as threefold in recent decades.  

The academic super star with a secret
This “problem” was first noticed by Jonathan Schooler  back in the 1990s. As a young graduate student at the University of Washington in 1980, Schooler had appeared to discover an interesting fact about language and memory. At that time, it was widely believed that the act of describing our memories improved them. But, in a series of clever experiments, Schooler demonstrated this was not true. He showed that subjects shown a face and asked to describe it were much less likely to remember that face when shown it later than those subjects who had simply looked at it. Schooler dubed the phenomenon “verbal overshadowing”.

This “discovery” turned Schooler into an academic super star and his study was cited more than 400 times over the years that followed. Buoyed by the success of his first experiment, Schooler went on to extend the model to a variety of other similar tasks which also seemed to demonstrate the same results. But while Schooler was publishing these results in highly reputable journals, a secret worry was gnawing at him: he was finding it impossible to replicate his original findings. He would often see an effect of some kind, but the “verbal overshadowing” was just not as strong. It was as if it was getting weaker. At first, he assumed that he’d made an error in the design of his experiment or has made a mistake in his statistical calculations. 

Cosmic habituation
Schooler tried to put the pesky problem out of his mind. His colleagues assured him that such things happen all the time. Years passed and Schooler found new things to research, he got married and raised a family. But his replication problems kept on getting worse. His first attempt at replicating his 1990 study was in 1995. His results that time turned out to be 30% smaller. The next year, the size of the effect shrank by yet another 30%.

When other labs repeated his experiments, a similar spread of data occurred. It seemed as if nature had changed its mind and was now taking back the original results. Schooler started referring to the problem as “cosmic habituation”, by analogy to the response that occurs when individual subjects habituate to particular stimuli. Although he started joking that the cosmos was habituating to his ideas, he took it very personally.

Now a tenured professor at the University of California, Schooler still ruminates over those experiments. “I know I should just move on already. I should really stop talking about this. But I can’t.” Schooler is convinced that he’s stumbled on a serious problem. A problem that afflicts many of the most new ideas in many fields of research. And he’s not alone. Other scientists have also noticed the phenomena, looked into it and come up blank as well.  

Extraordinarily lucky researchers
It has been argued that the decline effect is largely be a product of publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistitian Theodore Sterling back in 1959. Sterling noticed that 97% of all published psychological studies with statistically significant data found the effect they were looking for. A significant result is defined as any data point that would be reproduced by chance less than 5% of the time.

This test was invented in 1922 by the English mathematician Ronald Fisher, who picked 5% as the boundary line, somewhat arbitrarily, because it made pencil and slide-rule calculations easier. Sterling saw that if 97% of psychology studies were proving their hypotheses, either psychologist were extraordinarily lucky or they published only the outcomes of successful experiments. In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that are not favourable. But it is becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.

While publication bias almost certainly plays a role in the decline effect, it remains and incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people who have been unable to replicate their initial data despite their best efforts.

Our beliefs as a form of blindness
One of the classic examples of selective reporting concerns the testing of accupuncture in different countries. Widely accepted as a medical treatment in various Asian countries, acupuncture is much more contested in the West. These cultural differences have profoundly influenced the results of clinical trials. Between 1966 and 1195, there were 47 studies of acupuncture in China, Taiwan and Japan. Every single trial concluded that accupunture was an effective treatment. During the same period, there were 94 clinical trials of acupuncture in the United States, Sweden and the UK. Only 56% of these studies found any therapeutic benefits. This wide discrepancy suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see. Our beliefs are a form of blindness.

Reporting only positive results
John Ioannidis, an epidomologist at Stanford University, argues that such distortions are a serious issue in biomedical research. “these exagerations are why the decline has become so common,” he says. “It’d be really great if the initial studies gave us an accurate summary of things. But they don’t. And so what happens is we waste a lot of money treating millions of patients and doing lost of follow-up studies on other themes based on results that are misleading.”

In 2005, Ioannidis published an article in the Journal of the American Association that looked at the forty-nine most cited clinical-research studies in three major medical journals. Forty-five of these studies reported positive results, suggesting that the intervention being tested was effective. Because most of these studies were randomized controlled trials–the gold standard of medical evidence–they tended to have a significant impact on clinical practice, and led to the spread of treatments such as hormone replacement therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes. Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been subject to replication, 41% had either been directly contradicted or had their effect size downgraded significantly.

Fashionable subjects are worse
The situation is even worse when a subject is fashionable. In recent years, for instance, there have been hundreds or studies on the various genes that control the differences between in disease risks between men and women. These findings have included everything from the mutations responsible for the increased risk of schizophrenia to the genes underlying hypertension. Ioannidis and his colleagues looked at four hundred and thirty-tow of these claims. They quickly discovered that the vast majority had serious flaws. But the most troubling fact emerged when the looked at the test of replication: out of four hundred and thirty-tow claims, only a single one was consistently replicable. “This doesn’t mean that none of these claims will turn out to be true” he says. But, given the fact that most of them were done badly, I wouldn’t hold my breath.”

According to Ioannidis, the main problem is that too many researchers engage in what he calls “significance chasing” or finding ways to interpret the data so that it passes the statistical test of significance–the 95% boundary invented by Ronald Fisher. “The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy.” Ioannidis says. In recent years, Ioannidis has become increasingly blunt about the pervasiveness of the problem. One of his most often cited papers has a deliberately provocative title: “Why most published research findings are false.”

A fundamental cognitive flaw
The problem of selective reporting is rooted in a fundamental cognitive flaw: we like proving ourselves right and hate being wrong. Or, as Ioannidis puts it: “It feels good to validate an hypothesis. It feels even better when you got a financial interest in the idea or your career depends on it. And that’s why, even after a claim has been systematically disproven you still see some researchers citing the first few studies that show a strong effect. They really want to believe that it’s true.”

That why scientists need to become more rigorous about data collection before they publish. We’re wasting too much time chasing after bad studies and underpowered experiments. The current “obsession” with replicability distracts from the real problem, which is faulty design. Nobody even tried to replicate most science papers – there are simply too many. According to Nature, a third of all studies never even get cited, let alone repeated. “I’ve learned the hard way to be exceedingly careful” Schooler says. “Every researcher,, should have to spell out, in advance, how many subjects they’re going to use, and what exactly they’re testing, and what constitutes a sufficient level of proof. We have the tools to be much more transparent about our experiments.”

In a  forthcoming paper, Schooler recommends the establishment of an open-source database, in which researchers would be required to outline their planned investigations and document all of their results. “I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says.”It would help us finally deal with all these issues that the decline effect is exposing.”

We still have to choose what to believe
Ultimately, anomalies in test results demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effects sizes, they often continue to get cited in textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this why the decline effect is so troubling. Not because it reveals the human fallibility of science. Not because data can be tweaked and beliefs shape perception. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. The decline effect is troubling because it reminds us of how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proven. And just because an idea can be proven doesn’t mean it’s true. Once all the experiments are done, we still have to choose what to believe.

This text was edited from an article written by Jonah Lehrer and published in the December 13, 2010 edition of the New Yorker magazine.