Many researchers have an unhealthy obsession with p-values and it has been ruining science as the need to prove hypotheses right has produced false results. But is the defamed p-value really to blame?

Firstly, the p-value is used to test or dismiss a 'null hypothesis', which states that there is no difference between two groups or that there is no correlation between a pair of characteristics. Generally, a p-value of 0.05 or less is the preferred value as it means a finding is statistically significant and results can be published.

However, the American Statistical Association (ASA) said that it is not entirely true. A p-value of 0.05 only means that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed.

P-value does not indicate the importance of a finding

It also cannot indicate the importance of a finding; for example, a drug can have a statistically significant effect on patients' blood cholesterol levels without having a therapeutic effect.

This has a direct effect on research that many biomedical scientists have been carrying out as the p-value has been used as a measure to distinguish claims based on a single patient's outcome from, for example, clinical trials on a large group of people. It is also often the determining factor for journals to publish a study and whether reporters will report it.

Statistician Andrew Gelman from Columbia University, and psychologist Eric Loken from the University of Connecticut, say scientists have fallen into a "fallacy" - that if a statistically significant result appears in an experiment with many variables to account for, that result is by definition, a sound one.

“Statistically speaking, a statistical significant result obtained under highly noisy conditions is more likely to be an overestimate and can even be in the wrong direction,” said Gelman.

“In short: a finding from a low-noise study can be informative, while the finding at the same significance level from a high-noise study is likely to be little more than … noise.”

Stop abusing the p-value, statisticians say

But Steven McKinney, a statistician with British Columbia Cancer Agency Vancouver said that critics should stop picking on the p-value, and pick on the abuses of the p-value instead.

"It's not small p-values that are the problem, it is this repeated phenomenon of researchers publishing a result with a small p-value with no attendant discussion of whether the result is one of any scientific relevance and whether the appropriate amount of data was collected," he said. "This is the phenomenon behind the current replication crisis."

As shown in a paper published by Stanford meta-researcher John Ioannidis and his colleagues last year, there is a growing use of p-values. The paper analysed more than 1.6 million study abstracts and more than 385,000 full-text papers, all of which looked at p-values. They found "an epidemic" of statistical significance.

96% of the papers claimed statistically significant results and "the proportion of papers that use p-values is going up over time, and the most significant results have become even more significant over time."

Only about 10% of the papers mentioned effect sizes in their abstracts and fewer mentioned measures of uncertainty, such as confidence intervals. Rarely was the real importance of the p-value findings reported.

So as p-values have become more popular, they've also become more meaningless.

Stopping the p-value epidemic

Just last year, the ASA warned in a statement that the misuse of the p-value is contributing to the number of research findings that cannot be reproduced. It even published a guide on the use of the p-value. It also advised researchers to avoid drawing scientific conclusions or making policy decisions based on p-values alone.

Some journals have suggested a ban on publishing papers that contain p-values, but this could be counter-productive, says Andrew Vickers, a biostatistician at Memorial Sloan Kettering Cancer Centre in New York City.

Instead, he suggests that researchers should be instructed to "treat statistics as a science, and not a recipe".

However, a better understanding of the p-value will not eliminate the urge to succeed using statistics to create an impossible level of confidence, Gelman says.

"People want something that they can't really get," he says. "They want certainty."

So unless researchers forgo the need for certainty and recognise that statistical significance does not equal importance, it is one step forward to addressing the p-value epidemic. MIMS

Read more:
3 reasons why you should not always trust a scientific paper
Retracting research papers: The cost of fraudulence to science
Fraudulent data unveiled in 33 research publications authored by Japanese scientist