Wrong understanding of statistics leads to wrong science. The p-value is dead. Long live the e-value!

Rianne de Heide is a statistician at the Vrije Universiteit Amsterdam. While she explains her research, she repeatedly has to suppress the urge to draw on the blackboard, which is lacking in the room at the VU in Amsterdam. She wants to show mathematical definitions and graphs. Because this is necessary to really understand the p-value. The p-value is the standard that applies in science to demonstrate a connection. “The problem is that it appears difficult for researchers to understand what a p-value actually is.”

P-values ​​are widely used. Especially in medicine, psychology and economics. A p-value indicates the chance that the results obtained by scientists in a study give an incorrect picture of reality. That the data found is very coincidental, an exception. If the probability is less than 0.05, the results are assumed to be correct. For example, to demonstrate that a medicine works, a p-value of less than 0.05 has been established as the official standard used by the American Food and Drug Administration (FDA) and the European Medicines Agency (EMA).

It proves difficult for doctors, psychologists and anyone else who wants to use the p-value to understand how exactly the p-value works. Mistakes are sometimes made. De Heide has therefore worked with other mathematicians on a replacement for the p-value: the ‘e-value’.

In January she presented the research on which she has been working – with Peter Grünwald and Wouter Koolen – since 2016 at the Royal Statistical Society in London, an important organization for statistics. “It has been clear for years that that p-value does not actually work well. It is a great honor that I can present my work here.”

It now often happens that when research is done again, different results emerge

Why is it so important to replace the p-value?

“In both medical and social science, researchers are talking about the replication crisis. It now often happens that when research is done again, different results emerge. For example, one study may find a positive effect of a drug while another may not find it at all.

“It turns out that a lot of research is simply wrong. A famous article about this problem in medical science is also called: ‘Why Most Published Research Findings are False’. And the same is said about social science. The use of the p-value is one of the causes of this problem.”

What goes wrong with the p-value?

“There are all kinds of pitfalls to using a p-value as a way to test a hypothesis. The investigation must therefore proceed according to strict rules. Scientists do not always adhere to this, because they do not understand exactly how the p-value works.

“Questionnaires have been sent to doctors and psychologists, among others, which show that actually don’t know a lot what you calculate with the p-value. And you have to remember: doctors read articles about their field every week. They are full of statements about p-values. Yet less than half of the doctors gave the correct answer to the question of what the p-value means. Even math teachers often don’t know the right answer.”

Something that researchers often do, but which is actually not allowed, is adding extra data afterwards

So what are scientists doing wrong when it comes to statistics?

“Something that researchers often do, but which is actually not allowed, is that they add extra data afterwards. Suppose researchers investigate whether a drug can lower blood pressure and they investigate this in a group of thirty test subjects. It may be that the blood pressure does drop in many test subjects, but it is not enough to get a p-value that is less than 0.05. Researchers often think: let’s add some more test subjects to make the result statistically significant.”

“This is called ‘optional stopping’. In principle, it is a logical intuition that you want to increase the amount of data. But with the p-value this is not allowed this way. It can be proven mathematically that the chance of a false positive becomes very high. So after adding test subjects you find a p-value below 0.05 and conclude that there is an effect, but in fact this effect is not there at all. In some cases the chance is even 100 percent.”

That sounds crazy. If you add test subjects are you sure you will get incorrect results?

“Yes, in some cases. If you do everything by the book, the chance of a false positive is only 5 percent, because the p-value is 0.05. But if you do optional stopping and you add a few more people after viewing one group, this chance increases. Often researchers do not mention that they have done this, or are not even aware that it is not allowed.

“Sometimes scientists consciously want to do optional stopping. For example, you conduct research per subject and stop if you see no effect. That is less expensive and often more ethical. For example, if you want to investigate whether a vaccine works. If you were to use the p-value, the chance of a false positive would really be 100 percent.”

A useful feature is that you can also combine e-values

Does this problem not exist with the new e-value that you propose?

“No, with the e-value you can simply do optional stopping. It has also already been used for research into the effectiveness of a vaccine. We also think that the e-value is generally easier to understand than the p-value and will therefore lead to fewer problems.”

How does this e-value work?

“The e-value indicates how great the expectation is that a hypothesis is correct. The ‘e’ stands for ‘expectation’, but also for ‘evidence’, because it is also a measure of how much evidence your research provides for a hypothesis.

“For example, if you are conducting research into the medicine that is intended to lower blood pressure, the e-value indicates how likely it is that the medicine will actually lower blood pressure. As with the p-value, there is a lower limit. If the e-value is greater than 20, you can speak of statistical significance, and in this example you can therefore assume that the medicine lowers blood pressure. An e-value is therefore not a probability, like the p-value, but a positive number.

“A useful feature is that you can also combine e-values. This allows you to indicate how two studies strengthen the evidence for a hypothesis. Simply by multiplying the e-values. If one research group finds an e-value of 5 and the other finds a value of 10, then together they can say they have a value of 50. This is not possible with the p-value.”




ttn-32