Digital Tbucket Tank (DTT)

How Scientists Can Stop Being Fooled About Statistics

An exciting article by Dorothy Bishop appeared in Nature 584: 9 (2020); doi: 10.1038 / d41586-020-02275-8

Collecting simulated data can reveal common ways in which our cognitive biases lead us astray.


Numerous efforts have been made over the past decade to promote robust and credible research. Some focus on changing incentives, such as changing funding and publication criteria, to favor open science over sensational breakthroughs. But attention must also be paid to the individual. Overly human cognitive biases can lead us to see results that are not there. Faulty reasoning leads to sloppy science, even when the intentions are good.

A few words about the author:  

Professor Dorothy Bishop


Professor of Developmental Neuropsychology, Department of Experimental Psychology; Fellow of St. John's College


Professor Bishop researches language impairments in children. In some cases, speech difficulties have an obvious cause, such as hearing loss or a condition like Down's syndrome. In other cases, children have particular difficulty learning to speak or understanding language for no apparent reason. Professor Bishop has studied children with "specific speech disabilities", or SLI, who make up about 3% of the population but tend to be neglected by researchers. Using twin studies, she studied the genetic components of these disorders and worked with molecular geneticists to find out which genes are involved. Source image: Wikipedia


How Scientists Can Stop Being Fooled About Statistics

Numerous efforts have been made over the past decade to promote robust and credible research. Some focus on changing incentives, such as changing funding and publication criteria, to favor open science over sensational breakthroughs. But attention must also be paid to the individual. Overly human cognitive biases can lead us to see results that are not there. Faulty reasoning leads to sloppy science, even when the intentions are good.

Researchers need to become more aware of these pitfalls. Just as laboratory scientists are not allowed to deal with dangerous substances without safety training, researchers should not be allowed to get anywhere near a P-value or similar statistical probability measure until they have demonstrated that they understand what it is means.

We all tend to overlook evidence that contradicts our views. When faced with new data, our pre-existing ideas can lead us to see structures that don't exist. This is a form of confirmation bias where we search for information and remember information that goes with what we are already thinking. It can be adaptable: people need to be able to weed out important information and act quickly to get out of danger. But this filtering can lead to scientific errors.


The measurement of the charge of the electron by the physicist Robert Millikan in 1913 is an example of this. Although he claimed that his work included all of the data points from his famous oil droplet experiment, his notebooks revealed other, unreported data points that would have changed the final value only slightly but would have given him a larger statistical error. There was debate over whether Millikan intended to mislead his readers. However, it is not uncommon for honest people to suppress memories of inconvenient facts (RC Jennings Sci. Eng. Ethik 10, 639-653; 2004).

Another type of limitation promotes misunderstandings in probability and statistics. We have long known that people have difficulty grasping the uncertainty associated with small samples (A. Tversky and D. Kahneman Psychol. Bull. 76, 105-110; 1971). As a current example, let's assume that 5% of the population is infected with a virus. We have 100 hospitals that test 25 people each, 100 hospitals that test 50 people, and 100 that test 100 people. What percentage of hospitals will not find any cases and mistakenly conclude that the virus is gone? The answer is 28% of the hospitals that test 25 people, 8% of the hospitals that test 50 people, and 1% of the hospitals that test 100 people. The average number of cases detected by hospitals is the same regardless of the number of cases tested, but the range is much larger with a small sample.

This non-linear scaling is difficult to grasp intuitively. It leads to underestimating how noisy small samples can be and therefore to conducting studies that lack the statistical power to detect an effect.

The researchers also fail to realize that the significance of a result, expressed in a P-value, depends critically on the context. The more variables you examine, the more likely it is that you will find an incorrectly "significant" value. For example, if you test 14 metabolites for an association with a disorder, then the probability that you happen to find at least one P-value below 0,05 - a commonly used threshold of statistical significance - is not 1 in 20, but closer to 1 to 2.

How can we convey an understanding of this? One thing is clear: conventional training in statistics is inadequate or even counterproductive because it could give the user inappropriate trust. I'm experimenting with an alternative approach: generating simulated data that students can subject to various statistical analyzes. I use this to convey two key concepts.

First, when students are presented with null records (like random numbers), they quickly discover how easy it is to find false results that appear statistically "significant". Researchers need to learn that interpreting a P-value when asked, "Is A associated with B?" is very different from the question "Are there correlations for the variables A, B, C, D and E for which P <0,05? The question of whether a particular metabolite is associated with a disease is not the same as looking for a range of metabolites to see if any are associated with it, the latter requiring much more rigorous testing.

Keeping the four horsemen of irreproducibility in check

Simulated data also provide information when the samples come from two "populations" by different means. Students quickly learn that with small sample sizes, an experiment can be useless to reveal even a moderate difference. A 30-minute data simulation can stunn researchers if they understand the implications.


Researchers need to acquire lifelong habits to avoid being misled by affirmative biases. Observations that contradict our expectations require special attention. Charles Darwin said in 1876 that he had made a habit of "whenever I come across a published fact, observation, or thought that contradicts my general conclusions, immediately and immediately write a memorandum about it to write: because I had established from experience that such facts and thoughts were much more likely to escape memory than favorable ". I've seen that myself. In writing literature reviews, I was horrified to find that I had completely forgotten to mention papers that ran counter to my instinct, even though the papers had no particular flaws. I am now trying to list them.

We all find it difficult to see the flaws in our own work - this is a normal part of human cognition. But if we understand these blind spots, we can avoid them.