Accuracy and prediction… the limits of our intuition
Imagine that an infectious disease is spreading around the world, and that a test is available, with for example a proven accuracy of 90%. Ask yourself the following question:
What is the likelihood that I am infected, if the test is positive?
Usually, people answer that the probability is 90%, that is, equal to the accuracy of the test. But this answer is wrong and betrays our difficulty in reasoning correctly with probabilities.
In reality, the probability in question could be any number between 0% and 100%!
Now I’ll explain. Before doing so, however, a clarification is necessary. A test has two types of accuracy. One that allows it to detect infected people, which is called ‘sensitivity’, and one that allows it to detect non-infected people, which is called ‘specificity’. But to simplify the discussion, we can here consider that the 90% accuracy of the test in question means that both its sensitivity and its specificity are 90%.
So, how is it possible that with a 90% accurate test, the probability that a person is infected, when the test response is positive, can be any percentage, between 0% and 100%?
It’s simple. It is because we usually forget that this probability depends on how many infected persons are present in the population.
To understand this, we must first remember that the probability in question is a conditional probability’. In fact, we are looking for the probability that the person is infected (in), conditional on the fact that the test result is positive (+). Let us denote this probability P(in|+), as is customary in probability theory.
Now we must remember that the conditional probability of the event “in”, knowing that the event “+” is realized, is given by the probability that “in” and “+” are simultaneously realized, divided by the probability that “in” is realized. With obvious notation, we therefore have the following formula:
P (in|+) = P (in & +) / P(+).
In the literature, the probability P(in|+) has a name, it is called “positive predictive value”, often indicated with the acronym PPV. To obtain a more explicit formula, the idea is to further decompose the joint probability P(in…