In another thread Wrath of the Swarm posed a question about the interpretation of medical test results. Some of you may be aware that this thread took a direction that, to some degree, moved away from the original point. I think the direction that this thread should have moved in (if we are to create more light than heat) is to look at how many medical students and doctors actually get questions of this sort correct. That is what I propose to do with this thread.
In the original thread Wrath of the Swarm wrote:
In order to examine how medical students and doctors perform on this sort of question, I will use the question from Casscells et al. (1978) study as an example. This question was given to a group of faculty, staff and fourth-year students at Harvard Medical School.
From the original study:
Only 18% of those taking the test gave the correct answer (~2%) and the most frequent answer given was in fact 95%, which is woefully wrong. This seems, at least initially, like a worrying statistic, especially considering the subjects are from Harvard Medical School. A large number of studies in the 70s and 80s showed similar patterns from many different subjects across many similar tasks.
This incorrect sort of response was termed 'base-rate neglect'; for the rather obvious reason that subjects appeared to be neglecting to take into account the low base rate (or prevelance) of a disease and, thereby, overestimating the power of the test. The basic argument is that human reasoning does not approximate Bayesian reasoning.
However, from the late 80s until the present day, a number of cognitive psychologists have started to reappraise the phenomenon and question whether base-rate neglect is, in fact, really present in human reasoning. In fact, Staddon (1988) has argued that in animals from sea snails to humans, the learning mechanisms responsible for habituation, sensitization, classical conditioning and operant conditioning can be formally described as Bayesian inference machines.
I'd like to look, in particular, at a study by Cosmides & Tooby (1996) which used the question above as a control condition. They also presented the same question (in terms of the data) but rephrased by using frequencies rather than percentages (technically a frequentist rather than Bayesian representation). The results painted a completely new picture of base-rate neglect. 12% of the the Stanford (non-medical) students in this study who were tested on the original question gave the correct answer, as opposed to 18% in the original study (I will leave this to others to speculate as to whether this reflects on the academic merits of the two universities or is merely experimental noise).
When students were presented with the question using frequencies, specifically:
A number of other studies (e.g. Medin & Edelson (1988)) have shown the same sort of effect and base-rate neglect is now regarded as a rather simplistic and outmoded way of looking at human reasoning. In short, the earlier studies do not show that medical students and doctors are unable to take base-rates into account. If they did, how could non-medical students do so well just by posing the question in a different way?
The next time I'm sick I think I'll still consult a doctor rather than a statistician.
Casscells, W., Schoenberger, A. and Grayboys, T. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299, 999-1000.
Cosmides, L. and J. Tooby (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58: 1-73.
Medin, D. L., and S. M. Edelson. 1988. Problem structure and the use of base-rate information
from experience. Journal of Experimental Psychology: General 117(1): 68–85.
Staddon, J.E.R. (1988). Learning as inference. In R.C. Bolles & M.D. Beecher (Eds.), Evolution and learning. Hillsdale, NJ: Erlbaum.
Note: I've been unable to find links to the above articles online. If anyone can find links to any or all of them I'd appreciate it if they could post them. This article, however, presents the argument that base rates are not normally neglected in human reasoning.
In the original thread Wrath of the Swarm wrote:
I would like to dispute that studies do show that doctors routinely fail basic aspects of test interpretation.....all those sites that pointed out that doctors routinely fail questions about the most basic aspects of test interpretation.
In order to examine how medical students and doctors perform on this sort of question, I will use the question from Casscells et al. (1978) study as an example. This question was given to a group of faculty, staff and fourth-year students at Harvard Medical School.
From the original study:
(Incidentally, an assumption needs to be made in this question to give a precise result, but for the purpose of the study this does not matter as subjects only had to give an approximately correct answer to be classified as giving the correct answer.)“If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person’s symptoms or signs?_ ____%”
Only 18% of those taking the test gave the correct answer (~2%) and the most frequent answer given was in fact 95%, which is woefully wrong. This seems, at least initially, like a worrying statistic, especially considering the subjects are from Harvard Medical School. A large number of studies in the 70s and 80s showed similar patterns from many different subjects across many similar tasks.
This incorrect sort of response was termed 'base-rate neglect'; for the rather obvious reason that subjects appeared to be neglecting to take into account the low base rate (or prevelance) of a disease and, thereby, overestimating the power of the test. The basic argument is that human reasoning does not approximate Bayesian reasoning.
However, from the late 80s until the present day, a number of cognitive psychologists have started to reappraise the phenomenon and question whether base-rate neglect is, in fact, really present in human reasoning. In fact, Staddon (1988) has argued that in animals from sea snails to humans, the learning mechanisms responsible for habituation, sensitization, classical conditioning and operant conditioning can be formally described as Bayesian inference machines.
I'd like to look, in particular, at a study by Cosmides & Tooby (1996) which used the question above as a control condition. They also presented the same question (in terms of the data) but rephrased by using frequencies rather than percentages (technically a frequentist rather than Bayesian representation). The results painted a completely new picture of base-rate neglect. 12% of the the Stanford (non-medical) students in this study who were tested on the original question gave the correct answer, as opposed to 18% in the original study (I will leave this to others to speculate as to whether this reflects on the academic merits of the two universities or is merely experimental noise).
When students were presented with the question using frequencies, specifically:
76% of the non-medical students got this question correct compared to 18% of the medical students in the original study. So it appears that just by rewording the question in terms of frequencies instead of percentages base-rate neglect disappears. In fact, when the non-medical students were presented with the above question along with some supplementary questions to focus their reply 92% got the correct answer.1 out of every 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive (i.e., the “true positive” rate is 100%). But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease (i.e. The “false positive” rate is 5%).
Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people.
Given the information above:
on average,
How many people who test positive for the disease will actually have the disease?_____ out of ______
A number of other studies (e.g. Medin & Edelson (1988)) have shown the same sort of effect and base-rate neglect is now regarded as a rather simplistic and outmoded way of looking at human reasoning. In short, the earlier studies do not show that medical students and doctors are unable to take base-rates into account. If they did, how could non-medical students do so well just by posing the question in a different way?
The next time I'm sick I think I'll still consult a doctor rather than a statistician.
Casscells, W., Schoenberger, A. and Grayboys, T. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299, 999-1000.
Cosmides, L. and J. Tooby (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58: 1-73.
Medin, D. L., and S. M. Edelson. 1988. Problem structure and the use of base-rate information
from experience. Journal of Experimental Psychology: General 117(1): 68–85.
Staddon, J.E.R. (1988). Learning as inference. In R.C. Bolles & M.D. Beecher (Eds.), Evolution and learning. Hillsdale, NJ: Erlbaum.
Note: I've been unable to find links to the above articles online. If anyone can find links to any or all of them I'd appreciate it if they could post them. This article, however, presents the argument that base rates are not normally neglected in human reasoning.