Accuracy of Test Interpretation - The Facts

steve74

Scholar
Joined
Feb 1, 2004
Messages
89
In another thread Wrath of the Swarm posed a question about the interpretation of medical test results. Some of you may be aware that this thread took a direction that, to some degree, moved away from the original point. I think the direction that this thread should have moved in (if we are to create more light than heat) is to look at how many medical students and doctors actually get questions of this sort correct. That is what I propose to do with this thread.

In the original thread Wrath of the Swarm wrote:
....all those sites that pointed out that doctors routinely fail questions about the most basic aspects of test interpretation.
I would like to dispute that studies do show that doctors routinely fail basic aspects of test interpretation.

In order to examine how medical students and doctors perform on this sort of question, I will use the question from Casscells et al. (1978) study as an example. This question was given to a group of faculty, staff and fourth-year students at Harvard Medical School.

From the original study:
“If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person’s symptoms or signs?_ ____%”
(Incidentally, an assumption needs to be made in this question to give a precise result, but for the purpose of the study this does not matter as subjects only had to give an approximately correct answer to be classified as giving the correct answer.)

Only 18% of those taking the test gave the correct answer (~2%) and the most frequent answer given was in fact 95%, which is woefully wrong. This seems, at least initially, like a worrying statistic, especially considering the subjects are from Harvard Medical School. A large number of studies in the 70s and 80s showed similar patterns from many different subjects across many similar tasks.
This incorrect sort of response was termed 'base-rate neglect'; for the rather obvious reason that subjects appeared to be neglecting to take into account the low base rate (or prevelance) of a disease and, thereby, overestimating the power of the test. The basic argument is that human reasoning does not approximate Bayesian reasoning.
However, from the late 80s until the present day, a number of cognitive psychologists have started to reappraise the phenomenon and question whether base-rate neglect is, in fact, really present in human reasoning. In fact, Staddon (1988) has argued that in animals from sea snails to humans, the learning mechanisms responsible for habituation, sensitization, classical conditioning and operant conditioning can be formally described as Bayesian inference machines.
I'd like to look, in particular, at a study by Cosmides & Tooby (1996) which used the question above as a control condition. They also presented the same question (in terms of the data) but rephrased by using frequencies rather than percentages (technically a frequentist rather than Bayesian representation). The results painted a completely new picture of base-rate neglect. 12% of the the Stanford (non-medical) students in this study who were tested on the original question gave the correct answer, as opposed to 18% in the original study (I will leave this to others to speculate as to whether this reflects on the academic merits of the two universities or is merely experimental noise).

When students were presented with the question using frequencies, specifically:
1 out of every 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive (i.e., the “true positive” rate is 100%). But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease (i.e. The “false positive” rate is 5%).
Imagine that we have assembled a random sample of 1000 Americans. They were selected by a lottery. Those who conducted the lottery had no information about the health status of any of these people.
Given the information above:
on average,
How many people who test positive for the disease will actually have the disease?_____ out of ______
76% of the non-medical students got this question correct compared to 18% of the medical students in the original study. So it appears that just by rewording the question in terms of frequencies instead of percentages base-rate neglect disappears. In fact, when the non-medical students were presented with the above question along with some supplementary questions to focus their reply 92% got the correct answer.
A number of other studies (e.g. Medin & Edelson (1988)) have shown the same sort of effect and base-rate neglect is now regarded as a rather simplistic and outmoded way of looking at human reasoning. In short, the earlier studies do not show that medical students and doctors are unable to take base-rates into account. If they did, how could non-medical students do so well just by posing the question in a different way?
The next time I'm sick I think I'll still consult a doctor rather than a statistician.

Casscells, W., Schoenberger, A. and Grayboys, T. (1978). Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299, 999-1000.

Cosmides, L. and J. Tooby (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58: 1-73.

Medin, D. L., and S. M. Edelson. 1988. Problem structure and the use of base-rate information
from experience. Journal of Experimental Psychology: General 117(1): 68–85.

Staddon, J.E.R. (1988). Learning as inference. In R.C. Bolles & M.D. Beecher (Eds.), Evolution and learning. Hillsdale, NJ: Erlbaum.


Note: I've been unable to find links to the above articles online. If anyone can find links to any or all of them I'd appreciate it if they could post them. This article, however, presents the argument that base rates are not normally neglected in human reasoning.
 
But don't those questions give the students the terms they are looking for: 'false positive rate' and so on, enabling them to plug the numbers into formulae they have learned by rote? This does not demonstrate true understanding on the part of the students, merely that they are able to function effectively as calculating machines.
 
ceptimus said:
But don't those questions give the students the terms they are looking for: 'false positive rate' and so on, enabling them to plug the numbers into formulae they have learned by rote? This does not demonstrate true understanding on the part of the students, merely that they are able to function effectively as calculating machines.

The students in this study were non-medical undergraduate students, so I really don't think that applies. I managed to get through maths A-level and three years as a psychology undergrad without ever learning Bayes' theorem.

Also, you'll notice that both studies gave false positive rates yet the level of correct responses in the Cosmides & Tooby study was much higher. These students were demonstrating real understanding, all they needed was the data in a format (frequencies rather than percentages) that is used in everyday human reasoning.

Edited to add: I just checked the original paper and Cosmides & Tooby did a further experiment which made no mention of false positives or percentages yet still 72% got the correct answer.
 
OK, what assumption must be made?

I've calculated the true negative rate as ~0.95

The sum total of true positives and false negatives is 0.001, but that would make the positive predictive value anything <= 0.02 -- could be much lower depending on the sensitivity or the false negative rate.
 
sickstan said:
OK, what assumption must be made?

I've calculated the true negative rate as ~0.95

The sum total of true positives and false negatives is 0.001, but that would make the positive predictive value anything <= 0.02 -- could be much lower depending on the sensitivity or the false negative rate.

You need to assume that the true positive rate is 100% to give an exact value of 1.963%. But, as I said, this doesn't really matter as even a roughly correct answer will be classified as correct. The most common answer was 95% which was way out.

The main point is that when the question is presented in terms of frequencies, a lot more people are correct.
 
sickstan said:
OK, what assumption must be made?
The assumption that must be made is that the false negative rate is sufficiently small to not make a difference in the calculation. Only in such a case does knowing the population incidence permit the question to be answered.
 
Wrath of the Swarm said:
In one of the links I cited, that basic question was asked in a manner that presented the frequencies. The students gave the correct answer less than half of the time.

Perhaps these links would be helpful:

http://dieoff.org/page19.htm
http://www.mpib-berlin.mpg.de/en/institut/dok/full/martignon/kssbimbri/kssbimbri.pdf
http://yudkowsky.net/bayes/bayes.html
http://www.bcu.ubc.ca/~whitlock/bio300/Probability/FalsePositives&Doctors.html

I presume you are referring to the Gigerenzer & Hoffrage (1995) study where 46% of doctors gave the correct answer? This study used the same question as Eddy (1982). In Eddy's original study 70-80% of subjects gave the incorrect answer. So, there was an obvious improvement when using frequencies rather than probabilities (although not as large as in the Cosmides & Tooby study).

Your original question, however, was not given using frequencies yet you still claimed that it demonstrated that doctors could not interpret tests correctly. I happen to have Gigenrezer & Hoffrage's paper in front of me and they would dispute that:

Testing people's competencies for Bayesian inference with standard probability formats thus seems analogous to testing a pocket calculator's competence by feeding it binary numbers."

It would seem that even the studies you cite in favour of your argument disagree with your position.

Edited for spelling
 
I don't follow. They say that people can't answer the question - how exactly does that contradict my position, again?
 
It contradicts your position because they say that the reason that people get the question wrong is not that they are unable to reason correctly but because of the way it is presented. You claimed that doctors doing poorly on this question meant that they couldn't interpet test results well. That is not the case, as the Cosmides & Tooby study shows. It is the case that people (even non-medical students) can answer this sort of question if the question is presented in a suitable format.
 
Some of those sites demonstrate that doctors can't answer the questions well even when the data are presented in terms of frequencies.

There are also studies that show the order in which information is presented can have a significant effect on the conclusions people draw from them.

[added] If people can't answer the question when it is presented in a particular way, then that does show that they can't reason properly. If they could, it wouldn't matter in what way the problem was presented, as long as the information presented was the same.
 
Some of those sites demonstrate that doctors can't answer the questions well even when the data are presented in terms of frequencies.
All the evidence shows that when questions are presented in terms of frequencies performance improves dramatically. The Cosmides & Tooby study also shows that 92% of non-medical students can answer this sort of question correctly.

There are also studies that show the order in which information is presented can have a significant effect on the conclusions people draw from them.
And?

[added] If people can't answer the question when it is presented in a particular way, then that does show that they can't reason properly. If they could, it wouldn't matter in what way the problem was presented, as long as the information presented was the same.
The Cosmides & Tooby study shows quite clearly that people can reason correctly about these sorts of problems. The fact that it does make a difference how the information is presented shows that the wording of the questions such as the one you asked is the problem not doctors' reasoning skills.
If the question you asked shows that doctors can't reason correctly how is it possible that 92% of non-medical students get it correct, just by changing the wording?
 
First of all, other studies show that even when the presentation is altered, most people can't answer it correctly. I admit that one study shows that the majority of people can get the right answer if the question is presented just right.

Secondly, if people can't solve the problem when sufficient information is presented to them, that would seem to suggest there are some problems with their reasoning.
 
First of all, other studies show that even when the presentation is altered, most people can't answer it correctly.
Would you care to reference them? You've only mentioned one so far.

I admit that one study shows that the majority of people can get the right answer if the question is presented just right.

Well, its more than one study. For example, in the Medin & Edelson study I mentioned above, non-medical students were trained on pairs of symptoms which matched a particular disease. The results showed that the students could successfully use this training to 'diagnose' correctly. Moreover, the students performance was entirely free of base-rate neglect. In fact, the students would often make diagnoses based on symptoms with low base rates, if those symptoms provided a perfect predictor of the disease. Once again, this is non-medical students showing no base-rate neglect in a clinical task. And, in this particular study, the data was presented as a series of matched stimuli (ie symptom 1, symptom 2, disease) rather than in sentence form, so claiming that the question needs to be presented 'just right' is incorrect.

There is also further work on this topic by Weber et al. (1993). I'll quote you some of their conclusion:
When mentioning a particular hypothesis at multiple levels of specificity, doctors generated them in a logically consistent general-to-specific order most of the time and increasingly so with more clinical experience.


Secondly, if people can't solve the problem when sufficient information is presented to them, that would seem to suggest there are some problems with their reasoning.
As I have already pointed out to you, 92% of the non-medical students in the Cosmides & Tooby study were able to reason quite correctly on this type of question. There are plenty of other studies which will back this up, and I have quoted two of them above. The simple fact is that doctors do interpret test results well, and the research backs this up. The idea that people neglect base-rates is outmoded and incorrect.

Weber, E. U., Böckenholt, U., Hilton, D. J., and Wallace, B (1993). Determinants of diagnostic hypothesis generation: Effects of information, base rates, and experience. Journal of Experimental Psychology: Learning, Memory and Cognition 19:1151-1164.
 
Wrath of the Swarm said:
Secondly, if people can't solve the problem when sufficient information is presented to them, that would seem to suggest there are some problems with their reasoning.

What Steve-integer said makes a lot of sense (and I recognize his writing from somewhere. He reminds me of a type of dinosuar) I'd like to add that the only way it really matters how you ask the question is the way in which the doctor encounters the question in terms of his practice. Clearly, given the nature of the practice, the frequentist mode dominates.
 
I would agree that the base-neglect problem has been magnified out of proportion, but I disagree strongly that it doesn't exist at all.

Doctors (and all human beings) seem to rely on intuitive heuristics rather than consciously performing calculations. I submit that this can lead to dangerous "blind spots" and errors in reasoning.

In this link, even presenting general frequencies didn't lead to very good accuracies. Only when the numbers for each individual state were provided did accuracy rise significantly.

Don't you find it even a little disturbing that people can only reach the right answer when most of the work is done for them? It's quite simple to derive the frequencies from the standard presentation, yet hardly anyone does so correctly?

I'm sorry, but I find your attempted defense of medical professionals and humanity in general to be lacking.
 
I would agree that the base-neglect problem has been magnified out of proportion, but I disagree strongly that it doesn't exist at all.

Doctors (and all human beings) seem to rely on intuitive heuristics rather than consciously performing calculations. I submit that this can lead to dangerous "blind spots" and errors in reasoning.
I would submit that one of the people who has been magnifying it out of proportion is yourself. You claimed that doctors who couldn't solve the question you posed were unable to interpret test results accurately. The evidence suggests otherwise.
In this link, even presenting general frequencies didn't lead to very good accuracies. Only when the numbers for each individual state were provided did accuracy rise significantly.

Don't you find it even a little disturbing that people can only reach the right answer when most of the work is done for them? It's quite simple to derive the frequencies from the standard presentation, yet hardly anyone does so correctly?
The point is that people's reasoning, probably for evolutionary reasons, is biased towards frequentist interpretations of data. This does not mean that they are unable to reason correctly, as the studies I cited above show.

I'm sorry, but I find your attempted defense of medical professionals and humanity in general to be lacking.
Really, how so? I've presented plenty of evidence that your assertion that doctors can't interpret test results correctly is incorrect. I've also presented plenty of evidence that the general population doesn't show base rate neglect. Throughout this thread you have been backing away from your original position. I find your attempts to discredit the medical profession to be so wide of the mark, they're almost laughable.
 
Um... how can I put this? ... no, I haven't.

I'm sure you'd like to believe otherwise, but I've offered the same point I always have:

People don't understand statistics very well at all, even doctors. I find it odd that you emphasize the one experiment that found high levels of subject comprehension when the presentation of the problem was greatly simplified, and ignoring all the others that show most people still can't do the problem.
 
Wrath of the Swarm said:
Um... how can I put this? ... no, I haven't.

I'm sure you'd like to believe otherwise, but I've offered the same point I always have:

People don't understand statistics very well at all, even doctors. I find it odd that you emphasize the one experiment that found high levels of subject comprehension when the presentation of the problem was greatly simplified, and ignoring all the others that show most people still can't do the problem.

You have quite demonstrably backed down:

I would agree that the base-neglect problem has been magnified out of proportion

This from someone who started a thread with a base-rate neglect question to support his argument that doctors can't interpret tests accurately!

And please don't start with this 'one experiment' nonsense. I have referenced five studies in this thread (and there are many more) which support the position that base-rate neglect is an illusion. Despite all the evidence that contradicts it, you still cling to your dogma.
 
People aren't *completely* unaware of the base rate - but some researchers mistakenly thought they are precisely because performance on such questions is so terrible.

It's most certainly not an illusion.
 

Back
Top Bottom