Poll: Accuracy of Test Interpretation

Crap, I hadda open my big mouth.

This stuff can be resolved only by appealing to signal detection theory (this was a method perfected by communication eniginers working for the military in WW II. They used it to help radar detectors tell if that spot on the screen was just noise, or actually enemy planes).

SDT can be used in any situation where two states of reality are possible (you have cancer / you don't), and you have some way-- a test-- of identifying which is which on any trial (i.e., trial being a single patient that you test).

So, as many posted earlier, on any one trial (i.e. patient tested) there are four possibilities:


sdbox.jpg


The "response" is from the test (positive or negative); the "signa"l is reality / from the patient. God knows (sorry about that) the patient either has the disease or he doesn't

So, this is a detection task. Based on this test, can we detect the presence or absence of disease in our patients?

Detection tasks depend on two issues:

1) Sensitivity (unfortunately, this is not the "sensitivity" people are referring to above). Sensitivity in this example gets at the test's validity. A valid test is more sensitive than an invalid one (in the graph below, the bigger the distance between the means of the two bell curves, the greater the tests sensitivity / validity). If a test had zero validity (or reliability), the two distributions would be right on top of each other.

Finally, sensitivity is often referred to as d' (d prime)

2) Decision Making Criterion. This is up to the tester. At what point (test score) do I conclude the patient has the disease, and at what point (test score) do I assume the person does not have the disease?

This is called Beta, and is represented by the black vertical line in the graph. Any observed test score higher than beta, and the doctor will conclude the patient is positive. Any test score less than beta and the dr. concludes the patient is negative.

ir2.gif


But, where do you put beta? Ideally, to be unbiased, you should put it exactly at the peak where the two curves intersect (in the graph above, Beta looks like it's about a centimeter left of being optimal.

I'm pretty sure at this point, specificity = sensitivity (as sensitivity is defined by other posters).

But, there are times when you want to be biased one way or the other.

If the detection task is hearing the linebacker husband come home while you are diddling his wife, you might
want to avoid misses at all costs (which will result, though, in making lots of false alarms). So, you would set beta way to the left (so, that any remotely loud noise will force you to get up and investigate).

In this scenario, you will make lots hits, but also lots of false alarms. To the best of my understanding, when beta is non-optimal, you could run into situations where sensitivity (which = hits/hits + fa's) differs from specificity (crs/crs + misses)

If the detection task were deciding whether an accused person is guilty, well, then we have to set beta way to the right (given our real high standard of proof beyond a reasonable doubt).

So, although we lower the incidence of putting innocent people in jail (i.e., false alarms) we raise the incidece on letting guilty go free (the miss)

Now, imagine testing not just one patient but 100's or 1000's. The results can be graphed in an ROC curve, which let you see the validity of the test, and sensitivity / specificity.

But, that's a post for a later date!

B
 
Re: Re: Poll: Accuracy of Test Interpretation

bpesta22 said:


I haven't read all the replies, but on at least the first page, no one mentions the issue of base rates, which is what this question is all about. Hey, I give this lecture every semester in an HR class.

If a test is 99% accurate, then if 100 people WITH the disease took it, there would be:

99 hits
1 miss

And, if 100 people without the disease took it, there would be

99 correct rejections and
1 false alarm.

But, the practical value of a test depends on the base rate (the % of the population that has what's being tested for.

Actually, I think Rolfe mentions this on the first page (though she doesn't call them base rates):

The thing is, to get the predictive value of a test (which is what Wrath is asking), you need to know the incidence of the condition in the population representative of the individual being tested. This is obviously higher if that population is "sick people with clinical signs typical of the disease in question". In fact, the relevant figure is the clinical probability that this individual is affected.

Wrath said this was incorrect though.
 
Indeed.

First, it's the sick people in the tested sample population that's important for determining the accuracy.

Second, Rolfe is confusing having different sample base rates and performing multiple tests. If people are screened before tests are ever used on them, then the chance that the final conclusion is correct increases dramatically - more than one test is involved. This has nothing whatsoever to do with the accuracy of the test under discussion.

Third, that's just one of the things she got wrong, not the only one.
 
Wrath of the Swarm said:
Indeed.

First, it's the sick people in the tested sample population that's important for determining the accuracy.

Second, Rolfe is confusing having different sample base rates and performing multiple tests. If people are screened before tests are ever used on them, then the chance that the final conclusion is correct increases dramatically - more than one test is involved. This has nothing whatsoever to do with the accuracy of the test under discussion.

I think that was entirely her point. That the prevalence of the disease in the population representative of the patient is the important parameter, and in most situations this will be higher than that of the population as a whole. This affects the final result, making the result of the test more reliable.

Note that here Rolfe is talking about the probabilty that the final conclusion is correct, not about the "accuracy" of the test (whatever that means).
 
But that is precisely my point: such a discussion is invalid.

Sorting patients according to symptoms is a test in itself, which results in a certain statistical distribution in the resulting population. Presumably the screening means that more people in the testing population will have the disease.

But that doesn't make any difference to the accuracy of the main test in this case. Moreover, we're dealing with test sample of one (as is always the case), and this person either has the disease or he doesn't. The distribution is discrete instead of continuous.

We could compute how likely it was that the patient was in a particular category, work out how accurate the test would be in such a case, and average out the accuracy across possible states to find an mean accuracy, but that figure wouldn't be very useful at all.

And again: it makes the conclusion drawn more reliable. It doesn't change any aspect of the test itself.
 
Wrath of the Swarm said:
But that is precisely my point: such a discussion is invalid.

Sorting patients according to symptoms is a test in itself, which results in a certain statistical distribution in the resulting population. Presumably the screening means that more people in the testing population will have the disease.

Well, it doesn't neccessarily have be much of a test. For example, if a particular disease is much more prevalent in white males over 50, and the patient fits into this category, then the conclusion drawn from the test will be more reliable than for the general population, no?

But that doesn't make any difference to the accuracy of the main test in this case.

Just to clarify something, in your question the "accuracy" is 99%, yes? If so, then I fail to see how the comment by Rolfe contradicts this. Bear in mind that this was before all the semantic quibbling about whether the specificity and sensitivity are equal, or whatever. What's being talked about here is the probability that the diagnosis is correct.

Moreover, we're dealing with test sample of one (as is always the case), and this person either has the disease or he doesn't. The distribution is discrete instead of continuous.

We could compute how likely it was that the patient was in a particular category, work out how accurate the test would be in such a case, and average out the accuracy across possible states to find an mean accuracy, but that figure wouldn't be very useful at all.

Aren't we talking about the probabilty that the diagnosis is correct? If not, what are we talking about?

And again: it makes the conclusion drawn more reliable. It doesn't change any aspect of the test itself.

Again, Rolfe's comment which you said was incorrect was:

The thing is, to get the predictive value of a test (which is what Wrath is asking), you need to know the incidence of the condition in the population representative of the individual being tested. This is obviously higher if that population is "sick people with clinical signs typical of the disease in question". In fact, the relevant figure is the clinical probability that this individual is affected.

To me, this seems to be saying exactly the same thing as you are saying, if you take "the predictive value of a test" as meaning the probability that the diagnosis is correct. So why did you say it was incorrect?
 
Because it's not true if the test's error is blind to the subject condition. If the test is as likely to be wrong when the patient has the disease as not, the distribution in the tested population doesn't matter.

To simplify things, I didn't want to ask about a case where two tests were applied, just one. Using other factors to reach a conclusion about the probability of disease is performing another test.
 
Wrath of the Swarm said:
Because it's not true if the test's error is blind to the subject condition. If the test is as likely to be wrong when the patient has the disease as not, the distribution in the tested population doesn't matter.

Again, I'm refering to the probability that the diagnosis is correct, which is the question you were originally asking (the question to which the answer was 9.02%). Do you agree that this depends upon the incidence of the condition in the population representative of the individual being tested? It seems that from your other replies that you do, but a simple yes or no answer would be clearer.

To simplify things, I didn't want to ask about a case where two tests were applied, just one. Using other factors to reach a conclusion about the probability of disease is performing another test.

Okay, you were just considering the case where one test is used. That's fine for a hypothetical case. However, I think that Rolfe's point was that in clinical situations this is unrealistic. This seems to me to be a perfectly valid point, since it is an important caveat when extrapolating from the hypothetical case, which you presented, to real life.
 
In this case, no, it doesn't. Because the test has a specified accuracy.

If there were 9,000 healthy people and 1,000 sick ones, it would give the right answer 99% of the time. If there were 9,990 healthy people and 10 sick ones, it would still be right 99% of the time.

Why is this concept so difficult to understand?
 
Wrath of the Swarm said:
In this case, no, it doesn't. Because the test has a specified accuracy.

If there were 9,000 healthy people and 1,000 sick ones, it would give the right answer 99% of the time. If there were 9,990 healthy people and 10 sick ones, it would still be right 99% of the time.

Why is this concept so difficult to understand?

I understand this concept. As I have repeated several times, I'm talking about the probabilty that the diagnosis is correct. From your original question:

Let's say that I went for an annual medical checkup, and the doctor wanted to know if I had a particular disease that affects one out of every thousand people. To check, he performed a blood test that is known to be about 99% accurate. The test results came back positive. The doctor concluded that I have the disease.

How likely is it that the diagnosis is correct?

The answer to the question in bold, "how likely is it that the diagnosis is correct?" depends upon the incidence of the condition in the population, which in this case is one in a thousand, or 0.001. However, the point that Rolfe was trying to make, which you disagreed with, is that often in real clinical situations this figure will be higher since there will be a screening as well. You can say that this constitutes an extra test, which you've discounted from your scenario, but the fact is that in real life it is the more common situation.

And to clarify again, I'm not arguing with the 99% accuracy figure.
 
Originally posted by Wrath of the Swarm
Because it's not true if the test's error is blind to the subject condition.
Which you didn't explicitely state in your question.

I think a much better question to ask would be: How likely is it the doctor is going to conclude the test is correct? Because you seem to believe that likelyhood is very high, while in reality, he is probably not even going to administer the test without prior symptoms of some sort, or at least base his descision on a number of other factors as well.

So despite being an interesting puzzle question, it has very little to do with reality, which is what you intended to discuss, wasn't it?
 
exarch said:
Which you didn't explicitely state in your question.

I think a much better question to ask would be: How likely is it the doctor is going to conclude the test is correct? Because you seem to believe that likelyhood is very high, while in reality, he is probably not even going to administer the test without prior symptoms of some sort, or at least base his descision on a number of other factors as well.

So despite being an interesting puzzle question, it has very little to do with reality, which is what you intended to discuss, wasn't it?

Face it, exarch, the original question was left ambiguous in order to allow for the creation of a controversy no matter what the answer.
 
Brian the Snail said:
The answer to the question in bold, "how likely is it that the diagnosis is correct?" depends upon the incidence of the condition in the population, which in this case is one in a thousand, or 0.001. However, the point that Rolfe was trying to make, which you disagreed with, is that often in real clinical situations this figure will be higher since there will be a screening as well. You can say that this constitutes an extra test, which you've discounted from your scenario, but the fact is that in real life it is the more common situation.
So what?

In real life, it's unusual for tests to have such a low chance of errors as well. Should we complain about that?

That was not the point I disagreed with.
 
Since screening tests frequently are not prefaced by another test, whether formally or informally, and research has shown that doctors aren't very good at understanding how test results actually work, I'd say this scenario has a great deal to do with reality.
 
Originally posted by Wrath of the Swarm
Since screening tests frequently are not prefaced by another test, whether formally or informally, and research has shown that doctors aren't very good at understanding how test results actually work, I'd say this scenario has a great deal to do with reality.
Screening tests are just that, screening. They have a high false positive rate, and, hopefully, a very low false negative rate. And they are often followed by a second, more accurate verification test.

And I would say that not all screening tests are performed in a lab either. Feeling for lumps, spotting a rash, problems with eyesight or balance, they can all be a tell-tale indication of something more serious, which is then checked with x-rays, MRIs, ultra-sound, EEGs, etc...

And I'm pretty sure MRIs are pretty "accurate" at spotting a lot of things, especially if you're looking for them, but it's just not possible to give everyone an MRI every 6 months just to make sure they're not ill. So other tests are employed, with high false positive rates, to make sure people who DO have a lethal affliction might be helped in time. Something which you would probably oppose because of the "unknown dangers" involved :rolleyes:

Basically, you are assuming the doctor will tell anyone who has a lump or a mole that they have cancer.
 
No, I am not assuming that.

The point of the question was to illustrate that many people do not understand how to draw conclusions from such tests.
 
Rolfe is right. Sensitivity and specificity, and by corollary, true positives and true negative rates cannot be determined by the word "accuracy", defined as (tests = condition) / (tests != condition). Sorry Wrath.
 
Yes they can. Accuracy cannot be defined in the absense of a known testing sample unless alpha and beta are equal to accuracy and each other.

If you'd like to claim otherwise, you could try posting an example. I've already demonsrated that, under the given circumstances, no other values for alpha and beta are possible. If you can find a flaw in the argument, or a counterexample, that would defeat my position quite readily.
 

Back
Top Bottom