Poll: Accuracy of Test Interpretation

Still waiting for your links of evidence Wrath.

You have no evidence for any of the idiotic claims you make.

Quit referring to it as "evidence" and call it what it is: your own personal opinion and nothing more.
 
alpha: chance of making a Type I error (incorrectly dismissing the null hypothesis)

beta: chance of making a Type II error (failing to correctly dismiss the null hypothesis)

In medicine, the null hypothesis is usually that the person lacks the condition being tested for. (Sometimes not, of course, but generally so.)

So alpha represents the chance of giving a positive response when the patient is actually negative. (A false positive.)

Beta represents the chance of giving someone a negative response when they're actually positive. (A false negative.)

For many simple tests, alpha is equal to beta. For more complex ones, alpha and beta aren't necessarily the same. Generally, it is considered better to have as few false negatives as possible, even if it results in false positives. Medical testing is usually biased towards false positives for this reason.
 
Accuracy, in the Wolfram site sense, is a concept involved with measurement - and ultimately, all testing is just a subcategory of measurement.

The definition provided applies.
 
Rolfe said:
First, Wrath is also continuing with the same assertions he was making on about page 2. Just avoiding defining what he means. While he continues, why should I not continue?

Because you're better than him? Because it's futile. Because you have more important things to do.

If someone calls you a name in the street and continues to stand there repeating the insult, do you stand there for a few hours and continue to explain how he is incorrect? It seems a futile stance to me.

This is my academic subject. Wrath has continually asserted that I am stupid, and ignorant, and worse. I feel that to retire while he is still taking that attitude is to concede to him, and I will not do that. He is trying to pontificate on a subject he is not familiar with, and I will not have it appear by my retiring that I give the slightest acknowledgement to his ill-conceived and ill-thought-out arguments.

I understand what you are saying, but how long are you willing to continue? Should WOS have stamina to boggle the mind and can keep this up for the next 10 weeks, you still going to be here on page 666 saying the same things? I just think that people who feed trolls have no right to complain about them.

Also, there is a more interesting discussion waiting in the wings. One which may make this look like a tea-party given Wrath's temperament, but interesting nonetheless.

Then move along. Is WOS the only person this new discussion could take place with? Is that why you are waiting?

Now if we can only agree that Wrath meant to imply that both specificity and sensitivity were 99%, and as sensitivity doesn't matter an iota for the purpose of the problem he sets, he might as well have said specificity is 99% from the start, we migth just manage to go on.

Rolfe.

Why do you need this to go on. It's patently obvious that the basic message is true. Move on.

Your refusal to do so, just makes you look like WOS to me. You can't let go.

BUT! I don't want you to think I'm unfairly picking on you in this idiotic affair. It's just that many people are pointing out what a jerk WOS is being, it's easy to see :D

I think though, that you display similiar behaviour in not being able to let this go and move on (to what you admit is the more interesting bit!). I suppose, it's really just me wanting to vent about the troll feeders that give them just what they want. More and more and more and more and more posts to respond to.....

Adam
 
Wrath of the Swarm said:
Rolfe: You can't weasel out that easily. You insisted that the question could not be answered without my explicitly providing the rate of Type I error (although you used different terminology) and proclaimed that my question was ambiguous because of my ignorance.

The data I gave you was more than sufficient to answer the question, and you know it. Your mealy-mouthed evasions wouldn't be nearly as mealy if you weren't aware of how deep a hole you've dug yourself into.
OK, note that I tried to move this on, but Wrath won't have it. (He also admits he's being deliberately insulting, but then we're told to cultivate a thick skin so I will let that run off.)

NOW HEAR THIS. THE PROBLEM AS IT STANDS IS NOT SOLVABLE UNLESS THE SPECIFICITY VALUE FOR THE TEST IS PROVIDED.

I'm finding Wrath's insistence on using his own words rather than the words clearly defined at the beginning of all primers in the subject (including American ones, the example in front of me now is American) annoying, but we will proceed.

As several people said at the start, the result might be different depending on how you have combined sensitivity and specificity to get this "accuracy" figure. (x<SUB>1</SUB> + x<SUB>2</SUB>) / 2 = 99%, fine but now we no longer know what either x<SUB>1</SUB> or x<SUB>2</SUB> were. And since only one of those is required for the solving of the problem, we are stymied.

The only way to proceed is to assume that we do know, because there is an underlying assumption that x<SUB>1</SUB> = x<SUB>2</SUB>. Finally, Wrath confirmed this. His very use of the term "accuracy" was to be taken to imply that (however unlikely this may be in practice), x<SUB>1</SUB> must be equal to x<SUB>2</SUB>. Because otherwise this "powerful" term couldn't be used at all.

So, we had a problem that required we know the specificity. Wrath told us a figure which we sort of gathered might be (specificity + sensitivity) / 2. Several people protested that this wasn't really good enough. About five pages later it gradually became obvious from Wrath's posts that since the term was indeed meaningless unless sensitivity was equal to specificity, then we were intended to assume that this was the case.

So, rather than simply tell us that specificity was 99%, he told us (we finally figured) that (specificity +sensitivity) / 2 was 99%. Then he admitted that the extra wrinkle was that sensitivity = specificity, implied by his particularly unique use of terminology.

OK, if (specificity +sensitivity) / 2 = 99%, and specificity = sensitivity, we are finally able to deduce that specificity (the term we need the value for) is 99%.

WOW.

And now Wrath is saying that we don't need the specificity at all? Now this is just plain wrong, whichever way you slice it. The figure you need for the calculation is specificity, and because Wrath chose to make it a very uphill battle to discover for sure how he meant us to derive that, doesn't alter the necessity for its being derived.

Who is in the hole?

Rolfe.
 
Originally posted by Wrath of the Swarm
they're insults, toots, not ad hominem attacks, not that you can tell the difference
From here

An ad hominem argument, also known as argumentum ad hominem (Latin, literally "argument against the man"), is a fallacy that involves replying to an argument or assertion by attempting to discredit the person offering the argument or assertion.
 
We've been over this.

For any patient, under any condition, the chance of the test producing an incorrect result is 1%. The accuracy for any patient, under any condition, is 99%.

Of the people who don't have the condition, 99% will receive a negative result and 1% will receive a positive.

Of the people who do have the condition, 99% will receive a positive result and 1% will receive a negative.

This is almost an idea situation. Very low alpha, very low beta. They don't get much more predictive power than this.
 
slimshady2357 said:
Because you're better than him? Because it's futile. Because you have more important things to do.

I understand what you are saying, but how long are you willing to continue? Should WOS have stamina to boggle the mind and can keep this up for the next 10 weeks, you still going to be here on page 666 saying the same things? I just think that people who feed trolls have no right to complain about them.
I do have more important things to do, you're right. But I'm also a teacher (though I no longer hold a university teaching post, I still teach this subject at post-graduate level).

If I'm failing to get my point across, if there is one of the audience (or it looks like more) who don't understand me, I like to try to do better, one more attempt to make the situation clear.

Rolfe.
 
Wrath of the Swarm said:
For any patient, under any condition, the chance of the test producing an incorrect result is 1%. The accuracy for any patient, under any condition, is 99%.
This is almost a meaningless statement.

Wrath, unless you tell me how you get this particular 99% from TP, FP, TN and TP numbers gathered from the evaluation testing of the assay, we can't even discuss it.

Rolfe.
 
I really do have a catalogue of laboratory tests to proofread. You know, some of these things that give concentration results, where accuracy and precision are the watchwords, and others that give positive/negative results, which we judge by assessing sensitivity and specificity.

So, either Wrath admits that the figure required for the problem he posed is the specificity (however derived), and the accuracy doesn't matter and doesn't need to be equal to the specificity for the problem still to be meaningful....

Or he explains how his "accuracy" figure would be worked out as a numerical value from the findings of the evaluation testing of the assay, which we recall are limited to:
number of patients testing true-positive
number of patients testing false-positive
number of patients testing true-negative
number of patients testing false-negative.

Because that is how it is done. Binary assays are tested in that way, those are the results you acquire, and those are the only data you have to derive all the numerical terms you want to define the assay.

BillyJoe showed how all the terms we use are derived from these four figures. Knowing how they are derived makes us able to use them "accurately". So tell us how to use "accuracy" in that way.

Or simply agree to move on, the only relevant figure needed for the problem was specificity, you took a helluva roundabout way to give it to us but we got there, now let's look at that 9.02% number and its implications.

Rolfe (off to proofread, see you in the morning - or 4 am if insomnia strikes!).
 
Rolfe said:
This is almost a meaningless statement.

Wrath, unless you tell me how you get this particular 99% from TP, FP, TN and TP numbers gathered from the evaluation testing of the assay, we can't even discuss it.

Rolfe.
Well, here's one way simple enough for all to understand:

Try the test on 100,000 people known to have the disease, the test says that 99,000 have it, and 1,000 don't.

Estimate of probability of a false negative is 1,000 / 100,000 = 1%


Now try the test on 100,000 people known to be free of the disease, the test says 99,000 are clear and 1,000 of them have it.

Estimate of probability of a false positive is 1,000 / 100,000 = 1%

As both figures have turned out the same we combine them together and say the test is always 99% accurate.

Now quibble away.
 
ceptimus said:
As both figures have turned out the same we combine them together and say the test is always 99% accurate.

Now quibble away.
Flying visit.

Most of the time in fact nearly all of the time both figures do not turn out to be the same. So your definition needs to be able to cope with that to be any use in the real world.

How does it do that?

Rolfe.
 
Rolfe said:
Flying visit.

Most of the time in fact nearly all of the time both figures do not turn out to be the same. So your definition needs to be able to cope with that to be any use in the real world.

How does it do that?

Rolfe.
It doesn't need to, as IN THIS CASE they are both the same. Of course, IF they were not both the same, then both would have to be given.

There are plenty of real world tests where the chance of a false positive IS the same as the chance of a false negative, even though this may not be true of medical tests.

The fact remains that Wrath's question, as posted has a definite answer. You may not like him, and you may believe he has a hidden (or not so hidden) agenda, and find it distasteful to admit that he is right, but the mathematics speaks for itself. Saying that real world situations are almost always different to Wrath's hypothetical situation, does not prove him wrong.

I think you (Rolfe) cast the first stone in this thread, by posting the answer and your chart, and so spoiling Wrath's poll. I think both of you are equally guilty of hurling insults at each other.

Anyway, I am happy to continue to debate for as long as is necessary, and it doesn't make me angry - I enjoy it. I might still learn something about statistics, though I've not learned any statistics from this thread yet.
 
Originally posted by ceptimus
As both figures have turned out the same we combine them together and say the test is always 99% accurate.
Why combine them at all?

Anyway, the real issue is 9.02%, as has been pointed out (or, alternately, 90.98%).

Even though even that figure is purely theoretical, since a case like this will only pop up in obligatory tests, like the physical you get when joining the army, or testing for HIV infection.
What do you think is the first thing people who test positive for HIV do? Get a second opinion I think ...

Does anyone know how well a second test performs. Are those "accurate"?
 
I had decided to cease posting to this thread after Wrath admitted he didn't take his question from a study involving doctors:

I will admit that I can't find a source that used precisely the same question. That is the question I remember being given (quite vividly), but I can't demonstrate that it was ever used in a study.

I retract the claim and admit I was wrong to make it.

But Wrath wouldn't let it go and accused me of being a liar. When I pointed out the blatant lie he told in this thread in the TRSOTTTWND he had no reply over there. Normally I'd let it go (hell, I've had fun today making Wrath eat his words) but when someone accuses me of lying, it becomes a point of principle.

Wrath, you lied in this thread. You claimed:

I finally found the sources that duplicated the question (I even pointed them out, remember?).

That is a lie. Look at the first quote in this post. You contradict yourself. You have lied and yet you accuse me of being a liar. Wrath, you are a bad joke. Its time for you to own up to your lie and apologise for claiming that I lied.

Edited to add: I apologise for derailing this thread but when someone lies in a thread I feel it is important that they are called up on it.
 
ceptimus said:
It doesn't need to, as IN THIS CASE they are both the same. Of course, IF they were not both the same, then both would have to be given.
It does need to. A descriptive parameter for the test which is only applicable in the unlikely chance of the two values being the same is useless.

You can see the artificiality of the situation from the way you posted your calculation (which was clear, thank you).
Try the test on 100,000 people known to have the disease, the test says that 99,000 have it, and 1,000 don't.

Estimate of probability of a false negative is 1,000 / 100,000 = 1%

Now try the test on 100,000 people known to be free of the disease, the test says 99,000 are clear and 1,000 of them have it.

Estimate of probability of a false positive is 1,000 / 100,000 = 1%
You had to force the two groups to be the same, or your scenario simply wouldn't have worked. But in real life there is no reason at all why they should be the same. They are not independent variables, because (as Wrath pointed out) there is a tendency for improvements on one side to cause a deterioration on the other - efforts to eliminate false negatives tend to increase false positives and vice versa. They tend to be opposite variables, which actually makes it less likely that they will happen to be identical in any given example.

Wrath has been asserting that this "accuracy" definition is a "more powerful" way to describe the test. However, you have just shown quite clearly that the term simply cannot be used at all except in the very unlikely chance that false positive rate and false negative rate are identical. This is precisely why it is not used. There is in fact no meaningful way to combine the two sides for the general case, and give an overall "accuracy" figure which does not vary with prevalence. This is why it is standard practice to quote sensitivity and specificity separately.
ceptimus said:
There are plenty of real world tests where the chance of a false positive IS the same as the chance of a false negative, even though this may not be true of medical tests.
This may be so. But we are not talking about other situations here, this discussion is specifically about medical tests. For the purpose of going on to discuss the competency or otherwise of doctors, as Wrath has told us several times. Therefore the deliberate use of a defining term which is simply not applicable to the medical testing situation is perverse, to put it mildly.
ceptimus said:
The fact remains that Wrath's question, as posted has a definite answer.
No, Ceptimus, it doesn't.

You cannot calculate a positive predictive value (which was in effect the question) unless you know the SPECIFICITY. That is the percentage of unaffected patients who test positive. (We don't care, for this question, how many affected patients test negative, we don't need to know.)

Wrath did not provide this information in a way that could be understood without making assumptions.

My assumption was, OK, I know I need the specificity figure. I've been given something called "accuracy", which I know is not a meaningful concept in the context of this type of testing. I will assume that this is just sloppy terminology, and that what I have actually been given is in fact the specificity. I did this, and got the expected result.

However, Wrath explicitly denies that this is what happened. He states that he deliberately used this "accuracy" term because it is "more powerful", because it incorporates both sensitivity and specificity. I ask again, how can a term be "more powerful" for the purpose of (real-world, in which we actually live and in which the terms we choose to use have to have general utility) test description, when it cannot be used at all in most (real-life) instances?

("The very fact that I used the term at all should have told you that I was referring to the rare and unrealistic situation of equal specificity and sensitivity!" Oh, God give me strength!)

Others reading the thread realised that "accuracy" must somehow incorporate sensitivity and specificity - and started speculating how. Arithmetical mean? This loses information, as Geni pointed out, but not only that. As Wrath pointed out, you can't just give equal weight to sensitivity and specificity when in reality more unaffected patients will be tested than affected, and so a higher rate of false positives will be disproportionately reflected in the "wrong" results. This notion is actually dragging the discussion back towards the concept of the predictive value, which is in fact the answer, not the question.

(In fact, this is the reasoning that led to the adoption of the predictive value calculation as another way to characterise test performance, with all its advantages and disadvantages which we will no doubt get to some time next week.)

So, we're left scratching our heads. Wrath has told us the "accuracy", specifically denying that he just meant to say "specificity". We need the specificity figure. How to get it? We don't know what Wrath means by "accuracy", because as you've just demonstrated, it's not a term which has a meaningful definition which can be used to characterise real-life tests.

Now we begin to suspect that the only way to get any sense out of this is to assume that Wrath must mean that both sensitivity and specificity are 99%. This is such an unlikely situation in real life (for medical tests, but remember, we are specifically dealing with medical tests, and we are trying to test those used to handling medical tests), that it hadn't really occurred to us that he could possibly mean that. But maybe he does.

Yes, he does. Since the question cannot be answered without the specificity value, Wrath must have given us the specificity value, we're told. OK, that's what I thought, but no, "accuracy" isn't specificity. But I still have to be able to deduce the specificity from this "accuracy" value. Arithmetical means run into the sands of the predictive value. So there is only one other possible explanation. He means that for this remarkable test, sensitivity and specificity are identical, therefore he can produce this all-in-one "accuracy" figure (which we've never heard used in the real world, for reasons already gone into).

Can you see that this inevitably involves an assumption? Either that Wrath meant specificity when he said accuracy, or that he meant that specificity must equal sensitivity for this very singular assay, and therefore accuracy means both sensitivity and specificity, which brings us back to where we were, accuracy here means specificity.

However you slice it, it is an assumption. Necessitated by Wrath's choice to parachute-in this "accuracy" figure, rather than simply use the standard terminology for the discipline which is applicable to all tests, not just a (tiny) subset.

Recall, Wrath kept declaring that he was posing the question in exactly the way it had been put to the medical personnel in the studies he was replicating (from memory). But when sources were finally produced, none of them quoted an "accuracy" value. In fact the popular choice was the "false positive rate" (100 - specificity), which is in effect the same information, but the term is more intuitive in its meaning than specificity itself. No problem with this, it's the right way to do it.

Why did Wrath choose to do it differently?

I think, because he knows pure statistics, but not applied statistics. He was trying to do it from memory, and didn't understand what any medical person would automatically know - that tests are usually not of equal sensitivity and specificity, and that specificity is the term you need to know. He therefore got into a huge tangle which was entirely unnecessary.

I'm sorry if I've still failed to explain it to you. Doesn't the fact that none of the source studies introduced an "accuracy" figure reveal anything, even if my explanations are inadequate?

Yes, I threw an early stone, overtly. But it was as a response to an unprovoked covert stone, Wrath's OP. I knew exactly where he was going. I'd quite like to go there and explore that place. But to set out to trap/criticise the medical community and yet not to employ the unambiguous terms that are provided for the purpose of this question, and which it turns out were employed by his sources, got my back up from the first moment. If Wrath is going to set himself up as the oracle in the medical statistics department, then for goodness sake use the correct medical statistics in the OP!

However, that wasn't my main reason for pulling the trigger on that sloppily-put and simplistic problem. My main reason was that Wrath had shown us a scenario deliberately designed to make the doctor look stupid. First he had built into the scenario a clinical examination, which led to the test request, but he did not tell us the reason for the request. This is hugely dishonest, because it is this reason that would influence the doctor's decision to accept the test result as correct. Which we're told he did.

Wrath tells us that the doctor examined the patient. Then he decided to order the test. Then he received a positive result. Which he decided was correct.

He wants us to assume that there was no special reason for the ordering of the test. That the figure to be assumed for "prevelance" in the case of this particular patient is the 0.1% figure for the population as a whole. And that the doctor therefore jumped to a wrong conclusion for no good reason.

But these are all assumptions. We can make equally (if not more) valid assumptions fron the same data.

The doctor ordered the test because of something he observed while examining the patient. When he got the positive result he was perfectly well aware that the condition had only a 0.1% incidence in the general population, but he knew that the probability of this patient having the disease was much greater than that, so the predictive value of this result in this patient was much higher than the baseline 9.02%, indeed high enough to make it a racing certainty.

The reason I say this is the more likely scenario is that Wrath couldn't just ask us the probability that the positive result was correct, oh no. He chose to tell us that the doctor decided it was correct, and ask us the probability that the doctor was wrong. Now Wrath has a very low opinion of doctors. But I don't think they're as stupid as he assumes. If part of the information I'm explicitly given is that "the doctor chose to believe that the result was correct" I feel I am entitled to use this information to reflect on the entire problem. If he made this decision, might not this be an indicator that he'd requested the test because of clinical suspicion, not as a routine?

Ceptimus, wording of these problems is of crucial importance. You mustn't tell the audience too little, or too much. Here, we were told too little, in that we had to guess at the specificity figure we needed, and we're kept in the dark as to the reason for requesting the test. And we're told too much, in that we're told what the doctor concluded. We didn't need that to be able to work out the basic maths, but once we've been told it, it introduces other possible assumptions which may affect the interpretation of how the problem should be viewed.

Again, when we look at the source material, we find that the question isn't "how likely is it that the doctor was wrong?" It's "how likely is the test to be wrong?" Much more neutral, but not Wrath's style.

And (certainly in the example quoted by Steve74, which looked like the original 1978 study) there has to be some way of indicating that we are not allowed to take signs and symptoms into consideration (explicitly stated in the 1978 question), or in fact that the patient is low-probability and we should use a low-probability in calculating predictive value.

Wrath missed that part completely.

Now is we ignore Wrath's justifications for his wording, how could it have been worded to pose the same question, but to rein in all those assumptions and keep the reader on the desired train of thought? Easy.

Last month I had to go for an insurance medical. The doctor could find nothing wrong with me, but the insurance company required that I have a blood test for a particular disease, which actually has a 0.1% incidence in the clinically heathy population. The test, which has a false positive rate of only 1%, came back positive. What is the probability that I really have the disease?

That's how you do it (and believe me, I've set this one for clinicians often enough, I know how careful you have to be to stop them jumping all over the question - but this one is bomb-proof).

Note that I explicitly identified the patient in question with the population I gave the incidence figure for. No weaselling that the prevalence isn't necessarily valid for that individual. It's the right prevalence. And by the way, no silly "accuracy", you have a number there from which you can directly derive the specificity. And I didn't tell you what the doctor thought. Why should I? You're the doctor! Come on, what do you think? I've been very careful to give you NO reason to go for a high probability that the result is right. Will you still fall for it?

And quite often, they do.

And that, actually, is the start of the class, not the end.

So, I went for Wrath. This is no more than he dishes out - he frequently tells posters they can expect no mercy from him. The reason I did it is that he was setting out to have a go at medical comprehension of statistics, but he himself had been extremely sloppy in his wording of the question. A question which is regularly used in the sort of classes I teach, and the parameters of which are well-known.

I wish he'd worded the question better. We might have had a much more constructive discussion by now. But Wrath takes no prisoners. When he himself is less than perfect, why should we refrain from retaliation?

Rolfe. Hoping to start the real discussion soon.
 
They can be the same. They often are. They don't need a reason to be identical any more than they need a reason to be different.

Indeed, the term 'accuracy' cannot meaningfully be applied unless alpha and beta are equal - but that is precisely why it's more powerful. When it does apply, it says a great deal.

No assumptions need be made. We know the disease incidence in the general population. We know how frequently healthy people will test as positive, as well as how frequently sick people will test as positive, because we were told how often the test is wrong: 1% of the time.

Your assumption, Rolfe, was that the error rate must be dependent upon the possibility being examined. This is not the case. It's not even the general case. It's simply the case that applies most often in medical situations.

In your ignorance, and arrogance, you decried this as a mistake, an ambiguous statement. There's no ambiguity about it. The statement made can be valid only if the Type I and Type II errors are equally likely - and since we were discussing a hypothetical test whose properties are determined by fiat, there is no reason for any logically consistent properties we assign to it to be considered invalid.

The basic point holds even if we don't use an example with identical alpha and beta rates. People aren't able to answer the question correctly no matter what permutations are used.

The simple truth is that the example was meant to illustrate that most medical professionals do not understand basic issues of statistics and are not qualified to make judgements regarding them. The irony is that your witless babbling has proven my point better than I could ever have done.

Thank you, Rolfe.
 
Wrath of the Swarm said:
Your assumption, Rolfe, was that the error rate must be dependent upon the possibility being examined. This is not the case. It's not even the general case. It's simply the case that applies most often in medical situations.

I think you're accurate in your assessment of medical professionals. I've presented a similar conundrum to several, and it usually results in a lot of blustering and tap-dancing.

However, vets tend to be smarter than physicians anyway. For one thing, at least in the US, it's harder to get into vet school than med school. For another, vets have to deal with multiple species, none of whom can speak in the normal way. For another, they're permitted to retain much more of their humanity than docs are.

In any event, I get the impression that Rolfe is mostly expostulating on the vagueries of trying to reduce statistics to a single number. I hope you consider saving your anger for people like HopkinsMedStudent.
 

Back
Top Bottom