Wrath of the Swarm said:
Secondly, the overall accuracy of the test cannot be established - or even spoken about - without reference to a specific test population if the proportion of false positives is different from the proportion of false negatives.
Wrong.
The "accuracy" of the test cannot be established, or even spoken about, until you have clearly defined what you mean by "accuracy". This you have not done.
This subject is a very well-defined one, with its own well-defined vocabulary. So if I were to refer to the "positive predictive value" for example, I would be justified in believing that anyone familiar with the subject would know exactly what I meant by this. If someone unfamiliar with the subject asked me for a definition, it would be easy to provide one.
BillyJoe kindly supplied a list of concise definitions of nearly all the terms recognised for discussing this problem. I'll repeat them again now.
POSITIVES/NEGATIVES, SENSITIVITY/SPECIFICITY, PREDICTIVE VALUES, PREVALENCE.
TRUE POSITIVE: a person who tests positive and has the disease.
FALSE POSITIVE: a person who tests positive but does not have the disease.
TRUE NEGATIVE: a person who tests negative and does not the disease.
FALSE NEGATIVE: a person who tests negative but does have the disease.
SENSITIVITY: the percentage of people with the disease for whom the test is positive.
Sensitivity = TP / (TP + FN)
SPECIFICITY: the percentage of people without the disease for whom the test is negative.
Specificity = TN / (TN + FP)
POSITIVE PREDICTIVE VALUE (PPV): the percentage of people with the disease who test positive for the disease
PPV = TP / (TP + FP)
NEGATIVE PREDICTIVE VALUE (NPV): the percentage of people without the disease who test negative for the disease
NPV = TN / (TN + FN)
PREVALENCE: the percentage of the population who have the disease
courtesy,
BillyJoe
The two extra that BillyJoe didn't mention are:
FALSE POSITIVE RATE: 100 - specificity
FALSE NEGATIVE RATE: 100 - sensitivity.
Nowhere in that list do the terms "accuracy" or "alpha and beta values" appear.
The way these characteristics of a test are evaluated is by taking a group of patients, x% of whom have the condition (known about by other means, the reference method). Test them all by the test you are evaluating. Some of the affected people will test positive, the true-positives (TP). Some of them will test negative, the false-negatives (FN). Some of the unaffected people will test negative, the true-negatives (TN). Some of them will test positive, the false-positives (FP).
This is all the information you get about the test. From this you have to derive the numbers that describe the test to its users. Above, BillyJoe has detailed exactly how this is done for the terms we actually use (and I added a couple more, and defined them).
Now, you will note that the sensitivity and specificity values are completely independent of the proportion of affected people, x. So long as you have enough in each group (affected and unaffected) to give a good representation of test performance, the proportions are irrelevant. Switch them round and the values for sensitivity and specificity will remain the same. And THEY DO NOT HAVE TO BE EQUAL FOR THIS TO BE TRUE.
This is why these two figures are the cardinal descriptive terms of the test. They are absolute values which describe the test independently of population prevalence (which may vary widely).
However, if you actually want to examine the implications of these values for the probability of the test being right in populations of differing prevalence (or as I like to think of it, in patients with differing probabilities of being affected), you need to factor in your assumed prevalence, and derive the PPV and the NPV. And indeed, sometimes the results you get for particular permutations of this sum are somewhat counter-intuitive.
Wrath is persistently referring to "accuracy", and to "alpha and beta values". Now it is impossible to talk about this rationally unless all terms are defined. And by defined, that means explain how you get the number from the results you got when you did the evaluation described in the bold paragraph. Because that is all the information you will ever have (although you may improve on the validity of the exercise by increasing the number of individuals involved).
I thought alpha and beta values were just another way I've never heard of of expressing sensitivity and specificity, and I still suspect that from the way Wrath is using the words. But he won't even confirm that, or which is which.
He persistently refuses to explain what he means by "accuracy".
I think I'm beginning to get it better in the more recent posts. It's one of the suggestions I considered earlier. "Accuracy" is defined by Wrath as the figure you get for both sensitivity and specificity if these two parameters happen to be equal.
No wonder this isn't a defined term! It's a completely useless term. To dream up a term like this which can only be used in the improbable fluke of TP / (TP + FN) happening to come out equal to TN / (TN + FP) in the evaluation scenario described above is meaningless. To then castigate everyone else who doesn't intuit this remarkable definition from your ill-defined scenario, is arrogant and unjustified.
IF a general statement about the test's accuracy is made, THEN the alpha must be the same as the beta for the statement to be valid. These tests are perfectly possible - common, even.
All right, Wrath, name six. I've already said this pages earlier. In the real world in which most of us actually practise, real tests don't come like that. And if one happens to come like that, it's just a coincidence (and a coincidence which might well just disappear if you extend the numbers of patients in your evaluation study and publish revised and better estimates of the figures). Not worth coining a special defined term for, which is why nobody did. Until Wrath came along.
Now this is boring me too. It isn't the aspect of the problem I really wanted to talk about. I think the use of the word "accuracy" was just a piece of sloppy terminology for "specificity" in the first place. I was perfectly happy all along just to substitute the word "specificity" for "accuracy" in the original question and call it quits on that aspect. Because we don't need the sensitivity value at all to do the sum! It doesn't have to be the same as the specificity, it doesn't have to be anything in particular. If you just say, the test specificity is 99%, you can carry on.
Which is what I was trying to do, honest. (Because there is a lot of carrying on to do, in fact.)
However, we do have something to clear up properly before we do. I've stated that sensitivity and specificity values are a constant property of the test, and do not change with prevalence of disease in the population being tested. And that they not only do not have to be equal, it is no more than a mildly interesting (and unusual) coincidence if they are. I can be told the values for any given test, and I know they describe the performance of the test as they are.
This is why they are the parameters you would quote when asking the initial question the thread is about. In fact the question was simply, "given a test with specificity 99%, what is its positive predictive value in a population with a prevalence of disease of 0.1%?" Answer, no arguing, 9.02%. No need for the sensitivity even to be mentioned, you don't need it. And without the dressing-up as a doctor's appointment, soluble as a pure statistics question.
So, do I have agreement that sensitivity and specificity are absolutes, do not have to be equal (and in fact will probably not be equal), and do not depend (equal or not) on the prevalence of disease in the population used to evaluate the test?
I'm asking this because I've seen some posts that make me think this isn't clear in everyone's minds, and if we don't get it clear we could be in for another three pages of cross-purposes.
Rolfe.