• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Statistics help please

Deetee

Illuminator
Joined
Jul 8, 2003
Messages
3,789
I need guidance/help regarding some comparisons. Any advice much appreciated.

I have a small group of 9 patients with an uncommon disease X.
In 8 of them it seemed to be associated/triggered by a problem (Y) but in one case it seemed to be linked with a different problem (Z).
Now Y happens quite commonly in the general "at risk" population of 600,000 people (its incidence is 500,000), but Z is rare (100).

However, my sample is incomplete, and I don't know how many other cases of disease X are out there.

Can I determine whether having Y or Z is a greater risk factor for developing disease X?

What is the best way to compare, and what confidence limits would there be?
 
Last edited:
It has really been a long time since I last used statistics but I think that you are looking for Bayes theorem:

P(X|Y) = [P(Y|X)*P(X)]/P(Y) which gives you the probablility that a person will have disease X given that she shows problem Y. Substitute Z for Y and you could compare.



You have two problems:
From your sample, you could set P(Y|X) to 8/9 and P(Y) to 5/6 and P(Z|X) to 1/1 and P(Z) to 1/6000

But You don't show show data about P(X) i.e. how common disease X is given no other information.

Your second problem is you sample size, especially the 1 case with Z.
You simply can't do statistics with samples sizes of one :)

Zee
 
ETA: P(Z|X) should be 1/9, not 1/1 but this doesn't help you either

Come to think of it...
Since P(X) is the same in both equations, you could substitute and get a relative comparison of P(X|Y) and P(X|Z)
If I did my quick caluculation correctly, P(X|Y) = 625 times higher than P(X|Z) but again, you sample size makes this meaningless
 
Last edited:
It has really been a long time since I last used statistics but I think that you are looking for Bayes theorem:

P(X|Y) = [P(Y|X)*P(X)]/P(Y) which gives you the probablility that a person will have disease X given that she shows problem Y. Substitute Z for Y and you could compare.



You have two problems:
From your sample, you could set P(Y|X) to 8/9 and P(Y) to 5/6 and P(Z|X) to 1/1 and P(Z) to 1/6000

But You don't show show data about P(X) i.e. how common disease X is given no other information.

Your second problem is you sample size, especially the 1 case with Z.
You simply can't do statistics with samples sizes of one :)

Zee

Bit confused still....
The estimates are that disease X occurs in about 1 in every 125 of the overall population. Does that help?
 
Are you trying to make inference on incidence using only prevalent cases (because that's complicated...)? Wouldn't you need undiseased exposed (i.e. with Y or Z) to make odds ratios and things like that? And yeah, the sample is just too small for anything beyond unreliable point estimates.
 
I need guidance/help regarding some comparisons. Any advice much appreciated.

I have a small group of 9 patients with an uncommon disease X.
In 8 of them it seemed to be associated/triggered by a problem (Y) but in one case it seemed to be linked with a different problem (Z).
Now Y happens quite commonly in the general "at risk" population of 600,000 people (its incidence is 500,000), but Z is rare (100).

However, my sample is incomplete, and I don't know how many other cases of disease X are out there.

Can I determine whether having Y or Z is a greater risk factor for developing disease X?

What is the best way to compare, and what confidence limits would there be?

What you want is a case-control study. Find 18 comparable controls from your patient population and then measure for the presence of Y and Z. Calculate the odds-ratio for each factor:

|Case|No Case
Exposed|a|b
Not Exposed|c|d

OR = ad/bc

You then convert this to a z-score by taking the ln of the OR and dividing by the SE (sqrt of (1/a+1/b+1/c+1/d)) and use the usual tests for statistical significance. The confidence interval is formed using the ln, but you can then take the anti-log to convert it to an interval that makes sense.

That tells you whether one or the other factor is associated with X and whether one or the other is significant.

If you want to compare the relative influence of Y and Z, use logistic regression (I presume you have a stats program?).

Linda
 
Last edited:
What you want is a case-control study. Find 18 comparable controls from your patient population and then measure for the presence of Y and Z. Calculate the odds-ratio for each factor:

|Case|No Case
Exposed|a|b
Not Exposed|c|d

OR = ad/bc

You then convert this to a z-score by taking the ln of the OR and dividing by the SE (sqrt of (1/a+1/b+1/c+1/d)) and use the usual tests for statistical significance. The confidence interval is formed using the ln, but you can then take the anti-log to convert it to an interval that makes sense.

That tells you whether one or the other factor is associated with X and whether one or the other is significant.

If you want to compare the relative influence of Y and Z, use logistic regression (I presume you have a stats program?).

Linda

You guys are just too much. Why can't I have some of your spare brain capacity?

If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.

And what if with the control samples one of the boxes comes up with a zero?
 
If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.

If you're only testing for Y as a factor, probably. Not for Z, because at 1 in 6000 in the general population, there's 95% chance you won't have any Z in 18 controls, as fls suggested.

And what if with the control samples one of the boxes comes up with a zero?

Technically, there's Fisher's exact test, but again, for Z you don't have the sample size to make any inference.

Finally, because you have prevalent cases (subjects already have the disease at recruitment, thus have aged since onset), any association with between Y or Z and the disease might not be representative of incident cases. That is, prevalent cases tend to live longer than incident cases (the longer you have the disease, the more likely it will eventually be detected thus the more likely such a subject will be included in the sample), and what you observe may be association with longer disease duration rather than increased incidence.
 
You guys are just too much. Why can't I have some of your spare brain capacity?

If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.

You can. The OR gives a useful measure of the strength of that association, which is the information that you are looking for.

And what if with the control samples one of the boxes comes up with a zero?

I misinterpreted the part about the incidence of Z and missed that it was so low. As Jorghnassen pointed out, you probably won't find any incident examples of Z in your controls, which won't make it possible to do a case-control analysis for Z (the numbers should work for Y). You can do a Fisher's exact test (instead of a Chi-square test) when any of your cells have less than 5 cases (you can do a Fisher's exact test in any case, it's just that it moves you away from the realm of 'pencil and paper').

A rough rule of thumb, when you are dealing with anything rare, is to focus on collecting a group with the rarest factor. Is it possible to collect a group of people with Z for a retrospective cohort study? Alternatively, if you already have a good measure of the underlying incidence of these factors in your population, you could simply advertise for people with X and Z. If you get any additional cases, it gives you enough ammunition to make a more involved study worthwhile, since you really shouldn't have more than one person with both to begin with (specifically, 3 or more people with both would occur with a less than 5% probability based on the numbers you gave). If you don't get any additional cases, then it suggests you can drop the idea.

Linda
 
Last edited:
You guys are just too much. Why can't I have some of your spare brain capacity?

If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.

And what if with the control samples one of the boxes comes up with a zero?



Linda's right about the 5 units per cell needed for the chi-square test to be reliable, but that rule only applies to the table of expected values. If your actual sample has zero that's not a problem. But when you compute the expected values, you need a value of least 5 per cell under the assumption of the null hypothesis.
 

Back
Top Bottom