• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Interpreting nonsignificant results

TruthSeeker

Illuminator
Joined
Sep 5, 2003
Messages
3,587
I'm writing a systematic review (never again!).

It is on a very focused question of whether age is related to outcome A.

There are 4 studies which have looked at the question. In two studies, age is a continuous variable, and in two it is dichotomous, with the split at 65 years. Unfortunately, the studies have each used a slightly different measure of A. Each of these measures, however, has been validated for use across the adult lifespan. Due to the heterogeneity across studies, I can't do a meta-analysis.

In terms of results, in two of the studies, the difference between age groups is not statistically significant. As well, when age is continuous, it is not associated with the outcomes (whether by correlation or multivariate regression)

None of the studies is very large or very powerful - not unusual in clinical studies. In studies where age is continuous, sample size is 71 and 120 and in the other two, age group sizes are 31 and 136.

So, the straightforward conclusion is that there is no evidence that age is related to outcome A.

Another conclusion is that larger studies with more representative samples might find a difference and should be encouraged.

My questions: 1. is there a third possibility I'm overlooking?
2. How much evidence of "non significant difference" does one need to conclude there is no difference?

Any other input very welcome

Thanks
 
I would think a valid thing to say would be almost a combination of your two conclusions: "At this sample size, there does not appear to be any significant correlation."
 
What are the p-values of the four studies? Are they consistent and close to the significance cut-off? If so, that's an indication to do more research. On the other hand, if they vary wildly, that's an indication it's just random chance producing the actual results.

If you want to conclude "no difference" with a specified high level of confidence, then determine how large the difference could be and still be considered "no difference". This difference is termed delta. You can then run a hypothesis test that determines the probability that the true difference is less than delta with a specified level (1 - alpha) level of confidence.

Beth


I'm writing a systematic review (never again!).

It is on a very focused question of whether age is related to outcome A.

There are 4 studies which have looked at the question. In two studies, age is a continuous variable, and in two it is dichotomous, with the split at 65 years. Unfortunately, the studies have each used a slightly different measure of A. Each of these measures, however, has been validated for use across the adult lifespan. Due to the heterogeneity across studies, I can't do a meta-analysis.

In terms of results, in two of the studies, the difference between age groups is not statistically significant. As well, when age is continuous, it is not associated with the outcomes (whether by correlation or multivariate regression)

None of the studies is very large or very powerful - not unusual in clinical studies. In studies where age is continuous, sample size is 71 and 120 and in the other two, age group sizes are 31 and 136.

So, the straightforward conclusion is that there is no evidence that age is related to outcome A.

Another conclusion is that larger studies with more representative samples might find a difference and should be encouraged.

My questions: 1. is there a third possibility I'm overlooking?
2. How much evidence of "non significant difference" does one need to conclude there is no difference?

Any other input very welcome

Thanks
 
if you're worried about making a type II error (not rejecting a false hypothesis) then you should work out

[latex] \beta [/latex] = probability of making a type 2 error

this will be effected by the significance level that you're using for [latex] \alpha [/latex]

If you have a high probability of a type 2 error based on the data you're using, then it might be worth doing a larger test -or altering [latex] \alpha [/latex]
 
Last edited:
Betting on the null!

I would mention the reliability of the survey measure which could reduce finding an effect if one exists. What is it, the square root of the reliability affects the theoretical limit on validity?

Also, the variability for age could restrict the range too (say, if everyone in the sample were 50+, for example).

And the variability on the survey itself could be a problem.

I'm leary of dichotomizing age-- one wakes up at 65 and is now in a completely different group than the night before his/her birthday?

That said, extreme groups comparisons can sometimes be useful. Is there any way to look at the survey results for just the youngest and oldest people in the studies?

What does the graph of the effect by age look like. Is there a weak relationship masked by a few outliers, or does it look trully random?

The best way to handle it is by reporting some type of power analysis. Assuming a small effect size, calculate the probability of rejecting the null with the present data sets and their variability.

Then you could say something like: Assuming a small effect indeed exists here, there was an 80% chance of finding it with the present studies. So, the best conclusion is with 80% confidence, no age differences exist on this measure.

Hope this helps

B
 
Oh, I forgot to mention, what are the mean age differences (that weren't significant) on the survey, and what does that mean?

For example, suppose the older folks are scoring nominally .5 an item different on the survey. In two of the studies older people are doing better, and in two they are doing worse. Sounds pretty random in this case.

On the other hand, A 2 or 3 point mean difference in each study (consistent, direction-wise across the 4 studies) might imply an important difference masked by poor power.
 
Oh, I forgot to mention, what are the mean age differences (that weren't significant) on the survey, and what does that mean?

For example, suppose the older folks are scoring nominally .5 an item different on the survey. In two of the studies older people are doing better, and in two they are doing worse. Sounds pretty random in this case.

On the other hand, A 2 or 3 point mean difference in each study (consistent, direction-wise across the 4 studies) might imply an important difference masked by poor power.

get back to finding that elusive g-spot :D
 
I'm lost navigating these waters-- can't even find the little man in the boat:(
 
Thanks, everyone.

Some great suggestions here. I'm limited somewhat by the amount of information the authors provide in their manuscripts (very frustrating!) but you have given me some possibilities.

Pesta, I agreee with you about the age dichotomization but I have to deal with what's out there. I have already written a paragraph about the strengths/weaknesses of this approach.

Thanks again! I'll update once I've moved forward a bit.
 
Statistical studies are designed to find evidence of a difference, not evidence of no difference. In fact, we can pretty much reject an absence of a difference on philosophical/a prior grounds. For a continuous variable, the probability that two different quantities will be exactly the same is basically zero. So the only question is whether there is any difference large enough to worry about. And for that, you first need to figure out what difference is "large enough to worry about". To find beta, you first have to figure out what the alternative hypothesis is.

bpesta22 [/quote said:
"I'm [leery] of dichotomizing age-- one wakes up at 65 and is now in a completely different group than the night before his/her birthday?
Everything has to be dichotomized at some point. After all, everything is, in the end, either in the rejection region or not. And to decide that, you generally turn data into discrete variables. No study is going to record someone's age as 63.789302124...
 
So, the straightforward conclusion is that there is no evidence that age is related to outcome A.

Another conclusion is that larger studies with more representative samples might find a difference and should be encouraged.

My questions: 1. is there a third possibility I'm overlooking?
Smaller studies with less representative samples might find a difference and should be encouraged?

:boxedin:

2. How much evidence of "non significant difference" does one need to conclude there is no difference?
I don't have much to add, really. I'll just repeat what a bunch of other people have already said, because it's important, namely, that the answer depends on how small a difference is, practically speaking, as good as no difference at all. Generally, a non-significant study provides strong evidence against a large effect but only weak evidence against a small effect. (In fact, of course, it provides weak evidence for an effect of about the same size as actually occurred in the study).

What are the p-values of the four studies? Are they consistent and close to the significance cut-off? If so, that's an indication to do more research. On the other hand, if they vary wildly, that's an indication it's just random chance producing the actual results.
I'm not sure how literally you meant this to be taken, but I guess, in the presence of a real effect, one wouldn't expect the p-values themselves to be the same across studies of different sizes: larger studies will tend to have smaller (more significant) p-values than smaller studies, in a test of the null hypothesis of no effect. So one should look instead for consistency of some other, more direct, measure of the effect size.
 
Thanks again, everyone.

Just as an indication of the magnitude of the effects: I calculated that it would require a sample size of over 10,000 patients for the effect to be significant at p less than 0.05 with power .80

So, that helps.

69dodge, I love the idea of asking for smaller, less representative studies. Just to see what the reviewers say.
 
So far the discussion has centred on statistical significance. The last post, indicating a required sample size of 10,000, raises the question of clinical significance. In other words, there may be something going on, but is it useful?
 
So far the discussion has centred on statistical significance. The last post, indicating a required sample size of 10,000, raises the question of clinical significance. In other words, there may be something going on, but is it useful?

Which is exactly the direction of my paper at this point. It is highly unlikely that such a small difference would be associated with a clinically relevant impact.
 
TS

I'm not an expert in power analysis, but 10000 seems pretty steep to me.

It seems like there might be two ways to do it.

1) calculate the presumed effect size based on the mean differences (the nonsignificant ones) found in the 4 studies you looked at. From there, figure out what n size would have been needed to have p=.80 to reject the null.

My problem with this approach-- and I could be wrong-- is that if the 4 studies represent a type II error, then by definition they are unfairly understimating the true effect size. Using the observed effect size then seems off.

I'm not sure if you did this, but I guess the other way to go is figure out what the smallest clinically meaningful effect size might be (say .20) and from there figure out the power these studies had to detect that effect.

Take the above with a grain of salt, but it seems odd to me that one would need 10,000 subjects to find something here.
 
My problem with this approach-- and I could be wrong-- is that if the 4 studies represent a type II error, then by definition they are unfairly understimating the true effect size. Using the observed effect size then seems off.

I'm not sure this follows. A type II error also, by definition, includes the case where the null hypothesis is false, but the true effect size is not enough to be "significant" in a statistical sense.

Our best guess is that the true effect size is the size that has been estimated. From that, we can conclude how many subjects would be necessary to successfully detect an effect of that magnitude.
 
Kit-- you may be right, I dunno.

It just seems odd to me that a reasonably reliable and valid survey / assessment would need 10,000 subjects to detect a semi-meaningful age difference.

Most studies are lucky to run 100 subs versus 10,000!

We need stanley cohen!
 
It just seems odd to me that a reasonably reliable and valid survey / assessment would need 10,000 subjects to detect a semi-meaningful age difference.

Doesn't seem odd at all to me. If the difference is actually very small, then it will require lots of subject to detect (reliably).
 
Kit-- you may be right, I dunno.

It just seems odd to me that a reasonably reliable and valid survey / assessment would need 10,000 subjects to detect a semi-meaningful age difference.

Most studies are lucky to run 100 subs versus 10,000!

We need stanley cohen!

Jacob Cohen?

Linda
 

Back
Top Bottom