Well, sorta. You can quantify how you *might* have adequately accounted for the risk. Look at poker. We can calculate absolute pot odds without a problem and determine that a heavy bet with a full house is a good idea. If the other guy has a straight flush, I did not "adequately" account for this situation. I made the "right" decision but I did the "wrong" thing because I lost the hand.
That's not really a good example, since a simple understanding of how you calculate odds on the probability of any particular poker hand does not actually include the conditions under which you encounter those hands.
That's an assertion with no evidence
It is basic statistics. I have simply described what "power" means in terms of your ability to demonstrate an effect.
and, quite frankly, it's rather counterintuitive unless you mean the likelihood of passing *one* trial.
It is somewhat counter-intuitive, which I suspect is why it usually gets little to no consideration in protocol discussions. It is not the likelihood of passing one trial. It is the likelihood of passing your threshold for success with a given effect size.
I discussed this in Pavel's thread with examples (
http://www.internationalskeptics.com/forums/showthread.php?postid=5032589#post5032589).
Let's take an effect size of 0.80, which represents a 'large' effect size. For trials with p=0.50, this means translates to the following numbers of hits for increasing trial numbers:
1/1, 9/10, 22/25, and 43/50.
The p-value for each of those results if due to chance are:
1.00, 0.01, 0.0001, and 0.0000001.
The number of hits necessary to exceed a standard of 0.001 would be:
N/A, 10/10, 21/25, and 37/50.
Which translates to success rates of:
N/A, 100%, 84%, and 74%.
Which reflects effect sizes of:
N/A, 1.571, 0.748, and 0.500.
While the person is able to accomplish the same thing in each in each set of trials, whether or not they will be able to exceed the threshold depends upon the total number of trials. Conversely, the larger the total number of trials, the lower their success rate needs to be in order to pass, and smaller and smaller effect sizes (i.e. the effect of 'holes') will allow them to pass.
I can't believe you mean that since you *always* increase the likelihood of passing one trial when adding additional trials.
Suppose the "gap" we don't find practical to close is cheating by some form of signaling. With two trials, I stand virtually no chance of determining a pattern in the environment (finding a signal within the noise). With 2,000 trials you can bet I'm going to find that signal.
Suppose the gap is that there's some little thing a decoy might do to reveal that he's not the target. The step I would need to take to entirely prevent this is too expensive. I estimate that there's only a 1 in 100 chance that this might happen.
But you realize that you are pulling this number out of your ass, right? What if it's one in ten or one in two?
So, the claimant has a 99 in 100 chance of having three trials with a 1 in 12 chance and a 1 in 100 chance of having three trials with a 1 in 10 chance (I'm on the 2 kidney thing).
If I add one more trial, then my worst case scenario is four trials of a 1 in 10 chance. Those odds are more difficult than my original best-case scenario of of three trials of 1 in 12 odds.
This works if the number you have pulled out of your ass is reasonable. How would you go about figuring out whether it is or not?
The amount of residual bias present in good-quality RCT's is estimated to be 0.10. In good-quality studies without control groups, it is estimated to be 0.20. Ray Hyman looked at the amount of bias which may be present in the ganzfeld studies (as in Anita's test, these involve making guesses whilst attempting to remove any possible sources of normal information) and found that there may be at least 0.30. What these numbers indicate is the proportion of studies which should be found to be negative, which will seem to be positive. Now, as you can see, a bias of 0.10 utterly dwarfs the effect of playing around with the odds. If you are worried about the one false-positive result due to chance in 1000 tests, this will be dwarfed by the 100 false positive results due to bias - an effect that won't even be touched by the removal of that one false positive due to chance.
Now, under the conditions of Anita's test, some of those sources of bias will not be present - the effect of multiple testing, flexibility in specifying outcomes (at least for the purpose of passing the test), and publication bias, will not be present. You've never taken other biases, like the bias introduced by the asymmetry in the location of the missing kidney and asymmetry in her guesses, or randomization (this isn't mentioned in the protocol) into consideration. But mostly we worry about the effect of her picking up subconscious or conscious clues from the subjects and examiners. And we've tried to mitigate this through partial blinding. So how successful are we at reducing bias. Is it a thousand-fold less? Is it ten-fold less?
Other situations, where people claim to have eliminated bias (parapsychology studies, claimants performing informal tests), when compared to subsequent testing, can show effect sizes of 0.20 or 0.50 or more due to the sort of bias we are worried about with Anita. And as I illustrated in my example above, increasing numbers of trials allow for smaller and smaller effect sizes to lead to a result which passes the criteria for a successful test.
Linda
References:
Why Most Published Research Findings Are False.
Statistical Power Analysis for the Behavioral Sciences, Jacob Cohen.
Commentary on John P.A. Ioannidis' 'Why Most Published Research Findings Are False', Ray Hyman, Skeptical Inquirer, Vol. 30, March-April 2006.