• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Statistical significance

The experiment that would be insisted on is something like this.

[...]

So under your theory you're going to want 0.25N+3.08*sqrt(0.1875N) <= 0.35N-1.28*sqrt(0.2275N). Solving that algebra problem we see that the critical condition is N>=10*(3.08*sqrt(0.1875) + 1.28*sqrt(0.2275))^2 which is about 37.8.

Oops, through a silly algebra error I forgot to square the 10. This answer is therefore off by a factor of 10. And the later post comparing 27% and 25% is off by a factor of 50.

Now if you wanted to be precise you'd have to work out your exact odds using the binomial approximation to see what the actual cutoff that satisfies both you and Randi is, but it will be about 38 trials.

38 should, of course, be 378.

My apologies,
Ben
 
That sounds good, but I wonder what the most time-consuming protocol that Randi has accepted is. Does anyone know?

I don't know, but it doesn't really matter since all expenses for running the experiment are the responsibility of the testee.

Cheers,
Ben
 
I don't know, but it doesn't really matter since all expenses for running the experiment are the responsibility of the testee.

It most certainly does matter. Randi's time -- and that of the volunteers who actually run most of the testing for him (since very few preliminary tests require his personal attention) is given freely in the interest of scientific progress and general good-fellowship.

I would be perfectly happy to give up one of my Saturday afternoons to go watch someone stare at Zener cards. I'd be a little more leery of giving up the entire weekend, but, hey, it's for science (and for Randi). But if Randi wants a month, I simply can't give it to him. Not won't, but can't. I have classes to teach, animals to feed, my country's five hundredth birthday to plan, my wedding to arrange, my wife to murder, and Guilder to frame for it. I'm swamped. :D
 
It most certainly does matter. Randi's time -- and that of the volunteers who actually run most of the testing for him (since very few preliminary tests require his personal attention) is given freely in the interest of scientific progress and general good-fellowship.

I would be perfectly happy to give up one of my Saturday afternoons to go watch someone stare at Zener cards. I'd be a little more leery of giving up the entire weekend, but, hey, it's for science (and for Randi). But if Randi wants a month, I simply can't give it to him. Not won't, but can't. I have classes to teach, animals to feed, my country's five hundredth birthday to plan, my wedding to arrange, my wife to murder, and Guilder to frame for it. I'm swamped. :D

The rules to the challenge put no upper limit on how much work it may take to verify the claim. They just insist that all costs are the responsibility of the testee. Randi makes the offer to all, and as long as someone steps forward with a request to be tested according to his rules, he is legally bound to find a way to test them.

If Randi is unable to conduct the test with volunteers, then hiring people to observe the tests becomes a cost. And that cost would be the responsibility of the testee.

Cheers,
Ben
 
The rules to the challenge put no upper limit on how much work it may take to verify the claim.

The rules of the challenge also don't obligate Randi to accept any challenge, if they have a reasonable belief that the proposed challenge structure is beyond what they are capable of handling. For example, Randi has rejected at least one proposal because the claimant put unreasonable (and nonsensical) restrictions on who the participants in the challenge would be.


Randi makes the offer to all, and as long as someone steps forward with a request to be tested according to his rules, he is legally bound to find a way to test them.

Er,.... no. The challenge is only legally binding if Randi accepts it -- and he's not obligated to accept every proposal that comes down the pike. He's only obligated to make a good-faith effort to negotiate a protocol.
 
Oops, through a silly algebra error I forgot to square the 10. This answer is therefore off by a factor of 10. And the later post comparing 27% and 25% is off by a factor of 50.
38 should, of course, be 378.

My apologies,
Ben

I think you'll find that about 10,500 trials would be necessary for a claimant performing at a 27% success rate in a ganzfeld experiment where a 25% success rate would be expected to demonstrate with 99.9999% probability that the success was not due to chance. See http://faculty.vassar.edu/lowry/binomialX.html and insert the following values: For n, 10500; for k, 2835 (27% of 10500); for p, .25. Click calculate and you will note that "P: 2835 or more out of 10500" is 0.000001.
 
I think you'll find that about 10,500 trials would be necessary for a claimant performing at a 27% success rate in a ganzfeld experiment where a 25% success rate would be expected to demonstrate with 99.9999% probability that the success was not due to chance. See http://faculty.vassar.edu/lowry/binomialX.html and insert the following values: For n, 10500; for k, 2835 (27% of 10500); for p, .25. Click calculate and you will note that "P: 2835 or more out of 10500" is 0.000001.

You've missed two critical factors.

First of all if a claimant expects to perform on average at a 27% rate, then the odds that they actually perform at a 27% rate on any given occasion is only 50%. If the claimant is going to do all of that work, the claimant is going to want better than even odds of winning the bet. The calculation that I was presenting was estimating the number of trials needed to succeed with a given probability of success. (The figure that I chose was a 90% chance of succeeding at a 99.9% confidence level.)

The second is that the structure of the challenge says that you must succeed to a 1/1000 level of significance on two occasoins. And the data that you use in the first trial cannot be reused in the second one. That means that a lot more trials will be needed in the end. This is very unfortunate for the hypothetical claimant in this case because succeeding at a 99.9% level takes well over half the data that succeeding at a 99.9999% level does.

Cheers,
Ben
 
Er,.... no. The challenge is only legally binding if Randi accepts it -- and he's not obligated to accept every proposal that comes down the pike. He's only obligated to make a good-faith effort to negotiate a protocol.

I just re-read it, and you are clearly right. Randi does make a promise that it is available to all comers, but in the fine print it says that Randi will only sign it (making it a contract) after a test protocol has been agreed on. If no test protocol is agreed on, then he is under no obligation to sign it.

My bad.

Cheers,
Ben
 
By the way, Ben, I owe you a "thank you" for your answer to my question earlier. I'm still thinking about what it means, which is why I didn't say so earlier.
 
Where is that stated in the challenge rules?

Theparticular p-value is not in the rules, but that's the level at which he usually sets the bar during protocol negotiation. Succeed at a 1:1000 chance during the preliminary, and then succeed at a 1:1000000 chance during the (hypothetical) final, for a total chance of winning of 1:1000000000 if the null hypothesis is true.

In this regard, Mr. Tilly is slightly wrong; the final test is supposed to be stricter than the prelim.
 
Theparticular p-value is not in the rules, but that's the level at which he usually sets the bar during protocol negotiation. Succeed at a 1:1000 chance during the preliminary, and then succeed at a 1:1000000 chance during the (hypothetical) final, for a total chance of winning of 1:1000000000 if the null hypothesis is true.

In this regard, Mr. Tilly is slightly wrong; the final test is supposed to be stricter than the prelim.

just out interest, has anyone achieved any kind of significance (say p<0.05) on their preliminaries?
 
just out interest, has anyone achieved any kind of significance (say p<0.05) on their preliminaries?

Oh, sure. Almost by definition, half of the participants have achieved p < 0.50 and all of them have achieved p <= 1.0 :D

Seriously, though -- when you test twenty or more people, you will expect to see at least one p < 0.05. Randi has tested hundreds. The credophiles are well aware of the few cases -- which I'm too lazy to look up, but I'm sure that your Google-fu is up to the task -- where the claimant has achieved better-than-chance but not up to the contractual level.
 
Oh, sure. Almost by definition, half of the participants have achieved p < 0.50 and all of them have achieved p <= 1.0 :D

Seriously, though -- when you test twenty or more people, you will expect to see at least one p < 0.05. Randi has tested hundreds. The credophiles are well aware of the few cases -- which I'm too lazy to look up, but I'm sure that your Google-fu is up to the task -- where the claimant has achieved better-than-chance but not up to the contractual level.

Probably the most famous one is Natash Demkina.

This is why statistical significance is not always the last word: we are not dealing with natural phenomena, here: we're dealing with human intention. Demkina did better than chance, but not necessarily better than educated guessing or cheating. Those are the null hypotheses: not chance.
 
Probably the most famous one is Natash Demkina.

Thank you.


This is why statistical significance is not always the last word: we are not dealing with natural phenomena, here: we're dealing with human intention. Demkina did better than chance, but not necessarily better than educated guessing or cheating. Those are the null hypotheses: not chance.

Well, "chance" is much more mathematically tractable than "educated guessing," because we know how to calculate chance in a way that we don't know how to calculate "guessing." So part of the reason that Randi sets the bar so high is that it reduces the possibility of a false positive obtained by running the numbers against a particularly naive null hypothesis.
 
Thank you.




Well, "chance" is much more mathematically tractable than "educated guessing," because we know how to calculate chance in a way that we don't know how to calculate "guessing." So part of the reason that Randi sets the bar so high is that it reduces the possibility of a false positive obtained by running the numbers against a particularly naive null hypothesis.

ok - that kinda answers my next question, which would be as to whether a participant achieving (say) p<0.01 could "double up" - ie do the experiment again - and then if they achieved p<0.01 again that this would be accepted as a preliminary pass....

but if the bar is set high to protect against a "naive null hypothesis" then i guess this wouldn't be acceptable....
 
Well, "chance" is much more mathematically tractable than "educated guessing," because we know how to calculate chance in a way that we don't know how to calculate "guessing." So part of the reason that Randi sets the bar so high is that it reduces the possibility of a false positive obtained by running the numbers against a particularly naive null hypothesis.

Yeah, but it's an important point: Wiseman is left holding the bag trying to explain why her results (4/7) were astronomically better than chance - cannot be explained by chance - but she *still* officially failed. Since the tradition is to compare to chance, there is a lot of misunderstanding among her supporters. Wiseman is accused of denying a legitemate victory.

Come to think of it: this may not be an example of a JREF challenge. I think this was a "demonstration" with the assistance of the usual JREF suspects (Wisemann, Hyneman...)

I'll shut up, now: I'm not trying to derail the thread.
 
Yeah, but it's an important point: Wiseman is left holding the bag trying to explain why her results (4/7) were astronomically better than chance - cannot be explained by chance

It was only significant to 0.02 - so it's not that astronomical at all.....

http://www.csicop.org/specialarticles/natasha2.html

though it's annoying me as to how they arrived at that figure.....

you have 7 people all with 7 different conditions (or no condition) which requires identifying.....
The girl has cards with the 7 conditions and just has to match them to the correct people.....

Person 1 steps up and assuming complete guesswork, she has 1/7 prob of getting it right.....

so it seems that p(7) has to be 0.002 from 1/7!

but....as for the rest....

it's obviously not independent....as the first wrong guess guarentees at least another one....


hmmmm

help?
 
Last edited:
good old JREF....there's an old thread on this :)


P(N=0) = (1/0!) * ((1/0!) - (1/1!) + (1/2!) - (1/3!) + (1/4!) - (1/5!) + (1/6!) - (1/7!)) = 0.3677
P(N=1) = (1/1!) * ((1/0!) - (1/1!) + (1/2!) - (1/3!) + (1/4!) - (1/5!) + (1/6!)) = 0.3679
P(N=2) = (1/2!) * ((1/0!) - (1/1!) + (1/2!) - (1/3!) + (1/4!) - (1/5!)) = 0.183
P(N=3) = (1/3!) * ((1/0!) - (1/1!) + (1/2!) - (1/3!) + (1/4!)) = 0.0625
P(N=4) = (1/4!) * ((1/0!) - (1/1!) + (1/2!) - (1/3!)) = 0.0139
P(N=5) = (1/5!) * ((1/0!) - (1/1!) + (1/2!)) = 0.0042
P(N=6) = (1/6!) * ((1/0!) - (1/1!)) = 0
P(N=7) = (1/7!) * (1/0!) = 0.000198

http://www.internationalskeptics.com/forums/showthread.php?t=38646&highlight=Natasha+Demkina

apparently the inclusion-exclusion principle.....hmmm why does this work?
 

Back
Top Bottom