Merged Odds Standard for Preliminary Test

On the specific technical issue here, Rodney is right. Honestly, I don't remember the details of the math either. But here is a Matlab program that gives the result.

Code:
pp=betainc(.3,40-(30-1),30,'upper')
pp =
    0.3087

I have no experience with Matlab, and as such, I'm interested in the details of the math involved. The figure could well be correct, sure, but for things like this, it's best to nail everything down solidly.

Startz said:
You raise an interesting point about both sides negotiating "in their own best interest." Presumably, JREF's interest is in conducting a fair test.

An _unbiased_ test of _what the claimant claims to be able to do_, yes. If the claimant can only perform consistently at 28 out of 40, it is not in the claimant's best interest to accept protocols specifying 30 out of 40.

Startz said:
Primarily, they want to be sure the applicant neither cheats nor wins by luck.

Correct. The primary goal of the protocol design is to rule out, beyond a reasonable statistical doubt, that the applicant has successfully performed their claim by random chance alone. A secondary (call it "primary-A", really) goal of that design is to ensure that the applicant cannot 'cheat', and in fact can only succeed at the test if they do, in fact, possess the claimed ability.

Startz said:
Past that, I hope JREF would do what it can to help an applicant demonstrate any paranormal skills he has.

No. This is a challenge. Either the applicant can do what he says, or he cannot. The JREF should be considered an adversarial party who is only willing to accept hard results.
 
Past that, I hope JREF would do what it can to help an applicant demonstrate any paranormal skills he has.

Why? Would you expect the JREF to cover the expenses that the applicant incurs, for example? If not, why not -- that's certainly something the JREF could do "to help an application demonstrate any paranormal skills." But that's not in the JREF's interest, nor will they do that.

If you consider it reasonable for it to conserve its financial resources, then you understand why it also wants to conserve its (much more valuable) time resources....
 
You raise an interesting point about both sides negotiating "in their own best interest." Presumably, JREF's interest is in conducting a fair test. Primarily, they want to be sure the applicant neither cheats nor wins by luck. Past that, I hope JREF would do what it can to help an applicant demonstrate any paranormal skills he has.
Indeed, this is very interesting point. We always focus on the probability of passing the test without having the stated ability, but never on the probability of passing if the claimant has the ability as stated. From the info on this thread, it looks like Pavel has less than a 1 in 3 chance of passing the proposed protocol even is he has the ability he has stated. If I were the claimant, I would not accept this. If JREF would not allow a test where I was more likely to pass, I would have a legitimate complaint. Nowhere in the rules does it state that the test must be completed in a given time. Here we have a claimant who can be tested in a reasonable amout of time, and yet the JREF seems to want to limit him to a test that he is not likely pass even it he performs the 70% accuracy that he claims.

I hate to say it, but I am beginning to see Rodney's point! Previously he as talked about tests (i.e. Ganzfeld or RNG) what would take an unreasonable amount of time to test, but now we have an example where the time is not unreasonable and the claimant is not being offered a fair test. Hopefully, the negotiations will continue until this is remedied.

IXP
 
Yes, but it's a minor difference. The way the preliminary test is tentatively designed for Pavel, the odds are he will fail even if he has a paranormal ability. Is that the design you want?

It is unfortunate that Pavel's described ability is barely discernable from random chance. One hopes that he will work to find a way to demonstrate it conclusively in a reasonable timeframe. I try to support him in this every chance I get.

So far you've suggested adding the possiblity of an extra test on an additional day if Pavel is close on the first test. If an extra day of testing is available, one could simply increase the number of trials to both allow for Pavel's estimated margin of success and still maintain the total by-chance success chance at the desired 1:1000. This could even be done with a little easier per-day work than the current proposed test.

Since the results of each trial will be known at the completion of each, it is possible that after the first day Pavel's performance will be so poor that even perfect performance on the second day will prove insufficient to pass the preliminary test. The number of trials completed in the first day could be kept the same (remember I had suggested it could be reduced because of the extra day) to increase the chance of such culling if chance-only performance is expected.

This would give exactly the functionality you were looking for (retest given if performance is close) without changing the general format of the test. In the current format it would be something like 29 out of 40 means success, 28 out of 40 means he comes back another day to try for 1 out of 3, 27 out of 40 means on the extra day he'll need to get 2 out of 3, and 26 out of 40 means he still gets to come back for the extra test but will have to perform 3 out of 3 that day. I'll admit that I didn't spend the time to calculate out the odds, but I think that's pretty close to numbers that would work. Obviously you could increase the portion of trials performed on the second day, but that will increase the odds of requireing it.

All this requires only that JREF is willing to extend their general testing time guideline from 1 testing day to 2. Therefore your suggestion boils down to, "I think it is reasonable that some tests may require two days instead of limiting all of them to 1 day." I suppose we can add in "I think 1 in 537 is every bit as reasonable a threshold for chance performance as 1 in 1000." I have my doubts the JREF would agree with you on either point, but there you are.
 
Indeed, this is very interesting point. We always focus on the probability of passing the test without having the stated ability, but never on the probability of passing if the claimant has the ability as stated.

I have discussed this on numerous occasions.

I hate to say it, but I am beginning to see Rodney's point!

This point has always been relevant to challenge discussions.

Previously he as talked about tests (i.e. Ganzfeld or RNG) what would take an unreasonable amount of time to test, but now we have an example where the time is not unreasonable and the claimant is not being offered a fair test. Hopefully, the negotiations will continue until this is remedied.

IXP

Damn. Does this mean I actually have to read through the challenge thread?

Linda
 
...Here we have a claimant who can be tested in a reasonable amout of time, and yet the JREF seems to want to limit him to a test that he is not likely pass even it he performs the 70% accuracy that he claims.

I hate to say it, but I am beginning to see Rodney's point! Previously he as talked about tests (i.e. Ganzfeld or RNG) what would take an unreasonable amount of time to test, but now we have an example where the time is not unreasonable and the claimant is not being offered a fair test. Hopefully, the negotiations will continue until this is remedied.

IXP
A fair test is one thing, but demonstrating a phenomenon greater than random chance is another.

Pavel claims 70% success.
From the http://www.automeasure.com/chance.html tables I quoted above, if he were to pick 1 card from 3 correctly, 70% percent of the time from 10 picks (trials), this is still in the bounds of random chance at 1:100 odds.

So the test is fair, and he achieves what he says he can, but he has NOT demonstrated that such a skill is anything remarkable.

I think this is what is bogging down here. I've followed Pavel's attempts at working a protocol. It is obvious he doesn't have a good grip on probability. That is not a criticism of him at all. In fact it is quite common. He has then tried in good faith to adapt to demands of a more "impressive" demonstration which, I agree, he should not have gone down because it now has emerged that he is trying to demonstrate something that he doesn't claim to be able to do.

But that's the point of the Challenge.

To demonstrate that your talent produces results significantly greater that random chance.

The fact is that his skill can not do this, and he has acknowledged that in a number of semi-blind trials he has attempted during discussions in other threads with him.

There is little point in putting together a protocol that merely demonstrates an ability to guess at the level of random chance.
It is to demonstrate a paranormal ability, not a normal one.
 
I have no experience with Matlab, and as such, I'm interested in the details of the math involved. The figure could well be correct, sure, but for things like this, it's best to nail everything down solidly.

See http://faculty.vassar.edu/lowry/binomialX.html. Method 1 there is applicable, so the formula for determining exactly 30 hits in 40 trials with a P of .7 is: 40!/(30! * 10!) * .7^30 * .3^10. That works out to be about 11%, but then you have to add to it the probability of obtaining exactly 31 hits, 32 hits, etc., which brings the total to about 31%. Fortunately, the above website makes it easy by allowing you to type in the values and obtaining the answer for 30 or more hits. Just type 40 under "n", 30 under "k", and 0.7 under "p". Then click on "calculate" and you will see the result displayed below under "P: 30 or more out of 40" for "Method 1. exact binomial calculation."
 
See http://faculty.vassar.edu/lowry/binomialX.html. Method 1 there is applicable, so the formula for determining exactly 30 hits in 40 trials with a P of .7 is: 40!/(30! * 10!) * .7^30 * .3^10. That works out to be about 11%, but then you have to add to it the probability of obtaining exactly 31 hits, 32 hits, etc., which brings the total to about 31%. Fortunately, the above website makes it easy by allowing you to type in the values and obtaining the answer for 30 or more hits. Just type 40 under "n", 30 under "k", and 0.7 under "p". Then click on "calculate" and you will see the result displayed below under "P: 30 or more out of 40" for "Method 1. exact binomial calculation."

Rodney, I'm asking _you_ to explain it, and ideally in layman's terms, not citing arcane mathematics, nor using an alternate automated tool.

Put another way, Pavel claims he can consistently (ie, 100%) perform at 28 out of 40. Why should the chances of him having a good day and picking 30 or more out of 40 suddenly drop to 30%? He only has to get two more right, and given that the baseline is 100% at 28, ... see where I'm coming from?

Also, you appear to have neglected all the salient points to the topic of interest, and are instead focusing a great deal of effort on something which, even if you _are_ correct, is dismissed with the phrase "Pavel should not accept protocols which require him to perform beyond his claimed capabilities", and even the JREF will tell Pavel that -- they are not interested in testing someone and having that person have a prepared position ready to fall back to when they fail. They wish it to be as clear as possible that "You claimed X, you actually can only do Y, you fail, no excuses."

Do you have any thoughts on the rest of the points raised?
 
I hate to say it, but I am beginning to see Rodney's point!
I certainly do not! If Pavel signs the protocol that he can do 30 out of 40, then 30 is the number he claims, not 29, and everything below 30 is a failure.

As we can see from Patricia Putt's protocol, the claimant will be required to sign a statement like this one:
protocol said:
I, the undersigned, agree to all terms and conditions listed in this document outlining the protocol for my preliminary test in the James Randi Educational Foundation’s One Million Dollar Challenge. I agree that the protocol outline describes a fair test of my claimed ability.
Pavel will presumably have to sign the same clause, so he should be very careful in stating exactly what he thinks he can do. Rodney's idea that the test criteria should be lowered successively until Pavel succeeds by luck is totally unfair!
 
A fair test is one thing, but demonstrating a phenomenon greater than random chance is another.

Pavel claims 70% success.
From the http://www.automeasure.com/chance.html tables I quoted above, if he were to pick 1 card from 3 correctly, 70% percent of the time from 10 picks (trials), this is still in the bounds of random chance at 1:100 odds.

So the test is fair, and he achieves what he says he can, but he has NOT demonstrated that such a skill is anything remarkable.

I think this is what is bogging down here. I've followed Pavel's attempts at working a protocol. It is obvious he doesn't have a good grip on probability. That is not a criticism of him at all. In fact it is quite common. He has then tried in good faith to adapt to demands of a more "impressive" demonstration which, I agree, he should not have gone down because it now has emerged that he is trying to demonstrate something that he doesn't claim to be able to do.

But that's the point of the Challenge.

To demonstrate that your talent produces results significantly greater that random chance.

The fact is that his skill can not do this, and he has acknowledged that in a number of semi-blind trials he has attempted during discussions in other threads with him.

There is little point in putting together a protocol that merely demonstrates an ability to guess at the level of random chance.
It is to demonstrate a paranormal ability, not a normal one.
See the bolded part. This is just plain false. If Pavel can get it correct 70% of the time, then a test with sufficient trials can both give him a fair chance of succeeding (assuming he has the ability) and a sufficiently low probabilty of succeeding without having the ability.

For example, A set of 100 trials with success defined by 66 or more correct would give us:

0.83714 chance of succeeding if he can identify the picture 70% of the time.
0.00089 chance of succeeding if can only identify the picture 50% of the time.

IXP
 
I hate to say it, but I am beginning to see Rodney's point! Previously he as talked about tests (i.e. Ganzfeld or RNG) what would take an unreasonable amount of time to test, but now we have an example where the time is not unreasonable and the claimant is not being offered a fair test. Hopefully, the negotiations will continue until this is remedied.
IXP
I take back this part of my post. I looked back through the negotiation thread, and it is Pavel who suggested the test that required more than his claimed ability! I thought it was JREF.

Pavel: You should consider your chances of passing at your 70% accuracy figure when deciding on a protocol, not just the JREF's 1000:1 criterion. You can calculate these on the binomial calculator already linked.

IXP
 
This would give exactly the functionality you were looking for (retest given if performance is close) without changing the general format of the test. In the current format it would be something like 29 out of 40 means success, 28 out of 40 means he comes back another day to try for 1 out of 3, 27 out of 40 means on the extra day he'll need to get 2 out of 3, and 26 out of 40 means he still gets to come back for the extra test but will have to perform 3 out of 3 that day. I'll admit that I didn't spend the time to calculate out the odds, but I think that's pretty close to numbers that would work.

The probability of passing the test that you have just described, by random chance, is about 1 in 63.
 
The probability of passing the test that you have just described, by random chance, is about 1 in 63.

A lot closer to 1 in 64 now that I take a moment, but that was still quite a bit further off than I'd hoped. This test does have to extend rather far to meet both criteria it seems. Has to go all the way to 46 out of 66 to achieve 1:1000 odds at 70% performance. With only fitting 40 tests in a day, no way to ensure total success by the end of the first day I'm affraid.

Still doable in two days though, so I can at least validate my point that if an extra day is deemed reasonable it is possible to satisfy the requirements of Pavel's success claim and the JREF demand of performance-over-chance.
 
I see you have trouble distinguishing between the words "probably" and "in my fantasies".

Randi's Personal FAQ states that "... a couple hundred have completed and failed the preliminaries." Assuming an average chance of a false positive in the preliminaries on the order of 1 in a thousand, someone having passed a preliminary test by now would be, statistically, utterly unremarkable.

I have an idea of how to get the million. We get a 10,000 people to take the test using the exact same procedure. One of the ten left standing might have a shot.
 
Why? Would you expect the JREF to cover the expenses that the applicant incurs, for example? If not, why not -- that's certainly something the JREF could do "to help an application demonstrate any paranormal skills." But that's not in the JREF's interest, nor will they do that.

If you consider it reasonable for it to conserve its financial resources, then you understand why it also wants to conserve its (much more valuable) time

resources....

If JREF covered expenses then my claim can only be demonstrated with a $10,000 bankroll at the Dunes. I prefer the luxury suite for the night.
 
Much of the discussion in this thread could apply to many challenges, but some is specific to Pavel's protocol. In that regard it may be worth remembering that Pavel submitted a protocol that:

1) Met the 1 in 1,000 standard.
2) Checked through many cards, reducing the probability of a false negative.
3) Could be conducted in one session in one afternoon.

JREF rejected it.
 
Much of the discussion in this thread could apply to many challenges, but some is specific to Pavel's protocol. In that regard it may be worth remembering that Pavel submitted a protocol that:

1) Met the 1 in 1,000 standard.
2) Checked through many cards, reducing the probability of a false negative.
3) Could be conducted in one session in one afternoon.

JREF rejected it.
Which protocol was that, and what was the reason for the rejection?
 
A lot closer to 1 in 64 now that I take a moment...

Well, not that it would really matter, but are you sure? I recalculated it again and got the same result again, 1 in 63.41255...etc. I mean, maybe it's me who's making a mistake somewhere, and in that case I'd rather find out.

tsig said:
I have an idea of how to get the million. We get a 10,000 people to take the test using the exact same procedure. One of the ten left standing might have a shot.

This form of cheating (just trying your luck) has always been available. When JREF is free to adjust their odds requirements according to the number of applicants, the risk is reasonably mitigated. With Rodney's propositions such as "you should set a fixed odds standard" and "you could as well make it 1 in 537, the difference is minor", your scenario could be theoretically carried out, and you would have a 1 in 29 chance of getting the million.

Of course, now that there are less than a thousand days left in the challenge, the point is mostly moot.
 
See the bolded part. This is just plain false. If Pavel can get it correct 70% of the time, then a test with sufficient trials can both give him a fair chance of succeeding (assuming he has the ability) and a sufficiently low probabilty of succeeding without having the ability.

For example, A set of 100 trials with success defined by 66 or more correct would give us:

0.83714 chance of succeeding if he can identify the picture 70% of the time.
0.00089 chance of succeeding if can only identify the picture 50% of the time.

IXP
I agree - my statement was referring only to the case that I quoted - 10 trials, chosing 1 from 3 test objects at a time. Indeed it would only require 30 trials at odds of 1:1000 for him to win with a 70% success rate.

The problem is, that he says he can't consistently reach a 70% success rate. So even his 70% success rate isn't.
 
Rodney, I'm asking _you_ to explain it, and ideally in layman's terms, not citing arcane mathematics, nor using an alternate automated tool.

Put another way, Pavel claims he can consistently (ie, 100%) perform at 28 out of 40. Why should the chances of him having a good day and picking 30 or more out of 40 suddenly drop to 30%? He only has to get two more right, and given that the baseline is 100% at 28, ... see where I'm coming from?
What you're missing is that, if Pavel averages 70% hits, he has only a 58% (not a 100%) chance of getting 28 or more hits in 40 trials. So, even if 28 was the passing grade, Pavel still might well fail, but at least he would have a better than even chance of passing. However, with a 70% average hit rate, the odds of getting 29 or more hits is 44% and the odds of getting 30 or more hits is 31%, which makes his chances of passing less than even.

Also, you appear to have neglected all the salient points to the topic of interest, and are instead focusing a great deal of effort on something which, even if you _are_ correct, is dismissed with the phrase "Pavel should not accept protocols which require him to perform beyond his claimed capabilities", and even the JREF will tell Pavel that -- they are not interested in testing someone and having that person have a prepared position ready to fall back to when they fail. They wish it to be as clear as possible that "You claimed X, you actually can only do Y, you fail, no excuses."

Do you have any thoughts on the rest of the points raised?
The fundamental point that most here are missing is that the preliminary test should not be so rigid as to eliminate applicants who actually do have a paranormal ability.
 

Back
Top Bottom