• You may find search is unavailable for a little while. Trying to fix a problem.

What's the required p-value to beat?

I was just pointing out that even if we concede your point that we shouldn't expect these abilities to be any more consistent than those of talented batsmen, chess players etc, we would still expect that they would (as with such abilities) produce results that are significantly better than random chance. And they don't.

Thanks for clarifying.
 
Why? Any human ability should fall within normal parameters compared to other human abilities. Anyone can play piano after a few lessons, but only some people will reach virtuoso level after many years of study and practice.
Claimants are not expected to reach virtuoso level. They are just expected to do anything paranormal at all.

An exceptional baseball pitcher may be defined as one who pitches a no-hitter game. There have only been 236 no-hitters in the past 111 years, so the success rate does not exactly jump significantly.

And no, despite your bizarre claim about my presumed motive, choosing pitchers or any other skilled human would not weaken my argument. Education and practice are the keys to acquiring skill in any field. If psychic skills exist, why should they be any different? Because you say so?
Perhaps there is confusion about the nature of the challenge. A claimant needs not beat anyone, unlike a baseball hitter, or pitcher, or chess player, or any of those examples.
He or she is simply expected to demonstrate something. It would be as if the baseball pitcher was only required to throw the ball a certain distance. He or she could set themselves any distance for the throw, and even demand that he or she need only succeed one time out of ten and such. And of course, the pitcher could demand that there should or should not be a hitter.

Under such test conditions one would expect that people should pass a million times in a row.
 
...
If psychic skills exist, why should they be any different? Because you say so?

1. There is no evidence psychic skills exist, hence a speculation about if they would be different from other skills is at best moot and at worst a colossal waste of time. As soon as there is evidence, we could but would not necessarily need to speculate. But not earlier.

2. Please do not put words in my mouth. I did not say anything "so", I merely criticized your analogy and I stand by that criticism.

Let's agree to disagree and move on.
 
1. There is no evidence psychic skills exist, hence a speculation about if they would be different from other skills is at best moot and at worst a colossal waste of time. As soon as there is evidence, we could but would not necessarily need to speculate. But not earlier.

There was no evidence the world was not flat until there was. Despite your dogmatic proclamation ("because I said so, so there!"), envisioning parameters is a tried and true method toward proving or disproving unexplained phenomena.
 
There was no evidence the world was not flat until there was. Despite your dogmatic proclamation ("because I said so, so there!"), envisioning parameters is a tried and true method toward proving or disproving unexplained phenomena.

Again, please do not put words in my mouth.
 
Expecting an exact 100% or near 100% replication is very ridiculous and extremely conservative. Telling a psychic to pass 100 tests in a row is like telling famous baskeball player, Brian, to never miss a basket.

No, it's like telling famous basketball player Brian, who claims he can make a basket in 1 out of 10 free throws, to make 1 out of 10, 100 times in a row.

The thing that's being repeatedly tested is not the actual act (making a free-throw) but the claim (making at least 1 freethrow out of 10, in this case).

It's up to Brian to figure out that he usually gets 8 baskets out of 10 free throws but he's never tried 9 shots in a row without getting at least one basket, and make his claim accordingly.
 
A million tests will succeed in a row? That is so unrealistic in practical terms, even via conventional research. So, if a study found a p-value of 0.001, what is the probability of getting five 0.001 p-values in a row? Simple! 1_ X 10^-14

Not even conventional research has reach those kinds of odds.

I do not believe that psychic powers exist. But I believe that such powers should be tested the same way one tests other skills.

As a result, agree with the above. There is also another related point. But if the JREF truly wants to test whether psychic powers exist, then it should also consider beta, the probability that a false null hypothesis is not rejected.

I think there's a good argument for keeping the beta reasonably low. If it's an issue of protecting the money, I'd cut down the amount of money involved to, say, $100,000 or even $50,000. I understand that there would be PR issues with doing this, but that's OT.
 
Last edited:
I do not believe that psychic powers exist. But I believe that such powers should be tested the same way one tests other skills.

As a result, agree with the above.
You do? But it's nonsense!
The correct answer to that question is: We do not know.

There is also another related point. But if the JREF truly wants to test whether psychic powers exist, then it should also consider beta, the probability that a false null hypothesis is not rejected.
What the JREF truly wants to do, AFAIU, is hammer home the point that all those professional psychics who charge a lot of money for their tricks can't deliver the goods.

I think there's a good argument for keeping the beta reasonably low.
How do you propose to calculate beta in practice?

If it's an issue of protecting the money, I'd cut down the amount of money involved to, say, $100,000 or even $50,000. I understand that there would be PR issues with doing this, but that's OT.
Ridiculous.
 
I do not believe that psychic powers exist. But I believe that such powers should be tested the same way one tests other skills.

As a result, agree with the above. There is also another related point. But if the JREF truly wants to test whether psychic powers exist, then it should also consider beta, the probability that a false null hypothesis is not rejected.

I think there's a good argument for keeping the beta reasonably low. If it's an issue of protecting the money, I'd cut down the amount of money involved to, say, $100,000 or even $50,000. I understand that there would be PR issues with doing this, but that's OT.

I argue that the best way to keep beta low is plenty of repetition, effectively increasing the sample size. It does not require letting more type I errors through, and thus no need to reduce the reward.

Intuitively: if someone claims they can perform a task 51 times out of 100, if you do the test 200 times, by chance they might fail to out-perform a coin toss. However, if you do the same test 2000 times, the effect should start to emerge clearly.
 
Last edited:
I argue that the best way to keep beta low is plenty of repetition, effectively increasing the sample size. It does not require letting more type I errors through, and thus no need to reduce the reward.

Intuitively: if someone claims they can perform a task 51 times out of 100, if you do the test 200 times, by chance they might fail to out-perform a coin toss. However, if you do the same test 2000 times, the effect should start to emerge clearly.

The intuition is right, but...

If JREF wanted to allow 200 coin tosses with only a 1 in a thousand chance of success by luck, it would require 122 heads out of the 200 tosses. (Someone should check my math, I've spent the day making turkey broth.) The chance of a win for someone with a 51% chance of getting a head is only 1.8 out of a thousand.

Raise the number of tosses to 2000 and JREF would require 1069 heads. And the fictional 51% psiguy would have a 1.34 percent chance of winning. (If I did it right, you'd have to run nearly 25,000 trials before psiguy would have a better than even chance of winning.)

My limited experience with JREF is that they don't have much patience for long trials.
 
My limited experience with JREF is that they don't have much patience for long trials.

That may well be because nobody who claims an ability with such a low success-rate can ever explain how they came to know that they had that ability.

How would you notice that your coinflips are slightly biased? Who flips a coin 2000 times, keeps track and - assuming a result slightly out of the ordinary - proceeds to repeat the experiment with different coins etc?

Also, I have yet to see someone who applies and then not only understands the difficulties of such a test, but makes a reasonable effort to accomodate for them. IIRC the rules make it clear that the JREF will not pay for the testing procedures - all an applicant would have to do do get a long and complicated test done would be to pay test-team.
 
That may well be because nobody who claims an ability with such a low success-rate can ever explain how they came to know that they had that ability.

How would you notice that your coinflips are slightly biased? Who flips a coin 2000 times, keeps track and - assuming a result slightly out of the ordinary - proceeds to repeat the experiment with different coins etc?

Also, I have yet to see someone who applies and then not only understands the difficulties of such a test, but makes a reasonable effort to accomodate for them. IIRC the rules make it clear that the JREF will not pay for the testing procedures - all an applicant would have to do do get a long and complicated test done would be to pay test-team.

I helped--or tried to help--help on the Pavel negotiations. His claim was much higher than 51 percent but lower than 100 percent. Pavel proposed tests that would have been completed within a day at his expense. While some of his protocol design was perhaps overcomplicated, JREF was unaccommodating.

JREF is pretty clear that they're not interesting in conducting neutral scientific experiments. JREF is interesting in a challenge to debunk false claims of the paranormal, preferably in a way that educates the public.
 
You do? But it's nonsense!
The correct answer to that question is: We do not know.

I do what? I said I don't believe in the existence of psychic powers. However, if JREF wishes to test whether such powers exist, there's no reason that a discussion of the parameters of the hypothesis tests should be deemed ridiculous.

That part of my post didn't contain a question, so it's unclear what question you're referring to.

How do you propose to calculate beta in practice?

For particular values of the population mean and standard deviation, there are alpha-beta curves that specify the relationship between the two parameters. Of course, similar curves exist for given proportions, which would be more applicable to most MDC tests. For more details, see this text:

http://www.econ.nyu.edu/user/ramseyj/textbook/chapter11.pdf

As for "in practice," it's unclear why practice in this context differs from the standard examples in statistics texts.

Ridiculous.

Please clarify why reducing the prize to allow for an increased beta is ridiculous. Why is one million the magic number?
 
Last edited:
I do what? I said I don't believe in the existence of psychic powers. However, if JREF wishes to test whether such powers exist, there's no reason that a discussion of the parameters of the hypothesis tests should be deemed ridiculous.

That part of my post didn't contain a question, so it's unclear what question you're referring to.
When you click on that little arrow at the top of a quote, right next to the poster's name, it will take you back to the original post. Do that when you can't recall the context of a quote.

As for "in practice," it's unclear why practice in this context differs from the standard examples in statistics texts.
How do you obtain your effect size estimate?

Please clarify why reducing the prize to allow for an increased beta is ridiculous. Why is one million the magic number?
Ultimately, the JREF doesn't set a beta value.
if I understand you correctly, then your solution for decreasing beta is increasing alpha. When you can't get a real psychic to step forward by offering 1 million, then the solution is certainly not handing out smaller sums to non-psychics.
 
How would you notice that your coinflips are slightly biased? Who flips a coin 2000 times, keeps track and - assuming a result slightly out of the ordinary - proceeds to repeat the experiment with different coins etc?
Good point.
Those questions are rhetorical but here are answers nevertheless.

The answer to the last question is: These guys:
We analyze the natural process of flipping a coin which is caught in the hand. We prove that vigorously-flipped coins are biased to come up the same way they started. The amount of bias depends on a single parameter, the angle between the normal to the coin and the angular momentum vector. Measurements of this parameter based on high-speed photography are reported. For natural flips, the chance of coming up as started is about .51.
http://comptop.stanford.edu/u/preprints/heads.pdf

Persi Diaconis started out as a magician, btw.

The answer to the first question... Well, anyone ever notice how coin flips are really biased?
 
I argue that the best way to keep beta low is plenty of repetition, effectively increasing the sample size. It does not require letting more type I errors through, and thus no need to reduce the reward.

Intuitively: if someone claims they can perform a task 51 times out of 100, if you do the test 200 times, by chance they might fail to out-perform a coin toss. However, if you do the same test 2000 times, the effect should start to emerge clearly.

I agree with this. The question is really whose responsibility is it to include it in the protocol. The JREF is interesting in alpha, and the claimant should be the one interested in beta. Few, if any, are, though, and that may because they believe their abilities are infallible.

On the other hand, the JREF does not accept protocols that would take an excessive amount of time. The 2,000 repetitions example might be disallowed as excessive. (Didn't Pavel run afoul of this?) I'd like that particular MDC condition reconsidered.
 
On the other hand, the JREF does not accept protocols that would take an excessive amount of time. The 2,000 repetitions example might be disallowed as excessive. (Didn't Pavel run afoul of this?) I'd like that particular MDC condition reconsidered.

I think the JREF would consider protocols that take a long time, as long as the applicant is willing to compensate (monetarily) everyone involved for their time. We've seen groups of skeptics give up an afternoon or a whole day to administer a test. And remember, that's just the day of a test. It does not take into account all the time and effort of negotiating a protocol and acquiring and setting up equipment, etc.. That amount of time is something those groups seem willing to give freely in the same way that Civil War re-enactors give their time and effort for something that they enjoy.

However, when it comes to a protocol that must be administered over days/months/weeks, then it's a different story. Just like a Civil War re-enactor would expect to be compensated if he or she were participating in battle scenes for a movie which took weeks to shoot.

Eventually, a test becomes so complex and time consuming that the prize money will be eaten up by the costs of the test---and that's assuming that the applicant can actually pass the test and win the prize.

Ward
 
Earwig 0 again.

Back in the dim and distant days when I was a Psychology undergrad, I had a running disputation with my statistics lecturer on the subject of significance – not levels as such, but what it meant. It got to the point where on the Finals paper on Statistics and Experimental Design there was a question which meant nothing to my fellow students but was an invitation to me to expound!

Here goes.

If I carry out the same experiment 20 times – whatever the experiment – on one occasion out of the 20 I am likely to get significance at the 5% level. I say “I am likely … ” because we are getting into probability cadence: what is the probability of getting a significant result at, say, the 5% level if the effect is non-existent? It doesn't seem to me to be 5%, but ...

The significance level is actually irrelevant – it just changes how long it will take for it to happen.

Shift now to the disputation.

If I run 20 different experiments, I can expect one of those to produce a significant result at the 5% level – even if none of the 20 effects which I am investigating is “real”.

OK guys, over to you!
 
Last edited:
When you click on that little arrow at the top of a quote, right next to the poster's name, it will take you back to the original post. Do that when you can't recall the context of a quote.

I did look at the original post. It was unclear what you were referring to. Defining your pronouns ("But it's nonsense") would have helped.

How do you obtain your effect size estimate?

The effect size is calculated the same way it would be for alpha, as the difference between the sample proportion (correct answers/total answers in a given test) and the proportion of correct answers JREF believes a psychic should be able to achieve.


if I understand you correctly, then your solution for decreasing beta is increasing alpha. When you can't get a real psychic to step forward by offering 1 million, then the solution is certainly not handing out smaller sums to non-psychics.

If I understand you correctly, you're saying that anyone not able to achieve 100% correct answers all the time is a non-psychic. If so, you're defining away the issue we're debating here.
 
Earwig 0 again.

Back in the dim and distant days when I was a Psychology undergrad, I had a running disputation with my statistics lecturer on the subject of significance – not levels as such, but what it meant. It got to the point where on the Finals paper on Statistics and Experimental Design there was a question which meant nothing to my fellow students but was an invitation to me to expound!

Here goes.

If I carry out the same experiment 20 times – whatever the experiment – on one occasion out of the 20 I am likely to get significance at the 5% level. I say “I am likely … ” because we are getting into probability cadence: what is the probability of getting a significant result at, say, the 5% level if the effect is non-existent? It doesn't seem to me to be 5%, but

You are correct that p-values work only for supernatural abilities that differ only in quantity from normal abilities. But many abilities can be stated in this way. For example, all of us can defy gravity for half a second by jumping. However, the standard deviation of the time we spend in the air is pretty low, so spending even 15 seconds in the air would indicate some kind of ability.
 
The significance level is actually irrelevant – it just changes how long it will take for it to happen.

It seems to me that you're talking about the sample size, not the significance level.

As far as "waiting for it to happen,"what you're "waiting" for may never happen. The downside of drawing inferences off a smaller sample is that you're less certain that those inferences actually apply to the population. (I'm assuming that standard deviation, and effect size are held constant.)
 
I appreciate the answers I received to the earlier question, which proved useful when I taught that particular class.

The recent discussion about the what the significance levels OUGHT to be (rather than are) should be split into its own thread, IMO. But that's up to the mods.
 
I did look at the original post. It was unclear what you were referring to. Defining your pronouns ("But it's nonsense") would have helped.
You were agreeing with this:
A million tests will succeed in a row? That is so unrealistic in practical terms, even via conventional research. So, if a study found a p-value of 0.001, what is the probability of getting five 0.001 p-values in a row? Simple! 1_ X 10^-14
The problematic question/answer bolded by me.

The effect size is calculated the same way it would be for alpha, as the difference between the sample proportion (correct answers/total answers in a given test) and the proportion of correct answers JREF believes a psychic should be able to achieve.
That's not how the challenge works and I don't think it would be appropriate to change it so.

If I understand you correctly, you're saying that anyone not able to achieve 100% correct answers all the time is a non-psychic. If so, you're defining away the issue we're debating here.
You do not understand me correctly. I don't even know how you get the idea.
 
You were agreeing with this:


OK, now I see what you mean. I agreed that the significance level of 0.001 is far lower than that used in conventional research--a claim that does not require the (incorrect) calculation to justify it. The conventional significance level in most applications is 0.05.


You do not understand me correctly. I don't even know how you get the idea.

My copy function isn't working at the moment, but you implied that someone isn't a psychic can't achieve a p-value of less than <0.001 happens to be set. Is that correct?
 
Last edited:
That's not how the challenge works and I don't think it would be appropriate to change it so..

That's exactly how tests of conventional treatments (throughout the social and medical sciences) are executed. Why should, say, homeopathic treatments be tested completely differently?
 
That's exactly how tests of conventional treatments (throughout the social and medical sciences) are executed. Why should, say, homeopathic treatments be tested completely differently?

Because A) this isn't a clinical trial; it's a million-dollar challenge, and B) you're allowed to include an uncertainty factor here, which absolutely wouldn't be allowed in a clinical trial. Saying, "this magical remedy works in 80% of all cases" would give you an inherent p-value of .05, and is a claim that can be tested. The point of the MDC is to get near-certainty on the claim.
 
you're allowed to include an uncertainty factor here, which absolutely wouldn't be allowed in a clinical trial. Saying, "this magical remedy works in 80% of all cases" would give you an inherent p-value of .05, and is a claim that can be tested. The point of the MDC is to get near-certainty on the claim.

I don't understand.
(1) Are you saying that medical tests aren't conducted on treatments that only work sometimes?

(2) What does "inherent p-value" mean?
 
I don't understand.
(1) Are you saying that medical tests aren't conducted on treatments that only work sometimes?
Huh? No. Where on Earth did you get that from?

(2) What does "inherent p-value" mean?

It's part of the claim. MDC tests claims. If you say, "this succeeds 80% of the time", then that's what will be tested (assuming a protocol can be agreed on, and other requirements met).
 
I guess I misunderstood what you meant when you said an uncertainty factor wouldn't be allowed in a clincal trial. Sorry.

If I understand the second part right you're saying the MDC would require you to beat your own claim (say 80%) by more than chance--not just do better than 50/50. That makes sense, although a claimant who believed they could do 80 percent would still be wise to ask for a protocol that only beat 50 percent. Admittedly, "claimant" and "would still be wise" doesn't seem to fit together too well.
 
I guess I misunderstood what you meant when you said an uncertainty factor wouldn't be allowed in a clincal trial. Sorry.
Oh, yeah, I guess I didn't phrase that as well as I could, but you seem to have the basic idea now.

That makes sense, although a claimant who believed they could do 80 percent would still be wise to ask for a protocol that only beat 50 percent.

Not necessarily. Attempting to prove you can beat 50 percent with the degree of certainty needed for the MDC is going to be a lot harder, because it would require so much more testing to reach the necessary p-values for that claim. If you really think you can reach 80% consistently, it's better to ask for that, or just barely less; otherwise the testing may prove too expensive.
 
If I did this right, and if there isn't too much rounding error, then JREF's p=.001 can be met with 20/25 heads and the probability of doing better than that for someone with an 80% ability is 42%

But if the person claimed a 70% ability, JREF would require beating 42/60. And given a true ability of 0.8 the chance of doing that or better is 95%. So if you can go for a few more trials than it pays to claim a performance level lower than the truth.
 
If I did this right, and if there isn't too much rounding error, then JREF's p=.001 can be met with 20/25 heads and the probability of doing better than that for someone with an 80% ability is 42%

I don't believe one-tailed p-values should come in binomial experiments such as a coin. Psi in binomial experiments can be predicted as either a hit greater than chance or miss greater than chance. Thus, the proper p-value for a coin is two-tailed. Thus it would take the claimant about 21 heads in order to the pass the preliminary test. You are correct with the 80% probability though.

But if the person claimed a 70% ability, JREF would require beating 42/60. And given a true ability of 0.8 the chance of doing that or better is 95%. So if you can go for a few more trials than it pays to claim a performance level lower than the truth.

Effects that are large can be easily detected via small sample-size whereas small effects cannot.
 
The effect size is calculated the same way it would be for alpha, as the difference between the sample proportion (correct answers/total answers in a given test) and the proportion of correct answers JREF believes a psychic should be able to achieve.

Where did you get this idea? There are many ways of calculating an effect-size; however, an effect-size in a binomial experiment like coins, Zener Cards, etc. is calculated by the z-value of a t-test and dividing it by the square root of N. (Effect-size=z-value\sqrt. of N)


If I understand you correctly, you're saying that anyone not able to achieve 100% correct answers all the time is a non-psychic. If so, you're defining away the issue we're debating here.

I agree. Saying getting 59 heads out of 60 coin tosses is not evidence of psi is very absurd.
 
Last edited:
Saying, "this magical remedy works in 80% of all cases" would give you an inherent p-value of .05,

No, it doesn't. The significance level, which is what you're apparently referring to, is separate from the threshold proportion and the sample size.
 
Last edited:
It's part of the claim. MDC tests claims. If you say, "this succeeds 80% of the time", then that's what will be tested (assuming a protocol can be agreed on, and other requirements met).

A significance test does not simply test the claim, "this succeeds 80% of the time" The p-value tests the null-hypothesis, not the alternative.
 
My copy function isn't working at the moment, but you implied that someone isn't a psychic can't achieve a p-value of less than <0.001 happens to be set. Is that correct?
I don't see how you get that from what I wrote.
You could say that I think that a someone who's psychic ability is like a baseball players ability to throw a ball should have no trouble achieving virtually arbitrary p-values.
Any professional psychic should be able to achieve tiny p-values if they are able to do what they claim.
More generally, anyone whose psychic ability is so that they notice it in daily life should not have a problem with achieving low p-values in short time.

That's exactly how tests of conventional treatments (throughout the social and medical sciences) are executed. Why should, say, homeopathic treatments be tested completely differently?
That's fairly difficult to explain in a way that's understandable to someone who never dealt with "true believers".
A true believer will never give up his belief but always conclude that the test method is false, or that there was some foul play by nefarious forces. So when the tried and true methods of science say that homeopathy does not work...
That's what necessitates these individual protocol negotiations. And just because people are delusional, does not mean that they are stupid, even if they may have mild formal thought disorder. I think, IMHO, that much of the problem of reaching agreements comes from the fact that they navigate around negative results like a blindsighted person navigates around obstacles.

Anyways, it is not the JREF that dictates what someone should be able to do to qualify as psychic. The claimant should come up with the protocol. The JREF is only interested that whatever claim is being made cannot be achieved by normal means. That is certainly a departure from normal scientific practice but as far as normal science is concerned all that stuff is wrong/non-existent.
 
I don't see how you get that from what I wrote.
You could say that I think that a someone who's psychic ability is like a baseball players ability to throw a ball should have no trouble achieving virtually arbitrary p-values.
Any professional psychic should be able to achieve tiny p-values if they are able to do what they claim.
More generally, anyone whose psychic ability is so that they notice it in daily life should not have a problem with achieving low p-values in short time.

Not necessarily. It actually depends on how good is the psychic. A telekinetic, for example, can be a professional of getting 55% heads/tails. However, it is most likely that the psychic will not reach a p-value of 0.001 in the small run. If the JREF wants to test his/her claim with an 80% chance of getting a p-value of 0.001, JREF would need a sample-size of 1704 coin tosses. Of course this isn't a problem with claimants who have high hit rates and claim they are professional of doing so.
 
Last edited:
Back
Top Bottom