• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

JREF Challenge Statistics

Completely unrelated, but does T'ai Chi's avatar give anyone else a headache?
 
Completely unrelated, but does T'ai Chi's avatar give anyone else a headache?
More of an eyeache than a headache in my case, but I do find it distracting and anoying. I suppose if you choose an avatar to reflect your personality this sort of thing is bound to happen.

Robert
 
It does seem theoretically possible.

Let's take a simple case where someone says that they can predict whether a coin will come up heads or tails. A test is designed whereby they don't get to touch or even see the coin (they assure us that this will not affect their powers). Assume you require 10 coin tosses, and they need to get all of them right to pass the preliminary test. As any computer programmer could tell you, that's a 1/1024 chance by chance alone - not by any means impossible.

For the actual test, let's say that they double the number of coin tosses. That's a 1/1048576 (and yes, I'm enough of a geek to type that from memory) of chance coming to the rescue, and 1/1024 * 1/1048576 is slightly less probable than 1 in a billion.

I have no idea what the tests look like, but I'm guessing that 1 in a billion chances are a fair bit less than what would be accepted as some sort of positive demonstration. While it is not likely that even 10000 people or more will pull this off... it's not impossible by any means.
 
The trick is, 666, the test will depend on the claim. If the claim is that a person can predict 90% of coin tosses, it can be falsified much more easily than a claim of 55%. But the trick is, a test that fails to achieve a claimed 90% may be ended before sufficient data are collected to answer the 55% claim; as such, it would be inappropriate to include those data in a simple "against chance" test. This would have nothing to do with whether the person had any powers at all, but simply a practical consideration of test design.
 
I'm specifically talking about the preliminary tests. I'ev been informed that the alpha for these is typically .001.

Apparently you were (ahem) misinformed.

The standardized alpha cutoff of 0.001 for a preliminary test is JREF's nominal maximum that they will accept for for a preliminary test, when it is practical to calculate.

Depending upon what the claim is, the claimant may suggest something that is much less probable than 0.001, or even something for which calculating an alpha cutoff is impractical because we can't even determine a baseline situation.

As a simple example, a recently accepted protocol involved the claimant suggesting he could summon UFOs. Offhand, I don't know how to estimate the a priori probability of something that has never been reliably seen in human history, but I suspect it's much less than 0.001. Similarly, if I claim to be able to levitate for thirty seconds without any physical support, that would certainly be a paranormal claim, almost certainly be accepted by the JREF, and much less likely than the nominal alpha cutoff. On the other hand, if I claim to e able to detect whether a given person (perhaps by being given a personal article and using some form of psychperception), we can directly calculate the probabiliy of my getting N correct answers simply by guessing, and set N to be "high enough" to give us the desired cutoff.

Until and unless we can calculate actual alpha cutoffs for each test as it is performed, we will not be able to assess the overall probability that the JREF challenge will be met "by chance alone."
 
In this example:

http://www.randi.org/jr/032902.html



What's the alpha for these tests, and how it is calculated? I think there's enough information there for a calculation to be made.

There is not enough information there for us to calculate an alpha value.

The test description makes it very clear what "chance" performance is : random guessing wll result in "Mike" finding one item out of ten correctly, a 10% chance (or more formally, p = 0.10 per trial). Since the trials are independent, the chance of him getting two out of two correctly would be 0.01 (0.10^2), and more generally, the chance of him getting N out of N correctly would be (0.10^N).

However, we are not told how many he would have needed to get correctly to succeed on this test. If he were required to correctly find all ten items, the alpha cutoff would be 0.10^10, or 0.0000000001, one in ten b-for-billion. If he were only required to get eight out of ten, the mathematics gets a little more complicated and I'd have to pull out the binomial distribution to answer it. (Please don't make me do math. You wouldn't like me when I do math.)
 
Apparently you were (ahem) misinformed.

The standardized alpha cutoff of 0.001 for a preliminary test is JREF's nominal maximum that they will accept for for a preliminary test, when it is practical to calculate.

Depending upon what the claim is, the claimant may suggest something that is much less probable than 0.001, or even something for which calculating an alpha cutoff is impractical because we can't even determine a baseline situation.

As a simple example, a recently accepted protocol involved the claimant suggesting he could summon UFOs. Offhand, I don't know how to estimate the a priori probability of something that has never been reliably seen in human history, but I suspect it's much less than 0.001. Similarly, if I claim to be able to levitate for thirty seconds without any physical support, that would certainly be a paranormal claim, almost certainly be accepted by the JREF, and much less likely than the nominal alpha cutoff. On the other hand, if I claim to e able to detect whether a given person (perhaps by being given a personal article and using some form of psychperception), we can directly calculate the probabiliy of my getting N correct answers simply by guessing, and set N to be "high enough" to give us the desired cutoff.

Until and unless we can calculate actual alpha cutoffs for each test as it is performed, we will not be able to assess the overall probability that the JREF challenge will be met "by chance alone."


Excellent point, and I was going to add that you cannot really calculate the statistical odds of a success by deception, or by delusion. That is, there is no reason to assume that all of the candidates are using random guessing as their strategies, so there is no reason to believe that their results will be randomly distributed. Of course the JREF works to eliminate any correlation between these other factors and actual results, so over a large sample the results will be random. But the samples in question are not really large because every test is different.
 
I've been thinking about this for quite some time, and finally put up a webpage on it

You write on your page:

I think it would be interesting if they made the data more accessible. Not everyone can afford to fly to Florida, forget about their job and etc., and spend what most likely would be weeks searching through paper files.

Making the data more accessible will require money. Are you a paying member of JREF?

Each test is a "Stand Alone" event. Like throwing dice or flipping a coin, the current test is not dependent on the previous test. The probability of "Success by chance" does not change after any number of tests--unless the current testee has learned new tricks from the previous tests...

rwguinn made the best point, I think. The tests are seperate events. Someone losing one does not increase the chance of someone else winning. If you flip a coin 100 times and get all heads, the probability of you getting heads the next time is still 1/2.

Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

A rookie error only someone totally ignorant of statistics would make.
 
Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

I disagree. T'ai Chi is fairly explicit about the hypothesis that he is testing, and it's not related to the gambler's fallacy at all:

This information could allow one to test the incredible notion that Randi, or skeptics in general, exert a "negative energy" on those they are testing, and cause the results to be worse than what one would expect.

My reading is that this is yet another thinly-disguised accusation of cheating on the part of the JREF. The idea, of course, being that if we had seen 10,000 preliminary tests at a nominal alpha cutoff of 0.001, then the results are being biased against success, or alternatively that Randi & Co. are not giving claimants a fair shot at the million. In principle, this is no different than my noticing that almost no one manages to find the lady at the three-card Monte game down on the corner, and that therefore it's probably rigged.

Unfortunately, we don't yet have a sufficient sample size to be able to make any meaningful determinations, and at the current rate of three or four preliminary tests per year, I don't expect to have enough data within my lifetime. Nor should T'ai, unless he expects his martial arts practice to grant him a supernaturally prolonged life.
 
You write on your page:



Making the data more accessible will require money. Are you a paying member of JREF?





Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

A rookie error only someone totally ignorant of statistics would make.


Except that the tests are NOT independent. Each claiment has the opportunity to learn from previous claiments. This could be signifgant if the claiment plans to cheat, or if the claiment really has psychic powers, but they happen to be tempremental.
 
Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

A rookie error only someone totally ignorant of statistics would make.
To be fair, this does mean that the probability of someone passing one of the first 10,000 preliminary tests is not just based on an estimation of the pass rate but our knowledge of existing failures.

I can see how this could be confusing for the poor chap.
 
Except that the tests are NOT independent. Each claiment has the opportunity to learn from previous claiments. This could be signifgant if the claiment plans to cheat, or if the claiment really has psychic powers, but they happen to be tempremental.

I can see how it might be useful if they plan to cheat, but the test will most likely be given by different people and have an at least slightly different protocal, so I'm not sure knowing that other people failed will help all that much.

When I think of results that depend on previous results, I picture pulling numbers out of a hat - for each one you pull out, the chance of getting a specific number increases.
 
The standardized alpha cutoff of 0.001 for a preliminary test is JREF's nominal maximum that they will accept for for a preliminary test, when it is practical to calculate.

First, one doesn't calculate alpha, one sets it before the experiment. Second, ... OK? I'm not sure how this goes against what I've been saying that .001 is the typical alpha for a preliminary tset.

For example, from http://www.randi.org/jr/08-24-01.html, Randi writes

As always, as described in the rules, a preliminary test for the JREF prize would be performed. That test would have odds of only 1 in 1,000 against the results being positive by chance alone. Should your product pass this preliminary test, we would be prepared, as outlined in our published rules, to go to the second and final test for the million-dollar prize.

Similarly, if I claim to be able to levitate for thirty seconds without any physical support,...

I'm not talking about all possible preliminary tests, but only those that are statistical in nature, and I make this very clear. The claims you describe above are not statistical in nature, they're not like having 10 cups, with gold under one of them, and the person gets 20 trials, etc.
 
My reading is that this is yet another thinly-disguised accusation of cheating on the part of the JREF.

I'm not sure why you have the need to read something sinister into it.

The 2 things that one would hope to get out of seeing such data are:

1) the actual data! It would be nice to actually see some

2) testing incredible claims that skeptics make results lower than expected by chance

These are scientific matters, not pot-shots at JREF.
 
What I spend my money on is none of your business.

Indeed. But if you request something that will cost JREF money, the very least you could do is to support JREF with money.

If not, you insist that others pay for what you want.
 
Except that the tests are NOT independent. Each claiment has the opportunity to learn from previous claiments. This could be signifgant if the claiment plans to cheat, or if the claiment really has psychic powers, but they happen to be tempremental.
You are wrong. The tests are independent. Each claimant has a test designed to test the specific claim.
 
You are wrong. The tests are independent. Each claimant has a test designed to test the specific claim.

The fact remains that claiments have an opportunity to learn from other claiment's test, or even their own previous tests. The tests are independent only if the the only factor in the outcomes of the tests is chance. Clearly chance is a big part of most tests, but the fact is that most claiments are NOT using random choice as their strategy.

One obvious example would be a person that looks at a test by a previous claiment and sees a way to cheat not anticipated by the JREF. This might inspire the new claiment to practice and then apply with the same protocol as the previous claiment.
 
First, one doesn't calculate alpha, one sets it before the experiment.


Wrong.

Second, ... OK? I'm not sure how this goes against what I've been saying that .001 is the typical alpha for a preliminary tset.

Because 0.001 is not the "typical" alpha, but the nominal maximum alpha.

Case in point, the experiment with Mike cited earlier. If Mike were required to get 20/20 correct to pass the preliminary test, the alpha cutoff would not be 0.001, but 0.0000000001, and we have no knowledge about whether Mike is "typical."
 

Back
Top Bottom