• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

JREF Challenge Statistics

I'd just like to see T'ai Chi answer anyone of the questions Claus asked......

But WE know that he will just continue to ignore...

DB
 
Thank you.

I am happy now; I know that it is not my reasoning that is wrong.

Your error is that you are continuing to look at each individual test, rather than at the cumulative nature of combining them in the meta-analysis. You also continue your misunderstanding of the reasoning behind alpha, but that is ok. The two z tests you compare are precisely the difference between laboratory parapsychology work (where the tc z reasoning determines alpha) and the challenge (where the merc z reasoning determines alpha and the cutoff). My hypothetical coin-flip example was merely a demonstration which combined the multiple challenge tests into one test, for ease of understanding. Your analysis here confirms that it is a sound argument.

My argument is with the test design, certainly; it is a design that is appropriate for the challenge, but inappropriate for a meta-analysis.

I do thank you for finally putting your reasoning on the table. My mind is at ease now.
I don't want to give T.C. ammunition but;

Wouldn't the example you gave about the strings of heads be analysable as a geometric distribution?
We've lost a massive amount of information in the throwing away of data so our certainty of the head to tails ratio will be lower, but I think we can create an unbiased estimator of E(heads) as (Occurences of a string of heads of length L+1)/(Occurences of a string of length L).

In a similar way the data from a repeated JREF experiment of the form 2 failures for 20 trials could be analysed as a tail-truncated sum of two geometric distributions.

I'm mentioning this because if I'm right, the strings of successes generated by paranormal 'researchers' could be re-analsysed as a head truncated geometric distribution, to generate an unbiased estimator.

Thoughts anyone?
 
I don't want to give T.C. ammunition but;
Don't worry about that--first off, he was comparing to .5, so this is a different animal altogether. Secondly, if it turns out he's right, that is what is important, not winning some argument.
Thoughts anyone?
At first glance, that looks really neat--I would certainly defer to someone who knows more about math than I do (drkitten?), but I do think that addresses the systematic bias that I was talking about.
 
I would hardly claim to know more than Mercutio about statistics, but as far as I can tell, the question you asked is a legitimate one that T'ai Chi should have asked.

Wouldn't the example you gave about the strings of heads be analysable as a geometric distribution?
We've lost a massive amount of information in the throwing away of data so our certainty of the head to tails ratio will be lower, but I think we can create an unbiased estimator of E(heads) as (Occurences of a string of heads of length L+1)/(Occurences of a string of length L).

Yes, we could do this analysis. But we would need essentially to build our statistics, our estimator, and our tables of significance from scratch, and to perform meta-analysis on this kind of data under field conditions would be a nightmare.

In particular, building our estimator hinges crucially on the idea that there are lots of people, all running near-identical coin-flip experiments. Under field conditions, this is exactly what we don't get -- instead, we get one person who can "influence" coin flips, another person who can "predict" the fall of a pair of dice, a third who can clairvoy (is that a word?) the cards drawn from a conventional deck, all to different claimed threshholds of accuracy. And that's not counting the nutcases who believe that they can summon UFOs.

Again, the people who are in the greatest need of this kind of analysis are not the JREF, but the field researchers at the parapsychology department of Redbrick Uni; the JREF has neither the facilities, the interest, the mission, nor the capacity for doing this kind of meta-analysis.
 
Yes, we could do this analysis. But we would need essentially to build our statistics, our estimator, and our tables of significance from scratch, and to perform meta-analysis on this kind of data under field conditions would be a nightmare.
Agreed. It would be one fun Monte Carlo simulation, though, no?
In particular, building our estimator hinges crucially on the idea that there are lots of people, all running near-identical coin-flip experiments. Under field conditions, this is exactly what we don't get -- instead, we get one person who can "influence" coin flips, another person who can "predict" the fall of a pair of dice, a third who can clairvoy (is that a word?) the cards drawn from a conventional deck, all to different claimed threshholds of accuracy. And that's not counting the nutcases who believe that they can summon UFOs.
Correct me if I am thinking fuzzy on this...it seems to me that the type of test and claimed level of accuracy are much less important in this analysis than on the sort that TC was asking for. Or maybe I am just looking at the bias I saw, and am ignoring some other source of bias.
Again, the people who are in the greatest need of this kind of analysis are not the JREF, but the field researchers at the parapsychology department of Redbrick Uni; the JREF has neither the facilities, the interest, the mission, nor the capacity for doing this kind of meta-analysis.
Agreed wholeheartedly. Although it might (might, I say, I am speaking out of ignorance here) be a fun project for someone pursuing a math degree!
 
Correct me if I am thinking fuzzy on this...it seems to me that the type of test and claimed level of accuracy are much less important in this analysis than on the sort that TC was asking for. Or maybe I am just looking at the bias I saw, and am ignoring some other source of bias.
It's still not really applicable to the JREF stats, just because of the pick 'n' mix nature of their tests. Something which is necessary for them to test every applicant.
Agreed wholeheartedly. Although it might (might, I say, I am speaking out of ignorance here) be a fun project for someone pursuing a math degree!
*Sigh*.
If someone could point me towards an electronic copy of the data recorded by paranormal researchers, and could talk me through the relivant stats. I'd be prepared to spend an hour or two MATLABing the data.

For a sufficently large body of data exponental decay should be a valid approximation for the inital stages. Does anyone in the know about radioactive physics want to fill us in on significance values etc.?
 
Agreed. It would be one fun Monte Carlo simulation, though, no?

For a sufficiently limited and personal definition of "fun," perhaps.

Correct me if I am thinking fuzzy on this...it seems to me that the type of test and claimed level of accuracy are much less important in this analysis than on the sort that TC was asking for.

Well, the type of test would be crucial because it would establish what our baseline for "chance" is, and it would need to be assessed, in detail, for each claim. (For example, if I claim to be able to predict the next card drawn from a poker deck : are the cards drawn with or without replacement? What kind of feedback do I get?)

The claimed level of performace is less relevant, but I think that would need to be assessed on a case-by-case basis.

I still think it would be too difficult to do well, and too easy to do badly -- and therefore shouldn't be done at all.
 
The claimed level of performace is less relevant, but I think that would need to be assessed on a case-by-case basis.

I know, quoting my own post is a sure sign of narcissism or something. But I just realized something relevant regarding bias.

The claimed level of performance is important, because in some regards, it determines the stringency of the controls that Randi & Co. would need to apply in testing conditions.

Just as an example -- if I claimed to be able to predict with 100% accuracy whether or not a pregnant woman was carrying a male or a female (a traditional question of divination, btw), that would indeed be paranormal. if I claimed to be able to predict with 51% accuracy -- that's not that hard, since the ratio of males to females at birth is approximately 51.5% males for most First World countries. That 1.5% might make all the difference between a successful and an unsuccessful test, depending upon how Randi set up the test (and calculated the probability of the null hypothesis).

In particular, if I claimed 100% accuracy, Randi could safely ignore this little demographic factoid in designing the test. Even if I scored slightly above "chance" (as defined by a 50/50 null hypothesis), his millioni would still be safe.
 
I don't know what the alpha level is on the JREF preliminary test (the ones that are statistical in nature that is) but if it's 0.001 then it's too large IMO. To prevent fraud you have to have it such that a large number of people can't take the test in a short time and have one win by chance and then go "A-ha!"

Yes, it's the claimaints responsibility to pay for the trip to the JREF and it'd be hard to get a large number of people from all over the world to do that, but there are plenty of people that already live close to the JREF and it would cost them almost nothing to take the challenge. All someone would have to do is organize a large number of them to take the test. Have them all make the same claim so it'd be easy to train them at once. March several of them into the JREF every day. You'd only need enough to cover a year because after a year a person is eligible to take the challenge again.

If all the tests were the same and the alpha level were 0.001 then that chance someone wouldn't win by chance on a single test would be 0.999, which means it'd take only 693 people taking the test to have a greater than 50% chance that someone would pass by luck alone.

If someone did this the JREF would be able to figure out their trick and they'd know they were being scammed but they'd be in a position where backing out would make them look bad.

Why would each person take the challenge? Consider the cost, chance of winning and the reward. The cost is a few hours of your time. The chance of winning is 1 in 1000 (if we have alpha=0.001). The reward is to be able to say you beat James Randi in his challenge. The woos worldwide would eat this up and the winner of the preliminary challenge would make a lot of money from them as a result. S/He'd be a cult hero to them. You'd make, at a minimum, tens of thousands of dollars, if not millions, all for a few hours of your time at a 1 in 1000 shot.

The fact that you passed the preliminary challenge instead of the real challenge would be lost in the shuffle. The fact that you failed when the real challenge came along would get lost too. Or, if someone passed the preliminary challenge they'd likely just not take the real challenge
because they'd have more to lose than to gain at that point.

I hope the alpha is generall smaller than 0.001. Tests can be set where there is both a good chance the person will pass the challenge if they have the abilities they claim and a small alpha.
 
Last edited:
Why would each person take the challenge? Consider the cost, chance of winning and the reward. The cost is a few hours of your time. The chance of winning is 1 in 1000 (if we have alpha=0.001). The reward is to be able to say you beat James Randi in his challenge. The woos worldwide would eat this up and the winner of the preliminary challenge would make a lot of money from them as a result. S/He'd be a cult hero to them. You'd make, at a minimum, tens of thousands of dollars, if not millions, all for a few hours of your time at a 1 in 1000 shot.

You can't get those good odds in Vegas. You risk nothing, you win all.
 
I don't know what the alpha level is on the JREF preliminary test (the ones that are statistical in nature that is) but if it's 0.001 then it's too large IMO. To prevent fraud you have to have it such that a large number of people can't take the test in a short time and have one win by chance and then go "A-ha!"

[snip]

If all the tests were the same and the alpha level were 0.001 then that chance someone wouldn't win by chance on a single test would be 0.999, which means it'd take only 693 people taking the test to have a greater than 50% chance that someone would pass by luck alone.

It hasn't been a problem so far; at present rates, 693 people taking the test would represent something like a century of testing. Not only will Randi be long-dead by then, but probably so will you, I, and the person selected to replace Kramer as the challenge coordinator.

I think that the 0.001 alpha level does a good job serving the Educational part of the JREF mission. If hundreds of people were to apply in a very short time with identical or near-identical claims, especially with something that could obviously related to simply getting lucky with wild guesses, I think that would actually help the cause of skeptical reasoning. Very few people, for example, think there's anything supernatural involved when they read about someone winning the lottery -- there are millions of tickets sold, and someone has to be a winner. This applies on a smaller scale, too.... if I win a new car at the firehouse raffle, it's not because I'm magic, but because there were only 1000 tickets sold, and someone had to get it. If it can be made obvious -- and people like Penn and Teller, Michael Shermer, and the Mythbusters are very good at making
things obvious -- that of the hundreds of people who entered the JREF raffle, one person finally drew a winning ticket, that's actually a strong argument against the paranormal.

Especially when they then fail the final test with results not much better than chance.
 
My understanding is that the .001 level is for preliminary testing only, with results expected to meet the .000001 level for the actual MDC.

That's correct. I think 6's point is that even passing the preliminary test (irrespective of one's performance on the final) could and would be spun as a tremendous step forward for paranormalism. "Even Randi admits that this kind of performance merits further investigation!" Et cetera, et cetera, and so forth.

And to some extent he's right, because "against stupidity, the Gods themselves contend in vain," and there are demonstrably still people out there who believe all sorts of dumb things. But I also think that he's wrong, because the percentage of those people is slowly getting smaller and smaller. Even homeopaths will go in for surgery if they get appendicitis.
 
And to some extent he's right, because "against stupidity, the Gods themselves contend in vain," and there are demonstrably still people out there who believe all sorts of dumb things. But I also think that he's wrong, because the percentage of those people is slowly getting smaller and smaller. Even homeopaths will go in for surgery if they get appendicitis.

Ah, I see now.

As to the stupidity quote, that only makes sense. After all, you wouldn't expect a god to be more powerful than that which created it...

*Huntsman ducks and runs*

:D
 
If I were a cheerleader for the paranormal I'd be very keen to see these statistics compiled, so that I could claim that paranormal activities are observed, just not at a level required to win the challenge. Think about Brian Josephson's hysterical response to the testing of 'The Girl Without X-Ray Vision' (as she's now known), namely that since she performed better than chance there must be something there even though she didn't meet her own agreed criterion for success. The cry for a statistical anaysis sounds awfully like a prelude to an attempt to claim the same thing for challenge applicants en masse.
 
If I were a cheerleader for the paranormal I'd be very keen to see these statistics compiled,

Or just any person who is curious about seeing the actual data from interesting tests.

so that I could claim that paranormal activities are observed, just not at a level required to win the challenge.

If there's nothing there, what does one have to be afraid of?
 
Or just any person who is curious about seeing the actual data from interesting tests.
Especially those who are ignorant of statistics.
If there's nothing there, what does one have to be afraid of?
Misinterpretation of statistical artifacts as effects.

Hey, it happens! Regression to the mean gets misinterpreted quite often. I'd hate to see what sort of spin a less obvious artifact gets!
 
Especially those who are ignorant of statistics.

But we're all ignorant of the statistics if we're not able to see any actual statistics.

What % of the applicants have been female? Interesting question. Seems unnecessarily difficult to get a numeric answer.
 
But we're all ignorant of the statistics if we're not able to see any actual statistics.
Not ignorant of the statistics. Ignorant of statistics. Unable to understand a bias inherent in the accumulation of trial data, for one example.
What % of the applicants have been female? Interesting question. Seems unnecessarily difficult to get a numeric answer.
It is a self-selected sample. Suppose you found a particular percentage female; what possible interpretation could you give it? Is that the percentage in the population? Are there pressures that might lead a greater proportion, or a smaller proportion, of women to apply?

The data cannot answer those questions, even if they were available; for the purposes of the challenge, there is no reason to make any inquiries of this sort. They are meaningless data.

What sort of things do you think you could learn from the gender percentages of these data? Why do you think they are appropriate?
 
Unable to understand a bias inherent in the accumulation of trial data, for one example.

The only thing presented on that note was a poor argument based on optional stopping occurring, something which does not actually happen in JREFs well-designed tests, and based on observed data being tested against what the claimant claims, something which also does not accur since z-scores are in the form

z = (observed-expected by chance)/stuff

Suppose you found a particular percentage female; what possible interpretation could you give it? Is that the percentage in the population? Are there pressures that might lead a greater proportion, or a smaller proportion, of women to apply?

The hang-up is thinking of doing inference based on the %. Viewing the % as a descriptive statistic, there are no problems whatsoever.

They are meaningless data.

You do not want to learn numerical results about the test?

What sort of things do you think you could learn from the gender percentages of these data?

The characteristics of the applicants. That seems interesting to know what type of people took the test, gender, where they are from, age, so on.

As do the categories of claims tested.

As do the scores from the tests, for reasons already explained.
 
Last edited:

Back
Top Bottom