• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Statistical significance

The second is that the structure of the challenge says that you must succeed to a 1/1000 level of significance on two occasoins.

Where is that stated in the challenge rules?

From http://www.randi.org/research/challenge.html

[FONT=arial, helvetica]Upon properly completing this document and agreeing upon the test protocol, you will receive your application back, signed on the reverse by JR. The applicant then becomes eligible for the preliminary test, which, if successful, will result in the formal test.


The fact that there are independent preliminary and formal tests means that you must succeed on two occasions.

Cheers,
Ben
[/FONT]
 
this is also from the csicop article......

why is this necessary instead of (as well as) the inc/excl odds? It seems a rather unmathematic method - ie looking at the test results, and then using them to calculate what you were measuring the test against....

I chose the criterion for the test within a Bayesian framework. Within this framework, the probabilities for the matching procedure given in the preceding table are only part of the story. We have to consider explicitly the prior odds of both the null and the alternative hypotheses. The information provided by the outcome does not, in itself, provide us the probabilities that the null or the alternative hypothesis is correct. Instead, the information from the outcome is used to revise the prior odds.

A logical problem with the NHT was recognized by Fisher. For both the null and the alternative hypotheses we can calculate the probabilities for each possible outcome. For example, given the null hypothesis, the preceding table gives the probability of four correct matches as .0139. The probability of four correct matches given the alternative hypothesis is .1562. The problem is that the investigator is not interested in these probabilities. Rather, he or she wants to know the probability that the null or the alternative hypothesis is true given the outcome of the test. This is a subtle, but crucially important difference. The experiment or the test provides us with data (an outcome). We can compute the probability of this outcome given the hypotheses.

What we want, however, is the probability of the hypotheses given the outcome. This is the problem of “inverse probabilities.” Philosophers and statisticians engage in complicated and never-ending debates about whether such inverse probabilities can be justified. The difficulty is that we need to know the prior probabilities of null and alternative hypotheses before we can get the probabilities for these hypotheses after we have observed the outcome.

In our case, the Bayesian context requires us to specify two hypotheses to compare. In addition, we have to specify a prior probability that each is true. Consider the claim that Natasha has X-ray vision and can use this ability to diagnose medical conditions. What are the odds that this claim is true? The Bayesian approach is often criticized because the assignment of prior odds to hypotheses is subjective and arbitrary. This article is not the place to debate this matter. I only need to say that we have an empirical basis for assigning prior odds to Natasha’s claim. The assignment does not need to be exact. A crude approximation will do.

snip

I decided to assume that the prior odds in favor of the null hypothesis were 99:1. This means that I was also assuming that the prior odds against the alternative hypothesis are also 99:1. The null hypothesis in our test is that the average number of correct matches will be one. The alternative hypothesis is that the average number of correct matches will be five. These two hypotheses are statistical hypotheses. The statistical procedure uses the outcome of the test as a basis for deciding between these two statistical hypotheses. We should distinguish these statistical hypotheses from conceptual or substance hypotheses. [3]
http://www.csicop.org/specialarticles/natasha2.html
 
this is also from the csicop article......

why is this necessary instead of (as well as) the inc/excl odds? It seems a rather unmathematic method - ie looking at the test results, and then using them to calculate what you were measuring the test against....

http://www.csicop.org/specialarticles/natasha2.html

Unfortunately the description given is incomplete. If his alternate hypothesis is that she has complete x-ray vision, then her odds of making any mistake are close to zero, and 3 failures are strong evidence against it. So when he did that calculation he had to be calculating it based on a specific hypothesis that she would have a specific success rate. Without knowing that rate, nobody can duplicate his calculation.

But that said, it is a perfectly valid mathematical calculation with a solid foundation in probability theory. The only question is whether it is the right method to use.

The idea is simple. Given a set of specific hypothesis and prior beliefs about their relative likelyhoods, one can use Bayes' Theorem to calculate the right posterior expectations and beliefs. So come up with a reasonable set of hypotheses, run the formula and you get a reasonable set of conclusions. The obvious catch is where you get the hypothesis and beliefs. And the answer is that you have to make them up. Which people are resistant to for fairly obvious reason. (But in practice surprisingly often you'll find that when people do make them up and run the experiments, their conclusions are surprisingly close to each other.)

It seems far more "scientific" to use hypothesis testing instead. However the problem with hypothesis testing is that it has a rather dubious philosophical foundation. (I've discussed this extensively in this thread.) It is simple and gives unambiguous answers so people like it. However there is a persistent well-educated minority who is bothered benough by its poor underpinnings to choose to not to use it. Ray Hymen would appear to be one of them.

The key problem, as he points out, is that the question everyone really wants to answer in statistics is fundamentally impossible. It requires data that we do not have. The approach for Bayesian statistics is to make up the data we do not have (or better yet see what happens if we make up different sets of data). The approach of hypothesis testing is to ask a somewhat related question that we actually can answer. (And then most people promptly mistake the question answered for the one they really wanted answered...)

So in summary, the procedure that he uses is unfamiliar to most people, but it is not a mathematically unreasonable procedure to use.

Cheers,
Ben
 
The fact that there are independent preliminary and formal tests means that you must succeed on two occasions.
I already knew that. I asked specifically about your claim that the rules require an applicant to succeed at a 1/1,000 level of significance on two occasions. That's not in the rules set forth by the Randi Foundation. In fact, Drkitten states that generally the second test must be at the 1/1,000,000 level of significance, but that's not in the rules either. Further, drkitten raises the possibility that the Randi Foundation might not be willing to monitor a time-consuming test, such as thousands of ganzfeld trials.

In light of both this fact and the fact that Dean Radin and other parapsychologists contend that ganzfeld experiments have been successful at a rate far exceeding chance over many years, it seems to me that the Randi Foundation should explicitly specify what would constitute a ganzfeld test successful enough to warrant award of the one million dollar prize. Otherwise, I think parapsychologists can properly argue that the prize isn't actually available for successful ganzfeld experiments.
 
I already knew that. I asked specifically about your claim that the rules require an applicant to succeed at a 1/1,000 level of significance on two occasions. That's not in the rules set forth by the Randi Foundation. In fact, Drkitten states that generally the second test must be at the 1/1,000,000 level of significance, but that's not in the rules either. Further, drkitten raises the possibility that the Randi Foundation might not be willing to monitor a time-consuming test, such as thousands of ganzfeld trials.

I just re-read the rules and the FAQ in more detail. I had been under the impression that the preliminary and formal tests were identical. However http://www.randi.org/research/faq.html#5.2 fixed that mis-impression. I don't see in the official rules anything about the official level of significance, and I haven't been around as long as drkitten. So if she thinks that the formal test is to 1/1,000,000, then she would know better than I.

As for willingness to monitor How much time would be required? suppose you need 10,000 trials. Suppose each trial takes 6 seconds. So you need 60,000 seconds, which is 1000 minutes, which is about 17 hours. So 2 solid days of testing. No single person will volunteer for that, but you might get people to volunteer for different segments of time. Or a procedure could be agreed on where the two people wind up being left for extended periods of time inside of two locked rooms (locked from the outside of course) while data was being collected by a computer program. And as a last resort, the testee could offer to pay for observers. (That is unlikely to be necessary though, it shouldn't be hard to find a volunteer to write the computer program, which could then be independently verified by both parties.)

In short the logistics could be worked out if there was a sufficiently motivated testee.

In light of both this fact and the fact that Dean Radin and other parapsychologists contend that ganzfeld experiments have been successful at a rate far exceeding chance over many years, it seems to me that the Randi Foundation should explicitly specify what would constitute a ganzfeld test successful enough to warrant award of the one million dollar prize. Otherwise, I think parapsychologists can properly argue that the prize isn't actually available for successful ganzfeld experiments.

I think this would be useless.

There are an infinite number of criteria that would be accepted by the Randi Foundation, and the boundary between "acceptable" and "unacceptable" is derived by well-known probability theory calculations. Therefore the criteria to be used will always be the one that the testee is happy with. For instance someone who thinks they succeed at a 35% rate will want to settle for a criteria that requires fewer trials - why do extra work when you do not think you'll need it? By contrast someone who thinks they succeed at a 27% rate will want to ask for more trials because they think they are unlikely to succeed with fewer trials.

Therefore an official statement by the Randi foundation that, "This number of successes at this many trials will earn the prize" would not be useful. Because serious applicants are unlikely to want that specific setup. It is better to be flexible.

Incidentally note that Dean Radin et al contend that does not mean that it is true. For instance there is a selection bias problem. For instance suppose that over time 10 researchers test 100 people with 100 trials each, and then fail to report their bottom 10 people. Those people will, on average, be about 7 successes short of average. On average those successes happened - they just happened in the included group. So each experimenter will report 9,000 trials and on average be about 70 successes ahead of average. The result is a data set of 90,000 trials that is about 700 successes ahead, which is over 5 standard deviations away from expected.

Even if only some researchers fail to report some data, it doesn't take many to become significant. And those who fail will often fail to report a lot more than the bottom 10%. Plus there are other ways they can fool themselves. The result is a mass of statistics presenting an apparently compelling case.

But that issue is neither here nor there. Should Dean Radin be convinced that he has a good enough applicant to win, he is free to help that applicant apply and be properly tested to a mutually agreed on protocol. Even if being tested properly would cost more than a million dollars, the publicity of actually being able to win the contest would be more than worth it to them.

Cheers,
Ben
 
snip

So in summary, the procedure that he uses is unfamiliar to most people, but it is not a mathematically unreasonable procedure to use.

Cheers,
Ben

really interesting post - i'll do some reading on this :)
 
Someone with a great deal of practice in cold reading given 4 hours to "diagnose" conditions should be expected to do better than blind chance.....So say you wanted to take this into account.....starting with an assumption of ability is one option - if they have x-ray eyes then 7/7 should be no problem.....

but two other possible options;

Perhaps running a parallel test with a skilled cold reader - and then using their results as null.

choosing arbitrary probabilities for correctly guessing the first condition (and later ones) (based on some reasoning!)


For this second option, say you started with an arbitrary 1/5 chance of correctly guessing the first condition - how could you ammend the below formula to take this into account? Can you do it?

http://www.ds.unifi.it/VL/VL_EN/urn/urn6.html

[latex]$

P(N_n=k)= \frac{1}{k!}\sum_{j = 0, ..., n - k}\frac{(-1)^j}{j!}

[/latex].
 
Last edited:
In light of both this fact and the fact that Dean Radin and other parapsychologists contend that ganzfeld experiments have been successful at a rate far exceeding chance over many years,

I contend that they haven't -- that the experiments are fraudulent and the researchers are incompetent.

Good. Now we've both expressed our opinions.

And Dean Radin can get in line alone with everyone else.
 
I contend that they haven't -- that the experiments are fraudulent and the researchers are incompetent.
Evidence?

And Dean Radin can get in line alone with everyone else.
But you have expressed the opinion that a ganzfeld test protocol might be too time-consuming for the Randi Foundation. So how, exactly, can a ganzfeld applicant win the prize, even if in fact the test achieves a hit rate of 27% (as opposed to the expected 25%) over more than 10,000 trials?
 
Evidence?

The fact that no reputable journal will touch Radin's "research" with a hay-fork, for a start. Radin is a card-carrying academic and knows how to work the journal system.....

But you have expressed the opinion that a ganzfeld test protocol might be too time-consuming for the Randi Foundation. So how, exactly, can a ganzfeld applicant win the prize, even if in fact the test achieves a hit rate of 27% (as opposed to the expected 25%) over more than 10,000 trials?

By designing a better protocol.
 
But you have expressed the opinion that a ganzfeld test protocol might be too time-consuming for the Randi Foundation. So how, exactly, can a ganzfeld applicant win the prize, even if in fact the test achieves a hit rate of 27% (as opposed to the expected 25%) over more than 10,000 trials?

If someone truly wishes to challenge with a ganzfeld experiment, I'm willing to volunteer to write a computer program to allow the experiment to proceed without active supervision. That should reduce the work on the volunteers enough to make supervision doable.

I'm presuming, of course, that Randi is willing to accept having participants observed by camera in separate locked rooms. (The participants having been searched for phones, and with people occasionally peeking in.)

Cheers,
Ben
 
For this second option, say you started with an arbitrary 1/5 chance of correctly guessing the first condition - how could you ammend the below formula to take this into account? Can you do it?

http://www.ds.unifi.it/VL/VL_EN/urn/urn6.html

latex.php
.

You can do it. But it gets messy.

If you want to assume that the person just has 1/5 of getting the first, and the rest are chance, then you have 2 cases. The first, has 20% likelyhood, which is that the first is right. And then you just apply the formula for 6 random choices to the rest.

The remaining possibility is that the person has the first wrong. In that case their answer will be a permutation of 7 that definitely has something else map onto the first which maps onto a third thing. Let's "patch out" the first element by saying that whatever maps onto it maps on to what it maps onto instead. This gives you a completely random permutation of 6 things. Applying the formula for 6 random choices to this random permutation of 6 you can find the probability of that permutation having 0, 1, 2, etc right. And now you can figure out the odds of getting various answers with your original 7. For instance in solutions with 2 right out of 6, there are 2/3 that the original had 2 right out of 6, and 1/3 of having 1 right out of 6. (Was the first element "patched in" to the mapping of something onto itself, making a previously right answer wrong?)

It is ugly, but you can do the calculation. By hand even.

For a more complete analysis, http://www.csicop.org/specialarticles/natasha2.html points you to Diaconis, P., & Holmes, S. (2002). A Bayesian peek into Feller Volume I. Indian Journal of Statistics, 64, 820-841. I haven't read that article, but Persi Diaconis is a well-known probability theorist, and I am confident that he is right.

Incidentally I don't know what scenario that article analyzed. But he says that he originally gave the odds as 99-1 against. After seeing 4 the odds were still 9-1 against. Working Bayes' Theorem backwards and knowing the probabilities for the chance hypothesis, his alternate scenario said that her odds of getting 4 right were 15%. He said that if she'd gotten 5 right she would have been up to being 1-1, which means that under that scenario her odds of getting 5 right were about 40%. So while we don't know exactly how that scenario was defined, he must analyzing a hypothesis close to the one she claimed - which is that she looks at a person and she's usually right.

Cheers,
Ben
 
If you want to assume that the person just has 1/5 of getting the first, and the rest are chance, then you have 2 cases. The first, has 20% likelyhood, which is that the first is right. And then you just apply the formula for 6 random choices to the rest.

My idea was to try to factor in the increased ability to "guess" due to cold-reading techniques for every condition...so say the first guess was only 1/5 rather than 1/7 then you could give the girl a 40% better than pure chance - and then translate that throughout the other guesses - ie 1/6 becomes 7/30, 1/5 becomes 7/25 etc.....which would give P(k=7) at around 0.0015. Which (if you can alter all the odds that way, hmmm i think you can :) ) shows quite clearly how rapidly the odds come down if you apply a (not unreasonable) inference of cold-reading expertise....


i found the article referenced at the bottom of csicop pretty informative as an intro to the problems of Null hypothesis testing....http://htpprints.yorku.ca/archive/00000234/01/03denis.html

In a coin-flip paradigm, let the null hypothesis be that the coin is fair. After many trials, if indeed the null is rejected, what shall we infer? Assuming we have an alternative conceptual hypothesis, this necessarily implies a statistical alternative hypothesis. Although the statistical hypothesis may be easily inferred (upon rejection of the null), this does not necessarily suggest the conceptual can be inferred with equal ease. If p < .05, we will reject the null and infer the statistical alternative. This follows from the Neyman-Pearson model of hypothesis-testing. However, we may not be so willing to infer the conceptual alternative, especially if it is something that is not a plausible explanation for why the null was rejected. For instance, we would hardly infer an alternative such as the coin is governed by spiritual agents (given say, many consecutive heads) that are invisible in this room. This conceptual alternative is unlikely in that it would probably not be suggested by a social scientist. The scientist would more likely infer something of the nature that the coin is biased, due to a physical defect. This would be a more “common-sense” alternative to chance factors having produced the successive heads. What is crucial to note is that the alternative could comprise of almost anything and is not restricted to a particular number of hypotheses. Indeed, the logical possibility of alternative hypotheses is practically infinite. The point at hand is that although the “spiritual agents” hypothesis would likely receive little attention, it cannot be refuted based solely on the logic of hypothesis-testing, and certainly not by any statistical procedure. Simply because we have inferred a statistical alternative in no way guarantees that we have inferred the correct conceptual alternative. Indeed, whether the statistical alternative implies any justification for the conceptual alternative is debatable.
snip

What should be concluded from the above discussion? Is it that we should not even attempt inferences of alternative hypotheses? Certainly not. What is to be noted by this short exposition is to recognize the “leap” made when inferring conceptual alternative hypotheses, and to be intimately aware of it. Too many times the statistical and conceptual alternatives are conflated, and the latter is assumed to be correct based merely on the “statistical truth” of the former. The primary goal of this note was to clarify that these two hypotheses cannot and should not be equated. To do so constitutes a methodological error. Furthermore, as demonstrated by the above examples, construing the alternative requires rigid isolation of experimental variables, and even then it is difficult to conclude that the correct alternative hypothesis (out of a presumably infinite supply) has been selected.
 
Last edited:
If someone truly wishes to challenge with a ganzfeld experiment, I'm willing to volunteer to write a computer program to allow the experiment to proceed without active supervision. That should reduce the work on the volunteers enough to make supervision doable.

I'm presuming, of course, that Randi is willing to accept having participants observed by camera in separate locked rooms. (The participants having been searched for phones, and with people occasionally peeking in.)

Cheers,
Ben
I doubt if Randi will accept anything less than active supervision, but you might e-mail him.
 
Can someone point me at a website that has a good explanation, in layman's terms, of why statistical significance is important in an experiment. I'd like to read up a bit on what makes a scientific result statistically significant, and what expressions of error mean (the term escapes me right now, but I'm talking about what is expressed as a +/- range of accuracy in experimental results).

Best statistical manuals written yet:

Snedecor, George W. and Cochran, William G. (1989), Statistical Methods, Eighth Edition, Iowa State University Press.

Cochran, Wm. G. and Cox, Gertrude, Experimental Design, 2nd Ed., Wiley Fisher

Check 'em out. Add 'em to your library.
 
this is in part i problem of language, statistians know what they mean but it often sounds like "rejecting the null" implies the claim that the null is "wrong". it simply does not mean this.
So you are drawing a disctinction between "I reject the null" and "I think that the null is false"?

i would think both misleading. and both likely to be misunderstood by the "woman in the street" (and most undergrate statistics majors).

(and independently: of course the null is "false"! all models are wrong: are you aiming for utility or Truth?)

while i believe there are often better ways of communicating evidence, i'd not reject this style of hypothesis testing totally (as many professionals do). nevertheless the implications are widely misunderstood (even when they are coherent).
 
My previous thought experiment seems not to have been sufficiently convincing. So here's an even better one.

An urn contains a hundred balls. Either all of them are labelled "1", or one is labelled "3" and the rest "2". These are known to be the only two possibilities. Call the second the null hypothesis.

To decide between the two hypotheses, an experiment is to be done which consists of removing one ball from the urn and looking at the number printed on it. I devise the following statistical significance test: Reject the null hypothesis if the number is odd.

If the null hypothesis is true, the probability (alpha) of rejecting it is a respectably low 1%. If it's false, I'll definitely reject it, so the test's power is as good as one could possibly hope for.

Great test, right?

I reach in and pull out a ball. Today is not my lucky day. I get the "3". The test says I should reject the null hypothesis. But I know that it's true!

Clearly, that a significance test has low alpha and high power is no guarantee that it will not lead to extremely silly inferences.

By "silly inferences," of course I do not mean reaching a conclusion which is in fact false. This cannot always be avoided, although it can be in the present experiment. Rather, I mean reaching a conclusion which, in light of the experimental result obtained, is known to be false; and, more generally, reaching a conclusion which the experimental result provides a great deal of evidence against.

As this thought experiment shows, I can conclude nothing about which hypothesis an experimental result is evidence for, from the fact that the result was improbable given the null hypothesis, if I don't also know how probable it was given the alternative hypothesis. Getting a "3" had only a 1% chance on the null hypothesis, but it was absolutely impossible on the alternative hypothesis, and 1% is still infinitely greater than zero.

By changing the numbers on the balls under the null and alternative hypotheses, I can cause the hypothesis test "reject the null hypothesis if the number is odd" to have all sorts of different values for its level of significance (alpha) and its power. And yet, it's all totally irrelevant to the proper conclusion on seeing a "3". Provided that "3" is impossible on one of the two hypotheses, the obvious conclusion is that the other hypothesis is certainly the true one.
 

Back
Top Bottom