Psi in the Ganzfeld

Not as easy as I thought, but I've certainly made a dent. None of the choices I made in which to include/exclude are particularly controversial, and I'm sure a more skeptical person than myself could find other reasons to filter the data.

First, with regards to the pre-1983 data, I reintroduced the 50% MCE experiments, as per Radin's criteria of using only experiments with a hit/miss method of scoring. Then, following Utts' lead, I excluded Sargent's data due to issues with the protocol. I also took out Terry's 1976 work, after criticisms by Parker and Kennedy (more on which later), and also York's, since the paper did not report the results according to the primary method, but instead by a secondary scoring method.

I then reintroduced data from post-1983 which were not carried out by the five laboratories covered by Radin's work, although I removed Bierman's 1987 experiment since the experimenter in the room during the judging knew what the target was, so there was a problem with possible subliminal cueing.

Also as any paper which did not give numerical results, nor gave any description from which a reasonable estimate could be made.

To remove any problems with optimal stopping, I cut out those experiments that did not have a pre-set number of trials or did not complete the pre-set number of trials

To address the problem of inflormal experiments being published if they should happen to get good results, I also removed any experiments that were explicitly labelled pilot, or were media or classroom demonstrations, or student work, or experiments of twenty trials or fewer.

Then I took out experiments that did not use white or pink noise (ie, static) or silence as an auditory stimuli, or didn't have a random selection of targets. Finally I took out experiments that used audio targets.

79 experiments, 3960 trials, 1073 hits, 27% hit rate average, stouffer z of 2.71, or odds of 1 in 297

I could've done more, but I don't have much time to check about the effect of outliers, nor to go through each experiment and read the paper again to make sure I wasn't missing any experiments that should've been excluded. Nevertheless, I'm confident I haven't made any major gaffs.

The point of the exercise is to show how easy it is to make a perfectly sensible looking meta-analysis that agrees with your pre-existing hypothesis. This took me about an hour to do, and was largely a question of sorting the database according to z-scores, and looking at the positive experiments to see what flaws they shared. However, it wouldn't take a genius to write this up as if these criteria where decided upon before any analysis was carried out, and these were the results we got.

For the record, what really made a difference was when I noticed how many of the highest z-scores came from experiments with very few trials. Removing the shortest experiments really took a chunk off the results.

ETA: mucked about for a bit more: by excluding all experiments of thirty trials or less and then removing Dalton's 1997 work as an outlier, the Stuoffer z falls to 1.97 or about 1 in 41.
 
Last edited:
Okay, and just before the weekend, here's a brief bit about the Terry experiments I mentioned earlier. I'm quoting this chunk from my articles on the ganzfeld as an example of a flawed experiment.

Honorton, Terry, "Psi-Mediated Imagery and ideation in the ganzfeld: a confirmatory study", Research in Parapsychology 1974, 1975
"The present study is an attempt to confirm the findings of our previous investigation. Eighteen undergraduate honors students enrolled in C.H.Ts parapsychology honors seminar at St. John's University participated. The students were divided into six experimental teams, consisting of three students per team. The experimental protocol called for each team to complete ten experimental sessions with one member of the team serving as subject, another as agent, and a third as recording experimenter. Due to a hospital workers’ strike which closed the laboratory building, only one of the six teams completed all ten sessions. The other teams completed between three and seven sessions each. Twelve of the students served as subjects, completing between one and four sessions each."


It should be noted that the report details the results of only those trials where Honorton or Terry were personally observing. Thus of the 38 completed sessions, only the results from 27 were reported.

The second study was written by Terry alone.

Terry, "A multiple session ganzfeld study", Research in Parapsychology 1974, 1975
"The subjects were self-selected volunteers and the agents were either friends of the subjects or acquaintances drawn from the Maimonides staff. The experimental plan called for 10 subject-agent pairs to complete 10 sessions each, following the procedures used by Honorton and Harper. Six of the 10 pairs completed their 10 sessions, but the others did not. The data reported here include only those pairs who completed all 10 sessions."


This experiment, with its 60 sessions and its 45% hit rate, became one of the most criticised of the early ganzfeld experiments. Firstly, as with all the early Maimonides work, the randomisation process was less than optimal:

Terry, "A multiple session ganzfeld study", Research in Parapsychology 1974, 1975
"The specific target for a given session was selected randomly by shuffling a deck of 31 numbered cards corresponding to the number of target packets. Each packet contained four thematically heterogeneous slide reels. The uppermost reel of the selected packet served as target for the session. [...] At the end of the "sending period," the agent replaced the target reel in the packet and shuffled it with the three other reels in the packet."


Meanwhile Kennedy mentioned Terry’s experiment’s lack of thoroughness with publishing all of the results, even from the uncompleted series.

Kennedy, "Methodological issues in free-response psi experiments", JASPR 73, 1979
"When several subjects are tested in multiple sessions, the experimenter may, for various reasons, choose to discard the data for subjects who did not complete the intended number of sessions. This selection is not intrinsically improper, but the results for the discarded data should be reported. In ESP tests, those subjects who do poorly on the first sessions may become discouraged and drop out, while those initially doing well will finish the required number of sessions. This situation would create a biased sample. A confirmatory ganzfeld experiment (Terry and Honorton, 1976, p. 211) is a recent example in which results for the discarded data were not reported."


Honorton clearly replied in the same issue, but I do not have access to his paper, only Kennedy's reply.

Kennedy, "More on methodological issues in free-response psi experiments", JASPR 73, 1979
"Honorton comments (p. 397) that I unfairly implied that a study by Terry and Honorton (1976) was biased due to data selection. In fact, I stated that the discarded data should be reported and explained why this is so without making any implications about the outcome of that report. The hypothesis of data selection was and will remain a viable alternative to the ESP hypothesis until it is shown (empirically) that the data are not in accordance with the selection hypothesis. While the absence of a decline in the selected data is favorable to the ESP hypothesis, the most important analysis is to show that the discarded data do not have a lower scoring rate than the selected data—particularly the first few trials of the selected data."


Parker and Wiklund also had something to add to the debate:

Parker, Wiklund, "The ganzfeld experiments: towards an assessment", Journal of Parapsychology 54, 1987
"Two further Maimonides studies by Terry and Honorton in 1976 were judged to be flawed because of their procedure of eliminating each used target in the series from future selection. This could increase the chance expectation from 1/4 to 1/3 (or higher) if subjects gained knowledge of the target pool. Parker, however, calculated in a worse case analysis of the number of subjects relative to the number of target packs re-appearing, that such an effect would be negligible. Wiklund regarded it, nevertheless, as a serious methodological flaw. In addition to this, the series had other flaws concerning the randomisation of the target series and its reconstruction after viewing."

So, poor randomisation, incomplete data and no guarantee of a 25% MCE. Any thoughts, anyone?
 
79 experiments, 3960 trials, 1073 hits, 27% hit rate average, stouffer z of 2.71, or odds of 1 in 297

According to the binomial distribution, the odds of obtaining at least 1073 hits in 3960 trials with an expected hit rate of 25% is 0.118%, or about 1 in 845. But even your odds of 1 in 297 are easily significant at the relatively stringent 1% level. So how do these probabilities support your argument?
 
According to the binomial distribution, the odds of obtaining at least 1073 hits in 3960 trials with an expected hit rate of 25% is 0.118%, or about 1 in 845. But even your odds of 1 in 297 are easily significant at the relatively stringent 1% level. So how do these probabilities support your argument?

If he can get the odds down from quintillions-to-one to 300-to-one or 1000-to-one in one hour (or let's assume a few days if he's secretly been working on this all Christmas), then it seems reasonable he could reduce it further with some more time.

Additionally, he is still working under the assumption that studies that cannot be excluded for verifiable reasons, are good studies. This is not a reasonable assumption. Of course we cannot always tell if there was an error in an experiment only by looking at the available information. Many mistakes, and outright fraud, can happen without leaving any obvious clues. What we could do, perhaps, is to calculate how many studies would have to be crocked up, or how many scientists would have to be fraudsters, in order to get the result down to chance. But the assumption of a normal distribution is not reasonable.


Edit: And additionally, while Ersby has removed studies that could be suspected of having a very high 'file drawer effect', we cannot assume the 'file drawer effect' to be zero for the remaining studies. Even large studies are sometimes canceled before publication. To correct this effect, it is not enough to remove studies, but we also need to figure out how many below-chance results that would have to be added in order to get chance expectations. If this number is reasonably small, the file drawer effect can be suspected to be a major explanation.
 
Last edited:
According to the binomial distribution, the odds of obtaining at least 1073 hits in 3960 trials with an expected hit rate of 25% is 0.118%, or about 1 in 845. But even your odds of 1 in 297 are easily significant at the relatively stringent 1% level. So how do these probabilities support your argument?
If my understanding of statistics is correct, when summing the total results of numerous experiments, simply treating them as one great big experiment and working out the z-score of that is incorrect. Something called the Stouffer z is used. That's my understanding, at least.

And let me repeat, the purpose of the exercise was to demonstrate how easy it was to construct a meta-analysis to support any particular theory. It took just one hour (honest, I haven't been working on this over Christmas!) to get the results down as low as 1 in 297, and then a further hour to get it down to 1 in 41, which is no longer significant at p=0.01.

There's no doubt in my mind that someone else coming along with a pro-psi view could take my figure and re-jig them to get result that they want. And so it can go back and forth. This is why I think the total database of ganzfeld experiments is not solid enough to be considered strong proof of psi. There are simply too many issues still unanswered.
 
Additionally, he is still working under the assumption that studies that cannot be excluded for verifiable reasons, are good studies. This is not a reasonable assumption.

Absolutely. A very similar point was made way back in the Journal of Parapsychology 1986, after Hyman and Honorton examined the earliest ganzfeld experiments. After beginning by saying that he feels that Honorton wins on points, the author Christopher Scott makes some interesting points:

More seriously, I would question the value of the meta-analysis approach as a basis for psi skepticism. It is unrealistic to hope to find all the flaws in a large corpus of work by studying the published reports. The strategy would make sense only if one assumed that the reports were accurate. In my view, reporting deficiencies are easier to accept than psi. Consider two categories of error source.

Fraud. Given the existing motivation structure of parapsychology as a profession, it is reasonable to expect some fraudulent experiments. (Practising parapsychologists such as J. B. Rhine and Carl Sargent have publicly stated their belief that fraudulent experiments are not unusual in parapsychology, and of course there have been several celebrated exposures.) A fraudulent experiment will naturally be supported by a dishonest report, and Hyman's approach, being entirely based on the report, will find nothing wrong.

Self-deception. Some (perhaps many) experimenters are slipshod in their laboratory work. Some of them will tidy up the mess in writing the report. (A well-known example is provided by the Brugmans experiment; see my paper in Research in Parapsychology, 1982.) Again, Hyman will find nothing wrong."
 
Last edited:
If my understanding of statistics is correct, when summing the total results of numerous experiments, simply treating them as one great big experiment and working out the z-score of that is incorrect. Something called the Stouffer z is used. That's my understanding, at least.

I don't think that's correct. By aggregating all studies into one meta-analysis, appropriate weight can be given to each study; e.g., a study with 25 trials will have one-quarter the weight of a study with 100 trials.

And let me repeat, the purpose of the exercise was to demonstrate how easy it was to construct a meta-analysis to support any particular theory. It took just one hour (honest, I haven't been working on this over Christmas!) to get the results down as low as 1 in 297, and then a further hour to get it down to 1 in 41, which is no longer significant at p=0.01.
But your methodology doesn't make sense to me. You can't simply exclude studies on the basis that they are too small or exclude a large study as an outlier because it produced too many hits. I'll e-mail Beth, who is a Randi forum member and professional statistician, and see if she is willing to weigh in on the validity of your methodology.

There's no doubt in my mind that someone else coming along with a pro-psi view could take my figure and re-jig them to get result that they want. And so it can go back and forth. This is why I think the total database of ganzfeld experiments is not solid enough to be considered strong proof of psi. There are simply too many issues still unanswered.
I disagree but, again, let's see if Beth is willing to weigh in.
 
I'll e-mail Beth, who is a Randi forum member and professional statistician, and see if she is willing to weigh in on the validity of your methodology.

That sounds like an excellent idea. I'm always happy to be proven wrong. But don't forget to ask her about the validity of Radin's methodology, too. Let's keep this a fair playing field, after all.
 
That sounds like an excellent idea. I'm always happy to be proven wrong. But don't forget to ask her about the validity of Radin's methodology, too. Let's keep this a fair playing field, after all.
Okay, I just e-mailed her. I specifically asked her to evaluate your post #41, but hopefully she will have the time to get up to speed on the entire issue, including Radin's methodology.
 
I'm not going to try to understand Radin's methodology. Ersby's point is basically sound: you can't trust meta analysis results. There are a number of subjective decisions that have to be made regarding what studies to include and which to exclude. By adjusting the criteria, the outcome can be manipulated to provide contradictory results. Thus, meta-analysis isn't a good tool for any type of controversial conclusion.

I think it can be a useful exploratory tool for earnest investigators, but in order to feel confident in the results of any particular analysis, you would have to study not only how the analysis was done, but all of the experiments that were/might have been included. Personally, I don't have the desire to spend that kind of time on the Ganzfeld experiments.
 
Thanks for the vote of confidence, Beth. Did you have anything to say about the Stouffer z?

If I'm wrong in using it, then a lot of parapsychology authors have made the same mistake, since it was while reading parapsychological papers that I learnt about this. In fact, if you do a search on "stouffer z" on google, a lot of parapsychological papers come up. This has always bothered me - to use a statistical method which doesn't seem to be too frequently used in other scientific fields.
 
According to the binomial distribution, the odds of obtaining at least 1073 hits in 3960 trials with an expected hit rate of 25% is 0.118%, or about 1 in 845. But even your odds of 1 in 297 are easily significant at the relatively stringent 1% level. So how do these probabilities support your argument?


I am not sure of this, why is a twenty seven % anything other than random chance. Why is it significant in a statistical sense. If you toss a coin, it is only over long runs that the distribution approaches 50%.

That is why the standard deviation is an important statistic, what is the standard deviation in the Ganzfeld, and why son't people discuss it?
 
Thanks for the vote of confidence, Beth. Did you have anything to say about the Stouffer z?

If I'm wrong in using it, then a lot of parapsychology authors have made the same mistake, since it was while reading parapsychological papers that I learnt about this. In fact, if you do a search on "stouffer z" on google, a lot of parapsychological papers come up. This has always bothered me - to use a statistical method which doesn't seem to be too frequently used in other scientific fields.

I don't know about the Stouffer z. I don't recall having studied it, though some things only get a brief mention. I would think a jackknife approach would be the best to use, but as I haven't read the papers I don't know if they tried that.
 
Last edited:
I am not sure of this, why is a twenty seven % anything other than random chance. Why is it significant in a statistical sense. If you toss a coin, it is only over long runs that the distribution approaches 50%.

That is why the standard deviation is an important statistic, what is the standard deviation in the Ganzfeld, and why son't people discuss it?

It has to do with the number of trials. It's significant in a statistical sense meaning that the probability of being that far from the expected mean is lless than 5%. The standard deviation isn't given in this situation because given the probability of success and the number of trials, you can compute the standard deviation.
 
I'm not going to try to understand Radin's methodology. Ersby's point is basically sound: you can't trust meta analysis results. There are a number of subjective decisions that have to be made regarding what studies to include and which to exclude. By adjusting the criteria, the outcome can be manipulated to provide contradictory results. Thus, meta-analysis isn't a good tool for any type of controversial conclusion.

I think it can be a useful exploratory tool for earnest investigators, but in order to feel confident in the results of any particular analysis, you would have to study not only how the analysis was done, but all of the experiments that were/might have been included. Personally, I don't have the desire to spend that kind of time on the Ganzfeld experiments.
Thanks for weighing in, Beth. However, I'm a little unclear about your position on meta-analysis, and so let me ask you a specific question: Assuming that there are several experiments that use the same methodology (for example, ganzfeld experiments in which a recipient attempts to determine which of four photographs was telepathically transmitted) and that have no known bias, would it be proper or improper to aggregate these experiments into a meta-analysis?
 
According to the binomial distribution, the odds of obtaining at least 1073 hits in 3960 trials with an expected hit rate of 25% is 0.118%, or about 1 in 845.
I get 0.11836% for strictly greater than 1073 hits, but 0.13333% for at least 1073 hits.
 
If my understanding of statistics is correct, when summing the total results of numerous experiments, simply treating them as one great big experiment and working out the z-score of that is incorrect. Something called the Stouffer z is used. That's my understanding, at least.
I'd expect both to give the same answer here.

Can you give more details about how you computed the Stouffer z of 2.71?
 
I used the formula (sum of z-scores)/(sq rt of number of trials)

I hope that's right.
 
Thanks for weighing in, Beth. However, I'm a little unclear about your position on meta-analysis, and so let me ask you a specific question: Assuming that there are several experiments that use the same methodology (for example, ganzfeld experiments in which a recipient attempts to determine which of four photographs was telepathically transmitted) and that have no known bias, would it be proper or improper to aggregate these experiments into a meta-analysis?

I can't say more than maybe from this brief description. Certainly it's not necessarily a problem to do so, but I would have to read through the reports of the different experiments before I could make that assessment. The problem that I see with meta-analysis isn't so much the proper/improper aspect, but the fact that in order to properly assess one that's been done requires a huge amount of investigative work. Thus, unless you're willing to do the research - i.e. determine all possible experiments to be included, assess the inclusion criteria, review the experiments for how well they meet the criteria - you're left with trusting the folks who did it to be honest and unbiased in their assessment.

It really comes down to how much you trust in the people who did the work to have done a good job, reducing their own subjective biases to the greatest extent possible. For my part, I think meta-analysis is useful to researchers who are analyzing the data but not a good tool to convince others of a particular conclusion. There are too many subjective judgments involved in setting up the analysis that can drastically impact the results.
 
79 experiments, 3960 trials, 1073 hits, 27% hit rate average, stouffer z of 2.71, or odds of 1 in 297

I used the formula (sum of z-scores)/(sq rt of number of trials)
The denominator was the square root of 79? And the numerator was the sum of 79 z-scores, one for each of the 79 experiments? And the individual z-scores you didn't calculate yourself, but just used whatever was reported by the experimenters?

Or something else?
 

Back
Top Bottom