Psi in the Ganzfeld

The denominator was the square root of 79? And the numerator was the sum of 79 z-scores, one for each of the 79 experiments?

Yes.

And the individual z-scores you didn't calculate yourself, but just used whatever was reported by the experimenters?

Or something else?

Where possible I used reported z-scores, or sometimes calculated them myself. On those few occasions where a result was given as "at chance" or "unsuccessful", I gave it a z-score of zero.
 
I can't say more than maybe from this brief description. Certainly it's not necessarily a problem to do so, but I would have to read through the reports of the different experiments before I could make that assessment. The problem that I see with meta-analysis isn't so much the proper/improper aspect, but the fact that in order to properly assess one that's been done requires a huge amount of investigative work. Thus, unless you're willing to do the research - i.e. determine all possible experiments to be included, assess the inclusion criteria, review the experiments for how well they meet the criteria - you're left with trusting the folks who did it to be honest and unbiased in their assessment.
Okay, but is it ever proper to exclude an experiment from any type of aggregative statistical analysis solely based upon that experiment's small size? Also, is it ever proper to exclude an experiment as an "outlier" from an aggregative statistical analysis solely based upon the results being highly significant?

It really comes down to how much you trust in the people who did the work to have done a good job, reducing their own subjective biases to the greatest extent possible. For my part, I think meta-analysis is useful to researchers who are analyzing the data but not a good tool to convince others of a particular conclusion. There are too many subjective judgments involved in setting up the analysis that can drastically impact the results.
But are not subjective judgments made in all scientific research? If so, why isn't there the same skepticism about, say, cancer research findings as there is about psi research findings?
 
Okay, but is it ever proper to exclude an experiment from any type of aggregative statistical analysis solely based upon that experiment's small size?

Also, is it ever proper to exclude an experiment as an "outlier" from an aggregative statistical analysis solely based upon the results being highly significant?

Yes, you could exclude an experiment based on small sample size or because that experiment is statistically significantly different from the others. Whether or not it's "proper" depends to a very large extent on how the results are being used. Sometimes it's a reasonable decision to make for that situation and experimental data.

But are not subjective judgments made in all scientific research? If so, why isn't there the same skepticism about, say, cancer research findings as there is about psi research findings?


Yes, it's inherent to all scientific research. The same skepticism is shown to any research findings that are contrary to established dogma in any field.
 
Yes, you could exclude an experiment based on small sample size or because that experiment is statistically significantly different from the others. Whether or not it's "proper" depends to a very large extent on how the results are being used. Sometimes it's a reasonable decision to make for that situation and experimental data.
Let me give you a hypothetical: Laboratories A, B, and C each conduct eight separate experiments, with 25 trials in each experiment (total of 600 trials in 24 experiments). Laboratories D, E, and F each conduct two separate experiments, with 100 trials in each experiment (total of 600 trials in 6 experiments). Laboratories G, H, and I each conduct one experiment, with 200 trials in each experiment (total of 600 trials in 3 experiments). Protocols appear to be identical in all 33 experiments, with no obvious bias in any of them, but hit rates are higher in the experiments conducted by Laboratories A, B, and C. Are you saying that it might be proper to aggregate the 1200 trials conducted by Laboratories D, E, F, G, H, and I, but exclude the 600 trials conducted by Laboratories A, B, and C? If so, why?

Yes, it's inherent to all scientific research. The same skepticism is shown to any research findings that are contrary to established dogma in any field.
I agree with your choice of the word "dogma." ;) But why should not an equal amount of skepticism be applied to all research findings? And what statistical burden would psi research have to meet to convince you that psi is real?
 
Let me give you a hypothetical: Laboratories A, B, and C each conduct eight separate experiments, with 25 trials in each experiment (total of 600 trials in 24 experiments). Laboratories D, E, and F each conduct two separate experiments, with 100 trials in each experiment (total of 600 trials in 6 experiments). Laboratories G, H, and I each conduct one experiment, with 200 trials in each experiment (total of 600 trials in 3 experiments). Protocols appear to be identical in all 33 experiments, with no obvious bias in any of them, but hit rates are higher in the experiments conducted by Laboratories A, B, and C. Are you saying that it might be proper to aggregate the 1200 trials conducted by Laboratories D, E, F, G, H, and I, but exclude the 600 trials conducted by Laboratories A, B, and C? If so, why?
Yes, it might be proper. For example, if one effect you are concerned about is determining if there is a change between early experiments versus later ones and 25 trials is insufficient for that purpose. It all depends on what the research question of interest is and how it would be affected (or not) by the inclusion or exclusion of experiments will a smaller number of trials.
I agree with your choice of the word "dogma." ;) But why should not an equal amount of skepticism be applied to all research findings?
No reason not to. It's just that research that confirms current thinking is rarely challenged - an aspect of human nature more than anything else I imagine.

And what statistical burden would psi research have to meet to convince you that psi is real?

Good question. I don't know. It's a question I've been mulling over for over two years now. But I haven't yet arrived at a firm conclusion.
 
Are you saying that it might be proper to aggregate the 1200 trials conducted by Laboratories D, E, F, G, H, and I, but exclude the 600 trials conducted by Laboratories A, B, and C? If so, why?

I excluded short trials because I wanted to minimise the effect of informal experiments being pulished only if they were successful.

By labelling the shortest experiments A, B and C, you seem to be telling yourself that they are the earliest of the experiments. In truth, this is not so. You would have been better to label them experiments A, G and J.

If the effect of psi is geniune, then removing the smallest experiments should increase the results, since if an effect is real then the more results give the better effect. Instead, the best results are within the shortest results. This is the opposite of what you'd expect if the effect were real.
 
If the effect of psi is geniune, then removing the smallest experiments should increase the results, since if an effect is real then the more results give the better effect.

Why? If it was real, then surely we'd get the same results for any kind of test (albeit with much greater variation for small tests). But of course, even if the effect was real, there might still be a file drawer effect.

However, maybe you mean that if it was real, then surely all the large tests should give a positive result. Unless of course some researchers missed something crucial about how to do these tests, but what that would be, I think is something for the pro-PSI researchers to suggest.
 
My thinking is that for a small effect like psi, it'd only be apparent as the experiment got larger. Small experiments would vary too much, as you said. If trying to detect a slight bias in a coin, ten or twenty flips wouldn't be enough. Any differences from the 50% expected results could be chance. You'd need to do an experiment will many trials.

Parapsychologists have long since said that this raises the risk of boredom in the subject and/or experimenter, which would lessen the effect of psi.

Btw, the second paragraph in my last post doesn't seem to make any sense, and I apologise for that. I'm sure I knew what I meant at the time...
 
My thinking is that for a small effect like psi, it'd only be apparent as the experiment got larger. Small experiments would vary too much, as you said. If trying to detect a slight bias in a coin, ten or twenty flips wouldn't be enough. Any differences from the 50% expected results could be chance. You'd need to do an experiment will many trials.
Yes, but if you aggregate several small experiments, you obtain -- all things being equal -- the equivalent of one large experiment. You seem to justify excluding small experiments by speculating that only the ones that produced above the expected number of hits were published, but I fail to see any evidence supporting your speculation. Do you have any?

Parapsychologists have long since said that this raises the risk of boredom in the subject and/or experimenter, which would lessen the effect of psi.
Don't you think that's a possibility in a long experiment?

Btw, the second paragraph in my last post doesn't seem to make any sense, and I apologise for that. I'm sure I knew what I meant at the time...
I thought that paragraph exhibited the same logic as the others. ;)
 
You seem to justify excluding small experiments by speculating that only the ones that produced above the expected number of hits were published, but I fail to see any evidence supporting your speculation.

He does not assume this. He assumes that (small) studies that produced above the expected number of hits were published with a higher probability, not that these would be the only ones to be published.

Evidence supporting this speculation comes from a) the observation that such ad-hoc test often occur, with no significant results, and no articles getting published (unless someone reports it here at the JREF, perhaps) and b) the data discussed in this thread showing a higher rate of positive results for small studies than for large studies, as this speculation predicts.

Anyway, the underlying assumption that PSI effects would be small is very strange in the first place. If we had this extremely useful ability, surely evolution would have selected it to be reliable, not very close to chance expectation? Or do people believe that this is an ability that we have aquired very recently, giving no time for evolution to improve it?
 
He does not assume this. He assumes that (small) studies that produced above the expected number of hits were published with a higher probability, not that these would be the only ones to be published.

Evidence supporting this speculation comes from a) the observation that such ad-hoc test often occur, with no significant results, and no articles getting published (unless someone reports it here at the JREF, perhaps)
Whose observation? Can you document with a specific example?

and b) the data discussed in this thread showing a higher rate of positive results for small studies than for large studies, as this speculation predicts.
Or small studies may be more likely than large studies to produce positive results because of the fatigue/boredom factor in large studies.

Anyway, the underlying assumption that PSI effects would be small is very strange in the first place. If we had this extremely useful ability, surely evolution would have selected it to be reliable, not very close to chance expectation? Or do people believe that this is an ability that we have aquired very recently, giving no time for evolution to improve it?
One hypothesis is that psi used to be much stronger in early humans because it was more useful then. To the extent that we can communicate with morse code, radios, telephones, computers, etc., we don't need psi as much.
 
Whose observation? Can you document with a specific example?
Louie Savva, in the thread that started this thing, said it was common for informal experiments to be carried out, with only positive results getting any publicity. How far this applies to a cumbersome protocol like the ganzfeld is not sure, but I certainly think that this should be taken into account.

Or small studies may be more likely than large studies to produce positive results because of the fatigue/boredom factor in large studies.

Or because successful experiments are more likely to be written up.

Plus, you have to be careful not to fall into the trap of inventing post hoc reasons to explain away bad results. Parapsychologists have a whole raft of explanations they can call upon when faced with contrary results.

Of course, these same reasons are never called upon when they have good results. You say that long-term experiments suffer from boredom/fatigue? Then what of the large-scale experiments that got good results? That goes against the boredom theory, but I'm sure you're not conerned that these experiments seem to be disobeying one of the tenets of psi.

One hypothesis is that psi used to be much stronger in early humans because it was more useful then. To the extent that we can communicate with morse code, radios, telephones, computers, etc., we don't need psi as much.

I doubt that evolution has made a major difference in the last two hundred years.
 
Last edited:
If my understanding of statistics is correct, when summing the total results of numerous experiments, simply treating them as one great big experiment and working out the z-score of that is incorrect. Something called the Stouffer z is used. That's my understanding, at least.

I'd expect both to give the same answer here.
Or ... maybe not. :D

I hadn't thought about it too deeply, but I've since thought about it some more and read some more, and I've changed my mind.

If a bunch of experiments are basically the same---so that it makes any sense, to begin with, to combine their individual trials---then I think combining them is better.

Even if the experiments are different, http://www.blackwell-synergy.com/doi/pdf/10.1111/j.1420-9101.2005.00917.x says, and I agree, that Stouffer's simple method isn't as good as a similar method which weights each study differently based on, roughly speaking, its size.

Or, to look at it from the other side, Stouffer's unweighted method in effect gives too much weight to small studies and not enough to big ones.

I think (today; who knows what tomorrow may bring) that combining the trials of a bunch of similar experiments will give roughly the same answer as the weighted method rather than the unweighted one.
 
Or small studies may be more likely than large studies to produce positive results because of the fatigue/boredom factor in large studies.

First, though I don't know how these studies were made, a large study does not necessarily use the same subjects for a long period of time. Second, we could equally postulate a competing theory saying that testing the same subject for a long time may be more likely to produce positive results due to the 'adaptation' factor. However, both the 'fatigue/boredom' factor and the 'adaptation' factor are of course completely unproven. How do we know that bored people are less susceptible to PSI? They may in fact be more susceptible (perhaps because they are more relaxed).

One hypothesis is that psi used to be much stronger in early humans because it was more useful then. To the extent that we can communicate with morse code, radios, telephones, computers, etc., we don't need psi as much.

Apart from the fact that evolution does not work that quickly (especially not when removing features that are of no hindrance), I must disagree. A real PSI ability would be extremely useful in today's society. There are countless imaginable schemes where this could make an individual very wealthy, for example. Not to mention the fact that most people in need of help still don't have a cell phone handy.
 
Louie Savva, in the thread that started this thing, said it was common for informal experiments to be carried out, with only positive results getting any publicity. How far this applies to a cumbersome protocol like the ganzfeld is not sure, but I certainly think that this should be taken into account.
Louie provided anecdotes with no documentation. On other threads, Randi Forum members tear into anecdotes as being completely valueless. So why accept anecdotes here?

Of course, these same reasons are never called upon when they have good results. You say that long-term experiments suffer from boredom/fatigue? Then what of the large-scale experiments that got good results? That goes against the boredom theory, but I'm sure you're not conerned that these experiments seem to be disobeying one of the tenets of psi.
I'm not excluding large-scale experiments, but you are excluding -- with no basis that I can see -- small-scale ones. You have to explain why, when large-scale experiments are combined with small-scale ones, the results are highly statistically significant.

I doubt that evolution has made a major difference in the last two hundred years.
As far as I know, there is no good database of controlled psi experiments from 200 years ago or longer.
 
Louie provided anecdotes with no documentation. On other threads, Randi Forum members tear into anecdotes as being completely valueless. So why accept anecdotes here?

I have never said that anecdotes are completely worthless, nor do I think they are. If you disagree with other members on this point, you should discuss it with them.

I'm not excluding large-scale experiments, but you are excluding -- with no basis that I can see -- small-scale ones. You have to explain why, when large-scale experiments are combined with small-scale ones, the results are highly statistically significant.

Because the small-scale ones are the ones most likely to be informal experiments written up for publication only after the results were found to be good.

Once again, that meta-analysis I did was to prove a point - that anyone with enough time or patience can manipulate a database to their own ends. You can't keep thinking that they somehow prove something. I, using quite sensible criteria knocked the odds down to 1 in 297. Chopping of those experiments of under 30 trials was stretching things, I grant you, but not bad for an hours work.

I'll have a look at the pdf that omega-6 pointed to. See what that does.

As far as I know, there is no good database of controlled psi experiments from 200 years ago or longer.

In other words, the hypothesis you quoted earlier has no evidence to support it.
 
I have never said that anecdotes are completely worthless, nor do I think they are. If you disagree with other members on this point, you should discuss it with them.
Fair enough, but I'm not prepared to accept Louie's undocumented assertion. As I commented on the other thread, he appears to have rapidly changed from an enthusiastic parapsychologist to someone who now wants nothing to do with parapsychology. As such, I don't consider him to be an objective third party.

Because the small-scale ones are the ones most likely to be informal experiments written up for publication only after the results were found to be good.
Again, evidence is needed, not Louie's undocumented assertion.

Once again, that meta-analysis I did was to prove a point - that anyone with enough time or patience can manipulate a database to their own ends. You can't keep thinking that they somehow prove something. I, using quite sensible criteria knocked the odds down to 1 in 297. Chopping of those experiments of under 30 trials was stretching things, I grant you, but not bad for an hours work.
But you still didn't succeed in reducing the odds to a level of insignificance, and I'm not sure that your calculation employs the appropriate statistical test.

In other words, the hypothesis you quoted earlier has no evidence to support it.
No statistical evidence as far as I know, but none against it either.
 
As far as I know, there is no good database of controlled psi experiments from 200 years ago or longer.
He was making a general statement, I think, about how slowly evolution tends to operate, not claiming that he has specific evidence that it operates slowly in the particular case of psi. If it operates slowly everywhere else, then it very probably does here too.
 
As I commented on the other thread, he appears to have rapidly changed from an enthusiastic parapsychologist to someone who now wants nothing to do with parapsychology. As such, I don't consider him to be an objective third party.
I don't understand. What indicates to you that he's not objective? That he no longer believes there's anything to parapsychology? Isn't he in a position to know, considering that he had been working in the field?

What does "objective" mean, anyway? Undecided? Or right?

Again, evidence is needed, not Louie's undocumented assertion.
Huh? We're talking about the possibility of unsuccessful experiments going unpublished. What would you consider sufficient evidence? A published report saying, "We ran this experiment, here are the details, but it was unsuccessful, so we decided not to publish ... oops, too late." ?
 
And what statistical burden would psi research have to meet to convince you that psi is real?
Good question. Here's what I think the general approach should be.

State a theory of psi which makes specific enough predictions that the results of an experiment could be inconsistent with chance and yet also be inconsistent with your theory. Then do the experiment, and see whether the results not only are inconsistent with chance but also are consistent with your theory. The more specific are the predictions that your theory makes, the stronger is the support that an experiment provides for your theory should the experimental results turn out to be consistent with the predictions.

In any event, a theory that says, "funny things happen sometimes, we don't know what or when" is not a very useful theory, well-suppported or not. What can one do with such a theory? What difference does it make if I believe it or not? I'm perfectly happy to admit that funny things happen sometimes. Now what?
 

Back
Top Bottom