• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

psychics

How many remote viewing and ganzfeld experiments have skeptics like Wiseman attempted to replicate? I believe that the overwhelming majority of such experiments have been undertaken by parapsychologists who are open to the idea that psi is real. I know that Wiseman did undertake a staring experiment, but has he undertaken any remote viewing and ganzfeld experiments?

Wiseman, R. & Greening, E. (2002). The mind machine: A mass participation experiment into the possible existence of extrasensory perception. The British Journal of Psychology, 93, 487-499.

Smith, M. D., Wiseman, R., Machin, D., Harris, P. & Joiner, R. (1997). Luckiness, competition, and performance on a psi task. Journal of Parapsychology, 61, 33-44.

Wiseman, R., West, D., & Stemman, R. (1996). An experimental test of psychic detection. Journal of the Society for Psychical Research, 61(842), 34-45.

Many of these papers can be found on Wiseman's website:
 
Mostly from my copy of Parapsychology - Frontier Science of the Mind, J B Rhine and J G Pratt, (c) Charles C Thomas Publisher, 1957. See the Table on Page 49, for example. And subsequent readings.
Do you disagree with Honorton's evaluation, in which he found that 27 of 33 Rhine experiments produced statistically significant results? If so, what is wrong with Honorton's evaluation?
 
Wiseman, R. & Greening, E. (2002). The mind machine: A mass participation experiment into the possible existence of extrasensory perception. The British Journal of Psychology, 93, 487-499.

Smith, M. D., Wiseman, R., Machin, D., Harris, P. & Joiner, R. (1997). Luckiness, competition, and performance on a psi task. Journal of Parapsychology, 61, 33-44.

Wiseman, R., West, D., & Stemman, R. (1996). An experimental test of psychic detection. Journal of the Society for Psychical Research, 61(842), 34-45.

Many of these papers can be found on Wiseman's website:
Thanks, but I still don't see that Wiseman has undertaken his own remote viewing or ganzfeld experiments.
 
Thanks, but I still don't see that Wiseman has undertaken his own remote viewing or ganzfeld experiments.

David Marks undertook remote viewing experiments. And I think the results are very illustrative of the problems that concern skeptics.

He was unable to replicate the results of Targ and Puthoff. He tried to figure out what he was doing wrong, exploring different ways of doing the experiments, but still was unable to get results different than chance. When reviewing Targ and Puthoff's description of the experiments yet again, he noticed a small, seemingly irrelevant detail. They didn't mention that they edited the descriptions in order to remove references to previously visited locations, something he found he had to do in order to avoid giving clues to the judges as to which description matched up with which location. When he looked into this in further detail, he discovered that not only were clues not edited out of the descriptions, but that the judges were given the list of places visited in the order in which they were visited. With those two pieces of information, it was relatively easy to make matches. When Marks took the descriptions from Targ and Puthoff's data, removed those pieces of information and had them re-judged, then the matches were consistent with chance.

Now, this was a huge error. It completely negated what appeared to be a p-value of one-in-a-million. And not only did Targ and Puthoff miss this error, they refused to release any more of their data to Marks in order to discover whether this same error poisoned all of their data.

This is an example of why skeptics are unimpressed by claims coming from believers that their methods are rigorous and unbiased and all possible means of normal information transfer have been eliminated. Add to that reports like that made by Susan Blackmore, where errors were made in following protocols that were not mentioned in the published research, and it would be foolish to assume that just because parapsychologists say their research is sound, that it actually is.

Parapsychologists are not alone in this. This is an issue in any field of research. It's just that replication by people who are open to the possibility of error weeds out these flaws. And until that has been done, history tells us that we shouldn't take any of it at face value. The same claims in the past, in the field of parapsychology, (experiment X proves psi) have not held up.

Linda
 
Last edited:
David Marks undertook remote viewing experiments. And I think the results are very illustrative of the problems that concern skeptics.

He was unable to replicate the results of Targ and Puthoff . . .
I've read Marks' critique of the early Puthoff-Targ remote viewing experiments, and I agree that he raised some valid points. Did he also critique the SAIC remote viewing experiments? In any event, I don't see where he has ever undertaken his own ganzfeld experiments or even critiqued the most recent and tightly-controlled ganzfeld experiments. For that matter, has any skeptic critiqued the findings in the September 2001 article Updating the Ganzfeld Database: A Victim of Its Own Success?
 
I've read Marks' critique of the early Puthoff-Targ remote viewing experiments, and I agree that he raised some valid points. Did he also critique the SAIC remote viewing experiments? In any event, I don't see where he has ever undertaken his own ganzfeld experiments or even critiqued the most recent and tightly-controlled ganzfeld experiments. For that matter, has any skeptic critiqued the findings in the September 2001 article Updating the Ganzfeld Database: A Victim of Its Own Success?

What's to critique? They show 40 studies with results that you'd expect to see due to chance with a little bit of bias thrown in (bias that parapsychologists do not deny is present).

Linda
 
What's to critique? They show 40 studies with results that you'd expect to see due to chance with a little bit of bias thrown in (bias that parapsychologists do not deny is present).

Linda
The 40 studies show an average hit rate of 30.1%, whereas a hit rate of 25% would be expected by chance. That 30.1% hit rate is statistically significant at the 0.48% level. Further, only 29 of these 40 studies followed the standard ganzfeld protocol (the other 11 studies were exploratory, and so a lower hit rate would be expected.) When those 29 studies were isolated, the hit rate was 31.2%, which is statistically significant at the 0.02% level.
 
The 40 studies show an average hit rate of 30.1%, whereas a hit rate of 25% would be expected by chance. That 30.1% hit rate is statistically significant at the 0.48% level. Further, only 29 of these 40 studies followed the standard ganzfeld protocol (the other 11 studies were exploratory, and so a lower hit rate would be expected.) When those 29 studies were isolated, the hit rate was 31.2%, which is statistically significant at the 0.02% level.

You are talking about the result of combining the studies. The studies themselves show a mix of results - some with effects larger than 25%, some with effects smaller than 25%. The bulk of the results are within 2 standard deviations above and below, with a handful of larger effects, above and below. It basically follows the pattern of what you'd expect to see due to chance. Throw in a little 'publication' (I put that in quotes because some of these studies aren't published in the usual meaning of the word) bias and the results are indistinguishable from chance. It's hard to convince others that the effect is even present if it cannot be reliably duplicated, but even more importantly, it makes it difficult to study the effect. You don't know whether your attempt to transmit music was negative because music isn't amenable to psi or because you lost the coin toss this time 'round.

Maybe something can be salvaged by combining the studies. And if you add them all together, you can demonstrate a small effect that is unlikely to be due to chance. However, if you look more carefully, you discover that this is due to a single study. There is one study (Dalton 1997) in that list that is a strong outlier. Its z-score is more than 2 standard deviations larger than the next highest z-score. Its results don't fit with any of the rest of the studies, nor do its results fit with the overall pattern. And if you remove that one study from the calculation, the combined effect is no longer significant - the combined z-score is 1.79 with a p-value of 7.4%. So basically, the claimed 'success' of which psi is a victim is based on a single study as opposed to the other 39. And this isn't a study which has been published or subject to peer-review (for what it's worth) or scrutiny because it was simply a paper presented at a conference. You'd think the single most important paper in the recent history of the ganzfeld would deserve better than that.

Then we look at the effect of exploration vs. replication. The studies were ranked according replicability and then combined. And lo and behold, those studies that conformed to prior studies showed a statistically significant effect. But the variable "adheres to standard Ganzfeld protocol" could be formed in a variety of different ways. The authors 'happened' to divide the studies into those which fell above the mid-point of the scale and those that fell below. But they could have divided them up in different ways, such as using the median, the average, weighting the z-score with the standardness, etc. And how you choose to do this makes a difference. For example, if you take those studies which are average/above average in standardness, the combined z-score is no longer statistically significant. The cut-off that the authors chose just happens to be the one that maximizes statistical significance.

An effect which is absent when conservative assumptions are made, and whose presence can only be achieved through specific post hoc regrouping of the data is not only unconvincing, but is completely inadequate as the basis of exploratory research. Parapsychologists are doing themselves a grave disservice by promoting the ganzfeld as a method by which to demonstrate psi. It's like promoting shoe size as a way to measure intelligence and then using that measure to explore various relationships between social factors and intelligence.

Linda
 
And if you add them all together, you can demonstrate a small effect that is unlikely to be due to chance.
No, you discover a large effect that is unlikely to be due to chance. ;)

However, if you look more carefully, you discover that this is due to a single study.
Even without that study, the hit rate is well above chance, at 28.7%.

There is one study (Dalton 1997) in that list that is a strong outlier.
What about the included studies that had negative z scores? For example a 1993 study by Kanthamani and Palmer had a z score of -2.17 and a hit rate of 9.1%, and a 1994 study by Williams had a z score of -2.30 and a hit rate of 11.9%. Were those the properly done studies and the Dalton study the improper one?

Then we look at the effect of exploration vs. replication. The studies were ranked according replicability and then combined. And lo and behold, those studies that conformed to prior studies showed a statistically significant effect. But the variable "adheres to standard Ganzfeld protocol" could be formed in a variety of different ways. The authors 'happened' to divide the studies into those which fell above the mid-point of the scale and those that fell below. But they could have divided them up in different ways, such as using the median, the average, weighting the z-score with the standardness, etc. And how you choose to do this makes a difference. For example, if you take those studies which are average/above average in standardness, the combined z-score is no longer statistically significant. The cut-off that the authors chose just happens to be the one that maximizes statistical significance.
Have you done alternative calculations? In any event, I believe that the authors relied on three independent raters and conventional statistical theory to make their determinations. However, since Beth is a professional statistician, I'll solicit her assistance and see what her opinion is on the authors' methodology.
 
No, you discover a large effect that is unlikely to be due to chance. ;)

According to Cohen (Statistical Power Analysis for the Behavioral Sciences), the effect size (represented by 'h') for that differences is 0.112, with h=0.2 a small effect, h=0.5 a medium effect, and h=0.8 a large effect.

Even without that study, the hit rate is well above chance, at 28.7%.

But we don't actually expect the average to be exactly 25%. We just expect it to be within a few percentage points, and 28.7% falls within the range we expect. And we already know that there is some publication bias (we have been told about unpublished negative studies), so we really should expect the average to be higher than 25% anyway.

What about the included studies that had negative z scores? For example a 1993 study by Kanthamani and Palmer had a z score of -2.17 and a hit rate of 9.1%, and a 1994 study by Williams had a z score of -2.30 and a hit rate of 11.9%. Were those the properly done studies and the Dalton study the improper one?

Those studies all fall within the main cluster on a forest plot or a funnel plot. The Dalton study is barely even on the same page.

Have you done alternative calculations? In any event, I believe that the authors relied on three independent raters and conventional statistical theory to make their determinations. However, since Beth is a professional statistician, I'll solicit her assistance and see what her opinion is on the authors' methodology.

I didn't challenge their ranking of the standardness of the studies. I just noted that there were several obvious ways that the studies could be divided into 'replication' or 'exploration'. The method that they chose isn't even the most reasonable or obvious, since it results in uneven groups and is skewed. I did perform alternative calculations. For example, dividing the studies into 'above average standardness' and 'below average standardness' led to a p-value of 0.1 for the studies that were most like the original ganzfeld. None of the other calculations I tried, such as using the median, or using those studies ranked higher than 6 resulted in a p-value as strong as the one they chose. Putting that all together suggests that they tried many different ways of analyzing the data and then picked only those that supported their conclusion to include in their report. This is (unfortunately) not an uncommon technique, but it does violate the assumptions of hypothesis testing and alter what conclusions can be drawn from the p-values.

Linda
 
Rodney,

Your pseudo-scientific blathering has brought you up against a real scientist. If I were you, I would beat a silent and hasty retreat before Linda makes you look sillier than you already do. You have been grasping at statistical straws here for months, and have repeatedly had your Root Beer spilled in your lap. If I were you, I'd take a break and change your pants.
 
According to Cohen (Statistical Power Analysis for the Behavioral Sciences), the effect size (represented by 'h') for that differences is 0.112, with h=0.2 a small effect, h=0.5 a medium effect, and h=0.8 a large effect.
Fine. I was referring to the unlikelihood of obtaining the overall results by chance; i.e., the results were significant not only at the 5% confidence level, but at the 0.5% level, according to the authors.

But we don't actually expect the average to be exactly 25%. We just expect it to be within a few percentage points, and 28.7% falls within the range we expect.
Beth can correct me if I'm wrong, but I believe that obtaining 440 hits in 1533 trials, which -- if you add up the numbers-- was the number of hits and trials in the 39 experiments excluding the 1997 Dalton experiment, is statistically significant.

And we already know that there is some publication bias (we have been told about unpublished negative studies), so we really should expect the average to be higher than 25% anyway.
Do you have (non-anecdotal) evidence for unpublished negative ganzfeld studies?

Those studies all fall within the main cluster on a forest plot or a funnel plot. The Dalton study is barely even on the same page.
Yes, but you can't automatically assume it was done improperly. Human performance is highly variable -- check out athletic performance and you'll find it varies dramatically from day to day.

I didn't challenge their ranking of the standardness of the studies. I just noted that there were several obvious ways that the studies could be divided into 'replication' or 'exploration'. The method that they chose isn't even the most reasonable or obvious, since it results in uneven groups and is skewed. I did perform alternative calculations. For example, dividing the studies into 'above average standardness' and 'below average standardness' led to a p-value of 0.1 for the studies that were most like the original ganzfeld. None of the other calculations I tried, such as using the median, or using those studies ranked higher than 6 resulted in a p-value as strong as the one they chose.
The question is whether what the authors did was biased. I'll let Beth weigh in on that.

Putting that all together suggests that they tried many different ways of analyzing the data and then picked only those that supported their conclusion to include in their report.
How can you possibly know that?

This is (unfortunately) not an uncommon technique, but it does violate the assumptions of hypothesis testing and alter what conclusions can be drawn from the p-values.
Again, I'll let Beth weigh in on whether what the authors did was biased.
 
Rodney,

Your pseudo-scientific blathering has brought you up against a real scientist. If I were you, I would beat a silent and hasty retreat before Linda makes you look sillier than you already do. You have been grasping at statistical straws here for months, and have repeatedly had your Root Beer spilled in your lap. If I were you, I'd take a break and change your pants.
Thanks for your insightful comment. Now let mommy have her computer back. ;)
 
Last edited:
I am no expert by any stretch of the imagination but I am having trouble with the definition of "chance" in this case. If we were talking about flipping a coin I would have no trouble, but when someone says bridge and it can be judged to be a hit for a duck or a boat, then I can't help but think one must redefine beating random occurance. In this case it seems a random word generator would beat "chance" as well as these researchers claim.
 
Fine. I was referring to the unlikelihood of obtaining the overall results by chance; i.e., the results were significant not only at the 5% confidence level, but at the 0.5% level, according to the authors.

It is a common error to mistake the p-value for the strength of the effect. However, I have pointed this out to you several times now so you don't really have an excuse for doing it again.

The analysis that I was referring to with my comment did not have results that were significant at the 5% confidence level (p=.074)

Beth can correct me if I'm wrong, but I believe that obtaining 440 hits in 1533 trials, which -- if you add up the numbers-- was the number of hits and trials in the 39 experiments excluding the 1997 Dalton experiment, is statistically significant.

The pre-selected method of analysis was to combine the z-scores. Why did you change to a different method of analysis?

Do you have (non-anecdotal) evidence for unpublished negative ganzfeld studies?

Yes.

Yes, but you can't automatically assume it was done improperly. Human performance is highly variable -- check out athletic performance and you'll find it varies dramatically from day to day.

This is the equivalent of somebody running a 2 minute mile.

I'm not assuming the study was improperly done (although it would be foolish not to consider that one of the possibilities). I'm saying that it doesn't look like it belongs to this particular database. Maybe it was a 1000m race.

The question is whether what the authors did was biased. I'll let Beth weigh in on that.

How can you possibly know that?

I don't know. I'm merely suspicious - like when my 8-year-old daughter shows up with paint all over her hands and tells me that the dog needs a bath.

The point is that researchers generally seriously consider that their conclusions may be wrong. Which means that they go looking for reasons to distrust the results, like determining how sensitive the results are to the assumptions they make. Or recognize that caution (if not outright dismissal) is in order when dealing with such a heterogeneous group of results as is found here. It bothers me that there is not even a hint of this consideration within the article.

Again, I'll let Beth weigh in on whether what the authors did was biased.

I'll be bold and state that progress has depended upon the serious consideration we may be wrong.

Linda
 
It is a common error to mistake the p-value for the strength of the effect. However, I have pointed this out to you several times now so you don't really have an excuse for doing it again.

The analysis that I was referring to with my comment did not have results that were significant at the 5% confidence level (p=.074)
Okay, I'll make you happy by retracting my comment about a "large effect" as technically inaccurate, but why the focus on the strength of the effect if the likelihood of the results occurring by chance is only .48%? (I understand that you disagree with that percentage, but p=.0048 was the figure used by the authors, following the same methodology used by Milton and Wiseman in their previous article.)

The pre-selected method of analysis was to combine the z-scores. Why did you change to a different method of analysis?
Again, because I'm interested in the likelihood of the results occurring by chance. You seem to think that obtaining 440 hits in 1533 trials with a hit probability of 25% is within the range of what would be expected by chance. I don't think that's true, but I'd like to hear Beth's viewpoint.

And that evidence is . . .

This is the equivalent of somebody running a 2 minute mile.

I'm not assuming the study was improperly done (although it would be foolish not to consider that one of the possibilities). I'm saying that it doesn't look like it belongs to this particular database. Maybe it was a 1000m race.

I don't know. I'm merely suspicious - like when my 8-year-old daughter shows up with paint all over her hands and tells me that the dog needs a bath.
I would simply note that some prior ganzfeld experiments have also shown high hit rates. (I know -- you think they weren't sufficiently tightly controlled, but it's unclear to me whether that made a difference.)

The point is that researchers generally seriously consider that their conclusions may be wrong. Which means that they go looking for reasons to distrust the results, like determining how sensitive the results are to the assumptions they make. Or recognize that caution (if not outright dismissal) is in order when dealing with such a heterogeneous group of results as is found here. It bothers me that there is not even a hint of this consideration within the article.
The authors were responding to Milton's and Wiseman's prior article, which was not exactly a study in humility, either. ;)

I'll be bold and state that progress has depended upon the serious consideration we may be wrong.
Speak for yourself. (With the exception of my "large effect" comment, which was the first (minor) error I've ever made in my whole entire life.) ;)
 
Last edited:
.. the likelihood of the results occurring by chance is only .0048%? (I understand that you disagree with that percentage, but p=.0048 was the figure used by the authors, following the same methodology used by Milton and Wiseman in their previous article...
I must tell you that the difference between p=.0048 and .0048% is one hundred times wrong. Two orders of magnitude.
 

Back
Top Bottom