Chris Carter Exposes Richard Wiseman Misdeeds

alex.tsakiris · Oct 30, 2010

(Hi from your friends at www.skeptiko.com

)

JSPR has published a penetrating critique by Chris Carter of Richard Wiseman's latest attack on parapsychology - his 'Heads I win, tails you lose' paper published in Skeptical Inquirer. Chris Carter's reply:
http://www.skepticalinvestigations.org/Mediaskeptics/Carter_Wiseman.pdf

Ersby · Oct 30, 2010

This was mentioned on Rational Skepticism too, so I'll just post what I said there:

Chris Carter's paper is a bit strange. Instead of rebutting Wiseman's claims of what parapsychologists do to maintain significant results, he simply says "Well, you do it too!" which is fine as a playground squabble, but leaves the initial claims untouched.

He's misleading on a few things. He says Wiseman doesn't offer a shred of evidence regarding cherry-picking new methods, but in Wiseman's paper, he references a paper by Caroline Watt. So that's a shred of evidence, no?

Regarding the experimenter effect, Carter only references the Wiseman/Schlitz work and doesn’t even mention the follow-up paper in which the effect no longer seemed to occur (Schlitz, Wiseman et al, 2006) though I’d expect he’s heard of it.

Regarding the debate about Jay-tee, I’d say that Wiseman’s position isn’t very strong, and I do wonder what he was expecting to prove with just four trials. (I could hazard a guess, though)

The stuff Chris Carter writes on meta-analyses is just plain wrong. He says M&W used a statistical measure that didn’t take sample size into account, but Milton & Wiseman used z-scores (which includes the standard deviation which is linked to sample size) and the weighted z (or Effect Size ES – my least favourite statistical measure) which is the z-score divided by the square root of the number of trials. So it is taken into account, although a binomial distribution would’ve been better. (I should also point out that in the past Radin, Utts and the most recent meta-analysis by Storm et al have also used Effect Size ES on individual experiments, but he doesn't seem to be complaining about them!)

Meanwhile, he mentions that Dalton’s work was published two years before M&W’s meta-analysis was published. This isn’t quite true. It was presented at a PA Convention (it’s never been published in a peer-reviewed journal) in 1997, and so was Milton and Wiseman’s meta-analysis. Plus, the pdf of the PA article of M&W’s work states it was received in June 1997. (Of course, that leads into debates about the haste with which they went to print... etc etc.)

I find Chris Carter quite a tedious writer. He presents such an incomplete picture of the debate that it’s depressing just to wade through it all.

Garrette · Oct 30, 2010

I'm glad this thread is here. I read this article by Carter, though I have yet to go back to read Wiseman's original, and was wondering how much merit was in it. As I'm not by the slightest stretch a statistician, I am unable to dissect it as Ersby has begun.

That said, the comments on Sheldrake and Jaytee did seem to have some merit--not that the original experiments showed real significance but that Wiseman's replication was not properly handled/discussed.

fls · Oct 30, 2010

alex.tsakiris said:
(Hi from your friends at www.skeptiko.com )

JSPR has published a penetrating critique by Chris Carter of Richard Wiseman's latest attack on parapsychology - his 'Heads I win, tails you lose' paper published in Skeptical Inquirer. Chris Carter's reply:
http://www.skepticalinvestigations.org/Mediaskeptics/Carter_Wiseman.pdf

That's disappointing. Many of the claims are invalid or are directed at criticizing Wiseman, rather than addressing Wiseman's criticisms.

The issue about unpublished results (and the recent discussion of Bem's paper shows how extensive the 'exploratory' null results might be) is dismissed by quoting the Fail Safe N, which Carter must know by now is not valid. After all Scargle published the work showing the Fail Safe N to be a gross overestimation in almost all cases in a parapsychology journal (The Journal of Scientific Exploration) over ten years ago.

Carter continues to support the nonsensical idea that statistics based on chance are relevant to our expectations. Who cares whether Nadia performed in a way which would be unexpected if it were due to chance? Who would seriously propose that matching a person to a condition under non-blinded conditions wouldn't involve the use of clues? Or Sheldrake's experiments which were analyzed using the assumption that the dog's behaviour when Pam wasn't returning home was random, when their own observations showed it wasn't - his baseline behaviour matched the pattern they were searching for regardless of the movements of his owner.

The comments on the ganzfeld metanalysis are almost bizarre, as Wiseman simply used the same method of combining studies as had been used in prior metanalyses by parapsychologists (and continues to be used by them). Plus the measure does have some input from sample size. And the practise of combining all the hits and misses as one big study is an invalid technique anyway (the only way it could be treated as valid is if psi doesn't exist), especially since we have discovered that the base rate probably differs from study to study (per prior discussions).

And then we get to the bizarre conclusion at the end which seems to consist of "Wiseman is a jerk therefore psi is real".

Who exactly is this supposed to persuade?

Linda

Rodney · Oct 30, 2010

Ersby said:
The stuff Chris Carter writes on meta-analyses is just plain wrong. He says M&W used a statistical measure that didn’t take sample size into account, but Milton & Wiseman used z-scores (which includes the standard deviation which is linked to sample size) and the weighted z (or Effect Size ES – my least favourite statistical measure) which is the z-score divided by the square root of the number of trials. So it is taken into account, although a binomial distribution would’ve been better. (I should also point out that in the past Radin, Utts and the most recent meta-analysis by Storm et al have also used Effect Size ES on individual experiments, but he doesn't seem to be complaining about them!)

Maybe that's because the latter knew what they were doing? See http://www.internationalskeptics.com/forums/showpost.php?p=4398145&postcount=21

fls · Oct 30, 2010

Garrette said:
I'm glad this thread is here. I read this article by Carter, though I have yet to go back to read Wiseman's original, and was wondering how much merit was in it. As I'm not by the slightest stretch a statistician, I am unable to dissect it as Ersby has begun.

That said, the comments on Sheldrake and Jaytee did seem to have some merit--not that the original experiments showed real significance but that Wiseman's replication was not properly handled/discussed.

I have picked through Sheldrake's paper and Wiseman's paper, and there doesn't seem to be much merit in these criticisms. For one thing, the characterization of Wiseman's criteria is grossly misleading, and if they (Carter and Sheldrake) believe their characterizations, it shows a surprising ignorance of validity and reliability in the choice of outcome measures.

I can link to some threads where this was discussed.

http://www.internationalskeptics.com/forums/showthread.php?postid=5379381#post5379381

Here is one link, but I'm not sure how useful it will be since apparently you read it already.

Linda

fls · Oct 30, 2010

Rodney said:
Maybe that's because the latter knew what they were doing? See http://www.internationalskeptics.com/forums/showpost.php?p=4398145&postcount=21

Yeah, I remember in a prior discussion with you that the there were differences between my calculations and the numbers listed in these metanalyses (not just Wiseman's, as the same numbers were used in other analyses). I suggested we shouldn't trust the numbers as written without investigating them. But I haven't bothered since then as we can't draw any valid conclusions about whether psi exists from those studies anyway. Not that I seem to be averse to pointless statistical analyses.

Linda

alex.tsakiris · Oct 30, 2010

Ersby said:
This was mentioned on Rational Skepticism too, so I'll just post what I said there:

Chris Carter's paper is a bit strange. Instead of rebutting Wiseman's claims of what parapsychologists do to maintain significant results, he simply says "Well, you do it too!" which is fine as a playground squabble, but leaves the initial claims untouched.

He's misleading on a few things. He says Wiseman doesn't offer a shred of evidence regarding cherry-picking new methods, but in Wiseman's paper, he references a paper by Caroline Watt. So that's a shred of evidence, no?

Regarding the experimenter effect, Carter only references the Wiseman/Schlitz work and doesn’t even mention the follow-up paper in which the effect no longer seemed to occur (Schlitz, Wiseman et al, 2006) though I’d expect he’s heard of it.

Regarding the debate about Jay-tee, I’d say that Wiseman’s position isn’t very strong, and I do wonder what he was expecting to prove with just four trials. (I could hazard a guess, though)

The stuff Chris Carter writes on meta-analyses is just plain wrong. He says M&W used a statistical measure that didn’t take sample size into account, but Milton & Wiseman used z-scores (which includes the standard deviation which is linked to sample size) and the weighted z (or Effect Size ES – my least favourite statistical measure) which is the z-score divided by the square root of the number of trials. So it is taken into account, although a binomial distribution would’ve been better. (I should also point out that in the past Radin, Utts and the most recent meta-analysis by Storm et al have also used Effect Size ES on individual experiments, but he doesn't seem to be complaining about them!)

Meanwhile, he mentions that Dalton’s work was published two years before M&W’s meta-analysis was published. This isn’t quite true. It was presented at a PA Convention (it’s never been published in a peer-reviewed journal) in 1997, and so was Milton and Wiseman’s meta-analysis. Plus, the pdf of the PA article of M&W’s work states it was received in June 1997. (Of course, that leads into debates about the haste with which they went to print... etc etc.)

I find Chris Carter quite a tedious writer. He presents such an incomplete picture of the debate that it’s depressing just to wade through it all.

Good starting points for a discussion. I'm going to email Chris Carter and see if he will weigh-in.

Ersby · Oct 30, 2010

Rodney said:
Maybe that's because the latter knew what they were doing? See http://www.internationalskeptics.com/forums/showpost.php?p=4398145&postcount=21

Or perhaps they didn't? http://www.internationalskeptics.com/forums/showthread.php?p=5713571 (start at post 116 to watch Storm et al's work unravel)

As far as I can tell: Wiseman and Milton calculated the stouffer z for each experiment, and then calculated another stouffer z for all the individual stouffer zs. Usually people do a stouffer z of individual z-scores.

alex.tsakiris · Oct 30, 2010

fls said:
I have picked through Sheldrake's paper and Wiseman's paper, and there doesn't seem to be much merit in these criticisms.

You better explain since your opinion seems to be in the minority... even among skepitcs.

fls · Oct 30, 2010

alex.tsakiris said:
You better explain since your opinion seems to be in the minority... even among skepitcs.

I edited the post and added a link to the thread in which I discuss this in more detail.

Also, since I'm not sure that I explained it all that well, I will elaborate on this again.

Linda

Rodney · Oct 30, 2010

Ersby said:
Or perhaps they didn't? http://www.internationalskeptics.com/forums/showthread.php?p=5713571 (start at post 116 to watch Storm et al's work unravel)

As far as I can tell: Wiseman and Milton calculated the stouffer z for each experiment, and then calculated another stouffer z for all the individual stouffer zs. Usually people do a stouffer z of individual z-scores.

But all Wiseman's and Milton's negative Stouffer Z figures were wrong, and I demonstrated that, using the binomial distribution for the studies that they analyzed, the results were statistically significant.

Ersby · Oct 30, 2010

Their figures weren't wrong, given the statistical measure they'd chosen. But the statistical measure, it could be argued, was wrong.

EDIT: wait a sec... I remember now. There were some results that looked odd. I'll have to check in a couple of days. Not at home at the mo'.

Paul C. Anagnostopoulos · Oct 30, 2010

So here's something I think about frequently: Is statistics really this difficult? Or are there "master statisticians" who could take a look at a psi experiment or group of experiments and pass judgment on the statistical methods chosen? Masters who the rest of the community would trust?

Because if not, I can't help but be reminded of this:

http://lesswrong.com/lw/1ib/parapsychology_the_control_group_for_science/

~~ Paul

fls · Oct 30, 2010

Wiseman's selection of criteria:

The initial observation was that Pam's parents could tell when Pam was returning because of the way her dog behaved. This observation will be complicated by the fact that this occurs in the presence of knowledge about when Pam is expected to return and knowledge of when she has returned. Also, as the videotapes showed, there is wide variation in the behaviour of the dog depending upon time of day, which house he is in, how busy it is inside and outside the house, people coming and going to the door, some knowledge of Pam's routines, etc.

I'm a physician so I am familiar with looking for patterns in signs and symptoms in order to form diagnoses. And it is typical to start with the open-ended approach Sheldrake uses whereby you look for the pattern you expect to see, and if it is present, consider that your diagnosis is correct. This isn't a particularly relaible method, however. Even if you are careful about not letting your biases creep into the gathering of information, it turns out, once you apply an evidence-based approach, that you will be wrong often enough for it to be a problem. So instead, we depend on the criteria-based approach Wiseman uses to improve the reliability and validity of our conclusions. Instead of saying, if someone is pregnant, I expect them to have nausea and vomiting plus a missed period, we ask, what us the probability they are pregnant if they have nausea and vomiting plus a missed period? Instead of looking for the pattern we'd expect if the dog knew when its owner was returning, we ask, what is the probability their owner is returning when they exhibit this behavior?

Wiseman chose a criteria for success based on discussions with the family as to how they could tell when Pam was returning based on the behaviour of the dog. It turned out that this behavior was not associated with her return. He altered the criteria based on further discussion, and it still wasn't associated with a return. Now it may be that the criteria needed refinement.. And four experiments is a small number to completely rule out any effect. However, the impression given was that this behavior was consistent, not that it was so sporadic that it would easily be absent on four trials.

Regardless of whether or not the criteria were sufficiently refined and the number of trials sufficiently large, the approach chosen by Wiseman was a valid and reliable means of discovering whether the dog's behaviour predicted the owner's return compared to Sheldrake's confirmatory approach. So to call criteria based on the claims made by the parents "arbitrary" is a mischaracterization. And he certainly doesn't deserve the heavy criticism for choosing to stick with methods which are more reliable and valid, instead of discarding them in favor of less reliable methods because they give you the answer you want.

Linda

fls · Oct 30, 2010

Paul C. Anagnostopoulos said:
So here's something I think about frequently: Is statistics really this difficult? Or are there "master statisticians" who could take a look at a psi experiment or group of experiments and pass judgment on the statistical methods chosen? Masters who the rest of the community would trust?

Because if not, I can't help but be reminded of this:

http://lesswrong.com/lw/1ib/parapsychology_the_control_group_for_science/

~~ Paul

Heh. I've said the very same thing about homeopathy.

This isn't a statistics problem though. It's more of a research methods problem and more about understanding exactly which hypothesis is being tested in order to choose the appropriate methods, rather than knowing how to apply the methods once chosen. I don't know what field that is. It wasn't part of my statistics courses. I took courses in research methodologies, epidemiology, etc. as part of my graduate studies in public health. I would presume that they are part of other graduate programs. I was taught by people from departments of anthropology, psychology, education, medicine, and economics, as far as I recall. No statisticians except for my statistics classes. No philosophers either. But I don't know if that's representative.

Linda

fls · Oct 30, 2010

Ersby said:
Or perhaps they didn't? http://www.internationalskeptics.com/forums/showthread.php?p=5713571 (start at post 116 to watch Storm et al's work unravel)

As far as I can tell: Wiseman and Milton calculated the stouffer z for each experiment, and then calculated another stouffer z for all the individual stouffer zs. Usually people do a stouffer z of individual z-scores.

They calculated an effect size for each experiment (the z score divided by N^1/2), but this isn't what they combined for the Stouffer z. The Stouffer z used the sum of individual z scores divided by the sqrt of the number of experiments. This is based on the numbers I get when performing those calculations matching up with the reported numbers (and not matching up when done the other way).

Linda

Ersby · Oct 30, 2010

Okay. I'm not at home, so I'm working from memory here, so there's every possibility I'm wrong.

fls · Oct 30, 2010

Rodney said:
But all Wiseman's and Milton's negative Stouffer Z figures were wrong, and I demonstrated that, using the binomial distribution for the studies that they analyzed, the results were statistically significant.

It may interest you to know that the only way it can be presumed that combining all the hits and misses into one big study and testing for significance using the binomial distribution is valid, is if you presume that psi does not have an effect. Otherwise, it is an invalid test.

So which is it?

Linda

fls · Oct 30, 2010

Ersby said:
Okay. I'm not at home, so I'm working from memory here, so there's every possibility I'm wrong.

Okay. I had to go back and check it myself. Too confusing to rely on memory.

Linda

Chris Carter Exposes Richard Wiseman Misdeeds

Scholar

Fortean

Penultimate Amazing

Penultimate Amazing

Illuminator

Penultimate Amazing

Penultimate Amazing

Scholar

Fortean

Scholar

Penultimate Amazing

Illuminator

Fortean

Nap, interrupted.

Penultimate Amazing

Penultimate Amazing

Penultimate Amazing

Fortean

Penultimate Amazing

Penultimate Amazing