• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Check my methodology - prayer study

Let me say I have a deep suspicion that it is a good thing that JREF gets run by magicians rather than statisticians. And I will be shocked if a challenge ever comes out of this. Having said all that, these discussions sometimes provide a useful platform to clear up scientific matters.

Digithead has provided a citation from a scientific journal. The referenced article supports the notion that you get better results using stratification. But on the narrow point that Saizai has claimed, the article completely supports him. The authors write:


In other words, if you randomize as Saizai suggests and use a critical value for five percent test, you get a false positive five percent of the time - just as basic statistics tells you will happen.

-Dick Startz

No, their article makes that claim for large sample trials (n>400). Their simulations clearly demonstrate that controlling for confounding by stratification is necessary for small sample trials. They find that the false positive rate increases when there is confounding in those trials...

They even have a clear delineation on how to proceed depending on the circumstances and endpoints...
 
As a matter of fact, they finish their article with 10 guidelines, the first being:

For superiority trials that seek to demonstrate the superiority of one therapy over another, consider stratified randomization when the overall sample size for a trial is small (,200 patients per treatment arm) or when interim analyses or subgroup analyses are planned that will involve small samples of a larger cohort. p25

And Randi consults with statisticians, read the challenge rules....
 
FWIW, digithead, I don't currently have access to the fulltext of the article. If you can pass me a PDF (or txt) I will read and respond.
 
No, their article makes that claim for large sample trials (n>400). Their simulations clearly demonstrate that controlling for confounding by stratification is necessary for small sample trials. They find that the false positive rate increases when there is confounding in those trials...

They even have a clear delineation on how to proceed depending on the circumstances and endpoints...

While I suspect that the number of forum members who find this a useful way to learn statistics is small, let me try to set the record straight anyhow.

1. Non-stratified sampling does not affect the size of a test. It does affect power. Therefore, if the Challenge standard is no more than 1/1000 (or whatever) of a false positive, randomization works fine. If one is trying to get a better research design, then power also matters. For this, stratification can help.

2. Nothing in (1) is affected by sample size, because the critical value for a test adjusts for sample size.

(1) and (2) are just facts of mathematics. They don't really admit of argument. In contrast, legitimate objections to the outcome of simulations are that what's being simulated doesn't correspond to the experiment being run. What's important is that the math correspond to the experiment being run, which isn't always the experiment that we think we see.

There's a long history of scientists being fooled when investigating "paranormal" events. I suspect statisticians are as vulnerable as physicists. That's the source of my opinion (which is not a fact of mathematics) that magicians and those giving magicians a skeptical assist are awfully important.
-Dick Startz
 
An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.).

[...]

They also found that as the incidence of the prognostic factor increases, the false positive rate also increases. For an n of 30 and a factor present in 30% of patients, this false positive rate is 43%.


I'm also interested in seeing the full paper, if possible.

What is the definition of a positive result? If it's defined, as usual, so that 5% is the probability of a positive result given the null hypothesis, then of course, on the null hypothesis, 5% of results will be positive. So, what, exactly, was 43% of what?

I bet that the statistical test they use, which defines "positive result", is based on a null hypothesis that assumes no confounding factors. (What? You mean, not everything is normally distributed? :D) So, then, where there are confounding factors, the statistics come out wrong.

Saizai, you also need to be more specific about what the definition of a positive result is, in your setup. I understand that each patient will get a score that is a number between 0 and 100. So, you'll have a bunch of numbers for the patients who were prayed for, and another bunch of numbers for the patients who weren't prayed for. Now what? How do you get a binary positive/negative decision out of all these numbers?

To satisfy the JREF, I think you'll need to use some sort of non-parametric test, which makes no assumptions about the overall distribution of the scores, but assumes only that the patients were randomly assigned.
 
1. Non-stratified sampling does not affect the size of a test. It does affect power. Therefore, if the Challenge standard is no more than 1/1000 (or whatever) of a false positive, randomization works fine. If one is trying to get a better research design, then power also matters. For this, stratification can help.
If this were true then why ever adjust or stratify for a confounder?

2. Nothing in (1) is affected by sample size, because the critical value for a test adjusts for sample size.
You should really look at some of the Bayesian thoughts on this, anything by James Berger out of Duke should help you...

(1) and (2) are just facts of mathematics. They don't really admit of argument. In contrast, legitimate objections to the outcome of simulations are that what's being simulated doesn't correspond to the experiment being run. What's important is that the math correspond to the experiment being run, which isn't always the experiment that we think we see.
No, they're also facts of clinical trials. Do you really think that pure randomization will handle clinically relevant factors such as age, disease severity, disease treatment, socioeconomic status, smoking status, comorbid conditions, etc. that have proven to be important in nearly every clinical trial?

And the paper I submitted for proof only discussed the presence of 1 clinically relevant factor. Stratification is an absolute necessity in small sample trials.

There's a long history of scientists being fooled when investigating "paranormal" events. I suspect statisticians are as vulnerable as physicists. That's the source of my opinion (which is not a fact of mathematics) that magicians and those giving magicians a skeptical assist are awfully important.
-Dick Startz
Absolutely correct but irrelevant. There are also people who think of statistics as a sort of magic that conveys a sacred advancement into the realm of believability. Do any of my statements and suggestions make you believe that I am credulous when it comes to statistical design?

Seriously, this guy wants to design a trial to test the power of intercessory prayer on disease outcomes. But it doesn't matter if he were testing a new drug designed to eliminate AIDS, the clinical trial protocols are the same. Regardless of what you're trying to test, you need to adjust for clinically relevant factors that can enhance or obscure an effect. You need to isolate the effect of just the treatment in question from all of the other things that could also explain it. Simple randomization does not do this unless he samples from a homogenous population which is about as likely as winning the powerball.

He has multiple outcomes with the QOL instrument he's using. He also is collecting data regarding costs, etc. to see if IP has an effect. Can anyone here really support his decision to ignore clinically relevant factors such as age, disease severity, disease treatement, etc. given what his sample size (n=50) and experimental design (t-tests) are? Does anyone besides Saizai think that this is a good design?
 
You have not made any reasonable a priori constraints on what you are looking for.

You have made no attempt to choose variables that are a reliable and valid measure of what you claim to be attempting to measure.
Such as? I repeatedly asked for specific examples.

You want us to give you examples of something you haven't done? Linda's point was that you have not made any constraints. How can she possibly give you an example of something that isn't there?

Not so. They already say that they will be different in order to make it more difficult. That is a change, right? It's in the choice of N.

The protocol is identical: I choose, arbitrarily, a score equation at the beginning of the round. In terms of ensuring that there is no fallacy or fraud going on, it does not matter what I choose. You have ignored my repeated requests for an example of what I might choose that would cause an unusually high false positive rate.

Exactly. You choose a scoring system at the start of each round. That it, you score each round in a different way. The rounds do not have the same protocol. It is not at all the same as chaning N. All that means is that you are following the same protocol, but you follow it a different number of times. In fact, it has been debated many times that N could be exactly the same for both. If the preliminary has a 1/1000 chance and the final has a 1/1000 chance the the total chance of winning is 1/1,000,000. In your case you are proposing changing the very thing that determines whether you win or lose. How do you not understand that this could never be acceptable?
 
You want us to give you examples of something you haven't done? Linda's point was that you have not made any constraints. How can she possibly give you an example of something that isn't there?

I got a kick out of that as well. :)

Linda
 
I got a kick out of that as well. :)

...where?:D

Jokes aside. You say that Saizai can't do it while you can design a solid study in your sleep. Fine. Can you please specify:
- how much would you charge to provide the design
- how much do you estimate the cost of conducting the study would be.
Then we can talk practicalities.
The paradox of these chat rooms is that they are next to useless without competent professionals and professionals that join them jeopardise their credibility by stooping to the amateurs' level.
 
Last edited:
Baby and Bathwater

saizai,

Please allow me to join the debate late. I believe that I may have a fresh outlook on the task you present for us here. I hope that I don't throw out the baby with the bathwater.

I have a bit of a reputation around this Forum of proposing radical changes and my own challenges, replete with cash incentives. Let's all work together to build the best protocol.

Your goal is to show that prayer influences health, even at great distances. I'd suggest that you hypothesize that prayer even improves health.

First, I suggest that you've selected your populations poorly. Selecting those suffering from disease and those using the Internet and those willing to volunteer greatly limits your ability to use randomization to avoid confounding effects. The power of your tests will suffer accordingly. Why not select healthy individuals from a controlled population, such as a prison, a church, or a university?

Second, I suggest that you've selected your outcomes poorly. Selecting pain, for example, is highly subjective and readily confounded. As a terminal cancer patient (not quite end-stage, thank you.), I can tell you that my pain varies based on whether a friend called today, the amount sunshine, how long my palliative drugs have been in the refrigerator (shelf life issue), how long it's been since my chemotherapy (the chemo has a palliative benefit too.), and a host of other items. You must accept one of: 1) a very large sample size (Don't! It's too expensive.), 2) stratification (Don't. It's too complicated.), or 3) a new outcome that you can readily and accurately measure (Do this!) Why not select blood pressure (Give everyone an electronic BP cuff.)? How about mental acuity (Have everyone take a web-base skill test.)?

Third, I suggest that you're not blind enough. Your studies include a great deal of cognitive science. You know about how we deceive and trick even ourselves into believing what we want. You, indeed absolutely no one, should know that assignments until the end of the trials. You must not be assigning individuals based on Round 1 results into groups in Round 2. Totally blind your studies. It's the only way to go, really! I know it may seems hard, but we can help. It's really easier than you think. (Randi, by the way has a "magical" way of "divining" such blinded studies that are just "amazing"!)

Fourth, I urge you to go with a A-B study to reduce the confounding effects. Assign half of the receivers to Group A, half to Group B. Assign the providers to A. Run one trial. Measure all receivers. Assign the providers to B (tell them that A have been cured, killed, died, forgotten, converted to Pastamania, or something not so cruel or funny.) Run one trial. Measure all receivers. Now ANOVA. If you assume (and you really can't, but hey there's still the randomization) that no confoundering variables were more likely to occur during one trial than the other, you've blocked the confounding effects.

To tie a bow around it:
1) You and I design a simple web-based intelligence test.
2) You and I design a simple survey that asks the questions (first name, age, gender, eye color, nose length, whatever, and verified email address) the answers to which (except for the email address) we'd like to provide to the providers, and a second survey to get email addresses of providers.
3) I drive 50 miles north on a Sunday morning and visit a number of churches, posting fliers asking them to visit a certain website to register for the study as receivers.
4) I drive 50 miles south on the next Sunday morning and visit a number of churches posting fliers asking them to visit another certain website to register for the study as providers.
5) The computer programs (I write them and you review them) implement our protocol, sending the correct emails to the receivers and providers at the appropriate times and after two months sends both of us an email of the resultant data. I provide all needed software, hardware, and domain services. I maintain complete lockdown of the machines, the code, and the data. You do not get to know the website names or towns or churchs. Indeed, we do not get any data, except for a daily "heartbeat" report listing number of registrations, tests, emails sent, and days remaining in the current step) until after the test's conclusion.
6) If the test shows a significant positive effect of prayer, then you win $1000 (of my own money) and my support in applying for MDC. If not, you agree to cease all requests for donations on any website forever more regarding any paranormal claim, especially the healing effects of prayer.

By the way, I am tempted to offer $5 to anyone who improves by better than average in test 2 over test 1 or test 3 over test 2. Perhaps it should be $100 to the church with the person with the best improvement during a trial and $100 to the church of the provider associated with the best provided-for improvement. I'm still considering the ramifications, such as collecting information about where to send checks, and the effect on my checking account.

I believe that you'll do us both a favor by taking some serious time to really consider this proposal. I believe that you won't find more support anywhere than in the text above.

Determinedly,
Gulliver
 
Last edited:
What timing! Gulliver, it looks like you have saved me some money (assuming that Linda does not find fault with your design also ....:D )

Happy to shore up your checking account if this goes anywhere.
(Based on previous examples, you will excuse me for not running to the cheque book just yet...)
 
I suspect our balances are safe.

What timing! Gulliver, it looks like you have saved me some money (assuming that Linda does not find fault with your design also ....:D )

Happy to shore up your checking account if this goes anywhere.
(Based on previous examples, you will excuse me for not running to the cheque book just yet...)

Thanks so much for support, and so quickly too. Now if I had just prayed for such a miracle, we might have something :)

Oh, and there's definitely fault to be found in the proposal, but time and the Forum's kind members will address most of it. (And I do subscribe to pretty-good-now is better than perfect-never philosophy.)

Most gratefully,
Gulliver
 
You, indeed absolutely no one, should know that assignments until the end of the trials.


I don't think he will. His computer will.

(I don't know what would stop him from asking his computer, though. He should clarify this.)

You must not be assigning individuals based on Round 1 results into groups in Round 2.


The rounds involve different individuals. No one participates in both rounds.

He plans to look at the results of round 1 to see what things prayer apparently influenced the most, and then he predicts that in round 2 it will influence the same things in totally different people.

Fourth, I urge you to go with a A-B study to reduce the confounding effects. Assign half of the receivers to Group A, half to Group B. Assign the providers to A. Run one trial. Measure all receivers. Assign the providers to B (tell them that A have been cured, killed, died, forgotten, converted to Pastamania, or something not so cruel or funny.) Run one trial. Measure all receivers. Now ANOVA. If you assume (and you really can't, but hey there's still the randomization) that no confoundering variables were more likely to occur during one trial than the other, you've blocked the confounding effects.


That makes a lot of sense.
 
1) You and I design a simple web-based intelligence test.

I question the choice of the ability to take an intelligence test as a valid metric. There are far too many things that will affect this, even if you ignore the risk of deliberate cheating. Just as many things affect pain, so do many things affect people's ability to take tests, time of day, state of mind, hunger, thirst, caffeine, etc.. In addition, it is fairly well established that IQ tests are not actually valid tests of IQ because people learn to better at them. Given that, testing to see if people do better at what is effectively an IQ test given lots of time and practice seems virtually guaranteed to find a difference.

Finally, you have not solved the most important objection to Saizai's claim. Neither his test nor yours will provide proof of anything. The whole point of Randi's challenge is that it provides undisputed proof that the applicant can do what they claim, although the actual method of doing so can still be disputed. For example, if a dowser can find which bucket has water hidden under it 19/20 times, it certainly shows that they can find water, although does not prove that dowsing itself actually works. However, a study that shows a statistically significant difference between two groups proves absolutely nothing, all it shows is that something interesting might be happening and more research could be needed. Neither Saizai's test or yours will ever be acceptable as challenges for the million because they are just not challenges, they are simply medical studies.
 
By the way, I am tempted to offer $5 to anyone who improves by better than average in test 2 over test 1 or test 3 over test 2. Perhaps it should be $100 to the church with the person with the best improvement during a trial and $100 to the church of the provider associated with the best provided-for improvement. I'm still considering the ramifications, such as collecting information about where to send checks, and the effect on my checking account.
This would be enough incentive for me to intentionally cheat by answering badly on the first test. I think that cash prizes should probably be avoided unless you control for this.
 
...where?:D

Jokes aside. You say that Saizai can't do it while you can design a solid study in your sleep. Fine. Can you please specify:
- how much would you charge to provide the design
- how much do you estimate the cost of conducting the study would be.
Then we can talk practicalities.
The paradox of these chat rooms is that they are next to useless without competent professionals and professionals that join them jeopardise their credibility by stooping to the amateurs' level.

I'm sorry. I can't tell how I'm supposed to take this. Perhaps it's just my pre-coffee state.

Linda
 
Grateful Reply

First and foremost, I thank you for your reply. I learn a great deal from such comments, and yours are most kind as well.
I don't think he will. His computer will.

(I don't know what would stop him from asking his computer, though. He should clarify this.)
May I clarify please? I intend that saizai will not be able to obtain any data except for the heartbeat email. I'll maintain the computer involved. I won't tell him the websites involved. I won't give him access at any time. The fliers will contain a "password" for the church involved. Without the password, no one will be able to register. I intend not to disclose the towns or churches or websites and to maintain a complete and secure lock-down on the computer involved. We get the results only after the complete test. I even believe that we can agree on the statistical test before the first step. I'd like to have my computer set up before the lock-down to run the test automatically and email the results without further human intervention.
The rounds involve different individuals. No one participates in both rounds.

He plans to look at the results of round 1 to see what things prayer apparently influenced the most, and then he predicts that in round 2 it will influence the same things in totally different people.
That's a fair gig. I should have done better on this point. To improve my point, please let me say that saizai might still accidentally cause a bias. Let's say he notices that men are less likely to improve with prayer. He might assign more women to the receiving group, even when random assignment is used. I've actually made this mistake once. In a graduate level experimental design course, I rejected certain coin tosses (fell on the floor, didn't rotate enough, didn't catch it right, etc.) whenever I didn't like the result. I rather sure that I wasn't intentionally introducing the bias, but the videotape sure made it look that way. My professor was happy to have proven her point on two of the ten students. I was saddened to earn a poor grade for my stupidity.

Most gratefully,
Gulliver
 
Lost here

Cuddles,

As I recall we've worked to great results before. Consider, for example, our teaming on dealing with Dargo. May I please have your kindness to review this carefully? I believe in you and your abilities, but find your responses here difficult to accept. I will do my best to respect your comments and to appreciate the effort you must have made to share with me your insights.

I question the choice of the ability to take an intelligence test as a valid metric. There are far too many things that will affect this, even if you ignore the risk of deliberate cheating. Just as many things affect pain, so do many things affect people's ability to take tests, time of day, state of mind, hunger, thirst, caffeine, etc.. In addition, it is fairly well established that IQ tests are not actually valid tests of IQ because people learn to better at them. Given that, testing to see if people do better at what is effectively an IQ test given lots of time and practice seems virtually guaranteed to find a difference.

I must respectfully disagree. Intelligence tests are well established as not varying under many conditions. Caffeine intake is probably the most notorious, but please reference http://www.springerlink.com/content/t25366v1554q33m0/ for an interesting summary of a recent article in the peer-reviewed _Cellular and Molecular Life Sciences (CMLS)_. Even given your claim, won't the A-B nature of the experimental design eliminate any influence? Let's say 10 out of 100 subjects cheat to improve their performance, by whatever means. Since they don't know in which trial they're receiving prayer support, they can't bias the result.

Finally, you have not solved the most important objection to Saizai's claim. Neither his test nor yours will provide proof of anything. The whole point of Randi's challenge is that it provides undisputed proof that the applicant can do what they claim, although the actual method of doing so can still be disputed. For example, if a dowser can find which bucket has water hidden under it 19/20 times, it certainly shows that they can find water, although does not prove that dowsing itself actually works. However, a study that shows a statistically significant difference between two groups proves absolutely nothing, all it shows is that something interesting might be happening and more research could be needed. Neither Saizai's test or yours will ever be acceptable as challenges for the million because they are just not challenges, they are simply medical studies.

Now, Cuddles, I really think you need to sit in a cozy armchair and ponder your statements above. You're much too smart to make this claim. We aren't interested in how it works. We are concerned that after eliminating other factors can we show with statistical accuracy that a paranormal force effected the outcome under controlled situations. If so then the claimant deserves our further consideration. I would, of course, defer to JREF for the next test if this one shows a positive result. I would not claim to have proven anything.

If you insist on maintaining your position, would you please provide a quote from the JREF FAQs or instructions that eliminates "medical studies" as "just not challenges"?

I sincerely hope that we can disagree here and maintain our professional friendship. You have my respect, and I ask your indulgence to allow me to disagree so tersely.

With real gratitude,
Gulliver
 
Point taken

This would be enough incentive for me to intentionally cheat by answering badly on the first test. I think that cash prizes should probably be avoided unless you control for this.

I must agree with you in many ways. I can see your point. While I'm only tempted to make cash incentives a part of the protocol, I remain undecided. To counter the harm you so clearly expressed, I offer that we might double our sample size with only $200 in incentives. Since the cheaters wouldn't know in which A-B trial they're receiving prayer, they could not bias the outcome.

(Full Disclosure: I'm a theist. I know. I know. But I am not religious and I definitely don't believe in any type of divine intervention. It's difficult for me to write the next few sentences.)

I suggest that cheaters could create bad data points, assuming that a fair God wouldn't intercede for a cheater, even when receiving prayer support. This problem would reduce the test's ability to detect legitimate effects on those who don't cheat. So cheaters, under this assumption, hurt, not help saizai's case.

I'll keep pondering the point, but I leaning toward no cash incentives.

Better educated now,
Gulliver
 
May I clarify please? I intend that saizai will not be able to obtain any data except for the heartbeat email. I'll maintain the computer involved. I won't tell him the websites involved. I won't give him access at any time.


Right, I understand that this is what you're proposing.

It seemed that you were criticising his, different, proposal (as found on his website www.prayermatch.org), and I thought that the criticism wasn't necessarily warranted. Even under his own proposal, he claims that he won't know which patients get prayed for and which don't, until the end of the study.

Let's say he notices that men are less likely to improve with prayer. He might assign more women to the receiving group, even when random assignment is used.


Yes, that is exactly the sort of thing he intends to do. But why would it be a problem? I think that prayer can't improve anyone's health. A demonstration that prayer improved the health of women would still be remarkable.

I've actually made this mistake once. In a graduate level experimental design course, I rejected certain coin tosses (fell on the floor, didn't rotate enough, didn't catch it right, etc.) whenever I didn't like the result.


That's different, I'm pretty sure. The decision to reject a particular coin toss was based, at least in part, on the result of that coin toss.

But suppose you had two rounds of coin tossing. In the first round, coins that land on the floor happen to be mostly heads. You are aiming for tails, say. Therefore, you decide to ignore, in the second round, all coins that land on the floor. This would not be a problem. Assuming that coins which land on the floor have a 50% chance of coming up heads, you haven't increased your chances of getting lots of tails in the second round, by choosing to ignore coins that land on the floor.

Similarly, assuming prayer doesn't work for anyone, saizai can't increase his chances of getting a positive outcome from his study, by limiting the second round to women. Of course, if prayer does work for women, he can. But that's ok---we want him to, in that case!
 

Back
Top Bottom