• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Check my methodology - prayer study

While I thank you for your concern, that is not something I am interested in discussing, as it falls within the "what will you do with it if you win" category.

I'm very disappointed to hear that. You do not seem to be a professional and so you are probably not under an obligation to act in an ethical manner, but I was hoping that you would choose to do so, anyway. That you have no interest in improving your methodology in order to decrease the probability that you are wasting part of what life remains for these people is disturbing to me.

That is only one aspect of it.

You could just as well claim that serious researchers would never accept a positive finding, and that therefore any research is completely fruitless. I happen to disagree. However, per above, I do not want to discuss this further, as it is not related to a specific critique of (and preferably, improvement to) my methodology.

It is related to critique and improvement of your methodology, just not the part that you care about.

It is not vaguely defined, except the comments section, which I have said that I do not intend to use for the purposes of conclusion-relevant analysis.

It is vaguely defined in that at no point have you indicated what useful or relevant concept you are measuring when you collect data such as "cost of treatment". The only measure you have that seems to be reasonably reliable and valid (as a measure of pain) is your 10 point pain scale. Even your hard outcomes, such as death, will be measured in an unreliable fashion, making it unclear what exactly it means to not be registered as a death.

Certainly. Which is why I set the analysis before the data is collected, per standard rigorous protocol. No sharpshooter fallacy here. :)

I was referring only to your first study, which is exploratory in nature and does not involve pre-set comparisons.

Argument from ignorance is generally valid only in some very limited circumstances, where you have proven the ability of the test to detect the thing tested for, and are claiming that the (new) negative results are therefore evidence that the thing tested for does not exist in the place it was newly tested for.

See my "Charlie the Treasure Hunter" analogy; should come up on a forum search.

The circumstances are no more limited than they are for testing a positive.

The analogy didn't come up for me, but I think I know what you mean.

Correct. I would like 50<n<500 but it's primarily a pragmatic question, of how many qualified participants can be recruited.

It would be more responsible of you to figure out beforehand how many you need before undertaking a three year project.

How would it do so [at likelihood > p], given that the sorting into groups is random?

I will work on an illustrative example (which may be more useful than a Monte Carlo sim).

If you believe that the sim is invalid, please propose an alternate sim so that we can test your hypothesis. :)

It would make more sense to wait until you have an idea what the parameters will be.

Linda
 
I'm very disappointed to hear that. You do not seem to be a professional and so you are probably not under an obligation to act in an ethical manner, but I was hoping that you would choose to do so, anyway. That you have no interest in improving your methodology in order to decrease the probability that you are wasting part of what life remains for these people is disturbing to me.

You seem to have grossly misunderstood what I said.

I am interested in improving methodology. The quote I was responding to, however, had no suggestions for methodology, only for what I might do with the study once it was completed, how it might be published, etc. That I do not see any reason to discuss here.

I am under obligation to act ethically simply on ethical grounds, not out of any academic contract.

It is related to critique and improvement of your methodology, just not the part that you care about.
If you have a suggestion for how to improve the methodology of the study itself - to make it more reliable, more sensitive, more comprehensive, etc - I'm listening.

Your definition of 'methodology' does not seem to be within that.

It is vaguely defined in that at no point have you indicated what useful or relevant concept you are measuring when you collect data such as "cost of treatment". The only measure you have that seems to be reasonably reliable and valid (as a measure of pain) is your 10 point pain scale. Even your hard outcomes, such as death, will be measured in an unreliable fashion, making it unclear what exactly it means to not be registered as a death.
"Cost of treatment" = total pre-insurance medical bills. Not really a subjective or 'soft' measure.

10 point pain scale is superceded by the SF-36v2 HRQOL.

Explain what you mean re deaths? That was unclear. Preferably, propose an improvement to whatever flaw(s) you see...

I was referring only to your first study, which is exploratory in nature and does not involve pre-set comparisons.
Ah. That is part of the design that has changed since. The new one is:
Round 1: score equation = SF36v2 HRQOL (tentatively)
Round 2: score equation determined based on Round 1 data, default to SF36v2 again

No exploratory round.

It would be more responsible of you to figure out beforehand how many you need before undertaking a three year project.
Two year (it's been revised).

And yes, it would be decided beforehand; but choice of N is dependent on quite a number of things, some of which aren't yet settled.

Again, do you have a specific suggestion for what it should be and why rather than just non-constructive criticism?

I will work on an illustrative example (which may be more useful than a Monte Carlo sim).
Please make it one that is simable. Previous posters have claimed to have examples that, when simulated, turned out (per my argument) to be spurious.
 
Last edited:
Suppose this was a "wishing on a star" study--and you were going to match up wishees and wishers-- do you see how hard it would be prove that any of the star wishing had any effect on the wishees? Sure, there might be some effect of some sort--but we would have no way of attributing it to star wishing.

People have probably already pointed you to the Harvard heart prayer study--that is what a good study looks like. You can easily see if there are effects and if the effects are related to prayer and/or knowing that one might be being prayed for etc. You have nothing to control for a placebo effect or any way of measuring healing.

You've asked us to go to your website and evaluate your protocol, now do something for the people who were kind enough to respond to you. Go watch this video http://www.whydoesgodhateamputees.com/video8.htm and tell us how you propose to control for such an illusion? People attribute healings to all sorts of things--prayer, wheatgrass juice, rain dances, good karma, etc.--but that doesn't mean any of these things are responsible. Most things get better on their own. Many things respond positively just to the idea that someone is doing something or cares. The placebo effect is very real.

Listen to what the smart people on this forum are telling you if you are serious. And learn how to design a good prayer study. http://www.hno.harvard.edu/gazette/2006/04.06/05-prayer.html

Are you familiar with Randi's astrology "class"? http://www.youtube.com/watch?v=3Dp2Zqk8vHw

If you want to be taken seriously, you need to show some awareness of how people fool themselves and what a double blind study is--you seem not to understand either...or even why your protocol isn't a protocol. You may as well see if chanting for world peace works while you're implementing your study.
 
articulett - I don't think you understand my methodology. The measurement is clear; there is no need to control for placebo since all recipients are treated identically (it's blinded); and you can tell whether it has an effect by comparing the groups.

You do not seem to understand either that this is a randomized controlled trial, or what one is; I am not making any analysis from any specific situation, but from the group differences (or lack thereof). The illusion video you linked to does not apply to me. (Also, please note that I am not myself a theist.)

If you have a specific reason why you believe that my protocol is not in fact a blinded randomized controlled trial, please quote my study design and explain what fallacy you believe is going on.

P.S. It happens that my major is in cognitive science, so I am well familiar with a very wide range of cognitive and visual illusions, errors, fallacies, etc. :) I am not making any.
 
Last edited:
*laugh* Correct. Figures there'd be someone around who wants to be precise.

In any case, p is effectively the uncertainty factor, i.e. the chance that you haven't proven what you thought you had, which is all I was using it for in my argument.


The chance that you haven't proven what you thought you had, given what?

I think you underestimate the importance of precision on this point.

Per above, this is certainly true. However, I think we are getting outside the range of valid objections to my protocol, and into simply general objections to all research, particularly speculative research. As it is not related to any particular flaw of *my* methodology, I'd rather not get into it.


I do not object to all research. I object to hypothesis testing that concerns itself only with the level of significance of the test but ignores questions of power. A positive result from such a test is quite impossible to interpret.

And that's not possible to know. (Though theists may claim otherwise.)

Certainly it's not possible to even discuss what that p(positive | it works [in manner X]) without discussing X. And I will not get into any discussion about X, as I consider it a waste of time given the dearth of valid non-contradictory evidence to determine X.


One does not have to believe that X exists, to discuss X for the purpose of designing a study to check whether X exists.

One has to discuss X, to design a study to check whether X exists.

How would it do so [at likelihood > p], given that the sorting into groups is random?

If you believe that the sim is invalid, please propose an alternate sim so that we can test your hypothesis. :)


Can we back up here?

A simulation is fine, but what should be simulated?

You think ("at likelihood > p") that the simulation should simulate many runs of the experiment, all conducted under the assumption that prayer is ineffective. Some of these runs will have a negative result; some will have a postive result. The simulation should report the fraction of runs that give a positive result.

How does this kind of simulation accurately reflect the situation we find ourselves in, should the real experiment produce a positive result? In this situation, we have to decide whether the positive result is probably a false positive or probably a true positive.

The simulation involves many negative results; in the real situation, the result is known to be positive.

The simulation involves no true positives; in the real situation, the result might be a true positive. (Obviously. The whole question is to decide whether it is or isn't!)

Such a simulation is irrelevant.

Well, let me qualify that. It is irrelevant from a scientific point of view, from the point of view of someone who isn't sure whether prayer works and who is interested in finding out what a positive experimental result would imply about the issue. From the point of view of (a caricature of) the JREF, which is already sure that prayer doesn't work and which is interested solely in keeping its million dollars with high probability, it is sufficient.
 
The chance that you haven't proven what you thought you had, given what?

I think you underestimate the importance of precision on this point.

Please explain then how the difference is relevant here, and how exactly a better design could be made.

If the flaw is with any research that attempts to ask what I'm asking, well, ain't much I can do about that. :p

I do not object to all research. I object to hypothesis testing that concerns itself only with the level of significance of the test but ignores questions of power. A positive result from such a test is quite impossible to interpret.

See above.

I'm happy to discuss ways to improve the methodology. I am not particularly interested in discussing the problems with trying to do a study a prayer at all, e.g. ones based on claiming that the prior probability is infinitesimal and therefore ANY result from ANY study design is going to be a very low posterior probability. That's just not very useful.

So please: offer a suggestion for how it can be improved, or don't waste your time and mine.

One does not have to believe that X exists, to discuss X for the purpose of designing a study to check whether X exists.

One has to discuss X, to design a study to check whether X exists.

Not very much. I am interested in whether prayer can influence the outcomes I am measuring. I am not, per se, testing whether prayer exists.

This is because I don't care whether prayer exists, unless it is a sort that can influence the things I am interested in measuring.

Hopefully the distinction makes sense.

Thus, my choice of measure is not dependent on X, but on what types of prayer, as a class, I care about.

A simulation is fine, but what should be simulated?

You think ("at likelihood > p") that the simulation should simulate many runs of the experiment, all conducted under the assumption that prayer is ineffective.

Why? Conduct them under either assumption.

Some of these runs will have a negative result; some will have a postive result. The simulation should report the fraction of runs that give a positive result.

Yup.

How does this kind of simulation accurately reflect the situation we find ourselves in, should the real experiment produce a positive result? In this situation, we have to decide whether the positive result is probably a false positive or probably a true positive.

The simulation involves many negative results; in the real situation, the result is known to be positive.

The simulation involves no true positives; in the real situation, the result might be a true positive. (Obviously. The whole question is to decide whether it is or isn't!)

Such a simulation is irrelevant.

Well, let me qualify that. It is irrelevant from a scientific point of view, from the point of view of someone who isn't sure whether prayer works and who is interested in finding out what a positive experimental result would imply about the issue. From the point of view of (a caricature of) the JREF, which is already sure that prayer doesn't work and which is interested solely in keeping its million dollars with high probability, it is sufficient.

I see. Interesting points.

However, the JREF has previously taken on experiments - less complicated perhaps - which still reduce to the same issue: there is some chance that the applicant will get a positive result by luck alone. In this context, what we are discussing, ultimately, must resolve to that.

Take that into account when proposing your modified simulation, one which compares both prior assumptions of prayer working and prayer not working. Remember that JREF is willing to accept positive results simply by virtue of them being very unlikely (e.g. 1/1000th).

Clearly they are not making the rather more complicated prior/posterior probability sort of argument based on previous research, the power of the experiment being conducted, etc.

And again, rather than purely attacking the power of the study, please suggest improvements to the methodology that would not have the flaws you see in the current one.

If you cannot do so, then there is no point having this discussion.
 
Ah. That is part of the design that has changed since. The new one is:
Round 1: score equation = SF36v2 HRQOL (tentatively)
Round 2: score equation determined based on Round 1 data, default to SF36v2 again

No exploratory round.

Round 2 is not the same as round 1. Round 2 is based on data gathered in round 1. Therefore round 1 is exploratory. In addition, the JREF preliminary and final test have to follow the same protocol. As it stands, your protocol is not acceptable for the JREF challenge because you plan on changing the most important part of your test specifically to make it more likely to get a positive in the final test.
 
There may be hope for you yet. :)

While you have disparaged my (and others') suggestions as not relevant, you have gone ahead and made some of the suggested changes.

"Cost of treatment" = total pre-insurance medical bills. Not really a subjective or 'soft' measure.

Height is not a soft measure either.

Do you really have any idea whether or not "cost of treatment" is related to improvement in health? Or even in which direction?

There are costs associated with asking too many questions - more drop outs for one. Now that you've chosen a reasonable outcome measure, I would suggest dropping all the miscellaneous questions you had on your list.

Explain what you mean re deaths? That was unclear. Preferably, propose an improvement to whatever flaw(s) you see...

This is the question you need to answer - will you find out about all of the deaths if you take a passive approach?

Ah. That is part of the design that has changed since. The new one is:
Round 1: score equation = SF36v2 HRQOL (tentatively)
Round 2: score equation determined based on Round 1 data, default to SF36v2 again

No exploratory round.

That's a good start. You will need information on how the SF36 performs in people with cancer. Does it discriminate between different levels of HRQOL or are most people clustered at one end (poor) of the scale? What's the variability, sensitivity, specificity, etc.?

I don't understand what you mean by "score equation determined based on Round 1 data". Wouldn't you just look for significant differences in the score between the two groups? Are you talking about adjusting the scores based on other variables? 'Cuz then it's back to being exploratory.

And yes, it would be decided beforehand; but choice of N is dependent on quite a number of things, some of which aren't yet settled.

Again, do you have a specific suggestion for what it should be and why rather than just non-constructive criticism?

Yes, I did make specific suggestions as to what and why. You should have a sample size that has an adequate power to detect a small effect. The "why" was contained in one of the paragraphs you dismissed.

Please make it one that is simable. Previous posters have claimed to have examples that, when simulated, turned out (per my argument) to be spurious.

I think with the changes that are being made, it will become a non-issue.

Linda
 
Round 2 is not the same as round 1. Round 2 is based on data gathered in round 1. Therefore round 1 is exploratory. In addition, the JREF preliminary and final test have to follow the same protocol. As it stands, your protocol is not acceptable for the JREF challenge because you plan on changing the most important part of your test specifically to make it more likely to get a positive in the final test.

No, the protocol is still the same: I decide what the test will be before the data is gathered, and the test has no way to know which group someone is in. I happen to be choosing the test arbitrarily in the case of the first round, and based on data acquired in the case of the second round. There is no test I can choose within that constraint that would invalidate proper protocol. If you believe there is, propose one.

The first round data is still valid if it turns out that the thing I was testing for turned out to show an effect.

While you have disparaged my (and others') suggestions as not relevant, you have gone ahead and made some of the suggested changes.

If you are referring to the two-round design, that is something I decided on a few months ago.

Height is not a soft measure either.

Do you really have any idea whether or not "cost of treatment" is related to improvement in health? Or even in which direction?

Who said it has to be? Suppose prayer makes things cheaper (or more expensive). :p

There are costs associated with asking too many questions - more drop outs for one. Now that you've chosen a reasonable outcome measure, I would suggest dropping all the miscellaneous questions you had on your list.

That would reduce data available for finding a good test to use in the second round. And it would reduce the data available for other purposes; I am not doing this just for the Challenge after all.

This is the question you need to answer - will you find out about all of the deaths if you take a passive approach?

No. But that's why you get the doctors' phone numbers and call 'em up if someone stops responding. And show them a document that the participant signed at the outset allowing them to release relevant medical information to you.

That's a good start. You will need information on how the SF36 performs in people with cancer. Does it discriminate between different levels of HRQOL or are most people clustered at one end (poor) of the scale? What's the variability, sensitivity, specificity, etc.?

*nod* And I don't have that yet. Hence "tentative".

I don't understand what you mean by "score equation determined based on Round 1 data". Wouldn't you just look for significant differences in the score between the two groups? Are you talking about adjusting the scores based on other variables? 'Cuz then it's back to being exploratory.

How to score round 2 is decided based on what happened in round 1 (e.g. if it turned out that from variables A..Z, variables F, H, and Y showed significant group differences in round 1, then you'd make a score equation for round 2 that averages F, H, & Y somehow). You are still only scoring round 2 data of course, and what you're calculating for the conclusion is still the difference between groups.

Yes, I did make specific suggestions as to what and why. You should have a sample size that has an adequate power to detect a small effect. The "why" was contained in one of the paragraphs you dismissed.

So what exactly do you believe to be an adequate sample size and why? Please show me the math you use to arrive at your proposed number.
 
Okay. I realize by saying this that I run the risk of appearing condescending. But I've thought about this quite a bit and I've decided that it's more important for me to say this than it is for me to worry if someone thinks I'm arrogant. And I also want to make it clear that I am not claiming that amateurs cannot do good science. There are many excellent examples from science fairs and other venues that prove otherwise.

Saizai, you do not understand what I have said. And this is my fault, as I assumed a certain level of technical knowledge on your part. But as it stands, your "experiment" cannot be considered a legitimate exercise. You have not made any reasonable a priori constraints on what you are looking for. You have made no attempt to choose variables that are a reliable and valid measure of what you claim to be attempting to measure. You have introduced a bias into your methods that increases the probability of obtaining a falsely significant result which will lead to two problems. The naive will falsely proclaim that the results show that prayer has an effect, and legitimate researchers will not be able to use your results for further study as they will be useless when there is no way to separate out spurious effects from possible real effects. There's more, but that's more than enough.

Your method of dismissing these concerns unless I demonstrate how they could cause a problem and how I would suggest they be fixed, only works if you are talking about fine-tuning an otherwise solid design. Otherwise, what you are basically asking is for me to design your study for you. Since this is exactly the kind of thing that I do, I could probably do this in my sleep (and my students/colleagues might claim that sometimes I did ;)). But frankly, with your attitude, it's like pulling teeth to make any sort of progress (assuming it is actually possible to make any progress - I'm not sure, now, that I've seen any in spite of 6 pages of attempts).

I realize that I can't stop you. But perhaps I can influence whether or not the JREF is involved. I will point out to the JREF that it takes advantage of vulnerable people to present this as a legitimate scientific endeavor. And I will also point out to them any biases or other methodologic problems that increase the possibility of obtaining a result that will be falsely presented as "significant" in order to prevent an unwarranted awarding of the prize.

Linda
 
No, the protocol is still the same: I decide what the test will be before the data is gathered, and the test has no way to know which group someone is in. I happen to be choosing the test arbitrarily in the case of the first round, and based on data acquired in the case of the second round. There is no test I can choose within that constraint that would invalidate proper protocol. If you believe there is, propose one.

The first round data is still valid if it turns out that the thing I was testing for turned out to show an effect.

No, the protocol is clearly not the same. You say that you base part of round two on the results of round one. If anything about round two depends in any way whatsoever on round one, then it cannot be the same as round one. The preliminary and final in the JREF challenge must be exactly the same. Not similar. Exactly. If you do not assess both rounds in exactly the same way then you do not have a valid protocol for the JREF challenge.

As Linda says, you have had plenty of advice and criticism here. I suggest you start paying attention to it rather than dismissing everything out of hand. If you carry on in the same way I guarantee you will never be accepted for the challenge.
 
This is not for Saizai but for anyone else that might be persuaded by his argument that that simple random sampling will overcome any confounding issues...

An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.). Note that these examples were included in their article. To quote them:

To illustrate the chance that simple (unstratified) randomization may lead to treatment groups that are unbalanced with respect to a prognostic factor, consider a trial of two therapies in a disease with an important prognostic factor that is present in 15% of patients. The chance that the two treatment groups will differ by more than 10% for the proportion of patients with the prognostic factor is 33% for a trial of 30 patients, 24% for a trial of 50 patients, 10% for a trial of 100 patients, 3% for a trial of 200 patients, and 0.3% for a trial of 400 patients. (p. 20)

They also found that as the incidence of the prognostic factor increases, the false positive rate also increases. For an n of 30 and a factor present in 30% of patients, this false positive rate is 43%. For an n of 50, it drops to 38%. One has to have an n of 400 to drop the rate to 2%. Note that this is in the presence of only one confounder. It logically follows that more than one confounder increases this false positive rate.

So for small sample trials (n<400), simple randomization that does not account for confounders through stratification increases the risk of a false positive.

So unless Saizai is incredibly lucky and has a completely homogenous population in which to sample from, his experimental design and proposed sample size will likely increase the probability of a false positive, contrary to all of his claims and hand waiving.
 
You have not made any reasonable a priori constraints on what you are looking for.

Such as? I repeatedly asked for specific examples.

You have made no attempt to choose variables that are a reliable and valid measure of what you claim to be attempting to measure.
Such as? I repeatedly asked for specific examples.

And what do you think I am attempting to measure?

You have introduced a bias into your methods that increases the probability of obtaining a falsely significant result
Such as? I repeatedly asked for specific examples.

Your method of dismissing these concerns
I am dismissing only talk about what one might do with the results, as that isn't what I am interested in discussing here.

I have not dismissed anything about specific ways to improve the methodology, I have argued against them. My arguments have not been refuted, just ignored.

No, the protocol is clearly not the same. You say that you base part of round two on the results of round one. If anything about round two depends in any way whatsoever on round one, then it cannot be the same as round one. The preliminary and final in the JREF challenge must be exactly the same. Not similar. Exactly. If you do not assess both rounds in exactly the same way then you do not have a valid protocol for the JREF challenge.

Not so. They already say that they will be different in order to make it more difficult. That is a change, right? It's in the choice of N.

The protocol is identical: I choose, arbitrarily, a score equation at the beginning of the round. In terms of ensuring that there is no fallacy or fraud going on, it does not matter what I choose. You have ignored my repeated requests for an example of what I might choose that would cause an unusually high false positive rate.

An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.). Note that these examples were included in their article. To quote them:

And I've repeatedly asked you to propose a stratified random sampling methodology that you believe would be better.

So far you have not.

P.S. It helps if you provide a link to the article, even if it's only available to the plebes in abstract.
 
Last edited:
You asked us to critique your methodology, several of us who have significant experience with clinical trials and program evaluation have repeatedly showed you the errors in your design...

I repeatedly at the beginning of this thread implored you to seek the assistance of a biostatistician to help you work through the flaws...

We have now showed you clear evidence that you are wrong regarding your simple random sampling scheme...

Yet you have repeatedly dismissed all of our concerns, in some instances very arrogantly...

Why should any of us help you given your hubris?
 
digithead - So I take it you're not going to propose an improvement via stratified random sampling?
 
digithead - So I take it you're not going to propose an improvement via stratified random sampling?

No, there is nothing that I can propose for you as you obviously do not want to take any advice....

This board is entirely inappropriate place to provide you with experimental design advice beyond what we've already done...

We've shown you that your sampling design and sample size are inadequate for the hypothesis that you have and are prone to increasing false positives in the face of confounders, especially at the small differences that you expect...

What you do with that fact is entirely up to you...

I'll say it again despite all evidence that you cannot take any advice that does not conform to your belief that you can't possibly be wrong - seek the assistance of a biostatistician who will work with you throughout your entire clinical trial. Engaging this board as that person will not give you the level of service you so obviously need...
 
Pity, I thought you might be up for constructive discussion rather than just bashing. Ah well.

I suppose that if I were proposing a stratified random sampling, you'd be bashing me for doing something that's 'too complicated', right? ;)
 
Pity, I thought you might be up for constructive discussion rather than just bashing. Ah well.

I suppose that if I were proposing a stratified random sampling, you'd be bashing me for doing something that's 'too complicated', right? ;)

You're beyond help at this point...

I'm with Linda, I feel sorry for any of the patients that take part in your study. Hopefully, the hospital or medical staff that you will need to engage in this study will force you into some sort of IRB approval before you even recruit one patient because that's probably the only thing that will get you listen to criticism...
 
This is not for Saizai but for anyone else that might be persuaded by his argument that that simple random sampling will overcome any confounding issues...

An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.). Note that these examples were included in their article. To quote them:

So unless Saizai is incredibly lucky and has a completely homogenous population in which to sample from, his experimental design and proposed sample size will likely increase the probability of a false positive, contrary to all of his claims and hand waiving.

Let me say I have a deep suspicion that it is a good thing that JREF gets run by magicians rather than statisticians. And I will be shocked if a challenge ever comes out of this. Having said all that, these discussions sometimes provide a useful platform to clear up scientific matters.

Digithead has provided a citation from a scientific journal. The referenced article supports the notion that you get better results using stratification. But on the narrow point that Saizai has claimed, the article completely supports him. The authors write:
For trials with unstratified randomization, the erroneous finding of a statistically significant (P<0.05) difference between treatments occurred about 50 of 1,000 times in their computer simulation, regardless of endpoint rates in the constituent strata or sample size.

In other words, if you randomize as Saizai suggests and use a critical value for five percent test, you get a false positive five percent of the time - just as basic statistics tells you will happen.

-Dick Startz
 
No, there is nothing that I can propose for you as you obviously do not want to take any advice....

This board is entirely inappropriate place to provide you with experimental design advice beyond what we've already done...

We've shown you that your sampling design and sample size are inadequate for the hypothesis that you have and are prone to increasing false positives in the face of confounders, especially at the small differences that you expect...

What you do with that fact is entirely up to you...

I'll say it again despite all evidence that you cannot take any advice that does not conform to your belief that you can't possibly be wrong - seek the assistance of a biostatistician who will work with you throughout your entire clinical trial. Engaging this board as that person will not give you the level of service you so obviously need...

Just want to thank you for your posts...I'm learning a lot--it's been a while since I've done statistics--I know that the OP cannot hear you, but you are edifying others, I assure you.

It's always a bad sign when someone makes an assertion and then says "prove me wrong"--as though that had anything to do with whether they were "right".
 

Back
Top Bottom