• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

How to manipulate data, get results as good as PEAR, and be $1,000,000 richer

I'm sorry. Now I feel bad that you did all that work. I didn't give an adequate description of what I meant. That result is not at all surprising (except that I'm surprised it dropped as much as it did).

I meant that out of your 10,000 (or 1,000,000) trials, the only trials that you start testing for signficance after 30 guesses would be trials number 2, 3, 5, 7, 11, 13, 17, etc. And I chose prime numbers for an example, but I don't know whether or not the proportion of numbers that are prime numbers is equal to the proportion of trials subject to early review (I suspect using prime numbers is way too high - something like squares of whole numbers may be closer).

Linda

Sadly, it's actually fun for me to do and try these sims. I need a life.

Perhaps I'm missing something here. I think I follow you that we only have the option to stop on the trial level, and not the tester level. In real life, then, I don't see how this can work. If we look at the AIDS in Africa case, we may only have one trial. One doctor sets up the study, patients come in, are given either drug or placebo, and are tested at time=t. The patient is my coin flip. The doctor, if p << 0.001, will stop the trial early and publish results.

In the coin flip/psychic case, I'm going to have 10,000 people come in to be tested, each with 385 guesses to make. Your suggestion states that most of these people have to have the full 385. I'm suggesting that even though the rule may be in place, it may not be observed. In fact, looking at my data, 99% of the people did go the full 385 tests. So someone may innocently believe that stopping early should have no major impact. Time is money, money is time, people have lives, and why keep trying more and more tests when a person is clearly showing better than random results. How many testers would be honest enough to continue?
 
Perhaps I'm missing something here. I think I follow you that we only have the option to stop on the trial level, and not the tester level. In real life, then, I don't see how this can work. If we look at the AIDS in Africa case, we may only have one trial. One doctor sets up the study, patients come in, are given either drug or placebo, and are tested at time=t. The patient is my coin flip. The doctor, if p << 0.001, will stop the trial early and publish results.

I was thinking of the body of medical research as representing a large number of trials, some of which (like the HIV/circumcision trial) have the p value calculated before the trial is finished and will stop early if p is less than their cut-off.

So one could attempt to argue that the benefit of conventional medical treatment (taken as a whole) has been exaggerated by the presence of a systematic bias, just like you argued that the actual number of psychics is exaggerated by the presence of a systematic bias (i.e. 3 psychics out of 200 when you'd expect none).

In the coin flip/psychic case, I'm going to have 10,000 people come in to be tested, each with 385 guesses to make. Your suggestion states that most of these people have to have the full 385. I'm suggesting that even though the rule may be in place, it may not be observed. In fact, looking at my data, 99% of the people did go the full 385 tests. So someone may innocently believe that stopping early should have no major impact. Time is money, money is time, people have lives, and why keep trying more and more tests when a person is clearly showing better than random results. How many testers would be honest enough to continue?

I don't think it's a matter of honesty, but rather a matter of whether the effect of the bias is recognized - something that's easy to miss when you tend to focus only on your own trial. It's a more complicated concept than a single test.

Linda
 
As an interesting thought that just popped into my head, there are often reports of medical trials that have been stopped early because of obvious negtive effects. It is possible that at least some of these could be due to this effect and not actually due to the treatment? Of course, if this does happen it would be false negatives rather than false positives, and with experimental drugs it is always better safe than sorry, but it would be interesting to know if we could be losing potential treatments due to this effect.
 
As an interesting thought that just popped into my head, there are often reports of medical trials that have been stopped early because of obvious negtive effects. It is possible that at least some of these could be due to this effect and not actually due to the treatment? Of course, if this does happen it would be false negatives rather than false positives, and with experimental drugs it is always better safe than sorry, but it would be interesting to know if we could be losing potential treatments due to this effect.

That's an interesting idea. I'll have to think about this some more, but I don't it would be as big of an issue. In the first case, you're looking for a high side effect, and you stop on a high side effect. In this case, you're looking for a high side effect, but stop on a low side effect. I think it's a wash.

For example, the tests are usually for no difference between drug A on the market and drug B which you just developed. You're thinking that we stop too early and say B is worse when in fact B is better. So if observed - expected is too small, we stop. This is assuming that expected is 0. If B is in fact better, then expected should be +5 or +10. So:

Z obs: (observed - 0)/se < -3.1 we stop, and p inflates from 0.001 to 0.02
We stop in error when: (observed - +i)/se < -3.1. However, the chances of making this mistake are much smaller than Z obs.

Or let's think of 100 coin tosses. I'll stop if the person get's 40% correct or worse. Therefore, the chances of me stopping the test assuming no ability are: Z = (.4 - .5)/(0.5/10) = -2, therefore p=2.5%
However, if the person really and truly had ESP and was up to 60% success rate, the probability that I would stop and say no evidence is now
Z = (.4 - .6)/sqrt(.6*.4/100) = -4, therefore p << 0.1%

I made my own head now. Ouch.
 

Back
Top Bottom