• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

JREF Challenge Statistics

So you say.

Say an average is theoretically distributed normally with a mean of 5 and a standard deviation of 1.

We take 20 samples and observe a mean of 4.7.

Test the hypothesis that mu = 5.

This book just sets alpha = .05. Then it goes on to calculate a test statistic, and a p-value, then compares the p-value to alpha, and ends up not rejecting the null hypothesis that mu = 5.

But you're saying you can calculate alpha. Can you do it here please to shut me up?
Ah! I see the problem. (maybe). You are conflating two different sets of calculations. Of course we cannot take the numbers you give here and calculate alpha. What we need to know is what this theoretical distribution of numbers is measuring. If we are looking at medical research, for instance, where a drug is very costly and a false alarm would be expensive...or where an illness is devastating, and a miss would be intolerable. These things, which can be quantified in terms of dollar costs or person-hours or other numbers, are the relative costs of type I and type II errors in the real world (not in some hypothetical normal distribution. The relative costs of a drug, the prevalence of a disease in the population, those sorts of things are part of the analysis that goes into determining alpha.

And yes, one could determine that the ideal balance of type I and II error was an alpha of .07 (HIV drugs have been approved at that level, before the sample had sufficient power to have reached .05), although in such cases, social reasons will likely push researchers toward the .05 or .01 because everybody else uses them.

My grad stats courses spent enough time on this topic that I cannot simply think alpha is chosen out of thin air.
 
One reason is: you could think of it sort of like adding up all the cumulative errors of each test.

And

Flipping 10 coins, you expect 5 H, 5 T. But, with only 10 coins, 6, 7, or 8 H is not that rare at all, and 9 or 10, although rare, certainly something you might find if you spent just one day flipping coins.

Flipping 100 coins, you expect 50 H...but it is much more difficult to get the same percentage of H as in the smaller sample. 60H you might find, but 70 is already very rare, 80 you probably won't find in several days' attempts at flipping 100 coins in a row. 90 or 100 could take you weeks. (Basically, with just 10 flips, you only need to be off by 4 from a priori probability in order to get 90%; if you flip 100, you need to be off by 40 flips to get the same percentage. A much more difficult task.)

OK. Thanks to both of you for the explanation. I understand better why now...
 
Last edited:
Of course we cannot take the numbers you give here and calculate alpha.

Ok..

What we need to know is what this theoretical distribution of numbers is measuring.

Let's say weight in grams of a variety of beetle.

I'll even make up that cost(making Type I error) = $1, and cost(making Type II error) = 2$, if that will help get things moving.
 
I'ev seen discussions about the costs of making type 1 and type 2 errors before. But does that book have a formula that one can plug in cost(type I) and cost(type II), and possibly other things, and out comes an alpha?
Check out Kirk (2nd ed is 1984; my guess is there is a newer), chapter 1.

Yes, most researchers choose .05 or .01 out of ignorance, but that does not mean it is the only way.
 
Wow. Tai, for someone who claims to love stats, you really do not understand them.

He doesn't just claim to love statistics. He claims that he has a degree in the field.

Yet, he keeps making one rookie error after another.
 
No. Because you simply have to wait for a long enough run, which we know will happen eventually (by the Drunkard's Walk theorem). Even if you don't wait that long on this trial, the cumulative effect of a half-dozen positive-but-not-significant experiments might be enough to produce an overall finding of significance in the hands of a sufficiently corrupt statistician.
I understand the general idea of how it's supposed to work, but I don't think that it does work in the end when you look at the details. If the trials are long enough to get an excess of successes purely by chance, then they're also long enough that the small excess is not statistically significant. How could it be otherwise? The coin is fair.

I'm not sure of this, but that's how it seems to me, right now. If you or Mercutio could fill in some of the details, I'd be happy to learn.
 
Ok..

Let's say weight in grams of a variety of beetle.

I'll even make up that cost(making Type I error) = $1, and cost(making Type II error) = 2$, if that will help get things moving.
First things first. Was I right?

Were you, in fact, complaining about the use of bayesian statistics allegedly to arrive at the same numbers you did through non-bayesian means...when in fact your calculations began with an assumed alpha level, whereas the bayesian inference in question was used to come up with an alpha level in the first place? It seems very like your complaint is based on a misunderstanding.

Your choice of .01 as seeming more reasonable than .05--was that essentially an intuitive use of bayesian statistics, yourself? Or did you use some other means of determining whether it seemed right?
 
First things first. Was I right?

Don't know. You haven't shown any calculation yet.


, whereas the bayesian inference in question was used to come up with an alpha level in the first place?

By replacing the subjectiveness of choosing alpha with the subjectivenes of choosing prior odds?
 
Don't know. You haven't shown any calculation yet.
You have a tough time picking up on context, don't you? Was I right about your confusing the two?
By replacing the subjectiveness of choosing alpha with the subjectivenes of choosing prior odds?
I'll take that as a yes.
 
I'll take that as a yes.

You can interpret however you'd like. Not sure how you interpret me asking a question as a "Yes" though.

You said Dunn (2001), I believe. Is this a book, an article, what? And what is the title?
 
Last edited:
You can interpret however you'd like. Not sure how you interpret me asking a question as a "Yes" though.

You said Dunn (2001), I believe. Is this a book, an article, what? And what is the title?

All you need to do is answer "yes" or "no" to Mercutio's question.

What you don't need is to try to stir the discussion in another direction.

You are very quick to demand answers from others, but you shy away when the onus is on you.
 
Dr, alpha is still set in that problem. You've just been asked to solve backwards to find what it was set at.

Which is exactly the point I have been making for the past dozen or so posts.

What has been set is not an alpha value, but an acceptance criterion. To quote from the problem,
"It is decided to reject the null hypothesis [...] if the mean withdrawal time is less than 1.7 minutes." That's not setting "an alpha value." That's setting an acceptance criterion.

You are then asked to solve for "the probability of a Type I error," which is the definition of an alpha value.

In other words, given an ad hoc acceptance criterion and a testing scheme, solve for the alpha value.

Yes, you've mentioned that.

And you've just proven it -- again.
 
I understand the general idea of how it's supposed to work, but I don't think that it does work in the end when you look at the details. If the trials are long enough to get an excess of successes purely by chance, then they're also long enough that the small excess is not statistically significant. How could it be otherwise?

Not quite.

Remember that even a "fair" coin will yield a statistically significant result 5% of the time.
 
Alpha determines the acceptance criterion, and vice versa.

Try again.

Dude. Seriously. Read what you just wrote.

Alpha determines the acceptance criterion -- so if you know what alpha is, you can calculate the value of the acceptance criterion, yeah?

And vice versa. So if you know the acceptance criterion, you can ... calculate the value of alpha.
 
Alpha determines the acceptance criterion, and vice versa.

Which means that given an ad hoc acceptance criterion, you can calculate the alpha value of the relevant experiment and the probability of a Type I error.

What's the conceptual difficulty?
 
Which means that given an ad hoc acceptance criterion, you can calculate the alpha value of the relevant experiment and the probability of a Type I error.

You don't do a hypothesis test saying 'I'll set the criterion at 5, then I'll solve for alpha'. You set the alpha at .05, then get the criterion.

Either way, subjectively setting the criterion and alpha are the exact same information. The point is that you're not calculating alpha from thin air, so to speak. You're just setting one subjectively, and inverting to find the other.
 
You don't do a hypothesis test saying 'I'll set the criterion at 5, then I'll solve for alpha'. You set the alpha at .05, then get the criterion.

Either way, subjectively setting the criterion and alpha are the exact same information. The point is that you're not calculating alpha from thin air, so to speak. You're just setting one subjectively, and inverting to find the other.
I actually agree with you here. The bayesian inference process happens before this. You, in your analysis, appear to have pulled alpha out of...thin air. The authors do not. They weigh the costs and benefits of type I and II errors (admittedly there may be some subjectivity in this--hard costs are not always available, and trying to determine which factors are worth adding into the cost-benefit analysis can be a lot of work; still, the ability to specify even a range of values for, say, the base rate of an event, based on a posteriori numbers in the population, means that the bayesian process is a much more honest and useful method of determining alpha than the simple "it seemed reasonable" approach), and explain the process by which they determined the appropriate alpha AND criterion level.

Those who were criticising you for equating alpha and criterion level have been missing the same point you have. The bayesian step is not this one, but rather the bayesian step replaces your "out of thin air"/"reasonable"/"sensible" choice of alpha.
 
You, in your analysis, appear to have pulled alpha out of...thin air. The authors do not. They weigh the costs and benefits of type I and II errors (admittedly there may be some subjectivity in this...

I mentioned that .05 and .01 are typically used in medical studies, and this is a medical study, and that extraordinary claims require stronger evidence therefore lower alpha. This is not exactly out of thin air, but using standard statistical methods that have worked for us well for over 100 years in a huge variety of fields.

In regard to setting the prior odds, Hyman said

Since then, and continuing into our time, thousands of individuals have made these claims. Yet, not one of these claims has withstood a scientific test.

I agree, therefore I'd choose to set alpha low.

Indeed, given that not one of these claimants have produced scientific evidence in support of their ability, it would be reasonable to assign odds of several thousand to one against the truth of the claim.

"reasonable". Quick, where is Andy to say one is foolish for saying this word, but ignore that Hyman said it? :D

I decided to assume that the prior odds in favor of the null hypothesis were 99:1.

Assume.

Why not 95:1? Why not 90:1? 75:1? Etc. This approach just replaces subjectively choosing alpha with subjectively choosing the prior odds. Different people choosing different priors, could lead to different results. Whereas setting alpha is pretty much standardized, therefore different people pretty much choose the same alpha of .05, .01, or .001.

If I ask a person to choose alpha, they'll typically say .05, .01, or .001. Ask a person to set prior odds, and see what range of numbers you get.

This means that I was also assuming that the prior odds against the alternative hypothesis are also 99:1.

Assuming.

The null hypothesis in our test is that the average number of correct matches will be one.

No disagreement there. That follows from mathematical expectation given the set up of the matching problem.

Even with my setting the prior odds at this modest level, the evidence provided by the outcome still fell far short of swinging the odds in her favor.

Note "setting", not calculating.

Those who were criticising you for equating alpha and criterion level have been missing the same point you have. The bayesian step is not this one, but rather the bayesian step replaces your "out of thin air"/"reasonable"/"sensible" choice of alpha.

Discussions of costs of making various errors are abundant in frequentist literature. The Bayesian step occurs when they set prior odds.
 
I mentioned that .05 and .01 are typically used in medical studies, and this is a medical study, and that extraordinary claims require stronger evidence therefore lower alpha. This is not exactly out of thin air, but using standard statistical methods that have worked for us well for over 100 years in a huge variety of fields.
And when you realize why the alpha levels are set like as they have been, you will see that it is (tacit or expressed) bayesian inference at work.

I snipped the rest of your post--in it, you point out that certain inputs are "assum[ed]". Um...yeah, assumed based on a posteriori observation of prior occurance in the population...not pulled out of thin air, as your "assuming" comment leads one to believe. Other than that spin, you appear to accept that their choice of alpha (explicitly) and your own (implicitly) was based on fairly similar considerations of prior odds and costs of error.

I am a bit baffled, then, as to your disagreement with them. You go through the same process, but simply do not disclose what factors influenced your choice of alpha. They do, and you criticise them for "assuming" the same things that you simply sweep under the rug. Their process is much more honest and open than yours, but it appears that you have also engaged in your own form of bayesian inference at the same point they did. You just don't label it thus.

Is it only the wrong thing to do when someone else does it? Is it only the wrong thing to do when those who do it actually disclose the factors that they took into account?

Do you, at this point, acknowledge that your non-bayesian calculations address a different part of the problem than their bayesian inference? Or am I assuming too much understanding on your part?
 

Back
Top Bottom