• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Statistical scandals

Any analysis technique would reflect a prioritization of some value over another, which just emphasizes that research is a human enterprise.

Yes and no; your sentence could be taken a variety of ways. I will just respond by saying that using p-values to reflect one's values is very messy if not impossible.
Values are reflected in stat analyses when making decisions, i.e., in decision analyses. One uses the values to assign a utility (or cost) to each potential combination of (decision, state of nature); this makes a utility function. Then, on the basis of a statistical inference process, one derives the probability of each state of nature (given the data, if any). Combining those utilities with those probabilities yields the expected utility of each decision; the idea in the end is to choose the decision with the greatest expected utility. I'd be happy to go into this more with you, but in another thread as it's slightly off-topic at the moment.
 
This is why when biologists do experiments, they consult a mathematician who specializes in biostats. For example, our research center shares a biostatistician with another CFE.

Division of labor results in more productivity except when it comes to mathematics. All scientists should be extremely well versed in maths, including stats. Outsiders should only be used to double check, not to provide the strategy in the first place.
 
Statistics is the means whereby everyone may arrive at the same guess.

Ceptimus, that's a cute one. Sounds like how Dennis Lindley said "Objectivity is subjectivity when everyone agrees." This sentiment is definitely consonant with the postmodern times we live in.

I would just respond that statistics doesn't have to be that way. It is that way is because the standard statistical results don't answer the important scientific questions. To get important meaning from those results, people have to invent that meaning.

Take the arbitrary value for alpha of 0.05, for instance. It only seems objective because most people agree among themselves to use it.

But there is another, better statistical paradigm.
 
Originally Posted by amhartley:<<I can agree that, if p-values must be used, your art of combining them with other factors (e.g., study design quality) should definitely be standard practice. However, you have not shown how that can be done in any sensible manner.>>

Well, I wasn't asked, and anyway 'a reasonable manner' is too vague to come up with a response.

What's reasonable is more or less decided by the person doing the critique, right? No two experiments are the same, so we have to use judgement.

Admittedly, studies vary, so setting up well-defined, hard and fast statistical rules for all scientific investigation may be impossible. E.g., it would be difficult to say exactly when a study design is “moderately weak” vs. “moderately strong.”

Nonetheless, we should seek objectivity & rules whenever possible. That is one of the beautiful things about bayesian inference, and one of its advantages over p-values. By mentioning bayesianism, I know I’m going to set off a storm of posts about prior distributions (let ‘em come!, tho perhaps in a separate thread please). But once one has a prior, one combines it with the likelihood function L according to mathematical rules, to get the post-experimental probability distribution function of the unknown quantity. One doesn’t combine the prior with L just any way one pleases; nor is the method decided upon by popular decree (as with alpha = 0.05). Instead, one depends in this process on mathematics.
 
Admittedly, studies vary, so setting up well-defined, hard and fast statistical rules for all scientific investigation may be impossible. E.g., it would be difficult to say exactly when a study design is “moderately weak” vs. “moderately strong.”

Nonetheless, we should seek objectivity & rules whenever possible. That is one of the beautiful things about bayesian inference, and one of its advantages over p-values. By mentioning bayesianism, I know I’m going to set off a storm of posts about prior distributions (let ‘em come!, tho perhaps in a separate thread please). But once one has a prior, one combines it with the likelihood function L according to mathematical rules, to get the post-experimental probability distribution function of the unknown quantity. One doesn’t combine the prior with L just any way one pleases; nor is the method decided upon by popular decree (as with alpha = 0.05). Instead, one depends in this process on mathematics.

Yes, medical researchers depend on mathematicians. Since there appears to be debate, p<.05 remains the rule of thumb. If/when statisticians make a persuasive case I'm sure things will change.

Be mindful that p values for medical research are all over the map. .05 is a bare minimum. As I mentioned, I push for .01 whenever the budget allows, and ideally .005.

Add to this, that nothing is decided on one study, regardless of its p value. Cochrane has little to say about single-study questions. They're "in progress." When you say you wish there are standards, well, Cochrane is a standard.

One of the other costs associated with changing analysis, is that it becomes very difficult to integrate new findings into the base of knowledge, if we can't tell if study results are comparable. It adds an element of doubt, and makes either the past or recent experiments wasted effort.

Medical research is a human endeavour that costs money. If you can convince enough statisticians that this will save researchers time or money, then you may have a case. Right now, it sounds like you think researchers prefer to use p<=.05, which is not true, and that they use this as the final word on an experiments merit, which is also not true. Especially in Phase I trials, when we don't care much about Type II errors.

Stats are the *last* thing analyzed in a paper. Clarity of endpoints, controls, blinding, selection bias... these are much more important. This is why I get exhausted when I hear "anecdotes are useless." Actually, some anecdotes are very useful.
 
Division of labor results in more productivity except when it comes to mathematics. All scientists should be extremely well versed in maths, including stats. Outsiders should only be used to double check, not to provide the strategy in the first place.

Obviously, scientists take math up to the level necessary to do their job. My job is to design protocols to test hypotheses of my teammates. To do this, I'm expected to consult with a biostatistician who is trained in this.

Any study I design is done as part of a team: the senior researcher, the MD, the pharmacist, the biochemist, the specialist (on the subject at hand), the biostatistician, and myself: the protocol designer.

If anybody whipped up an experiment by himself without engaging experts, he'd be fired pretty quick.
 
Yes, medical researchers depend on mathematicians. Since there appears to be debate, p<.05 remains the rule of thumb. If/when statisticians make a persuasive case I'm sure things will change.

Be mindful that p values for medical research are all over the map. .05 is a bare minimum. As I mentioned, I push for .01 whenever the budget allows, and ideally .005.

Where I work, the standard for >75% of the tests is alpha=0.05. When we test for first-order interactions, it is usually 0.10. I am aware that different alphas are used throughout statistical applications. That doesn’t take away from the fact, though, that choosing alpha is an arbitrary exercise.

Add to this, that nothing is decided on one study, regardless of its p value. Cochrane has little to say about single-study questions. They're "in progress." When you say you wish there are standards, well, Cochrane is a standard.

I said that IF we insist on using p-values, we should combine them with prior, extra-test information. It’s good that that is the case in many fields. But it’s worrisome that p-values don’t allow such a combination process in any systematic, traceable fashion that follows mathematical/logical rules. See my other post about how bayesian inference accomplishes this.
Furthermore, our dependence on stat testing prevents and excuses us from choosing the number of studies with any rationales. The standard number in phase 3 of the pharma industry is 2 adequate-well-controlled studies, but most of the time we have no reason for not choosing 3 or 4 studies (except maybe to fall back on tradition & convention).

One of the other costs associated with changing analysis, is that it becomes very difficult to integrate new findings into the base of knowledge, if we can't tell if study results are comparable. It adds an element of doubt, and makes either the past or recent experiments wasted effort.

Not sure where you’re going with that; are you saying having a standard alpha always & everywhere would facilitate such integration?

Medical research is a human endeavour that costs money. If you can convince enough statisticians that this will save researchers time or money, then you may have a case. Right now, it sounds like you think researchers prefer to use p<=.05, which is not true, and that they use this as the final word on an experiments merit, which is also not true. Especially in Phase I trials, when we don't care much about Type II errors.

A de-emphasis on type II errors and not type I errors (is that what you mean?) would be surprising to me, although I’ve never worked in Phase 1. I would think the ability to detect a safety signal would be highly important, maybe even more so than the ability to avoid false-alarms. Unless you’re saying that stat testing altogether is less important in phase 1, which I have heard from other sources.

Stats are the *last* thing analyzed in a paper. Clarity of endpoints, controls, blinding, selection bias... these are much more important. This is why I get exhausted when I hear "anecdotes are useless." Actually, some anecdotes are very useful.
No disagreement on that from me. But I’m trying to discern its relevance to this thread.
 
No disagreement on that from me. But I’m trying to discern its relevance to this thread.

I just don't see "a scandal" here. At least, nothing as serious than the other problems with research design: improper placebo design, absent controls, poor selection. If I were to ask about how we need to improve the system, I'd target selection first, then controls, then placebo design.

ETA: Oh, and endpoints. We're way too slack on endpoints.
 
Last edited:
amhartley said:
Medical research relies heavily on statistical results to substantiate its findings. Yet, stat results are almost always misunderstood and misinterpreted. It's really scandalous.
You really don't give any examples of objectively false interpretations.

Yet, the common statistical results (p-values, hypothesis test results & confidence intervals) are not inductive, but deductive; they assume this or that theory or hypothesis, & then make statements about data.
Uhh, no. That would be begging the question. Simply because it calculates P(A|B) doesn't mean it's assuming B.

So, p is a statement about data x and Y, assuming Ho. It's not a statement about Ho.
It involves Ho, and therefore is, in a sense, about Ho.

If I tell you that one in a million cats weighs more than 200 pounds, and 95% of cows weigh more than 200 pounds, and that I have an animal that weighs more than 200 pounds, have I given you any information about what animal I have?

Yet, one can hardly blame researchers for these misinterpretations; the misinterpretations even populate introductory statistical books, where p-values & the like are proposed as "inferential" measures and where "inference" is defined as "extending sample results to general populations."
So where does the disagreement lie? Do you deny that they are used to extend sample results to general populations? Do you deny that that is the proper meaning of "inference"? Or do you deny that they should used to decide whether to extend sample results?

Just put "+michael +oakes +p-value +medical" into google or yahoo searches & read the results! None of what I've said so far is news.
It's rather silly to expect everyone reading the thread to look through a bunch of pages tryingto figure out what you're talking about, rather than just quoting, or at least giving a link.

1. Why do people almost universally misunderstand p-values, hypothesis test results & confidence intervals?
Isn't this a loaded question?

2. Why are these tools still used, despite these shortcomings?
Before anyone can answer that question, first you have to say what you think those shortcomings actually are.

amhartley said:
The null hypothesis (in this case, IIRC) of no difference is almost never correct. So what’s the purpose of the experiment? IIRC is false a priori.
What does IIRC stand for?

Where did this value of 0.05 come from? It seems quite arbitrary.
Where did the name "amhartley" come from? It seem quite arbitrary.

However, the p-value (as well as the alpha you are actually talking about) is a statement about data assuming the hypothesis IIRC. It’s not a statement about any hypothesis itself.
How can you possibly say something about P(A|B) without saying something about B?

Being new to JREF, I would have thought that the people here take great pride in avoiding “standard assumptions.”
Blindly ignoring assumptions is just as silly as blindly following them. Also, the word "assumption" is used in statistics (and math in general) in a manner that is different from its normal use.

amhartley said:
The problem here is that any choice of alpha, beyond a WAG (wild-a**ed guess) like Fisher's,
Uh... what do you mean by "WAG"?

But neither Fisher nor Neyman-Pearson (who at least had the guts to deal with type II errors) specified a way to bring costs into their testing paradigms. To do so would have required making statements about hypotheses (and that, as I said in my first post here, is something frequentism doesn't do).
If we already knew what confidence to assign to the hypothesis, we wouldn't need statistical tests to begin with.

amhartley said:
You haven’t explained how p<=0.05 is any better than a WAG.
Again, I don't understand what you mean by "WAG".

Food for thot: e.g., in the point-null testing situation, the probability of the tested hypothesis H could be >50% even when p<0.05.
H is either true or not. It's not a random variable, and doesn't habe a probability.

Plus, data producing p<0.05 may actually constitute evidence in favor of H.
Example?

It may “increase our confidence” but it may not, and it may increase our confidence for or against H, depending on such things as power.
That would show that the rejection region is not properly constructed. It is therefore an issue of the implementation of statiscal methods, rather than the methods themselves.

Putting a stake in the ground: I think most people in JREF would agree that we should expunge science of as much arbitrariness as possible, and found our findings on solid principles whenever we can.
While getting rid of arbitrariness is a good general principle, it's not always a bad thing, and it certainly isn't always the most important issue.

Otherwise, science becomes a tool for the powerful & influential, i.e., WHOSE “commonsense” will we rely on?
If it is arbitrary, then it, by definition, doesn't matter.

Are you saying that the average Randi member, obviously interested in questioning the assumptions of the paranormal etc, would not be willing to examine their own assumptions?
I take it that by "examine", you mean "see if they're good", or something like that? If they can be objectively evaluated, then they are, again, by definition, not arbitrary.

amhartley said:
“a one in five chance of being wrong:” Can you explain this more? Are you saying that, if I conclude H is false, my chances of being wrong are 20%?
Apparently, that is indeed what Athon meant. However, it is incorrect. The correct statement is "If H is false, the chances of being wrong are 1 in 20".

“it is up to you to determine whether my results are useful:” I don’t think you mean to say that any person has complete freedom to ignore statistical results.
Anyone setting up a test has the freedom to choose any rejection region they want. That's not "ignoring" the results.

If, as you say, "Statistics is merely a tool," then would you conclude stats is not a science? I.e., it's useful for behavior & decisionmaking, but not increasing knowledge?
It's a branch of mathematics used in science, and therefore is useful in increasing knowledge.

amhartley said:
As I have said 2 times above, the “number” we pick (and I think you are referring to alpha?) has no correspondence to the “level of confidence” we can place in this or that hypothesis.
No, alpha does correspond to how much confidence we have in rejecting the null hypothesis.

amhartley said:
Athon, as I mentioned to Blutoski, p<0.05 can constitute evidence FOR the tested hypothesis H, not just evidence against it.
You are being a bit inaccurate in switching from "data that gives p<.05" to "p<.05".

Plus, it is possible that Prob(H given data)>50% even though p<0.05. Therefore, p has nothing to do with levels of confidence in hypotheses. Maybe my post referring to bayesian probabilities will clear this up for you.
Besides the issue of assigning a probability to H that I discussed earlier, it is a fallacy to say that since there are cases in which one thing is true and another is not, that it somehow follows that the two are not linked.

amhartley said:
I work with professional statisticians all the time who think p-values are statements about the tested hypotheses.
They are.

Most often, statisticians follow RA Fisher's pattern of interpretation, claiming that p measures the evidence against H,
It is a measure.

But p-values & alpha don't do that.
But they are used to do so.

amhartley said:
There are many papers & books you could read about this. Berger & Sellke had a paper in 1987 (in The American Statistician) showing the disparity, in the point-null testing situation, between p-values & the post-experimental prob of the tested hypothesis H.
Can you quote them?

This invalidates the standard guidance, followed by statisticians as well as medical types, to consider p<0.05 as “moderate evidence against H.”
Again, that's a fallacy. A single counterexample can't contradict a claim of correlation.

That would be a WAG. But a WAG is all one can develop for measuring pressure using a thermometer
Again, "WAG"?

amhartley said:
But with respect to continuous properties, I am claiming that no 2 entities are equal. Any person's 2 eyes, for instance, will differ from each other in diameter, tho perhaps only by a few micrometers.
So do electrons in the Northern Hemisphere has rest masses different from those in the Southern Hemisphere?

amhartley said:
I would just respond that statistics doesn't have to be that way. It is that way is because the standard statistical results don't answer the important scientific questions. To get important meaning from those results, people have to invent that meaning.
So how can it be different? What questions need to be answered? What new meanings can be invented?

amhartley said:
nor is the method decided upon by popular decree (as with alpha = 0.05).
.05 isn't decided upon by popular decree, and it's a parameter, not a method. Furthermore, if you use Baysesian Confidence, you still have to decide what BC you're willing to be satisfied with. You'll never get a BC of 100%, so you have to decide what's "good enough". That's just as arbitrary as alpha, and it's in addition to having to decide on priors.

amhartley said:
That doesn’t take away from the fact, though, that choosing alpha is an arbitrary exercise.
No more than choosing where to set the thermostat.
Grundar said:
If the probability is lower than a previously selected number (usually 0.05 this is the p-value)
No, it's alpha.

Using [an alpha] of 0.05 is to accept that [when the hypothesis is correct] you are wrong 1 time of 20.
Your statement is true with the above corrections.

blutoski said:
Be mindful that p values for medical research are all over the map. .05 is a bare minimum. As I mentioned, I push for .01 whenever the budget allows, and ideally .005.
That should be "alpha".

blutoski said:
Medical science isn't so totally committed to p<.05. I do studies with .01 when I can afford it.
Of course, if you use alpha=.01 for the JREF Challenge, that would mean that everyone who applies gets, on average, $1000 just for showing up. In that sort of a situation, you'd want an alpha of at most .00001. Even with that, if everyone in the world were to apply, over 6,000 would win.

athon said:
Alpha is simply the chance that your null hypothesis is correct.
No, it's not.

The lower it is, the less likely it is the situation which explains the phenomena you are observing,
Alpha is not determined by the data.

Jorghnassen said:
The thing is that, and you can test this with simulation (or possibly figure it out analytically, maybe), if you do an experiment and get a p-value below but close to .05, repeating the experiment will likely yield a p>.05, that is, a non-significant result.
How is that relevant?

If you get a p-value below .025 (or something like that), subsequent experiments will much more consistently get p<.025 (hence it is a less arbitrary cut off than .05).
How so?
 
Originally Posted by athon :
Alpha is simply the chance that your null hypothesis is correct.

No, it's not.

Small nitpick here;

It really bugs the ◊◊◊◊ out of me when you get a gunslinger walking in making short, sharp sentences disaggreeing with a statement. I don't mind being wrong, however when somebody just says 'nu-uh' without a justification, it comes across as smart-arse. I've learned nothing new, and the person making the correction looks arrogant.

That said, I've always understood that the alpha is the probability that the test stastistic is lying somewhere in the field of possibilities where the null hypothesis is correct. I have no problem being corrected if this is outlandishly wrong, especially as I'm not a statistician and admit that it does confuse me. I learn when corrected.

Sorry to derail for a moment, but this is just one of my little pet hates.

Athon
 
I read through much of this thread but only skimmed through the last part, so forgive me if I echo something already said.

I believe amhart raises a critical point. Many researchers are not as well-versed in statistics as might be desirable. The real meaning of things like p-value and alpha value escape them. Most common is probably the belief that the p-value is the probability of the Null-Hypothesis being correct, when in truth it is the probability of the Null-Hypothesis creating the kind of data being found. Furthermore, they do not appreciate that almost all Nulls are incorrect for a given degree of exactness.
I was made painfuly aware of this during a recent students´ conference (psychology). Everything revolved around the .05 and one of the best talks was ruined, because the presenters interperted findings that were very interesting as no-result, because p=.062. This is where statistics 101 has gone to far.

But I also believe that just complaining about the shortcomings of present research is not enough. One also has to see the the benefits and suggest improvements.
Let´s start with the benefits of current research. Although P(data | Ho) (i.e. p) is not the same as P(Ho | data), the two values are connected through the prior probability P(Ho). Regrettably, this probability can not be determined. Yet, in all but the more extrem cases, a low value of p also indicates a low P(Ho | data).
It´s easy to criticize, but hard to come up with an easy solution. As was mentioned above, we will never be able to be sure if any of our Hypotheses is really true. We will not even be able to come up with an exact probability of them being true. If one can´t stand that, he´s in need of another profession, researcher will not do.

Additionally, I believe there is no reason to despair. As Cohen and others have pointed out, there are no easy solutions, but that does not mean, there is nothing we can do.
First, we can start putting more emphasis on descriptive statistics. Never report the significance test first. Tell me about the effect size. Cohen´s d and many other measures are available and extend the tools we can use to tackle reality. Use figures. A bar graph (ideally with confidence invervals) will give your reader much more complete information than a value of p, so he can easier decide what to make of it.
Second, we can use the Bayesian approach. Formulate a set of Hypotheses in advance and evaluate how much support each of them receives. If we make some educated guesses about the priors, all the better.

Here´s a link to an excellent paper by Jacob Cohen who put´s all the above into more eloquent words: http://www.personal.kent.edu/~dfresco/CRM_Readings/Cohen_1990.pdf
 
Last edited:
Originally Posted by athon :




Small nitpick here;

It really bugs the ◊◊◊◊ out of me when you get a gunslinger walking in making short, sharp sentences disaggreeing with a statement. I don't mind being wrong, however when somebody just says 'nu-uh' without a justification, it comes across as smart-arse. I've learned nothing new, and the person making the correction looks arrogant.

That said, I've always understood that the alpha is the probability that the test stastistic is lying somewhere in the field of possibilities where the null hypothesis is correct. I have no problem being corrected if this is outlandishly wrong, especially as I'm not a statistician and admit that it does confuse me. I learn when corrected.

Sorry to derail for a moment, but this is just one of my little pet hates.

Athon

I just got back from an applied statistics exam this morning. It is a first year module that I'm taking as a second year option as part of an Economics and Politics degree. Anyhoo I would agree with what Art said up to the point where I understand it, I may have only attended one lecture but I've been cramming for a couple of days.

Anyway, I suspect he may have been irritated by the topic starter by the way in which he starts making sweeping statements about people not understanding statistics and then demonstrating a lack of understanding himself.

As I understand it, alpha = 5% means that you would expect to reject Ho 5% of the time, if Ho is actually true. That is not the same thing as the chance your null hypothesis is correct. It is not even the same thing as the chance your null hypothesis is incorrect.
 
I just got back from an applied statistics exam this morning. It is a first year module that I'm taking as a second year option as part of an Economics and Politics degree. Anyhoo I would agree with what Art said up to the point where I understand it, I may have only attended one lecture but I've been cramming for a couple of days.

Anyway, I suspect he may have been irritated by the topic starter by the way in which he starts making sweeping statements about people not understanding statistics and then demonstrating a lack of understanding himself.

As I understand it, alpha = 5% means that you would expect to reject Ho 5% of the time, if Ho is actually true. That is not the same thing as the chance your null hypothesis is correct. It is not even the same thing as the chance your null hypothesis is incorrect.

Don't get me wrong, I don't disagree with his content, although I agree there's much of it I can't claim to have enough knowledge on to be able to form an opinion.

It was just a narky comment I had to make as this is much the reason I try to avoid Politics, while like the Science forum. I learn stuff here; people can make an incorrect statement and others will address it hopefully without the need to get emotional about it. When somebody acts the expert in order to simply look like an expert, it annoys me. I learned nothing new by being told 'you're wrong', other than my position might be incorrect. In contrast, 03998's comment also disagreed with the thread's claim, and yet the way it was phrased was polite without seeming arrogant.

It's no big deal, and maybe I should have just grumbled to myself and said nothing. It's my puritan side coming out; keep politics out of science. :)

Athon
 
Last edited:
Don't get me wrong, I don't disagree with his content, although I agree there's much of it I can't claim to have enough knowledge on to be able to form an opinion.

It was just a narky comment I had to make as this is much the reason I try to avoid Politics, while like the Science forum. I learn stuff here; people can make an incorrect statement and others will address it hopefully without the need to get emotional about it. When somebody acts the expert in order to simply look like an expert, it annoys me. I learned nothing new by being told 'you're wrong', other than my position might be incorrect. In contrast, 03998's comment also disagreed with the thread's claim, and yet the way it was phrased was polite without seeming arrogant.

It's no big deal, and maybe I should have just grumbled to myself and said nothing. It's my puritan side coming out; keep politics out of science. :)

Athon

You were just expressing your view that telling someone they are wrong without helping to correct their understanding irritates you, there's nothing wrong with that, it is constructive criticism.

They might then express the view that they don't have the time or the inclination to explain something that is covered by (well, over here at least) the A-Level maths syllabus.

Either way, no harm no foul.
 
How is that relevant?

Well, I didn't really want to get into this conversation but I did want to explain why Fisher's .05 is so arbitrary as a cutoff point. That is, with a p-value close to .05, one wants to say P(H0|data)<=.05 when the correct interpretation is P(observed data|H0)~.05 (I'm probably paraphrasing Berger & Selke here). It is relevant as .05 is often chosen to be the cutoff point for statistical significance when it is in fact a poor choice, as gathering more data (repeating the experiment) will "often" yield non-significant results at the .05 level (proportion may vary depending on scale parameter, assuming Normal data, etc...). I do believe this is called the Astronomer's Paradox, the moral of which is if .02275<p<=.05, gather more data.

Now I do believe most statisticians have better things to do than decry the apparent misinterpretation of p-values, so this post will probably be my last in this thread.
 
But now we are back to a misunderstanding I’ve addressed 3 or 4 times in this thread already: The p-value is a probability that assumes H; how then could it be a probability about H itself?

p-values do not assume Ho. What are you talking about?. Most of the researchers agree that you must choose between 5% and 10% p values to test the null hypothesis. It is just common sense.

I think that most of your critique comes from your misunderstanding of basic statistics.

I’ll try to state it another way. Reasoning as if
p-value = Prob (H given data)
is like saying “I’m going to prove proposition P. Step 1: Assume P.” My point here is that you can’t derive a probability about H once you have assumed H. Once you have assumed H, Prob(H assuming H) would always be 100%.

No, you don't have one assumption, you have TWO assumptions. One is H0 (the null hypothesis) and the other is H1 (the alternative hypothesis). You seem to believe that by "assumption" we mean "this is how things are and I am going to prove it". You are wrong, these are working assumptions about whether or not my sample's mean converges to the true mean (from the population).

H0: x is zero
H1 x is not zero

In fact, technically we will always want to reject H0 because it says that my sample is not representative of the true population values. Rejecting or not rejecting a hypothesis does not imply that we can generalise our sample results, it just says that our estimates are reliable and consistent with repetitive experiments.

I am not sure who or what you are criticising.
 
I just don't see "a scandal" here. At least, nothing as serious than the other problems with research design: improper placebo design, absent controls, poor selection. If I were to ask about how we need to improve the system, I'd target selection first, then controls, then placebo design.

ETA: Oh, and endpoints. We're way too slack on endpoints.

I guess you'r saying something like this: At the restaurant my flounder is too dry and my wine is too wet; then for dessert my strawberry shortcake lacks the shortcake.

In other words, like the shortcake being missing from my shortcake, so-called "statistical inference" in medical research is not inference at all (at least that's my claim). However, that's not the biggest problem when the study design, controls, etc. are done badly.
 
p-values do not assume Ho. What are you talking about?
There is a sense in which they do, but it is not the same meaning of "assume" as is commonly used. P-value is "If Ho is true, then what's the probability of getting data as far or further from the mean?" Insofar as it's an "if then" statement, it "assumes" Ho.

blutoski said:
This is why I get exhausted when I hear "anecdotes are useless." Actually, some anecdotes are very useful.
Next time someone says that to you, perhaps you should slap him across the face.
"Why did you slap me?"
"Do you have any evidence that I slapped you?"
"What do you mean? You just did!"
"That's an anecdote. Anecdotes are useless".

athon said:
It really bugs the ◊◊◊◊ out of me when you get a gunslinger walking in making short, sharp sentences disaggreeing with a statement. I don't mind being wrong, however when somebody just says 'nu-uh' without a justification, it comes across as smart-arse. I've learned nothing new, and the person making the correction looks arrogant.
Well, if you're going to criticize someone's post, you should mention whose post you're quoting. Also, if you read my post carefully, you'll see that I first quoted amhartley asking you "Are you saying that, if I conclude H is false, my chances of being wrong are 20%?" Since you later answered in the affirmative, I responded to amhartley by saying "Apparently, that is indeed what Athon meant. However, it is incorrect. The correct statement is 'If H is false, the chances of being wrong are 1 in 20'." I then later quoted your actual affirmative answer, and mentioned that you were wrong already having explained how that is so. You are saying that alpha=P(Ho|reject Ho) when in fact alpha=P(reject Ho|Ho). I admit that it was a bit confusing in that I put the explanation of why you were wrong in the section quoting amhartley, but the statement with which I was disagreeing wasn't actually in your post.

That said, I've always understood that the alpha is the probability that the test stastistic is lying somewhere in the field of possibilities where the null hypothesis is correct.
I don't know what you mean here.

003998 said:
I was made painfuly aware of this during a recent students´ conference (psychology). Everything revolved around the .05 and one of the best talks was ruined, because the presenters interperted findings that were very interesting as no-result, because p=.062.
Statiscally, they were. Now, we can make decisions based on more than just statistical tests, but the fact remains that if alpha=.05, then p=.0501 is not a statistically significant result.

Let´s start with the benefits of current research. Although P(data | Ho) (i.e. p) is not the same as P(Ho | data), the two values are connected through the prior probability P(Ho).
Strictly speaking, P(Ho|data) isn't a valid probability, but putting that aside,
P(data|Ho)=P(data&Ho)/P(Ho)
P(Ho|data)=P(data&Ho)/P(data)

therefore
P(data|Ho)/P(Ho|data)=P(data)/P(Ho)

Both values on the RHS are rather meaningless terms. P(data) is the probability of getting the given data. For instance, if your data is that you rolled a die and got a two, then P(data) is the probability of getting a two. Note that I did not say that P(data) is the probability of getting a two when you roll a die, because that's not P(data) is. P(data) is the probability of getting a two overall. Which makes it, as I said, rather meaningless. How can we assign a probability to getting a two, when we don't even know whether or not we're rolling a die, or if it's a six sided die, or if it's fair? P(Ho), in this example, would be the probability that you have a fair six sided die. Which, again, is meaningless.

Jorghnassen said:
It is relevant as .05 is often chosen to be the cutoff point for statistical significance when it is in fact a poor choice, as gathering more data (repeating the experiment) will "often" yield non-significant results at the .05 level (proportion may vary depending on scale parameter, assuming Normal data, etc...).
This isn't explaining why your claim is relevant, it is simply repeating the claim.

I do believe this is called the Astronomer's Paradox, the moral of which is if .02275<p<=.05, gather more data.
It seems to me that this is rather similar to that scene in "This is Spinal Tap" where they talk about how their amp goes to 11.
 
You really don't give any examples of objectively false interpretations.
Read on in my post. If that’s not what you’re looking for, then indicate what “objectively false” means, please. Also, be precise about what you take to be probability, since, I suspect, a different understanding of probability underlies what seems to be a lack of communication here.
Uhh, no. That would be begging the question. Simply because it calculates P(A|B) doesn't mean it's assuming B.
For Pr(A|B) to mean anything, one must assume B, however momentarily & provisionally. And as soon as one stops assuming B, P(A|B) is not interpretable (if you think it remains interpretable, then pls specify what is the interpretation, in terms of inductive inference).
It involves Ho, and therefore is, in a sense, about Ho.

If I tell you that one in a million cats weighs more than 200 pounds, and 95% of cows weigh more than 200 pounds, and that I have an animal that weighs more than 200 pounds, have I given you any information about what animal I have?
I’ll try to shore up your language before answering the question. Best I can tell, in your example the tested hypothesis Ho would be “this animal is a cat.” The p-value then would be 1/1,000,000 (now that you have dichotomized the data & removed most of the stat information). In this case (unlike most applications), the p-value is the same as the likelihood of the data itself (no “more extreme” part). . . are you following & should I continue for you?
So where does the disagreement lie? Do you deny that they are used to extend sample results to general populations? Do you deny that that is the proper meaning of "inference"? Or do you deny that they should used to decide whether to extend sample results?
This issue depends on the more fundamental issue of whether P(A|B) is a statement about B.
amhartley wrote: “Just put "+michael +oakes +p-value +medical" into google or yahoo searches & read the results! None of what I've said so far is news.”
It's rather silly to expect everyone reading the thread to look through a bunch of pages tryingto figure out what you're talking about, rather than just quoting, or at least giving a link.
Did you try the search? It’s quite easy to find out what I’m talking about. Maybe I could give you some pointers on internet searches? Control-c on your IBM-compatible keyboard is for copy. . .But since you asked I will suggest you go to http://www.warnercnr.colostate.edu/~anderson/thompson1.html. Not all the citations therein are saying exactly what I’m saying here, but most are at least related.
amhartley wrote: “1. Why do people almost universally misunderstand p-values, hypothesis test results & confidence intervals?”
Isn't this a loaded question?
Again, this depends on understanding that P(A|B) is the “probability of A assuming B.”
amhartley wrote: “2. Why are these tools still used, despite these shortcomings?”
Before anyone can answer that question, first you have to say what you think those shortcomings actually are.
Did you see the jingle on restaurant commercial on TV a few years ago: “Where’s the beef (in my beef burger)?” In the case of this thread, “where’s the inference (in this statistical inference)?” The shortcomings are that statements about data given hypotheses are offered as statements about hypotheses given data.
Originally Posted by amhartley : “The null hypothesis (in this case, IIRC) of no difference is almost never correct. So what’s the purpose of the experiment? IIRC is false a priori.”
What does IIRC stand for?
See the post by Grundar on the first page of this thread.
Where did this value of 0.05 come from? It seems quite arbitrary.
Where did the name "amhartley" come from? It seem quite arbitrary.
Just a regular guy here. What can I say?
amhartley wrote: “However, the p-value (as well as the alpha you are actually talking about) is a statement about data assuming the hypothesis IIRC. It’s not a statement about any hypothesis itself.”
How can you possibly say something about P(A|B) without saying something about B?
All we say about B is that we’re assuming it. There’s a neat analogy between interpreting P(Data | hypothesis) inferentially & “affirming the consequent” if you want me to explain it. . .
amhartley wrote: “Being new to JREF, I would have thought that the people here take great pride in avoiding standard assumptions.”
Blindly ignoring assumptions is just as silly as blindly following them. Also, the word "assumption" is used in statistics (and math in general) in a manner that is different from its normal use.
Please explain.
Originally Posted by amhartley : “The problem here is that any choice of alpha, beyond a WAG (wild-a**ed guess) like Fisher's,”
Uh... what do you mean by "WAG"?
Keep filling in the stars with letters until you come up with something.
amhartley wrote: “But neither Fisher nor Neyman-Pearson (who at least had the guts to deal with type II errors) specified a way to bring costs into their testing paradigms. To do so would have required making statements about hypotheses (and that, as I said in my first post here, is something frequentism doesn't do). “
If we already knew what confidence to assign to the hypothesis, we wouldn't need statistical tests to begin with.
Now I’m confused. Your first quip asked for objective examples, as if anything about stats or probability could be completely objective. But now you’re talking about confidence. Do you mean objective confidence (if there is such a thing) or subjective confidence?
amhartley wrote: “Food for thot: e.g., in the point-null testing situation, the probability of the tested hypothesis H could be >50% even when p<0.05.”
H is either true or not. It's not a random variable, and doesn't habe a probability.
That is a popular impression. However, above you wrote about assigning confidence to a hypothesis. How is confidence different from probability? (Of course, you could deny that confidence in a H has anything to do with statistics, but if that’s true then what’s the point of statistical inference?)
amhartley wrote: “Plus, data producing p<0.05 may actually constitute evidence in favor of H.”
You might want to pick up the paper by Richard Royall, 1986, I think in “The American Statistician,” with a title like “The relation of Evidence to Sample Size.” One of the claims: Any given p can be associated with any amount of evidence against Ho. See example below.
amhartley wrote: “It may “increase our confidence” but it may not, and it may increase our confidence for or against H, depending on such things as power.”
That would show that the rejection region is not properly constructed. It is therefore an issue of the implementation of statiscal methods, rather than the methods themselves.
First, keep in mind that the question of “confidence” in stats now seems to be subject to debate, which makes me wonder why you say a change in our confidence would “show” anything. Nonetheless, here’s an example: symmetric likelihoods, Ho says mu=0 and H1 says mu=1. mu-hat=xbar=.5, p=0.01. However, the evidence for H1 vis-à-vis Ho is neutral. No change in confidence.
I admit, you could say the rejection region is badly formed, i.e., “too much power.” However, this setup follows all the rules set forth by Fisher, Neyman and Pearson. If you can fix their methods by adding thumb-rules or whatever, be my guest. People have been trying that, and failing, for half a century, so if you succeeded, you’d be famous. I’ll even buy your book.
amhartley wrote: “Putting a stake in the ground: I think most people in JREF would agree that we should expunge science of as much arbitrariness as possible, and found our findings on solid principles whenever we can.”
While getting rid of arbitrariness is a good general principle, it's not always a bad thing, and it certainly isn't always the most important issue.
The arbitrariness points to how we’re using the wrong measures for evidence & (if I can use the word) probability. You want examples?
Q: “How long will it take Cousin Jeb to drive to Lake Wobegon?”
A: “As long as it takes Uncle George to finish the haying.”
Viz., you could certainly say the time haying measures the time to arrive at the lake. But I wouldn’t count on the gasoline in the tank lasting long enough, just because George happens to be hanging up the pitchfork. And saying “5 minutes longer than the haying” or “twice as long as the haying” or anything of the sort won’t fix it either, because haying & driving to the lake have little (discernable) relation. These proposed measures of driving time are arbitrary in the sense that knowing the haying time tells you little about the driving time. Maybe rain would slow down both Jeb & George, so there’s a correlation between their times. . .And so it is with p-values and evidence for or against hypotheses.
amhartley wrote: “Otherwise, science becomes a tool for the powerful & influential, i.e., WHOSE “commonsense” will we rely on?”
If it is arbitrary, then it, by definition, doesn't matter.
Same end result as if levels of confidence in hypotheses have no place in statistical inference. What’s the point?
amhartley wrote: “Are you saying that the average Randi member, obviously interested in questioning the assumptions of the paranormal etc, would not be willing to examine their own assumptions?”
I take it that by "examine", you mean "see if they're good", or something like that? If they can be objectively evaluated, then they are, again, by definition, not arbitrary.
Sorry, I don’t follow you.
Originally Posted by amhartley : ““a one in five chance of being wrong:” Can you explain this more? Are you saying that, if I conclude H is false, my chances of being wrong are 20%?”
Apparently, that is indeed what Athon meant. However, it is incorrect. The correct statement is "If H is false, the chances of being wrong are 1 in 20".
Correct. But above you said a hypothesis is either right or wrong, and cannot have a probability. So the hypothesis of “being wrong” is either wrong or right, and cannot have a probability. C’mon, does probability apply to hypotheses maybe only intermittently? Or just when Art says it does?
amhartley wrote: ““it is up to you to determine whether my results are useful:” I don’t think you mean to say that any person has complete freedom to ignore statistical results.”
Anyone setting up a test has the freedom to choose any rejection region they want. That's not "ignoring" the results.
Please look again at the context. Rejection regions were not being discussed in that way.
amhartley wrote: “If, as you say, "Statistics is merely a tool," then would you conclude stats is not a science? I.e., it's useful for behavior & decisionmaking, but not increasing knowledge?”
It's a branch of mathematics used in science, and therefore is useful in increasing knowledge.
If I pick my nose while working my calculator, is that useful too? I’m trying to give you the benefit of the doubt that you want to contribute here, but that hypothesis is starting to present problems.
Originally Posted by amhartley : “As I have said 2 times above, the “number” we pick (and I think you are referring to alpha?) has no correspondence to the “level of confidence” we can place in this or that hypothesis.”
No, alpha does correspond to how much confidence we have in rejecting the null hypothesis.
First, please explain how it’s meaningful to talk about “confidence” and still deny that hypotheses can have probabilities. Once you’ve explained that, maybe you could explain in what way “alpha does correspond to how much confidence we have in rejecting the null hypothesis.” The language is not precise enough to understand.
Originally Posted by amhartley : “Athon, as I mentioned to Blutoski, p<0.05 can constitute evidence FOR the tested hypothesis H, not just evidence against it.”
You are being a bit inaccurate in switching from "data that gives p<.05" to "p<.05".
Sorry; however, does it interfere with getting my point across? At least within that post I didn’t use “data that gives.” I guess you got me there. Score one for Art.
amhartley wrote: “Plus, it is possible that Prob(H given data)>50% even though p<0.05. Therefore, p has nothing to do with levels of confidence in hypotheses. Maybe my post referring to bayesian probabilities will clear this up for you.”
Besides the issue of assigning a probability to H that I discussed earlier, it is a fallacy to say that since there are cases in which one thing is true and another is not, that it somehow follows that the two are not linked.
Let me serve up the baked beans for Jeb. George just put up his pitchfork! After all, their times are linked.
Originally Posted by amhartley : “I work with professional statisticians all the time who think p-values are statements about the tested hypotheses.”
They are.
Well, George is taking a bit longer. . .maybe he got a flat tire.
amhartley wrote: “Most often, statisticians follow RA Fisher's pattern of interpretation, claiming that p measures the evidence against H,”
It is a measure.
Then again, Jeb only did 75% of the field today, ‘cuz it’s starting to hail out there.
amhartley wrote: “But p-values & alpha don't do that.”
But they are used to do so.
but my statistician friend told me haying time measured driving time. Hey, what’s a statistician, anyway?
Originally Posted by amhartley : “There are many papers & books you could read about this. Berger & Sellke had a paper in 1987 (in The American Statistician) showing the disparity, in the point-null testing situation, between p-values & the post-experimental prob of the tested hypothesis H.”
Can you quote them?
All my papers burned up in the barn when George lit up his pipe after finishing haying, and then passed out for hunger ‘cuz we couldn’t start dinner ‘cuz Jeb took too long to get up to the Lake. Seriously tho, I’ll get my papers & books in about 3 more weeks.
amhartley: “This invalidates the standard guidance, followed by statisticians as well as medical types, to consider p<0.05 as “moderate evidence against H.””
Again, that's a fallacy. A single counterexample can't contradict a claim of correlation.
Correlated in what way? Be precise. We wouldn’t want George to faint again. He’s depending on your “correlations” here.
Originally Posted by amhartley : “But with respect to continuous properties, I am claiming that no 2 entities are equal. Any person's 2 eyes, for instance, will differ from each other in diameter, tho perhaps only by a few micrometers.”
So do electrons in the Northern Hemisphere has rest masses different from those in the Southern Hemisphere?
Is the mass of an electron a continuous property?
Originally Posted by amhartley : “I would just respond that statistics doesn't have to be that way. It is that way is because the standard statistical results don't answer the important scientific questions. To get important meaning from those results, people have to invent that meaning.”
So how can it be different? What questions need to be answered? What new meanings can be invented?
In “how can that be different?”, are you saying that meaning is whatever we invent? That’s farther out, more postmodern, than even Kant maintained, with his logical categories.
Originally Posted by amhartley : “nor is the method decided upon by popular decree (as with alpha = 0.05).”
.05 isn't decided upon by popular decree, and it's a parameter, not a method. Furthermore, if you use Baysesian Confidence, you still have to decide what BC you're willing to be satisfied with. You'll never get a BC of 100%, so you have to decide what's "good enough". That's just as arbitrary as alpha, and it's in addition to having to decide on priors.
How, then is 0.05 decided? According to what I’ve read, Fisher first proposed 1/20 as a thumb-rule. It has since morphed into a standard on which the futures of entire companies hinge in, e.g., the pharmaceutical industry. The justification now for 0.05 is that it provides a common standard. Blutoski seems to have mentioned this in this thread: “One of the other costs associated with changing analysis, is that it becomes very difficult to integrate new findings into the base of knowledge, if we can't tell if study results are comparable. It adds an element of doubt, and makes either the past or recent experiments wasted effort.” But I don’t know for sure why he brought up the topic.
Further, alpha is not a parameter; it’s a feature of a method called hypothesis testing. In statistics, a parameter is unknown. We know alpha.
Originally Posted by amhartley : “That doesn’t take away from the fact, though, that choosing alpha is an arbitrary exercise.”
No more than choosing where to set the thermostat.
I set the thermostat so that I’m not too hot or cold. By analogy, should I feel uncomfortable if alpha is too big or small? On what grounds?

Request: Art, I fear such lengthy posts will prevent us from concentrating on anything sufficiently. Can we pls try to focus?
 

Back
Top Bottom