• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Statistical scandals

This isn't explaining why your claim is relevant, it is simply repeating the claim.

It seems to me that this is rather similar to that scene in "This is Spinal Tap" where they talk about how their amp goes to 11.

Here's the irrelevant reference for you:

Berger, J.O. & Selke, T. (1987). Testing a Point Null Hypothesis: The Irreconcilabity of p-value and Evidence. Journal of the American Statistical Association, 82, 112-139.

Now I'm really done with this thread.
 
The simple cause of all this is that statistics is something that seems essential to the biological sciences, so is used heavily there, yet is mathematically well beyond the vast majority of biologists.

Until everyone is well enough versed in maths to really understand stats, there will be problems of all sorts. As an example, how many biologists appeal to the normal (gaussian) distribution like zombies? Many. How many actually know where it comes from and therefore when it applies? Hardly any.
Odd. That doesn't match my experience doing research with actual biologists at all. Just who are you talking about here?
 
But I also believe that just complaining about the shortcomings of present research is not enough. One also has to see the the benefits and suggest improvements.
Thanks for a refreshing perspective & potentially helpful post. Yes, eventually we need to get around to what to do about all this. After a few years of talking with people about these issues, though, I’ve come to believe that people are not open to solutions until they are aware of the severity of the problems. Status quo is just way too comfortable.
Let´s start with the benefits of current research. Although P(data | Ho) (i.e. p) is not the same as P(Ho | data), the two values are connected through the prior probability P(Ho). Regrettably, this probability can not be determined. Yet, in all but the more extrem cases, a low value of p also indicates a low P(Ho | data).
Like I was saying, there is a temptation to minimize the weaknesses of p. In the easiest cases, at least, p is a combination of estimated effect size (delta), precision (1/sigma) and sample size (n). If you know p, sigma-hat & n, you can often work backwards to estimate delta. However, this process reminds me of the 2 guys in the helicopter counting caribou:
Joe: Hey, that’s a huge herd!
Jim (pauses for 15 sec): Yeah, 347 caribou.
Joe: That’s incredible! How did you count those so fast?
Jim: Easy; just count the legs and divide by 4.
My point with that story is that, whenever possible, we should go for direct answers, rather than the more complicated approaches. Occam rules.

The second thing to say here is that it doesn’t take an ‘extreme case’ to see a huge disparity between Pr(data |Ho) and Pr(Ho|data). For instance, around where I work at least, the vast majority of our stat tests test point null hypotheses (2 sided). In that situation (see Berger & Sellke 1987, already mentioned in this thread), it is easy to get a huge disparity. Other situations present this difficulty too.

It´s easy to criticize, but hard to come up with an easy solution. As was mentioned above, we will never be able to be sure if any of our Hypotheses is really true. We will not even be able to come up with an exact probability of them being true. If one can´t stand that, he´s in need of another profession, researcher will not do.
amen to all of that, except that the bayesian can in fact develop an exact Pr(Ho|data). This prob just may not apply to someone else with different priors.
Additionally, I believe there is no reason to despair. As Cohen and others have pointed out, there are no easy solutions, but that does not mean, there is nothing we can do. First, we can start putting more emphasis on descriptive statistics. Never report the significance test first. Tell me about the effect size. Cohen´s d and many other measures are available and extend the tools we can use to tackle reality. Use figures. A bar graph (ideally with confidence invervals) will give your reader much more complete information than a value of p, so he can easier decide what to make of it.
My only reservation here is that I doubt our ability to assimilate all of that info together and come up with an optimal inference or decision. Even when you consider only p and delta, there’s no assurance of optimality.
Second, we can use the Bayesian approach. Formulate a set of Hypotheses in advance and evaluate how much support each of them receives. If we make some educated guesses about the priors, all the better.
If you can get people to consider subjective bayesianism, I’m behind you.
Here´s a link to an excellent paper by Jacob Cohen who put´s all the above into more eloquent words: http://www.personal.kent.edu/~dfresco/CRM_Readings/Cohen_1990.pdf
Yes, excellent paper. Or at least thought provoking. As I mentioned above, though, it’s like alchemy to try to come up with inferences about hypotheses when all one has are probabilities about data assuming hypotheses, even (or especially?) if one adds Cohen’s d or Max Like Estimators & whatnot to the mix.
 
Anyway, I suspect he may have been irritated by the topic starter by the way in which he starts making sweeping statements about people not understanding statistics and then demonstrating a lack of understanding himself.
Please be more specific. When I don't understand something I'm eager to know about it. And I think the sweeping statements are justified. Truly yours, The Topic Starter.
As I understand it, alpha = 5% means that you would expect to reject Ho 5% of the time, if Ho is actually true. That is not the same thing as the chance your null hypothesis is correct. It is not even the same thing as the chance your null hypothesis is incorrect.
Accurate explanation. Another way to see the difference is in the case where A and B are just propositions (discarding the idea of p-values for the moment). Then, we have (if, as Art denies, it makes sense to assign probabilities to propositions like hypotheses):
1. Pr(B given A) = Pr(A and B) / Pr(A);
2. Pr(A given B) = Pr(A and B) / Pr(B);
3. From 2, Pr(A and B) = Pr(A given B) Pr(B);
4. Putting 3 into 1: Pr(B given A) = Pr(A given B) Pr(B) / Pr(A) (Bayes theorem).
So, Pr(H given D) = Pr(D given H) Pr(H)/Pr(D), for hypothesis H and data D.

In this, you can see the relation between Pr(B given A) and Pr(A given B). If Pr(B)/Pr(A) = 1, then Pr(B given A) = Pr(A given B).
This points to the correlations, however weak, that some here have mentioned between Pr(data given hypothesis) & Pr(hypothesis given data).
 
Well, I didn't really want to get into this conversation but I did want to explain why Fisher's .05 is so arbitrary as a cutoff point. That is, with a p-value close to .05, one wants to say P(H0|data)<=.05 when the correct interpretation is P(observed data|H0)~.05 (I'm probably paraphrasing Berger & Selke here). It is relevant as .05 is often chosen to be the cutoff point for statistical significance when it is in fact a poor choice, as gathering more data (repeating the experiment) will "often" yield non-significant results at the .05 level (proportion may vary depending on scale parameter, assuming Normal data, etc...). I do believe this is called the Astronomer's Paradox, the moral of which is if .02275<p<=.05, gather more data.
That's interesting; I hadn't heard of that paradox.
Now I do believe most statisticians have better things to do than decry the apparent misinterpretation of p-values, so this post will probably be my last in this thread.
Sorry to lose you. Even tho your "justification" of 0.05 'cuz, as I mentioned at other places, Pr(data given hypothesis) can differ widely from Pr(hypothesis given data), and not only in what some call "extreme cases."

I don't know what could be more important to a statistician, as a statistician, than putting the inference back into "statistical inference."
 
Sorry to lose you. Even tho your "justification" of 0.05 'cuz, as I mentioned at other places, Pr(data given hypothesis) can differ widely from Pr(hypothesis given data), and not only in what some call "extreme cases."
I need to correct a typo:
<<Sorry to lose you. Even tho your "justification" of 0.05 is problematical 'cuz, as I mentioned at other places, Pr(data given hypothesis) can differ widely from Pr(hypothesis given data), and not only in what some call "extreme cases.">>
 
For Pr(A|B) to mean anything, one must assume B, however momentarily & provisionally.
No. According to your formulation, P(A|B)=P(A&B)/P(B). That is defined regardless of whether we assume B. The probability of getting a two when you roll a fair die is well-defined regardless of whether you actually roll a fair die.

I’ll try to shore up your language before answering the question.
Why are you using hypothesis testing if you don’t approve of it?

It’s quite easy to find out what I’m talking about.
It would be even easier if you just told me.

The shortcomings are that statements about data given hypotheses are offered as statements about hypotheses given data.
That’s not a shortcoming of the method, but its presentation.

All we say about B is that we’re assuming it.
Nope. We’re saying what happens when we assume it.

There’s a neat analogy between interpreting P(Data | hypothesis) inferentially & “affirming the consequent” if you want me to explain it. . .
If I say “If A then B, B is true, therefore A is definitely true”, then that’s affirming the consequent, and it’s a fallacy. If I say “If A then B, B is true, that supports A”, that’s not a fallacy.

Please explain.
Math “assume”: if… then
Common “assume”: accept as true

Keep filling in the stars with letters until you come up with something.
It’s not the middle word that I have a problem with. How is this a “guess”?

That is a popular impression. However, above you wrote about assigning confidence to a hypothesis. How is confidence different from probability?
Confidence is about the person. Probability is about random variables.

One of the claims: Any given p can be associated with any amount of evidence against Ho. See example below.
I see no example.

First, keep in mind that the question of “confidence” in stats now seems to be subject to debate, which makes me wonder why you say a change in our confidence would “show” anything.
A properly constructed rejection region consists of situations which would decrease our confidence. Therefore, if the rejection region contains situations that would increase our confidence, it’s not properly constructed.

Ho says mu=0 and H1 says mu=1. mu-hat=xbar=.5, p=0.01. However, the evidence for H1 vis-à-vis Ho is neutral. No change in confidence.
I admit, you could say the rejection region is badly formed, i.e., “too much power.” However, this setup follows all the rules set forth by Fisher, Neyman and Pearson. If you can fix their methods by adding thumb-rules or whatever, be my guest. People have been trying that, and failing, for half a century, so if you succeeded, you’d be famous. I’ll even buy your book.
Okay. The alternative hypothesis should include all situations other than the null hypothesis. If Ho says mu=0, then H1 should say mu!=0. That is, it should include the possibility that mu=.5. Am I going to be famous for pointing that out?

The arbitrariness points to how we’re using the wrong measures for evidence & (if I can use the word) probability.
You don’t explain how. I’m not going to respond to your cryptic Jeb/Bush stuff.

Sorry, I don’t follow you.
Then explain what you meant by it.

Correct. But above you said a hypothesis is either right or wrong, and cannot have a probability. So the hypothesis of “being wrong” is either wrong or right, and cannot have a probability. C’mon, does probability apply to hypotheses maybe only intermittently? Or just when Art says it does?
I’ve never said it does.

If I pick my nose while working my calculator, is that useful too? I’m trying to give you the benefit of the doubt that you want to contribute here, but that hypothesis is starting to present problems.
Pot, kettle.

Is the mass of an electron a continuous property?
Mass is.

In “how can that be different?”, are you saying that meaning is whatever we invent?
You said that it can be different. I’m just asking you how.

Further, alpha is not a parameter; it’s a feature of a method called hypothesis testing. In statistics, a parameter is unknown. We know alpha.
Parameters are sometimes known in statistics, I was using the term in the general mathematical sense anyway, and .05 is not a feature of hypothesis testing.

The second thing to say here is that it doesn’t take an ‘extreme case’ to see a huge disparity between Pr(data |Ho) and Pr(Ho|data). For instance, around where I work at least, the vast majority of our stat tests test point null hypotheses (2 sided). In that situation (see Berger & Sellke 1987, already mentioned in this thread), it is easy to get a huge disparity. Other situations present this difficulty too.
Except that you haven't shown how this is a difficulty.

amen to all of that, except that the bayesian can in fact develop an exact Pr(Ho|data). [/quotew]No, not probability. Confidence.
 
Last edited:
Since it wasn't answered by amhartley, IIRC = if i recall correctly.

/Hans
 
Since it wasn't answered by amhartley, IIRC = if i recall correctly.

/Hans

Hans, thanks; I misunderstood. You had said
"To analyze your results you make the fairly standard assumption that there is no difference. Called the null hypothesis IIRC."
I took you to mean you were defining a new (to me) notation for the hypothesis of no difference.
 
amhartley said: <<For Pr(A|B) to mean anything, one must assume B, however momentarily & provisionally.>>
No. According to your formulation, P(A|B)=P(A&B)/P(B). That is defined regardless of whether we assume B. The probability of getting a two when you roll a fair die is well-defined regardless of whether you actually roll a fair die.
Just try to say “P(A|B),” i.e., “the probability of A assuming B,” without using the word “assuming” (and it's no fair saying “given” instead). Sorry if that appears frivolous; I’m just at a loss to understand you. Applying what you say below: to calculate the probability of getting a 2 when rolling a fair die, we “see what happens when/if”, even contrafactually, a fair die is being rolled. You & I are probably saying the same thing here.
amhartley said: <<I’ll try to shore up your language before answering the question. >>
Why are you using hypothesis testing if you don’t approve of it?
You had described a testing situation in rough terms and, to be able to discuss it, one needs more precise language. I was filling in that language but wasn’t sure exactly the scenario you had in mind.
amhartley wrote: <<It’s quite easy to find out what I’m talking about. >>
It would be even easier if you just told me.
Example: As an experiment, Oakes gave a short quiz to 70 psychology academics. Only 3 people interpreted the p-value correctly. The majority of respondents interpreted the p-value as the post-experimental probability of the tested hypothesis.

Additionally, the experiment has been repeated many times, with similar results.
amhartley wrote: <<The shortcomings are that statements about data given hypotheses are offered as statements about hypotheses given data. >>
That’s not a shortcoming of the method, but its presentation.
The average textbook or professor defines statistical inference as “extending sample results to a larger population,” sets up a correspondence between hypotheses & populations, and then says p-values, conf intervals, and hypothesis test results are inferential. How, then, can you hope to avoid the misunderstanding that these outcomes are statements about hypotheses? How would you change the presentation?
amhartley wrote: <<All we say about B is that we’re assuming it. >>
Nope. We’re saying what happens when we assume it.
That seems right; I see no difference.
amhartley wrote: <<There’s a neat analogy between interpreting P(Data | hypothesis) inferentially & “affirming the consequent” if you want me to explain it. . . >>
If I say “If A then B, B is true, therefore A is definitely true”, then that’s affirming the consequent, and it’s a fallacy. If I say “If A then B, B is true, that supports A”, that’s not a fallacy.
First, to nitpick: If A then B (A=>B), then B supports A only when B is not a tautology.
More general comment: In statistics we don’t have “A=>B.” We only have Pr(B assuming A), which is in the interval (0,1). Now, B only supports A in relation to how well another proposition (say, C) predicts B, viz., Pr(B assuming C). Specifically, the degree to which B supports A vis-à-vis C is Pr(B assuming A)/Pr(B assuming C).
amhartley wrote: <<Please explain. >>
Math “assume”: if… then
Common “assume”: accept as true
Okay.
amhartley wrote: <<Keep filling in the stars with letters until you come up with something. >>
It’s not the middle word that I have a problem with. How is this a “guess”?
It’s a guess in that the Neyman-Pearson H testing methodology specifies no rational way of choosing alpha. That is why people choose alpha depending on thumb-rules (such as Fisher’s 0.05) or the conventions of communities of scientists, rather than on logical/mathematical laws. In simple cases, one can reverse-engineer H testing from a bayesian standpoint to choose alpha, but that’s another story. . .
amhartley wrote: <<That is a popular impression. However, above you wrote about assigning confidence to a hypothesis. How is confidence different from probability? >>
Confidence is about the person. Probability is about random variables.
Interesting. Does confidence obey any laws, or can we assign (feel?) any degree of confidence we wish? Is it quantitative? If so, is it bounded?
amhartley wrote: << One of the claims: Any given p can be associated with any amount of evidence against Ho. See example below. >>
I see no example.
I was referring to the cases, noted lower in that post, in which x supports Ho relative to H1, even though p<0.05.
amhartley wrote: <<First, keep in mind that the question of “confidence” in stats now seems to be subject to debate, which makes me wonder why you say a change in our confidence would “show” anything. >>
A properly constructed rejection region consists of situations which would decrease our confidence. Therefore, if the rejection region contains situations that would increase our confidence, it’s not properly constructed.
Maybe it is useless to discuss this until, in the discussion a few lines back, we sort out how confidence differs from probability. In the meantime, I am just wondering how, if (as you say) confidence is “about the person,” that it is also “in a hypothesis.”
amhartley wrote: << Ho says mu=0 and H1 says mu=1. mu-hat=xbar=.5, p=0.01. However, the evidence for H1 vis-à-vis Ho is neutral. No change in confidence. I admit, you could say the rejection region is badly formed, i.e., “too much power.” However, this setup follows all the rules set forth by Fisher, Neyman and Pearson. If you can fix their methods by adding thumb-rules or whatever, be my guest. People have been trying that, and failing, for half a century, so if you succeeded, you’d be famous. I’ll even buy your book. >>
Okay. The alternative hypothesis should include all situations other than the null hypothesis. If Ho says mu=0, then H1 should say mu!=0. That is, it should include the possibility that mu=.5. Am I going to be famous for pointing that out?
Sorry, there must be a typo. What did you mean by “mu!=0?” I hope you didn't send that to the publisher.
amhartley wrote: <<The arbitrariness points to how we’re using the wrong measures for evidence & (if I can use the word) probability. >>
You don’t explain how. I’m not going to respond to your cryptic Jeb/Bush stuff.
Not a republican, eh?
amhartley wrote: <<Sorry, I don’t follow you. >>
Then explain what you meant by it.
“examine” could mean “put out in the open” at least, or even better, “evaluate.”
amhartley wrote: <<Correct. But above you said a hypothesis is either right or wrong, and cannot have a probability. So the hypothesis of “being wrong” is either wrong or right, and cannot have a probability. C’mon, does probability apply to hypotheses maybe only intermittently? Or just when Art says it does? >>
I’ve never said it does.
You stated “a one in five chance of being wrong.” Someone is wrong iff s/he’s concluded a hypothesis, & the hypothesis s/he’s concluded is false, so now the hypothesis has a probability of 4 in 5. Therefore, you’re attributing a probability to a hypothesis. Unless you want to distinguish between probability and chance, in which case we’d have 3 concepts to keep distinct: probability, chance, and confidence. This is getting complicated!
amhartley wrote: <<In “how can that be different?”, are you saying that meaning is whatever we invent? >>
You said that it can be different. I’m just asking you how.
I said that the inductive meanings people derive from deductive statistical outcomes are products of their minds, or of arbitrary conventions. Viz., they’re fabricated rather than discovered. You asked, “how that can be different.” If, as you seem to suggest, fabricated meaning and objective meaning are the same, then I suppose there’s no such thing as objective truth.
amhartley wrote: <<Further, alpha is not a parameter; it’s a feature of a method called hypothesis testing. In statistics, a parameter is unknown. We know alpha. >>
Parameters are sometimes known in statistics, I was using the term in the general mathematical sense anyway, and .05 is not a feature of hypothesis testing.
The focus was on alpha, which is the primary design probability of Neyman-Pearson hypothesis testing. Alpha is not just a feature, it's the pre-eminent feature.
amhartley wrote: <<The second thing to say here is that it doesn’t take an ‘extreme case’ to see a huge disparity between Pr(data |Ho) and Pr(Ho|data). For instance, around where I work at least, the vast majority of our stat tests test point null hypotheses (2 sided). In that situation (see Berger & Sellke 1987, already mentioned in this thread), it is easy to get a huge disparity. Other situations present this difficulty too. >>
Except that you haven't shown how this is a difficulty.
Well, I suppose that if inference is about assigning confidences to people, rather than assigning probabilities to hypotheses, then we have a few things to clear up before disparities between Pr(data |Ho) and Pr(Ho|data) would take any meaning or importance.

To me, probability = confidence = chance = degree of certainty. When people take Pr(data |Ho) as Pr(Ho|data), and they do (as shown in the studies of Oakes & others, tho I'm being sloppy here about both the p-value & Ho), then they are apt to place unwarranted degrees of certainty in hypotheses, viz., states of affairs. This can lead them to make decisions that have low probabilities of turning out in their favor.

For instance, in many applications, p=0.03 and Pr(Ho|data)=.20, say, could be realized given the same data. Mistaking p for Pr(Ho|data) leads here to an unjustified strength of belief that Ho is false. That strength of belief could, in turn, support a decision to bet 7 to 1 against Ho, whereas the max justified bet against Ho would be 4 to 1.

Are you saying “confidence,” not probability, constitutes certainty & dictates what risks are acceptable to take? In that case, it becomes important to investigate the laws & behaviors, if any, of confidence. How should confidence be affected by statistical evidence? What is the range of confidence? If we can’t express confidence numerically, can we express it at all, & how? What levels of confidence make what levels of risk acceptable? In short, the investigation of confidence sounds like a whole new, and important, scientific discipline.
 
Since the question of "Whence alpha = 0.05" came up a few times in the above 2 1/2 pages, I thought I'd provide the following for anyone interested, tho of course you can't trust everything you see on the net.

http://www.isixsigma.com/forum/showmessage.asp?messageID=67035
http://ourworld.compuserve.com/homepages/rajm/jspib.htm (look under #2: “Frequentist methods are arbitrary”)

Also, here's one I hadn't seen before:
http://www.skepdic.com/psiassumption.html (look under the paragraph beginning “We should also note that the notion of statistical significance itself is an. . .), although this paragraph contains one of the myths about p-values even as it tries to dispel myths about parapsychology etc: “Statistical significance only tells us the probability that a given statistic is not spurious or due to a statistical accident.” This is incorrect.

To understand why, not that a given statistic is not spurious only if the tested hypothesis H is false. So the quote here is saying that stat significance=Pr(-H), or 1 – stat signif=Pr(H). This is, once more, equating Pr(H given Data) with Pr(Data given H).

Just one more sign that misinterpretations of p-values etc. are very difficult, if not impossible, to avoid.
 
Sorry, there must be a typo. What did you mean by “mu!=0?” .

That is a notation that normally means "mu is not equal to 0". Most often used in programming or when you don't have easy access to symbols.

I think that is what Art meant.

/Hans
 
Amhartley said: <<Any given p can be associated with any amount of evidence against Ho. See example below. It may “increase our confidence” but it may not, and it may increase our confidence for or against H, depending on such things as power. >>
Art Vandelay said: <<That would show that the rejection region is not properly constructed. It is therefore an issue of the implementation of statiscal methods, rather than the methods themselves. >>
amhartley said: <<Ho says mu=0 and H1 says mu=1. mu-hat=xbar=.5, p=0.01. However, the evidence for H1 vis-à-vis Ho is neutral. No change in confidence.
I admit, you could say the rejection region is badly formed, i.e., “too much power.” However, this setup follows all the rules set forth by Fisher, Neyman and Pearson. If you can fix their methods by adding thumb-rules or whatever, be my guest. People have been trying that, and failing, for half a century, so if you succeeded, you’d be famous. I’ll even buy your book. >>
Art Vandelay said: <<Okay. The alternative hypothesis should include all situations other than the null hypothesis. If Ho says mu=0, then H1 should say mu!=0. That is, it should include the possibility that mu=.5. Am I going to be famous for pointing that out? >>
Amhartley said: <<Sorry, there must be a typo. What did you mean by “mu!=0?” I hope you didn't send that to the publisher.>>
That is a notation that normally means "mu is not equal to 0". Most often used in programming or when you don't have easy access to symbols.

I think that is what Art meant.

/Hans

Hans, thx for clearing that up. I’m more used to “mu^=0” or “mu<>0.”

Art, we may not be able to understand one another with respect to the interaction of p-values, H testing & confidence, until you explain (as I have asked twice) how you understand “confidence.” This expln would probably have to specify how confidence differs from probability and (according to your most recent posts) chance.

As I said, for me, confidence = probability = chance; saying that, I am committing myself to my confidence obeying the Kolmogorov’s axioms of probability and definition of conditional probability, at least as an ideal. That is, I submit myself to believing in certain ways. I try to make my degrees of confidence follow norms, mathematical and otherwise. I’m not free to believe anything I wish. When my beliefs are inconsistent, at least with one another, others can call me to account & I can be corrected. I view certain beliefs as justified, others as merely unjustified (speculative), and still others as mutually exclusive (inconsistent).

In contrast, in talking with some non-bayesians, I get the feeling that their desired distinction between confidence & probability is a deliberately vague explainer (a “soft pillow,” like Einstein’s cosmological constant) which allows them to escape from constraints on beliefs. This distinction opens the door for viewing confidence in a completely postmodern fashion, as a mysterious, speculative phenomenon that may or may not follow any set of rules. Without explicit & acknowledged rules that confidence must adhere to, it becomes impossible to say whether someone is right for interpreting inductively a p-value (say) in such and such a way. Without those rules, about all one can say to a person inferring such-and-such confidence from a p-value is that “that’s not how *I* would interpret the p-value” or “I don’t know anyone else who would feel that way” or “You’ll never get that published;” the highest court of appeal becomes either intersubjectivity or professional survival. Any discussion about propriety of beliefs is dead.

Once we agree on, or at least clarify, some hypothetical (at least) nature of "confidence," what you say statistical inference is concerned about, we may be able to communicate concerning the relations of p-values and confidence.
 
Just try to say “P(A|B),” i.e., “the probability of A assuming B,” without using the word “assuming” (and it's no fair saying “given” instead).
So what am I allowed to say? Am I allowed to say “if”? “When”? “Evaluated at”?

You had described a testing situation in rough terms
I’ve described a simple hypothetical. Put aside your issues regarding testing, and just answer it directly. I’m not talking about testing, I’m just talking about inferences.

The average textbook or professor defines statistical inference as “extending sample results to a larger population,” sets up a correspondence between hypotheses & populations, and then says p-values, conf intervals, and hypothesis test results are inferential. How, then, can you hope to avoid the misunderstanding that these outcomes are statements about hypotheses? How would you change the presentation?
Well, they are statements about the hypothesis. They are not statements about the probability of the hypothesis.

As for how I would try to prevent fallacious reasoning, I would be sure to go through the precise mathematical formulation, and emphasize that validity flows from mathematical foundation.

That seems right; I see no difference.
I do. “A implies B” means “If we assume A, then we must assume B”. It is true whether we assume A or not.

First, to nitpick: If A then B (A=>B), then B supports A only when B is not a tautology.
The full statement is that if 0<P(A),P(B)<1, and P(B|A)>P(B), then P(A|B)>P(A).

Now, B only supports A in relation to how well another proposition (say, C) predicts B, viz., Pr(B assuming C). Specifically, the degree to which B supports A vis-à-vis C is Pr(B assuming A)/Pr(B assuming C).
That makes no sense. B supports A regardless of whether C does.

It’s a guess in that the Neyman-Pearson H testing methodology specifies no rational way of choosing alpha.
”Guess” refers to issues of fact, not choices in general.

Interesting. Does confidence obey any laws, or can we assign (feel?) any degree of confidence we wish? Is it quantitative?
”Confidence” is a general term. “Bayesian Confidence” is a type of confidence, and it does follow rules.

I was referring to the cases, noted lower in that post, in which x supports Ho relative to H1, even though p<0.05.
Except that that does not do what you claimed it does, nor what you are now claiming. Earlier you said it’s neutral, but now you’re saying it supports Ho.

Maybe it is useless to discuss this until, in the discussion a few lines back, we sort out how confidence differs from probability.
Probability is what’s studied in probability theory. Confidence is not.

In the meantime, I am just wondering how, if (as you say) confidence is “about the person,” that it is also “in a hypothesis.”
If I say that I trust someone, am I saying something about that other person? Or myself? Or both?

“examine” could mean “put out in the open” at least, or even better, “evaluate.”
These assumptions are in the open. And what would evaluating them entail?

You stated “a one in five chance of being wrong.” Someone is wrong iff s/he’s concluded a hypothesis, & the hypothesis s/he’s concluded is false,
Or if he rejects the hypothesis, and the hypothesis is correct.

so now the hypothesis has a probability of 4 in 5.
That doesn’t follow.

The focus was on alpha, which is the primary design probability of Neyman-Pearson hypothesis testing. Alpha is not just a feature, it's the pre-eminent feature.
No, the focus was on the specific value of .05, and that specific value is not a feature of hypothesis testing.

For instance, in many applications, p=0.03 and Pr(Ho|data)=.20, say, could be realized given the same data. Mistaking p for Pr(Ho|data) leads here to an unjustified strength of belief that Ho is false. That strength of belief could, in turn, support a decision to bet 7 to 1 against Ho, whereas the max justified bet against Ho would be 4 to 1.
Or you could have a situation where P(Ho)=1 but p=.001. Such a situation will happen only .1% of the time. If you’re willing to have it happen that frequently, then you should set alpha at .001.

Without explicit & acknowledged rules that confidence must adhere to, it becomes impossible to say whether someone is right for interpreting inductively a p-value (say) in such and such a way.
It’s already impossible.
 
Hans, thanks; I misunderstood. You had said
"To analyze your results you make the fairly standard assumption that there is no difference. Called the null hypothesis IIRC."
I took you to mean you were defining a new (to me) notation for the hypothesis of no difference.
Which in turn confused me, since I expected it to mean "if I recall correctly".

003998 said:
A bar graph (ideally with confidence invervals) will give your reader much more complete information than a value of p, so he can easier decide what to make of it.
Confidence intervals? Those are even worse. If 3 out of 70 don't understand what alphas is, I doubt 3 out of 140 know what "confidence interval" means. Answer: very little. A confidence really doesn't say anything about the true value, yet people think that it does.
 
Confidence intervals? Those are even worse. If 3 out of 70 don't understand what alphas is, I doubt 3 out of 140 know what "confidence interval" means. Answer: very little. A confidence really doesn't say anything about the true value, yet people think that it does.

In fact in Oakes' study 3 out of 70 were the ones who *did" understand the p-value.

The way I explain a 95% conf interval? If we repeated an experiment a million times, about 95% of the intervals would contain the population parameter.

The trouble with this is that it's tempting but invalid (if probabilities cannot apply to hypotheses) to conclude the proby is 95% that any given 95% CI contains the parameter.
 
So what am I allowed to say? Am I allowed to say “if”? “When”? “Evaluated at”?
Any of those options indicate you’re assuming, at least provisionally, in the “mathematical” sense you identified in another post, that H is true. It doesn’t mean that you believe H, or that you’re committing yourself to act as if H. The assumption is purely for argument’s sake.

This type of assumption is as if I’m a official for a city and, because earthquakes sometimes occur there, I design the building code to demand a certain level of earthquake protection in building structures. That doesn’t mean, however, that I’m saying an earthquake WILL occur. Further, if no earthquake occurs, I’m not liable for having demanded too much from builders.

This type of assumption is also evident in "devil's advocate" type of arguments. The arguer could be quite sure the assumption is false, but s/he makes the assumption for the sake of showing the assumption's implications.
I’ve described a simple hypothetical. Put aside your issues regarding testing, and just answer it directly. I’m not talking about testing, I’m just talking about inferences.
Before I can address that, we have foundational questions (raised above) to answer about confidence & its difference, if any, from probability. The only way I know to talk about statistical inference involves probabilities of hypotheses. Probability as inference obeys laws & can be discussed; confidence (in your as-yet undefined sense) as inference I don’t know how to discuss meaningfully.
Well, they are statements about the hypothesis. They are not statements about the probability of the hypothesis.
Well, when no data either demonstrate H nor falsify H, we can’t say whether H. So, if we can’t even speak of Pr(H given data), then what can we say?
As for how I would try to prevent fallacious reasoning, I would be sure to go through the precise mathematical formulation, and emphasize that validity flows from mathematical foundation.
If, as I believe you hold, statistical inference involves assigning confidence to hypotheses (or is it to people?), then does confidence get included in this “precise mathematical formulation?”
The full statement is that if 0<P(A),P(B)<1, and P(B|A)>P(B), then P(A|B)>P(A).
Good point.
That makes no sense. B supports A regardless of whether C does.
The degree of statistical support by B of A is only in relation to the degree B supports C. But I’m seeing now that this, too, depends on unanswered questions about “support.” Because I believe statistical “support,” or statistical evidence, in favor of H, is anything that increases Pr(H), I can talk meaningfully about support. But you, with your emphasis on “confidence,” may have something completely different in mind.
”Guess” refers to issues of fact, not choices in general.
Any rational choice depends on (imperfect) knowledge of issues of fact. E.g., if I have to choose whether to invest in Stock A, I will gather & use as much information as possible about whether Stock A will appreciate or depreciate. If I have no knowledge about those issues of fact, as in the choice of alpha, I am guessing, rather than choosing rationally.
”Confidence” is a general term. “Bayesian Confidence” is a type of confidence, and it does follow rules.
I ask again: Does your confidence in general follow rules? Or only bayesian confidence?
Except that that does not do what you claimed it does, nor what you are now claiming. Earlier you said it’s neutral, but now you’re saying it supports Ho.
That was a different example. Art, we need to focus on foundational issues first. What is confidence, does it obey rules, etc.
Probability is what’s studied in probability theory. Confidence is not.
You’ve said more about what confidence is not than what it is.
If I say that I trust someone, am I saying something about that other person? Or myself? Or both?
Good question; I think it leads in a potentially productive direction.

Humans experience (elements of) the world in what is often called the subject-object relationship. The subject is usually the one who acts or experiences; the object is the one that is acted upon or is experienced. Subjects & objects only exist together; without subjects there are no objects, & vice versa. Confidence, as a function within the s-o relationship, applies subjectively to humans, & objectively to hypotheses. The answer to your question is BOTH.

The s-o relationship “happens” in a variety of “aspects,” or types of properties and laws: quantitative, logical, biological, legal, ethical, certitudinal, and other aspects. Confidence, trust, certainty, reliance, etc. are all functions within the certitudinal aspect. They involve the degree of certainty subjects have in objects. Without objects, there’s nothing to be certain about. Without subjects, there’s no one to be certain.

I could go on. . .
Or if he rejects the hypothesis, and the hypothesis is correct.
True. Either way, once some one has concluded a state (true or false) a hypothesis, they are wrong if the other state (false or true) is true about the hypothesis. So, the probability they are wrong is a probability of the other state.
No, the focus was on the specific value of .05, and that specific value is not a feature of hypothesis testing.
Well, I guess we were not communicating, because I was talking about alpha.
Or you could have a situation where P(Ho)=1 but p=.001. Such a situation will happen only .1% of the time. If you’re willing to have it happen that frequently, then you should set alpha at .001.
You didn’t comment on the more important part of what I was saying in “For instance, in many applications, p=0.03 and Pr(Ho|data)=.20, say, could be realized given the same data. Mistaking p for Pr(Ho|data) leads here to an unjustified strength of belief that Ho is false. That strength of belief could, in turn, support a decision to bet 7 to 1 against Ho, whereas the max justified bet against Ho would be 4 to 1.” Maybe that is because you doubt probabilities apply to hypotheses at all (even tho you said here “P(Ho)=1”, but perhaps just to participate in the discussion)?
It’s already impossible.
It’s impossible “to say whether someone is right for interpreting inductively a p-value (say) in such and such a way?” I’m surprised you would say that, and I wonder about the reason. Is that because you believe there are no norms for induction, or the only norm is complete freedom, or what?
 
Well, when no data either demonstrate H nor falsify H, we can’t say whether H. So, if we can’t even speak of Pr(H given data), then what can we say?
We can saythat we are more willing to accept (orreject) Ho.

If, as I believe you hold, statistical inference involves assigning confidence to hypotheses (or is it to people?), then does confidence get included in this “precise mathematical formulation?”
No.

The degree of statistical support by B of A is only in relation to the degree B supports C. But I’m seeing now that this, too, depends on unanswered questions about “support.” Because I believe statistical “support,” or statistical evidence, in favor of H, is anything that increases Pr(H), I can talk meaningfully about support.
Don't those ideas contradict each other? Whether A increases P(B) is independent of whether C increases P(B).

Any rational choice depends on (imperfect) knowledge of issues of fact. E.g., if I have to choose whether to invest in Stock A, I will gather & use as much information as possible about whether Stock A will appreciate or depreciate. If I have no knowledge about those issues of fact, as in the choice of alpha, I am guessing, rather than choosing rationally.
Which stock to pick is a choice. The claim that one will do better than the other is a guess. Picking a stock is not a guess. It's based on guesses, but it's not a guess.

I ask again: Does your confidence in general follow rules? Or only bayesian confidence?
Confidence in general does not follow rules. Bayesian confidence, as well as other types, do.

True. Either way, once some one has concluded a state (true or false) a hypothesis, they are wrong if the other state (false or true) is true about the hypothesis. So, the probability they are wrong is a probability of the other state.
This is a common misconception of hypothesis testing. Alpha should be calculated before the test is conducted, before any data is collected, and before any conclusion is reached. As you say, once one has reached a conclusion, it makes little sense to talk about the probability of it being true unless you are assigning that probability to the hypothesis itself.

You didn’t comment on the more important part of what I was saying in
What do you think is the important part? What position does it establish?

It’s impossible “to say whether someone is right for interpreting inductively a p-value (say) in such and such a way?” I’m surprised you would say that, and I wonder about the reason. Is that because you believe there are no norms for induction, or the only norm is complete freedom, or what?
It is inherently a subjective process, and any objectivity is illusury.
 
amhartley wrote: <<True. Either way, once some one has concluded a state (true or false) a hypothesis, they are wrong if the other state (false or true) is true about the hypothesis. So, the probability they are wrong is a probability of the other state. >>
This is a common misconception of hypothesis testing. Alpha should be calculated before the test is conducted, before any data is collected, and before any conclusion is reached. As you say, once one has reached a conclusion, it makes little sense to talk about the probability of it being true unless you are assigning that probability to the hypothesis itself.
You stated “a one in five chance of being wrong.” If you were referring to alpha in that, you didn’t say so. I, like most scientists outside statistics, was and remain concerned about inference, i.e., “post-experimentally, what can we say about unknown states of nature, based on observed data?” Apparently, you were talking about the deductive pre-experimental probability assuming H, i.e., “pre-experimentally, assuming H, what is the chance I’ll reject H?”

My different interpretation is not a “common misconception,” but rather a persistent focus on statistics as inference.
amhartley wrote: <<It’s impossible “to say whether someone is right for interpreting inductively a p-value (say) in such and such a way?” I’m surprised you would say that, and I wonder about the reason. Is that because you believe there are no norms for induction, or the only norm is complete freedom, or what? >>
It is inherently a subjective process, and any objectivity is illusury.
Yes, that is the supreme irony in frequentist testing as “inference:” It purports to be objective (celebrating its freedom from bayesian priors) but, because its testing procedures make no statements about hypotheses, the inferences (what you interpret as statements about hypotheses) it ends with are a subjectivist human construct. Neyman-Pearson hypothesis test inferences are fabricated, rather than discovered (in N-P's defense, they explicity excluded inference from what their testing can do).

This subjectivism is a result of the modern-to postmodern “Copernican Revolution” within the philosophy of science, with Kant as the pivotal figure. Before Kant, scientists sought to discover truths about nature & the world, etc. Ideally, at least, humans would adjust their beliefs to what nature revealed. With Kant, science and, indeed, human experience became a product of humans’ “logical categories.” After Kant, the reversal in science was completed: human experience became whatever people make it. Instead of science helping people to learn about the world, humans are envisioned as creating that world.
 
You stated “a one in five chance of being wrong.”
It was originally Athon that brought it up, and it sure seemed like he was talking about alpha.

My different interpretation is not a “common misconception,” but rather a persistent focus on statistics as inference.
It is a common misconception that alpha is based on the data, rather than the test. Once you have completed the test, the result of the test is no longer a random variable.

It purports to be objective (celebrating its freedom from bayesian priors) but, because its testing procedures make no statements about hypotheses, the inferences (what you interpret as statements about hypotheses) it ends with are a subjectivist human construct.
Yes. Which is its strength. It clearly separates the objective statements from the subjective interpretations. As opposed to Bayesian, which lumps it all together in one package.
 

Back
Top Bottom