You really don't give any examples of objectively false interpretations.
Read on in my post. If that’s not what you’re looking for, then indicate what “objectively false” means, please. Also, be precise about what you take to be probability, since, I suspect, a different understanding of probability underlies what seems to be a lack of communication here.
Uhh, no. That would be begging the question. Simply because it calculates P(A|B) doesn't mean it's assuming B.
For Pr(A|B) to mean anything, one must assume B, however momentarily & provisionally. And as soon as one stops assuming B, P(A|B) is not interpretable (if you think it remains interpretable, then pls specify what is the interpretation, in terms of inductive inference).
It involves Ho, and therefore is, in a sense, about Ho.
If I tell you that one in a million cats weighs more than 200 pounds, and 95% of cows weigh more than 200 pounds, and that I have an animal that weighs more than 200 pounds, have I given you any information about what animal I have?
I’ll try to shore up your language before answering the question. Best I can tell, in your example the tested hypothesis Ho would be “this animal is a cat.” The p-value then would be 1/1,000,000 (now that you have dichotomized the data & removed most of the stat information). In this case (unlike most applications), the p-value is the same as the likelihood of the data itself (no “more extreme” part). . . are you following & should I continue for you?
So where does the disagreement lie? Do you deny that they are used to extend sample results to general populations? Do you deny that that is the proper meaning of "inference"? Or do you deny that they should used to decide whether to extend sample results?
This issue depends on the more fundamental issue of whether P(A|B) is a statement about B.
amhartley wrote: “Just put "+michael +oakes +p-value +medical" into google or yahoo searches & read the results! None of what I've said so far is news.”
It's rather silly to expect everyone reading the thread to look through a bunch of pages tryingto figure out what you're talking about, rather than just quoting, or at least giving a link.
Did you try the search? It’s quite easy to find out what I’m talking about. Maybe I could give you some pointers on internet searches? Control-c on your IBM-compatible keyboard is for copy. . .But since you asked I will suggest you go to
http://www.warnercnr.colostate.edu/~anderson/thompson1.html. Not all the citations therein are saying exactly what I’m saying here, but most are at least related.
amhartley wrote: “1. Why do people almost universally misunderstand p-values, hypothesis test results & confidence intervals?”
Isn't this a loaded question?
Again, this depends on understanding that P(A|B) is the “probability of A assuming B.”
amhartley wrote: “2. Why are these tools still used, despite these shortcomings?”
Before anyone can answer that question, first you have to say what you think those shortcomings actually are.
Did you see the jingle on restaurant commercial on TV a few years ago: “Where’s the beef (in my beef burger)?” In the case of this thread, “where’s the inference (in this statistical inference)?” The shortcomings are that statements about data given hypotheses are offered as statements about hypotheses given data.
Originally Posted by amhartley : “The null hypothesis (in this case, IIRC) of no difference is almost never correct. So what’s the purpose of the experiment? IIRC is false a priori.”
What does IIRC stand for?
See the post by Grundar on the first page of this thread.
Where did this value of 0.05 come from? It seems quite arbitrary.
Where did the name "amhartley" come from? It seem quite arbitrary.
Just a regular guy here. What can I say?
amhartley wrote: “However, the p-value (as well as the alpha you are actually talking about) is a statement about data assuming the hypothesis IIRC. It’s not a statement about any hypothesis itself.”
How can you possibly say something about P(A|B) without saying something about B?
All we say about B is that we’re assuming it. There’s a neat analogy between interpreting P(Data | hypothesis) inferentially & “affirming the consequent” if you want me to explain it. . .
amhartley wrote: “Being new to JREF, I would have thought that the people here take great pride in avoiding standard assumptions.”
Blindly ignoring assumptions is just as silly as blindly following them. Also, the word "assumption" is used in statistics (and math in general) in a manner that is different from its normal use.
Please explain.
Originally Posted by amhartley : “The problem here is that any choice of alpha, beyond a WAG (wild-a**ed guess) like Fisher's,”
Uh... what do you mean by "WAG"?
Keep filling in the stars with letters until you come up with something.
amhartley wrote: “But neither Fisher nor Neyman-Pearson (who at least had the guts to deal with type II errors) specified a way to bring costs into their testing paradigms. To do so would have required making statements about hypotheses (and that, as I said in my first post here, is something frequentism doesn't do). “
If we already knew what confidence to assign to the hypothesis, we wouldn't need statistical tests to begin with.
Now I’m confused. Your first quip asked for objective examples, as if anything about stats or probability could be completely objective. But now you’re talking about confidence. Do you mean objective confidence (if there is such a thing) or subjective confidence?
amhartley wrote: “Food for thot: e.g., in the point-null testing situation, the probability of the tested hypothesis H could be >50% even when p<0.05.”
H is either true or not. It's not a random variable, and doesn't habe a probability.
That is a popular impression. However, above you wrote about assigning confidence to a hypothesis. How is confidence different from probability? (Of course, you could deny that confidence in a H has anything to do with statistics, but if that’s true then what’s the point of statistical inference?)
amhartley wrote: “Plus, data producing p<0.05 may actually constitute evidence in favor of H.”
You might want to pick up the paper by Richard Royall, 1986, I think in “The American Statistician,” with a title like “The relation of Evidence to Sample Size.” One of the claims: Any given p can be associated with any amount of evidence against Ho. See example below.
amhartley wrote: “It may “increase our confidence” but it may not, and it may increase our confidence for or against H, depending on such things as power.”
That would show that the rejection region is not properly constructed. It is therefore an issue of the implementation of statiscal methods, rather than the methods themselves.
First, keep in mind that the question of “confidence” in stats now seems to be subject to debate, which makes me wonder why you say a change in our confidence would “show” anything. Nonetheless, here’s an example: symmetric likelihoods, Ho says mu=0 and H1 says mu=1. mu-hat=xbar=.5, p=0.01. However, the evidence for H1 vis-à-vis Ho is neutral. No change in confidence.
I admit, you could say the rejection region is badly formed, i.e., “too much power.” However, this setup follows all the rules set forth by Fisher, Neyman and Pearson. If you can fix their methods by adding thumb-rules or whatever, be my guest. People have been trying that, and failing, for half a century, so if you succeeded, you’d be famous. I’ll even buy your book.
amhartley wrote: “Putting a stake in the ground: I think most people in JREF would agree that we should expunge science of as much arbitrariness as possible, and found our findings on solid principles whenever we can.”
While getting rid of arbitrariness is a good general principle, it's not always a bad thing, and it certainly isn't always the most important issue.
The arbitrariness points to how we’re using the wrong measures for evidence & (if I can use the word) probability. You want examples?
Q: “How long will it take Cousin Jeb to drive to Lake Wobegon?”
A: “As long as it takes Uncle George to finish the haying.”
Viz., you could certainly say the time haying measures the time to arrive at the lake. But I wouldn’t count on the gasoline in the tank lasting long enough, just because George happens to be hanging up the pitchfork. And saying “5 minutes longer than the haying” or “twice as long as the haying” or anything of the sort won’t fix it either, because haying & driving to the lake have little (discernable) relation. These proposed measures of driving time are arbitrary in the sense that knowing the haying time tells you little about the driving time. Maybe rain would slow down both Jeb & George, so there’s a correlation between their times. . .And so it is with p-values and evidence for or against hypotheses.
amhartley wrote: “Otherwise, science becomes a tool for the powerful & influential, i.e., WHOSE “commonsense” will we rely on?”
If it is arbitrary, then it, by definition, doesn't matter.
Same end result as if levels of confidence in hypotheses have no place in statistical inference. What’s the point?
amhartley wrote: “Are you saying that the average Randi member, obviously interested in questioning the assumptions of the paranormal etc, would not be willing to examine their own assumptions?”
I take it that by "examine", you mean "see if they're good", or something like that? If they can be objectively evaluated, then they are, again, by definition, not arbitrary.
Sorry, I don’t follow you.
Originally Posted by amhartley : ““a one in five chance of being wrong:” Can you explain this more? Are you saying that, if I conclude H is false, my chances of being wrong are 20%?”
Apparently, that is indeed what Athon meant. However, it is incorrect. The correct statement is "If H is false, the chances of being wrong are 1 in 20".
Correct. But above you said a hypothesis is either right or wrong, and cannot have a probability. So the hypothesis of “being wrong” is either wrong or right, and cannot have a probability. C’mon, does probability apply to hypotheses maybe only intermittently? Or just when Art says it does?
amhartley wrote: ““it is up to you to determine whether my results are useful:” I don’t think you mean to say that any person has complete freedom to ignore statistical results.”
Anyone setting up a test has the freedom to choose any rejection region they want. That's not "ignoring" the results.
Please look again at the context. Rejection regions were not being discussed in that way.
amhartley wrote: “If, as you say, "Statistics is merely a tool," then would you conclude stats is not a science? I.e., it's useful for behavior & decisionmaking, but not increasing knowledge?”
It's a branch of mathematics used in science, and therefore is useful in increasing knowledge.
If I pick my nose while working my calculator, is that useful too? I’m trying to give you the benefit of the doubt that you want to contribute here, but that hypothesis is starting to present problems.
Originally Posted by amhartley : “As I have said 2 times above, the “number” we pick (and I think you are referring to alpha?) has no correspondence to the “level of confidence” we can place in this or that hypothesis.”
No, alpha does correspond to how much confidence we have in rejecting the null hypothesis.
First, please explain how it’s meaningful to talk about “confidence” and still deny that hypotheses can have probabilities. Once you’ve explained that, maybe you could explain in what way “alpha does correspond to how much confidence we have in rejecting the null hypothesis.” The language is not precise enough to understand.
Originally Posted by amhartley : “Athon, as I mentioned to Blutoski, p<0.05 can constitute evidence FOR the tested hypothesis H, not just evidence against it.”
You are being a bit inaccurate in switching from "data that gives p<.05" to "p<.05".
Sorry; however, does it interfere with getting my point across? At least within that post I didn’t use “data that gives.” I guess you got me there. Score one for Art.
amhartley wrote: “Plus, it is possible that Prob(H given data)>50% even though p<0.05. Therefore, p has nothing to do with levels of confidence in hypotheses. Maybe my post referring to bayesian probabilities will clear this up for you.”
Besides the issue of assigning a probability to H that I discussed earlier, it is a fallacy to say that since there are cases in which one thing is true and another is not, that it somehow follows that the two are not linked.
Let me serve up the baked beans for Jeb. George just put up his pitchfork! After all, their times are linked.
Originally Posted by amhartley : “I work with professional statisticians all the time who think p-values are statements about the tested hypotheses.”
Well, George is taking a bit longer. . .maybe he got a flat tire.
amhartley wrote: “Most often, statisticians follow RA Fisher's pattern of interpretation, claiming that p measures the evidence against H,”
Then again, Jeb only did 75% of the field today, ‘cuz it’s starting to hail out there.
amhartley wrote: “But p-values & alpha don't do that.”
But they are used to do so.
but my statistician friend told me haying time measured driving time. Hey, what’s a statistician, anyway?
Originally Posted by amhartley : “There are many papers & books you could read about this. Berger & Sellke had a paper in 1987 (in The American Statistician) showing the disparity, in the point-null testing situation, between p-values & the post-experimental prob of the tested hypothesis H.”
All my papers burned up in the barn when George lit up his pipe after finishing haying, and then passed out for hunger ‘cuz we couldn’t start dinner ‘cuz Jeb took too long to get up to the Lake. Seriously tho, I’ll get my papers & books in about 3 more weeks.
amhartley: “This invalidates the standard guidance, followed by statisticians as well as medical types, to consider p<0.05 as “moderate evidence against H.””
Again, that's a fallacy. A single counterexample can't contradict a claim of correlation.
Correlated in what way? Be precise. We wouldn’t want George to faint again. He’s depending on your “correlations” here.
Originally Posted by amhartley : “But with respect to continuous properties, I am claiming that no 2 entities are equal. Any person's 2 eyes, for instance, will differ from each other in diameter, tho perhaps only by a few micrometers.”
So do electrons in the Northern Hemisphere has rest masses different from those in the Southern Hemisphere?
Is the mass of an electron a continuous property?
Originally Posted by amhartley : “I would just respond that statistics doesn't have to be that way. It is that way is because the standard statistical results don't answer the important scientific questions. To get important meaning from those results, people have to invent that meaning.”
So how can it be different? What questions need to be answered? What new meanings can be invented?
In “how can that be different?”, are you saying that meaning is whatever we invent? That’s farther out, more postmodern, than even Kant maintained, with his logical categories.
Originally Posted by amhartley : “nor is the method decided upon by popular decree (as with alpha = 0.05).”
.05 isn't decided upon by popular decree, and it's a parameter, not a method. Furthermore, if you use Baysesian Confidence, you still have to decide what BC you're willing to be satisfied with. You'll never get a BC of 100%, so you have to decide what's "good enough". That's just as arbitrary as alpha, and it's in addition to having to decide on priors.
How, then is 0.05 decided? According to what I’ve read, Fisher first proposed 1/20 as a thumb-rule. It has since morphed into a standard on which the futures of entire companies hinge in, e.g., the pharmaceutical industry. The justification now for 0.05 is that it provides a common standard. Blutoski seems to have mentioned this in this thread: “One of the other costs associated with changing analysis, is that it becomes very difficult to integrate new findings into the base of knowledge, if we can't tell if study results are comparable. It adds an element of doubt, and makes either the past or recent experiments wasted effort.” But I don’t know for sure why he brought up the topic.
Further, alpha is not a parameter; it’s a feature of a method called hypothesis testing. In statistics, a parameter is unknown. We know alpha.
Originally Posted by amhartley : “That doesn’t take away from the fact, though, that choosing alpha is an arbitrary exercise.”
No more than choosing where to set the thermostat.
I set the thermostat so that I’m not too hot or cold. By analogy, should I feel uncomfortable if alpha is too big or small? On what grounds?
Request: Art, I fear such lengthy posts will prevent us from concentrating on anything sufficiently. Can we pls try to focus?