• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Statistical scandals

Until everyone is well enough versed in maths to really understand stats, there will be problems of all sorts. As an example, how many biologists appeal to the normal (gaussian) distribution like zombies? Many. How many actually know where it comes from and therefore when it applies? Hardly any.

As I've already said several times in this thread, even professional statisticians don't understand that confidence intervals, p-values, alpha etc. are not inferential. And this situation persists despite decades of lectures & hundreds of papers making the same points I've been making here. I'm convinced it's a deeper problem than a lack of education.
 
amhartley wrote <<What does picking a representative sample have to do with my question as to whether chance exists?>>
1. What i meant is that if you do a new measurement you would not get the exact same results and therefore chance affects your data. A quick analogy imagine that you throw a dice (6 sided) 1000 times and record all results. If you want to check what the average result is and don't want to add up all the numbers you could randomly pick 30 of them and calculate a mean lets say you get 3.43 if you again pick 30 number at random you might get 3.56 if you do this enough times you will eventually get values like 2.1 or 4.9. You won't get them often but sometimes you will. and thats were chance comes in.
. . ./Hans

Hans,
What do you mean by chance? Lonergan’s review of the use of this word revealed that most people mean something is a chance event if the event occurs with no reason.
 
And you might, by dumb luck, pick people who were all exactly six feet tall within the tolerance of your measuring device. So you can't guarantee the means will be different.

The only reason I could pick any person 'exactly 6 ft tall' would be because of rounding error. Nobody is exactly 6 ft tall. We seem to agree on that.

So I agree that the experiment (with the imperfect tolerances of my devices) might yield exactly the same measured observed means between the 2 cities. But the point of the experiment is to say something about the actual (theoretical) mean heights of the 2 cities. These 2 means will be different. The hypothesis of no difference is a priori false.
 
But the point of the experiment is to say something about the actual (theoretical) mean heights of the 2 cities. These 2 means will be different.

I still don't see why that would be the case.

You seem to be implying that for any property, p, measured for any set of entities, each p must be unique, at least theoretically, if not experimentally. I don't agree with this statement. Where am I going wrong?
 
Hans,
What do you mean by chance? Lonergan’s review of the use of this word revealed that most people mean something is a chance event if the event occurs with no reason.

For me if some tings happens by chance is when something unlikely happens, like when i met my cousins in Denmark (I live in Sweden). For me that was a chance meeting. When talking about statistics I'm using chance in the meaning of random event. As in if you pick 5 cards from a deck there is a chance you get a straight. Or like in the length measuring mentioned earlier you have a chance to pick slightly longer(or shorter) than average persons to measure.

/Hans
 
The simple cause of all this is that statistics is something that seems essential to the biological sciences, so is used heavily there, yet is mathematically well beyond the vast majority of biologists.

Until everyone is well enough versed in maths to really understand stats, there will be problems of all sorts. As an example, how many biologists appeal to the normal (gaussian) distribution like zombies? Many. How many actually know where it comes from and therefore when it applies? Hardly any.

This is why when biologists do experiments, they consult a mathematician who specializes in biostats. For example, our research center shares a biostatistician with another CFE.

After that, the papers are submitted for peer-review, and the stats are examined by another pair of statisticians' eyes.
 
Imagine if I tried to measure barometric pressure with a thermometer. I might come up with statements like “if the temp is above 40 degrees C, we can consider pressure to be high. Otherwise, it’s low.” That would be a WAG. But a WAG is all one can develop for measuring pressure using a thermometer, because a thermometer is a tool made for measuring temperature, not pressure. Similarly, WAGs are all one can come up with for measuring evidence (as do statisticians) using p-values.

Well, see, this is the first concrete example, and it's 100% supporting my previous statement that statistics are subordinate to experimental design. As I exemplified, I've moved on experiments with p=.1, and rejected experiments with p=.01, because of design quality.

If the experimentor is using the wrong instrument, why are we even talking about statistics? How is this an example of a statistical issue?
 
The root problem is that p is a statement about data given H, not an inferential statement. To consider it inferential, as do most stat textbooks, statisticians and consumers of statistics, is a mistake.


Yeah, see, still, I'm not grokking the meaning of this. In day-to-day operations, I guess I'm a consumer of statistics, and I don't think I've met a single person who has ever used "p value" and "inferential" in the same sentence.

p value is used to keep screwups down to a dull roar.
 
No, p-values are not inferential.
They state the probability of incorrectly rejecting the Null hypothesis, given the data. What they do is provide you with a measure of strength for a point estimate you have derived, which is inferential.
The value of alpha is irrelevant, except when doing power calculations. The use of p<0.05 as significant / non-significant is rare these days. Its generally meant as a statement about how strong the evidence is against the Null.
It 'slightly' comes in to play, when constructing confidence intervals for point estimates. Then 95% is indeed arbitrary, but the whole reason we do it, as Martin Bland has argued elsehwere, that C.I.s aid interpretation for clinicians, rather than just a p-value. You can create confidence intervals of any size you like.
The resons you appeal to the Gaussian / Normal distribtuion is the Central Limit Therorem...
Bayesian stuff is ok, but even then the arbitraryness creeps in, with 95% 'credibility intervals'. I happen to find these even more pernicious than confidence intervals, as depending on the way your data is structured, prior choice can radically alter your inference. e.g. see here
The alternative is a likelihood based approach. This is favoured by people like Jeff Blume at Brown, who has a paper called
"What your statistician never told you about P-values."
Whilst perhaps attractive, it suffers from 2 problems:
firstly the level of arbitraryness still exists (likelihood ratio greater than 8 = evidence)
secondly it attaches as much significance to Type II as it does Type I errors, perhaps not appropriate.
Moreover in said paper, he argues that C.I's are more informative than just p-values, as we all know....
At the end of the day, if clinicians understood stats, i'd be out of a job, so it is my duty to confuse scientists so i am always needed
 
I still don't see why that would be the case.

You seem to be implying that for any property, p, measured for any set of entities, each p must be unique, at least theoretically, if not experimentally. I don't agree with this statement. Where am I going wrong?

Looking back, I did speak of "any property." I apologize; I should have said "any continuous property." E.g., certain properties, such as the number of fingers, may be equal for different people. Those properties take discrete values. Other, logical properties, e.g. self-identity (the fact that "I am me"), are equally true for different people.

But with respect to continuous properties, I am claiming that no 2 entities are equal. Any person's 2 eyes, for instance, will differ from each other in diameter, tho perhaps only by a few micrometers.

Just try, if you will, to construct 2 metal bars that are exactly the same mass. Even if they are so close to each other that every instrument says they're the same, their mass will differ at some level, some "further" decimal place.
 
For me if some tings happens by chance is when something unlikely happens, like when i met my cousins in Denmark (I live in Sweden). For me that was a chance meeting. When talking about statistics I'm using chance in the meaning of random event. As in if you pick 5 cards from a deck there is a chance you get a straight. Or like in the length measuring mentioned earlier you have a chance to pick slightly longer(or shorter) than average persons to measure.

/Hans

Hans, I've responded to you in a separate thread "nature of chance."
 
amhartley wrote: Of course, with our imperfect measuring equipment, we might get, say, the same (rounded) height from 2 people. But that's just because we can only measure things to a limited exactitude.

And you might, by dumb luck, pick people who were all exactly six feet tall within the tolerance of your measuring device. So you can't guarantee the means will be different.

James, maybe you are speaking about the sample (observed) mean? I'm speaking about the theoretical mean. Any statistical hypothesis (to which my claim "the null hypothesis is a priori false" refers) is about a theoretical measure, not an observed one.
 
This is why when biologists do experiments, they consult a mathematician who specializes in biostats. For example, our research center shares a biostatistician with another CFE.

After that, the papers are submitted for peer-review, and the stats are examined by another pair of statisticians' eyes.

Too bad the statisticians are usually as clueless on p-values as are the biologists.

But then again I've not met your particular biostatistician. Ask him or her for me, would you?, whether s/he thinks a p-value is part of inferential statistics, and why. And make sure Blutoski (oh, that's you, isn't it?) hears about it. 'Cause in a post below Blutoski said 'I don't think I've met a single person who has ever used "p value" and "inferential" in the same sentence.' Just about any textbook introduces p-values and hypothesis tests as "inferential" statistical methods.
 
Amhartley wrote: <<Imagine if I tried to measure barometric pressure with a thermometer. I might come up with statements like “if the temp is above 40 degrees C, we can consider pressure to be high. Otherwise, it’s low.” That would be a WAG. But a WAG is all one can develop for measuring pressure using a thermometer, because a thermometer is a tool made for measuring temperature, not pressure. Similarly, WAGs are all one can come up with for measuring evidence (as do statisticians) using p-values.>>
Well, see, this is the first concrete example, and it's 100% supporting my previous statement that statistics are subordinate to experimental design. As I exemplified, I've moved on experiments with p=.1, and rejected experiments with p=.01, because of design quality.

If the experimentor is using the wrong instrument, why are we even talking about statistics? How is this an example of a statistical issue?

Blutoski,
Look back if you will at the context of those 2 quotes. My point was that people try to measure evidence and confidence using p-values and hypothesis test results. They are using the wrong instruments. Doing so is as useless as trying to measure barometric pressure with a thermometer. Another example is a bank loan officer prejudging a potential borrower on the basis of the latter’s clothing. Manner of dress is certainly correlated with the ability of the latter to repay a loan, but it’s a very poor way to measure that ability. Similarly, a p-value is a very poor way to measure evidence or confidence.

I can agree that, if p-values must be used, your art of combining them with other factors (e.g., study design quality) should definitely be standard practice. However, you have not shown how that can be done in any sensible manner.
 
No, p-values are not inferential.
They state the probability of incorrectly rejecting the Null hypothesis, given the data. What they do is provide you with a measure of strength for a point estimate you have derived, which is inferential.
The value of alpha is irrelevant, except when doing power calculations. The use of p<0.05 as significant / non-significant is rare these days. Its generally meant as a statement about how strong the evidence is against the Null.
. . .

CB,
Thanks; that’s a refreshing post. However, do you really mean that a p-value “states the probability of incorrectly rejecting the Null hypothesis, given the data?” “Given the data” means “once the data are observed,” correct? If so, then once the data are observed, how is probability involved at all? Will we, say, flip a coin after doing our experiment, to decide whether to reject the hypothesis? I don’t understand.

Also, can you pls go into more detail about how p-values provide “a measure of strength for a point estimate you have derived?” I don’t get that.

As for p conveying a strength of evidence: Since you are aware of Blume & the Likelihood Paradigm for measuring evidence, are you also aware that p can be associated with any likelihood ratio at all (even after both Ho and H1 are fixed)? So, either p doesn’t measure evidence, or the LR doesn’t measure evidence, or both. You can't measure evidence both ways.

Those thots on CIs & likelihood ratios are definitely worth pursuing; however, I’m trying to control this thread from spinning out of control. If you want to start other posts about them, I’d love to join you. . .

At the end of the day, if clinicians understood stats, i'd be out of a job, so it is my duty to confuse scientists so i am always needed
CB, you set me a-dreamin’! I would suggest that if clinicians understood [basic, at least] stats, then you & I as statisticians wouldn’t be unemployed at all. We’d be free to move on to more important things than calculating p-values. E.g., we’d be capturing pre-experimental information in priors, combining them with likelihood functions to produce posterior distributions, and (with the addition of utility functions) comparing expected utilities of different decisions. That would be a blast. Don’t worry about a thing; there would be plenty of work for all of us.
 
I can agree that, if p-values must be used, your art of combining them with other factors (e.g., study design quality) should definitely be standard practice. However, you have not shown how that can be done in any sensible manner.


Well, I wasn't asked, and anyway 'a reasonable manner' is too vague to come up with a response.

What's reasonable is more or less decided by the person doing the critique, right? No two experiments are the same, so we have to use judgement.
 
Look back if you will at the context of those 2 quotes.

Ah. I see what you're saying now. The analytic approach was wrong, not the experimental protocol.



My point was that people try to measure evidence and confidence using p-values and hypothesis test results. They are using the wrong instruments. Doing so is as useless as trying to measure barometric pressure with a thermometer. Another example is a bank loan officer prejudging a potential borrower on the basis of the latter’s clothing. Manner of dress is certainly correlated with the ability of the latter to repay a loan, but it’s a very poor way to measure that ability. Similarly, a p-value is a very poor way to measure evidence or confidence.

Aaaand I disagree, right? That's why I missed the analogy, I think. IMO, p-value is perfectly appropriate, given that the purpose is answering the question: 'are these results just chance, or what?'


Maybe we could tweak technique from p values to other approaches, but it doesn't make it less arbitrary. Any analysis technique would reflect a prioritization of some value over another, which just emphasizes that research is a human enterprise.
 
IMO, p-value is perfectly appropriate, given that the purpose is answering the question: 'are these results just chance, or what?'
I don’t think (based on previous posts) you are saying the p-value can tell us whether “these results are just chance,” although a few scientists & other statisticians would make that error. I will take you to mean, therefore, that the p-value can measure the probability “these results are just chance.” Now, the results are due to chance if and only if the tested (null) hypothesis H is true. So, we would have p-value = Prob (H given data). If that’s not what you’re saying, skip the next 2 paras.

But now we are back to a misunderstanding I’ve addressed 3 or 4 times in this thread already: The p-value is a probability that assumes H; how then could it be a probability about H itself?

I’ll try to state it another way. Reasoning as if
p-value = Prob (H given data)
is like saying “I’m going to prove proposition P. Step 1: Assume P.” My point here is that you can’t derive a probability about H once you have assumed H. Once you have assumed H, Prob(H assuming H) would always be 100%.

However, once you’ve assumed H and you see data that absolutely could not arise if H were true, then H has to be false; that’s falsificationism (aka contrapositive syllogism; disproof by contradiction). I know people who try to justify p-values as an “analogy” or whatnot to falsificationism; is that what you’re talking about?
 

Back
Top Bottom