• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

(Another) random / coin tossing thread

My understanding is that frequentist deny that you can assign probabilities to distributions that you can't sample from.

OK, I can see where that might come from (although I don't agree with it).

As a frequentist doesn't have this initial P(unbiased), they will:
1)assign different weights to the possible outcomes compared to a bayesian using informative priors.
2)not look to generate an updated probability,as they believe the idea to be meaningless.They just want to know if it is reasonable to continue holding their initial assumptions.

OK - but don't they need to know numerically how unreasonable it would be to continue holding their assumptions? And once they've computed that, what meaning do they assign that number? How do they reconcile it with your above statement of their position?

And furthermore, suppose we allow for the (rather reasonable) possibility that the coin is only partially biased. In other words, not that it has heads on both sides, but just that it's weighted a little, and so more likely to come up heads. That weight is continuous variable - how can I assign probabilities of either 0 or 1 to it? And how can I decide which part of the range is ruled out, and with what confidence, if I can't use some version of Bayes' theorem (with or without all priors set equal)?
 
Last edited:
What's the difference between saying the prob the coin is biased is X%, and setting up a cutoff and saying you're past that cutoff? The second simply contains less information than the first (as far as I can tell).

Technically speaking, a statement "I reject the null hypothesis' contains no probability information whatsoever. It's a pure proposition that describes the result of an experiment. If you want your conclusions to have a probabilistic form, then you will need to use Bayes' theorem. Frequentist statistics are functions from observations to propositions. Bayesian statements are functions from (observations and probability distributions) to probability distributions).

But most people don't want their conclusions to be probabilistic, and don't feel comfortable "priming the pump" with unsupported prior distributions. And even with a Bayesian statement, you still need to get to propositional content at some point. Somehow, you need to get from the statement "there is a 75% chance that the coin is biased" to "he is a cheating bastard; break his thumbs." Where do you draw the cutoff?

EDIT - the stuff about patterns in the flips is a red herring (I think). Neither a Bayesian nor a frequentist would notice them if she didn't look. I think we should focus just on the total (heads-tails).

Shrug. I wasn't the one who brought it up first.
 
Translating back to the coin case, if you're trying to decide how likely a coin is to be fair, by tossing it n times and observing the resulting sequence, you need to know (1) the prior probability that it is fair, (2) the probability, supposing it's fair, of getting that sequence, and (3) the probability, supposing it's not fair, of getting that sequence.

And that's exactly what you cannot have.

How may different ways are there of being unfair? There are, quite literally, infinitely many ways. More than that, there are uncountably infinitely many ways of being unfair -- in fact, there are at least the power set of uncountably infinitely many ways of being fair.

That's too many to assign a probability distribution over. There is literally no way to quantify "not fair" in all possible ways (you run out of numbers first), so Bayesian statistics cannot look for all patterns at once.
 
No. In fact, it's a domain error to try to calculate that as a numeric term.

Interesting. So frequentists are like ostriches...

As far as I can see this kind of reasoning it totally useless in the physical sciences where there is a continuum of possible models, and one needs to define confidence intervals.
 
OK - but don't they need to know numerically how unreasonable it would be to continue holding their assumptions? And once they've computed that, what meaning do they assign that number? How do they reconcile it with your above statement of their position?

They only calculate P(S|unbiased), which tells you the probability of obtaining a sample containing the same information assuming your model is correct.
If this probability falls below an arbitrary threshold, they decide to reject the null hypothesis and move onto the next one, in which case the number isn't useful.
If the probability is above the threshold they keep the hypothesis and move onto the next data set/test, in which case the number still isn't useful as they can't talk about the probability of a hypothesis being correct.

And furthermore, suppose we allow for the (rather reasonable) possibility that the coin is only partially biased. In other words, not that it has heads on both sides, but just that it's weighted a little, and so more likely to come up heads. That weight is continuous variable - how can I assign probabilities of either 0 or 1 to it? And how can I decide which part of the range is ruled out, and with what confidence, if I can't use some version of Bayes' theorem (with or without all priors set equal)?
The talk of a continuous variable is something of a red herring. Any particular coin either has a probability of 0.7631245... of coming up heads or it doesn't. So you still get probabilities of 0 or 1 for the distribution of any individual coin, we just don't currently know the appropriate value.

Frequentists can talk about P(sample|model) so they are able to pick the model which is most likely to give rise to a sample, this is called the maximum likelihood estimate. They can also think of confidence intervals which allows you to pick a range that has a probability of p of containing the correct value of a parameter.

If I remember correctly it is the range that is considered to be randomly sampled from a fixed distribution, rather than the parameter having a distribution associated with it.
 
Last edited:
Technically speaking, a statement "I reject the null hypothesis' contains no probability information whatsoever. It's a pure proposition that describes the result of an experiment. If you want your conclusions to have a probabilistic form, then you will need to use Bayes' theorem. Frequentist statistics are functions from observations to propositions. Bayesian statements are functions from (observations and probability distributions) to probability distributions).

But most people don't want their conclusions to be probabilistic, and don't feel comfortable "priming the pump" with unsupported prior distributions. And even with a Bayesian statement, you still need to get to propositional content at some point. Somehow, you need to get from the statement "there is a 75% chance that the coin is biased" to "he is a cheating bastard; break his thumbs." Where do you draw the cutoff?
I don't care what people want their conclusions to be. When a Bayesian gives you a probability, he is referring to how much we know about something. If he says there is a 75% chance that the coin is biased and did all the math correctly, but you disagree, then either
a) he used different definitions of "coin" and "biased" than you did (aka different priors)
b) you introduced unsupported information or left out available information (aka screwed up)

Obviously two people working with different definitions of "coin" and "biased" may conclude different things. But the frequentist approach just says "Hey, use these definitions 'cause I told you so!" Fine. Specify the definitions, Bayes gives you the answer.

Do you act now as if the coin was biased? I don't care. Bayes has given you what you know about the situation. It claims no more than that. You cannot know more than a Bayesian knows, but you can certainly decide to act on knowledge however you wish.

And that's exactly what you cannot have.

How may different ways are there of being unfair? There are, quite literally, infinitely many ways. More than that, there are uncountably infinitely many ways of being unfair -- in fact, there are at least the power set of uncountably infinitely many ways of being fair.

That's too many to assign a probability distribution over. There is literally no way to quantify "not fair" in all possible ways (you run out of numbers first), so Bayesian statistics cannot look for all patterns at once.
And every one of these objections also applies to frequentists, but since they hide their assumptions instead of putting them out in the open to be criticized, they seem more objective. So good job there, frequentists. Way to pull the wool over our eyes.
 
And every one of these objections also applies to frequentists,

No, because frequentists do not need to quantify the number of ways that something can be biased. They simply calculate the probability that an unbiased coin would show the observed behavior, and if that behavior is "low enough" (which is indeed an assumption, but one out in the open), they reject the idea that it is unbiased.

Their results are arguably less informative because they have fewer assumptions. They need not assume an unquantifiable distribution of potential sources of error, because they simply infer that an error exists, but no information about type or degree.
 
Last edited:
Interesting. So frequentists are like ostriches...

In the sense that people falsely attribute behavior to them through total ignorance of the subject --- yes, I'm afraid so.


As far as I can see this kind of reasoning it totally useless in the physical sciences where there is a continuum of possible models, and one needs to define confidence intervals.

Well, that says more about what you can and can't see than it does about frequentist statistics, I'm afraid.
 
No, because frequentists do not need to quantify the number of ways that something can be biased. They simply calculate the probability that an unbiased coin would show the observed behavior, and if that behavior is "low enough" (which is indeed an assumption, but one out in the open), they reject the idea that it is unbiased.

Their results are arguably less informative because they have fewer assumptions. They need not assume an unquantifiable distribution of potential sources of error, because they simply infer that an error exists, but no information about type or degree.

See there? You just did it. You hid assumptions about the definition of "bias" by neglecting to mention that the test you choose corresponds to a choice of priors by the Bayesian.

Let's have an example, shall we? Suppose I'm about to toss a coin 100 times. What's a method a frequentist might use to determine whether the coin is biased? I'll do my best to extract the priors used, do a Bayesian analysis using those priors, and discuss the ways (if any! usually there are none because the approximation is exact) that the results differ.
 
I don't care what people want their conclusions to be. When a Bayesian gives you a probability, he is referring to how much we know about something. If he says there is a 75% chance that the coin is biased and did all the math correctly, but you disagree, then either
a) he used different definitions of "coin" and "biased" than you did (aka different priors)
b) you introduced unsupported information or left out available information (aka screwed up)

or c) he used unsupportable assumptions that cannot be justified in the forms of his priors.

The basic problem is that a Bayesian considers a statement like P(heads|biased) to be meaningful, which is a domain error unless you artificially restrict the set of "biases" that you accept.

We can agree -- frequentist and Bayesian alike -- on what the behavior of an unbiased coin is. (The usual properties of fair, independent, and stationary define it). We can similarly agree on the probability of any specific sequence of outcomes (1/2^N where N is the number of flips) and apply the axioms of probability theory to determine the probability of equivalence classes of flips (for example, the t-test says that two sequences are equivalent if their numbers of heads and tales are identical, regardless of arrangement). Picking the equivalence classes is equivalent to defining the particular test. Once this has been run, it is well-understood what the problem of getting a particular sequence or any equivalent to it is (given an unbiased coin). If the probability is low enough, we reject the idea that it is unbiased. All of these functions are mathematically well-grounded.

To go further in application of Bayesian stats, however, requires you to calculate P(sequence|biased), which is literally impossible (you can't take the probability as a discrete sum nor as an integral; the set of possible "biases" is too large.) Instead, you need to restrict yourself to a finite or continuum-sized set of possible biases, which is NOT something frequentists need to do, because they never need to quantify what "biased" means.
 
See there? You just did it. You hid assumptions about the definition of "bias" by neglecting to mention that the test you choose corresponds to a choice of priors by the Bayesian.

Quite the contrary. The test I choose (a simple t-test) corresponds to a choice of priors that is literally impossible for a Bayesian to make, although he can approximate it rather well if you assume that the probability of pathological cases such as probabilities distributed as Cantor dusts is everywhere zero. Since this nonpathology assumption corresponds well to the real world, this is one reason that Bayesian stats can be used.
 
Last edited:
My own PhD supervisor used to like Bayesian statistics until he dabbled with nonparametric Bayesian statistics. Just thought I'd share that bit of info.

/things are so much simpler with finitely many parameters
 
No, because frequentists do not need to quantify the number of ways that something can be biased. They simply calculate the probability that an unbiased coin would show the observed behavior, [...]

The observed behavior, or something arbitrarily considered equivalent to it.

We can agree -- frequentist and Bayesian alike -- on what the behavior of an unbiased coin is. (The usual properties of fair, independent, and stationary define it). We can similarly agree on the probability of any specific sequence of outcomes (1/2^N where N is the number of flips) and apply the axioms of probability theory to determine the probability of equivalence classes of flips (for example, the t-test says that two sequences are equivalent if their numbers of heads and tales are identical, regardless of arrangement). Picking the equivalence classes is equivalent to defining the particular test. Once this has been run, it is well-understood what the problem of getting a particular sequence or any equivalent to it is (given an unbiased coin).

What are the criteria for choosing one set of equivalence classes rather than another? What makes one test appropriate and another inappropriate?

If the probability is low enough, we reject the idea that it is unbiased.

Yes, but why? Events with low probability happen all the time. Everything has low probability.

Why are a bunch of events relevant which didn't occur, simply because we happened to call them "equivalent" to the one that did?

The choice of a test is implicitly a decision about which sorts of departure from perfect randomness are not too improbable and which ones, on the other hand, have negligibly small probability.

The decision has to be made, whether explicitly or implicitly, because the answer depends on it.
 
In the sense that people falsely attribute behavior to them through total ignorance of the subject --- yes, I'm afraid so.

Touchy, aren't we? But yes, I asked for it. :)

Well, that says more about what you can and can't see than it does about frequentist statistics, I'm afraid.

Fair enough. I freely admit to having trouble comprehending the frequentist point of view. I'll try to digest the posts here later when I have more time and see if I can get it.

If there were a concrete and physical example where frequentists and Bayesians disagree on something non-philosophical, that would help.

Actually, let me ask this - suppose we take our coin example, paying attention only to the total of heads-tails after N flips. As a physicist I would take that data point and use it to draw confidence intervals. That is, I would take a continuum of models for the coin - from always tails (p=0) to fair coin (p=.5) to always heads (p=1), parametrized by the probability p a given flip gives heads - and use the data (heads-tails) to draw contours of 90% confidence, 95% confidence, etc. in that p-space (here "contour" just means two points, since it's a 1-dimensional space). I could use Bayes' theorem to decide where to draw the contours, say assuming a flat prior on p.

Does that approach make sense to a frequentist?

(I'd really like to understand this, and I appreciate the patience of all the people kindly answering my questions so far.)
 
Last edited:
They only calculate P(S|unbiased), which tells you the probability of obtaining a sample containing the same information assuming your model is correct.
If this probability falls below an arbitrary threshold, they decide to reject the null hypothesis and move onto the next one, in which case the number isn't useful.
If the probability is above the threshold they keep the hypothesis and move onto the next data set/test, in which case the number still isn't useful as they can't talk about the probability of a hypothesis being correct.

I don't see how the number isn't useful. Could you answer my question in the post above?

Actually, on second thought I'm starting to get the point... following your procedure is going to allow you to draw one of those contours. But what will you say about the points inside it? They are all acceptable null hypotheses, I suppose, and you can't speak about their relative probabilities? So that means you can only ever draw one contour? Or are you allowed to change your cut-off probability and plot several contours associated with different values for it? If so, that seems dangerously close to Bayes...

The talk of a continuous variable is something of a red herring. Any particular coin either has a probability of 0.7631245... of coming up heads or it doesn't. So you still get probabilities of 0 or 1 for the distribution of any individual coin, we just don't currently know the appropriate value.

You're right - thanks.

If I remember correctly it is the range that is considered to be randomly sampled from a fixed distribution, rather than the parameter having a distribution associated with it.

Yes, I'm starting to see what you're saying. Although I still can't see how this is truly different from Bayes.
 
Last edited:
Actually, let me ask this - suppose we take our coin example, paying attention only to the total of heads-tails after N flips. As a physicist I would take that data set and use it to draw confidence intervals. That is, I would take a continuum of models for the coin - from always tails (p=0) to fair coin (p=.5) to always heads (p=1), parametrized by the probability p a given flip gives heads - and use the data (heads-tails) to draw contours of 90% confidence, 95% confidence, etc. in that p-space (here "contour" just means two points, since it's a 1-dimensional space). I could use Bayes' theorem to decide where to draw the contours, say assuming a flat prior on p.

Does that approach make sense to a frequentist?

Sure. In fact, a frequentist could do more or less exactly the same thing, since coin flips are something that can be sampled, and therefore you could simply get a coin that comes down heads with probability p for all the values of p you were interested in, take large enough samples, and run the appropriate numbers.

There are basically two ways to interpret "confidence intervals" in this sense -- and frequentists would have no problem with either. One is the confidence that we have that a coin with known probabilty p will have flipped between x1 and x2 heads in 1000 trials. The other is the confidence we have that a coin that flipped x heads has a "true" probability between p1 and p2. Both of those are legitimate frequentist calculations.
 
I can see that a Baynnsean treatment can be more accurate, andthat there are algorithms and applications that demonstrate that it can work, however I have never got my head round how you decide to assign the prior probability value.

At least I can understand simple the null hypothesis, and p-values.


ETA:

Any chance of a different concrete example for demonstrating the Baynsean approach?
 
Last edited:
What are the criteria for choosing one set of equivalence classes rather than another? What makes one test appropriate and another inappropriate?

The nature of what you are testing, of course. If you are interested in a question about similarly of means, then any two samples that have the same mean are equivalent. That's hardly arbitrary.

Yes, but why? Events with low probability happen all the time. Everything has low probability.

Why are a bunch of events relevant which didn't occur, simply because we happened to call them "equivalent" to the one that did?

Because we don't "happen" to call them equivalent -- the experimental hypothesis we wish to test does. If I wish to know if college students have higher mean self-esteeem than soldiers, then I'm interested by definition in means. If I'm intereste in whether college students have greater variance in self-esteem, then I'm interested in variance. If I'm interested in whether the sets differ at all, then I'm still not interested in differences in subject ordering that are an artefact of my sampling procedure. (I.e. any two data sets that are permutations of each other are equivalent, because sets are unordered).

The choice of a test is implicitly a decision about which sorts of departure from perfect randomness are not too improbable and which ones, on the other hand, have negligibly small probability.

But it's explictly about what sort of variance from randomness is irrelevant to the question at hand.
 
Last edited:

Back
Top Bottom