The Puzzle of Probability Itself

It's easy to understand those numbers, assuming they were based on polls. The polls are based on samples. Samples have uncertainty. Take a simplified situation, like the race between two candidates for a US Senate seat, where a simple majority of votes determines the winner. A polling firm polls 500 voters and finds that 52% of them say they'll vote for the Democratic candidate. As an estimate of how the population of voters will vote, the polling firm's 52% is only accurate to about ± 4%. The Democrat will win if the true proportion of voters who will vote for him is at 50%. A little math shows that, given the poll result of 52% ± 4%, the probability that the true percentage is at least 50% is about 83%. So, based on this poll, the Democrat has an 83% chance of winning.

Later, say another polls of 500 voters is conducted and finds that 53% of voters now say they'll vote for the Democratic candidate. The same math shows that the Democrat's chances have improved to 92%.

So, the bottom line is that the probability of winning at any point in time directly depends on the proportion of votes the polls say favor the candidate.

Yes, that is a good description of how the numbers are generated, but I still think the meaning isn't quite as clear as I usually take it to be. For example, if I ran the same poll (pick one) and got the same numbers, would they mean the same thing if the poll was taken 5 minutes before the election? How about 10 minutes after?

Somehow, the probability collapses into the actual.
 
This is a variant of classic probability problem (usually stated in terms of sons or daughters). The correct answer is 1/11, as you proved, by enumerating the sample space, here:

(excellent analysis snipped to save space)

However, there is actually an ambiguity in the wording of the problem. The probability that the second die is a 6, depends on how Sally came to learn that one die was a 6.

My hunch is that there is still ambiguity, even if we straighten out the wording. I think it arises because of what we take probability to mean. If I am making a statement as Sally (after seeing the dice), I know 100% what the other die is. So the answer given is dependent on the state of knowledge of the person asked.

But that feels like a cheat because we assume the question should be about the mathematics we've learned about how to calculate probabilities. "Sally" is just a placeholder variable. But eliminating her by saying "You look and see one of the dice is a six" alters the problem - it matters, not just what information we get (at least one die shows six), but how we got it (Sally saw both and showed us one, or we just happened to see one die and it was a six).

This business of process having an effect on our calculation pulls the whole enterprise away from simple deduction.

I can put it back by stating the problem as, "There are 11 equally possible outcomes for some event. What is the probability that event X (one of the set) will occur?" Here I've eliminated the process, something which I cannot do in any "real" experiment. I've also inserted "equally possible" - introducing a prior probability which is then preserved.
 
Yes, that is a good description of how the numbers are generated, but I still think the meaning isn't quite as clear as I usually take it to be. For example, if I ran the same poll (pick one) and got the same numbers, would they mean the same thing if the poll was taken 5 minutes before the election? How about 10 minutes after?

Somehow, the probability collapses into the actual.


The earlier the poll relative to the election, the more uncertainty there is. If the election is still a long way off when the polls is taken, then there is more uncertainty in the outcome than the naive statistical distribution would suggest. When I said that 52% of respondents in the poll (of 500 voters) implied that the candidate had an 83% chance of winning, I was assuming that the only source of uncertainty was due to random sampling. Or to put it another way, if someone says they're going to vote for the Democratic candidate, then they will. But for polls taken long before the election, this is not even close to a reasonable assumption, and probabilities to win computed under this assumption will overstate the leader's chances.

In Bayesian terms, my calculation of the probability of winning, above, used a non-informative prior, that is, one that has very little influence on the calculation, and lets the polling data speak for itself. But when we know that there is uncertainty from sources besides random sampling error, we can use a more informative prior that takes this into account, and shrinks the probabilities toward 0.5. Nate Silver of 538 does something along these lines in his so-called polls-only and polls-plus models. If you look at the predictions from these models over time, compared with his now-cast model, you'll see that polls-plus and polls-only models show the race is more uncertain than the now-cast model, which basically assumes that random sampling error is the main source of uncertainty. Of course, now that the election is only a few weeks away, the models are becoming closer in agreement.
 
Last edited:
(some snipped)

...If you look at the predictions from these models over time, compared with his now-cast model, you'll see that polls-plus and polls-only models show the race is more uncertain than the now-cast model, which basically assumes that random sampling error is the main source of uncertainty. Of course, now that the election is only a few weeks away, the models are becoming closer in agreement.

Yes, much the same happens in forecasting tournaments. My interest isn't so much in the mathematics or their justification, but what it means. Surely the exercise is pointless if we are just playing a "math game"? We want to know something about the real world and think the mathematics tells us this "something."

Maybe I'm not expressing it well. A parallel would be how the Turing test dodges the question by reframing it into a question more amenable to generating an answer. Or maybe I'm just asking a bad question.
 
So long as Sally could show either die as a six, then yes, it does not matter whether they are thrown as a pair or where (the moon) they are thrown. Because she selects a six to show however, the dice are linked (for one of the arguments to work). So, for example, she couldn't toss one last year and show it immediately. That would give you more information than waiting until both had been thrown.

I would contend that the dice are not linked. The set of the dice is evaluated differently, simply by counting them as a set.

Imagine two scenarios exactly the same, except for one thing:

In the first scenario, we throw two dice at once. We reveal one, and guess what the other will be.

In the other scenario, the same throwing of dice occurs, but the first die is not revealed. Is the chance of the second die's value changed?

If the dice themselves are linked, the behavior of the second die should be influenced by the first whether or not it's known, which suggests some supernatural linkage that I think dubious, unless you're suggesting a supernatural linkage caused by the knowledge itself. Are you inferring that knowing how a die on the moon fell directs the falling of a die on the earth? The chance of the second die being the same as the first is dependent on both dice. It is an artifact of a person's decision to count them as a set.
 
Last edited:
Yes, much the same happens in forecasting tournaments. My interest isn't so much in the mathematics or their justification, but what it means. Surely the exercise is pointless if we are just playing a "math game"? We want to know something about the real world and think the mathematics tells us this "something."

Well, if the model is "good" it will have good frequentist properties. If Nate runs the model on 100 elections in which the model gave the leading candidate an 80% chance of winning, and, in fact, about 80% of those candidates won, then the model makes good predictions in the frequentist sense.
 
I would contend that the dice are not linked. The set of the dice is evaluated differently, simply by counting them as a set.

Imagine two scenarios exactly the same, except for one thing:

In the first scenario, we throw two dice at once. We reveal one, and guess what the other will be.

In the other scenario, the same throwing of dice occurs, but the first die is not revealed. Is the chance of the second die's value changed?

It depends on the part I highlighted. If we just randomly reveal one or the other, then they aren't linked. If we use both in deciding which to reveal, they are.

If the dice themselves are linked, the behavior of the second die should be influenced by the first whether or not it's known, which suggests some supernatural linkage that I think dubious, unless you're suggesting a supernatural linkage caused by the knowledge itself. Are you inferring that knowing how a die on the moon fell directs the falling of a die on the earth? The chance of the second die being the same as the first is dependent on both dice. It is an artifact of a person's decision to count them as a set.

It isn't that the behavior of one dice influences the other - the dice are fair and the outcome is random. It's Sally's behavior that matters - the information she has available and how she chooses to reveal it. So yes, the problem is constructed to treat them as a set (or at least one answer does so).

I agree that the puzzle doesn't have to be constructed that way, but it happens to be.
 
Well, if the model is "good" it will have good frequentist properties. If Nate runs the model on 100 elections in which the model gave the leading candidate an 80% chance of winning, and, in fact, about 80% of those candidates won, then the model makes good predictions in the frequentist sense.

Maybe we could turn it into a many-worlds thing.

I don't know QM well enough to make the case, but it sounds like something similar going on - we have a prediction which eventually collapses: someone wins the election.

I wonder if we can spot the transition, say by tracking each vote as it is cast. Can we do better than having to wait until a majority votes one way or the other? It seems like there should be a smooth transition as the probability narrows to one (or zero for the loser).

And all this reminds me of something I mentioned in the OP - probability as a kind of clock which measures before and after an event goes from potential to actual. I have some idea that the "ticks" would be information, but it's only a very loose idea.
 
My hunch is that there is still ambiguity, even if we straighten out the wording. I think it arises because of what we take probability to mean. If I am making a statement as Sally (after seeing the dice), I know 100% what the other die is. So the answer given is dependent on the state of knowledge of the person asked.

But that feels like a cheat because we assume the question should be about the mathematics we've learned about how to calculate probabilities. "Sally" is just a placeholder variable. But eliminating her by saying "You look and see one of the dice is a six" alters the problem - it matters, not just what information we get (at least one die shows six), but how we got it (Sally saw both and showed us one, or we just happened to see one die and it was a six).

This business of process having an effect on our calculation pulls the whole enterprise away from simple deduction.


No, it doesn't. The probabilities are the same whether we are talking about an actual experiment conducted by a person, or about abstract mathematics. They must be, because to compute the probability of the outcome of the experiment in a rigorous manner, you have to go through that very same abstract math.

We say that Sally will throw two fair independent dice repeatedly until one of them turns up a 6. She will then ask you what the probability is that the other die is a 6. Mathematically, the experiment is described: given two independent random variables X and Y each with discrete uniform on the set {1, 2, 3, 4, 5, 6}. Let Z be the event that X = 6 or Y = 6. Let W be the event that X = 6 and Y = 6. Then

P(W|Z) = P(W, Z) / P(Z)
= P(W) / P(Z)
= P(X = 6, Y = 6) / [P(X = 6) + P(Y = 6) – P(X = 6, Y = 6)]
= (1/6)(1/6) / [1/6 + 1/6 – (1/6)(1/6)]
= (1/36) / (11/36)
= 1/11
 
No, it doesn't. The probabilities are the same whether we are talking about an actual experiment conducted by a person, or about abstract mathematics. They must be, because to compute the probability of the outcome of the experiment in a rigorous manner, you have to go through that very same abstract math.

We say that Sally will throw two fair independent dice repeatedly until one of them turns up a 6. She will then ask you what the probability is that the other die is a 6. Mathematically, the experiment is described: given two independent random variables X and Y each with discrete uniform on the set {1, 2, 3, 4, 5, 6}. Let Z be the event that X = 6 or Y = 6. Let W be the event that X = 6 and Y = 6. Then

P(W|Z) = P(W, Z) / P(Z)
= P(W) / P(Z)
= P(X = 6, Y = 6) / [P(X = 6) + P(Y = 6) – P(X = 6, Y = 6)]
= (1/6)(1/6) / [1/6 + 1/6 – (1/6)(1/6)]
= (1/36) / (11/36)
= 1/11

That's fine if I get the information (at least one die is a 6) from Sally. Does the probability change if we eliminate Sally?

I roll two dice without peeking. I look at one (only one) and see it is a six. Now I want to estimate the probability the other is a six as well. In this case, despite having the same information (one die is a six) the probability is now 1/6.

That's what I meant by the same information giving different outcomes, depending on how the information was obtained.
 
That's fine if I get the information (at least one die is a 6) from Sally. Does the probability change if we eliminate Sally?

I roll two dice without peeking. I look at one (only one) and see it is a six. Now I want to estimate the probability the other is a six as well. In this case, despite having the same information (one die is a six) the probability is now 1/6.

That's what I meant by the same information giving different outcomes, depending on how the information was obtained.



Given that it is not possible to make "fair dice" I would say the probability that the other dice is six could be better than one in six. How much better depends on whether the dice were deliberately made unfair, (in this case one may not be identical to the other), or if they were made to the best of mans ability to be fair.

Given that the six side has more dimples drilled in it, it would be slightly lighter and have a bias to be on top.
 
That's fine if I get the information (at least one die is a 6) from Sally. Does the probability change if we eliminate Sally?

No. That was my point. The problem is this:

Consider two independent random variables X and Y each with discrete uniform distribution on the set {1, 2, 3, 4, 5, 6}. Let Z be the event that X = 6 or Y = 6. Let W be the event that X = 6 and Y = 6. Compute P(W|Z).

The problem has nothing to do with Sally, dice, or anything else. It is a pure probability problem.

I roll two dice without peeking. I look at one (only one) and see it is a six. Now I want to estimate the probability the other is a six as well. In this case, despite having the same information (one die is a six) the probability is now 1/6.

You don't have the same information. The information you had from Sally is that she looked at both dice and reported that one of them was a six. You, on the other hand, only looked at one die. If Sally had only looked at one die and told you it was a six, then you'd have the same information as if you only looked at one die.

The information you have matters. This seems entirely banal. I don't understand why you think it is so remarkable.
 
Last edited:
Sally rolls two dice (6-sided, assumed fair). She shows one is a six. What is the probability that the other one is a six?

1) Either zero or one.
2) 1/2
3) 1/6
4) 1/11
5) 1/12
6) 1/36
7) Make up another answer or even reject the premise.

This is a variant of classic probability problem (usually stated in terms of sons or daughters). The correct answer is 1/11, as you proved, by enumerating the sample space, here:

I disagree; your solution assumes facts not stated in the problem. As you state:

However, there is actually an ambiguity in the wording of the problem. The probability that the second die is a 6, depends on how Sally came to learn that one die was a 6.

No, I don't think it is. The assumption you're making here is that Sally knows before showing you that one of the dice rolled a six, and this is not stated in the original problem. All that the problem states is that Sally shows you one of the dice, and that it is a six. It doesn't state anything about how Sally chooses which die to show, whether she is aware or cares that it's a six, or in fact anything about her knowledge or choices; it simply says that she rolls two dice which are assumed fair, and shows one of them which is a six. In effect, you're committing the Texas Sharpshooter Fallacy by assuming that Sally first reduced the solution space to outcomes including at least one six; nowhere in the initial problem does it say so.

If the problem stated that Sally rolled two dice until she reached a result where at least one was a six, then you would be correct; there are eleven possible outcomes in which at least one is a six, and in only one of them does the other show a six. But as stated, the problem specifies a single roll of fair dice; Sally showing one of them therefore reduces the solution space to the six possible outcomes in which that die shows a six.

The lesson there is that the probability depends on the actual procedure followed, and the model used to understand the probability must also model the actual procedure followed. And if the problem doesn't specify the procedure in enough detail, then there isn't enough information to determine the answer.

Dave
 
jt512 said:
This is a variant of classic probability problem (usually stated in terms of sons or daughters). The correct answer is 1/11, as you proved, by enumerating the sample space

Dave Rogers said:
I disagree; your solution assumes facts not stated in the problem. As you state:

jt512 said:
However, there is actually an ambiguity in the wording of the problem. The probability that the second die is a 6, depends on how Sally came to learn that one die was a 6.


No, I don't think it is. The assumption you're making here is that Sally knows before showing you that one of the dice rolled a six, and this is not stated in the original problem.


The original problem doesn't state what Sally knows. That's why it's ambiguous. But I don't think the problem is about Sally. Stripped down, I think its intent is to ask, if two dice are rolled, and one of them is a 6, what is the probability that the other die is a 6. As I said, the problem appears to be a variation on a problem that is in every elementary probability text: You meet a man on the street who tells you he has two children, one of whom is a girl. What is the probability that his other child is a girl?
 
I Monty-Halled it and chose 1/36. I read it as Sally purposefully showing us one of the sixes. If no six had been rolled, she wouldn't show us anything.

But the problem really doesn't say that. Looking at it again, I'd say 1/6.
 
Maybe we could turn it into a many-worlds thing.

I don't know QM well enough to make the case, but it sounds like something similar going on - we have a prediction which eventually collapses: someone wins the election.

I wonder if we can spot the transition, say by tracking each vote as it is cast. Can we do better than having to wait until a majority votes one way or the other? It seems like there should be a smooth transition as the probability narrows to one (or zero for the loser).

That is really off the wall. What we have is a prior distribution and some observations from a sampling model. We multiply them together, divide by a normalizing constant, and get an updated distribution.

And all this reminds me of something I mentioned in the OP - probability as a kind of clock which measures before and after an event goes from potential to actual. I have some idea that the "ticks" would be information, but it's only a very loose idea.


Yes, we update our probabilities as new information comes to light. That should hardly be a shock.
 
The original problem doesn't state what Sally knows. That's why it's ambiguous. But I don't think the problem is about Sally. Stripped down, I think its intent is to ask, if two dice are rolled, and one of them is a 6, what is the probability that the other die is a 6. As I said, the problem appears to be a variation on a problem that is in every elementary probability text: You meet a man on the street who tells you he has two children, one of whom is a girl. What is the probability that his other child is a girl?

I also read it that Sally knew nothing and the experiment would always be done the same way. In other words, she would always hide one die and reveal the outcome of the other roll, whatever pips were on top. That's an assumption, but it strips the question of its confusing parts (which I assume were added to make the question seem harder) and makes it a simple question of what one fair die rolls.

Of course, the answer is that if you roll it many times, it will come up as a 6 nearly one out of six times, approaching that exact amount the more rolls you include.

The Hillary question seems to be worded the same way. If this moment were frozen in time, then run forward to the end of the election, Hillary would win approximately 60 times if you ran the experiment 100 times. If you ran the experiment 1000 times, it'd be almost exactly 600.

Now let's say that time moves forward a week, then we freeze it again, still before the election occurs. Now, if you ran it forward 1000 times, Hillary would win around 800.

That's obviously impossible, except as a thought experiment. It's almost impossible to imagine. Does probability work like that? I don't know.

If we stop time now that Hillary has rolled a die twice, then restarted time until she finishes her next roll, we can predict that she would roll a number six in 100 of the next 600 times, or close to it. Is that essentially the same thing as predicting the election, except for all the variables?
 
(much snipped)

That's obviously impossible, except as a thought experiment. It's almost impossible to imagine. Does probability work like that? I don't know.

I don't know either, which is why I started this thread - to explore the ideas.

We have at least two meanings for probability already in this thread. One is a kind of "shut up and calculate (SUAC)" which seems to say (correct me if I'm straw manning here) that once we have fixed the priors/givens, it's merely a matter of using mathematics to generate a number we call the probability. Change the givens in a certain way and you get a different answer. Anything beyond that is out of scope and philosophical shenanigans.

I'll grant that the SUAC framing is fine, but not very interesting. Which is why this is in the philosophy section and not in the math section. I'm more interested in how the priors are set and the meaning of whatever the calculation produces. There are steps which extract facts about the real world, make them abstract for calculation purposes, and then reintroduce the answer as another kind of fact about the real world. It's the "out of scope" parts I want to chew on.
 
I also read it that Sally knew nothing and the experiment would always be done the same way. In other words, she would always hide one die and reveal the outcome of the other roll, whatever pips were on top. That's an assumption, but it strips the question of its confusing parts (which I assume were added to make the question seem harder) and makes it a simple question of what one fair die rolls.

I don't take the anthropomorphic elements of these problems seriously. I think they are generally added to keep students entertained (or, less cynically, "engaged"). Stripped of its anthropomorphic elements, I suspect that the intent of the question is to ask, if you roll two (fair, independent) dice, what is the probability that both come up "six" given that (at least) one of them does? The answer to this question, though perhaps unintuitive, is unquestionably 1/11.

The Hillary question seems to be worded the same way. If this moment were frozen in time, then run forward to the end of the election, Hillary would win approximately 60 times if you ran the experiment 100 times. If you ran the experiment 1000 times, it'd be almost exactly 600.

Now let's say that time moves forward a week, then we freeze it again, still before the election occurs. Now, if you ran it forward 1000 times, Hillary would win around 800.

That's obviously impossible, except as a thought experiment. It's almost impossible to imagine. Does probability work like that? I don't know.

If we stop time now that Hillary has rolled a die twice, then restarted time until she finishes her next roll, we can predict that she would roll a number six in 100 of the next 600 times, or close to it. Is that essentially the same thing as predicting the election, except for all the variables?


You can't rerun the election, even in principle. What you can do, at least in principle, is apply the same model to numerous elections. If you were to do that, and the model is good, then in about 60% of elections in which the model gives the leading candidate a 60% chance of winning, the leading candidate will win. Ditto for other percentages.
 
Last edited:
We have at least two meanings for probability already in this thread. One is a kind of "shut up and calculate (SUAC)" which seems to say (correct me if I'm straw manning here) that once we have fixed the priors/givens, it's merely a matter of using mathematics to generate a number we call the probability. Change the givens in a certain way and you get a different answer. Anything beyond that is out of scope and philosophical shenanigans.


If you can summarize that in half as many words, I swear I will print it on a t-shirt.

I'll grant that the SUAC framing is fine, but not very interesting. Which is why this is in the philosophy section and not in the math section. I'm more interested in how the priors are set and the meaning of whatever the calculation produces.


Here's the difference between a Bayesian statistician and a Bayesian philosopher. If a Bayesian philosopher wants to know where priors come from, he thinks about it for weeks on end, consults the wisest philosophers he can find, and asks about it on philosophy forums. Here's what a Bayesian statistician does: he looks it up in a Bayesian statistics textbook. Being a member of the latter camp, here's my answer:

1. The prior is chosen for convenience. Actually, that's a bit glib. In truth, the family of distributions from which the prior is chosen is chosen for convenience. So, we may choose a beta(a, b) prior for the "p" parameter of a binomial distribution, because the beta distribution is constrained to be between 0 and 1, just as the binomial parameter (and because the posterior will also be a beta distribution). So, not only do we pick a prior, and shut up and calculate, we often pick a prior specifically to make the calculation easy. But we still have to pick the parameters, a and b, of the beta distribution. So, read on.

2. A prior is chosen that will have the least possible influence on the calculation. This is done when we have little or no prior information or, for whatever reason, we want the data to speak for themselves. So, not only do we pick a prior and shut up and calculate, we often prick a prior that doesn't even affect the calculation.

3. Consensus of experts. We ask a bunch of subject-matter experts what they think the prior should be, and we do whatever they tell us. Thus, not only do we shut up and calculate, sometimes we don't even do our own calculations.

4. OMG, we actually use prior information! If 10 studies have been done on the effect of psychotherapy on the growth rate of man-in-the-moon marigolds, and we're doing the 11th study, we might actually use the results fo the previous 10 studies as our prior.
 
Last edited:

Back
Top Bottom