• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Why Malerin is Wrong About Bayes Theorem

And of course anybody who thinks about this realizes that it makes no sense. How can ignorance actually improve your chances?

You're not reading well today, Linda.

Ignorance doesn't improve your chances; ignorance means that your estimated probability of winning the bet is far off the actual probability -- because you don't know enough to make a good estimate.

Ignorance is what lets you believe that it's a good bet.
 
And you wonder why?
You mean how you ignore the fact that everyone who understands Bayes Theorem has found that you blatantly misuse the theorem and that you willful and fraudulently manipulate of the numbers to make your claim sad god claim somehow valid?
 
Even that's not true, because not all events that happen happen with probability 1.

If I roll a die and get a four, does that mean that P(rolling a four and it is sunny) + P (rolling a four and it is not sunny) = 1? Of course not, since P(rolling a four) is known a priori to be one in six.

Yes but my point is precisely that in the case of existence we don't know a priori odds.

All we know is that in a single trial (our universe) life existed (us). As far as I am aware, all the state of the art models and theories simply posit possible other universes with possible alternate conditions, etc. But this information does not lead to a better estimate because we don't have any way to choose from among these various possibilities.

Like in my example. All you know is that a number was chosen. Of course we can come up with a myriad different possible ways to choose a number, leading to an infinite possible choices, etc, but since we only have data from a single trial there is really no way to converge on which choice method is most likely and therefore there is really no way to generate a valid a priori estimate.
 
I don't see how that follows.

My guess is that you wrote "P(E|H)" but thought "P(E and H)", and likewise you wrote "P(E|~H)" but thought "P(E and ~H)".

P(E and H) + P(E and ~H) is 1, because the left side equals P(E), and E is known with certainty to have happened.

No I meant P(E|H) and P(E|~H).

I don't think we can come up with an a priori estimate of P(E). What I think we can come up with is an estimate of the sum of the conditionals P(E|H) + P(E|~H).

Again, since we only have information from a single trial, my claim is that the sum of all conditionals for a life supporting universe must equal 1.0.

Furthermore, the two conditionals P(E|H) and P(E|~H) are the only conditionals due to H being a binary random variable.

All this means -- and this is my assertion -- is that if one is doing a nonsense analysis like Malerin is doing, and if one pulls a value of P(H) out of their backsides like Malerin is doing, then P(E) is going to be automatically set as well. One can't pull independent values of P(H) and P(E) out of their backsides.
 
There is either a million dollars in my closet or there is not. That's a .5 chance. If it is there I'll give it to anyone who will give me $100. You have to pay me before I look though. Those are damn good odds. $100 for a 50/50 chance to win $1,000,000.

Any takers?

Wow, Malerin, a million dollar challenge! If you truly believe what you've been saying in other threads, and you are willing to consistently apply your arguments, then why not take RandFan's challenge?

C'mon Malerin, put your money where your mouth is.
 
Yes but my point is precisely that in the case of existence we don't know a priori odds.

And in light of that, making the statement that the probabilities MUST sum to one is, frankly, ludicrous.

therefore there is really no way to generate a valid a priori estimate.

...in which case, we choose a maximally uninformative prior estimate, which works out, mathematically, to be the binomial case with maximum entropy, i.e. p(God exists) = 0.5.
 
maximum entropy (and therefore maximum uncertainty). In the case of a binary, yes/no, decision, that is the distribution where p(yes) = 0.5, regardless of the question.

This is another issue, and I am glad you brought it up.

Every binary decision, "will the choice result in X or Y" can be rewritten as "will state X or Y occur as a result of this choice?"

And you have to think of this in context with the entire set of all possible states.

If we know only two possible states will result, as when a coin is flipped or when a person is asked a yes/no question, we can say for sure that either X or Y will occur, or X will or will not occur, or whatever. In that case, 0.5 is the right value to "maximize entropy."

But what about events we know literally nothing about -- such as Malerin's "universe creator?"

In that case, what does saying "X will or will not occur" mean? It doesn't mean that if X does not occur only one other state will take it's place. It means that if X does not occur, any state may take it's place. This is because if we know nothing about the event we necessarily know nothing about which other states may be chosen.

So my contention is that in such a case, to maximize entropy, you would need to "assign an equal probability to each state" as the article says and that results in an a priori estimate of 1 / |{set of all possible states}|, or infinitessimal.

This agrees with fls's formulation in the other thread, where we want to maximize the effect of new information on the conditional.

Am I wrong about this, drkitten?
 
Last edited:
Wow, Malerin, a million dollar challenge! If you truly believe what you've been saying in other threads, and you are willing to consistently apply your arguments, then why not take RandFan's challenge?

C'mon Malerin, put your money where your mouth is.

See Dr.Kitten's answer. In fact, read the whole thread. How many people besides yourself agree with RD?
 
And in light of that, making the statement that the probabilities MUST sum to one is, frankly, ludicrous.

But I am not talking about the unconditionals, I am saying the conditionals must sum to 1.

All making the conditionals sum to 1 (or any number, for that matter) does is create a dependence between the a priori estimates of E and H.

...in which case, we choose a maximally uninformative prior estimate, which works out, mathematically, to be the binomial case with maximum entropy, i.e. p(God exists) = 0.5.

See my post on this issue.
 
You're not reading well today, Linda.

Ignorance doesn't improve your chances; ignorance means that your estimated probability of winning the bet is far off the actual probability -- because you don't know enough to make a good estimate.

Ignorance is what lets you believe that it's a good bet.

That's what I mean. Any understanding of what it means to make a choice of p=0.5 has to take into account the situation where that is clearly the wrong choice. You suggested earlier that p=0.5 was a good choice in the face of no knowledge, but the question that someone attempting to understand your proposition would ask is "how can that be a good choice, when most of the time, once I have a little knowledge, it becomes a bad choice?" I'm trying to come up with a way to present it so it doesn't appear nonsensical - that it includes all the bad choices and all the good choices, and you are maximally uncertain as to where your choice falls.

Linda
 
You suggested earlier that p=0.5 was a good choice in the face of no knowledge, but the question that someone attempting to understand your proposition would ask is "how can that be a good choice, when most of the time, once I have a little knowledge, it becomes a bad choice?"

Making a choice when you have no knowledge is almost always bad. The best option, if you have no knowledge, is not to make a choice at all. But if you are forced at gunpoint to make a choice when you have no knowledge, then the best of your choices, mathematically, is to choose 0.5 (or the least informed choice, more generally, which in this case is the maximum entropy choice).

Bayes' theorem is one of those situations where you are forced at gunpoint to make a choice; you cannot apply Bayesian reasoning without a choice of priors. The nice thing about Bayes' theorem is it allows you to measure the effect of increasing knowledge --- as you learn more, you refine your posterior probability estimate to get a better choice in light of increasing knowledge.

The reason that the 0.5 choice is because you learn the most from new information. Information captured in a biased prior estimate will bias your conclusions; to take an extreme example, if I assume that the prior probability of unicorns existing, or of rocks falling from the sky, is zero, then there is no amount of evidence at all that can change my mind. I've basically guaranteed that I will learn nothing from new information.
 
Are you going to say why?

If I am wrong I would really like you to explain why because otherwise I won't learn anything from this.

Okay, second try, doing the math correctly this time. Let's start with a simple definition of conditional probability :

P(E) = P(E|H)P(H) + P(E|-H)P(H).

We now assume that I have access to an event generator that will generate events independent of E with probability 0.5. This is not a big assumption, since I have access to a whole jar-full of them; they're called coins. H is therefore just an event "I flipped heads," and P(H) = P (-H) = 0.5.

Furthermore, since this event is by construction independent of E, P(E|H) = P(E) and P(E|H)P(H) = P(E)/2 = P(E|H)/2.

Similarly, P(E|-H) = P(E) and P(E|-H)P(-H) = P(E)/2 = P(E|-H)/2.

And since P(E|H)/2 = P(E)/2 = P(E|-H)/2, we have that P(E) = P(E|H) = P(E|-H).

Your claim is that, in the absence of any other information about P(E), we should assume that P(E|H) + P(E|-H) = 1.

But, mathematically, this works out to be that P(E|H) + P(E|H) = 1.0, which in turn means that P(E|H) = 0.5, which in turn means that P(E) = 0.5

So your suggested constraint ends up being much stronger than the claim you rejected, namely that we cannot estimate P(E). We have. If I have no information on it whatsoever, then P(E) must be equal to 0.5.

So you end up either supporting Malerin's formulation --- the a priori probability of God existing is 0.5, or you end up needing to reject your own overconstraint.
 
The reason that the 0.5 choice is because you learn the most from new information. Information captured in a biased prior estimate will bias your conclusions; to take an extreme example, if I assume that the prior probability of unicorns existing, or of rocks falling from the sky, is zero, then there is no amount of evidence at all that can change my mind. I've basically guaranteed that I will learn nothing from new information.

But a probability can't ever be zero, so at the very least your example is showing that if the prior estimate is very low then it will take a huge amount of evidence to change your mind.

But even that isn't necessarily true, because of the other prior. If P(H) is extremely low, and P(E) is near unity, then it makes sense that any amount of evidence won't change the result.

Suppose H is unicorns exist and E is a unicorn horn exists.

If P(H) is infinitessimal, and we know P(E|H) is 1, then it all depends on P(E) like we would expect. If P(E) is high, then we would expect that even if we find a unicorn horn it would still be unlikely that unicorns exist -- some other explanation, implicitly included in P(E), would be more likely. If P(E) is low enough -- that is, if the a priori odds of finding a unicorn horn sitting in a field are low enough -- then finding one would indeed suggest that a unicorn was here I.E. P(H|E) would reflect this staggering evidence.
 
That's what I mean. Any understanding of what it means to make a choice of p=0.5 has to take into account the situation where that is clearly the wrong choice. You suggested earlier that p=0.5 was a good choice in the face of no knowledge, but the question that someone attempting to understand your proposition would ask is "how can that be a good choice, when most of the time, once I have a little knowledge, it becomes a bad choice?" I'm trying to come up with a way to present it so it doesn't appear nonsensical - that it includes all the bad choices and all the good choices, and you are maximally uncertain as to where your choice falls.

Linda

If one or more of these are your objections, I think I can clear it up for you.

1) Choosing p=0.5 is nonsensical because you anticipate that after gaining just a little information, you will set p to a value far different than 0.5.
You don't yet have that information. More importantly, you don't know in which direction that information will sway you. If e.g. you think it's very likely that the new information will make you set p to a value very near 1, then that is prior knowledge you have and must be incorporated into your first guess, so that instead of 0.5 you choose 0.95 or something. For p=0.5 we specify you have no prior knowledge. Note that this happens approximately 0% of the time.


2) Choosing p=0.5 is nonsensical because at the time you choose p=0.5, if asked what the most likely result after 100 trials would be, you would not choose 50.
Make sure you are conducting your "trials" correctly. They must be independent. The correct question is "After being asked 100 independent binary questions for which I have the same information that I have for this question, how many will I get right?" Obviously if you're asked the same question, or a similar question, then the results will be correlated.


3) Choosing any value of p is nonsensical because it specifies a level of information you just don't have.
No it doesn't. Only if you interpret p to mean "and after getting more information, for some reason the p I chose is still relevant!" does it specify a level of information you don't have. As it is your choice of p is simply the best current guess at how much you should believe one proposition or the other.
 
Below is a little diagram to explain the problem.

Theists believe P(E|~H) = 0.
Atheists believe P(E|~H) > 0.
Agnostics believe P(E|~H) >= 0.
 

Attachments

Last edited:
Making a choice when you have no knowledge is almost always bad. The best option, if you have no knowledge, is not to make a choice at all. But if you are forced at gunpoint to make a choice when you have no knowledge, then the best of your choices, mathematically, is to choose 0.5 (or the least informed choice, more generally, which in this case is the maximum entropy choice).

Bayes' theorem is one of those situations where you are forced at gunpoint to make a choice; you cannot apply Bayesian reasoning without a choice of priors. The nice thing about Bayes' theorem is it allows you to measure the effect of increasing knowledge --- as you learn more, you refine your posterior probability estimate to get a better choice in light of increasing knowledge.

The reason that the 0.5 choice is because you learn the most from new information. Information captured in a biased prior estimate will bias your conclusions; to take an extreme example, if I assume that the prior probability of unicorns existing, or of rocks falling from the sky, is zero, then there is no amount of evidence at all that can change my mind. I've basically guaranteed that I will learn nothing from new information.

I think that's reasonably good way to explain it. But then the conclusion one would draw is that the use of Bayes' theorem and its uninformative prior should be confined to situations where you have increasing knowledge - that is, absent information, its prior should be avoided (i.e. it can't serve as a guess for Randfan's scenario). And that when you find yourself in a situation where Bayes' theorem does not increase your knowledge (the likelihood ratio is derived intuitively (i.e. pulled out of your ass :)), rather than empirically), such as Malerin's fine-tuning scenario, a maximally uninformative prior should also be avoided.

For those reasons, I don't think RandFan's scenario proves Malerin's case, because I think the application of p=0.5 is unsupported in either case.

Linda
 

Back
Top Bottom