• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Why Malerin is Wrong About Bayes Theorem

rocketdodger

Philosopher
Joined
Jun 22, 2005
Messages
6,946
Malerin contends that it is valid to use an unconditional value of 0.5 for God existing and 0.1 for life existing in his analysis.

It is not valid, and this is why.

Malerin agrees that in Bayes Theorem, P(H|E) = P(E|H)P(H)/P(E), that P(E) can be replaced with P(E|H)P(H) + P(E|~H)P(~H).

What Malerin is missing is the fact that since we exist we know P(E|H) + P(E|~H) must sum to 1. That is, we know for sure that P(E|H) + P(E|~H) == 1.

What happens when follow this through?

P(E) = P(E|H)P(H) + P(E|~H)P(~H) -->
P(E) = P(E|H)P(H) + (1 - P(E|H))P(~H) -->
P(E) = P(E|H)P(H) + P(~H) - P(E|H)P(~H)

Now, Malerin's whole argument is contingent upon a known value of 1.0 for P(E|H). This is fine, since we can assume that if God did exist then we would exist 100% of the time as well. But if we use that value we arrive at

P(E) = 1.0 * 0.5 + 0.5 - 1.0 * 0.5 -->
P(E) = 0.5 + 0.5 - 0.5 -->
P(E) = 0.5

In other words, because we know we exist we cannot arrive at any valid estimates for P(E) and P(H) that are independent of each other. We can't just pull numbers out of our backsides. In particular, since we actually know the dependence, if Malerin wants to say P(H) is an "agnostic" 0.5 then this implies P(E) must also be an "agnostic" 0.5.

This should be intuitively true to most who understand statistics -- you can't estimate an unconditional probability using conditioned evidence and expect to learn anything.
 
Last edited:
What Malerin is missing is the fact that since we exist we know P(E|H) + P(E|~H) must sum to 1. That is, we know for sure that P(E|H) + P(E|~H) == 1.

I don't see how that follows.

Suppose I hear that my friend just gave birth.

Let H = It's a girl!
Let E = The child's name is Sue

Rough guesstimates: Pr(E|H) = .001 (there are lots of girls' names, so even if the child is a girl, it's not very likely her name is Sue)

Pr (E|~H) = .00001 (rabid Johnny Cash fans aside, not many parents name their boys "Sue")

Pr(E|H) + Pr (E|~H) = .001 + .00001 = .00101

(Note that there is a "useful" application of Bayes' Theorem here, as the knowledge that the child's name is Sue allows me to conclude with a high degree of certainty that the child is female.)

I'm not defending Malerin's overall approach, but I think this particular critique is misplaced.
 
I don't see how that follows.

Suppose I hear that my friend just gave birth.

Let H = It's a girl!
Let E = The child's name is Sue

Rough guesstimates: Pr(E|H) = .001 (there are lots of girls' names, so even if the child is a girl, it's not very likely her name is Sue)

Pr (E|~H) = .00001 (rabid Johnny Cash fans aside, not many parents name their boys "Sue")

Pr(E|H) + Pr (E|~H) = .001 + .00001 = .00101

(Note that there is a "useful" application of Bayes' Theorem here, as the knowledge that the child's name is Sue allows me to conclude with a high degree of certainty that the child is female.)

I'm not defending Malerin's overall approach, but I think this particular critique is misplaced.

You can arrive at unconditional estimates in your example even if you know the outcome already because the outcome does not affect the unconditional probabilities. In the case of existence, however, that is impossible -- since we exist, and could not estimate an unconditional probability if we did not exist, then any estimate is conditioned upon our existence and hence there is no notion of an unconditional probability to begin with from our point of view.

Specifically, you are not aware of a dependence between P(E|H) and P(E|~H) in your example of the child who is named Sue. In Malerin's case, we are. Since we exist, we are guaranteed that P(E|H) + P(E|~H) == 1.0.
 
Last edited:
You can arrive at unconditional estimates in your example even if you know the outcome already because the outcome does not affect the unconditional probabilities. In the case of existence, however, that is impossible -- since we exist, and could not estimate an unconditional probability if we did not exist, then any estimate is conditioned upon our existence and hence there is no notion of an unconditional probability to begin with from our point of view.

Specifically, you are not aware of a dependence between P(E|H) and P(E|~H) in your example of the child who is named Sue. In Malerin's case, we are. Since we exist, we are guaranteed that P(E|H) + P(E|~H) == 1.0.

I still don't see how that follows. What "dependence" are you talking about? And why does it mean that P(E|H) and P(E|~H) must sum to one? Why can't you just as easily say that "since we know the child is named Sue, then the conditional probabilities must sum to one"?

Edited to change "zero" to "one" in above paragraph, and to add: I reread your post and it seems that you're suggesting a sort of anthropic principle, that since we exist we cannot come up with conditional probabilities. But that's simply a way of questioning the reasonableness of the estimates from the fine-tuning argument, and I still don't see how that creates a requirement that the two probabilities sum to one.
 
Last edited:
I reread your post and it seems that you're suggesting a sort of anthropic principle, that since we exist we cannot come up with conditional probabilities. But that's simply a way of questioning the reasonableness of the estimates from the fine-tuning argument, and I still don't see how that creates a requirement that the two probabilities sum to one.

Yes that is what I am saying.

I have to think about my reasoning before I respond, I will do so by tomorrow hopefully.
 
What Malerin is missing is the fact that since we exist we know P(E|H) + P(E|~H) must sum to 1. That is, we know for sure that P(E|H) + P(E|~H) == 1.

We do?

Let H = This is a coin that always lands heads
Let E = A toss that results in heads

P(E/H) = 1 (Given a coin that always lands heads, heads must always result)
P(E/~H) = .5

~H simply means the coin does NOT always land heads. That doesn't mean you can't get heads. The coin can still land heads half the time, 1/4 of the time, never, 90% of the time, etc. It just can't land heads ALL THE TIME.

1 + .5 = 1.5

Try it a different way:

H= God (defined as a being that creates life supporting universes) exists.
E= A life-supporting universe exists

Pr(E/H) = 1 (If such a God exists, there's going to be a life-supporting universe).
Pr(E/~H) = .0000000001 (Given such a God does not exist, there could still be a life-supporting universe through naturalistic processes or random chance).

Again, the sum is over 1.
 
I don't see how that follows.

Suppose I hear that my friend just gave birth.

Let H = It's a girl!
Let E = The child's name is Sue

Rough guesstimates: Pr(E|H) = .001 (there are lots of girls' names, so even if the child is a girl, it's not very likely her name is Sue)

Pr (E|~H) = .00001 (rabid Johnny Cash fans aside, not many parents name their boys "Sue")

Pr(E|H) + Pr (E|~H) = .001 + .00001 = .00101

(Note that there is a "useful" application of Bayes' Theorem here, as the knowledge that the child's name is Sue allows me to conclude with a high degree of certainty that the child is female.)

I'm not defending Malerin's overall approach, but I think this particular critique is misplaced.

That's a good one. I went the other way (sums over 1).
 
since we exist we know P(E|H) + P(E|~H) must sum to 1.
buh buh what?

This is just very, very wrong. I can't even figure out what went wrong in your head to make you think it is right. Can you tell us what you think E and H are so we can help you figure out your mistake?
 
Not to mention, Malerin's assumption that "agnostic" means the probability of God's existence is 0.5 is just plain silly. Agnostic means we don't know, not that we can just arbitrarily assign values.

For example, I asked Malerin in another thread if this also meant the "agnostic" probability of the existence of the Flying Spaghetti Monster was 0.5 - he/she said yes.

 
Last edited:
I think the better question is what would inspire RD to create a seperate thread about a mistake he thinks I made?
 
buh buh what?

This is just very, very wrong. I can't even figure out what went wrong in your head to make you think it is right. Can you tell us what you think E and H are so we can help you figure out your mistake?

H = A creator of life supporting universes
E = A life supporting universe

My argument is that because we are part of a life supporting universe any estimates we generate are already conditioned I.E. we can't generate a valid unconditional probability for existence.

I am not claiming we can't generate valid estimates for conditional probabilities.
 
I think the better question is what would inspire RD to create a seperate thread about a mistake he thinks I made?

Your complete lack of responses to valid arguments brought against your points in other threads.
 
I have to think about my reasoning before I respond, I will do so by tomorrow hopefully.

My reasoning is that because we only have a single event to gather data from -- our own universe -- it is invalid to assume any sum conditional probability other than 1.0 for a life supporting universe.

To illustrate, an example:

Let H = a person who chooses a number
Let E = a number X is chosen by some process

Suppose we are given information that X was indeed chosen at least once. Absent any other information, what can we tell?

We can't really estimate a prior P(E) because we have literally no idea what the mechanics of the choice process are. Assume we can pull any value for P(H) out of our backsides.

We do know that X was chosen at least once conditioned on there either being or not being a person doing the choosing. So we know the conditional probability P(E|H) + P(E|~H) is at least greater than zero. But we don't have knowledge of any other events nor information that could help us estimate a probability of other events. All we have is that single event. Thus P(E|H) + P(E|~H) == 1.0.

As you suggested Dunstan this is more along the lines of questioning the values used in the fine-tuning argument but I think (assuming I am not wrong) it illustrates why it is pointless to apply statistical analysis to single events absent any other information.
 
I suspect the probability of God existing shrinks everytime He/She/It is no longer needed to explain something. And, it is probably well below .5, by now.

One might have more success in arguing that life, in the specific configurations we know it, existing is relatively small. Maybe .1. Maybe even lower.

However, we could also say that the probability of any life-like entities existing, and evolving to the point where they develop something like the Internet, could well be close to 1, given the trillions of different possible ways that could happen, and the limits underwhich physics seems to work.
 
There is either a million dollars in my closet or there is not. That's a .5 chance. If it is there I'll give it to anyone who will give me $100. You have to pay me before I look though. Those are damn good odds. $100 for a 50/50 chance to win $1,000,000.

Any takers?
 
To illustrate, an example:

Let H = a person who chooses a number
Let E = a number X is chosen by some process

Suppose we are given information that X was indeed chosen at least once. Absent any other information, what can we tell?

[...]

We do know that X was chosen at least once conditioned on there either being or not being a person doing the choosing. So we know the conditional probability P(E|H) + P(E|~H) is at least greater than zero.

Yes, I agree.

The only way for the sum to be zero is for both terms to be zero. That would mean that E is impossible supposing H true, and also impossible supposing H false. As H is either true or false, E would simply be impossible. But that can't be, because E happened.

But we don't have knowledge of any other events nor information that could help us estimate a probability of other events. All we have is that single event. Thus P(E|H) + P(E|~H) == 1.0.

I don't see how that follows.

My guess is that you wrote "P(E|H)" but thought "P(E and H)", and likewise you wrote "P(E|~H)" but thought "P(E and ~H)".

P(E and H) + P(E and ~H) is 1, because the left side equals P(E), and E is known with certainty to have happened.
 
My guess is that you wrote "P(E|H)" but thought "P(E and H)", and likewise you wrote "P(E|~H)" but thought "P(E and ~H)".

P(E and H) + P(E and ~H) is 1, because the left side equals P(E), and E is known with certainty to have happened.

Even that's not true, because not all events that happen happen with probability 1.

If I roll a die and get a four, does that mean that P(rolling a four and it is sunny) + P (rolling a four and it is not sunny) = 1? Of course not, since P(rolling a four) is known a priori to be one in six.

I'm afraid that I have to back Malerin on this one. While it's true that prior estimate of the probability of God's existence will more or less have to be pulled out by a proctologist, to use Bayes' theorem at all requires that you pull out a prior estimate. Appropriate practice, therefore, is not to choose an arbitrary prior, but to choose an uninformative prior, basically a prior that is maximally sensitive to the new evidence.

As to what that prior is,.... well, there are a number of different mathematical camps, but the one that is the most computationally tractable is to choose the distribution with maximum entropy (and therefore maximum uncertainty). In the case of a binary, yes/no, decision, that is the distribution where p(yes) = 0.5, regardless of the question.

Do all gloxenpfeffers hyperbolicate? I don't even know what the word means, so I might as well guess "yes" with probability 1/2.

Here's Wikipedia's take on the same question:

Another idea, championed by Edwin T. Jaynes, is to use the principle of maximum entropy (MAXENT). The motivation is that the Shannon entropy of a probability distribution measures the amount of information contained the distribution. The larger the entropy, the less information is provided by the distribution. Thus, by maximizing the entropy over a suitable set of probability distributions on X, one finds that distribution that is least informative in the sense that it contains the least amount of information consistent with the constraints that define the set. For example, the maximum entropy prior on a discrete space, given only that the probability is normalized to 1, is the prior that assigns equal probability to each state. And in the continuous case, the maximum entropy prior given that the density is normalized with mean zero and variance unity is the standard normal distribution.
 
There is either a million dollars in my closet or there is not. That's a .5 chance. If it is there I'll give it to anyone who will give me $100. You have to pay me before I look though. Those are damn good odds. $100 for a 50/50 chance to win $1,000,000.

Any takers?

No, because I have other information based on other closets and other people.

In a sense, you're actually proving Malerin's case for him. If you knew nothing about money and closets, then it would be a good bet for you. You know it's a bad bet because you have an informed prior. Really, it's no different than your offering to play craps with me using a pair of dice that you know are weighted and I don't.
 
No, because I have other information based on other closets and other people.

In a sense, you're actually proving Malerin's case for him. If you knew nothing about money and closets, then it would be a good bet for you. You know it's a bad bet because you have an informed prior. Really, it's no different than your offering to play craps with me using a pair of dice that you know are weighted and I don't.

And of course anybody who thinks about this realizes that it makes no sense. How can ignorance actually improve your chances?

I think the key to making this understandable is to focus on the "maximum uncertainty" part, rather than the p=0.5 part. Choosing a distribution which maximizes your uncertainty means that when you are ignorant, you are maximally uncertain as to whether it's a good bet until you have new information. That is, the choice of your distribution allows to specify a probability of 0.5, but you are maximally uncertain as to whether it's a good choice. Information starts to reduce that uncertainty.

Linda
 

Back
Top Bottom