• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

The Birthday Problem/Paradox for Dummies

Foolmewunz

Grammar Resistance Leader, TLA Dictator
Joined
Aug 11, 2006
Messages
41,468
Location
Pattaya, Thailand
Okay, making as much fun of my lack of math comprehension as you choose, can anyone explain the probability of the birthday paradox in terms that someone who's math-challenged can understand.

Forget Wiki. I have a hangover and it's far too thick(or I am).
 
It's 1 minus the probability that all the people in the room have different birthdays.
On Planet X there are only 3 days in a year. If you have 2 people, what is the probability that they will have the same Day of Hatching?
What if you have 3 people?
If you can work that out, we can try more days in the year.
 
Last edited:
Quick definition of the problem:
Given a number of people (N), what is the probability that 2 of them will have the same birthday?

Well, with N=367 of them, you run out of different days, so that's the 100% point.
But the 50% point happens at only N=23! That's way less than 367/2!

How can that be?

The answer is fairly simple. Each person has to be compared to every other person to see if they have the same birthday. So it's not the number of people you need to pay attention to, it's the number of possible ways to make a single pair of people. That number grows very fast.

The explanation is a bit more complicated.

With 2 people, you have only 1 'pair' of people (1,2)
With 3 people, you have 3 'pairs' of people (1,2)(1,3)(2,3)
With 4 people, you have 6 'pairs' of people (1,2)(1,3)(1,4)(2,3)(2,4)(3,4)
With 23 people, you have a whopping 253 pairs of people!
The general formula for the number of pairs is to add up every counting number (1,2,3...) less than the number of people. That happens to be equal to N * (N-1) / 2, or (N^2-N)/2.
(Rule of thumb: Anything with an N^2 in it will grow faster than you expect.)

So, that's the possible number of pairs of people that *could* match birthdays. Given that number (We'll call it P, for Pairs), what's the probability that at least one pair shares a birthday? Mathematically, it turns out to be a bit easier to answer the opposite question: What's the probability that NONE of them share a birthday? That ends up being a simple multiplying of the probability that any two people don't share a birthday (1 - 1/365.2425 (Stupid leap years)) by itself a number of times equal to the number of pairs

probability of no matches = (364/365) ^ P

You either have no matches, or at least one match. Or, to put it mathematically:
probability of at least one match = 1 - probability of no matches
=1 - (364/365) ^ P

For 253 matches (that we get with only 23 people), that becomes:
1 - (364/365) ^ 253
which works out to be about 50%
 
Way back when I was at university studying math, a professor mentioned this. There were about 40 students, and he said he was willing to bet that at least 2 of us were sharing a birthday. So we pointed to the twins in the front row: "Yeah, it's those two".
 
Quick definition of the problem:

probability of no matches = (364/365) ^ P

You either have no matches, or at least one match. Or, to put it mathematically:
probability of at least one match = 1 - probability of no matches
=1 - (364/365) ^ P

For 253 matches (that we get with only 23 people), that becomes:
1 - (364/365) ^ 253
which works out to be about 50%

This may or may not help someone develop an intuition to why the birthday problem works out the way it does, but there's a flaw in the logic here.

You can't just multiply (364/365) by itself a bunch of times to calculate the probability of no matches. This only works for independent events.

Say you have three people, A, B, and C. (And we'll ignore leap year).

Take the first pair, A and B. The probability of no match here is 364/365.

Take the next pair, A and C. Probability of no match: 364/365.

Now take the pair B and C. You already know they don't share a birthday with A, so one day is ruled out completely. There's now a probability of 363/364 that this pair doesn't match.

The formula actually should be:

1 - 365! / [(365-n)! * 365^n]
 
Way back when I was at university studying math, a professor mentioned this. There were about 40 students, and he said he was willing to bet that at least 2 of us were sharing a birthday. So we pointed to the twins in the front row: "Yeah, it's those two".

:p:p

Hans
 
This problem is actually a brilliant example of attaching meaning after the event.

23 seems low because after you've found 2 people with the same birthday (or imagined that you have, if you're purely doing it as a thought experiment) your brain goes from "two completely random people having a birthday on a completely random day" to "these two people having a birthday on the xth day of month y". A specific two people having the same birthday is unlikely, two people having the same birthday on a specific date is unlikely, and a specific two people having the same birthday on a specific date is very unlikely.

Had you specified a couple and/or a date beforehand, then it'd be unlikely with 23. But the brain tends to attach significance to those two people after the fact when that's not warranted.

Hopefully that's clear. I've not really woken up yet, so probably shouldn't have embarked on writing an explanatory post.
 
Thanks, guys. I sat and did some of the math, and it actually checks out once I got an understanding of the "series" (if that's the right term) I was looking at. And it does come out to somewhere just over fifty persons to come out to a 99% chance that you'll find two people matching.

Now comes the difficulty of explaining it to someone. I trusted the fact that it worked, but I was making a dog's breakfast of the explanation.
 
Thanks, guys. I sat and did some of the math, and it actually checks out once I got an understanding of the "series" (if that's the right term) I was looking at. And it does come out to somewhere just over fifty persons to come out to a 99% chance that you'll find two people matching.

Now comes the difficulty of explaining it to someone. I trusted the fact that it worked, but I was making a dog's breakfast of the explanation.

I understand how it works as well, but have great difficulty explaining it. I just tried it with a daughter, and made a hash of it.
 
You can also think of the problem as asking each person in a row what their birthday is, then comparing that to the list of 'taken' days (birthdays already claimed by others). What is the probability that a person would manage to name one that hadn't been claimed by someone else?

Person 1: 365/365
Person 2: 364/365
Person 3: 363/365
and so on. And to figure out the cumulative probability you multiply them all together. Even relatively low probabilities tend to accumulate quickly.
 
The formula actually should be:

1 - 365! / [(365-n)! * 365^n]

As a tangent, I'll point out that 365! is a gigantic number, much larger than a googol (10100). As a result, you can't easily work this out on most hand calculators (as written), even though the final answer is always a number between 0 and 1.
 
The formula actually should be:

1 - 365! / [(365-n)! * 365^n]

As a tangent, I'll point out that 365! is a gigantic number, much larger than a googol (10100). As a result, you can't easily work this out on most hand calculators (as written), even though the final answer is always a number between 0 and 1.
With n=23, Cabbage's formula produces
Code:
38093904702297390785243708291056390518886454060947061/75091883268515350125426207425223147563269805908203125

I'm not sure that much accuracy is really necessary, especially with a formula that doesn't take leap years into account.
 
This is a great example showing how useful it can be to recognize that ...

P = 1 - (~P)

Seems so simple/obvious that initially one questions why state something so plain and simple. Then you show them this thread's problem.
 
With n=23, Cabbage's formula produces
Code:
38093904702297390785243708291056390518886454060947061/75091883268515350125426207425223147563269805908203125

I'm not sure that much accuracy is really necessary, especially with a formula that doesn't take leap years into account.

I worked out that formula well before I was aware of stuff like Stirling's approximation, so the fact that I had no idea how big 365! was was a real issue.

There are much bigger sources of error anyway, like the fact that birthdays aren't distributed randomly, and twins.
 
Don't large factorials have a long string of zeros at the tail end?
Yes.

1! has 0 zeros at the end.
10! has 2 zeros at the end.
100! has 24 zeros at the end.
1000! has 249 zeros at the end.
10000! has 2499 zeros at the end.
100000! has 24999 zeros at the end.

Did you have some other factorial in mind?
 
Don't the number of zeros at the end of a power-of-ten factorial have a long string of nines at the end?

Yes, indeedy.

~~ Paul
 

Back
Top Bottom