• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Probability question help

With 9 slots you want to compute (7/9)^28 * 36, which is .0316.
9 events, not 9 slots. :)

The problem with using 9 events is that 9<10 so the only way to have exact two empty slots is to have the first 8 events be placed in different spots and to have the last event placed in an already occupied slot.

Wouldn't the probability of the first 8 getting different slots be 10/10 x 9/10 x 8/10 x 7/10 x 6/10 x 5/10 x 4/10 x 3/10, and the probability of the last event hitting an already occupied slot be 8/10 for a product of .0145? Multiplying that times the 45 ways gives .652 which sounds like a reasonable number.

Raising (8/10) to a power produces too large of a number because it includes the possibility that all events fell into just 7 or just 6 or just 5 (etc.) categories.
This is not the case. You are correct about using 10/10 * 9/10 ... 3/10 * 8/10, but after that there is no need to multiply by 45. The problem with the example of using only 9 events isn't that 9<10, since 11 events yields (8/10)^11*45 = 3.865470566.
 
9 events, not 9 slots. :)


This is not the case. You are correct about using 10/10 * 9/10 ... 3/10 * 8/10, but after that there is no need to multiply by 45. The problem with the example of using only 9 events isn't that 9<10, since 11 events yields (8/10)^11*45 = 3.865470566.

I wasn't clear enough in my post. (8/10)^n is always wrong. One must start with the 10/10 * 9/10 * 8/10... and then multiply by (8/10)^(n-8). So 11 events is .0181 * (8/10)^3 or .0181 * .512 = .00927. This result must be multiplied by 45 because if it is not then one ends up with an answer far too small to make sense. Randomly dropping 11 items into 10 slots will result in exactly two of the slots being empty much more than ~1 % of the time.
 
So to answer the original questions:

exactly two empty slots: .00939 or just under 1%

exactly three empty slots:
(10/10 * 9/10 * 8/10 * ... * 4/10) * (7/10)^21 * 120 =

.0605 * .000559 * 120 = .00405 or just under one half of one percent.


I'll take a stab at the others later. I'm off to work right now.
 
1. What is the probability that if you randomly drop 28 unique events into 10 different slots,

2 slots would have 0 events,
3 slots would have 0 events,

2. What is the probability that if you randomly drop 28 unique events into 10 different slots,

3 slots would have a total of 18 or more events,
2 slots would have a total of 18 or more events,
Unique events? I'm pretty sure you want indistinguishable events here, so that's what I'll go with.

Whether or not these questions are actually related to whatever you want to calculate, here's how to find the answers: first find how many ways the desired thing can occur, then divide by how many ways it can occur in general. The desired thing: choose K slots to be empty from 10. Put an event in each of the other slots. Then use the formula for combinations with repetitionWP to put the remaining 28-K events in the 10-K possible slots. In general: Use the formula for combinations with repetition to put 28 events in 10 slots.

Exactly 2 slots with 0 events:
[latex]$$\frac{{{10}\choose{2}}{{[28-(10-2)]+(10-2)-1}\choose{(10-2)-1}}}
{{{28+10-1}\choose{10-1}}}=0.32122$$[/latex]

Exactly 3 slots with 0 events:
[latex]$$\frac{{{10}\choose{3}}{{[28-(10-3)]+(10-3)-1}\choose{(10-3)-1}}}
{{{28+10-1}\choose{10-1}}}=0.28553$$[/latex]

The other two are much harder analytically, but I've run a program to find out the answers for the specific questions you've asked: the probability of having 3 slots that have at least 18 events is 88532120/124403620=0.71165, and of having 2 slots that have at least 18 events is 22676090/124403620=0.18228.

Edit: Haha, nope. This assumes that each configuration is equally likely, but like you said, there's only one way to put all 28 events in one slot, but lots of ways to put 14 each in two slots. Ignore the above.
 
Last edited:
Good point. I'll think about it some more.

Beth, you're a professional at this yes?

What does chi-squared say about that particular distribution?

(Yes, I could probably run the test myself, but why re-invent the wheel?)
 
I did a chi-square test on your data, assuming the following
  • it's based on 200 years (10 each of years ending in 0-9)
  • whether or not there was a recession in any given year is not dependent on there being a recesssion or not in any other given year
So, given 28 events in 200 years, there is an expected value of 2.8 events and 17.2 non-events per year ending in specific number. Using the CHITEST function in XCEL, I get:
  • Chi-Square ( with 9 degree of freedom)= .00326
  • at 9df, the critical value of Chi-Square (p=.05)= 16.92, thus we can't reject the null hypothesis that there is no significant difference between years in the data set you provided.
Comments-
When you look at just the table of observed values presented in the OP, it looks like there is something going on, with some years getting 0 and some getting 6 or 7. However, when you look at the what was observed vs. what is expected given the entire population of data (i.e. you'd expect 2.8 events per year ending in), the observed values don't seem as "non-random").
That said, I am not sure that the assumption of independence is valid (e.g. an event in one year has no "influence" on an event in another). This would depend on the average length in years of economic recessions. If recession generall last only a year or less, than an event in a year ending in 0 would generally preclude there being an event in the following year ending in 1. This might explain why, in the observed data from the OP, there are not any consecutive years-ending-in with high numbers of events.

Other than the possible violation of indepence, I think the non-parametric chi-square is an appropriate test for answering the general question "are these data different than what you would expect given no difference between years?"

ETA- It has since been pointed out in the following post that chi-square is not appropriate when the expected value for any cell is less than 5, so read the above with caution and disdain!
 
Last edited:
Beth, you're a professional at this yes?

What does chi-squared say about that particular distribution?

(Yes, I could probably run the test myself, but why re-invent the wheel?)


The chi-squared test is inappropriate here because the expected value in each cell would be 2.8 and the chi-squared test should only be used when the expected value is 5 or more in each cell.

Sorry about the erroneous computations I posted Saturday. I wasn't, ahem, completely sober. I should know better than to attempt to make computations in that condition.
 
beth said:
The chi-squared test is inappropriate here because the expected value in each cell would be 2.8

Thanks beth, I knew I was probably wrong somewhere. It's been at least 15 years since my last chi-square (does that make me cooler or more of a dork?). Is there a more appropriate non-parametric test for these type of data?

What about the assumption of independence? Does the possible lack thereof effect some of the probability calculation in done in the other posts?
 
The chi-squared test is inappropriate here because the expected value in each cell would be 2.8 and the chi-squared test should only be used when the expected value is 5 or more in each cell.

Well, it's got too many cells for Fisher's exact test. What is an appropriate test, then?

My point -- and I think that jskowron backs me up on this -- is that by plotting the data first and then looking at what kind of odd pattern emerges implicitly cherry-picks. What needs to be established first is if the data, taken as a whole, looks unusually non-uniform.

If the number in the cells is too small, can we turn the problem around? For the same 200 year set, figure out if the number of years in which there isn't a recession is unusually non-uniform?
 
drkitten said:
If the number in the cells is too small, can we turn the problem around? For the same 200 year set, figure out if the number of years in which there isn't a recession is unusually non-uniform?

That would still give an expected value of <5 in the same number of cells and would not change the chi-square value.
 
Thanks beth, I knew I was probably wrong somewhere. It's been at least 15 years since my last chi-square (does that make me cooler or more of a dork?). Is there a more appropriate non-parametric test for these type of data?

What about the assumption of independence? Does the possible lack thereof effect some of the probability calculation in done in the other posts?

No problem. Otherwise your analysis was on target.

I'm sure that there is an appropriate test, but I'd have to do some research - i.e. actually look stuff up in books I haven't read in years - before I could say which it is.
 
If you don't care about the closed form of the equation, here is the solution.
Let probability P(n, f, k) be the probability that with n slots with k events, that exactly f are filled.
Let C(1, 1) = 1
Let C(1, k) = 0
Let C(i, k) = C(i-1, k-1) + i * C(i, k-1)
then P(n, f, k) = C(f, k) * (n!/(n-(f+1))!)/(n^k)

In the case of f = 2, the closed form for this is
P(n, f, k) = (2^(k-1) - 1) * (n-1)/(n^(k-1))

This is for exactly 2 being filled at the end, not 2 being empty. The general case is not in closed form, a friend and I are having some difficulty getting it there. We have written a computer program that will figure out what P(n, f, k) should be based on the above formula, and then it would randomly play this game, 1,000,000 times, and the empirical data matches what the formula predicts, but, we can not put this in closed form, so if anyone can put it in closed form for all cases, that would be nice.
Here is the program:
Code:
#include <stdio.h>
#include <math.h>
#include <malloc.h>
#include <stdlib.h>

bool playgame(int n, int f, int k)
{
	bool* val = (bool*)(malloc(n*sizeof(bool)));
	
	for(int x = 0; x < n; x++)
		val[x] = false;
	
	for(int x = 0; x < k; x++)
		val[lrand48()%n] = true;
	
	int total = 0;
	for(int x = 0; x < n; x++)
	{
		if(val[x])
			total++;
	}
	
	free(val);
	return total==f;
}

double runplaygame(int n, int f, int k)
{
	int wins = 0;
	int max = 1000000;
	for(int x = 0; x < max; x++)
	{
		if(playgame(n, f, k))
			wins++;
	}
	
	return (double)wins / (double)max;
}

double solve_c(int n, int f, int k)
{
	//Allocate array
	double* c_mem = (double*)malloc((f + 1) * sizeof(double));
	
	//Initialize array
	for(int i=0; i<=f; i++)
		c_mem[i] = 0.0;
	c_mem[1] = pow((double)n, -(double)(k));
	
	//Compute result
	for(int x=1; x<k; x++)
	for(int i=f; i>0; i--)
	{
		c_mem[i] = c_mem[i-1] + i * c_mem[i];
	}
	
	double res = c_mem[f];
	
	free(c_mem);
	
	return res;
}

double solve_p(int n, int f, int k)
{
	double c = solve_c(n, f, k);
	
	for(int i=n; i > n-f; i--)
		c *= (double)i ;
	
	return c;
}

void printComparison(int n, int f, int k)
{
	double actual = solve_p(n, f, k);
	double experiment = runplaygame(n, f, k);
	
	printf("actual: %f, experiment: %f\n", (float)actual, (float)experiment);
}

int main(int argc, char** argv)
{
	
	printComparison(10,1,1);
	printComparison(10,2,6);
	printComparison(10,8,28);
	printComparison(5,3,10);
}
Here is the output:
Code:
actual: 1.000000, experiment: 1.000000
actual: 0.002790, experiment: 0.002783
actual: 0.071248, experiment: 0.071200
actual: 0.057324, experiment: 0.057504
 
As if there wasn't enough confusion in this thread already, I get a different answer for the first problem. The chances that there are exactly two slots open is 0.0712. I used a Markov Chain analysis. Since that involves taking a 10x10 matrix to the 28th power, it's kind of hard to show my work. Doing a quick simulation in excel gets me 0.0703, leading me to believe I'm on the right track.
 
As if there wasn't enough confusion in this thread already, I get a different answer for the first problem. The chances that there are exactly two slots open is 0.0712. I used a Markov Chain analysis. Since that involves taking a 10x10 matrix to the 28th power, it's kind of hard to show my work. Doing a quick simulation in excel gets me 0.0703, leading me to believe I'm on the right track.

I did a quick simulation in SAS. I get a probability of .07 for 2 open slots and .005 for 3 open slots. I think you are on the right track!
 
I believe the answers to the first question are .002 and .000046.

After that it gets hard. Hell, I might be wrong about this.

You may not have enough points for this to be significant. Also, don't forget to try the tens digit of the year.

~~ Paul

Upon further review, I believe that this initial analysis was indeed correct. We have all over-complicated the problem. All that is required is conventional probability analysis. Each 2- and 3-year period must contain probabilities for a number of events ranging between 0-28 that sum to 100 percent. Thus, on average, the probability of a given event is 100% divided by 29, which equals 3.45%. The higher probabilities will cluster around 5.6 events for a 2-year period (2.8 times 2) and 8.4 events for a 3-year period. (2.8 times 3). Zero events are far below the average, and 18 events are even farther above the average. So, the probabilities would be quite low for zero events and even lower for 18 events. Specifically, for 18 events, the probabilities, according to my calculation, are only .000000037 for a 2-year period and .0000145 for a 3-year period.
 
As if there wasn't enough confusion in this thread already, I get a different answer for the first problem. The chances that there are exactly two slots open is 0.0712. I used a Markov Chain analysis. Since that involves taking a 10x10 matrix to the 28th power, it's kind of hard to show my work. Doing a quick simulation in excel gets me 0.0703, leading me to believe I'm on the right track.
That is the same answer that I got for 28 events in 10 slots with exactly 2 open.
 
I think I have a (somewhat) closed form solution:

n = number of slots
k = number of events
m = number of open slots
F(n,k,m) = probability function

[latex]F(n,k,m) = \sum_{i=0}^n(-1)^{i+m-1}\binom{n}{i}\binom{i}{m}(1-i/n)^k[/latex]

The starting point came from GreedyAlgorithm's formula in this thread. He came up with a closed formula for when all slots are filled. It's then a matter of using a recursion formula to work downwards from there.
 
Does that formula work?

The idea is right, in any case. Inclusion-exclusion is the key.

Here's what I get (but my m is the number of non-empty slots, like HappyCat's f):

[latex]\[\frac{\binom{n}{m}\sum_{i=0}^m(-1)^{m-i}\binom{m}{i}i^k}{n^k}\][/latex]
 
Duh. boooeee's formula does work (except the signs are backwards, I think). Sorry.
 

Back
Top Bottom