T'ai Chi : Some math/stat questions for you

Originally posted by T'ai Chi
1. For a mound-shaped distribution, what is a decent way to estimate the standard deviation from the range?

Assuming approximate normality, take the range/4 for small data sets, or range/5 or range/6 for larger datasets.
I'm not too happy with this answer. The appropriate denominator goes to infinity as the sample size goes to infinity, though much more slowly to be sure.
9. If we only observe the outcomes from coin flipping (Heads = 1, Tails = 0): 1000100010101, what is the most sensible estimate of the probability of Heads? What general mathematical technique would you use here?

5/13 would be a sensible estimate. Call each Head, a sucess, p, and each tail, a failure, as 1-p. From our string of 1's and 0's, our likelihood function (multiplying them all together) is p^5*(1-p)^8. Differentiating this and setting it equal to 0, yields p^=5/13. (of course, you still have to show it is a maximum)

The question arose of 'What do we do if we get a string like 111111, or 0000? Does the above Maximum Likelihood estimation method break down? There is something called a Wilson estimate that overcomes this. It says that instead of estimating p as X/n, where X is the number of successes, it estimates p as p=(X+2)/(n+4).
What do you mean by "break down"? The maximum likelihood estimate of the coin's probability is indeed 1 and 0, respectively, in those cases. If those answers don't seem right, it means that your intuitive notion of "the most sensible estimate" does not correspond to the maximum likelihood estimate. So why use maximum likelihood in other cases? It's not as obviously wrong in those cases, but it still isn't "the most sensible estimate," I'd say. For one reason, it doesn't take into account any prior information you may have.

I'm not sure how (X+2)/(n+4) was arrived at. If the question is "what's the probability that the next toss will yield heads?" (which is not exactly the same question as "what's the best estimate of the coin's 'true' probability?"), Laplace's rule of succession gives (X+1)/(n+1). That assumes a uniform prior for the coin's probability. In other words, we suppose that all the possible coin probabilities, from 0 to 1, are equally likely a priori. (This is unlikely to correctly represent our state of knowledge about a real coin, of course.)
30. For what distributions is range/SD >= sqrt(2) ?

All of them. I found this in my notes from a theory class (without proof). I'm trying to prove it, but am having a hard time.
For a given range, the distribution with the largest variance is the one where all the probability is as far from the mean as possible. Without loss of generality, take the mean to be zero. If the range is 2r, place half the probability at -r and the other half at r. In other words, consider a random variable X with P(X = -r) = P(X = r) = 1/2. The standard deviation is r, so range/SD = 2.

I'm not sure why your notes say sqrt(2). Have I made a mistake? Can anyone give a distribution where the ratio is less than 2?
 
Number Six said:
13 and 14 also need independence to be true.

16 needs independence and identical Exponential distributions.


Oops, yes, thanks.


Shuffle the cards and place them face down on the table. What is the probability that the top card is the 3 of Clubs. Most said "1 in 52" but I thought that a better answer was "Either 0 or 1" because after the cards are sitting on the table the top card is either the 3 of Clubs or is not the 3 of Clubs.


Then your probabilities for the cards' outcomes do not form a true probability distribution, because the probabilites for the individual cards add up to more than 1.

Right?
 
69dodge said:
I'm not too happy with this answer. The appropriate denominator goes to infinity as the sample size goes to infinity, though much more slowly to be sure.What do you mean by "break down"?


Well, it is an approximation. :) The proportion of cases beyond 6 standard deviations is very tiny.


So why use maximum likelihood in other cases?... For one reason, it doesn't take into account any prior information you may have.


Well, maximum likelihood estimation is used in many situations and has been shown to be accurate in many situations. I agree about the prior information. If we had prior information, from flipping the same coin, we could certainly incorporate it. Although, one could argue that incorporating prior information from flipping other coins might not be appropriate.


I'm not sure how (X+2)/(n+4) was arrived at.


I'm not sure exactly how, mathematically, either. I mean, by adding 2 successes and 2 failures it clearly moves the estimate of p away from the extremes, 0 and 1, but how it was chosen mathematically I don't know.


For a given range, the distribution with the largest variance is the one where all the probability is as far from the mean as possible. Without loss of generality, take the mean to be zero. If the range is 2r, place half the probability at -r and the other half at r. In other words, consider a random variable X with P(X = -r) = P(X = r) = 1/2. The standard deviation is r, so range/SD = 2.


Ahh, I see the approach. Oops, the standard deviation of the set {-r, r} is sqrt(2)*r, so range/SD = 2r/(sqrt(2)*r) = sqrt(2).
 
T'ai Chi said:


Oops, yes, thanks.

[/b]

Then your probabilities for the cards' outcomes do not form a true probability distribution, because the probabilites for the individual cards add up to more than 1.

Right? [/B]

No, what I mean to say is that probability distribution for the random variable associated with the question "Is the 3 of Clubs the top card?" either has all the mass on "Yes" or all the mass on "No." And the same is true for the other 51 cards. In other words, we don't know what the probability distribution is but we do know it is one of two things because after the cards are sitting on the table either the 3 of Clubs *is* on the top of the deck (ie, all probability is on "Yes") or the 3 of Clubs *is not* on the top of the deck (ie, all probability is on "No"). More specifically, for 1 of the 52 cards all the probability mass is on "Yes" and for 51 of the 52 cards all the probability mass is on "No." It's just that we don't know which card has the "Yes" distribution and so we assign a 1/52nd probability of it for each of the 52 cards for convenience.
 
Number Six said:

No, what I mean to say is that probability distribution for the random variable associated with the question "Is the 3 of Clubs the top card?" either has all the mass on "Yes" or all the mass on "No." And the same is true for the other 51 cards.

Under your probability assignments, if you shuffled the deck, drew the top card, noted if it was a 3 of clubs or not, then put it back in the deck, and repeated this process for, say, 5200 times, how many 3 of clubs would you expect to draw?
 
T'ai Chi said:


Under your probability assignments, if you shuffled the deck, drew the top card, noted if it was a 3 of clubs or not, then put it back in the deck, and repeated this process for, say, 5200 times, how many 3 of clubs would you expect to draw?

The probability distribution for the number of times the 3 of Clubs comes out on top in the scenario you describe is Binomial with parameters n=5200 and p=1/52, assuming everything is fair and each draw is independent.
 
Number Six said:

The probability distribution for the number of times the 3 of Clubs comes out on top in the scenario you describe is Binomial with parameters n=5200 and p=1/52, assuming everything is fair and each draw is independent.

So p changed from 1 or 0 to 1/52 just from a sampling process?
 
T'ai Chi said:


So p changed from 1 or 0 to 1/52 just from a sampling process?

The sampling process isn't what changes p, rather the reason the p's are different is that we're talking about two different random variables.

Before you make your 5200 draws the number of times that the 3 of Clubs will come up has a Binomial probability distribution with parameters n=5200 and p=1/52.

After you make your 5200 draws "the number of times the 3 of Clubs will come up" no longer is relevant because that statement implies a future event whereas the event in question is in the past. Suppose 104 draws had the 3 of Clubs. Then the probability that 104 draws had the 3 of Clubs is 1 whereas the probability that any other number of draws had the 3 of Clubs is 0.
 
Originally posted by T'ai Chi
Oops, the standard deviation of the set {-r, r} is sqrt(2)*r, so range/SD = 2r/(sqrt(2)*r) = sqrt(2).
I still don't get it.

Var(X) = E(X<sup>2</sup>) - E(X)<sup>2</sup>.
Here E(X) = 0, so we're left with E(X<sup>2</sup>). Now X<sup>2</sup> = r<sup>2</sup> in all cases, so E(X<sup>2</sup>) = r<sup>2</sup> as well. Then just take the square root of the variance to get the standard deviation. Where does the sqrt(2) come from?
 
69dodge said:
I still don't get it.

Var(X) = E(X<sup>2</sup>) - E(X)<sup>2</sup>.
Here E(X) = 0, so we're left with E(X<sup>2</sup>). Now X<sup>2</sup> = r<sup>2</sup> in all cases, so E(X<sup>2</sup>) = r<sup>2</sup> as well. Then just take the square root of the variance to get the standard deviation. Where does the sqrt(2) come from?

Oops(?) I was thinking about data, and using the unbiased estimator for the SD, where the denomenator is n-1 instead of n.

So SD = sqrt[((-r-0)^2+(r-0)^2)/(2-1)] =

sqrt(2r^2) = sqrt(2)*r.
 
Originally posted by Number Six
Shuffle the cards and place them face down on the table. What is the probability that the top card is the 3 of Clubs. Most said "1 in 52" but I thought that a better answer was "Either 0 or 1" because after the cards are sitting on the table the top card is either the 3 of Clubs or is not the 3 of Clubs.
It's certainly true that the card either is or is not the 3 of Clubs. But is "I don't know" the most you can say about it?

What if I asked you whether the top card was red or black? Presumably you would also answer, "I don't know".

Are those two "I don't know"s really the same?

Aren't you pretty sure it's not the 3 of Clubs? What's wrong with giving a number (i.e., 51/52) to describe exactly how sure you are?

I don't see what difference it makes whether the cards were shuffled yesterday, or whether they will be shuffled tomorrow. In both cases, either the top card is the 3 of Clubs or it isn't, in both cases I don't know for certain what the top card is, and in both cases I know that there are 51 other cards which might be on top instead of the 3 of Clubs.

Probability, from a Bayesian point of view, is all about what you know, not about what is 'really the truth'. If you knew for certain what the truth was, you wouldn't need probability in the first place. Conversely, any time you're uncertain about what the truth is, you can use probability to describe that uncertainty. It's irrelevant whether the thing you're uncertain about is in the past or the future; if you're certain about a future event, you wouldn't use probability, and if you're uncertain about a past event, you should use it.
 
Uhhh... I can't even begin to follow these maths questions. :p

But what does seem to jump out at me was the fact that Bill did in fact challenge Tai-Chi to prove that he had the expertise in the field. Furthermore Bill asserted that he does not (Tai-Chi doesn't have the expertise). I think it's only fair that Bill now produces his proof as to why he could judge whether Tai-Chi had the expertise in the first place or did not.
 
Originally posted by T'ai Chi
Oops(?) I was thinking about data, and using the unbiased estimator for the SD, where the denomenator is n-1 instead of n.

So SD = sqrt[((-r-0)^2+(r-0)^2)/(2-1)] =

sqrt(2r^2) = sqrt(2)*r.
Ok, I see what you're doing. The question mentioned distributions, not samples, so that's why I did what I did.

I'm not sure why you'd compare the range of the sample to (an estimate of) the standard deviation of the population. Why wouldn't you also estimate the range of the population?
 
I agree that we treat the situation as if there were a 1 in 52 probability that the card on the top of the deck was a 3 of Clubs. I mean, if someone offers us a 51 to 1 bet on the deal we consider it fair even though it is definitely either better than fair or worse than fair (ie, we've already lost or won the bet as soon as we make it). But we just proceed on that basis because of our ignorance about the outcome an event that has already happened rather than because of of the probability distribution of such an event.
 
I tried to approach it by seeing if I can re-write the inquality into something that I recognize. I think I may have got it, not totally sure, since it is kind of late. :)

We know that

range/SD >= sqrt(2) is equivalent to

(max-min)^2/2 >= Var, which is

(max-min)^2/2 >= SUM (X_i-X_bar)^2/n (i=1 to n), for all n, so

Possible error somewhere between these lines I feel..

(max-min)^2/2 >= SUM (X_i-X_bar)^2/2 (i=1 to n), which implies

(max-min)^2/2 >= (X_i-X_bar)^2/2 (for any X_i), which implies

(max-min)^2 >= (X_i-X_bar)^2, which implies

max-min >= X_i-X_bar.

We know that X_bar >= min, so therefore, we have shown(?) that:

range/SD >= sqrt(2).
 
What is the probability of the top card being the 3 of clubs (3C)? The top card is either 3C or not, which is a 0 or 1 outcome, but I don't feel that either probability 0 or 1, is useful.

P(3C)=0 or P(3C)=1, how would you interpret those probabilities? In the first case, you are saying that the 3C is definitely not on top. In the second case, you are saying that the 3C is definitely on top. In reality, you have no clue what is really on top because you don't have access to that information.
 
T'ai Chi

Under your probability assignments, if you shuffled the deck, drew the top card, noted if it was a 3 of clubs or not, then put it back in the deck, and repeated this process for, say, 5200 times, how many 3 of clubs would you expect to draw?

If my memory did not let me (I hope so,I don't even remember when I played cards last time :-) ) there are 4 '3 of club' in a packet of cards.I'd say therefore that the probability to obtain a 3 of club as the first card is P[3C]=4/52=1/13 whilst the probability to not obtain it is P[~3C]=12/13.

Now if I chose the probability of confidence to be 0.99 (with α=β ) by applying Laplace's formula for n=5200 [5200 is enough big to say that the distribution of appearance of a 3 of club as the first card can be assumed as being a normal distribution]--->

[(N1/5200)-(1/13)] * {√ [5200*13*13/1*12]}=2.57

N1=~449 ---> N belongs to [ 351-449 ] (with a confidence of 0.99).
 
T'ai Chi said:
What is the probability of the top card being the 3 of clubs (3C)? The top card is either 3C or not, which is a 0 or 1 outcome, but I don't feel that either probability 0 or 1, is useful.

P(3C)=0 or P(3C)=1, how would you interpret those probabilities? In the first case, you are saying that the 3C is definitely not on top. In the second case, you are saying that the 3C is definitely on top. In reality, you have no clue what is really on top because you don't have access to that information.

Yes, in reality we have no clue what is really on top and so the probability 1/52 is useful although not correct. OTOH, if you say "I'm going to shuffle the cards, what is the chance the 3C will be on top at the end?" the probability 1/52 is correct (and useful too).
 
T'ai Chi said:
I tried to approach it by seeing if I can re-write the inquality into something that I recognize. I think I may have got it, not totally sure, since it is kind of late. :)

We know that

range/SD >= sqrt(2) is equivalent to

(max-min)^2/2 >= Var, which is

(max-min)^2/2 >= SUM (X_i-X_bar)^2/n (i=1 to n), for all n, so

Possible error somewhere between these lines I feel..

(max-min)^2/2 >= SUM (X_i-X_bar)^2/2 (i=1 to n), which implies

(max-min)^2/2 >= (X_i-X_bar)^2/2 (for any X_i), which implies

(max-min)^2 >= (X_i-X_bar)^2, which implies

max-min >= X_i-X_bar.

We know that X_bar >= min, so therefore, we have shown(?) that:

range/SD >= sqrt(2).
Some of your steps imply the next step, while some imply the previous step. This is not good. :) To prove something using a chain of implications, you need to start with something you know, and end up with what you want to prove; and all the implications have to be in that direction.

Anyway, I agree that range/SD >= sqrt(2). I'm making the stronger claim that range/SD >= 2.
 
69dodge said:
I'm making the stronger claim that range/SD >= 2.

Well, that can be disproved by finding one counter-example where range/SD < 2.

data = {2,4}
The range = 2, and the SD = sqrt(2), so

the range/SD = sqrt(2) < 2.


Some of your steps imply the next step, while some imply the previous step. This is not good. To prove something using a chain of implications, you need to start with something you know, and end up with what you want to prove; and all the implications have to be in that direction.


For any dataset, we know that X_bar >= min. I think the inequality works because of this fact, I just can't show it in mathematical steps yet. :) :)
 

Back
Top Bottom