The "Process" of John Edward

I did a google search and found this nifty little Poisson calculator
http://hyperphysics.phy-astr.gsu.edu/hbase/math/poifcn.html
If I plug in BillHoyt's numbers, I get this:

If the probability of a single event is p =.1336 and there are n = 85 events, then the value of the Poisson distribution function at value x =18 is 1.802468785965208 x 10^ -2. For these conditions, the mean number of events is 11.356 and the standard deviation is 3.1366922705295783.

If I plug in Lurker's numbers for "A", I get this:

If the probability of a single event is p =.065 and there are n = 231 events, then the value of the Poisson distribution function at value x =6 is 4.796096105218159 x 10^ -3. For these conditions, the mean number of events is 15.015 and the standard deviation is 3.7468686926552417.

I would seem that the numbers for Lurker's office are even more statistically significant than for JE's J guesses. I think that this may be support for the argument being made by both Lurker and myself regarding small sample sizes and the meaningfullness of the analysis.

edited to fix link
 
Lurker said:


Agreed again. But I was merely comparing averages. And clearly I was not comparing apples to oranges. I was comparing oranges from one shipment to all the known oranges. Methinks you are being unreasonably pedantic in your definitions.

I note you did not have the same problems when discussing the PASS/FAIL test for the rare/common letter test. Why do you think that is...

Lurker

I'm hardly being pedantic here. You simply can't compare percentages unless you know the denominators are truly the same. If one could do this, then why would one bother with the hassle of inferential statistics? Why not simply set a percentage difference level and do what you did?

Actually, I did have the same problem with the rare/common letter test after all the specifics were discussed. The problem there, though, lies in the combined male/female data. I'm looking into the Census data to try to get those denominators so that the combined male/female percentages are done properly.

Cheers,
 
OK, Bill. Let's just go back and say I was a bit more precise and when I made the comparison I meant I was comparing the means. Does that help? I never meant to imply that my "for kicks and giggles" test was any sort of statistical analysis. All I did was comapre a SAMPLE versus the census. Nothing more, nothing less.

And if you want to make the claim that the confidence intervals created by the standard deviations are too high that is your option.

So, after all this talk do you have any interest in forming an opinion on how big a SAMPLE it would take to form a fairly accurate (you decide the parameters) histogram compared to census data? I still stand by it would take far more than 78 unless you are content with data that is all washed over.

Lurker
 
Thanz said:
I did a google search and found this nifty little Poisson calculator
http://hyperphysics.phy-astr.gsu.edu/hbase/math/poifcn.html

If I plug in BillHoyt's numbers, I get this:



If I plug in Lurker's numbers for "A", I get this:



I would seem that the numbers for Lurker's office are even more statistically significant than for JE's J guesses. I think that this may be support for the argument being made by both Lurker and myself regarding small sample sizes and the meaningfullness of the analysis.

Check your numbers again, Thanz. Also, check that calculator. For Poisson, the mean = the variance. The standard deviation is the square root of the variance. Those reported std devs. are off. More importantly, though, you reported the pdfs, not the cdfs of the tails of interest. Even more important than that, however, is that a test that confirms the null hypothesis does not mean the test procedure is wrong or that the sample size is wrong!

This is fundamental to understanding testing and statistics. If we go through the alphabet with a significance level of .05 and we test each letter individually, and the sample perfectly reflects the population, we will still expect to get a null hypothesis rejection.

Now let's do what you just did for "B", but get the tail and its cdf straight. Lurker reported .039, meaning he counted 9 "B"s. The expected percentage is .046, or 10.626 for this sample size. Of interest here is the left-tail, or the probability of this few observed or less. That is, .383. We accept the null hypothesis here, for B.

Cheers,
 
Lurker said:
OK, Bill. Let's just go back and say I was a bit more precise and when I made the comparison I meant I was comparing the means. Does that help? I never meant to imply that my "for kicks and giggles" test was any sort of statistical analysis. All I did was comapre a SAMPLE versus the census. Nothing more, nothing less.

And if you want to make the claim that the confidence intervals created by the standard deviations are too high that is your option.
I'm running out of ways to get the point across to you, Lurker. The "means"? How does that change the denominator problem? The denominator problem is arithmetic. You can't take 50% of an apple, add it to 50% of an orange and get 100% of anything.

Here is the Australian government making the same point to people handling their census data (emphasis mine):

"Any Indigenous statistical comparisons made between two censuses must be made with caution and should not be accepted at face value until the user has explored, to his/her satisfaction, the possibility that the differences might be solely or largely a consequence of non-demographic increase in census counts. Failure to do this could lead users to draw incorrect conclusions about whether changes in social conditions have occurred.

2 Use percentages

Users should present their statistical estimates as percentages where both numerator and denominator are data from the same census. Analyses of intercensal statistical differences should be made by comparing percentages from two times, rather than directly comparing counts or numbers. In most instances appropriate percentages will be less biased than the numerator and denominator counts. In particular, percentages are estimated without bias, if the bias in the counts is the same in percentage terms for the numerator and denominator."

source
So, after all this talk do you have any interest in forming an opinion on how big a SAMPLE it would take to form a fairly accurate (you decide the parameters) histogram compared to census data? I still stand by it would take far more than 78 unless you are content with data that is all washed over.

Lurker
I have already commented on this. It depends on a large number of choices. Most particularly the choice of hypothesis.

Cheers,
 
BillHoyt said:
Check your numbers again, Thanz. Also, check that calculator. For Poisson, the mean = the variance. The standard deviation is the square root of the variance. Those reported std devs. are off. More importantly, though, you reported the pdfs, not the cdfs of the tails of interest. Even more important than that, however, is that a test that confirms the null hypothesis does not mean the test procedure is wrong or that the sample size is wrong!
I understand that, but my point is that small sample sizes will give strange, unreliable results.

What is the hypothesis here - for Lurker's office?

I say that there is some sort of "A-hole" located somewhere in Lurker's office. If someone with a name that starts with "A" gets too close, they are sucked into the "A-hole" never to be heard from again.

Therefore, I hypothesize that the number of people in Lurker's office with names that start with "A" will be under-represented when compared to the normal population.

Let's look at the data. Hmmm.... it looks like it supports my hypothesis quite strongly!

Look out Lurker! There is an "A-hole" somewhere in your office!!

Now let's do what you just did for "B", but get the tail and its cdf straight. Lurker reported .039, meaning he counted 9 "B"s. The expected percentage is .046, or 10.626 for this sample size. Of interest here is the left-tail, or the probability of this few observed or less. That is, .383. We accept the null hypothesis here, for B.
I don't know what the calculator is doing, but when I plug those numbers in I get a mean of 10.626, but a standard dev. that is not the square root of that number.
 
>You can't take 50% of an apple, add it to 50% of an orange and >get 100% of anything.

Why do you persist in such a silly comparison? I think most here can see that your comaprison is invalid.

>2 Use percentages

>Users should present their statistical estimates as percentages >where both numerator and denominator are data from the >same census. Analyses of intercensal statistical differences >should be made by comparing percentages from two times, >rather than directly comparing counts or numbers. In most >instances appropriate percentages will be less biased than the >numerator and denominator counts. In particular, percentages >are estimated without bias, if the bias in the counts is the same >in percentage terms for the numerator and denominator."

Need I point out that I DID use percentages? I think YOU are misinterpreting the Aussies here. I did not compare numbers, but percentages just like advised above. You are clearly confused if you thought otherwise. Why did you post #2 here? Did you not realize I was referring to percentages? Please explain cause I cannot see why you would post #2.

Further, I note you again avoided attempting to provide your opinion on how many it might take. If you have done so, please repeat the ballpark number. If not, why not? We' re not requesting a thesis here, just a freakin opinion.

Lurker
 
Thanz said:
I understand that, but my point is that small sample sizes will give strange, unreliable results.
Good ol' Thanz, I point out fundamental errors and out come the insults. You didn't get the point. You claim you understand but you posted pdfs and not cdfs. Those numbers are wrong. Period. You want cumulative probabilities.

I don't know what the calculator is doing, but when I plug those numbers in I get a mean of 10.626, but a standard dev. that is not the square root of that number.
Me too. Do you understand that the program is wrong?

Cheers,
 
Lurker said:
Why do you persist in such a silly comparison?
Lurker,

If you're being deliberately obtuse, I'm done with you. The "silly comparison" is perfectly apt. The australian web site I quoted fully supports it. I even highlighted the text for you.
 
BillHoyt said:


The way you frame the hypothesis doesn't get to the issue. This would simply distinguish between random and non-random letters. We're specifically interested in the idea that the letter frequencies might give away cold-reading. Therefore, we're looking for higher hits on the more frequent letters or lower hits on the lower frequency letters.


But Bill, you are comparing apples to oranges here. Letter frequencies as defined by what? The census? Versus a JE audience? Aren't you comparing Population A to Population B? How can you derive anything meaningful from this?

Please define "more frequent letters" and "lower frequency letters".

thanks!

Lurker
 
BillHoyt said:

Lurker,

If you're being deliberately obtuse, I'm done with you. The "silly comparison" is perfectly apt. The australian web site I quoted fully supports it. I even highlighted the text for you.

Well, you have yoru opinion, I have mine. I find you continually use apples, oranges cause you think it bolsters your argument to use that comaparison.

Why didn't you address my Houston vs US murder rate? That sort of stat is pretty commonly seen, is it not?

Lurker
 
Lurker said:


But Bill, you are comparing apples to oranges here. Letter frequencies as defined by what? The census? Versus a JE audience? Aren't you comparing Population A to Population B? How can you derive anything meaningful from this?

Please define "more frequent letters" and "lower frequency letters".

thanks!

Lurker
Lurker,

Have you read anything I've written? Have you read the web site? Do you not understand?
 
Lurker said:


Well, you have yoru opinion, I have mine. I find you continually use apples, oranges cause you think it bolsters your argument to use that comaparison.

Why didn't you address my Houston vs US murder rate? That sort of stat is pretty commonly seen, is it not?

Lurker

Alright. You're wasting my time. I have told you everything you need to understand this. Absolutely everything. You think about Houston in the context of what I said about the units of the denominator.
 
BillHoyt said:


Alright. You're wasting my time. I have told you everything you need to understand this. Absolutely everything. You think about Houston in the context of what I said about the units of the denominator.

Do you have reservations about the Houston example? Why? Why not?

And if you think you provided everything people need to understand this then there are two possiblities:

1. I am an idiot, which is possible but my two degrees mitigates that possibility somewhat.

2. You are a truly atrocious teacher and we should all be thankful that you are a bouncer at a bar and not a teacher.

Let's use Occam's Razor, shall we?

Lurker
 
Bill:

One more point which I think YOU need to consider.

In my office of 231 people I used a sample size of 231. What is the standard deviation for the frequency of each letter?

Lurker
 
Lurker said:
Bill:

One more point which I think YOU need to consider.

In my office of 231 people I used a sample size of 231. What is the standard deviation for the frequency of each letter?

Lurker

Finally, you might begin to answer your own question about Houston if you turn this gaffe around.
 
I still don't understand why we're using the Poisson here. Aren't we interested in a group of letters, the high or the low frequency letters? Or are we just interested in the J?
 
T'ai Chi said:
I still don't understand why we're using the Poisson here. Aren't we interested in a group of letters, the high or the low frequency letters? Or are we just interested in the J?

We can define the test any way we want, given that the null hypothesis, the data set, the distribution and the level of significance all work together. I choose "J" because it is the highest frequency initial in the population.

Cheers,
 
BillHoyt said:

Good ol' Thanz, I point out fundamental errors and out come the insults. You didn't get the point. You claim you understand but you posted pdfs and not cdfs. Those numbers are wrong. Period. You want cumulative probabilities.

Me too. Do you understand that the program is wrong?
First, what insults? Where in my post did I insult anyone?

Do you disagree that small sample sizes can produce strange and unreliable results?

If my numbers are wrong, what are the right numbers for Lurker's office and the letter A?

I have already admitted that my knowledge of stats is limited. But even with my limited understanding of stats, I can see that Lurker's office representation of the letter "A" is further away from the norm than JE's guesses of the letter "J". Or are you saying that this is incorrect as well?
 

Back
Top Bottom