How irises "reveal personality"

The sample size isn't large enough for the test to mean anything.

The sample size seems more than adequate, since several significant associations were found. Why do you say it isn't large enough?

Linda
 
i disagree with the sample size being big enough. because the results are based on correlation the sample size has to be VERY BIG as any correlation could be a freak chance, especially in the hundreds as in this case. i think you'd need at least 10k to be sure, i may even push for a 100k.

like i've said, there doesn't even seem to be a correlation or at least a good clear correlation. whatever happened to personality being nurture and not nature? this sort of classification of people makes me very uncomfortable.
 
i disagree with the sample size being big enough. because the results are based on correlation the sample size has to be VERY BIG as any correlation could be a freak chance, especially in the hundreds as in this case. i think you'd need at least 10k to be sure, i may even push for a 100k.

But that's the point of using statistical methods! We all realize that we don't want to draw conclusions from freak occurrences. So subjecting something to significance testing is effectively saying "there is a one in 20 chance that this is a freak occurrence". You've got it backwards. If the sample size is too small, then any differences, including freaky chance differences, are likely to be not statistically significant. Small samples prevent you from detecting real differences. They do not artificially inflate your chances of concluding that spurious results are significant.

like i've said, there doesn't even seem to be a correlation or at least a good clear correlation. whatever happened to personality being nurture and not nature? this sort of classification of people makes me very uncomfortable.

I don't think the problem with this study is sample size. It is with performing dozens of comparisons, but holding them to a standard that is only suitable for one comparison. There is a helluva difference between trying once to roll double sixes and trying 90 times to roll double sixes.

And, even if these results are real and reproducible, their predictive ability may be miniscule - e.g. if there is a 10 percent chance you are neurotic (an example of a baseline frequency) and someone looks in your eyes and sees furrows, that may change it to a 10.5 percent chance that you are neurotic.

Linda
 
But that's the point of using statistical methods! We all realize that we don't want to draw conclusions from freak occurrences. So subjecting something to significance testing is effectively saying "there is a one in 20 chance that this is a freak occurrence". You've got it backwards. If the sample size is too small, then any differences, including freaky chance differences, are likely to be not statistically significant. Small samples prevent you from detecting real differences. They do not artificially inflate your chances of concluding that spurious results are significant.

I don't think the problem with this study is sample size. It is with performing dozens of comparisons, but holding them to a standard that is only suitable for one comparison. There is a helluva difference between trying once to roll double sixes and trying 90 times to roll double sixes.
the sample size will technically never be big enough because it's completely correlation, but 430 (or however many) is no where near enough considering all the possible outcomes. huge freak chances often occur from more than 3 times the number of possible outcomes over 10k samples. 430 is not enough to come to ANY conclusion.

And, even if these results are real and reproducible, their predictive ability may be miniscule - e.g. if there is a 10 percent chance you are neurotic (an example of a baseline frequency) and someone looks in your eyes and sees furrows, that may change it to a 10.5 percent chance that you are neurotic.
yes, it could have uses. but personality may not even be genetic so using genetics to determine someones personality may not have any value if it turns out personality is nurture rather than nature. if they want to show it's use with mental conditions, then they should have done so.
 
Last edited:
the sample size will technically never be big enough because it's completely correlation, but 430 (or however many) is no where near enough considering all the possible outcomes. huge freak chances often occur from more than 3 times the number of possible outcomes over 10k samples. 430 is not enough to come to ANY conclusion.

I can't make any sense of this. Can you elaborate?

Linda
 
JMO, but I think you're way off re your requirement for sample sizes. Perhaps you are confusing error variance with confounding variables?

Also, the Meyer article I cited was not a pro-personality test article. The claim in the article was that psych testing in general produces validities that rival or beat those found in other areas of "real science".

I think it's ok to use the term "test" loosely, even in the context of determining whether some medical treatment works (as another example, the EEOC considers an employment intervew to be a test). Screening mammograms is a medical test, and it has only .32 validity for detecting breast cancer (within 1 year).

The validity of using aspirin to reduce heart attacks compared with the validity of using personality to predict job performance, imo, is apples to apples.

I cherry picked the examples because they seemed most interesting to me. The article has about 7 pages of examples. It's a good read for anyone interested in testing in general.
 
as in the possible results from one subject?

the correlation might even strengthen with more testing or it may lose more ground. obviously more testing needs to be done in some form, as these results are far from conclusive.
 
That's the strange beauty of a personality test. Unlike an IQ test, personality can be faked, and people do fake. In fact (though I don't claim expertise here, I have read some of the lit), the current consensus is that willingness to fake a personality test is itself a personality trait, and that that variance is captured by the big 5, and that faking in general seems to have little effect on validity.
The practice of faking one's personality is much more common than you might think. Some do it for personal gain and awards, others for political power and re-election.

That is, actors and politicians, respectively.
 
the sample size will technically never be big enough because it's completely correlation, but 430 (or however many) is no where near enough considering all the possible outcomes. huge freak chances often occur from more than 3 times the number of possible outcomes over 10k samples. 430 is not enough to come to ANY conclusion.

yes, it could have uses. but personality may not even be genetic so using genetics to determine someones personality may not have any value if it turns out personality is nurture rather than nature. if they want to show it's use with mental conditions, then they should have done so.

we have statistical tests for a reason. You do not need a sample size of 100,000 to draw meaningful conclusions. The results found significance at 5% and at 1%. That is enough to draw conclusions.
 
Also, the Meyer article I cited was not a pro-personality test article. The claim in the article was that psych testing in general produces validities that rival or beat those found in other areas of "real science".

Okay. I didn't know if it was you or the authors that were presenting obviously cherry-picked examples.

I think it's ok to use the term "test" loosely, even in the context of determining whether some medical treatment works (as another example, the EEOC considers an employment intervew to be a test). Screening mammograms is a medical test, and it has only .32 validity for detecting breast cancer (within 1 year).

The validity of using aspirin to reduce heart attacks compared with the validity of using personality to predict job performance, imo, is apples to apples.

I cherry picked the examples because they seemed most interesting to me. The article has about 7 pages of examples. It's a good read for anyone interested in testing in general.

Well, there's "loosely" and then there's "strains credulity".

In the study under consideraton, the NEO was used as a way to measure personality - i.e. an accurate indicator of the presence of the personality trait. You brought up mammograms as a comparable example, and I'd accept that as loosely relevant. Is a mammogram an accurate indicator of the presence of breast cancer? Not even close. A mammogram cannot tell you whether or you have breast cancer. It can only select out a group of people that should have a definitive test for the presence of breast cancer (biopsy and pathological examination). So personality testing is similar in validity to a marginally acceptable screening test that no one considers an accurate indicator of the presence of breast cancer.

But considering the kinds of questions we ask when studying therapies, and the vast differences in acceptable outcomes, I confess I am unable to find a way to make "aspirin treatment" and "personality testing" both apples.

Linda
 
Linda, thanks!

I think in terms of effect sizes-- the mean difference between groups on some outcome measure.

I'm pretty sure effect size can by synonymous with validity.

Looking at people who do and do not take aspirin (categorizing people into discrete groups here to make the example easier to understand) the effect size is .08*. In other words, aspirin has a .08 validity / effect on reducing the incidence of heart attacks.

That's a small effect, but if you did an extreme groups comparison-- followed 10000 people around who never took aspirin and 10000 others who always did, at the end there would be a statistically significant mean difference showing less heart attacks in the aspirin group.


The .30 validity for C predicting job performance is small in the sense that only 9% (.3 x .3) of the variance in job performance is explained by C.

That said, it's free to measure and takes 5 minutes. The return on investment for using C to hire people will be amazingly high, even though the validity is .30.

And, going back to the effect size example, get 100 people low in C and compare them to 100 people high in C. I'd bet money you'd find significant and non-trivial differences in job performance across the two groups.



*I'm pretty sure the correlation coefficient is the effect size, but not positive. I'm more used to seeing effect sizes that are caluclated by subtracting the mean difference between 2 groups, and then dividing by some measure of error (d ' ). Does anyone know how to calculate effect sizes from correlation coefficients, or is the correlation the measure of the effect?
 
Linda, thanks!

I think in terms of effect sizes-- the mean difference between groups on some outcome measure.

I'm pretty sure effect size can by synonymous with validity.

I think I understand what you are getting at. Both effect size and validity can be measured by the same tools - e.g. both a person and a table can be measured with a ruler and comparisons can be made as to the size?

Looking at people who do and do not take aspirin (categorizing people into discrete groups here to make the example easier to understand) the effect size is .08*. In other words, aspirin has a .08 validity / effect on reducing the incidence of heart attacks.

That's a small effect, but if you did an extreme groups comparison-- followed 10000 people around who never took aspirin and 10000 others who always did, at the end there would be a statistically significant mean difference showing less heart attacks in the aspirin group.

Yes (an r of 0.08 translates to an effect size of 0.16). I just want to note that the effect of therapy is not usually reported as a correlation (I'm curious about where Meyer et al. got their numbers), as it doesn't provide useful information compared to something like "risk reduction". It doesn't tell you whether or not you should recommend aspirin therapy.

The .30 validity for C predicting job performance is small in the sense that only 9% (.3 x .3) of the variance in job performance is explained by C.

That said, it's free to measure and takes 5 minutes. The return on investment for using C to hire people will be amazingly high, even though the validity is .30.

Under many conditions (including this one), you would actually be worse off than you were before you applied the test. If you use a test that is not particularly accurate (and an r of only 0.30 is a not particularly accurate test), and the condition you are looking for is not present in the majority of those being tested, then you will end up with more false positives than true positives. I can go through an example for you, if necessary.

And, going back to the effect size example, get 100 people low in C and compare them to 100 people high in C. I'd bet money you'd find significant and non-trivial differences in job performance across the two groups.

Yes, but it doesn't help you when deciding whether or not to hire someone.

*I'm pretty sure the correlation coefficient is the effect size, but not positive. I'm more used to seeing effect sizes that are caluclated by subtracting the mean difference between 2 groups, and then dividing by some measure of error (d ' ). Does anyone know how to calculate effect sizes from correlation coefficients, or is the correlation the measure of the effect?

d=2r/(1-r^2)^1/2 and r=d/(d^2+4)^1/2
 
FLS, thanks again!

I have more to say later, but real quick:

The specific aspirin study is frm the Steering Committee of the Physicians Health Study Research Group (1988), with an N size of 22071, which might make even the guy above happy!


Also, I screwed up in my second post on this example; the effect size correlation for aspirin is indeed .02-- in my second example, I misremembered it as being .08.


Feel free to email me bpesta22@cs.com

I think the Meyer article kicks major butt, and is a very interesting read. I'll send it to whomever wants it.

B
 

Back
Top Bottom