I should mention that these closed-deck experiments, which are what statistically untrained people think of first, are not well suited for the intended purpose, because it is difficult to impossible to calculate the false negative probability of the test, and hence it's hard to know whether the test is fair to the testee. Much better are experiments that use independent trials, where the false negative rate is easy to calculate and the number of trials comprising the test can be chosen according to the desired false negative and false positive rate.
Could you explain that a little more? (I believe you, I just don't understand why it's so.)
To understand the problem with the closed-deck approach, it helps to first understand the advantage of the alternative: independent trials. Say an individual claims that if you draw a random card from a freshly shuffled 52-card deck, he can guess the card (number and suit) 10% of the time. This, if true, would be quite a feat, since the probability of guessing the card by chance is just 1/52. We can test this person's claim by giving him n independent trials where for each trial one card is drawn from the deck, he makes his guess, the card is replaced, and the deck reshuffled for the next trial.
Let's say for now we decide to give him n=100 trials. It might seem natural to decide that if he can get at least k=10 of the trials correct, then we will declare that he has demonstrated his claim, and if he fails to get 10 correct then he has failed to demonstrate his claim. Is this a fair test? The answer is no. The reason is that if he truly has a 10% chance of guessing each card correctly then the probability that he will get 10 or more correct guesses out of 100 trials is only a little over 0.5 (it's actually .55 to two decimal places). Thus even if he has the ability he claims, there is almost a 50% chance we will wrongly conclude that his claim is false. It follows that for the test to be fair to the claimant we must make it more lenient and declare the test successful if he gets some fraction k/n < 0.1 of the trials correct.
In designing our test we need to be fair to the claimant, but also fair to ourselves. That is, if the claim is true, we want our test to have a high probability of declaring that it is true; this probability is called the
power of the test. However, if the claim is false we want to have a low probability of erroneously declaring it be true; that is, we want the test to have a low false positive rate (FPR). Notice that for a fixed n, the smaller we set k, the greater the power of the test, but also the greater the test's FPR. So, for a given n there is a trade-off between power and FPR. The question is, is there a way we can design a test that has both high power and low FPR? The answer is yes, we can increase the number of trials n. In fact for a given target power and FPR, there is a unique k and n if the trials are independent. For example, say we want the test to have power of 0.9 and a FPR of 1/10,000. It turns out that the test should comprise n=187 trials and require at least k=14 correct guesses.
The results above follow from the assumptions that (1) each trial is independent with (2) constant probability of success from trial to trial. These assumptions imply that the number of success in the sequence of trials follows a binomial distribution. Knowing that the sequence has a binomial distribution allows us to identify the unique values k and n that give us our desired power and FPR.
In the closed-deck design, however, neither of the above assumptions hold. If an individual claims he can correctly identify, say, 5 cards out a sequence of 52 with probability 0.5, we know that if we require him to identify at least 5 cards, then our test will have only 50% power and will be unfair to him. We know we have to set k to some value less than 5 to give the test sufficient power to be fair to him. But we have no idea how to determine k. The probability of correctly identifying k cards, given that he can identify 5 cards with 50% probability, does not follow any known distribution. If we set k to some value, we have no idea what the power of the test will be, and if we decide we want power of, say, 0.9, we have no idea what value to choose for k. We can try asking claimant how many cards he can identify with 90% probability, but it is unlikely he would really have a concrete idea, and regardless, it is nice to be able to take the claim we are given and design the test around it, rather than requiring the claimant to amend his claim to accommodate the limitations of a statistically awkward test design.