Merged Odds Standard for Preliminary Test

I'll try to clarify.

If the self-evident results have a set number of choices and/or answers, then you can pretty much calculate The Odds. In in a perfect world these worst case odds indicate how likely a person flipping a coin or rolling dice is to pass the test. Let's call this scenario Blind Luck.

Unfortunately, the world's not perfect. There are going to be occasions where due to time, space, money, personalities, or whatever, the protocol is not going whittle it down to Success by Ability vs Success by Blind Luck. Our odds calculation remains the same, but our confidence level is reduced.

Unless the subject is making their guesses based on a random-like process, such as rolling a die, it has to be assumed that their guesses will not be random, but will follow some sort of pattern. So unless there is no pattern to the placement of the target, the distribution of guesses, based on the pattern of guessing will be different than the distribution of guesses based on random sampling.

We typically see two kinds of guessing games - one where the placement of the target is random, and one where the placement is not random. Connie Sonne's test is an example of the former, and the VFF test is an example of the latter. Just think how easy it would have been to test VFF if the presence or absence of a kidney was determined randomly. The distribution of blind guesses in the former, even though those guesses are not likely to be random, will still correspond to the distribution of guesses based on random sampling. The distribution of guesses in the latter would have to be determined empirically. And that is the stumbling block in the Challenge, because it is not set up to empirically measure the distribution (and thereby provide us with a way to calculate the odds), preferring to depend upon a theoretical distribution.

So what we can try to do, instead, is to make the subject as blind as possible when it comes to the application of any sort of pattern to their guessing. For VFF this means that the subjects are covered in clothing which hides anything that could be used as a clue, such as age. The subjects are presented one at a time, chosen randomly with replacement (which means that some subjects may be read more than once, and some not at all). And she is not told beforehand how many missing kidneys there are. This is what you referred to earlier as Blind Luck vs. Ability.

It sounds to me like you're saying that if we don't have complete confidence that we have made it Ability vs. Blind Luck that we have no right to be discussing odds. Is that your stance?

My stance is that you are using the wrong distribution in order to calculate odds simply because to do otherwise is inconvenient.

I say that this is a challenge and not scientific research. We're perfectly entitled to say, "We acknowledge that in this protocol an ordinary person without a special ability will outperform an ordinary person flipping a coin."

And that is where empirical measurement would be necessary if we want to have some way of quantifying this for comparison to a person with a special ability. Right now, we can guess that both a person without a special ability and a person with a special ability will outperform random sampling, but we don't know by how much. This works against us if we don't make very good guesses as to how much and against the claimant if we way over-compensate. I think my main complaint is that the odds based on random sampling are used as though they are meaningful. :)

We can do that because the more important statement we're making is, "We're so confident that only a person who has an ability could pass this test that we're gonna put up our money."

Yeah, I'd just like to see some indication that our confidence does not reflect the odds based on random sampling, but rather our certainty that we've made a good guess as to how a person without special abilities would perform.

Linda
 
So why all the fuss here? Why should the JREF bother coming up with a hard answer on what odds they'll accept? As long as they are happy that the odds in each case are high enough that someone without an ability isn't going to win, what difference does it make?

I think that the fuss is to remove some of the apparent capriciousness as to whose application and/or protocol is accepted. I'm not sure that setting an odds standard will prevent that, though.

Linda
 
Yeah, I'd just like to see some indication that our confidence does not reflect the odds based on random sampling, but rather our certainty that we've made a good guess as to how a person without special abilities would perform.

Linda

Just to elaborate on this a bit...the odds will not reflect our confidence that the protocol has removed any bias. For example, the odds for the meta-analyses of the ganzfeld tests and for the PEAR random-number generator tests are billions to one, yet the results aren't persuasive because we aren't particularly confident that a tiny amount of residual bias accounts for the effect (i.e. the person without ability performing better than random sampling). On the other hand, when we are confident that we have removed all bias, lower odds would command our attention. Connie Sonne's test would have been impressive if she had passed (or many even if she had guessed two out of three correctly).

Linda
 
So why all the fuss here? Why should the JREF bother coming up with a hard answer on what odds they'll accept? As long as they are happy that the odds in each case are high enough that someone without an ability isn't going to win, what difference does it make?

I thought of something else.

Sometimes the organizations running the preliminary tests set standards higher than those set if Randi was running the test. For example (I don't have the exact details on this), Suitbert Ertel told me of a test which was run by a skeptical group in Europe using a couple of his psi-stars which passed the 1 in a 1000 threshold, but did not pass the 1 in a 10,000 threshold set by the group. Under other circumstances (say undertaking the test at TAM and broadcasting it to a wider audience) this could have been good enough to pass and to proceed to the final. Setting a standard could prevent this sort of situation, where the circumstances, rather than the actual performance, mean the difference between qualifying for a million dollar prize or receding into obscurity.

Linda
 
fls, I think you're starting premises are flawed. Here's how I and I believe the JREF and IIG approach these challenges.

* It's a challenge, not a scientific experiment..

* The protocol has to be accepted by the subject, which means the subject has to believe their powers will work under those conditions. This is usually irrational, so if the test is to happen, some rather odd requests might need to be honored.

* The protocol has to greatly reduce, if not eliminate, the chances of ordinary means resulting in success.

* The theoretical odds combined with a degree of confidence in the above has to be sufficient for the organization to believe their money is safe.

* The protocol has to be practical in terms of time, logistics and expense.

* And here's the one that you seem to be missing, the entire thing must be designed in such a way that it can be easily explained to and convincing enough for the Average Joe on the street. What the skeptics, True Believers and the claimant think about it is irrelevant because this publicity stunt won't change any of their minds.

Your suggestions seemed to based on the idea that this publicity stunt should be treated like an experiment that will be reviewed by scientists or deemed "consistent" from claimant to claimant. That's not the case.

The VFF test is easily explained to the Averge Joe: "Anita claims to be able to detect through means unknown to science that a person is missing a kidney and which kidney is missing just by looking at a person. We have set up three trials with six people each and with one person in each missing a kidney. We have taken steps to reduce visual clues available to Anita. Since there are 12 possible locations for kidneys (two per person) in each trial, Anita has a 1 in 12 chance of being right by just guessing. The odds of guessing all three are 1/12 x 1/12 x 1/12, which means the overall odds are 1/1,728. This is the level she must reach to pass the test. Probability says that with three guesses there's a 1 in 4 chance she'll get one right and a 1 in 51 chance she'll get two right. Her selections will be verified by an on-site sonogram technician."

While I could write it better, the point is that it's short enough and simple enough for any media outlet to write a blurb about it, which is the ultimate goal. She failed, and it was plain as day to reasonable people that she failed. The challenge served its purpose.

Your suggestions for the VFF protocol would not have been accepted by Anita. Even if they were, your test is overly complicated to explain. I'm no expert in stats, but I'm way above the average layman, and I'm not sure how to calculate the odds. The protocol is just not accessible to the people you need to reach.

The three trials of six people is dirt simple to understand. Understanding the 1 in 4 chance of getting one right is easy to understand if you think, "Okay, 12 choices. 3 guesses. 12 divided by 3 is 4. Okay, I get it." The test is useless if the audience is saying, "Oh, she got one right. What does that mean? What? She picked the same person again, but this time guessed the other kidney? You can pick the same person twice? What if she could tell it was the same person because the guy was a smoker?"
 
Last edited:
fls, I think you're starting premises are flawed. Here's how I and I believe the JREF and IIG approach these challenges.

* It's a challenge, not a scientific experiment..

* The protocol has to be accepted by the subject, which means the subject has to believe their powers will work under those conditions. This is usually irrational, so if the test is to happen, some rather odd requests might need to be honored.

* The protocol has to greatly reduce, if not eliminate, the chances of ordinary means resulting in success.

* The theoretical odds combined with a degree of confidence in the above has to be sufficient for the organization to believe their money is safe.

* The protocol has to be practical in terms of time, logistics and expense.

* And here's the one that you seem to be missing, the entire thing must be designed in such a way that it can be easily explained to and convincing enough for the Average Joe on the street. What the skeptics, True Believers and the claimant think about it is irrelevant because this publicity stunt won't change any of their minds.

Your suggestions seemed to based on the idea that this publicity stunt should be treated like an experiment that will be reviewed by scientists or deemed "consistent" from claimant to claimant. That's not the case.

No, I'm of the opinion that this is a challenge, not a scientific experiment, that it should be acceptable to the claimant, that it should be easily understandable to a casual observer, and that failure or success should be easily observed. I have already explicitly stated each of those points in this thread.

The VFF test is easily explained to the Averge Joe: "Anita claims to be able to detect through means unknown to science that a person is missing a kidney and which kidney is missing just by looking at a person. We have set up three trials with six people each and with one person in each missing a kidney. We have taken steps to reduce visual clues available to Anita. Since there are 12 possible locations for kidneys (two per person) in each trial, Anita has a 1 in 12 chance of being right by just guessing. The odds of guessing all three are 1/12 x 1/12 x 1/12, which means the overall odds are 1/1,728. This is the level she must reach to pass the test. Probability says that with three guesses there's a 1 in 4 chance she'll get one right and a 1 in 51 chance she'll get two right. Her selections will be verified by an on-site sonogram technician."

Your calculations should take into consideration that we already know which proportion of missing kidneys were on the left or the right and we already know that there is a pattern to which side she guesses.

While I could write it better, the point is that it's short enough and simple enough for any media outlet to write a blurb about it, which is the ultimate goal. She failed, and it was plain as day to reasonable people that she failed. The challenge served its purpose.

How is it plain that she failed? She gave the appearance of performing better than chance, even if she didn't surpass some unreasonably rigid standard. I realize that it is difficult to step out of the shoes of a non-believer, but the casual observer doesn't necessarily see much of a difference between a very unexpected event and a very, very unexpected event.

Your suggestions for the VFF protocol would not have been accepted by Anita. Even if they were, your test is overly complicated to explain. I'm no expert in stats, but I'm way above the average layman, and I'm not sure how to calculate the odds. The protocol is just not accessible to the people you need to reach.

Why wouldn't it have been accepted? Anita already agreed that they could wear clothes and cover their heads, and her original claim involved an individual reading. And my point with regards to the use of odds and statistics is that failure and success should be obvious, rather than relying upon not only the ability to calculate odds, but upon convincing someone that there is a world of difference between one in nineteen and one in twenty. You shouldn't have to explain the odds, it should be set up so that it doesn't give the appearance of something unexpected unless something unexpected actually happens.

The three trials of six people is dirt simple to understand.

How is "one at a time, Anita indicates whether she sees the right, left or both kidneys" difficult to understand?

Understanding the 1 in 4 chance of getting one right is easy to understand if you think, "Okay, 12 choices. 3 guesses. 12 divided by 3 is 4. Okay, I get it." The test is useless if the audience is saying, "Oh, she got one right. What does that mean? What? She picked the same person again, but this time guessed the other kidney? You can pick the same person twice? What if she could tell it was the same person because the guy was a smoker?"

Smell is one of those things you'd try to eliminate.

Eighteen people, 17 right kidneys, 16 left kidneys. Each individual has a one in nine chance of missing a left kidney and a one in 18 chance of missing a right kidney. Getting one left kidney correct (one in nine) and no right kidneys correct, as well as seeing kidneys that aren't there, plus not seeing kidneys that aren't there, isn't going to look all that remarkable to the casual observer.

Linda
 
Your calculations should take into consideration that we already know which proportion of missing kidneys were on the left or the right and we already know that there is a pattern to which side she guesses.
To compensate for that, I'm thinking the IIG used six people instead of five.

How is it plain that she failed? She gave the appearance of performing better than chance, even if she didn't surpass some unreasonably rigid standard. I realize that it is difficult to step out of the shoes of a non-believer, but the casual observer doesn't necessarily see much of a difference between a very unexpected event and a very, very unexpected event.
Do you have any evidence that a fence-sitter feels *better* about Anita's claims being true than before? People I have spoken to don't have any problems understanding that she failed, especially when I point out that the lady with the sonogram got 100% correct in about 30 seconds per subject.

Why wouldn't it have been accepted?
Did you follow the kidney protocol thread? Anita rejected burqas. At first she was insisting that she be allowed to observe the entire group and then have an opportunity to dismiss any number she wanted (one time) so that she could then concentrate on the remaining subjects. I am very confident that she would not agree to a one person at a time reading.

Anita already agreed that they could wear clothes and cover their heads, and her original claim involved an individual reading. And my point with regards to the use of odds and statistics is that failure and success should be obvious, rather than relying upon not only the ability to calculate odds, but upon convincing someone that there is a world of difference between one in nineteen and one in twenty. You shouldn't have to explain the odds, it should be set up so that it doesn't give the appearance of something unexpected unless something unexpected actually happens.
Nothing unexpected happened. She got one right, and that was a 1 in 4 chance. It's an excellent springboard to a discussion about confirmation bias and possible reasons why people believe they have special abilities. Of course, you remind them that she agreed in advance that she could do exactly what was tested and that before the test when she was told who was missing a kidney, she verified this in a matter of seconds.


How is "one at a time, Anita indicates whether she sees the right, left or both kidneys" difficult to understand?
Explain to me how I calculate the odds with an unknown number of targets and the possibility of a target being viewed more than once and some others not being viewed at all.

Smell is one of those things you'd try to eliminate.
I am very confident I can do a lot of things, but I would probably not agree to any test where I had to wear nose plugs during the process. Want to put me in a glass cage? No way. That would freak me out. Want to put the subjects in a glass cage? Same thing. Want to put them in a room with a glass window? Well, Anita will say that the glass blocks her perceptions.

You think like a scientist who sets up a protocol and then spends a bunch of research money finding people willing to be tested. A challenge is about a bunch of volunteers trying to organize a test that is acceptable to themselves, unpaid volunteers, and a subject who has already demonstrated a tenuous grasp on reality. Oh, and to do so as cheaply as possible.

Eighteen people, 17 right kidneys, 16 left kidneys. Each individual has a one in nine chance of missing a left kidney and a one in 18 chance of missing a right kidney. Getting one left kidney correct (one in nine) and no right kidneys correct, as well as seeing kidneys that aren't there, plus not seeing kidneys that aren't there, isn't going to look all that remarkable to the casual observer.
That's what happened, so I'm not seeing your point.
 
Last edited:
I think that the fuss is to remove some of the apparent capriciousness as to whose application and/or protocol is accepted. I'm not sure that setting an odds standard will prevent that, though.

Linda

That's the thing. Without a fairly major overhaul of the whole procedure, the JREF are free to dismiss and application at any time, up until they've actually signed it. Of course, applicants are free to do likewise. I agree with many of your concerns, but I don't see how this would make any difference. For example, with Pavel's protocol, they were aiming for odds of 1/1000. We can be reasonably sure that a set standard would not allow better odds than that, so it would have made no difference at all in his case.

Sometimes the organizations running the preliminary tests set standards higher than those set if Randi was running the test.

I've not heard of the case you refer to but remember that Randi has to agree to all protocols, regardless of who is running the tests.
 
For example, with Pavel's protocol, they were aiming for odds of 1/1000. We can be reasonably sure that a set standard would not allow better odds than that, so it would have made no difference at all in his case.
I have no idea "what they were aiming for", but Pavel states that in August he was informed by the JREF that to pass the preliminary test he must perform at a 100% success rate in 20 trials, where his probability of success in each trial was 50%. See http://www.internationalskeptics.com/forums/showpost.php?p=5027325&postcount=228. The odds of doing that are less than one in a million, not one in a thousand.
 
I have no idea "what they were aiming for", but Pavel states that in August he was informed by the JREF that to pass the preliminary test he must perform at a 100% success rate in 20 trials, where his probability of success in each trial was 50%. See http://www.internationalskeptics.com/forums/showpost.php?p=5027325&postcount=228. The odds of doing that are less than one in a million, not one in a thousand.
That was one of Randi's gruff remarks that came after numerous attempts at fixing a protocol had come to nothing. It is not even clear if 20 out of 20 was what was really meant, or if he just wanted 20 attempts. The JREF has done many such tests, and they were never 20 out of 20, but more commonly 16 out of 20.
 
That was one of Randi's gruff remarks that came after numerous attempts at fixing a protocol had come to nothing. It is not even clear if 20 out of 20 was what was really meant, or if he just wanted 20 attempts. The JREF has done many such tests, and they were never 20 out of 20, but more commonly 16 out of 20.
If the JREF is willing to test Pavel under the condition that he get 16 of 20 right in the preliminary test, he might accept that (although he claims less than an 80% hit rate). In any event, if the JREF is willing to back off what it told Pavel in August, why doesn't it clarify what would be an acceptable performance by Pavel?
 
Do you have any evidence that a fence-sitter feels *better* about Anita's claims being true than before?

I don't know. I certainly wouldn't feel comfortable assuming that they would see the results the same way as those who are busy crowing about how she got pnwed.

Did you follow the kidney protocol thread?

No. I read the protocol for the test.

Anita rejected burqas. At first she was insisting that she be allowed to observe the entire group and then have an opportunity to dismiss any number she wanted (one time) so that she could then concentrate on the remaining subjects. I am very confident that she would not agree to a one person at a time reading.

I didn't suggest burqas. I suggested those things she had agreed to, but most of the subjects chose not to use - head-coverings, for example.

I realize that you will deny that any suggestions I make are feasible. My point is that the perception of what she is able to accomplish will depend more upon the experimental set-up than what she actually does. And that there is no point in making it easy for her to obtain dramatic results.

Nothing unexpected happened. She got one right, and that was a 1 in 4 chance. It's an excellent springboard to a discussion about confirmation bias and possible reasons why people believe they have special abilities. Of course, you remind them that she agreed in advance that she could do exactly what was tested and that before the test when she was told who was missing a kidney, she verified this in a matter of seconds.

It is important to realize that she will be judged on two fronts - whether she did something unexpected and whether she passed the test. You seem to be treating it as though, because she didn't pass the test, her results weren't unexpected. However, she correctly identified two people as missing a kidney, something that if it were due to random sampling, would only happen 8 percent of the time, under the conditions of the test. This intuitively seems to be close to our usual 5 percent cut-off. What's more, the way in which this was revealed was dramatic and obvious, as first the chosen subject is asked to stand and then the person without a kidney is asked to stand. This really emphasizes that it's the same person. And the reveal as to whether or not the correct side was chosen becomes almost an after-thought - especially because that part of the test is unremarkable once the person is identified.

Explain to me how I calculate the odds with an unknown number of targets and the possibility of a target being viewed more than once and some others not being viewed at all.

I did in my prior post. If we are given the information that there are 18 subjects with 3 of them missing a kidney, two on the left and one on the right, then any individual, selected randomly, has a one in nine chance of missing a left kidney and a one in eighteen chance of missing a right kidney (ignoring the slight bias introduced by nobody missing two kidneys).


I am very confident I can do a lot of things, but I would probably not agree to any test where I had to wear nose plugs during the process. Want to put me in a glass cage? No way. That would freak me out. Want to put the subjects in a glass cage? Same thing. Want to put them in a room with a glass window? Well, Anita will say that the glass blocks her perceptions.

Or Anita could be asked what sort of scents she finds pleasing and a bunch of roses or an incense stick (or whatever) could be placed in the room with her.

You think like a scientist who sets up a protocol and then spends a bunch of research money finding people willing to be tested. A challenge is about a bunch of volunteers trying to organize a test that is acceptable to themselves, unpaid volunteers, and a subject who has already demonstrated a tenuous grasp on reality. Oh, and to do so as cheaply as possible.

I think that you simply plan to reject whatever I suggest.

That's what happened, so I'm not seeing your point.

This set-up takes the same results, but changes the extent to which they can be seen as unusual. Instead of making it look like she only made two errors (one of them trivial) on her way to doing something unexpected, all of her errors can be noted - seeing kidneys that aren't there and failing to see kidneys that are there. Plus all of her correct guesses will be expected, except for one. And that one would happen 11 percent of the time if due to random sampling - something that intuitively isn't as close to a remarkable finding.

Linda
 
In any event, if the JREF is willing to back off what it told Pavel in August, why doesn't it clarify what would be an acceptable performance by Pavel?
I think it is because they are fed up with Pavel, and the way he develops his abilities as he goes. If he had stated his ability clearly from the beginning, and if he did not constantly make last-minute changes to protocols, he would have been tested by now.

As I said at the time, I would not have dismissed Pavel's claim, and I would have liked to see him getting an extra chance, but I am more patient than Randi.
 
Last edited:
If the JREF is willing to test Pavel under the condition that he get 16 of 20 right in the preliminary test, he might accept that (although he claims less than an 80% hit rate). In any event, if the JREF is willing to back off what it told Pavel in August, why doesn't it clarify what would be an acceptable performance by Pavel?

I do not know.

What Randi said in his last statement was essentially one finger held up. The JREF reasoning "lack of personnel and time" did not cut it for me since then.
 
I didn't suggest burqas. I suggested those things she had agreed to, but most of the subjects chose not to use - head-coverings, for example.
They all wore straw hats and scarves on the backs of their necks.

I realize that you will deny that any suggestions I make are feasible. My point is that the perception of what she is able to accomplish will depend more upon the experimental set-up than what she actually does. And that there is no point in making it easy for her to obtain dramatic results.
I am trying to hammer home the point that when a scientist wants to conduct a study, she comes up with a protocol that she believes will be accepted as evidence by not only herself but by her peers. She then seeks out subjects for testing. If she can't find enough willing subjects, she either gives up because no other protocol is good enough or she makes adjustments. She doesn't negotiate with people.

A challenge is very different. The claimant makes some wild-ass claim that defies our current understandings of science. The organization and the claimant then negotiate a protocol. The claimant wants something acceptable for their "powers" to work, even if it is as irrational as the temperature of the room. The organization merely needs to eliminate ordinary and known means of passing the challenge. Nothing is actually proven if the claimant passes, so that's not an issue.

This is important to understand because while burqas (for example) might make an excellent choice for blinding, the subject has just as much control over the protocol and can reject it. You can't sit in an ivory tower and make proclamations about what is a good protocol. You need to wrestle with the pig to find something you both can accept.

It is important to realize that she will be judged on two fronts - whether she did something unexpected and whether she passed the test. You seem to be treating it as though, because she didn't pass the test, her results weren't unexpected. However, she correctly identified two people as missing a kidney, something that if it were due to random sampling, would only happen 8 percent of the time, under the conditions of the test. This intuitively seems to be close to our usual 5 percent cut-off. What's more, the way in which this was revealed was dramatic and obvious, as first the chosen subject is asked to stand and then the person without a kidney is asked to stand. This really emphasizes that it's the same person. And the reveal as to whether or not the correct side was chosen becomes almost an after-thought - especially because that part of the test is unremarkable once the person is identified.

Whose 5% cut-off? Do you think the general public knows about this number and how it is used? I don't think so. There's always a risk that is balanced against logistics and ease of presentation.


I did in my prior post. If we are given the information that there are 18 subjects with 3 of them missing a kidney, two on the left and one on the right, then any individual, selected randomly, has a one in nine chance of missing a left kidney and a one in eighteen chance of missing a right kidney (ignoring the slight bias introduced by nobody missing two kidneys).
What I am asking is how you actually conduct the test and calculate the odds. I *think* you are saying that people are randomly selected and presented to her. The person selected would then go back into the pool for possible selection again.

So, how does that work? How many people does she see?

Or perhaps you can explain to me how to calculate the odds for another suggestion, which is where each person was presented to Anita one at a time where she doesn't know how many people are missing kidneys. For each person she would indicate both kidneys or which kidney was missing. I figure with dependent events it is 3/36 * 2/35 * 1/34 to calculate the odds of getting all three correct.

Beyond that, I'm hitting a brick wall trying to derive the math and haven't tried to look it up.

Or Anita could be asked what sort of scents she finds pleasing and a bunch of roses or an incense stick (or whatever) could be placed in the room with her.
And let's make sure that all 18 people don't have an allergic reaction to the scent.

I think that you simply plan to reject whatever I suggest.
Not at all, because it doesn't matter what you or I suggest. What matters is what the claimant is willing to accept, what is practical for the organization to do, and what the volunteer subjects are willing to endure.

Take your perfume suggestion. First, Anita would have to decide on what scent she likes. Then on the other coast the IIG has to try to find that scent. What if they don't make it anymore and Anita doesn't have enough? We're back to the drawing board.

Assume they can find the scent. Before they agree to it, they have to find 18 people, three of whom are missing a kidney, and see if they can handle being around the scent for 30 minutes without getting visibly annoyed or having an allergic reaction. Even nose plugs, assuming you can get 15 normal people and 3 people missing a kidney willing to wear them for 30 minutes without getting visibly annoyed, don't prevent eye irritation. Of course, if you trot them out one at a time, it's less of an issue.

Assuming you can arrange all that, you still have the problem of the claimant right before the test saying the scent is too strong and bothers her. Or maybe combined with the scent from the freshly shampooed carpets it annoys her so much she won't take the test.

This is the real world of challenge negotiations. To get back on track, what I'm saying is that it's easier to add an extra subject or two to change the odds than it is to go through all this ******** trying to find a perfectly blinded protocol. Therefore, the JREF and the IIG are smart not to cite any fixed odds requirements in advance.

ETA: Unlike ordinary scientific research, a challenge is a one-time event (maybe someday 2-times if anybody ever passes) that is likely to have press coverage. There are no second chances. There are no trial runs you can use refine the process. That means you really do have to worry about things like people being allergic to perfumes or freaking out about wearing a nose plug.
 
Last edited:
They all wore straw hats and scarves on the backs of their necks.

I'm sorry. That sentence started out somewhat differently and I didn't edit it properly. It should say "chose to use" (rather than "not to use").

I am trying to hammer home the point that when a scientist wants to conduct a study, she comes up with a protocol that she believes will be accepted as evidence by not only herself but by her peers. She then seeks out subjects for testing. If she can't find enough willing subjects, she either gives up because no other protocol is good enough or she makes adjustments. She doesn't negotiate with people.

But we've already agreed that this is irrelevant, since this is not a scientific study. Why do you keep bringing it up? Are you under the impression that you are the only person here who has ever walked through a protocol negotiation with a claimant?

A challenge is very different. The claimant makes some wild-ass claim that defies our current understandings of science. The organization and the claimant then negotiate a protocol. The claimant wants something acceptable for their "powers" to work, even if it is as irrational as the temperature of the room. The organization merely needs to eliminate ordinary and known means of passing the challenge. Nothing is actually proven if the claimant passes, so that's not an issue.

Exactly. And you don't need to keep making this point, because as far as I can tell, we have been in agreement on this all along. I've been making this point for several years, so unless came to this realization after we started this conversation and you are now telling me your position, it's a bit of a distraction for you to keep bringing it up.

This is important to understand because while burqas (for example) might make an excellent choice for blinding, the subject has just as much control over the protocol and can reject it. You can't sit in an ivory tower and make proclamations about what is a good protocol. You need to wrestle with the pig to find something you both can accept.

Yes, exactly.

Whose 5% cut-off?

The cut-off which is in general use for research in fields like medicine or parapsychology.

Do you think the general public knows about this number and how it is used?

Anyone with a bit of a science background will be familiar with it, or they might pick it up from press releases about research studies which often include statements about whether the results exceeded a certain p-value.

I don't think so. There's always a risk that is balanced against logistics and ease of presentation.

Yes, my point is that it should be relatively easy for your audience to tell whether or not something is unexpected or expected, without going through the process of calculating odds. If someone declares that they can roll doubles when they roll a pair of dice, I don't need to be able to do the calculation to see that doing so two times out of three would be unusual.

What I am asking is how you actually conduct the test and calculate the odds. I *think* you are saying that people are randomly selected and presented to her. The person selected would then go back into the pool for possible selection again.

So, how does that work? How many people does she see?

Ah, I see - you mean coming up with a threshold and figuring out whether she exceeded that threshold for the purpose of the challenge. I was thinking about whether it would be clear to a casual observer, or whether it could be clear in a media blurb, just how likely or unlikely each guess was. As I mentioned earlier, I think the fence-sitters are more interested in whether Anita can do something interesting and unexpected than whether or not she can pass some particularly rigid standard.

Or perhaps you can explain to me how to calculate the odds for another suggestion, which is where each person was presented to Anita one at a time where she doesn't know how many people are missing kidneys. For each person she would indicate both kidneys or which kidney was missing. I figure with dependent events it is 3/36 * 2/35 * 1/34 to calculate the odds of getting all three correct.

Beyond that, I'm hitting a brick wall trying to derive the math and haven't tried to look it up.

We can sit down and work out the probabilities for various outcomes for a specific protocol, and of course that part would be necessary for the purposes of handing out prize money. I don't know if you're asking for a tutorial, but I see it as a bit of a side issue to the idea of making the results intuitively obvious to the casual observer.

It really seems to me that the reason for the MDC and then smaller challenges like Anita's, is to show that these claims need to be treated with skepticism. And we can't do that if people won't subject their claims to testing (hence the carrot), and we also can't do that if people are able to demonstrate that they are able to perform unusual and unexpected feats. Whether or not they are unusual enough to pass (some might say) an unreasonable standard, is somewhat secondary.

And let's make sure that all 18 people don't have an allergic reaction to the scent.

Not at all, because it doesn't matter what you or I suggest. What matters is what the claimant is willing to accept, what is practical for the organization to do, and what the volunteer subjects are willing to endure.

Take your perfume suggestion. First, Anita would have to decide on what scent she likes. Then on the other coast the IIG has to try to find that scent. What if they don't make it anymore and Anita doesn't have enough? We're back to the drawing board.

Assume they can find the scent. Before they agree to it, they have to find 18 people, three of whom are missing a kidney, and see if they can handle being around the scent for 30 minutes without getting visibly annoyed or having an allergic reaction. Even nose plugs, assuming you can get 15 normal people and 3 people missing a kidney willing to wear them for 30 minutes without getting visibly annoyed, don't prevent eye irritation. Of course, if you trot them out one at a time, it's less of an issue.

Assuming you can arrange all that, you still have the problem of the claimant right before the test saying the scent is too strong and bothers her. Or maybe combined with the scent from the freshly shampooed carpets it annoys her so much she won't take the test.

Ah, I see. The issue of various smells present in the building and carried on or about people was so minor that it didn't deserve any mention in the protocol, but once you need a way to reject whatever I say, it becomes an insurmountable problem.

Silly me.

Linda
 
But we've already agreed that this is irrelevant, since this is not a scientific study. Why do you keep bringing it up? Are you under the impression that you are the only person here who has ever walked through a protocol negotiation with a claimant?

You said, "there's no point in making this any easier" for the claimant as if the JREF or the IIG are sitting around trying to find ways to make things easier. Every time they make it "easier" it's a calculated concession.

The cut-off which is in general use for research in fields like medicine or parapsychology.

Anyone with a bit of a science background will be familiar with it, or they might pick it up from press releases about research studies which often include statements about whether the results exceeded a certain p-value.
I know what it is. By asking "whose" I was pointing out that your use of "our cutoff" presumes a familiarity that just isn't there. I doubt that the general public is even vaguely aware of this number.

Yes, my point is that it should be relatively easy for your audience to tell whether or not something is unexpected or expected, without going through the process of calculating odds.
How should they do it? Intuitively? Isn't that how these beliefs come about in the first place? If the test can serve as a mini-lesson in critical thinking, that's a good thing. That's why it's important to have the probability calculation be as simple as possible.

Ah, I see - you mean coming up with a threshold and figuring out whether she exceeded that threshold for the purpose of the challenge. I was thinking about whether it would be clear to a casual observer, or whether it could be clear in a media blurb, just how likely or unlikely each guess was. As I mentioned earlier, I think the fence-sitters are more interested in whether Anita can do something interesting and unexpected than whether or not she can pass some particularly rigid standard.
Sorry, but that's not what I'm asking. With the "three rounds of six" protocol, it's easy for a layperson to understand how the odds for getting all three correct are calculated. It's also easy to understand how with three guesses a person has about a 25% chance of getting one right.

Right now, I don't even understand what it is you propose with selecting people and allowing them to be put back in the pool. Is she looking at all 18 people at once? Sequentially? Is she randomly being presented people to read? If so, how many?

I can't comment on whether it's better that what they did if I don't even understand how it works.

We can sit down and work out the probabilities for various outcomes for a specific protocol, and of course that part would be necessary for the purposes of handing out prize money. I don't know if you're asking for a tutorial, but I see it as a bit of a side issue to the idea of making the results intuitively obvious to the casual observer.
Intuition: direct perception of truth, fact, etc., independent of any reasoning process;

I don't want people using intuition. That's what Anita did with the results, and look where that got her. I want to present people with basic facts and simple reasoning so they can think about it critically. The "three trials of six" scenario is pretty easy to understand for a layperson.

What I want to know is your proposal for the test and the probabilities for her getting k right in n guesses. Maybe it's better. I don't know.

Ah, I see. The issue of various smells present in the building and carried on or about people was so minor that it didn't deserve any mention in the protocol, but once you need a way to reject whatever I say, it becomes an insurmountable problem.

Silly me.
You're taking things personally. This has nothing to do with you. What I am trying to show is my reasoning behind not having fixed or consistent odds for challenges.

Suppose the claimant does not use the sense of smell as part of her special abilities. Suppose further that we think there's a small chance that smell might give her a slight edge over chance in our protocol, but at the same time we don't think it's at all possible that she could use it to ace the the test.

A) No mention of smell is made during negotiations. Nothing about smell is written into the contract we call a protocol. At the time of the test, the claimant refuses to take the test because she doesn't like a smell that doesn't seem to bother anyone else. Does she get a retest? No, and I doubt any judge would grant her one even if he didn't toss the case.

B) Same as the above except she complains about the smell after the test. Does she get a retest? No way in hell.

C) Smell is discussed during negotiations as a method of blocking ordinary means of detection, and a specific scent is agreed upon. It becomes part of the signed contract where it is the organization's duty to create the testing environment. Right before the test the claimant complains that the scent is too strong or when combined with the scent from the freshly shampooed carpet is too distracting. Does she get a retest? Yep.

So, if I'm the IIG or the JREF, I'm fine with scenarios A & B, but not C. If I normally use 1,000 to 1 odds in situations where I am 99.9% sure of my blinding, then I might go with 1,728 to 1 odds in this challenge and skip trying to deal with the perfume issue. It's a trade-off.

Thus, no fixed odds.
 
You said, "there's no point in making this any easier" for the claimant as if the JREF or the IIG are sitting around trying to find ways to make things easier. Every time they make it "easier" it's a calculated concession.

I understand that. I'm suggesting that it's valuable to consider not conceding on a point that provides the opportunity for a dramatic result, but to look for a different option instead.

I know what it is. By asking "whose" I was pointing out that your use of "our cutoff" presumes a familiarity that just isn't there. I doubt that the general public is even vaguely aware of this number.

I'm not sure that half the general public is relevant. I'm thinking of the people who pay any attention to the Challenge and the results, or who would be watching a program on Randi or attending one of his lectures.

How should they do it? Intuitively? Isn't that how these beliefs come about in the first place?

Yes, exactly. If they are forming their beliefs based on intuitive assessments, then I suspect that addressing that intuition will be more persuasive than asking them to rely on methods they are not comfortable using.

If the test can serve as a mini-lesson in critical thinking, that's a good thing. That's why it's important to have the probability calculation be as simple as possible.

I agree. I'm especially looking for ways to combine that with something familiar. Research on decision-making shows that we tend to depend upon intuitive assessments, which are sometimes very wrong. But if you present the information in a way that shows a closer match between the way we use our intuition and the actual probability, our assessments become more accurate.

Sorry, but that's not what I'm asking. With the "three rounds of six" protocol, it's easy for a layperson to understand how the odds for getting all three correct are calculated. It's also easy to understand how with three guesses a person has about a 25% chance of getting one right.

It still needs a basic understanding of binomial probabilities, and that is all that is needed for my suggestion as well.

Right now, I don't even understand what it is you propose with selecting people and allowing them to be put back in the pool. Is she looking at all 18 people at once? Sequentially? Is she randomly being presented people to read? If so, how many?

The subjects are presented one at a time, chosen randomly with replacement (which means that some subjects may be read more than once, and some not at all), one at a time, Anita indicates whether she sees the right, left or both kidneys.

The number of trials will depend upon how accurate Anita thinks she would be with that test and the p-value threshold that the IIG is looking for.

Intuition: direct perception of truth, fact, etc., independent of any reasoning process;

I don't want people using intuition. That's what Anita did with the results, and look where that got her. I want to present people with basic facts and simple reasoning so they can think about it critically. The "three trials of six" scenario is pretty easy to understand for a layperson.

But intuition is what people use and (as you mentioned) what Anita used. If you had presented Anita with a protocol in which the results she achieved did not intuitively seem all that remarkable, maybe she'd be less of a pain (I realize that's a ROF laughable idea :)).

What I want to know is your proposal for the test and the probabilities for her getting k right in n guesses. Maybe it's better. I don't know.

You would use a binomial distribution to calculate the probabilities. I would go through the work of figuring out the exact numbers and explaining it if this was a real protocol.

You're taking things personally. This has nothing to do with you.

Don't worry. I'm aware that others enjoy the same treatment. :)

What I am trying to show is my reasoning behind not having fixed or consistent odds for challenges.

Suppose the claimant does not use the sense of smell as part of her special abilities. Suppose further that we think there's a small chance that smell might give her a slight edge over chance in our protocol, but at the same time we don't think it's at all possible that she could use it to ace the the test.

A) No mention of smell is made during negotiations. Nothing about smell is written into the contract we call a protocol. At the time of the test, the claimant refuses to take the test because she doesn't like a smell that doesn't seem to bother anyone else. Does she get a retest? No, and I doubt any judge would grant her one even if he didn't toss the case.

B) Same as the above except she complains about the smell after the test. Does she get a retest? No way in hell.

C) Smell is discussed during negotiations as a method of blocking ordinary means of detection, and a specific scent is agreed upon. It becomes part of the signed contract where it is the organization's duty to create the testing environment. Right before the test the claimant complains that the scent is too strong or when combined with the scent from the freshly shampooed carpet is too distracting. Does she get a retest? Yep.

So, if I'm the IIG or the JREF, I'm fine with scenarios A & B, but not C. If I normally use 1,000 to 1 odds in situations where I am 99.9% sure of my blinding, then I might go with 1,728 to 1 odds in this challenge and skip trying to deal with the perfume issue. It's a trade-off.

Thus, no fixed odds.

I was simply pointing out that since your solution does not actually address the problem - i.e. increasing the odds does not necessarily affect whether or not it will be too easy for her to pass the test when holes are left in the protocol - it may be better to address holes in the protocol instead.

Linda
 
Yes, exactly. If they are forming their beliefs based on intuitive assessments, then I suspect that addressing that intuition will be more persuasive than asking them to rely on methods they are not comfortable using.
Wow. I find that condescending as well as ineffective. The way to teach critical thinking skills is to, well, teach critical thinking skills. Setting up a test so their "intuition" matches the results is a rather ridiculous approach.

The subjects are presented one at a time, chosen randomly with replacement (which means that some subjects may be read more than once, and some not at all), one at a time, Anita indicates whether she sees the right, left or both kidneys.

The number of trials will depend upon how accurate Anita thinks she would be with that test and the p-value threshold that the IIG is looking for.

Anita claimed 100% accuracy and we know the IIG was okay with 1,728 to 1 odds, so please continue with your explanation.

* How many trials?
* What are her odds of getting 1 or more correct while still failing?

You see, Linda, I'm way above the average layman when it comes to understanding stats, but I am at a loss right now how to calculate this. Their version of the test is incredibly simply to calculate and explain even if you've never read a statistics textbook.

You would use a binomial distribution to calculate the probabilities. I would go through the work of figuring out the exact numbers and explaining it if this was a real protocol.
The explanation of the statistics and the chances of getting "some" right are incredibly important factors in setting up a test like this. They don't mean squat in real research but they mean everything in a publicity stunt.

I was simply pointing out that since your solution does not actually address the problem - i.e. increasing the odds does not necessarily affect whether or not it will be too easy for her to pass the test when holes are left in the protocol - it may be better to address holes in the protocol instead.
And thus we're back to wrestling with the pig.

Look, you patch all the holes that money, time, and the irrational person on the other side of the table will let you patch. You then make a judgment call about whether you have enough confidence that your money is completely safe. Sometimes you tweak the odds the to make yourself feel more comfortable.

After all, there's nothing inherently "right" about 1,000 to 1, 20 to 1, 10,000 to 1, or 1,728 to 1. It boils down to a judgment call anyway.
 
Anita claimed 100% accuracy and we know the IIG was okay with 1,728 to 1 odds, so please continue with your explanation.

* How many trials?
* What are her odds of getting 1 or more correct while still failing?

You see, Linda, I'm way above the average layman when it comes to understanding stats, but I am at a loss right now how to calculate this. Their version of the test is incredibly simply to calculate and explain even if you've never read a statistics textbook.

I agree that the results of her test were fairly simple to calculate, which is why the average layman can understand that she did something unexpected.

There are two aspects to consider when looking at the test I suggested. As Anita makes each guess, the average layman will be able to understand the probability that a "missing kidney" guess is correct and the probability that a "kidney" guess is correct. And their focus will be on whether or not she gets the "missing kidney" guesses correct, since those will obviously be the low probability guesses. The layman is looking at the results after they have happened which makes these estimates fairly straightforward.

The IIG, on the other hand, is looking at the results before they have happened. And this is the part which can be fairly straight forward or more complicated depending upon the set-up, because they basically have to guess at what the distribution of results might be, and set criteria based on that. But this part is also relatively unimportant in terms of whether or not your audience understands the test and the results, as the explanation can be more direct.

The explanation of the statistics and the chances of getting "some" right are incredibly important factors in setting up a test like this. They don't mean squat in real research but they mean everything in a publicity stunt.

I wish it was true that background information means squat when setting up real research. It would make my life much easier. :)

And thus we're back to wrestling with the pig.

Look, you patch all the holes that money, time, and the irrational person on the other side of the table will let you patch. You then make a judgment call about whether you have enough confidence that your money is completely safe. Sometimes you tweak the odds the to make yourself feel more comfortable.

After all, there's nothing inherently "right" about 1,000 to 1, 20 to 1, 10,000 to 1, or 1,728 to 1. It boils down to a judgment call anyway.

Like I said, I don't disagree that there are valid reasons to maintain flexibility in setting the odds. I just don't think "let's pretend it's a solution even though it doesn't actually solve the problem" should be one of them.

Linda
 

Back
Top Bottom