Moderated Is the Telekinesis Real?

Just like tampering with voting machines is not the only, or even remotely the best, way of affecting the results of an election. A far more effective and vulnerable point of attack is the tabulator. Which is why that's what Fancy Bear goes after.



Let's be clear what the actual anomalies in the data are.

First, the baselines. These are the runs that supposedly established the unaffected behavior of the REGs such that putatively affected runs would have empirically strong data to compare against. This is, as I mentioned before, an example of good empirical control. Rather than compare the experimental results to some theoretical expectation, they were compared against a measured expectation. But the problem is that the calibration runs were too closely correlated to have credibly come from actual equipment. Dr. Steven Jeffers provides the statistical argument establishing this. As I explained at length earlier, baselines that are "too good to be true" will amplify any variance in experimental data.

As an aside, this anomaly was brought to Jahn's attention, and his explanation is cause for concern. He opined that the operators performing the calibration runs unconsciously willed them to have exceptionally good performance. If so, then they are not unaffected runs and have no value as a baseline. This is where the controls suggested by various reviewers would have been useful. Jahn essentially admits that the calibration runs may have been affected by the subjects being present at the time of the calibration. That alone justifies the controls and invalidates his results.

Second, Operator 010. All the data responsible for significant variance in the experimental data was produced by a single subject, whose purported effect was greater than all the other subjects combined. We do not have to propose that all 22 subjects discovered a way to falsify data. Only one subject produced data that was at all interesting, and that's the one subject we should look more closely at. Nefarious suspicious aside, if one subject displays the purported ability in spades and 21 don't at all, that's a red flag for anomalous data no matter how you suspect it happened. You want more variance across subjects.

Third, volitional control -- which is what you touch on above. The effect only appeared (and keep in mind it ever only appeared for Operator 010) when the subject got to choose which of several available experimental protocols to attempt. If the choice of which to do was taken away from the subject, the ability to affect the REG disappeared entirely. Again, if we think pessimistically, this merely suggests that that was the only protocol Operator 010 had discovered how to subvert. This is an example of an empirical control doing its job. When the effect correlates perfectly to a control variable, you know that the basis underlying the control variable is what's explaining the data. "Operator 010 could affect the REG, but only if she got to choose how the experiment that day would be conducted."

Again, it doesn't matter whether we can imagine how Operator 010 got those results to happen. What matters is that the data point much more strongly to anomalous empirical outcomes that were either uncontrolled, or controlled but ignored in the analysis by diluting them with the aggregation.
I am guilty of repeating what I have said before -- I will get to Palmer's article and operator 010 later this week. I will explain why Palmer is wrong about this particular reading.

One more thing -- you do not compare measured results to measured expectation, you compare them to theoretical expectation to rule out the possibility of the error of an experiment, which is always present. In other words, the measured expectation is not precise, so it is not used as a benchmark.
 
Last edited:
The purpose of the Princeton study, as I understand it, was to show that so called everyday people have occasional sparks of telekinesis, but they cannot sustain their telekinetic abilities for long; this is the reason why they chose statistical methods to analyze their results.

And you have completely ignored the fact that only one of PEAR's 22 subjects managed to display any "gift" at all, and only under circumstances she chose. Does that not make you suspicious? The "everyday people" in the experiments -- and in the attempts to replicate the findings -- showed no "gift" of psychokinesis. Yet for some reason you seem to think this study is scientific proof of the phenomenon.

Then there is Uri Geller, who claims that he has tested telekinetic abilities, but he is a conman who made millions demonstrating his "gift"

Given the desire and ability of claimants in this field to dupe experts and the public, why are you trying to argue that reasonable controls to prevent such occurrences were unjustified in what was billed as scientific research of a true phenomenon? Is it just so you can have some apparent means to belittle Jahn's critics?
 
I am guilty of repeating what I have said before...

Which you suddenly seem to have time to do. That was written to bring another poster up to speed on the discussion. You seem to have time to make myriad promises of future debate, but no time to do it.

The problem is that you claimed -- at the very beginning of this thread -- that all Jahn's critics were incompetent, and you claimed later that Palmer must have been biased. But this was all before you even knew who PEAR's critics were or what they said. Why should we believe now that at some future time -- once you've finally digested what these critics actually wrote -- that you will have an honest and skilled answer to them? You've obviously approached this research and its critics with a heavy bias in favor of Jahn, et al.

Unless, of course, you're prepared to admit that you passed judgment on these people without even reading what they wrote. Are you honest enough to do that?
 
The purpose of the Princeton study, as I understand it, was to show that so called everyday people have occasional sparks of telekinesis, but they cannot sustain their telekinetic abilities for long...


This is not an uncommon result in paranormal research. Certain subjects on certain runs get better than expected results, but a following run fails to show the same effect. It’s otherwise known as “luck”.
 
As for being repetitive, I agree with you. I try to respond to some posts that I find interesting, although they may contain similar data.

If you pick and choose the posts to respond to in a way that makes you spend your precious little time simply repeating yourself, then I reject your claim to be too busy to attend to salient rebuttals.

For six pages I've asked you to address the analysis by Steven Jeffers. You claim to be a statistician, and his argument is statistical in nature. You told us if there were problems with the protocol, they would have manifested themselves in anomalies in the data. Jeffers found just such anomalies. By any objective measure, he is a critic who has addressed your very concerns. Yet you pretend he doesn't exist and instead speak only about a critic you chose.

This is very suspicious, Buddha. You're clearly not pressed for time, as you claim. And you're clearly unwilling to address critics your opponents name. You've even wasted one of today's precious posts trying to justify why your opponents here should respect your choice of critic.

Answer Jeffers immediately or admit you cannot.
 
Please read my post carefully. I already explained why the methods of data tampering that you described wouldn't work.

No, you rejected the methods of data tampering that you described. That was properly pointed out to be a straw man. I further went into substantial detail showing why your whole line of reasoning missed the most salient point about empirical controls -- a subject you've already admitted ignorance of.

I suspect that, unlike one of my opponents, you are not an engineer so I think it would take some time for you to understand my post...

The comical naivete of your relevant post proved you are not an engineer. And we have a long string of your prior threads to show you will happily claim expertise you eventually cannot demonstrate if you think it will help your argument. Once again instead of addressing the actual criticism of an "intelligent person," you have simply tried to gaslight him into believing your naive arguments still somehow work because you are somehow more qualified than they regardless of the field. "I'm so much smarter than you and you evidently don't understand me," is an arrogant and ineffective response. Correct it.
 
It doesn't even have to be that conclusory. I find it amusing that Buddha happily believes people can affect machines with the power of their minds, but at the same time finds it impossible that a clever subject can figure out how to rig the data in a poorly-controlled experiment conducted in a controversial field its practitioners admit is fraught with prior malfeasance. His opinion of what's a priori feasible is pretty far off in the fringe sigmas.
If you say that a particular experiment could be rigged, you should suggest how to do it, but so far I do not see a plausible suggestion from you or any one else. But I have I surprise for you -- it is possible to rig this experiment, although this comes with a cost.

You could disconnect the cable form the device and connect your own device to it. By doing this you would be able to enter any data you like into the recording device. But there is a catch -- you would have to calibrate the recording device. otherwise your data won't make any sense. It takes time to calibrate a device; I did it before and I know how it works, it took me about an hour to achieve the goal. I am not the most efficient calibrator, I admit that. But even a technological wiz wont be able to do it in less that 10 minutes. While the calibration is in progress. the recording device doesn't record any data, so there is a time gap that is easily noticeable. Then, after you're done, you would have to connect REG back to a recording device, and calibrate it again. In theory this method would work, but it is completely impractical.

Now, about my convictions -- I didn't choose to believe in telekinesis because I believe in God, there is no connection between these areas. As a matter of fact, the head of the Princeton research team is an atheist, he thinks that people acquired telekinetic ability as a result of the evolution of species. There are other atheist scientists who agree with him.
 
If you say that a particular experiment could be rigged, you should suggest how to do it...

No, for the reasons already given, which you did not address. You failed to understand the very thorough post I wrote, and are still bent on reversing the burden of proof for empiricism. Nor is your ongoing ignorance of empiricism corrected by endless strings of "impossible" straw men that you propose. Address the point I actually made.

Now, about my convictions -- I didn't choose to believe in telekinesis because I believe in God...

Straw man. You believe in telekinesis, and -- regardless of why -- this has led you to approach PEAR and its critics with obvious bias. You are unwilling to address and correct that bias.

As a matter of fact, the head of the Princeton research team is an atheist, he thinks that people acquired telekinetic ability as a result of the evolution of species. There are other atheist scientists who agree with him.

Irrelevant. None of Jahn's critics attribute to him any ulterior motive that you need address. His critics focused entirely on the statistical and methodological errors he made which made his findings untenable. Address those. Do not make up new ways in which you think your opponents or Jahn's critics have been unfair.
 
Last edited:
One more thing -- you do not compare measured results to measured expectation, you compare them to theoretical expectation to rule out the possibility of the error of an experiment, which is always present. In other words, the measured expectation is not precise, so it is not used as a benchmark.

No, this is a complete misunderstanding of the t-test for significance. The measured expectation includes and subsumes all the ways in which the experimental apparatus may differ from theoretical expectations, or may vary according to unknown variables, without the experimenter having to know what they are and control individually from them. It controls for the result, not for any suspected process. This is qualitatively different than comparing observation to theory and attempting thereafter to control for specific anticipated error. Most notably it's one of the ways in which Palmer argues PEAR improved over its predecessors. Yes, there are always sources of error in the experiment for an empirical study. No, you will not always know what they are. That's why the t-test exists. If you employ the t-test method, you don't have to know what they are in order to control for them. You only have to be able to measure their combined effect to a certain degree of certainty. That degree then controls the significance of any observed variance from expectation in the experimental results. Yes, the tradeoff is accepting uncertainty in the baseline. No, that doesn't make it inappropriate as a baseline.

You've just illustrated that you have no idea how the t-test works, no idea when it should be used, no idea about how practitioners reckon its uncertainty, no idea how uncertainty plays into statistics at large, and therefore no prayer whatsoever of understanding what Dr. Steven Jeffers says about PEAR's method. In your haste to pretend to be an expert, you keep demonstrating profound ignorance. You're not just messing up the details; you're fumbling over foundational concepts.

Another foundational concept you've completely stumbled over is the approach of engineering to systems analysis. It's well and good to be able to predict or extrapolate the behavior of an engineered system according to theory, or even according to an expert knowledge of its design. But that is no substitute for measuring actual behavior and actual error. Again, measurement of results subsumes all the effects one knows about and all the ones that the engineer doesn't know about. Researchers such as Petroski and Perrow have shown repeatedly that when designs reach a certain complexity they become fundamentally unpredictable. This results in what Perrow terms "normal accidents" unless controlled for in a closed-loop fashion. Hence the response of engineering is to measure where possible instead of theorizing and predicting.

You're suggesting the measurement of error in the REG is an inappropriate standard against which to compare alleged other sources of variance, and that theory should be the guide. That's all the proof I need that you have zero experience or knowledge in any field relevant to your contributions to this forum. I had my suspicions during your attempt to prove reincarnation, but this post of yours today cements my conclusion that you have no relevant authority or expertise from which to argue as condescendingly as you have.
 
If you say that a particular experiment could be rigged, you should suggest how to do it, but so far I do not see a plausible suggestion from you or any one else. But I have I surprise for you -- it is possible to rig this experiment, although this comes with a cost.

You could disconnect the cable form the device and connect your own device to it. By doing this you would be able to enter any data you like into the recording device. But there is a catch -- you would have to calibrate the recording device. otherwise your data won't make any sense. It takes time to calibrate a device; I did it before and I know how it works, it took me about an hour to achieve the goal. I am not the most efficient calibrator, I admit that. But even a technological wiz wont be able to do it in less that 10 minutes. While the calibration is in progress. the recording device doesn't record any data, so there is a time gap that is easily noticeable. Then, after you're done, you would have to connect REG back to a recording device, and calibrate it again. In theory this method would work, but it is completely impractical.

Now, about my convictions -- I didn't choose to believe in telekinesis because I believe in God, there is no connection between these areas. As a matter of fact, the head of the Princeton research team is an atheist, he thinks that people acquired telekinetic ability as a result of the evolution of species. There are other atheist scientists who agree with him.

Mewling excuses, nothing more.

Sad. Low energy.

Several far simpler examples of how the data could be compromised have been given. Your obsessive need for focus in irrelevant minutia reveals more about the nature of your arguments than I think you realize.



I did it before and I know how it works, it took me about an hour to achieve the goal.

I don't believe you. The ignorance of basic principles reflected in your posts prove to me, beyond all doubt, that you are either lying about your professional training and experience, or are singularly inept in your field. Your own words have neutered your claims of expertise. Such statements from you are devoid of any meaning.
 
Last edited:
One must regularly fulfill promises if one is to be taken seriously when one makes them.

Indeed, when he asked for links to criticism of PEAR that we would like him to address, and we give them to him, he lies and says none were provided and so he's justified launching into constantly-deferred criticism of the ones he's cherry-picked. He's trying to script both sides of the debate and he thinks we can't see him doing it. Like I said, this approach seems to be favored by folks for whom I suspect gaslighting has been a primary means of manipulation in the past.
 
“However, this issue loses importance when one considers how the
U significance is distributed among the various subjects tested. In the
formal series, only two of the 22 subjects tested provided independently
significant results. The bulk of the significance is attributable to one of
these subjects, who contributed 14 of the 61 formal series (23%). In these
series, this subject achieved a hitting rate (in terms of his or her intent)
of 50.05% over 105,150 runs (Z-4.49, 2<10-4). When the results of this
subject are eliminated, the remaining series are no longer significant
K (50.01%, Z-1.36). This subject's scoring rate is significantly higher than
3that of the other subjects combined (Z-3.14, 2<.005).”
Palmer, page 112.
http://www.dtic.mil/dtic/tr/fulltext/u2/a169486.pdf

Palmer suggests that the scores of this particular subject should be discarded. In statistics such unusual scores are called “outliers”

Should an outlier be discarded? It depends on the applications. For example, during analysis of a yearly company performance an outlier is usually discarded if the purpose of an analysis is to determine overall earnings trend (usually the method of linear regression analysis is used for this purpose). There is a good reason for that – a company’s earnings might have increase twofold for a given months if it had sold some of its raw materials during that month, for instance; otherwise, the trend would be calculated incorrectly. But if the company is engaged in so called instant stock trading, the exclusion of an outlier would be a big mistake because it would lead to incorrect prediction of the stock’s next movement. In control systems the outliers are kept in place because a failure to take them into account could lead to the system’s instability and even destruction.

My former colleague designs control systems for chemical processes involving combustion; a decision not to take an outlier into account would, certainly, lead to a dangerous explosion in such systems.

The goal of the Princeton research was to investigate telekinetic capabilities of the whole group, not of its individual members. In his evaluation Palmer made the same mistake that I did when, as a rookie data analyst, was assigned to a clinical trial drug testing.

My job was to collect the trail data and put it in a prescribed format, so a more experienced data analyst could use statistical analysis of it to draw conclusions. The company management wanted me to do independent statistical analysis and compare it to the one made by my colleague, so I could learn from my mistakes, as they put it. Indeed I made a gross mistake when I set aside a set of statistical data belonging to the subject who was declared cancer-free, while some participants were declared to be in remission only, and the other ones had experienced only minor improvements. “The doctors should investigate this person’s genetic makeup so they could find more effective treatment for the rest of the patients,” I said. “This is not how it works. The purpose of the trial was not to identify particular person for whom the tested drug made wonders, we should judge the overall effectiveness of the medication. The FDA do not care about this person’s “miraculous” recovery, they want to see the overall drug performance, so they could decide whether it deserves the license or not,” said my more experienced colleague.

If this subject’s scores are discarded, then the researchers would come to inevitable conclusion that he possesses highly developed telekinetic abilities, which would prove that the telekinesis exists. However, the purpose of the research was not to identify the individuals with pronounced telekinetic skills but to judge performance of the group as a whole.

An outlier policy depends on the research purpose, but it seems to me that Palmer didn’t realize that. Perhaps he is a straw man after all, as one of my opponents suggested. Well, it was not my intent to choose a weak report criticizing the Princeton ESP research, I just followed the data presented in a Wikipedia article.

Here is the link to the Wikipedia article https://en.wikipedia.org/wiki/Princeton_Engineering_Anomalies_Research_Lab
 
Palmer suggests that the scores of this particular subject should be discarded. In statistics such unusual scores are called “outliers”

Yes, we all took basic statistics. Don't pontificate. You're not the teacher here.

Should an outlier be discarded? It depends on the applications.

No, not really. If one data point out of many is responsible for all the variance purported for a distribution, it is clearly anomalous. In most cases, theory predicts the general shape to expect from the data. When all the data but one point fit any of the general predictions, that's how we are able to determine that it is anomalous.

For example, during analysis of a yearly company performance...

...which has nothing to do with psychology experiments on human subjects.

My former colleague designs control systems for chemical processes involving combustion...

...which has nothing to do with psychology experiments on human subjects.

The goal of the Princeton research was to investigate telekinetic capabilities of the whole group, not of its individual members.

That's why it was dishonest of PEAR to attribute the variance contributed by a single subject to the tendency of a group. It is not necessary to assert that testing individual ability was the goal in order to reject anomalous data. In fact, if aggregate results are what is desired, it is even more important to reject data that do not aggregate well.

In his evaluation Palmer made the same mistake that I did...

No, your mistake was to misclassify the data. That's not what happened in the PEAR studies. You're simply drawing upon your obviously limited experience in data analysis and deciding that all problems must conform to that experience.

If this subject’s scores are discarded, then the researchers would come to inevitable conclusion that he possesses highly developed telekinetic abilities...

No, that doesn't follow. The score should be set aside simply because its variance is not consistent with the distribution to which it is supposed to belong. The decision is made purely on the basis of what is expected from the profile, not by guessing at why it's anomalous. The check on that decision is made by the same subject having failed another control -- the volitional variable, which you do not discuss. From that, the critics suspect that Operator 010 somehow tainted her results, but it is neither possible nor necessary to confirm that in order to take appropriate action.

What you utterly failed to consider is that during replication, both by PEAR and by the two other organizations that tried to reproduce the findings, the outcomes most closely matched what Dr. Palmer found when he excluded Operator 010. Subsequent replication confirms that Operator 010 was anomalous data that should have been excluded.

An outlier policy depends on the research purpose, but it seems to me that Palmer didn’t realize that.

The research purpose in this case is one in which Dr. Palmer had years of experience and in which he specialized for his entire career. You, however, have a history of lying about what you know. Specifically you demonstrated undeniable ignorance in your reincarnation thread of how research should be conducted to obtain reliable empirical results. Under those circumstances it's comically hubristic of you to try to tell us what the professionals "must" have forgotten, because it doesn't match your layman's expectations.

Once again your argument boils down merely to demanding that we accept you as an expert despite all the evidence that we shouldn't, and your subsequent dictum that professionals in the field are incompetent compared to you. That's pure ego, not science. You don't address the replication. You don't address the volitional control. You just say, "...because I said so."

Perhaps he is a straw man after all, as one of my opponents suggested.

Asked and answered. Dr. Palmer himself is not the straw man. The straw man is your decision that his criticism should be addressed instead of what your opponents told you to address. The label "straw man" applies to an argument, not to a person. It's a metaphor.

Well, it was not my intent to choose a weak report criticizing the Princeton ESP research, I just followed the data presented in a Wikipedia article.

There's a lot to unpack here. First, the Palmer view is not at all weak. Nor is it biased, as you suggested before you even read it. Second, of course it was your intent to follow that. You wrongly stated that your critics had not provided you with criticism to address, despite their having done so clearly and repeatedly. On the basis of that lie, you chose to address only the critics whom you had cherry-picked. As I said before, you even expended some rhetoric to try to show that your opponents should accept your choice of critic.
 
My former colleague designs control systems for chemical processes involving combustion; a decision not to take an outlier into account would, certainly, lead to a dangerous explosion in such systems.

The desired behavior of a control system is not at all the same as the expectation from a naturally-occurring system. Yes there are some control applications where you want a "hair trigger" and are able to tolerate false positives. In those cases you would know from analysis or theory that the control action has low consequence and/or low cost, and/or that inaction has high consequence. But on the other hand there are control applications where the opposite is true. Your one example is hardly representative.

Take for example the weight-on-wheels sensor (WOW) on a typical airliner. A number of control discretes are tied to that process variable. Most visibly, the speed brakes (spoilers) and tire brakes are frequently "armed" to deploy or activate on the WOW signal. And if the landing gear trucks are pitched to fit the gear bay, the WOW signal relaxes the retaining actuator so that the truck pitches properly onto the runway. But the WOW signal is filtered, integrated over time. Only when the airframe has properly settled for sufficient time do these actions take place. Why? Because premature control in that case is high-consequence, whereas inaction is neutral -- spoilers and brakes can be manually activated, and truck-pitch actuators have fail-safes. If the airliner merely bounces, say in rough weather, and the WOW signal were not properly integrated, the spoilers would deploy with the airliner possibly several meters above the runway, leading to a sudden loss of lift and an sudden, unacceptable increases in sink rate. The truck-pitch actuator would relax without the runway being in contact with at least one axle, possibly causing the trucks to swing and the airframe's c.g. to fluctuate unacceptably as a result. The brakes would set, causing the tires eventually to hit the runway with a higher level of resistance and therefore increased skid. This prematurely wears the tires, but also risks blowout and loss of control -- especially if one main gear should hit the runway before the other after WOW application. Hence the WOW sensor must be steady for at least 1 second before the control variables are triggered. Momentary WOW signals are discarded as anomalous. When the danger of false-positive control output is significant, one designs the system to recognize and reject spurious outputs. Heck, even amateur electronics hobbyists quickly learn to "debounce" mechanical switches in order to avoid control outputs cycling rapidly during the microseconds in which the switch sporadically makes contact as it closes. This is basic stuff.

Let's move slowly back to the topic. In spacecraft designs -- both manned and unmanned -- it is common to provide a global inhibit signal to cut out things such as propulsion under certain conditions such as when the spacecraft has landed. In the Apollo lunar module this was done explicitly by the LMP "poking" a non-negative value into location 0413 in the guidance computer's memory after the pilot had completed the landing. In several unmanned lander designs, this was detected by an onboard accelerometer operating on the vertical-axis dimension of the spacecraft. In all those cases, the effect was to provide an INHIBIT signal, either to software programmed to consult the appropriate signal, or to combinatorial electronics using the register voltage as a discrete. The goal in either case was to preclude actions that would be considered undesirable for a landed spacecraft, such as firing the descent motors. Sadly more than one spacecraft in our history of space exploration did not properly filter the accelerometer to exclude such spurious signals as the shock of the spring-loaded landing legs deploying, and thus inappropriately inhibited the descent-engine operation. Just because the designers didn't anticipate or imagine the eventual cause of failure, that was no excuse not to apply appropriate control.

So in your rush to criminalize the advisable conditioning of PEAR's experimental data, you've managed to bring up an example that not only fails to represent the behavior of data in the experimental sciences, but fails to represent the experience of control-systems design. It takes effort to misrepresent to that degree. I would expect someone claiming expertise in both applied mathematics and control system design to have drawn the parallel between basic control designs such as PID controllers and their counterpart concepts in statistical sampling. You don't seem to understand either concept, or the relationship.

To be sure, you are correct in saying that excluding anomalous data requires judgment. But you are not correct in pretending your one example is suitable for either situation. You are further wrong in insisting that you have the proper judgment in this case to comment on the data conditioning recommended by others who are experts in the field.

Theory desires that input discretes should be well-behaved and represent accurately the real-world condition we conceptualize as a discrete event in the process we are controlling. We therefore introduce conditioning procedures to mitigate the departure of practice from theory and formalize rules for rejecting false inputs so that the data suitably satisfy theory. Similarly, theory predicts that certain variables sampled from the real world should result in any of several possible distributions of outcome. In like manner we adopt practices to help mitigate departure from theory due to error, and we do so in a way that doesn't require us to know or speculate about the possible causes of error in advance.

In the context of PEAR research, we consider the outcomes theory would predict on either side of the question whether PK is real. If there is no such thing as a psychokinetic effect in humans, then a test properly designed to measure one would fail to show significant variance. The distribution of measurements for all subjects would cluster very closely around the null measurement, with standard deviation corresponding only to measurement and sampling error. There would be only inconsequential variance across all subjects.

In contrast, if there were a PK effect in humans, we could expect a test properly designed to measure that effect to produce results for a good sample of subjects that resembled one of the normal distributions. That is, at one end of the curve we would have a few people who -- for whatever reason -- had little if any PK ability. At the other end we would have a few people who -- again for whatever reason -- had prodigious PK ability. We should expect the majority of people to cluster within a standard deviation or so of the mean PK measurement, for that mean to differ significantly from the mean for null, and for the standard deviation to be broad enough to indicate actual variation in the data that isn't explained by the baseline (which includes both measurement error and sample error). This would be consistent not only with a normal variance in any human ability -- natural or supernatural -- across a proper sample, but consistent also with what the believers in PK have long believed to be the case. Specifically, in Budhhism the PK ability is thought to vary in individuals according to the degree to which one has attained enlightenment, varying from spoon-bending and other parlour tricks to full-blown corporeal flying. A big part of exercising proper judgment in data conditioning is knowing what the data should look like.

The original PEAR data resembles neither of those theoretical predictions. PEAR misleadingly represents that the coarsest of aggregations (omitting per-subject data) shows a significant variance over baseline. That's because the coarsest of aggregations does not reveal that the variance across subjects is suspicious. Moreover, when the anomalous data point is removed, the data across all subjects does resemble one of the expected distributions -- the non-PK one. This is the judgment that's required in this case. Human-subjects research in a properly vetted sample should exhibit variance across subjects conforming to past experience in the field -- generally a one-result cluster (no significance) or a normal distribution (possible significance). Having all the data congregate at one end of the distribution and one data point all by its lonesome at the other end is highly indicative of an anomalous result that should be conditioned away. Without it, the data conform to one of the two expected outcomes.

But Dr. John Palmer wasn't the only one to eliminate Operator 010. Robert Jahn did that too, in the follow-on attempts to replicate that you haven't yet read. There are four datasets in this study: two from Jahn/Boone and one each from two independent researchers. Three of those data sets, when reckoned across all subjects instead of relative to the other variables, conform to the "peaky" one-result distribution, the expected no-PK output. The only dataset that doesn't conform to any expected distribution is the one that includes Operator 010. Hence subsequent testing easily confirms that removing Operator 010 was the proper judgment in that case.
 
The goal of the Princeton research was to investigate telekinetic capabilities of the whole group, not of its individual members.

Yes, that's true, but it misses the mark of the criticism. While the goal may have been to investigate aggregated behavior, that doesn't preclude inter-subject tests to ensure that the data are homogeneous and therefore that the aggregration has meaning. You seem to be trying to say that Dr. Palmer found an anomaly in something that PEAR wasn't trying to study, so it doesn't matter. That's not what happened. Dr. Palmer found an anomaly while testing the data for integrity and internal consistency. That kind of test is always appropriate, and very important when conclusions are to be drawn from broad aggregations.

Let's say we have a random sample of twenty sixth-graders (12 years old) who are being sent to basketball camp. We test their ability to score a basketball free throw by giving each of them 10 trials. Each player's score is the number of free throws they hit, from zero to 10. Certainly we can create descriptive statistics about the score -- the mean score and the standard deviation. I have no idea what that distribution would look like, but let's say the mean score is 3.1 hits out of ten. Make up a standard deviation; it's not important.

Since it's a random sample of kids, we would expect them to vary in their ability. Some 12-year-olds have more practice and skill than others. If, for each score, we look at the histogram of kids who got that score, it should also look something like a normal distribution. You'd have nerds like me who would score very few hits, and athletes like my 12-year-old nephew who would score a lot. Most kids, we figure, would score somewhere around that mean of 2-4 hits. Few if any would score higher than two standard deviations better than the mean.

That's our empirically determined baseline, although as I continue you will probably see there's a slight problem with method. Ignore it for the purposes of the example; I know it's there.

Now send all the kids to basketball camp for two weeks and draw another test sample. A reasonable test of the effectiveness of the basketball camp would be to see if the mean scores rose. Now let's say the post-camp mean score was 4.2. Eureka! The camp works! Except there's a hitch; the second sample included LeBron James, and the analysts don't know that. They don't know the identities of the subjects, only their scores.

A quick surf over to the NBA stats page says LeBron's free-throw percentage is around 75%, far better than anyone else in the group. When we look at the histogram -- which, for a properly distributed sample, should still look like a normal distribution -- we see that suspicious-looking spike out there in the 7-8 score range. It doesn't fit what we expect the inter-subject data to look like, whether the camp works or not. He's dragging the average artificially upward, so the aggregation is not as meaningful as it otherwise would be. If the mean score without him is only 3.15, we might conclude the camp is not effective.

In his evaluation Palmer made the same mistake I did [...] when I set aside a set of statistical data belonging to the subject who was declared cancer-free.
[...]
If [Operator 010]’s scores are discarded, then the researchers would come to inevitable conclusion that he possesses highly developed telekinetic abilities, which would prove that the telekinesis exists.

This is a mess. You didn't give us enough detail in your story to determine what exactly you think the mistake was that you made or why exactly you think Palmer made the same mistake. But let's first discuss what's obviously wrong about your comparison. Then we'll work through the rest as best we can and hope to cover all the bases.

The last statement is simply wrong. "Operator 010 is psychokinetic" is not the "inevitable" conclusion of conscientious scientists looking at outliers. Outliers are not presumed to vary according to the variable proffered in the hypothesis. In fact, if there is any presumption at all it's most often that the anomalous variance comes from a sporadically confounding variable, in my example the considerable outside training and expertise of a professional athlete. It sometimes becomes a subsidiary exercise to discover -- and later control for -- that variable. Dr. Palmer didn't go any further, nor likely would he have been able to.

In your anecdote you proposed to disregard a subject because of questions regarding what may have caused the very low score and its possible ability to confound the intent of the test to measure drug effectiveness. If I'm reading your reasoning correctly, you propose that Palmer wants to disregard Operator 010's score similarly because of what may possibly have caused it. You dance around the concept that it's because of assumed PK ability, but it's not clear that's what you mean. And in my example above, that would be like disregarding the anomalously high score because you suspected it was a professional basketball player.

This is simply inapt. Palmer gives no reason for disregarding Operator 010 beyond the inability of the data to fit reasonably within the expected inter-subject distribution. It would have been appropriate, for example, to disregard Operator 009 if his score had been two orders of magnitude below everyone else's. That could indicate, for example, some pervasive difficulty that subject had operating the machinery, and that effect would mask any intended measurement. We don't have to speculate why scores are anomalous in either direction, although it is often attractive to do so. It is sufficient to reject the score based on its incongruence in context, not on what it may conceivably represent. It would have also been appropriate to remove Operator 010 if the remaining distribution was rendered coherent and fit the pro-PK distribution. Yes, the means would have been slightly lower, but they may still have been significant compared to baseline. And that significance would have statistical validity because the inter-subject distribution would have been as expected.

It's your standard straw-man argument. You're ascribing to Palmer motives you may have once naively had, when there is no evidence for any such motive on Palmer's part and considerable evidence for an entirely different -- and completely necessary -- motive altogether. One that you seem blithely unaware of. Just because Dr. Palmer's actions superficially resemble ones that, in a different context, would be wrong doesn't make them wrong in this context. From the very start you accused him of trying to make the data fit his wishes, which you assumed wrongly to be anti-PK. In fact it's quite obvious he's trying to make the data fit any of the expected distributions so that the descriptive statistics and correlations have the intended meaning. This is common practice, and you know it is. And your protests that Dr. Palmer somehow doesn't know how to do that properly in this field, and that you somehow do, is comically self-serving.

By way of background, just so everyone is up to speed :—

Drug trials typically follow the double-blind, placebo-based model. A sample is drawn, and various categorical variables are consulted for the sample that determine how well it represents the relevant population. That population is then divided (typically randomly) into two groups: one that will receive the drug and the other that will receive a placebo. The subjects don't know which one they're getting. The experimenters who interact with the patients, in turn, don't know which one they're administering. The placebo group serves as the baseline control against which the variable group is measured. In order for that to be valid, all those pre-trial categorical variables have to match up fairly evenly between the groups. They're usually demographic in nature, like age, sex, ethnicity, prior medical conditions, etc. But they're really proxies for effects known, suspected, or speculated to introduce confounding influences in the outcome. Dr. Philip Zimbardo, of Stanford prison experiment infamy, includes a layman-accessible description in his book The Lucifer Effect of how he homogenized his sample between the Guards and Prisoners groups. If all those potential confounders are balanced in both groups, you can confidently attribute variance in outcome (in this case, the disposition or occurrence of cancer) to the only category you varied -- what pill the patient actually got. If your placebo group were mostly men and the drug seemed to have worked well on the mostly-women group who took the actual drug, you may not be able to separate the desired effect of the drug from the baseline fact that cancer rates are higher among men.

In your anecdote, the observation you wished to remove was one in which no cancer was observed. From context, I glean that this was anomalous, such as the patient previously having had cancer, and that the expected effect of the drug was simply to put cancer into remission. The causal category (spontaneous disappearance) you proposed to eliminate from whichever side of the placebo line that patient was on still has to be represented on both sides, and -- within both groups -- along its entire categorical spectrum in order for the correlations to the placebo/drug category to remain valid. This has nothing whatsoever to do with disregarding data that is self-evidently out of place, irrespective of known or speculated cause.

Let's talk more about categorical variables. Imposed controls are often categorical variables, in which case strong correlations to them suggest the presence of the confounding condition that motivated the control. The example I gave above relevant to your anecdote was in controlling the sample for sex, because sex is known to affect cancer rates. For a PK example, if a subject demonstrates the ability to move a paper cup around on the table "using only his mind," and that ability completely disappears when the cup is covered under a bell jar, then any of several confounding phenomena is indicated. The control is applied to prevent the subject from physically manipulating the cup in a way the experimenters wouldn't otherwise detect. In past trials this has been accomplished by invisible loops of of monofilament held between the subjects hands, or tricks as prosaic as blowing on the cup. It isn't necessary for the experimenters to think of every possible way of surreptitiously moving the cup by ordinary physical means. Isolating it physically from the subject eliminates most if not all such methods.

If we have some number of such subjects and some of them can move the cup to varying degrees and others can't, and some can move it with the bell jar in place, and others can't, then we have the basis to perform tests for significant relationships among categorical variables, where in this case the category might be "jar" vs. "no jar." This is how categories would more properly be considered in an experiment, and we would switch to something more akin to the chi-square test for independence.

In the PEAR study the volitional variable was imposed as a control to preclude anything that would have the effect of knowing before the trial how the outcomes would appear. It doesn't matter how such knowledge could be acquired or such preparation could be accomplished. It doesn't matter that you, I, Palmer, or anyone else fails to imagine how it could be done. That's not how empirical control typically works. What matters is that the data were collected in a way that recorded whether the subject was able, at the time of testing, to select the method of trial that would occur. That's a category in the study.

In a double-blind placebo study, we would want the measured cancer rate at the end of the study to be significantly independent of all variables except the placebo-vs-drug variable. Toward that end we compare measured cancer rates to those variables regardless of whether they got the drug or the placebo. If the rate correlates in your sample more strongly to, say, whether a patient exercises regularly than to whether he got the drug, you can't say the placebo-vs-drug category is sufficiently independent to be significant.

If the PK effect hypothesized by PEAR is real, the measurement of it in terms of variance from the baseline was expected to be independent of the volition category. It wasn't. It was quite strongly dependent on it. Now you can read all sorts of nefarious intent into the notion that all the variance in the one out of three studies that showed any variance at all depended on whether one subject knew in advance how the day's experiment was going to be done. But what's more important is that PEAR had no answer for this. They didn't propose any sort of PK-compatible explanation (e.g., "For PK to work, the subject had to be in a certain mindset that was defeated by the volition variable.") They did no further testing to isolate and characterize this clearly predictive variable that wasn't supposed to be predictve.

You can't just leave a failed test for independence alone and claim victory nonetheless. Palmer didn't explicitly perform the independence test, but he didn't have to. The errant correlation is trivially apparent. This lengthy exposition is meant to reach this one point: You accuse Palmer of eliminating or disregarding Operator 010, and you claim -- based on irrelevant and wrong comparisons to control-system design -- that this is a no-no. A better way of looking at Palmer's review is that he considered the problem categorically. The categories of volition-vs-not and significance-vs-significance are not independent in the way they would need to be for Jahn's hypothesis to have been properly supported by his first study. It's not that Operator 010 wasn't taken into account. Palmer took Operator 010 into account according to the way the categories in which she fell could be reasoned about intelligently and correctly.
 
And you have completely ignored the fact that only one of PEAR's 22 subjects managed to display any "gift" at all, and only under circumstances she chose. Does that not make you suspicious? The "everyday people" in the experiments -- and in the attempts to replicate the findings -- showed no "gift" of psychokinesis. Yet for some reason you seem to think this study is scientific proof of the phenomenon.



Given the desire and ability of claimants in this field to dupe experts and the public, why are you trying to argue that reasonable controls to prevent such occurrences were unjustified in what was billed as scientific research of a true phenomenon? Is it just so you can have some apparent means to belittle Jahn's critics?
There are different types of controls. If a person claims that he has a "telekinetic" gift and brings his own equipment, that equipment should be rigorously inspected, as it happened in Geller's case. The same is true for a group of scientists who claim that that there equipment enabled them to prove that telekinesis exists. But checking if a subject was truthful during an experiment that doesn't involve her equipment is something unheard of in scientific circles.

In theory, it is possible that during a clinical trial a subjects takes a medication and instead of swallowing it puts it under his tongue and then spits it out when the doctor doesn't see it. But not a single researcher in his mind would seriously consider such possibility.
 
There are different types of controls

True, but you can't seem to think of any that don't involve a machine of some kind. You have no idea what a scientist means when he has designed an experiment that involves empirical controls.

But checking if a subject was truthful during an experiment that doesn't involve her equipment is something unheard of in scientific circles.

Bwahahaha! You don't know anything about how to do science. You're not a scientist, or even remotely close to one. Did you read Loftus like I asked you during your reincarnation thread? If you think psychologists can't design experiments and tests that detect deception, you are completely in over your head here. Most psychometric instruments, for example, have internal-consistency controls that score a certain way when there is an attempt at deception.

In theory, it is possible that during a clinical trial a subjects takes a medication...

What a bizarre straw man.

But not a single researcher in his mind would seriously consider such possibility.

Correct, which is why magical truth-telling pills are not what psychologists use. Do you literally think that everything in the world must necessarily work the way your layman's imagination suggests?
 
Yes, we all took basic statistics. Don't pontificate. You're not the teacher here.



No, not really. If one data point out of many is responsible for all the variance purported for a distribution, it is clearly anomalous. In most cases, theory predicts the general shape to expect from the data. When all the data but one point fit any of the general predictions, that's how we are able to determine that it is anomalous.



...which has nothing to do with psychology experiments on human subjects.



...which has nothing to do with psychology experiments on human subjects.



That's why it was dishonest of PEAR to attribute the variance contributed by a single subject to the tendency of a group. It is not necessary to assert that testing individual ability was the goal in order to reject anomalous data. In fact, if aggregate results are what is desired, it is even more important to reject data that do not aggregate well.



No, your mistake was to misclassify the data. That's not what happened in the PEAR studies. You're simply drawing upon your obviously limited experience in data analysis and deciding that all problems must conform to that experience.



No, that doesn't follow. The score should be set aside simply because its variance is not consistent with the distribution to which it is supposed to belong. The decision is made purely on the basis of what is expected from the profile, not by guessing at why it's anomalous. The check on that decision is made by the same subject having failed another control -- the volitional variable, which you do not discuss. From that, the critics suspect that Operator 010 somehow tainted her results, but it is neither possible nor necessary to confirm that in order to take appropriate action.

What you utterly failed to consider is that during replication, both by PEAR and by the two other organizations that tried to reproduce the findings, the outcomes most closely matched what Dr. Palmer found when he excluded Operator 010. Subsequent replication confirms that Operator 010 was anomalous data that should have been excluded.



The research purpose in this case is one in which Dr. Palmer had years of experience and in which he specialized for his entire career. You, however, have a history of lying about what you know. Specifically you demonstrated undeniable ignorance in your reincarnation thread of how research should be conducted to obtain reliable empirical results. Under those circumstances it's comically hubristic of you to try to tell us what the professionals "must" have forgotten, because it doesn't match your layman's expectations.

Once again your argument boils down merely to demanding that we accept you as an expert despite all the evidence that we shouldn't, and your subsequent dictum that professionals in the field are incompetent compared to you. That's pure ego, not science. You don't address the replication. You don't address the volitional control. You just say, "...because I said so."



Asked and answered. Dr. Palmer himself is not the straw man. The straw man is your decision that his criticism should be addressed instead of what your opponents told you to address. The label "straw man" applies to an argument, not to a person. It's a metaphor.



There's a lot to unpack here. First, the Palmer view is not at all weak. Nor is it biased, as you suggested before you even read it. Second, of course it was your intent to follow that. You wrongly stated that your critics had not provided you with criticism to address, despite their having done so clearly and repeatedly. On the basis of that lie, you chose to address only the critics whom you had cherry-picked. As I said before, you even expended some rhetoric to try to show that your opponents should accept your choice of critic.
I think it was you who called Palmer a straw man just because I chose his article, I was just following your lead.

You keep saying that in psychological studies the outliers should be rejected. It is your personal point of view; however, you didn't provide a single example of a psychological study rejecting an outlier, while I provided several examples showing why in many cases the outliers should be discarded including the one involving a clinical trial. That clinical trial was to test a cancer medication. But it could have been about testing a psychiatric medication and the conclusion would have been the same -- the outlier should have been rejected.
 

Back
Top Bottom