To me the whole thing was over from the very beginning...
For me it was essentially over when he deployed his standard argument in his opening post, "These guys must be wrong because I'm so much smarter than they." That doesn't mean there isn't value in a subsequent explanation of the errors Buddha makes. But at a certain point, after a certain number of pages, there is diminishing returns. Once it becomes readily apparent to the reader that Buddha bluffs
every time, there's little interest in pursuing that to the inevitable conclusion.
Jeffers' showed that the baseline is biased so all results are rendered moot: they didn't use a random device as claimed...
Well, let's be absolutely clear about something. Jahn
et al. are the ones who presented the anomalous baseline data. It's attractive to write off some of these researchers as hopelessly biased. Jahn did not conceal his data. What Jeffers has done is to point out, for those who don't readily see it, why the condition of PEAR's baseline data renders the study non-probative. It's one thing to say, "Yes, the variance in our baseline data is too high in some cases and too low in other cases (i.e., did not correlate to the variance in the calibration runts)." The layman might say, "Okay, so what?" Jeffers comes in and says, "That means the estimates of significance were measured against an obviously erroneous baseline and can't be relied upon to show actual significance."
Jabba insisted that the physics behind guaranteed a random output of the device. Even in a well designed device, in those times some degree of impurity migration in electronic components might have played a role in long runs.
I assume you mean Buddha. The slip is understandable.
Yes, we covered this before. While the process around which an apparatus is based may theoretically be governed by a normal distribution, its exhibited behavior over a finite number of runs will (according to theory) ever only approximate -- to a measurable extent -- the normal distribution, and (according to practice) be subject to innumerable confounds at varying degrees of significance -- e.g., degradation in electronic components, wear on mechanical parts, environmental factors.
Far from being a problem, these departures from the ideal form the basis for idealizing measured data to within a certain confidence interval. We expect the machine to be a certain degree off from ideal just be chance, owing to the factors I allude to above. If it's more off than that during a baseline run, the the baseline is known to be biased. If the it's less off than that during a baseline run, there's an uncontrolled factor and you cannot assert that the baseline is not biased.
Buddha dismisses this whole idea. He doesn't understand it, obviously, but he contrived an example that he thinks -- from his "infallible" knowledge -- means that such a method could not work. He writes:
Suppose, you have determined in advance the length of a sequence of samples of electric circuits you have to take to decide that the number of defective ones exceeds specified limit indicating that something is wrong with the manufacturing process. For whatever reason that only Palmer has knowledge of, you might decide that the process had produced a “biased” sequence, and continue sampling. Then a new sequence would indicate that the manufacturing process is fine, and stop sampling. But Palmer will tell you that your new sequence is, actually, a “biased” subsequence, and you must go on with the sampling. This means that, if Palmer is correct, a sampling never ends. This nonsense implies that all statistical methods are at fault no matter what application you choose.
Right, so we have a population of
N widgets that comprises a production lot. Our goal is to estimate the defect rate. First, we know it's not zero -- just like we know that the departure of Jahn's REGs from a theoretical normal is not zero. No production process is defect-free -- intentionally so. It would be far too expensive. Similarly no machine runs perfectly every time, but we tolerate its misbehavior. So we calculate an acceptable defect rate, one that ensures the cost of replacing the defective widgets that make it into the field is considerably below the cost of improving the production process to eliminate the defects, a
p-value. For many large-scale manufacturing processes, that's around
p = 0.03. For every hundred widgets, you allow three defects. This is akin to saying that for every 100 baseline runs of the REG, you expect three baseline runs to depart significantly from the ideal normal. In the widget case the
p-value for significant defect rate is a business determination. In the REG case the
p-value is empirically determined.
He then alludes to the unremarkable concept of margin of error: how many widgets from the lot would we need to test in order to convince ourselves, within a certain confidence interval, that the lot has met or exceeded its defect-rate requirements. (I'm leaving out the one-tailed versus two-tailed distinction here for simplicity.) For some desired plus-or-minus amount, that's straightforwardly computed from
N. The measured defect rate of the sample of
n is expected to correspond to the defect rate of the lot of
N widgets within the specified tolerance.
But what if it doesn't? asks Buddha. Indeed, under the doctrine of normal variance the defect rate of one sample may vary considerably from the defect rate of the next sample and so forth. If we're not sure which one (or both) of those is biased, and by how much, then how are we to know when to stop sampling the lot? We'd have to keep sampling indefinitely, he claims. And since that's practically impossible, Buddha is claiming a sort of
reductio ad absurdum that would require either Palmer to be wrong, or all of statistics to be wrong, but not both. Since statistics is obviously right, Palmer is obvious wrong.
Buckle up, kids. There's a lot of problems here.
First and foremost is the central limit theorem, which is, oh,
only the most important theorem in descriptive statistics! (Buddha seems always to fall afoul, not of little nitpicky things, but of foundational knowledge.) The central limit says many things, but here it says that the accumulated sample defect rate and lot defect rate must converge, even if the rates are determined by factors that aren't normally distributed. Further, the convergence rate must be linear. This means we don't have to take samples
ad infinitum. (This is not the first time Buddha has made this elementary error.) The samples are guaranteed to converge at a useful rate.
Second, we have an expectation. If we aim for a defect rate of
p=0.03, the expectation is initially that we met it. That's a prior probability of defect. We can then feed samples through Bayes' theorem and come up with a posterior probability that the lot defect rate has departed from expectation, given the results of each sample. This method tends to converge extremely rapidly to a conclusion. This alludes again to what Palmer and May were talking about. The Bayesian inference method is based on the expected uniform distribution of biased samples in the lot. If the samples are suspiciously front-loaded or back-loaded, or otherwise clustered, Palmer says this is qualitatively significant in another way even if the "defect rate" in the data is what it should be.
Bayes helps us in other ways too. We can compute the likelihood that the samples are biased based on estimates of the probability of front-loaded or back-loaded sampling given the expectation of uniform distribution. That is, given an assumption, say, that the first sample was biased, then what is the likelihood that the second sample will also be biased if the lot is good?
Finally, the PEAR data as treated by May and Palmer is not sampled from a larger population, as Buddha's example discusses, such that a margin of error applies. Buddha is alluding to the rather simplistic concept that you can determine something about a population, to a quantified degree of certainty, by measuring a representative sample of it. That's not at all what Palmer is talking about. He's talking about reasoning about and from all the available data. If you have a lot of size
N and you divide it into samples of size
n and measure all the samples, The sample means will be expected to converge to the population mean, but the sample means will expect to vary individually from the population means by a deterministic amount. The error is expected to diffuse among the samples, but not perfectly uniformly. The determination of what is biased and what is not, in this case, is not purely guesswork.
Where Jeffers is concerned, the baselines are expected to exhibit the same sample-wise bias rate as the calibration data. This would be akin to having your
N-sized lot of widgets partitioned into
n-sized samples and looking at how many of those samples were biased, then sampling according to different criterion -- again into
n-sized samples -- and seeing a radically different bias rate. It suggests that the bias might caused by the variable for the second sampling criterion.
When Buddha suggests that what Palmer and May contemplate would take infinite sampling, I just have to laugh. Again, the whole foundation of statistics is to avoid having to do that, yet retain a quantified degree of certainty in the observations. Buddha not only doesn't understand statistics, he doesn't understand the core concept of what statistics
are for.
This long debate with "Buddha" only served to show the holes in his knowledge and that "telekinesis" has really nothing to show.
His evolution book wasn't really about evolution. His God thread wasn't really about proof for God. His reincarnation thread wasn't really about reincarnation. And this thread really isn't about psychokinesis. All Buddha's threads seem to be his effort to convince people he's very smart. The side effect of correcting Buddha's errors in an on-topic fashion is, regrettably, that he doesn't come off looking very smart, or prudent.