Yes, the Ward and Wilson test is "a chi-squared test." The Pearson test that you first erroneously referred us to is also "a chi-squared test." Any test that involves computing a test statistic that is meant to be χ
2-distributed can be called "a chi-squared test." Again, just because χ
2 values appear in the paper doesn't mean your interpretation of the purpose and implications of the test that produced those values is correct or that it necessarily follows from those values. Once again, you seem to be regarding these statistical tests as some kind of a machine where you put data in the top and turn a crank and a pat answer comes out the bottom that requires no further thought.
But they don't draw
your conclusions.
Asked and answered. A statement of the form, "X doesn't seem to explain all the variance in Y," is not at all equivalent or predicatory to a statement, "There is too much variance in Y for Y to be valid." You are purporting that the latter statement necessarily follows from the first in some way that cannot be questioned.
None of this tap dance fixes your misrepresentation of Damon et al. First you told us that the judgment that there was "too much error" in the dates and that the "dating was invalid" was "straight from the Damon paper." But when pressed, all you could come up with was the statement that one stated cause of scatter didn't seem to be the entire cause in this case. Then you switched sides and seemed to tacitly admit that the conclusions you were trying to pin on Damon et al. weren't actually in the paper, but that the authors were somehow dishonest for not stating them anyway. Now you're trying to fix that mess of contradiction by telling us the conclusions that you really want to put in Damon's mouth so obviously follow from the numbers themselves that there can be no question about them—straight-up begging the question.
The statement you made and attributed to Damon et al. is not in their paper. They are not dishonest in any way for not having drawn the same conclusions as you. Your conclusions do not inexorably follow from the χ
2 values reported.
You're the one telling us it invalidates the measurement in this case, therefore it's your burden of proof. Begging the question does not carry that burden.
A root-cause analysis of what caused a purported outlier in a radiocarbon dating run is generally not possible according to what we find in the literature. But if you think it is, by all means describe how and give some examples. Accounting for unknown or unknowable error is exactly what statistics is for, and why we properly call it the study of uncertainty. If you could figure out all the root causes for why something didn't go the way you expected, you wouldn't have uncertainty—you'd have certainty. The entire
raison d'être for distributions like the normal distribution is to help us reason mathematically about things we can't account for any other way.
What are you supposed to do when you have a χ
2 value for radiocarbon dates that exceeds the critical value for your desired confidence interval? What is common in the archaeology field? What do Ward and Wilson recommend? You told us a few days ago that the dating was invalid as the result of the Ward and Wilson test, but now you seem to be softening and getting ready to accept that the Ward and Wilson test is not some black-and-white slam-dunk. (Hint: Ward and Wilson actually don't treat it that way in their own paper.) Damon et al. accounted for the scatter in the Sample 1 interlaboratory data by
not combining the data directly as they could do for the other samples, and as advised in Ward and Wilson, but by applying a different statistical aggregation method supported by the
t-distribution and common in the chemistry field.
I have no idea what you're talking about. And I don't recall you bringing it up before, so it's hardly something I'm "avoiding." But by all means keep trying to direct my rebuttal. It's hilarious how desperate you are to control the discussion and steer it away from things you can't fathom. Fig. 1 in Damon is a pretty picture. The actual data are elsewhere in the paper.
Hardly. Science is about vigorously challenging each others' methods and findings. Most of science is about doing just that, not even breaking new ground. Ward and Wilson spend two-thirds of their paper explaining why previous methods devised by their colleagues weren't any good. Christopher Ramsey at Oxford spends half his paper telling us why Ward and Wilson's method is no good. If the science in Damon et al. is so bad, where are the legions of radiocarbon dating experts that should be rising up to show those egregious flaws? Why are the only people offering criticism people from outside the field and—in their other work—focused only on the shroud of Turin?
Not really. You're trying to foist a conclusion regarding those facts by vigorously begging its question and assiduously avoiding any sort of test of the reasoning that would ordinarily have to support such a conclusion.
Now let's go back (as promised) and look at your own attempt to analyze the lower-level data. I should warn you that it's a sign of bad faith when a person who has just learned a tidbit of new information goes off running in some direction with it according to a revised argument (now featuring the tidbit) but always in service of the same old preconceived notion.
No, because that doesn't follow. Why did you do it that way? The
t5-distribution was part of a combined solution that kept the inverse-square weightings for intra-laboratory data while the pooled means were reckoned according to the
t-distribution (but not according to
t2) in order to maintain congruence with the other samples in Table 2. This is accounted for by the notations on Table 3. If you had just wanted to lump all the runs from all the labs together and treat them as one homogeneous data set, you should have used the factors from
t9 (per Damon) or
t11 (per your implied homogeneity), but either way something that doesn't entail any pooling. But then your results would not compare correctly with the control samples.
You don't show your work, so it's hard to imagine where all your mistakes might be. But yes, if you simply treat all 12 radiocarbon dates from all the runs for Sample 1 as
t5-distributed data (and it's not correct to do so), you probably came up with 1102-1420 CE according to N{689, 61} and 95% CI giving you ±(2.6×61).
As a check, do the same thing with Sample 4, using the ordinary ±2σ. Instead of 1263-1283 CE cal (Damon, Table 3), you'll get radiocarbon dates 1091-1354 CE. So there again, a much larger interval.
All you're showing with this is that using the wrong method gives you the wrong answer.
It's possible because you're incorrectly comparing radiocarbon dates directly with calibrated calendar dates. Table 2 is radiocarbon dates in years before 1950. Table 3 is calibrated calendar dates. In Damon, Fig. 2, you see the calibration curve with the inside and outside dates at 95% CI for Sample 1 (the shroud) shown as intercepts. This is the proper method. You don't calibrate each radiocarbon date and then compute the confidence. You do all the statistical operations on the radiocarbon dates and then look up your calendar dates based on your final error (not just the mean).
In Table 3, the calibrated values for Sample 1 at 95% confidence are
1262-1312 CE and 1353-
1385 CE. That's because of the two intercepts for the 660 yr BP lower limit on the radiocarbon date. (That's mean radiocarbon date from Table 3, or unweighted mean radiocarbon date from Table 2, minus its error: 691-31.) However, note that the paper states the values (meaning the calibrated calendar dates) have been "rounded up/down to the nearest 10 yr." (Damon, p. 614). From the calibrated dates, the numbers in whatever color
this is are the upper and lower values in calibrated calendar years. 1262 is rounded down to 1260 and 1385 is rounded up to 1390.
But the key concept is that if you attempt to find calendar age by resolving the radiocarbon BP epoch (1950 CE) against mean radiocarbon date—691 yr BP—with simple subtraction, what you're left with is still a radiocarbon date. You can't compare 1259 in radiocarbon years to 1260 CE (or 1262 CE) in calendar years.
Here's a suggestion for you that might be a bit more revealing. Do a Ward and Wilson test on the
intra-laboratory findings for each sample and then on the aggregated (non-pooled) data points from Table 1 for each sample. Put an X in the cell for each combination of laboratory and sample that fails the Ward and Wilson test at 95% CI. Then tell me what you think it might mean.
| Sample 1 | Sample 2 | Sample 3 | Sample 4 |
|---|
| Arizona | | | | |
| Oxford | | | | |
| Zurich | | | | |
| Aggregated | | | | |