KAJ
Scholar
OK, I'll bite.How about a paper with two statisticians contributing?
Reviewing this kind of work takes time, and IRL I don't have much spare, so expect delayed responses from me - reading the paper you cited and writing this post used a lot of time.
I'm hesitant to get involved because you have used three distinct statistical approaches to support your proposition that the sample results are so heterogenous as to indicate subterfuge.
- In #70 you used non-overlap of 1 sd intervals. I addressed that briefly in #98 .
- Next, in #159 you relied on the "chi^2 test" saying "If the carbon 14 dating fails the chi^2 test, then the results are no good" without giving any more detail of the "chi^2 test". In #176 I pointed you to Bray and said "If you want to continue with your heterogeneity argument, you really need to address that expert opinion." In #196 you said "Look at table two [in Damon et al], and the reported X^2 value of 6.4". I addressed that in #203.
- Next, in #206 you said "How about a paper with two statisticians contributing?" linking to a paper you apparently hadn't read (#215) and I'm practically certain you hadn't understood, and tried to reverse the burden of proof ("Can you explain why it is wrong?").
OK, on to the statistics...
Firstly I'll address the crux of your proposition, that data differences ("dispersion") between labs suggests that the samples did not come from one source. In #76 you said:
That doesn't matter. No matter what date, if the samples don't agree, then the samples were not from the same thing.
The control samples do agree on the date, that means the control samples were from the same items.
This evidences a fundamental misconception (not restricted to you!), that heterogeneity is binary - things are either heterogenous or not. In reality nothing is perfectly homogenous, there are always real differences between subsamples - not just measurement errors. Those differences may be too small to matter but, no matter how small, they can be demonstrated given enough data and appropriate data analyses. If data analysis doesn't show a difference that does not mean there is no difference, only that the data and/or analysis are inadequate to show the difference.
This illustrates one of the problems with null hypothesis significance tests (NHST), we don't believe the null hypothesis anyway (see, for example, link). We know the SoT is not perfectly homogenous, it is at least somewhat heterogenous. If NHST tests do not demonstrate heterogeneity then either the data or the data analysis are inadequate.
To support your proposition you would need to show that the sample dispersion was so great as to be incompatible with "samples ... from the same thing". But we do not know the dispersion expected of "samples ... from the same thing". The control samples give some relevant information, but it would be a great assumption that the heterogeneity of the ToS and the controls were similar.
The non-overlap of standard deviations you used first (#70) and the chi^2 analysis used by Damon et al. compare the between lab dispersion to (estimates of) within lab dispersion, not relevant to your proposition. link is relevant here.
Even worse, in this data set samples and labs are inextricably confounded, it isn't possible to distinguish sample effects from lab effects. Again, the control samples give some relevant information, but it would be a great assumption that between lab differences in treatment of ToS samples were the same as differences in their treatment of controls.
To add to the difficulties, Damon et al. Table 1 gives only 12 results for the ToS, 4 for each sample/lab. With such a small data set no statistical test will have useful power, you really have no chance of showing undue heterogeneity unless it is massive.
OK, on to the paper of which you asked "How about a paper with two statisticians contributing?"
TL;DR This paper really is rubbish. It's approaches are "different" enough to be considered wacky. Even if their conclusion ("... a decrease in age BP as x1 increases ...") were correct, it would not support your conclusion that the 3 samples did not all come from the SoT.
There are (at least) three versions of the paper:
1) You linked to Fanti et al. (2010) That 5 page document is from a Workshop Proceedings and "The purpose of this paper is to summarize the results obtained in Ref. 2". Ref 2 is Riani M., Atkinson A.C., Fanti G., Crosilla F.: “Carbon Dating of the Shroud of Turin: Partially Labelled Regressors and the Design of Experiments”, which I'll refer to as...
2) Riani et al. (2010). The link to this in Fanti et al <https://www.lse.ac.uk/collections/statistics/research/RAFC04May2010.pdf> gives a 404 error but I found it at Researchgate . The coincidence in dates (May 4, 2010) suggests this 20 page document is from the same workshop as Fanti et al (2010).
3) Riani et al (2013) Riani et al (2013) by the same authors is an 11 page article in in Stat Comput (2013)
My comments apply to all three, but I've concentrated on Riani et al (2010) as being the most complete. Pulling it apart point by point would be tedious and pointless. I'll highlight some of the worst bits, if you want more justification of my opinion, indicate what would satisfy you.
The Riani et al approach is wildly unusual and, in my opinion, fatally flawed.
They have only 12 data points,far too few to show excessive heterogeneity. Their section on heterogeneity (section 2 in Riani et al 2010) really just demonstrates that they don't have a good model of the errors, required for any realistic analysis.
Damon et al. say "The results, together with the statistical assessment of the data prepared in the British Museum, were forwarded to Professor Bray of the Istituto di Metrologia 'G. Colonetti', Turin, for his comments. He confirmed that the results of the three laboratories were mutually compatible, and that, on the evidence submitted, none of the mean results was questionable".
Riani et al ignore this and work on the basis that there is heterogeneity. In 2010 they said:
Fanti et al. "The 12 datings, furnished by the three laboratories, show a lack of homogeneity"
Riani et al. "The twelve results from the 1988 radio carbon dating of the Shroudof Turin show surprising heterogeneity. We try to explain this lack of homogeneity by regression on spatial coordinates."
By Riani et al 2013 the heterogeneity was not only undeniably present, it was "egregious" - last sentence of section 2 Heterogeneity "We now use a spatial analysis to try to discover the source of the egregious heterogeneity in the readings on the TS."They fit a spatial model when they don't have good spatial data. Their generation of "387,072 possible cases to analyse" is really just fiction, they don't seem to realise that all except one (at most!) of these cases are wrong - there is only one "real" arrangement and they don't know which (if any) of the "possible" cases this is.
They fit a rectilinear model with no clear reason (but not surprising considering they have only 3 "real" x values. They conclude (Riani et al 2010 p14) "The effect is that of a decrease in age BP as x1 increases. The effect is not large over the sampled region; between x1 = 43 and 81, our estimate of the change is about two centuries. Extrapolation of this linear trend to unsampled values of x1 eventually leads to meaningless negative results". Such an impossible conclusion should have been taken as clear evidence that the model is just plain wring.
Overall this work illustrates the phrase "if you torture the data long enough, it will confess to anything"
