Indeed, this is being presented as if it's some kind of smoking gun. There's no smoke. There's not even a gun.
Students usually can't give you a full soup-to-nuts account of their line of reasoning that got them to an erroneous conclusion, complete with all their hidden assumptions. Occasionally you can deduce what the wrong assumption is, just because the wrong answer can only come from a small set of assumptions or premises. But other times you just have to admit you can't read minds and you can't accurately identify what wrong thing is stuck in a student's head. That's kind of where I find myself now. He's so far off in the weeds—and unwilling to call for help—that I think he might be hopelessly lost. But hope springs eternal.
A radiocarbon date is just the age inferred from the ratio of steadily decaying carbon isotopes. There is some inherent uncertainty in measuring that decay because it's based on a discrete number of decays in a short continuous amount of time. But that's not the calibration problem. The calibration problem is due to finding out back in the 1950s that the amount of
14C available for absorption varies across time and place and that this had a noticeable effect on dating accuracy.
The radiocarbon date isn't a number. It's a normally-distributed random variable with a given mean and a given standard error. And that is itself usually pooled data from several runs, each of which has an uncertainty arising from the inherent measurement error. The
14C concentration is a 5-dimensional object: it's a normally-distributed random variable parameterized by coordinates on earth and years of time, yielding a mean and standard error for
14C concentration at that time and place.
In 1986 there didn't exist One True Method of distributing a set of distributions across a distribution. It's not that statistics didn't have a way. It's that there were
many (different) ways of conceptualizing the problem—some even dating back to Gauss himself—and the scientist has to figure out which method works best for how he knows the phenomenon he's measuring behaves. By 1988, radiocarbon dating had settled on either (or both) of two methods.
But the problem in converting radiocarbon dates to calendar dates is not just managing the statistical undercertainty to arrive at an appropriate distribution to which you can apply any of several normative confidence metrics. The problem is that the underlying physical processes are inherently and intractably ambiguous. When the fundamental physical property you're measuring is an isotope ratio, you have to understand that any given ratio in a specimen will be the product of elapsed time since death
and also the
14C concentration profile during its life on Earth. One of those varies steadily and monotonically. The other doesn't. That lack of monotonicity is what bites us, because it means we're dealing with a non-invertible function.
Simply put, the same high ratio of
14C could be observed in a young specimen in which not much of the isotope has decayed
and in an older specimen that was around when there was more
14C present to absorb. If all you have is the specimen, you don't know how much each possible factor contributed to what you measured. Now by 1993(?) when we did our analysis of a shoreline, there was an agreed-upon method for estimating probability weightings for all the different dates it
could be. This was the result of further studying the dataset that the calibration curve was derived from.
Still, what comes out of the calibration process is ostensibly normally distributed. Everything that went into it is normally distributed and everything that comes out of it is normally distributed. But the vital thing to know about the calibration result is that it's almost certainly going to be a
set of normally-distributed random variables—not just one random variable. We can't do anything to fix that. It's simply that the problem of assessing true age from radiocarbon age requires us to invert a non-invertible function—i.e., to do something that's mathematically impossible.
Now we deal with this all the time in mathematics. It's usually best understood if we set aside the notion of statistical distributions and just work with finite, scalar numbers—with a huge caveat that I'll point out later.
Let's say the length of a certain part (ignoring tolerance and error) is calculated by finding the roots (zeros) of this equation.
A = (x - 1)² - q
That is, find the value of
x when A = 0 and
q is some value we measure on the factory floor. Don't worry about why that's the right equation. Let's say we just know it because our hypothetical company mathematician has derived it for us. If you do the algebra for
q = 2, you come up with √(2) + 1. But algebra reminds us that the proper answer is ±√(2) + 1, or a solution set approximately {-0.414, 2.414}. Now we know that a length is a physical quantity that can't be less than zero, so we know to ignore the impossible option and take the length to be 2.414.
That doesn't mean the other root is wrong. It's a correct solution to the equation. We didn't do the algebra wrong. We just know from other information which answer is the one that works in the physical-world problem we were modeling with the math. We need to avoid saying it is the "right" answer. The other answer is a right answer. It's just not the one that can solve our problem. Our problem faces an inherent ambiguity. We didn't make a mistake. Our use of the word "ambiguous" doesn't imply we can't know the answer to our problem.
In the ℝ-valued example, our solution set is a set of values that we must select from on an either-or basis. Both are valid answers to the problem, but there is no way to reckon the notion of "combining" them in any way other than to show them as a set. When we turn back to statistics, our values are not either-or numbers but random variables that
collectively model the underlying phenomena along with its uncertainty. We can certainly say that if we get two calibrated dates—sometimes you get three—you can still express them as the set
{N(μ₁,σ₁²), N(μ₂,σ₂²)
}
but it's absolutely, incontrovertibly
wrong to treat the set as discrete, distinct distributions. Unlike our simplistic example, it is absolutely
not an either-or determination. This is the caveat I alluded to earlier. Instead you have to blend the distributions. There are many ways to do that, and knowing the right way to do it depends on knowing what your variables represent. Here they represent a functional mapping from the same underlying domain.
Incidentally, you can also represent the set of distributions as
{A₁-B₁, A₂-B₂};
nσ
for some given
n, where A and B are the upper and lower values of a given error band defined by
n. You might recognize that as a confidence interval, which embodies the mean and the standard error. That's just a different way to notate a random variable that has a given parameterization. And that's how radiocarbon dating experts prefer to express it, with
n = 1 or
n = 2. Each element in the calibration set is thus termed a
confidence band. It is expressly wrong to consider them to be discrete intervals into only one of which the true date must fall. Students are expected to be confused by a notation that seems to be expressing a set of tolerances rather than modes in a single distribution. But the remedy for that is to learn more, not to demand that the world shrink to fit what you already know.
When you blend the distributions—optionally applying the weights I talked about above—you get a single multimodal distribution. That's a distribution with more than one mode, or hump. Typically for radiocarbon dating you get two humps, but I have seen some with three. Now if your modes are widely separated, the trough in between them might drop very close to the x-axis. But it will never be zero. The distribution will never, ever,
ever devolve into an either-or choice. But in most cases those modes will be close enough together than their tails blend into a nontrivial trough. Either way, what's important to remember is that it's
one distribution, not several.
Because this is still a probability mass function, the true calendar date will gravitate toward one hump or another. The weighting method we got from Stuiver and his colleagues after the shroud analysis was done will help. But just because the true date might prefer one hump to the other doesn't mean the non-preferred hump is wrong or that it indicates some flaw in the underlying lab procedure. This is where I think our poor statistics student is getting confused.
Yes, in the case of the shroud we can rely on a documentary historical source that says it existed in 1356 CE.
Under no circumstances does that mean that Damon et al.'s 1353-1384 CE confidence band (95%)—which includes ostensible "creation" dates some decades later—constitutes any kind of flaw, mistake, or coverup that we need to investigate.
It is not a smoking gun.
It is literally a non-issue, just like getting -0.414 for a length is a non-issue. It's simply how the math works, and the solution is to understand what to do with what the math gives you. The other mode identified by the 1262-1312 CE confidence band comfortably accommodates an earlier creation date that allows for the shroud to be witnessed in 1356 CE. And that's the proper interpretation of the 68% reported confidence band that skews older and produced only a single mode.
Now any time you're reasoning from a single standard error, you have to ask yourself where the other 32% of the data is. It could be in the later portion of the distribution, but based on what we can know from all sources—including the statistics themselves—it
probably isn't. It is probably not gravitating toward the later mode. If the modern weighted reckoning could be time-warped back to 1988, the confidence bands in Damon et al. would be accompanied by a weighting factor that would have given more weight to the 1262-1312 CE confidence band. The existence of another confidence band—another mode—that contains values we can decide according to other evidence are probably not where the true date lies is not any sort of mistake or cause for concern. It's literally just how the math works, and you have to know how to apply it to your specific problem.
We've covered the shroudie invisible patch nonsense in excruciating detail.
It remains nonsense.
I agree. What seems to be confusing our student is the notion that the "errant" confidence band corresponds to a later date than he says we can know is possible. To the best of my mindreading ability, he seems to be thinking this is evidence of contamination from a modern patch. If that's the case, it's simply as wrong as it can be. The later confidence band simply comes from the vagaries of
14C concentration over time. By the time we're fiddling around with the multimodal distribution, the outliers that could conceivably be attributed to younger fabric will have already been handled statistically and either accommodated by a larger confidence interval (Leese, writing with Damon) or rejected from the data pool (Christen, Bronk Ramsey).
When he once said the two confidence bands can be explained only by there having been two sources of fabric in the sample, that was as ignorant as it could possibly be. As Pauli might have said, "That's not right, it isn't even wrong." I fear that our student's comprehension level hasn't improved much since those days.
Falling back to declarative statements like, "The data are heterogeneous, therefore the results are unreliable!" is simply to ignore everything we've gone over in the past few weeks. It's a fringe reset. In over his head, our student is simply fleeing back to Square One, knees jerking all the way, hoping the real world will just simplify itself and obey black-and-white rules that lay people can apply without any pesky acquisition of expertise. That's not how data behave.
Similarly, complaining that people aren't agreeing that scientific measurements of the same value should be as homogeneous as possible, is the strawiest of straw men. No one is claiming that. Every scientist strives to measure accurately and precisely. The question is what to do when the data are inevitably distributed. Casabianca et al. have one idea, but neither he nor his coauthors work in the field and therefore have little understanding of the value of their idea. The people who actually do radiocarbon dating know how much heterogeneity to expect, how to identify it, and what to do about it.
According to one method that was popular in 1988, the heterogeneity simply doesn't correlate to the shroud sample. It correlates to the Arizona lab, no matter what sample they were working on. According to another method (Christen), we only have to reject two outliers to bring the study into proper conformity. This is a correct, proper expression of how radiocarbon daters use statistics to inform them of the shape of their data. Casabianca's mechanical application of normative values simply doesn't cut it in the field.
Similarly Van Haelst tries to throw a whole lot of statistical mud at the wall to see if any sticks. I got as far as watching him try to apply a beta distribution statistic to data that aren't beta-distributed. That tells me all I need to know. People need to understand that just because an author can array all his errors in neat tables doesn't mean it suddenly becomes science.
We await your interpretation.
I simply don't know what to make of this. Historiographic interpretation is notoriously unreliable and difficult. It will always just come down to someone's opinion that some text or another might be talking about some artifact or another. The debate over the Pray Codex is never going to end because it's always just a debate over how creatively one is allowed to fill in the blanks, to say this is equivalent to that, and to quibble over poor artistic ability.
But science is science. That's hard data. That's an objective physical phenomenon that can be measured objectively, and an uncertainty can be reckoned for that measurement. It's one thing to quibble over whether that uncertainty has been properly expressed and whether it makes the dating unreliable. But it's another thing altogether to simply up and say that objectively-obtained data is simply wrong for no better reason than someone is filling in the historiographic blanks the way they want. That's like saying you can ignore your blood pressure at 175/110 because your wife says she thinks you're looking good these days.