The only way of reliably producing the identical PDF that I have seen work reliably, is to reproduce exactly the same the same scan on exactly the same computer, using exactly the same settings.
Agreed. This is why I asked Robert early on to describe all the parameters and settings that affected what a PDF contains. I asked twice and he never answered. The answer is "practically infinite," meaning a large enough number to preclude exhaustive investigation. This is not, however, why Arpaio's toy investigation is legally inadmissible. It is, however, why it's useless for any practical evidentiary use.
Not even the same paper on the same scanner and computer using the same settings would be guaranteed to produce the same PDF. Small factors such as the fine alignment of the paper on the scanner bed have an observable effect. Why? Because, for example, PDF optimizers can attempt to correct what they recognize as alignment errors. Thus they replace objects in the original PDF with new objects that are transformations (i.e., scaling, nudging, or rotation) of previous elements. If, for example, the document was more perfectly aligned on the scanner bed, the optimizer might not recognize it as something that needs adjustment. Therefore the optimized PDF won't have those adjustments.
This is something we see all the time in the field of computing for science and engineering, as we collect and measure data from the real world. The care with which we obtain that data affects the behavior of downstream algorithms, often to a nonlinear extent.
Yes, Robert will just try to handwave these facts away as "more techno-babble from Jay," but of course what he's really saying by that is, "I don't understand a single word Jay said, and I can't refute it." And that's why experts, not Robert, make the important decisions and acquire credibility.
I could, for example, write a program in, say, VB which organises my image files. 20 other programmers could do the same, but no two solutions would be identical.
I.e., what happens several times a semester in any computer science curriculum. As a former teacher of college engineering and computer science courses, I can attest that if you get two solutions that behave suitably alike, that's evidence the students may have collaborated and you should look more closely for evidence of outright cheating. The natural condition is for solutions to differ markedly.
That's as simple as I can make it without getting into technicalities.
Feel free to get into the technicalities with me. I love it, because it's part of my profession, and Robert can't figure those arguments and so gets visibly flustered.
The notion of PDF, PS, and other page description languages as
languages is vital. I mention one of the subjects I taught was computer graphics, which is intensely fun -- especially if you're teaching at the university that produced the people who went on to found Evans and Sutherland, Pixar, and Adobe. And one of the assignments was indeed to hand-write programs in Postscript and PDF.
I remember one student wrote a program, in Postscript, to
generate fractals as output. It was a very tiny program, as far as PS files go, and just for bleeps and giggles I ran it on our department printer -- took three hours to run and produced gorgeous device-resolution output. Now an equivalent PS file (in terms of visual output) could have been written simply as an embedded bitmap. Or in any number of ways, programmatically. The field is literally wide open, when you have a Turing-complete programming language to work in.
The Birthers loved technicalities when they thought the technicalities were on their side. It was fun to listen to them foam about "layers" and "bit masks" as if they suddenly knew what these things were. Nowadays it's hard to get a Birther to talk about them. Why? Because the world saw them shoot themselves in the foot. "PDFs from scanned documents ever only contain the one layer!" they said. And after quickly realizing that that was a most
inexpert position from which to argue, they've been backpedaling ever since and hoping the technical argument would go away.