And it finally clicked: you're trying to replay the Justice Sheen PDF story. Because that story was plausible in one case, you have it stuck in your head now that if you copy-paste something from one document to another, it might change some of the characters. And once again faced with having typed something wrong, not realizing it until too late, and not wanting to admit to any conceivable error, you're grasping at that straw again.
I've been trying to figure out why you didn't just screen-shot the WhatsApp conversation to try to prove your point. Why did you have to copy from WhatsApp, paste it into a different program, and screen-shot
that program? And the answer is that this is what had to happen in the Sheen story, so copy-pasting is what you think you have to do to make a credible story in this case.
Very often as a knowledgeable skeptic dealing with uninformed and misinformed people, you have to figure which of an infinity of wrong ideas they have in their head. Someone who knows how character encodings work in computers has the one right idea. But someone who doesn't know anything about that bases their understanding of what happens behind the scenes when you copy and paste on who-knows-what.
Here is the letter R. What is actually stored in the computer memory for that letter in this post? It's an 8-bit number, 0x52 in hexadecimal or 01010010 in binary. Why that particular number and not, say, 42? Because we've agreed internationally for decades that the binary value 01010010, interpreted as a letter, will be the uppercase letter R. We call that agreement ISO/IEC 8859-1, which comes from the old ASCII standard from the days of the teletypes. The assignment of these 8-bit values to stand for letters and symbols in a computer's memory, or in digital communications, is called a character encoding. You may have seen other names for similar encodings such as UTF-8 or Unicode.
So back in the day when you were misusing the way primes notation was supposed to indicate time, a teletype that received the binary value 01010010 over its bulky, noisy, slow serial digital connection would put in motion some physical mechanism that would result in a metal type for R being positioned between a hammer and an ink ribbon. Then the hammer would fire, and you'd get an R-shaped ink smudge on the paper.
In the days of primitive computer terminals, the terminal received 01010010 over a slightly better serial line, and this told electronics to aim an electron gun in a certain way to paint a picture of an R in green phosphor. Go watch the old
Andromeda Strain. They use this kind of terminal extensively.
We've improved on that a bit. How are you able to see an actual letter R in your browser, in this post, as you read it? Your browser reads 01010010 out of the computer memory in a programmatic context that tells it it's supposed to interpret it as a letter. Then it goes to the data for the typeface you've chosen for your browser and looks up the glyph for 01010010. The glyph is simply the set of instructions that tells the pixel-painting portion of your browser how to make an uppercase R, in that typeface, in pixels.
Here's R in the default typeface. Here it is,
R, in a different typeface. And a third:
R. You see different pixels painted in each case because the glyph is different in each typeface. But if you could magically peek into your computer's memory where it's holding the text of the post—the binary values, not the picture of the post that your browser has painted out of its typefaces—you'll see 01010010 in all three cases. By default, the browser translates binary values into the proper glyphs using the IEC 8859 encoding.
What's stored for 𝕽, a stylized uppercase R used in mathematics? In this case, not 01010010. Here we've switched encodings (and told the browser so) to use Unicode.
1 We need more bits for that, because the encoded value for this character is 0x1D57D in hexadecimal. Not only is it looking up a different glyph, its underlying representation is different from just plain R. It
is a different character with a different meaning.
So when you copy and paste, what happens? The glyphs don't get copied. The picture doesn't get copied. The encoded values stored in the underlying bytes get copied. In WhatsApp's program memory, 01010010 for R is packaged up and sent to, say, Firefox where it appears in that program's memory as 01010010. It may look different because Firefox is using a different typeface than WhatsApp to paint the pixels. But the underlying encoded characters
do not change.
Put more applicably, if an encoded text byte is 0xBA (the symbol º), copy-pasting it into another program won't change that encoded byte to 0x22 (the basic double-qoute, ").
So what made that happen in the Justice Sheen report on
Herald of Free Enterprise? JimOfAllTrades already
covered that. What's stored in the PDF file for the symbol º as it appears in the report?
Not an encoded character. That PDF is a scanned document. Someone put the paper report on a scanner, and the scanner took a
picture of the symbol. That picture is what is stored in the PDF. Sure, your browser—and other programs—know how to receive the binary description of the
picture and turn it into pixels for you. But it's just doing the same thing dumbly for every kind of picture: cats, scanned text, people's naughty bits.
Copy-pasting from this works entirely differently. As Jim notes, it's a bit of a software miracle that it can happen at all. When you select and copy the picture of the text, the PDF viewer program is furiously trying to interpret bits of the picture as letters, the way our eyes and brain do. Once it has done so, the data that goes into the clipboard is not the picture, but the encoded value for the character the program thinks the picture shows. If the picture looks like an R, the program puts 01010010 into the clipboard memory. But because this process isn't perfect, sometimes the picture of a º might look more like a picture of a " to the algorithm, so it stores 0x22 instead of 0xBA. Then at the destination, the ordinary character rendering process I described above paints the glyph for " instead of for º because that's what got (wrongly) encoded as the result of the picture-interpreting part of the PDF viewer. The destination has no way of knowing any different.
But wait, there's more.
When you type the " key on your computer, a binary signal for that keystroke is being given to whatever program has the keyboard focus at that moment on your computer. Most ordinary keystrokes get translated into the encoded character appropriate for that key sequence. So when I hold down the Shift key and press the R key, the binary value 01010010 (for uppercase R) gets delivered to the program as data that it's supposed to do something with. When I hold down Shift and press the key with the single and double quotes, the program gets 00100010, the code for a plain old double-quote.
So what does it do with it? Depends on the program. The vast majority of programs (including WhatsApp and Google Mail) simply add that character code to the current place in the document without further fuss. And then the glyph-painting part of the program paints some pixels in the right place on the screen according to what the typeface glyph says the R should look like.
But some programs try to be clever. A word processing program like Microsoft Word wants to make documents as pleasing as possible. Typographers know that plain old straight up-and-down double quotes " don't look good in type. We want “pretty” marks, where the open-quote and closed-quote symbols (single or double) are slanted, curled inward, or possibly inverted. But we don't want to make the writer hunt around for the right way to do it. “ and ” aren't just alternate glyphs for 0x22. They're completely differently encoded characters. If you look into the computer memory for this post, you won't see 0x22 surrounding the word pretty.
So when Word (and other Microsoft programs) see 0x22, they don't just dump that character into the data they're accumulating for your document. Word tries to see whether you're at the beginning of a sentence or the end of a sentence and instead of storing 0x22, it will store 0x201C or 0x201D instead, representing the
encoded character of the open- or closed-double-quotation. Those are entirely different characters than 0x22. It's helping you make your document look more pretty without you having to try too hard.
If you copy and paste from Word, it copies those binary codes, not the 0x22 you originally typed. If you copy 0x22
into Word (as opposing to pressing the keys for "), you just get the plain glyph ". Later, other algorithms in Word might kick in and do the translation, replacing 0x22 with 0x201C or 0x201D. And if you leave the 0x22 in there and copy it
out of Word, the translation into the typographic characters does not occur.
WhatsApp doesn't do any of this when you type. Text-entry widgets in Outlook for Web, Google Mail, or this forum don't do any of that. None at all. You only get that pattern of encoded
typographical quote marks (not 0x22 or 0x27) when you
type into Word. The statement you purport to be from a mathematician was
typed into a Microsoft product, which then mangled it.
At my company we vacillate between turning the helpful rewrite features off or leaving them on. We want them to do their thing when we're writing plain English documents. But we don't want them to happen when we're typing technical expressions or symbols. At best they get in the way, and at worst they silently rewrite things incorrectly.
And we never, ever,
ever use a single-quote followed by a double-quote, '", to approximate a forward triple prime. You always type three single-quotes, ''', or three backticks, ```, for a reverse triple prime. That's where you slipped up. Your "mathematician"—who clearly exists only in your head for the purposes of this thread—made a cardinal mistake. Your Microsoft typography slip-up just helped us see it more clearly.
So there's the full debunking of your, "It just copied that way, I swear," ploy. This isn't the same situation as the Sheen report, and you don't know enough about how computers work to lie effectively about how your symbols got botched up
again.
____________
1 I'm aware that the single-byte encodings for Unicode are identical to the single-byte encodings for IEC 8859-1, and that likely the browser is just always using Unicode.